File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1211_metho.xml
Size: 15,455 bytes
Last Modified: 2025-10-06 14:08:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1211"> <Title>Question Answering on a Case Insensitive Corpus</Title> <Section position="4" start_page="2" end_page="2" type="metho"> <SectionTitle> 3 Case Restoration </SectionTitle> <Paragraph position="0"> This section presents the case restoration approach [Niu et al. 2003] that supports QA on case insensitive corpus. The flowchart for using Case Restoration as a plug-in preprocessing module to IE is shown in Figure 4.</Paragraph> <Paragraph position="1"> The incoming documents first go through tokenization. In this process, the case information is recorded as features for each token. This token-based case information provides basic evidence for the optional procedure called Case Detection to decide whether the Case Restoration module needs to be called.</Paragraph> <Paragraph position="2"> A simple bi-gram Hidden Markov Model [Bikel et al. 1999] is selected as the choice of language model for this task. Currently, the system is based on a bi-gram model trained on a normal, case sensitive raw corpus in the chosen domain.</Paragraph> <Paragraph position="3"> Three orthographic tags are defined in this model: (i) initial uppercase followed by at least one lowercase, (ii) all lowercase, and (iii) all uppercase.</Paragraph> <Paragraph position="4"> To handle words with low frequency, each word is associated with one of five features: (i)</Paragraph> <Paragraph position="6"> f denotes a single token feature which are defined as above), the goal for the case restoration task is to find the optimal tag sequence n210 tttt T G16= , which maximizes the conditional probability W) |Pr(T [Bikel et al. 1999]. By Bayesian equality, this is equivalent to maximizing the joint probability T)Pr(W, . This joint probability can be computed by a bi-gram HMM as</Paragraph> <Paragraph position="8"> where V denotes the size of the vocabulary, the back-off coefficients l's are determined using the Witten-Bell smoothing algorithm, and the</Paragraph> <Paragraph position="10"> are computed by the maximum likelihood estimation. A separate HMM is trained for bigrams involving unknown words. The training corpus is separated into two parts, the words occurring in Part I but not in Part II and the words occurring in Part II but not in Part I are all replaced by a special symbol #Unknown#. Then an HMM for unknown words is trained on this newly marked corpus. In the stage of tagging, the unknown word model is used in case a word beyond the vocabulary occurs.</Paragraph> </Section> <Section position="5" start_page="2" end_page="3" type="metho"> <SectionTitle> 4 IE Engine Benchmarking </SectionTitle> <Paragraph position="0"> A series of benchmarks have been conducted in evaluating the approach presented in this paper.</Paragraph> <Paragraph position="1"> They indicate that this is a simple but very effective method to solve the problem of handling case insensitive input for NLP, IE and QA.</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> Case Restoration </SectionTitle> <Paragraph position="0"> A raw corpus of 7.6 million words in mixed case drawn from the general news domain is used in training case restoration. A separate testing corpus of 0.88 million words drawn from the same domain is used for benchmarking. Table 1 gives the case restoration performance benchmarks. The overall F-measure is 98% (P for Precision, R for</Paragraph> </Section> <Section position="2" start_page="2" end_page="3" type="sub_section"> <SectionTitle> Recall and F for F-measure). </SectionTitle> <Paragraph position="0"> The score that is most important for IE is the F-measure of recognizing non-lowercase word. We found that the majority of errors involve missing the first word in a sentence due to the lack of a powerful sentence final punctuation detection module in the case restoration stage. But it is found that such 'errors' have almost no negative effect on the following IE tasks.</Paragraph> <Paragraph position="1"> There is no doubt that the lack of case information from the input text will impact the NLP/IE/QA performance. The goal of the case restoration module is to minimize this impact. A series of degradation tests have been run to measure the impact.</Paragraph> <Paragraph position="2"> Degradation Tests on IE and Parsing Since IE is the foundation for our QA system, the IE degradation due to the case insensitive input directly affects the QA performance.</Paragraph> <Paragraph position="3"> The IE degradation benchmarking is designed as follows. We start with a testing corpus drawn from normal case sensitive text. We then feed the corpus into the IE engine for benchmarking. This is normal benchmarking for case sensitive text input as a baseline. After that, we artificially remove the case information by transforming the corpus into a corpus in all uppercase. The case restoration module is then plugged in to restore the case before feeding the IE engine. By comparing benchmarking using case restoration with baseline benchmarking, we can calculate the level of performance degradation from the baseline in handling case insensitive input.</Paragraph> <Paragraph position="4"> For NE, an annotated testing corpus of 177,000 words is used for benchmarking (Table 3), using an automatic scorer following Message to the loss of case information in the incoming corpus, is 2.1%. We have also implemented the traditional NE-retraining approach proposed by [Kubala et al. 1998] [Miller et al. 2000] [Palmer et al. 2000] and the re-trained NE model leads to In fact, positive effects are observed in some cases. The normal English orthographic rule that the first word be capitalized can confuse the NE learning system due to the lack of the usual orthographic distinction between a candidate proper name and a common word.</Paragraph> <Paragraph position="5"> 6.3% degradation in the NE F-measure, a drop of more than four percentage points when compared with the case restoration two-step approach. Since this comparison between two approaches is based on the same testing corpus using the same system, the conclusion can be derived that the case restoration approach is clearly better than the retraining approach for NE.</Paragraph> <Paragraph position="6"> Beyond NE, some fundamental InfoXtract support for QA comes from the CE relationships and the SVO parsing results. We benchmarked their degradation as follows.</Paragraph> <Paragraph position="7"> From a processed corpus drawn from the news domain, we randomly picked 250 SVO structural links and 60 AFFILIATION and POSITION relationships for manual checking (Table 3, COR for Correct, INC for Incorrect, SPU for Spurious, MIS for Missing, and DEG for Degradation).</Paragraph> <Paragraph position="8"> Surprisingly, there is almost no statistically significant difference in the SVO performance.</Paragraph> <Paragraph position="9"> The degradation due to the case restoration was only 0.07%. This indicates that parsing is less subject to the case factor to a degree that the performance differences between a normal case sensitive input and a case restored input are not obviously detectable.</Paragraph> <Paragraph position="10"> The degradation for CE is about 6%.</Paragraph> <Paragraph position="11"> Considering there is absolutely no adaptation of the CE module, this degradation is reasonable.</Paragraph> </Section> </Section> <Section position="6" start_page="3" end_page="3" type="metho"> <SectionTitle> 5 QA Degradation Benchmarking </SectionTitle> <Paragraph position="0"> The QA experiments were conducted following the TREC-8 QA standards in the category of 250-byte answer strings. In addition to the TREC-8 benchmarking standards Mean Reciprocal Rank (MRR), we also benchmarked precision for the top answer string (Table 4).</Paragraph> <Paragraph position="1"> Comparing QA benchmarks with benchmarks for the underlying IE engine shows that the limited QA degradation is in proportion with the limited degradation in NE, CE and SVO. The following examples illustrate the chain effect: case restoration errors Gc6 NE/CE/SVO errors Gc6 QA errors.</Paragraph> <Paragraph position="2"> Q137: 'Who is the mayor of Marbella?' This is a CE question, the decoded CE asking relationship is CeHead for the location entity 'Marbella'. In QA on the original case sensitive corpus, the top answer string has a corresponding CeHead relationship extracted as shown below. Input: Some may want to view the results of the much-publicised activities of the mayor of In contrast, the case insensitive processing is shown below: Input: SOME MAY WANT TO VIEW THE</Paragraph> </Section> <Section position="7" start_page="3" end_page="3" type="metho"> <SectionTitle> RESULTS OF THE MUCH-PUBLICISED ACTIVITIES OF THE MAYOR OF MARBELLA, JESUS GIL Y GIL, IN CLEANING UP THE TOWN </SectionTitle> <Paragraph position="0"> Gc6 [case restoration] some may want to view the results of the much-publicised activities of the mayor of marbella , Jesus Gil y Gil, in cleaning up the town Gc6 [NE tagging] some may want to view the results of the much-publicised activities of the mayor of marbella , <NeMan>Jesus Gil y Gil</NeMan> , in cleaning up the town The CE module failed to extract the relationship for MARBELLA because this relationship is defined for the entity type NeOrganization or NeLocation which is absent due to the failed case restoration for 'MARBELLA'. The next example shows an NE error leading to a problem in QA. Q119: 'What Nobel laureate was expelled from the Philippines before the conference on East Timor?' In question processing, the NE Asking Point is identified as NePerson. Because Mairead Maguire was successfully tagged as NeWoman, the QA system got the correct answer string in the following snippet: Immigration officials at the Manila airport on Saturday expelled Irish Nobel peace prize winner Mairead Maguire. However, the case insensitive processing fails to tag any NePerson in this snippet. As a result the system misses this answer string. The process is illustrated below.</Paragraph> <Paragraph position="1"> Input: IMMIGRATION OFFICIALS AT THE</Paragraph> </Section> <Section position="8" start_page="3" end_page="4" type="metho"> <SectionTitle> MANILA AIRPORT ON SATURDAY EXPELLED IRISH NOBEL PEACE PRIZE WINNER MAIREAD MAGUIRE </SectionTitle> <Paragraph position="0"> As shown, errors in case restoration cause mistakes in the NE grouping and tagging: Irish Nobel Peace Prize Winner Mairead Maguire is wrongly tagged as NeProduct.</Paragraph> <Paragraph position="1"> We also found one interesting case where case restoration actually leads to QA performance enhancement over the original case sensitive processing. A correct answer snippet is promoted from the 3 rd candidate to the top in answering Q191 'Where was Harry Truman born?'. This process is shown below.</Paragraph> <Paragraph position="2"> Input: HARRY TRUMAN (33RD PRESIDENT): As shown, LAMAR, MO gets correctly tagged as NeCity after case restoration. But LAMAR is mis-tagged as NeOrg in the original case sensitive processing. The original case sensitive snippet is Harry Truman (33rd President): Born May 8, 1884, in Lamar, Mo. In our NE system, there is such a learned pattern as follows: X , TwoLetterUpperCase Gc6 NeCity.</Paragraph> <Paragraph position="3"> This rule fails to apply to the original text because the US state abbreviation appears in a less frequently seen format Mo instead of MO.</Paragraph> <Paragraph position="4"> However, the restoration HMM assigns all uppercase to 'MO' since this is the most frequently seen orthography for this token. This difference of the restored case from the original case enables the NE tagger to tag Lamar, MO as 'NeCity' which meets the NE Asking Point constraint 'NeLocation'.</Paragraph> <Paragraph position="5"> QA and Case Insensitive Question We also conducted a test on case insensitive questions in addition to case insensitive corpus by calling the same case restoration module.</Paragraph> <Paragraph position="6"> This research is useful because, when interfacing a speech recognizer to a QA system to accept spoken questions, the case information is not available in the incoming question.</Paragraph> <Paragraph position="7"> We want to In addition to missing the case information, there are other aspects of spoken questions that require treatment, e.g., lack of punctuation marks, spelling mistakes, repetitions. Whether the restoration approach is effective calls for more research. know how the same case restoration technique applies to question processing and gauge the degradation effect on the QA performance (Table 5).</Paragraph> <Paragraph position="8"> We notice that the question processor missed two originally detected NE Asking Points and one Asking Point CE Link. There are a number of other errors due to incorrectly restored case, including non-asking-point NEs in the question and grouping errors in shallow parsing as shown below for Q26 : 'What is the name of the &quot;female&quot; counterpart to El Nino, which results in cooling temperatures and very dry weather?' (Notation: NP for Noun Phrase, VG for Verb Group, PP for Prepositional Phrase and AP for Adjective Phrase).</Paragraph> <Paragraph position="9"> Input: WHAT IS THE NAME OF THE In the original mixed-case question, after parsing, we get the following basic phrase grouping: NP[What] VG[is] NP[the name] PP[of the &quot; female &quot; counterpart] PP[to El Nino] , ... ? There is only one difference between the caserestored question and the original mixed-case question, i.e. Female vs. female. This difference causes the shallow parsing grouping error for the PP of the &quot;female&quot; counterpart. This error affects the weights of the ranking features Headword Matching and Phrase-internal Word Order. As a result, the following originally correctly identified answer snippet was dropped: the greenhouse effect and El Nino -- as well as its &quot;female&quot; counterpart, La Nina -- have had a profound effect on weather nationwide.</Paragraph> <Paragraph position="10"> As question processing results are the starting point and basis for snippet retrieval and feature ranking, an error in question processing seems to lead to greater degradation, as seen in almost 10% drop compared with about 3% drop in the case when only the corpus is case insensitive.</Paragraph> <Paragraph position="11"> A related explanation for this degradation contrast is as follows. Due to the information redundancy in a large corpus, processing errors in some potential answer strings in the corpus can be compensated for by correctly processed equivalent answer strings. This is due to the fact that the same answer may be expressed in numerous ways in the corpus. Some of those ways may be less subject to the case effect than others. Question processing errors are fatal in the sense that there is no information redundancy for its compensation.</Paragraph> <Paragraph position="12"> Once it is wrong, it directs the search for answer strings in the wrong direction. Since questions constitute a subset of the natural language phenomena with their own characteristics, case restoration needs to adapt to this subset for optimal performance, e.g. by including more questions in the case restoration training corpus.</Paragraph> </Section> class="xml-element"></Paper>