File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1211_intro.xml
Size: 5,164 bytes
Last Modified: 2025-10-06 14:01:59
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1211"> <Title>Question Answering on a Case Insensitive Corpus</Title> <Section position="3" start_page="0" end_page="2" type="intro"> <SectionTitle> 2 Question Answering Based on IE </SectionTitle> <Paragraph position="0"> We use a QA system supported by increasingly sophisticated levels of IE [Srihari & Li 2000] [Li et al. 2002]. Figure 1 presents the underlying IE engine InfoXtract [Srihari et al. 2003] that forms the basis for the QA system. The major information objects extracted by InfoXtract include NEs, Correlated Entity (CE) relationships (e.g. Affiliation, Position etc.), Subject-Verb-Object (SVO) triples, entity profiles, and general or predefined events. These information objects capture the key content of the processed text, preparing a foundation for answering factoid questions.</Paragraph> <Paragraph position="2"> (i) Question Processing, (ii) Text Processing, and (iii) Answer Ranking. In text processing, the case insensitive corpus is first pre-processed for case restoration before being parsed by InfoXtract. In addition, keyword indexing on the corpus is required. For question processing, a special module for Asking Point Identification is called for.</Paragraph> <Paragraph position="3"> Linking the two processing components is the Answer Ranking component that consists of two modules: Snippet Retrieval and Feature Ranking.</Paragraph> <Paragraph position="4"> It is worth noting that there are two types of NE: (i) proper names NeName (including NePerson, NeOrganization, NeLocation, etc.) and (ii) non-name NEs (NeItem) such as time NE (NeTimex) and numerical NE (NeNumex). Close to 40% of the NE questions target non-name NEs. Proper name NEs are more subject to the case effect because recognizing a name in the running text often requires case information. Non-name NEs generally appear in predictable patterns. Pattern matching rules that perform case-insensitive matching are most effective in capturing them.</Paragraph> <Paragraph position="5"> There is a third, optional module Answer Point Identification in our QA system [10], which relies on deep parsing for generating phrase-Answer Ranking relies on access to information from both the Keyword Index as well as the IE</Paragraph> <Section position="1" start_page="2" end_page="2" type="sub_section"> <SectionTitle> Snippet Retrieval </SectionTitle> <Paragraph position="0"> Snippet retrieval generates the top n (we chose 200) most relevant sentence-level candidate answer snippets based on the question processing results.</Paragraph> <Paragraph position="1"> We use two types of evidence for snippet retrieval: (i) keyword occurrence statistics at snippet level (with stop words removed), and (ii) the IE results, including NE Asking Points, Asking Point CE Link, head word of a phrase, etc.</Paragraph> <Paragraph position="2"> If the Question Processing component detects an Asking Point CE Link, the system first attempts to retrieve snippets that contain the corresponding CE relationship. If it fails, it backs off to the corresponding NE Asking Point. This serves as a filter in the sense that only the snippets that contain at least one NE that matches the NE Asking Point are extracted. For questions that do not contain NE Asking Points, the system backs off to keyword-based snippet retrieval.</Paragraph> <Paragraph position="3"> A synonym lexicon is also constructed for query expansion to help snippet retrieval. This includes irregular verbs (go/went/gone, etc.), verb-noun conversion (develop/development; satisfy/ satisfaction; etc.), and a human-modified level answers from snippet-level answers. This module was not used in the experiments reported in this paper.</Paragraph> <Paragraph position="4"> conservative synonym list (e.g. adjust/adapt; adjudicate/judge; etc.).</Paragraph> <Paragraph position="5"> Factors that contribute to relevancy weighting in snippet retrieval include giving more weight to the head words of phrases (e.g. 'disaster' in the noun phrase 'the costliest disaster'), more weight to words that are linked with question words (e.g.</Paragraph> <Paragraph position="6"> 'calories' in 'How many calories...' and 'American' in 'Who was the first American in space'), and discounting the weight for synonym matching.</Paragraph> <Paragraph position="7"> Feature Ranking The purpose of Feature Ranking is to re-rank the candidate snippets based on a list of ranking features.</Paragraph> <Paragraph position="8"> Given a list of top n snippets retrieved in the previous stage, the Feature Ranking module uses a set of re-ranking features to fine-tune relevancy measures of the initial list of snippets in order to generate the final top five answer strings that are required as output. Figure 3 gives the ranking model for the Feature Ranking module.</Paragraph> <Paragraph position="9"> quantifying the snippet's relevance to the question. The ranking model is given by</Paragraph> <Paragraph position="11"> where l represents the question type of Q and w il gives the weight assigned to the ranking feature.</Paragraph> </Section> </Section> class="xml-element"></Paper>