File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1188_metho.xml

Size: 22,852 bytes

Last Modified: 2025-10-06 14:08:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1188">
  <Title>Information Extraction for Question Answering: Improving Recall Through Syntactic Patterns</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Experimental Setting
</SectionTitle>
    <Paragraph position="0"> We set up experiments to address two related issues.</Paragraph>
    <Paragraph position="1"> First, we wanted to understand how the usual precision/recall trade-off shows up in off-line corpus-based QA, and specifically, whether extracting more data of lower quality (i.e., favoring recall) gives a QA system a better performance than extracting smaller amounts of more accurate data (i.e., favoring precision). Second, we tried to verify the hypothesis that syntactic parsing for information extraction does increase the extraction recall by identifying relations between entities not adjacent on the surface layer but connected syntactically.</Paragraph>
    <Paragraph position="2"> There are different approaches to the evaluation of information extraction modules. The usual recall and precision metrics (e.g., how many of the interesting bits of information were detected, and how many of the found bits were actually correct) require either a test corpus previously annotated with the required information, or manual evaluation (Fleischman et al., 2003). Although intrinsic evaluation of an IE module is important, we were mainly interested in measuring the performance of this module in context, that is, working as a sub-part of a QA system. We used the number of questions answered correctly as our main performance indicator.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 QA System
</SectionTitle>
      <Paragraph position="0"> For the experiments described below we used an open-domain corpus-based QA system QUARTZ (Jijkoun et al., 2004). The system implements a multi-stream approach, where several different strategies are used in parallel to find possible answers to a question. We ran the system turning on only one stream, Table Lookup, which implements an off-line strategy for QA.</Paragraph>
      <Paragraph position="1"> The Table Lookup stream uses a number of knowledge bases created by pre-processing a document collection. Currently, QUARTZ' knowledge bases include 14 semi-structured tables containing various kinds of information: birth dates of persons, dates of events, geographical locations of different objects, capitals and currencies of countries, etc. All this information is extracted from the corpus offline, before actual questions are known.</Paragraph>
      <Paragraph position="2"> An incoming question is analyzed and assigned to one of 37 predefined question types. Based on the question type, the Table Lookup stream identifies knowledge bases where answers to the question can potentially be found. The stream uses keywords from the question to identify relevant entries in the selected knowledge bases and extracts candidate answers. Finally, the QA system reranks and sanity checks the candidates and selects the final answer.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Questions and Corpus
</SectionTitle>
      <Paragraph position="0"> To get a clear picture of the impact of using different information extraction methods for the off-line construction of knowledge bases, similarly to (Fleischman et al., 2003), we focused only on questions about persons, taken from the TREC-8 through TREC 2003 question sets. The questions we looked at were of two different types: person identification (e.g., 2301. What composer wrote &amp;quot;Die G&amp;quot;otterd&amp;quot;ammerung&amp;quot;?) and person definition (e.g., 959. Who was Abraham Lincoln?). The knowledge base relevant for answering questions of these types is a table with several fields containing a person name, an information bit about the per-son (e.g., occupation, position, activities), the confidence value assigned by the extraction modules to this information bit (based on its frequency and the reliability of the patterns used for extraction), and the source document identification. The Table Lookup finds the entries whose relevant fields best match the keywords from the question.</Paragraph>
      <Paragraph position="1"> We performed our experiments with the 336 TREC questions about persons that are known to have at least one answer in the collection. The collection used at TREC 8, 9 and 10 (referred to as TREC-8 in the rest of the paper) consists of 1,727,783 documents, with 239 of the corresponding questions identified by our system as asking about persons. The collection used at TREC 2002 and 2003 (AQUAINT) contains 1,033,461 documents and 97 of the questions for these editions of TREC are person questions.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Extraction of Role Information
</SectionTitle>
    <Paragraph position="0"> In this section we describe the two extraction methods we used to create knowledge bases containing information about persons: extraction using surface text patterns and using syntactic patterns.</Paragraph>
    <Paragraph position="1"> Clearly, the performance of an information extraction module depends on the set of language phenomena or patterns covered, but this relation is not straightforward: having more patterns allows one to find more information, and thus increases recall, but it might introduce additional noise that hurts precision. Since in our experiments we aimed at comparing extraction modules based on surface text vs.</Paragraph>
    <Paragraph position="2"> syntactic patterns, we tried to keep these two modules parallel in terms of the phenomena covered.</Paragraph>
    <Paragraph position="3"> First, the collections were tagged with a Named Entity tagger based on TnT (TnT, 2003) and trained on CoNLL data (CoNLL, 2003). The Named Entity tagger was used mainly to identify person names as separate entities. Although the tagging itself was not perfect, we found it useful for restricting our surface text patterns.</Paragraph>
    <Paragraph position="4"> Below we describe the two extraction methods.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Extraction with Surface Text Patterns
</SectionTitle>
      <Paragraph position="0"> To extract information about roles, we used the set of surface patterns originally developed for the QA system we used at TREC 2003 (Jijkoun et al., 2004).</Paragraph>
      <Paragraph position="1"> The patterns are listed in Table 1.</Paragraph>
      <Paragraph position="2"> In these patterns, person is a phrase that is tagged as person by the Named Entity tagger, role is a word from a list of roles extracted from the WordNet (all hyponyms of the word 'person,' 15703 entries),1 role-verb is from a manually constructed list of &amp;quot;important&amp;quot; verbs (discovered, invented, etc.; 48 entries), leader is a phrase identifying leadership from a manually created list of leaders (president, minister, etc.; 22 entries). Finally, superlat is the superlative form of an adjective and location is a phrase tagged as location by the Named Entity tagger.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Extraction with Syntactic Patterns
</SectionTitle>
      <Paragraph position="0"> To use the syntactic structure of sentences for role information extraction, the collections were parsed with Minipar (Lin, 1998), a broad coverage dependency parser for English. Minipar is reported to achieve 88% precision and 80% recall with respect to dependency relations when evaluated on the SU-SANNE corpus. We found that it performed well on the newpaper and newswire texts of our collections and was fairly robust to fragmented and not well-formed sentences frequent in this domain. Before extraction, Minipar's output was cleaned and made more compact. For example, we removed some empty nodes in the dependency parse to resolve non-local dependencies. While not loosing any important information, this made parses easier to analyse when developing patterns for extraction.</Paragraph>
      <Paragraph position="1"> Table 2 lists the patterns that were used to extract information about persons; we show syntactic dependencies as arrows from dependents to heads, with Minipar's dependency labels above the arrows.</Paragraph>
      <Paragraph position="2"> As with the earlier surface patterns, role is one of the nouns in the list of roles (hyponyms of person  out snippets that may not be about roles; in some of the experiments below, we turn this filtering mechanism off. Pattern Example ... role, person The British actress, Emma Thompson ... (superlat|first|last)..., person The first man to set foot on the moon, Armstrong person,... role... Audrey Hepburn, goodwill ambassador for UNICEF. person,... (superlat|first|last)... Brown, Democrats' first black chairman. person,... role-verb... Christopher Columbus, who discovered America, ... role person District Attoney Gil Garcetti role... person The captain of the Titanic Edward John Smith person,... leader... location Tony Blair, the prime minister of England location... leader, person The British foreign secretary , Jack Straw  in WordNet), role-verb is one of the &amp;quot;important verbs.&amp;quot; The only restriction for person was that it should contain a proper noun.</Paragraph>
      <Paragraph position="3"> When an occurence of a pattern was found in a parsed sentence, the relation (person; infobit) was extracted, where info-bit is a sequence of all words below role or role-verb in the dependency graph (i.e., all dependents along with their dependents etc.), excluding the person. For example, for the sentence Jane Goodall, an expert on chimps, says that evidence for sophisticated mental performances by apes has become ever more convincing, that matches the pattern person appo-[?][?][?]role, the extracted information was (Jane Goodall; an expert on chimps).</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experiments and Results
</SectionTitle>
    <Paragraph position="0"> We ran both surface pattern and syntactic pattern extraction modules on the two collections, with a switch for role filtering. The performance of the Table Lookup stream of our QA system was then evaluated on the 336 role questions using the answer patterns provided by the TREC organizers. An early error analysis showed that many of the incorrect answers were due to the table lookup process (see Section 3) rather than the information extraction method itself: correct answers were in the tables, but the lookup mechanism failed to find them or picked up other, irrelevant bits of information. Since we were interested in evaluating the two extraction methods rather than the lookup mechanism, we performed another experiment: we reduced the sizes of the collections to simplify the automatic lookup.</Paragraph>
    <Paragraph position="1"> For each TREC question with an answer in the collection, NIST provides a list of documents that are known to contain an answer to this question. We put together the document lists for all questions, which left us with much smaller sub-collections (16.4 MB for the questions for the TREC-8 collection and 3.2 MB for the AQUAINT collection). Then, we ran the two extraction modules on these small collections and evaluated the performance of the QA system on the resulting tables. All the results reported below were obtained with these sub-collections. Comparison of the extraction modules on the full TREC collections gave very similar relative results.</Paragraph>
    <Paragraph position="2"> Table 3 gives the results of the different runs for the syntactic pattern extraction and the surface pattern extraction on the TREC-8 collection: the number of correct answers (in the top one and the top three answer candidates) for the 239 person questions. The columns labeled Roles+show the results for the extraction modules using the list of possible roles from WordNet (Section 4), and the columns labeled Roles[?]show the results when the extraction modules consider any word as possibly denoting a role. The results of the runs on the AQUAINT collection with 97 questions are shown in Table 4.</Paragraph>
    <Paragraph position="3"> The syntactic pattern module without role filtering scored best of all, with more than a third of the  tion (97 questions).</Paragraph>
    <Paragraph position="4"> questions answered correctly for the TREC-8 collection. Another interesting observation is that in all experiments the modules based on syntactic patterns outperformed the surface-text-based extraction.</Paragraph>
    <Paragraph position="5"> Furthermore, there is a striking difference between the results in Table 3 (questions from TREC 8, 9 and 10) and the results in Table 4 (questions from TREC 2002 and 2003). The questions from the more recent editions of TREC are known to be much harder: indeed, the Table Lookup stream answers only 21% of the questions from TREC 2002 and 2003, vs. 38% for earlier TRECs.</Paragraph>
    <Paragraph position="6"> In all experiments, both for syntactic and surface patterns, using the list of roles as a filtering mechanism decreases the number of correct answers. Using lexical information from WordNet improves the precision of the extraction modules less than it hurts the recall. Moreover, in the context of our knowledge base lookup mechanism, low precision of the extracted information does not seem to be an obstacle: the irrelevant information that gets into the tables is either never asked for or filtered out during the final sanity check and answer selection stage.</Paragraph>
    <Paragraph position="7"> This confirms the conclusions of (Bernardi et al., 2003): in this specific task having more data seems to be more useful than having better data.</Paragraph>
    <Paragraph position="8"> To illustrate the interplay between the precision and recall of the extraction module and the performance of the QA system, Table 5 gives the comparison of the different extraction mechanisms (syntactic and surface patterns, using or not using the list of roles for filtering). The row labelled # facts shows the size of the created knowledge base, i.e., the number of entries of the form (person, info), extracted by each method. The row labelled Precision shows the precision of the extracted information (i.e., how many entries are correct, according to a human annotator) estimated by random sampling and manual evaluation of 1% of the data for each table, similar to (Fleischman et al., 2003). The row labelled Corr. answers gives the number of questions correctly answered using the extracted information.</Paragraph>
    <Paragraph position="9">  ent extraction methods on the TREC-8 collection.</Paragraph>
    <Paragraph position="10"> The results in Table 5 indicate that role filtering affects the syntactic and surfaces modules quite differently. Filtering seems almost essential for the surface-pattern-based extraction, as it increases the precision from 23% to 68%. This confirms the results of Fleischman et al. (2003): shallow methods may benefit significantly from the post-processing.</Paragraph>
    <Paragraph position="11"> On the other hand, the precision improvement for the syntactic module is modest: from 54% to 61%.</Paragraph>
    <Paragraph position="12"> The data from the syntactic module contains much less noise, although the sizes of the extracted tables before role filtering are almost the same. After filtering, the number of valid entries from the syntactic module (i.e., the table size multiplied by the estimated precision) is about 6000. This is substantially better than the recall of the surface module (about 4100 valid entries).</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Error Analysis
</SectionTitle>
    <Paragraph position="0"> In theory, all relatively simple facts extracted by the surface pattern module should also be extracted by the syntactic pattern module. Moreover, the syntactic patterns should extract more facts, especially ones whose structure deviates from the patterns pre-defined in the surface pattern module, e.g., where elements adjacent in the syntactic parse tree are far apart on the surface level. To better understand the differences between the two extraction approaches and to verify the conjecture that syntactic parsing does indeed increase the recall of the extracted information, we performed a further (manual) error analysis, identifying questions that were answered with one extraction method but not with the other.</Paragraph>
    <Paragraph position="1"> Tables 6 and 7 gives the breakdown of the performance of the two modules, again in terms of the questions answered correctly. We show the results for the 239 questions on the TREC-8 collection; for the 97 questions on the AQUAINT corpus the relative scores are similar. As Tables 6 and 7 indicate, not all questions answered by the surface pattern module were also answered by the syntactic pattern module, contrary to our expectations. We took a closer look at the questions for which the two modules performed differently.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.1 Syntactic Patterns vs. Surface Patterns
</SectionTitle>
      <Paragraph position="0"> There were three types of errors responsible for producing an incorrect answer by the syntactic pattern module for questions correctly answered with surface patterns. The most frequent errors were parsing errors. For 6 out of 12 questions (see Table 6) the answer was not extracted by the syntactic pattern method, because the sentences containing the answers were not parsed correctly. The next most frequent error was caused by the table lookup process. For 4 questions out of the 12, the required information was extracted but simply not selected from the table as the answer due to a failure of the lookup algorithm. The remaining errors (2 out of 12) were of a different type: for these 2 cases the surface pattern extraction did perform better than the syntactic method. In both cases this was because of wildcards allowed in the surface patterns. E.g., for the sentence . . . aviator Charles Lindbergh married Anne Spencer Morrow. . . the syntactic pattern method extracted only the relation (Charles Lindbergh; aviator), whereas the surface pattern method also extracted (Anne Spencer Morrow; aviator Charles Lindbergh married), because of the pattern 'role. . . person' with role instantiated with aviator and person with Anne Spencer Morrow. In fact, the extracted information is not even correct, because Anne is not an aviator but Lindbergh's wife. However, due to the fuzzy nature of the lookup mechanism, this new entry in the knowledge base allows the QA system to answer correctly the question 646. Who was Charles Lindbergh's wife?, which is not answered with the syntactic pattern extraction module.</Paragraph>
      <Paragraph position="1"> To summarize, of the 12 questions where the surface patterns outperformed the syntactic patterns * 6 questions were not answered by the syntactic method due to parsing errors, * 4 were not answered because of the table lookup failure and * for 2 the surface-based method was more appropriate. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.2 Surface Patterns vs. Syntactic Patterns
</SectionTitle>
      <Paragraph position="0"> We also took a closer look at the 32 questions for which the syntactic extraction performed better than the surface patterns (see Table 6). For the surface pattern extraction module there were also three types of errors. First, some patterns were missing, e.g., person role-verb.... The only difference from one of the actually used patterns (person,... role-verb...) is that there is no comma between person and role-verb.</Paragraph>
      <Paragraph position="1"> This type of incompleteness of the set of the surface patterns was the cause for 16 errors out of 32.</Paragraph>
      <Paragraph position="2"> The second class of errors was caused by the Named Entity tagger. E.g., Abraham Lincoln was always tagged as location, so the name never matched any of the surface patterns. Out of 32 questions, 10 were answered incorrectly for this reason. Finally, for 6 questions out of 32, the syntactic extraction performed better because the information could not be captured on the surface level. For example, the surface pattern module did not extract the fact that Oswald killed Kennedy from the sentence . . . when Lee Harvey Oswald allegedly shot and killed President John F. Kennedy. . . , because none of the patterns matched. Indeed, Lee Harvey Oswald and the potentially interesting verb killed are quite far apart in the text, but there is an immediate relation (subject) on the syntactic level.</Paragraph>
      <Paragraph position="3"> It is worth pointing out that there were no lookup errors for the surface pattern method, even though it used the exact same lookup mechanism as the approach based on syntactic patterns (that did experience various lookup errors, as we have seen).</Paragraph>
      <Paragraph position="4"> It seems that the increased recall of the syntactic pattern approach caused problems by making the lookup process harder.</Paragraph>
      <Paragraph position="5"> To summarize, out of 32 questions answered using syntactic extraction method but not by the surface pattern approach * 16 questions would have required extending the set of surface patterns, * 10 questions were not answered because of NE tagging error, and * 6 questions required syntactic analysis for extraction of the relevant information.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
6.3 Adding Patterns?
</SectionTitle>
      <Paragraph position="0"> We briefly return to a problem noted for extraction based on surface patterns: the absence of certain surface patterns. The surface pattern person role-verb... was not added because, we felt, it would introduce too much noise in the knowledge base. With dependency parsing this is not an issue as we can require that person is the subject of role-verb. So in this case the syntactic pattern module has a clear advantage. More generally, while we believe that extraction methods based on hand-crafted patterns are necessarily incomplete (in that they will fail to extract certain relevant facts), these observations suggest that coping with the incompleteness is a more serious problem for the surface patterns than for the syntactic ones.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML