File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/a00-1021_intro.xml

Size: 3,967 bytes

Last Modified: 2025-10-06 14:00:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="A00-1021">
  <Title>Ranking suspected answers to natural language questions using predictive annotation</Title>
  <Section position="4" start_page="0" end_page="150" type="intro">
    <SectionTitle>
2 System description
</SectionTitle>
    <Paragraph position="0"> Our system (Figure 1) consists of two pieces: an IR component (GuruQA) that which returns matching texts, and an answer selection compo- null neat (AnSel/Werlect) that extracts and ranks potential answers from these texts.</Paragraph>
    <Paragraph position="1"> This paper focuses on the process of ranking potential answers selected by the IR engine, which is itself described in (Prager et al., 1999).</Paragraph>
    <Section position="1" start_page="150" end_page="150" type="sub_section">
      <SectionTitle>
2.1 The Information Retrieval
component
</SectionTitle>
      <Paragraph position="0"> In the context of fact-seeking questions, we made the following observations: * In documents that contain the answers, the query terms tend to occur in close proximity to each other.</Paragraph>
      <Paragraph position="1"> * The answers to fact-seeking questions are usually phrases: &amp;quot;President Clinton&amp;quot;, &amp;quot;in the Rocky Mountains&amp;quot;, and &amp;quot;today&amp;quot;). * These phrases can be categorized by a set of a dozen or so labels (Figure 2) corresponding to question types.</Paragraph>
      <Paragraph position="2"> * The phrases can be identified in text by pattern matching techniques (without full NLP).</Paragraph>
      <Paragraph position="3"> As a result, we defined a set of about 20 categories, each labeled with its own QA-Token, and built an IR system which deviates from the traditional model in three important aspects. * We process the query against a set of approximately 200 question templates which, may replace some of the query words with a set of QA-Tokens, called a SYNclass. Thus &amp;quot;Where&amp;quot; gets mapped to &amp;quot;PLACES&amp;quot;, but &amp;quot;How long &amp;quot; goes to &amp;quot;@SYN(LENGTH$, DURATIONS)&amp;quot;.</Paragraph>
      <Paragraph position="4"> Some templates do not cause complete replacement of the matched string. For example, the pattern &amp;quot;What is the population&amp;quot; gets replaced by &amp;quot;NUMBERS population'. null * Before indexing the text, we process it with Textract (Byrd and Ravin, 1998; Wacholder et al., 1997), which performs lemmatization, and discovers proper names and technical terms. We added a new module (Resporator) which annotates text segments with QA-Tokens using pattern matching. Thus the text &amp;quot;for 5 centuries&amp;quot; matches the DURATIONS pattern &amp;quot;for :CARDINAL _timeperiod&amp;quot;, where :CAR-DINAL is the label for cardinal numbers, and _timeperiod marks a time expression.</Paragraph>
      <Paragraph position="5"> * GuruQA scores text passages instead of documents. We use a simple documentand collection-independent weighting scheme: QA-Tokens get a weight of 400, proper nouns get 200 and any other word - 100 (stop words are removed in query processing after the pattern template matching operation). The density of matching query tokens within a passage is contributes a score of 1 to 99 (the highest scores occur when all matched terms are consecutive).</Paragraph>
      <Paragraph position="6"> Predictive Annotation works best for Where, When, What, Which and How+adjective questions than for How+verb and Why questions, since the latter are typically not answered by phrases. However, we observed that &amp;quot;by&amp;quot; + the present participle would usually indicate the description of a procedure, so we instantiate a METHODS QA-Token for such occurrences. We have no such QA-Token for Why questions, but we do replace the word &amp;quot;why&amp;quot; with &amp;quot;@SYN(result, cause, because)&amp;quot;, since the occurrence of any of these words usually betokens an explanation.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML