XML Viewer - h05-1076

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1076_metho.xml
Size: 11,279 bytes
Last Modified: 2025-10-06 14:09:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1076">
  <Title>Hong Kong</Title>
  <Section position="5" start_page="605" end_page="606" type="metho">
    <SectionTitle>
3. Web-derived Answer Patterns
</SectionTitle>
    <Paragraph position="0"> In addition to using metadata for RC, the proposed approach also leverages knowledge sources that are external to the core RC resources - primarily the Web and other available corpora. This section describes our approach that attempts to automatically derive answer patterns from the Web as well as score useful answer patterns to aid RC. We utilize the open domain question-answer pairs (2393 in all) from the Question Answering track of TREC (TREC8-TREC12) as a basis for automatic answer pattern acquisition.</Paragraph>
    <Section position="1" start_page="605" end_page="606" type="sub_section">
      <SectionTitle>
3.1 Deriving Question Patterns
</SectionTitle>
      <Paragraph position="0"> We define a set of question tags (Q_TAGS) that extend the metadata above in order to represent question patterns. The tags include one for main verbs (Q_MVerb), three for named entities (Q_LCN, Q_PRN and Q_ORG) and one for base noun phrases (Q_BNP). We are also careful to ensure that noun phrases tagged as named entities are not further tagged as base noun phrases.</Paragraph>
      <Paragraph position="1">  A question pattern is expressed in terms of Q_TAGS. A question pattern can be used to represent multiple questions in the TREC QA resource. An example is shown in Table 5.</Paragraph>
      <Paragraph position="2"> Tagging the TREC QA resource provides us with a set of question patterns {QP</Paragraph>
    </Section>
    <Section position="2" start_page="606" end_page="606" type="sub_section">
      <SectionTitle>
3.2 Deriving Answer Patterns
</SectionTitle>
      <Paragraph position="0"> For each question pattern, we aim to derive answer patterns for it automatically from the Web.</Paragraph>
      <Paragraph position="1"> The set of answer patterns capture possible ways of embedding a specific answer in an answer sentence. We will describe the algorithm for deriving answer patterns as following and illustrate with the following question answer pair from TREC QA: Q: When did Alexander Graham Bell invent the telephone? A: 1876</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="606" end_page="606" type="metho">
    <SectionTitle>
1. Formulate the Web Query
</SectionTitle>
    <Paragraph position="0"> The question is tagged and the Web query is formulated as &amp;quot;Q_TAG&amp;quot;+ &amp;quot;ANSWER&amp;quot;, i.e. Question: &amp;quot;When did Alexander Graham Bell invent the telephone?&amp;quot; QP: When do Q_PRN Q_MVerb Q_BNP ? where Q_PRN= &amp;quot;Alexander Graham Bell&amp;quot;, Q_MVerb= &amp;quot;invent&amp;quot;, and Q_BNP= &amp;quot;the telephone&amp;quot; hence Web query: &amp;quot;Alexander Graham Bell&amp;quot;+ &amp;quot;invent&amp;quot; + &amp;quot;the telephone&amp;quot; + &amp;quot;1876&amp;quot; 2. Web Search and Snippet Selection The Web query is submitted to the search engine Google using the GoogleAPI and the top 100 snippets are downloaded. From each snippet, we select up to ten contiguous words to the left as well as to the right of the &amp;quot;ANSWER&amp;quot; for answer pattern extraction. The selected words must be continuous and do not cross the snippet boundary that Google denotes with '...'.</Paragraph>
  </Section>
  <Section position="7" start_page="606" end_page="606" type="metho">
    <SectionTitle>
3. Answer Pattern Selection
</SectionTitle>
    <Paragraph position="0"> We label the terms in each selected snippet with the Q_TAGs from the question as well as the answer tag &lt;A&gt;. The shortest string containing all these tags (underlined below) is extracted as the answer pattern (AP). For example: Snippet 1: 1876, Alexander Graham Bell invented the telephone in the United States...</Paragraph>
  </Section>
  <Section position="8" start_page="606" end_page="606" type="metho">
    <SectionTitle>
AP 1: &lt;A&gt;, Q_PRN Q_MVerb Q_BNP.
</SectionTitle>
    <Paragraph position="0"> (N.B. The answer tag &lt;A&gt; denotes &amp;quot;1876&amp;quot; in this example).</Paragraph>
    <Paragraph position="1">  algorithm for Web-derived answer questions calls for specific answers, such as a factoid in a word or phrase. Hence the question-answer pairs from TREC QA are suitable for use. On the other hand, Remedia is less suitable here because it contains labelled answer sentences instead of factoids. Inclusion of whole answer sentences in Web query formulation generally does not return the answer pattern that we seek in this work.</Paragraph>
    <Section position="1" start_page="606" end_page="606" type="sub_section">
      <SectionTitle>
3.3 Scoring the Acquired Answer Patterns
</SectionTitle>
      <Paragraph position="0"> The answer pattern acquisition algorithm returns multiple answer patterns for every question-answer pair submitted to the Web. In this subsection we present an algorithm for deriving scores for these answer patterns. The methodology is motivated by the concept of confidence level, similar to that used in data mining. The algorithm is as follows:</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="606" end_page="607" type="metho">
    <SectionTitle>
1. Formulate the Web Query
</SectionTitle>
    <Paragraph position="0"> For each question pattern QP</Paragraph>
    <Paragraph position="2"> obtained previously, randomly select an example question among the m</Paragraph>
    <Paragraph position="4"> options that belongs to this pattern. The question is tagged and the Web query is formulated in terms of the Q_TAGs only. (Please note that the corresponding answer is excluded from Web query formulation here, which differs from the answer pattern acquisition algorithm). E.g., Question: &amp;quot;When did Alexander Graham Bell invent the telephone? Q_TAGs: Q_PRN Q_MVerb Q_BNP Web query: &amp;quot;Alexander Graham Bell&amp;quot;+ &amp;quot;invent&amp;quot; + &amp;quot;the telephone&amp;quot; 2. Web Search and Snippet Selection The Web query is submitted to the search engine  Google and the top 100 snippets are downloaded. 3. Scoring each Answer Pattern AP</Paragraph>
    <Paragraph position="6"> and the retrieved snippets, totally the following counts for each answer pattern AP</Paragraph>
    <Paragraph position="8"> and for which the tag &lt;A&gt; matches the correct answer.  in order to achieve decent coverage of the available examples. The confidence for AP</Paragraph>
    <Paragraph position="10"> Equation (3) tries to assign high confidence values to answer patterns AP ij that choose the correct answers, while other answer patterns are assigned low confidence values. E.g.: &lt;A&gt;, Q_PRN Q_MVerb Q_BNP (Confidence=0.8) Q_MVerb by Q_PRN in &lt;A&gt;. (Confidence=0.76)</Paragraph>
    <Section position="1" start_page="607" end_page="607" type="sub_section">
      <SectionTitle>
3.4 Answer Pattern Matching in RC
</SectionTitle>
      <Paragraph position="0"> The Web-derived answer patterns are used in the RC task. Based on the question and its QP, we select the related AP to match among the answer sentence candidates. The candidate that matches the highest-scoring AP will be selected. We find that this technique is very effective for RC as it can discriminate among candidate answer sentences that are rated &amp;quot;equally good&amp;quot; by the BOW or metadata matching approaches, e.g.: Q: When is the Chinese New Year? QP: When is the Q_BNP? where Q_BNP=Chinese New Year Related AP: Q_BNP is &lt;A&gt; (Confidence=0.82) Candidate answer sentences 1: you must wait a few more weeks for the Chinese New Year.</Paragraph>
      <Paragraph position="1"> Candidate answer sentences 2: Chinese New Year is most often between January 20 and February 20.</Paragraph>
      <Paragraph position="2"> Both candidate answer sentences have the same number of matching terms - &amp;quot;Chinese&amp;quot;, &amp;quot;New&amp;quot; and &amp;quot;Year&amp;quot; and the same metadata, i.e.</Paragraph>
      <Paragraph position="3"> Q_BNP=Chinese New Year. The term &amp;quot;is&amp;quot; is excluded by stopword removal. However the Web-derived answer pattern is able to select the second candidate as the correct answer sentence.</Paragraph>
      <Paragraph position="4"> Hence our system gives high priority to the Web-derived AP - if a candidate answer sentence can match an answer pattern with confidence &gt; 0.6, the candidate is taken as the final answer. No further knowledge constraints will be enforced.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="607" end_page="608" type="metho">
    <SectionTitle>
4. Context Assistance
</SectionTitle>
    <Paragraph position="0"> During RC, the initial application of the BOW approach focuses the system's attention on a small set of answer sentence candidates. However, it may occur the true answer sentence is not contained in this set. As was observed by (Riloff and Thelen, 2000) and (Charniak et al., 2000), the correct answer sentence often precedes/follows the sentence with the highest number of matching words. Hence both the preceding and following context sentences are searched in their work to find the answer sentence especially for why questions.</Paragraph>
    <Paragraph position="1"> Our proposed approach references this idea in leveraging contextual knowledge for RC.</Paragraph>
    <Paragraph position="2"> Incorporation of contextual knowledge is very effective when used in conjunction with named entity (NE) identification. For instance, who questions should be answered with words tagged with Q_PRN (for persons). If the candidate sentence with the highest number of matching words does not contain the appropriate NE, it will not be selected as the answer sentence. Instead, our system searches among the two preceding and two following context sentences for the appropriate NE. Table 6 offers an illustration.</Paragraph>
    <Paragraph position="3"> Data analysis Remedia training set shows that the context window size selected is appropriate for when, who and where questions.</Paragraph>
    <Section position="1" start_page="607" end_page="608" type="sub_section">
      <SectionTitle>
Football Catches On Fast
</SectionTitle>
      <Paragraph position="0"> (LATROBE, PA., September 4, 1895) - The new game of football is catching on fast, and each month new teams are being formed.</Paragraph>
      <Paragraph position="1"> Last night was the first time that a football player was paid. The man's name is John Brallier, and he was paid $10 to take the place of someone who was hurt....</Paragraph>
      <Paragraph position="2"> Question: Who was the first football player to be paid? Sentence with maximum # matching words: Last night was the first time that a football player was paid.</Paragraph>
      <Paragraph position="3"> Correct answer sentence: The man's name is John Brallier, and he was paid $10 to take the place of someone who was hurt.</Paragraph>
      <Paragraph position="4">  As for why questions, a candidate answer sentence is selected from the context window if its first word is one of &amp;quot;this&amp;quot;, &amp;quot;that&amp;quot;, &amp;quot;these&amp;quot;, &amp;quot;those&amp;quot;, &amp;quot;so&amp;quot; or &amp;quot;because&amp;quot;. We did not utilize contextual constraints for what questions.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML