File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1135_metho.xml

Size: 11,533 bytes

Last Modified: 2025-10-06 14:10:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1135">
  <Title>Improving QA Accuracy by Question Inversion</Title>
  <Section position="5" start_page="1073" end_page="1073" type="metho">
    <SectionTitle>
3 Algorithm
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1073" end_page="1073" type="sub_section">
      <SectionTitle>
3.1 System Architecture
</SectionTitle>
      <Paragraph position="0"> A simplified block-diagram of our PIQUANT system is shown in Figure 1. The outer block on the left, QS1, is our basic QA system, in which the</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1073" end_page="1075" type="metho">
    <SectionTitle>
QUESTION PROCESSING (QP), SEARCH (S) and
ANSWER SELECTION (AS) subcomponents are indi-
</SectionTitle>
    <Paragraph position="0"> cated. The outer block on the right, QS2, is another QA-System that is used to answer the inverted questions. In principle QS2 could be QS1 but parameterized differently, or even an entirely different system, but we use another instance of QS1, as-is. The block in the middle is our Constraints Module CM, which is the subject of this paper.</Paragraph>
    <Paragraph position="1">  The Question Processing component of QS2 is not used in this context since CM simulates its output by modifying the output of QP in QS1, as described in Section 3.3.</Paragraph>
    <Section position="1" start_page="1074" end_page="1075" type="sub_section">
      <SectionTitle>
3.2 Inverting Questions
</SectionTitle>
      <Paragraph position="0"> Our open-domain QA system employs a named-entity recognizer that identifies about a hundred types. Any of these can be answer types, and there are corresponding sets of patterns in the QUESTION PROCESSING module to determine the answer type sought by any question. When we wish to invert a question, we must find an entity in the question whose type we recognize; this entity then becomes the sought answer for the inverted question. We call this entity the inverted or pivot term.</Paragraph>
      <Paragraph position="1"> Thus for the question:  (1) &amp;quot;What was the capital of Germany in 1985?&amp;quot; Germany is identified as a term with a known type (COUNTRY). Then, given the candidate answer &lt;CANDANS&gt;, the inverted question becomes (2) &amp;quot;Of what country was &lt; CANDANS&gt; the capital in 1985?&amp;quot; Some questions have more than one invertible term. Consider for example: (3) &amp;quot;Who was the 33 rd president of the U.S.?&amp;quot; This question has 3 inversion points: (4) &amp;quot;What number president of the U.S. was &lt;CANDANS&gt;?&amp;quot; (5) &amp;quot;Of what country was &lt;CANDANS&gt; the 33 rd president?&amp;quot; (6) &amp;quot;&lt;CANDANS&gt; was the 33 rd  what of the U.S.?&amp;quot; Having more than one possible inversion is in theory a benefit, since it gives more opportunity for enforcing consistency, but in our current implementation we just pick one for simplicity. We observe on training data that, in general, the smaller the number of unique instances of an answer type, the more likely it is that the inverted question will be correctly answered. We generated a set NELIST of the most frequently-occurring named-entity types in questions; this list is sorted in order of estimated cardinality. null It might seem that the question inversion process can be quite tricky and can generate possibly unnatural phrasings, which in turn can be difficult to reparse. However, the examples given above were simply English renditions of internal inverted structures - as we shall see the system does not need to use a natural language representation of the inverted questions. Some questions are either not invertible, or, like &amp;quot;How did X die?&amp;quot; have an inverted form (&amp;quot;Who died of cancer?&amp;quot;) with so many correct answers that we know our algorithm is unlikely to benefit us. However, as it is constituted it is unlikely to hurt us either, and since it is difficult to automatically identify such questions, we don't attempt to intercept them. As reported in (Prager et al. 2004a), an estimated 79% of the questions in TREC question sets can be inverted meaningfully. This places an upper limit on the gains to be achieved with our algorithm, but is high enough to be worth pursuing.</Paragraph>
    </Section>
    <Section position="2" start_page="1075" end_page="1075" type="sub_section">
      <SectionTitle>
3.3 Inversion Algorithm
</SectionTitle>
      <Paragraph position="0"> As shown in the previous section, not all questions have easily generated inverted forms (even by a human). However, we do not need to explicate the inverted form in natural language in order to process the inverted question.</Paragraph>
      <Paragraph position="1"> In our system, a question is processed by the</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="1075" end_page="1076" type="metho">
    <SectionTitle>
QUESTION PROCESSING module, which produces a
</SectionTitle>
    <Paragraph position="0"> structure called a QFrame, which is used by the subsequent SEARCH and ANSWER SELECTION modules.</Paragraph>
    <Paragraph position="1"> The QFrame contains the list of terms and phrases in the question, along with their properties, such as POS and NE-type (if it exists), and a list of syntactic relationship tuples. When we have a candidate answer in hand, we do not need to produce the inverted English question, but merely the QFrame that would have been generated from it. Figure 1 shows that the CONSTRAINTS MODULE takes the QFrame as one of its inputs, as shown by the link from QP in QS1 to CM. This inverted QFrame can be generated by a set of simple transformations, substituting the pivot term in the bag of words with a candidate answer &lt;CANDANS&gt;, the original answer type with the type of the pivot term, and in the relationships the pivot term with its type and the original answer type with &lt;CANDANS&gt;. When relationships are evaluated, a type token will match any instance of that type. Figure 2 shows a simplified view of the original QFrame for &amp;quot;What was the capital of Germany in 1945?&amp;quot;, and Figure 3 shows the corresponding Inverted QFrame. COUNTRY is determined to be a better type to invert than YEAR, so &amp;quot;Germany&amp;quot; becomes the pivot. In Figure 3, the token &lt;CANDANS&gt; might take in turn &amp;quot;Berlin&amp;quot;, &amp;quot;Moscow&amp;quot;, &amp;quot;Prague&amp;quot; etc.</Paragraph>
    <Paragraph position="2">  The output of QS2 after processing the inverted QFrame is a list of answers to the inverted question, which by extension of the nomenclature we call &amp;quot;inverted answers.&amp;quot; If no term in the question has an identifiable type, inversion is not possible.</Paragraph>
    <Section position="1" start_page="1075" end_page="1076" type="sub_section">
      <SectionTitle>
3.4 Profiting From Inversions
</SectionTitle>
      <Paragraph position="0"> Broadly speaking, our goal is to keep or re-rank the candidate answer hit-list on account of inversion results. Suppose that a question Q is inverted around pivot term T, and for each candidate answer</Paragraph>
      <Paragraph position="2"> } is generated as described in the previous section. If T is on one of the {C ij }, then we say that C i is validated. Validation is not a guarantee of keeping or improving C</Paragraph>
      <Paragraph position="4"> position or score, but it helps. Most cases of failure to validate are called refutation; similarly, refutation</Paragraph>
      <Paragraph position="6"> is not a guarantee of lowering its score or position. null It is an open question how to adjust the results of the initial candidate answer list in light of the results of the inversion. If the scores associated with candidate answers (in both directions) were true probabilities, then a Bayesian approach would be easy to develop. However, they are not in our system. In addition, there are quite a few parameters that describe the inversion scenario.</Paragraph>
      <Paragraph position="7"> Suppose Q generates a list of the top-N candidates</Paragraph>
      <Paragraph position="9"> }. If this inversion method were not to be used, the top candidate on this list,</Paragraph>
      <Paragraph position="11"> , and generates an ordered list C ij of candidate answers found in this set. Each inverted question QT</Paragraph>
      <Paragraph position="13"> is run through our system, generating inverted answers {C</Paragraph>
      <Paragraph position="15"> }, and whether and where the pivot term T shows up on this list, represented by a list of positions {P</Paragraph>
      <Paragraph position="17"> We added to the candidate list the special answer nil, representing &amp;quot;no answer exists in the corpus.&amp;quot; As described earlier, we had observed from training data that failure to validate candidates of certain types (such as Person) would not necessarily be a real refutation, so we established a set of types SOFTREFUTATION which would contain the broadest of our types. At the other end of the spectrum, we observed that certain narrow candidate types such as UsState would definitely be refuted if validation didn't occur. These are put in set MUSTCONSTRAIN.</Paragraph>
      <Paragraph position="18"> Our goal was to develop an algorithm for recomputing all the original scores {S</Paragraph>
      <Paragraph position="20"> and MUSTCONSTRAIN. Reliably learning all those weights, along with set membership, was not possible given only several hundred questions of training data. We therefore focused on a reduced problem.</Paragraph>
      <Paragraph position="21"> We observed that when run on TREC question sets, the frequency of the rank of our top answer fell off rapidly, except with a second mode when the tail was accumulated in a single bucket. Our numbers for TRECs 11 and 12 are shown in Table 1.</Paragraph>
      <Paragraph position="22">  We decided to focus on those questions where we got the right answer in second place (for brevity, we'll call these second-place questions). Given that TREC scoring only rewards first-place answers, it seemed that with our incremental approach we would get most benefit there. Also, we were keen to limit the additional response time incurred by our approach. Since evaluating the top N answers to the original question with the Constraints process requires calling the QA system another N times per question, we were happy to limit N to 2. In addition, this greatly reduced the number of parameters we needed to learn.</Paragraph>
      <Paragraph position="23"> For the evaluation, which consisted of determining if the resulting top answer was right or wrong, it meant ultimately deciding on one of three possible outcomes: the original top answer, the original second answer, or nil. We hoped to promote a significant number of second-place finishers to top place and introduce some nils, with minimal disturbance of those already in first place.</Paragraph>
      <Paragraph position="24"> We used TREC11 data for training, and established a set of thresholds for a decision-tree approach to determining the answer, using Weka (Witten &amp; Frank, 2005). We populated sets SOFTREFUTATION and MUSTCONSTRAIN by manual inspection.</Paragraph>
      <Paragraph position="25"> The result is Algorithm A, where (i [?] {1,2}) and</Paragraph>
      <Paragraph position="27"> are the original candidate answers o The a k are learned parameters (k [?] {1..13})</Paragraph>
      <Paragraph position="29"> means the ith answer was validated</Paragraph>
      <Paragraph position="31"/>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML