File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-2015_metho.xml
Size: 19,507 bytes
Last Modified: 2025-10-06 14:09:46
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-2015"> <Title>Learning Strategies for Open-Domain Natural Language Question Answering</Title> <Section position="3" start_page="85" end_page="88" type="metho"> <SectionTitle> 2 QABLe - Learning to Answer Questions </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="85" end_page="85" type="sub_section"> <SectionTitle> 2.1 Overview </SectionTitle> <Paragraph position="0"> Figure 1 shows a diagram of the QABLe framework. The bottom-most layer is the natural language textual domain. It represents raw textual sources, questions, and answers. The intermediate layer consists of processing modules that translate between the raw textual domain and the top-most layer, an abstract representation used to reason and learn.</Paragraph> <Paragraph position="1"> This framework is used both for learning to answer questions and for the actual QA task.</Paragraph> <Paragraph position="2"> While learning, the system is provided with a set of training instances, each consisting of a textual narrative, a question, and a corresponding answer. During the performance phase, only the narrative and question are given.</Paragraph> <Paragraph position="3"> At the lexical level, an answer to a question is generated by applying a series of transformation rules to the text of the narrative. These transformation rules augment the original text with one or more additional sentences, such that one of these explicitly contains the answer, and matches the form of the question.</Paragraph> <Paragraph position="4"> On the abstract level, this is essentially a process of searching for a path through problem space that transforms the world state, as described by the textual source and question, into a world state containing an appropriate answer. This process is made efficient by learning answer-generation strategies. These strategies store procedural knowledge regarding the way in which answers are derived from text, and suggest appropriate transformation rules at each step in the answer-generation process. Strategies (and the procedural knowledge stored therein) are acquired by explaining (or deducing) correct answers from training examples. The framework's ability to answer questions is tested only with respect to the kinds of documents it has seen during training, the kinds of questions it has practiced answering, and its interface to the world (domain sensors and operators).</Paragraph> <Paragraph position="5"> In the next two sections we discuss lexical preprocessing, and the representation of features and relations over them in the QABLe framework. In section 2.4 we look at the structure of transformation rules and describe how they are instantiated. In section 2.5, we build on this information and describe details of how strategies are learned and utilized to generate answers. In section 2.6 we explain how candidate answers are matched to the question, and extracted.</Paragraph> </Section> <Section position="2" start_page="85" end_page="86" type="sub_section"> <SectionTitle> 2.2 Lexical Pre-Processing </SectionTitle> <Paragraph position="0"> Several levels of syntactic and semantic processing are required in order to generate structures that facilitate higher order analysis. We currently use MontyTagger 1.2, an off-the-shelf POS tagger based on (Brill, 1995) for POS tagging. At the next tier, we utilize a Named Entity (NE) tagger for proper nouns a semantic category classifier for nouns and noun phrases, and a co-reference resolver (that is limited to pronominal anaphora).</Paragraph> <Paragraph position="1"> Our taxonomy of semantic categories is derived from the list of unique beginners for WordNet nouns (Fellbaum, 1998). We also have a parallel stage that identifies phrase types. Table 1 gives a list of phrase types currently in use, together with the categories of questions each phrase type can answer. In the near future, we plan to utilize a link parser to boost phrase-type tagging accuracy. For questions, we have a classifier that identifies the lexically pre-process raw text semantic category of information requested by the question. Currently, this taxonomy is identical to that of semantic categories. However, in the future, it may be expanded to accommodate a wider range of queries. A separate module reformulates questions into statement form for later matching with answer-containing phrases.</Paragraph> </Section> <Section position="3" start_page="86" end_page="86" type="sub_section"> <SectionTitle> 2.3 Representing the QA Domain </SectionTitle> <Paragraph position="0"> In this section we explain how features are extracted from raw textual input and tags which are generated by pre-processing modules.</Paragraph> <Paragraph position="1"> A sentence is represented as a sequence of the syntax of the sentence. The sentence Main (e.g., main verb) is the controlling element of the sentence, and is recognized by main(w</Paragraph> <Paragraph position="3"> of speech are recognized by the function pos, as in</Paragraph> <Paragraph position="5"> , VBD). The relative syntactic ordering of words is captured by the function w</Paragraph> <Paragraph position="7"> to generate the entire sentence starting with an arbitrary word, usually the sentence Main.</Paragraph> <Paragraph position="8"> before() may also be applied as a predicate, such as</Paragraph> <Paragraph position="10"> ). Thus for each word w</Paragraph> <Paragraph position="12"> )). A consecutive sequence of words is a phrase entity or simply entity. It is given the designation e x and declared by a binding function, such as entity(e x , NE) for a named entity, and entity(e x , NP) for a syntactic group of type noun phrase. Each phrase entity is identified by its head, as head(w h , e x ), and we say that the phrase head controls the entity. A phrase entity is defined as head(w</Paragraph> <Paragraph position="14"> We also wish to represent higher-order relations such as functional roles and semantic categories.</Paragraph> <Paragraph position="15"> Functional dependency between pairs of words is encoded as, for example, subj(w</Paragraph> <Paragraph position="17"> ). Functional groups are represented just like phrase entities. Each is assigned a designation r</Paragraph> <Paragraph position="19"> defined in terms of its head and members (which may be individual words or composite entities).</Paragraph> <Paragraph position="20"> Semantic categories are similarly defined over the set of words and syntactic phrase entities - for example, sem_cat(c</Paragraph> <Paragraph position="22"> Semantically, sentences are treated as events defined by their verbs. A multi-sentential passage is represented by tying the member sentences together with relations over their verbs. We declare two such relations - seq and cause. The seq relation between two sentences, seq(s</Paragraph> <Paragraph position="24"> )), is defined as the sequential ordering in time of the corresponding events. The cause relation cause(s</Paragraph> <Paragraph position="26"> )) is defined such that the second event is causally dependent on the first.</Paragraph> </Section> <Section position="4" start_page="86" end_page="87" type="sub_section"> <SectionTitle> 2.4 Primitive Operators and Transformation Rules </SectionTitle> <Paragraph position="0"> The system, in general, starts out with no procedural knowledge of the domain (i.e., no transformation rules). However, it is equipped with 9 primitive operators that define basic actions in the domain. Primitive operators are existentially quantified. They have no activation condition, but only an existence condition - the minimal binding condition for the operator to be applicable in a given state. A primitive operator has the form is an action implemented in the domain. An example primitive operator is</Paragraph> <Paragraph position="2"> Other primitive operators delete words or manipulate entire phrases. Note that primitive operators act directly on the syntax of the domain.</Paragraph> <Paragraph position="3"> In particular, they manipulate words and phrases.</Paragraph> <Paragraph position="4"> A primitive operator bound to a state in the domain constitutes a transformation rule. The procedure for instantiating transformation rules using primitive operators is given in Figure 2. The result of this procedure is a universally quantified rule having the form</Paragraph> <Paragraph position="6"> either the name of an action in the world or an internal predicate. C represents the necessary condition for rule activation in the form of a conjunction over the relevant attributes of the world state.</Paragraph> <Paragraph position="7"> The first component, priority rating, is an inductively acquired measure of the rule's performance on previous instances. The second component modulates the priority rating with respects to a frequency of use measure. The third component captures any uncertainty inherent in the underlying features serving as parameters to the rule.</Paragraph> <Paragraph position="8"> Each time a new rule is added to the rule base, an attempt is made to combine it with similar existing rules to produce more general rules having a wider relevance and applicability.</Paragraph> <Paragraph position="9"> Given a rule is active. Therefore the hypothesis represented by the triggering condition is likely an overgeneralization of the target concept. This means that rule A may bind in some states erroneously. However, since all rules that can bind in a state compete to fire in that state, if there is a better rule, then A will be preempted and will not fire.</Paragraph> </Section> <Section position="5" start_page="87" end_page="88" type="sub_section"> <SectionTitle> 2.5 Generating Answers </SectionTitle> <Paragraph position="0"> Returning to Figure 1, we note that at the abstract level the process of answer generation begins with the extraction of features active in the current state.</Paragraph> <Paragraph position="1"> These features represent low-level textual attributes and the relations over them described in section 2.3.</Paragraph> <Paragraph position="2"> Immediately upon reading the current state, the system checks to see if this is a goal state. A goal state is a state who's corresponding textual domain representation contains an explicit answer in the right form to match the questions. In the abstract representation, we say that in this state all of the goal constraints are satisfied.</Paragraph> <Paragraph position="3"> If the current state is indeed a goal state, no further inference is required. The inference process terminates and the actual answer is identified by the matching technique described in section 2.6 and extracted.</Paragraph> <Paragraph position="4"> If the current state is not a goal state and more processing time is available, QABLe passes the state to the Inference Engine (IE). This module stores strategies in the form of decision lists of rules. For a given state, each strategy may recommend at most one rule to execute. For each strategy this is the first rule in its decision list to fire. The IE selects the rule among these with the highest relative rank, and recommends it as the next transformation rule to be applied to the current state.</Paragraph> <Paragraph position="5"> If a valid rule exists it is executed in the domain. This modifies the concrete textual layer. At this point, the pre-processing and feature extraction stages are invoked, a new current state is produced, and the inference cycle begins anew.</Paragraph> <Paragraph position="6"> If a valid rule cannot be recommend by the IE, QABLe passes the current state to the Search Engine (SE). The SE uses the current state and its set of primitive operators to instantiate a new rule, as described in section 2.4. This rule is then executed in the domain, and another iteration of the process begins.</Paragraph> <Paragraph position="7"> If no more primitive operators remain to be applied to the current state, the SE cannot instantiate a new rule. At this point, search for the goal state cannot proceed, processing terminates, and QABLe returns failure.</Paragraph> <Paragraph position="8"> 1. select primitive operator to instantiate 2. bind active state variables & goal spec to existentially quantified condition variables 3. execute action in domain 4. update expected effect of new rule according to change in state variable values When the system is in the training phase and the SE instantiates a new rule, that rule is generalized against the existing rule base. This procedure attempts to create more general rules that can be applied to unseen example instances. Once the inference/search process terminates (successfully or not), a reinforcement learning algorithm is applied to the entire rule searchinference tree. Specifically, rules on the solution path receive positive reward, and rules that fired, but are not on the solution path receive negative reinforcement.</Paragraph> </Section> <Section position="6" start_page="88" end_page="88" type="sub_section"> <SectionTitle> 2.6 Candidate Answer Matching and Extraction </SectionTitle> <Paragraph position="0"> As discussed in the previous section, when a goal state is generated in the abstract representation, this corresponds to a textual domain representation that contains an explicit answer in the right form to match the questions. Such a candidate answer may be present in the original text, or may be generated by the inference/search process. In either case, the answer-containing sentence must be found, and the actual answer extracted. This is accomplished by the Answer Matching and Extraction procedure.</Paragraph> <Paragraph position="1"> The first step in this procedure is to reformulate the question into a statement form. This results in a sentence containing an empty slot for the information being queried. Recall further that QABLe's pre-processing stage analyzes text with respect to various syntactic and semantic types. In addition to supporting abstract feature generation, these tags can be used to analyze text on a lexical level. The goal now is to find a sentence whose syntactic and semantic analysis matches that of the reformulated question's as closely as possible.</Paragraph> </Section> </Section> <Section position="4" start_page="88" end_page="89" type="metho"> <SectionTitle> 3 Experimental Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="88" end_page="89" type="sub_section"> <SectionTitle> 3.1 Experimental Setup </SectionTitle> <Paragraph position="0"> We evaluate our approach to open-domain natural language question answering on the Remedia corpus. This is a collection of 115 children's stories provided by Remedia Publications for reading comprehension. The comprehension of each story is tested by answering five who, what, where, and why questions.</Paragraph> <Paragraph position="1"> The Remedia Corpus was initially used to evaluate the Deep Read reading comprehension system, and later also other systems, including Quarc and the Brown University statistical language processing class project.</Paragraph> <Paragraph position="2"> The corpus includes two answer keys. The first answer key contains annotations indicating the story sentence that is lexically closest to the answer found in the published answer key (AutSent). The second answer key contains sentences that a human judged to best answer each question (HumSent). Examination of the two keys shows the latter to be more reliable. We trained and tested using the HumSent answers. We also compare our results to the HumSent results of prior systems. In the Remedia corpus, approximately 10% of the questions lack an answer. Following prior work, only questions with annotated answers were considered.</Paragraph> <Paragraph position="3"> We divided the Remedia corpus into a set of 55 tests used for development, and 60 tests used to evaluate our model, employing the same partition scheme as followed by the prior work mentioned above. With five questions being supplied with each test, this breakdown provided 275 example instances for training, and 300 example instances to test with. However, due to the heavy reliance of our model on learning, many more training examples were necessary. We widened the training set by adding story-question-answer sets obtained from several online sources. With the extended corpus, QABLe was trained on 262 stories with 3-5 questions each, corresponding to</Paragraph> </Section> <Section position="2" start_page="89" end_page="89" type="sub_section"> <SectionTitle> 3.2 Discussion of Results </SectionTitle> <Paragraph position="0"> Table 2 compares the performance of different versions of QABLe with those reported by the three systems described above. We wish to discern the particular contribution of transformation rule learning in the QABLe model, as well as the value of expanding the training set. Thus, the QABLe-N/L results indicate the accuracy of answers returned by the QA matching and extraction algorithm described in section 2.6 only. This algorithm is similar to prior answer extraction techniques, and provides a baseline for our experiments. The QABLe-L results include answers returned by the full QABLe framework, including the utilization of learned transformation rules, but trained only on the limited training portion of the Remedia corpus. The QABLe-L+ results are for the version trained on the expanded training set.</Paragraph> <Paragraph position="1"> As expected, the accuracy of QABLe-N/L is comparable to those of the earlier systems. The Remedia-only training set version, QABLe-L, shows an improvement over both the baseline QABLe, and most of the prior system results. This is due to its expanded ability to deal with semantic alternations in the narrative by finding and learning transformation rules that reformulate the alternations into a lexical form matching that of the question.</Paragraph> <Paragraph position="2"> The results of QABLe-L+, trained on the expanded training set, are for the most part noticeably better than those of QABLe-L. This is because training on more example instances leads to wider domain coverage through the acquisition of more transformation rules. Table 3 gives a break-down of rule learning and use for the two learning versions of QABLe. The first column is the total number of rules learned by each system version. The second column is the number of rules that ended up being successfully used in generating an answer. The third column gives the average number of rules each system needed to answer an answer (where a correct answer was generated).</Paragraph> <Paragraph position="3"> Note that QABLe-L+ used fewer rules on average to generate more correct answers than QABLe-L.</Paragraph> <Paragraph position="4"> This is because QABLe-L+ had more opportunities to refine its policy controlling rule firing through reinforcement and generalization.</Paragraph> <Paragraph position="5"> Note that the learning versions of QABLe do significantly better than the QABLe-N/L and all the prior systems on why-type questions. This is because many of these questions require an inference step, or the combination of information spanning multiple sentences. QABLe-L and QABLe-L+ are able to successfully learn transformation rules to deal with a subset of these cases.</Paragraph> </Section> </Section> class="xml-element"></Paper>