File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-2501_metho.xml

Size: 27,071 bytes

Last Modified: 2025-10-06 14:09:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2501">
  <Title>Strategies for Advanced Question Answering</Title>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
2 Answer Fusion, Ranking and Reliability
</SectionTitle>
    <Paragraph position="0"> Given the size of today's very large document repositories, one can expect that any complex topic will be covered from multiple points of view. This feature is exploited by the question decomposition techniques, which generate a set of multiple questions in order to cover all of the possible interpretations of a complex topic. However, a set of decomposed questions may end up producing a disparate (and potentially contradictory) set of answers. In order for Q/A systems to use these collections of answers to their advantage, answer fusion must be performed in order to identify a single, unique, and coherent answer.</Paragraph>
    <Paragraph position="1"> We view answer fusion as a three-step process. First, an open-domain, template-based answer formalization is constructed based on predicate-argument frames. Second, a probabilistic model is trained to detect relations between the extracted templates. Finally, a set of template merging operators are introduced to construct the merged answer. The block architecture for answer fusion is illustrated in Figure 2. The system functionality is demonstrated with the example illustrated in Figure 3.</Paragraph>
    <Paragraph position="2"> Our method first converts the extracted answers into a series of open-domain templates, which are based on predicate-argument frames (Surdeanu et al, 2003). The next component detects generic inter-template relations.</Paragraph>
    <Paragraph position="3"> Typical &amp;quot;greedy&amp;quot; approaches in Information Extraction (Hobbs et al, 1997; Surdeanu and Harabagiu, 2002) use heuristics that favor proximity for template merging.</Paragraph>
    <Paragraph position="4"> The example in Figure 3 proves that this is not always the best decision, even for templates that share the same predicate and have compatible slots.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Open-domain template representation
</SectionTitle>
      <Paragraph position="0"> A key issue to the proposed approach is the open-domain template representation. While template-based representations have been proposed for information merging in the past (Radev and McKeown, 1998), they considered only domain-specific scenarios. Based on our recent successes with the extraction of predicate-argument frames (Surdeanu et al, 2003), we propose a template representation that is a direct mapping of predicate-argument frames. For example, the first template in Figure 3 is generated from the frame detected for the predicate &amp;quot;assassinate&amp;quot;: the first slot - ARG0 typically stands for subject or agent; the second slot ARG1 - stands for the predicate object, and the modifier arguments ARGM-LOC and ARGM-TMP indicate the location and date of the event.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.2 Detection of template relations
</SectionTitle>
      <Paragraph position="0"> In this section, we introduce a probabilistic model for the detection of template relations that has been proven to infer better connectivity.</Paragraph>
      <Paragraph position="1"> If the templates that are candidates for merging are selected entirely based on heuristics (Radev and McKeown, 1998; Surdeanu and Harabagiu, 2002), the application of fusion operators for QA is unreliable, due to their relatively weak semantic understanding of the  templates. The novelty in our approach is to precede template merging by the discovery of relations among templates.</Paragraph>
      <Paragraph position="2"> We propose a novel matching approach based on template attributes that support relation detection for merging. The approach combines phrasal parsing, lemma normalization and semantic approximation (via WordNet lexical chains). For example, this approach detects that the attributes ARG1 of the first template (&amp;quot;assassinate&amp;quot;) and ARG1 of the third template (&amp;quot;kill&amp;quot;) from Figure 3 refer to the same entity, by matching &amp;quot;terrorist&amp;quot; with &amp;quot;terrorists&amp;quot;. Moreover, the names of the templates (&amp;quot;assassinate&amp;quot; and &amp;quot;kill&amp;quot;) are connected through a WordNet lexical chain.</Paragraph>
      <Paragraph position="3"> A &amp;quot;greedy&amp;quot; detection procedure would incorrectly merge templates with the same name and a similar structure. The second and third templates from Figure 3, both named &amp;quot;kill&amp;quot;, illustrate this case. Instead, we propose a novel probabilistic algorithm for relation detection. The algorithm computes a probability distribution of possible relations among entity templates, and retains those relations whose probabilities exceed a confidence threshold.</Paragraph>
    </Section>
    <Section position="3" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
Operator Description
</SectionTitle>
      <Paragraph position="0"> CONTRADICTION Two templates contain contradicting information, e.g. the same terrorist event is reported to have a different number of victims.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1" end_page="1" type="metho">
    <SectionTitle>
ADDITION The second template introduces additional
</SectionTitle>
    <Paragraph position="0"> facts, e.g. one template indicates the location/date of a terrorist event while the second indicates number of victims.</Paragraph>
  </Section>
  <Section position="6" start_page="1" end_page="1" type="metho">
    <SectionTitle>
REFINEMENT The second template provides more refined
</SectionTitle>
    <Paragraph position="0"> information about the same event, e.g. the town instead of the country of location.</Paragraph>
    <Paragraph position="1"> AGREEMENT The templates contain redundant information. This operator is useful to heighten the answer strength.</Paragraph>
    <Paragraph position="2"> GENERALIZATION The two templates contain only incomplete facts that form an event only when combined.</Paragraph>
    <Paragraph position="3"> TREND The templates indicate similar patterns over time.</Paragraph>
  </Section>
  <Section position="7" start_page="1" end_page="1" type="metho">
    <SectionTitle>
NO INFORMATION The templates contain no useful information,
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.3 Fusion Operators
</SectionTitle>
      <Paragraph position="0"> The probabilistic model detects relations. A set of 7 template fusion operators is applied on the detected relations to generate the final set of templates. The operators are described in Table 1. The purpose of the fusion operators is to label the generic relations with the required merge operation, e.g. ADDITION, CONTRADICTION, TREND. For example, the templates T1 and T3 can be merged with the ADDITION operator. Optionally, the resulting template can be merged with template T2 with the weaker operator TREND, because they mark a similar type of event that takes place in the same location and date.</Paragraph>
      <Paragraph position="1"> The generic template relations are labeled with one of the operators described in Table 1 with a machine learning approach based on Support Vector Machines (SVM) and a dedicated SVM kernel. SVMs are ideal for this task because they do not require an explicit set of features (a very complex endeavor in the planned open-domain environment), but localized kernels that provide a measure of template similarity. The labeled template relations direct the actual merging operation, which yields the final list of templates. Actual text can be generated from these templates, but this is beyond the goal of this paper.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="1" end_page="1" type="metho">
    <SectionTitle>
3 Bootstrapping Question Answering
</SectionTitle>
    <Paragraph position="0"> Two key components of modern QA systems are the identification of the answer type, and the extraction of candidate answers that are classified in the corresponding answer type category. For example, the question &amp;quot;What weapons of mass destruction (WMD) does Iraq have?&amp;quot; has the answer type &amp;quot;WMD&amp;quot; and accepts concepts such as &amp;quot;anthrax&amp;quot; as valid (but not necessarily correct) candidate answers. This approach provides an efficient implementation for the &amp;quot;exact answer&amp;quot; question answering spirit, but it is plagued by limited scalability. null We address the above scalability problem through several innovations: we develop a novel bootstrapping technique that increases significantly the coverage of the existing answer type categories. Furthermore, new answer type categories are created for concepts that can not be classified according to the currently available knowledge. In addition to the immediate application for answer extraction, the induced answer type knowledge is used to bootstrap the passage retrieval component, through intelligent query expansion.</Paragraph>
    <Paragraph position="1"> Like most of the successful AQUAINT QA systems, LCC's system uses an answer type (AT) ontology for the classification of AT categories. The AT ontology is based on WordNet but can be extended with other open-domain or domain-specific categories. Instances of given categories are identified in answer passages using a modified version of the CiceroLite Named-Entity Recognizer (NER).</Paragraph>
    <Paragraph position="2"> The first innovation in bootstrapping focuses on the capability of LCC's QA system to identify AT instances. The algorithm is summarized in Figure 4. The algorithm uses as input a very large set of question/answer pairs, and the existing AT ontology currently used by LCC's QA system. For each AT category, the algorithm adds the exact answers from the training question/answer pairs that share the same AT category to the BootstrappedLexicon, which is the lexicon generated as one outcome of this algorithm. Besides the lexicon, the algorithm induces a set of answer extraction patterns, BootstrappedPatterns, which guarantees the scalability of the proposed approach. BootstrappedPatterns is initialized to the empty set and is iteratively increased during the bootstrap loop. During the loop, the system scores all possible extraction patterns, and selects the best pattern to be added to BootstrappedPatterns. Concepts discovered with the newly extracted pattern are appended to BootstrappedLexicon, and the process repeats.</Paragraph>
    <Paragraph position="3"> If a question/answer pair exists in the training set with &amp;quot;anthrax&amp;quot; as the exact answer, step 1.2 of the bootstrapping algorithm adds &amp;quot;anthrax&amp;quot; to BootstrappedLexicon. The bootstrap loop (step 1.4) mines the training documents for all possible patterns that contain anthrax. The best pattern selected is &amp;quot;deploy anthrax&amp;quot;, which is generalized to &amp;quot;deploy ANY-WMD&amp;quot;. This pattern is then used to extract other candidates for the WMD category, such as &amp;quot;smallpox&amp;quot;, &amp;quot;botulinum&amp;quot; etc.</Paragraph>
    <Paragraph position="4"> The algorithm illustrated in Figure 4 addresses the discovery of new instances for existing AT categories.</Paragraph>
    <Paragraph position="5"> A direct extension of this algorithm handles the situation when the discovered entities and patterns do not belong to a known category. The detection of new AT categories will be performed based on the AT word, which is the question concept that drives the selection of the AT. For example, the AT concept in the question: &amp;quot;What viral agent was used in Iraq?&amp;quot; is &amp;quot;viral agent&amp;quot;, which does not exist in the current WordNet ontology.</Paragraph>
    <Paragraph position="6"> If the answer type concept does not exist in WordNet, the bootstrapping algorithm will create a distinct category for this concept. If the answer type concept exists in WordNet, the algorithm attaches the bootstrapped entities and patterns to the concept hypernym that provides the largest coverage without overlapping any other known categories. This approach is robust enough to function without word sense disambiguation: the algorithm explores all relevant synsets and selects the one that maximizes the above condition. For example, the answer type concept &amp;quot;fighter aircraft&amp;quot; from the question: &amp;quot;What fighter aircrafts are in use in the Iraqi army?&amp;quot; is mapped to the hypernym synset airplane (Sense #1), instead of vehicle (Sense #1), which overlaps with other vehicle categories such as cars.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Enhancing retrieval, navigation, and fusion
</SectionTitle>
      <Paragraph position="0"> Answer accuracy is conditioned by the ability of the QA system to generate effective queries for the retrieval subsystem (Moldovan et al., 2003). Queries that are too restrictive will incorrectly narrow the search space, and fail to retrieve the relevant answer passages. An example of such a query is (biological AND agents AND Qaeda), which is generated for the question &amp;quot;What biological agents does al Qaeda possess?&amp;quot;. This query will miss most of the relevant text passages since they do not include any explicit reference to biological agents.</Paragraph>
      <Paragraph position="1"> The extensions to the AT ontology, described above, enable an intelligent query expansion based on two expansion resources: AT instances, and extraction patterns. More precisely, each question concept mapped under any of the AT categories is expanded with the instances and keywords from the extraction patterns associated with that category. In this case, the expanded query for the above question is: ((biological AND agents) OR (bacterial AND agent) OR (viral AND agent) OR (fungal AND agent) OR (toxic AND agent) OR botulism OR botulinum OR smallpox OR encephalitis OR (deploy)) AND (Qaeda). This query illustrates two important requirements: the conversion of extraction patterns into keywords (e.g., &amp;quot;deploy&amp;quot; for &amp;quot;deploy ANY-WMD&amp;quot;); and the controlled expansion through selective keyword selection (e.g., for &amp;quot;biological agents&amp;quot;).</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.2 Continuous updating of scenario knowledge
</SectionTitle>
      <Paragraph position="0"> The bootstrapping algorithm described in the previous section is based on the large question/answer data set, which is largely open-domain. We consider a direct extension of this algorithm that automatically learns scenario knowledge by monitoring the user's browsing habits.</Paragraph>
      <Paragraph position="1"> Question/answer pairs are extracted based on the user's feedback. These pairs form the seeds for a meta-bootstrapping loop, as illustrated in Figure 5. Similar documents - i.e. documents where the identical exact answer and the question keywords are identified - are produced from the relevant Q/A pairs. This process can  be equally applied on the Web or on a static document collection. The bootstrapping algorithm described in Figure 5 is applied on the extracted documents. The inferred AT instances are further used to enrich the collection of considered documents, which forms the meta-bootstrapping loop.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="1" end_page="1" type="metho">
    <SectionTitle>
4 User Background
</SectionTitle>
    <Paragraph position="0"> Research in question answering, and in the more general field of information retrieval, has traditionally focused on building generic representations of the document content, largely independent of any subjective factors. It is important to note, however, that all users are different: not only do they have different backgrounds and expertise, but they also vary in their goals and reasons for using a Q/A system. This variety has made it difficult for systems to represent user intentions automatically or to make use of them in Q/A systems.</Paragraph>
    <Paragraph position="1"> Figure 6 illustrates the inherent differences between system end-users. Since (by definition) a novice lacks the domain-specific knowledge available to an expert, we should expect a novice user to choose a path completely different than an expert user, leading to extremely different results for the same top level question.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.1 Assessing User Background
</SectionTitle>
      <Paragraph position="0"> We evaluate users via a discrete evaluation scale, which ranks users as novice, casual, or expert users based on how much background knowledge they have on the given topic. The approach classifies users based on the path chosen in the generated question decomposition tree.</Paragraph>
      <Paragraph position="1"> This kind of characterization of user expertise can be used to reduce the exploration space generated through question decomposition. The most significant drawback of question decomposition is the exponential increase in the number of questions to be answered, which, to our knowledge, is not addressed by current QA research.</Paragraph>
      <Paragraph position="2"> We filter the generated question decomposition tree using the detected user expertise: for example, if the user is known to be an &amp;quot;expert&amp;quot;, only the paths generated through &amp;quot;expert&amp;quot; decomposition - i.e. generated using significant world and topic knowledge - will be followed.</Paragraph>
      <Paragraph position="3"> To be able to use the question decomposition tree for user classification, we must first classify the decomposition tree itself, i.e. the branches must be marked with one of the three discrete classification values. By shifting the classification problem from the (yet) abstract user background to the decomposition tree, we argue that the problem is simplified because we know how much background and world knowledge was necessary for the question decomposition. For example, to generate the &amp;quot;expert user&amp;quot; path in Figure 6, the system must have access to world knowledge that indicates that an &amp;quot;impact&amp;quot; can be economic, social etcetera and that  How have thefts impacted the safety of Russia's nuclear navy, and has the the theft problem been increased or reduced over time? How have thefts impacted the safety of Russia's nuclear navy, and has the the theft problem been increased or reduced over time? What sort of items have been stolen? What sort of items have been stolen? To what degree do different thefts put nuclear or radioactive materials at risk? To what degree do different thefts put nuclear or radioactive materials at risk?  nuclear materials are sensitive equipment. We will quantify the amount of knowledge used for decomposition and label the generated branches accordingly. Once a labeled decomposition tree is available, the user's background can be classified based on the selected path. The relevant answers (where &amp;quot;relevancy&amp;quot; can be explicitly requested from the user, or implicitly detected based on the documents visited) are mapped back to the corresponding questions, which provides a dynamic trace in the question decomposition tree. Using the tree structure and the classification labels previously assigned, we will train machine learning algorithms that will infer the final user expertise classification.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
4.2 Representing User Background
</SectionTitle>
      <Paragraph position="0"> We propose a new multi-modal approach for the representation of the user profile that includes concepts and relations in addition to terms.</Paragraph>
      <Paragraph position="1"> Traditionally, the user profile (or background) has been represented as a term vector, derived from the previously relevant document (be it online or offline information). Under this approach, each profile P, is represented as: P = ((t</Paragraph>
      <Paragraph position="3"> are terms from relevant documents and w i are term weights, typically computed with the tf * idf metric. Our approach is novel in two regards. The first innovation stems from the observation that it is common for one user to explore multiple topics even during the same session. For example, an analyst interested in the current Iraq situation, must explore topics related to military action, peace keeping, and terrorism. Hence the one vector representation for the profile P is clearly insufficient. In the proposed representation, the profile P is represented as a set of vectors p  )), i = 1, 2, ..., n, and m is the size of vector p i . We expect the number and size of the profile vectors to change dynamically. When a new document is marked as relevant, the document vector is either: (a) merged with an existing profile, if their similarities are higher than a given threshold, or (b) used to generate a new profile. Profile vectors are removed based on negative feedback: if a document vector similar to an existing profile receives continuous negative feedback the corresponding profile is deleted in order to keep the profile synchronized with the user's current interest patterns. We believe this profile representation to be flexible enough to accommodate all expertise levels, from novice to expert. For example, the expert user's background will consist of multiple vectors; each specializes on a clear, domain-specific direction, while the novice user's profile will most likely contain fewer vectors with more generic terms.</Paragraph>
      <Paragraph position="4"> The second innovation includes concepts and relations in addition to lexical terms in the user profile. A preliminary analysis of the CNS documents indicates that &amp;quot;al&amp;quot; is among the most frequent terms, but, by itself, &amp;quot;al&amp;quot; is considered a stop word by most information retrieval systems. However, the significance of the term becomes evident when the complete concept, &amp;quot;al Qaeda&amp;quot; is considered. This observation indicates that semantic information is significant for the representation of the user profile. In addition to indexing entities, we index generic of entity-to-entity relations that are significant, and often the goal, of the intelligence analyst's work.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="1" end_page="1" type="metho">
    <SectionTitle>
5 Processing Negation in Question An-
</SectionTitle>
    <Paragraph position="0"> swering Although all human systems of communication represent negation in some format, the issue of how best to address negation in open-domain Q/A still remains an open research question. Previous Q/A systems have dealt with negation by filtering the retrieved answer and eliminating answers that share key terms with the query but are irrelevant for the reasons of negation (Martinovic, 2002; Attardi et al, 2001) or by constructing relational databases to query the answers can handle negation in the question since the scope is clearly defined in the relational database (Jung and Lee, 2001). However, neither of these systems has dealt with the central problem that negation poses for Q/A: determining the scope of the negation context. Consider the following examples: a. Which countries did not vote for the Iraq war resolution in the Security Council? b. Which countries did not provide help to the coalition during the Gulf War in 1991? c. What planets have no moon? In question (a), the scope of negation only includes the countries that were members of the Security Council during the Iraq war resolution that were able to vote but did not. However, examples (b) and (c) are ambiguous with respect to the scope of negation. In question (b), the scope could encompass the whole world, or all the countries in the Middle East that should have provided help but did not. In question (c), even more entities can be included under the scope of negation: all of the planets in the solar system or even all of the planets in the entire universe (including planets that are not yet discovered).</Paragraph>
    <Paragraph position="1"> In order for a Q/A system to answer questions like (b) or (c), the scope of negation must first be determined. We initially propose to develop empirical studies for recognizing the most frequent cases of negation: e.g. the &amp;quot;no&amp;quot; negation - &amp;quot;with no terrorists, the world would be safer&amp;quot;, the &amp;quot;nothing&amp;quot; negation - &amp;quot;the inspectors found nothing&amp;quot;, and other core cases of local negation - e.g. &amp;quot;thefts did not occur at the beginning&amp;quot;. We shall complement our methods of recognizing negation in textual sentences by analyzing various syntactic and semantic contexts of negation, e.g. adverbial negation &amp;quot;the president never leaves the White House without the Secret Service approval&amp;quot;.</Paragraph>
    <Paragraph position="2"> In addition, we assume that when a speaker is formulating a question to find out whether a proposition is true or false, s/he formulates the question with the form of the proposition which would be the most informative if it turned out to be true. We expect that if a question has the form of negation, the speaker believes that the negative answer is the most informative. Using such hypotheses, we argue that in a negation question, if the scope is ambiguous, like in (b) or (c), then we can solve the ambiguity by choosing the scope that will be more informative for the user.</Paragraph>
    <Paragraph position="3"> Given these assumptions, we propose that negation can be addressed in Q/A in three ways: By using the user background. In questions like (b) above, if the user background is terrorism, then we can limit the scope of the countries to those who have been linked to terrorism.</Paragraph>
    <Paragraph position="4"> By interacting with the user. If no user background can be established, as in question (c), we expect to use dialogue techniques to enable the user can specify the relevant scope.</Paragraph>
    <Paragraph position="5"> By finding cues from the answers to the positive question. Finally, we expect to be able to use a combination of heuristics and information extraction methods to deduce the answer to a negative question from the answers to the corresponding positive question. For example, when searching for the answer to the positive analog of question (c), we can limit the scope of the negation to the solar systems where there are planets with moons.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML