XML Viewer - w04-3251

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-3251_metho.xml
Size: 14,410 bytes
Last Modified: 2025-10-06 14:09:30
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3251">
  <Title>Instance-Based Question Answering: A Data-Driven Approach</Title>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 An Instance-Based Approach
</SectionTitle>
    <Paragraph position="0"> This paper presents a data driven, instance-based approach for Question Answering. We adopt the view that strategies required in answering new questions can be directly learned from similar training examples (question-answer pairs). Consider a multi-dimensional space, determined by features extracted from training data. Each training question is represented as a data point in this space. Features can range from lexical n-grams to parse trees elements, depending on available processing.</Paragraph>
    <Paragraph position="1"> Each test question is also projected onto the feature space. Its neighborhood consists of training instances that share a number of features with the new data point. Intuitively, each neighbor is similar in some fashion to the new question. The obvious next step would be to learn from the entire neighborhood - similar to KNN classification. However, due to the sparsity of the data and because different groups of neighbors capture different aspects of the test question, we choose to cluster the neighborhood instead. Inside the neighborhood, we build individual clusters based on internal similarity. Figure 1 shows an example of neighborhood clustering. Notice that clusters may also have different granularity - i.e. can share more or less features with the new question.</Paragraph>
    <Paragraph position="2">  dimensional feature space. A set of neighborhood clusters are identified and a model is dynamically built for each of them. Each model is applied to the test question in order to produce its own set of candidate answers.</Paragraph>
    <Paragraph position="3"> By clustering the neighborhood, we set the stage for supervised methods, provided the clusters are sufficiently dense. The goal is to learn models that explain individual clusters. A model explains the data if it successfully answers questions from its corresponding cluster. For each cluster, a models is constructed and tailored to the local data.</Paragraph>
    <Paragraph position="4"> Models generating high confidence answers are applied to the new question to produce answer candidates (Figure 2) Since the test question belongs to multiple clusters, it benefits from different answerseeking strategies and different granularities.</Paragraph>
    <Paragraph position="5"> Answering clusters of similar questions involves several steps: learning the distribution of the expected answer type, learning the structure and content of queries, and learning how to extract the answer. Although present in most systems, these steps are often static, manually defined, or based on limited resources (section 2). This paper proposes a set of trainable, cluster-specific models:  1. the Answer Model Ai learns the cluster-specific distribution of answer types.</Paragraph>
    <Paragraph position="6"> 2. the Query Content Model Ui is trained to enhance  the keyword-based queries with cluster-specific content conducive to better document retrieval.</Paragraph>
    <Paragraph position="7"> This model is orthogonal to query expansion.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3. the Extraction Model Ei is dynamically built
</SectionTitle>
    <Paragraph position="0"> for answer candidate extraction, by classifying snippets of text whether they contain a correct answer or not.</Paragraph>
    <Paragraph position="1">  in order to better retrieve relevant documents, model the expected answer, and then extract it from raw text. Local question-answer pairs (Q,A) are used as training data. These models are derived directly from cluster data and collectively define a focused strategy for finding answers to similar questions (Figure 3).</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 The Answer Model
</SectionTitle>
      <Paragraph position="0"> Learning cluster-specific answer type distributions is useful not only in terms of identifying answers in running text but also in answer ranking. A probabilistic approach has the advantage of postponing answer type decisions from early in the QA process until answer extraction or answer ranking. It also has the advantage of allowing training data to shape the expected structure of answers.</Paragraph>
      <Paragraph position="1"> The answer modeling task consists of learning specific answer type distributions for each cluster of questions. Provided enough data, simple techniques such as constructing finite state machines or learning regular expressions are sufficient. The principle can also be applied to current answer ontologies by replacing the hard classification with a distribution over answer types.</Paragraph>
      <Paragraph position="2"> For high-density clusters, the problem of learning the expected answer type is reduced to learning possible answer types and performing a reliable frequency count. However, very often clusters are sparse (e.g. are based on rare features) and a more reliable method is required. k-nearest training data points Q1::Qk can be used in order to estimate the probability that the test question q will observe an answer type j:</Paragraph>
      <Paragraph position="4"> where P( j;Qi) is the probability of observing an answer of type j when asking question Qi.</Paragraph>
      <Paragraph position="5"> (q;Qi) represents a distance function between q and Qi, and is a normalizing factor over the set of all viable answer types in the neighborhood of q.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 The Query Content Model
Current Question Answering systems use IR in a
</SectionTitle>
      <Paragraph position="0"> straight-forward fashion. Query terms are extracted and then expanded using statistical and semantic similarity measures. Documents are retrieved and the top K are further processed. This approach describes the traditional IR task and does not take advantage of specific constraints, requirements, and rich context available in the QA process.</Paragraph>
      <Paragraph position="1"> The data-driven framework we propose takes advantage of knowledge available at retrieval time and incorporates it to create better cluster-specific queries. In addition to query expansion, the goal is to learn content features: n-grams and paraphrases (Hermjakob et al., 2002) which yield better queries when added to simple keyword-based queries. The Query Content Model is a cluster-specific collection of content features that generate the best document set (Table 1).</Paragraph>
      <Paragraph position="2"> Cluster: When did X start working for Y?  terms may not be appropriate if the two entities share a long history. A focused, cluster-specific content model is likely to generate more precise queries.</Paragraph>
      <Paragraph position="3"> For training, simple keyword-based queries are run through a retrieval engine in order to produce a set of potentially relevant documents. Features (n-grams and paraphrases) are extracted and scored based on their co-occurrence with the correct answer. More specifically, consider a positive class: documents which contain the correct answer, and a negative class: documents which do not contain the answer. We compute the average mutual information I(C; Fi) between a class of a document, and the absence or presence of a feature fi in the document (McCallum and Nigam, 1998). We let C be the class variable and Fi the feature variable:</Paragraph>
      <Paragraph position="5"> where H(C) is the entropy of the class variable and H(CjFi) is the entropy of the class variable conditioned on the feature variable. Features that best discriminate passages containing correct answers from those that do not, are selected as potential candidates for enhancing keyword-based queries.</Paragraph>
      <Paragraph position="6"> For each question-answer pair, we generate candidate queries by individually adding selected features (e.g. table 1) to the expanded word-based query. The resulting candidate queries are subsequently run through a retrieval engine and scored based on the number of passages containing correct answers (precision). The content features found in the top u candidate queries are included in the Query Content Model.</Paragraph>
      <Paragraph position="7"> The Content Model is cluster specific and not instance specific. It does not replace traditional query expansion - both methods can be applied simultaneously to the test questions: specific keywords are the basis for traditional query expansion and clusters of similar questions are the basis for learning additional content conducive to better document retrieval. Through the Query Content Model we allow shared context to play a more significant role in query generation.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 The Extraction Model
</SectionTitle>
      <Paragraph position="0"> During training, documents are retrieved for each question cluster and a set of one-sentence passages containing a minimum number of query terms is selected. The passages are then transformed into feature vectors to be used for classification. The features consist of n-grams, paraphrases, distances between keywords and potential answers, simple statistics such as document and sentence length, part of speech features such as required verbs etc. More extensive sets of features can be found in information extraction literature (Bikel et al., 1999).</Paragraph>
      <Paragraph position="1"> Under our data-driven approach, answer extraction consists of deciding the correctness of candidate passages. The task is to build a model that accepts snippets of text and decides whether they contain a correct answer.</Paragraph>
      <Paragraph position="2"> A classifier is trained for each question cluster.</Paragraph>
      <Paragraph position="3"> When new question instances arrive, the already trained cluster-specific models are applied to new, relevant text snippets in order to test for correctness. We will refer to the resulting classifier scores as answer confidence scores.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We present a basic implementation of the instance-based approach. The resulting QA system is fully automatically trained, without human intervention.</Paragraph>
    <Paragraph position="1"> Instance-based approaches are known to require large, dense training datasets which are currently under development. Although still sparse, the subset of all temporal questions from the TREC 9-12 (Voorhees, 2003) datasets is relatively dense compared to the rest of the question space. This makes it a good candidate for evaluating our instance-based QA approach until larger and denser datasets become available. It is also broad enough to include different question structures and varying degrees of difficulty and complexity such as: &amp;quot;When did Beethoven die?&amp;quot; &amp;quot;How long is a quarter in an NBA game?&amp;quot; &amp;quot;What year did General Montgomery lead the Allies to a victory over the Axis troops in North Africa?&amp;quot; The 296 temporal questions and their corresponding answer patterns provided by NIST were used in our experiments. The questions were processed with a part of speech tagger (Brill, 1994) and a parser (Collins, 1999).</Paragraph>
    <Paragraph position="2"> The questions were clustered using templatestyle frames that incorporate lexical items, parser labels, and surface form flags (Figure 1). Consider the following question and several of its corresponding frames: &amp;quot;When did Beethoven die?&amp;quot; when did &lt;NNP&gt; die when did &lt;NNP&gt; &lt;VB&gt; when did &lt;NNP&gt; &lt;Q&gt; when did &lt;NP&gt; &lt;Q&gt; when did &lt;Q&gt; where &lt;NNP&gt;,&lt;NP&gt;,&lt;VB&gt;,&lt;Q&gt; denote: proper noun, noun phrase, verb, and generic question term sequence, respectively. Initially, frames are generated exhaustively for each question. Each frame that applies to more than three questions is then selected to represent a specific cluster.</Paragraph>
    <Paragraph position="3"> One hundred documents were retrieved for each query through the Google API (www.google.com/api). Documents containing the full question, question number, references to TREC, NIST, AQUAINT, Question Answering and other similar problematic content were filtered out. When building the Query Content Model keyword-based queries were initially formulated and expanded. From the retrieved documents a set of content features (n-grams and paraphrases) were selected through average mutual information. The features were added to the simple queries and a new set of documents was retrieved. The enhanced queries were scored and the corresponding top 10 ngrams/paraphrases were included in the Query Content Model. The maximum n-gram and paraphrase size for these features was set to 6 words.</Paragraph>
    <Paragraph position="4"> The Extraction Model uses a support vector machine (SVM) classifier (Joachims, 2002) with a linear kernel. The task of the classifier is to decide if text snippets contain a correct answer. The SVM was trained on features extracted from one-sentence passages containing at least one keyword from the original question. The features consist of: distance between keywords and potential answers, keyword density in a passage, simple statistics such as document and sentence length, query type, lexical n-grams (up to 6-grams), and paraphrases.</Paragraph>
    <Paragraph position="5"> We performed experiments using leave-one-out cross validation. The system was trained and tested without any question filtering or manual input. Each cluster produced an answer set with corresponding scores. Top 5 answers for each instance were considered by a mean reciprocal rank (MRR) metric over all N questions: MRRN = PNi=0 1ranki , where ranki refers to the first correct occurrence in the top 5 answers for question i. While not the focus of this paper, answer clustering algorithms are likely to further improve performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML