File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/p03-1003_intro.xml
Size: 5,730 bytes
Last Modified: 2025-10-06 14:01:47
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1003"> <Title>A Noisy-Channel Approach to Question Answering</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Current state-of-the-art Question Answering (QA) systems are extremely complex. They contain tens of modules that do everything from information retrieval, sentence parsing (Ittycheriah and Roukos, 2002; Hovy et al., 2001; Moldovan et al, 2002), question-type pinpointing (Ittycheriah and Roukos, 2002; Hovy et al., 2001; Moldovan et al, 2002), semantic analysis (Xu et al., Hovy et al., 2001; Moldovan et al, 2002), and reasoning (Moldovan et al, 2002). They access external resources such as the WordNet (Hovy et al., 2001, Pasca and Harabagiu, 2001, Prager et al., 2001), the web (Brill et al., 2001), structured, and semi-structured databases (Katz et al., 2001; Lin, 2002; Clarke, 2001). They contain feedback loops, ranking, and re-ranking modules. Given their complexity, it is often difficult (and sometimes impossible) to understand what contributes to the performance of a system and what doesn't.</Paragraph> <Paragraph position="1"> In this paper, we propose a new approach to QA in which the contribution of various resources and components can be easily assessed. The fundamental insight of our approach, which departs significantly from the current architectures, is that, at its core, a QA system is a pipeline of only two modules: * An IR engine that retrieves a set of M documents/N sentences that may contain answers to a given question Q.</Paragraph> <Paragraph position="2"> * And an answer identifier module that given a question Q and a sentence S (from the set of sentences retrieved by the IR engine) identifies a sub-string S A of S that is likely to be an answer to Q and assigns a score to it.</Paragraph> <Paragraph position="3"> Once one has these two modules, one has a QA system because finding the answer to a question Q amounts to selecting the sub-string S A of highest score. Although this view is not made explicit by QA researchers, it is implicitly present in all systems we are aware of.</Paragraph> <Paragraph position="4"> In its simplest form, if one accepts a whole sentence as an answer (S A = S), one can assess the likelihood that a sentence S contains the answer to a question Q by measuring the cosine similarity between Q and S. However, as research in QA demonstrates, word-overlap is not a good enough metric for determining whether a sentence contains the answer to a question. Consider, for example, the question &quot;Who is the leader of France?&quot; The sentence &quot;Henri Hadjenberg, who is the leader of France's Jewish community, endorsed confronting the specter of the Vichy past&quot; overlaps with all question terms, but it does not contain the correct answer; while the sentence &quot;Bush later met with French President Jacques Chirac&quot; does not overlap with any question term, but it does contain the correct answer.</Paragraph> <Paragraph position="5"> To circumvent this limitation of word-based similarity metrics, QA researchers have developed methods through which they first map questions and sentences that may contain answers in different spaces, and then compute the &quot;similarity&quot; between them there. For example, the systems developed at IBM and ISI map questions and answer sentences into parse trees and surface-based semantic labels and measure the similarity between questions and answer sentences in this syntactic/semantic space, using QA-motivated metrics. The systems developed by CYC and LCC map questions and answer sentences into logical forms and compute the &quot;similarity&quot; between them using inference rules. And systems such as those developed by IBM and BBN map questions and answers into feature sets and compute the similarity between them using maximum entropy models that are trained on question-answer corpora. From this perspective then, the fundamental problem of question answering is that of finding spaces where the distance between questions and sentences that contain correct answers is small and where the distance between questions and sentences that contain incorrect answers is large.</Paragraph> <Paragraph position="6"> In this paper, we propose a new space and a new metric for computing this distance. Being inspired by the success of noisy-channel-based approaches in applications as diverse as speech recognition (Jelinek, 1997), part of speech tagging (Church, 1988), machine translation (Brown et al., 1993), information retrieval (Berger and Lafferty, 1999), and text summarization (Knight and Marcu, 2002), we develop a noisy channel model for QA. This model explains how a given sentence S A that contains an answer sub-string A to a question Q can be rewritten into Q through a sequence of stochastic operations. Given a corpus of question-answer pairs (Q, S A ), we can train a probabilistic model for estimating the conditional probability In Section 2, we first present the noisy-channel model that we propose for this task. In Section 3, we describe how we generate training examples. In Section 4, we describe how we use the learned models to answer factoid questions, we evaluate the performance of our system using a variety of experimental conditions, and we compare it with a rule-based system that we have previously used in several TREC evaluations. In Section 5, we demonstrate that the framework we propose is flexible enough to accommodate a wide range of resources and techniques that have been employed in state-of-the-art QA systems.</Paragraph> </Section> class="xml-element"></Paper>