File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-2009_intro.xml
Size: 4,827 bytes
Last Modified: 2025-10-06 14:03:29
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2009"> <Title>Answering the Question You Wish They Had Asked: The Impact of Paraphrasing for Question Answering</Title> <Section position="2" start_page="0" end_page="33" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In a typical Question Answering system, an input question is analyzed to formulate a query to retrieve relevant documents from a target corpus (Chu-Carroll et al., 2006; Harabagiu et al., 2006; Sun et al., 2006). This analysis of the input question affects the subset of documents that will be examined and ultimately plays a key role in determining the answers the system chooses to produce. However, most existing QA systems, whether they adopt knowledge-based, statistical, or hybrid methods, are very sensitive to small variations in the question form, often yielding substantially different answers for questions that are semantically equivalent. For example, our system's answer to &quot;Who invented the telephone?&quot; is &quot;Alexander Graham Bell;&quot; however, its top answer to a paraphrase of the above question &quot;Who is credited with the invention of the telephone?&quot; is &quot;Gutenberg,&quot; who is credited with the invention of the printing press, while &quot;Alexander Graham Bell,&quot; who is credited with the invention of the telephone, appears in rank four.</Paragraph> <Paragraph position="1"> To demonstrate the ubiquity of this phenomenon, we asked the aforementioned two questions to several QA systems on the web, including LCC's PowerAnswer system,1 MIT's START system,2 Answer-Bus,3 and Ask Jeeves.4 All systems exhibited different behavior for the two phrasings of the question, ranging from minor variations in documents presented to justify an answer, to major differences such as the presence of correct answers in the answer list. For some systems, the more complex question form posed sufficient difficulty that they chose not to answer it.</Paragraph> <Paragraph position="2"> In this paper we focus on investigating a high risk but potentially high payoff approach, that of improving system performance by replacing the user question with a paraphrased version of it. To obtain candidate paraphrases, we adopt a simple yet powerful technique based on machine translation, which we describe in the next section. Our experimental results show that we can potentially achieve a 35% relative improvement in system performance if we have an oracle that always picks the optimal paraphrase for each question. Our ultimate goal is to automatically select from the set of candidates a high potential paraphrase using a component trained against the QA system. In Section 3, we present our initial approach to paraphrase selection which shows that, despite the tremendous odds against selecting performance-improving paraphrases, our conservative selection algorithm resulted in marginal improvement in system performance.</Paragraph> <Paragraph position="3"> To measure the impact of paraphrases on QA systems, we seek to adopt a methodology by which paraphrases can be automatically generated from a user question. Inspired by the use of parallel translations to mine paraphrasing lexicons (Barzilay and McKeown, 2001) and the use of MT engines for word sense disambiguation (Diab, 2000), we leverage existing machine translation systems to generate semantically equivalent, albeit lexically and syntactically distinct, questions.</Paragraph> <Paragraph position="4"> Figure 1 (A) illustrates how MT-based paraphrasing captures lexical paraphrasing, ranging from obtaining simple synonyms such as hazardous and dangerous to deriving more complex equivalent phrases such as expectant mother and pregnant woman. In addition to lexical paraphrasing, some two-way translations achieve structural paraphrasing, as illustrated by the example in Figure 1 (B). Using multiple MT engines can help paraphrase diversity. For example, in Figure 1 (B), if we use the @promt translator5 for English-to-Spanish translation and Babelfish6 for Spanish-to-English translation, we get &quot;Find out on the nuclear armament program of India&quot; where both lexical and structural paraphrasings are observed.</Paragraph> <Paragraph position="5"> The motivation of generating an array of lexically and structurally distinct paraphrases is that some of these paraphrases may better match the processing capabilities of the underlying QA system than the original question and are thus more likely to produce correct answers. Our observation is that while the paraphrase set contains valuable performance-improving phrasings, it also includes a large num- null tered out to reduce negative impact on performance.</Paragraph> </Section> class="xml-element"></Paper>