File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1054_metho.xml
Size: 17,841 bytes
Last Modified: 2025-10-06 14:07:57
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1054"> <Title>Is It the Right Answer? Exploiting Web Redundancy for Answer Validation</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Overall Methodology </SectionTitle> <Paragraph position="0"> Given a question a0 and a candidate answer a1 the answer validation task is defined as the capability to assess the relevance of a1 with respect to a0 . We assume open domain questions and that both answers and questions are texts composed of few tokens (usually less than 100). This is compatible with the TREC-2001 data, that will be used as examples throughout this paper. We also assume the availability of the Web, considered to be the largest open domain text corpus containing information about almost all the different areas of the human knowledge.</Paragraph> <Paragraph position="1"> The intuition underlying our approach to answer validation is that, given a question-answer pair ([a0 ,a1 ]), it is possible to formulate a set of validation statements whose truthfulness is equivalent to the degree of relevance of a1 with respect to a0 . For instance, given the question &quot;What is the capital of the USA?&quot;, the problem of validating the answer &quot;Washington&quot; is equivalent to estimating the truthfulness of the validation statement &quot;The capital of the USA is Washington&quot;. Therefore, the answer validation task could be reformulated as a problem of statement reliability. There are two issues to be addressed in order to make this intuition effective.</Paragraph> <Paragraph position="2"> First, the idea of a validation statement is still insufficient to catch the richness of implicit knowledge that may connect an answer to a question: we will attack this problem defining the more flexible idea of a validation pattern. Second, we have to design an effective and efficient way to check the reliability of a validation pattern: our solution relies on a procedure based on a statistical count of Web searches.</Paragraph> <Paragraph position="3"> Answers may occur in text passages with low similarity with respect to the question. Passages telling facts may use different syntactic constructions, sometimes are spread in more than one sentence, may reflect opinions and personal attitudes, and often use ellipsis and anaphora. For instance, if the validation statement is &quot;The capital of USA is Washington&quot;, we have Web documents containing passages like those reported in Table 1, which can not be found with a simple search of the statement, but that nevertheless contain a significant amount of knowledge about the relations between the question and the answer. We will refer to these text fragments as validation fragments.</Paragraph> <Paragraph position="4"> 1. Capital Region USA: Fly-Drive Holidays in and Around Washington D.C.</Paragraph> <Paragraph position="5"> 2. the Insider's Guide to the Capital Area Music Scene (Washington D.C., USA).</Paragraph> <Paragraph position="6"> 3. The Capital Tangueros (Washington, DC Area, USA) 4. I live in the Nation's Capital, Washington Metropolitan Area (USA).</Paragraph> <Paragraph position="7"> 5. in 1790 Capital (also USA's capital): Wash null A common feature in the above examples is the co-occurrence of a certain subset of words (i.e.</Paragraph> <Paragraph position="8"> &quot;capital&quot;,&quot;USA&quot; and &quot;Washington&quot;). We will make use of validation patterns that cover a larger portion of text fragments, including those lexically similar to the question and the answer (e.g. fragments 4 and 5 in Table 1) and also those that are not similar (e.g. fragment 2 in Table 1). In the case of our example a set of validation statements can be generalized by the validation pattern: [capital a2 texta3 USA a2 texta3 Washington] where a2 texta3 is a place holder for any portion of text with a fixed maximal length.</Paragraph> <Paragraph position="9"> To check the correctness of a1 with respect to a0 we propose a procedure that measures the number of occurrences on the Web of a validation pattern derived from a1 and a0 . A useful feature of such patterns is that when we search for them on the Web they usually produce many hits, thus making statistical approaches applicable. In contrast, searching for strict validation statements generally results in a small number of documents (if any) and makes statistical methods irrelevant. A number of techniques used for finding collocations and co-occurrences of words, such as mutual information, may well be used to search co-occurrence tendency between the question and the candidate answer in the Web. If we verify that such tendency is statistically significant we may consider the validation pattern as consistent and therefore we may assume a high level of correlation between the question and the candidate answer. Starting from the above considerations and given a question-answer pair a4a0a6a5 a1a8a7 , we propose an answer validation procedure based on the following steps: 1. Compute the set of representative keywords a9 a0 and a9 a1 both from a0 and from a1 ; this step is carried out using linguistic techniques, such as answer type identification (from the question) and named entities recognition (from the answer); null 2. From the extracted keywords compute the validation pattern for the pair [a0a6a5 a1 ]; 3. Submit the patterns to the Web and estimate an answer validity score considering the number of retrieved documents.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Extracting Validation Patterns </SectionTitle> <Paragraph position="0"> In our approach a validation pattern consists of two components: a question sub-pattern (Qsp) and an answer sub-pattern (Asp).</Paragraph> <Paragraph position="1"> Building the Qsp. A Qsp is derived from the input question cutting off non-content words with a stop-words filter. The remaining words are expanded with both synonyms and morphological forms in order to maximize the recall of retrieved documents. Synonyms are automatically extracted from the most frequent sense of the word in WordNet (Fellbaum, 1998), which considerably reduces the risk of adding disturbing elements. As for morphology, verbs are expanded with all their tense forms (i.e. present, present continuous, past tense and past participle). Synonyms and morphological forms are added to the Qsp and composed in an OR clause.</Paragraph> <Paragraph position="2"> The following example illustrates how the Qsp is constructed. Given the TREC-2001 question &quot;When did Elvis Presley die?&quot;, the stop-words filter removes &quot;When&quot; and &quot;did&quot; from the input. Then synonyms of the first sense of &quot;die&quot; (i.e. &quot;decease&quot;, &quot;perish&quot;, etc.) are extracted from WordNet. Finally, morphological forms for all the corresponding verb tenses are added to the Qsp. The resultant Qsp will be the following: [Elvis a2 texta3 Presley a2 texta3 (die OR died OR dying OR perish OR ...)] Building the Asp. An Asp is constructed in two steps. First, the answer type of the question is identified considering both morpho-syntactic (a part of speech tagger is used to process the question) and semantic features (by means of semantic predicates defined on the WordNet taxonomy; see (Magnini et al., 2001) for details). Possible answer types are: DATE, MEASURE, PERSON, LOCATION, ORGANI-ZATION, DEFINITION and GENERIC. DEFINITION is the answer type peculiar to questions like &quot;What is an atom?&quot; which represent a considerable part (around 25%) of the TREC-2001 corpus. The answer type GENERIC is used for non definition questions asking for entities that can not be classified as named entities (e.g. the questions: &quot;Material called linen is made from what plant?&quot; or &quot;What mineral helps prevent osteoporosis?&quot;) In the second step, a rule-based named entities recognition module identifies in the answer string all the named entities matching the answer type category. If the category corresponds to a named entity, an Asp for each selected named entity is created. If the answer type category is either DEFINITION or GENERIC, the entire answer string except the stop-words is considered. In addition, in order to maximize the recall of retrieved documents, the Asp is expanded with verb tenses. The following example shows how the Asp is created. Given the TREC question &quot;When did Elvis Presley die?&quot; and the candidate answer &quot;though died in 1977 of course some fans maintain&quot;, since the answer type category is DATE the named entities recognition module will select [1977] as an answer sub-pattern.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Estimating Answer Validity </SectionTitle> <Paragraph position="0"> The answer validation algorithm queries the Web with the patterns created from the question and answer and after that estimates the consistency of the patterns.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Querying the Web </SectionTitle> <Paragraph position="0"> We use a Web-mining algorithm that considers the number of pages retrieved by the search engine. In contrast, qualitative approaches to Web mining (e.g.</Paragraph> <Paragraph position="1"> (Brill et al., 2001)) analyze the document content, as a result considering only a relatively small number of pages. For information retrieval we used the AltaVista search engine. Its advanced syntax allows the use of operators that implement the idea of validation patterns introduced in Section 2. Queries are composed using NEAR, OR and AND boolean operators. The NEAR operator searches pages where two words appear in a distance of no more than 10 tokens: it is used to put together the question and the answer sub-patterns in a single validation pattern.</Paragraph> <Paragraph position="2"> The OR operator introduces variations in the word order and verb forms. Finally, the AND operator is used as an alternative to NEAR, allowing more distance among pattern elements.</Paragraph> <Paragraph position="3"> If the question sub-pattern a10a12a11a14a13 does not return any document or returns less than a certain threshold (experimentally set to 7) the question pattern is relaxed by cutting one word; in this way a new query is formulated and submitted to the search engine. This is repeated until no more words can be cut or the returned number of documents becomes higher than the threshold. Pattern relaxation is performed using word-ignoring rules in a specified order. Such rules, for instance, ignore the focus of the question, because it is unlikely that it occurs in a validation fragment; ignore adverbs and adjectives, because are less significant; ignore nouns belonging to the WordNet classes &quot;abstraction&quot;, &quot;psychological feature&quot; or &quot;group&quot;, because usually they specify finer details and human attitudes. Names, numbers and measures are preferred over all the lower-case words and are cut last.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Estimating pattern consistency </SectionTitle> <Paragraph position="0"> The Web-mining module submits three searches to the search engine: the sub-patterns [Qsp] and [Asp] and the validation pattern [QAp], this last built as the composition [Qsp NEAR Asp]. The search engine returns respectively: a15a17a16a19a18a20a11a22a21a23a10a12a11a14a13a17a24 , a15a17a16a25a18a26a11a22a21a25a27a28a11a29a13a17a24 and a15a17a16a25a18a26a11a22a21a30a10a31a11a14a13 NEAR a27a32a11a14a13a33a24 . The probability a34a35a21a19a27a32a24 of a pattern a27 in the Web is calculated by:</Paragraph> <Paragraph position="2"> where a15a33a16a19a18a26a11a6a21a19a27a32a24 is the number of pages in the Web where a27 appears and a38 a1a6a39a17a34a40a1a8a41a17a43a44a11 is the maximum number of pages that can be returned by the search engine. We set this constant experimentally. However in two of the formulas we use (i.e. Point-wise Mutual Information and Corrected Conditional Probability) a38 a1a6a39a17a34a40a1a8a41a45a43a46a11 may be ignored. The joint probability P(Qsp,Asp) is calculated by means of the validation pattern probability: a34a35a21a23a10a31a27a47a13a17a24a37a36a48a34a35a21a30a10a31a11a14a13a12a49a6a50a6a51a22a52a40a27a28a11a29a13a17a24 We have tested three alternative measures to estimate the degree of relevance of Web searches: ant of Conditional Probability which considers the asymmetry of the question-answer relation. Each measure provides an answer validity score: high values are interpreted as strong evidence that the validation pattern is consistent. This is a clue to the fact that the Web pages where this pattern appears contain validation fragments, which imply answer accuracy. null PMI(Qsp,Asp) is used as a clue to the internal coherence of the question-answer validation pattern QAp. Substituting the probabilities in the PMI formula with the previously introduced Web statistics, for word co-occurrence mining (Dunning, 1993).</Paragraph> <Paragraph position="3"> We decided to check MLHR for answer validation because it is supposed to outperform PMI in case of sparse data, a situation that may happen in case of questions with complex patterns that return small number of hits.</Paragraph> <Paragraph position="5"> a64a67a27a32a11a14a13a33a24 is the number of appearances of Qsp when Asp is not present and it is calculated asa15a17a16a25a18a26a11a6a21a23a10a31a11a29a13a17a24a42a64a40a15a17a16a19a18a20a11a22a21a23a10a12a11a14a13a98a49a22a50a6a51a22a52a67a27a32a11a14a13a17a24 . Similarly, a15a33a16a19a18a20a11a22a21a20a64a67a27a32a11a14a13a17a24 is the number of Web pages where Asp does not appear and it is calculated as a38 a1a6a39a17a34a40a1a8a41a45a43a46a11a32a64a99a27a28a11a29a13 .</Paragraph> <Paragraph position="6"> Corrected Conditional Probability (CCP) in contrast with PMI and MLHR, CCP is not symmetric (e.g. generally a100a101a100a31a34a35a21a30a10a31a11a14a13 a5 a27a32a11a14a13a17a24a103a102a36</Paragraph> <Paragraph position="8"> a10a31a11a14a13a33a24 ). This is based on the fact that we search for the occurrence of the answer pattern Asp only in the cases when Qsp is present. The statistical evidence for this can be measured through a34a35a21a19a27a32a11a14a13a63a104a10a31a11a14a13a33a24 , however this value is corrected with</Paragraph> <Paragraph position="10"> in the denominator, to avoid the cases when high-frequency words and patterns are taken as relevant answers.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 An example </SectionTitle> <Paragraph position="0"> Consider an example taken from the question answer corpus of the main task of TREC-2001: &quot;Which river in US is known as Big Muddy?&quot;. The question keywords are: &quot;river&quot;, &quot;US&quot;, &quot;known&quot;, &quot;Big&quot;, &quot;Muddy&quot;. The search of the pattern [river NEAR US NEAR (known OR know OR...) NEAR Big NEAR Muddy] returns 0 pages, so the algorithm relaxes the pattern by cutting the initial noun &quot;river&quot;, according to the heuristic for discarding a noun if it is the first keyword of the question. The second pattern [US NEAR (known OR know OR...) NEAR Big NEAR Muddy] also returns 0 pages, so we apply the heuristic for ignoring verbs like &quot;know&quot;, &quot;call&quot; and abstract nouns like &quot;name&quot;. The third pattern [US NEAR Big NEAR Muddy] returns 28 pages, which is over the experimentally set threshold of seven pages.</Paragraph> <Paragraph position="1"> One of the 50 byte candidate answers from the TREC-2001 answer collection is &quot;recover Mississippi River&quot;. Taking into account the answer type LOCATION, the algorithm considers only the named entity: &quot;Mississippi River&quot;. To calculate answer validity score (in this example PMI) for [Mississippi River], the procedure constructs the validation pattern: [US NEAR Big NEAR Muddy NEAR Mississippi River] with the answer sub-pattern [Mississippi River]. These two patterns are passed to the search engine, and the returned numbers of pages are substituted in the mutual information expression at the places of a15a33a16a19a18a20a11a22a21a23a10a12a11a14a13a67a49a6a50a6a51a22a52a108a27a28a11a29a13a17a24 and a15a17a16a25a18a26a11a22a21a25a27a28a11a29a13a17a24 respectively; the previously obtained number (i.e.</Paragraph> <Paragraph position="2"> 28) is substituted at the place of a15a17a16a25a18a26a11a22a21a30a10a31a11a14a13a33a24 . In this way an answer validity score of 55.5 is calculated.</Paragraph> <Paragraph position="3"> It turns out that this value is the maximal validity score for all the answers of this question. Other correct answers from the TREC-2001 collection contain as name entity &quot;Mississippi&quot;. Their answer validity score is 11.8, which is greater than 1.2 and also greater than a109a45a110a111a66a107a55 a38 a1a6a39a17a16a113a112a88a1a22a114 a115a12a1a6a114a19a16a117a116a6a16a19a18a26a118 a119a60a120a83a121a42a122a42a43 a21a19a36a123a87a72a87a72a110a124a87a44a24 . This score (i.e. 11.8) classifies them as relevant answers. On the other hand, all the wrong answers has validity score below 1 and as a result all of them are classified as irrelevant answer candidates. null</Paragraph> </Section> </Section> class="xml-element"></Paper>