File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1201_intro.xml
Size: 3,550 bytes
Last Modified: 2025-10-06 14:02:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1201"> <Title>Question answering via Bayesian inference on lexical relations</Title> <Section position="3" start_page="1" end_page="1" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> Information Retrieval (IR) systems such as SMART (Buckley, 1985) rank documents for relevance w.r.t. to a user query, based on keyword match between the query and a document, each represented in the well-known &quot;vector space model&quot;. The degree of match is measured as the cosine of the angle between query and document vectors.</Paragraph> <Paragraph position="1"> In QA, an IR subsystem is typically used to short-list passages which are likely to embed the answer. Usually, several enhancements are made to stock IR systems to meet this task.</Paragraph> <Paragraph position="2"> First, the cosine measure used in stock vector-space systems will be biased against long documents even if they embed the answer in a narrow zone. This problem can be ameliorated by representing suitably-sized passage windows (rather than whole documents) as vectors. While scoring passages using the cosine measure, we can also ignore passage terms which do not occur in the query.</Paragraph> <Paragraph position="3"> The second issue is one of proximity. A passage is likely to be promising if query words occur close to one another. Commercial search engines reward proximity of matched query terms, but in undocumented ways. Clarke et al. (Clarke et al., 2001) exploit term proximity within documents for passage scoring.</Paragraph> <Paragraph position="4"> The third and most important limitation of stock IR systems is the inability to bridge the lexical chasm between question and potential answer via lexical networks. One query from TREC (Vorhees, 2000) asks, &quot;Who painted Olympia?&quot; The answer is in the passage: &quot;Manet, who, after all, created Olympia, gets no credit.&quot; QA systems use a gamut of techniques to deal with this problem. FALCON (Harabagiu et al., 2000) (one of the best QA systems in recent TREC competitions) integrates syntactic, semantic and pragmatic knowledge for QA. It uses WordNet-based query expansion to try to bridge the lexical chasm. WordNet is customized into a answer-type taxonomy to infer the expected answer type for a question. Named-entity recognition techniques are also employed to improve quality of passages retrieved. The answers are finally filtered by justifying them using abductive reasoning. Mulder (Kwok et al., 2001) uses a similar approach to perform QA on Web scale. The well-known START system (Katz, ) goes even further in this direction.</Paragraph> <Paragraph position="5"> Discussion: In general, the TREC QA systems divide QA into two tasks: identifying relevant documents and extracting answer passages from them.</Paragraph> <Paragraph position="6"> For the former task, most systems use traditional IR engines coupled with ad-hoc query expansion based on WordNet. Handcrafted knowledge bases, question/answer type classifiers and a variety of heuristics are used for the latter task. Success in QA comes at the cost of great effort in custom-designed wordnets and ontologies, and expansion, matching and scoring heuristics which need to be upgraded as the knowledge bases are enhanced. Ideally, we should use a knowledge base which can be readily extended, and a core scoring algorithm which is elegant and &quot;universal&quot;.</Paragraph> </Section> class="xml-element"></Paper>