File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1199_intro.xml

Size: 3,807 bytes

Last Modified: 2025-10-06 14:02:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1199">
  <Title>Learning to Identify Single-Snippet Answers to Definition Questions</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Since the introduction of the TREC QA track (Voorhees, 2001), question answering systems for document collections have attracted a lot of attention. The goal is to return from the collection text snippets (eg., 50 or 250 characters long) or exact answers (e.g., names, dates) that answer natural language questions submitted by users.</Paragraph>
    <Paragraph position="1"> A typical system first classifies the question into one of several categories (questions asking for locations, persons, etc.), producing expectations of types of named entities that must be present in the answer (location names, person names, etc.). Using the question terms as a query, an information retrieval (IR) system identifies possibly relevant passages in the collection, often after query expansion (e.g., adding synonyms). Snippets of these passages are then selected and ranked, based on criteria such as whether or not they contain the expected types of named entities, the percentage of question words in each snippet, the percentage of words that also occur in other candidate snippets, etc. The system reports the most highly-ranked snippets, or, in the case of exact answers, named entities of the required type therein.</Paragraph>
    <Paragraph position="2"> Unfortunately, the approach highlighted above falls short with questions that do not generate expectations of particular types of named entities and contain very few non-stop-words. Definition questions (e.g. &amp;quot;What is a nanometer?&amp;quot;, &amp;quot;Who was Duke Ellington?&amp;quot;) have both properties, and are particularly common. In TREC-2001, where the distribution of question types reflected that of real user logs, 27% of the questions were requests for definitions. Hence, techniques to handle this category of questions are very important.</Paragraph>
    <Paragraph position="3"> We propose a new method to answer definition questions, that combines and extends the technique of Prager et al. (2001, 2002), which relied on WordNet hypernyms, and that of Joho et al. (2001, 2002), which relied on manually crafted lexical patterns, sentence position, and word co-occurrence across candidate answers. We train an SVM (Scholkopf and Smola, 2002) on vectors whose attributes include the verdict of Prager et al.'s method, the attributes of Joho et al., and additional phrasal attributes that we acquire automatically. The SVM is then used to identify and rank 250-character snippets, each intended to contain a stand-alone definition of a given term, much as in TREC QA tasks prior to 2003.</Paragraph>
    <Paragraph position="4"> In TREC-2003, the answers to definition questions had to be lists of complementary snippets (Voorhees, 2003), as opposed to single-snippet definitions. Here, we focus on the pre-2003 task, for which TREC data were publicly available during our work. We believe that this task is still interesting and of practical use. For example, a list of single-snippet definitions accompanied by their source URLs can be a good starting point for users of search engines wishing to find definitions.</Paragraph>
    <Paragraph position="5"> Single-snippet definitions can also be useful in information extraction, where the templates to be filled in often require short entity descriptions; see Radev and McKeown (1997). Experiments indicate that our method clearly outperforms the techniques it builds upon in the task we considered. We sketch in section 6 how we plan to adapt our method to the post-2003 TREC task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML