File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/h05-1040_intro.xml

Size: 5,882 bytes

Last Modified: 2025-10-06 14:02:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1040">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 315-322, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Enhanced Answer Type Inference from Questions using Sequential Models</Title>
  <Section position="2" start_page="0" end_page="315" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> An important step in factual question answering (QA) and other dialog systems is to classify the question (e.g., Who painted Olympia?) to the anticipated type of the answer (e.g., person). This step is called &amp;quot;question classification&amp;quot; or &amp;quot;answer type identification&amp;quot;.</Paragraph>
    <Paragraph position="1"> The answer type is picked from a hand-built taxonomy having dozens to hundreds of answer types (Harabagiu et al., 2000; Hovy et al., 2001; Kwok et al., 2001; Zheng, 2002; Dumais et al., 2002). QA [?] soumen@cse.iitb.ac.in systems can use the answer type to short-list answer tokens from passages retrieved by an information retrieval (IR) subsystem, or use the type together with other question words to inject IR queries.</Paragraph>
    <Paragraph position="2"> Early successful QA systems used manually-constructed sets of rules to map a question to a type, exploiting clues such as the wh-word (who, where, when, how many) and the head of noun phrases associated with the main verb (what is the tallest mountain in ...).</Paragraph>
    <Paragraph position="3"> With the increasing popularity of statistical NLP, Li and Roth (2002), Hacioglu and Ward (2003) and Zhang and Lee (2003) used supervised learning for question classification on a data set from UIUC that is now standard1. It has 6 coarse and 50 fine answer types in a two-level taxonomy, together with 5500 training and 500 test questions. Webclopedia (Hovy et al., 2001) has also published its taxonomy with over 140 types.</Paragraph>
    <Paragraph position="4"> The promise of a machine learning approach is that the QA system builder can now focus on designing features and providing labeled data, rather than coding and maintaining complex heuristic rulebases. The data sets and learning systems quoted above have made question classification a well-defined and non-trivial subtask of QA for which algorithms can be evaluated precisely, isolating more complex factors at work in a complete QA system.</Paragraph>
    <Paragraph position="5"> Prior work: Compared to human performance, the accuracy of question classifiers is not high. In all studies, surprisingly slim gains have resulted from sophisticated design of features and kernels.</Paragraph>
    <Paragraph position="6"> Li and Roth (2002) used a Sparse Network of Winnows (SNoW) (Khardon et al., 1999). Their features included tokens, parts of speech (POS), chunks (non-overlapping phrases) and named entity (NE) tags. They achieved 78.8% accuracy for 50 classes, which improved to 84.2% on using an (unpublished, to our knowledge) hand-built dictionary of &amp;quot;semantically related words&amp;quot;.</Paragraph>
    <Paragraph position="7">  Hacioglu and Ward (2003) used linear support vector machines (SVMs) with question word 2grams and error-correcting output codes (ECOC)-but no NE tagger or related word dictionary--to get 80.2-82% accuracy.</Paragraph>
    <Paragraph position="8"> Zhang and Lee (2003) used linear SVMs with all possible question word q-grams, and obtained 79.2% accuracy. They went on to design an ingenious kernel on question parse trees, which yielded visible gains for the 6 coarse labels, but only &amp;quot;slight&amp;quot; gains for the 50 fine classes, because &amp;quot;the syntactic tree does not normally contain the information required to distinguish between the various fine categories within a coarse category&amp;quot;.</Paragraph>
    <Paragraph position="9">  (1) SNoW accuracy without the related word dictionary was not reported. With the related-word dic null tionary, it achieved 91%. (2) SNoW with a relatedword dictionary achieved 84.2% but the other algorithms did not use it. Our results are summarized in the last two rows, see text for details.</Paragraph>
    <Paragraph position="10"> Our contributions: We introduce the notion of the answer type informer span of the question (in SS2): a short (typically 1-3 word) subsequence of question tokens that are adequate clues for question classification; e.g.: How much does an adult elephant weigh? We show (in SS3.2) that a simple linear SVM using features derived from human-annotated informer spans beats all known learning approaches. This confirms our suspicion that the earlier approaches suffered from a feature localization problem.</Paragraph>
    <Paragraph position="11"> Of course, informers are useful only if we can find ways to automatically identify informer spans. Surprisingly, syntactic pattern-matching and heuristics widely used in QA systems are not very good at capturing informer spans (SS3.3). Therefore, the notion of an informer is non-trivial.</Paragraph>
    <Paragraph position="12"> Using a parse of the question sentence, we derive a novel set of multi-resolution features suitable for training a conditional random field (CRF) (Lafferty et al., 2001; Sha and Pereira, 2003). Our feature design paradigm may be of independent interest (SS4). Our informer tagger is about 85-87% accurate.</Paragraph>
    <Paragraph position="13"> We use a meta-learning framework (Chan and Stolfo, 1993) in which a linear SVM predicts the answer type based on features derived from the original question as well as the output of the CRF. This meta-classifier beats all published numbers on standard question classification benchmarks (SS4.4). Table 1 (last two rows) summarizes our main results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML