File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1136_intro.xml

Size: 4,548 bytes

Last Modified: 2025-10-06 14:03:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1136">
  <Title>Reranking Answers for Definitional QA Using Language Modeling</Title>
  <Section position="3" start_page="0" end_page="1081" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In recent years, QA systems in TREC (Text REtrieval Conference) have made remarkable progress (Voorhees, 2002). The task of TREC QA before 2003 has mainly focused on the factoid questions, in which the answer to the question is a number, a person name, or an organization name, or the like.</Paragraph>
    <Paragraph position="1"> Questions like &amp;quot;Who is Colin Powell?&amp;quot; or &amp;quot;What is mold?&amp;quot; are definitional questions *This work was finished while the first author was visiting Microsoft Research Asia during March 2005-March 2006 as a component of the project of AskBill Chatbot led by Dr. Ming Zhou.</Paragraph>
    <Paragraph position="2"> (Voorhees, 2003). Statistics from 2,516 Frequently Asked Questions (FAQ) extracted from Internet FAQ Archives1 show that around 23.6% are definitional questions. This indicates that definitional questions occur frequently and are important question types. TREC started the evaluation for definitional QA in 2003. The definitional QA systems in TREC are required to extract definitional nuggets/sentences that contain the highly descriptive information about the question target from a given large corpus.</Paragraph>
    <Paragraph position="3"> For definitional question, statistical ranking methods based on centroid vector (profile) extracted from external resources, such as the online encyclopedia, are widely adopted in the top systems in TREC 2003 and 2004 (Xu et al., 2003; Blair-Goldensohn et al., 2003; Wu et al., 2004). In these systems, for a given question, a vector is formed consisting of the most frequent co-occurring terms with the question target as the question profile. Candidate answers extracted from a given large corpus are ranked based on their similarity to the question profile. The similarity is normally the TFIDF score in which both the candidate answer and the question profile are treated as a bag of words in the framework of Vector Space Model (VSM).</Paragraph>
    <Paragraph position="4"> VSM is based on an independence assumption, which assumes that terms in a vector are statistically independent from one another. Although this assumption makes the development of retrieval models easier and the retrieval operation tractable, it does not hold in textual data. For example, for question &amp;quot;Who is Bill Gates?&amp;quot; words &amp;quot;born&amp;quot; and &amp;quot;1955&amp;quot; in the candidate answer are not independent.</Paragraph>
    <Paragraph position="5"> In this paper, we are interested in considering the term dependence to improve the answer reranking for definitional QA. Specifically, the  language model is utilized to capture the term dependence. A language model is a probability distribution that captures the statistical regularities of natural language use. In a language model, key elements are the probabilities of word sequences, denoted as P(w1, w2, ..., wn) or P (w1,n) for short. Recently, language model has been successfully used for information retrieval (IR) (Ponte and Croft, 1998; Song and Croft, 1998; Lafferty et al., 2001; Gao et al., 2004; Cao et al., 2005). Our natural thinking is to apply language model to rank the candidate answers as it has been applied to rank search results in IR task.</Paragraph>
    <Paragraph position="6"> The basic idea of our research is that, given a definitional question q, an ordered centroid OC which is learned from the web and a language model LM(OC) which is trained with it. Candidate answers can be ranked by probability estimated by LM(OC). A series of experiments on standard TREC 2003 collection have been conducted to evaluate bigram and biterm language models. Results show that both these two language models produce promising results by capturing the term dependence and biterm model achieves the best performance. Biterm language model interpolating with unigram model significantly improves the VSM and unigram model by 14.9% and 12.5% in F-Measure(5).</Paragraph>
    <Paragraph position="7"> In the rest of this paper, Section 2 reviews related work. Section 3 presents details of the proposed method. Section 4 introduces the structure of our experimental system. We show the experimental results in Section 5, and conclude the paper in Section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML