File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/w01-1202_intro.xml
Size: 4,648 bytes
Last Modified: 2025-10-06 14:01:17
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1202"> <Title>MAYA: A Fast Question-answering System Based On A Predictive Answer Indexer*</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction[?] </SectionTitle> <Paragraph position="0"> Information Retrieval (IR) systems have been applied successfully to a large scale of search area in which indexing and searching speed is important. Unfortunately, they return a large [?] This research was partly supported by BK21 program of Telecommunications.</Paragraph> <Paragraph position="1"> amount of documents that include indexing terms in a user's query. Hence, the user should carefully look over the whole text in order to find a short phrase that precisely answers his/her question.</Paragraph> <Paragraph position="2"> Question-answering (QA), an area of IR, is attracting more attention, as shown in the proceedings of AAAI (AAAI, 1999) and TREC (TREC, http://trec.nist.gov/overview.html). A QA system searches a large collection of texts, and filters out inadequate phrases or sentences within the texts. By using the QA system, a user can promptly approach to his/her answer phrases without troublesome tasks. However, most of the current QA systems (Ferret et al., 1999; Hull, 1999; Srihari and Li, 1999; Prager et al., 2000) have two problems as follows: a0 It cannot correctly respond to all of the users' questions. It can answer the questions that are included in the pre-defined categories such as person, date, time, and etc.</Paragraph> <Paragraph position="3"> a1 It requires more indexing or searching time than traditional IR systems do because it needs a deep linguistic knowledge such as syntactic or semantic roles of words.</Paragraph> <Paragraph position="4"> To solve the problems, we propose a QA system using a predictive answer indexer MAYA (MAke Your Answer). We can easily add new categories to MAYA by only supplementing domain dictionaries and rules.</Paragraph> <Paragraph position="5"> We do not have to revise the searching engine of MAYA because the indexer is designed as a separate component that extracts candidate answers. In addition, a user can promptly obtain answer phrases on retrieval time because MAYA indexes answer candidates in advance.</Paragraph> <Paragraph position="6"> Most of the previous approaches in IR have been focused on the method to efficiently represent terms in a document because they want to index and search a large amount of data in a short time (Salton et al., 1983; Salton and McGill, 1983; Salton 1989). These approaches have been applied successfully to the commercial search engines (e.g.</Paragraph> <Paragraph position="7"> http://www.altavista.com) in World Wide Web (WWW). However, in a real sense of information retrieval rather than document retrieval, a user still needs to find an answer phrase within the vast amount of the retrieved documents although he/she can promptly find the relevant documents by using these engines. Recently, several QA systems are proposed to avoid the unnecessary answer finding efforts (Ferret et al., 1999; Hull, 1999; Moldovan et al. 1999; Prager et al., 1999; Srihari and Li, 1999). Recent researches have combined the strengths between a traditional IR system and a QA system (Prager et al., 2000; Prager et al., 1999; Srihari and Li, 1999). Most of the combined systems access a huge amount of electronic information by using IR techniques, and they improve precision rates by using QA techniques. In detail, they retrieve a large amount of documents that are relevant to a user's query by using a well-known TF a2 IDF.</Paragraph> <Paragraph position="8"> Then, they extract answer candidates within the documents, and filter out the candidates by using an expected answer type and some rules on the retrieval time. Although they have been based on shallow NLP techniques (Sparck-Jones, 1999), they consume much longer retrieval time than traditional IR systems do because of the addictive efforts mentioned above. To save retrieval time, MAYA extracts answer candidates, and computes the scores of the candidates on indexing time. On retrieval time, it just calculates the similarities between a user's query and the candidates. As a result, it can minimize the retrieval time.</Paragraph> <Paragraph position="9"> This paper is organized as follows. In Section 2, we review the previous works of the QA systems. In Section 3, we describe the applied NLP techniques, and present our system. In Section 4, we analyze the result of our experiments. Finally, we draw conclusions in Section 5.</Paragraph> </Section> class="xml-element"></Paper>