File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/w01-1202_evalu.xml
Size: 3,936 bytes
Last Modified: 2025-10-06 13:58:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-1202"> <Title>MAYA: A Fast Question-answering System Based On A Predictive Answer Indexer*</Title> <Section position="5" start_page="21" end_page="21" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.1 The experiment data </SectionTitle> <Paragraph position="0"> In order to experiment on MAYA, we collected 14,321 documents (65,752 kilobytes) from two web sites: korea.internet.com (6,452 documents) and www.sogang.ac.kr (7,869 documents). The former gives the members on-line articles on Information Technology (IT). The latter is a homepage of Sogang University. The indexing engine created the 14 answer DBs (14 semantic categories).</Paragraph> <Paragraph position="1"> For the test data, we collected 50 pairs of question-answers from 10 graduate students.</Paragraph> <Paragraph position="2"> Table 1 shows the 14 semantic categories and the numbers of the collected question-answers in each category. As shown in Table 1, we found 2 question-answers out of the 14 semantic categories. They are not closed-class question-answers but explanation-seeking question-answers like &quot;Question: How can I search on- null answers in each category We use two sorts of evaluation schemes. To experiment on MAYA, we compute the performance score as the Reciprocal Answer Rank (RAR) of the first correct answer given by each question. To compute the overall performance, we use the Mean Reciprocal Answer Rank (MRAR), as shown in Equation 5 (Voorhees and Tice, 1999).</Paragraph> <Paragraph position="4"> With respect to the total system that combines MAYA with the IR system, we use the RDR means the reciprocal rank of the first document including the correct answers given by each question.</Paragraph> </Section> <Section position="2" start_page="21" end_page="21" type="sub_section"> <SectionTitle> 4.2 Analysis of experiment results </SectionTitle> <Paragraph position="0"> The performance of MAYA is shown in Table 2.</Paragraph> <Paragraph position="1"> We obtained the correct answers for 33 questions out of 50 in Top 1.</Paragraph> <Paragraph position="2"> Table 3 shows the performance of the total system. As shown in Table 3, the total system significantly improves the document retrieval performance of underlying IR system about the closed-class questions.</Paragraph> <Paragraph position="3"> The average retrieval time of the IR system is 0.022 second per query. The total system is 0.029 second per query. The difference of the retrieval times between the IR system and the total system is not so big, which means that the retrieval speed of QA-only-system is fast enough to be negligible. The IR system shows some sentences including query terms to a user. However, the total system shows the sentences including answer candidates to a user. This function helps the user get out of the trouble that the user might experience when he/she looks through the whole document in order to find the answer phrase.</Paragraph> <Paragraph position="4"> MAYA could not extract the correct answers to certain questions in this experiment. The failure cases are the following, and all of them can be easily solved by extending the resources and pattern rules: a11 The lexico-syntactic parser failed to classify users' queries into the predefined semantic categories. We think that most of these failure queries can be dealt with by supplementing additional lexico-syntactic grammars.</Paragraph> <Paragraph position="5"> a12 The NE recognizer failed to extract answer candidates. To resolve this problem, we should supplement the entries in PLO dictionary, the entries in the unit dictionary, and regular expressions. We also should endeavor to improve the precision of the NE recognizer.</Paragraph> </Section> </Section> class="xml-element"></Paper>