File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-1112_metho.xml

Size: 4,992 bytes

Last Modified: 2025-10-06 14:08:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1112">
  <Title>AnyQ: Answer Set based Information Retrieval System</Title>
  <Section position="4" start_page="6" end_page="6" type="metho">
    <SectionTitle>
5. Experiments
</SectionTitle>
    <Paragraph position="0"> Whereas traditional Q/A and IR system have competition conference, like TREC, so that they can start with standard retrieval test collection, to explore how useful the proposed approach, we evaluate performance of answer document and candidate answer sentence. Another difference comes from the fact that result units for these systems are different. That is Q/A system returns exactly relevant answer (50 byte or 250 byte), while IR system returns document scored by ranking mechanism. Our system returns answer set distilled semantic knowledge as retrieval result</Paragraph>
    <Section position="1" start_page="6" end_page="6" type="sub_section">
      <SectionTitle>
5.1. Automatic Answer Set Construction
</SectionTitle>
      <Paragraph position="0"> Before evaluating our retrieval system, we were interested in knowing how effective and efficient the proposed knowledge base construction method. We tested the attribute-based classification for automatic construction method with 4,599 documents, 120 concepts, and 83 attributes. For performance comparisons, we used the standard precision, recall, and F-score [12]. Table 2 shows that the scores for using a-relation are higher than that of not using the relation. We gain a 31.4% increase in F-score and 400% in speed by using the knowledge. The potential advantage of using the a-relation is the ability to minimize the efforts not only required training set  nstruction but also new answer set construction. On e other hand, a disadvantage is that it has a less ance to assign new attributes. The result our tomatic answer set construction was to establish a ound work for further experiments.</Paragraph>
      <Paragraph position="1"> Performance of Answer set retrieval For our experimental evaluations we constructed al system in Web, named AnyQ. Our nyQ system currently consists of 14,700 concepts, unique attributes, and more than 1.8 million web ents in the economy domain for Korean. The erage number of document under each concept is the average number of answer document is 25, d the average number of attribute is 18. To measure ormance of retrieving answer set, we build 110 ery-relevant answer set, judged by 4 assessor. Our assessors team with 2 people. For performance comparisons, we used the P, R, F-score and MRR[5] for highlighted sentence. All retrieval runs are completely automatic, starting with queries, retrieve answer documents, and finally generating a ranked list of 5 candidate answer sentence.</Paragraph>
      <Paragraph position="2"> We build traditional Web IR system on the same document set for baseline system. The Web IR system uses 2-poisson model for term indexing and vector space model for document retrieving. Table 3 summarized the effectiveness of our initial search step, answer set search. As expected, we obtained better results progressively as answer set based approach. The accuracy of Web IR become higher top10(0.31) to top5(0.291) when we determine more number of documents retrieved. By contrast, AS based IR has improvement both precision(0.769) and recall(0.655) when we assess less number of documents on top ranked. Even when all documents was considered(0.468) is higher than Web IR top 10(0.3). It comes from the fact that Web IR retrieves massive documents appeared term query. But AS based IR handled prepared answer set. That is, AS based IR tend to set highly relevant documents on top result. In other words, answer set based approach can be easier for user to find information they need with less effort.</Paragraph>
      <Paragraph position="3"> To evaluate highlighted paragraphs, we generate a ranked list of 5 candidate answer sentences considered to contain the answer to the query. The score is 0.78 MRR. As mentioned before, our result is not the same type as TREC answer. But we can say that highlighted sentences are helpful to satisfy user information need.</Paragraph>
      <Paragraph position="4"> We further realized that the query pattern as attribute was not sufficient for finding answer.</Paragraph>
      <Paragraph position="5"> Moreover, Korean has characteristic, various variation of same pattern, its duplicate over the attributes. It brings the fact that query processing has ambiguity. Another weakness of our system is that the accuracy of retrieval depends on knowledge base granularity. That is, the effectiveness of attribute-based classification influences whole process of our approach.</Paragraph>
      <Paragraph position="6"> Unfortunately, Our experience cannot compare with other commercial system since there is no standard test collection. By the way AskJeeves was published their accuracy of retrieval is over 30~40%[7], however, this is not absolute contrast.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML