File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/01/p01-1037_evalu.xml

Size: 3,012 bytes

Last Modified: 2025-10-06 13:58:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1037">
  <Title>The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
6 Performance evaluation
</SectionTitle>
    <Paragraph position="0"> To evaluate the role of lexico-semantic feedback loops in an open-domain textual Q&amp;A system we have relied on the 890 questions employed in the TREC-8 and TREC-9 Q&amp;A evaluations.</Paragraph>
    <Paragraph position="1"> In TREC, for each question the performance was computed by the reciprocal value of the rank (RAR) of the highest-ranked correct answer given by the system. Given that only the first five answers were considered in the TREC evaluations, i f the RAR is defined as a90a42a91a92a90a94a93 a95a96a76a97a59a98a100a99</Paragraph>
    <Paragraph position="3"> swer was correct, but not the first one; 0.33 when the correct answer was on the third position; 0.25 if the fourth answer was correct; 0.2 when the fifth answer was correct and 0 if none of the first five answers were correct. The Mean Reciprocal Answer Rank (MRAR) is used to compute the over-all performance of the systems participating in the  dition, TREC-9 imposed the constraint that an answer is considered correct only when the textual context from the document that contains it can account for it. When the human assessors were convinced this constraint was satisfied, they considered the RAR to be strict, otherwise, the RAR was considered lenient.</Paragraph>
    <Paragraph position="4"> Table 2 summarizes the MRARs provided by  NIST for the system on which we evaluated the role of lexico-semantic feedbacks. Table 3 lists the quantitative analysis of the feedback loops. Loop 1 was generated more often than any other loop. However, the small overall average number of feedback loops that have been carried out indicate that the fact they port little overhead to the  data More interesting is the qualitative analysis of the effect of the feedback loops on the Q&amp;A evaluation. Overall, the precision increases substantially when all loops were enabled, as illustrated in Table 4.</Paragraph>
    <Paragraph position="6"> Individually, the effect of Loop 1 has an accuracy increase of over 40%, the effect of Loop 2 had an enhancement of more than 52% while Loop 3 produced an enhancement of only 8%. Table 4 lists also the combined effect of the feedbacks, showing that when all feedbacks are enabled, for short answers we obtained an MRAR of 0.568, i.e. 76% increase over Q&amp;A without feedbacks. The MRAR for long answers had a similar increase of 91%. Because we also used the answer caching technique, we gained more than 1% for short answers and almost 3% for long answers, obtaining the result listed in Table 2. In our experiments, from the total of 890 TREC questions, lexical alternations were used for 129 questions and the semantic alternations were needed only for 175 questions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML