File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/w00-0604_concl.xml

Size: 2,911 bytes

Last Modified: 2025-10-06 13:52:49

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0604">
  <Title>Answer Extraction Towards better Evaluations of NLP Systems</Title>
  <Section position="8" start_page="25" end_page="26" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We are convinced that reading comprehension tests are too difficult for the current state of art in natural language processing. Our analysis of the Maple Syrup story shows how much world knowledge and inference rules are needed to actually answer the test questions correctly.</Paragraph>
    <Paragraph position="1"> Therefore, we think that a more restricted kind of task that focuses rather on tractable problems than on AI-hard problems of question-answering (QA) is better suited to take our field a step further. Answer Extraction (AE) is an alternative to QA that relies mainly O n linguistic knowledge. AE aims at retrieving those exact passages of a document that directly answer a given user query. AE is less ambitious than full-fledged QA since the answers are not generated from a knowledge base but looked up in the documents. These documents come from a well-defined (technical) domain and consist of a relatively small volume of data. Our test queries are real world queries that express a concrete information need. To evaluate our AE systems, we propose besides precision and recall two additional measures: succinctness and correctness.</Paragraph>
    <Paragraph position="2"> They measure the quality of answer sentences on the sentence level and are computed on the basis of the overlap of logical predicates.</Paragraph>
    <Paragraph position="3"> To round out the picture, we address the questions in (WRC, 2000) in the view of what we said in this paper:  Q: Can such exams \[reading comprehension tests\] be used to evaluate computer-based language understanding effectively and e~ciently? A: We think that no language understanding system will currently be able to answer a significant proportion of such questions, which will make evaluation results difficult at best, meaningless at worst.</Paragraph>
    <Paragraph position="4"> Q: Would they provide an impetus and test bed for interesting and useful research? A: We think that the impetus they might provide would drive development in the wrong direction, viz. towards the creation of (possibly impressive) engineering feats without much linguistically interestingcontent.</Paragraph>
    <Paragraph position="5"> Q: Are they too hard for current technology? A: Definitely, and by a long shot.</Paragraph>
    <Paragraph position="6"> Q: Or are they too easy, such that simple hacks can score high, although there is clearly no understanding involved? ., A: &amp;quot;Simple hacks&amp;quot; would almost certainly score higher than linguistically interesting methods but not because the task is too simple but because it is far too difficult.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML