File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-2004_intro.xml
Size: 1,805 bytes
Last Modified: 2025-10-06 14:01:44
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-2004"> <Title>Exploiting Diversity for Answering Questions</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Progress in question answering technology can be measured as individual systems improve in accuracy, but it is not the only way to witness technological progress. A question one can ask is how well we can perform automatic question answering as a community. If we were asked to enter an Earth English system in an intergalactic TREC, how well would we do? One easy answer is that we would perform as well as the best QA system.</Paragraph> <Paragraph position="1"> A second answer is that perhaps we could do even better by combining systems--this might be expected to work if different systems were independent in their errors. The follow-up question is how would we build such a system? Lower bounds on the highest possible performance current technology can achieve on a given dataset have practical value, as well. They allow us to better estimate how well systems are doing with respect to the underlying difficulty of the dataset, and continually provide performance targets that are known to be achievable. Without such lower bounds on optimal performance, one cannot determine if technological progress in a domain has simply stalled.</Paragraph> <Paragraph position="2"> NIST's ROVER system for combining speech recognizer output gives ASR researchers an updated goal to shoot for after every evaluation, as well as an implicit measure of the extent to which systems are making the same errors (Fiscus, 1997). The work herein initiates a similar set of experiments for question answering technology. null</Paragraph> </Section> class="xml-element"></Paper>