File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/h05-1082_concl.xml
Size: 2,128 bytes
Last Modified: 2025-10-06 13:54:32
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1082"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 652-659, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics A Methodology for Extrinsically Evaluating Information Extraction Performance</Title> <Section position="5" start_page="658" end_page="658" type="concl"> <SectionTitle> 4 Conclusions </SectionTitle> <Paragraph position="0"> We presented a methodology for assessing information extraction effectiveness using an extrinsic study. In addition, we demonstrated how a novel database blending (merging) strategy allows interpolating extraction quality from automated performance up through human accuracy, thereby decreasing the resources required to conduct effectiveness evaluations.</Paragraph> <Paragraph position="1"> Experiments showed QA accuracy and speed increased with higher IE performance, and that the database blend percentage was a good proxy for ACE value scores. We emphasize that the study was not to show that IE supports QA better than other technologies, rather to isolate utility gains due to IE performance improvements.</Paragraph> <Paragraph position="2"> QA performance was plotted against human-machine IE blend and, for example, 70% QA performance was achieved with a database blend between 41% and 46% machine extraction. This corresponded to entity and relationship value scores of roughly 74 and 47 respectively.</Paragraph> <Paragraph position="3"> The logistic dose-response model provided a good fit and allowed for computation of confidence bounds for the IE associated with a particular level of performance. The constraints imposed by AnswerPad and FactBrowser ensured that world knowledge was neutralized, and the repeated-measures design (using participants as their own controls across multiple levels of database quality) excluded inter-participant variability from experimental error, increasing the ability to detect differences with relatively small sample sizes.</Paragraph> </Section> class="xml-element"></Paper>