File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1097_metho.xml
Size: 3,853 bytes
Last Modified: 2025-10-06 14:13:26
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1097"> <Title>INFORMATION EXTRACTION SYSTEM EVALUATION</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> INFORMATION EXTRACTION SYSTEM EVALUATION </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> PROJECT GOALS </SectionTitle> <Paragraph position="0"> This year, project efforts are focused on reapplying and revising existing evaluation techniques for the purpose of evaluating English and Japanese information extraction systems in the joint ventures and microelectronics domains. This year's effort will culminate in the Fifth Message Understanding Conference (MUC-5) in August, 1993.</Paragraph> <Paragraph position="1"> development in March, 1993, in preparation for the evaluation in July and the conference in August. Over 20 organizations (including Tipster-sponsored organizations) are planning to participate. Most of the non-Tipster organizations will be working only on the English joint ventures task or the English microelectronics task; however, two will be working on joint ventures in both languages, and one will be working on microelectronics in Japanese only.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS </SectionTitle> <Paragraph position="0"> MUC-4: The MUC-4 evaluation was conducted in FY92, the conference was held in June, 1992, and a proceedings was published in September. A single-value metric based on recall and precision was developed, and statistical significance tests were conducted. A blind test of 17 seventeen systems was conducted using an improved version of the Latin American terrorism information extraction task originally defined for MUC3. Higher levels of performance by nearly all veteran systems were achieved for MUC-4, but the top scores are still only moderate. Progress in controlling the tendency to generate spurious data was obvious, but the problem still exists, along with the problem of insufficient domain coverage and general world knowledge. The push to extend the systems has brought into the focus the adverse effect that errors made in early stages of processing at the sentence and phrasal level have on suprasentential processing done in subsequent stages.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> TIPSTER INTERIM EVALUATIONS: The </SectionTitle> <Paragraph position="0"> scoring software used for MUC-4 was rewritten for the object-oriented Tipster template design. Accomodations were made for scoring Japanese. Alternative scoring procedures and new metrics were introduced. The Tipster English and Japanese systems were evaluated in September, 1992 on joint ventures, and they were evaluated in February, 1993 on both joint ventures and microelectronics. The results of these evaluations are being used to make decisions concerning the evaluation methodology to be used for the final Tipster evaluation (which will be the MUC-5 evaluation).</Paragraph> <Paragraph position="1"> MUC-5: The call for participation in MUC-5 was issued in October, 1992, and participants began</Paragraph> </Section> <Section position="5" start_page="0" end_page="403" type="metho"> <SectionTitle> PLANS FOR THE YEAR </SectionTitle> <Paragraph position="0"> * Improve the evaluation methodology to be used for MUC-5 based on the experiences of the Tipster interim evaluations.</Paragraph> <Paragraph position="1"> * Coordinate the MUC-5 evaluation and conduct the conference.</Paragraph> <Paragraph position="2"> * Foster interest in resource-sharing among evaluation participants to support future R&D on information extraction and NLP in general.</Paragraph> </Section> class="xml-element"></Paper>