File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/m93-1004_intro.xml

Size: 2,892 bytes

Last Modified: 2025-10-06 14:05:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="M93-1004">
  <Title>TIPSTER/MUC- 5 INFORMATION EXTRACTION SYSTEM EVALUATIO N</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Three information extraction system evaluations using Tipster data were conducted in the context of Phase 1 of the Tipster Text program. Interim evaluations were conducted in September, 1992, and February, 1993 ; the final evaluation was conducted in July, 1993 . The final evaluation included not only the Tipster-supported information extraction contractors but thirteen other participants as well. This evaluation was the topic of th e Fifth Message Understanding Conference (MUC-5) in August, 1993 . With particular respect to the research an d development tasks of the Tipster contractors, the goal of these evaluations has been to assess success in terms of the development of systems to work in both English and Japanese (BBN, GE/CMU, and NMSUBrandeis) and/or in both the joint ventures and microelectronics domains (BBN, GE/CMU, NMSU/Brandeis, and UMass/Hughes) .</Paragraph>
    <Paragraph position="1"> The methodology associated with these evaluations has been under development since 1987, when the serie s of Message Understanding Conferences began . The evaluations have pushed technology to handle the recurring language problems found in sizeable samples of naturally-occuring text. Designing the evaluations around an information extraction application of text processing technology has made it possible to discuss NLP technique s at a practical level and to gain insight into the capabilities of complex systems .</Paragraph>
    <Paragraph position="2"> However, any such evaluation testbed application will undoubtedly differ in important respects from a real-life application. Thus, there is only an indirect connection between the evaluation results for a system and th e suitability of applying the system to performance of a task in an operational setting. A fairly large number of metrics have been defined that respond to the variety of subtasks inherent in information extraction and th e varying perspectives of evaluation consumers .</Paragraph>
    <Paragraph position="3"> The evaluations measure coverage, accuracy, and classes of error on each language-domain pair , independently of all other language-domain pairs that the system may be tested on . With its dual language and domain requirements and challenging task definition, Tipster Phase 1 pushed especially hard on issues such as portability tools, language- and domain- independent architectures and algorithms, and system efficiency. These aspects of software were not directly evaluated, although information concerning some or all of them may be found in the papers prepared by the evaluation participants .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML