XML Viewer - m95-1002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/95/m95-1002_concl.xml
Size: 8,119 bytes
Last Modified: 2025-10-06 13:57:26
<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1002">
  <Title>OVERVIEW OF RESULTS OF THE MUC-6 EVALUATION</Title>
  <Section position="19" start_page="89" end_page="89" type="concl">
    <SectionTitle>
CONCLUSION S
</SectionTitle>
    <Paragraph position="0"> The results of the evaluation give clear evidence of the challenges that have been overcome and the ones that remain along dimensions of both breadth and depth in automated text analysis. The NE evaluation results serve mainly to document in the MUC context what was already strongly suspected : 1. Automated identification is extremely accurate when identification of lexical pattern types depends only o n &amp;quot;shallow&amp;quot; information, such as the form of the string that satisfies the pattern and/or immediate context ; 2. Automated identification is significantly less accurate when identification is clouded by uncertainty o r ambiguity (as when case distinctions are not made, when organizations are named after persons, etc .) and must depend on one or more &amp;quot;deep&amp;quot; pieces of information (such as world knowledge, pragmatics, o r inferences drawn from structural analysis at the sentential and suprasentential levels) .</Paragraph>
    <Paragraph position="1"> The vast majority of cases are simple ones ; thus, some systems score extremely well -- well enough, in fact, t o compete overall with human performance . Commercial systems are available already that include identification o f those defined for this MUC-6 task, and since a number of systems performed very well for MUC-6, it is eviden t that high performance is probably within reach of any development site that devotes enough effort to the task .</Paragraph>
    <Paragraph position="2"> Any participant in a future MUC evaluation faces the challenge of providing a named entity identificatio n capability that would score in the 90th percentile on the F-measure on a task such as the MUC-6 one .</Paragraph>
    <Paragraph position="3"> The TE evaluation task makes explicit one aspect of extraction that is fundamental to a very broad range o f higher-level extraction tasks. The identification of a name as that of an organization (hence, instantiation of a n ORGANIZATION object) or as a person (PERSON object) is a named entity identification task . The association of shortened forms of the name with the full name depends on techniques that could be used for NE and CO a s well as for TE . The real challenge of TE comes from associating other bits of information with the entity . For PERSON objects, this challenge is small, since the only additional bit of information required is the person' s title (&amp;quot;Mr.,&amp;quot; &amp;quot;Ms.,&amp;quot; &amp;quot;Dr.,&amp;quot; etc.), which appears immediately before the name/alias in the text . For ORGANIZATION objects, the challenge is greater, requiring extraction of location, description, and identification of the type of organization .</Paragraph>
    <Paragraph position="4"> Performance on TE overall is as high as 80% on the F-measure, with performance on ORGANIZATIO N objects significantly lower (70th percentile) than on PERSON objects (90th percentile). Top performance o n PERSON objects came close to human performance, while performance on ORGANIZATION objects fel l significantly short of human performance, with the caveat that human performance was measured on only a portion of the test set . Some of the shortfall in performance on the ORGANIZATION object is due to inadequat e discourse processing, which is needed in order to get some of the non-local instances of th e ORG_DESCRIPTOR, ORG_LOCALE and ORG_COUNTRY slot fills . In the case of ORG_DESCRIPTOR , the results of the CO evaluation seem to provide further evidence for the relative inadequacy of current techniques for relating entity descriptions with entity names .</Paragraph>
    <Paragraph position="5"> Systems scored approximately 15-25 points lower (F-measure) on ST than on TE . As defined for MUC-6 , the ST task presents a significant challenge in terms of system portability, in that the test procedure required tha t all domain-specific development be done in a period of one month . For past MUC evaluations, the formal ru n had been conducted using the same scenario as the dry run, and the task definition was released well before the dr y run. Since the development time for the MUC-6 task was extremely short, it could be expected that the tes t would result in only modest performance levels . However, there were at least three factors that might lead one t o expect higher levels of performance than seen in previous MUC evaluations :  1. The standardized template structure minimizes the amount of idiosyncratic programming required t o produce the expected types of objects, links, and slot fills.</Paragraph>
    <Paragraph position="6"> 2. The fact that the domain-neutral Template Element evaluation was being conducted led to increased focus on getting the low-level information correct, which would carry over to the ST task, sinc e approximately 25% of the expected information in the ST test set was contained in the low-level objects.</Paragraph>
    <Paragraph position="7"> 3. Many of the veteran participating sites had gotten to the point in their ongoing development where the y had fast and efficient methods for updating their systems and monitoring their progress .</Paragraph>
    <Paragraph position="8">  It appears that there is a wide variety of sources of error that impose limits on system effectiveness, whatever th e techniques employed by the system. In addition, the short time frame allocated for domain-specific developmen t naturally makes it very difficult for developers to do sufficient development to fill complex slots that either ar e not always expected to be filled or are not crucial elements in the template structure .</Paragraph>
    <Paragraph position="9"> Sites have developed architectures that are at least as general-purpose techniques as ever, perhaps as a result of having to produce outputs for as many as four different tasks . Many of the sites have emphasized their pattern matching techniques in discussing the strengths of their MUC-6 systems . However, we still have full-sentenc e parsing (e .g. USheffield, UDurham, UManitoba) ; we sometimes have expectations of &amp;quot;deep understanding&amp;quot; (cf. UDurham's use of a world model) and sometimes not (cf . UManitoba's production of ST output directly fro m dependency trees, with no semantic representation per se). Some systems completed all stages of analysis befor e producing outputs for any of the tasks, including NE . Six of the seven sites that participated in the coreferenc e evaluation also participated in the MUC-6 information extraction evaluation, and five of the six made use of the results of the processing that produced their coreference output in the processing that produced their informatio n extraction output.</Paragraph>
    <Paragraph position="10"> The introduction of two new tasks into the MUC evaluations and the restructuring of information extractio n into two separate tasks have infused new life into the evaluations . Other sources of excitement are the spinoff efforts that the NE and CO tasks have inspired that bring these tasks and their potential applications to th e attention of new research groups and new customer groups . In addition, there are plans to put evaluations on line , with public access, starting with the NE evaluation ; this is intended to make the NE task familiar to new site s and to give them a convenient and low-pressure way to try their hand at following a standardized test procedure . Finally, a change in administration of the MUC evaluations is occurring that will bring fresh ideas . The author is turning over government leadership of the MUC work to Elaine Marsh at the Naval Research Laboratory i n Washington, D .C. Ms. Marsh has many years of experience in computational linguistics to offer, along wit h extensive familiarity with the MUC evaluations, and will undoubtedly lead the work exceptionally well .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML