XML Viewer - m95-1018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/m95-1018_metho.xml
Size: 27,663 bytes
Last Modified: 2025-10-06 14:14:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1018">
  <Title>System Walkthrough</Title>
  <Section position="5" start_page="224" end_page="224" type="metho">
    <SectionTitle>
CUSTOMIZATION
</SectionTitle>
    <Paragraph position="0"> HASTEN is simple to customize, and involves the steps listed below . Due to the complexity of the task, these steps may be repeated to adjust the definitions .</Paragraph>
    <Paragraph position="1"> * Encode the template specification according to the task specification . This step not only supports th e creation of the Generator output scripts, but also enables the loading of answer key templates fo r analysis and evaluation.</Paragraph>
    <Paragraph position="2"> * Define the Collector concept specifications, including semantic roles and constraints. * Define the Generator output script which maps the Collector concepts to the template format. The output script invokes utility functions to convert the Collector data structure into the template format . Depending on task specification, special purpose routines may be required .</Paragraph>
    <Paragraph position="3"> Formulate Egraphs for examples from the training texts. Encoding an Egraph requires less than a minute , using a graphical editor. However, for MUC-6, significant time was spent comprehending the template s and locating the originating text units.</Paragraph>
    <Paragraph position="4"> The remainder of the effort involves determining the similarity metric weights and thresholds to maximize th e extraction performance.</Paragraph>
    <Paragraph position="5"> The Template Element task required 2 PERSONEgraphs, one for an untitled personal name and one for a titled persona l name. The Template Element task required 44 ORGANIZATION Egraphs, in order to extract the locations, nationalities, local descriptors, and unnamed organizations . The Scenario Template task required 132 SUCCESSION Egraphs. In total, the Egraphs referenced 12 structural element classes (e .g. NP), and were constrained to form 10 0 unique structural elements. The Egraphs required 14 syntactic categories, 20 semantic classes, and 2 lexical properties .</Paragraph>
  </Section>
  <Section position="6" start_page="224" end_page="225" type="metho">
    <SectionTitle>
SYSTEM WALKTHROUGH
</SectionTitle>
    <Paragraph position="0"> This section will provide a brief description of HASTEN's performance on the selected walkthrough document .</Paragraph>
    <Paragraph position="1"> HASTEN performed reasonably well, achieving a recall/precision of 38/77 . HASTEN extracted the principal succession event involving &amp;quot;James&amp;quot; and &amp;quot;Dooner, &amp;quot; but failed to detect both management posts . HASTEN failed to extract a secondary succession event involving &amp;quot;Kim .&amp;quot;</Paragraph>
    <Section position="1" start_page="224" end_page="225" type="sub_section">
      <SectionTitle>
Analysis
</SectionTitle>
      <Paragraph position="0"> For each text unit, the Analyzer compared the SUCCESSION Egraphs, computed the similarity metric value, an d selected the maximal matching Egraph that exceeded the similarity threshold . The first successful match occurred in the headline, resulting in the extraction of the succession event, but not the post :  INPUT : &amp;quot;Marketing &amp; Media--Advertising : John Dooner Will Succeed James At Helm of McCann- Erickson&amp;quot; EXAMPLE : 930219-0013 .B (similarity 1 .0) &amp;quot;He succeeds Lance R. Primis &amp;quot; COLLECT : #&lt;SEM :SUCCESSION 747&gt; :IN #&lt;SEM :PERSON 2441 :NAME &amp;quot;John Dooner&amp;quot; &gt; :OUT #&lt;SEM :PERSON 2442 :NAME &amp;quot;James&amp;quot; &gt;  The Analyzer had created the semantic PERSON representations during a previous processing phase, and linked the m to the originating text . The Analyzer accesses these representations and fills the : IN and : OUT slots. The next match occurred in sentence 2, resulting in the additional extraction of the organization and post: INPUT: &amp;quot;Yesterday, McCann made official what had been widely anticipated : Mr. James, 57 years old, is stepping down as chief executive officer o n July 1 and will retire as chairman at the end of the year . &amp;quot; EXAMPLE: 940128-0022 (similarity .86) &amp;quot;E-Systems Inc. said E . Gene Keiffer stepped down as chief executive officer &amp;quot;  Note that the semantic PERSON representation for &amp;quot;James&amp;quot; has a different identifier (i .e. 2391) than the representation from the headline (i.e. 2442). It is the Collector's responsibility to merge identical or compatible representations . The Collector will also merge the SUCCESS ION representations from the headline and sentence 2, as described in the nex t section. The next match occurred in sentence 3, resulting in the erroneous extraction of the succession event in reverse . The example was encoded without regard to the active/passive feature, and therefore, the only structural differenc e between the example and the input is that the input has the preposition &amp;quot;by,&amp;quot; thus resulting in the near-perfect similarit y value of 0.99. If the Analyzer had a passive example, its Egraph would have matched perfectly and therefore pre-empted this erroneous match . In this sentence, the Reference Resolver correctly resolved the pronoun &amp;quot;He&amp;quot; to the last mention of &amp;quot;James&amp;quot;, thus resulting in the extraction of the PERSON2391 representation for &amp;quot;James .&amp;quot;</Paragraph>
      <Paragraph position="2"> Sentence 20 contained the succession event for &amp;quot;Kim.&amp;quot; The training examples did not contain a sentence involvin g the word &amp;quot;hire,&amp;quot; and thus the Egraphs were not similar enough to result in a match . The closest examples achieve d a similarity value of approximately 0.60.</Paragraph>
      <Paragraph position="3"> In addition, Peter Kim was hired from WPP Group's J . Walter Thompson last September as vice chairman, chief strategy officer, world-wide.</Paragraph>
      <Paragraph position="4"> Collection The Collector receives the semantic representations from all the sentences, and merges them into a cumulative semantic representation . The Collector maintains separate semantic representations for incompatible information . In the walkthrough document, the Collector combines the semantic representations from the headline and sentence 2 into the following representation :</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="225" end_page="227" type="metho">
    <SectionTitle>
#&lt;EXT :SUCCESSION 56&gt;
:IN #&lt;EXT :PERSON 373 :NAME &amp;quot;John Dooner&amp;quot;/&amp;quot;Dooner&amp;quot;/&amp;quot;JohnJ .Dooner Jr . &amp;quot;
:TITLE &amp;quot;Mr .&amp;quot;&gt;
:ORGANIZATION #&lt;EXT :ORGANIZATION 211 :NAME &amp;quot;McCann-Erickson&amp;quot;/&amp;quot;McCann&amp;quot;&gt;
:OUT #&lt;EXT :PERSON 374 :NAME &amp;quot;James&amp;quot;/&amp;quot;Robert L . James&amp;quot; :TITLE &amp;quot;Mr .&amp;quot;&gt;
</SectionTitle>
    <Paragraph position="0"> :POST &amp;quot;chief executive officer &amp;quot; &gt; Multiple references to the same named entity (e.g. &amp;quot;James&amp;quot; and &amp;quot;Robert L . James&amp;quot;) are merged, relying on the alias information provided by NameTag. The PERSON Egraphs extract and fill the :TITLE slot. The erroneous SUCCESSION representation from sentence 3 is incompatible with this structure, and is maintained separately . Generation The Generator applies an output script to the Collector representations to produce the data templates . Since the erroneous succession event from sentence 3 does not have a :POST fill, the output script invalidates it and no templat e is generated. For object-oriented templates used in MUC-6, the output script must recursively traverse the Collecto r representations and apply conversion routines for each sub-template . The Generator actually produces a template data structure, which can be easily printed, but also fed directly to HASTEN's scoring program . The scoring program employs a top-down comparison algorithm that produces performance measures as well as a side-by-side display, as illustrated below. The display shows the individual credit assignments as well as the recall/precision subtotals fo r</Paragraph>
  </Section>
  <Section position="8" start_page="227" end_page="228" type="metho">
    <SectionTitle>
TEST RESULTS AND ANALYSIS
</SectionTitle>
    <Paragraph position="0"> For the formal MUC-6 test data, HASTEN had three official configurations: one to maximize recall, one to maximize precision, and one to maximize both . A simple adjustment to the similarity metric threshold created thes e configurations. The training module determined the values of the thresholds, and also determined the optima l extraction bias, which disabled the most over-generating Egraphs . Figure 9 shows the results of the three official configurations for both the training and the test data, as well as additional data points for other threshold settings . This figure clearly illustrates that HASTEN has the ability to trade recall for precision .</Paragraph>
    <Paragraph position="2"> HASTEN rapidly achieved its extraction performance, as illustrated in Figure 10 . After the initial effort to encode the training examples, the training module determined the optimal similarity metric parameters (see (c)) . During this time, no effort was made to actually generate the template slots VACANCY_REASON and ON_THE_JOB, and HASTEN generated a default fill of UNCLEAR and NO, respectively. HASTEN also defaulted the OTHER_ORG to be the same as the SUCCESSION_ORG, and therefore REL_OTHER_ORG was always SAME_ORG. A few days were then spent on those slots, raising the performance slightly (see (c)) . HASTEN's performance got a boost from the latest upgrade to the scoring program and keys (see l3 ). The remainder of the test period was spent on improving the nam e recognition, which impacts all three tasks, but resulted in very little improvement on the scenario template task .</Paragraph>
  </Section>
  <Section position="9" start_page="228" end_page="229" type="metho">
    <SectionTitle>
EXPERIMENTAL RESULTS
</SectionTitle>
    <Paragraph position="0"> Even though the MUC-6 extraction task focused on one scenario, SRA did not want to produce a single extractio n result. SRA's focus was on experimentation with MUC-6 providing a testing environment . This section reports on various other test results that fall outside of the official MUC-6 tests .</Paragraph>
    <Section position="1" start_page="228" end_page="228" type="sub_section">
      <SectionTitle>
Example Sampling
</SectionTitle>
      <Paragraph position="0"> Since extraction examples are the core knowledge source for HASTEN 's extraction capability, it is worthwhil e to explore the relationship between the number of examples and extraction performance . Furthermore, the order o f encoding the examples may also effect performance .</Paragraph>
      <Paragraph position="1">  sequentially ordered by the document number of the originating text unit . The next three runs use one-half, three-quarters, and all of the Egraphs, respectively . The first quarter run is connected with these three runs to sho w the gradual improvement in recall (26 to 46) and the minor degradation in precision (65 to 63) as more examples wer e given to HASTEN . Notice that the fourth Egraph quarter out-performs the first three quarters. Presumably, the fourth Egraph quarter includes generally applicable examples, while the first three Egraph quarters include unusual o r redundant examples. The last run, labelled Freq, consists of running only those Egraphs that matched at least tw o training text units (42 total), approximately one third of the total Egraphs . Presumably, this configuration eliminate s the unusual and redundant examples, and produces the performance near the level of all Egraphs .</Paragraph>
    </Section>
    <Section position="2" start_page="228" end_page="229" type="sub_section">
      <SectionTitle>
Alternative Metric Weights
</SectionTitle>
      <Paragraph position="0"> The Egraph similarity metric utilizes a weighted sum of factors . The official MUC-6 test results considered onl y one configuration of weights, which created a strong preference for the semantic content, especially the ANCHOR label.</Paragraph>
      <Paragraph position="1"> As an alternative, HASTEN was configured with weights that created a strong preference for the structural match .</Paragraph>
      <Paragraph position="2"> This experiment did not produce significantly different results than the official configuration, as illustrated in Figure 12. The five point drop in recall for the BASE configuration does demonstrate that structural differences in example s may interfere with the extraction of semantic content.</Paragraph>
      <Paragraph position="3">  As described earlier, the Egraphs originating from a particular text unit are withheld and not used for extraction on that unit. However, HASTEN has a special mode that runs only the Egraph keys, in order to test the rest of the system. This mode can test how well the Collector is merging information, how well the Reference Resolver is working, or test the format produced by the Generator. In effect, this mode simulates perfect Egraph matching, an d predicts the upper bound on extraction performance for those Egraphs .</Paragraph>
      <Paragraph position="4"> The Egraph key extraction performance is not 100% recall and precision, for a variety of reasons . First, not all occurrences of the extraction concept can be encoded with an Egraph . Elliptical or other highly contextual reference s can not be feasibly encoded . Second, mistakes in reference resolution can cause the extraction of erroneous semanti c representations, even if the Egraph match is correct. Third, the Collector can erroneously merge or split the extracte d semantic representations, even if the Egraph matches are independently correct. Fourth, due to the task specification , some of the scenario extraction output may not come directly from the Egraph matches . Fifth, for MUC-6, about 25% of the management scenario template fills are contained in the PERSON and ORGANI ZATION objects.</Paragraph>
      <Paragraph position="5">  The significant improvement to precision is due to the elimination of all spurious Egraph matches, since Egraph key s are by definition relevant. Recall also improves mainly because unusual examples that fail to match other Egraph s are now matched by their own Egraph.</Paragraph>
      <Paragraph position="6">  included a few template slots that forced systems to attempt to make subtle and inferential judgements ; namely, the VACANCY_REASON, ON_THE_JOB, and REL_OTHER_ORG slots. Furthermore, template specification itself wa s rather cumbersome due to the IN_AND_oUT object, which really was a &amp;quot;pseudo&amp;quot; object for grouping relate d information. These features of the task specification confused the customization and evaluation of extraction system s on the central scenario event, namely the management succession . Therefore, as an additional experiment, SRA devised the very minimal (or micro-MUC) template specification to represent the management succession event, as shown below:  This template specification completely eliminates the IN_AND_oUT object, the set fill slots, and the distinction of an acting post . This specification required changes to only HASTEN's Generator script. To evaluate its performance, SRA automatically converted the answer keys, and edited the official scoring program configuration file .  The generality of HASTEN's design can only be tested by using other task definitions in other domains . The MUC-6 interim scenario task of labor negotiations provides another good application for HASTEN . Figure 15 shows the final performance results on the labor negotiation data.</Paragraph>
    </Section>
    <Section position="3" start_page="229" end_page="229" type="sub_section">
      <SectionTitle>
Egraph Mutations
</SectionTitle>
      <Paragraph position="0"> Since the ultimate goal of HASTEN is to minimize the customization effort, HASTEN must strive to maximize its performance from as few examples as possible . One possibility is to automatically derive other Egraphs from thos e that have been encoded manually. HASTEN has another special module that tries to mutate Egraphs in a variety of ways, based on their similarity with other Egraphs . Currently, there are three mutation methods : * cross-over - replace some structural elements of one Egraph with elements from another ; * trim - eliminate structural elements from the ends of the Egraph; * merge - combine the structural elements of multiple Egraphs ; The mutation module compares every Egraph with each other, and for each pair of significantly similar Egraphs , applies the three methods . The resulting Egraphs are saved and can be treated in the same way as a manually created Egraph. Thus, the training module can run these derivative Egraphs to determine how well they perform, and construc t an extraction bias to include best ones . This module is promising, but there was insufficient time to fully investigat e it for the MUC-6 evaluation .</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="229" end_page="229" type="metho">
    <SectionTitle>
HASTEN IMPLEMENTATION
</SectionTitle>
    <Paragraph position="0"> HASTEN is implemented in Allegro Common LISP, including a development environment written in CLIM .</Paragraph>
    <Paragraph position="1"> HASTEN consists of 12,675 lines of code, and the development environment consists of 12,450 lines of code .</Paragraph>
    <Paragraph position="2"> HASTEN required approximately 20 person-weeks for its development, and MUC-6 required 16 person-weeks o f effort, including the interim test, formal test, and final report . Table 16 shows the processing time for the three officia l</Paragraph>
  </Section>
  <Section position="11" start_page="229" end_page="229" type="metho">
    <SectionTitle>
NAMED ENTITY TASK
</SectionTitle>
    <Paragraph position="0"> SRA performed the Named Entity task using its commercial name recognition product, called NameTag&amp;quot; .</Paragraph>
    <Paragraph position="1"> NameTag is a high-speed software program consisting of a C++ engine and name recognition data . NameTag uses its own tag specification that classifies names and other key phrases, and can either generate SGML annotated tex t or a table of extracted entities. Besides the classification, NameTag also assigns unique identifiers to those names that refer to the same entity, such as &amp;quot;International Business Machines &amp;quot; and &amp;quot;IBM.&amp;quot; NameTag also assigns country codes to place names . NameTag required 20 person-weeks for its engine development, 9 person-weeks for its data, and 1 0 person-weeks for its development interface and utilities .</Paragraph>
    <Paragraph position="2"> NameTag has three major processing modes that represent trade-offs between performance and speed . The BASE configuration performs the maximum analysis, achieves the best results, but is the slowest . The FAST mode reduces the analysis to increase speed with minimal degradation in performance . The FASTEST mode performs the minimum analysis at the greatest speed with the lowest performance . NameTag includes a small number of personal an d organization names (currently 530), which eliminate the need to dynamically recognize them. NameTag can process text without the use of these names . NameTag also can be run in case-insensitive mode to handle text in all uppe r case.</Paragraph>
    <Paragraph position="3"> For the MUC-6 Named Entity task, a 300 line C++ driver program used the NameTag API to run its name recognition, access the table of extracted entities, map the NameTag classification into the MUC-6 specification, an d generate the SGML annotated document. Since NameTag recognizes more names and phrases than defined for MUC-6, such as publications and relative temporal expressions, the driver program filtered some extracted entities . Since the links between aliases are not required for the Named Entity task, the driver suppressed this NameTag information.</Paragraph>
    <Paragraph position="4">  SRA submitted four official configurations : the BASE configuration, the two speed configurations, and the configuration without the use of personal and organizational names . Figure 17 shows the name recognition performance for the final test data, plus two reference points using the interim test data . SRA also conducted a test run using the case-insensitive mode, which is labelled allcaps in the figure. The NameTag case-insensitive mode was run on the upper-case version of the test data. Since the test data was manually tagged in mixed case and the MUC- 6 task specification includes case-sensitive tagging rules, the case-insensitive performance would actually be slightl y higher. For example, in mixed case, &amp;quot;group&amp;quot; is not included in the tag for &amp;quot;Chrysler group .&amp;quot; However, in upper case text, &amp;quot;GROUP&amp;quot; would presumably be included in the tag .</Paragraph>
    <Paragraph position="5">  run on a Sun SPARCstation 20 . The 30 test documents had a size of 87,203 characters . The number of recognition rules for each configuration is also shown . Table 19 contains the performance measures for the ENAMEX tag and its sub-classifications. The ENAMEX tag is the most difficult due to the ambiguity between people, places, and organizations . Furthermore, case sensitivity is more significant in the recognition of these names, as opposed to the numeric and temporal entities . Also, the NO-NAMES configuration excluded the use of personal and organizational names.</Paragraph>
    <Paragraph position="6">  The three speed configurations show the general trade-off between speed and recall, with precision remainin g about the same. Note the anomaly in Location precision measure for the fast configuration ; this is a side effect of recognizing less organizations, which pre-empt the location classification . The NO-NAMES configuration had little effect on person names, illustrating how simple they are to dynamically recognize . The NO-NAMES configuration  resulted in a significant drop in recall for organization names, reflecting the references to household names with little contextual clues, such as &amp;quot;Microsoft.&amp;quot;</Paragraph>
    <Section position="1" start_page="229" end_page="229" type="sub_section">
      <SectionTitle>
System Walkthrough
</SectionTitle>
      <Paragraph position="0"> NameTag performed very well on the selected walkthrough document, achieving a recall/precision of 98/97 fo r the BASE configuration, resulting from three errors . NameTag classified &amp;quot;J. Walter Thompson&amp;quot; as a person rather than a organization, since it looks like a personal name. NameTag tagged &amp;quot;Coca--Col a&amp;quot; within &amp;quot;Coca--Cola Classic&amp;quot; as an organization, since it failed to recognize the larger product name . NameTag tagged &amp;quot;Goldman&amp;quot; within &amp;quot;Kevin Goldman&amp;quot; as an organization, since that is a company alias in its static list of names. The NO--NAMES configuration eliminates this error, but causes NameTag to miss the mentions of &amp;quot;Coke&amp;quot; and &amp;quot;Coca--Cola, &amp;quot; which were also contained in its static list of names .</Paragraph>
      <Paragraph position="1"> The FAST configuration drops the performance to 87/96 due to its failure to learn &amp;quot;McCann--Erickson &amp;quot; as an organization, which consequently causes NameTag to miss 9 &amp;quot;McCann&amp;quot; aliases. The FASTEST configuration further drops performance to 82/94, since it does not apply the ampersand recognition rule to find &amp;quot;Ammirati &amp; Pulls&amp;quot; as an organization, and then individually tags &amp;quot;Puns &amp;quot; as an alias to &amp;quot;Martin Pulls .&amp;quot; The ALLCAPS configuration achieve s 92/92, due to additional erroneous names &amp;quot;BIG HOLLYWOOD TALENT AGENCY,&amp;quot; &amp;quot;JAMES PLACES, &amp;quot;</Paragraph>
    </Section>
  </Section>
  <Section position="12" start_page="229" end_page="234" type="metho">
    <SectionTitle>
&amp;quot;DOONER DECLINES,&amp;quot; and &amp;quot;COKE ADVERTISING .&amp;quot;
TEMPLATE ELEMENT TASK
</SectionTitle>
    <Paragraph position="0"> SRA combined the results of NameTag and some additional processing by HASTEN to perform the Template Element task . NameTag performs the majority of the work, since it identifies the person and organization names , classifies them, and resolves the aliases . NameTag also assigns country codes to the place names which support th e normalized oRG_coUNTRY slot. HASTEN matched person and organization Egraphs to extract additional loca l information, such as location, nationality, and descriptors. The Reference Resolver attempted to resolve organizational references to the names in order to extract additional non-local information .</Paragraph>
    <Paragraph position="1"> SRA submitted two official configurations: a BASE configuration running all components of HASTEN ; and a NO-REF configuration with the Reference Resolver disabled. Since the Reference Resolver provides some of the organizational descriptors, which may include some location or nationality information, the second configuratio n resulted in lower recall . However, since reference resolution is difficult, erroneous references can hurt precision . To demonstrate the portion of HASTEN's performance comes from NameTag, a third unofficial configuration was run that disabled reference resolution and the extraction of locations, nationalities, and descriptors . Figure 20 shows the performance results for these three configurations on the final test and training data, as well as the BASE configuration performance on the interim test and training data .</Paragraph>
    <Paragraph position="2">  Table 21 shows the processing time for the three Template Element configurations, run on a Sun SPARCstatio n 20. The 100 fmal test data documents have an original size of 261,658 characters . NameTag took 19 seconds to process these documents, as shown in the CPU Time column .</Paragraph>
    <Section position="1" start_page="234" end_page="234" type="sub_section">
      <SectionTitle>
System Walkthrough
</SectionTitle>
      <Paragraph position="0"> HASTEN and NameTag performed very well on the selected walkthrough document, achieving a recall/precision of 76/84 for the BASE and NO-REF configurations . The &amp;quot;Goldman&amp;quot; and &amp;quot;J. Walter Thompson&amp;quot; errors described in the Named Entity system walkthrough caused a spurious ORGANIZATION and PERSON object.</Paragraph>
      <Paragraph position="1"> NameTag classified three organization names as OTHER instead of COMPANY, due to the lack of explicit company indicators . They were &amp;quot;Ammirati &amp; Puri's,&amp;quot; &amp;quot;PaineWebber,&amp;quot; and &amp;quot;McCann-Erickson .&amp;quot; NameTag also classified &amp;quot;Creative Artists Agency&amp;quot; as GOVERNMENT due to the organizational head noun AGENCY. The NAMETAG-ONLY configuration achieved a performance of 68/82 . HASTEN generated one of the three organization descriptors, usin g an Egraph to extract the appositive &amp;quot;the big Hollywood agency,&amp;quot; which then enabled it to extract the locale and countr y fills. The Reference Resolver did not attempt to resolve the other two descriptors. HASTEN did not possess an Egraph to match &amp;quot;Coke headquarters in Atlanta, &amp;quot; thus causing a missing locale and country fill .</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML