XML Viewer - m95-1011

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/m95-1011_metho.xml
Size: 32,569 bytes
Last Modified: 2025-10-06 14:13:59
<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1011">
  <Title>DESCRIPTION OF THE UMASS SYSTEM AS USED FOR MUC- 6</Title>
  <Section position="3" start_page="0" end_page="128" type="metho">
    <SectionTitle>
SYSTEM OVERVIEW
</SectionTitle>
    <Paragraph position="0"> At the foundation of all our system configurations are the string specialists : these are the pattern matchin g routines that attempt to recognize proper names, dates, and other stylized noun phrase descriptions . The specialists that we used for NE consist of separate routines designed to handle locations, organizations, people, dates, money , and percentages . The location, organization, and people specialists rely on dictionaries for their recognition . The organization and people specialists used for NE were based on code (heavily modified) developed in the Informatio n Retrieval Laboratory at UMass . The dictionaries supporting those specialists were also borrowed from the IR La b with some adjustments for MUC-6. All other specialists were developed in the NLP Lab, and the location specialis t accessed a dictionary based on a subset of the Gazetteer entries .</Paragraph>
    <Paragraph position="1"> The BADGER sentence analyzer refers to a collection of processes associated with part-of-speech (p-o-s) tagging , a trainable decision tree used to locate appositive constructions, local syntactic analysis, and semantic case fram e instantiation . The processing of BADGER is not significantly different than that of the CIRCUS system used i n previous MUC evaluations [4, 5, 6] . Concept node (CN) definitions are still used to create case frame instantiation s and multiple CN definitions can apply to the same text fragment . BADGER is domain/task independent and require s no adjustment in order to move from one application to another. It does depend on CN definitions that ar e appropriate for a given domain/task, but different CN dictionaries can be plugged into BADGER as fully portable dictionary files . BADGER also relies on a p-o-s dictionary as well as semantic feature tags, and we do customize a p o-s dictionary and a semantic feature hierarchy for specific domains . This customization is handled manually, but it is not a difficult task .</Paragraph>
    <Paragraph position="2">  BADGER currently recognizes 27 p-o-s tags and it took 24 hours to manually create a p-o-s dictionary fo r MUC-6. We began with our domain independent core tags lexicon, which includes prepositions, determiners, and a number of common verbs . We then added terms that occurred 500 or more times in the six years of Wall Stree t Journal articles (1987-1992) from the Tipster collection, based on manual inspection . The p-o-s tag lexicon has 2084 entries. Semantic features were assigned to those same terms and also to those terms of interest to the ST tas k that appeared in the formal training set (100 texts) . The semantic lexicon has 5453 entries . We used 45 semantic features and the creation of this semantic tagging dictionary took 36 hours .</Paragraph>
    <Paragraph position="3"> The creation of a CN dictionary is now fully automated and accomplished by CRYSTAL, an inductiv e dictionary construction system. In previous MUC evaluations we used the AutoSlog system to generate C N dictionaries [10, 11, 12], but AutoSlog required human interaction for a quality control assessment of proposed C N definitions. CRYSTAL requires no such human review, and creates CN dictionaries on the basis of machine learnin g techniques [13] .</Paragraph>
    <Paragraph position="4"> In order to map BADGER output into MUC-6 text annotations, a number of decisions must be made abou t important noun phrases, including semantic type identification, concept attribute assignments, and coreferenc e recognition. These decisions are handled primarily by hand-coded consolidation routines, WRAP-UP, a trainabl e discourse analyzer designed to establish relational links between referents, and RESOLVE, a trainable coreference analyzer.</Paragraph>
    <Paragraph position="5"> When WRAP-UP makes a decision about the proper role of a given noun phrase or a possible relationshi p between two noun phrases, it considers evidence from a variety of sources . All of the features used by WRAP-UP are extracted using a domain-independent mechanism to encode features from CN slot values, from the relativ e position of the referents, and from verb patterns in which the noun phrase appeared . Sometimes these variou s sources provide consistent interpretations, but they often disagree with one another . All of WRAP-UP's decision s are handled by trained decision trees so discrepancies in the incoming data are managed on the basis of simila r situations encountered during training . WRAP-UP is most effective when its training has given it the experience i t needs to deal with various kinds of incongruencies .</Paragraph>
    <Paragraph position="6"> RESOLVE handles all the crucial merging decisions used by consolidation and WRAP-UP . It determines when two noun phrases refer to the same entity and should therefore be merged in order to consolidate feature descriptors into a single entity description. If RESOLVE merges entities too aggressively, recall will fall, and if RESOLVE is too passive in its merging decisions, precision will suffer due to spurious entities . RESOLVE's decisions are based on a decision tree induced from feature vector representations of noun phrases . Some of the features used in these representations are domain independent, and some are created for specific domains . The version of RESOLVE used for TE and ST relied on a subset of its domain-independent feature set rather than the larger, domain-enhanced featur e set used for the CO task. TE and ST would probably have benefited from the larger feature set used for the CO task , but there was not enough time to incorporate all of RESOLVE's features into the systems used for the TE and S T tasks .</Paragraph>
    <Paragraph position="7"> Space prohibits us from going into much detail about each of the major components in our various MUC- 6 system configurations, so we will concentrate on the trainable components that were present in the TE, CO, and S T tasks. Since different tasks required different configurations, we summarize the relevant system components in the  We also note that some major system components were designed and implemented during our preparations fo r MUC-6, including all the string specialists, all the consolidation routines, template parsers and text-markin g interfaces to translate MUC-6 training documents into formats compatible with our various trainable components , and a completely new implementation of WRAP-UP .</Paragraph>
  </Section>
  <Section position="4" start_page="128" end_page="130" type="metho">
    <SectionTitle>
A SYSTEM WALK-THROUG H
</SectionTitle>
    <Paragraph position="0"> It always helps to look at concrete examples, and the designated walk-through text provides us with a n opportunity to describe some selected processing as it tackles a specific text . We will limit most of our discussio n to a single sentence as it was handled by RESOLVE, CRYSTAL, and WRAP-UP .</Paragraph>
    <Paragraph position="1"> &amp;quot;Mr. James, 57 years old, is stepping down as chief executive office r on July 1 and will retire as chairman at the end of the year .&amp;quot; A Walk With RESOLVE RESOLVE is needed here to determine that &amp;quot;Mr . James&amp;quot;, &amp;quot;chief executive officer&amp;quot; and &amp;quot;chairman&amp;quot; are al l descriptions of the same person entity . RESOLVE succeeds here by examining pairwise combinations of nou n phrases (NPs) .</Paragraph>
    <Paragraph position="2"> The pairwise examination of NPs is handled by a C4 .5 decision tree. This tree was created by an inductive algorithm in response to a collection of representative NP pairs extracted from available training data. The complete tree is large, containing 133 nodes or possible decisions points . To get a feel for the RESOLVE d-tree, we present a piece of the tree starting at the root node . The parenthetical numbers indicate how many training instances were encountered at each leaf nodes of the tree . Decision points are more reliable when a large number of trainin g instances are examined for a given condition .</Paragraph>
    <Paragraph position="3"> Tree 1: a portion of the RESOLVE coreference tre e</Paragraph>
    <Paragraph position="5"> The ALIAS feature is the root node of the tree . This feature is true when an NP is a recognized alias of the othe r NP (e.g. &amp;quot;GM&amp;quot; is an alias of &amp;quot;General Motors Corp .&amp;quot;). After checking ALIAS and SAME-STRING, the featur e MOST-RECENT-COMPATIBLE-SUBJECT is checked . This feature is true when phrase-1 and phrase-2 ar e compatible (number and gender), and when phrase-1 is the most recent SUBJECT in the text . This feature was included specifically to handle pronoun resolution, and represents a variation on the well-known heuristic of mergin g with any referent found in the most recent compatible phrase . However, RESOLVE's version adds an extr a constraint: the previous phrase must be found in the SUBJECT buffer .</Paragraph>
    <Paragraph position="6"> If phrase-1 is the MOST-RECENT-COMPATIBLE-SUBJECT, and NAME-2 = No (i .e., phrase-2 has no name information - meaning that its probably a pronoun or other anaphoric reference), then the phrases are judge d coreferent . If phrase-2 does have name information (NAME-2 = Yes), but phrase-1 does not have name information (NAME-1 = No), then they are also judged coreferent.</Paragraph>
    <Paragraph position="7"> It is interesting to note that SAME-TYPE is not checked until the fourth layer of the tree : if SAME-TYPE = No, then they are not coreferent. One might reasonably expect to see this at the root node since incompatible type s should certainly be strong evidence for non coreference. But keep in mind that RESOLVE was not trained on perfec t data. If NPs were not processed correctly by the string specialists, then RESOLVE is right to reduce its confidence i n the SAME-TYPE feature accordingly . We know from our NE evaluation, that our string specialists sometimes  labeled an organization as a person or location by mistake . This created a significant noise level for SAME-TYPE , enough to render the feature questionable. Had SAME-TYPE been more reliable, it probably would have found its way to the top of the tree.</Paragraph>
    <Paragraph position="8"> Here is a rundown of how the features at the top of RESOLVE's decision tree were operating during th e complete walkthrough text : ALIAS = YES was used for 25 instances . Every instance was correctly classified, though 4 instance s were scored incorrect due to faulty string trimming .</Paragraph>
    <Paragraph position="9"> SAME-STRING = YES was used for 19 instances. 14 of those were correctly classified ; all the misclassified instances were &amp;quot;it&amp;quot; phrases, so a little contextual knowledge would have probably helpe d (all &amp;quot;it&amp;quot; phrases were attempted, but many were irrelevant) .</Paragraph>
    <Paragraph position="10"> MOST-RECENT-COMPATIBLE-SUBJECT = YES was used for 2 instances . Both times i t misclassified the instances. This feature was supposed to apply only to pronouns and generi c descriptions (&amp;quot;the company&amp;quot;), but in looking over the code for this feature extractor, we see that it wa s not properly constrained.</Paragraph>
    <Paragraph position="11"> PERSON-IS-ROLE = YES This feature was never used in the walk-through text . If the patterns used in this feature extractor were expanded (to include, for example, &amp;quot;IS STEPPING DOWN AS&amp;quot;), then i t might have been more useful .</Paragraph>
    <Paragraph position="12"> To understand how RESOLVE handled our focus sentence, we have to examine portions of the tree further fro m the root. It appears that the following decision points were important for this sentence : Tree 2: another portion of the coreference tre e</Paragraph>
    <Paragraph position="14"> Some intermediate nodes have been removed (all had value &amp;quot;NO&amp;quot;) . These branch points indicate that anytime w e had two references to the same type of object, neither phrase is a pronoun, the second phrase is not a proper name , both are in the same sentence, and the first phrase is a proper name, then RESOLVE classified the two references a s coreferent . The numbers at the leaf node (NAME-1 = YES) indicate that there were 20 instances that had the same feature values in the training set, and that 12 of these were positive and 8 were negative . Since the default classification for an ambiguous leaf node -- a leaf node that contained both positive and negative instances -- was to take the &amp;quot;majority class&amp;quot;, the tree returned &amp;quot;+&amp;quot; .</Paragraph>
    <Paragraph position="15"> This may seem to be an unintuitive and risky pattern for coreference classification, but in fact, this processin g turned out to be correct in the walk-through text whenever RESOLVE received correctly extracted NPs . The only time it erred was when RESOLVE was handed badly trimmed phrases or phrases with incorrect semantic features . So we've stumbled upon a rule induced by RESOLVE that probably wouldn't have been discovered manually . Yet it appears to be effective on real data -- which is precisely why machine learning may be more effective in the long ru n than manual knowledge engineering.</Paragraph>
    <Paragraph position="16">  A Walk with CRYSTAL When we look at the full sentence analysis for our target sentence, we begin with a simple syntactic analysi s into two segments : Subj : Mr. James, 57 years old Verb: i s Obj : steppingPP: down as chief executive officer Conj: and Verb: will retire PP: as chairman PP: at the end of the year Note that BADGER recognized &amp;quot;stepping&amp;quot; as a noun . This is because our p-o-s dictionary (derived from WSJ articles) didn't contain &amp;quot;stepping&amp;quot; as a verb form . Interestingly, this error does not cause us any difficultie s downstream because the exact same interpretation would have been applied during training . As long as the system i s handling noun/verb ambiguities consistently, we do not suffer substantially from these tagging errors. BADGER then applies CN definitions from CRYSTAL's dictionary and finds four CNs that apply to the firs t segment, three that extract &amp;quot;Mr. James, 57 years old&amp;quot; and one that extracts &amp;quot;chief executive officer&amp;quot; . Mr. James. 57 years old, is stepping down as chief executive officer on July 1</Paragraph>
    <Paragraph position="18"/>
  </Section>
  <Section position="5" start_page="130" end_page="132" type="metho">
    <SectionTitle>
ON THE JOB-YES
</SectionTitle>
    <Paragraph position="0"> These are the CN definitions that applied here :  difficulty determining which person to link with a Status_Evidence, and tended to attach the Status_Evidence to all persons in the same sentence.</Paragraph>
    <Paragraph position="1"> The CRYSTAL dictionary that was trained on the 300 TE texts generated 2945 CN definitions . An additional 596 task-specific CN definitions were learned from the 100 ST training texts . A Walk with WRAP-U P With these CNs in hand, WRAP-UP can then apply its trained d-trees to the CNs in order to establish relationa l links between objects . WRAP-UP used 17 different C4 .5 decision trees in its processing. Eight of these identify specific relationships between entities, five of them attempt to filter out spurious entities that do not meet the scenario relevance guidelines, and four fill in default values for template element attributes . We will now look at two examples of selected trees in action.</Paragraph>
    <Paragraph position="2"> Tree 3 shows a portion of a tree that considers relationships between Person and Status . Each Status o r Status_Evidence which has been identified in the text is paired with each Person to form an instance . An instance i s formed for &amp;quot;Mr . James&amp;quot; paired with the Status-Out from the &amp;quot;X is stepping down&amp;quot; concept node . This tree returns a negative classification if the Person and Status are not found in the same sentence . If both are found in the same noun phrase (which is the case for Mr . James and Status-Out), the tree returns a positive classification . An In-and Out relationship is created with IO_Person = Mr . James and New_Status = Out .</Paragraph>
    <Paragraph position="3"> If the Person was not from the same noun phrase as a Status CN, the tree returns negative . If the CN was of type Status_Evidence, as was the case for &amp;quot;will retired &amp;quot; in the second segment of the sentence, the tree branches to a large subtree in which two thirds of the training instances were positive. The tree returns a positive classificatio n for the instance with &amp;quot;Mr . James&amp;quot; and &amp;quot;will retire&amp;quot; . This leads to an In-and-Out relationship which is eventuall y merged with the textually identical In-and-Out already created .</Paragraph>
    <Paragraph position="4"> Tree 3: a portion of the Person-Status-Links tree</Paragraph>
    <Paragraph position="6"> Tree 4 is from the Filter stage of WRAP-UP and classifies Persons as relevant (&amp;quot;+&amp;quot;) or irrelevant (&amp;quot;-&amp;quot;) . The main criterion for relevance is whether a Person is involved in a management succession event, which is reflected i n this tree. Persons with multiple links from In_and_Out objects are classified as relevant. By the time this tree is applied, &amp;quot;Mr. James&amp;quot; from our example has been linked to more than one In-and-Out and is classified as relevant .</Paragraph>
    <Paragraph position="7"> Each of persons from this text who are not involved in a change of status were correctly classified as irrelevant b y WRAP-UP and were discarded .</Paragraph>
    <Paragraph position="8"> Tree 4: a portion of the Person-Filter tree</Paragraph>
    <Paragraph position="10"/>
  </Section>
  <Section position="6" start_page="132" end_page="439" type="metho">
    <SectionTitle>
SCORE REPORTS AND DISCUSSION
</SectionTitle>
    <Paragraph position="0"> We participated in all four MUC-6 evaluations in order to obtain as much feedback about our variou s components as possible. In an effort to reach beneath the numbers of the score reports, we conducted a few informal comparison-point experiments which we will report here as well .</Paragraph>
    <Paragraph position="1"> Named Entities (NE ) The NE task was handled by four independent string specialists designed and implemented during our MUC- 6 preparations. Dates, money, and percentages were all handled by a single specialist. A breakdown of our recall and precision for each specialist is shown below:  Our string specialists were organized in a serial architecture which allowed upstream components to clai m strings in a non-negotiable manner . The chain of specialists operated in the following order : [money/dates/percentages] ---&gt; organizations ---&gt; people ---&gt; location s The numeric specialists were reliable and did not interfere with downstream components by claiming false hits . However, the low coverage of the organization specialist's dictionary left many organization names free to be claime d by the person and location specialists. This introduces an interference effect into the precision scores for each of the three ENAMEX types. Text claimed by the wrong specialist, say an organization name marked as a location, i s counted as an incorrect organization type .</Paragraph>
    <Paragraph position="2"> To see how each specialist was performing individually, we broke down the ACTUAL column of the scor e report into three columns, one for each of the three specialists. The resulting RIP scores are as follows, with P1 th e precision for the type as reported by the scoring program, and P2 the precision for the individual specialists . P2 is computed as the number correct for a type divided by the objects reported as having that type .</Paragraph>
    <Paragraph position="3">  From this table we see that the organization specialist actually made the fewest classification errors , misclassifying only seven locations . Names that slipped past the organization specialist and were claimed by th e person or location specialists were charged against organization precision by the scoring program . So the organization specialist was penalized twice for those errors .</Paragraph>
    <Paragraph position="4"> While it is apparent that the serial architecture is far from ideal, our most glaring weakness was the recall of th e organization specialist. We had not anticipated this problem on the basis of the dry run materials . A post mortem o f the official NE test set shows that it contained a large number of government organizations, which represented a weak spot in our organization dictionary. This was a regrettable oversight on our part, which undoubtedly hurt recal l for CO, TE, and ST, given the importance of organization entities throughout .</Paragraph>
    <Paragraph position="5"> 13 3 We also note that the precision of the money and percentage specialists would have been perfect had w e succeeded in filtering out a data table in one of the test texts . The specialists were not at fault for that filtering error which failed to follow stated extraction guidelines .</Paragraph>
    <Paragraph position="6"> Coreference (CO) RESOLVE is a coreference resolution system that uses machine learning techniques to determine coreferen t relationships among relevant phrases in a text . For the MUC-6 evaluation, we used the C4.5 decision tree induction system [9]. RESOLVE was designed to work in conjunction with an information extraction system; as such, it s expected input is a set of phrases that are relevant to a specified information extraction task . The knowledge RESOLVE uses in order to learn to classify coreferent phrases is based on the same shallow knowledge used by ou r other system components .</Paragraph>
    <Paragraph position="7"> The MUC-6 CO task was defined to include nearly all noun phrases, not just those that were relevant to eithe r the TE or ST tasks. Competent analysis of coreferent relationships among all noun phrases requires a much mor e refined knowledge base and much deeper linguistic analysis than we employ in any of our current informatio n extraction components . However, we discovered that 66% of the recall in the CO dry-run test materials was based o n references to people and organizations . We also intended to use RESOLVE for coreference resolution within the T E and ST tasks, where it would have to deal with person and organization references .</Paragraph>
    <Paragraph position="8"> With these factors in mind, we decided to run RESOLVE on the MUC-6 CO task, but to constrain its input s o that it only attempted to find coreferent relationships among references to people and organizations, references tha t were potentially relevant to the MUC-6 TE and ST tasks . We therefore expected that our recall would be lower than other systems that attempted to find coreferent relationships among the full set of noun phrases defined by the MUC -</Paragraph>
  </Section>
  <Section position="7" start_page="439" end_page="439" type="metho">
    <SectionTitle>
6 CO task, but we hoped that the evaluation would be valuable nonetheless .
</SectionTitle>
    <Paragraph position="0"> Our official CO scores were 44% recall and 51% precision . If 66% of the recall for the MUC-6 CO fina l evaluation was based on references to people and organizations, as it was for the dry-run evaluation, then RESOLVE was actually achieving approximately 67% of the total recall for which it was originally intended . Earlier experiments with RESOVE in the MUC-5 EJV domain showed that our unpruned C4 .5 decision trees used fo r coreference resolution tend to get higher recall and lower precision than unpruned decision trees. Since RESOLV E was already handicapped with respect to potential recall -- focusing only on person and organization references -- w e decided to use unpruned trees in the final evaluation. After the official evaluation, we retrained RESOLVE based on a prepruning procedure in which only the &amp;quot;easiest&amp;quot; subset of the positive training instances (pairs of coreferent phrases ) are used. This version of RESOLVE suffered a small decrease in recall, 41%, but a much larger increase i n precision, 59% .</Paragraph>
    <Paragraph position="1"> Although our recall and precision results are not among the best reported in this evaluation, we find the results extremely encouraging given the fact that RESOLVE is a fully trainable system . RESOLVE was designed to find coreferent relationships among references to people and organizations ; since the MUC-6 CO training material included annotations for entities other than people and organizations, a special interface was used to mark 25 relevan t texts from the ST task for person and organization references (5 hours of work) . Another reason for specially marking ST texts was a hope that the CO training could also be used for coreference resolution in the TE and S T tasks; unfortunately, time constraints prevented us from using this domain-specific coreference resolution training in the latter two tasks.</Paragraph>
    <Paragraph position="2"> We spent 2 weeks on a system component that could reliably identify phrases from BADGER that wer e relevant candidates for the CO task, which in our view were phrases referring to people and organizations . Another week was spent modifying feature extractors used in the MUC-5 English Joint Ventures domain and adding new features designed specifically for the MUC-6 CO task (especially resolution of pronominal references to people and organizations) and the ST task (especially associating persons with their roles) . The output generator, which was required to handle potentially nested COREF SGML tags, required nearly a week-long effort .</Paragraph>
    <Paragraph position="3"> Given the nature of trainable decision tree technology, it is safe to assume that RESOLVE's performance woul d improve for both recall and precision with additional training . We are frankly surprised to see RESOLVE operatin g as well as it does on the basis of only 25 training documents .</Paragraph>
    <Paragraph position="4">  Template Entities (TE) In moving from NE to TE, we add BADGER's processing and a trainable CN dictionary to support BADGER' s case frame instantiations . It is interesting to compare our TE scores with our NE scores for organizations and people . TE is a test of fine-grained information extraction . Not only do we need to locate noun phrases that describ e people and organizations, we need to pull those nouns phrases apart in order to separate out names and titles, aliases and locales. This is primarily a noun phrase analysis challenge, with additional points to be won from correc t merging and consolidation across multiple noun phrases . As such, the RESOLVE decision trees represent crucial capabilities, along with routines designed to analyze appositives and other complex noun phrases . Unfortunately, CRYSTAL's CN definitions offer little help with noun phrase analysis, since they operate at a relatively coarse leve l of granularity with respect to a complex noun phrase . The CNs that CRYSTAL induces are designed to locate relevant noun phrases for an extraction task, but they do not help us understand where to look inside a given nou n phrase for the relevant information .</Paragraph>
    <Paragraph position="5">  The irrelevance of CRYSTAL's dictionary to noun phrase analysis has been confirmed by an experimental T E run in which we removed the official CRYSTAL dictionary (containing 2945 CN definitions) and replaced it with a dictionary of only two CN definitions .</Paragraph>
    <Paragraph position="6"> Defl extracts a person from any syntactic buffer, as long as the people string specialist identified a person in that buffer.</Paragraph>
    <Paragraph position="7"> Def2 extracts an organization from any syntactic buffer, as long as th e organization string specialist identified an organization in that buffer .</Paragraph>
    <Paragraph position="8"> Scores using this dictionary are remarkably close to the official score report based on a fully trained CRYSTAL dictionary, as shown by the following score report. So we must conclude that CRYSTAL, and indeed, all coarse grained sentence analysis, was totally irrelevant to the TE task. We would have obtained a higher score report b y trusting the specialists, working to tighten the performance of the specialists, and focusing on the manually-code d routines needed to dissect noun phrases correctly .</Paragraph>
    <Paragraph position="9"> CRYSTAL and BADGER neither helped nor hindered : they just weren't needed . RESOLVE probably did contribute to TE processing, but its positive effect was overwhelmed by critical weaknesses in noun phrase analysis , an area that has not received adequate attention thus far in our quest for trainable technologies .</Paragraph>
    <Paragraph position="10">  Here is the TE score report that results from this minimal CN dictionary :  In our system design, the ST task is equivalent to TE plus scenano-specific CNs for changes in position and status and WRAP-UP. WRAP-UP is responsible for all the relational links that describe successions and in-and-out instances. So it is fair to expect that TE will operate as an upper bound for ST . But we see significant drops in both recall and precision as we move from TE to ST .</Paragraph>
    <Paragraph position="11"> Part of this drop in recall and precision comes from low recall for new status, which is critical to this scenario . Persons and organizations which are not involved in a change of job status are discarded as irrelevant . CRYSTAL learned CNs named STATUS-IN and STATUS-OUT to identify persons involved in a change of status . The S T training texts provided only about 150 instances each for STATUS-IN and STATUS-OUT, which was insufficien t training . CRYSTAL had limited recall for these CNs.</Paragraph>
    <Paragraph position="12"> WRAP-UP takes as input the output from BADGER and forms In-and-Out relations and Succession events .</Paragraph>
    <Paragraph position="13"> WRAP-UP learns to discard as irrelevant any persons and organizations that are not attached to an In-and-Out or a Succession . When our system does not extract a new status due to low recall by domain-specific CNs, this ca n cause WRAP-UP to discard relevant persons and organizations, further lowering recall . We believe that the training available to CRYSTAL and WRAP-UP was too sparse to enable intelligent inferences about succession events . The following graph shows the learning curve for CN definitions that identified Organization names . The 30 0 available TE texts were randomly partitioned into training set and a blind test set with training size ranging fro m 10% to 90% of the documents. This graph shows the average recall and precision for 50 random partitionings at each training size.</Paragraph>
    <Paragraph position="14">  This data suggests that ORG was probably finding quite a few of the organizations missed by the scanner, but i t needed more training . Recall and precision had not leveled out after 270 training texts . With almost no training i t got a fourth of the org names (relying heavily on the scanner) . After 270 training texts it was getting half the or g names, but appears from the slope of the learning curve that it was not reaching saturation yet .</Paragraph>
    <Paragraph position="15"> We expect that WRAP-UP would tell a similar story if we were to compute the learning curve for key relationa l decisions . Inductive learning algorithms eventually flatten out with enough training, but performance tends t o increase steadily up until that plateau is reached .</Paragraph>
    <Paragraph position="16">  Although CRYSTAL was trained on 300 TE documents, WRAP-UP was only trained on 100 ST documents .</Paragraph>
    <Paragraph position="17"> The ST training corpus was undoubtedly much too small to support an inductive algorithm designed to lear n relational decisions . Trainable technologies are valuable in the battle against the knowledge engineering bottleneck , but we feel that it is important to provide adequate levels of training in order to realize their potential .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML