File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/m95-1012_metho.xml

Size: 49,452 bytes

Last Modified: 2025-10-06 14:13:59

<?xml version="1.0" standalone="yes"?>
<Paper uid="M95-1012">
  <Title>MITRE: DESCRIPTION OF THE ALEMBIC SYSTEM USED FOR MUC-6</Title>
  <Section position="3" start_page="0" end_page="142" type="metho">
    <SectionTitle>
THE PREPROCESSORS
</SectionTitle>
    <Paragraph position="0"> As noted above, the UNIX-based portion of the system is primarily responsible for part-of-speech tagging .</Paragraph>
    <Paragraph position="1"> Prior to the part-of-speech tagger, however, a text to be processed by Alembic passes through severa l preprocess stages; each preprocessor &amp;quot;enriches&amp;quot; the text by means of SGML tags. All of these preprocess components are implemented with LEX (the lexical analyzer generator) and are very fast.</Paragraph>
    <Paragraph position="2"> An initial preprocessor, the punctoker, makes decisions about word boundaries that are not coincident with whitespace. It tokenizes abbreviations (e.g., &amp;quot;Dr.&amp;quot;), and decides when sequences of punctuation and alphabeti c characters are to be broken up into several lexemes (e.g., &amp;quot;Singapore-based&amp;quot;) . The punctoker wraps &lt;LEX&gt; tags around text where necessary to indicate its decisions, as in the following: Singapore&lt; L EX pos=JJ&gt;-based&lt;/LEX &gt; As this example suggests, in some cases, the punctoker guides subsequent part-of-speech tagging by addin g a part-of-speech attribute to the &lt;LEX&gt; tags that it emits .</Paragraph>
    <Paragraph position="3"> The parasenter zones text for paragraph and sentence boundaries, the former being unnecessary for Muc-6.</Paragraph>
    <Paragraph position="4"> The sentence tagging component is both simple and conservative . If any end-of-sentence punctuation has no t been &amp;quot;explained&amp;quot; by the punctoker as part of a lexeme, as in abbreviations, it is taken to indicate a sentenc e boundary. The parasenter is also intended to filter lines in the text body that begin with &amp;quot;(r)&amp;quot; (but see our error analysis below) . A separate hl-taggeris invoked to zone sentence-like constructs in the headline field . The preprocess includes specialized phrase taggers. The title-tagger marks personal titles, making distinctions along the lines drawn by the NE and ST tasks. Included are personal honorifics (Dr., Ms.); military an d religious titles (Vicar, Sgt.); corporate posts (CEO, chairman); and &amp;quot;profession&amp;quot; words (analyst, spokesperson) . The date-tagger identifies TIMEX phrases. It uses a lex-based scanner as a front-end for tokenizing and typing its input; then a pattern-matching engine finds the actual date phrases . The date-tagger is fast, sinc e the pattern matcher itself is highly optimized, and since the lex-based front-end does not actually tokenize th e input or fire the pattern-matcher unless it suspects that a date phrase may be occurring in the text .</Paragraph>
    <Paragraph position="5">  Both the date- and title-tagger can tag a phrase as either (I) a single SGML element, or (2) individual lexemes , with special attributes that indicate the beginning and end of the matrix phrase, as i n &lt;LEX post=start&gt;chief&lt;/LEX &gt; &lt;LEX post=mid&gt;executive&lt;/LEX &gt; &lt;LEX post=end&gt;officer&lt;/LEX &gt; We adopted this LEX-based phrase encoding so as to simplify (and speed up) the input scanner of the part-of-speech tagger. In addition, a phrase's LEX tags can encode parts-of-speech to help guide the p-o-s tagger.</Paragraph>
  </Section>
  <Section position="4" start_page="142" end_page="143" type="metho">
    <SectionTitle>
THE PART-OF-SPEECH TAGGE R
</SectionTitle>
    <Paragraph position="0"> Our part-of-speech tagger is closest among the components of our Muc-6 system to Brill's original work o n rule sequences [S, 6, 7] . The tagger is in fact a re-implementation of Brill's widely-disseminated system, wit h various speed and maintainability improvements. Most of the rule sequences that drive the tagger were automatically learned from hand-tagged corpora, rather than hand-crafted by human engineers . However, the rules are in a human-understandable form, and thus hand-crafted rules can easily be combined with automatically learned rules, a property which we exploited in the Muc-6 version of Alembic.</Paragraph>
    <Paragraph position="1"> The tagger operates on text that has been lexicalized through pre-processing . The following, for example , is how a sample walkthrough sentence is passed to the part-of-speech tagger. Note how punctuation has bee n tokenized, and &amp;quot;Mr.&amp;quot; has been identified as a title and assigned the part-of-speech NNP (proper noun) .</Paragraph>
    <Paragraph position="2"> &lt;5&gt;Even so&lt;lex&gt;,&lt;/lex&gt; &lt;LEX pos=NNP ttl=WHOLE&gt;Mr .&lt;/LEX&gt; Dooner is on the prowl for more creative talen t and is interested in acquiring a hot agency&lt;lex&gt;.&lt;/lex&gt;&lt;/5 &gt; The part-of-speech tagger first assigns initial parts-of-speech by consulting a large lexicon . The lexico n maps words to their most frequently occurring tag in the training corpus . Words that do not appear in th e lexicon are assigned a default tag of NN (common noun) or NNP (proper noun), depending on capitalization.</Paragraph>
    <Paragraph position="3"> For unknown words, after a default tag is assigned, lexical rules apply to improve the initial guess . These rules operate principally by inspecting the morphology of words . For example, an early rule in the lexical rul e sequence retags unknown words ending in &amp;quot;ly&amp;quot; with the 10 tag (adverb) . In the sentence above, the only unknown word (&amp;quot;Dooner&amp;quot;) is not subject to retagging by lexical rules; in fact, the default NNP tag assignment i s correct. Lexical rules play a larger role when the default tagging lexicon is less complete than our own, which we generated from the whole Brown Corpus plus 3 million words of Wall Street Journal text . For example, in our experiments tagging Spanish texts (for which we had much smaller lexica), we have found that lexica l rules play a larger role (this can also be partially attributed to the more inflected nature of Spanish) .</Paragraph>
    <Paragraph position="4"> After the initial tagging, contextual rules apply in an attempt to further fix errors in the tagging . These rules reassign a word's tag on the basis of neighboring words and their tags . In this sentence, &amp;quot;more&amp;quot; changes from its initial JJR (comparative adjective) to RBR (comparative adverb) . Note that this change is arguably erroneous, depending on how one reads the scope of &amp;quot;more&amp;quot;. This tagging is changed by the following rule, which roughly reads: change word W from JJR to ROR if the the word to W's immediate right is tagged JJ JJR RBR nexttag JJ Table 1, below, illustrates the tagging process . The sample sentence is on the first line; its initial lexicon-based tagging is on the second line; the third line shows the final tagging produced by the contextual rules . In controlled experiments, we measured the tagger's accuracy on Wall Street Journal text at 95.1% based on a training set of140,000 words. The production version of the tagger, which we used for Muc-6, relies on the Even so . Mr. Dooner Is on the prowl for more creative talent and Is Interested In acquiring a hot agency</Paragraph>
    <Paragraph position="6"> Table 1 : Tagging a text with the lexicon (line 2) and contextual rules (line 3) . Note the default lexicon assignment of nnp to &amp;quot;Dooner&amp;quot; and the rule-based correction of &amp;quot;more&amp;quot; .</Paragraph>
    <Paragraph position="7">  learned rules from Brill's release 1.1 (148 lexical rules, 283 contextual rules), for which Brill has measure d accuracies that are 2-3 percentage points higher than in our own smaller-scale experiments . For MUC-6, we combined these rules with 19 hand-crafted contextual rules that correct residual tagging errors that were especially detrimental to our NE performance. Tagger throughput is around 3000 words/sec.</Paragraph>
  </Section>
  <Section position="5" start_page="143" end_page="149" type="metho">
    <SectionTitle>
THE PHRASER
</SectionTitle>
    <Paragraph position="0"> The Alembic phrase finder, or phraser for short, performs the bulk of the system's syntactic analysis . As noted above, it has somewhat less recognition power than a finite-state machine, and as such shares many characteristics of pattern-matching systems, such as CIRCUS [10] or FASTUS [2] . Where it differs from these systems is in being driven by rule sequences. We have experimented with both automatically-learned rul e sequences and hand-crafted ones . In the system we fielded for Muc-6, we ended up running entirely with hand-crafted sequences, as they outperformed the automatically-learned rules .</Paragraph>
    <Paragraph position="1"> How the phraser works The phraser process operates in several steps. First, a set of initial phrasing functions is applied to all of the sentences to be analyzed . These functions are responsible for seeding the sentences with likely candidat e phrases of various kinds. This seeding process is driven by word lists, part-of-speech information, and pretaggings provided by the preprocessors. Initial phrasing produces a number of phrase structures, many o f which have the initial null labeling (none), while some have been assigned an initial label (e .g., num) . The following example shows a sample sentence from the walkthrough message after initial phrasing .</Paragraph>
    <Paragraph position="2"> Yesterday, &lt;none&gt;McCann&lt;/none&gt; made official what had been widely anticipated : &lt;ttl&gt;Mr.&lt;/ttl&gt; &lt;none&gt;James&lt;/none&gt;, &lt;num&gt;57&lt;/num&gt; years old, is stepping down as &lt;post&gt;chief executive officer&lt;/post&gt; o n &lt;date&gt;July 1&lt;/date&gt; and will retire as &lt;post&gt;chairman&lt;/post&gt; at the end of the year .</Paragraph>
    <Paragraph position="3"> The post, ttl, and date phrases were identified by the title and date taggers . Mr. James' num-tagged age is identified on the basis of part of speech information, as is the organization name &amp;quot;McCann&amp;quot; .</Paragraph>
    <Paragraph position="4"> Once the initial phrasing has taken place, the phraser proceeds with phrase identification proper . This is driven by a sequence of phrase-finding rules . Each rule in the sequence is applied in turn against all of the phrases in all the sentences under analysis. If the antecedents of the rule are satisfied by a phrase, then th e action indicated by the rule is executed immediately. The action can either change the label of the satisfyin g phrase, grow its boundaries, or create new phrases. After the nth rule is applied in this way against every phrase in all the sentences, the n+lth rule is applied in the same way, until all rules have been applied . After all of the rules have been applied, the phraser is done.</Paragraph>
    <Paragraph position="5"> It is important to note that the search strategy in the phraser differs significantly from that in standar d parsers. In standard parsing, one searches for any and all rules whose antecedents might apply given the stat e of the parser's chart : all these rules become candidates for application, and indeed they all are applie d (modulo higher-order search control) . In our phraser, only the current rule in a rule sequence is tested: the rule is applied wherever this test succeeds, and the rule is never revisited at any subsequent stage of processing . After the final rule of a sequence is run, no further processing occurs .</Paragraph>
    <Paragraph position="6"> The language of phraser rules The language of the phraser rules is as simple as their control strategy . Rules can test lexemes to the left and right of the phrase, or they can look at the lexemes in the phrase . Tests in turn can be part-of-speech queries, literal lexeme matches, tests for the presence of neighboring phrases, or the application of predicates that are evaluated by invoking a Lisp procedure. There are several reasons for keeping this rule languag e simple. In the case of hand-crafted rules, it facilitates the process of designing a rule sequence. In the case o f machine-learned rules, it restricts the size of the search space on each epoch of the learning regimen, thus making it tractable. In either case, the overall processing power derives as much from the fact that the rules are sequenced, and feed each other in turn, as it does from the expressiveness of the rule language.</Paragraph>
    <Paragraph position="7">  This rule changes the label of a phrase from none to person if the phrase is bordered on its left by a ttl phrase. On the sample sentence, this rule causes the following relabeling of the phrase around &amp;quot;James&amp;quot; . Yesterday, &lt;none&gt;McCann&lt;/none&gt; made official what had been widely anticipated : &lt;ttl&gt;Mr.&lt;/ttl&gt; &lt;person&gt;James&lt;/person&gt;, &lt;num&gt;57&lt;/num&gt; years old, is stepping down as &lt;post&gt;chief executiv e officer&lt;/post&gt; on &lt;date&gt;July 1&lt;/date&gt; and will retire as &lt;post&gt;chairman&lt;/post&gt; at the end of the year .</Paragraph>
    <Paragraph position="8"> Once this rule has run, the labelings it instantiates become available as input to subsequent rules in th e sequence, e.g., rules that attach the title to the person in &amp;quot;Mr. James&amp;quot;, that attach the age apposition, and s o forth. Phraser rules do make mistakes, but as with other sequence-based processors, the phraser applies later rules in a sequence to patch errors made by earlier rules . In the walkthrough message, for example, &amp;quot;Amarati &amp; Purls&amp;quot; is identified as an organization, which ultimately leads to an incorrect org tag for &amp;quot;Martin Purls&amp;quot;, since this person's name shares a common substring with the organization name . However, rules that find personal names occur later in our named entities sequence than those which find organizations, thus allowing th e phraser to correctly relabel &amp;quot;Martin Purls&amp;quot; as a person on the basis of a test for common first names. Rule sequences for MUG6 For MUC-6, Alembic relies on three sequences of phraser rules, divided roughly into rules for generatin g NE-specific phrases, those for finding TE-related phrases, and those for ST phrases. The division is only rough , as the NE sequence yields some number of TE-related phrases as a side-effect of searching for named entities . To illustrate this process, consider the following walkthrough sentence, as tagged by the NE rule sequence .</Paragraph>
    <Paragraph position="9"> But the bragging rights to &lt;org&gt;Coke&lt;/org&gt; 's ubiquitous advertising belongs to &lt;org&gt;Creative Artists Agency &lt;/org&gt;, the big &lt;location&gt;Hollywood&lt;/location&gt; talent agency.</Paragraph>
    <Paragraph position="10"> The org label on &amp;quot;Creative Artists Agency&amp;quot; was set by a predicate that tests for org keywords (like &amp;quot;Agency&amp;quot;). &amp;quot;Coke&amp;quot; was found to be an org elsewhere in the document, and the label was then percolated . Finally, the location label on &amp;quot;Hollywood&amp;quot; was set by a predicate that inspects the tried-and-not-so-true TIPSTER gazetteer. What is important to note about these NE phraser rules is that they do not rely on a large database o f known company names . Instead, the rules are designed to recognize organization names in almost complet e absence of any information about particular organization names (with the sole exception of a few acronyms such as IBM, GM, etc.) . This seems to auger well for the ability to apply Alembic to different application tasks . Proceeding beyond named entities, the phraser next applies its TE-specific rule sequence . This sequence performs manipulations that resemble NP parsing, e.g., attaching locational modifiers. In addition, a subsequence of TE rules concentrates on recognizing potential organization descriptors . These rules generate so-called corpnp phrases, that is noun phrases that are headed by an organizational common noun (such a s &amp;quot;agency&amp;quot;, &amp;quot;maker&amp;quot;, and of course &amp;quot;company&amp;quot;) . The rules expand these heads leftwards to incorporate lexemes that satisfy a set of part-of-speech constraints. One such phrase, for example, is in the sample sentence above . But the bragging rights to &lt;org&gt;Coke&lt;/org&gt; 's ubiquitous advertising belongs to &lt;org&gt;&lt;org&gt;Creative Artist s Agency &lt;/org&gt;, &lt;corpnp&gt; the big &lt;location&gt;Hollywood&lt;/location&gt; talent agency&lt;/corpnp&gt;&lt;/org &gt; After corpnp phrases have been marked, another collection of TE rules associates these phrases with neighboring org phrases. In this case such a phrase is found two places to the left (on the other side of a comma) , so a new org phrase is created which spans both the original org phrase and its corpnp neighbor. See above.</Paragraph>
    <Paragraph position="11"> Note that these rule sequences encode a semantic grammar . Organizationally-headed noun phrases are labeled as org, regardless of whether they are simple proper names or more complex constituents such as th e</Paragraph>
    <Paragraph position="13"> org-corpnp apposition above. This semantic characteristic of the phraser grammar is clearer still with ST rules.</Paragraph>
    <Paragraph position="14"> These rules are responsible for finding phrases denoting events relevant to the MUC-6 scenario templates .) For the succession scenario, this consists of a few key phrase types, most salient among them : job (a post at an org), job-in and job-out (fully specified successions) and post-in and post-out (partially specified successions) .</Paragraph>
    <Paragraph position="15"> The following example shows the ST phrases parsed out of a key sentence from the walkthrough message.</Paragraph>
    <Paragraph position="16"> Yesterday, &lt;person&gt;McCann&lt;/person&gt; made official what had been widely anticipated :  The post-out phrase encodes the resignation of a person in a post. Note that in the formal evaluation we failed to find a more-correct job-out phrase, which should have included &amp;quot;McCann&amp;quot;. This happened because we did not successfully identify &amp;quot;McCann&amp;quot; as an organization, thus precluding the formation of the job-out phrase.</Paragraph>
    <Section position="1" start_page="145" end_page="147" type="sub_section">
      <SectionTitle>
Learning Phrase Rules
</SectionTitle>
      <Paragraph position="0"> We have applied the same general error-reduction learning approach that Brill designed for generating part-of-speech rules to the problem of learning phraser rules in support of the NE task. The official version of Alembic for MUC-6 did not use any of the rule sequences generated by this phrase rule learner, but we hav e since generated unofficial scores . In these runs we used phrase rules that had been learned for the ENAMEX expressions only--we still used the hand-coded pre-processors and phraser rules for recognizing TIMEX and NUMEX phrases. Our performance on this task is shown in Fig. 4, above. These rules yield six fewer points o f P&amp;R than the hand-coded ENAMEX rules--still an impressive result for machine-learned rules . Interestingly, the bulk of the additional error in the machine-learned rules is not with the &amp;quot;hard&amp;quot; organization names, but with person names OR=-If, LP=-14) and locations (AR-I2, AP=-18) .</Paragraph>
      <Paragraph position="1"> 1We put about one staff week of work into the sT task, during which we experienced steep hill-climbing on the training set . Nevertheless, we felt that the maturity of our sT processing was sufficiently questionable to preclude participating in the official evaluation . The present discussion should be taken in this light, i .e., with the understanding that it was not officially evaluated atMuc-6 .  Facts enter the propositional database as the result of phrase interpretation . The phrase interpreter is controlled by a small set of Lisp interpretation functions, roughly one for each phrase type . Base-level phrases, i .e. phrases with no embedded phrases, are mapped to unary interpretations. The phras e &lt;person&gt;IZobert L. James&lt;/person&gt;, for example is mapped to the following propositional fact . Note the pers-01 term in this proposition: it designates the semantic individual denoted by the phrase, and is generated in the process of interpretation .</Paragraph>
      <Paragraph position="2"> person(pers-01) Complex phrases, those with embedded phrases, are typically interpreted as conjuncts of simple r interpretations (the exception being NP coordination, as in &amp;quot;chairman and chief executive&amp;quot;) . Consider the phrase &amp;quot;Mr. James, 57 years old&amp;quot; which is parsed by the phraser as follows . Note in particular that the overal l person-age apposition is itself parsed as a person phrase.</Paragraph>
      <Paragraph position="3"> &lt;person&gt;&lt;person&gt;Mr. James&lt;/person&gt;, &lt;age&gt;&lt;num&gt;57&lt;/num&gt; years old&lt;/age&gt;&lt;/person &gt; The treatment of age appositions is compositional, as is the case for the interpretation of all but a few complex phrases . Once again, the embedded base-level phrase ends up interpreted as a unary person fact.</Paragraph>
      <Paragraph position="4"> The semantic account of the overall apposition ends up as a has-age relation modifying pers-02, the semantic individual for the embedded person phrase. This proposition designates the semantic relationship between a person and that person's age. More precisely, the following facts are added to the inferential database .</Paragraph>
      <Paragraph position="6"> What appears to be a spare argument to the has-age predicate above is the event individual for the predicate. Such arguments denote events themselves (in this case the event of being a particular number o f years old), as opposed to the individuals participating in the events (the individual and his or her age) . This treatment is similar to the partial Davidsonian analysis of events due to Hobbs [8] . Note that event individuals are by definition only associated with relations, not unary predicates .</Paragraph>
      <Paragraph position="7"> As a point of clarification, note that the inference system does not encode facts at the predicate calculu s level so much as at the interpretation level made popular in such systems as the SRI core language engine [1, 3] .</Paragraph>
      <Paragraph position="8"> In other words, the representation is actually a structured attribute-value graph such as the following, whic h encodes the age apposition above.</Paragraph>
      <Paragraph position="9"> [[head :person] [proxy pers-02] [modifiers [[head has-age] [proxy ha-04] [arguments (pers-02 [[head age ] [proxy age-03]])]]] ] The first two fields correspond to the embedded phrase : the head field is a semantic sort, and the proxy field holds the designator for the semantic individual denoted by the phrase . The interpretation encoding the  overall apposition ends up in the modifiers slot, an approach adopted from the standard linguistic account o f phrase modification. Inference in Alembic is actually performed directly on interpretation structures, an d there is no need for a separate translation from interpretations to more traditional-looking propositions . The propositional notation is more perspicuous to the reader, and we have adopted it here . Finally, note that the phrase interpretation machinery maintains pointers between semantic individual s and the surface strings from which they originated . One of the fortunate--if unexpected--consequences o f the phraser's semantic grammar is that maintaining these cross-references is considerably simpler than was th e case in our more linguistically-inspired categorial parser of old . Except for the ORG_DESCRIPTOR slot, the fil l rules line up more readily with semantic notions than with syntactic considerations, e .g., maximal projections . Equality reasoning Much of the strength of this inferential framework derives from its equality mechanism . This subcomponent allows one to make two semantic individuals co-designating, i.e., to &amp;quot;equate&amp;quot; them . Facts that formerly held of only one individual are then copied to its co-designating siblings . This in turn enables inference that may have been previously inhibited because the necessary antecedents were distributed ove r (what were then) distinct individuals .</Paragraph>
      <Paragraph position="10"> This equality machinery is exploited at many levels in processing semantic and domain constraints . One of the clearest such uses is in enforcing the semantics of coreference, either definite reference or appositional coreference. Take for example the following phrase from the walkthrough message, which we show here a s parsed by the phraser.</Paragraph>
      <Paragraph position="11">  Pressing on, the phraser parses the overall org-orgnp apposition as an overarching org. To interpret th e apposition, the interpreter also adds the following proposition to the database .</Paragraph>
      <Paragraph position="12"> entity-np-app((org-05, org-06) e-n-a-09) This ultimately causes org-05 and org-06 to become co-designating through the equality system, and th e following fact appears in the inferential database .</Paragraph>
      <Paragraph position="13"> has-location((org-05, geo-07) hasloc-10) i.e., Creative Artists Agency is located in Hollywood This propagation of facts from one individual CO its co-designating siblings is the heart of our coreferenc e mechanism. Its repercussions are particularly critical to the subsequent stage of template generation. By propagating facts in this way, we can dramatically simplify the process of collating information into templates , since all the information relevant to, say, an individual company will have been attached to that company b y equality reasoning. We will touch on this point again below.</Paragraph>
    </Section>
    <Section position="2" start_page="147" end_page="149" type="sub_section">
      <SectionTitle>
Inference
</SectionTitle>
      <Paragraph position="0"> The final role of the Alembic inference component is to derive new facts through the application o f carefully-controlled forward inference. As was the case with our MUC-5 system, the present Alembic allows only limited forward inference. Though the full details of this inference process are of legitimate interest i n  their own right, we will only note some highlights here . To begin with, the tractability of forward inference in this framework is guaranteed just in case the inference axioms meet a certain syntactic requirement . To date, all the rules we have written for even complex domains, such as the joint-venture task in MUC-5, have met this criterion. Aside from this theoretical bound on computation, we have found in practice that th e inference system is remarkably fast, with semantic interpretation, equality reasoning, rule application, and al l other aspects of inference together accounting for 6-7% of all processing time in Alembic. Details are in [II] . We exploited inference rules in several primary ways for the TE and ST tasks. The first class of inferenc e rules enforce so-called terminological reasoning, local inference that composes the meaning of words . One such rule distributes the meaning of certain adjectives such as &amp;quot;retired&amp;quot; across coordinated titles, as in &amp;quot;retire d chairman and CEO&amp;quot;. The phrase parses as follows; note the embedded post semantic phrase types .</Paragraph>
      <Paragraph position="1">  This rule yields the fact retired-ttl(ttl-11), and a similar rule yields retired-ttl(ttl-12) . Other like rules distribute coordinated titles across the title-holder, and so forth . The fact that multiple rules are needed t o distribute adjectives over coordinated noun phrases is one of the drawbacks of semantic grammars . On the other hand, these rules simplify semantic characteristics of distributivity by deferring questions of scope and non-compositionality to a later stage, i.e., inference. Interpretation procedures can thus remain compositional, which makes them substantially simpler to write . Additionally, these kinds of distribution rules further contribute to collating facts relevant to template generation onto the individuals for which these facts hold . Of greater importance, however, is the fact that inference rules are the mechanism by which we instantiat e domain-specific constraints and set up the particulars required for scenario-level templates . Some of this information is again gained by fairly straightforward compositional means . For example, the phrase &amp;quot;Walter IZawleigh Jr., retired chairman of Mobil Corp .&amp;quot; yields a succession template through the mediation of on e inference rule. The phrase is compositionally interpreted as</Paragraph>
      <Paragraph position="3"> The rule that maps these propositions to a succession event is job-out(pers, ttl, org) &lt;-- holds job((pers,job) x) +job((ttl, org), job) + retired-ttl(ttl) When applied to the above propositions this rule yields job-out((pers-17, ttl-15, org-07) j-o-19) . This fact is all that is required for the template generator to subsequently issue the appropriate succession event templates.  The most interesting form of domain inference is not compositional of course, but based on discours e considerations. In the present ST task, for example, succession events are not always fully fleshed out, bu t depend for their complete interpretation on information provided earlier in the discourse . In the walkthrough message, this kind of contextualized interpretation is required early on : Yesterday, McCann made official what had been widely anticipated : Mr. James, 57 years old, is stepping down a s chief executive officer on July 1 [ . . .]. He will be succeeded by Mr. Donner, 45.</Paragraph>
      <Paragraph position="4"> ST-level phrasing and interpretation of this passage produces two relevant facts, a job-out for the firs t clause, and a successor for the second . Note that although successor is normally a two-place relation, its valence here is one by virtue of the phraser not finding a named person as a subject to the clause .</Paragraph>
      <Paragraph position="5">  One approach to contextualizing the succession clause in this text would require first resolving th e pronominal subject &amp;quot;He&amp;quot; to &amp;quot;Mr. James&amp;quot; and then exploiting any job change facts that held about thi s antecedent. An equally effective approach, and simpler, is to ignore the pronoun and reason directly from th e successor fact to any contextualizing job-out fact. The rule that accomplishes this is job-in(pers-a, ttl, org) &lt;-- successor((pers-a) succ) + job-out-in-context?((succ, job-out) x-1) + job-out((pers-b, ttl, org) x-2 ) The mysterious-looking job-out-in-context? predicate implements a simple discourse model : it is true just in case its second argument is the most immediate job-out fact in the context of its first argument . Contextencoding facts are not explicitly present in the database, as their numbers would be legion, but are instantiate d &amp;quot;on demand&amp;quot; when a rule attempts to match such a fact . Note that what counts as a most immediate contextualized fact is itself determined by a separate search procedure . The simple-minded strategy we adopted here is to proceed backwards from the current sentence, searching for the most recent sentence containing a n occurrence of ajob-out phrase, and returning the semantic individual it denotes . In this example, the job-outin-contexa predicate succeeds by binding the succ variable to j-o-23, with the rule overall yielding a job-in fact . job-in((pers-24, ttl-21, org-22) j-i-26 ) As with job-out, this fact is mapped directly by the template generator to an incoming succession template. Note that this process of combining a job-out and successor fact effectively achieves what is ofte n accomplished in data extraction systems by template merging . However, since the merging takes place in th e inferential database, with propagation of relevant facts as a side-effect, the process is greatly simplified an d obviates the need for explicit template comparisons .</Paragraph>
      <Paragraph position="6"> One final wrinkle must be noted . Inference is generally a non-deterministic search problem, with no firm guarantee as to whether facts will be derived in the same chronological order as the sentences which underli e the facts . Rules that require contextualized facts, however, crucially rely on the chronological order of th e sentences underlying these facts. We have thus pulled these rules out of the main fray of inference, and apply them only after all other forward chaining is complete . In fact, these rules are organized as a Brill-style rule sequence, where each rule is allowed to run to quiescence at only one point in the sequence before the nex t rule becomes active. It is our hypothesis, though, that alldomain inference rules can be so organized, not jus t contextualized ones, and that by this organizational scheme, rules can be automatically learned from example .</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="149" end_page="150" type="metho">
    <SectionTitle>
TASK SPECIFIC PROCESSING AND TEMPLATE GENERATIO N
</SectionTitle>
    <Paragraph position="0"> Aside from phrasing and inference, a relatively small--but critical--amount of processing is required t o perform the Muc-6 named entities and template generation tasks .</Paragraph>
    <Paragraph position="1">  For NE, little is actually required beyond careful document management and printing routines . TIMEX forms, introduced by the preprocessing date-tagger, must be preserved through the rest of the processing pipe . Named entity phrases that encode an intermediate stage of NE processing must be suppressed at printout . Examples such as these abound, but by and large, Alembic's NE output is simply a direct readout of the resul t of running the named entity phraser rules.</Paragraph>
    <Paragraph position="2"> Name coreference in TE Of all three tasks, TE is actually the one that explicitly requires most idiosyncratic processing beyon d phrasing and inference . Specifically, this task is the crucible for name coreference, i .e., the process by which short name forms are reconciled with their originating long forms .</Paragraph>
    <Paragraph position="3"> This merging process takes place by iterating over the semantic individuals in the inferential database tha t are of a namable sort (e.g., person or organization) . Every such pair of same-sort individuals is compared t o determine whether one is a derivative form of the other . Several tests are possible.</Paragraph>
    <Paragraph position="4"> * Identity. If the forms are identical strings, as in the frequently repeated &amp;quot;Dooner&amp;quot;, or &amp;quot;McCann&amp;quot; in th e walkthrough article, then they are merged.</Paragraph>
    <Paragraph position="5">  * Shortening . If one form is a shortening of the other, as in &amp;quot;Mr. James&amp;quot; for &amp;quot;Robert L. James&amp;quot;, then the short form is merged as an alias of the longer.</Paragraph>
    <Paragraph position="6"> * Acronyms . If one form appears to be an acronym for the other, as in &amp;quot;CAA&amp;quot; and &amp;quot;Creative Artist s  Agency&amp;quot;, then the forms should be merged, with the acronym designated as an alias . Merging two forms takes place in several steps . First, their respective semantic individuals are equated in the inferential database . This allows facts associated with one form to become propagated to the other. In this way, the nationality information in &amp;quot;Japanese giant NEC&amp;quot; becomes associated with the canonical nam e &amp;quot;Nippon Electric Corp .&amp;quot; As a second step, names that are designated as aliases are recorded as such .</Paragraph>
    <Section position="1" start_page="150" end_page="150" type="sub_section">
      <SectionTitle>
Template generation
</SectionTitle>
      <Paragraph position="0"> We mentioned above that the inferential architecture that we have adopted here is in good part motivate d by a desire to simplify template generation . Indeed, template generation consists of nothing more than reading out the relevant propositions from the database .</Paragraph>
      <Paragraph position="1"> For the TE task, this means identifying person and organization individuals by matching on person(x) o r organization(y) . For each so-matching semantic individual, we create a skeletal template . The skeleton is initialized with name and alias strings that were attached to the semantic individuals during name merging . It is further fleshed out by looking up related facts that hold of the matched individual, e.g., has-location(y, z ) for organizations or has-title(x, w) for persons . These facts are literally just read out of the database .</Paragraph>
      <Paragraph position="2"> Finalization routines are then invoked on the near-final template to fill the ORG TYPE slot and to normalize the geographical fills of the ORG_LOCALE and ORG COUNTRY slots.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="150" end_page="153" type="metho">
    <SectionTitle>
PERFORMANCE ANALYSI S
</SectionTitle>
    <Paragraph position="0"> We participated in two officially-scored tasks at MUC-6, named entities and template elements. As noted above, we put roughly a staff week into customizing the system to handle the scenario templates task, bu t chose not to participate in the evaluation because another staff week or so would have been required to achieve performance on a par with other parts of the system.</Paragraph>
    <Paragraph position="1"> Overall performance On the named entity task, we obtained an official P&amp;R score of 91 .2, where the separate precision and recal l scores were both officially reported to be 91 . The overall score is remarkably close to our performance on th e  dry-run test set which served as our principal source of data for NE training and self-evaluation . To be precise, our final dry-run P&amp;R score prior to the MUC-6 evaluation run was 91.8, a scant o.6 higher than the officially measured evaluation score. The fact that the score dropped so little is encouraging to us .</Paragraph>
    <Paragraph position="2"> On the template elements task, our initial TE score was P&amp;R=78.5, and our revised official score was 77.3. Once again, this performance is encouragingly close to Alembic's performance on our final self-evaluatio n using the formal training data set. By the non-revised metric, we achieved a performance of P&amp;R=80.2 on the training data, with an overall drop of 1.7 points ofP&amp;R between training and official test. Table 2 summarizes slot-by-slot differences between our training and test performance on the TE task. The major differences we noted between training and testing performance lie in the organization alias and descriptor slots, and in th e person name and alias fields ; we have marked these discrepancies with asterisks (*) and will address their caus e later on in this document.</Paragraph>
    <Paragraph position="3"> Walkthrough errors In order to quantify Alembic's performance on the walkthrough message, we compiled an exhaustive analysis of our errors on this text. This was a difficult message for us, and we scored substantially less well on this message than in average, especially on the NE task. To our surprise, the majority of our errors was du e not to knowledge gaps in the named entity tagger, so much as to odd bugs and incompletely thought-ou t design decisions. Table 3 summarizes these errors . Entries marked with daggers (t) correspond to knowledg e gaps, e.g., missing or incorrect rules ; the other entries are coding or design problems. Fully half the problem instances were due to string matching issues for short name forms . For example, by not treating embedde d mid-word hyphens as white space, we failed to process &amp;quot;McCann&amp;quot; as a shortened form of &amp;quot;McCann-Erickson&amp;quot;. Turning now to the template element task, we note that the largest fraction of TE errors are repercussions of errors committed while performing the NE task. In particular, the people-name companies that wer e treated as persons during NE processing in turn led to spurious person templates . The magnitude of the NE error is mitigated by the fact that identical mentions of incorrectly-tagged named entities are merged for th e sake of TE template generation, and thus do not all individually spawn spurious templates . Among the TE errors not arising from NE processing errors, note in particular those that occurred on the most difficult slots , ORG DESCRIPTOR, ORG_LOCALE, and ORG_COUNTRY. These are all due in this case to missing locational an d 15 2 Nature of the problem Problem cases Resulting errors Naive string matching &amp;quot;McCann&amp;quot; vs. &amp;quot;McCann-Erickson &amp;quot; 9 inc type &amp;quot;John Dooner&amp;quot; vs. &amp;quot;John J. Dooner Jr.&amp;quot; 1 inc text, 1 spu type/text Missing phraser patterns t &amp;quot;Fallon McElligott&amp;quot; -- treated as person 1 inc type &amp;quot;Taster's Choice&amp;quot; -- naive `s prorpssing 1 spu type/text Poor phraser patterns t &amp;quot;Coca-Cola Classic&amp;quot; -- zealous org rule 1 spu type/text Missing date patterns t &amp;quot; the 21st century&amp;quot; 1 mis type/text Ambiguous name &amp;quot;New York Times&amp;quot; -- not an org 1 spu type/text Misc. embarrassing bugs &amp;quot;James&amp;quot; in &lt;HL&gt; -- treated as location 1 inc. type &amp;quot;J. Walter Thompson&amp;quot; -- punctoker lost &amp;quot;J .&amp;quot; 1 inc type, 1 inc text Table 3 : NE errors on walkthrough message Nature of the problem Problem cases Resulting errors Repercussions of NE errors &amp;quot;Walter Thompson&amp;quot;, &amp;quot;Fallon McElligott&amp;quot;, 3 spu pers. 1 mis org. alias &amp;quot;McCann&amp;quot; all treated as person &amp;quot;John Dooner&amp;quot; treated as two persons 2 spu pets, 1 mis pers. alias &amp;quot;Coca-Cola Classic&amp;quot; treated as organization 1 inc org . namett, 1 inc. org alias Missing org. NP patternst &amp;quot;the agency with billings of $400 million &amp;quot; 2 mis org. descriptor &amp;quot;one of the largest world-wide agencies &amp;quot; Missing location patterns t &amp;quot;Coke's headquarters in Atlanta&amp;quot; 1 mis org. locale/country Org. type determination t &amp;quot;Creative Artists Agency&amp;quot; -- treated as gov. 1 inc. org type Acronym resolution snafu &amp;quot;CAA&amp;quot; vs. &amp;quot;Creative Artists Agency&amp;quot; 1 inc org. namett, 1 mis org. alias Enthusiastic scorer mapping &amp;quot;New York Times &amp;quot; (spurious entity) mapped to 1 inc. org. namett &amp;quot;Fallon McElligott&amp;quot; (inc. entity type) Table 4 : TE errors on walkthrough message organizational NP phraser rules, which is consistent with trends we noted during training. These observation s are summarized in Table 4. Once again, single daggers (t) mark errors attributable to knowledge gaps . Note also that because of lenient template mappings on the part of the scorer, a number of errors that might intuitively have been whole organization template errors turned out only to manifest themselves a s organization name errors . These cases are marked with double daggers (tt) .</Paragraph>
    <Paragraph position="4"> Other trends In addition to this analysis of the single walkthrough message, we opened up some to% of the test data t o inspection, and performed a rough trend estimation . In particular, we wanted to explain the slot-by-slot discrepancies we had noted between our training and test performance (cf. Table 2) . We found a combination of knowledge gaps, known design problems that had been left unaddressed by the time of the evaluation run, and some truly embarrassing bugs .</Paragraph>
    <Paragraph position="5"> To dispense quickly with the latter, we admit to failing to filter lines beginning with &amp;quot;0&amp;quot; in the body of the message. This was due to the fact that earlier training data had these lines marked with &lt;5&gt; tags, whereas the official test data did not. These 0-lines were so rare in the formal training data that we had simply no t noticed our omission. This primarily affected our NUMEX and TIMEX precision in the named entity task.  In the template element task, our largest performance drop was on the ORG_DESRIPTOR SIOt, where we los t ii points of recall and 13 points of precision. This can be largely attributed to knowledge gaps in our phrase r rules for organizational noun phrases. In particular, we were missing a large number of head nouns that would have been required to identify relevant descriptor NPS .</Paragraph>
    <Paragraph position="6"> On the PERSON_NAME and PERSON_ALIAS Slots, we respectively found a 7 point drop in precision and an 8 point drop in recall . These were due to the same problem, a known flaw that had been left unaddressed i n the days leading to the evaluation . In particular, we had failed to merge short name forms that appeared i n headlines with the longer forms that appeared in the body of the message. For example, &amp;quot;James&amp;quot; in the walkthrough headline field should have been merged with &amp;quot;Robert L. James&amp;quot; in the body of the message . Because these short forms went unmerged, they in turn spawned incorrect person templates, hence the dro p in PERSON and PERSON_NAME precision. For the same reason, the templates that were generated for the long forms of these names ended up without their alias slot filled, accounting for the drop in PERSON_ALIAS recall. A similar problem held true for the ORG ALIAS slot. In this case, we failed both to extract organizatio n templates from the headline fields, or merge short name forms from headlines with longer forms in the tex t bodies. We were aware of these &amp;quot;mis-features&amp;quot; in our handling of person and organization name templates , but had left these problems unaddressed since they seemed to have only minimal impact on the forma l training data. Errare humanum est.</Paragraph>
  </Section>
  <Section position="8" start_page="153" end_page="154" type="metho">
    <SectionTitle>
POST-HOC EXPERIMENTS
</SectionTitle>
    <Paragraph position="0"> With this error analysis behind us, we pursued a number of post-hoc experiments . Most interesting among them was a simple attempt at improving recall on organization names. Indeed, Alembic has only a short list of known organizations--less than a half-dozen in total . Virtually all of the organizations found by Alembic are recognized from first principles . We decided to compare this strategy with one that uses a large lexicon of organization names .</Paragraph>
    <Paragraph position="1"> All of the Muc-6 NE training set was used to generate a list of i,8o8 distinct organization name strings . This could certainly be larger, but seemed a reasonable size . Nonetheless, this lexicon by itself got less than half of the organizations in the official named-entity test corpus : organization recall was 45 and precision 91. Another interesting question is how much an organization lexicon might have helped had it been addedto our rule-based phrasing algorithm, not simply used by itself. This configuration actually decreased ou r performance slightly (F-score down by 0 .5 points of P&amp;R), trading a slight increase in organization recall for a larger decrease in precision . The biggest problem here is due to overgeneration (up from 4% to 6%), and partial matches such as the following, Kraft &lt;ENAMEX&gt;General Foods&lt;/ENAMEX &gt; First &lt; E NA M EX&gt; Fidel ity&lt; /ENAMEX &gt; where &amp;quot;General Foods&amp;quot; and &amp;quot;Fidelity&amp;quot; were in the training corpus for the organization lexicon, but the longer names above were not .</Paragraph>
    <Paragraph position="2"> Admittedly, the way we integrated the organization lexicon into Alembic was relatively naYve, thereb y leading to some of these silly precision errors . We believe that if we more intelligently took advantage of thi s knowledge source, we could reduce the additional precision errors almost entirely . In addition, we were disappointed by the fact that our exhaustive compilation only produced somewhat less than 2,000 organization names, and only led to a piffling improvement in recall . Perhaps had we made use of larger name lists , we might have obtained better recall improvements--a case in point is the gargantuan Dunn &amp; Bradstree t listing exploited by Knight-Ridder for their named entities tagger [ 41 . Note, however, that all but a few of the organizations that were found in both the training name list and the test data were found by Alembic from first principles anyway. We may thus tentatively conclude that short of being as encyclopedic as the D&amp;B listing, a larger, better-integrated organization lexicon may have provided no more than a limite d improvement in F-score. To further improve our organization tagging, it appears that we will simply have t o expend more energy writing named entity phraser rules .</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML