MITRE:
DESCRIPTION OF THE ALEMBIC SYSTEM USED FOR MUC-6
John Aberdeen, John Burger, David Day, Lynette Hirschman, Patricia Robinson, and Marc Vilain
The MITRE Corporatio n
202 Burlington Rd.
Bedford, MA 01730
{aberdeen, john, clay, lynette, parann, mbv}@mitre.org
As with several other veteran Muc participants, MITRE'S Alembic system has undergone a major trans -
formation in the past two years. The genesis of this transformation occurred during a dinner conversation at
the last Muc conference, MUC-5 . At that time, several of us reluctantly admitted that our major impediment
towards improved performance was reliance on then-standard linguistic models of syntax. We knew we
would need an alternative to traditional linguistic grammars, even to the somewhat non-traditional categoria l
pseudo-parser we had in place at the time. The problem was, which alternative ?
The answer came in the form of rule sequences, an approach Eric Brill originally laid out in his work o n
part-of-speech tagging [5, 7] . Rule sequences now underlie all the major processing steps in Alembic. part-of-
speech tagging, syntactic analysis, inference, and even some of the set-fill processing in the Template Elemen t
task (TE) . We have found this approach to provide almost an embarrassment of advantages, speed an d
accuracy being the most externally visible benefits . In addition, most of our rule sequence processors ar e
trainable, typically from small samples . The rules acquired in this way also have the characteristic that the y
allow one to readily mix hand-crafted and machine-learned elements . We have exploited this opportunity t o
apply both machine-learned and hand-crafted rules extensively, choosing in some instances to run sequence s
that were primarily machine-learned, and in other cases to run sequences that were entirely crafted by hand .
ALEMBIC'S OVERALL ARCHITECTUR E
For all the changes that the system has undergone, the coarse architecture of the Muc-6 version ofAlembic
is remarkably close to that of its predecessors. As illustrated in Fig. r, below, processing is still divided int o
three main steps : a UNIX- and c-based preprocess, a Lisp-based syntactic analysis, and a Lisp-based inferenc e
phase. Beyond these coarse-grain similarities, the system diverges significantly from earlier incarnations . We
replaced our categorial grammar pseudo-parser, as suggested above. We also redesigned the preprocess fro m
the ground up . Only the inferential back end of the system is largely unchanged .
The internal module-by-module architecture of the current Alembic is illustrated in Fig. 2, below. The
central innovation in the system is its approach to syntactic analysis, which is now performed through a
sequence of phrase-finding rules that are processed by a simple interpreter . The interpreter has somewhat less
recognition power than a finite-state machine, and operates by successively relabeling the input according t o
the rule actions—more on this below. In support of the syntactic phrase finder, or phraser as we call it, the
input text must be tagged for part-of-speech . This part-of-speech tagging is the principal role of the UNIX
preprocess, and it is itself supported by a number of pretaggers (e .g., for labeling dates and title words) and
zoners (e.g., for word tokenization, sentence boundary determination and headline segmentation) .
The phrases that are parsed by the phraser are subsequently mapped to facts in the inferential database, a
mapping mediated by a simple semantic interpreter . We then exploit inference to instantiate domai n
Syntactic analysis (Lisp)
	
Tractable inference (Lisp)
IL
	
u
	
u
NE markup
	
TE templates
	
5T templates
Figure 1 : Coarse-grained system architecture .
UNIX preprocess
141
Template
Printing
gazetteer
Figure 2: Processing modules in Alembic.
constraints and resolve restricted classes of coreference . The inference system also supports equality reasoning
by congruence closure, and this equality machinery is in turn exploited to perform TE-specific processing, i n
particular, acronym and alias merging. Finally, the template generation module forms the final TE and S T
output by a roughly one-to-one mapping from facts in the inferential database to templates .
THE PREPROCESSORS
As noted above, the UNIX-based portion of the system is primarily responsible for part-of-speech tagging .
Prior to the part-of-speech tagger, however, a text to be processed by Alembic passes through severa l
preprocess stages; each preprocessor "enriches" the text by means of SGML tags. All of these preprocess
components are implemented with LEX (the lexical analyzer generator) and are very fast.
An initial preprocessor, the punctoker, makes decisions about word boundaries that are not coincident with
whitespace. It tokenizes abbreviations (e.g., "Dr."), and decides when sequences of punctuation and alphabeti c
characters are to be broken up into several lexemes (e.g., "Singapore-based") . The punctoker wraps <LEX> tags
around text where necessary to indicate its decisions, as in the following:
Singapore< L EX pos=JJ>-based</LEX >
As this example suggests, in some cases, the punctoker guides subsequent part-of-speech tagging by addin g
a part-of-speech attribute to the <LEX> tags that it emits .
The parasenter zones text for paragraph and sentence boundaries, the former being unnecessary for Muc-6.
The sentence tagging component is both simple and conservative . If any end-of-sentence punctuation has no t
been "explained" by the punctoker as part of a lexeme, as in abbreviations, it is taken to indicate a sentenc e
boundary. The parasenter is also intended to filter lines in the text body that begin with "®" (but see our
error analysis below) . A separate hl-taggeris invoked to zone sentence-like constructs in the headline field .
The preprocess includes specialized phrase taggers. The title-tagger marks personal titles, making distinc-
tions along the lines drawn by the NE and ST tasks. Included are personal honorifics (Dr., Ms.); military an d
religious titles (Vicar, Sgt.); corporate posts (CEO, chairman); and "profession" words (analyst, spokesperson) .
The date-tagger identifies TIMEX phrases. It uses a lex-based scanner as a front-end for tokenizing and
typing its input; then a pattern-matching engine finds the actual date phrases . The date-tagger is fast, sinc e
the pattern matcher itself is highly optimized, and since the lex-based front-end does not actually tokenize th e
input or fire the pattern-matcher unless it suspects that a date phrase may be occurring in the text .
TE Processin g
Acronym s
Aliases
H
UNIX pre-process
Zoning,
Pre-tagging,
fart-of-speech tagging
t
Phraser
NE Rules
TE Rules
CorpNP Rule s
5,T Rule_
142
Both the date- and title-tagger can tag a phrase as either (I) a single SGML element, or (2) individual lexemes ,
with special attributes that indicate the beginning and end of the matrix phrase, as i n
<LEX post=start>chief</LEX > <LEX post=mid>executive</LEX > <LEX post=end>officer</LEX >
We adopted this LEX-based phrase encoding so as to simplify (and speed up) the input scanner of the part-
of-speech tagger. In addition, a phrase's LEX tags can encode parts-of-speech to help guide the p-o-s tagger.
THE PART-OF-SPEECH TAGGE R
Our part-of-speech tagger is closest among the components of our Muc-6 system to Brill's original work o n
rule sequences [S, 6, 7] . The tagger is in fact a re-implementation of Brill's widely-disseminated system, wit h
various speed and maintainability improvements. Most of the rule sequences that drive the tagger were
automatically learned from hand-tagged corpora, rather than hand-crafted by human engineers . However,
the rules are in a human-understandable form, and thus hand-crafted rules can easily be combined with
automatically learned rules, a property which we exploited in the Muc-6 version of Alembic.
The tagger operates on text that has been lexicalized through pre-processing . The following, for example ,
is how a sample walkthrough sentence is passed to the part-of-speech tagger. Note how punctuation has bee n
tokenized, and "Mr." has been identified as a title and assigned the part-of-speech NNP (proper noun) .
<5>Even so<lex>,</lex> <LEX pos=NNP ttl=WHOLE>Mr .</LEX> Dooner is on the prowl for more creative talen t
and is interested in acquiring a hot agency<lex>.</lex></5 >
The part-of-speech tagger first assigns initial parts-of-speech by consulting a large lexicon . The lexico n
maps words to their most frequently occurring tag in the training corpus . Words that do not appear in th e
lexicon are assigned a default tag of NN (common noun) or NNP (proper noun), depending on capitalization.
For unknown words, after a default tag is assigned, lexical rules apply to improve the initial guess . These
rules operate principally by inspecting the morphology of words . For example, an early rule in the lexical rul e
sequence retags unknown words ending in "ly" with the 10 tag (adverb) . In the sentence above, the only
unknown word ("Dooner") is not subject to retagging by lexical rules; in fact, the default NNP tag assignment i s
correct. Lexical rules play a larger role when the default tagging lexicon is less complete than our own, which
we generated from the whole Brown Corpus plus 3 million words of Wall Street Journal text . For example, in
our experiments tagging Spanish texts (for which we had much smaller lexica), we have found that lexica l
rules play a larger role (this can also be partially attributed to the more inflected nature of Spanish) .
After the initial tagging, contextual rules apply in an attempt to further fix errors in the tagging . These
rules reassign a word's tag on the basis of neighboring words and their tags . In this sentence, "more" changes
from its initial JJR (comparative adjective) to RBR (comparative adverb) . Note that this change is arguably
erroneous, depending on how one reads the scope of "more". This tagging is changed by the following rule,
which roughly reads: change word W from JJR to ROR if the the word to W's immediate right is tagged JJ
JJR RBR nexttag JJ
Table 1, below, illustrates the tagging process . The sample sentence is on the first line; its initial lexicon-
based tagging is on the second line; the third line shows the final tagging produced by the contextual rules .
In controlled experiments, we measured the tagger's accuracy on Wall Street Journal text at 95.1% based on
a training set of140,000 words. The production version of the tagger, which we used for Muc-6, relies on the
Even so . Mr. Dooner Is on the prowl for more creative talent and Is Interested In acquiring a hot agency
rb rb
	
nnp NNP vbz In dt
	
nn
	
in JJR
	
jj
	
nn
	
cc vbz
	
jj
	
In
	
vim
	
dt jj
	
n n
rb rb
	
nnp NNP vbz In dt
	
nn
	
in RBR
	
jj
	
nn
	
cc vbz
	
jj
	
In
	
vim
	
dt jj
	
n n
Table 1 : Tagging a text with the lexicon (line 2) and contextual rules (line 3) . Note the default
lexicon assignment of nnp to "Dooner" and the rule-based correction of "more" .
143
learned rules from Brill's release 1.1 (148 lexical rules, 283 contextual rules), for which Brill has measure d
accuracies that are 2-3 percentage points higher than in our own smaller-scale experiments . For MUC-6, we
combined these rules with 19 hand-crafted contextual rules that correct residual tagging errors that were
especially detrimental to our NE performance. Tagger throughput is around 3000 words/sec.
THE PHRASER
The Alembic phrase finder, or phraser for short, performs the bulk of the system's syntactic analysis . As
noted above, it has somewhat less recognition power than a finite-state machine, and as such shares many
characteristics of pattern-matching systems, such as CIRCUS [10] or FASTUS [2] . Where it differs from these
systems is in being driven by rule sequences. We have experimented with both automatically-learned rul e
sequences and hand-crafted ones . In the system we fielded for Muc-6, we ended up running entirely with
hand-crafted sequences, as they outperformed the automatically-learned rules .
How the phraser works
The phraser process operates in several steps. First, a set of initial phrasing functions is applied to all of the
sentences to be analyzed . These functions are responsible for seeding the sentences with likely candidat e
phrases of various kinds. This seeding process is driven by word lists, part-of-speech information, and pre-
taggings provided by the preprocessors. Initial phrasing produces a number of phrase structures, many o f
which have the initial null labeling (none), while some have been assigned an initial label (e .g., num) . The
following example shows a sample sentence from the walkthrough message after initial phrasing .
Yesterday, <none>McCann</none> made official what had been widely anticipated : <ttl>Mr.</ttl>
<none>James</none>, <num>57</num> years old, is stepping down as <post>chief executive officer</post> o n
<date>July 1</date> and will retire as <post>chairman</post> at the end of the year .
The post, ttl, and date phrases were identified by the title and date taggers . Mr. James' num-tagged age is
identified on the basis of part of speech information, as is the organization name "McCann" .
Once the initial phrasing has taken place, the phraser proceeds with phrase identification proper . This is
driven by a sequence of phrase-finding rules . Each rule in the sequence is applied in turn against all of the
phrases in all the sentences under analysis. If the antecedents of the rule are satisfied by a phrase, then th e
action indicated by the rule is executed immediately. The action can either change the label of the satisfyin g
phrase, grow its boundaries, or create new phrases. After the nth rule is applied in this way against every
phrase in all the sentences, the n+lth rule is applied in the same way, until all rules have been applied . After
all of the rules have been applied, the phraser is done.
It is important to note that the search strategy in the phraser differs significantly from that in standar d
parsers. In standard parsing, one searches for any and all rules whose antecedents might apply given the stat e
of the parser's chart : all these rules become candidates for application, and indeed they all are applie d
(modulo higher-order search control) . In our phraser, only the current rule in a rule sequence is tested: the
rule is applied wherever this test succeeds, and the rule is never revisited at any subsequent stage of processing .
After the final rule of a sequence is run, no further processing occurs .
The language of phraser rules
The language of the phraser rules is as simple as their control strategy . Rules can test lexemes to the left
and right of the phrase, or they can look at the lexemes in the phrase . Tests in turn can be part-of-speech
queries, literal lexeme matches, tests for the presence of neighboring phrases, or the application of predicates
that are evaluated by invoking a Lisp procedure. There are several reasons for keeping this rule languag e
simple. In the case of hand-crafted rules, it facilitates the process of designing a rule sequence. In the case o f
machine-learned rules, it restricts the size of the search space on each epoch of the learning regimen, thus
making it tractable. In either case, the overall processing power derives as much from the fact that the rules
are sequenced, and feed each other in turn, as it does from the expressiveness of the rule language.
144
To make this clearer, consider a simple named entity rule that Is applied to identifying persons.
(clef-phraser
label
	
none
left-1
	
phrase ttl
label-action person)
This rule changes the label of a phrase from none to person if the phrase is bordered on its left by a ttl
phrase. On the sample sentence, this rule causes the following relabeling of the phrase around "James" .
Yesterday, <none>McCann</none> made official what had been widely anticipated : <ttl>Mr.</ttl>
<person>James</person>, <num>57</num> years old, is stepping down as <post>chief executiv e
officer</post> on <date>July 1</date> and will retire as <post>chairman</post> at the end of the year .
Once this rule has run, the labelings it instantiates become available as input to subsequent rules in th e
sequence, e.g., rules that attach the title to the person in "Mr. James", that attach the age apposition, and s o
forth. Phraser rules do make mistakes, but as with other sequence-based processors, the phraser applies later
rules in a sequence to patch errors made by earlier rules . In the walkthrough message, for example, "Amarati &
Purls" is identified as an organization, which ultimately leads to an incorrect org tag for "Martin Purls", since
this person's name shares a common substring with the organization name . However, rules that find personal
names occur later in our named entities sequence than those which find organizations, thus allowing th e
phraser to correctly relabel "Martin Purls" as a person on the basis of a test for common first names.
Rule sequences for MUG6
For MUC-6, Alembic relies on three sequences of phraser rules, divided roughly into rules for generatin g
NE-specific phrases, those for finding TE-related phrases, and those for ST phrases. The division is only rough ,
as the NE sequence yields some number of TE-related phrases as a side-effect of searching for named entities .
To illustrate this process, consider the following walkthrough sentence, as tagged by the NE rule sequence .
But the bragging rights to <org>Coke</org> 's ubiquitous advertising belongs to <org>Creative Artists Agency
</org>, the big <location>Hollywood</location> talent agency.
The org label on "Creative Artists Agency" was set by a predicate that tests for org keywords (like "Agency").
"Coke" was found to be an org elsewhere in the document, and the label was then percolated . Finally, the
location label on "Hollywood" was set by a predicate that inspects the tried-and-not-so-true TIPSTER gazetteer.
What is important to note about these NE phraser rules is that they do not rely on a large database o f
known company names . Instead, the rules are designed to recognize organization names in almost complet e
absence of any information about particular organization names (with the sole exception of a few acronyms
such as IBM, GM, etc.) . This seems to auger well for the ability to apply Alembic to different application tasks .
Proceeding beyond named entities, the phraser next applies its TE-specific rule sequence . This sequence
performs manipulations that resemble NP parsing, e.g., attaching locational modifiers. In addition, a
subsequence of TE rules concentrates on recognizing potential organization descriptors . These rules generate
so-called corpnp phrases, that is noun phrases that are headed by an organizational common noun (such a s
"agency", "maker", and of course "company") . The rules expand these heads leftwards to incorporate lexemes
that satisfy a set of part-of-speech constraints. One such phrase, for example, is in the sample sentence above .
But the bragging rights to <org>Coke</org> 's ubiquitous advertising belongs to <org><org>Creative Artist s
Agency </org>, <corpnp> the big <location>Hollywood</location> talent agency</corpnp></org >
After corpnp phrases have been marked, another collection of TE rules associates these phrases with neigh-
boring org phrases. In this case such a phrase is found two places to the left (on the other side of a comma) ,
so a new org phrase is created which spans both the original org phrase and its corpnp neighbor. See above.
Note that these rule sequences encode a semantic grammar . Organizationally-headed noun phrases are
labeled as org, regardless of whether they are simple proper names or more complex constituents such as th e
145
* * * TOTAL SLOT SCORES * * *
	 +	 +	 +	
SLOT
	
POS ACTT COR PAR INCI SPU MIS NONI REC PRE UND OVG ERR SU B
	 +	 +	 +	
<enamex>938 9911 881 0 01 110 57 01 94 89 6 11 16 0
type 938 9911 775 0 1061 110 57 01 83 78 6 11 26 12
text 938 9911 840 0 411 110 57 01 90 85 6 11 20 5
subto 1876 19821 1615 0 1471 220 114 01 86 81 6 11 23 8
	 +	 +	 +
ALL OB 2286 24061 1993 0 1631 250 130 01 87 83 6 10 21
	
8
MATCHD 2156 21561 1993 0 1631 0 0 01 92 92 0 0 8
	
8
	 +	 +	 +
P&R
	
2P&R
	
P&2R
F-MEASURES
	
84.95
	
83 .67
	
86 .28
* * * TASK SUBCATEGORIZATION SCORES * * *
	 +	 +	 +	
SLOT
	
POS ACTT COR PAR INC' SPU MIS NONI REC PRE UND OVG ERR SUB
	 +	 +	 +	
Enamex:
organi 454 4931 392 0 281 73 34 0I 86 80 7 15 26 7
person 373 3641 292 0 601 12 21 0I 78 80 6 3 24 17
locati 111 1341 91 0 181 25 2 0I 82 68 2 19 33 16
	 Figure4: Performance of rules learned for theENAMEXportion of theNEtask(unofficialscore)
org-corpnp apposition above. This semantic characteristic of the phraser grammar is clearer still with ST rules.
These rules are responsible for finding phrases denoting events relevant to the MUC-6 scenario templates .) For
the succession scenario, this consists of a few key phrase types, most salient among them : job (a post at an
org), job-in and job-out (fully specified successions) and post-in and post-out (partially specified successions) .
The following example shows the ST phrases parsed out of a key sentence from the walkthrough message.
Yesterday, <person>McCann</person> made official what had been widely anticipated :
<post-out>
<person>
< person ><ttl >Mr. </ttl> <person>James</person> </person> ,
<age><num>57</num> years old</age >
</person>,
is stepping down as
<post>chief executive officer</post>
</post-out> [. . .]
The post-out phrase encodes the resignation of a person in a post. Note that in the formal evaluation we
failed to find a more-correct job-out phrase, which should have included "McCann". This happened because we
did not successfully identify "McCann" as an organization, thus precluding the formation of the job-out phrase.
Learning Phrase Rules
We have applied the same general error-reduction learning approach that Brill designed for generating
part-of-speech rules to the problem of learning phraser rules in support of the NE task. The official version of
Alembic for MUC-6 did not use any of the rule sequences generated by this phrase rule learner, but we hav e
since generated unofficial scores . In these runs we used phrase rules that had been learned for the ENAMEX
expressions only—we still used the hand-coded pre-processors and phraser rules for recognizing TIMEX and
NUMEX phrases. Our performance on this task is shown in Fig. 4, above. These rules yield six fewer points o f
P&R than the hand-coded ENAMEX rules—still an impressive result for machine-learned rules . Interestingly,
the bulk of the additional error in the machine-learned rules is not with the "hard" organization names, but
with person names OR=-If, LP=-14) and locations (AR-I2, AP=-18) .
1We put about one staff week of work into the sT task, during which we experienced steep hill-climbing on the training set . Never-
theless, we felt that the maturity of our sT processing was sufficiently questionable to preclude participating in the official evaluation .
The present discussion should be taken in this light, i .e., with the understanding that it was not officially evaluated atMuc-6 .
146
PHRASE INTERPRETATION AND INFERENC E
The inference component is central to all processing beyond phrase identification . It has three roles .
• As a representational substrate, it records propositions encoding the semantics of parsed phrases ;
• As an equational system, it allows initially distinct semantic individuals to be equated to each other, an d
allows propositions about these individuals to be merged through congruence closure .
• As a limited inference system, it allows domain-specific and general constraints to be instantiate d
through carefully controlled forward-chaining .
Phrase Interpretation
Facts enter the propositional database as the result of phrase interpretation . The phrase interpreter is
controlled by a small set of Lisp interpretation functions, roughly one for each phrase type . Base-level
phrases, i .e. phrases with no embedded phrases, are mapped to unary interpretations. The phras e
<person>IZobert L. James</person>,
for example is mapped to the following propositional fact . Note the pers-01 term in this proposition: it
designates the semantic individual denoted by the phrase, and is generated in the process of interpretation .
person(pers-01)
Complex phrases, those with embedded phrases, are typically interpreted as conjuncts of simple r
interpretations (the exception being NP coordination, as in "chairman and chief executive") . Consider the
phrase "Mr. James, 57 years old" which is parsed by the phraser as follows . Note in particular that the overal l
person-age apposition is itself parsed as a person phrase.
<person><person>Mr. James</person>, <age><num>57</num> years old</age></person >
The treatment of age appositions is compositional, as is the case for the interpretation of all but a few
complex phrases . Once again, the embedded base-level phrase ends up interpreted as a unary person fact.
The semantic account of the overall apposition ends up as a has-age relation modifying pers-02, the semantic
individual for the embedded person phrase. This proposition designates the semantic relationship between a
person and that person's age. More precisely, the following facts are added to the inferential database .
person(pers-02)
has-age((pers-02, age-03) ha-04 )
age(age-03 )
What appears to be a spare argument to the has-age predicate above is the event individual for the
predicate. Such arguments denote events themselves (in this case the event of being a particular number o f
years old), as opposed to the individuals participating in the events (the individual and his or her age) . This
treatment is similar to the partial Davidsonian analysis of events due to Hobbs [8] . Note that event indi-
viduals are by definition only associated with relations, not unary predicates .
As a point of clarification, note that the inference system does not encode facts at the predicate calculu s
level so much as at the interpretation level made popular in such systems as the SRI core language engine [1, 3] .
In other words, the representation is actually a structured attribute-value graph such as the following, whic h
encodes the age apposition above.
[[head :person]
[proxy pers-02]
[modifiers [[head has-age]
[proxy ha-04]
[arguments (pers-02 [[head age ]
[proxy age-03]])]]] ]
The first two fields correspond to the embedded phrase : the head field is a semantic sort, and the proxy
field holds the designator for the semantic individual denoted by the phrase . The interpretation encoding the
147
overall apposition ends up in the modifiers slot, an approach adopted from the standard linguistic account o f
phrase modification. Inference in Alembic is actually performed directly on interpretation structures, an d
there is no need for a separate translation from interpretations to more traditional-looking propositions . The
propositional notation is more perspicuous to the reader, and we have adopted it here .
Finally, note that the phrase interpretation machinery maintains pointers between semantic individual s
and the surface strings from which they originated . One of the fortunate—if unexpected—consequences o f
the phraser's semantic grammar is that maintaining these cross-references is considerably simpler than was th e
case in our more linguistically-inspired categorial parser of old . Except for the ORG_DESCRIPTOR slot, the fil l
rules line up more readily with semantic notions than with syntactic considerations, e .g., maximal projections .
Equality reasoning
Much of the strength of this inferential framework derives from its equality mechanism . This
subcomponent allows one to make two semantic individuals co-designating, i.e., to "equate" them . Facts that
formerly held of only one individual are then copied to its co-designating siblings . This in turn enables
inference that may have been previously inhibited because the necessary antecedents were distributed ove r
(what were then) distinct individuals .
This equality machinery is exploited at many levels in processing semantic and domain constraints . One
of the clearest such uses is in enforcing the semantics of coreference, either definite reference or appositional
coreference. Take for example the following phrase from the walkthrough message, which we show here a s
parsed by the phraser.
<org>
<org>Creative Artists Agency</org> ,
<orgnp>the big <location>Hollywood</Iocation> talent agency</orgnp>
</org>
In propositional terms, the embedded organization is interpreted a s
organization(org-05)
	
"Creative Artists Agency"
The appositive noun phrase is interpreted a s
organization(org-06)
	
"the. . .agency"
geo-region(geo-07)
	
"Hollywood"
has-location((org-06, geo-07) hasloc-08)
	
locational pre-modifier
Pressing on, the phraser parses the overall org-orgnp apposition as an overarching org. To interpret th e
apposition, the interpreter also adds the following proposition to the database .
entity-np-app((org-05, org-06) e-n-a-09)
This ultimately causes org-05 and org-06 to become co-designating through the equality system, and th e
following fact appears in the inferential database .
has-location((org-05, geo-07) hasloc-10)
	
i.e., Creative Artists Agency is located in Hollywood
This propagation of facts from one individual CO its co-designating siblings is the heart of our coreferenc e
mechanism. Its repercussions are particularly critical to the subsequent stage of template generation. By
propagating facts in this way, we can dramatically simplify the process of collating information into templates ,
since all the information relevant to, say, an individual company will have been attached to that company b y
equality reasoning. We will touch on this point again below.
Inference
The final role of the Alembic inference component is to derive new facts through the application o f
carefully-controlled forward inference. As was the case with our MUC-5 system, the present Alembic allows
only limited forward inference. Though the full details of this inference process are of legitimate interest i n
148
their own right, we will only note some highlights here . To begin with, the tractability of forward inference
in this framework is guaranteed just in case the inference axioms meet a certain syntactic requirement . To
date, all the rules we have written for even complex domains, such as the joint-venture task in MUC-5, have
met this criterion. Aside from this theoretical bound on computation, we have found in practice that th e
inference system is remarkably fast, with semantic interpretation, equality reasoning, rule application, and al l
other aspects of inference together accounting for 6-7% of all processing time in Alembic. Details are in [II] .
We exploited inference rules in several primary ways for the TE and ST tasks. The first class of inferenc e
rules enforce so-called terminological reasoning, local inference that composes the meaning of words . One
such rule distributes the meaning of certain adjectives such as "retired" across coordinated titles, as in "retire d
chairman and CEO". The phrase parses as follows; note the embedded post semantic phrase types .
<post>
< post-qua I > retired </post- aua I >
<post>
<post>chai rman </post>
and
<post>CEO</post >
</post>
</post>
This particular example propositionalizes as follows, where the group construct denotes a plural individual
in Landman's sense (roughly a set [9]) .
title(ttl-11)
	
"chairman"
title(ttl-12)
	
"CEO"
group((ttl-11, ttl-12) grp-13)
	
"chairman and CEO"
retired-ttl(grp-13)
	
"retired"
To shift the scope of "retired" from the overall coordination to individual titles, the following rule applies .
retired-ttl(ttl) <— group((ttl, x) grp) + retired(grp )
This rule yields the fact retired-ttl(ttl-11), and a similar rule yields retired-ttl(ttl-12) . Other like rules
distribute coordinated titles across the title-holder, and so forth . The fact that multiple rules are needed t o
distribute adjectives over coordinated noun phrases is one of the drawbacks of semantic grammars . On the
other hand, these rules simplify semantic characteristics of distributivity by deferring questions of scope and
non-compositionality to a later stage, i.e., inference. Interpretation procedures can thus remain composi-
tional, which makes them substantially simpler to write . Additionally, these kinds of distribution rules further
contribute to collating facts relevant to template generation onto the individuals for which these facts hold .
Of greater importance, however, is the fact that inference rules are the mechanism by which we instantiat e
domain-specific constraints and set up the particulars required for scenario-level templates . Some of this
information is again gained by fairly straightforward compositional means . For example, the phrase "Walter
IZawleigh Jr., retired chairman of Mobil Corp ." yields a succession template through the mediation of on e
inference rule. The phrase is compositionally interpreted as
organization(org-14)
title(ttl-15)
retired-ttl(ttl-15)
job((ttl-15, org-14), job-16)
person(pers-17)
holds job((pers-17, job-16) h -j-18)
The rule that maps these propositions to a succession event is
job-out(pers, ttl, org) <— holds job((pers,job) x) +job((ttl, org), job) + retired-ttl(ttl)
When applied to the above propositions this rule yields job-out((pers-17, ttl-15, org-07) j-o-19) . This fact is
all that is required for the template generator to subsequently issue the appropriate succession event templates.
149
The most interesting form of domain inference is not compositional of course, but based on discours e
considerations. In the present ST task, for example, succession events are not always fully fleshed out, bu t
depend for their complete interpretation on information provided earlier in the discourse . In the
walkthrough message, this kind of contextualized interpretation is required early on :
Yesterday, McCann made official what had been widely anticipated : Mr. James, 57 years old, is stepping down a s
chief executive officer on July 1 [ . . .]. He will be succeeded by Mr. Donner, 45.
ST-level phrasing and interpretation of this passage produces two relevant facts, a job-out for the firs t
clause, and a successor for the second . Note that although successor is normally a two-place relation, its
valence here is one by virtue of the phraser not finding a named person as a subject to the clause .
person(pers-20)
	
"Mr. James"
title(ttl-21)
	
; "chairman"
organization(org-22)
	
"McCann "
job-out((pers-20, ttl-21, org-22) j-o-23)
person(pers-24)
	
"Mr. Dooner"
successor((pers-24) succ-25 )
One approach to contextualizing the succession clause in this text would require first resolving th e
pronominal subject "He" to "Mr. James" and then exploiting any job change facts that held about thi s
antecedent. An equally effective approach, and simpler, is to ignore the pronoun and reason directly from th e
successor fact to any contextualizing job-out fact. The rule that accomplishes this is
job-in(pers-a, ttl, org) <— successor((pers-a) succ) + job-out-in-context?((succ, job-out) x-1) +
job-out((pers-b, ttl, org) x-2 )
The mysterious-looking job-out-in-context? predicate implements a simple discourse model : it is true just
in case its second argument is the most immediate job-out fact in the context of its first argument . Context-
encoding facts are not explicitly present in the database, as their numbers would be legion, but are instantiate d
"on demand" when a rule attempts to match such a fact . Note that what counts as a most immediate contex-
tualized fact is itself determined by a separate search procedure . The simple-minded strategy we adopted here
is to proceed backwards from the current sentence, searching for the most recent sentence containing a n
occurrence of ajob-out phrase, and returning the semantic individual it denotes . In this example, the job-out-
in-contexa predicate succeeds by binding the succ variable to j-o-23, with the rule overall yielding a job-in fact .
job-in((pers-24, ttl-21, org-22) j-i-26 )
As with job-out, this fact is mapped directly by the template generator to an incoming succession template.
Note that this process of combining a job-out and successor fact effectively achieves what is ofte n
accomplished in data extraction systems by template merging . However, since the merging takes place in th e
inferential database, with propagation of relevant facts as a side-effect, the process is greatly simplified an d
obviates the need for explicit template comparisons .
One final wrinkle must be noted . Inference is generally a non-deterministic search problem, with no firm
guarantee as to whether facts will be derived in the same chronological order as the sentences which underli e
the facts . Rules that require contextualized facts, however, crucially rely on the chronological order of th e
sentences underlying these facts. We have thus pulled these rules out of the main fray of inference, and apply
them only after all other forward chaining is complete . In fact, these rules are organized as a Brill-style rule
sequence, where each rule is allowed to run to quiescence at only one point in the sequence before the nex t
rule becomes active. It is our hypothesis, though, that alldomain inference rules can be so organized, not jus t
contextualized ones, and that by this organizational scheme, rules can be automatically learned from example .
TASK SPECIFIC PROCESSING AND TEMPLATE GENERATIO N
Aside from phrasing and inference, a relatively small—but critical—amount of processing is required t o
perform the Muc-6 named entities and template generation tasks .
150
For NE, little is actually required beyond careful document management and printing routines . TIMEX
forms, introduced by the preprocessing date-tagger, must be preserved through the rest of the processing pipe .
Named entity phrases that encode an intermediate stage of NE processing must be suppressed at printout .
Examples such as these abound, but by and large, Alembic's NE output is simply a direct readout of the resul t
of running the named entity phraser rules.
Name coreference in TE
Of all three tasks, TE is actually the one that explicitly requires most idiosyncratic processing beyon d
phrasing and inference . Specifically, this task is the crucible for name coreference, i .e., the process by which
short name forms are reconciled with their originating long forms .
This merging process takes place by iterating over the semantic individuals in the inferential database tha t
are of a namable sort (e.g., person or organization) . Every such pair of same-sort individuals is compared t o
determine whether one is a derivative form of the other . Several tests are possible.
• Identity. If the forms are identical strings, as in the frequently repeated "Dooner", or "McCann" in th e
walkthrough article, then they are merged.
• Shortening . If one form is a shortening of the other, as in "Mr. James" for "Robert L. James", then the
short form is merged as an alias of the longer.
• Acronyms . If one form appears to be an acronym for the other, as in "CAA" and "Creative Artist s
Agency", then the forms should be merged, with the acronym designated as an alias .
Merging two forms takes place in several steps . First, their respective semantic individuals are equated in
the inferential database . This allows facts associated with one form to become propagated to the other. In
this way, the nationality information in "Japanese giant NEC" becomes associated with the canonical nam e
"Nippon Electric Corp ." As a second step, names that are designated as aliases are recorded as such .
Template generation
We mentioned above that the inferential architecture that we have adopted here is in good part motivate d
by a desire to simplify template generation . Indeed, template generation consists of nothing more than
reading out the relevant propositions from the database .
For the TE task, this means identifying person and organization individuals by matching on person(x) o r
organization(y) . For each so-matching semantic individual, we create a skeletal template . The skeleton is
initialized with name and alias strings that were attached to the semantic individuals during name merging . It
is further fleshed out by looking up related facts that hold of the matched individual, e.g., has-location(y, z )
for organizations or has-title(x, w) for persons . These facts are literally just read out of the database .
Finalization routines are then invoked on the near-final template to fill the ORG TYPE slot and to normalize
the geographical fills of the ORG_LOCALE and ORG COUNTRY slots.
PERFORMANCE ANALYSI S
We participated in two officially-scored tasks at MUC-6, named entities and template elements. As noted
above, we put roughly a staff week into customizing the system to handle the scenario templates task, bu t
chose not to participate in the evaluation because another staff week or so would have been required to
achieve performance on a par with other parts of the system.
Overall performance
On the named entity task, we obtained an official P&R score of 91 .2, where the separate precision and recal l
scores were both officially reported to be 91 . The overall score is remarkably close to our performance on th e
151
Formal training
	
Official test
Recall Precision Recall Precision Recall A Precis. A
organization 86 92 84 92 -2 —
name 76 78 77 80 +1 +2
alias 60 79 56 78 -4* -1
descriptor 27 62 16 49 -11* -13*
type 83 90 81 89 -2 -1
locale 46 87 43 87 -3 —
country 47 88 45 93 -2 +5
person 94 92 95 87 +1 -5*
name 93 91 93 84 — _7*
alias 94 95 86 96 -8* +1
title 95 96 94 93 -1 -3
All Objects 75 86 73 85 -2 -1
F-Measure, unrevised 80.21 78.52 -1.69
Table 2: Slot-by-slot performance differences, TE task (unrevised scores) .
dry-run test set which served as our principal source of data for NE training and self-evaluation . To be precise,
our final dry-run P&R score prior to the MUC-6 evaluation run was 91.8, a scant o.6 higher than the officially
measured evaluation score. The fact that the score dropped so little is encouraging to us .
On the template elements task, our initial TE score was P&R=78.5, and our revised official score was 77.3.
Once again, this performance is encouragingly close to Alembic's performance on our final self-evaluatio n
using the formal training data set. By the non-revised metric, we achieved a performance of P&R=80.2 on the
training data, with an overall drop of 1.7 points ofP&R between training and official test. Table 2 summarizes
slot-by-slot differences between our training and test performance on the TE task. The major differences we
noted between training and testing performance lie in the organization alias and descriptor slots, and in th e
person name and alias fields ; we have marked these discrepancies with asterisks (*) and will address their caus e
later on in this document.
Walkthrough errors
In order to quantify Alembic's performance on the walkthrough message, we compiled an exhaustive
analysis of our errors on this text. This was a difficult message for us, and we scored substantially less well on
this message than in average, especially on the NE task. To our surprise, the majority of our errors was du e
not to knowledge gaps in the named entity tagger, so much as to odd bugs and incompletely thought-ou t
design decisions. Table 3 summarizes these errors . Entries marked with daggers (t) correspond to knowledg e
gaps, e.g., missing or incorrect rules ; the other entries are coding or design problems. Fully half the problem
instances were due to string matching issues for short name forms . For example, by not treating embedde d
mid-word hyphens as white space, we failed to process "McCann" as a shortened form of "McCann-Erickson".
Turning now to the template element task, we note that the largest fraction of TE errors are repercussions
of errors committed while performing the NE task. In particular, the people-name companies that wer e
treated as persons during NE processing in turn led to spurious person templates . The magnitude of the NE
error is mitigated by the fact that identical mentions of incorrectly-tagged named entities are merged for th e
sake of TE template generation, and thus do not all individually spawn spurious templates . Among the TE
errors not arising from NE processing errors, note in particular those that occurred on the most difficult slots ,
ORG DESCRIPTOR, ORG_LOCALE, and ORG_COUNTRY. These are all due in this case to missing locational an d
15 2
Nature of the problem Problem cases Resulting errors
Naive string matching "McCann" vs. "McCann-Erickson " 9 inc type
"John Dooner" vs. "John J. Dooner Jr." 1 inc text, 1 spu type/text
Missing phraser patterns t "Fallon McElligott" — treated as person 1 inc type
"Taster's Choice" — naive `s prorpssing 1 spu type/text
Poor phraser patterns t "Coca-Cola Classic" — zealous org rule 1 spu type/text
Missing date patterns t " the 21st century" 1 mis type/text
Ambiguous name "New York Times" — not an org 1 spu type/text
Misc. embarrassing bugs "James" in <HL> — treated as location 1 inc. type
"J. Walter Thompson" — punctoker lost "J ." 1 inc type, 1 inc text
Table 3 : NE errors on walkthrough message
Nature of the problem Problem cases Resulting errors
Repercussions of NE errors "Walter Thompson", "Fallon McElligott", 3 spu pers. 1 mis org. alias
"McCann" all treated as person
"John Dooner" treated as two persons 2 spu pets, 1 mis pers. alias
"Coca-Cola Classic" treated as organization 1 inc org . namett, 1 inc. org
alias
Missing org. NP patternst "the agency with billings of $400 million " 2 mis org. descriptor
"one of the largest world-wide agencies "
Missing location patterns t "Coke's headquarters in Atlanta" 1 mis org. locale/country
Org. type determination t "Creative Artists Agency" — treated as gov. 1 inc. org type
Acronym resolution snafu "CAA" vs. "Creative Artists Agency" 1 inc org. namett,
1 mis org. alias
Enthusiastic scorer mapping "New York Times " (spurious entity) mapped to 1 inc. org. namett
"Fallon McElligott" (inc. entity type)
Table 4 : TE errors on walkthrough message
organizational NP phraser rules, which is consistent with trends we noted during training. These observation s
are summarized in Table 4. Once again, single daggers (t) mark errors attributable to knowledge gaps .
Note also that because of lenient template mappings on the part of the scorer, a number of errors that
might intuitively have been whole organization template errors turned out only to manifest themselves a s
organization name errors . These cases are marked with double daggers (tt) .
Other trends
In addition to this analysis of the single walkthrough message, we opened up some to% of the test data t o
inspection, and performed a rough trend estimation . In particular, we wanted to explain the slot-by-slot
discrepancies we had noted between our training and test performance (cf. Table 2) . We found a com-
bination of knowledge gaps, known design problems that had been left unaddressed by the time of the
evaluation run, and some truly embarrassing bugs .
To dispense quickly with the latter, we admit to failing to filter lines beginning with "0" in the body of
the message. This was due to the fact that earlier training data had these lines marked with <5> tags, whereas
the official test data did not. These 0-lines were so rare in the formal training data that we had simply no t
noticed our omission. This primarily affected our NUMEX and TIMEX precision in the named entity task.
153
In the template element task, our largest performance drop was on the ORG_DESRIPTOR SIOt, where we los t
ii points of recall and 13 points of precision. This can be largely attributed to knowledge gaps in our phrase r
rules for organizational noun phrases. In particular, we were missing a large number of head nouns that
would have been required to identify relevant descriptor NPS .
On the PERSON_NAME and PERSON_ALIAS Slots, we respectively found a 7 point drop in precision and an 8
point drop in recall . These were due to the same problem, a known flaw that had been left unaddressed i n
the days leading to the evaluation . In particular, we had failed to merge short name forms that appeared i n
headlines with the longer forms that appeared in the body of the message. For example, "James" in the
walkthrough headline field should have been merged with "Robert L. James" in the body of the message .
Because these short forms went unmerged, they in turn spawned incorrect person templates, hence the dro p
in PERSON and PERSON_NAME precision. For the same reason, the templates that were generated for the long
forms of these names ended up without their alias slot filled, accounting for the drop in PERSON_ALIAS recall.
A similar problem held true for the ORG ALIAS slot. In this case, we failed both to extract organizatio n
templates from the headline fields, or merge short name forms from headlines with longer forms in the tex t
bodies. We were aware of these "mis-features" in our handling of person and organization name templates ,
but had left these problems unaddressed since they seemed to have only minimal impact on the forma l
training data. Errare humanum est.
POST-HOC EXPERIMENTS
With this error analysis behind us, we pursued a number of post-hoc experiments . Most interesting
among them was a simple attempt at improving recall on organization names. Indeed, Alembic has only a
short list of known organizations—less than a half-dozen in total . Virtually all of the organizations found by
Alembic are recognized from first principles . We decided to compare this strategy with one that uses a large
lexicon of organization names .
All of the Muc-6 NE training set was used to generate a list of i,8o8 distinct organization name strings .
This could certainly be larger, but seemed a reasonable size . Nonetheless, this lexicon by itself got less than
half of the organizations in the official named-entity test corpus : organization recall was 45 and precision 91.
Another interesting question is how much an organization lexicon might have helped had it been addedto our
rule-based phrasing algorithm, not simply used by itself. This configuration actually decreased ou r
performance slightly (F-score down by 0 .5 points of P&R), trading a slight increase in organization recall for a
larger decrease in precision . The biggest problem here is due to overgeneration (up from 4% to 6%), and
partial matches such as the following,
Kraft <ENAMEX>General Foods</ENAMEX >
First < E NA M EX> Fidel ity< /ENAMEX >
where "General Foods" and "Fidelity" were in the training corpus for the organization lexicon, but the longer
names above were not .
Admittedly, the way we integrated the organization lexicon into Alembic was relatively naYve, thereb y
leading to some of these silly precision errors . We believe that if we more intelligently took advantage of thi s
knowledge source, we could reduce the additional precision errors almost entirely . In addition, we were
disappointed by the fact that our exhaustive compilation only produced somewhat less than 2,000 organi-
zation names, and only led to a piffling improvement in recall . Perhaps had we made use of larger name lists ,
we might have obtained better recall improvements—a case in point is the gargantuan Dunn & Bradstree t
listing exploited by Knight-Ridder for their named entities tagger [ 41 . Note, however, that all but a few of
the organizations that were found in both the training name list and the test data were found by Alembic from
first principles anyway. We may thus tentatively conclude that short of being as encyclopedic as the D&B
listing, a larger, better-integrated organization lexicon may have provided no more than a limite d
improvement in F-score. To further improve our organization tagging, it appears that we will simply have t o
expend more energy writing named entity phraser rules .
154
CONCLUDING THOUGHT S
All in all, we are quite pleased with the performance of Alembic in Muc-6. While we regret no t
participating in the ST task, we do believe that the framework was up to it, especially in light of our TE scores.
There were many lessons learned, and there will continue to be, as we further analyze our results and mak e
improvements to the system . Several points stand out . We had hoped to avoid full NP parsing, but the
definition of the ORG DESCRIPTOR slot clearly requires this, and we will need to return to larger-scale parsing
strategies in the future. We had hoped to include more machine-learned phraser rules, and as the rule learner
matures, we almost certainly will .
One thing is clear to us, however, and that is that rule sequences are an extremely powerful tool . They
were easy to hand-craft and adapt to the MUC-6 requirements. They run fast . And they work well.
References
[1] Alshawi, H . & Van Eijck, J. "Logical forms in the core language engine" . In Proceedings ofthe 27th
Meeting ofthe Assoc. iationfor Computational Linguistics (ACL-89). Vancouver, E.c.,1985.
[2] Appelt, D ., Hobbs, J ., Bear, J ., Israel, D., & Tyson, M . "FASTUS: A finite-state processor for infor -
mation extraction from real-world text". In Proceedings ofthe 13th Intl. Joint Conference on Artificial
Intelligence (IJCAR93) . Chambery,1993 . 1172-1178.
[3] Bayer, S. & Vilain, M. "The relation-based knowledge representation of King Kong" . SIGART Bulletin,
2(3),15-21.
[4] Borkovsky, A. "Knight-Ridder's value adding name finder : a variation on the theme of FASTUS" . In
Proceedings ofthe 6th Message Understanding Conference (Muc-6). Columbia, Md.,1995.
[5] Brill, E. "Some advances in rule-based part of speech tagging" . In Proceedings ofthe 12th Nationa l
Conference on Artificial Intelligence (AAAI-94). Seattle, 1994.
[6] Brill, E. A corpus-based approach to language learning. Doctoral Dissertation, Univ. Pennsylvania. 1993.
[7] Brill, E. "A simple rule-based part of speech tagger" . In Proceedings ofthe 3rd Conference on Applied
Natural Language Processing (Applied ACL 92). Trento, 1992.
[8] Hobbs, J . "Ontological promiscuity" . In Proceedings of the 23rd Meeting of the Association for
Computational Linguistics (ACL-85) . Chicago, Ill., 1985.
[9] Landman, F. "Groups". Linguistics and Philosophy, 12(3), 359-605 and 12(4), 723-744.
[to] Lehnert, W., McCarthy, J ., Soderland, S., Riloff, E., Cardie, C., Peterson, J., Feng, F., Dolan, C., &
Goldman, S . "University of Massachusetts/Hughes: Description of the CIRCUS system as used for Muc -
5". In Proceedings ofthe 5th Message Understanding Conference (MUC-5). Baltimore, Md ., 1993.
[11] Vilain, M . Semantic inference in natural language: validating a tractable approach . In Proceedings of the
24th Intl. Joint Conference on Artificial Intelligence (IJCA19 5). Montreal, 1995 . 1346-1351.
155
