WAYNE STATE UNIVERSITY : DESCRIPTION of
the UNO NATURAL LANGUAGE PROCESSIN G
SYSTEM as USED for MUC- 6
Lucja Iwanska, Mary Croll, Taewan Yoon, Maria Adam s
Wayne State University
Department of Computer Science
Detroit, Michigan 48202, US A
lucja@cs.wayne.edu
ph . (313) 577-1667, fx : (313) 577-6868
INTRODUCTIO N
Our Research Hypothesi s
The UNO natural language processing (NLP) system implements a Boolean algebra computational model of
natural language [Iwadska, 1992] [Iwanska, 1993] [Iwanska, 1994] [Iwanska, 1996b] and reflects our researc h
hypothesis that natural language is a very expressive, yet computationally tractable knowledge representa-
tion and reasoning system with its own representational and inferential machinery . One of our goals is to
experimentally demonstrate that an NLP system that closely parallels the representational and inferentia l
characteristics of natural language allows one to achieve an in-depth processing (eg ., querring automaticall y
from texts created knowledge bases or entity classification), with close-to-real-time, high-recall-and-precisio n
performance .
Why and What for MUC- 6
MUC6 Tasks Not Inconsistent with Our Goal s
We view the MUC6 tasks as not inconsistent with our goal of providing further experimental evidence i n
support of our research hypothesis . Also, we have been conducting experiments in non-committal process-
ing with controlled resource allocation such as exploiting local interpretations, taking advantage of loca l
contexts, and performing expensive computations such as disambiguation only if needed . We have also
been experimenting with flexible processing such as undoing various decisions and independent processin g
of related tasks . The NE MUC-6 task is a good test for such experiments.
Coreference— Most Interesting Tas k
We were particularly interested in the coreference task, with the NE task largely considered as a preparatio n
stage for it because:
1. Reference resolution is critical for processing pretty much any text in any domain;
2. We believe that mechanism for resolving references to named entities is basically the same regardles s
of the entities' types . Identifying certain named entities and therefore resolving references to them
may be facilitated by the availability of extra constraints on the beginning or the end of the name .
For example, identifying personal and company names is facilitated by the fact that personal name s
often start with a title such as "Ms ." and company names often end with a corporate extension suc h
as "Ltd" .
	
263
3. We are in the process of demonstrating that it is natural language for acquiring knowledge, and no t
acquiring knowledge in order to process natural language . Our very encouraging preliminary results
show that definite anaphora facilitates acquisition of taxonomic knowledge [Iwanska, 1996a], includin g
the "type-subtype" and "part-subpart" relations. These results demonstrate that approaches to definite
anaphora resolution that rely primarily on the availability of such knowledge, a standard approach i n
all NLP systems we are aware of, are misguided .
Successful Techniques for Developing and Testing In-Depth Processing Large Number of Texts
Finally, by participating in MUC6 we hoped to share our experience and learn from the other participant s
about successful techniques for developing and testing in-depth processing for a large, more than two thou -
sands, number of texts and for developing large knowledge bases .
UNO MODEL OF NATURAL LANGUAG E
Closely Mimics Representation and Reasoning Inherent in Natural Languag e
The UNO model closely mimics the representation and reasoning inherent in natural language because it s
translation procedure, data structures of the representation, and inference engine are motivated by th e
semantics and pragmatics of natural language . One payoff of this close correspondence to natural languag e
is the capability of automatically creating and querring knowledge bases from textual documents' .
Sentences asserting properties of (sets of) individuals, sentences describing subtyping relations, includin g
extentional type definitions, as well as intensional definitions of concepts such a s
1. "John is a neither good nor hard-working nurse " , "Not many students did welt'
2. "Dobermans, poodles and terriers are dogs "
3. "Elephant— a huge, thick-skinned, mammal with very few hairs, with a long, flexible snout, and tw o
ivory tusks"
are uniformly represented by the following type equations :
type == { < P1i TP1 >, < P2,TP2 >, . . ., < Pn,TPn > }
Their left-hand side, "type", is the representation of a noun phrase, the largest type, or the name of a
concept; the right-hand side is a two-element set: a property value P, and TP- a set of < t, p > elements
representing the fact that the property value P holds at a temporal interval t with the probability p .
Individual property values, temporal intervals and probabilities are represented by the sets [al , a2, . . ., an]
whose elements a, are terms, record-like structures consisting of a head, a type symbol, and a body, a list
of attribute-value pairs: attribute => value . For exr•nple, a complex noun "sick, very unhappy woman" is
represented by
[ woman( health => sick ,
happy => (not happy)(degree => very)) ]
whose only term has the type "woman" as its head and two attributes : "health" with the value sick ; "happy"
with the value (not happy)(degree => very) . Semantically, this data structure represents this subset o f
individuals of the type "woman" for which the attribute "health" has the value "sick" and the function
"happy" has the value "very unhappy' .
Various relations such as entailment or its dual subsumption (set-inclusion) and negation (set-complement )
are automatically computed, and the intuitively and formally correct results are guaranteed to hold . The
UNO NLP system uses such type equations bi-directionally : for answering questions about the properties
of a particular individual, and for matching particular properties against the properties of individuals in it s
knowledge base .
'Needless to say, limited by the subset of English covered by the model .
264
Solid Computational and Mathematical Framework in Tact with Linguistic The-
ories
The UNO representation offers a solid computational and mathematical framework in tact with linguisti c
theories. Updating knowledge base and automated inferencing is done by the same semantically clea n
computational mechanism of performing Boolean operations on the representation of natural language inpu t
and the representation of previously obtained information stored in the knowledge base .
The underlying knowledge representation formalisms with the computable Boolean algebras with set -
theoretic and interval-theoretic semantics allows one to capture semantics of different syntactic categorie s
because sets and intervals underlie semantics of many syntactic categories : common nouns, intransitive verbs ,
and adjectives can be thought of as denoting sets of persons or objects that possess properties denoted b y
the words; adjectives and adverbs are functions mapping sets of objects into sets of objects ; determiners are
functions mapping sets of objects into sets of sets of objects, and the denotations of proper nouns are set s
of sets of objects [Dowty et al ., 1981] [Barwise and Cooper, 1981] [Keenan and Faltz, 1985] [Hamm, 1989] .
The same machinery is used as a metalanguage for describing and propagating arbitrary Boolean con-
straints, including dictionary entries describing morphological and grammatical constraints . The data struc-
tures are partially specified, negative constraints are propagated via unification, and the nonmonotonicity
of negation [Pereira, 1987] is not problematic.
The UNO model shares many computational characteristics with the programming language LIF E
[Ait-Kaci and Richard Meyer and Peter Van Roy, 1993] because the efficiently computable calculus that un-
derlies LIFE [Kit-Kaci, 1986] is extended in the UNO model to handle negation and generalized quantifies .
Some of the linguistic theories that the UNO model encompasses and (or) extends include insights of th e
Montague semantics of natural language [Montague, 1973] [Dowty et al ., 1981], the Boolean algebra mathe-
matical models of [Keenan and Faltz, 1985], the theory of generalized quantifiers [Barwise and Cooper, 1981 ]
[Hamm, 1989], the theory of the pragmatic inference of quantity-based implicature of [Horn, 1972] [Horn, 1989] ,
and the theory of negation in natural language of [Horn, 1989] .
Recent Extension— Temporal Reasoning
We have recently extended the UNO model to incorporate temporal reasoning . This extension demonstrates
that important inferences about time can be captured by a general representation and reasoning mechanism
inherent in natural language, many aspects of which are closely mimicked by the UNO model . We have
shown that computing logical, context-independent and non-monotonic, context-dependent inferences fo r
temporal and non-temporal objects is almost exactly analogous .
Theory and Practis e
We are committed to addressing research problems with a strong promise for facilitating processing natural
language input . For example, we had decided to extenn the UNO model of natural language to handl e
temporal information because virtually all real-life tasks involve handling some aspects of time . There is a
large body of existing work on morphologically marked time and aspect, but we had decided against handling
this type of temporal information because it necessarily requires high recall and precision of performin g
sentential-level parsing, a task that no NLP system, including our system, can perform well . Instead, we
had decided to address temporal information from explicit temporal expressions because we can extremel y
reliably recover such expressions via local parsing .
Our natural-language-based temporal reasoner was developed and tested on more than three hundred s
1989 " Wall Street Journal' articles. We somewhat randomly chose a batch of 300 WSJ articles and using
SGML-like marks, an opening mark < TE > and a closing mark < /TE >, we had marked all expression s
of different syntactic categories that contained any information pertaining to time . Incidentaly, the number
of articles coincides with the number of the MUC-6 development data .
Our temporal reasoner automatically extracts explicit temporal expressions from on-line textual docu-
ments and creates their representation. This representation allows the system to compute entailed logical,
context-independent, deductive inferences and facilitates computing context-dependent, non-monotonic in-
ferences, including implicature, specialization, and generalization.
265
For any set of English temporal expressions, their information content can be computed and compared ,
which allows the system to compute answers to the "Yes-No" questions about various aspects of time ,
answers to the "When ?', "How long ?' and "How often ?' queries of the resulting knowledge base and, t o
a limited extent, temporal ordering of the events described in the documents .
UNO NLP SYSTEM
In this section, we briefly comment on the most distinct characteristics of our system, as requested by th e
MUC-6 program committee ; technical details can be found in the references provided earlier . Whenever
possible, we illustrate our system's capabilities with the examples from the walkthrough article .
Demonstrated by the pre-MUC6 Research and Implementation
Reasoning with Explicit Negative, Disjunctive, and Conjunctive Information at All Syntactic
Levels
One consequence of this rather unique capability is flat taxonomies; complex Boolean types need not be
stored explicitly, which prevents the unnecessary, but common, exponential growth of a knowledge base . For
example, the UNO system does not need to explicitly store the entailment (subsumption) relation betwee n
the complex disjunctive type "power-boat or sail-boat' and the lexically-simple type "boat" because this
relation is directly (and cheaply) computed from the UNO representation of these two expressions, commo n
nouns in this case.
Flat taxonomies are highly desirable because, among others, they facilitate the ease and quality increas e
of knowledge base maintenance.
Handling general negation in natural language allows the UNO system to correctly compute possibl e
interpretations of the sentence "Mr. Dooner doesn't see a creative malaise permeating the agency", one of
which is that Mr. Dooner sees something else . This correct interpretation is automatically preferred by the
system because of the context created by the immediately following sentence "He points to several campaigns
with pride, including the Taster's Choice commercials that are like a running soap opera " .
Adverbial and Adjectival Modification at All Syntactic Levels
Adverbial and adjectival modification also contributes to the flatness of our taxonomies . For example, the
UNO system automatically computes the relations between the following pairs of expressions denoting types
and qualitative frequency values :
"smooth process"
	
"not very smooth process"
"no immediate plans"
	
"plans"
"huge models"
	
"small model?
"unusual"
	
"not extremely unusual"
Nonstandard Quantifiers
Some of the nonstandard quantifiers that our system can handle include vague quantifiers involving th e
determiner "many" . The UNO system automatically computes relations between the following expressions :
"many differences"
	
" very many differences"
	
"not very many differences"
Reasoning with Uncertainty and Qualitative Probabilistic Reasoning without Underlying Nu-
meric Values
Just like the other systems participating in MUC-6, our system often misses a high-level relation betwee n
entities described in a sentence because our parser does not attempt or fails to compute full sentential-leve l
parsing. However, even without such a quality parse, the system is capable of automatically computin g
relations such as the relation between the following two expression s
"possible acquisition?
	
"possible, but not very likely, acquisitions"
266
While the lack of a quality parse may prevent the system from understanding a high-level relation, i n
this case, what exactly was said about possible acquisitions, it understands the difference between "possible,
but not very likely" and "possible" acquisitions .
Temporal Reasoning
The UNO system does not only identify explicit temporal expressions, but also automatically reasons wit h
them . Some of the expressions uniquely handled by our system include the following :
1. Temporal frequency quantifiers
"Not very often"
	
"Very often"
2. Infinite number of temporal relations
"Not in the immediate future" "immediately precedes "
"Shortly before or after"
	
"Not long before"
	
"Five or sir days before "
3. Uncertain and underspecified (explicitly negative) temporal information
"It did not happen on April 22, 1992"
"X happened in April, 1992, or X happened in May, but not early May, 1992"
Reasoning with Underspecified Information
The UNO system computes the fact that the following expressions differ in their information content an d
that the first has strictly more information than the second :
"X happened in May, 1992"
	
"X happened in May, but not early May, 1992'
Automatic Combination of Information
Our system offers a highly efficient Boolean "meet" operation which is mathematically guaranteed to com-
bine information in the most general way . This UNO operation provides an alternative to ad-hoc mergin g
employed by many other systems . We must note, however, that currently this advantage of our system i s
rarely realized in practise because it strongly depends on a correct, quality parse of input sentences .
Uniform Natural-Language-Based Representation of Taxonomic and Temporal Reasoning
As we explain later, this uniformity of our representation greatly simplifies our architecture and control .
Work Done during June-October 199 5
Below we elaborate on the work done between the end of June and beginning of October 1995 . We briefly
comment on the scope, importance and quantity of the tasks we had decided to do .
1. Implemented changes necessary to participate in MUC6
For example, before MUC-6, the UNO system was preserving the exact image of the input text, but i t
did not keep track of the correspondence between the resulting knowledge base and the actual pieces o f
input text from which the knowledge base resulted . For MUC-6, we had to redesign our bookkeepin g
structures in order to be able to do this . We devoted a considerable effort to this task because such a
change directly affects every processing stage .
Another piece of code we had to develop were functions for chosing markables and outputting SGML -
tagged text .
267
2 . Improved the accuracy of identifying unmarked sentential boundaries
Our pre-MUG6 system was quite good in correctly identifying sentential boundaries in newspaper
articles. A simple list of 200-or-so standard abbreviations and the sensitivity to the most commo n
occurences of periods in numbers was largely responsible for this good performance . We wanted the
system to perform this task near-perfectly because it would improve this generally needed capabilit y
and because based on the existing literature, we expected the distance expressed in a number o f
sentences to be a very important factor in computing pronominal referents .
3. Improved handling punctuation
It is fair to say that before MUC-6, we largely ignored, as opposed to handled, punctuaction . Now
we can say that we can handle periods, exclamations, most unpaired singlequotes, most commas, an d
some dashes. While handling punctuation is good in general, specifically for MUC-6, it is needed fo r
processing numbers and facilitating identification of apposites .
4. Completed half-built semantic hierarchy of structured geographical knowledge .
The UNO NLP hierarchy of geographical knowledge contains major geographical information abou t
all countries, including capital cities, major and important cities, towns, ports, suburbs, local settle-
ments, geographical and political regions that divide land such as provinces, islands, major ports an d
airports, landmarks, monetary, length, area, and volume systems, official languages, major politica l
organizations, waters such as seas, lakes, and rivers, and geographical landmarks and points of interes t
such as mountains, hills, woods, and national parks .
This geographical knowledge is encoded in our uniform, general-purpose UNO knowledge representa -
tion; UNO NLP system supports geographical reasoning with its general inferencing mechanism .
We certainly did not need to do this for MUC-6, a simple gazetteer list would do (an approach adopte d
by most MUC-6 participants) . However, we put a lot of effort into encoding this hierarchy because :
(a) With the existence of this hierarchy, we further substantiate our claim that natural language is
a powerful and efficient knowledge representation system and add geographical knowledge to th e
list of uniformly represented and reasoned about types of knowledge . Right now, our system can
reason about geographical region containment in the exact analogous fashion as about type subse t
relation and temporal interval containment.
(b) We wanted to demonstrate that the same in-depth mechanism of some aspects of geographica l
reasoning can be efficiently used to perform a much lesser, MUG6-like task of marking locations .
5. Identified common named types, including organization types and existing named entities
We identified more than 100 types other than the type "organization" and developed sizable knowledge
bases and dictionaries with the actual existing, classified named entities of these types . We decided t o
do it in order to experimentally substantiate our belief that reference mechanism for named entities o f
different types is basically the same for all entity types, and that references can be computed by the
same piece of code .
6 . Implemented numbers and personal name s
Our effort of making the UNO system process numbers and improving handling personal names wa s
strictly related to MUC-6 .
7 . Developed and tested a general approach to handling abbreviations, acronyms and aliases
We spent much more effort on abbreviations, acronyms and aliases than originally planned . First,
such short forms are very common in written language. Second, handling such short forms resemble s
handling semantic ambiguity. And third, short forms fit with our ongoing research on context .
8. Implemented quick-and-dirty pronoun resolutio n
While this code appears to perform well, it breaks occasionally and needs to be further debugged .
268
Dictionary
I
	
NL Input
Dictionary
entries
Meta
Dictionary
Knowledge Representation
Boolean
Algebras
KB
Interpreter
Parser
	 t	
Grammar
Static
Knowledge
Dynamic
Knowledge Inference
Engine
Learning
	
Discourse
------------------------------------ -
I
	
1
Figure 1 : Modules of the UNO natural language processing syste m
UNO Architecture
The practical significance of the uniformity of the UNO natural-language-based representation and inference ,
is a simple and flexible architecture of our NLP system :
1. All UNO modules access the knowledge representation module and share its uniform representation .
2. There is no need for external specialists such as knowledge representation systems or temporal rea-
soners. Our system uniformly represents and reasons with taxonomic, temporal and geographical
knowledge.
3. With no external specialists, no interfaces to access them are needed, and therefore there is no need t o
translate between incompatible representations.
An NLP system that needs to perform tasks beyond information extraction and to exhibit some in-
depth processing such as question answering virtually always calls some external specialists, typicall y
knowledge representation systems . As reported in the literature, the necessity to translate between th e
representation of the NLP system and such an external specialist is very hard to do and it tremendousl y
complicates control [Palmer et al., 1993] [exp, 1996] .
269
UNO NLP Module s
The UNO NLP system consists of the following modules : Reader, Dictionary, Parser, Knowledge
Representation, Discourse, and Learning .
The first three, Reader, Dictionary, and Parser, are modules of the BILING system, a NI.P sys-
tem processing a large number of narratives written by bilingual English/Spanish students [Iwaiiska, 1989] .
The changes to these old modules include augmenting the parser to produce the UNO representation of sen-
tences, enhancing the morphological analyzer to handle prefixes, and supplying the reader with structures for
storing the information gained at various stages of processing . The Knowledge Representation module
implements the theory behind the UNO model of natural language .
The Reader module contains functions for breaking input text into documents, paragraphs, sentences,
and words . It recognizes abbreviated phrases, contractions, punctuation etc . This module also contains
functions for creating various structures from strings and LISP s-expressions, and routines for initializin g
global variables used by other modules .
The Dictionary module contains functions for creating, updating, loading and checking consistenc y
of the UNO dictionary, and functions for performing morphological analysis of the input . Metaknowledg e
about the dictionary describes its content : it lists known features, specifies feature applicability to differen t
syntactic categories, describes possible and default values of different features (the default values are no t
shown explicitly in the entries) . This metaknowledge facilitates maintaining consistency of the dictionary .
The dictionary is used by the morphological analyzer for supplying each input word with syntactic ,
semantic, and pragmatic information . The morphological analyzer can recognize and generate various form s
of nouns and verbs, for example, cry, cries, crying, cried, adjectives, for example, angrier, derive adverb s
from adjectives, for example, slowly, etc. The morphological analyzer handles both prefixes, eg . the prefix
im in the word impossible, and suffixes, eg. less in the word brainless .
The Parser module contains functions implementing a chart parser [Winograd, 1983] [Earley, 1985] .
The parser produces both syntactic parse trees and the UNO semantic representation of the natural languag e
input . The grammar allows a limited context-sensitivity via features on lexical categories and non-terminals .
Each grammar rule is supplied with the name of a function translating the recognized expressions of natura l
language into the UNO representation .
The Knowledge Representation module consists of the Boolean algebras module, Knowledg e
Base Interpreter, and the Inference Engine module. The Boolean algebras module implements th e
UNO knowledge representation formalisms and some standard Boolean algebras such as predicate calculus
and the powerset of a finite set . The module contains functions for deriving the disjunctive normal form of a
complex Boolean expression independently of its algebra, as well as functions for creating the representatio n
of the element that this complex Boolean expression stands for .
The Knowledge Base Interpreter implements the interpreter of the sets of type equations encoding
taxonomic, temporal and geographical knowledge .
The Inference Engine module implements the UNO algorithm for representing and utilizing knowledg e
derived from natural language sentences. This algorithm updates the dynamic knowledge bases of the UN O
system.
The Discourse module implements anaphora resolution, functions for identifying referring expression s
and computing referents if needed, and discourse processing, functions for computing certain discours e
structures that facilitate maintaining dynamic knowledge bases .
The Learning module consists of functions mixing statistics and inductive learning techniques and i s
used for corpus analysis and definite-anaphora-based knowledge acquisition .
Control
Flexible, non-sequential control with all modules accessing the Knowledge Representation module .
Speed
Our system is reasonably fast . For MUG-6, it took slightly under half a minute to process a typical WS J
article in the development set .
270
WALKTHROUGH ARTICLE AND MORE
Somewhat Expected and Unexpected Problem s
Given our extremely ambitious goals for such a short period of time, and particularly knowledge engineerin g
large knowledge bases prevented us from doing coreference . We fully expected that we will not be able to
successfully complete improving our existing anaphora resolution code and make it work reliably with th e
newly created large knowledge bases .
However, we did not expect a failure of the function printing out the markings . This buggy piece of code
is about the easiest to fix, but at the same time, it is the most damaging in terms of the score . We had t o
not print the recognized organization names and turn off processing of the "HL" and "DATELINE" parts
of the articles.
Our Official, Unofficial, and Very Unofficial Scor e
Our scores are not very meaningful because despite the fact that the problematic markings constituted onl y
a small percentage of all the markings, 8% for the first six test articles, the scorer failed to score anything ,
but four shortest articles ; this can be seen in the official results table below— the "DD" slot containing a dat e
in a canned "Month/Day/Year" format shows only four matches .
In order to obtain an unofficial score, we had to edit our results, which we did strictly according to th e
trace. Even this unofficial score does not reflect well our real performance, only shows that we did something .
For those things that we were able to mark, our own very unofficial estimate is that we performed in th e
high nineties in both recall and precision (as can be seen in the enclosed sample article) . For example, w e
correctly marked both expressions problematic for most other systems : "the 21st century" and "Hollywood" .
OFFICIA L
• • • TOTAL SLOT SCORE S
SLOT
	
POS ACT— COP. PAR INC— SPU MIS NON— REC PRE UND OVO ERR SU B	
+	 +----+	
<enemex>
	
909 9— 7 0 0— 2 907 0— 1 78 99 77 99 0
type
	
909
	
9— 5 0 2— 2 902 0— 0 56 99 77 99 2 8
text
	
909
	
9— 6 0 2— 7 907 0— 0 56 99 22 99 76
subtotals
	
1818 18— 10 0 4— 4 1804 0— 0 56 99 77 99 7 8<times>
	
112
	
6— 5 0 0— 1 107 0— 4 63 96 17 96 0
type
	
112
	
6— 5 0 0— 1 107 0— 4 83 96 17 96 0
text
	
117
	
6— 4 0 1— 1 107 0— 4 67 96 17 96 70
subtotals
	
224 12— 9 0 1— 7 714 '0— 4 75 96 17 96 1 0
<nnmex>
	
93 5— 5 0 0— 0 88 0— 5 100 95 0 95 0
type
	
93
	
5— 3 0 0— 0 66 0— 5 100 93 0 95 0
text
	
93
	
3— 3 0 2— 0 88 0— 3 60 96 0 97 40
subtotals
	
186 10— 6 0 7— 0 176 0— 4 80 95 0 96 70
	 +	 +----+	
ALL OBJECTS
	
7778 40	 77 0 7— 6 2194 0	 1 66 98 15 99 7 0
MATCHED ONLY
	
34 34	 77 0 7— 0 0 0	 79 79 0 0 20 20
	 +	 +----+	P&R
	
2P&R
	
Pts2 R
F-MEASURES
	
7 .38
	
5 .65
	
1 .51
• • • DOCUMENT SECTION SCORES • •	
+	 +----+	SLOT
	
POS ACT— COR PAR INC— SPU MIS NON— REC PRE UND OVG ERR SU B	
+	 +----+	HL
	
176 0— 0 0 0— 0 128 0— 0 • 100 • 100
DD
	
60 4— 4 0 0— 0 56 0— 7 100 93 0 93 0
DATELINE
	
37 0— 0 0 0— 0 57 0— 0 • 100 • 10 0
TXT
	
1986 36— 73 0 7— 6 1958 0— 1 64 98 IT 99 73	
+	 +----+	
UNOFFICIAL SCORE OF EDITED RESULT S
• • • TOTAL SLOT SCORES • • *	
+	 +----+	SLOT
	
POS ACT— COR PAR INC— SPU MIS NON— REC PRE UND OVO ERR SU B
	 +	 +----+	<enames>
	
917 477— 754 0 0— 168 656 0— 78 60 77 40 76 0
type
	
912 477— 730 0 24— 166 658 0— 25 54 72 40 79 9
text
	
912 422— 204 0 50— 166 658 0— 22 48 72 40 81 20
subtotals
	
1874 644— 434 0 74— 336 1316 0— 74 31 77 40 80 1 4
<times>
	
112 176— 67 0 0— 91 73 0— 78 49 22 51 57 0
type
	
112 178— 86 0 1— 91 25 0— 77 48 77 51 58 1
text
	
117 178— 79 0 8— 91 25 0— 70 44 77 31 61 9
subtotals
	
774 356— 165 0 9— 182 30 0— 74 46 22 51 59 3
<names>
	
93 103— 70 0 0— 35 23 0— 75 67 75 33 45 0
type
	
93 103— 70 0 0— 35 73 0— 75 67 75 33 45 0
text
	
93 105— 59 0 11— 35 23 0— 63 56 25 33 54 1 6
subtotals
	
166 710— 129 0 11— TO 46 0— 69 61 25 33 50 8
	 +	 +----+	ALL OBJECTS
	
7734 1410	 778 0 94— 366 1417 0— 32 52 63 42 74 1 1
MATCHED ONLY 677 877— 776 0 94— 0 0 0	 88 68 0 0 11 1 1
271
	 +	 +----+	P&R
	
2P&R
	
PZ:2 RF-MEASURES
	
39 .96
	
46 .23
	
36 .1 6
• • • DOCUMENT SECTION SCORES • • •
	 +	 +----+	SLOT
	
POS ACT— COR PAR INC— SPU MIS NON— REC PRE UND OVO ERR SU B	 +	
+-- --+	HL
	
126 0— 0 0 0— 0 126 0— 0 • 100 • 100DD
	
60 60— 60 0 0— 0 0 0— 100 100 0 0 0 0DATELINE
	
52 0— 0 0 0— 0 32 0— 0 • 100 • 10 0TXT
	
1994 1350— 666 0 94— 666 1232 0— 34 49 62 44 74 1 2	
+----+	
Edits
Below, we explain our edits and enclose the answer key for the walkthrough article with the hand-writte n
markings showing the expressions we recognized as well as our edited markings .
In the first 6 test articles, there were 16, or 8% problematic markings, and 186, or 92% of unproblemati c
markings, with bits of text that was not there appearing and portions of teh existing text disappearing :
1. Double-closed marks
Example :
	
Ms. <ENAMEX TYPE="PERS01">Vashington</EIAMEX>,</EIAMEX> ,
Edit:
	
Ms. <ENAMEX TYPE="PERS01">Yashington</EIAMEX> ,
lr of edits : 4
2. Simple unclosed/unopen marks
Example :
	
of laws Corp.'s lows <EUAMEX TYPE="LOCATIOI">America Publishing uni t
Edit:
	
of News Corp.'s lees <EIAMEX TYPE="LOCATIOI">America </EIAMEX>
Jr of edits : 10
3. Really scrambled :
Example :
	
<TIMEX TYPE="TIME">$<YUMEX TYPE="MOIEY">$725,000 last year</TIMEX >
Edit:
	
<TIMEX TYPE="MOIEY">$725,000</IUMEX> <TIMEX TYPE="TIME"> last year</TIMEX>
Ir of edits : 2
4. Nothing was done about the introduced text and the markings that were not printed at all, bu t
expressions clearly recognized, as evidenced by the trace .
Qualitative Indication of Our Performance for the Walkthrough Articl ed
correctly identified expressio n
correctly identified expression, but too short or too long
Ot
of don e
<DOCID> wsj99 ' 026 .0231 </DOCID ><DOCNO> 940224-0133 . </DOCNO >
<H > Marketing Sr Media - Advertising:O
	
NAMEX TYPE="PERSON'>John Donner</ENAMEX> Will Soccee
	
NAMEX TYPE="PERSON'>James</ENAMEX >0
	
Helm ofL~C/~:~ENAMEX TYPE="ORGANIZATION">McCann .Ericksoo< N MEX >Q
♦~ By NAMEX TYPE="PERSON•>Kevin Goldman</ENAMEX> </HL >9DD <TIMEX TYPE="DATE•>02/24/94</TIMEX> </DD>
\s~ SO> WALL STREET JOURNAL (3), PAGE B6 </SO >
<CO> IPG K </CO><IN> ADVERTISING (ADV), ALL ENTERTAINMENT & LEISURE (ENT)
,FOOD PRODUCTS (POD), FOOD PRODUCERS, EXCLUDING FISHING (OFP) ,
RECREATIONAL PRODUCTS & SERVICES (REC), TOYS (TMF) </IN ><TXT >
<P>
	
/s~One of the many differ ces betwee44J ENAMEX TYPE=•PERSON">Robert L
. James</ENAMEX>, chairman an d'ef executive officer o
	
NAMEX '
	
E=•ORGANIZATION•>MeCann-Erickson</ENAMEX>, an d+ NAMEX TYPE=" RSON">John J . Dooner Jr .</ENAMEX> ,
~~yy!! agency'~~~~~~~[[[[[esident and chief operating officer, is quite
	
~\telling: Mr
	
NAMEX TYPE=•PERSON•>James</ENAMEX> enjoys sailboating, while Mr[+NAMEX TYPE= ' PERSON•>D000er</ENAMEX >owns a po
	
at.
	
V</P
><p>
ow, M + NAMEX TYPE=•PERSON">James</ENAMEX> is preparing to sail into the sunset, and Mr .ENAME
	
YPE="PERSON'>Dooner</ENAMEX> .is poised to rev up the engines to guideNAMEX TYPE=•ORGANIZATION•>Interpubllc Gronp</ENAMEX>'s
NAMEX TYPE=•ORGANIZATION•>McCann-Erickson</ENAMEX> inOTIMEX TYPE="DATE•>the 21st century</TIMEX> . Yesterday ,
	
NAMEX TYPE=='ORGANIZATION•>M
	
</ENAMEX> madeofficial what had been widely anticipated : M
	
ENAMEX TYPE='PERSON">James</£NAMEX>, 37 year. old .M stepping down as chief executive officer o
	
IMEX TYPE=•DATE•>July 1</TIMEX> and wil l
272
reti
	
chairman at the end of the year . He will be succeeded byMr + NAMEX TYPE='PERSON•>Dooner</ENAMEX>, 66 .
<0>It promises to be a smooth process, which I. unusual/~[n th e
volatile atmosphere of the advertising business . But M ~ + ~ ENAMEX TYPE="PERSON">Dooner</ENAMEX> hasa big challenge that will be his top priority . 'I'm going
	
oreson strengthening the creative work," he says . 'There is room t o
grow. We can make further improvements in terms of the perception of our creative work .'</9>
< /~Eve +-g{ME% TYPE='PERSON•>Alan Oottesmau </£NAME%>, an analyst wit - NAMEX TYPEs•OROANIZATION•>Paine Webber</ENAMEX> ,
who b
	
e
	
NAMEX TYPE="ORGANIZATION•>McCann</ENAMEX> is filled
	
vitality" and is is 'great shape,• says thatEvea tree a standpoint, "You wouldn't pay to see their reel' of commercials .
</9><0>
hil
	
NAMEX TYPE='ORGANIZATION•>McCann</ENAMEX> 's world-wide billings rot + NUMEX TYPE=•PERCENT">17%</NUMEX> t o
t-jF NUMTYPE=•MONEY•>66 .e billion</NUMEX> last
	
/\
r fro
	
NUMEX TYPE='MONEY•> g b.7 bllllion</NUMEX> i>(+~"I'IMEX TYPE=•DATE•>l092</TIMEX>, the agency still is dogged by the
loss of th
	
creative assignment for the prestigious Coca-Coll
	
~~Classic account. •I would be less than honest to say I'm n o
dis
	
sated not to be able to claim creative leadership to
	
NAMEX TYPE="ORGANIZATION• STATUS=•OPT">Coke</ENAMEX>, 'M + ENAMEX TYPE="PERSON•>Dooner</£NAME says .
<0 NAMEX TYPE=•ORGANIZATION">McCann</ENAMEX> still handles promotions and media buying fo r
NAMEX TYPE=•ORGANIZATION• STATUS="OPT•>Cohr
	
AMEX>'s ubiquitous *Av e advertising belongs t oNAMEX TYPE=" ROANIZATION">Coke</ENAMEX> . Buthe bragging rights tNAMEX TYPE=
	
OANIZATION•>Creative Artists Agency</ENAMEX>, the hi + NAMEX TYPE='LOCATION">Hollywood</ENAMEX >- alent agency. 'We ar e
sving to have a strong renewed creative partnership wit hNAMEX TYPE='ORGANIZATION'>Coca-Cola</ENAMEX>,' M4'+j£NAMEX TYPE="PERSON•>Dooner</ENAMEX> says. However, odds of that
apposing areslim sin word fro
	
NAMEX TYPE="OROAPJIZATION'>Coke</ENAMEX> headquarters i(+aENAMEX TYPE="LOCATION'>Atlanta</ENAMEX >is the
	
NAME YPE ORGANIZATION•>CAA</ENAMEX> an dother
	
agencies, such s - ~iNAMEX TYPE=•OROANIZATION•>Falloo McEUigott</ENAMEX> ,handl
	
NAMEX TYIrE"ORGANIZATION'>Coke</ENAMEX> adeertising .
</0o >
M + NAMEX TYPE="PERSON•>Dooner</ENAMEX>, who recently lost 60 pounds over three-end-a-hal fmoat
	
ye now that be has 'r
	
rated^ himself, he weals to d othe same for the agency . Por M + NAMEX TYPE="PERSON•>Dooner</ENAMEX>, it mesas maintaining his
running and enercise schedule, a
	
r the agency, it mean sdeveloping mo global campaigns that nonetheless reflect loca l
cultures . On
	
NAMEX TYPE="ORGANIZATION•>McCana</ENAMEX> account, 'I Can't Believe It's Not Butter,' abatter subst
	
e, is in 11 countries, for example.
</9>
NAMEX TYPE='ORGANIZATION">McCann</ENAMEX> has initiated a new so-called global collaborative system .mposed of world-w'
	
ccuunt directors paired with creativ ertners . In additio + NAMEX TYPE=•PERSON•>Peter Kim</E MEX> was hired fro m
NAMEX TYPE-. GANIZATIa173kl(>WPP Gronp</ENAMEX>'
	
AMEX TYPE="ORGANIZATION">J.alter Thompson</ENAMEX> lax + IMEX TYPE="DATE'>Se ember</TIMEX> as vice chairman, chief strategy
officer, world-wide .</9>
<9>M + NAMEX TYPE='PERSON'>Dooner</ENAMEX> doesn't see a creative malaise permeating the agency .
He p
	
to several campaigns with pride, including the Tas tChoice commercials that are like a running soap opera .
	
'
	
+- NUMEX TYPE="MONEY'>$1 9million</NUMEX>
campaign with the recognition of d+~'!
	
TYPE=•MONEY">$100 million</NUMEX> campaign, "he says of the commercials that feature • couple that
	
old a record for the length of time dating before kissing .</9>
<0>Even so, MENAMEX TYPE=•PERSON•>Dooner</ENAMEX> is on the prowl for more creative talent and
is interested inwiring a hot agency . He says he would like tofinalise an acquisition 'yesterday. I'm not known for patience.'
</p><0> ~
Mc(}ENAMEX TYPE='PERSON">Dooner</ENAMEX> met wit
	
NAMEX TYPE='PERSON•>Martin Puris</ENAMEX>,press
	
sad chief executive
	
`officer
	
NAMEX TYPE="ORGANIZATION•>Ammirati It Puris</ENAMEX>, about ENAMEX TYPE="ORGANIZATION'>McCann</ENAMEX>' sargniri
	
e e
	
cywith billings o + NUM
	
TYPE="MONEY'>$400 miilion</NUMEX>, but nothing has materialised . 'Thereis no question,
	
s M + ENAMEX TYPE=•PERSON">Dooner</ENAMEX>, "the* we are looking for qualit yacquisitions an
	
ENA
	
TYPE=•ORGANIZATION">Ammirati is Puris</ENAME Y> is a quality operation. There are
sum people an n ire agencies that I would love to see be part of
	
/~_thNAMEX TYPE=•OROANIZATION">McCann</ENAMEX> family .' Mr(+¢.NAMEX TYPE=•PERSON">Dooner</ENAMEX> declines to
idea t y possible acquisitions .
	
~~</p>
<9>Mr + NAMEX TYPE="PERSON•>Dooner</ENAMEX> is just gearing
up for the headaches of running one o fthe Is
	
d-wide agencies . (There are no immediate plans toreplace M + ENAMEX TYPE="PERSON">Dooner</ENAMEX> as president ; M + NAMEX TYPE=•PERSON'>James</ENAMEX> operate d
ae chairmaet executive officer and president for a period of time.) Mr.
+ ENAMEX TYPE='PERSON">James</ENAMEX> is filled with thoughts of enjoying his three hobbies, sailing, skiing and hunting.9>
<0>Asked why,,would choose to voluntarily exit while he still to
so young, M
	
E~EX TYPE="PERSON">James</ENAMEX> says it is time to be a tad selfish about how h espends his d
	
+ ENAMEX TYPE_'PERSON">James</ENAMEX>, who has a reputation sa asextraordinarily ton
	
kmaster, says that because he "had a greattime. in advertising.' he doesn't want to 'talk about the
disappoi meats .' In fact, when he is asked his opinion of the ne wbat
	
- NAMEX TYPE=•ORGANIZATION• STATUS=•OPT•>Coke</ENAMEX> ads from ENAMEX TYPE='ORGANIZATION">CAA</ENAMEX> ,M
	
MEX TYPE="PERSON•>James</ENAMEX> places his hands over hi smo
	
. He shrugs. He doep t utter a word . He has, he says, fon dmemories of working wit1 ENAMEX TYPE="ORGANIZATION•>Coke</ENAMEX> executives
	
ENAMEX TYPE="ORGANIZATION•>Coke</ENAMEX >has given us gre
	
(((ii
ihighs' toys Mr(+NAMEX TYPE="PERSON•>James</ENAMEX>, sitting in his plush office, filled wit h
photographs of
	
g ss well as huge models of, among other things, a Dutch tugboat .</I.>
<0>He says
	
feels a 'great sense of accomplishment .'' In 36countries
	
NAMEX TYPE=•ORGANIZATION•>McCann</ENAMEX> is ranked in the top three ; in 73 countries, it is in the top 10.</9>
<0>
will continue t o
273
Soon . Mr t NAMEX TYPE="PERSON">James</ENAMEX> will be able to compete in as many sailing race s
as a hon .
	
nd concentrate on his duties as rear commodore a tth
	
NAMEX TYPE="ORGANIZATION">New York Yacht Club</ENAMEX> .
<0>
Maybe he'll even leave something from his office for
	
NAMEX TYPE="PERSON">Dooner</ENAMEX> .
Perhaps & framed page from the New York Times, date
	
IMEX TYPE="DATE">Dec . 194T</TIMEX> ,showing ~p(ea-end chart of the stock market creak e .g.!
	
atyear
. M } ENAMEX TYPE="PERSON">James</ENAMEX> says he framed it and kept it by his desk as a
"person
	
nder . It can all be gone like that ."
</p >
</TXT >
</DOC >
Edited Markings for the Walkthrough Articl e
<DOC >
<DOCID> wsj94 ' 096 .0031 </DOCID><
DOCNO> 940324-0133 . </DOCNO ><HL>
	
Marketing tc Media - Advertising :q John Dooner
Will Succeed Jame .q At Helm of McCann-Erickso
n
q - -
q By Kevin Goldman </HL >
<DO> <TIMEX TYPE="DATE'>09/94/94</TIMEX> </DD >
<SO> WALL STREET JOURNAL (J), PAGE Be </50 >
<CO> IPO K </CO >
<IN> ADVERTISING (ADV), ALL ENTERTAINMENT It LEISURE (ENT) ,
FOOD PRODUCTS (FOD), FOOD PRODUCERS, EXCLUDING FISHING (OFP) .
RECREATIONAL PRODUCTS Q SERVICES (REC), TOYS (TMF) </IN >
<TXT>
<0>One of the many differences between <ENAMEX TYPE=•PERSON">Robert L
. James</ENAMEX>, chairman an dchief executive officer of McCann-Erickson, and <ENAMEX TYPE="PERSON">John J
. Dooner Jr</ENAMEX> . ,the agency's president and chief operating officer, is quit e
telling : Mt . <ENAMEX TYPE="PERSON">James</ENAMEX> enjoys sailboating, while Mr . <ENAMEX TYPE='PERSON">Dooner</ENAMEX >
suns a
powerboat .
</p >
<0>Now, Mr
. <ENAMEX TYPE="PERSON">James</ENAMEX> is preparing to sell into the sunset, and Mr .
<ENAMEX TYPE="PERSON">Dooner</ENAMEX> is poised to rev up the engines to guide Interpublic Group' .McCann-Erickson into the <TIMEX TYPE="TIME">21st ceatury</TIMEX>
. Yesterday, McCann madeofficial whet had been widely anticipated : Mr
. <ENAMEX TYPE="PERSON">James</ENAMEX>, <TIMEX TYPE=•TIME•>57 years</TIMEX >old ,
is stepping down as chief executive officer on <TIMEX TYPE="DATE">July 1</TIMEX> and wil l
retire ea chairmen at the end of the year . He will be succeeded b yMr
. <ENAMEX TYPE="PERSON">Dooner</ENAMEX>, 45 .
</ p >
<0 >
It promises to be a smooth process, which is unusual given the
volatile atmosphere of the advertising business . But Mr . <ENAMEX TYPE="PERSON•>Dooner</ENAMEX> has•
big challenge that will be his top priority . "I'm going to focu s
on strengthening the creative work," he says . "There is room t ogrow
. We can make further improvements in terms of the perception o f
our creative work.*
<0 >
<ENAMEX TYPE=•PERSON">Evea Alen Gottesman</ENAMEX>, an analyst with PaineWebber, who believe .
McCann is filled with "vitality and is in "great shape," says tha t
from a creative standpoint, "You wouldn't pay to see their 'eel* ofcommercials.
</p>
< 0 >
While McCann's world-wide billings rose <NUMEX TYPE="PERCENT•>17%</NUMEX> t o
<TIMEX TYPE="TIME"><NUMEX TYPE=•MONEY'>$e .4 billion</NUMEX> lastyear</TIMEX>
from <NUMEX TYPE="MONEY•>$5 .7 billion</NUMEX> in <TIMEX TYPE="DATE">1999</TIMEX> ,
loss of the key creative assignment for the prestigious Coca-Col aClassic account
. '1 would be less than honest to say I'm no t
disappointed not to be able to claim creative leadership for Coke, 'Mr
. <ENAMEX TYPE="PERSON">Dooaer</ENAMEX> says .
</p>
<0 >
McCann still handles promotions and media buying for Coke . Bu t
the bragging rights to Coke's ubiquitous advertising belongs t o
Creative Artists Agency, the big <ENAMEX TYPE=•LOCATION">Hollywood</ENAMEX> talent agency . 'We are
striving to have a strong renewed creative partnership with
Coca-Cola" Mr . <ENAMEXTYPE=°PERSON•>Dooner</ENAMEX> says. However, odds of that happening are
slim since word from Coke headquarter. in <ENAMEX TYPE=•LOCATION•>Atlanta</ENAMEX> is that CAA an d
other ad agencies, such as Fallon McElligott, will continue t o
handle Coke advertising .
</p>
<0 >Mr
. <ENAMEX TYPE="PERSON">Dooner</ENAMEX>, who recently lost <NUMEX TYPE='MONEY•>60 pounds</NUMEX> ove rthree-and-4-<TIMEX TYPE="TIME">half
months</TIMEX>, says now that he has "reinvented" himself, he wents to d o
the same for the agency. For Mr. <ENAMEX TYPE='PERSON'>Dooner</ENAMEX>, it means maintaining hi s
running and exercise schedule, and for the agency, it mein .
developing more global campaigns that nonetheless reflect localcultures
. One McCann account, •1 Can't Believe It's Not <ENAMEX TYPE="PERSON'>Bntter</ENAMEX> ' a
butter substitute, is in 11 countries, for example .
</p>
<0 >McCann has initiated a new so-called global collaborative system ,
composed of world-wide account directors paired with creative
partnen. In addition, <ENAMEX TYPE=•PERSON•>Peter Kim</ENAMEX> was hired from WPP Group's J .
<ENAMEX TYPE="PERSON">Walter Thompson</ENAMEX> lest <TIMEX TYPE=•DATE">September</TIMEX> as vice chairman, chief strateg yofficer, world-wide .
</ p >
< 9>Mt
. <ENAMEX TYPE="PERSON•>Dooner</ENAMEX> doesn't see • creative malaise permeating the agency .
He points to several campaigns with pride, including the Taster' s
Choice commercials that are like a running soap opera . "It's a <NUMEX TYPE="MONEY">$19</NUMEX >
million campaign with the recognition of a <NUMEX TYPE='MONEY•>S200 million</NUMEX> campaign, "
he ways of the commercials that feature a couple that must hold a
274
record for the length of time dating before kissing .
</0 >
<0 >Eve% no . Mr . <ENAMEX TYPE=•PERSON">Dooner</ENAMEX> Is on the prowl for more creative talent an d
is interested in acquiring a hot agency. He says he would like to finalise an a<gaisition "yesterday I'm not knows for patience . *
</p >
<0 >Mr . <ENAMEX TYPE="PERSON•>0 oo0et</ENAMEX> met with <ENAMEX TYPE="PERSON">Martin Puris</ENAMEX>, president an d
chief executiv e
officer of Ammirati & Puri., about McCann's acquiring the agenc y
with billings of <NUMEX TYPE=•MONEY">$400 million</NUMEX>, but aothiog has materialised . "Ther e
is no question,' says Mr . <ENAMEX TYPE=•PERSON">Dooner</ENAMEX>, 'that we are looking for qualit y
acquisitions and Ammirati & Paris is a quality operation . There are
some people and entire agencies that I would love to see be part of
the McCann family .' Mt . <ENAMEX TYPE="PERSON">Dooner</ENAMEX> declines to identify possibl e
acquisitions .
</P >
<0>Mr. <ENAMEX TYPE="PERSON'>Dooner</ENAMEX> is just gearing up for the headaches of running one o f
the largest world-wide agencies . (There are no immediate plena to
replace Mr . <ENAMEX TYPE ."PERSON">Doonet</ENAMEX> as president ; Mr . <ENAMEX TYPE=•PERSON">James</ENAMEX> operate d
as chairman, chief executive officer and president for a period of time .) Mr .
<ENAMEX TYPE='PERSON">llmes</ENAMEX> is filled will thoughts of enjoying his three hobbies : . ..Wag, skiing and hunting .
</p >
<0 >
Asked why he would choose to voluntarily exit while he still i s
so young. Mr . <ENAMEX TYPE="PERSON">James</ENAMEX> says It is time to bee tad selfish about bow h e
spends his deys. Mr. <ENAMEX TYPE="PERSON">James</ENAMEX>, who has a reputation as a n
extraordinarily tough taskmaster, nays that because he •had a grea t
time" is advertising," he doesn't want to "talk about th e
disappointments .' In fact, when he is asked his opinion of the new
batch of Coke eds from CAA, Mr . <ENAMEX TYPE=•PERSON'>James</ENAMEX> places his hands over hi s
mouth . He shrugs . He doesn't utter ..ord . He has, he says, fond
memories of working with Coke executives . "Coke has given os grea t
highs,* says Mr . <ENAMEX TYPE="PERSON">James</ENAMEX>, sitting in his pluck office, filled wit h
photographs of sailing es well as huge models of, among other tkings, a Dutch tugboat .
OP >
<0 >
He says he feels a "great sense of accomplishment ." In 36
countries, McCann is tanked in the top three ; in 75 countries, it is
in the top 10.
</P >
<0>Soon, Mr . <ENAMEX TYPE="PERSON">lames</ENAMEX> will be able to compete in as many sating race s
es he chooses . And concentrate on his duties es rear commodore a t
the <ENAMEX TYPE="LOCATION">New York</ENAMEX> Yacht Club .
</It >
<0 >Maybe he'll even leave something from hie office for Mr . <ENAMEX TYPE="PERSON•>Dooner</ENAMEX> .
Perhaps a framed page from the <ENAMEX TYPE="LOCATION•>New York</ENAMEX> Times, dated <TIMEX TYPE="DATE'>Dec . 8</TIMEX> ,
<TIMEX TYPE="DATE">1987</TIMEX> ,
*bowing a year-end chart of the stock market crash earlier tha tyear
. Mr. <ENAMEX TYPE="PERSON">James</ENAMEX> says he framed it and kept it by his desk as a"personal reminder
. It can all be gone like thee !
</ p >
</TXT >
</DOC>
Discussion of Result s
For each subtask, we show the D/T (developed/tested) ratio, the ratio of the number of articles used fo r
development to the number of articles used for testing . For errors, we show a brief explanation of the cause .
NUMEXES
D/T = 100/600 + many convoluted made up example s
Our perfori.)ance on numexes is very good— it is not an extremely difficult task and we strove and achieved
a near-completeness. We correctly identified interesting cases :
"C$7.4 million"
"135 million Canadian dollar?
"$414, 1 67'
"87.5 cents"
"18 Canadian cent?
"three cent ?
"five to 10 cents"
Occasionally the system makes an error . For example, "though it is assumed that dozens of them won 't
be", "Co ., to succeed Mark'', "60 pounds" are identified as money (the system knows pound, mark and wo n
to be British, German and Korean monetary units, respectively ; the culprit is too local interpretation . )
TIMEXES
D/T = 100/600
275
Our very good performance on the task of identifying temporal expressions was sligthly improved wit h
handling numbers . Previously we were missing "bare" numeric years such as "the 1980 election" , " 1980s" ,
"pre-1970' , and dates such as "4.12" (as referring to "April 12-th" ).
Our errors included : " 57 years old" (probably misinterpreted task definition), "last year" (deliberatel y
kept) and "three-and-a-half months" (only some dashes handled) .
LOCATION S
D/T = 100/600 (100/3000)
Good performance on identifying locations stems from the combination of our rather complete critica l
knowledge bases with the major named types and geographical information for all countries and automati c
interpretation of locative expressions with a known geographical type such as "the city of Farmington Hills"
and " The Isle of Man" , and expressions of the form : "Smaller Region, Larger Region " such as "Poland, New
York" .
PERSONAL NAME S
D/T = 100/600 (100/3000)
Unsurprisingly good performance of our system in processing personal names stems from our fairl y
complete knowledge bases with reasonable and not very strange first names along with the gender an d
nicknames as aliases information . Processing personal names is also supported by an extensive, 300-entr y
dictionary of titles and professional degrees . Some of our more interesting titles include :
"Lieutenant Junior Grade" abbreviated as " Lt. jg" , "Knight Commander of the Order of the British
Empire" and "Your Serene Highness" .
We correctly identified the following interesting cases :
"Ms. Washington" , " Mr. York" and "Ms. Lansing" were not confused with locations
"John J. Dooner Jr''
"Steve" as first name reference
"Kenneth H. Thorn" no title
"Peter A . Left" was not confused with the direction as in "go to the left'
"Howard Dean" was not confused with the title
Our errors included:
"Robert S. "Steve " Miller" as two persons "Robert S." and "Steve Miller" (doublequotes basically ignored)
"CFO Paul Rizzo" as a personal name (no real correction of misspels)
"Thomas H. O'Brien Jr." (forgot about the singlequote in names, single letter cannot be a last name)
"Dawn Capital" as a personal name (organization names suppressed )
"Sun Chief Executive Scott McNealy" with "Sun" as TIMEX because abbreviates "Sunday" , "Chief' as
title because of a short cut, "Executive Scott McNealy" as a first-middle-last name tripl e
(organization names Jot marked and interpretation too local )
"does not consider himself a Butt-Head Astronomer" as a person with the "Head" title and the last nam e
"Astronomer" (interpretation too local )
Surprises
We were surprised by two things . First, that there is so much (depth to) abbreviations. And second, that
personal pronouns do not obey expected distance constraints, contrary to the claims in the existing literatur e
that 96% of pronominal referents are no further than three sentences away .
Greatest Limiting Factor s
For us, the greatest limiting factors were time, time, and time again . We also found it very hard to create an d
debug large, circa 20,000 entry, typed knowledge bases . Our knowledge acquisition and debugging technique s
involved a combination of hand-crafting (geographical knowledge base), scanning (abbreviations), materia l
that others shared with us (for example, our department gave us an on-line, 800-or-so list of universities), an d
semi-automatic acquisition (our mentioned earlier newly developed technique based on definite anaphora) .
276
References
[exp, 1996] (1996). Special Issue of the International Journal of Expert Systems on Knowledge Representatio n
and Inference for Natural Language Processing. JAI Press, Inc .
[Ait-Kaci, 1986] Ait-Kaci, H . (1986) . An algebraic semantics approach to the effective resolution of type
equations . Journal of Theoretical Computer Science, 45:293-251 .
[Ait-Kaci and Richard Meyer and Peter Van Roy, 1993] Ait-Kaci and Richard Meyer and Peter Van Roy ,
H. (1993). Wild LIFE: A User Manual (Preliminary version) . Prl research report, Digital Equipmen t
Corporation, Paris Research Laboratory .
[Barwise and Cooper, 1981] Barwise, J . and Cooper, R . (1981) . Generalized quantifiers and natural lan-
guage. Linguistics and Philosophy, 4 :159-219 .
[Dowty et al., 1981] Dowty, D . R., Wall, R. E ., and Peters, S. (1981). Introduction to Montague semantics.
D. Reidel Publ . Co.
[Earley, 1985] Earley, J . (1985) . An efficient context-free parsing algorithm . In Readings in Natural Language
Processing, pages 25-33. Morgan Kaufmann Publishers, Inc .
[Hamm, 1989] Hamm, F . (1989) . Naturlich-sprachliche quantoren. Max Niemeyer Verlag.
[Horn, 1972] Horn, L . R. (1972) . On the Semantic Properties of Logical Operators in English . IULC.
[Horn, 1989] Horn, L . R. (1989) . A Natural History of Negation . The University of Chicago Press .
[Iwanska, 1989] Iwanska, L . (1989). Automated Processing of Narratives Written by 6-12 Grade Students :
The BILING Program. Technical Report UIUCDCS-R-89-1508, Dept . of Computer Science, University o f
Illinois at Urbana-Champaign .
[Iwanska, 1992] Iwanska, L . (1992) . A General Semantic Model of Negation in Natural Language: Repre-
sentation and Inference . In Proceedings of the Third International Conference on Principles of Knowledg e
Representation and Reasoning (KR92), pages 357-368 .
[Iwanska, 1993] Iwanska, L . (1993). Logical Reasoning in Natural Language: It Is All About Knowledge .
International Journal of Minds and Machines, Special Issue on Knowledge Representation for Natura l
Language, 3(4):475-510.
[Iwanska, 1994] Iwanska, L . (1994) . Talking about Time: Temporal Reasoning as A Problem of Natura l
Language . In Working Notes of the AAAI Fall Symposium on Knowledge Representation for Natura l
Language Processing in Implemented Systems, pages 70-81 . Also appeared in Proceedings of the Thir d
International Workshop on Intelligent Information Systems, Springer Verlag .
[Iwanska, 1996a] Iwanska, L. (1996a) . Definite Anaphora for Knowledge Acquisition . In preparation .
[Iwanska, 1996b] Iwanska, L . (1996b). Natural (Language) Temporal Logic : Reasoning about Absolute an d
Relative Time . International Journal of Expert Systems (to appear).
[Keenan and Faltz, 1985] Keenan, E . L. and Faltz, L. M. (1985) . Boolean Algebra Semantics Of Natural
Language . D. Reidel Publ . Comp .
[Montague, 1973] Montague, R . (1973) . The Proper Treatment of Quantification in Ordinary English . In
Hintikka, J., Moravcik, J ., and Suppes, P ., editors, Approaches to Natural Language. D. Reidel Publ . Co.
[Palmer et al ., 1993] Palmer, M ., Passonneau, R., Weir, C., and Finin, T. (1993) . The KERNEL Text
Understanding System . Artificial Intelligence, 63 :17-69.
[Pereira, 1987] Pereira, F . (1987). Grammars and logics of partial information . In Proceedings of of the
International_Conference on Logic Programming, volume 2, pages 989-1013 .
[Winograd, 1983] Winograd, T . (1983). Language as a Cognitive Process. Addison-Wesley Publ . Comp .
277
