Recognizing Referential Links: 
An Information Extraction Perspective 
Megumi Kameyama 
Artificial Intelligence Center 
SRI International 
333 Ravenswood Ave., Menlo Park, CA 94025, U.S.A. 
megumi@ai, sri. tom 
Abstract 
We present an efficient and robust reference reso- 
lution algorithm in an end-to-end state-of-the-art 
information extraction system, which must work 
with a considerably impoverished syntactic analy- 
sis of the input sentences. Considering this disad- 
vantage, the basic setup to collect, filter, then order 
by salience does remarkably well with third-person 
pronouns, but needs more semantic and discourse 
information to improve the treatments of other ex- 
pression types. 
Introduction 
Anaphora resolution is a component technology of 
an overall discourse understanding system. This 
paper focuses on reference resolution in an infor- 
mation extraction system, which performs a par- 
tial and selective 'understanding' of unrestricted 
discourses. 
Reference Resolution in IE 
An information extraction (IE) system automat- 
ically extracts certain predefined target informa- 
tion from real-world online texts or speech tran- 
scripts. The target information, typically of the 
form "who did what to whom where when," is ex- 
tracted from natural language sentences or format- 
ted tables, and fills parts of predefined template 
data structures with slot values. Partially filled 
template data objects about the same entities, en- 
tity relationships, and events are then merged to 
create a network of related data objects. These 
template data objects depicting instances of the 
target information are the raw output of IE, ready 
for a wide range of applications such as database 46 
updating and summary generation. 1 
In this IE context, reference resolution takes the 
form of merging partial data objects about the 
same entities, entity relationships, and events de- 
scribed at different discourse positions. Merging 
in IE is very difficult, accounting for a significant 
portion of IE errors in the final output. This pa- 
per focuses on the referential relationships among 
entities rather than the more complex problem of 
event merging. 
An IE system recognizes particular target infor- 
mation instances, ignoring anything deemed irrel- 
evant. Reference resolution within IE, however, 
cannot operate only on those parts that describe 
target information because anaphoric expressions 
within target linguistic patterns may have an- 
tecedents outside of the target, and those that oc- 
cur in an apparently irrelevant pattern may ac- 
tually resolve to target entities. For this reason, 
reference resolution in the IE system needs ac- 
cess to all of the text rather than some selective 
parts. Furthermore, it is reasonable to assume that 
a largely domain-independent method of reference 
resolution can be developed, which need not be 
tailored anew each time a new target is defined. 
In this paper, I discuss one such entity refer- 
ence resolution algorithm for a general geo-political 
business domain developed for SRI's FASTUS TM 
system (Hobbs et al., 1996), one of the leading IE 
systems, which can also be seen as a representative 
of today's IE technology. 
IThe IE technology has undergone a rapid development 
in the 1990s driven by the series of Message Understand- 
ing Conferences (MUCs) in the U.S. government-sponsored 
TIPSTER program (http ://www. tipster, org). 
The Input to Reference Resolution 
Multiple top-scoring sites working on IE have con- 
verged on the use of finite-state linguistic patterns 
applied in stages of smaller to larger units. This 
finite-state transduction approach to IE, first intro- 
duced in SRI's FASTUS, has proven effective for 
real-world texts because full parsing is far too am- 
biguous, slow, and brittle against real-world sen- 
tences. This means that we cannot assume correct 
and full syntactic structures in the input to ref- 
erence resolution in a typical IE system. The in- 
put is a set of (often overlapping or discontiguous) 
finite-state approximations of sentence parts. We 
must approximate fine-grained theoretical propos- 
als about referential dependencies, and adapt them 
to the context of sparse and incomplete syntactic 
input. 
The input to reference resolution in the the- 
oretical literature is assumed to be fully parsed 
sentences, often with syntactic attributes such as 
grammatical functions and thematic roles on the 
constituents (Webber, 1978; Sidner, 1979; Hobbs, 
1978; Grosz, Joshi, and Weinstein, 1995). In im- 
plemented reference resolution systems, for pro- 
noun resolution in particular, there seems to be 
a trade-off between the completeness of syntac- 
tic input and the robustness with real-world sen- 
tences. In short, more robust and partial parsing 
gives us wider coverage, but less syntactic infor- 
mation also leads to less accuyate reference reso- 
lution. For instance, Lappin and Leass (1994) re- 
port an 86% accuracy for a resolution algorithm 
for third-person pronouns using fully parsed sen- 
tences as input. Kennedy and Boguraev (1996) 
then report a 75% accuracy for an algorithm that 
approximates Lappin and Leass's with more robust 
and coarse-grained syntactic input. After describ- 
ing the algorithm in the next section, I will briefly 
compare the present approach with these pronoun 
resolution approaches. 
Algorithm 
This algorithm was first implemented for the 
MUC-6 FASTUS system (Appelt et al., 1995), and 
produced one of the top scores (a recall of 59% 
and precision of 72%) in the MUC-6 Coreference 
Task, which evaluated systems' ability to recog- 47 
nize coreference among noun phrases (Sundheim, 
1995). Note that only identity of reference was 
evaluated there. 2 
The three main factors in this algorithm are (a) 
accessible text regions, (b) semantic consistency, 
and (c) dynamic syntactic preference. The algo- 
rithm is invoked for each sentence after the earlier 
finite-state transduction phases have determined 
the best sequence(s) of nominal and verbal expres- 
sions. Crucially, each nominal expression is associ- 
ated with a set of template data objects that record 
various linguistic and textual attributes of the re- 
ferring expressions contained in it. These data ob- 
jects are similar to discourse referents in discourse 
semantics (Karttunen, 1976; Kamp, 1981; Heim, 
1982; Kamp and Reyle, 1993), in that anaphoric 
expressions such as she are also associated with 
corresponding anaphoric entities. A pleonastic it 
has no associated entities. Quantificational nomi- 
nals such as each company are associated with en- 
tity objects because they are 'anaphoric' to group 
entities accessible in the context. In this setup, the 
effect of reference resolution is merging of multiple 
entity objects. Here is the algorithm. 
1. INPUT: Template entities with the following 
textual, syntactic, and semantic features: 
(a) determiner type (e.g., DEF, INDEF, PRON) 
(b) grammatical or numerical number (e.g., SG, 
PL, 3) 
(c) head string (e.g., automaker) 
(d) the head string sort in a sort hierarchy (e.g., 
aut omaker-+ company-+organizat ion) 
(e) modifier strings (e.g., newly founded, with the 
pilots) 
(f) text span (the start and end byte positions) 
(g) sentence and paragraph positions 3 
(h) text region type (e.g., HEADLINE, TEXT) 
2Other referential relationships such as subset and part- 
whole did not reach sufficiently reliable interannotator 
agreements. Only identity of reference had a sufficiently 
high agreement rate (about 85%) between two human 
annotators. 
ZHigher text structure properties such as subsections and 
sections should also be considered if there are any. Exact 
accessibility computation using complex hierarchical text 
structures is a future topic of study. 
2. FOR EACH potentially anaphoric entity object 
in the current sentence, in the left-to-right order, 
DO 
(1) COLLECT antecedent entity objects from the 
accessible text region. 
• For an entity in a HEADLINE text region, 
the entire TEXT is accessible because the 
headline summarizes the text. 
• For an entity in a TEXT region, everything 
preceding its text span is accessible (except for 
the HEADLINE). Intrasentential cataphora 
is allowed only for first-person pronouns. 
• In addition, a locality assumption on 
anaphora sets a (soft) window of search for 
each referring expression type--the entire 
preceding text for proper names, narrower for 
definite noun phrases, even narrower for pro- 
nouns, and only the current sentence for re- 
flexives. In the MUC-6 system, the window 
size was arbitrarily set to ten sentences for 
definites and three sentences for pronouns, 
ignoring paragraph boundariesl and no an- 
tecedents beyond the limit were considered. 
This clearly left ample room for refinement. 4 
(2) FILTER with semantic consistency between 
the anaphoric entity E1 and the potential an- 
tecedent entity E2. 
• Number Consistency: El's number must be 
consistent with E2's number--for example, 
twelve is consistent with PLURAL, but not 
with SINGULAR. As a special case, plural 
pronouns (they, we) can take singular organi- 
zation antecedents. 
• Sort Consistency: El's sort must ei- 
ther EQUAL or SUBSUME E2's sort. 
This reflects a monotonicity assumption on 
anaphora--for example, since company sub- 
sumes automaker, the company can take a 
Chicago-based automaker as an antecedent, 
but it is too risky to allow the automaker 
to take a Chicago-based company as an 
antecedent. On the other hand, since 
4For example, in a more recent FASTUS system, para- 
graphs are also considered in setting the limit, and at most 
one candidate beyond the limit is proposed when no candi- 
dates are found within the limit. 48 
automaker and airline are neither the same 
sort nor in a subsumption relation, an au- 
tomaker and the airline cannot corefer. (The 
system's sort hierarchy is still sparse and in- 
complete.) 
• Modifier Consistency: El's modifiers must 
be consistent with E2's modifiers--for ex- 
ample, French and British are inconsistent, 
but French and multinational are consistent. 
(The system doesn't have enough knowledge 
to do this well.) 
(3) ORDER by dynamic syntactic preference. The 
following ordering approximates the relative 
salience of entities. The basic underlying 
hypothesis is that intrasentential candidates 
are more salient than intersentential candi- 
dates as proposed, for example, in Hobbs 
(1978) and Kameyama (in press), and that 
fine-grained syntax-based salience fades with 
time. Since fine-grained syntax with gram- 
matical functions is unavailable, the syntactic 
prominence of subjects and left-dislocation is 
approximated by the left-right linear ordering. 
i. the preceding part of the same sentence in 
the left-right order 
ii. the immediately preceding sentence in the 
left-right order 
iii. other preceding sentences within the 'limit' 
(see above) in the right-left order 
3. OUTPUT: After each anaphoric entity has found 
an ordered set of potential antecedent entities, 
there are destructive (indefeasible) and nonde- 
structive (defeasible) options. 
(a) Destructive Option: MERGE the anaphoric 
entity into the preferred antecedent entity . 
(b) Nondestructive Option: RECORD the an- 
tecedent entity list in the anaphoric entity to 
allow reordering (i.e., preference revisions) by 
event merging or overall model selection. 
The MUC-6 system took the destructive op- 
tion. The nondestructive option has been im- 
plemented in a more recent system. 
These basic steps of "COLLECT, FILTER, and 
ORDER by salience" are analogous to Lappin and 
Leass's (1994) pronoun resolution algorithm, but 
each step in FASTUS relies on considerably poorer 
syntactic input. The present algorithm thus pro- 
vides an interesting case of what happens with 
extremely poor syntactic input, even poorer than 
in Kennedy and Boguraev's (1996) system. This 
comparison will be discussed later. 
Name Alias Recognition 
In addition to the above general algorithm, a 
special-purpose alias recognition algorithm is in- 
voked for coreference resolution of proper names. 5 
1. INPUT: The input English text is in mixed 
cases. An earlier transduction phase has rec- 
ognized unknown names as well as specific- 
type names for persons, locations, or 
organizations using name-internal pattern 
matching and known name lists. 
2. FOR EACH new sentence, FOR EACH unknown 
name, IF it is an alias or acronym of an- 
other name already recognized in the given text, 
MERGE the two--an alias is a selective sub- 
string of the full name (e.g., Colonial for Colo- 
nial Bee)~, and acronym is a selective sequence 
of initial characters in the full name (e.g., GM 
for General Motors). 
Overall Performance 
The MUC-6 FASTUS reference resolution algo- 
rithm handled only coreference (i.e., identity of ref- 
erence) of proper names, definites, and pronouns. 
These are the 'core' anaphoric expression types 
whose dependencies tend to be constrained by sur- 
face textual factors such as locality. The MUC-6 
Coreference Task evaluation included coreference 
of bare nominals, possessed nominals, and indefi- 
nites as well, which the system did not handle be- 
cause we didn't have a reliable algorithm for these 
mostly 'accidental' coreferences that seemed to re- 
quire deeper inferences. Nevertheless, the system 
scored a recall of 59% and precision of 72% in the 
blind evaluation of thirty newspaper articles. 
5 In addition, a specific-type name may be converted into 
another type in certain linguistic contexts. For instance, in 
a subsidiary of Mrs. Field, Mrs. Field is converted from a 
person name into a company name. 49 
Expression Type Number of 
Occurrences 
Definites ' 61 
Pronouns 39 
Proper Names 32 
Reflexives 1 
TOTAL 133 
Correctly 
Resolved 
28(46%) 
24(62%) 
22(69%) 
1(100%) 
75(56%) 
Table 1: Core Discourse Anaphors in Five Articles 
Grammatical 
Person 
3rd person 
3rd person 
that 
lst/2nd person 
reflexive 
Intra/Inter-S 
Antecedent 
intra-S 
inter-S 
inter-S 
inter-S 
intra-S 
Number of 
Occurrences 
27 
6 
1 
5 
1 
Correctly 
Resolved 21(78%) 
2(33%) 0(0%) 
1(20%) 
1(100%) 
Table 2: Pronouns in Five Articles 
Table 1 shows the system's performance in re- 
solving the core discourse anaphors in five ran- 
domely selected articles from the development set. 
Only five articles were examined here because the 
process was highly time-consuming. The perfor- 
mance for each expression type varies widely from 
article to article because of unexpected features of 
the articles. For instance, one of the five articles is 
a letter to the editor with a text structure drasti- 
cally different from news reports. On average, we 
see that the resolution accuracy (i.e., recall) was 
the highest for proper names (69%), followed by 
pronouns (62%) and definites (46%). There were 
not enough instances of reflexives to compare. 
Table 2 shows the system's performance for pro- 
nouns broken down by two parameters, gram- 
matical person and inter- vs. intrasentential 
antecedent. The system did quite well (78%) 
with third-person pronouns with intrasentential 
antecedents, the largest class of such pronouns. 
Part of the pronoun resolution performance here 
enables a preliminary comparison with the results 
reported in (1) Lappin and Leass (1994) and (2) 
Kennedy and Boguraev (1996). For the third- 
person pronouns and reflexives, the performance 
was (1) 86% of 560 cases in five computer manu- 
als and (2) 75% of 306 cases in twenty-seven Web 
page texts. The present FASTUS system correctly 
resolved 71% of 34 cases in five newspaper arti- 
cles. This progressive decline in performance cor- 
responds to the progressive decline in the amount 
of syntactic information in the input to reference 
resolution. To summarize the latter decline, Lap-- 
pin and Leass (1994) had the following components 
in their algorithm. 
1. INPUT: fully parsed sentences with grammati- 
cal roles and head-argument and head-adjunct 
relations 
2. Intrasentential syntactic filter based on syntactic 
noncoreference 
3. Morphological filter based on person, number, 
and gender features 
4. Pleonastic pronoun recognition 
5. Intrasentential binding for reflexives and recip- 
rocals 
6. Salience computation based on grammatical 
role, grammatical parallelism, frequency of men- 
tion, proximity, and sentence recency 
7. Global salience computation for noun phrases 
(NPs) in equivalence classes (with seven salience 
factors) 
8. Decision procedure for choosing among equally 
preferred candidate antecedents 
Kennedy and Boguraev (1996) approximated the 
above components with a poorer syntactic input, 
which is an output of a part-of-speech tagger with 
grammatical function information, plus NPs recog- 
nized by finite-state patterns and NPs' adjunct and 
subordination contexts recognized by heuristics. 
With this input, grammatical functions and prece- 
dence relations were used to approximate 2 and 
5. Finite-state patterns approximated 4. Three 
additional salience factors were used in 7, and a 
preference for intraclausal antecedents was added 
in 6; 3 and 8 were the same. 
The present algorithm works with an even poorer 
syntactic input, as summarized here. 
1. INPUT: a set of finite-state approximations of 
sentence parts, which can be overlapping or dis- 
contiguous, with no grammatical function, sub- 
ordination, or adjunct information. 
. 
3. 
4. 
. 
50 
No disjoint reference filter is used. 
Morphological filter is used. 
Pleonastic pronouns are recognized with finite- 
state patterns. 
Reflexives simply limit the search to the current 
sentence, with no attempt at recognizing coar- 
guments. No reciprocals are treated. 
. Salience is approximated by computation based 
on linear order and recency. No grammatical 
parallelism is recognized. 
. Equivalence classes correspond to merged entity 
objects whose 'current' positions are always the 
most recent mentions. 
8. Candidates are deterministically ordered, so no 
decision procedure is needed. 
Given how little syntactic information is used in 
FASTUS reference resolution, the 71% accuracy in 
pronoun resolution is perhaps unexpectedly high. 
This perhaps shows that linear ordering and re- 
cency are major indicators of salience, especially 
because grammatical functions correspond to con- 
stituent ordering in English. The lack of disjoint 
reference filter is not the most frequent source of 
errors, and a coarse-grained treatment of reflexives 
does not hurt very much, mainly because of the in- 
frequency of reflexives. 
An Example Analysis 
In the IE context, the task of entity reference res- 
olution is to recognize referential links among par- 
tially described entities within and across docu- 
ments, which goes beyond third-person pronouns 
and identity of reference. The expression types to 
be handled include bare nominals, possessed nomi- 
nals, and indefinites, whose referential links tend to 
be more 'accidental' than textually signaled, and 
the referential links to be recognized include sub- 
set, membership, and part-whole. 
Consider one of the five articles evaluated, the 
one with the most number and variety of referential 
links, for which FASTUS's performance was the 
poorest. Even for the 'core' anaphoric expressions 
of 20 definites, 14 pronouns, and 7 names limited 
Expression Type Number of 
Occurrences 
Definites 
Pronouns 
Bare Nominals 
Proper Names 
Possessed Nominals 
Indefinites 
TOTAL 
25 
14 
12 
7 
6 
3 
Correctly 
Resolved 
10(40%) 
10(71%) 3(25%) 
2(29%) 0(0%) 
0(0%) 
25(37%) 
Table 3: Referential Links in the Example 
to coreference, for which the code was prepared, 
the recall was only 51%. 
Figure 1 shows this article annotated with ref- 
erential indices. The same index indicates corefer- 
ence. Index subscripts (such as 4a) indicate subset, 
part, or membership of another expression (e.g., 
indexed 4). The index number ordering, 1,...,N, 
has no significance. Each sentence in the TEXT 
region is numbered with paragraph and sentence 
numbers, so, for instance, 2-1 is the first sentence 
in the second paragraph. 
Note that not all of these referential links need 
be recognized for each particular IE application. 
However, since reference resolution must consider 
all of the text for any particular application for the 
reason mentioned above, it is reasonable to assume 
that an ideal domain-independent reference resolu- 
tion component should be able to recognize all of 
these. Is this a realistic goal, especially in an IE 
context? This question is left open for now. 
Error Analysis 
Table 3 shows the system's performance in recog- 
nizing the referential links in this article, grouped 
by referring expression types. These exclude the 
initial mention of each referential chain. Notable 
sources of errors and necessary extensions are sum- 
marized for each expression type here. 
Pronouns: Of the four pronoun resolution errors, 
one is due to a parse error (American in 7-1 was 
incorrectly parsed as a person entity, to which 
she in 8-1 was resolved), that in 6-2 is a discourse 
deixis (Webber, 1988), beyond the scope of the 
current approach, and two errors (it in 3-1 and 
its in 7-1) were due to the left-right ordering of 
intrasentential candidates. Recognition of par- 51 
allelism among clause conjuncts and a stricter 
locality preference for possessive pronouns may 
help here. 
Definites: Of the fifteen incorrect resolutions of 
definites, five have nonidentity referential rela- 
tionships, and hence were not handled. These 
nonidentity cases must be handled to avoid er- 
roneous identity-only resolutions. Two errors 
were due to the failure to distinguish between 
generic and specific events. Token-referring def- 
inites (the union in 8-2 and the company in 9-1) 
were incorrectly resolved to recently mentioned 
types. Three errors were due to the failure to 
recognize synonyms between, for example, call 
(3-2) vs. request (3-1) and campaign (9-1) vs. 
strategy (9-1). Other error sources are a failure 
in recognizing an appositive pattern (9-1), the 
left-right ordering of the candidates in the pre- 
vious sentence (9-2), and three 'bugs'. 
Proper Names: Name alias recognition was un- 
usually poor (2 out of 7) because American was 
parsed as a person-denoting noun. The lower- 
case print in Patt gibbs also made it difficult to 
link it with Ms. Gibbs. Such parse errors and 
input anomalies hurt performance. 
Bare Nominals: Since bare nominaJs were not 
explicitly resolved, the only correct resolutions 
(3 out of 12) were due to recognition of ap- 
positive patterns. How can the other cases be 
treated? We need to understand the discourse 
semantics of bare nominMs better before devel- 
oping an effective algorithm. 
Possessed Nominals: A systematic treatment of 
possessed nominals is a necessary extension. The 
basic algorithm will look like this--resolve the 
possessor entity A of the possessed nominal 
ASs B, then if there is already an entity of type 
B associated with A mentioned anywhere in the 
preceding text, then this is the referent of B. Pos- 
sessed nominal resolution also requires a 'syn- 
onymy' computation to resolve, for example, its 
battle (9-1) to its corporate campaign (7-1). It 
also needs 'inferences' that rely on multiple suc- 
cessful resolutions. For instance, her members 
in 8-1 must first resolve her to Ms. Gibbs, who 
HEADLINE: American Airlines1 Calls for Mediation15 In Its1 Union4 Talks2 
DOCUMENT DATE: 02/09/87 
SOURCE: WALL STREET JOURNAL 
1-1 Amr corp.'s American Airlines1 unit said it1 has called for federal mediation15 in itsl contract talks2 with unions4 
representing its1 pilots10 and flight attendants17. 
2-1 A spokesman for the companyl said Americanl officials "felt talks2 had reached a point where mediation15 would be 
helpful." 
2-2 Negotiations2a with the pilots16 have been going on for 11 months; talks2b with flight attendantslv began six months 
ago. 
3-1 The president5 of the Association of Professional Flight Attendants4a, which represents Americanl's more than 
10,000 flight attendantsl~, called the requests for mediationl~ "premature" and characterized its as a bargaining 
tactic that could lead to a lockout. 
3-2 Patt gibbss, presidents of the association4a, said talks2b with the company1 seemed to be progressing well and the 
calls for mediation15 came as a surprise. 
4-1 The major outstanding issue in the negotiations2b with the flight attendants17 is a two-tier wage scale, in which recent 
employees' salaries increase on a different scale than the salaries of employees who have worked at american1 for a longer 
time. 
4-2 The union4a wants to narrow the differences between the new scale and the old one. 
5-1 The company1 declined to comment on the negotiatlons2b or the outstanding issues. 
5-2 Representatives for the 5,400-member Allied Pilots Association4b didn't return phone calls. 
6-1 Under the Federal Railway Labor Act, \[if the mediatorls~ fails to bring the two sides~ together and the two sidesT 
don't agree to binding arbitration, a 30-day cooling-off period follows\]0. 
6-2 After that6, the unionTa can strike or the companyT~ can lock the unlon~a out. 
7-1 Ms. Gibbs5 said that in response to the companyl's move, hers union4a will be "escalating" its4~ "corporate campaign" 
against American1 over the next couple of months. 
7-2 In a corporate campaign10, a union9 tries to get a companys's financiers, investors, directors and other financial 
partners to pressure the companys to meet union9 demands. 
8--1 A corporate campaign~0, she5 said, appeals to her5 membersl~ because "it10 is a nice, clean way to take a job action, 
and our4~ women17 are hired to be nice." 
8-2 the union4a has decided not to strike, she5 said. 
9-1 The union4~ has hired a number of professional consultants14 in its4~ battle18 with the company1, including Ray 
Rogers14,~ of Corporate Campaign Inc., the New York labor consultant14a who developed the strategy12 at Geo. 
A. Hormel ~ Co.ls's Austin, Minn., meatpacking plant last year. 
9-2 That campaign12, which included a strike, faltered when the company13 hired new workers and the International 
Meatpacking Union wrested control of the local union from Rogers14a' supporters. 
Figure 1: Example Article Annotated with Referential Links 
52 
is president of the association, and this 'associa- 
tion' is the Association of Professional Flight At- 
tendants. After it is understood as 'the members 
of the Association of Professional Flight Atten- 
dants,' the coreference with the flight attendants 
can be inferred. Similarly for our women in 8-1. 
Indefinites: Some indefinites are 'coreferential' to 
generic types, for example, a corporate campaign 
(7-2, 8-1). Given the difficulty in distinguishing 
between generic and specific event descriptions, 
it is unclear whether it will ever be treated in a 
systematic way. 
Conclusions 
In an operational end-to-end discourse understand- 
ing system, a reference resolution component must 
work with input data containing parse errors, lex- 
icon gaps, and mistakes made by earlier reference 
resolution. In a state-of-the-art IE system such 
as SRI's FASTUS, reference resolution must work 
with considerably impoverished syntactic analysis 
of the input sentences. The present reference res- 
olution approach within an IE system is robust 
and efficient, and performs pronoun resolution to 
an almost comparable level with a high-accuracy 
algorithm in the literature. Desirable extensions 
include nonidentity referential relationships, treat- 
ments of bare nominals, possessed nominals, and 
indefinites, type-token distinctions, and recogni- 
tion of synonyms. Another future direction is to 
turn this component into a corpus-based statisti- 
cal approach using the relevant factors identified 
in the rule-based approach. The need for a large 
tagged corpus may be difficult to satisfy, however. 
Acknowledgment 
This work was in part supported by U.S. govern- 
ment contracts~ _--- _" __:3 

References 
Appelt, Douglas, Jerry Hobbs, John Bear, David Israel, 
Megumi Kameyama, Andy Kehler, David Martin, Karen 
Myers, and Mabry Tyson. 1995. SRI International FAS- 
TUS system: MUC-6 test results and analysis. In Pro- 
ceedings of the 6th Message Understanding Conference, 
pages 237-248. DARPA. 53 
Grosz, Barbara, Aravind Joshi, and Scott Weinstein. 1995. 
Centering: A framework for modelling the local coherence 
of discourse. Computational Linguistics, 21(2):203-226. 
Heim, Irene. 1982. The Semantics off Definite and Indefinite 
Noun Phrases. Ph.D. thesis, University of Massachusetts 
at Amherst. 
Hobbs, Jerry. 1978. Resolving pronoun references. Lin- 
gua, 44:311-338. Also in B. Grosz, K. Sparck-Jones, and 
B. Webber, eds., Readings in Natural Language Process- 
ing, Morgan Kaufmann, Los Altos, CA, 1986, 339-352. 
Hobbs, Jerry R., Douglas E. Appelt, John Bear, David 
Israel, Megumi Kameyama, Mark Stickel, and Mabry 
Tyson. 1996. FASTUS: A cascaded finite-state trans- 
ducer for extracting information from natural-language 
text. In E. Roche and Y. Schabes, editors, Finite State 
Devices \]for Natural Language Processing. MIT Press, 
Cambridge,Massachusetts. 
Kameyama, Megumi. in press. Intrasentential centering: A 
case study. In Marilyn Walker, Aravind Joshi, and Ellen 
Prince, editors, Centering in Discourse. Oxford Univer- 
sity Press. 
Kamp, Hans. 1981. A theory of truth and semantic repre- 
sentation. In J. Groenendijk, T. Janssen, and M. Stokhof, 
editors, Formal Methods in the Study of Language. Math- 
ematical Center, Amsterdam, pages 277-322. 
Kamp, Hans and Uwe Reyle. 1993. From Discourse to 
Logic. Kluwer, Dordrecht. 
Karttunen, Lauri. 1976. Discourse referents. In James D. 
McCawley, editor, Syntax and Semantics: Notes \]from the 
Linguistic Underground, volume 7. Academic Press, New 
York, pages 363-386. 
Kennedy, Christopher and Branimir Boguraev. 1996. 
Anaphora for everyone: Pronominal anaphora resolu- 
tion without a parser. In Proceedings off the 16th 
International Con\]ference on Computational Linguistics 
(COLING-'96}. Association for Computational Linguis- 
tics. 
Lappin, Shalom and Herbert Leass. 1994. An algorithm for 
pronominal anaphora resolution. Computational Linguis- 
tics, 20(4):535-562. 
Sidner, Candace. 1979. Towards a computational theory 
of definite anaphora comprehension in English discourse. 
Technical Report 537, MIT Artificial Intelligence Labo- 
ratory, Cambridge, MA, June. 
Sundheim, Beth. 1995. Overview of results of the MUC- 
6 evaluation. In Proceedings of the 6th Message Under- 
standing Conference, pages 13-32. DARPA. 
Webber, Bonnie. 1988. Discourse deixis: Reference to 
discourse segments. In Proceedings of the 26th Annual 
Meeting off the Association \]for Computational Linguis- 
tics, pages 113-122. Association for Computational Lin- 
guistics, June. 
Webber, Bonnie Lynn. 1978. A Formal Approach to Dis- 
course Anaphora. Ph.D. thesis, Harvard University. 
