A LANGUAGE-INDEPENDENT ANAPHORA RES()LUTION 
SYSTEM FOR UNDERSTANDING MULTILINGUAL TEXTS 
Chinatsu Aone and Douglas McKee 
Systems Research and Applications (SRA) 
2000 15th Street North 
Arlington, VA 22201 
aonec@sra.com, mckeed@sra.com 
Abstract 
This paper describes a new discourse module 
within our multilingual NLP system. Because of 
its unique data-driven architecture, the discourse 
module is language-independent. Moreover, the 
use of hierarchically organized multiple knowledge 
sources makes the module robust and trainable using 
discourse-tagged corpora. Separating discourse phe- 
nomena from knowledge sources makes the discourse 
module easily extensible to additional phenomena. 
1 Introduction 
This paper describes a new discourse module within 
our multilingual natural language processing system 
which has been used for understanding texts in En- 
glish, Spanish and Japanese (el. \[1, 2\])) The follow- 
ing design principles underlie the discourse module: 
• Language-independence: No processing code de- 
pends on language-dependent facts. 
• Extensibility: It is easy to handle additional phe- 
nomena. 
• Robustness: The discourse module does its best 
even when its input is incomplete or wrong. 
• Trainability: The performance can be tuned for 
particular domains and applications. 
In the following, we first describe the architecture 
of the discourse module. Then, we discuss how its 
performance is evaluated and trained using discourse- 
tagged corpora. Finally, we compare our approach to 
other research. 
1 Our system has been used in several data extraction tasks 
and a prototype nlachine translation systeln. 
perfo.m .... ~nti ~u2k c$~ " e dv 
....... ................... 
r ............ . ....... o ..................................... -, 
l:)i~ ~ Module 
Figure 1: Discourse Architecture 
2 Discourse Architecture 
Our discourse module consists of two discourse pro- 
cessing submodules (the Discourse A dministralor and 
the Resolution Engine), and three discourse knowl- 
edge bases (the Discourse Knowledge Source KB, 
the Discourse Phenomenon KB, and the Discourse 
Domain KB). The Discourse Administrator is a 
development-time tool for defining the three dis- 
course KB's. The Resolution Engine, on the other 
hand, is the run-time processing module which ac- 
tually performs anaphora resolution using these dis- 
course KB's. 
The Resolution Engine also has access to an ex- 
ternal discourse data structure called the global dis- 
course world, which is created by the top-level text 
processing controller. The global discourse world 
holds syntactic, semantic, rhetorical, and other infor- 
mation about the input text derived by other parts 
of the system. The architecture is shown in Figure i. 
2.1 Discourse Data Structures 
There are four major discourse data types within the 
global discourse world: Discourse World (DW), \[)is- 
156 
course Clause (DC), Discourse Marker (DM), and 
File Card (FC), as shown in Figure 2. 
The global discourse world corresponds to an entire 
text, and its sub-discourse worlds correspond to sub- 
components of the text such as paragraphs. Discourse 
worlds form a tree representing a text's structure. 
A discourse clause is created for each syntactic 
structure of category S by the semantics module. It 
can correspond to either a full sentence or a part of a 
flfll sentence. Each discourse clause is typed accord- 
ing to its syntactic properties. 
A discourse marker (cf. Kamp \[14\], or "discourse 
entity" in Ayuso \[3\]) is created for each noun or verb 
in the input sentence during semantic interpietation. 
A discourse marker is static in that once it is intro- 
duced to the discourse world, the information within 
it is never changed. 
Unlike a discourse marker, a file card (cf. Heim \[11\], 
"discourse referent" in Karttunen \[15\], or "discourse 
entity" in Webber \[19\]) is dynamic in a sense that 
it is continually updated as the discourse process- 
ing proceeds. While an indefinite discourse marker 
starts a file card, a definite discourse marker updates 
an already existing file card corresponding to its an- 
tecedent. In this way, a file card keeps track of all 
its co-referring discourse markers, and accumulates 
semantic information within them. 
2.2 Discourse Administrator 
Our discourse module is customized at development 
time by creating and modifying the three discourse 
KB's using the Discourse Administrator. First, a dis- 
course domain is established for a particular NLP ap- 
plication. Next, a set of discourse phenomena which 
should be handled within that domain by the dis- 
course module is chosen (e.g. definite NP, 3rd per- 
son pronoun, etc.) because some phenomena may 
not be necessary to handle for a particular applica- 
tion domain. Then, for each selected discourse phe- 
nomenon, a set of discourse knowledge sources are 
chosen which are applied during anaphora resolution, 
since different discourse phenomena require different 
sets of knowledge sources. 
2.2.1 Discourse Knowledge Source KB 
The discourse knowledge source KB houses small 
well-defined anaphora resolution strategies. Each 
knowledge source (KS) is an object in the hierarchi- 
cally organized KB, and information in a specific KS 
can be inherited from a more general KS. 
There are three kinds of KS's: a generator, a filter 
and an orderer. A generator is used to generate pos- 
w w     
hi* Edit '~4=1p 
/ 10 ..... J 
't-- "F'~-''=~ I 
i 
Figure 3: Discourse Knowledge Source KB 
sible antecedent hypotheses from the global discourse 
world. Unlike other discourse systems, we have multi- 
ple generators because different discourse phenomena 
exhibit different antecedent distribution patterns (cf. 
Guindon el al. \[10\]). A filter is used to eliminate im- 
possible hypotheses, while an orderer is used to rank 
possible hypotheses in a preference order. The KS 
tree is shown in Figure 3. 
Each KS contains three slots: ks-flmction, ks-data, 
and ks-language. The ks-function slot contains a 
functional definition of the KS. For example, the func- 
tional definition of the Syntactic-Gender filter defines 
when the syntactic gender of an anaphor is compati- 
ble with that of an antecedent hypothesis. A ks-data 
slot contains data used by ks-function. The sepa- 
ration of data from function is desirable because a 
parent KS can specify ks-function while its sub-KS's 
inherit the same ks-function but specify their own 
data. For example, in languages like English and 
Japanese, the syntactic gender of a pronoun imposes 
a semantic gender restriction on its antecedent. An 
English pronoun "he", for instance, can never refer 
to an NP whose semantic gender is female like "Ms. 
Smith". The top-level Semantic-Gender KS, then, 
defines only ks-flmction, while its sub-KS's for En- 
glish and Japanese specify their own ks-data and in- 
herit the same ks-function. A ks-language slot speci- 
fies languages if a particular KS is applicable for spe- 
cific languages. 
Most of the KS's are language-independent (e.g. 
all the generators and the semantic type filters), and 
even when they are language-specific, the function 
157 
(defframe discourse-world (discourse-d*ta-structure) 
date 
location 
topics 
position 
discourse-clauses 
s u b-discou rse-worlds~ 
; DW 
date of the text 
; loc~tion where the text is originated 
; semantic concepts which correspond to globM topics of the text 
; the corresponding character position in the text 
; ~ list of discourse clauses in the current DW 
; a list of DWs subordinate to the current one 
(defframe discourse-clause (discourse-d~ta-structure ; D(: 
discourse-markers ; ~ list of discourse m~rkers in the current D(:~ 
syntax ; ~n f-structure for the current DC 
parse-tree ; ~ p~rse tree of this S 
semantics ; ~ semantic (KB) object representing the current DC 
position ; the corresponding character position in the text 
d~te ; date of the current DC~ 
loca.tion ; Ioco.tlon of the current D(2 
subordinate-discourse-clsuse ; a DC," subordinate to the current D(: 
coordin~te-dlscourse-clattses) ; coordinate DC's which a conjoined sentence consists of 
II (dell di .... ........... ker(dl d ture' ;DM ........ ......... Jr 
position ; the corresponding character position in the text 
discourse-clause ; a pointer b~ck to DC: 
syntax ; an f-structure for the current DM 
semantics ; a semantic (KB) object 
file card) ; a pointer to the file card 
(deffr&me file-card (discourse-d~t~-structure) 
co-referring-discou rse-m~r kers 
u pd ated-semantic-info) 
; FC: 
a list of co-referring DM's 
; a semantic (KB) object which contains cumulative sem&ntlcs 
Figure 2: Discourse World, Discourse Clause, Discourse Marker, and File Card 
definitions are shared. In this way, much of the dis- 
course knowledge source KB is sharable across differ- 
ent languages. 
2.2.2 Discourse Phenomenon KB 
The discourse phenomenon KB contains hierarchi- 
cally organized discourse phenomenon objects as 
shown in Figure 4. Each discourse phenomenon ob- 
ject has four slots (alp-definition, alp-main-strategy, 
dp-backup-strategy, and dp-language) whose values 
can be inherited. The dp-definilion of a discourse 
phenomenon object specifies a definition of the dis- 
course phenomenon so that an anaphoric discourse 
marker can be classified as one of the discourse phe- 
nomena. The dp-main-strategy slot specifies, for each 
phenomenon, a set of KS's to apply to resolve this 
particular discourse phenomenon. The alp-backup- 
strategy slot, on the other hand, provides a set of 
backup strategies to use in case the main strategy 
fails to propose any antecedent hypothesis. The dp- 
language slot specifies languages when the discourse 
phenomenon is only applicable to certain languages 
(e.g. Japanese "dou" ellipsis). 
When different languages use different sets of KS's 
for main strategies or backup strategies for the same 
discourse phenomenon, language specific dp-main- 
strategy or dp-backup-strategy values are specified. 
For example, when an anaphor is a 3rd person pro- 
noun in a partitive construction (i.e. 3PRO-Partitive- 
Parent) 2, Japanese uses a different generator for the 
main strategy (Current-and-Previous-DC) than En- 
glish and Spanish (Current-and-Previous-Sentence). 
2e.g. "three of them" ill English, "tres de ellos" in Spanish, 
"uchi san-nin" in Japaamse 
Because the discourse KS's are independent of dis- 
course phenomena, the same discourse KS can be 
shared by different discourse phenomena. For exam- 
ple, the Semantic-Superclass filter is used by both 
Definite-NP and Pronoun, and the Recency orderer 
is used by most discourse phenomena. 
2.2.3 Discourse Domain KB 
The discourse domain KB contains discourse domain 
objects each of which defines a set of discourse phe- 
nomena to handle \[n a particular domain. Since 
texts in different domains exhibit different sets of dis- 
course phenomena, and since different applications 
even within the same domain may not have to handle 
the same set of discourse phenomena, the discourse 
domain KB is a way to customize and constrain the 
workload of the discourse module. 
2.3 Resolution Engine 
The Resolution Engine is the run-time processing 
module which finds the best antecedent hypothesis 
for a given anaphor by using data in both the global 
discourse world and the discourse KB's. The Resolu- 
tion Engine's basic operations are shown in Figure 5. 
2.3.1 Finding Antecedents 
The Resolution Engine uses the discourse phe- 
nomenon KB to classify an anaphor as one of the 
discourse phenomena (using dp-definition values) and 
to determine a set of KS's to apply to the anaphor 
(using dp-main-strategy values). The Engine then 
applies the generator KS to get an initial set of hy- 
potheses and removes those that do not pass tile filter 
158 
; • -~ . ~_. ..... _-._~_-'~ ~, ~,-,~-~ ..................... 
Figure 4: Discourse Phenomenon KB 
For each anaphoric discourse marker ill the current sentence: 
Find-Antecedent 
Input: aalaphor to resolve, global discourse world 
Get-KSs-for-Discourse-Phenomenon 
Input: anaphor to resolve, discourse phenomenon KB 
Output: a set of discourse KS's 
Apply-KSs 
hlput: aalaphor to resolve, global discourse world, discourse KS's 
Output: the best hypothesis 
Output: the best hypothesis 
Update-Discourse-World 
Input: anaphor, best hypothesis, global discourse world 
Output: updated global discourse world 
Figure 5: Resolution Engine Operations 
KS's. If only one hypothesis rernains, it is returned as 
the anaphor's referent, but there may be more than 
one hypothesis or none at all. 
When there is more than one hypothesis, orderer 
KS's are invoked. However, when more than one or- 
derer KS could apply to the anaphor, we face the 
problem of how to combine the preference values re- 
turned by these multiple orderers. Some anaphora 
resolution systems (cf. Carbonell and Brown \[6\], l~ich 
and LuperFoy \[16\], Rimon el al. \[17\]) assign scores 
to antecedent hypotheses, and the hypotheses are 
ranked according to their scores. Deciding the scores 
output by the orderers as well as the way the scores 
are combined requires more research with larger data. 
In our current system, therefore, when there are mul- 
tiple hypotheses left, the most "promising" orderer 
is chosen for each discourse phenomenon. In Section 
3, we discuss how we choose such an orderer for each 
discourse phenomenon by using statistical preference. 
In the future, we will experiment with ways for each 
orderer to assign "meaningful" scores to hypotheses. 
When there is no hypothesis left after the main 
strategy for a discourse phenomenon is performed, a 
series of backup strategies specified in the discourse 
phenomenon KB are invoked. Like the main strut- 
egy, a backup strategy specifies which generators, fil- 
ters, and orderers to use. For example, a backup 
strategy may choose a new generator which gener- 
ates more hypotheses, or it may turn off some of the 
filters used by the main strategy to accept previously 
rejected hypotheses. How to choose a new generator 
or how to use only a subset of filters can be deter- 
mined by training the discourse module on a corpus 
tagged with discourse relations, which is discussed in 
Section 3. 
Thus, for example, in order to resolve a 3rd per- 
son pronoun in a partitive in an appositive (e.g. 
anaphor ID=1023 in Figure 7), the phenomenon KB 
specifies the following main strategy for Japanese: 
generator = Head-NP, filters = {Semantic-Amount, 
Semantic-Class, Semantic-Superclass}, orderer = Re- 
cency. This particular generator is chosen because in 
almost every example in 50 Japanese texts, this type 
of anaphora has its antecedent in its head NP. No 
syntactic filters are used because the anaphor has no 
useful syntactic information. As a backup strategy, 
a new generator, Adjacent-NP, is chosen in case the 
parse fails to create an appositive relation between 
the antecedent NP ID=1022 and the anaphor. 
159 
The AIDS Surveillance Committee confirmed 7A1DSpatients yesterday. 
IDM-1 
semantics: Patient.101 I 
Three of them were 
hemophiliac. 
DM-2 semantics: 
Person.102 
FC-5 
coreferring-DM's: { DM-I DM-2} 
semantics: PatienL101 ^ Person.102 
Figure 6: Updating Discourse World 
2.3.2 Updating the Global Discourse World 
After each anaphor resolution, the global discourse 
world is updated as it would be in File Change Se- 
mantics (cf. Helm \[11\]), and as shown in Figure 6. 
First, the discourse marker for the anaphor is in- 
corporated into the file card to which its antecedent 
discourse marker points so that the co-referring dis- 
course markers point to the same file card. Then, the 
semantics information of the file card is updated so 
that it reflects the union of the information from all 
the co-referring discourse markers. In this way, a file 
card accumulates more information as the discourse 
processing proceeds. 
The motivation for having both discourse markers 
and file cards is to make the discourse processing a 
monotonic operation. Thus, the discourse process- 
ing does not replace an anaphoric discourse marker 
with its antecedent discourse marker, but only creates 
or updates file cards. This is both theoretically and 
computationally advantageous because the discourse 
processing can be redone by just retracting the file 
cards and reusing the same discourse markers. 
2.4 Advantages of Our Approach 
Now that we have described the discourse module in 
detail, we summarize its unique advantages. First, 
it is the only working language-independent discourse 
system we are aware of. By "language-independent," 
we mean that the discourse module can be used for 
different languages if discourse knowledge is added 
for a new language. 
Second, since the anaphora resolution algorithm is 
not hard-coded in the Resolution Engine, but is kept 
in the discourse KB's, the discourse module is ex- 
tensible to a new discourse phenomenon by choosing 
existing discourse KS's or adding new discourse KS's 
which the new phenomenon requires. 
Making the discourse module robust is another im- 
portant goal especially when dealing with real-world 
input, since by the time the input is processed and 
passed to the discourse module, the syntactic or se- 
mantic information of the input is often not as accu- 
rate as one would hope. The discourse module must 
be able to deal with partial information to make a 
decision. By dividing such decision-making into mul- 
tiple discourse KS's and by letting just the applicable 
KS's fire, our discourse module handles partial infor- 
mation robustly. 
Robustness of the discourse module is also mani- 
fested when the imperfect discourse KB's or an inac- 
curate input cause initial anaphor resolution to fail. 
When the main strategy fails, a set of backup strate- 
gies specified in the discourse phenomenon KB pro- 
vides alternative ways to get the best antecedent hy- 
pothesis. Thus, the system tolerates its own insuffi- 
ciency in the discourse KB's as well as degraded input 
in a robust fashion. 
3 Evaluating and Training the 
Discourse Module 
In order to choose the most effective KS's for a par- 
ticular phenomenon, as well as to debug and track 
progress of the discourse module, we must be able to 
evaluate the performance of discourse processing. To 
perform objective evaluation, we compare the results 
of running our discourse module over a corpus with 
a set of manually created discourse tags. Examples 
of discourse-tagged text are shown in Figure 7. The 
metrics we use for evaluation are detailed in Figure 8. 
3.1 Evaluating the Discourse Module 
We evaluate overall performance by calculating re- 
call and precision of anaphora resolution results. The 
higher these measures are, the better the discourse 
module is working. In addition, we evaluate the dis- 
course performance over new texts, using blackbox 
evaluation (e.g. scoring the results of a data extrac- 
tion task.) 
To calculate a generator's failure vale, a filter's false 
positive rate, and an orderer's effectiveness, the algo- 
rithms in Figure 9 are used. 3 
3.2 Choosing Main Strategies 
The uniqueness of our approach to discourse analysis 
is also shown by the fact that our discourse mod- 
ule can be trained for a particular domain, similar 
to the ways grammars have been trained (of. Black 
3,,Tile remaining antecedent hypotheses" are the hypothe- 
ses left after all the filters are applied for all anaphor. 
160 
Overall Performance: Recall = No~I, Precision = N¢/Nh 
I Number of anaphors in input 
Arc. Number of correct resolutions 
Nh Number of resolutions attempted 
Filter: Recall = OPc/IPc, \['recision = OPc/OP 
IP 
OP 
OF~ 
1 - OP/IP 
- or~/IF~ 
Number of correct pairs in input 
Number of pairs in input 
Number of pairs output and passed by filter 
Number of correct pairs output by filter 
Fraction of input pairs filtered out 
Fraction of correct answers filtered out (false positive rate) 
Generator: Recall = N¢/I, \['recision = Nc/Nh 
I 
Nh gc 
Nh/I 
1 - N~/I 
Number of anaphors in input 
Number of hypotheses in input 
Number of times correct answer in output 
Average number of hypotheses 
Fraction of correct answers not returned (failure rate) 
Orderer: 
I Number of anaphors in input 
N¢ Number of correct answers output first 
Nc/I Success rate (effectiveness) 
Figure 8: Metrics used for Evaluating and Training Discourse 
For each discourse phenomenon, 
given anaphor and antecedent pairs in the corpus, 
calculate how often the generator fails to generate the antecedents. 
For each discourse phenomenon, 
given anaphor and antecedent pairs in the corpus, 
for each filter, 
calculate how often the filter incorrectly eliminates the antecedents. 
For each anaphor exhibiting a given discourse phenomenon in the corpus, 
given the remaining antecedent hypotheses for the anaphor, 
for each applicable orderer, 
test if the orderer chooses the correct antecedent as the best hypothesis. 
Figure 9: Algorithms for Evaluating Discourse Knowledge Sources 
161 
<DM ID=-I000>T 1 ' ~'.~.~4S\]~<./DM> (<DM ID=1001 Type=3PARTA 
\[The AIDS Surveillance Corru~ttee of the Health and Welfare Ministry 
(Chairman, Prof¢.~or Emeritus Junlchi Sh/okawa), on the 6~h, newly 
COnfirmed 7 AIDS patients (of them 3 arc dead) and 17 iafec~d pcop!¢.\] 
<DM IDol 020 Typc-~DNP Ref=1000>~'/',: ~-?'~)~ ~ ~,:.~.~" J~D M > 
(7)-~ "k~<DM ID=1021>IKIJ~.</DM>~<DM lD=1022 Type=BE Ref=1021> 
~\[~'\]~.:~'~</DM> (<DM ID=1023 Type=3PARTA Ref=1021>5 
</DM>~-'Jx) . <DM ID=I02AType-ZPARTF Ref=1020></DM>--j ~, 
~'-~.~'~.~1~)~. <DM ID=1025 Typc--ZPARTF Ref=1020></DM> 
<\[}M ID=I026>~J~,</DM> (<DM ID=1027 Typc=JDEL Ref=1026>~ 
\[4 of ~ 7 ~:wly discovered patients were male homosexuals<t022> 
(of them<1023> 2 are dead), I is heterosexual woaran, and 2 (ditto l) 
are by contaminated blood product.\] 
La Comisio~n de Te'cnicos del SIDA informo' dyer 
de que existen <DM ID=2000>196 enfermos de 
<DM ID=2OOI>SIDA</DM></DM> en la Comunidad 
Valenciana. De <DM ID=2002 Type=PRO Reffi000>ellos 
</DM>, 147 corresponden a Valencia; 34, a Alicante; 
y 15, a Castello'n. Mayoritariamente <DM ID=2003 
Type=DNP Ref=2001>la enfermedad</DM> afecta a <DM 
ID=2004 Type=GEN~Ios hombres</DM>, con 158 cases. 
Entre <DN ID=2OOfi Type=DNP Ref=2OOO>los afectados 
</DM> se encuentran nueve nin~os menores de 13 an'os. 
Figure 7: Discourse Tagged Corpora 
\[4\]). As Walker \[lS\] reports, different discourse algo- 
rithms (i.e. Brennan, Friedman and Pollard's center- 
ing approach \[5\] vs. Hobbs' algorithm \[12\]) perform 
differently on different types of data. This suggests 
that different sets of KS's are suitable for different 
domains. 
In order to determine, for each discourse phe- 
nomenon, the most effective combination of gener- 
ators, filters, and orderers, we evaluate overall per- 
formance of the discourse module (cf. Section 3.1) at 
different rate settings. We measure particular gen- 
erators, filters, and orders for different phenomena 
to identify promising strategies. We try to mini- 
mize the failure rate and the false positive rate while 
minimizing the average number of hypotheses that 
the generator suggests and maximizing the number 
of hypotheses that the filter eliminates. As for or- 
derers, those with highest effectiveness measures are 
chosen for each phenomenon. The discourse module 
is "trained" until a set of rate settings at which the 
overall performance of the discourse module becomes 
highest is obtained. 
Our approach is more general than Dagan and Itai 
\[7\], which reports on training their anaphora reso- 
lution component so that "it" can be resolved to its 
correct antecedent using statistical data on lexical re- 
lations derived from large corpora. We will certainly 
incorporate such statistical data into our discourse 
KS's. 
3.3 Determining Backup Strategies 
If the main strategy for resolving a particular anaphor 
fails, a backup strategy that includes either a new 
set of filters or a new generator is atternpted. Since 
backup strategies are eml)loyed only when the main 
strategy does not return a hypothesis, a backup strat- 
egy will either contain fewer filters than the main 
strategy or it will employ a generator that returns 
more hypotheses. 
If the generator has a non-zero failure rate 4, a new 
generator with more generating capability is chosen 
from the generator tree in the knowledge source KB 
as a backup strategy. Filters that occur in the main 
strategy but have false positive rates above a certain 
threshold are not included in the backup strategy. 
4 Related Work 
Our discourse module is similar to Carbonell and 
Brown \[6\] and Rich and LuperFoy's \[16\] work in us- 
ing multiple KS's rather than a monolithic approach 
(cf. Grosz, Joshi and Weinstein \[9\], Grosz and Sidner 
\[8\], Hobbs \[12\], Ingria and Stallard \[13\]) for anaphora 
resolution. However, the main difference is that our 
system can deal with multiple languages as well as 
multiple discourse phenomena 5 because of our more 
fine-grained and hierarchically organized KS's. Also, 
our system can be evaluated and tuned at a low level 
because each KS is independent of discourse phenom- 
ena and can be turned off and on for automatic eval- 
uation. This feature is very important because we 
use our system to process real-world data in different 
domains for tasks involving text understanding. 

References 
\[i\] Chinatsu Aone, Hatte Blejer, Sharon Flank, 
Douglas McKee, and Sandy Shinn. The 
Murasaki Project: Multilingual Natural Lan- 
guage Understanding. In Proceedings of the 
ARPA Human Language Technology Workshop, 
1993. 
\[2\] Chinatsu Aone, Doug McKee, Sandy Shinn, 
and Hatte Blejer. SRA: Description of the 
SOLOMON System as Used for MUC-4. In Pro- 
ceedings of Fourth Message Understanding Con- 
ferencc (MUC-4), 1992. 
\[3\] Damaris Ayuso. Discourse Entities in JANUS. 
In Proceedings of 27th Annual Meeting of the 
ACL, 1989. 
\[4\] Ezra Black, John Lafferty, and Salim Roukos. 
Development and Evaluation of a Broad- 
(:',overage Probablistic Grammar of English- 
Language Computer Manuals. In Proceedings of 
30lh Annual Meeting of the ACL, 1992. 
\[5\] Susan Brennan, Marilyn Friedman, and Carl 
Pollard. A Centering Approach to Pronouns. In 
Proceedings of 25th Annual Meeting of the A(,'L, 
1987. 
\[6\] Jairne G. Carbonell and Ralf D. Brown. 
Anaphora Resolution: A Multi-Strategy Ap- 
/)roach. In Proceedings of the 12lh International 
Conference on Computational Linguistics, 1988. 
\[7\] Ido Dagan and Alon Itai. Automatic Acquisition 
of Constraints for the Resolution of Anaphora 
References and Syntactic Ambiguities. In Pro- 
ceedings of the 13th International Conference on 
Computational Linguistics, 1990. 
\[8\] Barbara Crosz and Candace L. Sidner. Atten- 
tions, Intentions and the Structure of Discourse. 
Computational Linguistics, 12, 1986. 
\[9\] Barbara J. Grosz, Aravind K. Joshi, and Scott 
Weinstein. Providing a Unified Account of Def- 
inite Noun Phrases in Discourse. In Proceedings 
of 21st Annual Meeting of the ACL, 1983. 
\[10\] Raymonde Guindon, Paul Stadky, Hans Brun- 
net, and Joyce Conner. The Structure of User- 
Adviser Dialogues: Is there Method in their 
Madness? In Proceedings of 24th Annual Meet- 
ing of the ACL, 1986. 
\[11\] Irene Helm. The Semantics of Definite and In- 
definite Noun Phrases. PhD thesis, University of 
Massachusetts, 1982. 
\[12\] Jerry R. Hohbs. Pronoun Resolution. Technical 
Report 76-1, Department of Computer Science, 
City College, City University of New York, 1976. 
\[13\] Robert Ingria and David Stallard. A Computa- 
tional Mechanism for Pronominal Reference. In 
Proceedings of 27th Annual Meeting of the ACL, 
1989. 
\[14\] Hans Kamp. A Theory of Truth and Semantic 
Representation. In J. Groenendijk et al., edi- 
tors, Formal Methods in the Study of Language. 
Mathematical Centre, Amsterdam, 1981. 
\[15\] Lauri Karttunen. Discourse Referents. In J. Mc- 
Cawley, editor, Syntax and Semantics 7. Aca- 
demic Press, New York, 1976. 
\[16\] Elaine Rich and Susan LuperFoy. An Architec- 
ture for Anaphora Resolution. In Proceedings of 
the Second Conference on Applied Natural Lan- 
guage Processing, 1988. 
\[17\] Mort Rimon, Michael C. McCord, Ulrike 
Schwall, and Pilar Mart~nez. Advances in Ma- 
chine Translation Research in IBM. In Proceed- 
zngs of Machine Translation Summit IIl, 1991. 
\[18\] Marilyn A. Walker. Evaluating Discourse Pro- 
cessing Algorithms. In Proceedings of 27th An- 
nual Meeting of the ACL, 1989. 
\[19\] Bonnie Webber. A Formal Approach to Dis- 
course Anaphora. Technical report, Bolt, Be- 
ranek, and Newman, 1978. 
