CUSTOMIZING AND EVALUATING A MULTILINGUAL DISCOURSE MODULE 
Chinatsu Aone 
System Research and Applications Corporation (SRA) 
2000 15th Street North 
Arlington, VA 22201 
emaih aonec @ sra.com 
ABSTRACT 
In this papeh we first describe how we have custom- 
ized our data-driven multilingu~fl discourse module within 
our text understanding system lor dill'erent lm~guages and 
for a particular NLP application by utilizing hierm'chic~dly 
organized discourse KB's. Then, we report qum~titalive 
and qmditative findings from ewduating the system both 
with and without discourse processing, ~md discuss how 
resolving certain kinds of mmphora affects system perh)r- 
lnance. 
1 INTRODUCTION 
Although previous discourse rese,'uch (cf Hobbs \[7\], 
Webber 191, Grosz mid Sidner 16/, etc.) made significant 
contributions at a theorelic~d level, the effectiveness of 
discourse processing in NLP systems h~s not been studied 
so Ira' at a practical level (of. Walker \[8\]). In systems used 
in NLP applications such us the Message Underst,'ulding 
Conferences (of. 14, 5 I), discourse processing is often not a 
sep~u'ate module hut is pmt mid proeel of "template gener- 
ation." Thus, the eflbct of different types of discourse pro- 
cessing on a pm'liculm" task has not been shown either. 
In addition, both at Iheoretical and practical levels, few 
seem to have considered designing discourse processing in 
a way that is customizable for multiple languages and 
domains. However, since discourse phenomena differ 
mnong Imlguages mid even among domains within the 
same language, it ix desirable that discourse processing be 
custonfizahle and its result ewduable. 
In this paper, we descril)e how we have customized our 
multilingual discourse module wilhin our tcxl understand- 
ing system for a pm'ticulm" I~mk (i.e. data exla'action in the 
joint venture domain) in two different lmlguagcs (i.e. 
English mid Japanese), mid report the cwduation rcsul|s. 
2 DISCOURSE MODULE 
ARCHITECTURE 
In Aone ~ultl McKee \[2\], we have described our new 
language- and domain-independent discourse module 
within our text underslanding system. In addition to being 
lmlguage- mid domain-independent, the module is ewdu- 
able m~d mfinable to different applications and domains. 
The discourse mchitecture is motiwtted by our need to port 
our text uuderstmlding system to diflcrent languages (e.g. 
English, Japanese, Spanish) mid to different dom~dns (of. 
Aone et al. \[1\]). The discourse mtxlule is strictly dam- 
~hiven so that mmphora resolution h)r different lmlguages 
mid domains can be achieved sunply by selecling neces- 
sary dala. It consists of one discourse processor (the Reso- 
lution Engine) and three discourse knowledge bases (the 
Discourse Phenomenon KB, the Discourse Knowledge 
Source KB, the Discourse Domain KB). The Discourse 
AdminisUator ix a developmenl-time tool for defitfiug the 
three discourse KB's. The m'chitecture is shown in Figure 
I. 
peFfortll-At~tllatlliC,g 
deft,,, I \[ "q'~':Y 
I)iscc,urs¢ M~chlle 
Figure 1. Discourse Architecture 
2.1 I)iscourse Knowledge Bases 
The Discourse Knowledge Source KB houses small 
well-delined mmphora resolution strategies. Each knowl- 
edge source (KS) is an object in the hierarchically orga- 
nized KB, and infl)rmation can be inherited from more 
general to more specific KS's. This KB consists of three 
kinds of KS's: generators, \[liters and orderers. A generator 
is uscd to generate possible anlecedent hypotheses fi'om a 
certain region of text. Afilter is used to eliminate impossi- 
ble hypotheses, while an ot~lerer is used to rmlk possible 
hyl)othescs in a preference order if there is more than one. 
Most of the KS's are language-independen! (e.g. all the 
generalors and the semanlic tilters). Even when they are 
language-specilic, a sub-KS can inherit information from 
its superclass KS's while defining specific data lee:ally. For 
ex,'unple, the Semantic-Gender-Filter KS 1 deliues only 
funclional definition of this KS, while its sub-KS's for 
English ~md Japanese each specify \]~mguage-specific data 
~md inherit the stone funclioual definilion from their pro'on! 
KS. 
1. Seluanlic-Gender-Filter filters out an antecedent 
hypothesis whose semantic gende1 is not consistent 
with the restriction imposed by the syntaclic gender of 
~l pI'OIIOHI|, 
1109 
The Discourse Phenomenon KB contains hierarchi- 
cally organized discourse phenomenon objects (e.g. 
Nmne-Anaphora, DeIinite-NP) each of which specifies a 
definition of the discourse phenomenon and a set of KS's 
(i.e. generators, tilters, and orderers) to apply to resolve 
this particular discourse phenomenon. Because the dis- 
course KS's are independent of discourse phenomena, the 
stone discourse KS cm~ be shared by different discourse 
phenomena in different languages ,and domains. For exam- 
pie, KS's such as Sem,-mtic-Type-Filter and Recency- 
Orderer are used by most discourse phenomena in multiple 
languages. 
Finally, the Discourse Domain KB contains discourse 
domain objects each of which defines a set of discourse 
phenomena to hmldle in a particular domain. Since texts in 
different domains exhibit different sets of discourse phe- 
nomena, and since dilt'erent applications even within the 
same domain may not have to handle the same set of dis- 
course phenomena, the discourse domain KB is a way to 
customize ,and constrain the workload of tile discourse 
module. 
These three hierarchically organized discourse KB's 
make it possible to share some of the discourse KB's while 
also being able to add language- mid domain-specitic dis- 
course data. 
2.2 Resolution Engine 
The Resolution Engine is the run-time processing 
module which finds tile best ,antecedent hypothesis tot a 
given ~maphor by using the discourse KB's described 
above. First, it determines from the Discourse Dom~fin KB 
which discourse phenomena to handle giveu a particular 
language eald domain. Then, it uses the Discourse Phe- 
nomenon KB to classify ml auaphor as one of the dis- 
course phenomena and to decide which KS's to apply to it. 
Next, the Engine applies appropriate generator KS's to get 
,'m initial set of antecedent hypotheses, mid then applies fil- 
ter KS's to remove inconsistent hypotheses. When there is 
more than one hypothesis left, orderer KS's specified in 
the Discourse Phenomenon KB are invoked to rank the 
hypotheses. 
3 CUSTOMIZING DISCOURSE KB'S 
We have customized our discourse KB's to perform a 
data extraction t,'~sk in the joint venture domain. Our text 
understanding system takes English mid Japanese newspa- 
per articles about joint ventures as input (cf. Figure 2), and 
outputs database templates (eL Figure 3). The system has 
to extract from the ,articles infonnation regm'ding which 
organizations participate iu a joint venture (including a 
new joint venture compmly if any), what the purpose of 
tile joint venture is (e.g. selling coal), who tim people m'e 
that are associated with these organizations, etc. We made 
a task-oriented decision that handling organization mm- 
phora, both definite NPs (e.g. "the company") and name 
anaphora (e.g. "Toyota" for "Toyota Motors Corp."), is a 
top priority initially in order to improve performance. 
Thus, we created in the Discourse Domain KB a discourse 
domain object called JV-Data-Extraction which specifies 
that two discourse phenomenon objects from the Dis- 
course Phenomenon KB, namely mune anaphora (DP- 
Nmne) mid definite NP anaphora for orgmlizations (DP- 
DNP-Orgmlization), should be handled ill this application 
domain. 
NEW YORK -- A joint veature to export con 
from tile United States has been lbnned between 
M&M Ferrous America Ltd. here and Crown 
Coal & Coke Co., Pittsburgh. 
Coal obtained by Crown lroln v,'u-ious domes- 
tic mines will be marketed oflMlore by M&M, a 
lrading colnp~my formed six years ago by former 
PhilippiBrothers Inc. employees. Crown, which 
formerly had its own mines, heretofore marketed 
coati from v,'uious sources to domestic steehnak- 
ers only, according to Eric S. Katzenstein, M&M 
vice president. 
((omitted)) 
Eastern European countries such as Rommlia are 
likely mm'kets, he said. 
Figure 2. All Exmnple o1' Input Text 
<TIE UP REI, ATIONSHII'-2975348 1> := 
TIlL U P S TAT\[ IS: EXISTING 
ENTITY: <ENTITY-2975348-1> <F, NTITY-2975348-2> 
ACTIVITY: <ACTIVITY-2975348-1> 
<ENTITY-2975348-1 > := 
NAME: M&M Ferrous America I,TD 
AMASES: "M&M" 
I,OCATION: New York (CITY 4) New York (PROVINCE 1) 
United States (COUNTI),Y) 
TYPE: COMPANY 
PERSON: <PERSON-2975348-1> 
<ENTITY-2975348-2> := 
NAME: Crown C~al & Coke CO 
A/JASI~ : "(?rowll" 
I,OCAT\[ON: Pittsburgh (CITY 4) Pem~sylwtnia (IqU)VINCF, 1) 
United States (COUNTRY) 
TYPE: COMPANY 
<INDUSTRY-2975348-1> := 
IN D\[ ISTRY-TYPE: S ALES 
PRODUCT/SERVICE: (50 "Crown's coal") 
<ACTIVITY-2975348-1 > := 
\[NDUSTRY: <INDUSTRY-2975348 1> 
ACTIVITY-SIT|';: (Romania (COUNTRY) <ENTITY-2975348- 
1>) 
<PERSON-297534g- 1 > := 
NAME: Eric S. Katzenstein 
PERSON'S ENTITY: <I.;NTITY-2975348- I > 
POSITION: SREXEC 
Figure 3. An Exmnple ofau Output Template 
3.1 Name Anaphora 
In order to resolve name ~maphora, English mid Japa- 
nese share some of the KS's ill tile Discourse Knowledge 
Source KB, nmnely Current-Text-General01; Semmltic- 
1110 
Type-Filter, and Recency-Orderer. Tiffs generator gener- 
ates all the possible antecedenl hypotheses up to the cur- 
rent sentence. The Semantic-Type-Filter then checks if rice 
semantic type of amphor is consistent with that of an ;mte- 
cedent \[iypothesis. When there is more than one hypothe- 
sis left, the Recency-Orderer orders the hypotheses 
according to their proximity to the ataphot. 
In addition to the three lmcguage-independent KS's, 
each h'mguage uses a language-specific lilter. For English, 
a filter named Englis\[i-N,'une-Filter, which matches an 
anaphor (e.g. "Crown") with a subsequence of a~ mtte- 
cedent nane string (e.g. "Crown Coal & Coke CO"), is 
currently employed. For Japatese, mt additional sittgle fil- 
ler called Japanese-N,'une-Filter covers seemingly vast 
wu'iatious of Japanese company crane anaphora 2. This KS 
matches an attuphor with any conthiualion of characters in 
an ~mtecedenl as long as the character order is preserved 
(e.g. "abe" can be an anaphor of "abede"). One exceplion 
is lhal a~ mtaphor c~m have an extra word "s\[ia" at the end 
that is not a part of flte fnll company mune or a compmty 
acronym (e.g. "Westinghouse (WH)" can be refen'cd to 
auapltoric~dly by "Weslinghouse-sha" or "WH-sha"). 
3.2 Definite NP 
Attother discourse phenomctton which is handled lor 
this lask ix definite NPs relerriug Io organizations such as 
"the venture," "the West Germ+m electronics concern," 
etc., where the words "venture" and "cottcern" in these 
cotttexts point to subcltksses of I/to semanlic concept l+or an 
org~utization. Although Japanese does not have a delinite 
article, in writlen Japmlese the word "dou" (literally 
meaning "lice sane") prefixed to certain nout~s performs 
approximately the sane function ~Ls English tlelinite a'tiele 
"the". Both English and Japatese currently share the 
sane three KS's (i.e. Current-Text-Generatoc', Semantic- 
Type-Filter, Recency-Orderer) lot delinite NP resolution. 
Additionally, English uses Syntactic-Number-Filter, 
which checks if the syntaclic nnmber of the anaphor is 
consistent with that of ~m anlecedent hypolhesis. Although 
Japalese does not exhibit syntactic number distinction, a 
"don" phr~Lse can only refer semmttic~dly Io a single 
entity. 3 Thus, Japanese uses Semanlic-Amount-Eilter, 
which exchtdes semantically plural entities (e.g. a con- 
joined NP, ~m NP with a plural qnmttifier) as possible aate- 
cedents for a "dou" phrase. 
4 EVALUATION RESUI:I'S 
hi this section, we will report onr evahtalion results. 
We ran 100 Japanese and 100 English blind test joint vett- 
2, For example: 
~\[~(~H~f~), ~¢,~,~ (~.,~2Z~), • . 
3. A definite plural NP can be expressed in Japanese by 
a numeral or numerical quantilier plus a classifier, as in 
"ryousha" (file two companies) and "san-sha" (the 
three companies). 
lure ,'uticlcs through our text uuderslmlding system with 
and without the discourse module turned on, and scored 
the resnlls using an automalic scoring prognam. The scor- 
ing program uses a scoring metric from information 
retrieval, and reports recall and precision for each slot in 
the lemplates as well as a single combined score called F- 
measure 4 for overall perforlnance (of. 1141). 
It shouM be noted that this ewduation is a blackbox 
ewduation of the syslem as used in a particular application 
task. Consequently, the results do not directly reflect the 
perfonn~mce of the discourse module itself• For cxanple, 
this task does not require all company name anaphora (i.e. 
aliases) to be reported, but only those which are involved 
in joint ventures. Also, the causes of task l~tilure or success 
are somelimes due to the lhilure or success of system mod- 
ules other ttum the discourse module. For instance, the pro- 
processing system does not always recognize company 
names which me potential autecedenls. On the other hard, 
the preprocessing module rather than the discourse module 
sometimes recognizes compaty acronyms as aliases. 
Thus, the resnlts of the hlackbox ev~dnation reflect more 
on how the discourse module helps the whole system per- 
Ibnn a p~uticular task. 
4.1 Name Anaphora 
It is clem* that the perlbnmmce of name auaphora reso- 
htlion is directly linked to how well the system tills in the 
ALIASES slot in the output templates (of. Figure 3), The 
100 Japanese texts required idenlifying a total of 127 com- 
pany name aliases. With the discourse module tnnted on, 
the recall of Ihc ALIASES slot increases by 38 poinls and 
the precision by 16 points. Though the set of KS's used for 
nane amphora was mostly satisihctory, we lound one 
problem paticular to this domain in tx~th l~mguages. Since 
the texts arc in the joint venture domain, it is often i\[ie c~tse 
that the nane of a new joint venture company (e.g. 
"'Chrysler Japan") overlaps the nanes of its p~u'ent corn- 
panics (e.g. "Chrysler Corp."). Wlten the text nses a nane 
anaphor (e.g. "Chrysler"), it must refer to the pm+ent com- 
pany even when the joint venture company is mentioned 
most recently. We are plmming to add another orderer 
which preli~rs the pm'ent company when there is such a 
conllicl. 
4.2 I)elinite NP 
We hyt×)thesized that resolving delinitc NP's affects 
the extraclion of information about which company is per- 
forming which "economic activity" in a joint venture (e.g. 
Compaty A will nlanufaelufe ca's while Company B will 
mm'ket them), since snch information appem's later in at 
4. F-measure is calculated by: 
( \[\]7, + 1.0) x 1' x R 
1~2 x f'+R 
where 1' is precision, R is recall, and \[\] is the relative 
importmtce given to recall over precision. In this case, 
~= 1.0. 
111I 
article after compmties involved ill It joint venture are 
"already introduced into the discourse (e.g. "Publishing 
rivals Time Inc. and New York Thnes Co. said they agreed 
ill principle to form ajointiy owned national magazine dis- 
tribution partnership... The joint venture will continue to 
market mag~ines currently marketed by Tune Distribu- 
tion..."). 
Under the same test condition as above, the precision 
of the relevant slot (i.e. ACTIVITY-SITE slot ill Figure 3) 
increased by 5 points in JapaJmse when discourse process- 
ing was used. The recall was not affected much by the dis- 
course processing; it increased only by 1 point. In the 
English test, the changes in both precision and recall were 
negligible. One of the reasons for this less drastic incre~tse 
of this slot value is that the sentence expressing economic 
activities do not always use delinite NPs for the agents of 
such activities. Such agents can be expressed by name 
mlaphora or pronouns or, often in English, by implicit sub- 
jects of infinitives, as in "Siemens AG and GTE Corp. 
agreed to set up a new holding eomp~my in West Germany 
to oversee their telecommunications joint venture...". 
In addition, examination of the test results showed that 
when there are more than one antecedent hypothesis, topic 
marking (using particle "wa") plays a more significant 
role in determining the antecedent of a Japmmse "dou" 
definite NP th,'m recency. At the time of the testing, how- 
ever, we were not using topic marking infonnafion to pre- 
fer topicalized amecedent hypotheses. Another finding 
which is true of both Japanese and English is that definite 
NP ,'maphora resolution often requires pragmatic infercnc- 
ing ill order to obtain a fact which is not explicitly slated in 
the text. For ex,-unple, in order to resolve the definite NP in 
the senteuce "Chevron, an oil company, also said it 
acquired Rhonc-Poulenc's 30% interest in Petrosynthese 
S.A., boosting its holding in the French joint venture to 
65%," the discourse module has to infer either that 
Petrosyuthese S.A. is a French comp~my (perhaps from the 
company designator?) or that acquiring someone's holding 
ill a company increases one's holding in that company. Wc 
are currently adding KS's which m~dce use of topic infor- 
mation and pragmatic inferencing, ,and also investigating 
which combinations of KS's will optimize discourse pcr- 
fo iTllallce. 
Furthermore, we think that very little ch,-mge in recall 
is due to the fact that the system a~ssumed tile parent com- 
panies to be the value of ACTIVITY-SITE when it is 
undetermined. Thus, this detault value kept the recall of 
the system without discourse processing higher, mid them- 
fore the ACTIVITY-SITE slot was not as good an indica- 
tor of the discourse module performance as the ALIASES 
slot. 
It is interesting to note that ml approach like Dagan ~u~(I 
Itai's \[3\], which uses statistical data on semantic selec- 
tional restriction that is automatically acquired from large 
corpora to resolve anaphora 5, tines not work well in this 
domain. This is because a typical text in this domain con- 
tains at least two lX)ssible antecedents (joint venture part- 
ners m~d possibly a joint venture comp~my) of the s~une 
semm~tic type, munely organization, hn" a delinite NP ana- 
phora referring to organizations. 
4.3 Overall Performance 
Overall, discourse processing increased the system 
perh~rmance measured by tile combination of overall 
recall mid precision scores (i.e. F-measure) by 4 points in 
Japanese, mostly due to ~m overall increa.se in precision. 
Interestingly, the discourse processing helped also in the 
identification of links between organizations mid people, 
,'~s indicated by the PERSON slot of the <ENTITY> object 
,'rod the PERSON'S ENTITY slot of tile <PERSON> 
object (cf. Figure 3). With the discourse processing lunged 
on, the recall of both PERSON and PERSON'S ENTITY 
slots incre~Lsed by 7 points, and the precision by 10 points 
and 12 points respectively. 
We think that this is because when a person associated 
with an organization is mentioncd, the company mune or 
the person's naJne is often an anaphoric form as in "Carlos 
M. Herrera, president of Preferred," or "Katzenstein, a 
former executive with Bomar Resources Inc.". In order to 
undersUmd the relation between ,'m organization and a per- 
son as in "Eric S. Katzenstein, M&M vice president" (cf. 
Figure 2), tile system has to recognize both the alfilialion 
link between the person and the comDmy hnplicit in tile 
appositive phrase, and the mmphoric link between Ihe 
objects under different aliases. Our discourse module 
takes care of both identifying appositive relations (e.g. 
Eric S. Katzenstein is vice presideu0 and resolving u~une 
anaphora (e.g. "M&M" refers to "M&M Ferrous America 
Ltd."). 
5 CURRENT AND FUTURE WORK 
In this paper, we have described our multilingual dis- 
course module mid ils customized discourse KB's, and 
reported the blackbox ewduation results when it was used 
in a data extraction task in the joint venture domain. Cur- 
rently we arc working on the following two research areas 
in order to improve anaphora resolution. 
First, we are experimenting with ways to automate 
Iraining of anaphora resolulion by applying machine learn- 
ing so that the discourse module can be castomized auto- 
maritally to a p~ticular hmguage, domain or application 
without extensive manual knowledge engineering. Ill 
order to obtain feedback liar training, we must be able to 
automate glassbox evaluation of discourse processing 
itself. For this, we have built Iwo tools: a discourse tag- 
ging tool and a discourse evaluation tool. The former has 
been used to tag texts with discourse relations, while tile 
latter lakes discourseqagged corpora as a key and the sys- 
tem output as results to be ev~duated. 
5. According to theu approach, for a sentence "It was 
going to collect it," "governnmnt" is a preferred ante° 
cedent of the first "it," while "money" is of the sec- 
ond, using such statistics. 
11"12 
Second, we are expanding the rm~ge of anaphoric phe- 
nomena which our discourse module can hmldle. They 
include overt pronouns in English mM Spanish, and zero 
pronouns in Japanese and Spmfish. 
REFERENCES 
I1\] Chinalsu Aone, Hatte Blcjcr, Sharon Flank, Douglas 
McKee, and SmMy Shinn. The Murasaki Project: 
Multilingual Natural Language UnderslmMing. In 
Proceedings of the ARPA Human Language Technol- 
ogy Wvrkshop, 1993. 
12\] Chinatsu Aone and Dougl,~s McKce. Language-Inde- 
pendent Anaphora Resolution System lot Under- 
standing Multilingu~d "lbxls. In Proceedings of 31st 
Annual Meeting of the ACL, 1993. 
1131 Ido Dagml ~md Alon llai. Aulomatic Acquisition of 
Constraints for Ihe Resolution of Anaphora Refer- 
ences and Synlactic Ambiguities. In Proceedings of 
the 13th International Cot!ference on Computational 
Linguistics, 1990. 
141 Delense Adwmced Resem'ch Projecks Agency. Pro- 
ceedings of t"ourth Message Understanding Confi~r- 
ettce (MUC-4). Morgml Kauflmmn Publishers, 1992. 
\[51 Adwmced Resem'ch Pro}ecls Agency. Proceedings of 
Fourth Message Understanding Cot(erence (MUC- 
5). Morg,'m Kaufinmm Publishers, 1993. 
16\] Bm'bara Grosz and Candace L. Sidner. Attentions, 
hltenl.iol~s alld file Strtlcture of Discourse. Cotttpula- 
tional Linguistics, 12, 1986. 
171 Jen'y R. Hobbs. Pronoun Resolution. Technical 
Report 76-1, Depmlmenl of Computer Science, City 
College, City University of New York, 1976. 
\[81 Marilyn A. Walkel; Evaluating Discourse Processing 
Algorithms. In Proceedings of 27th Annual Meeting 
of the ACL, 1989. 
191 Bonnie Webber. A Formal Approach to Discourse 
Anaphora. Technical report, Bolt, Beranek, and New- 
mini, 1978. 
1113 
