Pronominalization revisited* 
Renate Henschel and Hua Cheng ~md Massimo Poesio 
HCRC, University of Edinburgh, UK 
{henschel,huac,poesio}@cogsci.ed.ac.uk 
Abstract 
Pronolninalization has been related to tile idea 
of a local focus - a set of discourse entities in 
the speaker's centre of attention, for exmnple ill 
Gundel et al. (1993)'s givenness hierarchy or 
in centering theory. Both accounts say that the 
determination of tile tbcus depends on syntac- 
tic as well as pragmatic factors, but have not 
been able to pin those factors down. In this 
paper, we uncover the major factors which de- 
termine the focus set in descriptive texts. This 
new tbcus definition has been ew, luated with re- 
spect to two corporm museum exhibit labels, 
mid newspaper mtieles. It provides an opera- 
tionalizable basis for pronoun production, and 
has been implemented as the reusable module 
gnome-np. The algorithm l)ehind gnome-np is 
conlpared with the most recent pronoun gener- 
ation algorithm of McCoy and Strube (1999). 
1 Introduction 
Besides the well established problem of ln'onoun 
resolution, pronoun generation is now attract- 
ing renewed attention. In the past, generation 
systelns generated pronouns without attaching 
much importance to the problem, one notable 
exception being the classical algorithm of Dale 
(1990), loosely based on centering theory. With 
the emergence of corpus based studies in comtm- 
rational linguistics, the question arises whether 
it is possible to refine known standard algo- 
rithms, or whether an improvement is only to 
be achieved with the hell) of world knowledge 
reasoning - a matter too complex to be dealt 
with reliably at this time. Tile Ibrlner direction 
is represented by tile pioneering work of McCoy 
and Strube (1999). They propose a refined algo- 
rithm for tile choice between definite description 
on the one hand and pronoun on the other for 
* The work reported in this paper has been calmed out 
with the tinancial support of UK ESPllC grant L51126. 
animate referents I , which is based on distancG 
time structure mid ambiguity constraints. 
Here we introduce a more general algoritlnn 
for the pronominalization decision that is valid 
not only for animate but for inanimate referents 
as well. In conformity with McCoy and Strube, 
we group noun phrases with definite determiner 
and proper nalnes together under tile term "det'- 
inite description". The algorithm proposes a 
new pronominalization strategy, which beyond 
McCoy and Strube (1999)'s criteria makes use 
of the discom'se status of the antecedent and 
parallelism effects. 
The algorithm has been implemented as the 
reusable module gnome-np. It has been re-used 
in tile web hypertext generation system ILEX 
(see Oberlander el; al. (1998)). It shows ml 
accuracy over 87% with respect to two corpora 
(each 5000 words) of difl'erent genres. 
2 Accounts of pronominalization 
In previous ace(rants pronominalization has 
been related to the idea of a local focus of ~d;- 
tention: a set of discourse referents who/wlfich 
is in the center of attention of the speaker (e.g. 
Sidner (1979), givenness hierarchy (Gun(lel et 
al., 1993), centering theory (Grosz et al., 1995), 
RAFT/RAPId. (Suri, 1993)). Whereas (Gundel 
el; al., 1993) do not atteml)t to make their fo- 
cus notion operationalizable, this has been at- 
tempted by fllrther develolmlents of centering. 
However these have mostly been applied to the 
pronoun resolution problem. In the following 
we discuss three versions of centering and show 
that their application to the pronoun generation 
problem is nevertheless linfited. 
Centering. Centering was developed to ex- 
plain local discourse coherence; the extent to 
which it benefits pronoun generation is how- 
ever not immediately clear, hi centering, 
1We llSO the terms "discourse entity" and "refc'rent" 
synonymously in this paper. 
306 
the discour.~e entiti(:s (:vok(:d in a (',(n'l;ain ui:- 
terall(;e 'tt i ga'e c~dle(t fi)rward-looldng centers 
(Cfs). It is assumed i;lmt they are 1)ari;ially or- 
der(:d. As a major dei;erminant of the ordering, 
the gramma.tical fun(:t;ion hierarchy (roughly: 
SUI/.I~>OIM>OTIII~;IIS) has 1)een 1)r,')t)o,~ed. 13e- 
CallSe other fa.cl;ors afl'e(:ting th(: ord(:r have no(; 
1)een (',lld)or;~ted in de(;ail, this ranking (as tit(' 
only Ol)erai, ionaliza/)le ha.n(lle) has 1)(:(x)m(: the 
si;:m(lard ra.nking in several comt)utational ai)- 
1)\]ic~(,ions of (:entering. The 1)ackward-tooking 
center (hencefl)rth C1)) is a distinguished 1nero- 
bet of (;lie CI:~, which is defined as the most 
highly ranked member of the Ct5 of th(' previous 
lltterallee 'u,i_ \] which is realized in v,i. The Cb is 
consid(:red as (;h(: \]o('al focus of ai;i;(:ntion. Cen- 
tering sta,t('s two rules. Only the first; rule makes 
~t (:brim at)()u(; l)ronominaliz~tion: ll! any (;lemen(; 
of the uti;eran(:e ui--1 is realiz(,,d in v,i as l)ro- 
nomh th(m l;he C\]) must l)e l)r()l~()minalized in 'it i 
as well. As lloted tw McCoy mid Struhe (\]999), 
this rule apl)lies only in (,h(; ease that (:we sul)se- 
qllent lltterallCeS share Hl()r('~ (;}l~tll o11(~ ref('A'ellt~ 
m:td (;hal; l;h(: non-el) ref(.'r(,.nt is 1)ronominalized 
in (,he se(xmd ui:termme. \]}ut why (;\]fis non-el) 
referent is realized as a, l)ronomt is not given t)y 
(~h(' (:heory. 
Itowev(:r, f()lk)wing mot(: (;he sl>irit of (xuli;er- 
ing tha.n the actual definition, ()he (:au under- 
stand (;he el) as (;he refin'ent which is prefer¢d)lv 
l)ronominalize(t. General t)r(mominalizai;ion ()f 
the backward-looking center was in fact a claim 
of (mrly c(mtering, lmi; h~d 1;o l)e al)andone(1 be- 
cause of (:olmter-evidenee from r(,'al discourse. 
Bnt the id(::r (,ha.t t)l'on()nfinaliztti;ion of the Ct) 
could 1)e a, m(:ans of establishing lo(:td discourse 
(x)herence is still 1)revalenl;. \]t has accordingly 
})een use(t l)y seine generation systems to (:on- 
trol 1)ronominalization e.g. in the IIA,;X sys- 
l~em (O1)erlm~dcr et al., 1998), the el) is always 
realized as a pronomL 
Semantic centering. Centering is a.lso found 
in Dale (1990) as the method of t)ronominaliza- 
tion control. However, Dale's center detinii;ion 
differs from standard centering theory in i;hat it 
is defined semantically and not on the basis of 
a syni;aetie ranking. 2 This apl)roaeh has some 
appeal, espc.cially for generation, ix;cause it sup- 
l)ori, s the natural mo(hfla.rity bel;ween strate- 
2In 1)ari;icular, \])al(: adol)l;s (;he l'estll(; o1' I;he aci;ion 
(Ienol:ed 1)3' ihe previous claus(; of a recil)(: as (;he center. 
~i(: generation -wlfi(:h would determine t;he se- 
mantk: c(:nter for each uttera.llce and tactical 
g(:neration which dec.ides about granmmt:.ical 
fimetions. 
Functional centering. Finally, the cent;ering 
version suggest;ed \])y Stl'ul)e and Hahn (\]!)99) 
al>l)ears to r(:veal an underlying discourse mech- 
anism resl)onsil)le for centering: the information 
si;rH(:ture of an utterance (roughly the given- 
new i)ai;t(:rn) is (;he de('l)er reason for the rank- 
ing ()\[ the %rward-h)oking (:eni:(:rs. This l)er- 
mii;s a generalization of sl;andar(l centering into 
a -indel)ell(leld; l;heory eovering 1)oth 
trex: and fixed word-order s. It is how- 
ever then surprising that this result is not made 
maximld use of in the Sltb.sc,,(lttent generaISon- 
orient, cA work of McCoy an(1 Strube (1999). 
Beyond centering. The questions wlfi(:h re- 
main ol)en with all (;hree al)l)roat:hes - stan- 
dard (;(;nterillg, S(:lll;~ll(;ic centering and fun(:-- 
tional eentering- are: 
\].~.~ \~?\]ly are ill real texts a. lat%e nunl\])(:r of 
C1)'s not t)rononfinalized? 
_P21 \~q~y are non-C1)r(:ferents 1)ronominalized? 
or (:x1)ressc(l indel)(,.1Mently of centering: 
IP~I \~qty are in real texl;s a large nunfl)(,r of 
(tis(:our,s(: entities with an ani;ecedent in i;h(', 
previous utteran(:e not l)ronomina\]ized? 
FI?2~ \NqW can more tlmn one entity 1)e t)ronom- 
inalize(l in one ui;t(,rance? 
\].h'on~ a corlms-driven view, question \[~ is the 
larger prol)lem. 
~/\[(;C()y all(t Stl'lt\])e (19,()9) were the first to 
sng;~est all al~ol;il~llln for ~ellel'~ttioll which solves 
t;h(,se problems. It was motivated by the ol)- 
servation that ~t la.rge percentage of NPs which 
would have been realized by 1)ronouns using 
known algorithms, are in fact not, realized as 
pronomls in real text. They suggest that such 
NPs serve to mark ~time changes' in the dis- 
course. Their algorithm aecordingly makes use 
of distance, context ambiguity mid telnl)ora\] 
discourse structure to decide about 1)ronolni- 
nalization. In our work, we have considered a 
corpus of a ditl'erent genre in which I, emt)oral 
cha.nge does nol, 1)bW a determining role: de- 
script, ive (;exts. \¥e 1)repose a new algorithm 
307 
that significantly simplifies the problem of pro- 
noun choice. It is based on a new definition of 
the local focus, which views the discourse status 
of the antecedent as the major motivation be- 
hind focusing. The algorithm performs equally 
well when applied to McCoy and Strube's cor- 
pus of newspaper articles. 
3 Corpus analysis 
The algorithm we will present below has been 
developed in close relation to the MUSE corpus - 
a corpus of museum exhibit labels a. The corpus 
is a collection of web pages of the Paul Getty 
Museum, pages from an exhibition catalogue, 
and pages froln a jewellery book. Typical char- 
acteristics are tile central role of inanimate ref- 
erents in these texts, and the lack of temporal 
change thus providing an interesting counter- 
t)art to the newst)aper genre investigated by Mc- 
Coy and Strube. 
With an overall set of around 5000 words, tile 
cortms contains 1450 NPs. Each NP has been 
annotated with respect to, among others, gram- 
matical function, discourse status, gender, num- 
ber, countability, and antecedent relationships. 
23% of the NPs form reference chains (i.e. at 
least two mentions of one and the same referent 
in one text), the other 77% are only mentioned 
once. We have 101 different reference chains; 
the chain-fbrming NPs fall into 10\] discourse- 
new and 213 anaphoric NPs. In the following, 
we will only discuss the anaphoric NPs. 50% of 
the anaphoric NPs are realized as definite de- 
scriptions, 50% as pronouns. We distinguish be- 
tween locally bound pronouns, which are deter- 
mined syntactically (Binding Theory, (Chore- 
sky, 1981)), and which we expect the tacti- 
cal generator to handle correctly, and pronouns 
which are not locally bound so-called dis- 
course pronouns. We investigated possible col 
relations between the discourse 1)ronouns and 
semantic/pragmatic features of their context. 
The basic notions that we found were dis- 
tance, discourse status of the antecedent, and 
grammatical function of the antecedent. All 
three notions need a precise definition. 
Distance. ~lb be able to determine the dis- 
tance between a discourse entity and its an- 
tecedent, a precise determination of what counts 
3UI{L: http ://www. hcrc. ed. ac. uk/'gnome/corpora 
as utterance unit is necessary. Following 
Kameyama (1998), we take as utterance unit 
the finite clause, l{elative clauses and con> 
plenmnt clauses are not counted as utterances 
on their own. This means that we count 
clauses containing complement clauses or rel- 
ative clauses as single utterances. 4,a The pre- 
vious utterance is the preceding utterance at 
the same level of embedding. 
Note that we allow the treatment of clauses 
with VP coordination (subject ellipsis) as com- 
plex coordinated clauses as done in Kameyalna 
(1998), thus handling subject ellipsis as a dis- 
course pronoun; our algorithm does not; insist 
on this view however. 
The following correlation between pronoun 
use and distance was tbund in our corpus: 97% 
of the pronouns have an antecedent in the same 
or the previous utterance. 
Discourse status. The information status of 
a discourse entity in an utterance is either given 
or new. We use these terms with an identi- 
cal Lneaning as g~vn'nd and focus in Vallduvi 
(1993). Discourse status, as introduced by 
Prince (1992), is a similar but different notion: 
A discourse entity is discourse-old, if it has 
been mentioned before in the same discourse; 
it is discourse-new otherwise. All cases of 
givenness by indirect means like part-whole, 
set-member relationships, other bridging rela- 
tions, inferences (Prince's inferrables, anchored 
and situationally evoked entities) are judged as 
discourse-new, thus taking into account only 
tile identity antecedent relationship. We share 
Prince's opinion that pronominalization has to 
do with discourse status, whereas definiteness 
has to do with information status. 
66% of all short-distance discourse pronouns 
in the MUSE corpus refer to an antecedent which 
is in itself discourse-old. 
Subjecthood. The third strong correlation is 
the relation between pronoun use and the grmn- 
matical function of the antecedent. 63% of dis- 
course pronouns have a subject as antecedent. 
The following table shows the overall distribu- 
tion of antecedent properties for short-distance 
4This deviates from Kameyama, who analyzes re- 
ported speech as separate utterance. 
'SComplement and relative clauses consisting of inore 
than one tinite clause create their own internal level of 
focusing. 
308 
discourse 1}ronouns and (shown ill 1}rackets) h),' 
short-distance definite descriptions. 
old new 
,~,t)io{:{; a8% (22%) 25% (12%) 
~ot s,,1,.i 28% (18%) .(}% (48%) 
4 Algorithm 
\] ~ased on these corpus s{,ndy resnl|;s, we {lefhle a 
new notion of the local focus -- the set of refer- 
ents which arc awfilal)le for prononfilmlization. 
The local focus is ut}dated at each utterance 
boundary, and is defined as the set of referents 
of the 1)revious utterance which are: 
(a) discourse-old, or 
(b) realized as subject. 
This set {:all theoretically {:ontain \]11{}re thnu one 
\].lt ill \]HOSt cas,, s, O0 a.(1 (\])) are o,,o 
and the same singlel;on sol;, which coul{l be seen 
as the well-known Cb. Thus sta\]ldar(l cent{~r- 
ing app{,,a\]'s as ~t spe{'ial case of {}m a\])l}roa{:h. 
This account means that newly introduced r(;t L 
{;l'C\]l|,S arc ll(}l; immediately l)ron{}mina\]ized i\]1 
the following utterance, Ulfless they have bc{,.n 
introduced as subject--ml ol)servation made l}y 
Brenlmn (19!}8) and now confirlile(t with respect 
to our data also. 
The l}rOl}osed definition of the lo{:al focus ~,,ou-.. 
eralizes the t2){:using mechanisms assm~led i\]1 
{x211i,el'\]llg all(t intro(hlces the discourse status ()f 
the antecedent as (}he lnain {:rii;eriol~ })ehi\]\]d th{~ 
l}\]onomi\]laliza.ti{}n decision. It is i\]\]t{;\]'{>ting iT() 
lIOl,e I;ll:tl; \]X/i{:Coy all{1 Sl;rul}{; (1.9991 also nmke 
use of the discourse status of the ant(x:{,Aent; 
without mentioning it exl)lh:itly. For a. certah\] 
sul)sel; of' intrasentealtial anat)horic relal, ions ill 
amMguous cont(,xts they I)rOl}OSe \])ro\]\]on\]ina.1- 
ization in case the antec(;dent would 1)c the 1)re- 
ferred one in Stl'ut)e (1998)'s pr{}lloun resohl- 
lion algoritlnn. Because the set {}i' a.ntecedents 
is l'mlkcd there with resl)eCt to infbrm~tion sta- 
tus, this is identical with {mr proposal. Why 
tlmy do not use the discourse status as a gen- 
eral criBerion is not clear. We believe that the 
discourse status of the antecedent as pr{momi- 
nalization trigger is a general rule, in discourse 
Sell-l&ii/;,ies. 
The central role of discourse sLat;us and sub- 
jecthoo{1 are in our opinion 1101; accidental. The 
Bwo nol;ions retlecl; tw{\] tyt)ical stra{,egies 1;o 
introduce a new referenl; inBo l;he (liscourse. 
Wc will assume here the mnm~rked inf(}rnmt;ion 
structure of an utterance: given - new. The 
subject usually is part of (or identical to) the 
9i'uen. Let X i)e a certain referent; which is newly 
intl'oduced in utterance (ul), and referred to 
again in t;lle following ui~tera.nce 0121. In the 
first strategy, X is introduced in the new non- 
subject t)arl; of (u\].). And ill this l)ati;el'n the sec- 
ond lnention of X in (u2) is not pronominalized. 
In exalnple (1) given in Figure 1 tile local focus 
for ,,t ;el'a,,ce one el m0,1  : {t,.4; 
'm..,in 'morns" is new in (\]1\].) and \]1ol; pronominal- 
ized in (112). The other typical strategy is where 
the referent is tirst mentioned ill a subject posi- 
tion. This is typical for a segment onset, or the 
beginning of ~ text,. Ofl;en this referent is given 
})y other lneans - - for example, l)y refhrence to a 
1)icture., or to a r(;lated object. In example (2) 
of Figure \]., the second mention is i)rononfinal - 
ized. ~\]'\]ms 1;11(} sul)jecI; position seems to time- 
lion as creating a givemless allocation for the 
denotexl rcfercnl;. These two strategies roughly 
correspond with two types of thcnlatic develop- 
nlent identified in l)mm,q (197d). 
Parallelism. Our definition of l;he local tS- 
cus licenses 91% (62 of 68 pronouns) of all 
short-distance discourse pronouns in ore" corpus. 
Looking at tile pronomls violating the prol)osed 
accounl;, we nm.de ,,11 interesting observal;ion: 
n}osl; el l;heln occtlr ill conl;exts of strong t)ar- 
allelism. \'Vc call an anphoric NP ~/~,1~2 parallel 
if it has ml a.ntccedenl; 'l~q)l in the previous utter- 
~Ill(;{;~ alld 'll,l) I alld '**,i12 \]rove Lhe 5alllC graummti- 
cal function. 1,k)l" work with real I;ex{;, il; is useful 
to inchlde cases whel'(~' 7L\]) 2 is a 1)osscssive or gen- 
itive NP inside a certain 'npa, and 'np\] and 'np:~ 
have the same gralnmatical flmction. Depend- 
ing on the concrel;e function, we distinguish sub- 
.ic{:t and object 1)arallelisln. Strong parallelism 
is a simulta.neous subject mid object para.lMism 
in two consecutive clauses. Strong i)arallelism 
always overrides the local focus criterion, mid 
allows tbr pronominalization of referents with 
discourse-new antecedents in nonsut)ject posi- 
tion. 
The local focus definition refilled by the par- 
allelism eff'ect is ml explanation for question P~ 
and a small portion of \[~\], but most cases of 
problem \[~ r(nnain open. q_'wo reasons for not 
prononfinalizing a reh~rent which is a nwanber 
of the local focus need to be considered: 
~ alnbiguous context, 
309 
(1) (\[11.) Shortly after irzh, eriti~,.q the building in 1752, he commissioned th, e areh, iteet Pierre Conta, nt 
d'Ivry to renovate the main rooms. 
(u2) The engravings for these rooms , showing the wall lights in place, were reproduced in Dide~vt's 
Encyclopaedic, one of the principal works of the Age of E'nlightenment. 
(2) (ul) Scottish born, Canadian based jeweller, Alison Bailey-Smith, constructs elaborate and cer- 
emoniaI jewellers} from irtd,ustrial wire. 
(u2) Her materials are often gathered from so'arces such as abandoned television .sets ... 
(3) ~i~ With attachments s'ach as an omtlav micvometer~ the microscope irtcovporates the latest sei- 
entitle technology of the mid-17OOs. 
(u2) The design of its era'ring gilt b~vnze stand was the heigh, t of the Rococo stifle ... (4) 
(u0) the table probably came from, the Trianon de Porcelaine , a small house built for th, e King's 
mistress, Madame de }l/\[ontespar~,, on the 9ro'unds of the Palace of Vet.sallies. 
(ul) This table's marquetry of ivory and horn, painted blue underneath, would have followed the 
house's blue-and-white color sehcrne, imitating blue-and-white Chinese porcelain, a fashionable aTI.d 
highly prized material. 
(u2) Blue-arM-white cevarnie tiles decorated the house, ... 
Figure t: Corlms examples 
discourse structure signalling. 
Ambiguity. Along with McCoy and Strube 
we argue that ambiguity with respect to gen- 
der/number influences the pronominalization 
decision: members of the local fbcus which have 
a competing referent (refbrent with similar gen- 
der/number) in some span to the left of the ref- 
erent to be generated should not be realized as 
t)ronouns so as to minimize the inference load 
for the reader. However, not to allow pronom- 
inalization in all ambiguous context situations 
does not ~I)t)ear to be consistent with real texts 
(McCoy and Strube, 1999). In the MUSE cortms 
one third of all focal NPs occur in ambiguous 
contexts, one half of them is pronominalized, 
the other half is not. Two questions require a 
precise answer to use the ambiguity constraint 
in a generation algorithm: 
• Which set of 1)reviously mentioned refer- 
ents or text st)an is taken into account tbr 
referents to be in competition? 
• Which referents are pronominalizcd despite 
an alnbiguous context? 
The answer is surprisingly simple: I/.eferents 
of the previous utterance which are not in the lo- 
cal fbcus do not disturb pronominalization, even 
if they have the stone gender/nmnber. Only if 
the actual referent has a competitor in the local 
fbcus, is pronominalization blocked. This is il- 
lustrated in Figure 1 with exmnples (3) and (4), 
respectively. In (3) the microscope is discourse- 
old and the only member in the local focus for 
(u2); the competing referents ocular micrvmc- 
tcr" mid technology are new and hence not local 
fbr utterance (u2). In (4), the local focus for 
(u2) is {the  at, th,  tl, e l,.o'asc} 
A slight improvement of the performance of 
the algorithm can be achieved by regarding 
the role of "heavy" nonrestrictive modification. 
hm\]uding the referents of discom'se-new NPs 
which are amplified by appositions or nonre- 
strictive relative clauses into the set; of' possible 
competitors improves accuracy slightly. 
Discourse structure signalling. It is now 
known that detinite descriptions (or more gen- 
eral overspecified NPs) signal the start of a new 
discourse segment (Passommau, 1996; Vonk et 
al., 1992). For most generation systems gener- 
ate from an I/ST-like text; plan, discourse seg- 
ments are naturally given. The only question 
fl'om the generation perspective is the degree of 
detail provided by the segmentation. 
Our algorithm gnome-up assumes that the 
discourse segmentation has already been speci- 
fied. At each segment boundary, the local focus 
is set to nil, thereby disallowing pronominal- 
ization for all discourse entities of the first ut- 
terance in the segment onset. 
It is also well known that plmmed discourse 
with repeated phrases at the begimfing of a 
clause are seen as 'bad style'. Identical repeated 
pronouns at the clause onset are rarely found 
in expository and descriptive texts (2.6% of all 
discourse pronouns in our corpus). Hmnan writ- 
ers usually avoid possibly dull lack of variation 
by employing various aggregation techniques. 
310 
Let X 1)e a refl,'renl; I;o 1)e generated in Ill;l;erailCO (112), and focu,.s 1)e the' scl; of rc'h;reni;s of the 1)rcvious 
ul;l;eran(:(; (ul) which are 
(a) discoursc-okl, or 
(b) realized as subject. 
(1) X has an antecedent beyond a segment boundary def description 
(2) X has an antecedent two or more ul;i;cranccs distant def (lescril)tion 
(:~) X hits ~Ul alll;(X'(xl(,ll(; ill (ll\]): ~lll(l 
(3a) X occurs in strong 1)aralM contc'xt 1)ronoun 
(31)) X ¢ focu,.s (lcf dcscril)tion 
(3(:) X C .foc~z.s and 
• X has a coral)cling relbrcnt Y c focus dc'f description 
• X has a comp(~ting retba'cnt Y in (ul) amplitic(1 with appo- (lef dcscril)tion 
sition or nonrestrictive relative clause 
• olso prollOllll 
The repeti|;ion 1)locking rul(; overri(lcs the 1)ronominalization suggesl;ed in (3c) to a definite description. 
Figme 2: The algoritlnn 
rl'hlls pronoun rel)etition 1)hwking seems 1.o Jw~ 
an aggregation trigger rather than ~ motivation 
for definite description generation. We hyl)ot\]> 
(;size l;hat t;he at)l)ar(mt Kcquen(:y of (lelinite (le- 
scriptions ill t)lmnm(l discourse has much to do 
with repetition blocking, but is used with re- 
specl; to a very line-grained, 1)tel)ably genre- 
specific discourse, si;rtlCl;lll'e. Olle candidate for 
this is the, t(,mt)oral structure in newst)at)er ar- 
i;ieles proposed by McCoy and Sta'ul)e. 
When evahutting Ollr algorM m l, w(' only used 
tile pa.l'agr~I)h seglnenl;~ti;ion given in the corpus. 
\]{lit for g;etlel'al;ioll systellls, which usually sir(' 
not equil)l)Cd wit.h develol)ed a.ggrcgal.i(m meal- 
tries, we have also made avai\]ablc a t)ronoun rci> 
et,itioli blocking rule: If a discourse entity in the 
local focus has a nont)ossessiv(~ l)ronomilml an- 
|;ecedelit, in'onolninalizal;ioli will 1)e J)loel¢cd at 
this l/line. Figure 2 SUll:lnlarizcs the algorithm. 
The presented pronominalization algorithm 
has been implelnented ill the reusable module 
gnome-np, gnome-np consists of a colnponent 
for discom'se model lnanagement and one for 
NP form determination, it is designed to 1)e 
plugged ill ~~:\['1;(;1' text 1)lanning, coneeI)tualiza- 
ti(m, and sentence plalming, trot 1)etbre tactical 
generation. 
5 Evaluation 
A comparison of the t)erforlnance of our algo- 
rithln with 1,he annotated MUS1.; corlms and Mc- 
Coy and Strube's newspatmr corlms is given in 
Table 1. The e, valuation has been carried out 
for the algorithm gnome-np without cm\])loying 
the rel)etith)n blocking rule and without; a line- 
grained discourse segmentation. Layout scg- 
lllell{;s Wel'e llse(l for the MUSE COl'l)llS. Beeallse 
l/he munl)er of annotal;e(l seglnent OllSe\[;s Jill' the 
newsl)aper corpus is not easy to r('-estat)lish, wc 
giv(; here two figures fol" this eori)us: tirst with- 
out any segment, ons('t signalling (lower 1)ound), 
and second with the assulnt)l;ion that 15 short- 
distance definite (tcscriptions mark segment on- 
s<%s. The tigures include locally-herald l/re - 
nouns to yield J)(;tter cOlnl)arability wil;h McCoy 
and Sl;rul)e. '.\[lic, figur(,'s in l,hc, (:ohmms 'gnome- 
nil' represc, nl; I;\]lose NPs whose form is l)re(li(',led 
correctly 1)y 1;hi; new algoril;hm when evaluatc(l 
against l;h('~ a\]moi;at,(~(l corpora. 
The figures in T~d)le 1 show that our al- 
gorithm performs very well in both domains, 
even without using a tiner discourse Seglnen- 
ration such as telnt)ol'al structure. Moreover, 
it; pertBrms better on McCoy and Stl'ul)e's cor- 
pus than their own algorithm, which success- 
fldly predicted the choice between realization by 
pronoml and realization by detinite description 
in 84.7% of all eases. The disagreements oc- 
('ur tirsl; tbr long distance t)rol~ouns (in our ter- 
lilino\]ogy: prollOtlllS lIlore than one clause dis- 
tanI;) and, second, ill hmger tel'trent chains with 
well established focus. For the latter, whereas 
gnome-np wouhl always suggest a tn'OlmUn, the 
real discourse swaps betweeli pronoun mid deft- 
nile description. Thus a finer segmentation or a 
repetition blocking rule could still improve the 
result fllrther. 
311 
MUSE gnome-rip agreement newspaper gnome-np agreement 
pronouus 112 101 90.2% 302 267 88.d% 
def descril)tions 101 86 85.1% 225 187 202 83.1% 89.70{o 
total 213 t87 87.8% 527 454 469 86.1% 89.0% 
'Dtble 1: Performance comparison 
6 Conclusions 
This paper has presented a new algorithm tbr 
the pronominalization of third person discourse 
entities. The Mgorithm, first, is implemented 
as a reusable module tbr generation systenls 
and, second, provides a theoretical account of 
pronominalization in general. 
The proposed algorithm provides a solution 
for question \[~ above by widening the defini- 
tion of local Ibcus to be a set with possibly more 
than one referent. The Mgorithm also oilers a 
new solution fbr problem \[~ above, aml)igu- 
ous pronoun generation. Discom'se structuring 
(~\]) is assumed as given. A sufficiently fine- 
grained discourse structuring has been explored, 
for example, by McCoy and Strube fbr their do- 
nlmn of newspaper articles, but remMns an issue 
fbr future research fbr other domains. We have 
shown that next to 1)roxilnity, the discourse sta- 
tus of the mltecedcnt is a main criterion for trig- 
gering pronomilmlization. 
The suggested algorithm generalizes known 
fbcusing accounts. Gundel et al. (1993)'s cog- 
nitive slat, us of being "in fbcus" is now approxi- 
mated by the set of all discourse-old entities and 
the subject of the previous utterance. The new 
focus determination is also a generalization of 
centering's Cb. The focus so defined serves two 
functions simultaneously: to trigger 1)ronomi- 
nMization, and to provide the set; of competitors 
for pronoun generation in ambiguous contexts. 
Although our training corpus is too small to jus- 
tify general clMms, the ewduation with respect 
to tile newspaper genre provides evidence that 
this finding is valid for planned discourse ill gen- 
erM, independent of the concrete genre. 

References 

Susan Brennan. 1998. Centering as a, psychological re- 
source for achieving joint reference in spontaneous 
discourse. In Marilyn A. Walker, Aravind K..loshi, 
and Ellen F. Prince, editors, Centering Theory in Dis- 
course, pages 227 - 250. Clarendon Press, Oxtbrd. 

Noam Chomsky. 1981. Lectures on government and 
binding. Foris, l)ordrecht. 

Robert Dale. 1990. Generating referring cxpression.s. 
The MIT Press, Cambridge, Massachusetts. 

Fl'anti~ek Dane~. 1974. \]hmctional sentence perspective 
mid the orga.nisa.tion of the text. In Frantigek Dane~, 
editor, Papers on Functional Sentence Perspective, 
pages 106 128. Academia, Prague. 

Barbara J. Grosz, Aravind K. Joshi, and Scott Weinstein. 1995. Centering: A framework for modelling 
the local coherence of discourse. Computational Linguistics, 21 (2):203 16d. 

Jeanette K. Gundel, Nancy Iledberg, and Ron Zaeharski. 
1993. Cognitive status aim the tbrm of rethr,:ing ex- 
pressions ill discourse. Lang'uagc , 69:27d 3(}7. 

Megulni Kameyama. 19.(t8. lntrasentcntial centering: A 
case study. In Marilyn A. Walker, Aravind K..Joshi, 
and Ellen F. Prince, editors, Centering 7'hcory in Dis- 
course, pages 89 -- 114. Clarendon Press, Oxtbrd. 

Kathleen McCoy and Michael Strube. 1999. Generating 
anaphoric expressions: Pronoun or definite description? In Proceedings of ACL '99 Workshop: Reference and discourse structure, pages 63 -- 71. 

J. Obcrlander, M. O'Donnell, A. Knott, and C. Mellish. 
1998. Conversation in the museuln: experiments in 
dynamic hypermedia with the intelligent labelling ex- 
I)Iorcr. New l~,(:view of Multimedia and Hypermedia, 
pages 11 32. 

l/,ebecca Passonlmau. 1996. Using centering to re- 
lax gricean constraints on discourse anaphorie noun 
phrases. L(tngu, age wn, d ,qp('.ech, 39(2):229-- 264. 

Ellen F. Prince. 1992. The ZPC letter: Subjects, deli- 
niteness aim inforina.tion status. In W. C. Mam~ and 
S. A. Thompson, editors, Discourse desciption: Di- 
verse linguistic anab.lSCS of a flmd-raisi.ng text..lohn 
Benjamins, Amsterdam. 

Cmldace L. Sidner. 1979. 7bwards a computationally 
theory of definite anaphora comprehension in English 
disourse. PhD thesis. 

Michael Strube and Udo Hahn. 1999. Functional centering - grounding referential coherence in information structure. Computational Linguistics, 25(3):309-344. 

Michael Strube. 1998. Never look back: An alternative 
to centering. In Proceedings of Coling-ACL '98, pages 
1251-1257. 

Linda Z. Suri. 1993. Extending focussing frameworks to 
process complex sentences and to correct the written 
English of proficient signers of American Sign Lan- 
guage. PhD thesis. 

Enrico VMlduvi. 1993. lMbrmation packaging- a survey. 
Technical rel)ort, HCRC research Pal)er RP-d4. 

W. Vonk, G. Hustinx, and W. Simons. 1992. The use of 
referential expressions in structuring discourse. Lan- 
guage and Cognitive P~vcesses, 7(3/4):301-333. 
