Functional Centering 
Michael Strube & Udo Hahn 
Freiburg University 
(~\[} Computational Linguistics Lab 
Europaplatz 1, D-79085 Freiburg, Germany 
{strube, hahn}@coling, uni-freiburg, de 
Abstract 
Based on empirical evidence from a free 
word order language (German) we propose a 
fundamental revision of the principles guid- 
ing the ordering of discourse entities in the 
forward-looking centers within the center- 
ing model. We claim that grammatical role 
criteria should be replaced by indicators 
of the functional information structure of 
the utterances, i.e., the distinction between 
context-bound and unbound discourse ele- 
ments. This claim is backed up by an empir- 
ical evaluation of functional centering. 
1 Introduction 
The centering model has evolved as a methodology for 
the description and explanation of the local coherence 
of discourse (Grosz et al., 1983; 1995), with focus on 
pronominal and nominal anaphora. Though several 
cross-linguistic studies have been carded out (cf. the 
enumeration in Grosz et al. (1995)), an almost canon- 
ical scheme for the ordering on the forward-looking 
centers has emerged, one that reflects well-known reg- 
ularities of fixed word order languages such as En- 
glish. With the exception of Walker et al. (1990; 
1994) for Japanese, Turan (1995) for Turkish, Ram- 
bow (1993) for German and Cote (1996) for English, 
only grammatical roles are considered and the (par- 
tial) ordering in Table 11 is taken for granted. 
I subject > dir-object > indir-object I 
> complement(s) > adjunct(s) 
Table 1: Grammatical Role Based Ranking on the C! 
~Table 1 contains the most explicit ordering of grammat- 
ical roles we are aware of and has been taken from Bren- 
nan et al. (1987). Often, the distinction between comple- 
ments and adjuncts is collapsed into the category "others" 
(c.f., e.g., Grosz et al. (1995)). 
Our work on the resolution of anaphora (Strube & 
Hahn, 1995; Hahn & Strube, 1996) and textual el- 
lipsis (Hahn et al., 1996), however, is based on Ger- 
man, a free word order language, in which grammat- 
ical role information is far less predictive for the or- 
ganization of centers. Rather, for establishing proper 
referential relations, the functional information struc- 
ture of the utterances becomes crucial (different per- 
spectives on functional analysis are brought forward 
in Dane~ (1974b) and Dahl (1974)). We share the no- 
tion of functional information structure as developed 
by Dane~ (1974a). He distinguishes between two cru- 
cial dichotomies, viz. given information vs. new infor- 
mation (constituting the information structure of ut- 
terances) on the one hand, and theme vs. rheme on 
the other (constituting the thematic structure of utter- 
ances; cf. Halliday & Hasan (1976, pp.325-6)). Dane~ 
refers to a definition given by Halliday (1967) to avoid 
the confusion likely to arise in the use of these terms: 
"\[...\] while given means what you were talking about 
(or what I was talking about before), theme means 
what I am talking about (now) \[...\]" Halliday (1967, 
p.212). Dane~ concludes that the distinction between 
given information and theme is justified, while the dis- 
tinction between new information and rheme is not. 
Thus, we arrive at a trichotomy between given infor- 
mation, theme and rheme (the latter being equivalent 
to new information). We here subscribe to these con- 
siderations, too, and will return in Section 3 to these 
notions in order to rephrase them more explicitly by 
using the terminology of the centering model. 
In this paper, we intend to make two contributions 
to the centering approach. The first one, the intro- 
duction of functional notions of information structure 
in the centering model, is methodological in nature. 
The second one concerns an empirical issue in that we 
demonstrate how a functional model of centering can 
successfully be applied to the analysis of several forms 
of anaphoric text phenomena. 
At the methodological level, we develop arguments 
that (at least for free word order languages) grammat- 
ical role indicators should be replaced by functional 
270 
role patterns to more adequately account for the or- 
dering of discourse entities in center lists. In Section 
3 we elaborate on the particular information structure 
criteria underlying a function-based center ordering. 
We also make a second, even more general method- 
ological claim for which we have gathered some pre- 
liminary, though still not conclusive evidence. Based 
on a re-evaluation of empirical arguments discussed 
in the literature on centering, we stipulate that ex- 
changing grammatical by functional criteria is also a 
reasonable strategy for fixed word order languages. 
Grammatical role constraints can indeed be rephrased 
by functional ones, which is simply due to the fact 
that grammatical roles and the information structure 
patterns, as we define them, coincide in these kinds 
of languages. Hence, the proposal we make seems 
more general than the ones currently under discus- 
sion in that, given a functional framework, fixed and 
free word order languages can be accounted for by the 
same ordering principles. As a consequence, we argue 
against Walker et al.'s (1994, p.227) stipulation, which 
assumes that the C I ranking is the only parameter of 
the centering theory which is language-dependent. In- 
stead, we claim that functional centering constraints 
for the C! ranking are possibly universal. 
The second major contribution of this paper is re- 
lated to the unified treatment of specific text phe- 
nomena. It consists of an equally balanced treatment 
of intersentential (pro)nominal anaphora and textual 
ellipsis (also called functional or partial anaphora). 
The latter phenomenon (cf. the examples given in the 
next section), in particular, is usually only sketchily 
dealt with in the centering literature, e.g., by assert- 
ing that the entity in question "is realized but not di- 
rectly realized" (Grosz et al., 1995, p.217). Further- 
more, the distinction between those two kinds of re- 
alization is generally delegated to the underlying se- 
mantic theory. We will develop arguments how to lo- 
cate elliptical discourse entities and resolve textual el- 
lipsis properly at the center level. The ordering con- 
straints we supply account for all of the above men- 
tioned types of anaphora in a precise way, includ- 
ing (pro)nominal anaphora (Strube & Hahn, 1995; 
Hahn & Strube, 1996). This claim will be validated 
by a substantial body of empirical data (cf. Section 4). 
2 Types of Anaphora Considered 
Text phenomena, e.g., textual forms of ellipsis and 
anaphora, are a challenging issue for the design of 
parsers for text understanding systems, since imper- 
fect recognition facilities either result in referentially 
incoherent or invalid text knowledge representations. 
At the conceptual level, textual ellipsis relates a quasi- 
anaphoric expression to its extrasentential antecedent 
by conceptual attributes (or roles) associated with that 
antecedent (see, e.g., the relation between "Akkus" 
(accumulator) and "316LT", a particular notebook, in 
(lb) and (la)). Thus, it complements the phenomenon 
of nominal anaphora, where an anaphoric expression 
is related to its antecedent in terms of conceptual gen- 
eralization (as, e.g., "Rechner" (computer) in (lc) 
refers to "316LT' in (la) mediated by the textual ellip- 
sis in (lb)). The resolution of text-level nominal (and 
pronominal) anaphora contributes to the construction 
of referentially valid text knowledge bases, while the 
resolution of textual ellipsis yields referentially coher- 
ent text knowledge bases. 
(1) a. Ein Reserve-Batteriepaek versorgt den 316LT ca. 
2 Minuten mit Strom. 
(A reserve battery pack - supplies - the 316LT - 
for approximately 2 minutes - with power.) 
b. Der Status des Akkus wird dem Anwender ange- 
zeigt. 
(The status of the accumulator - is - to the user - 
indicated.) 
c. Ca. 30 Minuten vor der Entleerung beginnt der 
Rechner 5 Sekunden zu beepen. 
(Approximately 30 minutes - before the discharge 
- starts - the computer - for 5 seconds - to beep.) 
d. 5 Minuten bevor er sich ausschaltet, f'angt die 
Low-Battery-LED an zu blinken. 
(5 minutes - before - it - itself- turns off- begins 
- the low-battery-LED - to flash.) 
In the case of textual ellipsis, the missing concep- 
tual link between two discourse elements occurring in 
adjacent utterances must be inferred in order to estab- 
lish the local coherence of the discourse (for an early 
statement of that idea, cf. Clark (1975)). In the sur- 
face form of utterance (lb) the information is missing 
that "Akkus'" (accumulator) links up with "316LT". 
This relation can only be made explicit if conceptual 
knowledge about the domain, viz. the relation part-of 
between the concepts ACCUMULATOR and 316LT, is 
available (see Hahn et al. (1996) for a more detailed 
treatment of text ellipsis resolution). 
3 Principles of Functional Centering 
Within the framework of the centering model 
(Grosz et al., 1995), we distinguish each utterance's 
backward-looking center (Gb(U,~)) and its forward- 
looking centers (G! (Un)). The ranking imposed on 
the elements of the G I reflects the assumption that the 
most highly ranked element of G I (tin) - the preferred 
center Cp(Un) - is the most preferred antecedent of 
an anaphoric or elliptical expression in Un+l, while 
the remaining elements are partially ordered accord- 
ing to decreasing preference for establishing referen- 
tial links. Hence, the most important single construct 
of the centering model is the ordering of the list of 
forward-looking centers (Walker et al., 1994). 
271 
The main difference between Grosz et al.'s work 
and our proposal concerns the criteria for ranking the 
forward-looking centers. While Grosz et al. assume 
that grammatical roles are the major determinant for 
the ranking on the C'y, we claim that for languages 
with relatively free word order (such as German), it 
is the functional information structure (IS) of the ut- 
terance in terms of the context-boundedness or un- 
boundedness of discourse elements. The centering 
data structures and the notion of context-boundedness 
can be used to redefine Dane~' (1974a) trichotomy be- 
tween given information, theme and new information 
(rheme). The Cb(U,), the most highly ranked element 
of C.t(Un-i) realized in \[In, corresponds to the el- 
ement which represents the given information. The 
theme of U, is represented by the preferred center 
C'p(U,), the most highly ranked element of C! (Un). 
The theme/rheme hierarchy of \[In is represented by 
CI(U,~ ) which - in our approach - is partly deter- 
mined by the C! (Un-i): the rhematic elements of Un 
are the ones not contained in C! (U,_ i ) (unbound dis- 
course dements); they express the new information in 
Un. The ones contained in Cl(U,_i ) and Cy(U,) 
(bound discourse elements) are thematic, with the 
theme/rheme hierarchy corresponding to the ranking 
in the Cls. The distinction between context-bound 
and unbound elements is important for the ranking 
on the C I, since bound elements are generally ranked 
higher than any other non-anaphoric elements (cf. also 
Haji~ov~i et al. (1992)). 
An alternative definition of theme and rheme in the 
context of the centering approach is proposed by Ram- 
bow (1993). In his approach the theme corresponds to 
the Cb and the theme/rheme hierarchy can be derived 
from those elements of C! (U,-i) that are realized in 
\[In. Rambow does not distinguish, however, between 
the information structure and the thematic structure 
of utterances, which leads to problems when a change 
of the criteria for recognizing the thematic structure is 
envisaged, Our approach is flexible enough to acco- 
modate other conceptions of theme/rheme as defined, 
e.g., by Haji6ov~i et al. (1995), since this change af- 
fects only the thematic but not the information struc- 
ture of utterances. 
bound element(s) >~sb~.. unbound element(s) 
anaphora >X Sbo,,a 
(possessive pronoun xor elliptical antecedent) >,Sbo,,a 
(elliptical expression xor head of anaphoric expression) 
nom head, >pr,c nom head2 >p~,o ... >~, nom headn 
Table 2: Functional Ranking Constraints on the C! 
The rules holding for the ranking on the C' I, derived 
from a German language corpus, are summarized in 
Table 2. They are organized into three layers 2. At 
the top level, >,sb,,~ denotes the basic relation for the 
overall ranking of information structure (IS) patterns. 
Accordingly, any context-bound expression in the ut- 
terance U,_ i is given the highest preference as a po- 
tential antecedent of an anaphoric or elliptical expres- 
sion in \[In while any unbound expression is ranked 
next to context-bound expressions. 
The second relation depicted in Table 2, >iSbou~u, 
denotes preference relations dealing exclusively with 
multiple occurrences of (resolved) anaphora, i.e., 
bound elements, in the preceding utterance. >'Sbo,,d 
distinguishes among different forms of context-bound 
elements (viz., anaphora, possessive pronouns and tex- 
tual ellipses) and their associated preference order. 
The final element of >,Sbou~u is either the elliptical 
expression or the head of an anaphoric expression 
which is used as a possessive determiner, a Saxon gen- 
itive, a prepositional or a genitival attribute (cf. the 
ellipsis in (2c): "die Ladezeit" (the charge time) vs. 
"seine Ladezeit" (its charge time) or "die Ladezeit des 
Akkus" (the accumulator's charge time)). 
For illustration purposes, consider text fragment (1) 
and the corresponding Oh~C! data in Table 33: In (ld) 
the pronoun "er" (it) might be resolved to "Akku" 
(accumulator) or "Rechner" (computer), since both 
fulfill the agreement condition for pronoun resolu- 
tion. Now, "der Rechner" (computer) figures as a 
nominal anaphor, already resolved to DELL-3 16LT, 
while "Akku" (accumulator) is only the antecedent 
of the elliptical expression "der Entleerung" (dis- 
charge). Therefore, the preferred antecedent of "er" 
(it) is determined as Rechner (computer). 
The bottom level of Table 2 specifies >~rco which 
covers the preference order for multiple occurrences 
of the same type of any information structure pattern, 
e.g., the occurrence of two anaphora or two unbound 
elements (all heads in an utterance are ordered by 
linear precedence relative to their text position). In 
sentence (2b), two nominal anaphors occur, "Akku" 
(accumulator) and "Rechner" (computer). The tex- 
tual ellipsis "Ladezeit" (charge time) in (2c) has to 
be resolved to the most preferred dement of the C' I 
of (2b), viz. the entity denoted by "Akku" (accumula- 
tor) (cf. Table 4). Note that "Rechner" (computer) is 
the subject of the sentence, though it is not the pre- 
ferred antecedent, since "Akku" (accumulator) pre- 
cedes "Rechner" (computer) and is anaphoric as well. 
2Disregarding coordinations, the ordering we propose in- 
duces a strict ordering on the entities in a center list. 
3Minuten (minutes) is excluded from the C! for reasons 
concerning the processing of complex sentences (cf. Strube 
(1996)). 
272 
(la) Cb: DELL-3 16LT: 316LT 
Cf." \[DELL-316LT: 316LT, RESERVE-BATTERY-PAcK: Reserve-Batteriepack, 
TIME-UNIT-PAIR: 2 Minuten, POWER: Strom\] 
(lb) Cb: DELL-316LT:- 
Cf: \[DELL-316LT:--, Accu: Akku, STATUS: Status, USER: Anwender\] 
(lc) Cb: DELL-3 16LT: Rechner 
Cf: \[DELL-316LT: Rechner, Accu: --, DISCHARGE: Enfleerung, 
TIME-UNIT-PAIR: 30 Minuten, TIME-UNIT-PAIR: 5 Sekunden\] 
(ld) Cb: DELL-3 16LT: er 
Cf: \[DELL-316LT: er, LoW-BATTERY-LED: Low-Battery-LED 
Table 3: Centering Data for Text Fragment (1) 
CONTINUE 
CONTINUE 
CONTINUE 
CONTINUE 
(2a) Cb: DELL-3 16LT: 316LT 
Cf: \[DELL-316LT: 316LT, NIMH-Accu: NiMH-Akku\] 
(2b) Ch: DELL-3 16LT: Reehner 
Cf: \[NIMH-Accu: Akku, DELL-316LT: Rechner, TIME-UNIT-PAIR: 4 Stunden, 
POWER: Sffom\] 
(2c) . Cb: NIMH-Accu:- 
Cf: \[NIMH-Accu: --, CHARGE-TIME: Ladezeit, TIME-UNIT-PAIR: 1,5 Stunden\] 
CONTINUE 
RETAIN 
SMOOTH-SHIFT 
Table 4: Centering Data for Text Fragment (2) 
(2) Der316LTw~dmiteinemNiMH-Akku bestllckt. 
(I'he 316LT is - with a NiMH-accumulator - 
equipped.) 
b. Durch diesen neuartigen Akku wird der Rechner 
ffir ca. 4 Stunden mit Strom versorgt. 
(Because of this new type of accumulator - is 
the computer - for approximately 4 hours - with 
power - provided.) 
c. Dartiberhinaus ist die Ladezeit mit 1,5 Stunden 
sehr kurz. 
(Also - is - the charge time of 1.5 hours - quite 
short.) 
Given these basic relations, we may formulate the 
composite relation :>,s (Table 5). It states the condi- 
tions for the comprehensive ordering of items on C! 
(x and y denote lexical heads). 
>,s := { (x,y) I 
~fx and y both represent the same type of IS pattern 
then the relation >~,,c applies to x and y 
else/fx and y both represent different forms 
of bound elements 
then the relation >rSbo,,a applies to x and y 
else the relation >rsb°,= applies to x and y } 
Table 5: Information Structure Relation 
4 Evaluation 
In this section, we first describe the empirical and 
methodological framework in which our evaluation 
experiments were embedded, and then turn to a dis- 
cussion of evaluation results and the conclusions we 
draw from the data. 
4.1 Evaluation Framework 
The test set for our evaluation experiment consisted of 
three different text sorts: 15 product reviews from the 
information technology (IT) domain (one of the two 
main corpora at our lab), one article from the German 
news magazine Der Spiegel, and the first two chapters 
of a short story by the German writer Heiner Miiller 4. 
The evaluation was carried out manually in order to 
circumvent error chaining 5. Table 6 summarizes the 
total numbers of anaphors, textual ellipses, utterances 
and words in the test set. 
ag. apho~ ellipses utterances words 
IT 308 294 451 5542 
Spiegel 102 25 82 1468 
MfiUer 153 20 87 867 
563 339 620 7877 
Table 6: Test Set 
Given this test set, we compared three major ap- 
proaches to centering, viz. the original model whose 
ordering principles are based on grammatical role in- 
dicators only (the so-called canonical model) as char- 
acterized by Table 1, an "intermediate" model which 
can be considered a naive approach to free word order 
languages, and, of course, the functional model based 
on information structure constraints as stated in Table 
2. For reasons discussed below, augmented versions 
of the naive and the canonical approaches will also be 
considered. They are characterized by the additional 
4Liebesgeschichte. In Heiner Mflller. Geschichten aus 
der Produktion 2. Berlin: Rotbuch Verlag, pp. 57-63. 
SA performance evaluation of the current anaphora and 
ellipsis resolution capacities of our system is reported in 
Hahn et al. (1996). 
273 
constraint that elliptical antecedents are ranked higher 
than elliptical expressions (short: "ante > express"). 
For the evaluation of a centering algorithm on nat- 
urally occurring text it is necessary to specify how to 
deal with complex sentences. In particular, methods 
for the interaction between intra- and intersentential 
anaphora resolution have to be defined, since the cen- 
tering model is concerned only with the latter case (see 
Suri & McCoy (1994)). We use an approach as de- 
scribed by Strube (1996) for the evaluation. 
Since most of the anaphors in these texts are nom- 
inal anaphors, the resolution of which is much more 
restricted than that of pronominal anaphors, the rate of 
success for the whole anaphora resolution process is 
not significant enough for a proper evaluation of the 
functional constraints. The reason for this lies in the 
fact that nominal anaphors are far more constrained by 
conceptual criteria than pronominal anaphors. So the 
chance to properly resolve a nominal anaphor, even 
at lower ranked positions in the center lists, is greater 
than for pronominal anaphors. While we shift our 
evaluation criteria away from simple anaphora resolu- 
tion success data to structural conditions based on the 
proper ordering of center lists (in particular, we focus 
on the most highly ranked item of the forward-looking 
centers) these criteria compensate for the high propor- 
tion of nominal anaphora that occur in our test sets. 
The types of centering transitions we make use of (cf. 
Table 7) are taken from Walker et al. (1994). 
~(~)= ~(~) 
~(~)# ~(~) 
cb(u.) = c~(u._~) 
OR Cb(U,,-I) undef. 
CONTINUE 
RETAIN ROUGH-SHIFT 
Cb(U.) # 
C~(U~_~) 
SMOOTH-SHIFT 
Table 7: Transition Types 
4.2 Evaluation Results 
In Table 8 we give the numbers of centering transi- 
tions between the utterances in the three test sets. The 
first column contains those which are generated by the 
naive approach (such a proposal was made by Gordon 
et al. (1993) as well as by Rambow (1993) who, nev- 
ertheless, restricts it to the German middlefield only). 
We simply ranked the elements of C! according to 
their text position. While it is usually assumed that the 
elliptical expression ranks above its antecedent (Grosz 
et al., 1995, p.217), we assume the contrary. The sec- 
ond column contains the results of this modification 
with respect to the naive approach. In the third column 
of Table 8 we give the numbers of transitions which 
are generated by the canonical constraints as stated by 
Grosz et al. (1995, p.214, 217). The fourth column 
supplies the results of the same modification as was 
used for the naive approach, viz. elliptical antecedents 
are ranked higher than elliptical expressions. The fifth 
column shows the results which are generated by the 
functional constraints from Table 2. 
First, we examine the error data for anaphora res- 
olution for the five cases. All approaches have 99 
errors in common. These are due to underspecifica- 
tions at different levels, e.g., the failure to account 
for prepositional anaphors (16), plural anaphors (8), 
anaphors which refer to a member of a set (14), sen- 
tence anaphors (21), and anaphors which refer to the 
global focus (12). Only 6 errors of the functional ap- 
proach are directly caused by an inappropriate order- 
ing of the C I, while the naive approach leads to 10 
errors and the canonical to 7. When the antecedent of 
an elliptical expression is ranked above the elliptical 
expression itself the error rate of these two augmented 
approaches increases to 12 and 9, respectively. 
We now turn to the distribution of transition types 
for the different approaches. The centering model as- 
sumes a preference order among these transitions, e.g., 
CONTINUE ranks above RETAIN and RETAIN ranks 
above SHIFT. This preference order reflects the pre- 
sumed inference load put on the hearer or speaker 
to coherently decode or encode a discourse. Since 
the functional approach generates a larger amount 
of CONTINUE transitions, we interpret this as a first 
rough indication that this approach provides for more 
efficient processing than its competitors. 
But this reasoning is not entirely conclusive. Count- 
ing single occurrences of transition types, in general, 
does not reveal the entire validity of the center lists. 
Instead, considering adjacent transition pairs gives a 
more reliable picture, since depending on the text sort 
considered (e.g., technical vs. news magazine vs. lit- 
erary texts) certain sequences of transition types may 
be entirely plausible, though they include transitions 
which, when viewed in isolation, seem to imply con- 
siderable inferencing load (cf. Table 8). For instance, 
a CONTINUE transition which follows a CONTINUE 
transition is a sequence which requires the lowest pro- 
cessing costs. But a CONTINUE transition which fol- 
lows a RETAIN transition implies higher processing 
costs than a SMOOTH-SHIFT transition following a 
RETAIN transition. This is due to the fact that a RE- 
TAIN transition ideally predicts a SMOOTH-SHIFT in 
the following utterance. In this case the SMOOTH- 
SHIFT is the "least effort" transition, because only the 
first element of the C! of the preceding utterance has 
to be checked to perform the SMOOTH-SHIFT transi- 
tion, while in the case of CONTINUE at least one more 
check has to be performed. Hence, we claim that no 
one particular centering transition is preferred over an- 
other. Instead, we postulate that some centering tran- 
sition pairs are preferred over others. Following this 
274 
IT 
Transition Types naive naive & ante > express 
CONTINUE 49 167 
RETAIN 269 158 
SMOOTH-SHIFT 32 41 
ROUGH-SHIFT 39 23 
Errors ' 69 70 
canonical 
102 
226 
24 
37 
68 
canonic~ & functional 
an~ > express 
197 309 
131 25 
35 51 
26 4 
69 67 
Spiegel 
Miiller 
CONTINUE 17 
RETAIN 42 
SMOOTH-SHIFT 9 
ROUGH-SHIFT 7 
Errors 18 
CONTINUE 31 
RETAIN 19 
SMOOTH-SHIFT 15 
ROUGH-SItlFT 14 
Errors 22 
CONTINUE 97 
RETAIN 330 
SMOOTH-SHIFT 56 
ROUGH-SHIFT 60 
Errors (specific errors) 109 (10) 
28 
32 
9 
6 
19 
31 
19 
17 
12 
22 
37 
28 
7 
3 
16 
32 
18 
15 
14 
22 
43 50 
23 12 
8 13 
1 0 
17 f6 
32 36 
18 15 
16 18 
13 10 
22 22 
272 395 
172 52 
59 82 
40 14 
108 (9) 105 (6) 
226 171 
209 272 
67 46 
41 54 
111 (12) 106 (7) 
Table 8: Numbers of Centering Transitions 
line of argumentation, we here propose to classify all 
occurrences of centering transition pairs with respect 
to the costs they imply. The cost-based evaluation 
of different C! orderings refers to evaluation criteria 
which form an intrinsic part of the centering model 6. 
Transition pairs hold for two immediately succes- 
sive utterances. We distinguish between two types of 
transition pairs, cheap ones and expensive ones. We 
call a transition pair cheap if the backward-looking 
center of the current utterance is correctly predicted 
by the preferred center of the immediately preced- 
ing utterance, i.e., Cb(Ui) = Gp(Ui_l),i = 2...n. 
Transition pairs are called expensive if the backward- 
looking center of the current utterance is not correctly 
predicted by the preferred center of the immediately 
preceding utterance, i.e., Cb(Ui) # Gp(Ui_l),i = 
2... n. Table 9 contains a detailed synopsis of cheap 
and expensive transition pairs. In particular, chains 
of the RETAIN transition in passages where the Cb 
does not change (passages with constant theme) show 
that the canonical ordering constraints for the forward- 
looking centers are not appropriate, 
The numbers of centering transition pairs generated 
by the different approaches are shown in Table 10, In 
general, the functional approach shows the best re- 
6As a consequence of this postulate, we have to rede- 
fine Rule 2 of the Centering Constraints (Grosz et al., 1995, 
p.215) appropriately, which gives an informal characteriza- 
tion of a preference for sequences of CONTINUE over se- 
quences of RETAIN arid, similarly, sequences of RETAIN 
over sequences of SHIFT. Our specification for the case of 
text interpretation says that cheap transitions are preferred 
over expensive ones, with cheap and expensive transitions 
as defined in Table 9. 
Suits, while the naive and the canonical approaches 
work reasonably well for the literary text, but exhibit 
a poor performance for the texts from the IT domain 
and the news magazine. The results for the latter ap- 
proaches become only slightly more positive with the 
modification of ranking the antecedent of a textual el- 
lipsis above the elliptical expression, but they do not 
compare to the results of the functional approach. 
We were also interested in finding out whether the 
functional ordering we propose possibly "includes" 
the grammatical role based criteria discussed so far. 
We, therefore, re-evaluated the examples already an- 
notated with Gb/C! data available in the literature 
(for the English language, we considered all exam- 
pies from Grosz et al. (1995) and Brennan et al. 
(1987); for Japanese we took the data from Walker 
et al. (1994)). Surprisingly enough, all examples of 
Grosz et al. (1995) passed the test successfully. Only 
with respect to the troublesome Alfa Romeo driving 
scenario (cf. Brennan et al. (1987, p.157)) our con- 
straints fail to properly rank the elements of the third 
sentence C! of that example. 7 Note also that these 
results were achieved without having recourse to ex- 
tra constraints, e.g., the shared property constraint to 
account for anaphora parallelism (Kameyama, 1986). 
We applied our constraints to Japanese examples in 
the same way. Again we abandoned all extra con- 
straints set up in these studies, e.g., the Zero Topic As- 
signment (ZTA) rule and the special role of empathy 
7In essence, the very specific problem addressed by that 
example seems to be that Friedman has not been previously 
introduced in the local discourse segment and is only acces- 
sible via the global focus. 
275 
CONTINUE 
- cheap 
CONTINUE cheap 
RETAIN expensive 
SMOOTH-SHIFT cheap 
ROUGH-SHIFT expensive 
RETAIN 
expensive 
cheap 
expensive 
expensive 
expensive 
SMOOTH-SHIFT 
expensNe 
cheap 
expens~e 
cheap 
ROUGH-SHIFT 
i 
expensive 
expensive 
expensive 
expensive 
Table 9: Costs for Transition Pairs 
naive & cost type naive 
ante > express 
cheap 72 180 
1T expensive 317 209 
Cheap 25 36 
Spiegel expensive 50 39 
cheap 45 48 
MOiler expensive 34 31 
cheap 142 264 
expensive 401 279 
c~onical 
129 
260 
45 
30 
46" 
33 
,220 
323 
functional ante > express 
236 
153 
51 
24 
48 
31 
335 
208 
321 
68 
62 
13 
55 
24 
438 
105 
Table 10: Cost Values for Centering Transition Pair Types 
verbs (Walker et al., 1994). However, the results our 
constraints generate are the same as those generated by 
Walker et al. including these model extensions. Only a 
single problematic case remains, viz. example (30) of 
Walker et al. (1994, p.214) causes the same problems 
they described (discourse-initial utterance, semantic 
or world knowledge should be available). Even for 
the crucial examples (32)-(36) of Walker et al. (1994, 
p.216-221) our constraints generate the same Cls as 
Walker et al.' s constraints with ZTA. 
To summarize the results of our empirical evalua- 
tion, we first claim that our proposal based on func- 
tional criteria leads to substantially better and -- with 
respect to the inference load placed on the text under- 
stander, whether human or machine -- more plausi- 
ble results for languages with free word order than the 
structural constraints given by Grosz et al. (1995) and 
those underlying a naive approach. We base these ob- 
servations on an evaluation approach which considers 
transition pairs in terms of the inference load specific 
pairs imply. Second, we have gathered some evidence, 
still far from being conclusive, that the functional con- 
straints on centering seem to incorporate the struc- 
tural constraints for English and the modified struc- 
tural constraints for Japanese. Hence, we hypothesize 
that functional constraints on centering might consti- 
tute a general mechanism for treating free an___dd fixed 
word order languages by the same descriptive mecha- 
nism. This claim, however, has to be further substan- 
tiated by additional cross-linguistic empirical studies. 
5 Comparison with Related Approaches 
The centering model (Grosz et al., 1983; 1995) is con- 
cerned with the interactions between the local coher- 
ence of discourse and the choices of referring expres- 
sions. Crucial for the centering model is the way 
the forward-looking centers are organized. Despite 
several cross-linguistic studies a kind of "standard" 
has emerged based on the study of English (cf. Ta- 
ble 1 in Section 1). Only few of these cross-linguistic 
studies have led to changes in the basic order of dis- 
course entities, the work of Walker et al. (1990; 
1994) being the most far reaching exception. They 
consider the role of expressive means in Japanese to 
indicate topic status and the speaker's perspective, 
thus introducing functional notions, viz. ToPIc and 
EMPATHY, into the discussion. German, the object 
language we deal with, is also a free word order lan- 
guage like Japanese (possibly even more constrained). 
Our basic revision of the ordering scheme completely 
abandons grammatical role information and replaces it 
with entirely functional notions reflecting the informa- 
tion structure of the utterances in the discourse. Inter- 
estingly enough, several extra assumptions introduced 
to account, e.g., for anaphora parallelism (e.g., the 
shared property constraint formulated by Kameyama 
(1986)) can be eliminated without affecting the cor- 
rectness of anaphora resolutions. Rambow (1993) has 
presented a theme/rheme distinction within the cen- 
tering model to which we fully subscribe. His pro- 
posal concerning the centering analysis of German (al- 
ready referred to as the "naive" approach; cf. Section 
4) is limited, however, to the German middlefield and, 
hence, incomplete. 
A common topic of criticism relating to focusing 
approaches to anaphora resolution has been the diver- 
sity of data structures they require, which are likely 
to hide the underlying linguistic regularities. Focus- 
ing algorithms prefer the discourse element already 
in focus for anaphora resolution, thus considering 
context-boundedness, too. But the items of the fo- 
cus lists are either ordered by thematic roles (Sidner, 
276 
1983) or grammatical roles (Suri & McCoy, 1994; 
Dahl & Ball, 1990)). Dahl & Ball (1990) improve 
the focusing mechanism by simplifying its data struc- 
tures and, thus, their proposal is more closely related 
to the centering model than any other focusing mecha- 
nism. But their approach still relies upon grammatical 
information for the ordering of the centering list, while 
we use only the functional information structure as the 
guiding principle. 
6 Conclusion 
In this paper, we provided an account for ordering 
the forward-looking centers which is entirely based on 
functional notions, grounded on the information struc- 
ture of utterances in a discourse. We motivated our 
proposal by the constraints which hold for a free word 
order language such as German and derived our results 
from data-intensive empirical studies of (real-world) 
expository texts. We have gathered preliminary evi- 
dence that the functional ordering of discourse enti- 
ties in the centers seems to coincide with the gram- 
matical roles of fixed word order languages. We also 
augmented the ordering criteria of the forward-looking 
center such that it accounts not only for (pro)nominal 
but also for functional anaphora (textual ellipsis), an 
issue that, so far, has only been sketchily dealt with 
in the centering framework. The extensions we pro- 
pose have been validated by the empirical analysis of 
real-world expository texts of considerable length. We 
thus follow methodological principles of corpus-based 
studies that have been successfully exercised in the 
work of Passonneau (1993). Still open are proper de- 
scriptions of deictic expressions, proper names (cf. the 
Alfa Romeo driving scenario), and plural or generic 
definite noun phrases. An anaphora resolution module 
and an ellipsis handler based on this functional center- 
ing model has been implemented as part of a compre- 
hensive text parser for German. 
Acknowledgments. We would like to thank our colleagues 
in the 8£ZY r group for fruitful discussions and Jon A1- 
cantara (Cambridge, UK) for re-reading the final version 
via Interact. This work has been funded by LGFG Baden- 
Wiirttemberg (M. S trube). 

References 
Brennan, S. E., M. W. Friedman & C. J. Pollard (1987). A 
centering approach to pronouns. In Proc. of ACL-87, 
pp. 155-162. 
Clark, H. H. (1975). Bridging. In Proc. of TINLAP-1, pp. 
169-174. 
Cote, S. (1996). Ranking forward-looking centers. In 
E. Prince, A. Joshi & M. Walker (Eds.), Centering in 
Discourse. Oxford: Oxford University Press (to ap- 
pear). 
Dahl, D. A. & C. N. Ball (1990). Reference resolution in 
PUNDIT. In P. Saint-Dizier & S. Szpakowicz (Eds.), 
Logic and Logic Grammars for Language Processing, 
pp.. 168-184. Chichester, U.K.: Ellis Horwood. 
Dahl, O. (Ed.) (1974). Topic and Comment, Contextual 
Boundness, and Focus. Hamburg: Buske. 
Dane~, E (1974a). Functional sentence perspective and 
the organization of the text. In F. Dane~ (Ed.), Pa- 
person Functional Sentence Perspective, pp. 106-128. 
Prague: Academia. 
Dane~, F. (Ed.) (1974b). Paperson FunctionalSentencePer- 
spective. Prague: Academia. 
Gordon, P. C., B. J. Grosz & L. A. Gilliom (1993). Pro- 
nouns, names, and the centering of attention in dis- 
course. Cognitive Science, 17:311-347. 
Grosz, B. J., A. K. Joshi & S. Weinstein (1983). Providing a 
unified account of definite noun phrases in discourse. 
In Proc. of ACL-83, pp. 4430. 
Grosz, B. J., A. K. Joshi & S. Weinstein (1995). Center- 
.ing: A framework for modeling the local coherence of 
discourse. Computational Linguistics, 21 (2):203-225. 
Hahn, U. & M. Strube (1996). Incremental centering and 
center ambiguity. In Proc. of the 18 th Annual Confer- 
ence of the Cognitive Science Society. La Jolla, CA. 
Hahn, U., M. Strube & K. Markert (1996). Bridging textual 
ellipses. In Proc. of COLING-96. 
Hajieowi, E., V. Kubofi & P. Kubofi (1992). Stock of shared 
knowledge: A tool for solving pronominal anaphora: 
InProc. ofCOL1NG-92, Vol. 1, pp. 127-133. 
Haji~ov~, E., H. Skoumalov~ & P. Sgall (1995). An auto- 
marie procedure for topic-focus identification. Com- 
putational Linguistics, 21(1):81-94. 
Halliday, M. A. K. (1967). Notes on transitivity and theme 
in English, Part 2. Journal of Linguistics, 3:199-244. 
Halliday, M. A. K, & R. Hasan (1976). Cohesion in English. 
London: Longman. 
Kameyama, M. (1986). A property-sharing constraint in 
centering. In Proc. of ACL-86, pp. 200-206. 
Passormeau, R. J. (1993). Getting and keeping the center of 
attention. In M. Bates & R. Weisehedel (Eds.), Chal- 
lenges in Natural Language Processing, pp. 179-227. 
Cambridge, UK: Cambridge University Press. 
Rambow, O. (1993), Pragmatic aspects of scrambling and 
topicalization in German. In IRCS Workshop on Cen- 
tering in Discourse. Univ. of Pennsylvania, 1993. 
Sidner, C. L. (1983). Focusing in the comprehension of deft- 
nite anaphora. In M. Brady & R. Berwick (Eds.), Com- 
putational Models of Discourse, pp. 267-330. Cam- 
bridge, MA: MIT Press. 
Strube, M. (1996). Processing complex sentences in the cen- 
tering framework. In this volume. 
Strube, M. & U. Hahn (1995). ParseTalk about sentence- 
and text-level anaphora. In Proc. of EACL-95, pp. 237- 
244. 
Suri, L. Z. & K. F. McCoy (1994). RAFT/RAPR and center° 
ing: A comparison and discussion of problems related 
to processing complex sentences. Computational Lin- 
guistics, 20(2):301-317. 
Turan, U. (1995). Null vs. Overt Subjects in Turkish: A Cen- 
tering Approach. (Ph.D. thesis). University of Penn- 
sylvania. 
Walker, M. A., M. Iida & S. Cote (1990). Centering in 
Japanese discourse. In Proc. of COLING.90, Ap- 
pendix, 6pp. 
Walker, M. A., M. Iida & S. Cote (1994). Japanesediscourse 
and the process of centering. Computational Linguis- 
tics, 20(2): 193-233. 
