Processing Complex Sentences in the Centering Framework 
Michael Strube 
Freiburg University 
Computational Linguistics Lab 
Europaplatz 1, D-79085 Freiburg, Germany 
st rube@ coling, uni- freiburg, de 
Abstract 
We extend the centering model for the resolution 
of intia-sentential anaphora and specify how to 
handle complex sentences. An empirical eval- 
uation indicates that the functional information 
structure guides the search for an antecedent 
within the sentence. 
1 Introduction 
The centering model (Grosz et al., 1995) focuses 
on the resolution of inter-sentential anaphora. Since 
intra-sentential anaphora occur at high rates in real- 
world texts, the model has to be extended for the res- 
olution of anaphora at the sentence level. However, 
the centering framework is not fully specified to han- 
die complex sentences (Suri & McCoy, 1994). This 
underspocification corresponds to the lack of a pre- 
cise definition of the expression utterance, a term al- 
ways used but intentionally left undefined 1. There- 
fore, the centering algorithms currently under discus- 
sion are not able to handle naturally occurring dis- 
course. Possible strategies for treating sentence-level 
anaphora within the centering framework are 
1. processing sentences linearly one clause at a time 
(as suggested by Grosz et al. (1995)), 
2. preference for sentence-external antecedents 
which are proposed by the centering mechanism, 
3. preference for sentence-internal antecedents 
which are filtered by the usual binding criteria, 
4. a mixed-mode which prefers only a particular set 
of sentence-internal over sentence-external an- 
tecedents (e.g. Suri & McCoy (1994)). 
The question arises as to which strategy fits best for the 
interaction between the resolution of intra- and inter- 
sentential anaphora. In my contribution, evidence for a 
mixed-mode strategy is brought forward, which favors 
a particular set of sentence-internal antecedents given 
by functional criteria. 
1Cf. the sketchy statements by Brennan et al. (1987, 
p.155): "\[...\] U is an utterance (not necessarily a full clause) 
\[...\]", and by Grosz et al. (1995, p.209): "U need not to be a 
full clause." 
2 Constraints on Sentential Anaphora 
Our studies on German texts have revealed that the 
functional information structure of the sentence, con- 
sidered in terms of the context-boundedness of dis- 
course elements, is the major determinant for the rank- 
ing on the forward-looking-centers (C! (U,)) (Strube 
& Hahn, 1996). Hence, context-bound discourse ele- 
ments are generally ranked higher in the C! than any 
other non-anaphoric element. The functional informa- 
tion structure has impact not only on the resolution of 
inter-sentential anaphora, but also on the resolution of 
intra-sentential anaphora. Hence, the most preferred 
antecedent of an intra-sentential anaphor is a phrase 
which is also anaphoric. Consider sentences (1) and 
(2) and the corresponding centering data in Table 1 
(Cb: backward-looking center; the first dement of the 
pairs denotes the discourse entity, the second element 
the surface). In sentence (1), a nominal anaphor oc- 
curs, der T3100SX (a particular notebook). In sentence 
(2), another nominal anaphor appears, der Rechner 
(the computer), which is resolved to T3100SX from 
the previous sentence. In the matrix clause, the pro- 
noun er (it) co-specifies the already resolved anaphor 
der Rechner in the subordinate clause. 
(1) Ist der Resume-Modus aktiviert, schaltet sich der 
T3100SX selbstiindig ab. 
(If the resume mode is active, - switches - itself - the 
T3100SX- automatically - off.) 
(2) Bei spiiterem Einschalten des Rechners arbeitet er so- 
fort an der alten Smile weiter. 
(The - later - turning on - of the computer - it - re- 
sumes working - at exactly the same place.) 
(1) Cb: T3100SX: T3100SX 
Cf: \[1"3100SX: T3100SX\] 
(2) Cb: T3100SX: er 
Cf: \[T3100SX: er, 
TURN-ON: Einschalten, 
PLACE: Smile\] 
Table 1: Centering Data for Sentences (1) and (2) 
This example illustrates our hypothesis that intra- 
sentential anaphors preferably co-specify context- 
bound discourse elements. In order to empirically 
378 
strenghten this argument, we have examined several 
texts of different types: 15 texts from the information 
technology (IT) domain, one text from the German 
news magazine Der Spiegel, and the first chapters of 
a short story by the German writer Heiner Miiller 2 (cf. 
Table 2). In the texts, 65 intra-sentential anaphors oc- 
I II text ana. I sent. ana. I anapho~ I words I 
IT 284 24 308 5542 
Spiegel 90 12 102 1468 
M~ller 124 29 153 867 
498 65 563 7877 
Table 2: Distribution of Anaphors in the Text Corpus 
cur, 58 of them (89,2%) have an antecedent which is a 
resolved anaphor, while only 32 of them (49,2%) have 
an antecedent which is the subject of the matrix clause 
(cf. Table 3). These data indicate that an approach 
based on grammatical roles (Sm'i & McCoy, 1994) is 
inappropriate for the German language, while an ap- 
proach based on the functional information structure 
seems preferable. In addition, we maintain that ex- 
changing grammatical with functional criteria is also 
a reasonable strategy for fixed word order languages. 
They can be rephrased in terms of functional crite- 
ria, simply due to the fact that grammatical roles and 
the information structure patterns we defined, unless 
marked, coincide in these languages. 
I II cont.-bound --,bound II subj. I -~ subj. I 
IT 20 4 16 8 
Spiegel 10 2 6 6 
MOiler 28 1 10 19 
58 7 32 33 
Table 3: Types of Intra-Sentential Antecedents 
Since the strategy described above is valid only for 
complex sentences which consist of a matrix clause 
and one or more subordinate clauses, compound sen- 
tences which consist of main clauses must be consid- 
ered. Each of these sentences is processed by our 
algorithm in linear order, one clause at a time with 
the usual centering operations. Compound sentences 
which consist of multiple full clauses also have multi- 
ple Cb/C! data. 
Now, we are able to define the expression utterance 
in a satisfactory manner: An utterance U is a simple 
sentence, a complex sentence, or each full clause of a 
compound sentence 3. The C! of an utterance is com- 
puted only with respect to the matrix clause. Given 
these findings, complex sentences can be processed at 
three stages (2a-2c; transitions from one stage to the 
next occur only when a suitable antecedent has not 
been found at the previous stage): 
2Liebesgeschichte. In Heiner Mflller, Geschichten aus 
der Produktion 2, Berlin: Rotbuch Verlag, pp.57-63. 
aWe do not consider dialogues with elliptical utterances. 
1. For resolving an anaphor in the first clause of Un, 
propose the dements of Cy (Un-1) in the given 
order. 
2. For resolving an anaphor in a subsequent clause 
of U,, 
(a) propose already context-bound elements of 
Un from left to right 4. 
(b) propose the dements of C:(Un-1) in the 
given order. 
(c) propose all dements of Un not yet checked 
from left to right. 
3. Compute the C! (Un), considering only the ele- 
ments of the matrix clause of Un. 
3 Evaluation 
In order to evaluate the functional approach to the res- 
olution of intra-sentential anaphora within the center- 
ing model, we compared it to the other approaches 
mentioned in Section 1, employing the test set re- 
ferred to in Table 2. Note that we tried to eliminate er- 
ror chaining and false positives (for some remarks on 
evaluating discourse processing algorithms, cf. Walker 
(1989); we consider her results as a starting point for 
our proposal). 
First, we examine the errors which all strategies 
have in common (for the success rate, cf. Table 4). 
99 errors are caused by underspecification at differ- 
ent levels, e.g., prepositional anaphors (16), plural 
anaphors (8), anaphors which refer to a member of a 
set (14), sentence anaphors (21), and anaphors which 
refer to a global focus (12) are not yet included in the 
mechanism. In 9 cases, any strategy will choose the 
false antecedent. 
The most interesting cases are the ones for which 
the performance of the different strategies varies. The 
linear approach generates 40 additional errors in the 
anaphora resolution, which are caused only by the or- 
dering strategy to process each clause of sentences 
with the centering mechanism. The approach which 
prefers inter-sentential anaphora causes 60 additional 
errors. Note that this strategy performs remarkably 
well at first sight. For 44 of the errors it chooses an 
inter-sentential antecedent which is, on the surface, 
identical to the correct intra-sentential antecedent. We 
count these 44 resolutions as false positives, since the 
anaphor has been resolved to the false discourse en- 
tity. The approach which prefers intra-sentential an- 
tecedents causes 27 additional errors. These errors 
occur whenever an inter-sentential anaphor can be re- 
solved with an incorrect intra-sentential antecedent. 
4We abstract here from the syntactic criteria for filtering 
out some elements of the current sentence by applying bind- 
hag criteria (Strube & Hahn, 1995). Syntactic constraints 
like control phenomena override the preferences given by 
the context. 
379 
I II r4 I linear I ex=a > int a I int a > extra I f ctional I 
IT 308 237 (?6,9%) 229 (?4,3%) 240 (77,9%) 247 (80,2%) 
Spiegel 102 82 (80,4%) 76 (74,5%) 82 (80,4%) 86 (84,3%) 
Mtlller 153 105 (68,8%) 99 (64,7%) 115 (75,3%) 128 (83,7%) 
I II 563 1424 (75,3%)1404 (71,8%)1437 (77,6%)1461 (81,9%) I 
Table 4: Success Rate without Semantic Constraints 
The functional approach causes only 3 additional er- 
rors. These errors occur whenever the antecedent of 
an intra-sentential anaphor is not bound by the context 
(which is possible but rare) and when the anaphor can 
be resolved at the text level. 
The results change slightly if semantic/conceptual 
constraints (type and further admissibility constraints) 
on anaphora are considered. 22 errors of the lin- 
ear approach, 8 errors of the approach which prefers 
inter-sentential antecedents, and 12 errors of the ap- 
proach which prefers inter-sentential antecedents can 
be avoided. Only 6 errors of the functional approach 
can be avoided by incorporating semantic criteria. 
This might constitute a cognitively valid argument for 
the functional approach - the better the strategy, the 
lower the influence of semantics or world knowledge 
on anaphora resolution. 
To summarize the results of our empirical eval- 
uation, we claim that our proposal based on func- 
tional criteria leads to substantively better results for 
languages with free word order than the linear ap- 
proach suggested by Grosz et al. (1995) and the 
two approaches which prefer inter-sentential or intra- 
sentential antecedents. 
4 Comparison to Related Work 
Crucial for the evaluation of the centering model 
(Grosz et al., 1995) and its applicability to naturally 
occurring discourse is the lack of a specification con- 
ceming how to handle complex sentences and intra- 
sentential anaphora. Grosz et al. suggest the pro- 
cessing of sentences linearly one clause at a time. 
We have shown that such an approach is not appro- 
priate for some types of complex sentences. Suri 
& McCoy (1994) argue in the same manner, but we 
consider the functional approach for languages with 
free word order superior to their grammatical criteria, 
while, for languages with fixed word order, both ap- 
proaches should give the same results. Hence, our ap- 
proach seems to be more generally applicable. Other 
approaches which integrate the resolution of sentence- 
and text-level anaphora are based on salience metrics 
(Haji~ov~i et al., 1992; Lappin & Leass, 1994). We 
consider such metrics to be a method which detracts 
from the exact linguistic specifications as we propose 
them. 
At first sight, grammar theories like GB (Chomsky, 
1981) or I-IPSG (Pollard & Sag, 1994), are the best 
choice for resolving anaphora at the sentence-level. 
But these grammar theories only give filters for ex- 
cluding some elements from consideration. Neither 
gives any preference for a particular antecedent at the 
sentence-level, nor do they consider text anaphora. 
5 Conclusions 
In this paper, we gave a specification for handling 
complex sentences in the centering model based on the 
functional information structure of utterances in dis- 
course. We motivated our proposal by the constraints 
which hold for a free word order language (German) 
and derived our results from data-intensive empirical 
studies of real texts of different types. 
Some issues remain open: the evaluation of the 
functional approach for languages with fixed word or- 
der, a fine-grained analysis of subordinate clauses as 
Suri & McCoy (1994) presented for SX because SY 
clauses, and, in general, the solution for the cases 
which cause errors in our evaluation. 
Acknowledgments. This work has been funded by LGFG 
Baden-Wiirttemberg. I would like to thank my colleagues in 
the C£27~" group for fruitful discussions. I would also like to 
thank Jon Alcantara (Cambridge) who kindly took the role 
of the native speaker via Internet. 

References 
Brerman, S. E., M. W. Friedman & C. J. Pollard (1987). A 
centering approach to pronouns. In Proc. of ACL-87, 
pp. 155-162. 
Chornsky, N. (1981). Lectures on Government and Binding. 
Dordreeht: Foris. 
Grosz, B. J., A. K. Joshi & S. Weinstein (1995). Center- 
ing: A framework for modeling the local coherence of 
discourse. Computational Linguistics, 21(2):203-225. 
Hajitov~, E., V. Kubofi & P. Kubofi (1992). Stock of shared 
knowledge: A tool for solving pronominal anaphora. 
In Proc. ofCOLING-92, Vol. 1, pp. 127-133. 
Lappin, S. & H. J. Leass (1994). An algorithm for pronom- 
inal anaphora resolution. Computational Linguistics, 
20(4):535-561. 
Pollard, C. & I. V. Sag (1994). Head-Driven Phrase Struc- 
ture Grammar. Chicago, ILL: Chicago Univ. Press. 
Strobe, M. & U. Hahn (1995). ParseTalk about sentence- 
and text-level anaphora. In Proc. of EACL-95, pp. 237- 
244. 
Strube, M. & U. Hahn (1996). Functional centering. In this 
volume. 
Suri, L. Z. & K. F. McCoy (1994). RAVr/RAPR and center- 
ing: A comparison and discussion of problems related 
to processing complex sentences. ComputationalLin- 
guistics, 20(2):301-317. 
Walker, M. A. (1989). Evaluating discourse processing al- 
gorithms. In Proc. of ACL-89, pp. 251-261. 
