The Role of Centering Theory's Rough-Shift in the Teaching
and Evaluation of Writing Skills
Eleni Miltsakaki
UniversityofPennsylvania
Philadelphia, PA 19104 USA
elenimi@unagi.cis.upenn.edu
Karen Kukich
Educatinal Testing Service
Princeton, NJ 08541 USA
kkukich@ets.org
Abstract
Existing software systems for automated essay scor-
ing can provide NLP researchers with opportunities
to test certain theoretical hypotheses, including some
derived from Centering Theory. In this study we em-
ploy ETS's e-rater essay scoring system to examine
whether local discourse coherence, as de#0Cned bya
measure of Rough-Shift transitions, might be a sig-
ni#0Ccant contributor to the evaluation of essays. Our
positive results indicate that Rough-Shifts do indeed
capture a source of incoherence, one that has not been
closely examined in the Centering literature. These re-
sults not only justify Rough-Shifts as a valid transition
type, but they also support the original formulation of
Centering as a measure of discourse continuityeven in
pronominal-free text.
1 Introduction
The task of evaluating student's writ-
ing ability has traditionally been a labor-
intensivehuman endeavor. However, sev-
eral di#0Berent software systems, e.g., PEG
Page and Peterson #281995#29, Intelligent Essay
Assessor
1
and e-rater
2
, are now being used
to perform this task fully automatically.Fur-
thermore, by at least one measure, these soft-
ware systemsevaluate student essays with the
same degree of accuracy as human experts.
That is, computer-generated scores tend to
matchhuman expert scores as frequently as
twohuman scores match each other #28Burstein
et al., 1998#29.
Essay scoring systems such as these can
provide NLP researchers with opportunities
to test certain theoretical hypotheses and to
explore a variety of practical issues in compu-
tational linguistics. In this study,we employ
the e-rater essay scoring system to test a hy-
1
http:#2F#2Flsa.colorado.edu.
2
http:#2F#2Fwww.ets.org#2Fresearch#2Ferater.html
pothesis related to Centering Theory #28Joshi
and Weinstein, 1981; Grosz et al., 1983, in-
ter alia#29. We focus on Centering Theory's
Rough-Shift transition which is the least well
studied among the four transition types. In
particular, we examine whether the discourse
coherence found in an essay, as de#0Cned bya
measure of relative proportion of Rough-Shift
transitions, might be a signi#0Ccant contributor
to the accuracy of computer-generated essay
scores. Our positive #0Cnding validates the role
of the Rough-Shift transition and suggests a
route for exploring Centering Theory's prac-
tical applicability to writing evaluation and
instruction.
2 The e-rater essay scoring system
One goal of automatic essay scoring systems
suchase-rater is to represent the criteria that
human experts use to evaluate essays. The
writing features that e-rater evaluates were
speci#0Ccally chosen to re#0Dect scoring criteria
fortheessayportion ofthe GraduateManage-
ment Admissions Test #28GMAT#29. These cri-
teria are articulated in GMAT test prepara-
tion materials at http:#2F#2Fwww.gmat.org. In
e-rater, syntactic variety is represented by
features that quantify occurrences of clause
types. Logical organization and clear transi-
tions are represented by features that quan-
tify cue words in certain syntactic construc-
tions. The existence of main and supporting
points is represented by features that detect
where new points begin and where they are
developed. E-rater also includes features that
quantify the appropriateness of the vocabu-
lary content of an essay.
One feature of writing valued by writing
experts that is not explicitly represented in
the currentversion of e-rater is local coher-
ence. Centering Theory provides an algo-
rithm for computing local coherence in writ-
ten discourse. Our study investigates the ap-
plicability of Centering Theory's local coher-
ence measure toessayevaluation by determin-
ing the e#0Bect of adding this new feature to
e-rater's existing array of features.
3 Overview of Centering
A synthesis of two di#0Berent lines of work
#28Joshi and Kuhn, 1979; Joshi and Weinstein,
1981#29 and #28Sidner, 1979; Grosz, 1977; Grosz
and Sidner, 1986#29 yielded the formulation
of Centering Theory as a model for moni-
toring local focus in discourse. The Cen-
tering model was designed to account for
those aspects of processing that are respon-
sible for the di#0Berence in the perceived co-
herence of discourses such as those demon-
strated in #281#29 and #282#29 below #28examples from
Hudson-D'Zmura #281988#29#29.
#281#29 a. John went to his favorite music store to
buy a piano.
b. He had frequented the store for many
years.
c. He was excited that he could #0Cnally buy a
piano.
d. He arrived just as the store was closing for
the day.
#282#29 a. John went to his favorite music store to
buy a piano.
b. It was a store John had frequented for
manyyears.
c. He was excited that he could #0Cnally buy a
piano.
d. It was closing just as John arrived.
Discourse #281#29 is intuitively more coherent
than discourse #282#29. This di#0Berence maybe
seen to arise from the di#0Berent degrees of con-
tinuity in what the discourse is about. Dis-
course #281#29 centers a single individual #28John#29
whereas discourse #282#29 seems to focus in and
out on di#0Berententities #28John, store, John,
store#29. Centering is designed to capture these
#0Ductuations in continuity.
4 The Centering model
In this section, we present the basic def-
initions and common assumptions in Cen-
tering as discussed in the literature #28e.g.,
Walker et al. #281998#29#29. We present the as-
sumptions and modi#0Ccations we made for this
study in Section 6.1.
4.1 Discourse segments and entities
Discourse consists of a sequence of textual
segments and each segment consists of a se-
quence of utterances. In Centering The-
ory, utterances are designated by U
i
, U
n
.
Each utterance U
i
evokes a set of dis-
course entities, the FORWARD-LOOKING
CENTERS, designated by Cf#28U
i
#29. The
members of the Cf set are ranked accord-
ing to discourse salience. #28Ranking is de-
scribed in Section 4.4.#29The highest-ranked
member of the Cf set is the PREFERRED
CENTER, Cp. A BACKWARD-LOOKING
CENTER, Cb,is also identi#0Ced for utterance
U
i
. The highest ranked entity in the pre-
vious utterance, Cf#28U
i,1
#29, that is realized
in the current utterance, U
i
, is its des-
ignated BACKWARD-LOOKING CENTER,
Cb. The BACKWARD-LOOKING CEN-
TER is a special member of the Cfset because
it represents the discourse entity that U
i
is
about, what in the literature is often called
the 'topic' #28Reinhart, 1981; Horn, 1986#29.
The Cp for a given utterance may be iden-
tical with its Cb, but not necessarily so. It
is precisely this distinction between looking
back in the discourse with the Cb and pro-
jecting preferences for interpretations in the
subsequent discourse with the Cp that pro-
vides the key element in computing local co-
herence in discourse.
4.2 Centering transitions
Four types of transitions, re#0Decting four de-
grees of coherence, are de#0Cned in Centering.
They are shown in transition ordering rule
#281#29. The rules for computing the transitions
are shown in Table 1.
#281#29 Transition ordering rule: Continue
is preferred to Retain, which is preferred to
Smooth-Shift, which is preferred to Rough-
Shift.
Centering de#0Cnes one more rule, the Pro-
noun rule whichwe will discuss in detail in
Section 5.
Cb#28Ui#29=Cb#28Ui-1#29 Cb#28Ui#296=Cb#28Ui-1#29
Cb#28Ui#29=Cp Continue Smooth-Shift
Cb#28Ui#296=Cp Retain Rough-Shift
Table 1: Table of transitions
4.3 Utterance
In early formulations of Centering Theory,
the 'utterance' was not de#0Cned explicitly.In
subsequentwork #28Kameyama, 1998#29, the ut-
terance was de#0Cned as, roughly, the tensed
clause with relative clauses and clausal com-
plements as exceptions. Based on crosslin-
guistic studies, Miltsakaki #281999#29 de#0Cned the
utterance as the traditional 'sentence', i.e.,
the main clause and its accompanying subor-
dinate and adjunct clauses constitute a single
utterance.
4.4 Cf ranking
As mentioned earlier, the PREFERRED
CENTER of an utterance is de#0Cned as the
highest ranked member of the Cf set. The
ranking of the Cf members is determined
by the salience status of the entities in the
utterance and mayvary crosslinguistically.
Kameyama #281985#29 and Brennan et al. #281987#29
proposed that the Cf ranking for English is
determined by grammatical function as fol-
lows:
#282#29 Rule for ranking of
forward-looking centers: SUBJ#3EIND.
OBJ#3EOBJ#3EOTHERS
Later crosslinguistic studies based on em-
pirical work #28Di Eugenio, 1998; Turan, 1995;
Kameyama, 1985#29 determined the following
detailed ranking, with QIS standing for quan-
ti#0Ced inde#0Cnite subjects #28people, everyone
etc#29 and PRO-ARB #28we, you#29 for arbitrary
plural pronominals.
#283#29Revised rule for the ranking of
forward-looking centers: SUBJ#3EIND.
OBJ#3EOBJ#3EOTHERS#3EQIS, PRO-ARB.
4.4.1 Complex NPs
In the case of complex NPs, whichhave
the propertyofevoking multiple discourse en-
tities #28e.g. his mother, software industry#29,
the working hypothesis commonly assumed
#28e.g. Walker and Prince #281995#29#29 is ordering
from left to right.
3
5 The role of Rough-Shift
transitions
As mentioned brie#0Dy earlier, the Centering
model includes one more rule, the Pronoun
Rule given in #284#29.
#284#29 Pronoun Rule: If some elementof
Cf#28Ui-1#29 is realized as a pronoun in Ui, then
so is the Cb#28Ui#29.
The Pronoun Rule re#0Dects the intuition
that pronominals are felicitously used to re-
fer to discourse-saliententities. As a result,
Cbs are often pronominalized, or even deleted
#28if the grammar allows it#29. Rule #284#29 then
predicts that if there is only one pronoun in
an utterance, this pronoun must realize the
Cb. The Pronoun Rule and the distribution
offorms#28de#0Cnite#2Finde#0Cnite NPs and pronom-
inals#29 over transition types plays a signi#0Ccant
role in the development of anaphora resolu-
tion algorithms in NLP. Note that the utility
ofthe Pronoun Rule and the Centering transi-
tions in anaphora resolution algorithms relies
heavily on the assumption that the texts un-
der consideration are maximally coherent. In
maximally coherent texts, however, Rough-
Shifts transitions are rare, and even in less
than maximally coherent texts they occur
infrequently. For this reason the distinc-
tion between Smooth-Shifts and Rough-Shifts
was collapsed in previous work #28Di Eugenio,
1998; Hurewitz, 1998, inter alia#29. The status
of Rough-Shift transitions in the Centering
model was therefore unclear, receiving only
negative evidence: Rough-Shifts are valid be-
cause they are found to be rare in coherent
discourse.
In this study we gain insights pertaining
to the nature of the Rough-Shifts precisely
because we are forced to drop the coherence
assumption. Our data consist of student es-
says whose degree of coherence is under eval-
uation and therefore cannot be assumed. Us-
ing students' paragraph marking as segment
boundaries, we 'centered' 100 GMAT essays.
The average length of these essays was about
3
But see also Di Eugenio #281998#29 for the treatment
of complex NPs in Italian.
Def. Phr. Indef. Phr. Prons
Rough-Shifts 75 120 16
Total 195 16
Table 2: Distribution of forms over Rough-Shifts
250 words. In the next section we show
that Rough-Shift transitions provide a reli-
able measure of incoherence, correlating well
with scores provided by writing experts.
One of the crucial insights was that, in
our data, the incoherence detected by the
Rough-Shift measure is not due to violations
of the Pronominal Rule or infelicitous use of
pronominal forms in general. In Table 2,
we report the results of the distribution of
forms over Rough-Shift transitions. Out of
the 211 Rough-Shift transitions, found in the
set of 100 essays, in 195 occasions the Cp
was a nominal phrase, either de#0Cnite or indef-
inite. Pronominals occurred in only 16 cases
of which 6 cases instantiated the pronominals
'we' or 'you' in their generic sense. Table 2
strongly indicates that student essays were
not incoherent in terms of the processing load
imposed on the reader to resolve anaphoric
references. Instead, the incoherence in the es-
says was due to discontinuities in students'
essays caused by their introducing too many
undeveloped topics within what should be a
conceptually uniform segment, i.e. their para-
graphs. This is, in fact, what Rough-Shift
picked up.
These results not only justify Rough-Shifts
as a valid transition type but they also sup-
port the original formulation of Centering as
a measure of discourse continuityeven when
anaphora resoluion is not an issue. It seems
that Rough-Shifts are capturing a source of
incoherence that has been overlooked in the
Centering literature. The processing load in
the Rough-Shift cases reported here is not
increased by the e#0Bort required to resolve
anaphoric reference but instead by the e#0Bort
required to #0Cnd the relevant topic connections
in a discourse bombarded with a rapid suc-
cession of multiple entities. That is, Rough-
Shifts are the result of absent or extremely
short-lived Cbs. Weinterpret the Rough-
Shift transitions in this context as a re#0Dection
of the incoherence perceived by the reader
when s#2Fhe is unable to identify the topic #28fo-
cus#29 structure of the discourse. This is a
signi#0Ccant insight which opens up new av-
enues for practical applications of the Cen-
tering model.
6 The e-rater Centering study
In anearlier preliminary study,weapplied the
Centering algorithm manually to a sample of
36 GMAT essays to explore the hypothesis
that the Centering model provides a reason-
able measure of coherence #28or lack of#29 re#0Dect-
ing the evaluation performed byhuman raters
with respect to the corresponding require-
ments described in the instructions for human
raters. We observed that essays with higher
scores tended to have signi#0Ccantly lower per-
centagesofROUGH-SHIFTs thanessayswith
lower scores. As expected, the distribution of
the other types of transitions was not signif-
icant. In general, CONTINUEs, RETAINs,
and SMOOTH-SHIFTs do not yield incoher-
ent discourses #28in fact, an essay with only
CONTINUE transitions might sound rather
boring!#29.
In this study we test the hypothesis that
a predictor variable derived from Centering
can signi#0Ccantly improve the performance of
e-rater. Since we are in fact proposing Cen-
tering's ROUGH-SHIFTs as a predictor vari-
able, our model, strictly speaking, measures
incoherence.
The corpus for our study came from a
pool of essays written by students taking the
GMAT test. We randomly selected a total
of 100 essays, covering the full range of the
scoring scale, where 1 is lowest and 6 is high-
est #28see appendix#29. We applied the Center-
ing algorithm to all 100 essays, calculated the
percentage of ROUGH-SHIFTs in each essay
and then ran multiple regression to evaluate
the contribution of the proposed variable to
the e-rater's performance.
6.1 Centering assumptions and
modi#0Ccations
Utterance. Following Miltsakaki #281999#29, we
assumethat the each utteranceconsists of one
main clause and all its subordinate and ad-
junct clauses.
Cf ranking. We assumed the Cf ranking
given in #283#29.
A modi#0Ccation we made involved the sta-
tus of the pronominal I.
4
We observed that
in low-scored essays the #0Crst person pronom-
inal I was used extensively, normally present-
ing personal narratives. However, personal
narratives were unsuited to this essay writing
task and were assigned lower scores by ex-
pert readers. The extensive use of I in the
subject position produced an unwanted e#0Bect
of high coherence. We prescriptively decided
to penalize the use of I's in order to better
re#0Dect the coherence demands made by the
particular writing task. The way to penal-
ize was to omit I's. As a result, coherence
was measured with respect to the treatment
of the remaining entities in the I-containing
utterances. This gave us the desired result of
being able to distinguish those I-containing
utterances which made coherent transitions
with respect to the entities they were talking
about and those that did not.
LackofFit
Source
DF Sum of
Squares
Mean
Square
F-
Ratio
LackofFit 71 53.55 0.75 1.30
Pure Error 24 13.83 0.57 Prob#3EF
Total Error 95 67.38 0.23
Max RSq
0.94
Parameter
Estimates
Term
Esti-
mate
Std
Error
t-
Ratio
Prob#3E
jtj
Intercept 1.46 0.37 3.92 0.0002
E-RATER 0.80 0.06 11.91 #3C.0001
ROUGH -0.013 0.0041 -3.32 0.0013
E#0Bect Test
Source
Nparm
DF Sum of
Squares
F-
Ratio
Prob#3E
F
E-RATER 1 1 100.56 141.77 #3C.0001
ROUGH 1 1 7.81 11.01 0.0013
Table 3: Regression
Segments. Segment boundaries are ex-
4
In fact, a similar modi#0Ccation has been proposed
by Hurewitz #281998#29 and Walker #281998#29 observed that
the use of I in sentences such as 'I believe that...', 'I
think that...' do not a#0Bect the focus structure of the
text.
tremely hard to identify in an accurate and
principled way. Furthermore, existing algo-
rithms #28Morris and Hirst, 1991; Youmans,
1991; Hearst, 1994; Kozima, 1993; Reynar,
1994; Passonneau and Litman, 1997; Passon-
neau, 1998#29 rely heavily on the assumption of
textual coherence. In our case, textual coher-
ence cannot be assumed. Given that text or-
ganization is also part of the evaluation of the
essays, we decided to use the students' para-
graph breaks to locate segment boundaries.
6.2 Implementation
For this study,we decided to manually tag
coreferring expressions despite the availabil-
ity of coreference algorithms. We made this
decision because a poor performance of the
coreference algorithm would give us distorted
results and wewould not be able to test our
hypothesis. For the same reason, we manu-
ally tagged the Preferred centers as Cp. We
only needed to mark all the other entities as
OTHER. This information was adequate for
the computation of the Cb and all of the tran-
sitions.
Discourse segmentation and the implemen-
tationof the Centering algorithm for the com-
putation of the transitions were automated.
Segments boundaries were marked at para-
graph breaks and the transitions were calcu-
lated according to the instructions given in
Table 1. As output, the system computed
the percentage of Rough-Shifts for each es-
say. The percentage of Rough-Shifts was cal-
culated as the number of Rough-Shifts over
the total number of identi#0Ced transitions in
the essay.
7 Study results
In the appendix, we give the percentages of
Rough-Shifts #28ROUGH#29 for each of the actual
student essays #28100#29 on whichwe tested the
ROUGH variable in the regression discussed
below. The HUMAN #28HUM#29 column con-
tains the essay scores given byhuman raters
and the EARTER #28E-R#29 column contains the
corresponding score assigned by the e-rater.
Comparing HUMAN and ROUGH, we ob-
serve that essays with scores from the higher
end of the scale tend to havelower percent-
ages of Rough-Shifts than the ones from the
lower end. Toevaluate that this observa-
tion can be utilized to improve the e-rater's
performance, we regressed X=E-RATER and
X=ROUGH #28the predictors#29 by Y=HUMAN.
The results of the regression are shown in Ta-
ble 3. The 'Estimate' cell contains the coef-
#0Ccients assigned for eachvariable. The coef-
#0Ccient for ROUGH is negative, thus penaliz-
ing occurrences of Rough-Shifts in the essays.
The t-test #28't-ratio' in Table 3#29 for ROUGH
has ahighly signi#0Ccant p-value #28p#3C0.0013#29for
these 100 essays suggesting that the added
variable ROUGH can contribute to the ac-
curacy of the model. The magnitude of the
contribution indicated by this regression is
approximately 0.5 point, a reasonalby siz-
able e#0Bect given the scoring scale #281-6#29. Ad-
ditional work is needed to precisely quan-
tify the contribution of ROUGH. That would
involve incorporating the ROUGH variable
into the building of a new e-rater model and
comapring the results of the new model to the
original e-rater model.
As a preliminary test of the predictability
of the model, we jacknifed the data. We per-
formed 100 tests with ERATER as the sole
variable leaving out one essay each time and
recorded the prediction of the model for that
essay.We repeated the procedure using both
variables. The predicted values for ERATER
alone and ERATER+ROUGH are shown in
columns PrH#2FE and PrH#2FE+R respectively
in Table 4. In comparing the predictions, we
observe that, indeed, 57 #25 of the predicted
values shown in the PrH#2FE+R column are
better approximations of the HUMAN scores,
especially in the cases where the ERATER's
score is discrepantby 2 points from the HU-
MAN score.
8 Discussion
Our positive #0Cnding, namely that Centering
Theory's measure of relative proportion of
Rough-Shift transitions is indeed a signi#0C-
cant contributor to the accuracy of computer-
generated essay scores, has several practical
and theoretical implications. Clearly, it in-
dicates that adding a local coherence feature
to e-rater could signi#0Ccantly improve e-rater's
scoring accuracy. Note, however, that over-
all scores and coherence scores need not be
strongly correlated. Indeed, our data contain
several examples of essays with high coher-
ence scores but lowoverall scores and vice
versa.
We brie#0Dy reviewed these cases with several
ETS writing assessment experts to gain their
insights into the value of pursuing this work
further. In an e#0Bort to maximize the use of
their time with us, we carefully selected three
pairs of essays to elicit speci#0Cc information.
One pair included twohigh-scoring #286#29essays,
one with a high coherence score and the other
with a low coherence score. Another pair in-
cluded two essays with low coherence scores
but di#0Bering overall scores #28a 5 and a 6#29. A
#0Cnal pair was carefully chosen to include one
essay with an overall score of 3 that made
several main points but did not develop them
fully or coherently, and another essay with an
overall score of 4 that made only one main
point but did develop it fully and coherently.
After brie#0Dy describing the Rough-Shift co-
herence measure and without revealing either
the overall scores or the coherence scores of
the essay pairs, we asked our experts for their
comments on the overall scores and coherence
of the essays. In all cases, our experts pre-
cisely identi#0Ced the scores the essays had been
given. In the #0Crst case, they agreed with the
high Centering coherence measure, but one
expert disagreed with the low Centering co-
herence measure. For that essay, one expert
noted that "coherence comes and goes" while
another found coherence in a "chronological
organization of examples" #28a notion beyond
the domain of Centering Theory#29. In the sec-
ond case, our experts' judgments con#0Crmed
the Rough-Shift coherence measure. In the
third case, our experts speci#0Ccally identi#0Ced
both the coherence and the development as-
pects as determinants of the essays' scores. In
general, our experts felt that the development
of an automated coherence measure would be
a useful instructional aid.
The advantage of the Rough-Shift metric
over other quanti#0Ced components of the e-
rater is thatit can be appropriately translated
into instructive feedback for the student. In
an interactive tutorial system, segments con-
taining Rough-Shift transitions can be high-
lighted and supplementary instructional com-
ments will guide the studentinto revising the
relevant section paying attention to topic dis-
continuities.
9 Future work
Our study prescribes a route for several fu-
ture research projects. Some, such as the
need to improve on fully automated tech-
niques for noun phrase#2Fdiscourse entity iden-
ti#0Ccation and coreference resolution, are es-
sential for converting this measure of local co-
herence to a fully automated procedure. Oth-
ers, not explicitly discussed here, such as the
status of discourse deictic expressions, nom-
inalization resolution, and global coherence
studies are fair game for basic, theoretical re-
search.
Acknowledgements
Wewould like to thank Jill Burstein who provided us
with the essay set and human and e-rater scores used
in this study; Mary Fowles, Peter Cooper, and Seth
Weiner who provided us with the valuable insights
of their writing assessment expertise; Henry Brown
who kindly discussed some statistical issues with us;
Ramin Hemat who provided perl code for automati-
cally computing Centering transitions and the Rough-
Shift measure for each essay.We are grateful to Ar-
avind Joshi and Alistair Knott for useful discussions.
References
S. Brennan, M. Walker-Friedman, and C. Pollard.
1987. A Centering approach to pronouns. In Pro-
ceedings of the 25th Annual Meeting of the Associa-
tion for Computational Linguistics, pages 155#7B162.
Stanford, Calif.
J. Burstein, K. Kukich, S. Wol#0B, M. Chodorow,
L. Braden-Harder, M.D. Harris, and C. Lu. 1998.
Automated essay scoring using a hybrid feature
identi#0Ccation technique. In Annual Meeting of the
Association for Computational Linguistics, Mon-
treal, Canada, August.
B. Di Eugenio. 1998. Centering in Italian. In Center-
ing Theory in Discourse, pages 115#7B137. Clarendon
Press, Oxford.
B. Grosz and C. Sidner. 1986. Attentions, intentions
and the structure of discourse. Computational Lin-
guistics, 12:175#7B204.
B. Grosz, A. Joshi, and S. Weinstein. 1983. Provid-
ing a uni#0Ced account of de#0Cnite noun phrases in
discourse. In Annual Meeting of the Association
for Computational Linguistics, pages 44#7B50.
B. Grosz. 1977. The representation and use of focus
in  underastanding. Technical Report No.
151, Menlo Park, Calif., SRI International.
M. Hearst. 1994. Multiparagraph segmentation of
expository text. In Proc. of the 32nd ACL.
L. Horn. 1986. Presupposition, theme and variations.
In Chicago Linguistics Society,volume 22, pages
168#7B192.
S. Hudson-D'Zmura. 1988. The Structure of Dis-
course and Anaphor Resolution: The Discourse
Center and the Roles of Nouns and Pronouns.
Ph.D. thesis, UniversityofRochester.
F. Hurewitz. 1998. A quantitative look at discourse
coherence. In M. Walker, A. Joshi, and E. Prince,
editors, Centering Theory in Discourse,chapter 14.
Clarendon Press, Oxford.
A. Joshi and S. Kuhn. 1979. Centered logic: The
role of entity centered sentence representation in
natural  inferencing. In 6th International
Joint ConferenceonArti#0Ccial Intelligence, pages
435#7B439.
A. Joshi and S. Weinstein. 1981. Control of infer-
ence: Role of some aspects of discourse structure:
Centering. In 7th International Joint Conference
on Arti#0Ccial Intelligence, pages 385#7B387.
M. Kameyama. 1985. ZeroAnaphora: The Case of
Japanese. Ph.D. thesis, Stanford University.
M. Kameyama. 1998. Intrasentential Centering: A
case study.InM.Walker, A. Joshi, and E. Prince,
editors, Centering Theory in Discourse, pages 89#7B
112. Clarendon Press: Oxford.
H. Kozima. 1993. Text segmentation based on sim-
ilaritybetween words. In Proc. of the 31st ACL
#28Student Session#29, pages 286#7B288.
E. Miltsakaki. 1999. Locating topics in text process-
ing. In Proceedingsof ComputationalLinguisticsin
the Netherlands #28CLIN'99#29.
J. Morris and G. Hirst. 1991. Lexical cohesion com-
puted by thesaural relations as an indicator of the
structure of the text. Computational Linguistics,
17:21#7B28.
E. B. Page and N. Peterson. 1995. The computer
moves into essay grading: Updating the ancient
test. Phi Delta Kappan, March:561#7B565.
R. Passonneau and D. Litman. 1997. Discourse seg-
mentation byhuman and automated means. Com-
putational Linguistics, 23#281#29:103#7B139.
R. Passonneau. 1998. Interaction of discourse struc-
ture with explicitness of discourse anaphoric noun
phrases. In M. Walker, A. Joshi, and E. Prince,
editors, Centering Theory in Discourse, pages 327#7B
358. Clarendon Press: Oxford.
T. Reinhart. 1981. Pragmatics and linguistics: An
analysis of sentence topics. Philosophica,27:53#7B94.
J. Reynar. 1994. An automatic method of #0Cnding
topic boundaaries. In Proc. of 32nd ACL #28Studen
Session#29, pages 331#7B333.
C. Sidner. 1979. Toward a computational theory of
de#0Cnite anaphora comprehension in English. Tech-
nical Report No. AI-TR-537, Cambridge, Mass.
MIT Press.
U. Turan. 1995. Null vs. Overt Subjects in Turk-
ish Discourse: A Centering Analysis. Ph.D. thesis,
UniversityofPennsylvania.
M. Walker and E. Prince. 1995. A bilateral approach
to givenness: A hearer-status algorithm and a Cen-
tering algorithm. In T. Fretheim and J. Gundel,
editors, Reference and Referent Accessibility. Ams-
terdam: John Benjamins.
M. Walker, A. Joshi, and E. Prince #28eds#29. 1998. Cen-
tering Theory in Discourse. Clarendon Press: Ox-
ford.
M. Walker. 1998. Centering : Anaphora resolution
and discourse structure. In M. Walker, A. Joshi,
and E. Prince, editors, Centering Theory in Dis-
course, pages 401#7B35. Clarendon Press: Oxford.
G. Youmans. 1991. A new tool for discourse ana-
lyis: The vocabulary-management pro#0Cle. Lan-
guage, 67:763#7B789.
HUM E-R ROUGH PrH#2FE PrH#2FE+R
6 5 15 5.05 5.26
6 6 22 5.9921 5.9928
6 6 15 5.99 6.09
6 6 22 5.9921 5.9928
6 6 24 5.99 5.96
6 4 22 4.13 4.35
6 4 13 4.13 4.46
6 6 28 5.99 5.90
6 5 30 5.0577 5.0594
6 4 30 4.13 4.24
6 4 0 4.13 4.62
6 5 20 5.05 5.19
6 6 21 5.99 6.00
6 6 50 5.99 5.58
6 6 25 5.99 5.94
6 5 21 5.05 5.18
6 6 6 5.99 6.22
6 5 35 5.05 4.98
6 5 25 5.05 5.12
6 5 30 5.057 5.059
5 4 15 4.14 4.46
5 5 7 5.07 5.40
5 4 5 4.14 4.60
5 5 38 5.07 4.96
5 4 40 4.14 4.12
5 5 45 5.07 4.86
5 6 27 6.02 5.95
5 4 30 4.28 4.14
5 5 21 5.07 5.20
5 5 16 5.07 5.27
5 5 20 5.07 5.22
5 6 32 6.02 5.88
5 4 40 4.143 4.148
5 4 10 4.14 4.53
5 4 23 4.14 4.35
5 5 20 5.07 5.22
5 6 25 6.02 5.98
5 4 25 4.14 4.33
5 5 50 5.07 4.79
5 6 10 6.02 6.20
4 3 11 3.22 3.71
4 5 45 5.09 4.88
4 4 46 4.15 4.04
4 3 50 3.22 3.17
4 3 36 3.22 3.37
4 3 33 3.22 3.41
4 5 42 5.09 4.92
4 3 50 3.22 3.17
4 4 36 4.15 4.18
4 4 40 4.15 4.13
HUM E-R ROUGH PrH#2FE PrH#2FE+R
4 3 11 3.22 3.71
4 3 75 3.22 2.79
4 4 38 4.15 4.16
4 3 62 3.22 3.00
4 4 12 4.15 4.53
4 4 40 4.15 4.13
4 5 48 5.09 4.84
4 3 9 3.22 3.74
4 3 81 3.22 2.69
4 3 100 3.22 2.34
3 3 55 3.24 3.11
3 4 30 4.16 4.28
3 4 81 4.16 3.59
3 4 42 4.16 4.11
3 3 50 3.24 3.18
3 3 66 3.24 2.96
3 3 42 3.24 3.30
3 2 40 2.30 2.50
3 3 75 3.24 2.83
3 3 40 3.24 3.33
3 3 78 3.24 2.78
3 3 62 3.24 3.02
3 2 55 2.30 2.29
3 2 30 2.30 2.64
3 3 ? 3.29 ?
3 5 45 5.11 4.91
3 3 80 3.24 2.75
3 2 37 2.30 2.54
3 3 75 3.24 2.83
3 2 50 2.30 2.36
2 2 67 2.32 2.14
2 2 67 2.32 2.14
2 4 78 4.17 3.68
2 3 67 3.25 2.97
2 3 41 3.25 3.33
2 2 ? 2.32 ?
2 1 67 1.37 1.30
2 2 20 2.32 2.84
2 2 42 2.32 2.50
2 2 50 2.32 2.39
1 2 50 2.35 2.41
1 2 0 2.35 3.29
1 1 67 1.42 1.35
1 3 71 3.26 2.95
1 3 57 3.26 3.12
1 0 100 0.44 -0.03
1 1 85 1.42 1.09
1 1 67 1.42 1.35
1 2 57 2.35 2.31
1 1 0 1.42 2.48
Table 4: Table with the human scores #28HUM#29, the e-rater scores #28E-R#29, the Rough-Shift measure #28ROUGH#29,
the #28jacknifed#29 predicted values using e-rater as the only variable #28PrH#2FE#29 and the #28jacknifed#29 predicted values
using the e-rater and the added variable Rough-Shift #28PrH#2FE+R#29. The ROUGH measure is the percentage of
Rough-Shifts over the total number of identi#0Ced transitions. The question mark appears where no transitions
were identi#0Ced.
