File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/p00-1052_metho.xml

Size: 21,896 bytes

Last Modified: 2025-10-06 14:07:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1052">
  <Title>The Role of Centering Theory's Rough-Shift in the Teaching and Evaluation of Writing Skills</Title>
  <Section position="4" start_page="2" end_page="2" type="metho">
    <SectionTitle>
3 Overview of Centering
</SectionTitle>
    <Paragraph position="0"> A synthesis of two di#0Berent lines of work #28Joshi and Kuhn, 1979; Joshi and Weinstein, 1981#29 and #28Sidner, 1979; Grosz, 1977; Grosz and Sidner, 1986#29 yielded the formulation of Centering Theory as a model for monitoring local focus in discourse. The Centering model was designed to account for those aspects of processing that are responsible for the di#0Berence in the perceived coherence of discourses such as those demonstrated in #281#29 and #282#29 below #28examples from Hudson-D'Zmura #281988#29#29.</Paragraph>
    <Paragraph position="1">  #281#29 a. John went to his favorite music store to buy a piano.</Paragraph>
    <Paragraph position="2"> b. He had frequented the store for many years.</Paragraph>
    <Paragraph position="3"> c. He was excited that he could #0Cnally buy a piano.</Paragraph>
    <Paragraph position="4"> d. He arrived just as the store was closing for the day.</Paragraph>
    <Paragraph position="5"> #282#29 a. John went to his favorite music store to buy a piano.</Paragraph>
    <Paragraph position="6"> b. It was a store John had frequented for manyyears.</Paragraph>
    <Paragraph position="7"> c. He was excited that he could #0Cnally buy a piano.</Paragraph>
    <Paragraph position="8"> d. It was closing just as John arrived.  Discourse #281#29 is intuitively more coherent than discourse #282#29. This di#0Berence maybe seen to arise from the di#0Berent degrees of continuity in what the discourse is about. Discourse #281#29 centers a single individual #28John#29 whereas discourse #282#29 seems to focus in and out on di#0Berententities #28John, store, John, store#29. Centering is designed to capture these #0Ductuations in continuity.</Paragraph>
  </Section>
  <Section position="5" start_page="2" end_page="3" type="metho">
    <SectionTitle>
4 The Centering model
</SectionTitle>
    <Paragraph position="0"> In this section, we present the basic definitions and common assumptions in Centering as discussed in the literature #28e.g., Walker et al. #281998#29#29. We present the assumptions and modi#0Ccations we made for this study in Section 6.1.</Paragraph>
    <Section position="1" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.1 Discourse segments and entities
</SectionTitle>
      <Paragraph position="0"> Discourse consists of a sequence of textual segments and each segment consists of a sequence of utterances. In Centering Theory, utterances are designated by U</Paragraph>
      <Paragraph position="2"> evokes a set of discourse entities, the FORWARD-LOOKING CENTERS, designated by Cf#28U</Paragraph>
      <Paragraph position="4"> #29. The members of the Cf set are ranked according to discourse salience. #28Ranking is described in Section 4.4.#29The highest-ranked member of the Cf set is the PREFERRED CENTER, Cp. A BACKWARD-LOOKING CENTER, Cb,is also identi#0Ced for utterance</Paragraph>
      <Paragraph position="6"> #29, that is realized in the current utterance, U i , is its designated BACKWARD-LOOKING CENTER, Cb. The BACKWARD-LOOKING CENTER is a special member of the Cfset because it represents the discourse entity that U</Paragraph>
      <Paragraph position="8"> about, what in the literature is often called the 'topic' #28Reinhart, 1981; Horn, 1986#29.</Paragraph>
      <Paragraph position="9"> The Cp for a given utterance may be identical with its Cb, but not necessarily so. It is precisely this distinction between looking back in the discourse with the Cb and projecting preferences for interpretations in the subsequent discourse with the Cp that provides the key element in computing local coherence in discourse.</Paragraph>
    </Section>
    <Section position="2" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.2 Centering transitions
</SectionTitle>
      <Paragraph position="0"> Four types of transitions, re#0Decting four degrees of coherence, are de#0Cned in Centering.</Paragraph>
      <Paragraph position="1"> They are shown in transition ordering rule #281#29. The rules for computing the transitions are shown in Table 1.</Paragraph>
      <Paragraph position="2"> #281#29 Transition ordering rule: Continue is preferred to Retain, which is preferred to Smooth-Shift, which is preferred to RoughShift. null Centering de#0Cnes one more rule, the Pronoun rule whichwe will discuss in detail in</Paragraph>
    </Section>
    <Section position="3" start_page="2" end_page="2" type="sub_section">
      <SectionTitle>
4.3 Utterance
</SectionTitle>
      <Paragraph position="0"> In early formulations of Centering Theory, the 'utterance' was not de#0Cned explicitly.In subsequentwork #28Kameyama, 1998#29, the utterance was de#0Cned as, roughly, the tensed clause with relative clauses and clausal complements as exceptions. Based on crosslinguistic studies, Miltsakaki #281999#29 de#0Cned the utterance as the traditional 'sentence', i.e., the main clause and its accompanying subordinate and adjunct clauses constitute a single utterance.</Paragraph>
    </Section>
    <Section position="4" start_page="2" end_page="3" type="sub_section">
      <SectionTitle>
4.4 Cf ranking
</SectionTitle>
      <Paragraph position="0"> As mentioned earlier, the PREFERRED CENTER of an utterance is de#0Cned as the highest ranked member of the Cf set. The ranking of the Cf members is determined by the salience status of the entities in the utterance and mayvary crosslinguistically.</Paragraph>
      <Paragraph position="1"> Kameyama #281985#29 and Brennan et al. #281987#29 proposed that the Cf ranking for English is determined by grammatical function as follows: null #282#29 Rule for ranking of forward-looking centers: SUBJ#3EIND.</Paragraph>
      <Paragraph position="2"> OBJ#3EOBJ#3EOTHERS Later crosslinguistic studies based on empirical work #28Di Eugenio, 1998; Turan, 1995; Kameyama, 1985#29 determined the following detailed ranking, with QIS standing for quanti#0Ced inde#0Cnite subjects #28people, everyone etc#29 and PRO-ARB #28we, you#29 for arbitrary plural pronominals.</Paragraph>
      <Paragraph position="3"> #283#29Revised rule for the ranking of forward-looking centers: SUBJ#3EIND.</Paragraph>
      <Paragraph position="4"> OBJ#3EOBJ#3EOTHERS#3EQIS, PRO-ARB.</Paragraph>
      <Paragraph position="5">  In the case of complex NPs, whichhave the propertyofevoking multiple discourse entities #28e.g. his mother, software industry#29, the working hypothesis commonly assumed #28e.g. Walker and Prince #281995#29#29 is ordering from left to right.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="3" end_page="3" type="metho">
    <SectionTitle>
5 The role of Rough-Shift
</SectionTitle>
    <Paragraph position="0"> transitions As mentioned brie#0Dy earlier, the Centering model includes one more rule, the Pronoun Rule given in #284#29.</Paragraph>
    <Paragraph position="1"> #284#29 Pronoun Rule: If some elementof Cf#28Ui-1#29 is realized as a pronoun in Ui, then so is the Cb#28Ui#29.</Paragraph>
    <Paragraph position="2"> The Pronoun Rule re#0Dects the intuition that pronominals are felicitously used to refer to discourse-saliententities. As a result, Cbs are often pronominalized, or even deleted #28if the grammar allows it#29. Rule #284#29 then predicts that if there is only one pronoun in an utterance, this pronoun must realize the Cb. The Pronoun Rule and the distribution offorms#28de#0Cnite#2Finde#0Cnite NPs and pronominals#29 over transition types plays a signi#0Ccant role in the development of anaphora resolution algorithms in NLP. Note that the utility ofthe Pronoun Rule and the Centering transitions in anaphora resolution algorithms relies heavily on the assumption that the texts under consideration are maximally coherent. In maximally coherent texts, however, Rough-Shifts transitions are rare, and even in less than maximally coherent texts they occur infrequently. For this reason the distinction between Smooth-Shifts and Rough-Shifts was collapsed in previous work #28Di Eugenio, 1998; Hurewitz, 1998, inter alia#29. The status of Rough-Shift transitions in the Centering model was therefore unclear, receiving only negative evidence: Rough-Shifts are valid because they are found to be rare in coherent discourse.</Paragraph>
    <Paragraph position="3"> In this study we gain insights pertaining to the nature of the Rough-Shifts precisely because we are forced to drop the coherence assumption. Our data consist of student essays whose degree of coherence is under evaluation and therefore cannot be assumed. Using students' paragraph marking as segment boundaries, we 'centered' 100 GMAT essays.</Paragraph>
    <Paragraph position="4"> The average length of these essays was about  But see also Di Eugenio #281998#29 for the treatment of complex NPs in Italian.</Paragraph>
    <Paragraph position="5"> Def. Phr. Indef. Phr. Prons  250 words. In the next section we show that Rough-Shift transitions provide a reliable measure of incoherence, correlating well with scores provided by writing experts.</Paragraph>
    <Paragraph position="6"> One of the crucial insights was that, in our data, the incoherence detected by the Rough-Shift measure is not due to violations of the Pronominal Rule or infelicitous use of pronominal forms in general. In Table 2, we report the results of the distribution of forms over Rough-Shift transitions. Out of the 211 Rough-Shift transitions, found in the set of 100 essays, in 195 occasions the Cp was a nominal phrase, either de#0Cnite or indefinite. Pronominals occurred in only 16 cases of which 6 cases instantiated the pronominals 'we' or 'you' in their generic sense. Table 2 strongly indicates that student essays were not incoherent in terms of the processing load imposed on the reader to resolve anaphoric references. Instead, the incoherence in the essays was due to discontinuities in students' essays caused by their introducing too many undeveloped topics within what should be a conceptually uniform segment, i.e. their paragraphs. This is, in fact, what Rough-Shift picked up.</Paragraph>
    <Paragraph position="7"> These results not only justify Rough-Shifts as a valid transition type but they also support the original formulation of Centering as a measure of discourse continuityeven when anaphora resoluion is not an issue. It seems that Rough-Shifts are capturing a source of incoherence that has been overlooked in the Centering literature. The processing load in the Rough-Shift cases reported here is not increased by the e#0Bort required to resolve anaphoric reference but instead by the e#0Bort required to #0Cnd the relevant topic connections in a discourse bombarded with a rapid succession of multiple entities. That is, Rough-Shifts are the result of absent or extremely short-lived Cbs. Weinterpret the Rough-Shift transitions in this context as a re#0Dection of the incoherence perceived by the reader when s#2Fhe is unable to identify the topic #28focus#29 structure of the discourse. This is a signi#0Ccant insight which opens up new avenues for practical applications of the Centering model.</Paragraph>
  </Section>
  <Section position="7" start_page="3" end_page="4" type="metho">
    <SectionTitle>
6 The e-rater Centering study
</SectionTitle>
    <Paragraph position="0"> In anearlier preliminary study,weapplied the Centering algorithm manually to a sample of 36 GMAT essays to explore the hypothesis that the Centering model provides a reasonable measure of coherence #28or lack of#29 re#0Decting the evaluation performed byhuman raters with respect to the corresponding requirements described in the instructions for human raters. We observed that essays with higher scores tended to have signi#0Ccantly lower percentagesofROUGH-SHIFTs thanessayswith lower scores. As expected, the distribution of the other types of transitions was not significant. In general, CONTINUEs, RETAINs, and SMOOTH-SHIFTs do not yield incoherent discourses #28in fact, an essay with only CONTINUE transitions might sound rather boring!#29.</Paragraph>
    <Paragraph position="1"> In this study we test the hypothesis that a predictor variable derived from Centering can signi#0Ccantly improve the performance of e-rater. Since we are in fact proposing Centering's ROUGH-SHIFTs as a predictor variable, our model, strictly speaking, measures incoherence.</Paragraph>
    <Paragraph position="2"> The corpus for our study came from a pool of essays written by students taking the GMAT test. We randomly selected a total of 100 essays, covering the full range of the scoring scale, where 1 is lowest and 6 is highest #28see appendix#29. We applied the Centering algorithm to all 100 essays, calculated the percentage of ROUGH-SHIFTs in each essay and then ran multiple regression to evaluate the contribution of the proposed variable to the e-rater's performance.</Paragraph>
    <Section position="1" start_page="3" end_page="4" type="sub_section">
      <SectionTitle>
6.1 Centering assumptions and
modi#0Ccations
</SectionTitle>
      <Paragraph position="0"> Utterance. Following Miltsakaki #281999#29, we assumethat the each utteranceconsists of one main clause and all its subordinate and adjunct clauses.</Paragraph>
      <Paragraph position="1"> Cf ranking. We assumed the Cf ranking given in #283#29.</Paragraph>
      <Paragraph position="2"> A modi#0Ccation we made involved the status of the pronominal I.</Paragraph>
      <Paragraph position="3">  We observed that in low-scored essays the #0Crst person pronominal I was used extensively, normally presenting personal narratives. However, personal narratives were unsuited to this essay writing task and were assigned lower scores by expert readers. The extensive use of I in the subject position produced an unwanted e#0Bect of high coherence. We prescriptively decided to penalize the use of I's in order to better re#0Dect the coherence demands made by the particular writing task. The way to penalize was to omit I's. As a result, coherence was measured with respect to the treatment of the remaining entities in the I-containing utterances. This gave us the desired result of being able to distinguish those I-containing utterances which made coherent transitions with respect to the entities they were talking about and those that did not.</Paragraph>
      <Paragraph position="4">  In fact, a similar modi#0Ccation has been proposed by Hurewitz #281998#29 and Walker #281998#29 observed that the use of I in sentences such as 'I believe that...', 'I think that...' do not a#0Bect the focus structure of the text.</Paragraph>
      <Paragraph position="5"> tremely hard to identify in an accurate and principled way. Furthermore, existing algorithms #28Morris and Hirst, 1991; Youmans, 1991; Hearst, 1994; Kozima, 1993; Reynar, 1994; Passonneau and Litman, 1997; Passonneau, 1998#29 rely heavily on the assumption of textual coherence. In our case, textual coherence cannot be assumed. Given that text organization is also part of the evaluation of the essays, we decided to use the students' paragraph breaks to locate segment boundaries.</Paragraph>
    </Section>
    <Section position="2" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
6.2 Implementation
</SectionTitle>
      <Paragraph position="0"> For this study,we decided to manually tag coreferring expressions despite the availability of coreference algorithms. We made this decision because a poor performance of the coreference algorithm would give us distorted results and wewould not be able to test our hypothesis. For the same reason, we manually tagged the Preferred centers as Cp. We only needed to mark all the other entities as OTHER. This information was adequate for the computation of the Cb and all of the transitions. null Discourse segmentation and the implementationof the Centering algorithm for the computation of the transitions were automated.</Paragraph>
      <Paragraph position="1"> Segments boundaries were marked at paragraph breaks and the transitions were calculated according to the instructions given in  the percentage of Rough-Shifts for each essay. The percentage of Rough-Shifts was calculated as the number of Rough-Shifts over the total number of identi#0Ced transitions in the essay.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="4" end_page="4" type="metho">
    <SectionTitle>
7 Study results
</SectionTitle>
    <Paragraph position="0"> In the appendix, we give the percentages of Rough-Shifts #28ROUGH#29 for each of the actual student essays #28100#29 on whichwe tested the ROUGH variable in the regression discussed below. The HUMAN #28HUM#29 column contains the essay scores given byhuman raters and the EARTER #28E-R#29 column contains the corresponding score assigned by the e-rater.</Paragraph>
    <Paragraph position="1"> Comparing HUMAN and ROUGH, we observe that essays with scores from the higher end of the scale tend to havelower percentages of Rough-Shifts than the ones from the lower end. Toevaluate that this observation can be utilized to improve the e-rater's performance, we regressed X=E-RATER and X=ROUGH #28the predictors#29 by Y=HUMAN.</Paragraph>
    <Paragraph position="2"> The results of the regression are shown in Table 3. The 'Estimate' cell contains the coef#0Ccients assigned for eachvariable. The coef#0Ccient for ROUGH is negative, thus penalizing occurrences of Rough-Shifts in the essays. The t-test #28't-ratio' in Table 3#29 for ROUGH has ahighly signi#0Ccant p-value #28p#3C0.0013#29for these 100 essays suggesting that the added variable ROUGH can contribute to the accuracy of the model. The magnitude of the contribution indicated by this regression is approximately 0.5 point, a reasonalby sizable e#0Bect given the scoring scale #281-6#29. Additional work is needed to precisely quantify the contribution of ROUGH. That would involve incorporating the ROUGH variable into the building of a new e-rater model and comapring the results of the new model to the original e-rater model.</Paragraph>
    <Paragraph position="3"> As a preliminary test of the predictability of the model, we jacknifed the data. We performed 100 tests with ERATER as the sole variable leaving out one essay each time and recorded the prediction of the model for that essay.We repeated the procedure using both variables. The predicted values for ERATER alone and ERATER+ROUGH are shown in columns PrH#2FE and PrH#2FE+R respectively in Table 4. In comparing the predictions, we observe that, indeed, 57 #25 of the predicted values shown in the PrH#2FE+R column are better approximations of the HUMAN scores, especially in the cases where the ERATER's score is discrepantby 2 points from the HUMAN score.</Paragraph>
  </Section>
  <Section position="9" start_page="4" end_page="4" type="metho">
    <SectionTitle>
8 Discussion
</SectionTitle>
    <Paragraph position="0"> Our positive #0Cnding, namely that Centering Theory's measure of relative proportion of Rough-Shift transitions is indeed a signi#0Ccant contributor to the accuracy of computer-generated essay scores, has several practical and theoretical implications. Clearly, it indicates that adding a local coherence feature to e-rater could signi#0Ccantly improve e-rater's scoring accuracy. Note, however, that over-all scores and coherence scores need not be strongly correlated. Indeed, our data contain several examples of essays with high coherence scores but lowoverall scores and vice versa.</Paragraph>
    <Paragraph position="1"> We brie#0Dy reviewed these cases with several ETS writing assessment experts to gain their insights into the value of pursuing this work further. In an e#0Bort to maximize the use of their time with us, we carefully selected three pairs of essays to elicit speci#0Cc information.</Paragraph>
    <Paragraph position="2"> One pair included twohigh-scoring #286#29essays, one with a high coherence score and the other with a low coherence score. Another pair included two essays with low coherence scores but di#0Bering overall scores #28a 5 and a 6#29. A #0Cnal pair was carefully chosen to include one essay with an overall score of 3 that made several main points but did not develop them fully or coherently, and another essay with an overall score of 4 that made only one main point but did develop it fully and coherently.</Paragraph>
    <Paragraph position="3"> After brie#0Dy describing the Rough-Shift coherence measure and without revealing either the overall scores or the coherence scores of the essay pairs, we asked our experts for their comments on the overall scores and coherence of the essays. In all cases, our experts precisely identi#0Ced the scores the essays had been given. In the #0Crst case, they agreed with the high Centering coherence measure, but one expert disagreed with the low Centering coherence measure. For that essay, one expert noted that &amp;quot;coherence comes and goes&amp;quot; while another found coherence in a &amp;quot;chronological organization of examples&amp;quot; #28a notion beyond the domain of Centering Theory#29. In the second case, our experts' judgments con#0Crmed the Rough-Shift coherence measure. In the third case, our experts speci#0Ccally identi#0Ced both the coherence and the development aspects as determinants of the essays' scores. In general, our experts felt that the development of an automated coherence measure would be a useful instructional aid.</Paragraph>
    <Paragraph position="4"> The advantage of the Rough-Shift metric over other quanti#0Ced components of the e-rater is thatit can be appropriately translated into instructive feedback for the student. In an interactive tutorial system, segments containing Rough-Shift transitions can be highlighted and supplementary instructional comments will guide the studentinto revising the relevant section paying attention to topic discontinuities. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML