File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/j01-4003_intro.xml

Size: 11,876 bytes

Last Modified: 2025-10-06 14:01:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="J01-4003">
  <Title>A Corpus-Based Evaluation of Centering and Pronoun Resolution</Title>
  <Section position="4" start_page="514" end_page="517" type="intro">
    <SectionTitle>
4. Discussion
</SectionTitle>
    <Paragraph position="0"> For this study, we use McNemar's test to test whether the difference in performance of two algorithms is significant. We adopt the standard statistical convention of p &lt; 0.05 for determining whether the relative performance is indeed significant.</Paragraph>
    <Paragraph position="1"> First, we consider LRC in relation to the classical algorithms: Hobbs, BFP, and S-list. We found a significant difference in the performance of all four algorithms (e.g., LRC and S-list: p &lt; 0.00479), though Hobbs and LRC performed the closest in terms of getting the same pronouns right. These two algorithms perform similarly for two reasons. First, both search for referents intrasententially and then intersententially. In the New York Times corpus, over 71% of all pronouns have intrasentential referents, so clearly an algorithm that favors the current utterance will perform better. Second, both search their respective data structures in a salience-first manner. Intersententially, both examine previous utterances in the same manner: breadth-first based on syntax.</Paragraph>
    <Paragraph position="2"> Intrasententially, Hobbs does slightly better since it first favors antecedents close to the pronoun before searching the rest of the tree. LRC favors entities near the head of the sentence under the assumption that they are more salient. These algorithms' similarities in intra- and intersentential evaluation are reflected in the similarities in their percentage correct for the respective categories.</Paragraph>
    <Paragraph position="3"> Although S-list performed worse than LRC over the New York Times corpus, it did fare better over the fictional texts. This is due to the high density of pronouns in these texts, which S-list would rank higher in its salience list since they are hearer-old.</Paragraph>
    <Paragraph position="4"> It should be restated that a shallow version (syntax only) of the S-list algorithm is implemented here.</Paragraph>
    <Paragraph position="5">  Computational Linguistics Volume 27, Number 4 The standing of the BFP algorithm should not be surprising given past studies.</Paragraph>
    <Paragraph position="6"> For example, Strube (1998) found that the S-list algorithm performed at 91% correct on three New York Times articles, while the best version of BFP performed at 81%. This 10% difference is reflected in the present evaluation as well. The main drawback for BFP was its preference for intersentential resolution. Also, BFP, as formally defined, does not have an intrasentential processing mechanism. For the purposes of the project, the LRC intrasentential technique was used to resolve pronouns that could not be resolved by the BFP (intersentential) algorithm. It is unclear whether this is the optimal intrasentential algorithm for BFP.</Paragraph>
    <Paragraph position="7"> LRC-F is much better than LRC alone considering its improvement of over 5% in the newspaper article domain and over 7% in the fictional domain. This increase is discussed in the following section. The hybrid algorithm (LRC-P) has the same accuracy rate as LRC-F, though each gets 5 instances right that the other does not.</Paragraph>
    <Paragraph position="8"> 5. Examining Psycholinguistic Claims of Centering Having established LRC as a fair model of centering given its performance and incremental processing of utterances, we can use it to test empirically whether psycholinguistic claims about the ordering of the Cf-list are reflected in an increase in accuracy in resolving pronouns. The reasoning behind the following corpus tests is that if the predictions made by psycholinguistic experiments fail to increase performance or even lower performance, then this suggests that the claims may not be useful. As Suri, Mc-Coy, and DeCristofaro (1999, page 180) point out: &amp;quot;the corpus analysis reveals how language is actually used in practice, rather than depending on a small set of discourses presented to the human subjects.&amp;quot; In this section, we use our corpus evaluation to provide counterevidence to the claims made about using genitives and prepended phrases to rank the Cf-list, and we propose a new Cf-list ranking based on these results.</Paragraph>
    <Section position="1" start_page="515" end_page="516" type="sub_section">
      <SectionTitle>
5.1 Moving Prepended Phrases
</SectionTitle>
      <Paragraph position="0"> Gordon, Grosz, and Gilliom (1993) carried out five self-paced reading time experiments that provided evidence for the major tenets of centering theory: that the backward-looking center (Cb) should be realized as a pronoun and that the grammatical subject of an utterance is most likely to be the Cb if possible. Their final experiment showed that surface position also plays a role in ranking the Cf-list. They observed that entities in surface-initial nonsubject positions in the previous sentence had about the same repeated-name penalty as an entity that had been the noninitial subject of the previous sentence. These results can be interpreted to mean that entities in subject position and in prepended phrases (nonsubject surface-initial positions) are equally likely to be the Cb.</Paragraph>
      <Paragraph position="1"> So the claim we wished to test was whether sentence-initial and subject position can serve equally well as the Cb. To evaluate this claim, we changed our parser to find the subject of the utterance. By tagging the subject, we know what entities constitute the prepended phrase (since they precede the subject). We developed two different methods of locating the subject. The first simply takes the first NP that is the subject of the first S constituent. It is possible that this S constituent is not the top-level S structure and may even be embedded in a prepended phrase. This method is called LRC-F since it takes the first subject NP found. The second method (LRC-S) selects the NP that is the subject of the topqevel S structure. If one cannot be found, then the system defaults to the first method. The result of both tagging methods is that all NPs preceding the chosen subject are marked as being in a prepended phrase.</Paragraph>
      <Paragraph position="2">  Eight different corpus trials were carried out involving the two different parsing algorithms (LRC-F and LRC-S) and two different ordering modifications: (1) ranking the Cf-list after processing and (2) modifying the order of entities before processing the utterance. The standard Cf-list consists of ranking entities by grammatical role and surface order. As a result, prepended phrases would still be ranked ahead of the main subject. The modified Cf-list consists of ranking the main clause by grammatical role and placing all entities in the prepended phrase after all entities from the main clause. The second method involves reordering the utterance before processing. This technique was motivated mostly by the order we selected for pronoun resolution: an antecedent is first searched for in the Cf-partial, then in the past Cf-lists, and finally in the entities of the same utterance not in the Cf-partial. Pronouns in prepended phrases frequently refer to the subject of the same utterance as well as to entities in the previous utterance. Moving the prepended entities after the main clause entities before evaluation achieves the same result as looking in the main clause before the intersentential search.</Paragraph>
      <Paragraph position="3"> Table 4 contains the results of the trials over the New York Times domain. &amp;quot;Prepended movement&amp;quot; refers to ranking the Cf-list with prepended entities moved to the end of the main clause; &amp;quot;Standard sort&amp;quot; refers to maintaining the order of the Cf-list. &amp;quot;Norm&amp;quot; means that prepended entities were not moved before the utterance was processed.</Paragraph>
      <Paragraph position="4"> &amp;quot;Pre&amp;quot; means that the entities were placed behind the main clause.</Paragraph>
      <Paragraph position="5"> All statistics (within the respective algorithms) were deemed significant relative to each other using McNemar's test. However, it should be noted that between the best performers for LRC-F and LRC-S (movement of prepended phrases before and after Cf-list, column 2), the difference in performance is insignificant (p ~ 0.624). This indicates that the two algorithms fare the same. The conclusion is that if an algorithm prefers the subject and marks entities in prepended phrases as less salient, it will resolve pronouns better.</Paragraph>
    </Section>
    <Section position="2" start_page="516" end_page="517" type="sub_section">
      <SectionTitle>
5.2 Ranking Complex NPs
</SectionTitle>
      <Paragraph position="0"> The second claim we wished to test involved ranking possessor and possessed entities realized in complex NPs. Walker and Prince (1996) developed the complex NP assumption that &amp;quot;In English, when an NP evokes multiple discourse entities, such as a subject NP with a possessive pronoun, we assume that the Cf ordering is from left to right within the higher NP&amp;quot; (page 8). So the Cf-list for the utterance Her mother knows Queen Elizabeth would be {her, mother, Elizabeth}. Walker and Prince note that the theory is just a hypothesis but motivate its plausibility with a complex example.</Paragraph>
      <Paragraph position="1"> However, a series of psycholinguistic experiments carried out by Gordon et al.</Paragraph>
      <Paragraph position="2"> (1999) refute Walker and Prince's claim that the entities are ordered left to right. Gordon et al. found that subjects had faster reading rates for small discourses in which a pronoun referred to the possessed entity rather than the possessor entity.</Paragraph>
      <Paragraph position="3">  He assumes that possessor entities are nested deeper in the parse tree, so when the algorithm does a breadth-first search of the tree, it considers the possessed NP to be the most prominent.</Paragraph>
      <Paragraph position="4"> To see which claim is correct, we altered the Cf-list ranking to put possessed entities before possessor entities. The original LRC ordered them left to right as Walker and Prince (WP) suggest. Tables 5 and 6 include results for both domains. &amp;quot;+gen&amp;quot; indicates that only complex NPs containing genitive pronouns were reversed; &amp;quot;+pos&amp;quot; indicates that all possessive NPs were reversed, matching Gordon et al.'s study. The results indicate for both domains that Walker and Prince's theory works better, though marginally (for all domains and algorithms, significance levels between WP and +gen are under 0.05). For the New York Times domain, the difference in the actual number correct between LRC-S with WP and LRC-S with +pos is 1,362 to 1,337 or 25 pronouns, which is substantial (p (1.4e-06) over a corpus of 1,691 pronouns. Likewise, for the fictional texts, 1 extra pronoun is resolved incorrectly when using Gordon et al.'s method.</Paragraph>
      <Paragraph position="5"> Looking at the difference in what each algorithm gets right and wrong, it seems that type of referring expression and mention count play a role in which entity should be selected from the complex NP. If an entity has been mentioned previously or is realized as a pronoun, it is more likely to be the referent of a following pronoun.</Paragraph>
      <Paragraph position="6"> This would lend support to Strube and Hahn's S-list and functional centering theories (Strube and Hahn 1996), which maintain that type of referring expression and previous mention influence the salience of each entity with the S-list or Cf-list.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML