File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1074_metho.xml

Size: 14,741 bytes

Last Modified: 2025-10-06 14:08:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1074">
  <Title>Optimizing Algorithms for Pronoun Resolution</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Factors in Pronoun Resolution
</SectionTitle>
    <Paragraph position="0"> Pronoun resolution is conditioned by a wide range of factors. Two questions arise: Which factors are the most effective? How is interaction of the factors modelled? The present section deals with the first question, while the second question is postponed to section 4.</Paragraph>
    <Paragraph position="1"> Many approaches distinguish two classes of resolution factors: filters and preferences. Filters express linguistic rules, while preferences are merely tendencies in interpretation. Logically, filters are monotonic inferences that select a certain subset of possible antecedents, while preferences are non-monotonic inferences that partition the set of antecedents and impose an order on the cells.</Paragraph>
    <Paragraph position="2"> In the sequel, factors proposed in the literature are discussed and their value is appraised on evaluation data. Every factor narrows the set of antecedents and potentially discards correct antecedents. Table 1 lists both the success rate maximally achievable (broken down according to different types of pronouns) and the average number of antecedents remaining after applying each factor. Figures are also given for parsed input. Preferences are evaluated on filtered sets of antecedents.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Filters
</SectionTitle>
      <Paragraph position="0"> Agreement. An important filter comes from morphology: Agreement in gender and number is generally regarded as a prerequisite for coreference. Exceptions are existant but few (2.5%): abstract pronouns (such as that in English) referring to nonneuter or plural NPs, plural pronouns co-referring with singular collective NPs (Ge et al., 1998), antecedent and anaphor matching in natural gender 1Here, we only count anaphoric pronouns, i.e. third person pronouns not used expletively.</Paragraph>
      <Paragraph position="1"> rather than grammatical gender. All in all, a maximal performance of 88.9% is maintained. The filter is very restrictive, and cuts the set of possible antecedents in half. See Table 1 for details.</Paragraph>
      <Paragraph position="2"> Binding. Binding constraints have been in the focus of linguistic research for more than thirty years. They provide restrictions on co-indexation of pronouns with clause siblings, and therefore can only be applied with systems that determine clause boundaries, i.e. parsers (Mitkov, 1998). Empirically, binding constraints are rules without exceptions, hence they do not lead to any loss in achievable performance. The downside is that their restrictive power is quite bad as well (0.3% in our corpus, cf. Table 1).</Paragraph>
      <Paragraph position="3"> Sortal Constraints. More controversial are sortal constraints. Intuitively, they also provide a hard filter: The correct antecedent must fit into the environment of the pronoun (Carbonell and Brown, 1988). In general, however, the required knowledge sources are lacking, so they must be hand-coded and can only be applied in restricted domains (Strube and Hahn, 1999). Selectional restrictions can also be modelled by collocational data extracted by a parser, which have, however, only a very small impact on overall performance (Kehler et al., 2004).</Paragraph>
      <Paragraph position="4"> We will neglect sortal constraints in this paper.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Preferences
</SectionTitle>
      <Paragraph position="0"> Preferences can be classified according to their requirements on linguistic processing. Sentence Recency and Surface Order can be read directly off the surface. NP Form presupposes at least tagging. A range of preferences (Grammatical Roles, Role Parallelism, Depth of Embedding, Common Path), as well as all filters, presuppose full syntactic analysis.</Paragraph>
      <Paragraph position="1"> Mention Count and Information Status are based on previous decisions of the anaphora resolution module. null Sentence Recency (SR). The most important criterion in pronoun resolution (Lappin and Leass, 1994) is the textual distance between anaphor and antecedent measured in sentences. Lappin and Leass (1994) motivate this preference as a dynamic expression of the attentional state of the human hearer: Memory capability for storage of discourse referents degrades rapidly.</Paragraph>
      <Paragraph position="2"> Several implementations are possible. Perhaps most obvious is the strategy implicit in Lappin and Leass (1994)'s algorithm: The antecedent is searched in a sentence that is as recent as possible, beginning with the already uttered part of the current sentence, continuing in the last sentence, in the one but last sentence, and so forth. In case no Constraint Upper Bound a0 number Parser  antecedent is found in the previous context, subsequent sentences are inspected (cataphora), also ordered by proximity to the pronoun.</Paragraph>
      <Paragraph position="3">  tence recency values when only the most recent antecedent (in the order just stated) is considered. In Negra, 55.3% of all pronominal anaphora can be resolved intrasententially, and 97.6% within the last three sentences. Since only 1.6% of all pronouns are cataphoric, it seems reasonable to neglect cataphora, as is mostly done (Strube and Hahn, 1999; Hobbs, 1978). Table 1 underscores the virtues of Sentence Recency: In the most recent sentence with antecedents satisfying the filters, there are on average only 2.4 such antecedents. However, the benefit also comes at a cost: The upper ceiling of performance is lowered to 82.0% in our corpus: In many cases an incorrect antecedent is found in a more recent sentence.</Paragraph>
      <Paragraph position="4"> Similarly, we can assess other strategies of sentence ordering that have been proposed in the literature. Hard-core centering approaches only deal with the last sentence (Brennan et al., 1987). In Negra, these approaches can consequently have at most a success rate of 44.2%. Performance is particularly low with possessive pronouns which often only have antecedents in the current sentence. Strube (1998)'s centering approach (whose sentence ordering is designated as SR2 in Table 2) also deals with and even prefers intrasentential anaphora, which raises the upper limit to a more acceptable 80.2%. Strube and Hahn (1999) extend the context to more than the last sentence, but switch preference order between the last and the current sentence so that an antecedent is determined in the last sentence, whenever possible. In Negra, this ordering imposes an upper limit of 51.2%.</Paragraph>
      <Paragraph position="5"> Grammatical Roles (GR). Another important factor in pronoun resolution is the grammatical role of the antecedent. The role hierarchy used in centering (Brennan et al., 1987; Grosz et al., 1995) ranks subjects over direct objects over indirect objects over others. Lappin and Leass (1994) provide a more elaborate model which ranks NP complements and NP adjuncts lowest. Two other distinctions in their model express a preference of rhematic2 over thematic arguments: Existential subjects, which follow the verb, rank very high, between subjects and direct objects. Topic adjuncts in pre-subject position separated by a comma rank very low, between adjuncts and NP complements. Both positions are not clearly demarcated in German. When the Lappin&amp;Leass hierarchy is adopted to German without changes, a small drop in performance results as compared with the obliqueness hierarchy used in centering. So we will use the centering hierarchy.</Paragraph>
      <Paragraph position="6"> Table 1 shows the effect of the role-based preference on our data. The factor is both less restrictive and less precise than sentence recency.</Paragraph>
      <Paragraph position="7"> The definition of a grammatical role hierarchy is more involved in case of automatically derived input, as the parser cannot always decide on the grammatical role (determining grammatical roles in German may require world knowledge). It proposes a syntactically preferred role, however, which we will adopt.</Paragraph>
      <Paragraph position="8"> Role Parallelism (RP). Carbonell and Brown (1988) argue that pronouns prefer antecedents in the same grammatical roles. Lappin and Leass (1994) also adopt such a principle. The factor is, however, not applicable to possessive pronouns.</Paragraph>
      <Paragraph position="9"> Again, role ambiguities make this factor slightly problematic. Several approaches are conceivable: Antecedent and pronoun are required to have a common role in one reading (weak match). Antecedent and pronoun are required to have the same role in the reading preferred by surface order (strong match). Antecedent and pronoun must display the same role ambiguity (strongest match). Weak match restricted performance to 49.9% with 12.1 antecedents on average. Strong match gave an upper limit of 47.0% but with only 10.3 antecedents on average. Strongest match lowered the upper limit to 43.1% but yielded only 9.3 antecedents. In interaction, strong match performed best, so we adopt it. Surface Order (LR, RL). Surface Order is usually used to bring down the number of available antecedents to one, since it is the only factor that produces a unique discourse referent. There is less consensus on the preference order: (sentence-wise) left-to-right (Hobbs, 1978; Strube, 1998; Strube and Hahn, 1999; Tetreault, 1999) or right-to-left (recency) (Lappin and Leass, 1994). Furthermore, something has to be said about antecedents which embed other antecedents (e.g. conjoined NPs and their conjuncts). We registered performance gains 2Carbonell and Brown (1988) also argue that clefted or fronted arguments should be preferred.</Paragraph>
      <Paragraph position="10"> (of up to 3%) by ranking embedding antecedents higher than embedded ones (Tetreault, 2001).</Paragraph>
      <Paragraph position="11"> Left-to-right order is often used as a surrogate for grammatical role hierarchy in English. The most notable exception to this equivalence are fronting constructions, where grammatical roles outperform surface order (Tetreault, 2001). A comparison of the lines for grammatical roles and for surface order in Table 1 shows that the same is true in German.</Paragraph>
      <Paragraph position="12"> Left-to-right order performs better (upper limit 56.8%) than right-to-left order (upper limit 49.2%).</Paragraph>
      <Paragraph position="13"> The gain is largely due to personal pronouns; demonstrative pronouns are better modelled by right-to-left order. It is well-known that German demonstrative pronouns contrast with personal pronouns in that they function as topic-shifting devices. Another effect of this phenomenon is the poor performance of the role preferences in connection with demonstrative pronouns.</Paragraph>
      <Paragraph position="14"> Depth of Embedding (DE). A prominent factor in Hobbs (1978)'s algorithm is the level of phrasal embedding: Hobbs's algorithm performs a breadth-first search, so antecedents at higher levels of embedding are preferred.</Paragraph>
      <Paragraph position="15"> Common Path (CP). The syntactic version of Hobbs (1978)'s algorithm also assumes maximization of the common path between antecedents and anaphors as measured in NP and S nodes. Accordingly, intra-sentential antecedents that are syntactically nearer to the pronoun are preferred. The factor only applies to intrasentential anaphora.</Paragraph>
      <Paragraph position="16"> The anaphora resolution module itself generates potentially useful information when processing a text. Arguably, discourse entities that have been often referred to in the previous context are topical and more likely to serve as antecedents again. This principle can be captured in different ways.</Paragraph>
      <Paragraph position="17"> Equivalence Classes (EQ). Lappin and Leass (1994) make use of a mechanism based on equivalence classes of discourse referents which manages the attentional properties of the individual entities referred to. The mechanism stores and provides information on how recently and in which grammatical role the entities were realized in the discourse. The net effect of the storage mechanism is that discourse entities are preferred as antecedents if they recently came up in the discourse. But the mechanism also integrates the preferences Role Hierarchy and Role Parallelism. Hence, it is one of the best-performing factors on our data. Since the equivalence class scheme is tightly integrated in the parser, the problem of ideal anaphora resolution data does not arise.</Paragraph>
      <Paragraph position="18"> Mention Count (MC). Ge et al. (1998) try to factorize the same principle by counting the number of times a discourse entities has been mentioned in the discourse already. However, they do not only train but also test on the manually annotated counts, and hence presuppose an optimal anaphora resolution system. In our implementation, we did not bother with intrasentential mention count, which depends on the exact traversal. Rather, mention count was computed only from previous sentences.</Paragraph>
      <Paragraph position="19"> Information Status (IS). Strube (1998) and Strube and Hahn (1999) argue that the information status of an antecedent is more important than the grammatical role in which it occurs. They distinguish three levels of information status: entities known to the hearer (as expressed by coreferential NPs, unmodified proper names, appositions, relative pronouns, and NPs in titles), entities related to such hearer-old entities (either overtly via modifiers or by bridging), and entities new to the hearer. Like (Ge et al., 1998), Strube (1998) evaluates on ideal hand annotated data.</Paragraph>
      <Paragraph position="20"> NP Form (NF, NP). A cheap way to model information status is to consider the form of an antecedent (Tetreault, 2001; Soon et al., 2001; Strube and Muller, 2003). Personal and demonstrative pronouns are necessarily context-dependent, and proper nouns are nearly always known to the hearer.</Paragraph>
      <Paragraph position="21"> Definite NPs may be coreferential or interpreted by bridging, while indefinite NPs are in their vast majority new to the hearer. We considered two proposals for orderings of form: preferring pronouns and proper names over other NPs over indefinite NPs (Tetreault, 2001) (NF) or preferring pronouns over all other NPs (Tetreault, 2001) (NP).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML