File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0713_metho.xml
Size: 16,734 bytes
Last Modified: 2025-10-06 14:09:05
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0713"> <Title>An Algorithm for Resolving Individual and Abstract Anaphora in Danish Texts and Dialogues</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Background for DAR </SectionTitle> <Paragraph position="0"> In most applied approaches pronominal anaphora resolution is equivalent to determining the antecedent domain and choosing the most prominent or salient antecedent among possible candidates. Although there is not always an identity relation between linguistic antecedents and referents, we also follow this strategy, well aware that it is particularly problematic for APAs. In fact, the same linguistic expression can evoke different abstract objects depending on the context in which the APA occurs, see (Webber, 1991).</Paragraph> <Paragraph position="1"> Determining the degree of salience of discourse elements, henceforth DEs, is essential to anaphor resolution because personal pronouns refer to the most salient candidate antecedent that matches the given predication (Sidner, 1983). Nearly all salience-based models identify high degree of salience with high degree of givenness of DEs. In fact, although the various algorithms use different criteria for ranking DEs such as linear order, hierarchy of grammatical roles, information structure, Prince's Familiarity Scale (Prince, 1981), they all assign the highest prominence to the DEs which are most topical, known, bound, familiar and thus given, i.a. (Grosz et al., 1995; Brennan et al., 1987; Strube and Hahn, 1996; Strube, 1998). Grosz et al. (1995) also suggest that continuing speaking about the same elements in a discourse segment is perceived as more coherent than shifting the focus of attention. They implement this by the following ranking of transition states: continue > retain > shift.</Paragraph> <Paragraph position="2"> One salience model departs from the givenness2 assumption. It has been proposed by HajiVcov'a et al. (1990) and assigns the highest degree of salience to DEs in the focal part of an utterance in information structure terms (Sgall et al., 1986). These entities often represent new information. HajiVcov'a et al.'s approach is original and can account for the data in (1) and (2). However, it is problematic from an applied point of view. In the first place it is difficult to and familiarity.</Paragraph> <Paragraph position="3"> determine the information structure of all utterances. Secondly, focal candidate antecedents are ranked highest in HajiVcov'a et al.'s model, but they still compete with given candidate antecedents in their system. Finally the data does not confirm that all entities in the focal part of an utterance have the highest degree of accessibility. null We agree with HajiVcov'a's insight, but in order to operationalise the role of focality in resolution in a reliable way we propose the following. Accessibility by default is connected with givenness as assumed in most resolution algorithms. However, speakers can explicitly change the degree of accessibility of entities in discourse by marking them as salient with information structure related devices. These entities represent the main focus of an utterance, have the highest degree of salience and are, in the majority of cases, the preferred antecedents of anaphors.</Paragraph> <Paragraph position="4"> In these cases the shift of focus of attention is, in our opinion, as coherent as continuing speaking about the same entities, because it is preannounced to the addressee. On the basis of the data we propose a list of identifiable constructions in which explicit focus marking occurs and the focalDEs have the highest degree of salience in our data.3 Examples from the list are the following: null a: Entities referred to by NPs which are focally marked structurally. In Danish this marking occurs in clefts, existential and topicalised constructions.4 null b: Entities referred to by NPs that follow focusing adverbs, as in (1).</Paragraph> <Paragraph position="5"> c: Entities focally marked by the prosody (if this information is available) and/or entities providing the information requested in questions, as in (2).</Paragraph> <Paragraph position="6"> The hierarchy of verbal complements can model givenness preference in Danish. As in English pronouns have high givenness degree (pronominal chain preference). In addition to salience preferences we found that parallelism can account for numerous uses of Danish anaphors. According to parallelism in adjacent utterances with parallel grammatical complements, the preferred antecedent of an anaphor in the second utterance is the linguistic expression in e.g. (Sidner, 1983).</Paragraph> <Paragraph position="7"> the first utterance with the same grammatical function. Inspired by the work of (Kameyama, 1996) we have defined a preference interaction model to be used in resolution. Our model is given in figure 1.5 The interaction model states that givenness preferences are overridden by focality preference, when in conflict, and that they all are overridden by parallelism. Also in Dan- null ish demonstrative and personal pronouns refer to entities with different status in the discourse model. Weak (cliticised and unstressed) pronouns usually refer to the most salient entity in the utterance. Strong (stressed and demonstrative) pronouns emphasise or put in contrast the entities they refer to and/or indicate that their antecedents are not the most expected ones.6 Demonstratives preferentially refer to abstract entities, while personal pronouns preferentially refer to individual entities in ambiguous contexts. All these differences are implemented in dar.</Paragraph> <Paragraph position="8"> Approx. half of the APA occurrences in our dialogues refer to entities evoked by larger discourse segments (more turn takings). Thus we follow Eckert and Strube's approach of marking the structure of dialogues and searching for APA antecedents in the right frontier of the discourse tree (Webber, 1991). dar presupposes different discourse structures for texts and dialogues. null dar follows the es00 and phora strategy of discriminating between IPAs and APAs by rules looking at the semantic constraints on the predication contexts in which the anaphors occur. dar relies on many more discriminating rules than es00. These rules were defined analysing large amounts of data and using the encodings of the Danish parole computational lexicon (Braasch et al., 1998; Navarretta, 1997). dar uses language-specific rules to account 5The interaction model was defined on the basis of the data and the results of a survey of pronominal uses. Commonsense preferences which override all the other preferences (see inter alia (Hobbs, 1983) are not implemented. null der pronoun det can both be a personal pronoun (corresponding to it) and a demonstrative pronoun (corresponding to this/that). In the latter case it is always stressed.</Paragraph> <Paragraph position="9"> for Danish APAs. These occur in much more contexts than in English where elliptical constructions or other anaphors such as too and so are used. Examples of Danish-specific uses of abstract anaphors are given in (3) and (4).</Paragraph> <Paragraph position="10"> (3) Han var sulten. Det var jeg ikke. [pid] (lit. He was hungry. That was I not.) (My friends were hungry. I wasn't.) (4) Han kunne svomme, men det kunne hun ikke.</Paragraph> <Paragraph position="11"> (lit. He could swim, but it could she not.) (He could swim, but she couldn't.) A language-specific rule recognising APAs is the following: constructions with modal verbs and an object, such as x skal man (lit. x shall one) (one shall), x vil man (lit. x will one) (one will).</Paragraph> <Paragraph position="12"> An example of a rule identifying IPAs is the following: adjectival constructions in which the prepositional complement only subcategorises for concrete entities such as let for x (easy for x), fuld af x (full of x).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The DAR-algorithm </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Search Space and DE lists </SectionTitle> <Paragraph position="0"> dar presupposes the discourse structure described by Grosz and Sidner (1986). The minimal discourse unit is the utterance U. Paragraphs correspond to discourse segments in texts. Discourse segments in dialogues were manually marked. The dialogues were structured with Synchronising Units (SU) according to the definitions in es00.</Paragraph> <Paragraph position="1"> The immediate antecedent search space of a pronoun x in utterance Un is the previous utterance, Un[?]1. If Un is the first component in SUm in dialogues the immediate search space for x is SUm[?]1. dar assumes two antecedent domains depending on whether the pronoun has or has not been recognised as an IPA. The antecedent domain for IPAs is first Un[?]1 and then the preceding utterances in the right frontier of the discourse tree searched for in recency order.7 The antecedent domain for APAs or anaphors which can both be IPAs and APAs is Un[?]1.</Paragraph> <Paragraph position="2"> dar operates on two lists of DEs, the Ilist and the Alist. The Ilist contains the NPs referred to in Un[?]1 ranked according to their degree of salience and enriched with information on gender, number, animacy and other simple semantic types necessary to implement selectional restrictions. In the Ilist information about the grammatical role of nominals is provided and strongly focally marked elements are indicated. The leftmost element in the Ilist is the most salient one. Givenness and focality preferences are accounted for in the Ilist, as illustrated in figure 2. Focally marked entities are put in front of the list while the remaining DEs are ordered according to verbal complement order. Inside verbal complements nominals are ordered according to their occurrence order as illustrated in the second row of figure 2. The abstract entities which are referred to by an APA in Un[?]1 or SUm[?]1 are encoded in the Alist. They are removed from the list after a new utterance (SU in dialogues) has been processed if they have not been mentioned in it.</Paragraph> <Paragraph position="3"> The context ranking for abstract entities is that proposed by Eckert and Strube (2000) and is given in figure 3.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 The Algorithm </SectionTitle> <Paragraph position="0"> dar consists of two different functions ResolveDet and ResolveIpa. The former is applied if the actual pronoun x is third person singular neuter, while the latter is applied in all the remaining cases: if x is singular & neuter then go to ResolveDet(x) else go to ResolveIpa(x) ResolveIpa takes the IPA x as argument and looks for possible antecedents in the Ilist for the preceding Un[?]1 or Sm[?]1, after having applied syntactic constraints and selectional restrictions on the elements of the list. Three different cases are considered: (A) no antecedent has been found in the immediate search space; (B) one antecedent has been found; (C) more antecedents have been found.</Paragraph> <Paragraph position="1"> If no antecedent has been found (case A), ResolveIpa looks for the highest ranked antecedent in recency order in the Ilists of the preceding discourse. If an antecedent is found the algorithm returns it. If no antecedent is found, x is classified as inferable.8 If one antecedent is found (case B), it is returned. If more candidate antecedents are found (case C), ResolveIpa performs tests, implementing the preference interaction model described in section 3, as follows. If Un and Un[?]1 are parallel9 and one of the candidate antecedents has the same grammatical role in Un[?]1 as x in Un, this &quot;parallel&quot; antecedent is marked. In the remaining cases the algorithm marks the highest ranked candidate in the Ilist. Pronouns are preferred, unless there are focally marked candidate antecedents. At this point the algorithm individuates the preferred antecedent on the basis of x's type. If x is weak the marked candidate proposed in the preceding steps is returned together with the list of the remaining candidate antecedents (possible ambiguity). If x is strong the highest ranked candidate antecedent which was not marked in the preceding steps is returned together with the list of candidate antecedents.10 The approach of marking ambiguities resembles that proposed by Kameyama (1996).</Paragraph> <Paragraph position="2"> The main structure of the function ResolveDet is inspired by es00. ResolveDet tests the pronoun x using the IPA and APA discriminating rules discussed in section 3. If x is IPA, the function ResolveIpa-neu is applied. If x is APA the function ResolveApa is applied. Finally, if the pronoun is neither IPA nor APA, ResolveDet looks at its type. If x is strong the algorithm attempts to find an abstract antecedent (ResolveApa), while if it is weak dar tries to find an individual antecedent (ResolveIpa-neu). ResolveIpa-neu is like ResolveIpa except that it returns if no NP antecedents are found in Un[?]1 (case A) so that ResolveApa can be applied.</Paragraph> <Paragraph position="3"> tives dette/denne/disse (this/these) which never corefer with subject candidates.</Paragraph> <Paragraph position="4"> ResolveApa distinguishes between types of pronoun. If x is weak, the preferred antecedent is searched for among the elements indicated in the context ranking, unless it is the object of the verb gore (do), modals, have (have) or the abstract subject in copula constructions. In these cases the pronoun is resolved to the VP of the element in the A-list or in the context ranking. If x is strong ResolveApa attempts to resolve or classify it as vague depending on the type of pronoun. This part of the algorithm is specific to Danish and accounts for the fact that different strong pronouns preferentially refer to different abstract entities in the data.</Paragraph> <Paragraph position="5"> Resolved APAs are inserted into the Alist.</Paragraph> <Paragraph position="6"> In case of failure ResolveApa returns so that ResolveIpa-neu can be applied. If both functions fail, the pronoun is classified as vague.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Some Examples </SectionTitle> <Paragraph position="0"> In the following we look at the resolution of example (2) from section 3 and the example in (5).</Paragraph> <Paragraph position="1"> (5): Du har svaert ved at se musemarkoren p@a skaermen. Hvordan klarer du det? [edb] (You have difficulties seing the mouse-cursor (common-gend) on the screen (common-gend).</Paragraph> <Paragraph position="2"> How do you manage it/this (neuter gender))? The simplified Ilists and Alists after each utterance has been processed in example (2) are given in figure 4. (2) contains three SUs. U2 is an I/A thus it belongs to two synchronising units (SU1 and SU2). The Ilist after U1 has been processed, contains one element, din mor (your mother). In U2 the personal pronoun hun (she) occurs, thus ResolveIpa is applied. It resolves hun to the compatible NP in the Ilist, din mor. After U2 has been processed the Ilist contains two elements in this order: the focal marked entity vores nabo (our neighbour) and the pronoun hun (= din mor). ResolveIpa resolves the occurrence of the pronoun hun (she) in U3 to the most salient candidate NP in the Ilist, vores nabo. Here focal preference over-rides pronominal chain preference. The simplified Ilists and Alists after the two utterances in (5) have been processed are given in figure 5.</Paragraph> <Paragraph position="3"> After U1 has been processed there are two com-SU1: U1 (I) U2 (I/A): U1: hvem...hvem arbejdede din mor med? (with whom... whom did your mother work) Ilist: [din mor] markoren (the mouse cursor) and skaermen (the screen). In U2 the singular neuter gender pronoun det (it) occurs, thus ResolveDet is applied. The pronoun is neither IPA nor APA according to the discriminating rules. Then ResolveDet attempts to find an individual antecedent of the weak pronoun, applying the function ResolveIpa-neu. ResolveIpa-neu fails because the two DEs in the Ilist do not agree with the pronoun. Then the function ResolveApa resolves x looking at the context ranking. Being the Alist empty, U1, is proposed as antecedent. The resolved APA is added to the Alist.</Paragraph> </Section> </Section> class="xml-element"></Paper>