File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2302_metho.xml
Size: 21,851 bytes
Last Modified: 2025-10-06 14:10:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2302"> <Title>Another Evaluation of Anaphora Resolution Algorithms and a Comparison with GETARUNS' Knowledge Rich Approach</Title> <Section position="4" start_page="0" end_page="3" type="metho"> <SectionTitle> 2 The Anaphora Resolution Algorithms </SectionTitle> <Paragraph position="0"> We start by presenting a brief overview of three state-of-the-art algorithms for anaphora resolution -GuiTAR, JavaRAP, MARS.</Paragraph> <Section position="1" start_page="0" end_page="3" type="sub_section"> <SectionTitle> 2.1 JavaRAP </SectionTitle> <Paragraph position="0"> As reported by the authors (Long Qiu, Min-Yen Kan, Tat-Seng Chua, 2004) of the JAVA implementation, head-dependent relations required by RAP are provided by looking into the structural &quot;argument domain&quot; for arguments and into the structural &quot;adjunct domain&quot; for adjuncts. Domain information is important to establish disjunction relations, i.e. to tell whether a third person pronoun can look for antecedents within a certain structural domain or not. According to Binding Principles, Anaphors (i.e. reciprocal and reflexive pronouns), must be bound - search for their binder-antecedent in their same binding domain - roughly corresponding to the notion of structural &quot;argument/adjunct domain&quot;. Within the same domains, Pronouns must be free. Head-argument or head-adjunct relation is determined whenever two or more NPs are sibling of the same VP.</Paragraph> <Paragraph position="1"> Additional information is related to agreement features, which in the case of pronominal expressions are directly derived. As for nominal expressions, features are expressed in case they are either available on the verb - for SUBJect NPs- or else if they are expressed on the noun and some other tricks are performed for conjoined nouns.</Paragraph> <Paragraph position="2"> Gender is looked up in the list of names available on the web. This list is also used to provide the semantic feature of animacy.</Paragraph> <Paragraph position="3"> RAP is also used to find pleonastic pronouns, i.e. pronouns which have no referents. To detect conditions for pleonastic pronouns a list of patterns is indicated, which used both lexical and structural information.</Paragraph> <Paragraph position="4"> Salience weight is produced for each candidate antecedent from a set of salience factors. These factors include main Grammatical Relations, Headedness, non Adverbiality, belonging to the same sentence. The information is computed again by RAP, directly on the syntactic structure. The weight computed for each noun phrase is divided by two in case the distance from the current sentence increases. Only NPs contained within a distance of three sentences preceding the anaphor are considered by JavaRAP.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.2 GuiTAR </SectionTitle> <Paragraph position="0"> The authors (Poesio, M. and Mijail A. Kabadjov 2004) present their algorithm as an attempt at providing a domain independent anaphora resolution module, &quot;that developers of NLE applications can pick off the shelf in the way of tokenizers, POS taggers, parsers, or Named Entity classifiers&quot;. For these reasons, GuiTAR has been designed to be as independent as possible from other modules, and to be as modular as possible, thus &quot;allowing for the possibility of replacing specific components (e.g., the pronoun resolution component)&quot;.</Paragraph> <Paragraph position="1"> The authors have also made an attempt at specifying what they call the Minimal Anaphoric Syntax (MAS) and have devised a markup language based on GNOME mark-up scheme. In MAS, Nominal Expressions constitute the main processing units, and are identified with the tag NE <ne>, which have a CAT attribute, specifying the NP type: the-np, pronoun etc., as well as Person, Number and Gender attributes for agreement features. Also the internal structure of the NP is marked with Mod and NPHead tags.</Paragraph> <Paragraph position="2"> The pre-processing phase uses a syntactic guesser which is a chunker of NPs based on heuristics. All NEs add up to a discourse model - or better History List - which is then used as the basic domain where Discourse Segments are contained. Each Discourse Segment in turn may be constituted by one or more Utterances. Each Utterance in turn contains a list of forward looking centers Cfs.</Paragraph> <Paragraph position="3"> The Anaphora Resolution algorithm implemented is the one proposed by MARS which will be commented below. The authors also implemented a simple algorithm for resolving Definite Descriptions on the basis of the History List by a same head matching approach.</Paragraph> </Section> <Section position="3" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 2.3 MARS </SectionTitle> <Paragraph position="0"> The approach is presented as a knowledge poor anaphora resolution algorithm (Mitkov R.</Paragraph> <Paragraph position="1"> [1995;1998]), which makes use of POS and NP chunking, it tries to individuate pleonastic &quot;it&quot; occurrences, and assigns animacy. The weighting algorithm seems to contain the most original approach. It is organized with a filtering approach by a series of indicators that are used to boost or reduce the score for antecedenthood to a given NP.</Paragraph> <Paragraph position="2"> The indicators are the following ones:</Paragraph> </Section> </Section> <Section position="5" start_page="3" end_page="4" type="metho"> <SectionTitle> RD (Referential Distance); TP (Term Preference), </SectionTitle> <Paragraph position="0"> As the author comments, antecedent indicators (preferences) play a decisive role in tracking down the antecedent from a set of possible candidates.</Paragraph> <Paragraph position="1"> Candidates are assigned a score (-1, 0, 1 or 2) for each indicator; the candidate with the highest aggregate score is proposed as the antecedent.</Paragraph> <Paragraph position="2"> The authors comment is that antecedent indicators have been identified empirically and are related to salience (definiteness, givenness, indicating verbs, lexical reiteration, section heading preference, &quot;non- prepositional&quot; noun phrases), to structural matches (collocation, immediate reference), to referential distance or to preference of terms. However it is clear that most of the indicators have been suggested for lack of better information, in particular no syntactic constituency was available.</Paragraph> <Paragraph position="3"> In a more recent paper (Mitkov et al., 2003) MARS has been fully reimplemented and the indicators updated. The authors seem to acknowledge the fact that anaphora resolution is a much more difficult task than previous work had suggested, In unrestricted text analysis, the tasks involved in the anaphora resolution process contribute a lot of uncertainty and errors that may be the cause for low performance measures.</Paragraph> <Paragraph position="4"> The actual algorithm uses the output of Connexor's FDG Parser, filters instances of &quot;it&quot; and eliminates pleonastic cases, then produces a list of potential antecedents by extracting nominal and pronominal heads from NPs preceding the pronoun. Constraints are then applied to this list in order to produce the &quot;set of competing candidates&quot; to be considered further, i.e. those candidates that agree in number and gender with the pronoun, and also obey syntactic constraints. They also introduced the use of Genetic Algorithms in the evaluation phase.</Paragraph> <Paragraph position="5"> The new version of MARS includes three new indicators which seem more general and applicable to any text, so we shall comment on them.</Paragraph> <Paragraph position="6"> Frequent Candidates (FC) - this is a boosting score for most frequent three NPs; Syntactic Parallelism (SP) - this is a boosting score for NPs with the same syntactic role as the pronoun, roles provided by the FDG-Parser; Boost Pronoun (BP) - pronoun candidates are given a bonus (no indication of conditions for such a bonus).</Paragraph> <Paragraph position="7"> The authors also reimplemented in a significant way the indicator First NPs which has been renamed, &quot;Obliqueness (OBL) - score grammatical functions, SUBJect > OBJect > IndirectOBJect > Undefined&quot;. MARS has a procedure for automatically identifying pleonastic pronouns: the classification is done by means of 35 features organized into 6 types and are expressed by a mixture of lexical and grammatical heuristics. The output should be a fine-grained characterization of the phenomenon of the use of pleonastic pronouns which includes, among others, discourse anaphora, clause level anaphora and idiomatic cases.</Paragraph> <Paragraph position="8"> In the same paper, the authors deal with two more important topics: syntactic constraints and animacy identification.</Paragraph> </Section> <Section position="6" start_page="4" end_page="6" type="metho"> <SectionTitle> 3 GETARUNS </SectionTitle> <Paragraph position="0"> In a number of papers (Delmonte 1990;1991; 1992;1994; 2003;2004) and in a book (Delmonte 1992) we described our algorithms and the theoretical background which inspired it. Whereas the old version of the system had a limited vocabulary and was intended to work only in limited domains with high precision, the current version of the system has been created to cope with unrestricted text. In Delmonte (2002), we reported preliminary results obtained on a corpus of anaphorically annotated texts made available by R.Mitkov on his website. Both definite descriptions and pronominal expressions were considered, success rate was at 75% F-measure. In those case we used a very shallow and robust parser which produced only NP chunks which were then used to fire anaphoric processes. However the texts making up the corpus were technical manuals, where the scope and usage of pronominal expressions is very limited.</Paragraph> <Paragraph position="1"> The current algorithm for anaphora resolution works on the output of a complete deep robust parser which builds an indexed linear list of dependency structures where clause boundaries are clearly indicated; differently from Connexor, our system elaborates both grammatical relations and semantic roles information for arguments and adjuncts.</Paragraph> <Paragraph position="2"> Semantic roles are very important in the weighting procedures. Our system also produces implicit grammatical relations which are either controlled SUBJects of untensed clauses, arguments or adjuncts of relative clauses.</Paragraph> <Paragraph position="3"> As to the anaphoric resolution algorithm, it is based on the original Sidner's (1983:Chapter 5) and Webber's (1983:Chapter 6) intuitions on Focussing in Discourse. We find distributed, local approaches to anaphora resolution more efficient than monolithic, global ones. In particular we believe that due to the relevance of structural constraints in the treatment of locally restricted classes of pronominal expressions, it is more appropriate to activate different procedures which by dealing separately with non-locally restricted classes also afford separate evaluation procedures. There are also at least two principled reasons for the separation into two classes.</Paragraph> <Paragraph position="4"> The first reason is a theoretical one. Linguistic theory has long since established without any doubt the existence in most languages of the world of at least two classes: the class of pronouns which must be bound locally in a given domain and the class of pronouns which must be left free in the same domain - as a matter of fact, English also has a third class of pronominals, the so-called long-distance subject-of-consciousness bound pronouns (see Zribi-Hertz A., 1989); The second reason is empirical. Anaphora resolution is usually carried out by searching antecedents backward w.r.t. the position of the current anaphoric expression. In our approach, we proceed in a clause by clause fashion, weighting each candidate antecedent w.r.t. that domain, trying to resolve it locally. Weighting criteria are amenable on the one hand to linear precedence constraints, with scores assigned on a functional/semantic basis. On the other hand, these criteria may be overrun by a functional ranking of clauses which requires to treat main clauses differently from secondary clauses, and these two differently from complement clauses. On the contrary, global algorithms neglect altogether such requirements: they weight each referring expression w.r.t. the utterance, linear precedence is only physically evaluated, no functional correction is introduced.</Paragraph> <Section position="1" start_page="5" end_page="6" type="sub_section"> <SectionTitle> 3.1 Referential Policies and Algorithms </SectionTitle> <Paragraph position="0"> There are also two general referential policy assumption that we adopt in our approach: The first one is related to pronominal expressions, the second one to referring expressions or entities to be asserted in the History List, and are expressed as follows: - no more than two pronominal expressions are allowed to refer back in the previous discourse portion; - at discourse level, referring expressions are stored in a push-down stack according to Persistence principles.</Paragraph> <Paragraph position="1"> Persistence principles respond to psychological principles and limit the topicality space available to user w.r.t. a given text. It has a bidimensional nature: it is determined both in relation to an overall topicality frequency value and to an utterance number proximity value.</Paragraph> <Paragraph position="2"> Only &quot;persistent&quot; referring expressions are allowed to build up the History List, where persistence is established on the basis of the frequency of topicality for each referring expression which must be higher than 1. All referring expression asserted as Topic (Secondary, Potential) only once are discarded in case they appeared at a distance measured in 5 previous utterances. Proximate referring expressions are allowed to be asserted in the History List.</Paragraph> <Paragraph position="3"> In particular, if Mitkov considers the paragraph as the discourse unit most suitable for coreferring and cospecifying operation at discourse level, we prefer to adopt a parameterized procedure which is definable by the user and activated automatically: it can be fired within a number that can vary from every 10 up to 50 sentences. Our procedure has the task to prune the topicality space and reduce the number of perspective topic for Main and Secondary Topic. Thus we garbage-collect all non-relevant entities. This responds to the empirically validated fact that as the distance between first and second mention of the same referring expression increases, people are obliged to repeat the same linguistic description, using a definite expression or a bare NP. Indefinites are unallowed and may only serve as first mention; they can also be used as bridging expression within opaque propositions. The first procedure is organized as follows: A. For each clause, 1. we collect all referential expressions and weight them (see B below for criteria) - this is followed by an automatic ranking; 2. then we subtract pronominal expressions; 3. at clause level, we try to bind personal and possessive pronouns obeying specific structural properties; we also bind reflexive pronouns and reciprocals if any, which must be bound obligatorily in this domain; 4. when binding a pronoun, we check for disjointness w.r.t. a previously bound pronoun if any; 5. all unbound pronouns and all remaining personal pronouns are asserted as &quot;externals&quot;, and are passed up to the higher clause levels; B. Weighting is carried out by taking into account the following linguistic properties associated to each referring expression: 1. Grammatical Function with usual hierarchy (SUBJ > ARG_MOD > OBJ > OBJ2 > IOBJ > NCMOD); 2. Semantic Roles, as they have been labelled in FrameNet, and in our manually produced frequency lexicon of English; 3. Animacy: we use 75 semantic features derived from WordNet, and reward Human and Institution/Company labelled referring expressions; 4. Functional Clause Type is further used to introduce penalties associated to those referring expressions which don't belong to main clause. C. Then we turn at the higher level - if any -, and we proceed as in A., in addition 1. we try to bind pronouns passed up by the lower clause levels o if successful, this will activate a retract of the &quot;external&quot; label and a label of &quot;antecedenthood&quot; for the current pronoun with a given antecedent; o the best antecedent is chosen by recursively trying to match features of the pronoun with the first available antecedent previously ranked by weighting; o here again whenever a pronoun is bound we check for disjointness at utterance level.</Paragraph> <Paragraph position="4"> D. This is repeated until all clauses are examined and all pronouns are scrutinised and bound or left free.</Paragraph> <Paragraph position="5"> E. Pronouns left free - those asserted as externals will be matched tentatively with the best candidates provided this time by a &quot;centering-like&quot; algorithm. Step A. is identical and is recursively repeated until all clauses are processed.</Paragraph> <Paragraph position="6"> Then, we move to step B. which in this case will use all referring expressions present in the utterance, rather than only those available locally.</Paragraph> <Paragraph position="7"> Fig. 1 GETARUNS AR algorithm</Paragraph> </Section> <Section position="2" start_page="6" end_page="6" type="sub_section"> <SectionTitle> 3.2 Focussing Revisited </SectionTitle> <Paragraph position="0"> Our version of the focussing algorithm follows Sidner's proposal (Sidner C., 1983; Grosz B., Sidner C., 1986), to use a Focus Stack, a certain Focus Algorithm with Focus movements and data structures to allow for processing simple inferential relations between different linguistic descriptions co-specifying or coreferring to a given entity.</Paragraph> <Paragraph position="1"> Our Focus Algorithm is organized as follows: for each utterance, we assert three &quot;centers&quot; that we call Main, Secondary and the first Potential Topic, which represent the best three referring expressions as they have been weighted in the candidate list used for pronominal binding; then we also keep a list of Potential Topics for the remaining best candidates. These three best candidates repositories are renovated at each new utterance, and are used both to resolve pronominal and nominal cospecification and coreference: this is done both in case of strict identity of linguistic description and of non-identity. The second case may occur either when derivational morphological properties allow the two referring expressions to be matched successfully, or when a simple hyponym/hypernym relation is entertained by two terms, one of which is contained in the list of referring expressions collected from the current sentence, and the other is among one of the entities stored in the focus list.</Paragraph> <Paragraph position="2"> The Main Topic may be regarded the Forward Looking Center in the centering terminology or the Current Focus. All entities are stored in the History List (HL) which is a stack containing their morphological and semantic features: this is not to be confused with a Discourse Model - what we did in the deep complete system anaphora resolution module - which is a highly semantically wrought elaboration of the current text. In the HL every new entity is assigned a semantic index which identifies it uniquely. To allow for Persistence evaluation, we also assert rhetorical properties associated to each entity, i.e. we store the information of topicality (i.e. whether it has been evaluated as Main, Secondary or Potential Topic), together with the semantic ID and the number of the current utterance. This is subsequently used to measure the degree of Persistence in the overall text of a given entity, as explained below.</Paragraph> <Paragraph position="3"> In order to decide which entity has to become Main, Secondary or Potential Topic we proceed as follows: - we collect all entities present in the History List with their semantic identifier and feature list and proceed to an additional weighting procedure; - nominal expressions, they are divided up into four semantic types: definite, indefinite, bare NPs, quantified NPs. Both definite and indefinite NP may be computed as new or old entity according to contextual conditions as will be discussed below and are given a rewarding score; - we enumerate for each entity its persistence in the previous text, and keep entities which have frequency higher than 1, we discard the others; - we recover entities which have been asserted in the HL in proximity to the current utterance, up to four utterances back; - we use this list to &quot;resolve&quot; referring expressions contained in the current utterance; - if this succeeds, we use the &quot;resolved&quot; entities as new Main, Secondary, and Potential Topics and assert the rest in the Potential Topics stack; - if this fails - also partially - we use the best candidates in the weighted list of referring expressions to assert the new Topics. It may be the case that both resolved and current best candidates are used, and this is by far the most common case.</Paragraph> </Section> </Section> class="xml-element"></Paper>