File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-1040_metho.xml
Size: 9,521 bytes
Last Modified: 2025-10-06 14:08:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1040"> <Title>The Influence of Minimum Edit Distance on Reference Resolution</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Features for Reference Resolution in </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Previous Work </SectionTitle> <Paragraph position="0"> Driven by the necessity to provide robust systems for the MUC system evaluations, researchers began to look for those features which were particular important for the task of reference resolution. While most features for pronoun resolution have been described in the literature for decades, researchers only recently began to look for robust and cheap features, i.e., features which perform well over several domains and can be annotated (semi-) automatically.</Paragraph> <Paragraph position="1"> In the following, we describe a few earlier contributions to reference resolution with respect to the features used.</Paragraph> <Paragraph position="2"> Decision tree algorithms were used for reference resolution by Aone and Bennett (1995, C4.5), McCarthy and Lehnert (1995, C4.5) and Soon et al. (2001, C5.0). This approach requires the definition of a set of features describing pairs of anaphors and their antecedents, and collecting a training corpus annotated with them.</Paragraph> <Paragraph position="3"> Aone and Bennett (1995), working on reference resolution in Japanese newspaper articles, use 66 features. They do not mention all of these explicitly but emphasize the features POS-tag, grammatical role, semantic class and distance.</Paragraph> <Paragraph position="4"> The set of semantic classes they use appears to be rather elaborated and highly domain-dependent.</Paragraph> <Paragraph position="5"> Aone and Bennett (1995) report that their best classifier achieved an F-measure of about 77% after training on 250 documents. They mention that it was important for the training data to contain transitive positives, i.e., all possible coreference relations within an anaphoric chain.</Paragraph> <Paragraph position="6"> McCarthy and Lehnert (1995) describe a reference resolution component which they evaluated on the MUC-5 English Joint Venture corpus. They distinguish between features which focus on individual noun phrases (e.g. Does noun phrase contain a name?) and features which focus on the anaphoric relation (e.g. Do both share a common NP?). It was criticized (Soon et al., 2001) that the features used by McCarthy and Lehnert (1995) are highly idiosyncratic and applicable only to one particular domain. McCarthy and Lehnert (1995) achieved results of about 86% F-measure (evaluated according to Vilain et al. (1995)) on the MUC-5 data set. However, only a defined subset of all possible reference resolution cases was considered relevant in the MUC-5 task description, e.g., only entity references. For this case, the domain-dependent features may have been particularly important, making it difficult to compare the results of this approach to others working on less restricted domains.</Paragraph> <Paragraph position="7"> Soon et al. (2001) use twelve features (see Table 1). Soon et al. (2001) show a part of their decision tree in which the weak string identity feature (i.e.</Paragraph> <Paragraph position="8"> identity after determiners have been removed) appears to be the most important one. They also report on the relative contribution of the features where the three features weak string identity, alias (which maps named entities in order to resolve dates, per-son names, acronyms, etc.) and appositive seem to cover most of the cases (the other nine features contribute only 2.3% F-measure for MUC-6 texts and 1% F-measure for MUC-7 texts). Soon et al. (2001) include all noun phrases returned by their NP identifier and report an F-measure of 62.6% for MUC-6 data and 60.4% for MUC-7 data. They only used pairs of anaphors and their closest antecedents as positive examples in training, but evaluated according to Vilain et al. (1995).</Paragraph> <Paragraph position="9"> Cardie and Wagstaff (1999) describe an unsupervised clustering approach to noun phrase coreference resolution in which features are assigned to single noun phrases only. They use the features shown in Table 2, all of which are obtained automatically without any manual tagging. The feature semantic class used by Cardie and Wagstaff (1999) seems to be a domain-dependent one which can only be used for the MUC domain and similar ones. Cardie and Wagstaff (1999) report a performance of 53,6% F-measure (evaluated according to Vilain et al. (1995)).</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Data </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Text Corpus </SectionTitle> <Paragraph position="0"> Our corpus consists of 242 short German texts (total 36924 tokens) about sights, historic events and persons in Heidelberg. The average length is 151 to- null - distance in sentences between anaphor and antecedent - antecedent is a pronoun? - anaphor is a pronoun? - weak string identity between anaphor and antecedent - anaphor is a definite noun phrase? - anaphor is a demonstrative pronoun? - number agreement between anaphor and antecedent - semantic class agreement between anaphor and antecedent - gender agreement between anaphor and antecedent - anaphor and antecedent are both proper names? - an alias feature (used for proper names and acronyms) - an appositive feature Table 1: Features used by Soon et al.</Paragraph> <Paragraph position="1"> - position (NPs are numbered sequentially) - pronoun type (nom., acc., possessive, ambiguous) - article (indefinite, definite, none) - appositive (yes, no) - number (singular, plural) - proper name (yes, no) - semantic class (based on WordNet: time, city, animal, human, object; based on a separate algorithm: number, money, company) - gender (masculine, feminine, either, neuter) - animacy (anim, inanim) kens. The texts were POS-tagged using TnT (Brants, 2000). A basic identification of markables (referring expressions, i.e. NPs) was obtained by using the NP-Chunker Chunkie (Skut and Brants, 1998). The POS-tagger was also used for assigning attributes like e.g. the NP form to markables. The automatic annotation was followed by a manual correction and annotation phase in which the markables were annotated with further tags (e.g. semantic class). In this phase manual coreference annotation was performed as well. In our annotation coreference is represented in terms of a member attribute on markables. Markables with the same value in this attribute are considered coreferring expressions. The annotation was performed by two students. The reliability of the annotations was checked using the kappa statistic (Carletta, 1996).</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Data Generation </SectionTitle> <Paragraph position="0"> The problem of coreference resolution can easily be formulated as a binary classification: Given a pair of potential anaphor and potential antecedent, classify as positive if the antecedent is in fact the closest antecedent, and as negative otherwise. In anaphoric chains only the immediately adjacent pairs are classified as positive. We generated data suitable as input to a machine learning algorithm from our corpus using a straightforward algorithm which combined potential anaphors and their potential antecedents.</Paragraph> <Paragraph position="1"> We then applied the following filters to the resulting pairs: Discard an antecedent-anaphor pair a0 if the anaphor is an indefinite NP, a0 if one entity is embedded into the other, e.g. if the potential anaphor is the head of the potential antecedent NP (or vice versa),</Paragraph> <Paragraph position="3"> singular or plural in its agreement feature, a0 if both entities have different values in their agreement features2.</Paragraph> <Paragraph position="4"> For some texts, these heuristics (which were applied to the entire corpus) reduced to up to 50% the potential anaphor-antecedent pairs all of which would have been negative cases. We consider the cases discarded as irrelevant because they do not contribute any knowledge for the classifier. After application of the filters, the remaining candidate pairs were labeled as follows: a0 Pairs of anaphors and their direct (i.e. closest) antecedents were labeled P. This means that each anaphoric expression produced exactly one positive instance.</Paragraph> <Paragraph position="5"> a0 Pairs of anaphors and those non-antecedents which occurred closer to the anaphor than the direct antecedent were labeled N. The number of negative instances that each expression produced thus depended on the number of non-antecedents occurring between the anaphor and the direct antecedent (or, the beginning of the text if there was none).</Paragraph> <Paragraph position="6"> Pairs of anaphors and non-antecedents which occured further away than the direct antecedent as well as pairs of anaphors and non-direct (transitive) antecedents were not considered in the data sets. This produced 242 data sets with a total of 72093 instances of potential antecedent-anaphor pairs.</Paragraph> </Section> </Section> class="xml-element"></Paper>