File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1040_intro.xml

Size: 2,995 bytes

Last Modified: 2025-10-06 14:01:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1040">
  <Title>The Influence of Minimum Edit Distance on Reference Resolution</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> For the automatic understanding of written or spoken natural language it is crucial to be able to identify the entities referred to by referring expressions.</Paragraph>
    <Paragraph position="1"> The most common and thus most important types of referring expressions are pronouns and definite noun phrases (NPs). Supervised machine learning algorithms have been used for pronoun resolution (Ge et al., 1998) and for the resolution of definite NPs (Aone and Bennett, 1995; McCarthy and Lehnert, 1995; Soon et al., 2001). An unsupervised approach to the resolution of definite NPs was applied by Cardie and Wagstaff (1999). However, though machine learning algorithms may deduce to make best use of a given set of features for a given problem, it is a linguistic question and a non-trivial task to identify a set of features which describe the data sufficiently.</Paragraph>
    <Paragraph position="2"> We report on experiments in the resolution of anaphoric expressions in general, including definite noun phrases, proper names, and personal, possessive and demonstrative pronouns. Based on the work mentioned above we started with a feature set including NP-level and coreference-level features.</Paragraph>
    <Paragraph position="3"> Applied to the whole data set these features led only to moderate results. Since the NP form of the anaphor (i.e., whether the anaphoric expression is realized as pronoun, definite NP or proper name) appeared to be the most important feature, we divided the data set into several subsets based on the NP form of the anaphor. This led to the insight that the moderate performance of our system was caused by the low performance for definite NPs. We adopted a new feature based on the minimum edit distance (Wagner and Fischer, 1974) between anaphor and antecedent, which led to a significant improvement on definite NPs and proper names. When applied to the whole data set the feature yielded a smaller but still significant improvement.</Paragraph>
    <Paragraph position="4"> In this paper, we first discuss features that have been found to be relevant for the task of reference resolution (Section 2). Then we describe our corpus, the corpus annotation, and the way we prepared the data for use with a binary machine learning classifier (Section 3). In Section 4 we first describe the feature set used initially and the results it produced. Association for Computational Linguistics.</Paragraph>
    <Paragraph position="5"> Language Processing (EMNLP), Philadelphia, July 2002, pp. 312-319. Proceedings of the Conference on Empirical Methods in Natural We then introduce the minimum edit distance feature and give the results it yielded on different data sets.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML