File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0311_intro.xml
Size: 3,701 bytes
Last Modified: 2025-10-06 14:03:07
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0311"> <Title>The Reliability of Anaphoric Annotation, Reconsidered: Taking Ambiguity into Account</Title> <Section position="2" start_page="0" end_page="76" type="intro"> <SectionTitle> 1 INTRODUCTION </SectionTitle> <Paragraph position="0"> We tackle three limitations with the current state of the art in the annotation of anaphoric relations. The first problem is the lack of a truly systematic study of agreement on anaphoric annotation in the literature: none of the studies we are aware of (Hirschman, 1998; Poesio and Vieira, 1998; Byron, 2003; Poesio, 2004) is completely satisfactory, either because only a small number of coders was involved, or because agreement beyond chance couldn't be assessed for lack of an appropriate statistic, a situation recently corrected by Passonneau (2004). The second limitation, which is particularly serious when working on dialogue, is our still limited understanding of the degree of agreement on references to abstract objects, as in discourse deixis (Webber, 1991; Eckert and Strube, 2001).</Paragraph> <Paragraph position="1"> The third shortcoming is a problem that affects all types of semantic annotation. In all annotation studies we are aware of,1 the fact that an expression may not have a unique interpretation in the context of its 1The one exception is Rosenberg and Binkowski (2004).</Paragraph> <Paragraph position="2"> occurrence is viewed as a problem with the annotation scheme, to be fixed by, e.g., developing suitably underspecified representations, as done particularly in work on wordsense annotation (Buitelaar, 1998; Palmer et al., 2005), but also on dialogue act tagging. Unfortunately, the underspecification solution only genuinely applies to cases of polysemy, not homonymy (Poesio, 1996), and anaphoric ambiguity is not a case of polysemy. Consider the dialogue excerpt in (1):2 it's not clear to us (nor was to our annotators, as we'll see below) whether the demonstrative that in utterance unit 18.1 refers to the 'bad wheel' or 'the boxcar'; as a result, annotators' judgments may disagree - but this doesn't mean that the annotation scheme is faulty; only that what is being said is genuinely ambiguous.</Paragraph> <Paragraph position="3"> (1) 18.1 S: ....</Paragraph> <Paragraph position="4"> 18.6 it turnsout that the boxcar at Elmira 18.7 has a bad wheel 18.8 and they're.. gonnastart fixingthat at midnight 18.9 but it won'tbe readyuntil8 19.1 M: oh what a pain in the buttThis problem is encountered with all types of anno-tation; the view that all types of disagreement indicate a problem with the annotation scheme-i.e., that somehow the problem would disappear if only we could find the right annotation scheme, or concentrate on the 'right' types of linguistic judgmentsis, in our opinion, misguided. A better approach 2This example, like most of those in the rest of the paper, is taken from the first edition of the TRAINS corpus collected at the University of Rochester (Gross et al., 1993). The dialogues are available at ftp://ftp.cs.rochester.edu/pub/ papers/ai/92.tn1.trains_91_dialogues.txt.</Paragraph> <Paragraph position="5"> is to find when annotators disagree because of intrinsic problems with the text, or, even better, to develop methods to identify genuinely ambiguous expressions-the ultimate goal of this work.</Paragraph> <Paragraph position="6"> The paper is organized as follows. We first briefly review previous work on anaphoric annotation and on reliability indices. We then discuss our experiment with anaphoric annotation, and its results. Finally, we discuss the implications of this work.</Paragraph> </Section> class="xml-element"></Paper>