File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-1308_intro.xml
Size: 4,306 bytes
Last Modified: 2025-10-06 14:06:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W97-1308"> <Title>Supporting anaphor resolution in dialogues with a corpus-based probabilistic model</Title> <Section position="3" start_page="0" end_page="54" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The emergence of corpus-based approaches brought to the fore the importance of extensive records of real-life language. The technique of corpus annotation and the use of statistical measures are standard research tools in corpus-based approaches. This paper presents a study which relies on corpus annotation to describe anaphoric phenomena in two languages - English and Portuguese. The investigation concentrates on dialogues. The London-Lund Corpus is the source of English data, whereas the Portuguese data come from a corpus collected especially for the purposes of this research.</Paragraph> <Paragraph position="1"> Fligelstone's (Fii92) study on anaphora bears important similarities to the present one, as it also uses an annotation to describe features of anaphoric phenomena. The annotation created for the present study draws on some of the ideas which guide Fligelstone's, but it is quite distinct in both form and content. Biber's (Bib92) systematic use of statistical techniques to explore corpus data, together with the broad concept of referring expressions adopted, was also influential in shaping choices made for this project.</Paragraph> <Paragraph position="2"> Having in mind Biber's non-restrictive approach, anaphora is defined, for the purposes of this research, as the relationship between a term - called the anaphor- which must be linked to an explicit or inferable element in the discourse - called the antecedent - in order to successfully accomplish semantic interpretation. All types of anaphors are annotated, including pronouns, noun phrases, verb phrases, and all elliptical phenomena.</Paragraph> <Paragraph position="3"> A number of studies on anaphora attempt to incorporate the notion of topic, focus, or centre to the analysis of anaphora (see, among others, (Sial86), (Fox87)), leading to the discussion of ways to track topic - under any of the various names - in discourse (among many others, (Rei85), (GS86) and (GJW95)) and to relate topicality to anaphor resolution. The research described here is no exception. In order to assess the importance of topicality for anaphor resolution, it was decided that topic structure should be made an integral part of the investigation, and, consequently, encoded in the annotation. The notion of topic is, however, notoriously difficult to deal with (see (BY83) for an extensive discussion). A routine dialogue contains a number of dis- null course entities, typically expressed by noun phrases, which, to mention a few possibilities: may retain a salient status throughout the whole dialogue; may pop in and fade out any number of times; may pop in once and fade out for good; may pop in and subdivide into subordinate topics, then fade out and then return; and several other possible combinations and interactions. Moreover, real-life conversations often cannot be summed up in terms of a title-like global topic in any easy way.</Paragraph> <Paragraph position="4"> The study thus aimed at a working definition for the different levels of saliency so as to make the notion of topicality useful for the purpose of anaphor resolution. A set of categories was created to classify discourse entities into topical roles which cover the various levels of saliency. Global and local topics for a given dialogue had to be established a priori, independently of the analysis of anaphoric relations, so as to avoid circularity, as pointed out in (Fox87), although subsequent adjustments may consider discourse information related to those anaphoric relations. null Procedures to identify each one of the topical roles were spelled out as precisely as possible, having in mind that a measure of flexibility was necessary. The picture of topicality thus obtained does not claim to be any more than part of the truth. However, the assignment of topical roles to discourse entities is claimed to be an effective way of supporting anaphor resolution by keeping track of salient discourse entities. null</Paragraph> </Section> class="xml-element"></Paper>