File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/00/j00-4003_abstr.xml
Size: 6,245 bytes
Last Modified: 2025-10-06 13:41:41
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-4003"> <Title>Universidade do Vale do Rio dos Sinos</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Most models of definite description processing proposed in the literature tend to emphasise the anaphoric role of these elements. 1 (Heim \[1982\] is perhaps the best formalization of this type of theory). This approach is challenged by the results of experiments we reported previously (Poesio and Vieira 1998), in which subjects were asked to classify the uses of definite descriptions in Wall Street Journal articles according to schemes derived from proposals by Hawkins (1978) and Prince (1981). The results of these experiments indicated that definite descriptions are not primarily anaphoric; about half of the time they are used to introduce a new entity in the discourse. In this paper, we present an implemented system for processing definite descriptions based on the results of that earlier study. In our system, techniques for recognizing discourse-new descriptions play a role as important as techniques for identifying the antecedent of anaphoric ones.</Paragraph> <Paragraph position="1"> A central characteristic of the work described here is that we intended from the start to develop a system whose performance could be evaluated using the texts annotated in the experiments mentioned above. Assessing the performance of an NLP system on a large number of examples is increasingly seen as a much more thorough evaluation of its performance than trying to come up with counterexamples; it is considered essential for language engineering applications. These advantages are thought by many to offset some of the obvious disadvantages of this way of developing NLP theories--in particular, the fact that, given the current state of language processing technology, many hypotheses of interest cannot be tested yet (see below). As a result, quantitative evaluation is now commonplace in areas of language engineering such as parsing, and quantitative evaluation techniques are being proposed for semantic * Universidade do Vale do Rio dos Sinos - UNISINOS, Av. Unisinos 950 - Cx. Postal 275, 93022-000 Sao Leopoldo RS Brazil. E-mail: renata@exatas.unisinos.br t University of Edinburgh, ICCS and Informatics, 2, Buccleuch Place, EH8 9LW Edinburgh UK. E-maih Massimo.Poesio@ed.ac.uk 1 We use the term definite description (Russell 1905) to indicate definite noun phrases with the definite article the, such as the car. We are not concerned with other types of definite noun phrases such as pronouns, demonstratives, or possessive descriptions. Anaphoric expressions are those linguistic expressions used to signal, evoke, or refer to previously mentioned entities.</Paragraph> <Paragraph position="2"> (~) 2001 Association for Computational Linguistics Computational Linguistics Volume 26, Number 4 interpretation as well, for example, at the Sixth and Seventh Message Understanding Conferences (MUC-6 and MUC-7) (Sundheim 1995; Chinchor 1997), which also included evaluations of systems on the so-called coreference task, a subtask of which is the resolution of definite descriptions. The system we present was developed to be evaluated in a quantitative fashion, as well, but because of the problems concerning agreement between annotators observed in our previous study, we evaluated the system both by measuring precision/recall against a &quot;gold standard,&quot; as done in MUC, and by measuring agreement between the annotations produced by the system and those proposed by the annotators.</Paragraph> <Paragraph position="3"> The decision to develop a system that could be quantitatively evaluated on a large number of examples resulted in an important constraint: we could not make use of inference mechanisms such as those assumed by traditional computational theories of definite description resolution (e.g., Sidner 1979; Carter 1987; Alshawi 1990; Poesio 1993). Too many facts and axioms would have to be encoded by hand for theories of this type to be tested even on a medium-sized corpus. Our system, therefore, is based on a shallow-processing approach more radical even than that attempted by the first advocate of this approach, Carter (1987), or by the systems that participated in the MUC evaluations (Appelt et al. 1995; Gaizaukas et al. 1995; Humphreys et al.</Paragraph> <Paragraph position="4"> 1988), since we made no attempt to fine-tune the system to maximize performance on a particular domain. The system relies only on structural information, on the information provided by preexisting lexical sources such as WordNet (Fellbaum 1998), on minimal amounts of general hand-coded information, or on information that could be acquired automatically from a corpus. As a result, the system does not really have the resources to correctly resolve those definite descriptions whose interpretation does require complex reasoning (we grouped these in what we call the &quot;bridging&quot; class).</Paragraph> <Paragraph position="5"> We nevertheless developed heuristic techniques for processing these types of definites as well, the idea being that these heuristics may provide a baseline against which the gains in performance due to the use of commonsense knowledge can be assessed more clearly. 2 The paper is organized as follows: We first summarize the results of our previous corpus study (Poesio and Vieira 1998) (Section 2) and then discuss the model of deftnite description processing that we adopted as a result of that work and the general architecture of the system (Section 3). In Section 4 we discuss the heuristics that we developed for resolving anaphoric definite descriptions, recognizing discourse-new descriptions, and processing bridging descriptions, and, in Section 5, how the performance of these heuristics was evaluated using the annotated corpus. Finally, we present the final configuration of the two versions of the system that we developed (Section 6), review other systems that perform similar tasks (Section 7), and present our conclusions and indicate future work (Section 8).</Paragraph> </Section> class="xml-element"></Paper>