File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1075_intro.xml

Size: 5,881 bytes

Last Modified: 2025-10-06 14:02:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1075">
  <Title>A High-Performance Coreference Resolution System using a Constraint-based Multi-Agent Strategy</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Coreference accounts for cohesion in texts.</Paragraph>
    <Paragraph position="1"> Especially, a coreference denotes an identity of reference and holds between two expressions, which can be named entities, definite noun phrases, pronouns and so on. Coreference resolution is the process of determining whether two referring expressions refer to the same entity in the world. The ability to link referring expressions both within and across the sentence is critical to discourse and language understanding in general. For example, coreference resolution is a key task in natural language interfaces, machine translation, text summarization, information extraction and question answering. In particular, information extraction systems like those built in the DARPA Message Understanding Conferences (MUC) have revealed that coreference resolution is such a crucial component of an information extraction system that a separate coreference task has been defined and evaluated in MUC-6 (1995) and MUC-7 (1998).</Paragraph>
    <Paragraph position="2"> There is a long tradition of work on coreference resolution within computational linguistics. Many of the earlier works in coreference resolution heavily exploited domain and linguistic knowledge (Carter 1987; Rich and LuperFoy 1988; Carbonell and Brown 1988).</Paragraph>
    <Paragraph position="3"> However, the pressing need for the development of robust and inexpensive solutions encouraged the drive toward knowledge-poor strategies (Dagan and Itai 1990; Lappin and Leass 1994; Mitkov 1998; Soon, Ng and Lim 2001; Ng and Cardie 2002), which was further motivated by the emergence of cheaper and more reliable corpus-based NLP tools such as part-of-speech taggers and shallow parsers alongside the increasing availability of corpora and other resources (e.g. ontology).</Paragraph>
    <Paragraph position="4"> Approaches to coreference resolution usually rely on a set of factors which include gender and number agreements, c-command constraints, semantic consistency, syntactic parallelism, semantic parallelism, salience, proximity, etc.</Paragraph>
    <Paragraph position="5"> These factors can be either &amp;quot;constraints&amp;quot; which discard invalid ones from the set of possible candidates (such as gender and number agreements, c-command constraints, semantic consistency), or &amp;quot;preferences&amp;quot; which gives more preference to certain candidates and less to others (such as syntactic parallelism, semantic parallelism, salience, proximity). While a number of approaches use a similar set of factors, the computational strategies (the way antecedents are determined, i.e. the algorithm and formula for assigning antecedents) may differ, i.e. from simple co-occurrence rules (Dagan and Itai 1990) to decision trees (Soon, Ng and Lim 2001; Ng and Cardie 2002) to pattern induced rules (Ng and Cardie 2002) to centering algorithms (Grosz and Sidner 1986; Brennan, Friedman and Pollard 1987; Strube 1998; Tetreault 2001).</Paragraph>
    <Paragraph position="6"> This paper proposes a simple constraint-based multi-agent system to coreference resolution of general noun phrases in unrestricted English text. For a given anaphor and all the preceding referring expressions as the antecedent candidates, a common constraint agent is first presented to filter out invalid antecedent candidates using various kinds of general knowledge. Then, according to the type of the anaphor, a special constraint agent is proposed to filter out more invalid antecedent candidates using constraints which are derived from various kinds of special knowledge. Finally, a simple preference agent is used to choose an antecedent for the anaphor form the remaining antecedent candidates, based on the proximity principle. One interesting observation is that the most recent antecedent of an anaphor in the coreferential chain is sometimes indirectly linked to the anaphor via some other antecedents in the chain. In this case, we find that the most recent antecedent always contains little information to directly determine the coreference relationship with the anaphor. Therefore, for a given anaphor, the corresponding special constraint agent can always safely filter out these less informative antecedent candidates. In this way, rather than finding the most recent antecedent for an anaphor, our system tries to find the most direct and informative antecedent.</Paragraph>
    <Paragraph position="7"> In this paper, we focus on the task of determining coreference relations as defined in MUC-6 (1995) and MUC-7 (1998). In order to evaluate the performance of our approach on coreference resolution, we utilize the annotated corpus and the scoring programs from MUC-6 and MUC-7. For MUC-6, 30 dry-run documents annotated with coreference information are used as the training data. There are also 30 annotated training documents from MUC-7. The total size of 30 training documents is close 12,400 words for MUC-6 and 19,000 for MUC-7. For testing, we utilize the 30 standard test documents from MUC-6 and the 20 standard test documents from MUC-7. The layout of this paper is as follows: in Section 2, we briefly describe the preprocessing: determination of referring expressions. In Section 3, we differentiate coreference types and discuss how to restrict possible types of direct and informative antecedent candidates according to anaphor types. In Section 4, we describe the constraint-based multi-agent system. In Section 5, we evaluate the multi-agent algorithm. Finally, we present our conclusions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML