File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1020_intro.xml
Size: 8,409 bytes
Last Modified: 2025-10-06 14:02:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1020"> <Title>Learning Noun Phrase Anaphoricity to Improve Coreference Resolution: Issues in Representation and Optimization</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 The Anaphoricity Determination </SectionTitle> <Paragraph position="0"> System: Local vs. Global Optimization In this section, we will show how to build a model of anaphoricity determination. We will first present the standard, locally-optimized approach and then introduce our globally-optimized approach.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 The Locally-Optimized Approach </SectionTitle> <Paragraph position="0"> In this approach, the anaphoricity model is simply a classifier that is trained and optimized independently of the coreference system (e.g., Evans (2001), Ng and Cardie (2002a)).</Paragraph> <Paragraph position="1"> Building a classifier for anaphoricity determination. A learning algorithm is used to train a classifier that, given a description of an NP in a document, decides whether or not the NP is anaphoric. Each training instance represents a single NP and consists of a set of features that are potentially useful for distinguishing anaphoric and non-anaphoric NPs. The classification associated with a training instance -one of ANAPHORIC or NOT ANAPHORIC -- is derived from coreference chains in the training documents. Specifically, a positive instance is created for each NP that is involved in a coreference chain but is not the head of the chain. A negative instance is created for each of the remaining NPs.</Paragraph> <Paragraph position="2"> Applying the classifier. To determine the anaphoricity of an NP in a test document, an instance is created for it as during training and presented to the anaphoricity classifier, which returns a value of ANAPHORIC or NOT ANAPHORIC.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 The Globally-Optimized Approach </SectionTitle> <Paragraph position="0"> To achieve global optimization, we construct a parametric anaphoricity model with which we optimize the parameter1 for coreference accuracy on held-out development data. In other words, we tighten the connection between anaphoricity determination and coreference resolution by using the parameter to generate a set of anaphoricity models from which we select the one that yields the best coreference performance on held-out data.</Paragraph> <Paragraph position="1"> Global optimization for a constraint-based representation. We view anaphoricity determination as a problem of determining how conservative an anaphoricity model should be in classifying an NP as (non-)anaphoric. Given a constraint-based representation of anaphoricity information for the coreference system, if the model is too liberal in classifying an NP as non-anaphoric, then many anaphoric NPs will be misclassified, ultimately leading to a deterioration of recall and of the overall performance of the coreference system. On the other hand, if the model is too conservative, then only a small fraction of the truly non-anaphoric NPs will be identified, and so the resulting anaphoricity information may not be effective in improving the coreference system. The challenge then is to determine a &quot;good&quot; degree of conservativeness. As a result, we can design a parametric anaphoricity model whose conservativeness can be adjusted via a conservativeness parameter. To achieve global optimization, we can simply tune this parameter to optimize for coreference performance on held-out development data.</Paragraph> <Paragraph position="2"> Now, to implement this conservativeness-based anaphoricity determination model, we propose two methods, each of which is built upon a different definition of conservativeness.</Paragraph> <Paragraph position="3"> but to simply the optimization process, we will only consider single-parameter models in this paper.</Paragraph> <Paragraph position="4"> training a classifier -- the cost ratio (cr), which is defined as follows.</Paragraph> <Paragraph position="5"> cr := costof misclassifyingapositiveinstancecostof misclassifyinganegative instance Inspection of this definition shows that cr provides a means of adjusting the relative misclassification penalties placed on training instances of different classes. In particular, the larger cr is, the more conservative the classifier is in classifying an instance as negative (i.e., non-anaphoric). Given this observation, we can naturally define the conservativeness of an anaphoricity classifier as follows. We say that classifier A is more conservative than classifier B in determining an NP as non-anaphoric if A is trained with a higher cost ratio than B.</Paragraph> <Paragraph position="6"> Based on this definition of conservativeness, we can construct an anaphoricity model parameterized by cr. Specifically, the parametric model maps a given value of cr to the anaphoricity classifier trained with this cost ratio. (For the purpose of training anaphoricity classifiers with different values of cr, we use RIPPER (Cohen, 1995), a propositional rule learning algorithm.) It should be easy to see that increasing cr makes the model more conservative in classifying an NP as non-anaphoric. With this parametric model, we can tune cr to optimize for coreference performance on held-out data.</Paragraph> <Paragraph position="7"> We can also define conservativeness in terms of the number of NPs classified as non-anaphoric for a given set of NPs. Specifically, given two anaphoricity models A and B and a set of instances I to be classified, we say that A is more conservative than B in determining an NP as non-anaphoric if A classifies fewer instances in I as non-anaphoric than B. Again, this definition is consistent with our intuition regarding conservativeness.</Paragraph> <Paragraph position="8"> We can now design a parametric anaphoricity model based on this definition. First, we train in a supervised fashion a probablistic model of anaphoricity PA(c |i), where i is an instance representing an NP and c is one of the two possible anaphoricity values. (In our experiments, we use maximum entropy classification (MaxEnt) (Berger et al., 1996) to train this probability model.) Then, we can construct a parametric model making binary anaphoricity decisions from PA by introducing a threshold parameter t as follows. Given a specific t (0 [?] t [?] 1) and a new instance i, we define an anaphoricity model MtA in which MtA(i)</Paragraph> <Paragraph position="10"> increasing t yields progressively more conservative anaphoricity models. Again, t can be tuned using held-out development data.</Paragraph> <Paragraph position="11"> Global optimization for a feature-based representation. We can similarly optimize our proposed conservativeness-based anaphoricity model for coreference performance when anaphoricity information is represented as a feature for the coreference system. Unlike in a constraint-based representation, however, we cannot expect that the recall of the coreference system would increase with the conservativeness parameter. The reason is that we have no control over whether or how the anaphoricity feature is used by the coreference learner. In other words, the behavior of the coreference system is less predictable in comparison to a constraint-based representation. Other than that, the conservativeness-based anaphoricity model is as good to use for global optimization with a feature-based representation as with a constraint-based representation. We conclude this section by pointing out that the locally-optimized approach to anaphoricity determination is indeed a special case of the global one. Unlike the global approach in which the conservativeness parameter values are tuned based on labeled data, the local approach uses &quot;default&quot; parameter values. For instance, when RIPPER is used to train an anaphoricity classifier in the local approach, cr is set to the default value of one. Similarly, when probabilistic anaphoricity decisions generated via a MaxEnt model are converted to binary anaphoricity decisions for subsequent use by a coreference system, t is set to the default value of 0.5.</Paragraph> </Section> </Section> class="xml-element"></Paper>