File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1017_metho.xml
Size: 21,184 bytes
Last Modified: 2025-10-06 14:08:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1017"> <Title>Improving Pronoun Resolution by Incorporating Coreferential Information of Candidates</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The Baseline Learning Framework </SectionTitle> <Paragraph position="0"> Our baseline system adopts the common learning-based framework employed in the system by Soon et al. (2001).</Paragraph> <Paragraph position="1"> In the learning framework, each training or testing instance takes the form ofifana, candig, where ana is the possible anaphor and candi is itsantecedentcandidate1. An instance isassociated with a feature vector to describe their relationships. As listed in Table 2, we only consider those knowledge-poor and domain-independent features which, although superflcial, have been proved e-cient for pronoun resolution in many previous systems.</Paragraph> <Paragraph position="2"> During training, for each anaphor in a given text, a positive instance is created by paring the anaphor and its closest antecedent. Also a set of negative instances is formed by paring the anaphor and each of the intervening candidates.</Paragraph> <Paragraph position="3"> Based on the training instances, a binary classifler is generated using C5.0 learning algorithm (Quinlan, 1993). During resolution, each possible anaphor ana, is paired in turn with each preceding antecedent candidate, candi, from right to left to form a testing instance. This instance is presented to the classifler, which will then return a positive or negative result indicating whether or not they are co-referent. The process terminates once an instance ifana, candig is labelled as positive, and ana will be resolved to candi in that case.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The Learning Model Incorporating Coreferential Information </SectionTitle> <Paragraph position="0"> The learning procedure in our model is similar to the above baseline method, except that for each candidate, we take into consideration its closest antecedent, if possible.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Instance Structure </SectionTitle> <Paragraph position="0"> During both training and testing, we adopt the same instance selection strategy as in the base-line model. The only difierence, however, is the structure of the training or testing instances.</Paragraph> <Paragraph position="1"> Speciflcally, each instance in our model is composed of three elements like below: gender, number and animacy agreements in advance. Features describing the candidate (candi) 1. candi DefNp 1 if candi is a deflnite NP; else 0 2. candi DemoNP 1 if candi is an indeflnite NP; else 0 3. candi Pron 1 if candi is a pronoun; else 0 4. candi ProperNP 1 if candi is a proper name; else 0 5. candi NE Type 1 if candi is an \organization&quot; named-entity; 2 if \person&quot;, 3 if other types, 0 if not a NE 6. candi Human the likelihood (0-100) that candi is a human entity (obtained from WordNet) 7. candi FirstNPInSent 1 if candi is the flrst NP in the sentence where it occurs 8. candi Nearest 1 if candi is the candidate nearest to the anaphor; else 0 9. candi SubjNP 1 if candi is the subject of the sentence it occurs; else 0 Features describing the anaphor (ana): 10. ana Re exive 1 if ana is a re exive pronoun; else 0 11. ana Type 1 if ana is a third-person pronoun (he, she,...); 2 if a single neuter pronoun (it,...); 3 if a plural neuter pronoun (they,...); 4 if other types Features describing the relationships between candi and ana: 12. SentDist Distance between candi and ana in sentences 13. ParaDist Distance between candi and ana in paragraphs 14. CollPattern 1 if candi has an identical collocation pattern with ana; else 0 ifana, candi, ante-of-candig where ana and candi, similar to the deflnition in the baseline model, are the anaphor and one of its candidates, respectively. The new added element in the instance deflnition, anteof-candi, is the possible closest antecedent of candi in its coreferential chain. The ante-of-candi is set to NIL in the case when candi has no antecedent.</Paragraph> <Paragraph position="2"> Consider the example in Table 1 again. For the pronoun \it6&quot;, three training instances will be generated, namely, ifits6, The compaign5, NILg, ifits6, its advertising4, NILg, and ifits6, its3, Gitano1g.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Backward Features </SectionTitle> <Paragraph position="0"> In addition to the features adopted in the base-line system, we introduce a set of backward features to describe the elementante-of-candi. The ten features (15-24) are listed in Table 3 with their respective possible values.</Paragraph> <Paragraph position="1"> Like feature 1-9, features 15-22 describe the lexical, grammatical and semantic properties of ante-of-candi. The inclusion of the two features Apposition (23) and candi NoAntecedent (24) is inspired by the work of Strube (1998). The feature Apposition marks whether or not candi and ante-of-candi occur in the same appositive structure. The underlying purpose of this feature is to capture the pattern that proper names are accompanied by an appositive. The entity with such a pattern may often be related to the hearers' knowledge and has low preference. The feature candi NoAntecedent marks whether or not a candidate has a valid antecedent in the preceding text. As stipulated in Strube's work, co-referring expressions belong to hearer-old entities and therefore have higher preference than other candidates. When the feature is assigned value 1, all the other backward features (15-23) are set to 0.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Results and Discussions </SectionTitle> <Paragraph position="0"> In our study we used the standard MUC-6 and MUC-7 coreference corpora. In each data set, 30 \dry-run&quot; documents were annotated for training as well as 20-30 documents for testing. The raw documents were preprocessed by a pipeline of automatic NLP components (e.g. NP chunker, part-of-speech tagger, named-entity recognizer) to determine the boundary of the NPs, and to provide necessary information for feature calculation.</Paragraph> <Paragraph position="1"> In an attempt to investigate the capability of our model, we evaluated the model in an optimal environment where the closest antecedent of each candidate is correctly identifled. MUC-6 and MUC-7 can serve this purpose quite well; the annotated coreference information in the data sets enables us to obtain the correct closest Features describing the antecedent of the candidate (ante-of-candi): 15. ante-candi DefNp 1 if ante-of-candi is a deflnite NP; else 0 16. ante-candi IndefNp 1 if ante-of-candi is an indeflnite NP; else 0 17. ante-candi Pron 1 if ante-of-candi is a pronoun; else 0 18. ante-candi Proper 1 if ante-of-candi is a proper name; else 0 19. ante-candi NE Type 1 if ante-of-candi is an \organization&quot; named-entity; 2 if \person&quot;, 3 if other types, 0 if not a NE 20. ante-candi Human the likelihood (0-100) that ante-of-candi is a human entity 21. ante-candi FirstNPInSent 1 if ante-of-candi is the flrst NP in the sentence where it occurs 22. ante-candi SubjNP 1 if ante-of-candi is the subject of the sentence where it occurs Features describing the relationships between the candidate (candi) and ante-of-candi: 23. Apposition 1 if ante-of-candi and candi are in an appositive structure Features describing the candidate (candi): 24. candi NoAntecedent 1 if candi has no antecedent available; else 0 Table 3: Backward features used to capture the coreferential information of a candidate antecedent for each candidate and accordingly generate the training and testing instances. In the next section we will further discuss how to apply our model into the real resolution.</Paragraph> <Paragraph position="2"> Table 4 shows the performance of difierent systems for resolving the pronominal anaphors 2 in MUC-6 and MUC-7. Default learning parameters for C5.0 were used throughout the experiments. In this table we evaluated the performance based on two kinds of measurements: + \Recall-and-Precision&quot;: Recall = #positive instances classified correctly#positive instances Precision= #positive instances classified correctly#instances classified as positive The above metrics evaluate the capability of the learned classifler in identifying positive instances3. F-measure is the harmonic mean of the two measurements.</Paragraph> <Paragraph position="3"> + \Success&quot;: Success = #anaphors resolved correctly#total anaphors The metric4 directly re ects the pronoun resolution capability.</Paragraph> <Paragraph position="4"> The flrst and second lines of Table 4 compare the performance of the baseline system (Base-</Paragraph> <Paragraph position="6"> learned on MUC-6 with the backward features line) and our system (Optimal), where DTpron and DTpron!opt are the classiflers learned in the two systems, respectively. The results indicate that our system outperforms the base-line system signiflcantly. Compared with Baseline, Optimal achieves gains in both recall (6.4% for MUC-6 and 4.1% for MUC-7) and precision (1.3% for MUC-6 and 9.0% for MUC-7). For Success, we also observe an apparent improvement by 4.7% (MUC-6) and 3.5% (MUC-7).</Paragraph> <Paragraph position="7"> Figure 1 shows the portion of the pruned decision tree learned for MUC-6 data set. It visualizes the importance of the backward features for the pronoun resolution on the data set. From (*Here we only list backward feature assigner for pronominal candidates. In RealResolve-1 to RealResolve-4, the backward features for non-pronominal candidates are all found by DTnon!pron.) the tree we could flnd that: 1.) Feature ante-candi SubjNP is of the most importance as the root feature of the tree.</Paragraph> <Paragraph position="8"> The decision tree would flrst examine the syntactic role of a candidate's antecedent, followed by that of the candidate. This nicely provesour assumption that the properties of the antecedents of the candidates provide very important information for the candidate evaluation.</Paragraph> <Paragraph position="9"> 2.) Both features ante-candi SubjNP and candi SubjNP rank top in the decision tree.</Paragraph> <Paragraph position="10"> That is, for the reference determination, the subject roles of the candidate's referent within a discourse segment will be checked intheflrstplace. Thisflndingsupportswell the suggestion in centering theory that the grammaticalrelationsshouldbeusedasthe key criteria to rank forward-looking centers in the process of focus tracking (Brennan et al., 1987; Grosz et al., 1995).</Paragraph> <Paragraph position="11"> 3.) candi Pron and candi NoAntecedent are to be examined in the cases when the subject-role checking fails, which conflrms the hypothesis in the S-List model by Strube (1998) that co-refereing candidates would have higher preference than other candidates in the pronoun resolution.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Applying the Model in Real </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Resolution </SectionTitle> <Paragraph position="0"> In Section 4 we explored the efiectiveness of the backward feature for pronoun resolution. In those experiments our model was tested in an ideal environment where the closest antecedent of a candidate can be identifled correctly when generating the feature vector. However, during real resolution such coreferential information is not available, and thus a separate module has algorithm PRON-RESOLVE input: DTnon!pron: classifler for resolving non-pronouns DTpron: classifler for resolving pronouns begin: M1::n:= the valid markables in the given docu-</Paragraph> <Paragraph position="2"> The algorithm takes as input two classiflers, one for the non-pronoun resolution and the other for pronoun resolution. Given a testing document, the antecedent of each NP is identifled using one of these two classiflers, depending on the type of NP. Although a separate non-pronoun resolution module is required for the pronoun resolution task, this is usually not a big problem as these two modules are often integrated in coreference resolution systems. We just use the results of the one module to improve the performance of the other.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.1 New Training and Testing Procedures </SectionTitle> <Paragraph position="0"> For a pronominal candidate, its antecedent can be obtained by simply using DTpron!opt. For</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Training Procedure: </SectionTitle> <Paragraph position="0"> T1. Train a non-pronoun resolution classifler DTnon!pron and a pronoun resolution classifler DTpron, using the baseline learning framework (without backward features).</Paragraph> <Paragraph position="1"> T2. Apply DTnon!pron and DTpron to identify the antecedent of each non-pronominal and pronominal markable, respectively, in a given document.</Paragraph> <Paragraph position="2"> T3. Go through the document again. Generate instances with backward features assigned using the antecedent information obtained in T2.</Paragraph> <Paragraph position="3"> a non-pronominal candidate, we built a non-pronoun resolution module to identify its antecedent. The module is a duplicate of the NP coreference resolution system by Soon et al. (2001)5 , which uses the similar learning framework as described in Section 3. In this way, we could do pronoun resolution just by running PRON-RESOLVE(DTnon!pron, DTpron!opt), where DTnon!pron is the classifler of the non-pronoun resolution module.</Paragraph> <Paragraph position="4"> One problem, however, is that DTpron!opt is trained on the instances whose backward features are correctly assigned. During real resolution, the antecedent of a candidate is found by DTnon!pron or DTpron!opt, and the backward feature values are not always correct. Indeed, for most noun phrase resolution systems, the recall is not very high. The antecedent sometimes can not be found, or is not the closest one in the preceding coreferential chain. Consequently, the classifler trained on the \perfect&quot; feature vectors would probably fail to output anticipated results on the noisy data during real resolution.</Paragraph> <Paragraph position="5"> Thus we modify the training and testing procedures of the system. For both training and testing instances, we assign the backward feature values based on the results from separate NP resolution modules. The detailed procedures are described in Table 5.</Paragraph> <Paragraph position="6"> 5Details of the features can be found in Soon et al. (2001) algorithm REFINE-CLASSIFIER begin:</Paragraph> <Paragraph position="8"> for i = 1 to 1 Use DTipron to update the antecedents of pronominal candidates and the corresponding backward features; Train DTi+1pron based on the updated training instances; if DTi+1pron is not better than DTipron then The idea behind our approach is to train and test the pronoun resolution classifler on instances with feature values set in a consistent way. Here the purpose of DTpron and DTnon!pron is to provide backward feature values for training and testing instances. From this point of view, the two modules could be thought of as a preprocessing component of our pronoun resolution system.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.2 Classifler Reflning </SectionTitle> <Paragraph position="0"> If the classifler DT0pron outperforms DTpron as expected, we can employ DT0pron in place of DTpron to generate backward features for pronominal candidates, and then train a classifler DT00pron based on the updated training instances. Since DT0pron produces more correct feature values than DTpron, we could expect that DT00pron will not be worse, if not better, than DT0pron. Such a process could be repeated to reflne the pronoun resolution classifler. The algorithm is described in Figure 3.</Paragraph> <Paragraph position="1"> In algorithm REFINE-CLASSIFIER, the iteration terminates when the new trained classifler DTi+1pron provides no further improvement than DTipron. In this case, we can replace DTi+1pron by DTipron during the i+1(th) testing procedure. That means, by simply running PRON-RESOLVE(DTnon!pron,DTipron), we can use for both backward feature computation and instance classiflcation tasks, rather than applying DTpron and DT0pron subsequently.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 5.3 Results and Discussions </SectionTitle> <Paragraph position="0"> In the experiments we evaluated the performance of our model in real pronoun resolution.</Paragraph> <Paragraph position="1"> The performance of our model depends on the performance of the non-pronoun resolution classifler, DTnon!pron. Hence we flrst examined the coreference resolution capability of DTnon!pron based on the standard scoring scheme by Vilain et al. (1995). For MUC-6, the module obtains 62.2% recall and 78.8% precision, while for MUC-7, it obtains 50.1% recall and 75.4% precision. The poor recall and comparatively high precision re ect the capability of the state-of-the-art learning-based NP resolution systems. The third block of Table 4 summarizes the performance of the classifler DTpron!opt in real resolution. In the systems RealResolve-1 and RealResolve-2, the antecedents of pronominal candidates are found by DTpron!opt and DTpron respectively, while in both systems the antecedents of non-pronominal candidates are by DTnon!pron. As shown in the table, compared with the Optimal where the backward features of testing instances are optimally assigned, the recall rates of two systems drop largely by 7.8% for MUC-6 and by about 14% for MUC-7. The scores of recall are even lower than those of Baseline. As a result, in comparison with Optimal, we see the degrade of the F-measure and the success rate, which conflrms our hypothesis that the classifler learned on perfect training instances would probably not perform well on the noisy testing instances.</Paragraph> <Paragraph position="2"> The system RealResolve-3 listed in the flfth line of the table uses the classifler trained and tested on instances whose backward features are assigned according to the results from DTnon!pron and DTpron. From the table we can flnd that: (1) Compared with Baseline, the system produces gains in recall (2.1% for MUC-6 and 2.8% for MUC-7) with no signiflcant loss in precision. Overall, we observe the increase in F-measure for both data sets. If measured by Success, the improvement is more apparent by 4.7% (MUC-6) and 1.8% (MUC-7). (2) Compared with RealResolve-1(2), the performance decrease of RealResolve-3 against Optimal is not so large. Especially for MUC-6, the system obtains a success rate as high as Optimal.</Paragraph> <Paragraph position="3"> The above results show that our model can be successfully applied in the real pronoun resolution task, even given the low recall of the current non-pronoun resolution module. This should be owed to the fact that for a candidate, its adjacent antecedents, even not the closest one, could give clues to re ect its salience in the local discourse. That is, the model prefers a high precision to a high recall, which copes well with the capability of the existing non-pronoun resolution module.</Paragraph> <Paragraph position="4"> In our experiments we also tested the classifler reflning algorithm described in Figure 3. We found that for both MUC-6 and MUC-7 data set, the algorithm terminated in the second round. The comparison of DT2pron and DT1pron (i.e. DT0pron) showed that these two trees were exactly the same. The algorithm converges fast probably because in the data set, most of the antecedent candidates are non-pronouns (89.1% for MUC-6 and 83.7% for MUC-7). Consequently, the ratio of the training instances with backward features changed may be not substantial enough to afiect the classifler generation. Although the algorithm provided no further reflnement for DT0pron, we can use DT0pron, as suggested in Section 5.2, to calculate backward features and classify instances by running PRON-RESOLVE(DTnon!pron, DT0pron). The results of such a system, RealResolve-4, are listed in the last line of Table 4. For both MUC-</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 and MUC-7, RealResolve-4 obtains exactly </SectionTitle> <Paragraph position="0"> the same performance as RealResolve-3.</Paragraph> </Section> class="xml-element"></Paper>