XML Viewer - c04-1033

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/c04-1033_metho.xml
Size: 20,382 bytes
Last Modified: 2025-10-06 14:08:40
<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1033">
  <Title>An NP-Cluster Based Approach to Coreference Resolution</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Baseline: the NP-NP based
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
approach
2.1 Framework description
</SectionTitle>
      <Paragraph position="0"> We built a baseline coreference resolution system, which adopts the common NP-NP based learning framework as employed in (Soon et al., 2001).</Paragraph>
      <Paragraph position="1"> Each instance in this approach takes the form of ifNPj, NPig, which is associated with a feature vector consisting of 18 features (f1 &gt;&gt; f18) as described in Table 2. Most of the features come from Soon et al. (2001)'s system. Inspired by the work of (Strube et al., 2002) and (Yang et al., 2004), we use two features, StrSim1 (f17) and StrSim2 (f18), to measure the string-matching degree of NPj and NPi. Given the following similarity function:</Paragraph>
      <Paragraph position="3"> Str Similarity(SNPi, SNPj), respectively. Here SNP is the token list of NP, which is obtained by applying word stemming, stopword removal and acronym expansion to the original string as described in Yang et al. (2004)'s work.</Paragraph>
      <Paragraph position="4"> During training, for each anaphor NPj in a given text, a positive instance is generated by pairing NPj with its closest antecedent. A set of negative instances is also formed by NPj and each NP occurring between NPj and NPi.</Paragraph>
      <Paragraph position="5"> When the training instances are ready, a classifler is learned by C5.0 algorithm (Quinlan, 1993). Duringresolution, eachencounterednoun phrase, NPj, is paired in turn with each preceding noun phrase, NPi. For each pair, a testing instance is created as during training, and then presented to the decision tree, which returns a confldence value (CF)2 indicating the likelihood that NPi is coreferential to NPj. In our study, two antecedent selection strategies, Most Recent First (MRF) and Best First (BF), are tried to link NPj to its a proper antecedent with CF above a threshold (0.5). MRF (Soon et al., 2001) selects the candidate closest to the anaphor, while BF (Aone and Bennett, 1995; Ng 2The confldence value is obtained by using the smoothed ratio p+1t+2, where p is the number of positive instances and t is the total number of instances contained in the corresponding leaf node.</Paragraph>
      <Paragraph position="6"> and Cardie, 2002b) selects the candidate with the maximal CF.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Limitation of the approach
</SectionTitle>
      <Paragraph position="0"> Nevertheless, the problem of the NP-NP based approach is that the individual NP usually lacks adequate description information about its referred entity. Consequently, it is often di-cult to determine whether or not two NPs refer to the same entity simply from the properties of the pair. See the the text segment in Table 1, for example, [1 A mutant of [2 KBF1/p50] ], unable to bind to DNA but able to form homo- or [3 heterodimers] , has been constructed.</Paragraph>
      <Paragraph position="1"> [4 This protein] reduces or abolishes the DNA binding activity of wild-type proteins of [5 the same family ([6 KBF1/p50] , c- and v-rel)].</Paragraph>
      <Paragraph position="2"> [7 This mutant] also functions in vivo as a transacting dominant negative regulator:...</Paragraph>
      <Paragraph position="3">  In the above text, [1 A mutant of KBF1/p50], [4 This protein] and [7 This mutant] are annotated in the same coreferential cluster. According to the above framework, NP7 and its closest antecedent, NP4, will form a positive instance. Nevertheless, such an instance is not informative in that NP4 bears little information related to the entity and thus provides few clues to explain its coreference relationship with NP7.</Paragraph>
      <Paragraph position="4"> In fact, this relationship would be clear if [1 A mutant of KBF1/p50], the antecedent of NP4, is taken into consideration. NP1 gives a detailed description of the entity. By comparing the string of NP7 with this description, it is apparent that NP7 belongs to the cluster of NP1, and thus should be coreferential to NP4. This suggests that we use the coreferential cluster, instead of its single element, to resolve an NP correctly. In our study, we propose an approach which adopts an NP-Cluster based framework to do resolution. The details of the approach are given in the next section.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 The NP-Cluster based approach
</SectionTitle>
    <Paragraph position="0"> Similar to the baseline approach, our approach also recasts coreference resolution as a binary classiflcation problem. The difierence, however, is that our approach aims to learn a classifler which would select the most preferred cluster, instead of the most preferred antecedent, for an encountered NP in text. We will give the framework of the approach, including the instance rep-Features describing the relationships between NPj and NPi  1. DefNp 1 1 if NPj is a deflnite NP; else 0 2. DemoNP 1 1 if NPj starts with a demonstrative; else 0 3. IndefNP 1 1 if NPj is an indeflnite NP; else 0 4. Pron 1 1 if NPj is a pronoun; else 0 5. ProperNP 1 1 if NPj is a proper NP; else 0 6. DefNP 2 1 if NPi is a deflnite NP; else 0 7. DemoNP 2 1 if NPi starts with a demonstrative; else 0 8. IndefNP 2 1 if NPi is an indeflnite NP; else 0 9. Pron 2 1 if NPi is a pronoun; else 0 10. ProperNP 2 1 if NPi is a proper NP; else 0 11. Appositive 1 if NPi and NPj are in an appositive structure; else 0 12. NameAlias 1 if NPi and NPj are in an alias of the other; else 0 13. GenderAgree 1 if NPi and NPj agree in gender; else 0 14. NumAgree 1 if NPi and NPj agree in number; else 0 15. SemanticAgree 1 if NPi and NPj agree in semantic class; else 0 16. HeadStrMatch 1 if NPi and NPj contain the same head string; else 0 17. StrSim 1 The string similarity of NPj against NPi 18. StrSim 2 The string similarity of NPi against NPj Features describing the relationships between NPj and cluster Ck 19. Cluster NumAgree 1 if Ck and NPj agree in number; else 0 20. Cluster GenAgree 1 if Ck and NPj agree in gender; else 0 21. Cluster SemAgree 1 if Ck and NPj agree in semantic class; else 0 22. Cluster Length The number of elements contained in Ck 23. Cluster StrSim The string similarity of NPj against Ck 24. Cluster StrLNPSim The string similarity of NPj against the longest NP in Ck  baseline system using NP-NP based approach) resentation, the training and the resolution procedures, in the following subsections.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Instance representation
</SectionTitle>
      <Paragraph position="0"> An instance in our approach is composed of three elements like below: ifNPj, Ck, NPig where NPj, like the deflnition in the baseline, is the noun phrase under consideration, while Ck is an existing coreferential cluster. Each cluster could be referred by a reference noun phrase NPi, a certain element of the cluster. A cluster would probably contain more than one reference NPs and thus may have multiple associated instances. For a training instance, the label is positive if NPj is annotated as belonging to Ck, or negative if otherwise.</Paragraph>
      <Paragraph position="1"> In our system, each instance is represented as a set of 24 features as shown in Table 2. The features are supposed to capture the properties of NPj and Ck as well as their relationships. In the table we divide the features into two groups, one describing NPj and NPi and the other describing NPj and Ck. For the former group, we just use the same features set as in the baseline system, while for the latter, we introduce 6 more features: Cluster NumAgree, Cluster GenAgree and Cluster SemAgree: These three features mark the compatibility of NPj and Ck in number, gender and semantic agreement, respectively. If NPj mismatches the agreement with any element in Ck, the corresponding feature is set to 0.</Paragraph>
      <Paragraph position="2"> Cluster Length: The number of NPs in the cluster Ck. This feature re ects the global salience of an entity in the sense that the more frequently an entity is mentioned, the more important it would probably be in text.</Paragraph>
      <Paragraph position="3"> Cluster StrSim: This feature marks the string similarity between NPj and Ck. Suppose SNPj is the token set of NPj, we compute the feature value using the similarity function</Paragraph>
      <Paragraph position="5"> Cluster StrLNPSim: It marks the string matching degree of NPj and the noun phrase in Ck with the most number of tokens. The intuition here is that the NP with the longest string would probably bear richer description information of the referent than other elements in the cluster. The feature is calculated using the similarity function Str Similarity(SNPj, SNPk), where</Paragraph>
      <Paragraph position="7"/>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Training procedure
</SectionTitle>
      <Paragraph position="0"> Given an annotated training document, we process the noun phrases from beginning to end.</Paragraph>
      <Paragraph position="1"> ForeachanaphoricnounphraseNPj, weconsider its preceding coreferential clusters from right to left3. For each cluster, we create only one instance by taking the last NP in the cluster as the reference NP. The process will not terminate until the cluster to which NPj belongs is found.</Paragraph>
      <Paragraph position="2"> To make it clear, consider the example in Table 1 again. For the noun phrase [7 This mutant], the annotated preceding coreferential clus- null Thus three training instances are generated: if NP7, C1, NP6 g if NP7, C2, NP5 g if NP7, C3, NP4 g Among them, the flrst two instances are labelled as negative while the last one is positive. After the training instances are ready, we use C5.0 learning algorithm to learn a decision tree classifler as in the baseline approach.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Resolution procedure
</SectionTitle>
      <Paragraph position="0"> The resolution procedure is the counterpart of the training procedure. Given a testing document, for each encountered noun phrase, NPj, we create a set of instances by pairing NPj with each cluster found previously. The instances are presented to the learned decision tree to judge the likelihood that NPj is linked to a cluster.</Paragraph>
      <Paragraph position="1"> The resolution algorithm is given in Figure 1.</Paragraph>
      <Paragraph position="2"> As described in the algorithm, for each clus- null is the maximal confldence value of its instances. Similar to the baseline system, two cluster selection strategies, i.e. MRF and BF, could be applied to link NPj to a proper cluster. For MRF strategy, NPj is linked to the closest cluster with confldence value above 0.5, while for BF, it is linkedtotheclusterwiththemaximalconfldence value (above 0.5).</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.4 Comparison of NP-NP and
</SectionTitle>
      <Paragraph position="0"> NP-Cluster based approaches Asnotedabove, theideaoftheNP-Clusterbased approach is difierent from the NP-NP based approach. However, due to the fact that in our approach a cluster is processed based on its reference NPs, the framework of our approach could be reduced to the NP-NP based framework if the cluster-related features were removed. From this point of view, this approach could be considered as an extension of the baseline approach by applying additional cluster features as the properties of NPi. These features provide richer description information of the entity, and thus make the coreference relationship between two NPs more apparent. In this way, both rules learning and coreference determination capabilities of the original approach could be enhanced.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Data collection
</SectionTitle>
      <Paragraph position="0"> Our coreference resolution system is a component of our information extraction system in biomedical domain. For this purpose, an annotated coreference corpus have been built 4, which  consists of totally 228 MEDLINE abstracts selected from the GENIA data set. The average length of the documents in collection is 244 words. One characteristic of the bio-literature is that pronouns only occupy about 3% among all the NPs. This ratio is quite low compared to that in newswire domain (e.g. above 10% for MUC data set).</Paragraph>
      <Paragraph position="1"> A pipeline of NLP components is applied to pre-process an input raw text. Among them, NE recognition, part-of-speech tagging and text chunking adopt the same HMM based engine with error-driven learning capability (Zhou and Su, 2002). The NE recognition component trained on GENIA (Shen et al., 2003) can recognize up to 23 common biomedical entity types with an overall performance of 66.1 F-measure (P=66.5% R=65.7%). In addition, to remove the apparent non-anaphors (e.g., embedded proper nouns) in advance, a heuristic-based non-anaphoricity identiflcation module is applied, which successfully removes 50.0% non-anaphors with a precision of 83.5% for our data set.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Experiments and discussions
</SectionTitle>
      <Paragraph position="0"> Our experiments were done on flrst 100 documents from the annotated corpus, among them 70 for training and the other 30 for testing.</Paragraph>
      <Paragraph position="1"> Throughout these experiments, default learning parameters were applied in the C5.0 algorithm.</Paragraph>
      <Paragraph position="2"> The recall and precision were calculated automatically according to the scoring scheme proposed by Vilain et al. (1995).</Paragraph>
      <Paragraph position="3"> In Table 3 we compared the performance of difierent coreference resolution systems. The flrst line summarizes the results of the baseline system using traditional NP-NP based approach as described in Section 2. Using BF strategy, Baseline obtains 80.3% recall and 77.5% precision. These results are better than the work by Castano et al. (2002) and Yang et al. (2004), which were also tested on the MEDLINE data set and reported a F-measure of about 74% and 69%, respectively.</Paragraph>
      <Paragraph position="4"> In the experiments, we evaluated another NP-NP based system, AllAnte. It adopts a similar learning framework as Baseline except that during training it generates the positive instances by paring an NP with all its antecedents instead of only the closest one. The system attempts to use such an instance selection strategy to incorporate the information from coreferential clusters. But the results are nevertheless disappointing: although this strategy boosts the recall by 5.4%, the precision drops considerably by above 6% at the same time. The overall F-measure is even lower than the baseline systems.</Paragraph>
      <Paragraph position="5"> The last line of Table 3 demonstrates the results of our NP-Cluster based approach. For BF strategy, the system achieves 84.9% recall and 78.8% precision. As opposed to the baseline system, the recall rises by 4.6% while the precision still gains slightly by 1.3%. Overall, we observe the increase of F-measure by 2.8%.</Paragraph>
      <Paragraph position="6"> The results in Table 3 also indicate that the BF strategy is superior to the MRF strategy.</Paragraph>
      <Paragraph position="7"> A similar flnding was also reported by Ng and Cardie (2002b) in the MUC data set.</Paragraph>
      <Paragraph position="8"> To gain insight into the difierence in the performance between our NP-Cluster based system and the NP-NP based system, we compared the decision trees generated in the two systems in Figure 2. In both trees, the string-similarity features occur on the top portion, which supports the arguments by (Strube et al., 2002) and (Yang et al., 2004) that string-matching is a crucial factor for NP coreference resolution. As shown in the flgure, the feature StrSim 1 in left</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
treeiscompletelyreplacedbytheCluster StrSim
</SectionTitle>
      <Paragraph position="0"> and Cluster StrLNPSim in the right tree, which means that matching the tokens with a cluster is more reliable than with a single NP. Moreover, the cluster length will also be checked when the NP under consideration has low similarity against a cluster. These evidences prove that the information from clusters is quite important for the coreference resolution on the data set.</Paragraph>
      <Paragraph position="1"> The decision tree visualizes the importance of the features for a data set. However, the tree is learned from the documents where coreferential clusters are correctly annotated. During resolu- null (fi refers to the i(th) feature listed in Table 2) tion, unfortunately, the found clusters are usually not completely correct, and as a result the features important in training data may not be also helpful for testing data. Therefore, in the experiments we were concerned about which features really matter for the real coreference resolution. For this purpose, we tested our system using difierent features and evaluated their performanceinTable4. Herewejustconsideredfeature Cluster Length (f22), Cluster StrSim (f23) and Cluster StrLNPSim (f24), as Figure 2 has indicated that among the cluster-related features only these three are possibly efiective for resolution. Throughout the experiment, the Best-First strategy was applied.</Paragraph>
      <Paragraph position="2"> As illustrated in the table, we could observe that:  1. Without the three features, the system is equivalent to the baseline system in terms of the same recall and precision.</Paragraph>
      <Paragraph position="3"> 2. Cluster StrSim (f23) is the most efiective as it contributes most to the system performance. Simply using this feature boosts the F-measure by 2.7%.</Paragraph>
      <Paragraph position="4"> 3. Cluster StrLNPSim (f24) is also efiective by improving the F-measure by 2.1% alone.</Paragraph>
      <Paragraph position="5"> When combined with f23, it leads to the best F-measure.</Paragraph>
      <Paragraph position="6"> 4. Cluster Length (f22) only brings 0.1% F- null measure improvement. It could barely increase, or even worse, reduces the F-measure when used together with the the other two features.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> To our knowledge, our work is the flrst supervised-learning based attempt to do coreference resolution by exploring the relationship between an NP and coreferential clusters. In the heuristic salience-based algorithm for pronoun resolution, Lappin and Leass (1994) introduce a procedure for identifying anaphorically linked NP as a cluster for which a global salience value is computed as the sum of the salience values of its elements. Cardie and Wagstafi (1999) have proposed an unsupervised approach which also incorporates cluster information into consideration. Their approach uses hard constraints to preclude the link of an NP to a cluster mismatching the number, gender or semantic agreements, while our approach takes these agreements together with other features (e.g. cluster-length, string-matching degree,etc) as preference factors for cluster selection. Besides, the idea of clustering can be seen in the research of cross-document coreference, where NPs with high context similaritywouldbechainedtogetherbasedoncertain null clustering methods (Bagga and Biermann, 1998; Gooi and Allan, 2004).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML