File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1020_concl.xml
Size: 5,075 bytes
Last Modified: 2025-10-06 13:54:42
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1020"> <Title>Machine Learning for Coreference Resolution: From Local Classification to Global Ranking</Title> <Section position="6" start_page="161" end_page="163" type="concl"> <SectionTitle> 5 Discussion </SectionTitle> <Paragraph position="0"> Two questions naturally arise after examining the above results. First, which of the 54 coreference systems generally yield superior results? Second, why is the same set of candidate partitions scored so differently by the two scoring programs? To address the first question, we take the 54 coreference systems that were trained on half of the available training texts (see Section 4) and apply them to the three ACE test data sets. Table 5 shows the best-performing resolver for each test set and scoring program combination. Interestingly, with respect to the MUC scorer, the best performance on the three data sets is achieved by the same resolver. The results with respect to B-CUBED are mixed, however.</Paragraph> <Paragraph position="1"> For each resolver shown in Table 5, we also compute the average rank of the partitions generated by the resolver for the corresponding test texts.6 Intuitively, a resolver that consistently produces good partitions (relative to other candidate partitions) would achieve a low average rank. Hence, we can infer from the fairly high rank associated with the top B-CUBED resolvers that they do not perform consistently better than their counterparts.</Paragraph> <Paragraph position="2"> Regarding our second question of why the same set of candidate partitions is scored differently by the two scoring programs, the reason can be attributed to two key algorithmic differences between these scorers. First, while the MUC scorer only rewards correct identification of coreferent links, B-CUBED additionally rewards successful recognition of non-coreference relationships. Second, the MUC scorer applies the same penalty to each erroneous merging decision, whereas B-CUBED penalizes erroneous merging decisions involving two large clusters more heavily than those involving two small clusters.</Paragraph> <Paragraph position="3"> Both of the above differences can potentially cause B-CUBED to assign a narrower range of F-measure scores to each set of 54 candidate partitions than the MUC scorer, for the following reasons.</Paragraph> <Paragraph position="4"> First, our candidate partitions in general agree more on singleton clusters than on non-singleton clusters.</Paragraph> <Paragraph position="5"> Second, by employing a non-uniform penalty function B-CUBED effectively removes a bias inherent in the MUC scorer that leads to under-penalization of partitions in which entities are over-clustered.</Paragraph> <Paragraph position="6"> Nevertheless, our B-CUBED results suggest that 6The rank of a partition is computed in the same way as in Section 3.2, except that we now adopt the common convention of assigning rank a0 to the a0 -th highest scored partition. (1) despite its modest improvement over the baselines, our approach offers robust performance across the data sets; and (2) we could obtain better scores by improving the ranking model and expanding our set of candidate partitions, as elaborated below.</Paragraph> <Paragraph position="7"> To improve the ranking model, we can potentially (1) design new features that better characterize a candidate partition (e.g., features that measure the size and the internal cohesion of a cluster), and (2) reserve more labeled data for training the model. In the latter case we may have less data for training coreference classifiers, but at the same time we can employ weakly supervised techniques to bootstrap the classifiers. Previous attempts on bootstrapping coreference classifiers have only been mildly successful (e.g., M&quot;uller et al. (2002)), and this is also an area that deserves further research.</Paragraph> <Paragraph position="8"> To expand our set of candidate partitions, we can potentially incorporate more high-performing coreference systems into our framework, which is flexible enough to accommodate even those that adopt knowledge-based (e.g., Harabagiu et al. (2001)) and unsupervised approaches (e.g., Cardie and Wagstaff (1999), Bean and Riloff (2004)). Of course, we can also expand our pre-selected set of coreference systems via incorporating additional learning algorithms, clustering algorithms, and feature sets.</Paragraph> <Paragraph position="9"> Once again, we may use previous work to guide our choices. For instance, Iida et al. (2003) and Zelenko et al. (2004) have explored the use of SVM, voted perceptron, and logistic regression for training coreference classifiers. McCallum and Wellner (2003) and Zelenko et al. (2004) have employed graph-based partitioning algorithms such as correlation clustering (Bansal et al., 2002). Finally, Strube et al. (2002) and Iida et al. (2003) have proposed new edit-distance-based string-matching features and centering-based features, respectively.</Paragraph> </Section> class="xml-element"></Paper>