File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/n03-1031_concl.xml

Size: 2,470 bytes

Last Modified: 2025-10-06 13:53:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1031">
  <Title>Example Selection for Bootstrapping Statistical Parsers</Title>
  <Section position="6" start_page="80" end_page="80" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> We have considered three selection methods that have different priorities in balancing the two (often competing) criteria of accuracy and training utility. We have empirically compared their effect on co-training, in which two parsers label data for each other, as well as corrected co-training, in which a human corrects the parser labeled data before adding it to the training set. Our results suggest that training utility is an important selection criterion to consider, even at the cost of potentially reducing the accuracy of the training data. In our empirical studies, the selection method that aims to maximize training utility, Sdiff-n, consistently finds better examples than the one that aims to maximize accuracy, Sabove-n. Our results also suggest that the selection method that aims to maximize both accuracy and utility, Sint-n, shows promise in improving co-training parsers and in reducing human effort for corrected co-training; however, a much larger unlabeled data set is needed to verify the benefit of Sint-n.</Paragraph>
    <Paragraph position="1"> The results of this study indicate the need for scoring functions that are better estimates of the accuracy of the parser's output than conditional probabilities. Our oracle experiments show that, by using effective selection methods, the co-training process can improve parser peformance even when the newly labeled parses are not completely accurate. This suggests that co-training may still be beneficial when using a practical scoring function that might only coarsely distinguish accurate parses from inaccurate parses. Further avenues to explore include the development of selection methods to efficiently approximate maximizing the objective function of parser agreement on unlabeled data, following the work of Dasgupta et al. (2002) and Abney (2002). Also, co-training might be made more effective if partial parses were used as training data. Finally, we are conducting experiments to compare corrected co-training with other active learning methods. We hope these studies will reveal ways to combine the strengths of co-training and active learning to make better use of unlabeled data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML