File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/a00-2005_concl.xml
Size: 2,111 bytes
Last Modified: 2025-10-06 13:52:38
<?xml version="1.0" standalone="yes"?> <Paper uid="A00-2005"> <Title>Bagging and Boosting a Treebank Parser</Title> <Section position="7" start_page="39832" end_page="39832" type="concl"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> We have shown two methods, bagging and boosting, for automatically creating ensembles of parsers that produce better parses than any individual in the ensemble. Neither of the algorithms exploit any specialized knowledge of the underlying parser induction algorithm, and the data used in creating the ensembles has been restricted to a single common training set to avoid issues of training data quantity affecting the outcome.</Paragraph> <Paragraph position="1"> Our best bagging system performed consistently well on all metrics, including exact sentence accuracy. It resulted in a statistically significant F-measure gain of 0.6 over the performance of the base-line parser. That baseline system is the best known Treebank parser. This gain compares favorably with a bound on potential gain from increasing the corpus size.</Paragraph> <Paragraph position="2"> Even though it is computationally expensive to create and evaluate a small (15-30) ensemble of parsers, the cost is far outweighed by the opportunity cost of hiring humans to annotate 40000 more sentences. The economic basis for using ensemble methods will continue to improve with the increasing value (performance per price) of modern hardware.</Paragraph> <Paragraph position="3"> Our boosting system, although dominated by the bagging system, also performed significantly better than the best previously known individual parsing result. We have shown how to exploit the distribution created as a side-effect of the boosting algorithm to uncover inconsistencies in the training corpus. A semi-automated technique for doing this as well as examples from the Treebank that are inconsistently annotated were presented. Perhaps the biggest advantage of this technique is that it requires no a priori notion of how the inconsistencies can be characterized.</Paragraph> </Section> class="xml-element"></Paper>