File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/w99-0606_intro.xml

Size: 2,677 bytes

Last Modified: 2025-10-06 14:06:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0606">
  <Title>Boosting Applied to Tagging and PP Attachment</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Boosting is a machine learning algorithm that has been applied successfully to a variety of problems, but is almost unknown in computational linguistics. We describe experiments in which we apply boosting to part-of-speech tagging and prepositional phrase attachment. Results on both PP-attachment and tagging are within sampling error of the best previous results.</Paragraph>
    <Paragraph position="1"> The current best technique for PP-attachment (backed-off density estimation) does not perform well for tagging, and the current best technique for tagging (maxent) is below state-of-the-art on PPattachment. Boosting achieves state-of-the-art performance on both tasks simultaneously.</Paragraph>
    <Paragraph position="2"> The idea of boosting is to combine many simple &amp;quot;rules of thumb,&amp;quot; such as &amp;quot;the current word is a noun if the previous word is the.&amp;quot; Such rules often give incorrect classifications. The main idea of boosting is to combine many such rules in a principled manner to produce a single highly accurate classification rule.</Paragraph>
    <Paragraph position="3"> There are similarities between boosting and transformation-based learning (Brill, 1993): both build classifiers by combining simple rules, and both are noted for their resistance to overfitting.</Paragraph>
    <Paragraph position="4"> But boosting, unlike transformation-based learning, rests on firm theoretical foundations; and it outperforms transformation-based learning in our experiments. null There are also superficial similarities between boosting and maxent. In both, the parameters are weights in a log-linear function. But in maxent, the log-linear function defines a probability, and the objective is to maximize likelihood, which may not minimize classification error. In boosting, the log-linear function defines a hyperplane dividing examples into (binary) classes, and boosting minimizes classification error directly. Hence boosting is usually more appropriate when the objective is classification rather than density estimation.</Paragraph>
    <Paragraph position="5"> A notable property of boosting is that it maintains an explicit measure of how difficult it finds particular training examples to be. The most difficult examples are very often mislabelled examples. Hence, boosting can contribute to improving data quality by identifying annotation errors.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML