File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0407_intro.xml

Size: 2,724 bytes

Last Modified: 2025-10-06 14:01:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0407">
  <Title>Bootstrapping POS taggers using Unlabelled Data</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Co-training
</SectionTitle>
    <Paragraph position="0"> Given two (or more) &amp;quot;views&amp;quot; (as described in Blum and Mitchell (1998)) of a classification task, co-training can be informally described as follows: AF Learn separate classifiers for each view using a small amount of labelled seed data.</Paragraph>
    <Paragraph position="1"> AF Use each classifier to label some previously unlabelled data.</Paragraph>
    <Paragraph position="2"> AF For each classifier, add some subset of the newly labelled data to the training data.</Paragraph>
    <Paragraph position="3"> AF Retrain the classifiers and repeat.</Paragraph>
    <Paragraph position="4"> The intuition behind the algorithm is that each classifier is providing extra, informative labelled data for the other classifier(s). Blum and Mitchell (1998) derive PAC-like guarantees on learning by assuming that the two views are individually sufficient for classification and the two views are conditionally independent given the class. Collins and Singer (1999) present a variant of the Blum and Mitchell algorithm, which directly maximises an objective function that is based on the level of agreement between the classifiers on unlabelled data.</Paragraph>
    <Paragraph position="5"> Dasgupta et al. (2002) provide a theoretical basis for this approach by providing a PAC-like analysis, using the same independence assumption adopted by Blum and Mitchell. They prove that the two classifiers have low generalisation error if they agree on unlabelled data.</Paragraph>
    <Paragraph position="6"> Abney (2002) argues that the Blum and Mitchell independence assumption is very restrictive and typically violated in the data, and so proposes a weaker independence assumption, for which the Dasgupta et al. (2002) results still hold. Abney also presents a greedy algorithm that maximises agreement on unlabelled data, which produces comparable results to Collins and Singer (1999) on their named entity classification task.</Paragraph>
    <Paragraph position="7"> Goldman and Zhou (2000) show that, if the newly labelled examples used for re-training are selected carefully, co-training can still be successful even when the views used by the classifiers do not satisfy the independence assumption.</Paragraph>
    <Paragraph position="8"> In remainder of the paper we present a practical method for co-training POS taggers, and investigate the extent to which example selection based on the work of Dasgupta et al. and Abney can be effective.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML