File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1031_intro.xml

Size: 3,605 bytes

Last Modified: 2025-10-06 14:01:42

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1031">
  <Title>Example Selection for Bootstrapping Statistical Parsers</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Co-training
</SectionTitle>
    <Paragraph position="0"> Blum and Mitchell (1998) introduced co-training to bootstrap two classifiers with different views of the data.</Paragraph>
    <Paragraph position="1"> The two classifiers are initially trained on a small amount of annotated seed data; then they label unannotated data for each other in an iterative training process. Blum and Mitchell prove that, when the two views are conditionally independent given the label, and each view is sufficient for learning the task, co-training can boost an initial weak learner using unlabeled data.</Paragraph>
    <Paragraph position="2"> The theory underlying co-training has been extended by Dasgupta et al. (2002) to prove that, by maximizing their agreement over the unlabeled data, the two learners make few generalization errors (under the same independence assumption adopted by Blum and Mitchell).</Paragraph>
    <Paragraph position="3"> Abney (2002) argues that this assumption is extremely strong and typically violated in the data, and he proposes a weaker independence assumption.</Paragraph>
    <Paragraph position="4"> Goldman and Zhou (2000) show that, through careful selection of newly labeled examples, co-training can work even when the classifiers' views do not satisfy the independence assumption. In this paper we investigate methods for selecting labeled examples produced by two statistical parsers. We do not explicitly maximize agreement (along the lines of Abney's algorithm (2002)) because it is too computationally intensive for training parsers.</Paragraph>
    <Paragraph position="5"> The pseudocode for our co-training framework is given in Figure 1. It consists of two different parsers and a central control that interfaces between the two parsers and the data. At each co-training iteration, a small set of sentences is drawn from a large pool of unlabeled sentences and stored in a cache. Both parsers then attempt to label every sentence in the cache. Next, a subset of the newly labeled sentences is selected to be added to the training data. The examples added to the training set of one parser (referred to as the student) are only those produced by the other parser (referred to as the teacher), although the methods we use generalize to the case in which the parsers share a single training set. During selection, one parser first acts as the teacher and the other as the student, and then the roles are reversed.</Paragraph>
    <Paragraph position="6"> A and B are two different parsers.</Paragraph>
    <Paragraph position="7"> MiA and MiB are the models of A and B at step i.</Paragraph>
    <Paragraph position="8"> U is a large pool of unlabeled sentences.</Paragraph>
    <Paragraph position="9"> Ui is a small cache holding a subset of U at step i.</Paragraph>
    <Paragraph position="10"> L is the manually labeled seed data.</Paragraph>
    <Paragraph position="11"> LiA and LiB are the labeled training examples for A and B at step i.</Paragraph>
    <Paragraph position="12"> Initialize:</Paragraph>
    <Paragraph position="14"> Loop: Ui-Add unlabeled sentences from U. MiA and MiB parse the sentences in Ui and assign scores to them according to their scoring functions fA and fB.</Paragraph>
    <Paragraph position="15"> Select new parses{PA}and{PB}according to some selection method S, which uses the scores from fA and fB.</Paragraph>
    <Paragraph position="17"/>
  </Section>
class="xml-element"></Paper>
Download Original XML