XML Viewer - w03-0407

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0407_metho.xml
Size: 3,069 bytes
Last Modified: 2025-10-06 14:08:22
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0407">
  <Title>Bootstrapping POS taggers using Unlabelled Data</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3ThePOS taggers
</SectionTitle>
    <Paragraph position="0"> The two POS taggers used in the experiments are TNT, a publicly available Markov model tagger (Brants, 2000), and a reimplementation of the maximum entropy (ME) tagger MXPOST (Ratnaparkhi, 1996). The ME tagger, which we refer to as C&amp;C, uses the same features as MX-POST, but is much faster for training and tagging (Curran and Clark, 2003). Fast training and tagging times are important for the experiments performed here, since the bootstrapping process can require many tagging and training iterations.</Paragraph>
    <Paragraph position="1"> The model used by TNT is a standard tagging Markov model, consisting of emission probabilities, and transition probabilities based on trigrams of tags. It also deals with unknown words using a suffix analysis of the target word (the word to be tagged). TNT is very fast for both training and tagging.</Paragraph>
    <Paragraph position="2"> The C&amp;C tagger differs in a number of ways from TNT. First, it uses a conditional model of a tag sequence given a string, rather than a joint model. Second, ME models are used to define the conditional probabilities of a tag given some context. The advantage of ME models over the Markov model used by TNT is that arbitrary features can easily be included in the context; so as well as considering the target word and the previous two tags (which is the information TNT uses), the ME models also consider the words either side of the target word and, for unknown and infrequent words, various properties of the string of the target word.</Paragraph>
    <Paragraph position="3"> A disadvantage is that the training times for ME models are usually relatively slow, especially with iterative scaling methods (see Malouf (2002) for alternative methods). Here we use Generalised Iterative Scaling (Darroch and Ratcliff, 1972), but our implementation is much faster than Ratnaparkhi's publicly available tagger. The C&amp;C tagger trains in less than 7 minutes on the 1 million words of the Penn Treebank, and tags slightly faster than TNT.</Paragraph>
    <Paragraph position="4"> Since the taggers share many common features, one might think they are not different enough for effective co-training to be possible. In fact, both taggers are sufficiently different for co-training to be effective. Section 4 shows that both taggers can benefit significantly from the information contained in the other's output.</Paragraph>
    <Paragraph position="5"> The performance of the taggers on section 00 of the WSJ Penn Treebank is given in Table 1, for different seed set sizes (number of sentences). The seed data is taken  from sections 2-21 of the Treebank. The table shows that the performance of TNT is significantly better than the performance of C&amp;C when the size of the seed data is very small.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML