XML Viewer - w95-0101

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/95/w95-0101_abstr.xml
Size: 4,189 bytes
Last Modified: 2025-10-06 13:48:28
<?xml version="1.0" standalone="yes"?>
<Paper uid="W95-0101">
  <Title>Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In this paper we describe an unsupervised learning algorithm for automatically training a rule-based part of speech tagger without using a manually tagged corpus. We compare this algorithm to the Baum-Welch algorithm, used for unsupervised training of stochastic taggers. Next, we show a method for combining unsupervised and supervised rule-based training algorithms to create a highly accurate tagger using only a small amount of manually tagged text.</Paragraph>
    <Paragraph position="1"> Introduction There has recently been a great deal of work exploring methods for automatically training part of speech taggers, as an alternative to laboriously hand-crafting rules for tagging, as was done in the past \[Klein and Simmons, 1963; Harris, 1962\]. Almost all of the work in the area of automatically trained taggers has explored Markov-model based part of speech tagging \[Jelinek, 1985; Church, 1988; Derose, 1988; DeMarcken, 1990; Cutting et al., 1992; Kupiec, 1992; Charniak et al., 1993; Weischedel et al., 1993; Schutze and Singer, 1994; Lin et al., 1994; Elworthy, 1994; Merialdo, 1995\]. 2 For a Markov-model based tagger, training consists of learning both lexical probabilities (P(wordltag)) and contextual probabilities (P(tagiltagi_l ... tagi-n)). Once trained, a sentence can be tagged by searching for the tag sequence that maximizes the product of lexical and contextual probabilities.</Paragraph>
    <Paragraph position="2"> The most accurate stochastic taggers use estimates of lexical and contextual probabilities extracted from large manually annotated corpora (eg. \[Weischedel et al., 1993; Charniak et al., 1993\]). It is possible to use unsupervised learning to train stochastic taggers without the need for a manually annotated corpus by using the Baum-Welch algorithm \[Baum, 1972; Jelinek, 1985; Cutting et al., 1992; Kupiec, 1992; Elworthy, 1994; Merialdo, 1995\]. This algorithm works by iteratively adjusting the lexical and contextual probabilities to increase the overall probability of the training corpus. If no prior knowledge is available, probabilities are initially either assigned randomly or evenly distributed. Although less accurate than the taggers built using manually annotated corpora, the fact that they can be trained using only a dictionary listing the allowable parts of speech for each word and not needing a manually tagged corpus is a huge advantage in many situations. Although a number of manually tagged corpora are available (eg. \[Francis and Kucera, 1982; Marcus et al., 1993\]), training on a corpus of one type and then applying the tagger to a corpus of a different type usually results in a tagger with low accuracy \[Weischedel et al.,  text each time the tagger is to be apphed to a new language, and even when being applied to a new type of text.</Paragraph>
    <Paragraph position="3"> In \[Brill, 1992; Brill, 1994\], a rule-based part of speech tagger is described which achieves highly competitive performance compared to stochastic taggers, and captures the learned knowledge in a set of simple deterministic rules instead of a large table of statistics. In addition, the learned rules can be converted into a deterministic finite state transducer. Tagging with this finite state transducer requires n steps to tag a sequence of length n, independent of the number of rules, and results in a part of speech tagger ten times faster than the fastest stochastic tagger \[Roche and Schabes, 1995\]. One weakness of this rule-based tagger is that no unsupervised training algorithm has been presented for learning rules automatically without a manually annotated corpus. In this paper we present such an algorithm. We describe an algorithm for both unsupervised and weakly supervised training of a rule-based part of speech tagger, and compare the performance of this algorithm to that of the Baum-Welch algorithm.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML