File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/94/p94-1025_intro.xml

Size: 3,220 bytes

Last Modified: 2025-10-06 14:05:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P94-1025">
  <Title>PART-OF-SPEECH TAGGING USING A VARIABLE MEMORY MARKOV MODEL Hinrich Schiitze Center for the Study of Language and Information</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> Many words in English have several parts of speech (POS). For example &amp;quot;book&amp;quot; is used as a noun in &amp;quot;She read a book.&amp;quot; and as a verb in &amp;quot;She didn't book a trip.&amp;quot; Part-of-speech tagging is the problem of determining the syntactic part of speech of an occurrence of a word in context. In any given English text, most tokens are syntactically ambiguous since most of the high-frequency English words have several parts of speech. Therefore, a correct syntactic classification of words in context is important for most syntactic and other higher-level processing of natural language text.</Paragraph>
    <Paragraph position="1"> Two stochastic methods have been widely used for POS tagging: fixed order Markov models and Bidden Markov models. Fixed order Markov models are used in (Church, 1989) and (Charniak et al., 1993). Since the order of the model is assumed to be fixed, a short memory (small order) is typically used, since the number of possible combinations grows exponentially. For example, assuming there are 184 different tags, as in the Brown corpus, there are 1843 = 6,229,504 different order 3 combinations of tags (of course not all of these will actually occur, see (Weischedel et al., 1993)). Because of the large number of parameters higher-order fixed length models are hard to estimate. (See (Brill, 1993) for a rule-based approach to incorporating higher-order information.) In a Hidden iarkov Model (HMM) (Jelinek, 1985; Kupiec, 1992), a different state is defined for each POS tag and the transition probabilities and the output probabilities are estimated using the EM (Dempster et al., 1977) algorithm, which guarantees convergence to.a local minimum (Wu, 1983). The advantage of an HMM is that it can be trained using untagged text. On the other hand, the training procedure is time consuming, and a fixed model (topology) is assumed. Another disadvantage is due to the local convergence properties of the EM algorithm. The solution obtained depends on the initial setting of the model's parameters, and different solutions are obtained for different parameter initialization schemes. This phenomenon discourages linguistic analysis based on the output of the model.</Paragraph>
    <Paragraph position="2"> We present a new method based on variable memory Markov models (VMM) (Ron et al., 1993; Ron et al., 1994). The VMM is an approximation of an unlimited order Markov source. It can incorporate both the static (order 0) and dynamic (higher-order) information systematically, while keeping the ability to change the model due to future observations. This approach is easy to implement, the learning algorithm and classification of new tags are computationally efficient, and the results achieved, using simplified assumptions for the static tag probabilities, are encouraging.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML