File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1023_intro.xml
Size: 1,993 bytes
Last Modified: 2025-10-06 14:06:58
<?xml version="1.0" standalone="yes"?> <Paper uid="P99-1023"> <Title>A Second-Order Hidden Markov Model for Part-of-Speech Tagging</Title> <Section position="2" start_page="0" end_page="175" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Part-of-speech tagging is the act of assigning each word in a sentence a tag that describes how that word is used in the sentence. Typically, these tags indicate syntactic categories, such as noun or verb, and occasionally include additional feature information, such as number (singular or plural) and verb tense. The Penn Treebank documentation (Marcus et al., 1993) defines a commonly used set of tags.</Paragraph> <Paragraph position="1"> Part-of-speech tagging is an important research topic in Natural Language Processing (NLP). Taggers are often preprocessors in NLP systems, making accurate performance especially important. Much research has been done to improve tagging accuracy using several different models and methods, including: hidden Markov models (HMMs) (Kupiec, 1992), (Charniak et al., 1993); rule-based systems (Brill, 1994), (Brill, 1995); memory-based systems (Daelemans et al., 1996); maximum-entropy systems (Ratnaparkhi, 1996); path voting constraint systems (Tiir and Oflazer, 1998); linear separator systems (Roth and Zelenko, 1998); and majority voting systems (van Halteren et al., 1998).</Paragraph> <Paragraph position="2"> This paper describes various modifications to an HMM tagger that improve the performance to an accuracy comparable to or better than the best current single classifier taggers. This improvement comes from using second-order approximations of the Markov assumptions. Section 2 discusses a basic first-order hidden Markov model for part-of-speech tagging and extensions to that model to handle out-oflexicon words. The new second-order HMM is described in Section 3, and Section 4 presents experimental results and conclusions.</Paragraph> </Section> class="xml-element"></Paper>