File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2251_intro.xml
Size: 3,438 bytes
Last Modified: 2025-10-06 14:06:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2251"> <Title>Predicting Part-of-Speech Information about Unknown Words using Statistical Methods</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Part-of-speech tagging involves selecting the most likely sequence of syntactic categories for the words in a sentence. These syntactic categories, or tags, generally consist of parts of speech, often with feature information included.</Paragraph> <Paragraph position="1"> An example set of tags can be found in the Penn Treebank project (Marcus et al., 1993). Part-of-speech tagging is useful for speeding up parsing systems, and allowing the use of partial parsing.</Paragraph> <Paragraph position="2"> Many current systems make use of a Hidden Markov Model (HMM) for part-of-speech tagging. Other methods include rule-based systems (Brill, 1995), maximum entropy models (Ratnaparkhi, 1996), and memory-based models (Daelemans et al., 1996). In an HMM tagger the Markov assumption is made so that the current word depends only on the current tag, and the current tag depends only on adjacent tags. Charniak (Charniak et al., 1993) gives a thorough explanation of the equations for an HMM model, and Kupiec (Kupiec, 1992) describes an HMM tagging system in detail.</Paragraph> <Paragraph position="3"> One important area of research in part-of-speech tagging is how to handle unknown words. If a word is not in the lexicon, then the lexical probabilities must be provided from some other source. One common approach is to use affixation rules to &quot;learn&quot; the probabilities for words based on their suffixes or prefixes. Weischedel's group (Weischedel et al., 1993) examines unknown words in the context of part-of-speech tagging. Their method creates a probability distribution for an unknown word based on certain features: word endings, hyphenation, and capitalization. The features to be used are chosen by hand for the system. Mikheev (Mikheev, 1996; Mikheev, 1997) uses a general purpose lexicon to learn affix and word ending information to be used in tagging unknown words. His work returns a set of possible tags for unknown words, with no probabilities attached, relying on the tagger to disambiguate them.</Paragraph> <Paragraph position="4"> This work investigates the possibility of automatically creating a probability distribution over all tags for an unknown word, instead of a simple set of tags. This can be done by creating a probabilistic lexicon from a large tagged corpus (in this case, the Brown corpus), and using that data to estimate distributions for words with a given &quot;prefix&quot; or &quot;suffix&quot;. Prefix and suffix indicate substrings that come at the beginning and end of a word respectively, and are not necessarily morphologically meaningful.</Paragraph> <Paragraph position="5"> This predictor will offer a probability distribution of possible tags for an unknown word, based solely on statistical data available in the training corpus. Mikheev's and Weischedel's systems, along with many others, uses language specific information by using a hand-generated set of English affixes. This paper investigates what information sources can be automatically constructed, and which are most useful in predicting tags for unknown words.</Paragraph> </Section> class="xml-element"></Paper>