File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0102_intro.xml
Size: 2,905 bytes
Last Modified: 2025-10-06 14:06:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0102"> <Title>MBT: A Memory-Based Part of Speech Tagger-Generator</Title> <Section position="3" start_page="0" end_page="14" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Part of Speech (POS) tagging is a process in which syntactic categories are assigned to words. It can be seen as a mapping from sentences to strings of tags.</Paragraph> <Paragraph position="1"> Automatic tagging is useful for a number of applications: as a preprocessing stage to parsing, in information retrieval, in text to speech systems, in corpus linguistics, etc. The two factors determining the syntactic category of a word are its lexical probability (e.g. without context, man is more probably a noun than a verb), and its contextual probability (e.g. after a pronoun, man is more probably a verb than a noun, as in they man the boats). Several approaches have been proposed to construct automatic taggers.</Paragraph> <Paragraph position="2"> Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). In these approaches, a tag sequence is chosen for a sentence that maximizes the product of lexical and contextual probabilities as estimated from a tagged corpus.</Paragraph> <Paragraph position="3"> In rule-based approaches, words are assigned a tag based on a set of rules and a lexicon. These rules can either be hand-crafted (Garside et al., 1987; Klein & Simmons, 1963; Green & Rubin, 1971), or learned, as in Hindle (1989) or the transformation-based error-driven approach of Brill (1992).</Paragraph> <Paragraph position="4"> In a memory-based approach, a set of cases is kept in memory. Each case consists of a word (or a lexical representation for the word) with preceding and following context, and the corresponding category for that word in that context. A new sentence is tagged by selecting for each word in the sentence and its context the most similar case(s) in memory, and extrapolating the category of the word from these 'nearest neighbors'. A memory-based approach has features of both learning rule-based taggers (each case can be regarded as a very specific rule, the similarity based reasoning as a form of conflict resolution and rule selection mechanism) and of stochastic taggers: it is fundamentally a form of k-nearest neighbors (k-nn) modeling, a well-known non-parametric statistical pattern recognition technique. The approach in its basic form is computationally expensive, however; each new word in context that has to be tagged, has to be compared to each pattern kept in memory. In this paper we show that a heuristic case base compression formalism (Daelemans et al., 1996), makes the memory-based approach computationally attractive.</Paragraph> </Section> class="xml-element"></Paper>