File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/96/w96-0102_abstr.xml

Size: 1,871 bytes

Last Modified: 2025-10-06 13:48:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0102">
  <Title>MBT: A Memory-Based Part of Speech Tagger-Generator</Title>
  <Section position="2" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably.</Paragraph>
    <Paragraph position="1"> Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML