XML Viewer - w02-2006

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-2006_abstr.xml

Size: 1,395 bytes

Last Modified: 2025-10-06 13:42:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-2006">
  <Title>Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one person-day of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hard-copy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The algorithm begins by inducing initial lexical POS distributions from English translations in a bilingual dictionary without POS tags. It handles irregular, regular and semi-regular morphology through a robust generative model using weighted Levenshtein alignments. Unsupervised induction of grammatical gender is performed via global modeling of contextwindow feature agreement. Using a combination of these and other evidence sources, interactive training of context and lexical prior models are accomplished for fine-grained POS tag spaces. Experiments show high accuracy, fine-grained tag resolution with minimal new human effort.</Paragraph>
  </Section>
class="xml-element"></Paper>

Download Original XML