File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1020_intro.xml
Size: 1,691 bytes
Last Modified: 2025-10-06 14:05:54
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1020"> <Title>Distributional Part-of-Speech Tagging Hinrich Schfitze</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Since online text becomes available in ever increasing volumes and an ever increasing number of languages, there is a growing need for robust processing techniques that can analyze text without expensive and time-consuming adaptation to new domains and genres. This need motivates research on fully automatic text processing that may rely on general principles of linguistics and computation, but does not depend on knowledge about individual words.</Paragraph> <Paragraph position="1"> In this paper, we describe an experiment on fully automatic derivation of the knowledge necessary for part-of-speech tagging. Part-of-speech tagging is of interest for a number of applications, for example access to text data bases (Kupiec, 1993), robust parsing (Abney, 1991), and general parsing (deMarcken, 1990; Charniak et al., 1994).</Paragraph> <Paragraph position="2"> The goal is to find an unsupervised method for tagging that relies on general distributional properties of text, properties that are invariant across languages and sublanguages. While the proposed algorithm is not successful for all grammatical categories, it does show that fully automatic tagging is possible when demands on accuracy are modest.</Paragraph> <Paragraph position="3"> The following sections discuss related work, describe the learning procedure and evaluate it on the Brown Corpus (Francis and Ku~era, 1982).</Paragraph> </Section> class="xml-element"></Paper>