File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3229_intro.xml
Size: 2,602 bytes
Last Modified: 2025-10-06 14:02:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-3229"> <Title>A Resource-light Approach to Russian Morphology: Tagging Russian using Czech resources</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Morphological processing and part-of-speech tagging are essential for many NLP tasks, including machine translation, information retrieval and parsing. In this paper, we describe a resource-light approach to the tagging of Russian. Because Russian is a highly inflected language with a high degree of morpheme homonymy (cf. Table 11) the tags involved are more numerous and elaborate than those typically used for English. This complicates the tagging task, although as has been previously noted (Elworthy, 1995), the increased complexity of the tags does not necessarily translate into a more demanding tagging task. Because no large annotated corpora of Russian are available to us, we instead chose to use an annotated corpus of Czech. Czech is sufficiently similar to Russian that it is reasonable to suppose that information about Czech will be relevant in some way to the tagging of Russian.</Paragraph> <Paragraph position="1"> The languages share many linguistic properties (free word order and a rich morphology which plays a considerable role in determining agreement and argument relationships). We created a morphological analyzer for Russian, combined the results with information derived from Czech and used the TnT (Brants, 2000) tagger in a number of differ1All Russian examples in this paper are transcribed in the Roman alphabet. Our system is able to analyze Russian texts in both Cyrillic and various transcriptions.</Paragraph> <Paragraph position="2"> krasiv-a beautiful (short adjective, feminine) muVz-a husband (noun, masc., sing., genitive) husband (noun, masc., sing., accusative) okn-a window (noun, neuter, sing., genitive) window (noun, neuter, pl., nominative) window (noun, neuter, pl., accusative) knig-a book (noun, fem., sing., nominative) dom-a house (noun, masc., sing., genitive) house (noun, masc., pl., nominative) house (noun, masc., pl., accusative) skazal-a say (verb, fem., sing., past tense) dv-a two (numeral, masc., nominative) ent ways, including a a committee-based approach, which turned out to give the best results. To evaluate the results, we morphologically annotated (by hand) a small corpus of Russian: part of the translation of Orwell's &quot;1984&quot; from the MULTEXT-EAST project (V'eronis, 1996).</Paragraph> </Section> class="xml-element"></Paper>