File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/86/j86-4001_intro.xml
Size: 4,622 bytes
Last Modified: 2025-10-06 14:04:32
<?xml version="1.0" standalone="yes"?> <Paper uid="J86-4001"> <Title>ASSOCIATIVE MODEL OF MORPHOLOGICAL ANALYSIS: As EMPIRICAL INQUIRY 1</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 THE PROBLEM DECOMPOSED </SectionTitle> <Paragraph position="0"> Henceforth we call lexical entry or lexeme the basic word form that serves as a carder of morpho-syntactic information about a word in a lexicon. Word form or simply form denotes an inflected lexeme. The grammatical representation of a word form is called grammatical word or word. The grammatical word of a lexeme in our model is singular nominative case for nominals and first infinitive for verbs. This is the practice of printed Finnish mono- and bilingual dictionaries and lexicons.</Paragraph> <Paragraph position="1"> There is a one-to-many (one-to-thousands) mapping between lexemes and their valid forms. An effective inverse mapping is the analysis problem. Two distinct lexemes may, and frequently do, produce identical forms, homographs. Consequently, the analysis of a form is occasionally ambiguous, and it is paramount for the analysis to find all lexemes a given form represents.</Paragraph> <Paragraph position="2"> Any native Finn is able to produce without effort the proper phonemic (or graphemic) word form of a given lexeme to fit a given context. When various dialects are reduced into standard written orthographic Finnish, the lexeme takki ('coat'), to take a random example, may appear in text in any of the following forms:</Paragraph> <Paragraph position="4"> Similarly, we can take any verb lexeme, say jakaa ('deliver'), and list all its distinct forms: (2) { jakaa,jaan,jaat,jakavat,jaoin,jakanevat,jakakoot, jakaisimme,jaetaankohan,jaettaisiinkohan,... } A native speaker, when browsing lists such as (1) or (2), would accept the elements as valid forms and, furthermore, easily assign each its basic form, its lexeme. For the ninth element in (1), for instance, an average adult Finn would spontaneously recognize it as a properly inflected form of takki. But if asked to interpret the form, he after some hesitation would probably not be able to extract and identify affixes attached.</Paragraph> <Paragraph position="5"> The basic meaning-bearing unit in a word is a morpheme. The linear arrangements of morpheme classes in Finnish word forms are (Ikola 1977): Brackets are used to distinguish optional morpheme classes. VERBNOM (verb nominal) comprises two participle and four infinitive morphemes. Some of the classes are cross-categorial. COMPAR (comparison) may appear only with adjectival stems or with participial forms. Past tense can be joined only with the indicative mood; tense and mood are therefore expressed in a common class.</Paragraph> <Paragraph position="6"> Two clitics may be attached in a row, on rare occasions even three. In speech a < b means b follows a in time; in writing b is to the right of a.</Paragraph> <Paragraph position="7"> Once the morpheme classes and their ordering are made explicit, it is then easy for any given word form to isolate the representatives of the morpheme classes. A noun, say takkeinensakohan from (1), when matched against (3) is analyzed as takke+i+ne+nsa+ko+han corresponding to a stem, plural, comitative case, third person possessive, and two clitic morphemes. Similarly, a randomly picked complex verb form such as jaettaisiinkohan from (2) reads as jae+tta+isi+:n+ko+han, representing a stem, passive, conditional (and hence present tense) passive suffix (serving as a kind of &quot;fourth&quot; person), and two clitic morphemes.</Paragraph> <Paragraph position="8"> Each class has a small number of elements, and each morpheme has at most a few allomorphs, phonemic realizations. From the strategic point of view the morpheme classes fall into two categories. Other morphemes than the stems have a closed set of allomorphs. They can be recognized with a finite set of rules. Stems constitute a large and unbounded set of morphemes as new lexemes may develop and enter in the vocabulary. Hence the allomorphic variants of stems cannot be directly used in a closed set of rules. To bound a set Of rules, the rules for stems must recognize invariant parts of phoneme strings, not entire stem alternants.</Paragraph> <Paragraph position="9"> The original problem was thus decomposed into two parts: The Morphotactic Problem segments word forms and solves the allomorph relation for morphemes other than stems; The Stem Alternation Problem solves the residual allomorph relation for stems.</Paragraph> </Section> class="xml-element"></Paper>