File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-1020_metho.xml
Size: 6,030 bytes
Last Modified: 2025-10-06 14:12:27
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-1020"> <Title>AN INTEGRATED SYSTEM FOR MORPHOLOGICAL ANALYSIS OF THE SLOVENE LANGUAGE</Title> <Section position="4" start_page="85" end_page="85" type="metho"> <SectionTitle> 2. Structure </SectionTitle> <Paragraph position="0"> The system was implemented on VAX/VMS in Quintus Prolog and consists of the following parts: (1) The compiler, which takes as its input two-level rules and produces final state automata (transducers).</Paragraph> <Paragraph position="1"> (2) The lexicon module which provides a user interface for the creation and updating of the lexicon - the lexicon input module. This module embodies that part of morphological knowledge of Slovene inflectional morphology which cannot be (elegantly) covered by two-level rules. It is also the part of the system responsible for passing lexical word forms to (3) - the lexicon output module.</Paragraph> <Paragraph position="2"> (3) The MAS itself, which, having access to the transducers and (indirectly) to the lexicon, is able to analyze Slovene word forms into their lexical counterparts, and to synthesize word forms from lexical data.</Paragraph> <Paragraph position="3"> As we can see, the MAS with its knowledge of phono-morphological alternations embodied in the transducers guides the lexicon module in choosing 86 \] the correct lexical word from the lexicon. The MAS module is of course also able to synthesize words, given their lexical representation. The &quot;feeding&quot; of lexical words to the MAS is however application dependent, and will thus not be dealt with further in this paper. The workings of the compiler will also not be discussed, as this is not its first implementation (Karttunen 87).</Paragraph> </Section> <Section position="5" start_page="85" end_page="85" type="metho"> <SectionTitle> 3. Lexicon module </SectionTitle> <Paragraph position="0"> A basic part of our MAS system is the lexicon. The structure of the lexicon accords with the two-level model type lexicon; that is, the lexicon is composed of letter-.tree sub-lexicons (tries), consisting of morphemes with a common property. We can have, for instance, a sub-lexicon for stems, another for endings of male noun declension, another for conjugative endings of certain verbs, etc. A set of sub-lexicons is marked as initial, meaning that a (recognizable) word can only start with a member of these sub-lexicons. The other sub-lexicons are connected to initial sub-lexicons through pointers, typically making them inflectional paradigms of various 'word classes.</Paragraph> <Paragraph position="1"> An entry in a sub-lexicon consists of three parts: (1) the &quot;morpheme&quot;, which, in stem sub-lexicons (two-level rules aside), is the invariant part of the stem lexeme, written in the symbols of the lexical alphabet; (2) the continuation lexicon(s) of the morpheme; (3) morpho-syntactic features of the morpheme.</Paragraph> <Paragraph position="2"> To illustrate: bolezEn decl_subst_f2 / bv=subst gen=fem; (1) (2) (3) (1) - the stem of the lexeme &quot;illness&quot;; the lexical symbol &quot;E&quot; denotes an unstressed &quot;e&quot; (schwa sound), deleted in word forms with non-null endings (&quot;bolezen&quot; - nom. sg., but &quot;bolezni&quot; - gen. sg.); (2) - the name of the lexicon with endings of second female declension; (3) - inherent morpho-syntactic properties of the lexeme (noun, female gender).</Paragraph> <Paragraph position="3"> We can see that the lexicon system can take care of regular paradigms of inflecting words of the language (at least for suffixing languages, such as Slovene), while the two-level rules handle phono-morphological alternations. The Slovene language, however, abounds in alternations that are lexically conditioned. This is not to say that no rules can be constructed to cover these alternations, but rather that they are not (purely) phonologically conditioned. There is for instance an alternation that affects only nouns of male gender which have the &quot;animate&quot; property, and another one which pertains only to the plural and dual of certain Slovene nouns. Since two-level rules are sensitive only to the form of the word (string) they proces, they are insufficient for expressing such alternations.</Paragraph> <Paragraph position="4"> To handle lexically conditioned types of alternations, we have concentrated on the linking mechanism between the sub-lexicons. The &quot;continuation&quot; information belonging to an entry can also, along with a pointer to another sub-lexicon, include a list of texical alternations. When accessing word forms from the lexicon, these alternations tell the lexicon output module how to modify the continuation sub-lexicon to express the desired changes. The rules governing such modifications of the continuation sub-lexicon can perform a certain number of primitive &quot;transformational&quot; operations on the sub-lexicon in question.</Paragraph> <Paragraph position="5"> To make the point clearer, we give a simple case of an alternation that affects certain nouns of male gcnder. The alternation &quot;j epenthesis&quot; inserts a &quot;j&quot; in the stem final position in word forms with a non-null ending; e.g. &quot;krompir&quot; -potato, but &quot;krompirja&quot; for the singular genitive form. The lexicon entry looks like this: krompir decl_subst_m(pod j) / bv=subst gen=mas -anim; When the lexicon output module &quot;jumps&quot; to the continuation lexicon, the &quot;pod_j&quot; item will trigger the corresponding alternation in the morphological rule base of the system. The alternation procedure then takes as its input the continuation lexicon, modifies it, and returns the modified lexicon (with &quot;j&quot; prefixed to the non-null gramatemes). Analysis then proceeds with entries of the modified lexicon.</Paragraph> </Section> class="xml-element"></Paper>