File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0311_metho.xml
Size: 14,813 bytes
Last Modified: 2025-10-06 14:14:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0311"> <Title>MORPHOLOGICAL PRODUCTIVITY IN THE LEXICON</Title> <Section position="3" start_page="0" end_page="105" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Languages like Finnish, Hungarian, and Turkish have relatively rich morphology which governs grammatical functions often delegated to syntax in languages such as English. Prominence of morphology puts a greater demand on the information in the lexicon, which may grow to an unmanageable size due to heavy use of inflections and derivations. In Turkish, for instance, the nominal paradigm has three affixes (number, case, relativizer), and the verbal paradigm has eight (for voice, tense, person, aspect, and mood). Generating the full paradigm for a nominal and a verbal root requires 2 3 and 2 8 entries in the lexicon, respectively.</Paragraph> <Paragraph position="1"> The problem is further complicated by the rich inventory of derivational affixes for both paradigms, as exemplified in 11 Hankamer \[7\] argues convincingly that full listing of every word form in the lexicon is untenable for agglutinative languages.</Paragraph> <Paragraph position="3"> 'The clerks have not been informed of their duties' Handling inflections and derivations with lexical rules opens us possibilities for encoding semantic and grammatical changes in the lexicon as well. For instance, a causative suffix will demote an agent to a patient or a recipient, and it will add a new grammatical role for the causer (the new agent). A locative case suffix will mark a NP as an adjunct, which can no longer satisfy subcategorization requirements of the verbs or postpositions. We elaborate on the consequences of these phenomena in section 3.</Paragraph> <Paragraph position="4"> Another source for economy of representation can be seen in example (2), where attributive adjectives are used as nouns in 2b and 2d. One solution to this problem is syntactic underspecification, e.g., grouping the nouns and adjectives under a single lexical category. 1 An alternative is to use a lexical rule for differentiating predicate and term reading of the lexical entry.</Paragraph> <Paragraph position="5"> (2) a. kuru yaprak dry leaf 'dry leaf' b. meyve kuru-su fruit dry-POSS 'dried fruit' c. ya#-h hantm age-ADJ lady 'old lady' d. biitiin ya#-h-lar all age-ADJ-PLU 'all elderly' In what follows, we will describe different kinds of lexical rules for type constraints, and handling changes in grammatical roles or subcategorization requirements. We also discuss processing issues such as run-time generation versus pre-compiling of word forms.</Paragraph> </Section> <Section position="4" start_page="105" end_page="107" type="metho"> <SectionTitle> 2 Morphology-syntax Interface </SectionTitle> <Paragraph position="0"> Modelling inflections, derivations, and the corresponding phonological alternations via lexical rules amounts to the lexicalization of morphology. The alternatives to this approach (for Turkish) have also been explored, e.g., the modularization of syntax and morphology by keeping them (and their lexicons) as separate systems that communicate with each other \[5\], or integrating morphology, syntax and semantics, thus treating morphotactics in the same manner as syntax with respect to semantic composition \[1\]. From a computational point of view, the modular approach has efficient lexical access since lexical search is performed on root forms, and bound morphemes are not considered lexical items. In the integrated (multi-dimensional) approach, the lexicon contains free and bound morphemes; they have complete syntactic and semantic specifications. Some of the inflections, e.g. person and number, do not have any contribution to semantics, hence their semantic form (or LF) is that of identity. Some inflections, such as case and causative affixes, compose semantic form of the stem (LFs) with that of the affix. LF, can be turned into (cause z LF,) for causatives where z is the new argument introduced by the causative affix. 2 Similar arguments can be made for the semantic contribution of adjunct case markers.</Paragraph> <Paragraph position="1"> The lexical approach to morphology presented here is a mid-point in the design of the morphology-syntax interface. In this view, morphology is not isolated from syntax, but, similar to the modular organization, bound morphemes are not considered lexical items. They can be attached to stems via lexical rules.</Paragraph> <Paragraph position="2"> This implies that lexical rules are responsible for semantic composition and for the changes in syntactic requirements. This view also represents a middle ground in the complexity of lexical structures.</Paragraph> <Paragraph position="3"> I In fact, traditional Turkish grammar books such as \[ 10\] collectively call them &quot;substantives:' 2cf. example 9 Keeping morphology and syntax entirely separate forces one to stipulate different scopes for affixes. For instance, the adverbial suffix -ken and the adjectival -lu might have phrasal (3a and 3c) or lexical scope (3b and 3d). Multi-dimensional approach allows affixes to 'pick out' different scopes in mixed morphological and syntactic composition. The lexical approach can accomodate both readings, provided that lexical rules are invoked with relevant syntactic information, e.g., valency of the verb. Morphologically ambiguous cases such as 4 are handled by multiple instantiations of the lexical rules.</Paragraph> <Paragraph position="4"> (3) a. ~ocuk top-a \[kaleci-ye bakar\]-ken vurdu child ball-DAT goalkeeper-DAT look-ADV hit 'The child hit the ball facing the goalkeeper.' b. ~ocuklar \[yiiriir\]-ken tan toplamt#lar children walk-ADV stone picked 'The children had picked stones while walking.' C. \[Uzun kol\]-lu g6mlek long sleeve-ADJ shirt 'shirt with long sleeves' d. Uzun \[9igek\]-li g6mlek long fiower-ADJ shirt 'long shirt with flower patterns' (4) a. kalem-ler-i b. kalem-ler-i c. kalem-leri</Paragraph> <Paragraph position="6"> 'the pencils (=OBJ)' 'his/her pencils' 'their pencils' It is too early to evaluate the advantages and disadvantages of these approaches in terms of competence grammars and performance issues. But the choice of the strategy also affects the design of lexical organization. For instance, if inflections and derivations are handled by lexical rules, the morphological features need not be kept in the lexicon, since the lexical rules will reflect the changes in syntactic and semantic requirements coming from morphology. If morphology is treated almost like syntax, lexical knowledge should contain richer morphological information, including a semantic representation for bound forms (affixes), information about boundedness/freeness of morphemes, and the type of attachment (e.g., affixation, cliticization, syntactic concatenation) \[1, 8\]. This will enable the system to rule out, for instance, affixation of two free forms, or impose selectional restrictions on the stems of affixes.</Paragraph> <Paragraph position="7"> In this study, a lexical inheritance hierarchy is used in conjunction with the lexical rules to obtain type constraints and feature structures for free forms (words); bound forms are not part of the lexicon. The hierarchy is given in Figure 1.</Paragraph> <Paragraph position="8"> This tree is part of a greater hierarchy which includes inheritance information for words and phrases. We make use of the inheritance and type-checking mechanism of ALE \[2\] to impose type-specific constraints on words. Words are distinguished from phrases by disallowing any kind of gapping below the word level in the tree. Designating a lexical item as one of the subtypes in the hierarchy will apply all the constraints and incorporate the feature structures of the supertypes along the path to word. For instance, a qualitative adjective (e.g., rahat=comfortable) is distinguished from a quantitative one (e.g., gift=double) by its choice of modifiers; the latter does not allow intensifiers (5).</Paragraph> <Paragraph position="9"> (5) a. gok rahat koltuk very comfortable couch 'very comfortable couch' b. * gok gift koltuk C. rahat gift koltuk comfortable double couch 'comfortable twin couch' The fragments 3 of the type constraints for these subtypes are given in Figure 2. The controlled use of type constraints at different levels of the lexical hierarchy eliminate the need to enumerate type-specific lexical rules to achieve the same effect.</Paragraph> </Section> <Section position="5" start_page="107" end_page="107" type="metho"> <SectionTitle> 3 Types of lexical rules </SectionTitle> <Paragraph position="0"> Inflections: Lexical rules for inflections can check morphotactic constraints for proper ordering of morphemes. More importantly, they should reflect the grammatical or semantic requirements imposed by inflections. For instance, the locative case suffix in Turkish also marks an NP as adjunct (6).</Paragraph> <Paragraph position="1"> the type of NP is changed to an adjunct. This is achieved by modifying the head feature MOD: While the nominative marked noun has null value, a MODSYN value with verbal head is introduced in the head feature of the locative noun. This will allow the locative marked noun to modify a verb. Thus, it cannot satisfy the subcategonzation requirements of verbs or postpositions. This issue is critical for parsing relatively free word-order languages where grammatical relations are often indicated by overt case marking rather than structural position. Figure 3 also shows the derivation of the semantic representation for the case marked NP; at(x,y) is a second-order predicate that holds between a term z and a predicate y. This predicate is inserted into the set of restrictions for the noun. Although this method is not generative in the sense of \[14\], it allows semantic composition in the lexicon.</Paragraph> <Paragraph position="2"> Derivations: Denominal verbs, deverbal nouns, and part of speech changes can be modelled respectively by adding subcategorization frames, discharging subcategorization frames, and type coercions, via lexical rules. The most difficult issue in derivations is the semantic composition, For instance, the -CI morpheme (with allomorph s -ct/-ci/-cu/-cii/-ft/-fi/-~u/-fii) adds the meaning &quot;doer/user of something&quot; (7a), &quot;seller/lover of something&quot; (7b), or habitual (7c).</Paragraph> <Paragraph position="3"> Clearly, this ambiguity cannot be resolved without incorporating into lexical semantics a Qualia Structure a la Pustejovsky \[14\], or lexical semantic constraints \[4\]. We have been incorporating these types of constraints. Unfortunately, descriptive work on Turkish linguistics in this regard is very scarce, and there is no ontology such as Levin's \[9\]. Using features like \[Tanimate\], \[:Fartifact\], \[=Fcontainer\], and \[=Fperiod\], one can define semantic fields for the derivational morphemes. We expand the set of features as more lexical items are added to the lexicon. This is a very labour intensive task; the lack of a large-scale initiative on lexicography in the manner of LDOCE or COBUILD is hindering the efforts for automatic extraction of lexical knowledge from on-line resources.</Paragraph> <Paragraph position="4"> Our strategy is to obtain complex forms derivationally if the semantic relation of the bound morpheme to its stem is fairly predictable. We use lexicalized forms when the meaning is not compositional. One such ii0 case is the denominal verb suffix -le, which is very productive but has no predictable meaning that can be derived from the lexical semantics of the stem.</Paragraph> <Paragraph position="5"> Lexical Category Changes: As described in section 1, we model the nominal use of adjectives in Turkish by a single lexical item which may be interpreted as a term or a predicate by a lexical rule. There are other linguistic phenomena that are on the boundary of lexicon and syntax, which we opted to contain in the lexicon, e.g., non-referential objects, and valency change in the causatives. In the following, we briefly describe the lexical rules for them.</Paragraph> <Paragraph position="6"> Case assignment is overt in Turkish, which allows for scrambling of the constituents, All six permutations of the SOV order are felicitous if the object NP is case marked (e.g., 8a and 8b). If the object is non-referential or indefinite (cf. 8a and 8c), it is not marked morphologically, which blocks scrambling, and the unmarked SOV order is used (cf. 8c and 8d).</Paragraph> <Paragraph position="7"> (8) a. ~ocuk kitab-t oku-du child.NOM book-ACC(=object) read-TENSE.3SG 'The child read the book.' b, Kitab-t 9ocuk oku-du c. (ocuk kitap oku-du</Paragraph> <Paragraph position="9"> 'the child read a book (~ the child did book-reading)' d. * Kitap 9ocuk okudu Non-referential objects are not inflected, and they must occupy the immediately preverbal position. One way of dealing with nouns, then, is to keep two entries in the lexicon: one for unmarked form which may receive case marking and scramble, and one with lexically assigned case (accusative), which may not scramble. Our solution is to have a lexical rule that changes the subcategofization frames of verbs to handle cases where objects may be case-marked NPs or unmarked Ns. In the second case, the entity is marked indefinite and all scrambling is blocked by the lexical rule. Figure 4 shows the lexical rule in ALE notation (the rule is simplified for ease of exposition).</Paragraph> <Paragraph position="10"> Causatives can be modelled in a similar vein. A causative suffix changes the subcategofization frame of the verb by adding one more argument and changing the grammatical constraints on the other arguments. For instance, the new argument becomes the subject (causer), and the old subject (agent) is demoted down the grammatical hierarchy \[3\] to direct object or indirect object, depending on the valency of the verb: Morphophonemic rules: The rules for inflectional and defivational morphology might also take into account the archiphonemes that are not marked for certain features. For instance, the locative case marker has allomorphs -de/-da/-te/-ta. They may be represented uniquely by two metaphonemes -DA where D is a dental stop unmarked for voice and A is a low unround vowel unmarked for backness/frontness. Vowel harmony and voicing constraints 4 determine their surface realization during morphological composition. These kinds of rules are not lexical rules per se since they do not operate on lexical properties of the words. In our model, they are embedded in lexical rules for inflections and derivations.</Paragraph> </Section> class="xml-element"></Paper>