! 
MORPHOLOGICAL PRODUCTIVITY 
IN THE LEXICON 
Onur T. ~ehito~lu and H. Cem Bozsahin 
Laboratory for the Computational Studies of Language 
Department of Computer Engineering 
Middle East Technical University, Ankara, Turkey 
{onur, bozsahin}@LcsL, metu. edu. tr 
Abstract 
In this paper we outline a lexical organization for Turkish that makes use of lexical rules for inflec- 
tions, derivations, and lexical category changes to control the proliferation of lexical entries. Lexical 
rules handle changes in grammatical roles, enforce type constraints, and control the mapping of sub- 
categorization frames in valency changing operations. A lexical inheritance hierarchy facilitates the 
enforcement of type constraints. Semantic compositions in inflections and derivations are constrained 
by the properties of the terms and predicates. 
The design has been tested as part of a HPSG grammar for Turkish. In terms of performance, 
run-time execution of the rules seems to be a far better alternative than pre-compilation. The latter causes 
exponential growth in the lexicon due to intensive use of inflections and derivations in Turkish. 
1 Introduction 
Languages like Finnish, Hungarian, and Turkish have relatively rich morphology which governs grammatical 
functions often delegated to syntax in languages such as English. Prominence of morphology puts a greater 
demand on the information in the lexicon, which may grow to an unmanageable size due to heavy use of 
inflections and derivations. In Turkish, for instance, the nominal paradigm has three affixes (number, case, 
relativizer), and the verbal paradigm has eight (for voice, tense, person, aspect, and mood). Generating 
the full paradigm for a nominal and a verbal root requires 2 3 and 2 8 entries in the lexicon, respectively. 
The problem is further complicated by the rich inventory of derivational affixes for both paradigms, as 
exemplified in 11 Hankamer \[7\] argues convincingly that full listing of every word form in the lexicon is 
untenable for agglutinative languages. 
(1) Yaz-tct-lar-a gOr-ev-ler-i bil-dir-il-me-mipti 
write-VtoN-PLU-DAT able-VtoN-PLU-ACC know-CAUS-PASS-NEG-ASP-TENSE 
'The clerks have not been informed of their duties' 
Handling inflections and derivations with lexical rules opens us possibilities for encoding semantic and 
grammatical changes in the lexicon as well. For instance, a causative suffix will demote an agent to a patient 
or a recipient, and it will add a new grammatical role for the causer (the new agent). A locative case suffix 
will mark a NP as an adjunct, which can no longer satisfy subcategorization requirements of the verbs or 
postpositions. We elaborate on the consequences of these phenomena in section 3. 
Another source for economy of representation can be seen in example (2), where attributive adjectives are 
used as nouns in 2b and 2d. One solution to this problem is syntactic underspecification, e.g., grouping the 
105 
nouns and adjectives under a single lexical category. 1 An alternative is to use a lexical rule for differentiating 
predicate and term reading of the lexical entry. 
(2) a. kuru yaprak 
dry leaf 
'dry leaf' 
b. meyve kuru-su 
fruit dry-POSS 
'dried fruit' 
c. ya#-h hantm 
age-ADJ lady 
'old lady' 
d. biitiin ya#-h-lar 
all age-ADJ-PLU 
'all elderly' 
In what follows, we will describe different kinds of lexical rules for type constraints, and handling 
changes in grammatical roles or subcategorization requirements. We also discuss processing issues such as 
run-time generation versus pre-compiling of word forms. 
2 Morphology-syntax Interface 
Modelling inflections, derivations, and the corresponding phonological alternations via lexical rules amounts 
to the lexicalization of morphology. The alternatives to this approach (for Turkish) have also been explored, 
e.g., the modularization of syntax and morphology by keeping them (and their lexicons) as separate systems 
that communicate with each other \[5\], or integrating morphology, syntax and semantics, thus treating 
morphotactics in the same manner as syntax with respect to semantic composition \[1\]. From a computational 
point of view, the modular approach has efficient lexical access since lexical search is performed on root 
forms, and bound morphemes are not considered lexical items. In the integrated (multi-dimensional) 
approach, the lexicon contains free and bound morphemes; they have complete syntactic and semantic 
specifications. Some of the inflections, e.g. person and number, do not have any contribution to semantics, 
hence their semantic form (or LF) is that of identity. Some inflections, such as case and causative affixes, 
compose semantic form of the stem (LFs) with that of the affix. LF, can be turned into (cause z LF,) for 
causatives where z is the new argument introduced by the causative affix. 2 Similar arguments can be made 
for the semantic contribution of adjunct case markers. 
The lexical approach to morphology presented here is a mid-point in the design of the morphology-syntax 
interface. In this view, morphology is not isolated from syntax, but, similar to the modular organization, 
bound morphemes are not considered lexical items. They can be attached to stems via lexical rules. 
This implies that lexical rules are responsible for semantic composition and for the changes in syntactic 
requirements. This view also represents a middle ground in the complexity of lexical structures. 
I In fact, traditional Turkish grammar books such as \[ 10\] collectively call them "substantives:' 
2cf. example 9 
106 
Keeping morphology and syntax entirely separate forces one to stipulate different scopes for affixes. For 
instance, the adverbial suffix -ken and the adjectival -lu might have phrasal (3a and 3c) or lexical scope (3b 
and 3d). Multi-dimensional approach allows affixes to 'pick out' different scopes in mixed morphological 
and syntactic composition. The lexical approach can accomodate both readings, provided that lexical rules 
are invoked with relevant syntactic information, e.g., valency of the verb. Morphologically ambiguous cases 
such as 4 are handled by multiple instantiations of the lexical rules. 
(3) a. ~ocuk top-a \[kaleci-ye bakar\]-ken vurdu 
child ball-DAT goalkeeper-DAT look-ADV hit 
'The child hit the ball facing the goalkeeper.' 
b. ~ocuklar \[yiiriir\]-ken tan toplamt#lar 
children walk-ADV stone picked 
'The children had picked stones while walking.' 
C. \[Uzun kol\]-lu g6mlek 
long sleeve-ADJ shirt 
'shirt with long sleeves' 
d. Uzun \[9igek\]-li g6mlek 
long fiower-ADJ shirt 
'long shirt with flower patterns' 
(4) a. kalem-ler-i b. kalem-ler-i c. kalem-leri 
pencil-PLU-ACC pencil-PLU-POSS.3SG pencil-POSS.3PL 
'the pencils (=OBJ)' 'his/her pencils' 'their pencils' 
It is too early to evaluate the advantages and disadvantages of these approaches in terms of competence 
grammars and performance issues. But the choice of the strategy also affects the design of lexical organi- 
zation. For instance, if inflections and derivations are handled by lexical rules, the morphological features 
need not be kept in the lexicon, since the lexical rules will reflect the changes in syntactic and semantic 
requirements coming from morphology. If morphology is treated almost like syntax, lexical knowledge 
should contain richer morphological information, including a semantic representation for bound forms (af- 
fixes), information about boundedness/freeness of morphemes, and the type of attachment (e.g., affixation, 
cliticization, syntactic concatenation) \[1, 8\]. This will enable the system to rule out, for instance, affixation 
of two free forms, or impose selectional restrictions on the stems of affixes. 
In this study, a lexical inheritance hierarchy is used in conjunction with the lexical rules to obtain type 
constraints and feature structures for free forms (words); bound forms are not part of the lexicon. The 
hierarchy is given in Figure 1. 
This tree is part of a greater hierarchy which includes inheritance information for words and phrases. We 
make use of the inheritance and type-checking mechanism of ALE \[2\] to impose type-specific constraints 
on words. Words are distinguished from phrases by disallowing any kind of gapping below the word level 
in the tree. Designating a lexical item as one of the subtypes in the hierarchy will apply all the constraints 
and incorporate the feature structures of the supertypes along the path to word. For instance, a qualitative 
adjective (e.g., rahat=comfortable) is distinguished from a quantitative one (e.g., gift=double) by its choice 
of modifiers; the latter does not allow intensifiers (5). 
107 
word 
comma 
p~o;er- relativized-l art~ " - verb_l 
demonstrative_l qualitative_l ~ 
infinitival_l adverbial_l finite_l 
relative_l complement_l 
subj_re~obj_rel l 
Figure 1: Lexical hierarchy 
(5) a. gok rahat koltuk 
very comfortable couch 
'very comfortable couch' 
b. * gok gift koltuk 
C. rahat gift koltuk 
comfortable double couch 
'comfortable twin couch' 
The fragments 3 of the type constraints for these subtypes are given in Figure 2. The controlled use of 
type constraints at different levels of the lexical hierarchy eliminate the need to enumerate type-specific 
lexical rules to achieve the same effect. 
3 Types of lexical rules 
Inflections: Lexical rules for inflections can check morphotactic constraints for proper ordering of mor- 
phemes. More importantly, they should reflect the grammatical or semantic requirements imposed by 
inflections. For instance, the locative case suffix in Turkish also marks an NP as adjunct (6). 
(6) Adam araba-da uyu-du 
man car-LOC sleep-TENSE.3SG 
'The man slept in the car' 
The lexical rule for locative case is given (in ALE notation) in Figure 3. This rule is applied when the 
locative suffix is attached to a nominal stem. The head of the NP is marked with the locative case, and 
3We use HPSG style feature structures and signatures in our descriptions. See Pollard and Sag \[13\]. 
!08 
quantitative-l \[\]\] 
SYNSEM I LOCAL I CAT I HEAD 
quantitative-adj 
MOD 
\[\] 
MODSYN 
MODADJ 
qualitative-l 
SYNSEM LOCAL I CAT I HEAD 
"qualitative-adj 
MOD 
VlODSYN 
\[\] 
MODADJ 
LOCAL I CAT 
QUANT - \] 
QUANT-ADJ 
QUAL-ADJ \[~ 
NON-REF \[\] 
"HEAD common 
"QUANT - 
QUANT-ADJ ADJUNCTS 
QUAL-ADJ 
NON-REF 
LOCALICAT 
"HEAD r OUANT \[\] 
ADJUNCTS /QUANToADJ 
LNON-REF \[\] 
"QUANT \[\] ?1 QUANT-ADJ 
QUAL-ADJ 
NON-REF \[\] 
Figure 2: Type constraints for words and some subtypes. 
the type of NP is changed to an adjunct. This is achieved by modifying the head feature MOD: While the 
nominative marked noun has null value, a MODSYN value with verbal head is introduced in the head feature 
of the locative noun. This will allow the locative marked noun to modify a verb. Thus, it cannot satisfy the 
subcategonzation requirements of verbs or postpositions. This issue is critical for parsing relatively free 
word-order languages where grammatical relations are often indicated by overt case marking rather than 
structural position. Figure 3 also shows the derivation of the semantic representation for the case marked 
NP; at(x,y) is a second-order predicate that holds between a term z and a predicate y. This predicate is 
inserted into the set of restrictions for the noun. Although this method is not generative in the sense of \[14\], 
it allows semantic composition in the lexicon. 
Derivations: Denominal verbs, deverbal nouns, and part of speech changes can be modelled respectively 
by adding subcategorization frames, discharging subcategorization frames, and type coercions, via lexical 
rules. The most difficult issue in derivations is the semantic composition, For instance, the -CI morpheme 
(with allomorph s -ct/-ci/-cu/-cii/-ft/-fi/-~u/-fii) adds the meaning "doer/user of something" (7a), "seller/lover 
of something" (7b), or habitual (7c). 
(7) a. yol -cu 
road 
'traveller' 
109 
"word 
PHON 
SYNSEM 
\[\] 
word 
PHON 
LOCAL 
NONLOCAL 
r"°"" |CASE nominativ 
3ATIHEAD ,MOD u~\] J|REL J 
LPRED 
3ONT \[INDEX LRESTR %\] 
\[\] 
SYNSEM 
\[\] @ (-del-dal-tel-ta) 
"CAT I HEAD 
LOCAL 
'INDEX 
RESTR 
CONT 
NONLOCAL \[\] 
"noun 
CASE locative 
\[ \[\] -1\] MOD MODSYNILOCAL 'CONT 
REL \[\] 
PRED \[\] 
\[\] 
NUCLEUS 
QUANTS 
( WHAT 
LWHERE ~\]'> I~ 
< > 
Figure 3: Lexical rule for the locative case. 
b. ~;eker -ci 
candy 
'candy seller or lover' 
c. sabah -qt 
morning 
'morning person' 
Clearly, this ambiguity cannot be resolved without incorporating into lexical semantics a Qualia Structure 
a la Pustejovsky \[14\], or lexical semantic constraints \[4\]. We have been incorporating these types of 
constraints. Unfortunately, descriptive work on Turkish linguistics in this regard is very scarce, and there is 
no ontology such as Levin's \[9\]. Using features like \[Tanimate\], \[:Fartifact\], \[=Fcontainer\], and \[=Fperiod\], 
one can define semantic fields for the derivational morphemes. We expand the set of features as more lexical 
items are added to the lexicon. This is a very labour intensive task; the lack of a large-scale initiative on 
lexicography in the manner of LDOCE or COBUILD is hindering the efforts for automatic extraction of 
lexical knowledge from on-line resources. 
Our strategy is to obtain complex forms derivationally if the semantic relation of the bound morpheme 
to its stem is fairly predictable. We use lexicalized forms when the meaning is not compositional. One such 
ii0 
case is the denominal verb suffix -le, which is very productive but has no predictable meaning that can be 
derived from the lexical semantics of the stem. 
Lexical Category Changes: As described in section 1, we model the nominal use of adjectives in Turkish 
by a single lexical item which may be interpreted as a term or a predicate by a lexical rule. There are 
other linguistic phenomena that are on the boundary of lexicon and syntax, which we opted to contain in 
the lexicon, e.g., non-referential objects, and valency change in the causatives. In the following, we briefly 
describe the lexical rules for them. 
Case assignment is overt in Turkish, which allows for scrambling of the constituents, All six permutations 
of the SOV order are felicitous if the object NP is case marked (e.g., 8a and 8b). If the object is non-referential 
or indefinite (cf. 8a and 8c), it is not marked morphologically, which blocks scrambling, and the unmarked 
SOV order is used (cf. 8c and 8d). 
(8) a. ~ocuk kitab-t oku-du 
child.NOM book-ACC(=object) read-TENSE.3SG 
'The child read the book.' 
b, Kitab-t 9ocuk oku-du 
c. (ocuk kitap oku-du 
child.NOM book.ACC read-TENSE.3SG 
'the child read a book (~ the child did book-reading)' 
d. * Kitap 9ocuk okudu 
Non-referential objects are not inflected, and they must occupy the immediately preverbal position. 
One way of dealing with nouns, then, is to keep two entries in the lexicon: one for unmarked form which 
may receive case marking and scramble, and one with lexically assigned case (accusative), which may not 
scramble. Our solution is to have a lexical rule that changes the subcategofization frames of verbs to handle 
cases where objects may be case-marked NPs or unmarked Ns. In the second case, the entity is marked 
indefinite and all scrambling is blocked by the lexical rule. Figure 4 shows the lexical rule in ALE notation 
(the rule is simplified for ease of exposition). 
Causatives can be modelled in a similar vein. A causative suffix changes the subcategofization frame of 
the verb by adding one more argument and changing the grammatical constraints on the other arguments. 
For instance, the new argument becomes the subject (causer), and the old subject (agent) is demoted down 
the grammatical hierarchy \[3\] to direct object or indirect object, depending on the valency of the verb: 
(9) a. Can arkadaut-m 9a~tr-dt 
ffiend-POSS-ACC call-TENSE.3SG 
'Can called his friend.' 
b. Mehmet Can-a arkada#-t-m cagtr-t-n 
Can-DAT friend-POSS-ACC calI-CAUS-TENSE.3SG 
'Mehmet had Can call his friend.' 
iii 
verb-I 
SYNSEM LOCAL I CAT 
NONLOCAL "verb-I 
SYNSEM 
\[\] I11 SUBCAT mjjj- 
"HEAD \[\] 
SUBCAT 
LOCAL I CAT 
NONLOCAL \[\] 
(\[\], 
r.  oP o mo. 11\] CAT / LCASE 
n°minativeJll 
LADJUNCTSlNON'REF + J/ 
CONT \[\] j 
raove-obj ect (\[\],\[~}, \[ CAT I HEAD \[~A~E accusative\]\]) 
LCONT \[\] 
Where move-obj ect is a definite clause which deletes the accusative object from the SUBCAT structure in first argument and return resulting 
structure and accusative object in second and third argument respectively. 
Figure 4: Lexical rule for non-referential objects. 
Morphophonemic rules: The rules for inflectional and defivational morphology might also take into 
account the archiphonemes that are not marked for certain features. For instance, the locative case marker 
has allomorphs -de/-da/-te/-ta. They may be represented uniquely by two metaphonemes -DA where D is 
a dental stop unmarked for voice and A is a low unround vowel unmarked for backness/frontness. Vowel 
harmony and voicing constraints 4 determine their surface realization during morphological composition. 
These kinds of rules are not lexical rules per se since they do not operate on lexical properties of the words. 
In our model, they are embedded in lexical rules for inflections and derivations. 
4 Conclusion 
For a language with rich morphology, lexical rules can be used for controlled generation of surface forms. 
Inflections and derivations can be seen as word-based (local) operations on the root, and thus be modelled 
as lexical rules. Phonological alternations in stems can be embedded in the rules as well. Grammatical role 
changes, type constraints on word subtypes, and noun to NP promotions (as in non-referential objects) control 
the proliferation of lexical entries. Semantic contribution of inflections seems to be morpheme specific: All 
derivations take part in semantic composition, but some inflections (such as case and causatives) contribute 
semantically as well. Most inflections (e.g., person and number markers), however, have grammatical 
functions only. This is not to say they do not have a semantic form, just that in many cases the form is that 
of identity. Productive use of derivations is limited by the predictability of the semantic relation of the stem 
to the affix. 
We have been testing our lexicon design as part of an HPSG grammar for Turkish \[15\]. The grammar 
development environment, ALE, had to be modified to allow run-time evaluation of lexical rules. Compiling 
4cf. \[11, 12\] for a description of these processes. \[6\] is the original work on Turkish that combines finite state morphotactics 
with morphophonemic alternations. 
112 
out the lexical rules seems to be impractical, since generating every possible form for a large lexicon of 
roots causes exponential growth in the lexicon. Compilation of all surface forms for a lexicon of only 40 
root forms produces around 2800 entries, and takes about 8 minutes on a Sun Sparcstation 10. Run-time 
execution of rules puts the burden on parsing or generation. We believe that as the lexicons of NLP systems 
become more comprehensive and open-ended, the trade-off will be resolved in favour of using the lexical 
rules on demand at the expense of slower performance. 
Acknowledgements: This research is supported in part by grants from NATO Science Division SfS 
III (contract TU-LANGUAGE), and Scientific and Technological Research Council of Turkey (contract 
EEEAG-90). 

References 
Cem Bozsahin and Elvan G6~men. A categorial framework for composition in multiple linguis- 
tic domains. In Proceedings of the Fourth International Conference on Cognitive Science of NLP 
(CSNLP'95), Dublin, Ireland, July 1995. 
Bob Carpenter and Gerald Penn. The Attribute Logic Engine User's Guide, Version 2.0. Carnegie 
Mellon University, Pittsburgh, August 1994. 
Bernard Comrie. The syntax of causative constructions: cross-language similarities and divergences. 
In Shibatani and Masayoshi, editors, Syntax and Semantics 6: The grammar of causative constructions. 
Academic Press, 1976. 
Dan Fass. Lexical semantic constraints. In James Pustejovsky, editor, Semantics and the Lexicon. 
Kluwer, 1993. 
Zelal Giingfrdti and Kemal Oflazer. Parsing Turkish using the Lexical-Functional Grammar formalism. 
Machine Translation, 10:293-319, 1995. 
Jorge Hankamer. Finite state morphology and left to right phonology. In Proceedings of the West 
Coast Conference on Formal Linguistics (WCCFL-5), Stanford, 1986. 
Jorge Hankamer. Morphological parsing and the lexicon. In W. Marslen-Wilson, editor, Lexical 
Representation and Process. MIT Press, 1989. 
Jack Hoeksema and Richard D. Janda. Implications of process-morphology for categorial grammar. 
In D. Wheeler R.T. Oehrle, E. Bach, editor, Categorial Grammars and Natural Language Structures. 
Dordrecht, 1988. 
Beth Levin, English verb classes and alternations: a preliminary investigation. University of Chicago 
Press, 1993. 
Geoffrey L. Lewis. Turkish Grammar. Oxford University Press., Oxford, UK, 1967. 
Kemal Oflazer. Two-level description of Turkish morphology. Literary and Linguistic Computing, 
9(2), 1994. 
\[12\] Murat Oztaner. A word grammar of Turkish with morphophonemic rules. Master's thesis, Middle 
East Technical University, January 1996. 
\[13\] Carl Pollard and Ivan A. Sag. Head-driven Phrase Structure Grammar. CSLI Chicago, 1994. 
\[14\] James Pustejovsky. The generative lexicon. Computational Linguistics, 17(4):409-441, 1991. 
\[15\] Onur Sehitoglu. A sign-based phrase structure grammar for Turkish. Master's thesis, Middle East 
Technical University, January 1996. 
