A Computational Framework for Composition in Multiple 
Linguistic Domains 
Elvan GS~men 
Computer Engineering Department 
Middle East Technical University 
06531, Ankara, Turkey 
elvan@lcsl.metu.edu.tr 
Abstract 
We describe a computational framework 
for a grammar architecture in which dif- 
ferent linguistic domains such as morphol- 
ogy, syntax, and semantics are treated not 
as separate components but compositional 
domains. The framework is based on 
Combinatory Categorial Grammars and it 
uses the morpheme as the basic building 
block of the categorial lexicon. 
1 Introduction 
In this paper, we address the problem of mod- 
elling interactions between different levels of lan- 
guage analysis. In agglutinative languages, affixes 
are attached to stems to form a word that may cor- 
respond to an entire phrase in a language like En- 
glish. For instance, in Turkish, word formation is 
based on suffixation of derivational and inflectional 
morphemes. Phrases may be formed in a similar 
way (1). 
(1) Yoksul-la~-t~r-zl-makta-lar 
poor-V-CAUS-PASS-ADV-PERS 
'(They) are being made poor (impoverished)'. 
In Turkish, there is a significant amount of in- 
teraction between morphology and syntax. For in- 
stance, causative suffixes change the valence of the 
verb, mad the reciprocal suffix subcategorize the verb 
for a noun phrase marked with the comitative case. 
Moreover, the head that a bound morpheme modi- 
fies may be not its stem but a compound head cross- 
ing over the word boundaries, e.g., 
(2) iyi oku-mu~ ~ocuk 
well read-REL child 
'well-educated child' 
In (2), the relative suffix -mu~ (in past form of 
subject participle) modifies \[iyi oku\] to give the 
scope \[\[\[iyi oku\]mu~\] 9ocuk\]. If syntactic composi- 
tion is performed after morphological composition, 
we would get compositions such as \[iyi \[okumu~ 
6ocuk\]\] or \[\[iyi okurnu~\] ~ocuk\] which yield ill-formed 
semantics for this utterance. 
As pointed out by Oehrle (1988), there is no rea- 
son to assume a layered grammatical architecture 
which has linguistic division of labor into compo- 
nents acting on one domain at a time. As a computa- 
tional framework, rather than treating morphology, 
syntax and semantics in a cascaded manner, we pro- 
pose an integrated model to capture the high level of 
interaction between the three domains. The model, 
which is based on Combinatory Categorial Gram- 
mars (CCG) (Ades and Steedman, 1982; Steedman, 
1985), uses the morpheme as the building block of 
composition at all three linguistic domains. 
2 Morpheme-based Compositions 
When the morpheme is given the same status as 
the lexeme in terms of its lexical, syntactic, and 
semantic contribution, the distinction between the 
process models of morphotactics and syntax disap- 
pears. Consider the example in (3). 
(3) uzun kol-lu g5mlek 
long sleeve-ADJ shirt 
Two different compositions 1 in CCG formalism 
are given in Figure 1. Both interpretations are plau- 
sible, with (la) being the most likely in the absence 
of a long pause after the first adjective. To account 
for both cases, the suffix -lu must be allowed to mod- 
ify the head it is attached to (e.g., lb in Figure 1), 
or a compound head encompassing the word bound- 
aries (e.g., 1:~ in Figure 1). 
3 Multi-domain Combination 
Operator 
Oehrle (1988) describes a model of multi-dimen- 
sional composition in which every domain Di has 
an algebra with a finite set of primitive operations 
1Derived and basic categories in the examples are in 
fact feature structures; see section 4. 
We use ~ '~ to denote the combination of categories 
x and y giving the result z. 
302 
lexical entry syntactic category semantic category 
~z~n n/~ Ap.Zong(p( z ) ) 
kol n Ax.sleeve(x) 
-l~ (~1~) \ n ~q.x~.~(y, ha~(q)) 
g5mlek n Aw.shirt(w) 
uzun kol .In gJmlek 
(la) • n/n 
shirt(y, has(long(sleeve(z)))) = 'a shirt with long sl ..... ' 
(lb) 
~z~n kol -lu g6mlek 
n/n 
long(shirt(y, has(sleeve(z)))) = 'a long shirt with sleeves' 
Figure 1: Scope ambiguity of a nominal bound mor- 
pheme 
Fi. As indicated by Turkish data in sections 1 and 2, 
Fi may in fact have a domain larger than--but com- 
patible with--Di. 
In order to perform morphological and syntactic 
compositions in a unified framework, the slash oper- 
ators of Categorial Grammar must be enriched with 
the knowledge about the type of process and the 
type of morpheme. We adopt a representation sim- 
ilar to Hoeksema and Janda's (1988) notation for 
the operator. The 3-tuple <direction, morpheme 
type, process type> indicates direction 2 (left, right, 
unspecified), morpheme type (free, bound), and 
the type of morphological or syntactic attachment 
(e.g., affix, clitic, syntactic concatenation, reduplica- 
tion). Examples of different operator combinations 
are given in Figure 2. 
4 Information Structure and 
Tactical Constraints 
Entries in the eategorial lexicon have tactical con- 
straints, grammatical and semantic features, and 
phonological representation. Similar to HPSG (Pol- 
lard and Sag, 1994), every entry is a signed 
attribute-value matrix. Lexical and phrasal ele- 
2We have not yet incorporated into our model the 
word-order variation in syntax. See (Hoffman, 1992) for 
a CCG based approach to this phenomenon. 
Operator Morp. 
< \, bound, clitic> de 
< \, bound, affix> -de 
</, bound, redup> ap- 
</, free, concat> nzun 
< \, free, concat> ba~ka 
<\[, free, concat> gSr 
Example 
Ben de git-ti.m 
I too go-TENSE-PERS 
'I went too.' 
Ben-de kalem ear 
I-LOCATIVE pen exist 
'I have a pen.' 
ap-afzk durum 
INT-clear situation 
'Very clear situation' 
uzun yol 
long road 
'long road' 
bu- ndan ba~ka 
this-ABLATIVE other 
'other than this' 
ktz kedi-yi gSr-dii 
girl cat-ACC see-TENSE 
or 
ktz g6rdii kediyi 
'The girl saw the cat' 
Figure 2: Operators in the proposed model. 
ments are of the following f (function) sign: 
Fres \] 
/LphonJ 
res-op-arg is the categorial notation for the ele- 
ment. phon represents the phonological string. Lex- 
ical elements may have (a) phonemes, (b) mete- 
phonemes such as H for high vowel, and D for a dental 
whose voicing is not yet determined, and (c) optional 
segments, e.g., -(y)lA, to model vowel/consonant 
drops, in the phon feature. During composition, 
the surface forms of composed elements are mapped 
and saved in phon. phon also allows efficient lexicon 
search. For instance, the causative suffix -DHr has 
eight different realizations but only one lexical entry. 
Every res and arg feature has an f or p (property) 
sign: 
syn 1 pLSernj 
syn and sere are the sources of grammatical (g 
sign) and semantic (s sign) properties, respectively. 
These properties include agreement features such as 
person, number, and possessive, and selectional re- 
303 
strictions: 
"cat type 
form 
restr <cond> $ 
"person " 
number 
poss nprop 
case 
relative 
form 
"reflexive 
reciprocal 
causative 
passive 
vprop tense 
modal 
aspect 
person 
form 
restr <cond> g 
A special feature value called none is used for 
imposing certain morphotactic constraints, and to 
make sure that the stem is not inflected with the 
same feature more than once. It also ensures, 
through syn constraints, that inflections are marked 
in the right order (cf., Figure 3). 
5 Conclusion 
Turkish is a language in which grammatical func- 
tions can be marked morphologically (e.g., case), 
or syntactically (e.g., indirect objects). Semantic 
composition is also affected by the interplay of mor- 
phology and syntax, for instance the change in the 
scope of modifiers and genitive suffixes, or valency 
and thematic role change in causatives. To model 
interactions between domains, we propose a catego- 
rial approach in which composition in all domains 
proceed in parallel. As an implementation, we have 
been working on the modelling of Turkish causatives 
using this framework. 
6 Acknowledgements 
I would like to thank my advisor Cem Bozsahin for 
sharing his ideas with me. This research is supported 
in part by grants from Scientific and Technical Re- 
search Council of Thrkey (contract no. EEEAG- 
90), NATO Science for Stability Programme (con- 
tract name TU-LANGUAGE), and METU Gradu- 
ate School of Applied Sciences. 

References 
A. E. Ades and M. Steedman. 1982. On the order 
of words. Linguistics and Philosophy, 4:517-558. 
Jack Hoeksema and Richard D. Janda. 1988. Im- 
plications of process-morphology for categorial 
grammar. In R. T. Oehrle, E. Bach, and D. 
Wheeler, editors, Categorial Grammars and Nat- 
ural Language Structures, D. Reidel, Dordrecht, 
1988. 
Beryl Hoffman. 1992. A CCG approach to free word 
order languages. In Proceedings of the 30th An- 
nual Meeting of the A CL, Student Session, 1992. 
Richard T. Oehrle. 1988. Multi-dimensional compo- 
sitional functions as a basis for grammatical anal- 
ysis. In R. T. Oehrle, E. Bach, and D. Wheeler, 
editors, Categorial Grammars and Natural Lan- 
guage Structures, D. Reidel, Dordrecht, 1988. 
C. Pollard and I. A. Sag. 1994. Head-driven Phrase 
Structure Grammar. University of Chicago Press. 
M. Steedman. 1985. Dependencies and coordination 
in the grammar of Dutch and English. Language, 
61:523-568. 
