Two-Level Morphology with Composition 
Lauri Karttunen, Ronald M. Kaplan, and Annie Zaenen 
Xerox Palo Alto Research Center 
Center for the Study of language and Information 
StanJbrd University 
1. Limitations of "Kimmo" systems 
The advent of two-level morphology 
(Koskenniemi \[1\], Karttunen \[2\], Antworth 
\[3\], Ritchie et al. \[4\]) has made it relatively 
easy to develop adequate morphological 
(or at least morphographical) descriptions 
for natural languages, clearly superior to 
earlier "cut-and-paste" approaches to mor- 
phology. Most of the existing "Kimmo" 
systems developed within this paradigm 
consist of 
• linked lexicons stored as annotated 
letter trees 
• morphological information on the leaf 
nodes of trees 
• transducers that encode morphological 
alternations 
An analysis of an inflected word form is 
produced by mapping the input form to a 
sequence of lexical forms through the 
transducers and by composing some out- 
put from the annotations on the leaf nodes 
of the lexical paths that were traversed. 
Comprehensive morphological descrip- 
tions of this type have been developed for 
several languages including Finnish, 
Swedish, Russian, English, Swahili, and 
Arabic. Although they have several good 
features, these Kimmo-systems also have 
some limitations. The ones we want to ad- 
dress in this paper are the following: 
(1) Lexical representations tend to be 
arbitrary. Because it is difficult to write 
and test two-level systems that map 
between pairs of radically dissimilar 
forms, lexical representations in existing 
two-level analyzers tend to stay close to 
the surface forms. 
This is not a problem for morpho- 
logically simple languages like English 
because, for most words, inflected forms 
are very similar to the canonical dictionary 
entry. Except for a small number of 
irregular verbs and nouns, it is not 
difficult to create a two-level description 
for English in which lexical forms coincide 
with the canonical citation forms found in 
a dictionary. 
However, current analyzers for mor- 
phologically more complex languages 
(Finnish and Russian, for example) are not 
as satisfying in this respect. In these 
systems, lexical forms typically contain 
diacritic markers and special symbols; 
they are not real words in the language. 
For example, in Finnish the lexical 
counterpart of otin 'I took' might be 
rendered as otTallln, where T, al, and I1 
are an arbitrary encoding of morpho- 
logical alternations that determine the 
allomorphs of the stem and the past tense 
morpheme. The canonical citation form 
ottaa 'to take' is composed from 
annotations on the leaf nodes of the letter 
trees that are linked to match the input. It 
is not in any direct way related to the 
lexical form produced by the transducers. 
(2) Morphological categories are not 
directly encoded as part of the lexical 
form. Instead of morphemes like Plural or 
Past, we typically see suffix strings like +s, 
and +ed, which do not by themselves indi- 
cate what morpheme they express. 
Different realizations of the same morpho- 
logical category are often represented as 
different even on the lexical side. 
These characteristics lead to some un- 
desirable consequences: 
ACRES DE COLING-92, NANTES, 23-28 AO~' 1992 1 4 1 PROC. OF COLING-92, NA~rr~s, AU6.23-28, 1992 
I. Generation is more cumbersome and 
less efficient than analysis. Because the 
information about morphological cate- 
gories is available only on the leaf nodes of 
the trees, many paths through the struc- 
ture may have to be tried before the right 
one is found. Some ways around this 
problem have been invented (Barton \[5\]) 
but in practice their use is limited. 
II. Annotated letter trees cannot be 
minimized. Although letter trees, anno- 
tated with morphological information, are 
a kind of finite-state network, they cannot 
be minimized because all the information 
associated with the leaf nodes would get 
lost when the branching tails are merged. 
The approach that we describe in this 
paper overcomes these problems and al- 
lows a representation of morphological in- 
formation that maps more easily to the 
representation found in traditional lexi- 
cons. On this basis we have constructed 
morphological analyzers for English and 
French (with Carol Neidle) at Xerox PARC. 
2. Desiderata 
lexical level happy +Comp +Adj 
surface level happi er 0 
lexical level good +Comp +Adj 
surface level bett er 0 
Figure 1 
The stems are presented as the lemmas 
found in a dictionary, followed by mor- 
phological tags. 0 serves here as the 
epsilon symbol. Because there is no need 
to have other annotations on the lexicon 
trees, problems I and II in Section 1 have 
been eliminated. Lexical forms are always 
sequences of morphemes in their canonical 
representation. 
The only obstacle to this approach is 
that the rules that constrain the surface 
realization of lexical forms become more 
difficult to write when there is little or no 
similarity between the two levels of repre- 
sentation. Designing such rules and under- 
standing their interactions is a hard task 
even with the computational assistance 
provided by a complete compiler for the 
two-level formalism (Karttunen et al. \[6\]). 
We follow two simple principles: 
(1) Inflected forms of the same word 
are mapped to the same canonical dictio- 
nary form. This applies to both regular 
and irregular forms. For example, in our 
English analyzer the surface forms happier 
and better are directly matched with the 
lexical forms happy and good, respectively, 
rather than some nonwords. 
As the distance between lexical and sur- 
face form increases, the mapping is easier 
to describe by allowing one or more 
intermediate levels of representation. The 
solution we adopted combines the two- 
level rule formalism with the cascade 
model of finite-state morphology dis- 
cussed by Kaplan & Kay \[7\]. 
3. Composition of two-level rules 
(2) Morphological categories are repre- 
sented as part of the lexical form. Instead 
of encoding morphological categories such 
as Plural, Comparative, lstPerson as annota- 
tions on strings that realize them, we in- 
clude them directly in the lexical represen- 
tation. Consequently, our two-level repre- 
sentation of happier and better are: 
Ac'rF.s DE COLING-92, NANTES, 23-28 Ao0r 1992 1 4 2 
Our formal understanding of finite-state 
morphology is based on the 
demonstrations that both rewriting rules 
and two-level rules denote regular 
relations on strings (Kaplan \[9\]). The 
correspondence between regular relations 
and finite-state transducers and the 
closure properties of regular relations 
provide the computational and 
mathematical tools that our approach 
depends on. One of the earliest results of 
finite-state morphology is the observation 
PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
that regular relations are closed under 
composition (Johnson \[8\], Kaplan&Kay \[7\], 
Kaplan \[9\]). Consequently, a single 
transducer can be constructed whose 
behavior is exactly the same as a set of 
transducers arranged in an ordered 
feeding cascade: 
lexical string lexical string 
' T 
composite intermediate string 
transducer 
surface string surface string 
Figure 2 
This observation was originally made 
about transducers corresponding to 
phonological rewrite rules, but it applies 
to regular relations or transducers no 
matter how they are specified. Although 
regular relations in general are not closed 
under intersection, the subclass of 
relations denoted by standard two-level 
rules is closed under this operation 
(Kaplan \[9\]). Thus fstl and fst2 in 
Figure 2 may represent either a single two- 
level rule or the intersection of any 
number of rules. 
When the relationship between lexical 
and surface forms is complex, the descrip- 
tive task of setting up rules that relate the 
two levels can be simplified by decompos- 
ing the complex relation to a series of less 
opaque matches. For efficient recognition 
and generation, the resulting cascade can 
be reduced to a single transducer. Al- 
though it would be possible in principle to 
produce the same single transducer 
directly from two-level rules, we have 
found many cases in our descriptions of 
English and French where the composition 
approach is not only easier but also 
ACRES DECOLING-92, NANTES, 23-28 ^O',3T 1992 
linguistically more justified. We describe 
one such case in detail. 
4. French compound plurals 
French plurals can be formed in a vari- 
ety of ways. Some of the most common 
patterns are illustrated in Figure 3. 
We omit here the actual two-level 
rules; what Figure 3 illustrates is simply 
the joint effect of several rules that con- 
strain the realization of the plural mor- 
pheme and the shape of the stern in regu- 
lar nouns. Note that the constraints here 
are local; the stem and the plural 
morpheme are in a fixed position with 
respect to each other. 
message +masc +pl 
message 0 s 
'n~ssage' 
nez +masc +pl 
nez 0 0 
cheveu +masc +pl 
cheveu 0 x 
tnosel 
'hair' 
cheval +masc +pl 
chevau 0 x 
'horse' 
Figure 3 
In compound nouns and adjectives, 
several patterns are possible: (1) only the 
first part of the compound is marked for 
the plural, (2) both are, (3) none are or (4) 
I 4 3 PROC. OF COLING-92, NANTEs, AUG. 23-28, 1992 
only the last is. The possible patterns and 
some examples are given in Figure 4. 
The interesting cases are those in which 
the first part needs to be pluralized. In a 
simple two-level system, the information 
about plural formation summarized in 
Figure 3 would have to be rewritten and 
adapted so that the rules could apply over 
a. distance in the position just before the 
hyphen. 
No plural marking at all 
un je-ne-sais-quai 'a certain something' 
des je-ne-sais-quoi 
Plural marking on the first compound 
un chef-d'oeuvre 'masterpiece' 
des chefs-d'oeuvre 
Plural marking on the second compound 
une mini-jupe 'mini-skirt' 
des mini-jupes 
Plural marking on both compounds 
une porte-fen~tre 'French window' 
des portes-fen~tres 
Figure 4 
The simple rules for regular plural for- 
mation illustrated in Figure 3 do not work 
for first parts of compounds because the 
affected elements are not in the same con- 
figuration relative to each other. Although 
it is possible to modify the rules, the new 
versions would be rather complicated and 
do not capture the simple fact that the plu- 
rals portes and fen~tres in portes-fen~tres in 
themselves are regular, the only thing that 
is special about the word is that plurality 
is expressed in both parts of the com- 
pound. 
We avoid these complications by creat- 
ing a cascade of two-level rules in which 
the first stage is only concerned with the 
plurals of compounds. It starts from a lexi- 
cal form in which the words are marked 
for the pattern that they take and creates 
an intermediate level in which the 
information about number and gender is 
distributed over the agreeing parts. This is 
illustrated in Figure 5 for the masculine 
plural of social-ddmocrate, a word in which 
both parts get pluralized. 
social 0 0 -d~noc r ate+DP L+ma s c+pl 
social+mas c+pl -d~mocrat e 0 +masc+pl 
Figure 5 
The effect of the first stage of rules is to 
copy the morphological tags from the end 
of the compound to the middle whenever 
the +DPL (double plural) diacritic is pre- 
sent. 
The second layer of rules applies uni- 
formly to simple nouns as well as com- 
pounds. In the case at hand, the two plu- 
rals in sociaux-ddmocrates are realized in the 
regular way, as shown in Figure 6. 
sociau 0 x -d~mocrate 0 s 
Figure 6 
By first intersecting the rules in each set 
and then composing the results in the way 
shown in Figure 2, we end up with a 
transducer that eliminates the intermedi- 
ate level altogether and maps the lexical 
representation directly to the correct sur- 
face form, and vice versa. Figure 7 illus- 
trates the final result. 
social 0 -d/~mocrate +DPL +masc +pl Ill ill 
sociau x -d~mocrate 0 0 s 
Figure 7 
AcrEs DE COLING-92, NAgIES, 23-28 ^ot~r 1992 1 4 4 PRoc. OF COLING-92, NANTES. AUG. 23-28, 1992 
The representation in Figure 7 fulfills 
the desiderata laid out in Section 2 except 
that it contains a special diacritic +DPL that 
marks the behavior of social-ddmocrate, 
with respect to plural formation. In the 
next section, we show how that diacritic 
can be eliminated. 
5. Composition with the lexicon 
By choosing the canonical dictionary 
form as the lexical form in our English and 
French analyzers and by including mor- 
phological categories directly as part of 
that representation, we have eliminated 
the need for additional annotations in the 
lexical structure that are common in exist- 
ing Kimmo systems. We can treat the letter 
tree as a simple finite-state network in 
which all morphological information is 
carried on the branches of the tree and not 
on the leaves. 
Taking this idea one step further, we 
may think of the lexicon as a trivial first 
stage in a cascade of transducers that maps 
between the lexical and the surface levels. 
The second stage is the two-level rule sys- 
tem. In the case of our analyzers for 
English and French, the rule system starts 
out with three levels but reduces to two by 
intersection and composition. The final 
stage is the composition of the rule system 
with the lexicon. 
This progression of pushing the original 
Kaplan & Kay \[7\] program to its logical 
conclusion is depicted in Figure 8. 
Stage 1 Stage 2 Stage 3 
I L  XIc°N I \[ L.XICON 
¢ ¢ IFsw; 
surface string surface stri,~ 
I 
I 
o 
I LEXICON 
T 
FST 2 
surface string 
I 
O 
Stage 4 
LEXICON 
o FST 1 
o 
FST 2 
1 
surface string 
Figure 8 
Figure 8 sketches the construction of 
our morphological analyzers for English 
and French. Arrows labeled with & repre- 
sent intersection, arrows marked with o 
stand for composition. (We have simpli- 
fied this picture slightly by omitting the 
composition of small bookkeeping rela- 
tions that are necessary to model properly 
the interpretation of epsilon transitions in 
two-level rules.) 
Ac~s DE COLING-92, NANTES, 23-28 AOOT 1992 
Stage 1 consists of two parallel two- 
level rule systems arranged in a cascade, 
as illustrated in Section 4. In Stage 2, the 
rules on each level have been intersected 
to a single transducer. Stage 3 shows the 
composition of the two-level rule systems 
to a single transducer and Stage 4 repre- 
sents the final result: a transducer that 
maps sequences of canonical dictionary 
forms and morphological categories to the 
corresponding surface forms, and vice 
versa. Although the conceptual picture is 
1 4 5 P~oc. oF COLING-92, NANTES, AUG. 23-28, 1992 
quite straightforward, the actual com- 
putations to produce the structures can be 
resource intensive, in some cases quite 
impractical. 
At the last stage, when the idiosyncratic 
behavior of particular lexical items has 
been taken into account in the composition 
of the lexicon with the rule transducers, all 
morphological diacritics such as the +DPL 
tag for French nouns with double plurals 
can be eliminated because the rules that 
depend on them have been applied. In full 
compliance with our desiderata in Section 
2, the resulting transducer maps, among 
other things, social-ddmocrate+masc+pl 
directly to sociaux-ddmocrates, and vice 
versa. 
6. Discussion 
Finite-state morphology tests on the ob- 
servation that ordinary morphological al- 
ternations involve regular relations. This is 
the basis of the early work by Kaplan and 
Kay \[7\] on converting ordered rewrite 
rules to a cascade of transducers and the 
parallel transducers of Koskenniemi's two- 
level model \[1\]. In recent times the two- 
level model has been more popular. It has 
turned out (Karttunen \[10\]) that parallel 
two-level constraints are even expressive 
enough to account for phenomena that 
require rule ordering in the classical 
phonological rule formalism. But there is 
no computational or theoretical reason to 
insist on two-level descriptions. Because 
the mathematical properties of rewrite and 
two-level rules are now well-understood 
(Kaplan \[9\], Ritchie \[11\]), we can compose 
any n-level description to just two levels. 
In our work on English and French mor- 
phology we came across many instances in 
which the introduction of an extra level is 
both practical and linguistically motivated. 
The case of French compound plurals is a 
typical example. 
Our success in composing the rule sys- 
tem with the lexicon (Stage 4 in Figure 8) is 
due to a number of fortunate characteris- 
tics that morphological alternations and 
lexicons of natural languages seem to have 
even though they are not necessary or 
even probable from a formal point of view. 
We at least were surprised by some of our 
results. The most important of these 
delightful discoveries are: 
(1) Small case studies can be mis- 
leading. The composition of a rule trans- 
ducer against a lexicon containing a hand- 
ful of words is so much larger than the 
input lexicon that one is tempted to con- 
clude that the method can never succeed 
on a large scale. However, this blowup 
seems not to occur when the lexicon is 
already large. 
(2) Intersections and compositions of 
rule transducers tend to be large, but not 
nearly as large as they might be. The 
result of intersecting a few dozen two- 
level rules may have thousands or tens of 
thousands of states, but not trillions as the 
worst case scenario predicts. Many rules 
tend to apply either in quite similar or in 
quite different environments. The finite 
state machinery can represent such pat- 
terns without multiplying state sets. 
(3) Composition with the lexicons 
reduces the complexity of rule inter- 
actions. It might turn out that the 
composition of a large lexicon with an 
even larger rule transducer is bigger than 
either one of the input structures. In 
reality, the size of the result seems to be 
somewhere in the middle. The rules 
constrain the realization of all possible 
lexical forms. In the composition, their 
scope is restricted to just the forms that 
actually exist in the language. It turns out 
that this restriction makes the result 
smaller rather than larger even though the 
lexicon itself is a very irregular collection 
of forms. 
The fact that it is possible to construct a 
lexical transducer for the whole language 
raises interesting theoretical issues. In lin- 
guistics it is commonly assumed that lexi- 
AcrEs BE COLING-92, NANTES, 23.28 AOUT 1992 1 4 6 PROC. OF COLING-92, N^NTES, AUG. 23-28, 1992 
cal entries and the rules for realizing them 
exist independently from one another. 
That assumption is, of course, also the 
starting point of the work that we are re- 
porting about in this paper. The initial 
separation between the lexicon and the 
rules is useful in constructing a system for 
word recognition and generation. The 
rules are, in a sense, a decomposition of a 
very complex mapping between lexical 
and surface forms to a set of simpler rela- 
tions that we can comprehend and ma- 
nipulate. But in the construction of the fi- 
nal result individual rules and the distinct 
lexicon disappear. The rules play no role at 
all in the actual generation and recognition 
process. They are needed only for the pur- 
pose of enlarging the lexicon, although 
other acquisition methods can be envi- 
sioned. The rules are true generalizations 
about the two-level lexicon constructed 
with them but they are not a part of it. 
In linguistics the psychological reality of 
rules is often taken to be established by the 
observation that a simple listing of all 
forms would be not only implausible but 
even impossible, given that the brain must 
have some storage limitations. The general 
organization of the system like the one we 
have described suggests that the role of 
rules might be quite different. Instead of 
being essential for the production and 
comprehension of speech, the rules that 
linguists are trying to discover may be--if 
they exist in the mind at all--only sec- 
ondary reflections on the generalizations 
that can be encoded in the finite-state lexi- 
cal structure itself. 
R6sum6 
Cet article d6crit une nouvelle 
utilisation des transducteurs reguliers en 
analyse morphologique. Les syst6mes 
Kimmo standard se composent d'un 
lexique en forme d'arborescence de 
caract6res (trie) avec des sommets finals 
annot6s d'information morphologique et 
un ensemble de transducteurs qui 
transcrivent les representations lexicales 
en formes fl6chies. Bienque ces syst~mes 
soient sup6rieurs aux techniques an- 
t6rieures d'analyse morphologique "cut- 
and-paste", ils ont un certain hombre de 
d6savantages: les formes lexicales sont 
souvent arbitraires et diff6rentes des 
lemmes d'un dictionnaire normal; l'ana- 
lyse morphologique n'est pas encod6e 
directement dans la forme lexicale. Le 
r6sultat est que la synth~se est souvent 
plus ardue que i'analyse et que les struc- 
tures ne sont pas optimales. 
Les analyseurs morphologiques con- 
struits a Xerox-PARC pour le franqais et 
l'anglais se basent sur deux principles 
simples: 1. les formes fl6chies d'un m6me 
mot se basent sur un m6me lemme; 2. les 
cat6gories morphologiques font partie 
int6grante de la forme lexicale. Ainsi les 
formes lexicales sont toujours des 
s6quences de morphemes. Il est difficile 
d'achever ces deux r6sultats d6sirables 
dans le cadre d'une description classique/~ 
deux niveaux parce que la distance entre 
les formes lexicales et les formes de surface 
est longue et tr6s difficile a d6crire avec un 
seul ensemble de r6gles phonologiques 
deux niveaux. 
I1 est possible de r6soudre ces 
probl6mes en exploitant d'une faqon plus 
approfondie les principes de la phonologie 
deux niveaux. En guise d'exemple, 
l'article d6crit une cascade de r~gles 
deux niveaux qui permet une description 
simple du pluriel des mots compos6s en 
franqais. La premi6re serie de r6gles insure 
des annotations de nombre et de genre 
apr6s chaque 61ement de mots /~ pluriel 
double (social-ddmocrate ~ sociaux-ddmo- 
crates), la s6conde s6rie de ri~gles 
d6termine la r6alisation du nombre et du 
genre r6quise par les racines. 
Les caract6ristiques math6matiques des 
transducteurs r6guliers sont bien connues. 
Elles permettent la combinaison de trans- 
ducteurs correspondant a des syst6mes de 
r6gles a deux niveaux par composition et 
par intersection. Ainsi il est possible de 
r6duire un syst~me ~ niveaux multiples 
un seul transducteur qui contr61e simul- 
tan6ment toutes les alternances morpho- 
logiques d'un langue. Vu que dans les 
lexiques anglais et franqais d6velopp6s 
Xerox toute l'information morphologique 
est cod6e directement avec le lemme, il est 
possible d'aller plus loin et de composer le 
lexique entier avec les r6gles. Le trans- 
ducteur r6sultant, un lexique a deux 
niveaux, transcrit les formes lexicales 
directement en formes de surface et vice 
versa. Les r~gles ne sont utilis6es que dans 
la phase de construction. L'analyse et la 
synth~se ne font usage que du trans- 
ducteur lexical r6sultant. 
ACRES 91i COLING-92, NANTES, 23-28 AOIYr 1992 1 4 8 PROC. OF COLING-92, NANTES, AUG. 23~28, 1992 
AcrEs DE COLING-92, NANTES. 23-28 AO~I' 1992 1 4 7 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 

References 

Koskenniemi, K. Two-level Morphol- 
ogy. A General Computational Model for 
Word-Form Recognition and Production. 
Department of General Linguistics. 
University of Helsinki. 1983. 

Karttunen, L. K1MMO: a General 
Morphological Processor. Texas Lin- 
guistics Forum, 22:217-228. 1983. 

Antworth, E. L. PC-KIMMO: a two-level 
processor for morphological analysis. 
Occasional Publications in Academic 
Computing No. 16, Summer Institute 
of Linguistics, Dallas, Texas. 1990. 

Ritchie, G. D., G. J. Russell, A. W. 
Black, S. G. Pulman. Computational 
Morphology. Practical Mechanisms for 
the English Lexicon. The MIT Press, 
Cambridge, MA. 1991 

Barton, E., R. Berwick, E. Ristad. 
Computational Complexity and Natural 
Language. The MIT Press, Cambridge 
MA. 1987. 

Karttunen, L., K. Koskenniemi, and 
R. M. Kaplan. A Compiler for Two- 
level Phonological Rules. In Dal- 
rymple, M. et al. Tools for Morphologi- 
cal Analysis. Center for the Study of 
Language and Information. Stanford 
University. Palo Alto. 1987. 

Kaplan, R. M. and M. Kay. Phonolog- 
ical rules and finite~state transducers 
\[Abstract\]. Linguistic Society of Amer- 
ica Meeting Handbook. Fifty-sixth An- 
nual Meeting, December 27-30, 1981. 
New York. 

Johnson, C. Douglas. Formal Aspects 
of Phonological Description. Mouton. 
The Hague. 1972. 

Kaplan, R. M. Regular models of 
phonological rule systems. Alvey 
Workshop on Parsing and Pattern 
Recognition. Oxford University, 
April 1988. 

Karttunen, Lauri. Finite-State Con- 
straints. In the Proceedings of the Inter- 
national Conference on Current Issues in 
Computational Linguistics, June 10-14, 
1991. Universiti Sains Malaysia, Pe- 
nang, Malaysia. 1991. 

Ritchie, Graeme D. Languages Gen- 
erated by Two-level Morphological 
Rules. Research Paper 496. Depart- 
ment of Artificial intelligence, Uni- 
versity of Edinburgh, 1990. To appear 
in Computational Linguistics. 
