Syllable-based Morphology 
Lynne J. Cahill* 
School of Cognitive and Computing Sciences 
University of Sussex 
Brighton BN1 9QH 
England 
Abstract 
This paper presents a language for the description of 
morphological alternations which is based on sylla- 
ble structure. The justification for such an approach 
is discussed with reference to examples fi'om a va- 
riety of languages and the approach is compared to 
Koskenniemi's two-level account of morphonoIogy. 
Keywords: morphology, phonology, syllables. 
1 Introduction 
The field of computational morphology was revolu- 
tionized by the work of Kfimno Koskenniemi (198a, 
1984), whose two-level model of lnorphonology ha.s 
been used for the description of several languages, 
including English, French, Finnish and Japanese. It 
is attractive computationally, being based on finite 
state transducers. However, we shall argue that, al- 
though FSTs are good at ma.pping strings of symbols 
onto strings of symbols, morphological representa- 
tion is tree-like rather than string-like. 
Work on computational morphology can roughly 
be divided into Koskenniemi-type models, which give 
an account of phonological (or orthographical) alter- 
nations, but which pay less attention to morphologi- 
cal aspects, such as inheritance; and inheritance-type 
models, which do the opposite - ignoring phonologi- 
cal aspects. 
The latter includes lexical representation lan- 
guages which provide as output objects like: 
1. (sumx_er (umlaut Buch)) \[after Evans & Gaz- 
dar, 1989a, p.67\] 
2 .... is realized by the suffixation of/en/ \[Zwicky, 
1985, p.374\] 
3. V~;I{B -+ pa.s~; tense suffix: +re or +de \[de 
Srned~,, 1984, p.183\] 
*Supported by a grant fi'om I\]3M (UK) Scientific Centare. 
Winchester. 
4. (>>REFERENT 
(APPEND 
(APPEND \[STRING "werk"\]) 
OF PAST-PARTICIPLE-SUFFIX)) 
OF PAST-PARTICIPLE-PREFIX)) 
\[Daetemans, t987, p.40\] 
Work of this kind thus opens a gap which needs to 
be filled - namely a means of defining such morpho- 
logical fimctions a.s umlaut and su~xation. 
The a.im of the work described here is 1;o pro- 
vide a formal language for describing (in phonologi- 
cal terms) those morphological alternations that are 
to be found in natural languages. 
Section 2 will discuss the approach to morphol- 
ogy being advocated here, in particular contrasting 
it with Koskenniemi's approach, and explaining why 
concepts from non-linear phonologies may be useful 
to morphology. Section 3 will present the language 
itself, and some examples of how it may be applied 
to natural language alternations will be discussed in 
Section 4. 
2 The Approach 
Koskenniemi's model divides the morphological pro- 
cessor into two elements - the FST which handles 
(mor)phonologicM alternations by matching lexical 
and surface forms, and a system of mini-lexicons, 
which handle the concatenation of morphemes to 
make words. While this system has been shown to 
work with a nulnber of languages, and while its con> 
putationM simplicity makes it very attractive, such a 
model forces one to make a rather radical distinction 
between, say, suflixation and a modification function 
such as unfiaut. Furthermore, the model is concerned 
with strings of segments - that is, it does not a.l- 
low one to make any reference to entities above the 
level of the segment (syllables, feet, etc.), except by 
the use of rather awkward boundary diacritics. This 
is in contrast with much recent lit~guistic work on 
phonology, which i~w.: rediscovered suprm~egmenta! 
concepts, such a< ,aylJabie,;, metricM feet and corse 
48 
groups. (see e.g. Liberman & Prince, 1976, Du- 
rand,1986.) 
That there is some interaction between the pro- 
cesses of phonology and those of morphology can be 
readily seen by looking at an example from Matthews 
(1974). The Latin verb alternations 'seribo' (I write) 
and 'scripsi' (I wrote) apparently involve both. The 
addition of the '-s-' in the past tense form is purely 
morphological, while the alteration of the /b/ to a 
/p/, while it could again be morphological, could also 
be accounted for by natural phonological processes 
which operate in other morphological environments, 
as a ease of voicing assimilation, with the voicing fea- 
ture of the/b/assimilating to that of the/s/to give 
Ipl. 
With this in mind, it is no surprise that the types 
of functions which can occur in morphology are gem 
erally similar to phonological processes and often 
require reference to at least a. subset of those con- 
cepts required for the description of phonological 
phenomena. (Indeed , many morphological functions 
are phonological functions which have become "fos- 
silized" in the morphology after the disappearance 
of the phonological conditioning context.) Kosken- 
niemi's model was strictly segmental and used only 
monadic phones and phonenaes, rather than feature 
bundles. The present work proposes that tree struc- 
tures to the level of the syllable are required for 
morphological description, as well as feature bun- 
dies. The motivation comes from examples such as 
the English stressed syllable shift as in pairs like 
/kon'vikt/--/'konvikt/, and pairs distinguished by a 
single feature, such as the voicing feature in/tell:f/- 
/reli:v/. 
The present work aims to provide a language 
tot defining morphological fimctions using tree- 
structures, feature bundles and concepts from non- 
linear phonologies. It assumes the existence of a lex- 
ical representation language like those in Section 1 
above, although the exact nature of that language 
has no bearing on the language presented here. 
3 The Language 
MOLUSC 1, the formal language presented here, is 
a declarative language based on the concept of the 
syllable and hence it embodies the clai,n that all the 
functions used in morphology can be defined in terms 
of syllables and subsyllabic constituents. A syllable 
structure as in Figure 1 is assumed, where the onset 
and coda consist of consonants and the peak consists 
of vowels. (The onset a.nd coda may optionally be 
empty.) 
The existence of a rhyme constituent is not entirely 
uncontroversial (see Clements and Keyser, 1983), 
1 Morphological Operations Language Using Syllabic-based 
Constructs 
syll 
onset rhyme 
peak coda 
Figure 1: Syllable structure 
but is nonetheless widely accepted by linguists and 
its role in morphological description can be seen by 
looldng at the English verb alternations, "bring" - 
"brought", "think" -"thought" etc., in which the 
rhyme constituent in the past tense form is always 
/w/ (assmning that the/t/is a past tense suffix). 
A stem ~" consists of a sequence of syllables, without 
any further structure between the level of syllable 
and stem.. A disyllabic stein, such as/begin/ would 
therefore have the structure in Figure 2 (over). 
It should be noted that this analysis is distinct 
from most non-linear phonologies, which postulate 
fllrther levels of structure between the syllable and 
the stem, such as metrical feet. We maintain that 
these fllrther levels of analysis are not necessary for 
morphological description, although this is not to 
deny their role in phonological description. 
With this kind of structure, we still need some way 
of referring to particular syllables within the stem, 
or segments within the peak, onset and coda. We 
achieve this by means of a simple numbering con- 
vention, where +N refers to the Nth element from 
the left, and -N refers to the Nth element from the 
right. In addition, +0 refers to the pre-initial po- 
sition with respect to a sequence of nodes, and -O 
refers to the post-final position. This latter conven- 
tion is intended primarily for pre- and suffixation. 
MOLUSC contains one basic operation, substitu- 
tion. Conceptually, this means that affixation fimc- 
tions are regarded as substitutions of null elements 
with non-null elements, and deletion functions ("sub- 
traction" in Matthews, 1974) vice versa. A substitu- 
tion is expressed in the following form: 
\[LtlS ~ RItS\] 
where LIIS is an expression which specifies the node 
of a tree which is to be replaced and PdtS is an ex- 
pression which specifies the subtree which is to re- 
place it. The above function template consists of a 
single rule (LItS =~ RHS), each rule defining a sub- 
stitution. All alternations are defined in terms of 
2We use the word stem to refer to any word-form; this is not 
intended to carry any implications about the nature or role of 
the object in question, but is mcrcly a relatively neutral word 
for referring to objects above the lcvel of the syllable. 
49 
stern 
onset 
b 
syll 
rhyme 
peak coda 
I I 
e 0 
onset I 
g 
syll 
rhyme 
peak coda 
i n 
Figure 2: /begin/ 
functions on trees, so tl{at a substitution involves 
the replacement of a single node in the tree together 
with all its daughters, with another node and all its 
daughters. 
The LtIS of the function is an expression consisting 
of a structurally unique mother category, a numerical 
argument indicating tile syllable, and a numerical 
argument indicating the segment (the last two being 
optional with certain restrictions). This gives objects 
such as the following: 
(stem, +1) - the first syllable. 
(rhyme,-1) - the rhyme of the final 
syllable. 
(coda, +1,-1) - the final segment of the 
coda of the first 
syllable. 
The LI1S of the function can also be qualified with 
additional phonological information, such as: 
(coda,+l,-1)/d/ 
(coda,+l,-1)\[+,voice\] 
which say respectively that the final segment of the 
coda of the final syllable must be /d/or must have 
the feature \[+, voice\] for the function to have any 
effect. (In the event of any of the conditions fail- 
ing, the identity functions applies, so that a function 
always applies although it may not have any effect.) 
The P~HS n'lay be a description of a complete 
phonological object (i.e. a tree) or a feature set,. Since 
all node names are assumed to be abbreviations for 
feature sets, a substitution of a feature set involves 
simply the substitution of some subset of features in 
the set. 
The RIIS may also consist of variables, indicated 
by upper case letters, which are bound to parts of 
the input. This is needed for reduplication functions, 
which require the a.ttixation of some element (usually 
a syllable) which is the sarne as some part of the 
stern. 
The tree-structured phonological representations 
are expressed with punctuation marks -- a pe- 
riod separates the three terminal groups, onset, 
peak and coda, a semicolon separates syllables and 
commas separate the segments within a terminal 
group. Thus, the representation for /begin/ would 
be/b.e.;g.i.n/. All phonological objects, whether as 
input to the function or as a subtree to be substi- 
tuted in within a flmction must have their structure 
specified in this way. 
In addition to tile qualifications which the LHS 
may have, any number of conditions may be placed 
on the input tree. A function which has conditions 
takes the form, 
\[LItS ~ I~I~S :C\] 
where the : behaves like the context slash / stan- 
dardly used in phonology and C is any number of 
conditions. Each condition takes the same form as 
the LHS with qualifications, and may be combined 
as conjunctions (linked with commas) or disjunctions 
(linked with colnmas within curly braces {}). Thus, 
we may define a flmction which adds a suffix/.u./ 
only if the final syllable has the feature \[+,light\] (the 
plural suffixation to nouns in Old English), 
\[(stem,-0) ~ /.u./ : (stem,-1)\[+,light\] \] 
In many cases, more than one Nnction needs to 
he applied, and the ways in which the flmctions are 
combined can be crucial. There are three possible 
situations: 
• where one function must apply before another 
• where two functions must apply simultaneously 
® where two functions do not interact and so may 
apply in either order or simultaneously and yield 
the same result. 
The first case is handled with composite applica- 
tion of functions. By composite application we mean 
50 3 
the application of functions combined with function 
composition, denoted by o, as standardly used. (See 
example of Latin reduplication below.) 
For the second case we use conjoint application, in- 
dicated by &, which may apply to whole flmctions, 
or to rules within flmctions. Thus, we may have two 
alternations which both require the same conditions. 
For example, the Austronesian language Rotuman 
exhibits morphological metathesis, with alternations 
like /tiko/- /tiok/, /fupa/ - /fuap/. This can be 
defined as a swapping of the onset and coda of the 
final syllable with the use of variables, but the vari- 
ables must both be instantiated before either of the 
functions apply. We can define this using conjoint 
application of rules (where a rule is LIIS ::> RItS) 
within a single flmction, e.g. 
meta = \[(onset,-1) => /C/& (coda,-1) => 
/O/:(coda,-1)/C/,(onset,-1)/O /1 
where O and C are variables which are bound to 
parts of the input, in this case, the onset and coda 
of the finM syllable respectively. 
There are two types of situation which fit the third 
case, 
• where there is an exclusive disjunction, only one 
of which can ever apply 
• where the functions affect different parts of the 
structure, as in "the case of Semitic verbs, where 
two peak change functions apply to the stem - 
one to the initial syllable and the other to the 
final syllable. 
The latter is treated as function composition since 
there can be no question of interaction in these cases 
as they affect different parts of the stem. The former 
is treated as conjoint application of functions, since 
the use of function composition would seem a misuse 
in a situation where only one of the functions could 
ever apply. 
4 Some Examples 
There are a wide variety of different morphological 
functions which occur in language. This section will 
examine three in detail. The first of these is sim- 
ple suffixation. Suffixation is the main function used 
in many European languages, and exa.rnples are nu- 
merous. We shall take the German example/ze:/- 
/ze:en/, "See" - "Seen", ('lake'-'lakes'). The basic 
function for describing sutfixation is: 
\[(stem,-0) =:,/S/\] 
where S is the suffix. This says, intuitively, "the 
space after the stem is replaced with the suffix, S". 
The flmction required to define the alternation 
/ze:/-/ze:en/, then is: 
s mx_en = \[(stem,-0) a /.e.n/\] 
The second flmction we shall look at is tt, e German 
mnlaut. The German umlaut is a classic example of 
a phonological process which ha~s become fossilized in 
the morphology. Once occurring as part of a vowel 
harmony process, it applied to stems, fronting a hack 
vowel when a suffix with a front vowel was added, to 
preserve vowel harmony. In modern German, vowel 
harmony is no longer a productive phonological pro- 
cess, but the umlaut remains, largely without any 
suffix, ms a morphological marker for plurality in 
some classes of nouns. 
The description of the umlaut as the change from a 
vowel with the feature \[+,back\] to the corresponding 
vowel with the feature \[-,back\], is perhaps slightly 
misleading. Although with most vowels this is the 
case -/i/ --/y/, /o/ - /oe/, etc. - with the diph- 
thong /ou/, the "umlauted" equivalent is /oy/, in 
which the second vowel is fronted, but the first is 
raised. This raising can be seen as a phonological 
assimilation process, leaving us to define the umlaut 
as the t'ronting of a vowel if it is the only vowel, 
the fronting of both vowels if the peak consists of a 
doubled (long) vowel, and the fronting of the second 
vowel only in a peak which is a diphthong. This can 
be expressed by means of the following functions: 
umlautl' = \[(peak,?,--1)\[+,back\] ::~ \[-,back\]\] 
umlautT = \[(peak,?,-2)\[+,back\] => \[-,back\] 
:(peak,?,-1)/V/,(peak,?,-2)/V/\] 
the second of which says that the first vowel in a 
peak with two vowels is fronted only if it has the 
same value as the second vowel. 
Tlle other problem with the umlaut is the syllable 
to which it applies. MOLUSC requires that a func- 
tion be defined in terms of the syllable to which it 
applies, and this is one of the things which makes 
it a language for morphology rather than phonol- 
ogy. Phonological phenomena occur whatever tile 
syllable, provided only that the context restrictions 
arc satisfied. Morphological functions, we argue, al- 
though in other ways very similar to phonological 
ones, apply in a more restricted way to particular 
parts of stems, which we show by means of exam- 
ples can be defined in terms of the representations 
described above. 
Most German noun stems are monosyllabic, but 
those di-syllabic stems there are seem to divide fairly 
evenly betwcen those to which the umlaut applies to 
the first syllable and those to which it applies to the 
second syllable, as Table i shows. However, if we 
look more closely at this table, we can see that all 
those nouns which undergo umlaut on the first sylla- 
ble, have the unstressed, neutral-vowelled/ou/, /a/ 
or /al/ as their second syllables. This might lead 
us to propose an analysis which rcquircs reference to 
4 51 
Nouns which take the umlaut Nouns which take the umlaut 
on the first syllable on the second syllable 
Apfel I'apfal/ 
Boden /'boodan/ 
Bruder /'bruuda/ 
Garten /'gaatan/ 
Hammer /'hams/ 
Laden /'laadan/ 
Ofen /'of0n/ 
Sattel /'zatal/ 
Ausflucht /'ausfluxt/ 
Auskunft /'auskunft/ 
Gebrauch /ga'braux/ 
Gewand /ga'vant/ 
Irrtum /'iatum/ 
Reichtum /'raixtum/ 
Vormund /ffamunt/ 
Vorhang /!fahat.}/_ __ 
Table 1: Di-syllabie German nouns 
metrical feet, in order to specify that the stressed 
syllable takes the umlaut, but if we look again, we 
can see that in words such ms /iatum/ ("Irrtum', 
'error') and /raixtum/ ("Reichtmn", 'wealth'), the 
unstressed, but non-neutral-rowelled second syllable 
is umlauted. Thus, it would seem that the neutral- 
ness of the vowel is the deciding factor, not the stress. 
The functions required for the umlaut in German are 
thus, 
umlautl = \[(peak,+l,-1)\[+,back\] => \[-,back\] 
:(peak,-1,+l)/o/\] 
umlaut2 = \[(peak,+l,-2)\[+,back\] ~ \[-,back\] 
:(peak,+l,-1)/V/,(peak,+l,-2)/V/, 
(peak,-1,+l)/o/\] 
umla,ut3 = \[(peak,-1,-1)\[+,back\] \[/,back\]\] 
umlaut4 = \[(peakc-l,Tg)\[+,back\] => \[-,back\] 
:(peak,-1,-1)/V/,(peak,-1,-2)/V/\] 
Note that there is no need to specify" the context in 
which the second syllable undergoes the umlaut, as 
any stem with /a/ in the second peak will be un- 
changed by the second two functions anyway, as/a/ 
does not have the feature \[+,back\]. 
Since there is no interaction between these fnnc- 
tions and only one is ever going to have any effect, 
we combine these with &, indicating conjoint appli.- 
cation, thus, 
umlaut = umlautl & umlaut2 & umlaut3 
g~ umlaut4 
Finally, let us look at a more complex function - 
the partial reduplication found in Latin. This in- 
volves alternations such as /fal/ - /fefeli/, /kur/ - 
/kukuri/. For this we need two functions. The first 
reduplicates the whole of the first syllable, and the 
second deletes the initial coda. (that is, the coda of 
the reduplicative affix). This is necessary because 
we cannot reduplicate the onset and peak of the first 
syllable, as this is not a constituent in the structures 
we use. However, this analysis is actually quite at- 
tractive from a linguistic point of view, as it leaves 
us with two very natural functions - (reduplicative) 
prefixation a.nd consona.nt cluster reduction, a com- 
mon phonological process. The two functions are: 
redup' = \[(stem, +0) => /V/ 
:(stem, +l)/P/\] 
cod~_0 = \[(coda, +1) a /0/1 
Tim first of these reads something like "/P/is pre- 
fixed where/P/is the same as the first syllable of the 
input stem", and uses variable binding. The whole 
reduplica,tion can then be defined thus: 
redup = \[(coda,-1) => 0\] o 
\[(stem, +0) 
:(stem, +t)/P/\] 
5 Concluding Remarks 
The approach to computational morphologLv advo- 
cated here allows one to enjoy all the benefits of pow- 
erful and economical inheritance mechanisms pro- 
vided by lexical representation languages like those 
mentioned in 1 above, but still provide phonologi- 
cally (semi-)realized forms as output. Using syllables 
as the basic unit of description for the realization 
language enables succinct and linguistically attrac- 
tive definitions of a wide variety of morphological 
alternations from simple affixation to partial redu- 
plication and morphological metathesis. 
An interpreter for the language has been imple- 
mented in Prolog, and used to test a wide variety Of 
morphological functions frorn a range of languages 
as diverse as English, Arabic and Nakanai. Cahill 
(1989) presents the language in full, with a formal 
syntax and semantics of the language, a description 
of the interpreter and analyses of substantial frag- 
ments of English and Arabic. 

References 

Cahill,L.J. - Syllable-based Morphology for 
Natural Langnage Processing, DPhil disser- 
tation, University of Sussex, 1989. 
52 5 

Clements,G.N. and S.J.Ncyser CV Phonol- 
ogy: A Generative Theory of the Syllable, 
MIT l)ress, 1983. 

I)aelemans,W. An Object-Oriented Com- 
puter Model of Morphonological Aspects 
of Dutch, Doc.toral I)issertat.ion, University of 
Leuven, 1987. 

1)urand,a.(ed.) Dependency and Non- 
linear Phonology, (;room Ilelm, 198~J. 

Fvans,R. and G.Gazdar - "lnlbrence in I)atr", 
in EA CL-89, Manchester, l:;ngland, \] 989, pp.6(5- 
71. 

Evans,R. and G.(}azda~ .... l'he Scmanl.ics of 
Dat.r", in ALSIL,¢9, Sussex, I';nglaud, 1989, 
pp.'Lg--88. 

Koskemfiemi,l(. A Two-level Morphologi- 
cal Proco, ssov PhD disscrtatSon, Uniwn'sil.y of 
Ilelsinki, 1983. 

Koskenniemi,l( "A Geucral Comput.aliomfl 
Model for Word-lbrm P~ecognition alld Prc)duc- 
Lion", in COL \[N(;" 'S.{, \] 98.i, I)i'-178-181. 

Liberma:\],M. and A.t:'rinc(- "()u Sl='ess and 
Linguistic t(yt ira," in l;t~c/ai.,~lic \]7~i!~i'ry S. 1977, 
c (, ,oC, ' pp. 24 O-3o,). 

Matthews,P. -J'Vlorlfllology, ()(JP, !97,I. 

de Smedt,t(. - "Usi:~g () :,. ¢. "-() " ~' t,~:< 
l(nowledgc-f{cpresc'p~taticm To.'!,, "l.__,.-m Mof 
phology arm Syntax Plogjra~l~t~\]li;\[', in L:CAI- 
a4, \]984, pl;.18\]-.d. 

Zwicky,A. - "Ilow to D,~scrii)~. hi!i,.ection", in 
I)crkclcy Li~,6,uis!ic 5'ocie\[y II, 1.985, 1-,,p.372- 
386. 
