Computing Prosodic Morphology 
George Anton Kiraz* 
University of Cambridge (St John's College) 
Computer Laboratory 
Pembroke Street 
Cambridge CB2 1TP 
George. Kiraz@cl. cam. ac. uk 
Abstract 
This paper establishes a framework un- 
der which various aspects of prosodic 
morphology, such as templatic morphol- 
ogy and infixation, can be handled under 
two-level theory using an implemented 
multi-tape two-level model. The paper 
provides a new computational analysis of 
root-and-pattern morphology based on 
prosody. 
1 Introduction 
Prosodic Morphology (McCarthy and Prince, 
1986, et seq.) provides adequate means for de- 
scribing non-linear phenomena such as infixation, 
reduplication and templatic morphology. Stan- 
dard two-level systems proved to be cumbersome 
in describing such operations - see (Sproat, 1992, 
p. 159 ft.) for a discussion. Multi-tape two-level 
morphology (Kay, 1987; Kiraz, 1994, et. seq.) ad- 
dresses various issues in the domain of non-linear 
morphology: It has been used in analysing root- 
and-pattern morphology (Kiraz, 1994), the Arabic 
broken plural phenomenon (Kiraz, 1996a), and er- 
ror detection in non-concatenative strings (Bow- 
den and Kiraz, 1995). The purpose of this pa- 
per is to demonstrate how non-linear operations 
which are motivated by prosody can also be de- 
scribed within this framework, drawing examples 
from Arabic. 
The analysis of Arabic presented here differs 
from earlier computa\[ional accounts in that it em- 
ploys new linguistic descriptions of Arabic mor- 
phology, viz. moraic and affixational theories (Mc- 
Carthy and Prince, 1990b; McCarthy, 1993). 
The former argues that a different vocabulary is 
Supported by a Benefactor Studentship from St 
John's College. This research was done under the su- 
pervision of Dr Stephen G. Pulman. 
needed to represent the pattern morpheme accord- 
ing to the Prosodic Morphology Hypothesis (see 
§1:1), contrary to the earlier CV model where tem- 
plates are represented as sequences of Cs (conso- 
nants) and Vs (vowels). The latter departed rad- 
ically from the notion of root-and-pattern mor- 
phology in the description of the Arabic verbal 
stem (see §3). 
The choice of the linguistic model depends on 
the application in question and is left for the gram- 
marian. Tile purpose here is to demonstrate that 
multi-tape two-level morphology is adequate for 
representing these various linguistic models. 
The following convention has been adopted. 
Morphemes are represented in braces, { }, and 
surface forms in solidi, / /. In listings of gram- 
mars and lexica, variables begin with a capital 
letter. 
The structure of the papcr is as follows: Sec- 
tion 2 demonstrates how Arabic templatic mor- 
phology can be analysed by prosodic terms, and 
section 3 looks into infixation; finally, section 4 
provides some concluding remarks. The rest of 
this section introduces prosodic morphology and 
establishes the computational framework behind 
this presentation. 
1.1 Prosodic Morphology 
There are three essential principles in prosodic 
morphology (McCarthy and Prince, 1990a; Mc- 
Carthy and Prince, 1993). They are: 
(i) a. PROSODIC MORPHOLOGY HYPOTHE- 
SIS. Templates are defined in terms of 
the authentic units of prosody: mora 
(it), syllable (~), foot (Ft), prosodic 
word (PrWd). 
b. TEMPLATE SATISFACTION CONDI- 
TION. Satisfaction of templates con- 
straints is obligatory and is determined 
by the principles of prosody, both uni- 
versal and language-specific. 
664 
c. PROSODIC CIRCUMSCRIPTION. The 
domain to which morphological oper- 
ations apply may be circumscribed by 
prosodic criteria as well as by the more 
familiar morphological ones. 
In the Prosodic Morphology Hypothesis, 
morn is the unit of syllabic weight; a monomoraic 
syllable, oh, , is light (L), and a bimoraic syllable, 
a~,~, is heavy (H). The most common types of syl- 
lables are: open light, CV, open heavy, CVV, and 
closed heavy, CVC. This typology is represented 
graphically in (2). 
G (7 (7 
CV CVV CVC 
Association of Cs and Vs to templates is based on 
the Template Satisfaction Condition. Asso- 
ciation takes the following form: a node a always 
takes a C, and a morn # takes a V; however, in bi- 
morale syllables, the second # may be associated 
to either a C or a V} 
Prosodic Circumscription (PC) defines the 
domain of morphological operations. Normally, 
the domain of a typical morphological operation 
is a grammatical category (root, stem or word), 
resulting in prefixation or sufilxation. Under PC, 
however, the domain of a morphological opera- 
tion is a prosodically-delimited substring within 
a grammatical category, often resulting in some 
sort of infixation. The essential for PC is a pars- 
ing function ~I~ of the form in (3). 
(3) PARSING FUNCTION 
+(c, E) 
Let B be a base (i.e. stem or word). The func- 
tion ~I~ returns the constituent C that sits on the 
edge E E {right, left) of the base B. The result 
is a factoring of B into: kernel, designated by 
B:~, which is the string returned by the parsing 
function, and residue, designated by B/O, which 
is the remainder of B. The relation between B:~ 
and B/~ is given in (4), where ~ is the concate- 
nation operator. 
(4) FACTORING OF B BY (I) 
B = B:~ ~ B/,~ 
To illustrate this, let B = /katab/; applying 
the function O(al,, Left) on B factors it into: 
(i) the kernel B:~ = /ka/, and (ii) the residue 
1Other conventions associate consonant melodies 
left-to-right to the morale nodes, followed by associ- 
ating vowel melodies to syllable-initial morae. 
= /tab/. 
A morphological operation O (e.g. O = "Pre- 
fix {t}") defined on a base B is denoted by O(B). 
There are two types of PC: positive (PPC) and 
negative (NPC). In PPC, the domain of the op- 
eration is tile kernel B:,IJ; this type is denoted by 
O:~ and is defined in (5a). In NPC, the domain 
is the residue B/O; this type is denoted by 0/4) 
and is defined in (5b). 
(5) DEFINITION OF PPC AND NPC 
a. PPC, O:O(B) : O(B:~) ~ B/O 
b. NPC, O/O(B) = B:O ~ O(B/,I~) 
In other words, in PPC, O applies to the ker- 
nel B:~, concatenating the result with the residue 
B/O; in NPC, O applies to the residue B/O, con- 
catenating the result with the kernel B:O. Exam- 
ples are provided in section 3. 
1.2 Multi-Tape Two-Level Formalism 
Two-level morphology (Koskenniemi, 1983) de- 
fines two levels of strings in recognition and syn- 
thesis: lexical strings represent morphemes, and 
surface strings represent surface forms. Two-level 
rules map the two strings; the rules are compiled 
into finite state transducers, where lexical strings 
sit on one tape of the transducers and surface 
strings on the other. 
Multi-tape two-level morphology is an extension 
to standard two-level morphology, where more 
than one lexical tape is allowed. The notion of us- 
ing multiple tapes first appeared in (Kay, 1987). 
Motivated by Kay's work, (Kiraz, 1994) proposed 
a multi-tape two-level model. The model adopts 
the formalism in (6) as reported by (Pulman and 
Hepple, 1993). 
LLC LEX RLC {::~, ~} 
(6) LSC - SURI,' RSC 
where LLC is the left, lexical context, LEX is the 
lexical form, RLC is the right lexical context, 
LSC is the left surface context, SURF is the sur- 
face form, and RSC is the right surface context. 
The special symbol * indicates an empty con- 
text, which is always satisfied. The operator =~ 
states that LEX may surface as SURF in the given 
context, while the operator <=~ adds the condition 
that when LEx appears in the given context, then 
the surface description must satisfy SuR~. The 
latter caters for obligatory rules. A lexical string 
maps to a surface string iff (1) they call be par- 
titioned into pairs of lexical-surface subsequences, 
where each pair is licenced by a rule, and (2) no 
partition violates an obligatory rule. 
665 
One of the extensions introduced in the multi- 
tape version is that all expressions in the lexical 
side of the rules (i.e. LLC, LEx and RLC) are 
n-tuple of regular expressions of the form (xl, x2, 
..., xn). The ith expression refers to symbols on 
the ith tape. When n = 1, the parentheses can be 
ignored; hence, (x) and x are equivalent. 2 
2 Templatic Morphology 
Templatic morphology is best exemplified in 
Semitic root-and-pattern morphology. This sec- 
tion sets a framework under which templatic mor- 
phology can be described using (augmented) two- 
level theory. Our presentation differs from pre- 
vious proposals a in that it employs prosodic mor- 
phology in the analysis of Arabic, rather than ear- 
lier CV accounts. Arabic verbal forms appear in 
(7) in the passive (rare forms are not included). 
(7) ARABIC VERBAL MEASURES (1-8, 10) 
1 kutib 
2 kuttib 
3 kuutib 
4 'euktib 
5 tukuttib 
6 tukuutib 
7 nkutib 
8 ktutib 
10 stuktib 
(McCarthy, 1993) points out that Arabic verbal 
forms are derived from the base template in (8), 
which represents Measure 1. a~ represents an ex- 
trametrical consonant; thai; is, the last consonant 
in a stein. 
(8) ARABIC BASE TEMPLATE 
O- 0" O-m 
kut ib 
The remaining measures are derived from the base 
template by afiqxation; they have no templates of 
their own. The simplest operation is prefixation, 
e.g. {n} + Measure 1 -+ /nkutib/ (Measure 7). 
Measures 4 and 10 are derived in a similar fashion, 
but undergo a rule of syncope as shown in (9). 
2Our implementation interprets rules directly (see 
(Kiraz, 1996c)); hence, we allow unequal representa- 
tion of strings. If the rules were to be compiled into 
automata, a genuine symbol, e.g. 0, must be intro- 
duced by the rule compiler. For the compilation of our 
formalism into automata, see (Grimley-Evans et al., 
1996). 
aNon-linear proposals include (Kay, 1987), (Kor- 
nai, 1991), (Wiebe, 1992), (Narayanan and Hashem, 
1993), (Bird and Ellison, 1994) and (Kiraz, 1994). 
A working system for Arabic is reported by (Beesley 
et al., 1989; Beesley, 1990; Beesley, 1991). 
(9) DERIVATION OF MEASURES 4 AND 10 
Syncope: V ~ q5/\[CVC __ CVC\]ste m 
a. Measure 4: ~u + kntib ~ */eukutib/ 
~Y2~ pe /~uktib/ 
b. Measure 10: stu + kutib 
*/stukutib/ ~Y2~ ~'e /stuktib/ 
The following lexicon and two-level grammar 
demonstrate how the above measures can be anal- 
ysed under two-level theory. The lexicon main- 
tains four tapes: pattern, root, vocalism and affix 
tapes. 
1 {~.~,,<,:} 
2 {ktb} 
3 {ui} 
4 {~V} 
4 {n} 
4 {stV} 
pattern : \[measure= (i-8,10) \] 
root : \[measure= (1-4,6-8,10) \] 
vocalism: \[tense=perf, 
voice=pass\] 
verb_affix: \[measure=4\] 
verb_af f ix : \[measure=g\] 
verb_affix: \[measure=lO\] 
The first column indicates the tape on which the 
morpheme sits, and the second column gives the 
morpheme. Eactl lexical entry is associated with 
a category and a feature structure of the form 
cat : FS (column 3). Feature values in parentheses 
are disjunctive and are implenlented using boolean 
vectors (Mellish, 1988; Pulman, 1994). 
{a~a,,a~} is tile base-template. {ktb} 'notion 
of writing' is the root; it may occur in all measures 
apart from Measure 5. 4 {ui} is the perfective pas- 
sive vocalism. Tile remaining morphemes repre- 
sent the affixes for Measures 4, 7 and 10. Notice 
that the vowel ill the affixes of Measures 4 and 10 
is a variable V. This maims it possible for the affix 
to have a different vowel according to the inood of 
the following stem, e.g. \[a\] in /paktab/ (Measure 
4, active) and \[u\] in/puktib/(Measure 4, passive). 
Since the lexicon declares 4 lexical tapes, each 
lexical expression in the two-level grammar must 
be at most a 4-tuple. A grammar for the deriva- 
tion of the cited data appears below. 
* (~.,ON,e) 
CV 
(~,C,<d 
(e,e,e,A) * 
• a * 
<+,+,+#> 
C 
c,e,6,+ } 
-- -C 
R1 , 
R2 , 
R3 , 
(X,*##) R4 , 
R5 (6,~,c,A) 
- (+,+,+,e) ** 
_ * :=> 
4Roots do not occur in all measures in the litera- 
tm'e. Each root is lexically marked with the measures 
it occurs in. 
666 
* (,~,,,C,V,e) * 1t6 
Gig C C2V1C3 
¢:> 
where (~i=radical, Vi=vowel, A=verbal 
affix, and X ¢ +. 
Rule 11,1 handles monomoraie syllables mapping 
(r%,C,V,e) on the lexical tapes to CV on the sur- 
face tape. 11,ule R2 maps the extrmnetrieal conso- 
nant in a stem (i.e. the last; consonant in a stem) 
to I;he sm'face. Rule \]/,3 maps an atlix symbol from 
the fourth tap(', to the surface. 11,ules R4 and R5 
delete the boundary symbols fl'om stems and af 
fixes, respectively. Finally, rule R6 simulates the 
syncoI)e rule in (9); note |;hat V ill LS~ must unify 
with V in LEX, ensuring that the vowel of the af- 
fix has the same quality as that of the stem, e.g. 
/'eaktab/aim /Pu+ktib/ (measure 4). 
The two-level analysis of the cited forms ap- 
pears below ST = sm'face tape, PT -- pattern 
tape, 115.\[' -- root tape, VT : vocal{sin tat)e , and 
AT = attix tape. 
Measure 1 Measure 4 
VT u\[il + 
I¢,T k It Ib I+/ 
PT I 
1 12 4 3 3 5 6 12 4 
\]kuitilb\] \]ST \[~lul lkItilb\[ \] 
Measm'e 7 Measure 10 
~i+l AT Itl.!+l 
uli\[ 4 VT ulil + 
kit Ibl4- I~,T k\] 1; \]i) l 
" ~.l~.l<,,l+ rT ~.1~.1<4 
351124 33356124 
\[nTl~ultilbl lET ~u! lkltilbt I 
The numbers between the two levels indicaW, the 
rule mlmlmrs in (8) which sanction the sequences. 
The remaining Measures involve infixation and are 
discussed in the next section. 
3 Infixation 
Standard two-levels models can describe some 
(;lasses of infixation, but resorting to tile use of 
ad hoc diacritics which have no linguistic signif- 
icance, e.g. (Antworth, 1990, p. 156). This sec- 
tion presents a framework for describing infixa- 
lion rules using our multi-tape two-level formal- 
ism. This is illustrated here by analysing Mea- 
sures 2 and 8 of the Arabic verb. Measure 2,/kut- 
lib/, is derived by prefixing a mora to the base 
template under NI?C. The operation is 0 = 'prefix 
i d and tile rule is ()/R~(a~, Left). The new mora 
is filled by the spreading of the adjacent (second) 
consonant. The steps of the derivation are: 
O/~\[~(kutib) = kutib@ * ()(kutib:{ 0 
-- ku * O(tib) 
= ku * /ztib 
= ku * ttib 
: kuttib 
Measure 8,/ktutib/, is derived by the aflixation 
of a {t} to the base template, under NP(\]. The 
operation is O = 'prefix {t}'; the rule is O/'I,(C, 
\[,eft;), where C is a consonant. The process is: 
O/{l}(kutib)-- kutib:{I~ * O(kutib:~) 
:: k * O(utib) 
-- k * tutib 
ktut ib 
The following two-lew'J grammar builds on the 
one discussed in section 2. The following lexieal 
entry gives the Measure 8 morpheums. 
4: {t} verb_all ix : \[measure=8\] 
The additional two-level rules are: 
<~,~,c,,v~,0 E - <~,~,c,*,c) -~ 
I{7 * C * 
Features: \[measure= (2,5) \] 
• (~,,,C,V,h) * ~. 
1{.8 * CAV * 
Features: \[measure=8\] 
where C,i:radical, 'Vi---vowel, A----verbal 
affix, a, nd X   +. 
Rules I1,7-11,8 are measure-strut{tic. Each ruh; 
is associated with a feature structure which must 
unify with the feature structures of the affecl;ed 
lexical entries. This ensures that each rule is ap- 
plied only to the 1)roper ineasure. 
I1,7 handles Measm'e 2; it represents the opera- 
tion O -- 'prefix it' and the rule ()/~(a~, Left) by 
placing B:~I~ in LLC and the residue B/~ in 11LC, 
and inserting a consonant C (representing t ~) on 
the surface. The filling of/t by the spreading of 
the second iadical is achieved by the unification 
of C in LFx with C in RLC. 
I1,8 takes care of Measure 8; it represents the 
operation () -- 'prefix {t}' and the rule O/'li(C, 
Left). Note that one cmmot place B:~\[~ and B/i\[ , 
in LLC and I/,LC, respectively, as tilt; ease in 1/,7 
because the parsing function cuts into the first 
syllable. 
Oil(; remaining Measul'e has not been discussed, 
Measure 3. 11; is derived by prefixing the base 
template with It. The process is as follows: 
667 
O- (7 O"x O" G" O'm 
kut ib kut ib 
O" (T (Yx 
kuut ib 
The corresponding two-level rule follows. 
adds a # by lengthening the vowel V into VV. 
* - (a,,,C,V,e) - * 
R9 * - CVV - * 
Features: \[measure=(3,6)\] 
The two-level derivations are: I 
u ii + VT 
Measure 2 k t \] b I+ RT 
1 7 1 2 4 
\[ku t \]ti!b ! IST 
Measure 3 
u i\] + VT 
k t Ibl+ RT 
9 1 2 4 
Ikuul tiib\] \]ST 
Measure 8 
t +1 AT 
u if + VT 
k tlbI+ RT 
a, a, for, ! + PT 
8 5 1 2 4 
\]ktu\] \]ti!b \] \]ST 
It 
Finally, Measures 5 and 6 are derived by prefix- 
ing {tu} to Measures 2 and 3, respectively. 
4 Conclusion 
This paper have demonstrated that multi-tape 
two-level systems offer a richer and more powerful 
devices than those in standard two-level models. 
This makes the multi-tape version capable of mod- 
elling non-linear operations such as infixation and 
templatic morphology. 
The rules and lexiea samples reproduced here 
are based on a larger morphological grammar 
written for the SemHe implementation (a multi- 
tape two-level system) - for a full description of 
the system, see (Kiraz, 1996c; Kiraz, 1996b). 
sional Publications in Academic Computing 16. 
Summer Institute of Linguistics, Dallas. 
Beesley, K. (1990). Finite-state description of 
Arabic morphology. In Proceedings of the Sec- 
ond Cambridge Conference: Bilingual Comput- 
ing in Arabic and English. 
References 
Antworth, E. (1990). PC-KIMMO: A two-Level 
Processor for Morphological Analysis. Ocea- 
Beesley, K. (1991). Computer analysis of Ara- 
bic morphology. In Comrie, B. and Eid, M., 
editors, Perspectives on Arabic Linguistics III: 
Papers from the Third Annual Symposium on 
Arabic Linguistics. Benjamins, Amsterdam. 
Beesley, K., Buckwalter, T., and Newton, S. 
(1989). Two-level finite-state analysis of Ara- 
bic morphology. In Proceedings of the Seminar 
on Bilingual Computing in Arabic and English. 
The Literary and Linguistic Computing Centre, 
Cambridge. 
Bird, S. and Ellison, T. (1994). One-level phonol- 
ogy. Computational Linguistics, 20(1):55 90. 
Bowden, T. and Kiraz, G. (1995). A mor- 
phographemic model for error correction in non- 
concatenative strings. In Proceedings of the 
33rd Annual Meeting of the Association for 
Computational Linguistics, pages 24-30. 
Grimley-Evans, E., Kiraz, G., and Pulman, S. 
(1996). Compiling a partition-based two-level 
formalism. In COLING-96. 
Kay, M. (1987). Nonconcatenative finite-state 
morphology. In Proceedings of the Third Con- 
ference of the European Chapter of the Associa- 
tion for Computational Linguistics, pages 2-10. 
Kiraz, G. (1994). Multi-tape two-level morphol- 
ogy: a case study in Semitic non-linear mor- 
phology. In COLING-9~: Papers Presented to 
the 15th International Conference on Computa- 
tional Linguistics, volume 1, pages 180-6. 
Kiraz, G. (1996a). Analysis of the Arabic broken 
plural and diminutive. In Proceedings of the 
5th International Conference and Exhibition on 
Multi-Lingual Computing. Cambridge. 
Kiraz, G. (1996b). Computational Approach to 
Non-Linear Morphology. PhD thesis, Univer- 
sity of Cambridge. 
Kiraz, G. (1996c)..SEM.HE: A generalised two- 
level system. In Proceedings of the 3~th Annual 
Meeting of the Association for Computational 
Linguistics. 
Kornai, A. (1991). Formal Phonology. PhD the- 
sis, Stanford University. 
Koskenniemi, K. (1983). Two-Level Morphology. 
PhD thesis, University of Helsinki. 
668 
McCarthy, J. (1993). Template form in prosodic 
morphology. In Stvan, L. et al., editor, Papers 
from the Third Annual Formal Linguistics Soci- 
ety of Midamerica Conference, paReS 187-218. 
Indiana University Linguistics Club, Blooming- 
ton. 
McCarthy, a. and Prince, A. (:1986). Prosodic 
morphology, ms. 
McCarthy, J. and Prince, A. (1990a). Foot and 
word in prosodic morphology: The Arabic bro- 
ken plural. Natural Language and Linguistic 
Theory, 8:209-83. 
McCarthy, J. and Prince, A. (19901)). Prosodic 
morphology and templatic inort)hology. In Eid, 
M. and McCarthy, J., editors, Perspectives on 
Arabic Linguistics IL" Papers from the Sec- 
ond Annual Symposium on A~ubic Linguistics, 
pages 1--54. Benjamins, Amsterdam. 
McCarthy, J. and Prince, A. (1993). Prosodic 
inorphology, ins. 
Mellish, C. (1988). Implementing systemic classi- 
fication by unification. Computational Linguis- 
tics, 14(1):40 51. 
Narayanan, A. and Hashem, L. (1993). On ab- 
stract finite-state morphology. In Proceedings 
of the Sixth Conference of the Eu~vpean Chap- 
ter of the Association for Computational Lin- 
guistics, pages 297-304. 
Pulman, S. (1994). Expressivity of lean for- 
malislns. In Markantonatou, S. and Sadler, 
L., editors, Grammatical Formalisms: L~sues in 
Migration, Studies in Machine Translation and 
Natural Language Processing: 4, pages 35 59. 
Commission of the European Communities. 
Pulman, S. and Hepple, M. (1993). A t~ature- 
based formalisln for twodevel phonology: a 
description and implementation. Computer 
Speech and Language, 7:333 58. 
Sproat, R. (1992). MoTThology and Computation. 
MIT Press, Cambridge Mass. 
Wiebe, B. (1992). Modelling autosegmental 
phonology with inulti-tape finite, state transduc- 
ers. Master's thesis, Simon l%'aser University. 
669 
