MULTI-TAPE TWO-LEVEL MORPHOLOGY: 
A Case Study in Semitic Non-linear Morphology 
George Anton Kiraz* 
COMPUTER LABORATO1;:Y, UNIVEIlSITY OF CAMP, RIDCI,; 
(St John's Colh!ge) 
E-mail. George. Kiraz~cl. cam. ac. uk 
April 22, 199d 
Abstract 
This I)aper presents an implemented multi-tal)e two- 
level model capable of describing Semitie non-linear 
morphology. The computational fl'arnework behind the 
ettrrcnt work is motivated by \[Kay 1987\]; the fimnal- 
ism presented here is an extension to the formalism re- 
ported by \[Puhnan art(1 Hepl)le. 1993\]. The objectives 
of the current work are: to stay as close as possible, 
in spirit, to standard two-level morl)hology, to stay 
close to the linguistic description of Semitic stems, and 
to present a model which can be used with ease by 
the Semitist. The. Imper illustrates that if finite-state 
transducers (FSTs) in a standard two-level morphology 
model are replaced with multi-tape attxiliary versions 
(AFSTs), one can account for Semitic root-andq)attern 
morphology using high level notation. 
1 INTRODUCTION 
This paper aims at presenting a computational mor- 
phology model which can handle the non-linear 
phenomenon of Semitic morphology. The ap- 
proach presented here builds on two-level mori)hology 
\[Koskennienfi 1983\], extending it to achieve the desired 
objective. Tit('. contril)ution of this l)almr tnay \])e Slllll- 
marised as follows: 
With regards to the two-level model, we extend this 
model by allowing it to have multiI)le tapes on the lex- 
ical level and retaining the one tape on the surface 
level; hence, 'multi-tape two-level morphology'. Feasi- 
ble pairs in the standard two-level model become 'fea- 
sible tuple pairs' in our multi-tape model. 
With regards to the formalism, we have. chosen a 
twodevel formalism and extended it to be al)le to 
write multi-tape two-level grammars which involve 
non-linear operations. To achieve this, we made all 
lexieal expressions n-tuple regular expressions. In ad- 
dition, we introduced the notion of 'ellipsis', which in- 
*Supported by a Benefitctor Studentship from SI+ Jolm's Col- 
lege. q~llis research was done tllld(!r the SUlmrvision <ff I)r Steph(!n 
G. Pulman whom I thank fro' guidance, support and feedback. 
q'hanks to 13r ,Iohn Carroll for editm'ial comments, Arturo 'lh'u- 
jillo for useful 'chats' ;rod Tanya Bow(h!n for Prolog tips, 
dicates the (optional) omission from left-context lexical 
e×I)ressions of tui)les; this accounts for spr(~a(ling. 
Two-level implementations either work directly on 
rules or compile rules into FSTs. For the latte.r cats(:, we 
propose, an au×iliary finite-state transduce.r into which 
multi-tape two-level rules can be co)replied. Tit(.' ma- 
chine scans %Ulfle imirs ' instead of pairs of symbols. 
'Fhe outline of the paper is as follows: Sect;ion 2 in- 
troduces the root-and-pattern nature of Semitic roof 
phology. Section 3 provides a review of the previous 
prol)osals iBr han(lling Semitie morphology. Section 4 
t)resents our proposal, extending two-level morphology 
anti l)roposing a formalism which is adequate, for writ- 
ing non-linear grammars using high level notation. Sex> 
tion 5 al)i)lies our model on the Arabic verb. Section 6 
I)resents an auxiliary automaton into which multi-tape 
two-level rules can/)e compiled. Finally, section 7 giw;s 
eonchtding remarks. 
2 1~10 ()T- A ND- PATTI,;I)~N MORPItOL- 
OGY 
Non-linear root-and-pattern morphology is best il+ 
lustrated in Semitic. A Setnitic stem (:onsists of a root 
and a vowel tnelody, ;u'rattged according to a canon- 
ic.al i)atte.rn. For examph~, Arahic/Iv'uttib/ 'caused t.o 
write' is composed front the root murphenm {ktb} 'no- 
tion of wril.inp;' and the vowel melody morpheme {ui} 
'pertlwt lmssive'; the two are arr:mged act:ording to the 
pattern morpheme {CVCCVC} 'causative'. 
Table \] (next page) gives the Arabic perfeetive vet- 
hal forms (from \[McCarthy 1981\]). l 
t As indicated by \[McCarthy 1981\], the datain q'a|fle 1 pro- 
vi(les stems \[n urtdtwlyhlg morphl)h)i;i(:al forms. Ilence, it, should 
he noted that: tlICTCld~ C3+S(++ l,~tHt\[t(}r gLrld lllli+111)t!l ' Hl3.t'k\[llg. i,q IU2)~ 
shown'~ llh+ttly sl, etns t!xperhmce l~holxcd()gicaJ l)roc,~!ssitlg t.<) give 
am'face forms, (!.~i. /nkatab/ -+ /?inkatab/ (ffn'm 7); the root, 
morphemes .shown ar,'+ iwd; +fit++d lit tlm litm+ature in all forms, 
e.g. Lhere is llo such verb as */tal~attab/ (form 5), but there is 
/takassab/ from the root morpheme {ksb}; the qua.lity of the 
Sl!COlld VOWel ill forth I iS (\[iflerent, frm+t ()lie roo£ t() tl+tlOI,hol'+ 1!.+~, 
/qalal/ %o kill',/qabil/ %0 accept', /kabur/ 'to become I)i~,', front 
the met morphemes {qtl}, {qbl} and {kbr}, reSlmctiv(dy. Some 
\['orNflS do llol. ()(:cut' ill the passive. 
180 
"l~d)le 1 
Active 
Arabic Verbal Stems 
Passive 
1 l~at ab kutib 
2 l~tttab kuttib 
3 kaatab kmltib 
4 paktab puktib 
5 takattab tukuttib 
6 talmatal) tukuutib 
7 nb~tat) nkutib 
8 ktatab ktutib 
9 ktabab 
10 staktat) stuktib 
Active 
11 ktaabab 
1.2 ktawtal) 
13 ktawwal) 
14 ktmdml) 
15 ktanbay 
Q 1 daltraj 
Q2 tadahraj 
Q3 (lllalll'at,j 
Q4 dharjaj 
Passive 
duhrij 
tuduhrij 
dhunrij 
dlmrjij 
Moving horizontally across the table, one notices a 
,:hang° i,, vowel melo(ly (a,:tive (a}, >ssive {ui}); ev- 
erything else remains invariant. Moving vertically, at 
(:hatlg~e in eailonical pattern ()CC/II'S~ e.verythillg else re- 
mains inwtriant. 
\[Ilarris 1941\] suggested that Semitic. stem mor- 
phemes are classified into: root nmrl)he.mes c(msist- 
ing of e()nSollatlts att(l pattern morphemes consist- 
ing of vowels and affixes. Morphemes which fall old: 
of the domain of the root-an(l-l)attern system, such as 
particles and preposil,ions, are. (:lassitied as belonging 
to a third (:lass consisting of successions of consonants 
and vowels. The analysis of /kuttib/ i)roduces: the 
root {ktb} 'notion of writing' and the pattern {_.u_:i } 
'causative - perfect passive' (wh(!re _ indicates a cons()- 
nan( slot, and : indicates gemination). 
\[McCarthy 1981\] provided a deel)er analysis rai- 
der the fl:amework of autoseglnental 1)honology 
\[Goldsmith 1976\]. IIere, morphemes are elassiIied into: 
root morphemes CO\[lSiStillt~ of COIIS()llalltS, vo('alism 
nlorl)henms consisting of vowels, and pattern mor- 
phemes wlfieh are. CV-skelet:{. 2 Each sits on a sepa- 
rate tier in the alttosegmental model, and they m'e (:o- 
ordinated with association liims according to 0m I)rinei - 
pies of autosegmental phonology; when universal l)rin - 
(:il)les fail, language specific rules al)l)ly. '12he analysis 
of/kuttib/produces three inori)hemes , linked as illus- 
trated below. 
Fig. 1 Autosegmental analysis of/kuttib/ 
n i voealism L I 
C V C C V C patter'a 
I ">/ I 
k t l) ~'ool, 
Similarly, one can describe nonfinals such as /kitaab/ 
'l)ook', /kutub/ 'books', /kaatib/ 'writer', /kitaaba/ 
'writing' and /katiiba/ 'squadron' etc. 
2The analysis of Arabic here is I)ased tm CV theory 
\[McCarthy 1981\]. Morale \[Mc('arthy an,I Prince l!)90a\] and at- 
fixational \[McCarthy 1992\] analyses will be di;;(:ussed in a future 
work. 
3 @OM I)UTATIONAL MODELS 
In the past decade, two-level morl)hology, introduced 
t)y \[I(oskenniemi 1983\], has I)ecomc ubiquitous, in sec- 
tion 3.1, we shall take a l)rM' look at two-level morl)hol- 
ogy. Section 3.2 gives a brief review of the previous pro- 
posals for dealing with Semitic non-linear mori)hology. 
Section 3.3 looks at the development of the \[ormalism 
which we have chosen for our proposal. 
3.1 Two-Level Morphology 
q'his approach de\[ines two levels of strings in recogni- 
tion and synthesis: lexical and surface, the former is a 
represent.ation of lexic;d strings; the latter is a represen- 
i.;ttir)n of Slit(ace sl,rillgs. A lltlai)l)ing seheltle 1)etweell 
the t, wo levels is described by rules wlfieh are compiled 
into I;'STs; the seI. of I;'STs rml in parallel. One c.ase of 
~.wo-level rules l.;tkes the following form: 
,:b :> c:d___c:f 
i.e. lexicai a eorresl)Oll(lS (;0 surface b wh(!ll l)i'eeeeded 
by lexical c corresponding; to surface d and followed by 
h!xical e corresponding to sllrfat;e f. The olmrai, or is 
oIle o1' follr l;yI)es: =~ for a ¢;otltext restriel;iolt ruh!, <= 
for it NIII'\[';WO (',oercioIl rllle, ¢5 for & eolllposite rule (i.(',. 
a c.ol,q)osit.ion o\[ > and <:=), and /¢= \['or an exclusion 
rule. lh!re is an example from \[Ill(chic 1992\]: 
Fig. 2 Two-level description of moved \[ 
-m o ~v e- F e d lexical 
m o v 0 0 e d 
The process can He deseriHed l)y the rules: 
x: x -> ..... (l) 
I :l) _> ...... (2) 
(::0 :> v:v .... -t:0 (3) 
liule I is t.he default rule., where a lexi(:;d charac. 
ter al)l)ears oil the mlrfat:e. \]~.llle 2 is the I)oml(lary 
rule, where l;he lexieal morph(mm boundary symbol is 
deleted on the surface (i.e. surfaces as '0'). l{ule 3 
sl:at.es the deletion of lexical \[e\] in {re(we} in the. con- 
t;ext shown. 
One can see t.hat two-level morl)hology is highly in- 
lhmneed by co\[icatellative morphology: the first re- 
quirement for at sm'faee form to be related t:o a lexi- 
cal tbrm, given by \[/{.itchie 1992\], states that "the lex- 
ical t;alm is the eont:atcnatimz of the lexieal forms in 
qul!sl.ion..." (italics mine). This makes it extremely 
ditlieult, if not imlmssil)le , to apply the mttonomous 
Inorl)helues o\[ ,qemil,ic Lo l~lainst, remll two-level IIOI,3~~ 
(ion. 
181 
3.2 Previous Proposals 
Working within standard two-level morphology, 
\[Kataja and Koskenniemi 1988\] went around the prob- 
lem. Nominal forms, such as /kitaab/ 'book', were en- 
tered in the lexicon. Vert)al forms were derived by a 
'lexicon component'. A verb, such as /nkutib/ (form 
7), has the lexical entries 
n E1 u El i El 
where El is the alphabet of the root and E~ the al- 
phabet of the vocalism/affixes. Tim lexicon compovent 
takes the intersection of these two expressions and pro- 
duces/nkutib/. Now/nkutib/is fed on the lcxical tape 
of a standard two-level system wtfich takes care of con- 
ditional phonetic dmnges (assimilation, deletion, etc.) 
and produces/'einkutib/, a A similar approach was used 
by \[Lavie et al. 1988\] for IIehrew using a 'pre-h!xical 
compiler'. 
\[Kay 1987\] proposed a finite-state aplnoacl~ using 
fimr tapes for root, CV-skeleton, vowel melody and 
surface, each having an indel)endent head, i.e. the ma- 
chine can scan from one lexical tape without moving 
the head on other lexieal tapes. The absence of mo- 
tion is indicated by ad hoc notation coded in the lexical 
strings. 
\[Beesley 1991\], working on Arabic, impleme,ited a 
two-level system with 'detours', where, according to 
\[Sproat 1992, p. 163-64\], detouring involves nmltiple 
dictionaries being open at a time, one for roots and one 
for templates with vowels pre-compiled (as in iIarris' 
description). 
Other non two-level models were proposed (there 
is no place here for a review of ttmse works): 
\[Kornai 1991\] proposed a model for autosegmental 
l/honology using FSTs, where non-linear autoseg- 
mental representations are coded as linear strings. 
\[llird and Ellison 1992\] proposed a model llased on 
one-level phonology using FSA to model representa- 
tions and rules. \[Wiel)e 1992\] pr(llmsed I,l(Jdellii,g au- 
tosegmental phonology using multi-tal/e FSTs, where 
mitosegmental representations m'e coded in arrays. 
\[Puhnan and Hepi)le 1993\] prol)osed a formalism for 
bidirectional segmental phonological processing, and 
i)roposed using it for Arabic. The next subsection 
presents the develoi)ment of this formalism. 
3.3 Previous Formalisms 
\[Black et al. 1987\] pointed out ttmt previous two-level 
rules (cf. ,~a.1) affect one character at a time and pro- 
posed a formalism wtfich maps tletween (equal ram> 
bered) sequences of surface and lexical characters of 
the form, 
SURF ~ LEX 
alnidal consonant clusters, CC, take a prosthetic /Pi/. 
A lexical string maps l;o a sllrfaee sLring iff they 
can be partitioned into pairs of lexical-sm'fi~ce, sub- 
sequences, wtmre each pair is licenced I)y a rule. 
\[l\].uessink 1989\] added explicit contexts and allowed 
unequal sequences. \[Puhnan and IIepple 19931 (level- 
oiled the l'ormalism further, allowing feature-based rep- 
resentations interpreted via unification. 
The developed formalism is llased on the existence of 
only two levels of rel)resentation: sm'face and lexical. 
Two types of rules are provided: 
LSC - SuI~," - I1.SC --> LLC - LF:x - B.LC 
LSC - Sm~v - RSC c> LLC - LEX - RLC 
where 
LSC = h.'ft surfiu:e context 
~ IJll I,' --- Sllrface fortn 
I/.SC - riKht ranface context 
LLC = h'l't le.xical context 
LI,:x = lexical form 
I{LC = right lexical conte.xt 
The special symbol * indicates an empty context, which 
is always satisfied. The operator ~ states that lw, x 
'tttay sur\[itc.e, as StJIIF ill the given context, while the 
operator ¢5 adds the condition that when LEx appears 
in the given context, then the surface description must 
satisfy S1HII.'. 'Phe later caters for obligatory rules. 
The advantage of this Rn'malism over others is that it 
allows inter alia mappings between lexical and surface 
strings of uneqmd lengths/! 
Rules 1- 3 can be expressed in this formalism as 
follows: a 
• -X-* => *-X-* (,l) 
• - -* ~ *-+-* (5) 
• - -* <~ v-e-+ (6) 
Pulman and llepi)le proposed using the formalism 
for Arabic in the following manner: surface /k'utti5/ 
call be expressed with the r/lle: 
• -- Ct'tt.U2C2iC:l -- * -> -}- -- CIC2Ca -- + 
WIH!I'I! (',,, l'l!pl'l!S('llts I.\]1(! 7ztlI radical o\[' die root. They 
conclude that their representation is closer I,o the lin- 
guist,ic mmlysis of lIarris tlmn McCarthy. 'l~lte only 
disadvantaZ(~ is that lexi(:al (Jements, so. lint.tern and 
vocalism, al)llem in rules resultin/_, ~ in one rule per 
tentlllate-vocldism. 
4 A MULTI-q_'APl,; Two-LEvEL AP- 
I)llOACII 
Now we l)resent our prolm~ed model. Se(:tion 4.1 de- 
fines a multi-tap(, two-level model. Section 4.2 ex- 
pands the formalism presented in section 3.3 making 
it a multi-tape two-level formalism. 
4This allows two-level i~rallllll~l,y.tl Lo handle C,V, lIior0,1c &lid 
infixrd,ional im~tlyses which we shMI present in a future work. 
s0 in rules 1- 3 is indicated here by blank. 
182 
4.1 A Multi-Tape Two-Level Model 
This work follows \[Kay 1987\] in using I;hree I, apes l))l" 
the lexical level: pattern tape (PT), root tal)e (liT) 
and voeallsm tape (VT), and <m<: sm'face, tape (ST). 
Ill syntliesis, the lexical tapes are in read mode and the 
surface l;aI)e is in write mode; in recognil;ion, the op- 
posite state of affairs holds. One of the lexieal tapes 
is called the prhnary lexieal tape (PILF) through 
wtfieh all lexical morphentes which fall out of the do- 
nlain of rool;-and-pattern morl>hology are passed (e.g. 
pretixes, sutlixes, I~artic:les, prepositi<ms). Since char: 
acters in P'.I' correspond to those on ST, P'F was chosen 
as PLT. 
There is linguisti<: SUl)pnrt for n lexical l.apes 
maI)l)ing to <)ne surface tape. As described })y 
\[McCarthy 1986\], when a word is uttered, it is pro- 
nounced in a linear string of segmmits (eorrespondinf,; 
to the linear ST in this model), i.e. the multi-tier rep- 
resentatioll is linearised. McCarthy ealls this process 
tier eonllation. 
4.2 A Multi-Tape Two-Level Fornml- 
ism 
The l'ulnuul-Ih;pl)le/lhmssink/lllaek ct aL fornialisnl 
is adopted here with l;wo extensions. The first exten- 
sion is that all expressions in the lexical side of (.he 
rules (i.e. LLC, LBX and RLC) are n-tuple regular 
expressions of the form: 
(;1~1) ;i;2) • • , ) "lT~'t) 
If a regular expression ignores all tapes lint Pl;I?, the 
parentheses can 1)e ignored; hence, (x) is the sanlt! ;ts :.): 
where x is on PIfF. llaving n-tuI)le lexical exI)r(!ssions 
and 1-tuple surface expression corresponds to having 
n-tapes on the lexieal level and one ()it the surface. 
The second extension is giving LI,C the ability to 
contain ellipsis, ... , which indicates the (ol)tional) 
omission li'om LLC of tvples, provided that the t.uples 
to tlt(: left of... are the first to apl>ear Oil {;h(! \]ell. of 
l,\[~X. For examf)le , (;It(: LI,C (:xl)ression 
(<0 "" (~') 
matcltes al), axtl), axlx2t), axlx.2...1), where xi 7 / (at. 
In standard two-lew~l morphology we talk of feasilfle 
pairs. Ilere we talk of feasible tulile pairs el the 
forlrl 
(.",, :"':,..., :':,,) : (:'D 
For example, ll.ule 8 (see. I)elow) gives rise 1.o four fea- 
sible tul)le l)airs (C/, X, ):(X), 1 < i 5-4 4. The set of 
feasible tuple pairs is determined the same way as the 
set of feasible pairs in standard two-level gramniars. 
Now that we have presented otir prol)osal , we are 
ready to aplily it, to the Aral)ic data of '1'ahh! I. 
5 ANAI,YSIS OF 'I?IIE ARABIC VEIt.B 
~ection 5. l presents l;he default and I)oundary rules for 
Arabic. in the twoqevel fortnalisni. ,qec.I;ion 5.2 gives 
rules which handle vocalised-, non-voealised-, and l)ar- 
lially voealised tex(;s. I,'inally, we shall see the use of 
ellipsis to m:connt for gelllingd;ion and spreading in sec- 
tion 5.3. 
5.1 Defimlt and Boundary I{ules 
The default and boundary rules for Arabic in the mull,i- 
I.ape fornlalisnl are: G 
*-X--* -> *-X--* (7) 
*-X--* =~ * --- ((,<,X, )--* 
*-X-* -> *--(V,,A')-* 
V < {,,,,'.~ } (9) 
* - * -> *--t .... ~ (10) 
* - -* :> *- (-t,-I,+)--* (1t) 
Rule 7 is equiwdent to Rule 1. llule 8 states that any 
(} on t.he pal,l*q'n i.al)e and X on l;he root tal)e with lie 
I,ralisitioll (lti I;he. vocalisni tape c.orrespolld (,o X (ill the 
Sllrfac(! tape. Rule 9 sl, al;es that ally V oil the l)attern 
l;al)e and X {)n vocalisltl tal m with I1o transition on I;\]ie 
root tape ('.orresl)ond to X on tile. sln'face tape. Rule 10 
is the bomMary rule for morl)henw.s which lie out of the 
doniain of rool,-andqml.1;ern niorphology. Rule 11 is I,he 
})OllIiditry rille for sl,enis. 
llere is the derivation of Idri,,,,,',.ij,d (r<),-,n Q3)f=,>tn 
the three morphemes {e,c.2v,nc:,v2c4}, 7 {<ilt,'i) and 
{ui}, and the sutlix {at} '3M pers(m' which falls oul, 
of l,he dOillahl el rllol,-alld-.liat;tern Inort)holo/ry all(i> 
hence, I, akes its place on PI'I'. 
Fig. 3a Form Q3 -i- {a} 
u " i I- VT ........ J ........ \], ...... 
8 8 9 7 ,g 9 8 1 I 7 10 
The numl)ers between Srl ' and the lexical tapes indicate 
l:he rules which sanction the moves. 
We find l,hat default and l)oundary rules represent :t 
wide range of Seniti,it stenls. 
6Varialih!s are indicated by Ilplier-i':ase leti,ers and {t|,OllliC (!\]- 
(!lll(!llI,s Iiy lllwi!r ('itSl!-If!lJAws, 
7Nnte that assm:iat.lon lines are indicated hnplicltly by IliIlll- 
bering the (~V element;; in the pattern Inorpheliie, 
183 
5.2 Vocalisation 
Orthographically, Semitic texts appear in three forms: 
eonsonantal texts do not incorporate any w~wels but 
mattes lectionis ~, e.g. ktb for/katab/ (forln 1, active), 
/kutib/(form 1, passive) and/kutub/'books', but kaatb 
for/kaatab/(form 3, active) and/kaatib/'writer'; par- 
tially voeallsed texts incorporate some vowels to 
clarify ambiguity, e.g. kufl> for /kutib/ (form 1, pas- 
sive) to distinguish it fi'om /katab/ (form 1, active); 
and voeallsed texts incorporate flfll vocalisation, e.g. 
st&tab (form 10, active). 
This phenomenon is taken care of by the following 
rules: 
*- -* a (x0-(v)-(x=,) 
X I , X2 -¢ vowel (12) 
, - -, ~ (I',, x,, ) - (p, , x) - (i~, x~, ) 
P (~ {vl,v2}, X = vowel, 
1",,1~ E {cl,c2,ea,c4}, 
XI, X2 = radical (13) 
R.ule 12 allows the omission of non-stmn vowels (i.e.. 
prefixes and suffixes). Rule 13 allows the omission of 
stern vowels. Note that the lexical contexts, LI,C and 
RLC, ensure that mattes lectionis are not omitted in 
the surface. Here is form Q3 with partial vocalisation 
on the surface. 
Fig. 3b Form Q3 -I- {a} partially vocalised 
~ u i + VT 
8 8 9 7 8 13 8 11 12 10 
E,, I', I" 1" I'I I VT-I T 
One additional rule is required t<> allow the omis- 
sion of vowels which experience spreading (see Rule 17 
below). 
5.3 Gemination and Spreading 
The only two phonological <:hanges ill the Arabic sl.em 
are gemination and spreading, e.g. /tukuttib/ (form 
5) fi'om the morphemes {tvlct vl c~c~v2<:a }, {ktb} and 
{ui}. The gemination of the second radical \[t\] and the 
spreading of the first vowel \[u\] can be expressed by 
Rule 14 and Rule 15, respectively: 
*-X-* ~ (e.2,X, )-o2-, (14) 
* -- X -- * ~ (111, , X) .... V I -- * (15) 
8'Mothers of readlng', these are consonantal h!tters which play 
the role of vowels, all{\[ are represented ill t.he p3.ttel'll l/iol'|)helill~ 
by VV (e.g. /aa/, /uu/, /ii/). Mattes lectionis cannot be omit- 
ted fi'om the orthographic string. 
Note. the use of ellipsis to indicate, t;hat there are el- 
emenl;s separal;ing tile two \[u\]s. Form 5 is illustrated 
below (without boundary symbols). 
Fig. 4 Form 5 
--q i VT 
7 9 8 ;I 5 8 14 9 8 
In fact, gemination can be considered as a case <>f 
spreading; llnle 14 lmcomes, 
• -x-, -~ (<,,x,) .... ~,~-, (|6) 
This allows fin/tuk'ul, l, ib/(form 5)and/l,:tawtab/(form 
\,Ve also need to allow a vowel which originally sur- 
faces hy spreading t:o be onfil.ted in the Slll'face ill llll- 
vocalised words. This ix accomplished l)y l;he. \[bllowing 
rule: 
('U\[, ,X)..-(Pl,Xl, )-v I -(\[~,X2, ) 
X = vowel, 
P1,1~ C { el, <,, c:~, c4 }, 
Xt,X) = radical (I7) 
Not:e thai, the segments in SIJItF iIl the above rules do 
not appear in LI.;X, rather in L\[,C. This means \[;hat, if 
rllles are to })e eoml>ile<l ill{;() alltolllata, the alll;Omata 
}lave t;o rcmember i;he segments from LLC. 9 This leads 
us on thinking about what sorl; of allI;Olllal;a are needed 
to describe a mull,i-tape two-level grammar. 
6 C, OMPlI,ATION INTO AUTOMATA 
We define the following antomat, ou iul;o which rules can 
he cmnpiled: 
A multLtape f-register auxiliary finite-state 
automaton (AFSA) with n-tapes consists of: n read 
tapes and heads, a linite state control, and a read- 
write storage tape of length g, where f < w, and 
w is the length of the inlml; strings (of. APDA in 
\[I\]opcrofl. and Ulhmm 1979\]). The auLomal;on is illus- 
trated iu Fig. 5 (next page). I° 
In cme mow~, depending on the state of the finite 
control, along with the symbols scanned by the input 
aml storage heads, the AFSA may do any (n' all of the 
following: 
'qlf the h'aph!mental, ion works dh'e(%ly on ru\[es~ this can he 
achieved by unification. 
lI)~ ::: A ill LhO dla,P;rRHL 
184 
Fig. 5 
input tapes 
AFST 
....... F II 
7_Jt172 
storage 
• clumgc ~ state; 
• mow~ its ~t input heads independently c,n,:~ l)osil.iou 
to the right; 
• print a symbol on the coil scanned by the sLot'age 
head and (optionally) move that; head ont, l)osition 
to the right or loft. 
More fern, ally iLI/ AFSA is a se.xtui)lo of tim fOl'lli 
(Q,);, F, 6, q0, F'), whore: 
• Q is a finite sot; of states; 
• E is the machine's alphabet; 
• it' C )\] is the storage alphahot; 
• ~$ is the transition function, a map from Q × a x F t,o 
Q x I' x {L,/{}, where o" is (al, ..., o,,) and a i C Y;; 
• qll El Q is t.h,', initial sl.~tl.e; 
• 1,' C Q is the. sot of final st;ares. 
The transil;ion function a(l,, ~, r) -= (q, ,,., .,) iff t.he ma- 
chine emt move from state p to state q wlfile s(:antfin Z 
the n-tuplo cr from the input tapes and r from the cur- 
rent storage cell, and upon ente.ring state q, writes the 
symbol w onto the. cllrrent sl, or;Lg(1 cell ;m(I moves the 
storage head according to m E { L, l~}. 
A multi-t:ape ,t?-reglstm' auxiliary finite-state 
transducm' (AFST) wit;ll n inlmt tapes and k outlntt 
tapes is ml AFSA with (t+ + k)-tapos. AFSTs lw.httvo 
like AFSAs, but scan t.uple pairs. 
Note that an AFST with n = k -= I and ~? =: 0 is 
equivalent to a. FST. 
The rules are comIfiled into AFSTs in the same lines 
of standm'd two-level morphology. We shall ttso. a spe- 
cial ease of AFSTs: We hypothosise that, in lilms with 
tie.r confl:+A:ion, for all tnortJtcJogical processes, k=l 
(i.o. on('. surface tape); further, wo .:msmno l,hat, m> 
less one proves otherwise, all morphological processes 
require that f < 1 (hence, we shall ignore m in a). 
l,'or Semitic, n=3. The AFST for Rub 15 is illus- 
Ix;tted bolc~w. 
Fig. 6 AFST for Rule 15 
Def, 0 ; 0 
(v1,0,X):X, 0 ; X 
(vl,0,X):X, (vl,0,0):X, X ; 0 I)cf, 0;0 (13acktracking) \[ j (Road) 
l)c{', 0 ; 0 
(vl ,O,O):X, X ;(I 
(Re.acO 
Transitions m;u'ked with l)ef (for default) take phu'.o 
wh(!n a is a ft!:mibh! i.uI)le pair, oLhor dm, n l.hoso ex- 
plicidy shown. The onq)t.y st.rhlg is rot)resented I)y 0. 
The transil.ions are: 
• @so, l)cf, 0) = (so, 0) allows strings not related to 
l.his rub to be accepted; 
• @s0,0, ,O,X) : X,O) -. (sl,X) enters the rule 
writ.inp; X in t.he storage coil; 
• ,S(.~,,(',,,,O,X) : X,0) = (.~t,X) at,d 
,S(s.,, ('vl, 0, X) : X,(\]) = (st, X)ensure. badda'ack- 
ing; 
• ~(,'1, De f, O) = (sl, O) t'Oln'Osents ellipsis; 
• (S(sl, (v1,0,0): X, X) == (.,'2,0)retrieves the ('otx- 
t~mt.s of the storage cell; 
• ,S(.~.,, (',.,,, 0, 0) : .V, .V) =: (<,, {I) ;dl,,ws ,:,,,,s,,,:,,Uv,, 
,,,.di,,r; op,.'at.i,)ns, e.g. \[aa\] in/~.t..,~.t,/ (form 
6). 
• ¢$(s'e,De:f,0) = (.st,0) allows noll-(:OllSectttive 
reading operations, e.g. the three \[a\]s in /I.akat- 
t.b/ (form ~). 
'7 CON (',LU,qION 
This lmpor has shown that a. muld- t:apo I;wo-lovol ap- 
proach using t:he Puhnan-I \[eplflO/ILuessinl¢/Bhtcl~ et al. 
formalism with the. extensions mentioned is capable of 
do.scribing the whole range of Arabic stems. 
Why do we need storage in the automata? It is 
known that ml automaton with linito storage can bo 
rOl)laeed with a larger (me without storage (:t simt)le so- 
hd.i(m is i.o dui)licato l.he ma(:hino for each case); hence, 
18.q 
using finite storage (especially with g _< 1 and a small 
finite set of I') does not give the machine extra l)ower. 
The reason for using storage is to minimise the munher 
of machines and states. 
With regards to the implementation, first we imple- 
mented a small system in order to test the usage of 
AFSTs in our model. Once this was estat)lished, we 
made a second implementation based on the work of 
\[Pulman and Hepple 1993\]. This iml)lementation dif- 
fers fi'om theirs as follows: Lexical expressions are n- 
tuples, i.e. implemented as lists-of-lists instead of lists- 
of-characters. A facility to check ellipsis in rules was 
added. The lexicon consists of multii)le trees, one tree 
per tape. Finally, a morphosyntactic pro:set was added. 
Wc conclude this paper by looking at the possil)ility 
of using our model for toiml hmguages. 
7.1 Beyond Semitic 
This approach may be capable of des(:rit)ing other types 
of non-linear morphology, though we have not yet looked 
at a whole range, of examples. The following may form 
a theoretical franmwork for a number of non-linear phe- 
nomena. 
Consider sui)rasegmental morphology in tonal lan- 
guages. Tense in Ngbal~% a language of Zaire, is in- 
dicated l)y tone, e.g. {kpolo} 'return' gives Ikpat,)/ 
(Low),/kpSls/(Mid),/kpbl6/(Low-Iligh), and/kp61S/ 
(Iiigh) \[Nida 1949\]. This can be expressed with the 
stem nlorpheme. {lq)olo} on one tape and the tonal 
morphemes {L}, {M}, {LH} and {tI} on a second tape 
with the lbllowing rules: 
*-C-* => *-C-* (18) 
*-V-* :~ *--V-* (:19) 
*-T--* ~:;, (V, )-(,T)-* (20) 
where C is a consonant, V is a vowel and T is a tonal 
segment (these rules are for the al)ove data only). The 
transitions for /kpald/are shown below: 
Fig. 7 {kpolo) -I- {LII} 
~_VPV~ -_ 1 10 ~ St.era 
18 18 19 20 18 19 2(1 
For all other cases one needs to add a rule for spreading 
the tonal morpheme. 
7.2 Future Work 
Cmrently, we are looking at descrihing tl,e Semitic 
stem using morak: \[McCarthy and Prince 1990a\] and 
affixational \[McCarthy 1992\] analyses of Semitic sLems. 
Another area of interest is to look at the formal prop- 
erties of the formalism and of the AFSM. 
References 
\[Beesley 1991\] I(. Beesley. 'Computer Analysis of Aral)ic Mor- 
phology.' 11. Comrie and M. l"id (eds.) l'erspectives on 
Arabic Linguistics IlL 
\[\]~ird ~l.l~ld \]~\]\[\]~()\[l 1992\] S. \]~hd and T. l'\]llison. One Level 
Phonvlogy. Edinburgh research Papers in Cognitive Sci- 
ence, No. EU('CS/RP~51 (updated version 1993). 
\[Black et al. 1987\] A. Black, G. Ritchie, S. Puhnan, G. \]hlssel. 
'Formalisms fbr Morphographernlc Description.' 1,2ACL- 
3. 
\[(:oidsmith 1976\] J. (',oMsmith. Autosegmental Iqwnology, l)oc- 
t()r;t\] dissertaLion, MIT, Published later ,'us Autosegmen- 
tal and Metrical l'honology (Ox\[ord 1(.)90.) 
\[llarris 1941) Z. llm'ris. 'Linguistic Structmre of Hebrew.' ,liner- 
nag of the American Oriental 5'oclety: 61. 
\[llopcroft and UIhmm 1979\] .l. l\]ol)croft and J. Ulhnan. Int~v- 
duclion to Automata Theory, Languages, and Compu- 
tation. (Addlson-WeMey). 
\[Kataja and Koskemdemi 1988\] I,. Kataja and K. Kosken- 
niemi. 'Finite State 1)escril)l.ion of Semitic Morphnlc)gy.' 
COLIN('-S& 
\[Kay 1987\] M. Kay. ~Nonconcatenative Finite-St;tte Morph\[fl- 
ol',y.' ACL Prvceedings, 3rd I?u~vpean Meeting. 
\[Kornai 1991\] A. Kornai. l;'ormal PhonMogy. Ph.D. thesis, Stan- 
tk)rd University. 
\[Koskemdemi 1983\] K. Koske,miemi. Two Level Morphology. 
l'h.I), thesis, University of llelsinki. 
\[Lavie et al. 1988\] A. Lavie, A. Itai, U. Ormm. 'On the Appli- 
cability of Two Level Morphology to the ln\[lection of 
llebrew Verbs.' /'roceedin9 s of ALL C IIL 
\[McCarthy 1981\] .1. J. McCart;hy. 'A Prosodic Theory of Non- 
con(:;tteruttive Morphoh)gy.' L112. 
\[McCarthy 1986\] .1..l. McCarthy. 'OCP effects: gmnination and 
antigemination' LI 17. 
\[McCarthy ~tnd Print:e 199(ht\] J. ,I. McCarthy and A..q. Prince. 
'Prosodi(: lvlorphoh)gy and Ten-ll)latic Morphology,' In 
M. Eid and ,I. McCarthy (eds.) Perspectives on Arabic 
Linguistics II. 
\[Mc(.~;u'thy 1992\] .I..I. Mc(~au'thy. 'T(!ml)l;d.e Form in l)roso(lic 
Mm'l)hohq,,y.' (1.. al)l)ear in th(! Im)C(!edings \[)\[" tim For- 
Imtl l,inguisl.ics Society oi' Mid-America III. 
\[NMa 1949\] I'L Ni(\[a. Morphology: 'l'he I)eseriptivc Analysis of 
Words. (Ihfiwwsity ~d' Michigan Press.) 
\[Puhn;ut and llepl)le 1993\] .q. I)ulman and M. lIeI)ph~. 'A 
featm'e-based form;flisnl for two-level phonology: ,"t de- 
scriI)tion and hnplemtmtation.' (;ompuler Speech and 
\],a~tguage 7. 
\[l/itchle t992\] (l. lilt.chic, q,;ulgmtges (\](!\[ltwltl, e(\] \[;,y Tw(>lmvel 
Mnrphologlcal lhlles.' UL 18. 
\[ILuessink 1989\] 11. II.uessink. 'Two I,evel l'~ormalism. ' Utrecht 
gorkin 9 /tapers in NLP, No. 5. 
\[Sl)roal. 1992\] 1{. Sproat. Morphology and Compuhdion, ((\];ml- 
bridge, Mass.: MITt 
\[Wiebe 1992\] I/. Wiebe. Modelling Autosegmental I'honology 
with Multi-'lhpe Finite State Transducers. M.Sc. rl'he- 
sis. Simon l;raser University. 
186 
