Finiteostate Description of Semitic Morphology: 
A Case Study of Ancient Akkadian 
Laura KATAJA 
University oftIelsinki 
Depart~ne~t~ of Asian 
and Afiicaa Studies 
Halli~uskatu 11 
8F- 00i00 Helsinki 
Finland 
Kimmo KOSKENNIEMI 
University of Helsinki 
Research Unit for 
Computational Linguistics 
Hallituskatu 11 
SF-00100 Helsinki 
Finland 
A~st~ac.t: Thi~ paper discusses the problems of descrip- 
tio~ a~d c,m~putatlonal implementation of phonology and 
~no~'pholo\[~y in Semitic languages, using Ancient 
Akkadian as m~ example. Phonological and morphophono~ 
logical va~ iations are described using standard finite-state 
two..level morphological rules. Interdigitation, prefixation 
ax~.d s~tffixation are described by t~sing an intersection of 
~w~ lexicons which effectively defines lexical representa- 
tions of wo~'ds, 
~o lntrod'trcticm 
Word-.fir:mat\]on in Semitic languges poses several 
challengeu to computational morphology. One obvious 
difficulty is its nonconcatenative nature ie. the fact that 
inflection :is not just adding prefixes and suffixes, but also 
i~tcludes interdigitation where the phonological sequence 
3ymbolizh~g a verbal root is interrupted by individual and 
short sequences of phonemes denoting various derivational 
and inflectional stems. \]fn addition to this, there are 
~xumerou~ phonological and raorphophonological processes 
of a more conventional character. 
Two-level phonology assumes a framework for word- 
formation ~vhere there is an underlying lexical representa- 
tion of the word-form and a surface representation which 
are related to each other with two=level rules \[Kosken- 
nien~ J983\]. These rules compare the representations 
directly a~ld they operate in parallel The lexicon compo- 
rxent deth.~.es what lexical representations are permissible 
and how they correspond to sequences of morphemes, see 
figure 1. 
MORPHEMES 
1 I 
t exicon Component 
I 
I 
LEXiCAL ~EPRESENTATI ON 
I 
I 
TwoAevel Rules 
I I 
SI.IRIFACE IiEPRESENTATION 
Fig. 1 
_nL~ paper desc~ibes a fairly comprehensive two-level 
~le 3ysh~a~ ~br phonological and morphophonological 
alter~atio~ in Akkadian word inflection and regular 
ve~obal derivation. The rule component proves to be similar 
i~ two-level rule systems ~br other la~guages. 
Interdigitation entails more requirements for the lexicon 
which defines feasible lexical representations and relates 
them to underlying morphemes..The task for the lexicon 
component is more or less universal, even if some 
languages can do with simpler lexicons while others 
require more sophisticated structures. 
This paper discusses a solution which involves using two 
separate lexicons, one for word roots, and the other for 
prefixes, flexional elements and suffixes. Entries for roots 
leave flexlonal elements unspecified and vice versa. The 
intersection of these two lexicons effectively defines lexical 
representations of word-forms. 
2. Morphotactic structure of word-forms. 
Akkadian verbs have the following overall pattern: 
\[pers.\] \[root & flection\] \[gender & numb.\] \[opt. subjunctive etc.\] \[opt. obj.\] 
An example of a full fledged verbal form would be '(that) 
they caught him': 
lexicalrepresentation: I X t a B A T - u \ - n I - sh u 
surface representation: i x x a b t u u n i sh u 
A dash '-' denotes morpheme boundary, and backslash 'V a 
morphophoneme for vowel lengthening. The above word- 
form is divided into its parts according to the pattern as 
follows: 
person 1 
root X ... B ... A T 
flection ... t a ...... 
gender & number u \ 
subjunctive n 1 
object sh u 
Capital letters are used in order to distinguish radical 
consonants and vowels fi'om segments in other morphs. 
Thus, the root & flection part is XtaBAT where capital 
letters are components of the root, with lower case letters 
representing flectional elements. 
Nouns, in turn, have an overall structure : 
\[stem\] \[case & number\[ \[opt. possessive\] 
An example of a maximal nominal word-form is their 
kings ~. 
lexicalrepresentation: Sh a R \ - a \ n t - sh u n U 
surface representation: sh a r r a a n i sh u n u 
313 
This can be readily decomposed into its parts as follows: 
stem Sh a It \ 
case&number a \ n i 
possessive sit u n u 
3. Overall structure of morphs 
Verbal roots have an overall pattern of three radical 
consonants and one vowel c ... C ... v c where flectional 
elements may occur in the two intervening slots marked 
with "..." 
Flectional elements have a pattern consisting of two 
parts to fill the corresponding two gaps in the verbal root. 
The overall pattern is roughly ...(((c)C)v)...(v or \)... 
There is at most one verbal prefix and it indicates person 
(and partly modus). Its overalll pattern is (C)y. 
There are at most three verbal suffixes attached to the 
stem. The first suffix indicates gender and number (and 
partly person). They have the form v\ or they are empty. 
The second suffix indicates either the subjunctive (u, 
empty, or n-l) or the ventive (am or aim). The third suffix 
denotes the object or the dative case and conforms to a 
pattern c v( c u(\ c v)) 
Nominal stems are given as derived complete stems 
containing three radical consonants which can be 
identified, but no attempt has been made to generate them 
from plain radical consonants and flectional elements 
because stems are idiosyncratic and better described as 
lexicalized whole units. 
Nominal suffixes indicate gender, number and case. 
Gender is part of the stem for nouns whereas adjectives 
have an explicit feminine suffix (a)t (the masculine has no 
marking). Number and case are represented by port- 
manteau morphs. After these endings there may be a 
possessive ending according to one of two patterns: v k or 
c v (c v). 
3. Phonological Description 
Akkadian, like many other Semitic languages, has a 
considerable number of phonological and morphophono- 
logical processes. This paper describes a fairly complete 
and tested system of some 30 rules written in two-level 
formalism and compiled with the TWOL rule compiler 
\[Karttunen, Koskenniemi and Kaplan, 1987\]. A number of 
examples is given below accompanied by rules that 
correspond to the processes. In each example the lexical 
representation is given (in bold face) above the surface 
representation (in normal face). 
There are several assimilations word internally and at 
morpheme boundaries, eg. an N in the root is assimilated 
to the immediately following consonant, eg. 7re cut (past 
tense) ': 
I~KIS 
ikkis 
which corresponds to the rule: 
"assimilation of N" 
N:F <=> :F ; where F in Consonants ; 
Futhermore, 'he said ~: 
IZtaKAR izzakar 
"assimilation of dontals" 
t:F <=> :F ; whore F in Dontals ; 
and %e trusted hint (something) ': 
i P Q I D - sh u 
i p q i s s u 
"suffix assimilation of t" 
t:s <=> -: ~h: ; 
"suffix assimilation of sit" 
sh:s <=> :s --: ; 
Some alternations caused by laryngeals: 
'lord' 
B a \ E 1 u 
b e e / u 
"umlaut" 
a:e => E: :* : 
• * E: ; 
'he enters ' 
IE aRkUB 
i r r u b 
"elision of a" 
a:O <=> : Vowel Laryngeal: : CollsonaHt \: 
Examples of deletion of short vowels: 
'good ' 
oa~lQu 
dam qu 
Examples of vowel contractions. 
'they said to me' 
i Q B I 3 u \ -- n i m -. n i i 
q b u u rl i n n i 
"Vowel contraction" 
Vowel:O <:> (La:) :Vo (La:) :Vo ; 
~she is) clean' ZaKUJ-at 
z aka at 
Examples of morphological alternation of root vowels: 
'he decides' 
'decide!' 
Some analogical forms: 
'he enters' 
i P aR\OS 
i p a r r a s 
Pv ROS 
\[1 tl F fl S 
i E a R \ J B 
i r r u b 
'they (fern.) donate' 
i Q a I \ AI 5h at \ 
i q i sh sh a a 
Lexieol;,~ an ~. oft(:rt understood as lists of ex}£rles or as 
s~mm kind of ~rce str~xetm'es havi~g branches with letters 
as their ~.abel~ (tries)..A tree is, of course, a speciM case of a 
i):~:dte-.sta~ tranu:ition diagram or a finite-.sta~e automaton. 
Specificall.:~, i;rc, es have no loops or cycles. 'rite obvious 
generaliza~io:o, of' lexicons would, \[,hen, be to use transition 
diagr~m~ b~,%ead of trees. An entry tbr a verbal stem 
~decide' as a regular expression could be: 
);2* ," )iS"' ~ 2',)* 0 .5 Nz* 
where: >',~ denotes the alphabet for prcfixes~ flectional 
elemenrl;~. ~tn.d sut_~XeSo Correspondingly, an entry for a 
pr(,sc:~t te~ ~c ba:;ie stem (G stem) could be: 
where ~;,~ cv~m~e~ th.e alphabet fbr radical consonants and 
vowels, Ir~i;erseetions of such root entries and flectional 
elemer~ts l,~ve e~actly the lexical representations of verbal 
stereo. (Tit(,. number of diftbrcnt entries needed for 
fiectioxmt, par~s is i~ the order ot"10o) 
The infleetim~al parl; of the lexico~t could be expressed as 
a e(mcatenation of the prefix, flexion and the suffix sub- 
lexicons. The intersection of this a*~d the root lexicon 
eoxttains ~!\[ tbasible lexical representations (which was 
the task c)f the lexicon component). This intersection need 
J.tot be carried out in advance because the process of 
recognition can peribrm simultaneous searches in these 
i,wo c(anpone~.ts aJ~d sinmlatc the intersection. The result 
of an actual intersection would be inconvenient because of 
its size (roughly, the product of the sizes of its eom.ponents). 
(Th.ere is D.O operational implementation of this part of the 
system yet, although facilities to build it are avaUable3 
5° Combinations of Morphemes 
The otructure of \[cxicou that was sketched above greatly 
ovcrgenera:;e~ because many combinations of prefixes, 
fle('tional e)emcnts and suffixes are not valid. Restrictions 
are needed tbr the cooccurrence of these morphemes. One 
ohvious way to cope with such combinatorics is to use 
unificatim~-b~sed fbatures as in I)-PATR \[Karttunen 
1986\]. lh_~ification features have the additional benefit of 
also providing effective morphosy~d;actic features for 
¢~ordZorn'm It seems that the ability of using negation and 
disjunctiozx in unification would simplify the description. 
In the following we assume these to be available. 
1,\]fthctive restrictions for prefixes could be eg.: 
u (rmtPers2 siugular hotFemin ) or 
( permn3 plural not~omrn ) 
i (per'sere3 siHgular masculine) or 
( person3 plural uotComm ) 
where ¢omm refers to a gender which is used in some forms 
to cover both feminine and masculine (feminine, masculine 
and Comm are mutually exclusive). 
Descriptions tbr suffixes could be eg.: 
a a notPersl plural notMasc 
i 1 person2 singular feminine 
u u person3 plural masculine 
null morph notPers2SgFem or 
person1 plural comm 
The templates can be defined in a straight forward man- 
ner to rcsalt in combinations eg.: 
U... aa 
tl ,.. o u 
o ,.. 
person3 plural feminine 
person3 plural masculine 
person1 singular masculine, or 
person3 singular masculine 
The combinatorics of Akkadian prefixes and suffixes 
seems to be fairly complicated, but a feature calculus 
seems to be sutIicient for handling it so that it lets only 
valid combinations through and gives correct morpho- 
syntactic features to word-forms. (This part of the work is 
.%ill in progress.) 

References 

Karttunen, Lauri (1986) "D-PATR: A Development Envi- 
ronment for Unification-Based Grammars", Proceed- 
ings of COLING '86, Bonn. 

Karttunen, L., Koskenniemi, K., Kaplan, R. (1987) "A 
Compiler for Two-level Phonological Rules". In Tools 
for Morphological Analysis, Center for the Study of 
Language and Information, Report No. CSLL87-108. 

Kay, Martin, \[an unpublished paper on Finite-State 
Aproach to Arabian Morphology at a Symposium on 
Finite-State Phonology at CSLI in July 1985.\] 

Koskenniemi, Kirmno (1983) Two-Level Morphology: A 
General Computational Method for Word-Form Recog- 
nition and Production. University of Helsinki, Depart- 
ment of General Linguistics, Publicatiuons, No. 11. 

Koskenniemi, Kimmo (1986) "Compilation of Automata 
from Morphological Two-level rules", Papers from the 
Fifth Scandinavian Conference of Computational 
Linguistics. Helsinki, December 11-12, 1985. Depart- 
ment of General Linguistics, Publications, No. 15. 
