Compiling a Partition-Based Two-Level Formalism 
Edmund Grimley-Evans* 
University of Cambridge 
(St John's College) 
Computer Laboratory 
Cambridge CB2 3QG, UK 
Edmund. Grimley-Evans@cl. cam. ac. uk 
George Anton Kiraz t 
University of Cambridge 
(St John's College) 
Computer Laboratory 
Cambridge CB2 3QG, UK 
George. Kiraz@cl. cam. ac. uk 
Stephen G. Pulman 
University of Cambridge 
Cointmter Laboratory 
Cambridge CB2 3QG, UK 
and SRI International, Cambridge 
sgpOcam, sri. com 
Abstract 
This paper describes an algorithm for the 
compilation of a two (or more) level or- 
thographic or phonological rule notation 
into finite state transducers. The no- 
tation is an alternative to the standard 
one deriving from Koskenniemi's work: 
it is believed to have some practical de- 
scriptive advantages, and is quite widely 
used, but has a different interpretation. 
Etficient interpreters exist for the nota- 
tion, but until now it has not been clear 
how to compile to equivalent automata 
in a transparent way. The present paper 
shows how to do this, using some of the 
conceptual tools provided by Kaplan and 
Kay's regular relations calculus. 
1 Introduction 
Two-level formalisins based on that introduced 
by (Koskenniemi, 1983) (see also (Ritchie et al., 
1992) and (Kaplan and Kay, 1994)) are widely 
used in practical NLP systems, and are deservedly 
regarded as something of a standard. However, 
there is at least one serious rival two-level notation 
in existence, developed in response to practical 
difficulties encountered in writing large-scale mor- 
phological descriptions using Koskenniemi's nota- 
tion. Tile formalism was first introduced in (Black 
et al., 1987), was adapted by (Ruessink, 1989), 
and an extended version of it was proposed for use 
in the European Commission's ALEP language 
engineering platform (Pulman, 1991). A flmther 
extension to the formalisln was described in (Pul- 
man and Hepple, 1993). 
The alternative partition tbrmalism was mo- 
tivated by several perceived practical disadvan- 
*Supported by SERC studentship no. 92313384. 
tSupported by a Benefactors' Studentship from St 
John's College. 
rages to Koskenniemi's notation. These are de- 
tailed more fully in (Black et al., 1987, pp. 13-15), 
and in (Ritchie et al., 1992, pp. 181-9). In brief: 
(1) Koskennienli rules are not easily interpretable 
(by tile grammarian) locally, for the interpretation 
of 'feasible pairs' depends on other rules in the 
set. (2) There are frequently interactions between 
rules: whenever the lexieal/surface pair affected 
by a rule A appears in tile context of another rule 
B, the grammarian must check that its appearance 
in rule B will not conflict with the requirements of 
rule A. (3) Contexts may conflict: the same lexical 
character may obligatorily have multiple realisa- 
tions in different contexts, but it may be impossi- 
ble to state the contexts in ways that do not block 
a desired application. (4) Restriction to single 
character changes: whenever a change affecting 
more than one adjacent character occurs, multi- 
ple rules nmst be written. At best this prompts 
tile interaction problem, and at worst can require 
the rules to be forInulated with under-restrictive 
contexts to avoid mutual blocking. (5) There is 
no mechanism for relating particular rules to spe- 
cific classes of morpheme. This has to be achieved 
indirectly by introducing special abstract trigger- 
ing characters in lexical representations. This is 
clumsy, and sometimes descriptively inadequate 
('h'ost, 1990). 
Some of these problems can be alleviated by 
the use of a rule compiler that detects conflicts 
such as that described in (Karttunen and Beesley, 
1992). Others could be overcome by simple exten- 
sions to the tbrmalism. But several of these prob- 
lems arise from the interpretation of Koskenniemi 
rules: each rule corresponds to a transducer, and 
the two-level description of a language consists of 
the intersection of these transducers. Thus some- 
how or other it must be arranged that every rule 
accepts every two-level correspondence. We refer 
1;o this class of formalisms as 'parallel': every rule, 
in effect, is applied ill parallel at each point in the 
input. 
454 
The partition tbrmalism coImists of two types 
of rules (defined in more detail beh)w) which en- 
force optional or obligatory changes. Tl~e notion 
of well-formedness is defined via the notion of a 
'partition' of a sequence of lexical/surface corre- 
spondences, informally, a partition is a valid anal- 
ysis if (i) every element of the t)artition is licensed 
by an optional rule, and (ii) no element of the 
partition violates an obligatory rule. 
We have tbund that this formalism has some 
practical adwmtages: (1) The rules are relatively 
independent ot: each other. (2) Their interpreta- 
tion is more familiar for linguists: each rule copes 
with a single correspondence: in general you don't 
have to worry about all other rules having to t)e 
(:ompatible with it. (3) Multiple character changes 
art permitted (with some restrictions discussed 
below). (4) A category or term associated with 
each rule is requi,'e(t to uni(y with the affected 
morpheme, allowing for morI)ho-synta(:tic etfects 
to be cleanly described. (5) There ix a simple and 
etfMent direct interpreter for tt,e rule forrnalism. 
Tile partition formalism has been implemented 
in the European Commission's ALEP system tbr 
natural language engineering, distributed to over 
30 sites. Descriptions of 9 EU languages arc 
t)eing develot)e(1. A version has also be, en im- 
plemented within SI{.I's Core l,anguage Engine 
(Carl;er, 1995) and has been used to develot) de- 
scriptions of English, French, Spanish, Polish, 
Swedish, and Korean morphology. An N-level ex- 
tension of the formalism has also been developed 
by (Kiraz, 1994; Kiraz, 1996b) arrd used to de.-. 
scribe t;he morphology of Syria(: and other Semitic 
languages, arrd by (Bowden an(t Kiraz, 1995) for 
error dete(',tion in noncon(:atenative strings. This 
1)m.'tition-l)ased two-level formalism is thus a seri- 
ous riwll to the standard Koskcnniemi notation. 
lIowever, until now, the Koskenniemi notation 
has had one clear advantage in that it was clear 
how t;o compile it into transducers, with all the 
consequent gains in etliciency and portability and 
with |;ire ability t;o construct lexical transducers 
as in (Karttunen, 1.994). This paper sets out to 
remedy (ha|; defect by descril)ing a comtfilation 
algorithm for the I)artition-bas('d two-level nora- 
lion. 
2 Definition of the Formalism 
2.1 Formal Definition 
We use n tapes, where tim first N tapes are 
\]exical and the remaining M are surface, n -- 
N q M. In practi(:e, M :: 1. We write Ei 
for the alphabet of sylnbols used on tape i, and 
E :: (Er U {el) x ... x (E,~ U {c}), so that E* is 
the set of string-tuples representing possible con- 
tents of the n tapes. A proI)er subset of regular 
n-relations have the property that they are ex- 
pressible as the Cartesian product of n regular 
languages, H. = 1~1 × ... x l~n; we call such re= 
lations 'orthogonal'. (W('. present our detinitions 
along tire lines of (Kat)lan and Kay, 1994)). 
We use two regulm" ot)erators: Intro and Sub. 
IntrosL denotes the set of strings in L into which 
elements of S may be arbitrarily inserted, and 
SUBA,,~L denotes the set of strings in L ill which 
substrings that are in /3 may be replaced by 
strings from A. Both operators map regular lan- 
guages into regular languages, because they can 
be, t:haract(!rise(1 by regular relations: over tim al-. 
phabct E, Intros. = (Idz tO ({el × S))*, SubA,,~ =- 
(Id>\] tO (/3 x A))*, wtiere IdL = {(,% a') I .s' 6 L}, 
the identity relation over L. 
There are two kinds of two-level rules. The con- 
text restriction, or optional, rules, consist of a left 
context 1, a centre c, and a right context r. Surface 
coercion, or obligatory, rules require the centre to 
be split into lexical cl and surface c, compolmnts. 
Detinltion 2.1 A N:M context restrietion 
(CR) rule is a triple. (/,,c,r) where l,c,r are 
'orthogonal' regular relations of the form l :: 
I l X ... X ln~ (: = Cl X ... X (:~ ?' - ?'1 X ... X '1"~,. \[-1 
Definition 2.2 A N:M surface coercion (SC) 
rule ix a quadruple (/,c/,c~,r) where l and r 
are 'orthogonal' regular relations of tile form l = 
l I x ... x 1.n~ ?" : ~"r x ... x l'n, &lid Cl an(t c s 
are 'orthogonal' regular relations restricting only 
the lexical and surface tapes, respectively, of the 
\[:or'rH C l 7£ C I X ... x G N X >~N \[ \[ x ... X ~N-} M and 
,% = E~' x ... x ).\]~ x CN+:j x ... x (W+M. \[\] 
We usually use the following notation tbr rules: 
LLC l,I.;x RI,C ~>\[¢-\[¢> 
i,SC S J}{.1.' \]{.SC 
where 
I,LC (lel't l(,xi,:al corlt,,~t) = (~,..., 1N> 
LEX (lexical form) = <q,..., oN) 
RLC (right lexical context)= (r,,... ,rN) 
LSC (left surface context) :: <IN+r,... ,1N+M) 
SUll\]" (surfac, c forlll) == <c NI.I,...,CN+M> 
II,S(? (right surt'~(:e conl;cxt) = (rN+l,..., rN+M) 
1. ( Sm "e in t)racticc all the left conl;(;xts I.i start 
with E~ a.ud all the right contexts ?'i end with L*, 
we omit wril;ing it and assume it by default. The 
operators are: ~ for CII. rule, s, *{= for S(\] rules and 
4> for coInposite rules. 
A prot)osed morphologit:at analysis 1 ) is an ,~- 
tuI)le of strings, and th('. rules are intert)reted as 
applying ~o a section of this analysis in conl;ext: 
455 
P = l}l~,t~ (n-way concatenation of a left con- 
text, centre, aim right context). Formally: 
Defiinition 2.3 A C19, rule (1, c, r) contextually 
allows (1}, Pc, P,.) iff P~ E l, P,. E r and P~ G c. 
\[\] 
Definition 2.4 An SC rule (l, cl, c,., r) coer- 
cively disallows (Pt, P~,Pr) iff 1} G l, P,. E r, 
P,. Ecl and P,~ ¢ c~. \[\] 
Definition 2.5 A N:M two-level grammar is 
a pair (1~_~, \]~,<=), where f~ is a set of N:M con- 
text restriction rules and I~¢ is a set of N:M sur- 
face coercion rules. \[\] 
Definition 2.6 A two-level grmnmar (R~, R~=) 
aecepts the string-tuple l', partitioned as 
Pt,...,Pa:, iff P = PIt~...Pj,, (n-way concate- 
nation) and (1) for each i there is a CR rule 
A E II, such that A contextually allows 
• ). (I 1 ...I ,-1, Pi, Pi+l ...Pk) and (2) there are no i < j 
such that there is an SC rule/3 E /~¢ such that B 
coercively disallows U~ ...P~-I, P~..,Pj-~, 15...Pk). 
There are some alternatives to condition (2): 
(2 0 there is no i sudl that there is at, SC 
rule B E R¢ such that B coerciw,qy disallows 
"D 3. (t:L...I,.--t,15,Pi+I...Pk): this is (2) with the re- 
striction j = i + 1; since SC rules can only ap- 
ply to the partitions P/, epenthetic rules such as 
(~*@,~i),e X E.~,Z~ X (t,@,k)S*) ('insert an a 
between lexical and surface ks') can not be en- 
forced: the rule would disallow adjacent (k,k)s 
only if they were separated by an empty parti- 
tion: ...(k, k), e, (k, k)... would be disallowed, but 
...(k, k), (k, k}... would be accepted. 
(2ii) there is no i such that there is an SC 
rule B E Re such that, B coercively disallows 
(P1...Pi-:,, Pi, Pi+,...P~) or B coercively disallows 
(Pt...Pi-I, Pi...P~): this is (2) with the restriction 
j = i + 1 or j = i; this allows epenthetic rules 
to be used but rnay in certain cases be counteriI> 
tuitive for the user when insertion rules are used. 
For example, the rule (E* (g, g), u x E~, E~ x v, E*) 
('change 'u to v aft;re' a g') would not disallow a 
string-tuple partitioned as ...(.(I, g), (e, c), (u, u)... - 
assmning some CR rule allows (e, e). 
Earlier versions of the partition fbrmalism could 
not (in practice) cope with multiple lexical char- 
actors in SC rules, see (Carter, 1995, §4.1). This 
is not tit(; case here. 
The tbllowing rules illustrate the formalism: 
V B - * => 
RI: V b * 
B - B - * => 
R2: b b * 
c d ¢> R3: 
c b d 
R1 and R2 illustrate the iterative application of 
rules on strings: they sanction the lexical-surface 
strings (VBBB,Vbbb), where the second (B,b) 
pair serves as the centre of the first application 
of R2 and as the left context of the second ap- 
plication of the same rule. R,a is an cpenthetic 
rule which also demonstrates centres of unequal 
length. (We assume that <V,V), (c,c) and (d,d) 
are sanctioned by other identity rules.) 
The conditions in Definitions 2.1 and 2.2 that 
restrict the regular relations in the rules to be- 
ing 'orthogonal' are required in order for the fi- 
nal hmguagc t;o be regular, because Definition 2.6 
involves an implicit intersection of rule contexts, 
and we know that the intersection of regular rela- 
tions is not in general regular. 
2.2 Regular Expressions for Compilation 
'Ib compile a two-level grammar into an automa- 
ton we use a calculus of regular languages. We 
first use the standard technique of converting reg- 
ular mrelations into same-length regular relations 
by padding them with a space symbol 0. Unlike 
arbitrary regulm' n-relations, same-length regular 
relations m'e closed under intersection and comple- 
mentation, be.cause a theorem tells us that they 
correspond to regular languages over (e-free) n- 
tuplcs of symbols (Kaplan and Kay, 1994, p. 342). 
A proposed morphological analysis P = P1 ...P~: 
can be represented as a sanle-length string-tuple 
co l3lwt~2w...wlSt, w, where \[~ E E* is Vi converted 
to a same-length string-tuple by padding with 
0s, and w = (wl,...,w~,), whe.re the {w~} are 
new symbols to indicate the partition boundaries, 
w~ ¢ ~ v {0}. 
Since in a partitioned string-tuple accepted by 
the grammar (R=>, R~=) each Pi E e for some CR 
rule (l, c,r) ERa, we can make this representa- 
tion unique by defining a canonical way of convert- 
ing each such possible centre (2 into a same-length 
string-tuple 6'. A simple way of doing this is to 
pad with 0s at, the right making each string as long 
as the longest string in C: if C - (Pl, ...,pn), 
(;' = (>0", ...,p,,0*) n >~* - z*(0, ..., 0) (1) 
However, since we know tit(; set of possible pro._ 
titions - it is U{c \] ~l,r(l,(-,'r} E 1{:,}- we can 
reduce the number of elements of E in use, and 
hence silnplify the calculations, by inserting the 0s 
in a more flexible manIter: e.g., if C -- (ab, b}, let 
O = (ab, Ob) rather than (? : (ab, b0): assuming 
another rule. requires us to use (b, b} anyway, we 
only haw; to add (a, 0) rather than (a, b} and (b, 0). 
456 
The 1)reprocessor could use simI)le heuristics to 
make such decisions. In any case, the padding of 
t)ossibl(, t)artitions (:arries over to the (:entres c of 
ca r,les: it" (l,,-,,-) e = {0 I C c c}. 
Itencetbrth let 7c be l;he set; of elements of E that 
appear in seine 0-padded rule centre. 
The contexts of all rules and the lexical and 
surface Celltres of SC rules Inust l)e converted into 
same-length regular n-relations |)y inserting 0s at 
a\]l 1)ossible positions on each tape independently: 
if a; - 2; 1 x ... x xnt 
x °= (Introto}xt × ... × Intro{o}xn)Fire* (2) 
Note the difference between this insertion of 
0 everywhere, denoted x °, and the canonical 
padding L'. Both r('quire the 'orthogonality' condi- 
tion in ord('r for the intersection with 7r* to yield 
a regular language: inserting Os into (a, b}* at 
all possibh; l)ositions on each tape iIulependently 
would give a non-regular relation, for examt)le. 
Now we derive a formula ibr the set of O-padded 
;rod partitioned analysis st;rings accepted by the 
grammar (/~,=>, 17,¢_): The set of O-pa(l(ted centres 
of context; restriction rules ix given by: 
u = I c,,,..(L,c,,.) (s) 
th,re we assume that these centres are disjoint 
(Vc, d ~ .l).c : d V c f\] d = 0), because in prac- 
tice each c in a singleton set,, however tiler(; is an 
alternative deriw~tion that does not require this. 
We proceed subtraetively, starting as an initial 
approximation with an art)itrary concatenation of 
the possible l)artitions, i.e. the (:entres of Cl/, rules: 
co(Dee)* (4) 
From this we wish to subtract tim set of strings 
containing a t)artition that is not allowed by any 
CR rule: We introduce a new placeholder symbol 
T, r ~ 7c O {co}, to represent the centre of a rule, 
so the set of possihle contexts for a given centre 
G D is given by: 
\[.J z%, .° (s) 
(/,~,r)~ll , 
So the set, of contexts in wlfich the centre c may 
not, al)t)ear is the comlflement of this: 
rC*Tre* -- U I('T"'° (6) 
(t,,Lr)<1~ 
Now we can introduce t;he partition sel)arator co 
throughout, then substitute the centre itself, w&o, 
for its placeholder T in order t() derive an expres- 
sion for the set of partitioned strings in which an 
instan('e of the centre c al)l)ears ill a context in 
which it, is 'not allowed: \[o denotes comt)osition \] 
(7) 
If we subtract a term like this tbr each ~ 6 D 
Dorn our initial approximation (eq. 4), then we 
have all ext)ression for tile set of strings allowed 
by the CR rules of tile gralnlnar: 
C D 
(l,~,r)C51i, > 
\]t remains to enforce the sm'fime coercion rules 
/~-. For a given SC ruh; (1, ct, Cs, r) 6 /~,<:, a, first 
al)llroxinmtion I;o tim set of strings in which this 
rule is violated is given by: 
Intro{w} (/t/co(c'/' -- Cs°)cor 0) (.9) 
Here (r,'(~) - c~) is the set of strings that match 
the lexical centre but, do 11ol, match the surface 
centre. For part (2) of Definition 2.6 to apply this 
must equal the concatemttion of 0 or more adja- 
cent partitions, hence it has on each side of it, the 
partition separator co, and the operator Intro iil- 
troduces additional partition separators into tile 
contexts and the centre. The only case not yet 
{:overed is where dm centre matches 0 a(\[jacent 
partitions (i = j in part (2) of Definition 2.6). 
This can be dealt with by prefixing witll the sub- 
stitution operator Sub~o,o0w, so the set of strings in 
which one of the SC rules is violated is: 
U Sub~o,~o~o o Intro{~} (l°co(cp, - c(:)cor °) 
( I,,~, ,c., ,.,. )(. lt<~ 
We subtract; this too fl'om our aporoxin, a~l(ur{ 
(eq. 8) in order to arrive at a formula for the set 
of 0-padded and partitione(l strings that are ac- 
(:epted Iiy the grammar: 
& = co(Dw)*- \[_J Sub~<~ o 
~GI) 
(/,d,r)6 IL > 
" " U Subw,ww o 
Intro{~} (/°co(el '-- c\[~)co, '°) (1\]) 
Finally, we can replace l;he partition separator 
co anti the st)ace sylnbol 0/)y e to convert So into 
a regular (but no longer same-length) relation S 
that maps t)etween lexical and surface representa- 
tions, as in (Kaphm and Kay, 1994, p. 368). 
3 Algorithm and Illustration 
This section goes through the compilation of the 
samI)le grammar in section 2.1 step by step. 
457 
3.1 Preprocessing 
Preprocessing involves making all expressions of 
equal-length. Let, E1 = {V,B,c,d,0} and E~. = 
{V,b,c,d,0} be the lexical and surface alphabets, 
respectively. We pad all centres with O's (eq. 1), 
then compute the set of 0-padded centres (eq. 3), 
D = {(B,b), (0,b), {V,V), (c,c), (d,d)} (12) 
We also compute contexts (eq. 2). Uninstantiated 
contexts become 
Intro{o}(E~) x Intro{o}(E~) (13) 
The right context of R3, for instance, becomes 
Intro{o}(dS~) x Intro{o}(dE~) (\]4) 
3.2 Compilation into Automata 
The algorithm consists of three phases: (1) con- 
structing a FSA which accepts the centres, (2) ap- 
plying CR rules, and (3) \[brcing SC constraints. 
The first approximation to the grammar (eq. 4) 
produces FSA1 which accepts all centres. 
D 
FSA1 
Phase 2 deals with CR rules. We have two cen- 
tres to process: (B,b) (R1 ~ R2) and (0,b) (R3). 
For each centre, we compute the set of invalid con- 
texts in which the centre occurs (eq. 7). Then we 
subtract this from FSA1 (eq. 8), yielding FSA2. 
<d,d> 
<d,d> "" 
FSA2 
The third phase deals with SC rules: here the 
portion of R3. Firstly, we compute the set of 
strings in which R3 is violated (eq. 10). Secondly, 
we subtract the result from FSA2 (eq. 11), re- 
sulting in an automaton which only differs from 
FSA2 in that the edge from q5 to qo is deleted. 
4 Comparison with Previous 
Compilations 
This section points out the differences in compil- 
ing two-level rules in Koskenniemi's formalism on 
one hand, and the one presented here on the other. 
4.1 Overlapping Contexts 
One of the most important requirements of two- 
level rules is allowing the multiple applications 
of a rule on the same string. It is this require- 
ment which makes the compilation procedures in 
the Koskemfiemi formalism - described in (Ka- 
plan and Kay, 1994) - inconvenient. 'The multi- 
ple application of a given rule', they state, 'will 
turn out to be the major source of difficulty in 
expressing rewriting rules in terms of regular re- 
lations and finite-state transducers' (p. 346). The 
same difficulty applies to two-level rules. 
Consider R1 and R2 (§2.1), and D = 
{(V,V>, <B,b)}. (Kaplan and Kay, 1994) express 
CR rules by the relation, 1 
Restrict(c, l, r) : 7r*l c~r* N ~c*c rlr* (15) 
This expression 'does not allow for the possibil- 
ity that the context substring of one application 
might overlap with the centre and context por- 
tions of a preceding one' (p. 371). They resolve 
this by using auxiliary symbols: (1) They intro- 
duce left and right context brackets, <k and >k, 
for each context pair lk - rk of a specific centre 
which take the place of the contexts. (2) Then 
they ensure that each <k:<k only occurs if its 
context Ik has occurred, and each >k:>k only oc- 
curs if followed by its context rk. The automaton 
which results after compiling the two rules is: 
V:V V:V >k:>k >~:>~ >~:>~ ~:<, 
V:V 
B:b 
V:V 
>1:>1 
.'2:>2 
Removing all auxiliary symbols results in: 
B:b 
V:V 
Our algorithm produces this machine directly. 
Compiling Koskenniemi's formalism is compli- 
cated by its interpretation: rules apply to the en- 
tire input. A partition rule is concerned only 
with the part of the input that matches its centre. 
1This expression is an expansion of Restrict in 
(Kaplan and Kay, 1994, p. 371). 
458 
4.2 Conditional Compilation 
Compiling epenthetic rules in the Koskenniemi 
formalism requires special means; hence, the algo- 
rithm is conditional on the type of tim rule (Ka- 
plan and Kay, 1994, p. 374). This peculiarity, in 
the Koskenniemi formalism, is due to the dual in- 
terpretation of the 0 symbol in the parallel formal- 
isin: it is a genuine symbol in the alphabet, yet it 
acts as the empty string e in two-level ext)ressions. 
Note that it is the duty of the user to insert such 
symbols as appropriate (Karttunen and Beesley, 
1992). 
This duality does not hohl in the pm'tition 
formalism. The user can express lexical-surface 
pairs of unequal lengths. It is the duty of the rule 
compiler to ensure that all expressions m'e of equal 
length prior to compilation. With CR rules, this 
is done by padding zeros. With SC rules, howew;r, 
the Intro operator accomplishes this task. There 
is a subtle, but important, (lifl~rence here. 
Consider rule R3 (§2.1). The 0-padded centre 
of the CR portion becomes (0,b). The SC portion, 
however, is computed by the expression 
Insert{0}(() x Insert{0l(b) (16) 
yielding automaton (a): 
1 
<0,0> Any <o,o> A~,, 
a b 
If the centre of the SC portion had been padded 
with 0's, the centre wouht have been 
Insert{0}(0 ) x Insert{0}(/,) (17) 
yielding the undesired automaton (b). Both are 
similar except that state qo is final in the former. 
Taking (a) as the centre, eq. 10 includes (cd,cd); 
hence, eq. 11 excludes it. The compilation of our 
rules is not conditional; it is general enough to 
cope with all sorts of rules, epenthetic or not. 
5 Conclusion and Future Work 
This paper showed how to compile the partition 
formalism into N-tape automata. Apart from in- 
creased efficiency and portability of impleinenta- 
lions, this result also enables us to more easily 
relate this formalism to others in the field, using 
the finite-state calculus to describe the relations 
implemented by the rule compiler. 
A small-scale prototype of the algorithm has 
been implemented in Prolog. The rule compiler 
mak(;s use of a finite-state calculus library which 
allows the user to compile regular expressions into 
automata. The regular expression language in- 
cludes standard operators in addition to the op- 
erators defined here. The system has been tested 
with a number of hypothetical rule sets (to test 
the integrity of the algorithm) and linguistically 
motivated morphological grammars which make 
use of multiple tapes. Compiling realistic descrip- 
tions would need a more efficient implementation 
in a more suitable language such as C/C++. 
\]Alture work includes an extension to simulate 
a restricted torm of unification between categories 
associated with rules and morphemes. 
References 
Black, A., Ritchie, G., Pulman, S., and Russell, G. 
(1987). F()rmalisms for morphographemic descrip- 
tion. In EACL-87, pp. 11 8. 
Bowden, T. and Kiraz, G. (1995). A mor- 
phographe.mic model for error correction in noncon- 
catenative strings. In ACL-95, pp. 24-30. 
Carter, D. (1995). Rapid development of morpholog- 
ical descriptions for full language processing sys- 
tems. in EACL-95, pp. 202-9. 
Kaplan, R. and Kay, M. (1994). Regular models of 
phonological rule systems. Computational Linguis- 
tics, 20(3):331 78. 
Karttunen, L. (1994). Constructing lexical transduc- 
ers. In COLING-9~, pp. 406 411. 
Karttunen, L. and Beesley, K. (1992). Two-Level Rule 
Compiler. Palo Alto Resem'ch Center, Xerox Cot- 
poration. 
Kiraz, G. (1994). Multi-tape two-level morphology: 
a case study in Semitic non-linear morphology. In 
COLING-9\]~, pp. 18{}-6. 
Kiraz, G. (1996b). Computational Approach to Non- 
Linear Morphology. PhD thesis, University of Cam- 
bridge. 
Koskenniemi, K. (1983). Two-Level Morphology. PhD 
thesis, University of Helsinki. 
Puhnan, S. (1991). Two level morphology. In Alshawi 
et. al, ET6/I Rule Formalism and Virtual Machine 
Design Study, chapter 5. CEC, Luxembourg. 
Pulman, S. and Hepple, M. (1993). A feat{{re-based 
formalism for two-level phonology: a description 
and implementation. Computer Speech and Lan- 
guage, 7:333 58. 
Ritchic, G., Black, A., I{ussell, G., and Puhnan, 
S. (1992). Computational Morphology: Practical 
Mechanisms for the English Lexicon. MIT Press, 
Cambridge Mass. 
Ruessink, H. (1989). Two level formalisms. Technical 
Report 5, Utrecht Working Papers in NLP. 
Trost, H. (1990). The application of two-level mor- 
phology to non-concatenative German morphology. 
In Karlgren, H., editor, COLING-90, pages 371-6. 
459 
