Parallel Replacement in Finite State Calculus 
Andr6 Kempe and Lauri Karttunen 
Rank Xerox Research Centre - Grenoble Laboratory 
6, chemin de Maupertuis -- 38240 Meylan - France 
{kempe, karttunen}~xerox, fr http ://www. xerox, fr/grenoble/mltt 
Abstract 
This paper extends the calculus of regular ex- 
pressions with new types of replacement ex- 
pressions that enhance the expressiveness of 
the simple replace operator defined in Kart- 
tunen (1995). Parallel replacement allows 
multiple replacements to apply simultaneously 
to the same input without interfering with 
each other. We also allow a replacement to 
be constrained by any number of alternative 
contexts. With these enhancements, the gen- 
eral replacement expressions are more versa- 
tile than two-level rules for the description of 
complex morphological alternations. 
1 Introduction 
A replacement expression specifies that a given 
symbol or a sequence of symbols should be replaced 
by another one in a certain context or contexts. 
Phonological rewrite-rules (Kaplan and Kay, 1994), 
two-level rules (Koskenniemi 1983), syntactic dis- 
arnbiguation rules (Kar\]sson et al 1994, Kosken- 
niemi, Tapanainen, and Voutilainen 1992), and 
part-of-speech assignment rules (Brill 1992, Roche 
and Schabes 1995) are examples of replacement in 
context of finite-state grammars. 
Kaplan and Kay (1994) describe a general 
method representing a replacement procedure as 
finite-state transduction. Karttunen (1995) takes a 
somewhat simpler approach by introducing to the 
calculus of regular expression a replacement opera- 
tor that is defined just in terms of the other regular 
expression operators. We follow here the latter ap- 
proach. 
In the regular expression calculus, the replace- 
ment operator, ->, is similar to crossproduct, in 
that a replacement expression describes a rela- 
tion between two simple regular languages. Con- 
sequently, regular expresmons can be conveniently 
combined with other kinds of coperations, such as 
composition and union to form complex expres- 
sions. 
A replacement relation consists of pairs of strings 
that are related to one another in the manner 
sketched below: 
x u.~ y, u~ z upper string \[1\] 
x 1~ y 1~ z lower string 
We use u i and u~ to represent instances of Ui (with 
i C \[1, n\])and 1~ and 1~ to represent instances of Li. 
The upper string contains zero or more instances of 
Ui, possibly interspersed with other material (de- 
noted here by x, y, and z). In the corresponding 
lower string the sections corresponding to Ui are in- 
stances of Li, and the intervening material remains 
the same (Karttunen, 1995). 
The -> operator makes the replacement obliga- 
tory, (->) makes it optional. For the sake of com- 
pleteness, we also define the inverse operators, <- 
and (<-), and the bidirectional variants, <-> and (<->). 
We have incorporated the new replacement ex- 
pressions into our implementation of the finite- 
state calculus (Kempe and Karttunen, 1995). 
Thus, we can construct transducers directly from 
replacement expressions as part of the general cal- 
cnlus, without invoking any special rule compiler. 
1.1 Simple regular expressions 
The table below describes the types of regular ex- 
pressions and special symbols that are used to de- 
fine the replacement operators. 
(h) option, \[ h I 0 \] \[2\] 
h* Kleene star 
h+ Kleene plus h/B 
ignore (A possibly interspersed with 
strings from B) 
"h colnplement (negation) 
$h contains (at least one) A h B concatenation 
h I B union h g~ t3 intersection 
h - B relative complement (minus) 
h .x. B crossproduct (Cartesian product) h .o. 13 composition 
0 or \[ J epsilon (the empty string) 
\[. .\] affects empty string replacement (see. 2.2) 
? any symbol 
?* the universal ("sigma-star") language 
(contains MI possible strings of any length 
including the empty string) 
.#. string beginldng or end (see. 2.1) 
Note that expressions that contain the cross- 
product (.x.) or the composition ( o . . .) opera- 
tot, describe regular relations rather than regular 
hmguages. A regular relation is a mapping from 
one regular language to another one. t{egular lan- 
guages correspond to simple finite-state automata; 
regular relations are modelled by finite-state trans- 
ducers. 
In the relation A .x. B, we call the first lnern- 
ber, h, the upper language and the second mem- 
ber, B, the lower language. This choice of words 
is motivated by the linguistic tradition of writ- 
ing the result of a rule application underneath 
the original form. In a cascade of compdsitions, 
I~3..o. 1~2 .... o. Rn, which models a linguistic 
derivation by rewrite-rules, the upper side of the 
first relation, R1, contains the "underlying lexical 
622 
form", while the lower side of the last relation, Rn, 
contains the resulting "surface form". 
We recognize two kinds of symbols: simple sym- 
bols (a, b, c, etc.) and fst p'drs)ai (a:b, y:z, etc.). 
An Nt pair a : b can be thought of as tim crossprod- 
uct of a and b, the minimal relation consisting of a 
(the upper syml)ol) and b (the lower symbol). 
2 Parallel Replacement 
Conditional parallel replacement denotes a relation 
which maps a set of n expressions Ui (i E \[1, n\]) in 
the upper language into a set of corr~;sponding n 
expressions Li in the lower language if, and only if, 
they occur between a Left and a right context (ll, 
ri ). 
{ U~ -> L, II l~ _ r, } .... \[3\] 
.... { U,~ -> L,~ II l~_ rn } 
Unconditiomd parallel replacement denotes a 
similar relation where the replacement is not con- 
straint by contexts. 
Conditional parallel replacement corresponds to 
what Kaplan and Kay 0994) call "batch rules" 
where a. set of rules (replacements. .) is collected to- 
gel;her m a batch and performed m parallel, at the 
same time, in a way that all of them work on the 
same input, i.e. not one applies to the output of 
another replacement. 
2.1 Examples 
Regular expressions based on \[3\] can be abbrevi- 
ated if some of the Ut 1 EIt-I,OWF, I{ pairs, and/or 
some of the LEI.'T-I{IGIIT pairs, are equivalent. The 
complex expression: 
{ a -> b , b -> c II x _ y } ; \[4\] 
which contains multiple replacement in one left and 
right context, can be written in a more elementary 
way as two parallel replacements: 
{ a -> b II x _ y },{ b -> c II x _ y };\[5\] 
c Y 
a ? 
( X 
Figure 1: Transducer encoding \[4\] ~rnd \[5\] (Every arc 
with more than one label actually stands for a set of 
arcs with one label each.) 
Figure 1 shows the state diagram of a trans- 
ducer resulting from \[4\] or \[5\]. 'Fhe transducer 
maps the string xaxayby to xaxbyby following the 
path 0-1-2-1-3-0-0-0 and the string xbybyxa to 
xcybyxa following the path 0-1-3-0-0-0-1-2. 
The complex expression 
{ a -> b , b -> c I I x _ y , v _ w } , \[6\] 
{ a -> c\[I p - q } ; 
contains five single parallel replacements: 
{ a -> b I I x _ y } , \[7\] 
{ a-> b II v.~w } , {b->c 
II x_y} , {b->c 
II v_w} , 
{ a-> c II p-q } ; 
Contexts can be unspecified as in 
{ a->b II x _y , v_ , _w } ; \[8\] 
where a is replaced by b only when occuriug be- 
tween x and y, or after v, or before w. 
An unspecitied context is equivalent to ?% the 
universal (sigma-star) language. Similarly, a spec- 
itied context, such as x _ y, is actually interpreted 
as ?* x _ y ?*, that is, implicitly extending the 
context to infinity on both sides of the replacement. 
'l'his is a useful convention, but we also need to be 
able to refer explicitly to the beginning or the end 
of a string. For this purpose, we introduce a special 
symbol, .#. (Kaplan and Kay, 1994, p. 349). 
In the example 
{ a -> b II .#.- , v _ 7 ? .#,} ; \[9\] 
a is replaced by b only when it is at the beginning 
of a string or/)etween v and the two tinal symbols 
of a string I. 
2.2 ReI)la('ement of the Empty String 
The language described by the UI)PER \[)art of a 
replacement expression 2 
UPPER -> LOWER I I LEFT _ RIVET \[10\] 
can contain the empty string e. In this case, every 
string that is in the upper-side language of the re- 
lation, is mapped to an infinite set of strings in the 
lower-side language as the upper-side string can be 
considered as a concatenation of empty and non- 
empty substrings, with e at any position and in 
any number. E.g. 
~*-> ~ II -; \[11\] 
maps the string bb to the infinite set of strings bb, 
xbb, xbxb, xbxbx, xxbb, etc., since the language 
described by a* contains e, and the string bb can 
be considered as a result of any one of the concate- 
nations b~b, e~'b~b, e~'b~b, ~b~e.~b~c, 
~e~b~b, etc. 
For many practical l)urposes it is convenient to 
construct a version of empty-string replacement 
that allows only one application between any two 
adjacent symbols (Karttunen, 1995). In order not 
to confllse the notation by a non-standard interpre- 
tation of the notion of empty string, we introduce a 
special pair of brackets, \[. .\], placed around the 
1Note that .#. denotes the 1)eginning or the end of a 
string depending on whether it occurs in the left or the right 
context. 
2We describe this topic only for uni-direetional rep!.ace- 
merit from the upper to the lower side of a regular relation, 
trot analogous statements can be made for all other types of 
replacement mentioned in section 3. 
523 
upper side of a replacement expression that presup- 
poses a strict alternation of empty substrings and 
non-empty substrings of exactly one symbol: 
e x e y e z e ... \[12\] 
In applying this to the above example, we obtain 
\[. a* .1 -> x II - ; \[13\] 
that maps the string bb only to xbxbx since bb is 
here considered exclusively as a result of the con- 
catenation c.~b~¢~b~. 
If contexts are specified (in opposition to the 
above example) then they are taken into account. 
2.3 The Algorithm 
2.3.1 Auxiliary Brackets 
The replacement of one substring by another one 
inside a context, requires the introduction of aux- 
iliary symbols (e.g. brackets). Kaplan and Kay 
(1994) motivate this step. 
If we would use an expression like 
1, \[Ui .x. Li\] ri \[14\] 
to map. a particular Ui (i E .\[1, n\]) to l,i when oc- 
curing between a left and a right context, li and ri, 
then every li and ri would map substring adjacent 
to Ui. 
However, this approach is impossible for the fol- 
lowing reason (Kaplan and Kay, 1994): In an ex- 
ample like 
{ a -> b II x _ x } ; \[15\] 
where we expect xaxax to be replaced by xbxbx, 
the middle x serves as a context for both a's. A 
relation described by \[14\] could not accomplish this. 
The middle x would be mapped either by an ri or 
by an li but not by both at the same time. That is 
why only one a could be replaced and we would get 
two alternative lower strings, xbxax and xaxbx. 
Therefore, we have to use the contexts, li and ri, 
without mapping them. For this purpose we intro- 
duce auxiliary brackets <i after every left context 
li and >i before every right context ri. The re- 
placement maps those brackets without looking at 
the actual contexts. 
We need separate brackets for empty and non- 
empty UPPER. If we used the same bracket for both 
this would mean an overlap of the substrings to 
replace in an example like X>l<la>l. Here we 
might have to replace >1<1 and <la>l where <1 
is part of both substrings. Because of this overlap, 
we could not replace both substrings in parallel, i.e. 
at the same time. To make the two replacements 
sequentially is also impossible in either order, for 
reasons in detail explained in (Kempe and Kart- 
tunen, 1995). 
A regular relation describing replacement in con- 
text (and a transducer that represents it), is defined 
by the composition of a set of "simpler" auxiliary 
relations. Context brackets occur only in interme- 
diate relations and are not present in the final re- 
suit. 
2.3.2 Preparatory Steps 
Before tile replacement we make the following three 
transformations: 
(1) Complex regular expressions like \[4\] are 
transformed into elementary ones like \[5\], where ev- 
ery single replacement consists of only one UI-'I~ER, 
one LOWER, one LEI?T and one RIGHT expression. 
E.g. 
{ \[.(a).\] -> b II x_ y } , 
{ \[ \] -> c , e -> f II v _ ~ } ; \[16\] 
would be expanded to 
{ \[.(a).\] -> b l\[ x _ y } , 
{ \[\]->~ I I v_.} , \[lr\] {~->f 
II v_w} ; 
(2) Since we have to use different types of brack- 
ets for the replacement of empty and non-empty 
UPPER (el. 2.3.1), we split the set of parallel re- 
placements into two groups, one containing only 
replacements with empty UPPER and the other one 
only with non-empty UPPER. If an UPPER contains 
the empty string but is not identical with it, the 
replacement will be added to both groups but with 
a different UPPER. E.g. \[\].7\] wouldbe split into 
{ a->b II x_y} , 
{ e -> f II v _   } ; \[18\] 
the group of non-empty UPPER and 
{ \[..\] -> b II x_ y } , 
{ \[ \] -> e II v _ ~ } ; \[19\] 
the group of empty UPPER. 
(3) All empty UPPER of type \[ \] are trans- 
formed into type \[. .\] and the corresponding 
LOWER are replaced by their Kleene star flmction. 
E.g. \[19\] would be transformed into 
{ \[..\] ->b II x_y } , 
{ \[..\] -> c* II v_ w } ; \[20\] 
The following algorithm of conditional parallel 
replacement will consider all empty UPPER as being 
of type \[. . \], i.e. as not being adjacent to another 
empty string. 
2.3.3 The Replacement itself 
Apart fi'om the previously explained symbols, we 
will make use of the following symbols in the next 
regular expressions: \[21\] 
<o,, \[ <,~ I...I <mE \], union of all left brackets 
for empty UPPER. 
>~,~ \[ >~ I...I >r,~ \], union of all right brackets 
tor empty UPPER. 
><,uE \[ <~uE I >~uE \] 
<~,~,~ \[ <l I...I <- \], union of all left brackets for 
non-empty UPPER. 
>~,N, \[ >1 I...I >., \], union of Ml right brackets for 
non-empty UPPER. ><alINEl<allNE\]>al~N \] \] 
<all <all14 <aliNE 
>all ~>allrg >aliNE 
.1 Ignore-inside operator. 
Example: abc./x = \[abc/x\] - \[x ?*\]- \[?* x\], 
inside the string abe, i.e_ laetween'a and b 
and between b and c, alL x will be ignored any number of times. 
624 
We compose the conditional parallel replacement 
of the six auxiliary relations described by Kaplan 
and Kay (1994) and Karttunen (1995) which are: 
(1) InsertBrackets \[22\] 
(2) ConstrainBrackets 
(3) LeftContext 
(4) RightContext 
(5) Replace 
(6) RemoveBrackets 
The composition of these relations in the above 
order, defines the npward-oriented replacement. 
The resulting transducer maps UPPER inside an irt- 
put string to LOWER: when UPl't,;I/, is between l,l~\]l,"\[" 
and tlIGHT in the input context, leaving everything 
else unchanged. Other wu:iants of the replacement 
opel:ator will be defined later. 
For every single replacement { Ui -> 1,i II li 
ri } we introduce a separate pair of brackets <i 
and >i with i • \[1E...mE\] if UPI'Et{ is identical 
with the empty string and i ff \[\]...n\] if UPPEI-t does 
not contain the empty string. A left bracket <i 
indicates the end of a complete left context. A right 
bracket >i marks the beginning of a complete right 
context. 
We define the component relations irl the fol- 
lowing way. Note that UI'PI,\]R, LOW|!\]t{, I,I,;FT and 
IHGtIT (Ui, Li, li and ri) stand for regular expres- 
sions of any complexity but restricted to denote 
regular languages. Consequently, they are repre- 
sented by networks thai; contain no fst pairs. 
(1) InsertBrackets 
\[ \] <- ><~u \[23\] 
The relation inserts instances of all brackets on 
the lower side (everywhere and in any numl)er and 
order). 
(2) ConstrainBraekets 
~$\[ >~tz~¢ \[ >,,uN~," \] \] \[24\] 
"$\[ <,uF, \[ >,,, \] \] 
g ~$\[ <.rove \[ <,mzz I >~u \] \] 
The language does not apply to single brackets 
but to their types and allows them to be only in 
the following order: 
>atlNt,7,* >a/IF,* <all/';* <aaNI,:* \[25\] 
The composition of the steps (1) and (2) invokes 
this constraint, which is necessary for the tbllowing 
reasons: 
If we allowed sequences like <3 Ua <1>3 U1 >1 
we would have an overlap of the two substrin~s 
<a U3 >:l and <, U1 >1 which have to be replacea. 
Itere, either U1 or Ua could be replaced but not 
both at the same time. 
If we permitted sequences like >11z<=<ll~' U2 >2 
we would also have an overlap of the two re- 
placements which means we could either replace 
<2 U2 >u or >lU<lle but not both. 
(3) LeftContext 
~ ~ ... e ~ \[26\] 
for all i6 \[lE...mE, 1...n\] , li = 
~$\[ -\[h.l><au\] (><.u-<O* <, \] 
g~ ~$\[ \[li.l><~u\] (><.t,- <i)* ~<i \] 
The constraint forces every instance of a left 
bracket <i to be immediately preceded by tilt; cor- 
responding left context li and every instance of'li to 
be immediately folk)wed by <i, ignoring all brack- 
ets that are different from <i irlbetween, and all 
brackets ..... (<i included. • ) inside, . Ii ( .... /.) We ,separately.,. 
make the constraints Ai for every <i and li and then 
intersect them in order to get tim constraint for all 
left brackets and contexts. 
(4) RightContext 
o, ~ ... e, o,, \[27\] 
for all i6\[1E...mE, l...n\] , pl = 
~$\[ >i (><~u - >i)* ~\[ri.l><,u\] \] 
g: ~$\[ ">/ (><aU -- >i)* \[ri.l><,u\] \] 
'l'he constraint relates instances of right brackets 
>i and of right contexts ri, attd is the mirror im- 
age of step (3). We &;rive it from the left context 
constraint by reversing every right context r~, be- 
fore making the single constraints ,~i (not pi) and 
revel:sing again the result after having intersected 
all )h. 
(5) Replace 
EHn\],Ar \[28\] 
'i'he relation mal)s every bracketed I.Jl'l'l,;I/, 
<i Ui >i for non-empty UI'PEI{ and >i<i for empty 
UPPI)\]I/., to the corresponding bracketed LOWEll, 
<i Li >i, leaving everything else unchanged. 
i ' \] string not 'l'he term N" n \[28 means a that does 
contain ~my bracketed UPPEI{: 
.IV" = J~IE g...g #~mE g J~'l gO...g J~n \[29\] 
A particular bracketed empty UPPEll >i<i is ex- 
cluded l¥om the correspondiug N (i • \[~Z,:, ,,,lC\]) 
by 
aV, = ~$\[>, \[><,a,,~ - >i - <i\]* <d \[30\] 
and a bracketed non-empty UPPER <i Ui >i is ex- 
cluded from the corresponding A// (i • \[1, n\]) by 
= ~$\[<~ \[<~UN~,: - <d* \[31\] 
UI ,/'><all \[>allNt,1 - >i\]* >i\] 
I he term T¢m expression \[28\] abbrevmtes a re- 
lation that maps any bracketed -UPPER to the cor- 
responding bracketed I,OWER. It is the union ot' all 
single TQ relations mapping all occurl:ences of one 
Ui (empty and non-empty) to the corresponding 
Li: 
T¢ = "R.~r, I... I "1"¢,,~; I 7~ I... I T¢,, \[32\] 
The replacement "/6i of non-empty UPPEIL 
Ui (i • \[1, n\]) is performed by: 
<i \[ \[U~.Z><.,,\].x.\[L~.Z><~,\] \] >¢ \[33\] 
To illustrate this: Suppose we have a set of re- 
placements containing among others 
a-> b II x_ y ; \[34:\] 
This particular replacement is done by mapping in- 
side an input string every substring that looks like 
(underlined part) \[35\] 
...x >2>l>IE<1N<2 <18->1 >2>IE<IE<I<2y... 
using the brackets <1 and >t to a substring (un- 
derlined part) 
625 
r,l~l 
• ..X >2>1>lE<lE<2 <lb>l >2>1/~<1E<1<2~ ,'.vj. 
The replacement T~i of empty UPPER Ui 
(i 6 \[1E, mE\]) is performed by: 
\[ 0.x. \[\[><au~ - <i\]I d\[<aZZN~\]\] \]* \[37\] 
\[>i.x.<i\] \[ 0.x.\[Li.l><~u\]\] \[<i.x.>i\] 
\[ 0.x.E\[><au~ - >i\] I E>~,INE\]\] \]* 
In the following example we replace the empty 
U2E by L2E. Suppose we have in total one replace- 
ment of non-empty UPPER and two of empty UP- 
PER, one of which is 
\[..\] -> b I I x_ y ; \[38\] 
This replacement is done by mapping inside a 
string every substring that looks like (underlined 
part) 
...x >1>1E >2E <2E <1E<1 y... \[39\] 
using the brackets >2E<2E into a substring (un- 
derlined part) 
...x >1>1. I>1  I<1. I<d* \[40\] 
<2Eb>2E 
\[>1 I>1~ \[<1~ I <2El* <1E<1 y... 
The occurrence of exactly one bracket pair >iE 
and <iE between a left and a right context, actually 
corresponds to the definition of a (single) empty 
string expressed by \[. .\] (ef. sac. 2.2). 
The brackets \[>2E t >lE I <lE I <1\] and 
\[>1 \]>rE I <lE \] <2El in \[40\] are inserted on the 
lower side any number of times (including zero), i.e. 
they exist optionally, which makes them present if 
checking for the left or right context requires them, 
and absent if they are not allowed in this place. 
This set of brackets does not contain those ones 
used for the replacement, >i<i, because if we later 
check for them we do not want this check to be al- 
ways satisfied but only when the specified contexts 
are present, in order to be able to confirm or to 
cancel the replacement a posteriori. 
This set of optionally inserted brackets equally 
does not contain those which potentially could be 
used for the replacement of adjacent non-empty 
strings, i.e. >aUNE on the left and <aUNE on the 
right side of the expression. Otherwise, checking 
later for the legitimacy of the adjacent replace- 
ments would no longer be possible. 
(6) RemoveBrackets 
-> \[ \] \[41\] 
The relation eliminates from the lower-side lan- 
guage all brackets that appear on the upper side. 
3 Variants of Replacement 
3.1 Application of context constraints 
We distinguish four ways how context can constrain 
the replacement. The difference between them is 
where the left and the right contexts are expected, 
on the upper or on the lower side of the relation, i.e. 
LEFT and RIGHT contexts can be checked before or 
after the replacement. 
We obtain these four different applications of 
context constraints (denoted by I1, //, \\ and 
V) by varying the order of the auxiliary rela- 
tions (steps (3) to (5)) described in section 2•3.3 
(cf. \[221): 
(a) Upward-oriented 
{ U1 -> L1 II 11 _ ra } .... \[42\] 
.... { U.-> L. II In _r. } 
•..LeftContext .o. RightContext .o. Replace•.. 
(b) Right-oriented 
{ U1 -> LI II h - rl } .... \[43\] 
•..Righteontext .o. Replace . o. LeftContext... 
(c) Left-oriented 
{ vl -> L1 \\ 11 - ,'1 } .... \[44\] 
•..LeftContext .o. Replace .o. RightContext... 
(d) Downward-oriented 
{ /\]1 -> L1 \/ 11 _ rl } .... \[45\] 
•..Replace .o. LeftContext .o. RightContext... 
The versions (a) to.()c roughly, correspond to 
the three alternative interpretations of phonolog- 
ical rewrite rules discussed in Kaplan and Kay 
(1994). The upward-oriented version corresponds 
to the simultaneous rule application; the right- and 
left-oriented versions can model rightward or left- 
ward iterating processes, such as vowel harmony 
and assimilation. 
In the downward-oriented replacement the oper- 
ation is constrained by the lower (left and right) 
context. Here the Ui get mapped to the corre- 
sponding L/ just in case they end up between l{ 
and ri in the output string. 
3.2 Inverse, bidirectional and optional 
replacement 
Replacement as described above, ->, maps every 
U{ on the upper side unambiguously to the corre- 
sponding Li on the lower side but not vice versa. 
A L{ on the lower side gets mapped to Li or U{ on 
the upper side. 
The inverse replacement, <-, maps unambigu- 
ously from the lower to the upper side only. The 
bidirectional replacement, <->, is unambiguous in 
both directions. 
Replacements of all of these three types (direc- 
tions) can be optional, (->) (<-) (<->), i.e. they 
are either made or not. We define such a relation 
by changing Af (the part not containing any brack- 
eted UPPER) in expression \[28\] into ?* that accepts 
every substring: 
\[ ?* ~\]* ?* \[46\] 
Here an Ui is either mapped by the corresponding 
TQ contained in 7~ (cf. \[32\]) and therefore replaced 
by Li, or it is mapped by ?* and not replaced. 
4 A Practical Application 
In this section we illustrate the usefulness of the 
replace operator using a practical example. 
We show how a lexicon of French verbs ending in 
-it, inflected in the present tense subjunctive mood, 
can be derived from a lexicon containing the corre- 
sponding present indicative forms. We assume here 
that irregular verbs are encoded separately. 
It is often proposed that the present subjunctive 
of-it verbs be derived, for the most basic case, from 
626 
a stem in -iss- (e.g.: finir/finiss) rather than from 
a more general root (e.g.: fin(i)) because once this 
stern is assumed, the snbjunctive ending itself be- 
comes completely regular: 
(that l finish) (that I run) 
que je flniss-c que je cour-e 
que tu finiss-cs quc tu cour-es 
que ils flniss-ent qucils cour-en* 
'\]?he algorithm we propose },ere, is strMghtfor- 
ward: We first derive the present subjunctive stem 
from the third person plural t)resent indicative 
(e.g'...fini~%'~ cour), then append the suffix corre- 
sponding to the given person and number. 
The first step can be described as follows: 
define LETTER : \[47\] 
a I b I c I d I .... ; 
define TAG : \[48\] 
SubjPI ... IsGI... IPal ... IVerbl... ; 
define StemRegular : \[49\] 
\[ \[..\] <-> IndP PL P3 Verb \[\[ LETTER _ TAG \] 
.o. 
\[ LexInd TAG+ \] 
°o. 
\[ e n t <-> SUFF 1\] _ TAG \] ; 
The first transducer in \[49\] inserts the tags of the 
third person plural present indicative between the 
word and the tags of the actually required subjunc- 
tive form. The second transducer in\[49\] which is an 
indicative lexicon of -Jr verbs, concatenated with a 
sequence of at least one tag, provides the indica- 
tive form and keeps the initial subjunctive tags. 
The last transducer in \[49\] replaces the suffix -cut 
by the symbol SUFF. E.g.: 
inir ................... SubjP PL P2 Verb 
finir _ IndP PL P3_Verb SubjP PL_P2 Verb 
f inissent ............... SubjP PL_P2 Verb 
finis s_SUFF ............. Subj P_PL_P2_Verb 
'I?o append the appropriate suffix to the subjunc- 
tive stem, we use the following transducer which 
maps the symbol SUFF to a suffix and deletes all 
tags: \[50\] 
define Suffix : 
\[ { SUFF -> e II _ TAG* SG \[PIIP3\] }, 
{ SUFF -> e s 11 _ TAG* SG P2 }, 
{ SUFF -> i o n s II _ TAG* PL P1 }, 
{ SUFF -> i e z I I _ TAG* PL P2 }, 
{ SUFF -> e n t It _ TAG* PL P3 } \] 
.o. 
\[ TAG -> \[ \] \] ; 
The complete generation of subjunctive forms can 
be described by the composition: 
define LexSnbjP : \[51\] 
StemRegular .o. Suffix ; 
The resulting (single) transducer LexSubjP rep- 
resents a lexicon of present subjunctive forms of 
French verbs ending in -iv. It maps the infinitive of 
those verbs followed by a sequence of subjunctive 
tags, to the corresponding inflected surface form 
and vice versa. 
All intermediate transducers mentioned in this 
section will contribute to this finM t, ransducer bnt 
will themselves disappear. 
The regular expressions in this section could also 
be written in the two-level formalism (Kosken- 
niemi, 1983). However, some of them can be ex- 
pressed more conveniently in the above way, espe-- 
ciMly when tile replace operator is used. 
E.g., the first line of \[49\], written above as: 
\[..\] <-> IndP PL P3 Verb I I LETTER _ TAG \[52\] 
would have to be expressed in the two-level formal- 
ism by four rules: 
O:IndP <=> LETTER _ (:PL)(:P3)(:Verb) TAG; \[53\] 
O:PL <=> LETTER (:IndP) _ (:P3)(:Verb) TAG; 
O:P3 <=> LETTER (:IndP)(:PL) _ (:Verb) TAG; 
0 :Verb <=> LETTER ( : IndP) (:PL) (:P3) TAG ; 
IIere, the difficulty comes not only from the large 
nmnber of rules we would have to write in the above 
example, but also from the fact that writing one of 
lihese rules requires to have in mind all the others, 
to avoid inconsistencies between them. 
Acknowledgements 
This work builds on the research by Ronald Kaplan 
and Martin Kay on the finite-state calculus and the 
implementation of phonological rewrite rules (1994). 
Many thanks to our collegues at PARC and RXR.C 
Grenoble who helped us in whatever respect, partic- 
ularly to Annie Zaenen, Jean-Pierre Chanod, Marc 
Dymetman, Kenneth Beesley and Anne Schiller h)r 
helpfifl discussion on different topics, and to Irene 
Maxwell for correcting the paper. 
References 
Brill, Eric (1992). A Simple Rule-Based Part of Speech 
Tagger. I¥oc. 3rd conference on Applied Natural 
Language Processing. 'lYento, Italy, pp. 1152-155. 
Kaplan, Ronald M., and Kay, Martin (1981). Phono- 
logical Rules and Finite-State Transducers. Atmmd 
Meeting of the Linguistic Society of America. New 
York. 
l(aplan, R,onald M. and Kay, Martin (1994). Regular 
Models of Phonological Rule Systems. Computational 
Linguistics. 20:3, pp. 331-378. 
Karlsson, Fred, Voutilainei,, Atro, Heikkil£, Juha, 
and Anttila, Arto (1994). Constraint Grammar: 
a Language-Independent System for Parsing Unre- 
stricted Text. Mouton de Gruyter, Berlin. 
Karttunen, Lauri (1995). The Replace Operator. Prec. 
ACL-95. Cambridge, MA, USA. crap-lg/9504032 
Kempe, Andre and Karttunen, Lauri (1995). The Par- 
allel Replacement Operation in Finite State Calculus. 
Technical Report MLTT-021. Rank Xerox Research 
Centre, Grenoble Laboratory. Dec 21, 11995. 
htt p ://Www. xerox, fr/grenoble/mltt/repo rt s/home, htral 
Koskenniemi, Kimmo (1983). Two-Level Morphol- 
ogy: A General Computational Model for Word-Form 
Recognition and Production. Dept. of General Lin- 
guistics. University of Helsinki. 
Koskcnniemi, Kimmo (1990). Finite-State Parsing and 
l)isambiguation. Prec. Coling-90. Helsinki, Finland. 
Koskenniemi, Kimmo, 'l'apanainen, Pasi, and Vouti- 
lainen, Atro (1992). Compiling and using finite-state 
syntactic rules. Proc. Coling-92. Nantes, France. 
Roche, Emmanuel and Schabes, Yves (1995). De- 
terministic Part-of-Speech Tagging with Finite-State 
Transducers. Computational Linguistics. 21, 2, pp. 
227-53. 
Voutilainen, Atro (1994). Three Studies o\] Grammar- 
Based Surface Parsing of Unrestricted English ~l~xt. 
The University of Helsinki. 
627 
