An Efficient Compiler for Weighted Rewrite Rules 
Mehryar Mohri 
AT&T Research 
600 Mountain Avenue 
Murray Hill, 07974 NJ 
mohri@research, att. com 
Richard Sproat 
Bell Laboratories 
700 Mountain Avenue 
Murray Hill, 07974 NJ 
rws@bell-labs, com 
Abstract 
Context-dependent rewrite rules are used 
in many areas of natural language and 
speech processing. Work in computa- 
tional phonology has demonstrated that, 
given certain conditions, such rewrite 
rules can be represented as finite-state 
transducers (FSTs). We describe a new 
algorithm for compiling rewrite rules into 
FSTs. We show the algorithm to be sim- 
pler and more efficient than existing al- 
gorithms. Further, many of our appli- 
cations demand the ability to compile 
weighted rules into weighted FSTs, trans- 
ducers generalized by providing transi- 
tions with weights. We have extended 
the algorithm to allow for this. 
1. Motivation 
Rewrite rules are used in many areas of natural 
language and speech processing, including syntax, 
morphology, and phonology 1. In interesting ap- 
plications, the number of rules can be very large. 
It is then crucial to give a representation of these 
rules that leads to efficient programs. 
Finite-state transducers provide just such a 
compact representation (Mohri, 1994). They are 
used in various areas of natural language and 
speech processing because their increased compu- 
tational power enables one to build very large ma- 
chines to model interestingly complex linguistic 
phenomena. They also allow algebraic operations 
such as union, composition, and projection which 
are very useful in practice (Berstel, 1979; Eilen- 
berg, 1974 1976). And, as originally shown by 
Johnson (1972), rewrite rules can be modeled as 
1 Parallel rewrite rules also have interesting applica- 
tions in biology. In addition to their formal language 
theory interest, systems such as those of Aristid Lin- 
denmayer provide rich mathematical models for bio- 
logical development (Rozenberg and Sa\]omaa, 1980). 
231 
finite-state transducers, under the condition that 
no rule be allowed to apply any more than a finite 
number of times to its own output. 
Kaplan and Kay (1994), or equivalently Kart- 
tunen (1995), provide an algorithm for compiling 
rewrite rules into finite-state transducers, under 
the condition that they do not rewrite their non- 
contextual part 2. We here present a new algorithm 
for compiling such rewrite rules which is both sim- 
pler to understand and implement, and computa- 
tionally more efficient. Clarity is important since, 
as pointed out by Kaplan and Kay (1994), the rep- 
resentation of rewrite rules by finite-state trans- 
ducers involves many subtleties. Time and space 
efficiency of the compilation are also crucial. Us- 
ing naive algorithms can be very time consuming 
and lead to very large machines (Liberman, 1994). 
In some applications such as those related 
to speech processing, one needs to use weighted 
rewrite rules, namely rewrite rules to which 
weights are associated. These weights are then 
used at the final stage of applications to output the 
most probable analysis. Weighted rewrite rules 
can be compiled into weighted finite-state trans- 
ducers, namely transducers generalized by pro- 
viding transitions with a weighted output, under 
the same context condition. These transducers 
are very useful in speech processing (Pereira et 
al., 1994). We briefly describe how we have aug- 
mented our algorithm to handle the compilation 
of weighted rules into weighted finite-state trans- 
ducers. 
In order to set the stage for our own contribu- 
tion, we start by reviewing salient aspects of the 
Kaplan and Kay algorithm. 
2The genera\] question of the decidability of the 
halting problem even for one-rule semi-Thue systems 
is still open. Robert McNaughton (1994) has recently 
made a positive conjecture about the class of the rules 
without self overlap. 
Prologue o 
I d( Obligatory( ¢ , <i , >)) 
Id(Rightcontezt(p, <, >)) 
Replace 
Id(Leftcontezt(A, <, >)) 
Prologue -i 
= Id(Z~< 0 <i ¢°< > B~,< 0) o 
= Id((  0 > p>0Z 0- > > p>0  0- > o 
= \[Id(~*<,>, o)Opt(Id(<a)¢°<c>c × ¢°c>cId(>a))\]* o 
-" Id((~0A<0 - ~0 < < Z~0 f\] ~0A<0 - ~0 < < ~0)>) o 
Figure 1: Compilation of obligatory left-to-right rules, using the KK algorithm. 
(1) 
2. The KK Algorithm 
The rewrite rules we consider here have the fol- 
lowing general form: 
¢ --, p (2) 
Such rules can be interpreted in the following way: 
¢ is to be replaced by ¢ whenever it is preceded 
by A and followed by p. Thus, A and p represent 
the left and right contexts of application of the 
rules. In general, ¢, ¢, A and p are all regular 
expressions over the alphabet of the rules. Several 
types of rules can be considered depending on their 
being obligatory or optional, and on their direction 
of application, from left to right, right to left or 
simultaneous application. 
Consider an obligatory rewrite rule of the form 
¢ --+ ¢/A p, which we will assume applies left to 
right across the input string. Compilation of this 
rule in the algorithm of Kaplan and Kay (1994) 
(KK for short) involves composing together six 
transducers, see Figure 1. 
We use the notations of KK. In particular, 
denotes the alphabet, < denotes the set of context 
labeled brackets {<a, <i, <c}, > the set {>a, >i, 
>c}, and 0 an additional character representing 
deleted material. Subscript symbols of an expres- 
sion are symbols which are allowed to freely ap- 
pear anywhere in the strings represented by that 
expression. Given a regular expression r, Id(r) is 
the identity transducer obtained from an automa- 
ton A representing r by adding output labels to A 
identical to its input labels. 
The first transducer, Prologue, freely intro- 
duces labeled brackets from the set {<a, <i, 
<~, >a, >i, >~} which are used by left and 
right context transducers. The last transducer, 
Prologue -i, erases all such brackets. 
In such a short space, we can of course not 
hope to do justice to the KK algorithm, and the 
reader who is not familiar with it is urged to con- 
sult their paper. However, one point that we do 
need to stress is the following: while the con- 
struction of Prologue, Prologue -i and Replace 
232 
is fairly direct, construction of the other transduc- 
ers is more complex, with each being derived via 
the application of several levels of regular oper- 
ations from the original expressions in the rules. 
This clearly appears from the explicit expressions 
we have indicated for the transducers. The con- 
struction of the three other transducers involves 
many operations including: two intersections of 
automata, two distinct subtractions, and nine 
complementations. Each subtraction involves an 
intersection and a complementation algorithm 3. 
So, in the whole, four intersections and eleven 
complementations need to be performed. 
Intersection and complementation are classi- 
cal automata algorithms (Aho et al., 1974; Aho 
et al., 1986). The complexity of intersection is 
quadratic. But the classical complementation al- 
gorithm requires the input automaton to be de- 
terministic. Thus, each of these 11 operations re- 
quires first the determinization of the input. Such 
operations can be very costly in the case of the 
automata involved in the KK algorithm 4. 
In the following section we briefly describe a 
new algorithm for compiling rewrite rules. For rea- 
sons of space, we concentrate here on the com- 
pilation of left-to-right obligatory rewrite rules. 
However, our methods extend straightforwardly to 
other modes of application (optional, right-to-left, 
simultaneous, batch), or kinds of rules (two-level 
rules) discussed by Kaplan and Kay (1994). 
3A subtraction can of course also be performed di- 
rectly by combining the two steps of intersection and 
complementation, but the corresponding algorithm 
has exactly the same cost as the total cost of the two 
operations performed consecutively. 
4 One could hope to find a more efficient way of de- 
termining the complement of an automaton that would 
not require determinization. However, this problem 
is PSPACE-complete. Indeed, the regular expression 
non-universality problem is a subproblem of comple- 
mentation known to be PSPACE-complete (Garey and 
Johnson, 1979, page 174), (Stockmeyer and Meyer, 
1973). This problem also known as the emptiness 
of complement problem has been extensively studied 
(Aho et al., 1974, page 410-419). 
3. New Algorithm 
3.1. Overview 
In contrast to the KK algorithm which introduces 
"brackets everywhere only to restrict their occur- 
rence subsequently, our algorithm introduces con- 
text symbols just when and where they are needed. 
Furthermore, the number of intermediate trans- 
ducers necessary in the construction of the rules 
is smaller than in the KK algorithm, and each of 
the transducers can be constructed more directly 
and efficiently from the primitive expressions of 
the rule, ~, ~, A, p. 
A transducer corresponding to the left-to- 
right obligatory rule ¢ --* ¢/A p can be ob- 
tained by composition of five transducers: 
r o f o replace o 11 o 12 (3) 
1. The transducer r introduces in a string a 
marker > before every instance of p. For rea- 
sons that will become clear we will notate this 
as Z* p --~ E* > p. 
2. The transducer f introduces markers <1 and 
<2 before each instance of ~ that is followed 
by >: u u {>})'{<1, <2 
}5 >. In other words, this transducer/harks 
just those ~b that occur before p. 
3. The replacement transducer replace replaces 
~b with ~ in the context <1 ~b >, simultane- 
ously deleting > in all positions (Figure 2). 
Since >, <1, and <2 need to be ignored when 
determining an occurrence of ~b, there are 
loops over the transitions >: c, <1: ¢, <~: c 
at all states of ¢, or equivalently of the states 
of the cross product transducer ¢ × ~. 
4. The transducer 11 admits only those strings 
in which occurrences of <1 are preceded 
by A and deletes <l at such occurrences: 
5. The transducer 12 admits only those strings 
in which occurrences of <2 are not preceded 
by A and deletes <~ at such occurrences: 2*X <2-~ ~*~. 
Clearly the composition of these transducers leads 
to the desired result. The construction of the 
transducer replace is straightforward. In the fol- 
lowing, we show that the construction of the other 
four transducers is also very simple, and that it 
only requires the determinization of 3 automata 
and additional work linear (time and space) in the 
size of the determinized automata. 
3.2. Markers 
Markers of TYPE 1 
Let us start by considering the problem of con- 
structing what we shall call a TYPE I transducer, 
233 
Figure 2: Replacement transducer replace in the 
obligatory left-to-right case. 
which inserts a marker after all prefixes of a string 
that match a particular regular expression. Given 
a regular expression fl defined on the alphabet E, 
one can construct, using classical algorithms (Aho 
et al., 1986), a deterministic automaton a repre- 
senting E*fl. As with the KK algorithm, one can 
obtain from a a transducer X = Id(a) simply by 
assigning to each transition the same output label 
as the input label. We can easily transform X into 
a new transducer r such that it inserts an arbi- 
trary marker ~ after each occurrence of a pattern 
described by ~. To do so, we make final the non- 
final states of X and for any final state q of X we 
create a new state q~, a copy of q. Thus, q' has 
the same transitions as q, and qP is a final state. 
We then make q non-final, remove the transitions 
leaving q and add a transition from q to q' with 
input label the empty word c, and output ~. Fig- 
ures 3 and 4 illustrate the transformation of X into 
T. 
a:a cic 
Figure 3: Final state q of X with entering and 
leaving transitions. 
ata ctc 
Figure 4: States and transitions of r obtained by 
modifications of those of X. 
Proposition 1 Let ~ be a deterministic automa- 
ton representing E*/3, then the transducer r ob- 
tained as described above is a transducer post- 
marking occurrences of fl in a string ofF* by #. 
Proof. The proof is based on the observa- 
tion that a deterministic automaton representing 
E*/~ is necessarily complete 5. Notice that non- 
deterministic automata representing ~*j3 are not 
necessarily complete. Let q be a state of a and let 
u E ~* be a string reaching q6. Let v be a string 
described by the regular expression ft. Then, for 
any a E ~, uav is in ~*~. Hence, uav is accepted 
by the automaton a, and, since ~ is deterministic, 
there exists a transition labeled with a leaving q. 
Thus, one can read any string u E E* using the 
automaton a. Since by definition of a, the state 
reached when reading a prefix u ~ of u is final iff 
u ~ E ~*~, by construction, the transducer r in- 
serts the symbol # after the prefix u ~ iff u ~ ends 
with a pattern of ft. This ends the proof of the 
proposition, t3 
Markers of TYPE 2 
In some cases, one wishes to check that any 
occurrence of # in a string s is preceded (or fol- 
lowed) by an occurrence of a pattern of 8. We 
shall say that the corresponding transducers are 
of TYPE 2. They play the role of a filter. Here 
again, they can be defined from a deterministic au- 
tomaton representing E*B. Figure 5 illustrates the 
modifications to make from the automaton of fig- 
ure 3. The symbols # should only appear at final 
states and must be erased. The loop # : e added 
at final states of Id(c~) is enough for that purpose. 
All states of the transducer are then made final 
since any string conforming to this restriction is 
acceptable: cf. the transducer !1 for A above. 
#:E 
Figure 5: Filter transducer, TYPE 2. 
5An automaton A is complete iff at any state q and 
for any element a of the alphabet ~ there exists at least 
one transition leaving q labeled with a. In the case of 
deterministic automata, the transition is unique. 
6We assume all states of a accessible. This is true 
if a is obtained by determinization. 
234 
Markers of TYPE 3 
In other cases, one wishes to check the reverse 
constraint, that is that occurrences of # in the 
string s are not preceded (or followed) by any oc- 
currence of a pattern of ft. The transformation 
then simply consists of adding a loop at each non- 
final state of Id(a), and of making all states final. 
Thus, a state such as that of figure 6 is trans- 
a:a c:c 
Figure 6: Non-final state q of a. 
formed into that of figure 5. We shall say that the 
corresponding transducer is of TYPE 3: cf. the 
transducer 12 for ~. 
The construction of these transducers (TYPE 
1-3) can be generalized in various ways. In par- 
ticular: 
• One can add several alternative markers 
{#1,'", #k} after each occurrence of a pat- 
tern of 8 in a string. The result is then an 
automaton with transitions labeled with, for 
instance, ~1,'" ", ~k after each pattern of fl: 
cf. transducer f for ¢ above. 
• Instead of inserting a symbol, one can delete 
a symbol which would be necessarily present 
after each occurrence of a pattern of 8. 
For any regular expression a, de- 
fine M arker( a, type, deletions, insertions) as the 
transducer of type type constructed as previously 
described from a deterministic automaton repre- 
senting a, insertions and deletions being, respec- 
tively, the set of insertions and deletions the trans- 
ducer makes. 
Proposition 2 For any regular expression 
a, Marker(a, type, deletions, insertions) can be 
constructed from a deterministic automaton rep- 
resenting a in linear time and space with respect 
to the size of this automaton. 
Proof. We proved in the previous proposition that 
the modifications do indeed lead to the desired 
transducer for TYPE 1. The proof for other cases 
is similar. That the construction is linear in space 
is clear since at most one additional transition and 
state is created for final or non-final states 7. The 
overall time complexity of the construction is lin- 
ear, since the construction of ld(a) is linear in the 
~For TYPE 2 and TYPE 3, no state is added but only 
a transition per final or non-final state. 
r = \[reverse(Marker(E*reverse(p), 1, {>},0))\] 
f = \[reverse(Marker((~ U {>})*reverse(C> >), 1, {<1, <u},0))\] 
11 = \[Marker(N*)L 2,0, {<1})\]<~:<2 
12 = \[Marker($*A,3,@, {<2})\] 
Figure 7: Expressions of the r, f, ll, and 12 using Marker. 
(4) 
(5) 
(6) 
(7) 
number of transitions of a and that other modifi- 
cations consisting of adding new states and transi- 
tions and making states final or not are also linear. 
D 
We just showed that Marker(a,type, de- 
letions, insertions) can be constructed in a very 
efficient way. Figure 7 gives the expressions of the 
four transducers r, f, ll, and 12 using Marker. 
Thus, these transducers can be constructed 
very efficiently from deterministic automata repre- 
senting s ~*reverse(p), (~ O {>})* reverse(t> >), 
and E*,~. The construction of r and f requires 
two reverse operations. This is because these two 
transducers insert material before p or ¢. 
4. Extension to Weighted Rules 
In many applications, in particular in areas re- 
lated to speech, one wishes not only to give all 
possible analyses of some input, but also to give 
some measure of how likely each of the analyses is. 
One can then generalize replacements by consid- 
ering extended regular expressions, namely, using 
the terminology of formal language theory, ratio- 
nal power series (Berstel and Reutenauer, 1988; 
Salomaa and Soittola, 1978). 
The rational power series we consider here are 
functions mapping ~* to ~+ U {oo) which can be 
described by regular expressions over the alphabet 
(T~+ U {co}) x ~. S = (4a)(2b)*(3b) is an example 
of rational power series. It defines a function in 
the following way: it associates a non-null num- 
ber only with the strings recognized by the regu- 
lar expression ab*b. This number is obtained by 
adding the coefficients involved in the recognition 
of the string. The value associated with abbb, for 
instance, is (S, abbb) = 4 + 2 + 2 + 3 = 11. 
In general, such extended regular expressions 
can be redundant. Some strings can be matched 
SAs in the KK algorithm we denote by ¢> the set 
of the strings described by ¢ containing possibly oc- 
currences of > at any position. In the same way, sub- 
scripts such as >:> for a transducer r indicate that 
loops by >:> are added at all states of r. We de- 
note by reverse(a) the regular expression describing 
exactly the reverse strings of a if a is a regular expres- 
sion, or the reverse transducer of a if a is a transducer. 
235 
in different ways with distinct coefficients. The 
value associated with those strings is then the min- 
imum of all possible results. S' = (2a)(3b)(4b) + 
(5a)(3b*) matches abb with the different weights 
2+3+4 -- 9 and 5+3+3 = 11. The mini- 
mum of the two is the value associated with abb: 
(S', abb) = 9. Non-negative numbers in the defi- 
nition of these power series are often interpreted 
as the negative logarithm of probabilities. This 
explains our choice of the operations: addition of 
the weights along the string recognition and min, 
since we are only interested in that result which 
has the highest probability 9. 
Rewrite rules can be generalized by letting ¢ 
be a rational power series. The result of the ap- 
plication of a generalized rule to a string is then 
a set of weighted strings which can be represented 
by a weighted automaton. Consider for instance 
the following rule, which states that an abstract 
nasal, denoted N, is rewritten as m in the context 
of a following labial: 
Y ---* m/__\[+labial\] (8) 
z Now suppose that this is only probabilistically 
true, and that while ninety percent of the time 
N does indeed become m in this environment, 
about ten percent of the time in real speech it be- 
comes n. Converting from probabilities to weights, 
one would say that N becomes m with weight 
a = -log(0.9), and n with weight fl = -log(0.1), 
in the stated environment. One could represent 
this by the following rule: 
N --* am + fin/__\[+labial\] (9) 
We define Weighted finite-state transducers as 
transducers such that in addition to input and out- 
put labels, each transition is labeled with a weight. 
The result of the application of a weighted 
transducer to a string, or more generally to an 
automaton is a weighted automaton. The corre- 
sponding operation is similar to the unweighted 
case. However, the weight of the transducer 
and those of the string or automaton need to 
be combined too, here added, during composition 
(Pereira et al., 1994). 
9Using the terminology of the theory of languages, 
the functions we consider here are power series de- 
fined on the tropical semiring (7~+U{oo}, min, +, (x), 0) 
(Kuich and Salomaa, 1986). 
\"_/Too 
p:p/O 
N:N/O~ 
N:m/a N:n/l~ 
:0~ N:N/O 
Figure 8: Transducer representing the rule 9. 
We have generalized the composition opera- 
tion to the weighted case by introducing this com- 
bination of weights. The algorithm we described 
in the previous sections can then also be used to 
compile weighted rewrite rules. 
As an example, the obligatory rule 9 can be 
represented by the weighted transducer of Fig- 
ure 8 10. The following theorem extends to the 
weighted case the assertion proved by Kaplan and 
Kay (1994). 
Theorem 1 A weighted rewrite rule of the type 
defined above that does not rewrite its non- 
contextual part can be represented by a weighted 
finite-state transducer. 
Proof. The construction we described in the pre- 
vious section also provides a constructive proof 
of this theorem in the unweighted case. In case 
¢ is a power series, one simply needs to use in 
that construction a weighted finite-state trans- 
ducer representing ¢. By definition of composition 
of weighted transducers, or multiplication of power 
series, the weights are then used in a way consis- 
tent with the definition of the weighted context- 
dependent rules, o 
5. Experiments 
In order to compare the performance of the M- 
gorithm presented here with KK, we timed both 
algorithms on the compilation of individual rules 
taken from the following set (k • \[0, 10\]): 
a --* b~ c ~ (10) 
a --* b~ c k (11) 
1°We here use the symbol ~ to denote all letters 
different from b, rn, n, p, and N. 
236 
In other words we tested twenty two rules where 
the left context or the right context is varied in 
length from zero to ten occurrences of c. For our 
experiments, we used the alphabet of a realistic 
application, the text analyzer for the Bell Labora- 
tories German text-to-speech system consisting of 
194 labels. All tests were run on a Silicon Graph- 
ics IRIS Indigo 4000, 100 MhZ IP20 Processor, 
128 Mbytes RAM, running IRIX 5.2. Figure 9 
shows the relative performance of the two algo- 
rithms for the left context: apparently the per- 
formance of both algorithms is roughly linear in 
the length of the left context, but KK has a worse 
constant, due to the larger number of operations 
involved. Figure 10 shows the equivalent data for 
the right context. At first glance the data looks 
similar to that for the left context, until one no- 
tices that in Figure 10 we have plotted the time on 
a log scale: the KK algorithm is hyperexponential. 
What is the reason for this performance degra- 
dation in the right context? The culprits turn 
out to be the two intersectands in the expression 
of Rightcontext(p, <, >) in Figure 1. Consider 
for example the righthand intersectand, namely 
~0 > P>0~0- > ~0, which is the complement 
of ~0 > P>0~0- > ~0- As previously in- 
dicated, the complementation Mgorithm. requires 
determinization, and the determinization of au- 
tomata representing expressions of the form ~*a, 
where c~ is a regular expression, is often very ex- 
pensive, specially when the expression a is already 
complex, as in this case. 
Figure 11 plots the behavior of determiniza- 
tion on the expression Z~0 > P>0Z~0- > ~0 
for each of the rules in the set a ~ b/__c k, 
(k e \[0, 10\]). On the horizontal axis is the num- 
ber of arcs of the non-deterministic input machine, 
and on the vertical axis the log of the number of 
arcs of the deterministic machine, i.e. the ma- 
chine result of the determinization algorithm with- 
out using any minimization. The perfect linearity 
indicates an exponential time and space behav- 
ior, and this in turn explains the observed differ- 
ence in performance. In contrast, the construction 
of the right context machine in our algorithm in- 
volves only the single determinization of the au- 
tomaton representing ~*p, and thus is much less 
expensive. The comparison just discussed involves 
a rather artificiM ruleset, but the differences in 
performance that we have highlighted show up in 
real applications. Consider two sets of pronun- 
ciation rules from the Bell Laboratories German 
text-to-speech system: the size of the alphabet for 
this ruleset is 194, as noted above. The first rule- 
set, consisting of pronunciation rules for the ortho- 
graphic vowel <5> contains twelve rules, and the 
second ruleset, which deals with the orthographic 
q, 
o 
11 
/" -----nl / 
.../'/" 
/ 
m/'= 
i1~11 j 
0 0 0 `~'-" 0 / 0 ~ 0 
0__0/0 0 0 
I I I I 
2 4 6 $ 
L=tr~lm 011.~t Comxt 
Figure 9: Compilation times for rules of the form 
a ~ b/ck , (k E \[0, 10\]). 
9" 
o 
o 
./ / 
I: / N,* a~,~t.vn I / 
o/" /./ 
ii/11 
~0~0~0 ".''" 0 
i J i i i 
2 4 6 e 10 
Figure 10: Compilation times for rules of the form 
a ~ b/ c k, (k E \[0, 10\]). 
vowel <a> contains twenty five rules. In the ac- 
tual application of the rule compiler to these rules, 
one compiles the individual rules in each ruleset 
one by one, and composes them together in the 
order written, compacts them after each composi- 
tion, and derives a single transducer for each set. 
When done off-line, these operations of compo- 
Table 1: Comparison in a real example. 
I Rulesll KK II New I 
time space time space 
(s) states arcs (s) states arcs 
<5> 62 412 50,475 47 394 47,491 
<a> 284 1,939 215,721 240 1,927 213,408 
sition and compaction dominate the time corre- 
sponding to the construction of the transducer for 
each individual rule. The difference between the 
two algorithms appears still clearly for these two 
sets of rules. Table 1 shows for each algorithm 
the times in seconds for the overall construction, 
and the number of states and arcs of the output 
transducers. 
6. Conclusion 
We briefly described a new algorithm for compiling 
context-dependent rewrite rules into finite-state 
transducers. Several additional methods can be 
used to make this algorithm even more efficient. 
The automata determinizations needed for 
this algorithm are of a specific type. They repre- 
237 
sent expressions of the type ~*¢ where ¢ is a reg- 
ular expression. Given a deterministic automaton 
representing ¢, such determinizations can be per- 
formed in a more efficient way using failure func- 
tions (Mohri, 1995). Moreover, the corresponding 
determinization is independent of ~ which can be 
very large in some applications. It only depends 
on the alphabet of the automaton representing ¢. 
One can devise an on-the-fly implementation 
of the composition algorithm leading to the final 
transducer representing a rule. Only the neces- 
sary part of the intermediate transducers is then 
expanded for a given input (Pereira et al., 1994). 
The resulting transducer representing a rule 
is often subsequentiable or p-subsequentiable. It 
can then be determinized and minimized (Mohri, 
1994). This both makes the use of the transducer 
time efficient and reduces its size. 
We also indicated an extension of the theory 
of rule-compilation to the case of weighted rules, 
which compile into weighted finite-state transduc- 
ers. Many algorithms used in the finite-state the- 
ory and in their applications to natural language 
processing can be extended in the same way. 
To date the main serious application of this 
compiler has been to developing text-analyzers 
for text-to-speech systems at Bell Laboratories 
(Sproat, 1996): partial to more-or-less complete 
analyzers have been built for Spanish, Italian, 
French, Romanian, German, Russian, Mandarin 
and Japanese. However, we hope to also be able to 
use the compiler in serious applications in speech 
2 - 
co 
! 
O / 
/° /° 
/ /° 
/ / 
/ / 
I I t 
SOO 1;10 1120 
II S~S in I:bsr S 
Figure 11: Number of arcs in the non- 
deterministic automaton r representing PS = 
$ $ E~0 > P>0E>0- > E>0 versus the log of the num- 
ber of arcs in the automaton obtained by deter- 
minization of r. 
recognition in the future. 
Acknowledgements 
We wish to thank several colleagues of AT&T/_Bell 
Labs, in particular Fernando Pereira and Michael 
Riley for stimulating discussions about this work 
and Bernd MSbius for providing the German pro- 
nunciation rules cited herein. 

References 
Alfred V. Aho, John E. Hopcroft, and Jeffrey D. 
Ullman. 1974. The design and analysis of 
computer algorithms. Addison Wesley: Read- 
ing, MA. 
Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. 
1986. Compilers, Principles, Techniques and 
Tools. Addison Wesley: Reading, MA. 
Jean Berstel and Christophe Reutenauer. 
1988. Rational Series and Their Languages. 
Springer-Verlag: Berlin-New York. 
Jean Berstel. 1979. Transductions and Context- 
Free Languages. Teubner Studienbucher: 
Stuttgart. 
Samuel Eilenberg. 1974-1976. Automata, Lan- 
guages and Machines, volume A-B. Academic 
Press. 
Michael R. Garey and David S. Johnson. 1979. 
Computers and Intractability. Freeman and 
Company, New York. 
C. Douglas Johnson. 1972. Formal Aspects of 
Phonological Description. Mouton, Mouton, 
The Hague. 
Ronald M. Kaplan and Martin Kay. 1994. Regu- 
lar models of phonological rule systems. Com- 
putational Linguistics, 20(3). 
Lauri Karttunen. 1995. The replace operator. In 
33 rd Meeting of the Association for Compu- 
tational Linguistics (ACL 95), Proceedings of 
the Conference, MIT, Cambridge, Massachus- 
setts. ACL. 
Wener Kuich and Arto Salomaa. 1986. Semir- 
ings, Automata, Languages. Springer-Verlag: 
Berlin-New York. 
Mark Liberman. 1994. Commentary on kaplan 
and kay. Computational Linguistics, 20(3). 
Robert McNaughton. 1994. The uniform halt- 
ing problem for one-rule semi-thue systems. 
Technical Report 94-18, Department of Com- 
puter Science, Rensselaer Polytechnic Insti- 
tute, Troy, New York. 
Mehryar Mohri. 1994. Compact representations 
by finite-state transducers. In 32 nd Meeting of 
the Association for Computational Linguistics 
(ACL 94), Proceedings of the Conference, Las 
Cruces, New Mexico. ACL. 
Mehryar Mohri. 1995. Matching patterns of an 
automaton. Lecture Notes in Computer Sci- 
ence, 937. 
Fernando C. N. Pereira, Michael Riley, and 
Richard Sproat. 1994. Weighted rational 
transductions and their application to human 
language processing. In ARPA Workshop on 
Human Language Technology. Advanced Re- 
search Projects Agency. 
Grzegorz Rozenberg and Arto Salomaa. 1980. 
The Mathematical Theory of L Systems. Aca- 
demic Press, New York. 
Arto Salomaa and Matti Soittola. 1978. 
Automata- Theoretic Aspects of Formal Power 
Series. Springer-Verlag: Berlin-New York. 
Richard Sproat. 1996. Multilingual text analy- 
sis for text-to-speech synthesis. In Proceed- 
ings of the ECAI-96 Workshop on Extended 
Finite State Models of Language, Budapest, 
Hungary. European Conference on Artificial 
Intelligence. 
L. J. Stockmeyer and A. R. Meyer. 1973. Word 
problems requiring exponential time. In Pro- 
ceedings of the 5 th Annual ACM Sympo- 
sium on Theory of Computing. Association for 
Computing Machinery, New York, 1-9. 
