Disambiguation of Finite-State Transducers
N. Smaili and P. Cardinal and G. Boulianne and P. Dumouchel
Centre de Recherche Informatique de Montr eal.
fnsmaili, pcardinal, gboulian, Pierre.Dumouchelg@crim.ca
Abstract
The objective of this work is to
disambiguate transducers which have
the following form: T = R D and to
be able to apply the determinization
algorithm described in (Mohri, 1997).
Our approach to disambiguating
T = R D consists  rst of computing
the composition T and thereafter to
disambiguate the transducer T. We
will give an important consequence of
this result that allows us to compose
any number of transducers R with
the transducer D, in contrast to the
previous approach which consisted in
 rst disambiguating transducers D
and R to produce respectively D0 and
R0, then computing T0 = R0 D0 where
T0 is unambiguous. We will present
results in the case of a transducer
D representing a dictionary and R
representing phonological rules.
Keywords: ambiguity, determinis-
tic, dictionary, transducer.
1 Introduction
The task of speech recognition can be
decomposed into several steps, where
each step is represented by a  nite-
state transducer (Mohri et al., 1998).
The search space of the recognizer is
de ned by the composition of trans-
ducers T = A C R D M. Trans-
ducer A converts a sequence of obser-
vations O to a sequence of context-
dependent phones.
Transducer C converts a sequence
of context-dependent phones to a
sequence of context-independent
phones. Transducer R is a mapping
from phones to phones which imple-
ments phonological rules. Transducer
D is the pronunciations dictionary.
It converts a sequence of context-
independent phones to a sequence of
words. Transducer M represents a
language model: it converts sequences
of words into sequences of words, while
restricting the possible sequences or
assigning a score to the sequences.
The speech recognition problem con-
sists of  nding the path of least cost
in transducer O  T, where O is a
sequence of acoustic observations.
The pronunciations dictionary rep-
resenting the mapping from pronun-
ciations to words can show an inher-
ent ambiguity: a sequence of phones
can correspond to more than one word,
so we cannot apply the transducer de-
terminization algorithm (an operation
which reduces the redundancy, search
time and possibly space). This prob-
lem is usually handled by adding spe-
cial symbols to the dictionary to re-
move the ambiguity in order to be
able to apply the determinization al-
gorithm (Koskenniemi, 1990). Never-
theless, when we compose the dictio-
nary with the phonological rules, we
must take into account special sym-
bols. This complicates the construc-
tion of transducers representing these
rules and leads to size explosion. It
would be simpler to compose the rules
with the dictionary, then remove the
ambiguity in the result and then apply
the determinization algorithm.
2 Notations and
de nitions
Formally, a weighted transducer over a
semiring K = (K; ; ; 0; 1) is de ned
as a 6-tuple T = (Q;I; 1; 2;E;F)
where Q is a  nite set of states, I  
Q is a  nite set of initial states,  1 is
the input alphabet,  2 is the output
alphabet, E is a  nite set of transitions
and F  Q is a  nite set of  nal states.
A transition is an element of Q  1 
 2 Q K.
Transitions are of the form
t = (p(t);i(t);o(t);n(t);w(t));t2E
where p(t) denotes the transition’s
origin state, i(t) its input label, o(t)
its output label, n(t) the transition’s
destination state and w(t) 2K is the
weight of t. The tropical semiring
de ned as (R+ [1;min;+;1;0) is
commonly used in speech recogni-
tion, but our results are applicable
to the case of general semirings as well.
A path  = t1   tn of T is an ele-
ment of E verifying
n(ti 1) = p(ti) for 2 i n.
We can easily extend the functions p
and n to those paths:
p( ) = p(t1); (1)
n( ) = n(tn): (2)
We denote by P(r;s) the set of paths
whose origin is state r and whose des-
tination is state s. We can also extend
the function P to the sets R Q and
S Q:
P(R;S) = S
r2R; s2S
P(r;s)
We can extend the functions i and o to
the paths by taking the concatenations
of the input and output symbols:
i( ) = i(t1)   i(tn); (3)
o( ) = o(t1)   o(tn): (4)
De nition 1 (unambiguous trans-
ducer, (Berstel, 1979))
A transducer T is said to be unam-
biguous if for each w 2   1, there
exists at most one path  in T such
that i( ) = w.
De nition 2 (ambiguous paths)
Two paths  and  are ambiguous if
 6=  and i( ) = i( ).
Remark 1 : To remove the ambiguity
between two paths  and  , it su ces
to modify i( ) by changing the  rst in-
put label of the path  . This is done
by introducing an auxiliary symbol such
that: i( )6= i( ).
Figure 1a shows an ambiguous
transducer. It is ambiguous since
for the input string \s e [z]", there
are two paths representing the out-
put strings fces;sesg. In this  gure,
\eps" stands for epsilon or null symbol.
To disambiguate a transducer, we
 rst group the ambiguous paths; we
then remove the ambiguity in each
group by adding auxiliary labels as
shown in Figure 1b. Unfortunately, it
is infeasible to enumerate all the paths
in a cyclic transducer. However, in
(Smaili, 2001) it is shown that cyclic
transducers of the type studied in this
work can be disambiguated by trans-
forming to a corresponding acyclic sub-
transducer such that T0  T. This
(a)
0
1s:ses
3s:ces
5a:amis
7
k:cadeau
2E:eps
4E:eps
6m:eps
8a:eps
10
[z]:eps
#:#
[z]:eps
i:eps
9d:eps
o:eps
(b)
0
1s:ses
3s-2:ces
5a:amis
7
k:cadeau
2E:eps
4E:eps
6m:eps
8a:eps
10
[z]:eps
#:#
[z]:eps
i:eps
9d:eps
o:eps
Figure 1: (a) Ambiguous transducer
(b) Disambiguated transducer
fundamental property is described in
detail in section 2.1. Accordingly, we
apply the appropriate transformation
to the input transducer.
2.1 Fundamental Property
We are interested in the transducer
T = (Q; I;  ;  ; E; F) with  =
 0 ] 1 verifying the following prop-
erty:
Any cycle in T contains at least a
transition t such that i(t)2 1:
We denote by E0 and E1 the follow-
ing sets: E0 = ft 2 E : i(t) 2  0g
and E1 =ft2E : i(t) 2 1g: Notice
that E = E0]E1:
We can give a characterization of the
ambiguous paths verifying the funda-
mental property. Before, let’s make the
following remark:
Remark 2 Any path  in T has the
following form:
 = f0  0 f1  1    n 1 fn  n
with  i 2 E+0 ; fi 2 E+1 for 1  i  
n; f0 2 E 1 and  0 2 E 0 if n 1:
If n = 0 then  = f0  0:
Proposition 1 (characterization of
ambiguous paths)
Let  and  be two paths such that:
 = f0  0 f1  1    n 1 fn  n and
 = g0  0 g1  1    k 1 gk  k.
 and  are ambiguous if and only if
8
<
:
k = n
 i and  i are ambiguous (0 i n):
fi and gi are ambiguous (0 i n):
We will assume that the  rst transi-
tion’s path belongs to E0, i.e. f0 =  :
Recall that if we want to avoid cy-
cles, we just have to remove from T
all transitions t 2 E1. According to
Proposition 1, ambiguity needs to be
removed only in paths that use tran-
sitions t 2 E0, namely the path  i
that performs the decomposition given
in Remark 2. Disambiguation consists
only of introducing auxiliary labels in
the ambiguous paths. We denote by
Asrc the set of origin states of transi-
tions belonging to E1 and by Adst the
set of destination states of transitions
belonging to E2.
Asrc =fp(t) : t 2 E1g
Adst =fn(t) : t 2 E1g
According to Proposition 1 and what
precedes, it would be equivalent and
simpler to disambiguate an acyclic
transducer obtained from T in which
we have removed all E1 transitions.
Therefore, we introduce the operator
 : fTing  ! fToutg which accom-
plishes this construction.
Let T = (Q;I; 1; 2;E;F). Then
 (T) = (Q;I1; 1; 2;ET;F1) where:
1. I1 = I[Adst[fig, with i62Q.
2. F1 = F[Asrc[ffg, with f62Q.
3. ET = EnE1[f(i; q;  ;  ;0); q 2
I1g [f(q; f;  ;  ;0); q 2F1g.
The third condition insures the connec-
tivity of  (T) if T is itself connected.
It su ces to disambiguate the acyclic
transducer  (T), then reinsert the
transitions of E1 in  (T). The set of
paths in  (T) is then P(I1, F1).
2.2 Algorithm
Input:
T = (Q; i; X; Y; E; F) is an
ambiguous transducer verifying the
fundamental property.
Output:
T1 = (Q; i; X[X1; Y; ET; F) is an
unambiguous transducer, X1 is the set
of auxiliary symbols.
1. Tacyclic   (T).
2. Path set of paths of Tacyclic.
3. Disambiguate the set Path (creat-
ing the set X1).
4. T0  build the unambiguous
transducer which has unambigu-
ous paths.
5. T1    1(T0) (consists of rein-
serting in T0 the transitions of T
which where removed).
6. return T1
Now, we will study an important
class of transducers verifying the fun-
damental property. This class is ob-
tained by doing the composition of a
transducer D verifying the fundamen-
tal property with a transducer R. The
composition of two transducers is an
e cient algebraic operation for build-
ing more complex transducers. We
give a brief de nition of composition
and the fundamental theorem that in-
sures the invariance of the fundamental
property by composition.
3 Composition
The transducer T created by the com-
position of two transducers R and D,
denoted T = R D, performs the map-
ping of word x to word z if and only
if R maps x to y and D maps y to z.
The weight of the resulting word is the
 -product of the weights of y and z
(Pereira and Riley, 1997).
De nition 3 (Transitions) Let t =
(q, a, b, q1, w1) and e = (r, b, c, r1,
w2) be two transitions. We de ne the
composition t with e by:
t  e = ((q, r), a, c, (q1, r1), w1 w2).
Note that, in order to make the com-
position possible, we must have o(t) =
i(e).
De nition 4 (Composition)
Let R = (QR;IR;X;Y;ER;FR)
and S = (QS;IS;Y;Z;ES;FS) be
two transducers. The composi-
tion of R with S is a transducer
R S = (Q;Q;X;Z;E;F) de ned by:
1. i = (iR;iS),
2. Q = QR QS,
3. F = FR FS,
4. E =feR eS : eR 2ER; eS 2ESg.
Let D = (QD;ID;Y;Z;ED;FD) be a
transducer verifying the fundamental
property. We can write Y = Y0 ]Y1
where Y0 = fi(t) : t 2 E0g and
Y1 =fi(t) : t2E1g:
Theorem 1 (Fundamental) Let
R = (QR;IR;X;Y;ER;FR) verifying
the following condition:
(C) 8t2ER; o(t)2Y1 )i(t)2Y1:
Then the transducer T = R D veri es
the fundamental property.
Proof:
Let X1 = fi(t) : t 2 ER and o(t) 2
Y1g Y1 and X0 = XnX1. We will
prove that any path in T contains at
least a transition t such that i(t)2X1.
Let  be a cycle in T. Then, there
exists two cycles  R and  D in R and
in D respectively such that  =  R  
 D. The paths  R and  D have the
following form:
 D = g1   gn,
with gi 2ED for 1 i n;
 R = f1   fn,
with fi 2ER for 1 i n;
 =  R   D = (f1  g1)   (fn  gn).
There is an index k such that i(gk) 2
Y1 since D veri es the fundamental
property. We also necessarily have
i(gk) = o(fk) . According to condi-
tion (C) of Theorem 1, we deduce that
i(fk)2Y1. Knowing that fk 2ER, we
deduce that i(fk)2X1, which implies
i(fk gk) = i(fk) 2X1:
3.1 Consequence
The restriction to the case X = Y
allows us to build a large class of
transducers verifying the fundamental
property. In fact, if two transducers
R = (QR;IR;Y;Y;ER;FR) and S =
(QS;IS;Y;Y;ES;FS) verify the condi-
tion (C) of Theorem 1, then S R ver-
i es the condition (C), associativity of
 implies:
S (R D) = (S R) D.
Suppose that we have m transducers
Ri ( 1  i  m ) verifying the con-
dition (C) of Theorem 1 and that we
want to reduce the size of the trans-
ducer:
Tm = Rm Rm 1   R1 D.
To this end, we proceed as follows: we
add the auxiliary symbols to disam-
biguate the transducer; then we apply
determinization and  nally we remove
the auxiliary labels. These three oper-
ations are denoted by  .
Ti =
  (D) if i = 0:
 (Ri  (Ti 1)) if i 1:
The size of transducer Tm can also
be reduced by computing:
Tm =  (Rm Rm 1   R1 D).
The old approach:
T0m = R0m R0m 1   R01 D0:
has several disadvantages. The size of
R0i for 1  i  m increases consid-
erably since the auxiliary labels intro-
duced in each transducer have to be
taken into account in all others. This
fact limits the number of transducers
that can be composed with D.
4 Application and Results
We will now apply our algorithm to
transducers involved in speech recog-
nition. Transducer D represents the
pronunciation dictionary and possesses
the fundamental property. The set of
transitions of D is de ned as
E = E0]f(f;#;x;0;w)g
where f is the unique  nal state ofD, 0
is the unique initial state of D, x is any
symbol and # is a symbol represent-
ing the end of a word. All transitions
t 2 E0 are such that i(t) 6= #. Any
path  in E 0 is acyclic. The transducer
R representing a phonological rule is
constructed to ful ll condition (C) of
the fundamental theorem. The trans-
ducer D represents a French dictionary
with 20000 words and their pronuncia-
tions. The transducer R represents the
phonological rule that handles liaison
in the French language. This liaison,
which is represented by a phoneme ap-
pearing at the end of some words, must
be removed when the next word be-
gins with a consonant since the liaison
phoneme is never pronounced in that
case. However, if the next word begins
with a vowel, the liaison phoneme may
or may not be pronounced and thus be-
comes optional.
0
p:p
#:#
1
eps:[x]
2
[x]:[x]
p:p
#:#
v:v
#:#
Figure 2: Transducer used to handle
the optional liaison rule.
Figure 2 shows the transducer that
handles this rule. In the  gure, p
denotes all phonemes, v the vowels
and [x] the liaison phonemes.
Table 1 shows the results of our al-
gorithm using the dictionary and the
phonological rule previously described.
Transducer States Transitions
D 115941 136001
 (D) 17607 42140
R D 115943 151434
 (R D) 17955 50769
R  (D) 17611 53209
 (R  (D)) 17587 49620
Table 1: Size reduction on a French
dictionary
As we can see in Table 1, the opera-
tor  produces a smaller transducer in
all the cases considered here.
5 Conclusion and future
work
We have been able to disambiguate
an important class of cyclic and am-
biguous transducers, which allows us
to apply the determinization algorithm
(Mohri, 1997); and then to reduce the
size of those transducers. With our
new approach, we do not have to take
into account the number of transduc-
ers Ri and their auxiliary labels as was
the case with the approach used be-
fore. Thus, new transducers Ri such
as phonological rules can be easily in-
serted in the chain.
The major disadvantage of our ap-
proach is that disambiguating a trans-
ducer increases its size systematically.
Our future work will consist of develop-
ing a more e ective algorithm for dis-
ambiguating an acyclic transducer.

References

J. Berstel. 1979. Transductions and
Context-Free Languages. Teubner
Studienbucher, Stuttgart, Germany.

G. Boulianne, J. Brousseau, P. Ouel-
let, and P. Dumouchel. 2000. French
large vocabulary recognition with
cross-word phonology transducers.
In Proceedings ICASSP 2000, June.
Istanbul, Turkey.

S. Eilenberg. 1974-1976. Automata,
Language and Machines, volume A-
B. Academic Press, New York.

R. Kaplan and M. Kay. 1994. Reg-
ular models of phonological rule
systems. Computational linguistics,
20(3):331{378.

K. Koskenniemi. 1990. Finite state
parsing and disambiguation. In Pro-
ceedings of the 13th International
Conference on Computational Lin-
guistics (COLING '90), volume 2.
Helsinki, Finland.

M. Mohri, M. Riley, D. Hindle,
A. Ljolje, and F. Pereira. 1998.
Full expansion of context-dependent
networks in large vocabulary
speech recognition. In Proceedings
of the International Conference
on Acoustics, Speech, and Signal
Proceesing(ICASSP 98). Seattle,
Washington.

M. Mohri. 1997. Finite-state trans-
ducers in language and speech pro-
cessing. Computational linguistics,
23(2).

F. Pereira and M. Riley, 1997.
Speech recognition by composition
of weighted  nite automata. Em-
manuel Roche and Yves Schabes,
Cambridge, Massachusetts, a brad-
ford book, the mit press edition.

Nasser Smaili. 2001.
D esambigu  sation de transduc-
teurs en reconnaissance de la parole.
Universit e du Qu ebec  a Montr eal.
