Stochastic Finite-State models for Spoken Language Machine 
': anslation 
Srinivas Bangalore Giuseppe Riccardi 
AT&T Labs - Research 
180 Park Avenue 
Florham Park, NJ 07932 
{srini, ds'.p3}©research, att. tom 
Abstract 
Stochastic finite-state models are efficiently learn- 
able from data, effective for decoding and are asso- 
ciated with a calculus for composing models which 
allows for tight integration of constraints frora var- 
ious levels of  processing. In this paper, 
we present a method for stochastic finite-state ma- 
chine translation that is trained automatically from 
pairs of source and target utterances. We use this 
method to develop models for English-Japanese and 
Japanese-English translation. We have embedded 
the Japanese-English translation system in a call 
routing task of unconstrained speech utterances. We 
evaluate the efficacy of the translation system :in the 
context of this application. 
1 Introduction 
Finite state models have been extensively applied 
to many aspects of  processing including, 
speech recognition (Pereira and Riley, 1997; Riccardi 
et al., 1996), phonology (Kaplan and Kay, 1994), 
morphology (Koskenniemi, 1984), chunking (Abney, 
1991; Srinivas, 1997) and parsing (Roche, 1.999). 
Finite-state models are attractive mechanisms for 
 processing since they are (a) efficiently 
learnable from data (b) generally effective for de- 
coding (c) associated with a calculus for composing 
models which allows for straightforward integration 
of constraints from various levels of  pro- 
cessing. I 
In this paper, we develop stochastic finite-state 
models (SFSM) for statistical machine transla- 
tion (SMT) and explore the performance limits of 
such models in the context of translation in limited 
domains. We are also interested in these models 
since they allow for a tight integration with a speech 
recognizer for speech-to-speech translation. In par- 
ticular we are interested in one-pass decoding and 
translation of speech as opposed to the more preva- 
lent approach of translation of speech lattices. 
The problem of machine translation can be viewed 
as consisting of two phases: (a) lexical choice phase 
1 Furthermore, software implementing the finite-stal;e cal- 
culus is available for research purposes. 
where appropriate target  lexical items are 
chosen for each source  lexical item and (b) 
reordering phase where the chosen target  
lexical items are reordered to produce a meaning- 
ful target  string. In our approach, we will 
represent these two phases using stochastic finite- 
state models which can be composed together to 
result in a single stochastic finite-state model for 
SMT. Thus our method can be viewed as a direct 
translation approach of transducing strings of the 
source  to strings of the target . 
There are other approaches to statistical machine 
translation where translation is achieved through 
transduction of source  structure to tar- 
get  structure (Alshawi et al., 1998b; Wu, 
1997). There are also large international multi-site 
projects such as VERBMOBIL (Verbmobil, 2000) 
and CSTAR (Woszczyna et al., 1998; Lavie et al., 
1999) that are involved in speech-to-speech trans- 
lation in limited domains. The systems developed 
in these projects employ various techniques ranging 
from example-based to interlingua-based translation 
methods for translation between English, French, 
German, Italian, Japanese, and Korean. 
Finite-state models for SMT have been previ- 
ously suggested in the literature (Vilar et al., 1999; 
Knight and A1-Onaizan, 1998). In (Vilar et al., 
1999), a deterministic transducer is used to imple- 
ment an English-Spanish speech translation system. 
In (Knight and A1-Onaizan, 1998), finite-state ma- 
chine translation is based on (Brown et al., 1993) 
and is used for decoding the target  string. 
However, no experimental results are reported using 
this approach. 
• Our approach differs from the previous approaches 
in both the lexical choice and the reordering phases. 
Unlike the previous approaches, the lexical choice 
phase in our approach is decomposed into phrase- 
level and sentence-level translation models. The 
phrase-level translation is learned based on joint en- 
tropy reduction of the source and target s 
and a variable length n-gram model (VNSA) (Ric- 
cardi et al., 1995; Riccardi et al., 1996) is learrmd 
for the sentence-level translation. For the construc- 
52 
tion of the bilingual lexicon needed for lexical choice, 
we use the alignment algorithm presented in (A1- 
shawl et al., 1998b) which takes advantage of hi- 
erarchical decomposition of strings and thus per- 
forms a structure-based alignment. In the previ- 
ous approaches, a bilingual lexicon is constructed 
using a string-based alignment. Another difference 
between our approach and the previous approaches 
is in the reordering of the target  lexical 
items. In (Knight and A1-Onaizan, 1998), an FSM 
that represents all strings resulting from the per- 
mutations of the lexical items produced by lexical 
choice is constructed and the most likely translation 
is retrieved using a target  model. In (Vilar 
et al., 1999), the lexical items are associated with 
markers that allow for reconstruction of the target 
 string. Our reordering step is similar to 
that proposed in (Knight and A1-Onalzan, 1998) but 
does not incur the expense of creating a permutation 
lattice. We use a phrase-based VNSA target lan- 
guage model to retrieve the most likely translation 
from the lattice. 
In addition, we have used the resulting finite- 
state translation method to implement an English- 
Japanese speech and text translation system and 
a Japanese-English text translation system. We 
present evaluation results for these systems and dis- 
cuss their limitations. We also evaluate the efficacy 
of this translation model in the context of a telecom 
application such as call routing. 
The layout of the paper is as follows. In Section 2 
we discuss the architecture of the finite-state trans- 
lation system. We discuss the algorithm for learning 
lexical and phrasal translation in Section 3. The de- 
tails of the translation model are presented in Sec- 
tion 4 and our method for reordering the output 
is presented in Section 5. In Section 6 we discuss 
the call classification application and present moti- 
vations for embedding translation in such an applica- 
tion. In Section 6.1 we present the experiments and 
evaluation results for the various translation systems 
on text input. 
2 Stochastic Machine Translation 
In machine translation, the objective is to map a 
source symbol sequence Ws = wx,...,WNs (wi E 
Ls) into a target sequence WT = xl,..., XNT (Xi E 
LT). The statistical machine translation approach 
is based on the noisy channel paradigm and the 
Maximum-A-Posteriori decoding algorithm (Brown 
et al., 1993). The sequence Ws is thought as a noisy 
version of WT and the best guess I)d~ is then com- 
puted as 
^ 
W~ = argmax P(WT{Ws) wT 
= argmax P(WslWT)P(WT) (1) wT 
In (Brown et al., 1993) they propose a method for 
maximizing P(WTIWs) by estimating P(WT) and 
P(WsIWT) and solving the problem in equation 1. 
Our approach to statistical machine translation dif- 
fers from the model proposed in (Brown et al., 1993) 
in that: 
• We compute the joint model P(Ws, WT) from 
the bi corpus to account for the direct 
mapping of the source sentence Ws into the tar- 
get sentence I?VT that is ordered according to the 
• source  word order. The target string 
IfV~ is then chosen from all possible reorderings 2 
of 
I?VT = argmax P(Ws, WT) (2) WT 
\[TV~ = arg max P(I~VT I AT) (3) 
WTE~W T 
where AT is the target  model and AWT 
are the different reorderings of WT. 
• We decompose the translation problem into 
local (phrase-level) and global (sentence-level) 
source-target string transduction. 
• We automatically learn stochastic automata 
and transducers to perform the sentence-level 
and phrase-level translation. 
As shown in Figure 1, the stochastic machine 
translation system consists of two phases, the lexical 
choice phase and the reordering phase. In the next 
sections we describe the finite-state machine com- 
ponents and the operation cascade that implements 
this translation algorithm. 
3 Acquiring Lexical Translations 
In the problem of speech recognition the alignment 
between the words and their acoustics is relatively 
straightforward since the words appear in the same 
order as their corresponding acoustic events. In con- 
trast, in machine" translation, the linear order of 
words in the source , in general is not main- 
tained in the target . 
The first stage in the process of bilingual phrase 
acquisition is obtaining an alignment function that 
given a pair of source and target  sentences, 
maps source  word subsequences into target 
 word subsequences. For this purpose, we 
use the alignment algorithm described in (Alshawi et 
2 Note that computing the exact set of all possible reorder- 
ings is computationally expensive. In Section 5 we discuss 
an approximation for the set of all possible reorderings that 
serves for our application. 
53 
max P(Ws,W r ) 
WT 
Reorder 
Figure 1: A block diagram of the stochastic machine translation system 
English: I need to make a collect call 
Japanese: ~l~ ~lzP b ~--Jt~ 
Alignment: 1 5 0 3 0 2 4 
English: A T and T calling card 
Japanese: ~ 4 ~ -~ -- 7" Y F 
Alignment: 123456 
English: I'd like to charge this to my home phone 
Japanese: ~./J2 ~ $J,¢~ ~69 ~C. -~-~--~ 
Alignment: 170620345 
Table 1: Example bitexts and with alignment information 
al., 1998a). The result of the alignment procedure 
is shown in Table 1.3 
Although the search for bilingual phrases of length 
more than two words can be incorporated in a 
straight-forward manner in the alignment module, 
we find that doing so is computationally prohibitive. 
We first transform the output of the alignment 
into a representation conducive for further manip- 
ulation. We call this a bi TB. A string 
R E TB is represented as follows: 
R = Wl-Zl, W2_Z2,... , WN-ZN (4) 
an example alignment and the source-word-ordered 
bi strings corresponding to the alignment 
shown in Table 1. 
Having transformed the alignment for each sen- 
tence pair into a bi string (source word- 
ordered or target word-ordered), we proceed to seg- 
ment the corpus into bilingual phrases which can be 
acquired from the corpus TB by minimizing the joint 
entropy H(Ls, LT) ~ -1/M log P(TB). The proba- 
bility P(Ws, WT) = P(R) is computed in the same 
way as n-gram model: 
where wl E LsUe, zi E LTUe, e is the empty 
string and wi_zi is the symbol pair (colons are the 
delimiters) drawn from the source and target lan- 
guage. 
A string in a bi corpus consists of se- 
quences of tokens where each token (wi-xi) is repre- 
sented with two components: a source word (\]possi- 
bly an empty word) as the first component and the 
target word (possibly an empty word) that is the 
translation of the source word as the second com- 
ponent. Note that the tokens of a bi could 
be either ordered according to the word order of the 
source  or ordered according to the word 
order of the target . Thus an alignment 
of a pair of source and target  sentences 
will result in two bi strings. Table 2 shows 
3The Japanese string was translated and segmented so 
that a token boundary in Japanese corresponds to some token 
boundary in English. 
P(R) = Ill 
i (5) 
Using the phrase segmented corpus, we construct 
a phrase-based variable n-gram translation model as 
discussed in the following section. 
4 Learning Phrase-based Variable 
N-gram Translation Models 
Our approach to stochastic  modeling is 
based on the Variable Ngram Stochastic Automaton 
(VNSA) representation and learning algorithms 
introduced in (Riccardi et al., 1995; Pdccardi et al., 
1996). A VNSA is a non-deterministic Stochastic 
Finite-State Machine (SFSM) that allows for pars- 
ing any possible sequence of words drawn from a 
given vocabulary 12. In its simplest implementation 
the state q in the VNSA encapsulates the lexical 
(word sequence) history of a word sequence. Each 
54 
I_i./Ji need_~,~5 9 ~ ~ to_%EPS% make_~--It, ~" a_%EPS% collect_ ~ I/2 b call_hi l~ 
I'd_$L~2c like_ L 1",: ~ ~'~'~ to_%EPS% charge_-~-'v -- ~ this_,._ ~ ~ to_%EPS% my_*.L~ home..~ phone_~-~b= 
A_m4 T_-~ 4 -- and_T:/V T_~ ~ -- calling_K-- ~) Z/Y" card_2-- V 
Table 2: Bi strings resulting from alignments shown in Table 1. 
(%EPS% represents the null symbol c). 
state recognizes a symbol wi E lZU {e}, where e is the 
empty string. The probability of going from state qi 
to qj (and recognizing the symbol associated to qj) 
is given by the state transition probability, P(qj \[qi). 
Stochastic finite-state machines represent m a 
compact way the probability distribution over all 
possible word sequences. The probability of a word 
sequence W can be associated to a state sequence 
~Jw = ql,..., qj and to the probability P(~Jw)" For 
a non-deterministic finite-state machine the prob- 
ability of W is then given by P(W) = ~j P((Jw). 
Moreover, by appropriately defining the state space 
to incorporate lexical and extra-lexical information, 
the VNSA formalism can generate a wide class of 
probability distribution (i.e., standard word n-gram, 
class-based, phrase-based, etc.) (Riccardi et al., 
1996; Riccardi et al., 1997; Riccardi and Bangalore, 
1998). In Fig. 2, we plot a fragment of a VNSA 
trained with word classes and phrases. State 0 is 
the initial state and final states are double circled. 
The e transition from state 0 to state 1 carries 
the membership probability P(C), where the class 
C contains the two elements {collect, calling 
card}. The c transition from state 4 to state 6 
is a back-off transition to a lower order n-gram 
probability. State 2 carries the information about 
the phrase calling card. The state transition 
function, the transition probabilities and state 
space are learned via the self-organizing algorithms 
presented in (Riccardi et al., 1996). 
4.1 Extending VNSAs to Stochastic 
Transducers 
Given the monolingual corpus T, the VNSA learning 
algorithm provides an automaton that recognizes an 
input string W (W E yY) and computes P(W) ¢ 0 
for each W. Learning VNSAs from the bilingual cor- 
pus TB leads to the notion of stochastic transducers 
rST. Stochastic transducers rST : Ls × LT ~ \[0, 1\] 
map the string Ws E Ls into WT E LT and assign 
a probability to the transduction Ws ~--~ WT. In 
our case, the VNSA's model will estimate P(Ws ~-~.~" 
WT) : P(Ws, WT) and the symbol pair wi : xi 
will be associated to each transducer state q with 
input label wi and output label xl. The model 
rST provides a sentence-level transduction from Ws 
into WT. The integrated sentence and phrase-level 
transduction is then trained directly on the phrase- 
segmented corpus 7~ described in section 3. 
5 Reordering the output 
The stochastic transducers TST takes as input a sen- 
tence Ws and outputs a set of candidate strings in 
the target  with source  word or- 
der. Recall that the one-to-many mapping comes 
from the non-determinism of VST. The maximiza- 
tion step in equation 2 is carried out with Viterbi al- 
gorithm over the hypothesized strings in LT and I~VT 
is selected. The last step to complete the translation 
process is to apply the monolingual target  
model A T to re-order the sentence I?VT to produce 
^ 
W~. The re-order operation is crucial especially 
in the case the bi phrases in 7~ are not 
sorted in the target . For the re-ordering 
operation, the exact approach would be to search 
through all possible permutations of the words in 
ITVT and select the most likely. However, that op- 
eration is computationally very expensive. To over- 
come this problem, we approximate the set of the 
permutations with the word lattice AWT represent- 
ing (xl I x2 I ... XN) N, where xi are the words in 
ITVT. The most likely string ~V~ in the word lattice 
is then decoded as follows: 
^ 
W~ = argmax(~T o ~WT) 
= arg max P(~VT I)~T) 
(6) 
Where o is the composition operation defined for 
weighted finite-state machines (Pereira and Riley, 
1997). The complete operation cascade for the ma- 
chine translation process is shown in Figure 3. 
6 Embedding Translation in an 
Application 
In this section, we describe an application in 
which we have embedded our translation model and 
present some of the motivations for doing so. The 
application that we are interested in is a call type 
classification task called How May I Help You (Gorin 
et al., 1997). The goal is to sufficiently understand 
55 
collect/0.5 
ycs/O.8 ~ ~'~._~~ 
calYl 
please/1 < 
Figure 2: Example of a Variiable Ngram Stochastic Automaton (VNSA). 
........................................................ 
zsr 2,- 
m xe(v ,w ) \] w" 
Reorder 
Figure 3: The Machine Translation architecture 
caller's responses to the open-ended prompt How 
May I Help You? and route such a call based on the 
meaning of the response. Thus we aim at extracting 
a relatively small number of semantic actions from 
the utterances of a very large set of users who are not 
trained to the system's capabilities and limitations. 
The first utterance of each transaction has been 
transcribed and marked with a call-type by label- 
ers. There are 14 call-types plus a class other for 
the complement class. In particular, we focused our 
study on the classification of the caller's first utter- 
ance in these dialogs. The spoken sentences vary 
widely in duration, with a distribution distinctively 
skewed around a mean value of 5.3 seconds corre- 
sponding to 19 words per utterance. Some examples 
of the first utterances are given below: 
• Yes ma'am where is area code two zero 
one? 
• I'm tryn'a call and I can't get i~ to 
go through I wondered if you could try 
it for me please? 
• Hello 
We trained a classifer on the training Set of En- 
glish sentences each of which was annotated with a 
call type. The classifier searches for phrases that are 
strongly associated with one of the call types (Gorin 
et al., 1997) and in the test phase the classifier ex- 
tracts these phrases from the output of the speech 
recognizer and classifies the user utterance. '\]?his is 
how the system works when the user speaks English. 
However, if the user does not speak the  
that the classifier is trained on, English, in our 
case, the system is unusable. We propose to solve 
this problem by translating the user's utterance, 
Japanese, in our case, to English. This extends the 
usability of the system to new user groups. 
An alternate approach could be to retrain the 
classifier on Japanese text. However, this approach 
would result in replicating the system for each pos- 
sible input , a very expensive proposition 
considering, in general, that the system could have 
sophisticated natural  understanding and 
dialog components which would have to be repli- 
cated also. 
6.1 Experiments and Evaluation 
In this section, we discuss issues concerning evalu- 
ation of the translation system. The data for the 
experiments reported in this section were obtained 
from the customer side of operator-customer con- 
versations, with the customer-caxe application de- 
scribed above and detailed in (Riccardi and Gorin, 
January 2000; Gorin et al., 1997). Each of the cus- 
tomer's utterance transcriptions were then manually 
translated into Japanese. A total of 15,457 English- 
Japanese sentence pairs was split into 12,204 train- 
ing sentence pairs and 3,253 test sentence pairs. 
The objective of this experiment is to measure 
the performance of a translation system in the con- 
text of an application. In an automated call router 
there axe two important performance measures. The 
first is the probability of false rejection, where a 
call is falsely rejected. Since such calls would be 
transferred to a human agent, this corresponds to 
a missed opportunity for automation. The second 
56 
measure is the probability of correct classification. 
Errors in this dimension lead to misinterpretations 
that must be resolved by a dialog manager (Abella 
and Gorin, 1997). 
Using our approach described in the previous 
sections, we have trained a unigram, bigram and 
trigram VNSA based translation models with and 
without phrases. Table 3 shows lexical choice (bag- 
of-tokens) accuracy for these different translation 
models measured in terms of recall, precision and 
F-measure. 
In order to measure the effectiveness of our trans- 
lation models for this task we classify Japanese ut- 
terances based on their English translations. Fig- 
ure 4 plots the false rejection rate against the correct 
classification rate of the classifier on the English gen- 
erated by three different Japanese to English trans- 
lation models for the set of Japanese test sentences. 
The figure also shows the performance of the classi- 
fier using the correct English text as input. 
There are a few interesting observations to be 
made from the Figure 4. Firstly, the task per- 
formance on the text data is asymptotically simi- 
lar to the task performance on the translation out- 
put. In other words, the system performance is not 
significantly affected by the translation process; a 
Japanese transcription would most often be associ- 
ated with the same call type after translation as if 
the original were English. This result is particu- 
larly interesting inspite of the impoverished reorder- 
ing phase of the target  words. We believe 
that this result is due to the nature of the application 
where the classifier is mostly relying on the existence 
of certain key words and phrases, not necessarily in 
any particular order. 
The task performance improved from the 
unigram-based translation model to phrase unigram- 
based translation model corresponding to the im- 
provement in the lexical choice accuracy in Table 3. 
Also, at higher false rejection rates, the task perfor- 
mance is better for trigram-based translation model 
than the phrase trigram-based translation model 
since the precision of lexical choice is better than 
that of the phrase trigram-based model as shown in 
Table 3. This difference narrows at lower false rejec- 
tion rate. 
We are currently working on evaluating the 
translation system in an application independent 
method and developing improved models of reorder- 
ing needed for better translation system. 
7 Conclusion 
We have presented an architecture for speech trans- 
lation in limited domains based on the simple ma- 
chinery of stochastic finite-state transducers. We 
have implemented stochastic finite-state models for 
English-Japanese and Japanese-English translation 
in limited domains. These models have been trained 
automatically from source-target utterance pairs. 
We have evaluated the effectiveness of such a transla- 
tion model in the context of a call-type classification 
task. 

References 
A. Abella and A. L. Gorin. 1997. Generating se- 
mantically consistent inputs to a dialog man- 
ager. In Proceedings of European Conference on 
Speech Communication and Technology, pages 
1879-1882. 
Steven Abney. 1991. Parsing by chunks. In Robert 
Berwick, Steven Abney, and Carol Tenny, editors, 
Principle-based parsing. Kluwer Academic Pub- 
lishers. 
H.'Alshawi, S. Bangalore, and S. Douglas. 1998a. 
Learning Phrase-based Head Transduction Mod- 
els for Translation of Spoken Utterances. In The 
fifth International Conference on Spoken Lan- 
guage Processing (ICSLP98), Sydney. 
Hiyan Alshawi, Srinivas Bangalore, and Shona Dou- 
glas. 1998b. Automatic acquisition of hierarchi- 
cal transduction models for machine translation. 
In Proceedings of the 36 th Annual Meeting of the 
Association for Computational Linguistics, Mon- 
treal, Canada. 
P. Brown, S.D. Pietra, V.D. Pietra, and R. Mer- 
cer. 1993. The Mathematics of Machine Transla- 
tion: Parameter Estimation. Computational Lin- 
guistics, 16(2):263-312. 
E. Giachin. 1995. Phrase Bigrams for Continuous 
Speech Recognition. In Proceedings of ICASSP, 
pages 225-228, Detroit. 
A. L. Gorin, G. Riccardi, and J. H Wright. 1997. 
How May I Help You? Speech Communication, 
23:113-127. 
R.M. Kaplan and M. Kay. 1994. Regular models 
of phonological rule systems. Computational Lin- 
guistics, 20(3):331-378. 
Kevin Knight and Y. A1-Onaizan. 1998. Transla- 
tion with finite-state devices. In Machine trans- 
lation and the information soup, Langhorne, PA, 
October. 
K. K. KoskenniemL 1984. Two-level morphology: a 
general computation model for word-form recogni- 
tion and production. Ph.D. thesis, University of 
Helsinki. 
Alon Lavie, Lori Levin, Monika Woszczyna, Donna 
Gates, Marsal Gavalda, , and Alex Waibel. 
1999. The janus-iii translation system: Speech- 
to-speech translation in multiple domains. In 
Proceedings of CSTAR Workshop, Schwetzingen, 
Germany, September. 
Fernando C.N. Pereira and Michael D. Riley. 1997. 
Speech recognition by composition of weighted 
finite automata. In E. Roche and Schabes Y., 
editors, Finite State Devices for Natural Lan- 
guage Processing. MIT Press, Cambridge, Mas- 
sachusetts. 
G. Riccardi and S. Bangalore. 1998. Automatic ac- 
quisition of phrase grammars for stochastic lan- 
guage modeling. In Proceedings of A CL Workshop 
on Very Large Corpora, pages 188-196, Montreal. 
G. Riccardi and A.L. Gorin. January, 2000. 
Stochastic Language Adaptation over Time and 
State in Natural Spoken Dialogue Systems. IEEE 
Transactions on Speech and Audio, pages 3--10. 
G. Riccardi, E. Bocchieri, and It. Pieraccini. 1995. 
Non deterministic stochastic  models for 
speech recognition. In Proceedings of ICASSP, 
pages 247-250, Detroit. 
G. Riccardi, R. Pieraccini, and E. Bocchieri. 1996. 
Stochastic Automata for Language Modeling. 
Computer Speech and Language, 10(4):265-293. 
G. Riccardi, A. L. Gorin, A. Ljolje, and M. Riley. 
1997. A spoken  system for automated 
call routing. In Proceedings of ICASSP, pages 
1143-1146, Munich. 
K. Ries, F.D. BuO, and T. Wang. 1995. Improved 
Language Modeling by Unsupervised Acquisition 
of Structure. In Proceedings of ICASSP, pages 
193-196, Detroit. 
Emmanuel Roche. 1999. Finite state transducers: 
parsing free and frozen sentences. In Andr~ Ko- 
rnai, editor, Eztened Finite State Models of Lan- 
guage. Cambridge University Press. 
B. Srinivas. 1997. Complexity of Lexical Descrip- 
tions and its .Relevance to Partial Parsing. Ph.D. 
thesis, University of Pennsylvania, Philadelphia, 
PA, August. 
Verbmobil. 2000. Verbmobil Web page. 
http://verbmobil.dfki.de/. 
J. Vilar, V.M. Jim~nez, J. Amengual, A. Castel- 
lanos, D. Llorens, and E. VidM. 1999. Text 
and speech translation by means of subsequential 
transducers. In Andr~ Kornai, editor, Extened 
Finite State Models of Language. Cambridge Uni- 
versity Press. 
Monika Woszczyna, Matthew Broadhead, Donna 
Gates, Marsal Gavalda, Alon Lavie, Lori Levin, 
and Alex Waibel. 1998. A modular approach to 
spoken  translation for large domains. In 
Proceedings of AMTA-98, Langhorne, Pennsylva- 
nia, October. 
Dekai Wu. 1997. Stochastic Inversion Transduction 
Grammars and Bilingual Parsing of Parallel Cor- 
pora. Computational Linguistics, 23(3):377-404. 
