A DP based Search Using Monotone 
Alignments in Statistical Translation 
C. Tillmann, S. Vogel, H. Ney, A. Zubiaga 
Lehrstuhl f/Jr Informa,tik VI, RWTH Aachen 
D-52056 Aachen, Germany 
{t illmann, ney}©informatik, rwth-aachen, de 
Abstract 
In this paper, we describe a Dynamic Pro- 
gramming (DP) based search algorithm 
for statistical translation and present ex- 
perimental results. The statistical trans- 
lation uses two sources of information: a 
translation model and a language mod- 
el. The language model used is a stan- 
dard bigram model. For the transla- 
tion lnodel, the alignment probabilities are 
made dependent on the differences in the 
alignment positions rather than on the 
absolute positions. Thus, the approach 
amounts to a first-order Hidden Markov 
model (HMM) as they are used successful- 
ly in speech recognition for the time align- 
ment problem. Under the assumption that 
the alignment is monotone with respect to 
the word order in both languages, an ef- 
ficient search strategy for translation can 
be formulated. The details of the search 
algorithm are described. Experiments on 
the EuTrans corpus produced a word error 
rate of 5.1(/~.. 
1 Overview: The Statistical 
Approach to Translation 
The goal is the translation of a text given in some 
source language into a target language. We are given 
o J a source ('French') string fl = fl...fj...f.l, which 
is to be translated into a target ('English') string 
c~ = el...ei...el. Among all possible target strings, 
we will choose the one with the highest probability 
which is given by Bayes' decision rule (Brown et al.. 
1993): 
,~ = argmax{P,'(e\]~lfg~)} 
= argmax {P,'(ef). Pr(.f/lef)} 
Pr(e{) is the language model of the target language. 
whereas Pr(j'lale{) is the string translation model. 
The argmax operation denotes the search problem. 
In this paper, we address 
• the problem of introducing structures into the 
probabilistic dependencies in order to model 
the string translation probability Pr(f\] \[e~). 
• the search procedure, i.e. an algorithm to per- 
form the argmax operation in an efficient way. 
• transformation steps for both the source and 
the target languages in order to improve the 
translation process. 
The transformations are very much dependent on 
the language pair and the specific translation task 
and are therefore discussed in the context of the task 
description. We have to keep in mind that in the 
search procedure both the language and the transla- 
tion model are applied after the text transformation 
steps. However, to keep the notation simple we will 
not make this explicit distinction in the subsequent 
exposition. The overall architecture of the statistical 
translation approach is summarized in Figure 1. 
2 Aligmnent Models 
A key issue in modeling the string translation prob- 
ability Pr(f(le I) is the question of how we define 
the correspondence between the words of the target 
sentence and the words of the source sentence. In 
typical cases, we can assume a sort of pairwise de- 
pendence by considering all word pairs (fj,ei) for 
a given sentence pair \[f(; el\]. We further constrain 
this model by assigning each source word to exact- 
ly one target word. Models describing these types 
of dependencies are referred to as alignrnen.t models 
(Brown et al., 1993), (Dagan eta\].. 1993). (Kay & 
R6scheisen, 1993). (Fung & Church. 1994), (Vogel 
et al., 1996). 
In this section, we introduce a monotoue HMM 
based alignment and an associated DP based search 
algorithm for translation. Another approach to sta- 
tistical machine translation using DP was presented 
in (Wu, 1996). The notational convention will be a,s 
follows. We use the symbol Pr(.) to denote general 
289 
Source Language Text 1 
I Transformation 1 
¢~ 
Global Search: j~ Lexicon Model 
maximize Pr(el). pr(f~lell} I I AllgnmentModel 
ovor j. pc(e~) \[ Language Model, \[;....,!...,,on\] 
1 Target Language Text 
Figure I: Architecture of the translation approach 
based on Bayes decision rule. 
probability distributions with (nearly) no specific as- 
snmptions. In contrast, for model-based probability 
distributions, we use the generic symbol p(.). 
2.1 Alignment with HMM 
When aligning the words in parallel texts (for 
Indo-European language pairs like Spanish-English, 
German-English, halian-German .... ), we typically 
observe a strong localization effect.. Figure 2 illus- 
trates this effect, for the language pair Spanish-to- 
English. In many cases, although not always, there 
is an even stronger restriction: the difference in the 
position index is smaller than 3 and the alignment. 
is essentially monotone. To be more precise, the 
sentences can be partitioned into a small number 
of segments, within each of which the alignment is 
monotone with respect to word order in both lan- 
gaages. 
To describe these word-by-word alignments, we 
introduce the mapping j -- o j, which assigns a po- 
sition j (with source word .fj ) to the position i = aj 
(with target word ei). The concept of these align- 
ments is similar to the ones introduced by (Brown 
et al., 1993), but we will use another type of de- 
pendence in the probability distributions. Looking 
at. such alignments produced by a human expert, it, 
is evident that the mathematical model should try 
to capture the strong dependence of aj on the pre- 
ceding alignment a j-1. Therefore the probability of 
alignment aj for position j should have a dependence 
on the previous alignment position O j_l: 
P((/j \[(/j-1 ) 
A similar approach has been chosen by (Dagan et 
al., 1993) and (Vogel et al.. 1996). Thus the problem 
formulation is similar t.o that of/,he time alignment 
problem in speech recognition, where the so-called 
Hidden Markov models have been successfully used 
for a long time (Jelinek. 1976). Using the same basic 
principles, we can rewrite the probability by intro- 
ducing the 'hidden" aligmnents a~ := a l...aj...aa for 
a sentence pair \[f~; c/\]: 
P,,(s 'lcI  = 
J 
~i' j=1 
To avoid any confnsion with the term 'hidden'in 
comparison with speech recognition, we observe that 
the model states as such (representing words) are not 
hidden but the actual alignments, i.e. the sequence 
of position index pairs (j. i = aj ). 
So far there has been no basic restriction of the 
approach. We now assume a first-order dependence 
on the alignments aj only: 
Pr(fj,ajlf~-l,a{-1.e{) = p(fj,(/jlaj-l,e{) 
= p(ajlaj_l).p(fjlea,), 
where, in addition, we have assumed that the lexicon 
probability p(fle) depends only on aj and not. on 
aj _ 1 • 
To reduce the number of alignment parameters, 
we assume that the HMM alignment probabilities 
p(i\[i') depend only on the jump width (i - i'). The 
monotony condition can than be formulated as: 
p(i\[i')=O for i¢i'+O.i'+l,i'+2. 
This monotony requirement limits the applicabili- 
ty of our approach. However, by performing simple 
word reorderings, it. is possible to approach this re- 
quirement (see Section 4.2). Additional countermea- 
sures will be discussed later. Figure 3 gives an illus- 
tration of the possible alignments for the monotone 
hidden Markov model. To draw the analogy with 
speech recognition, we have to identify the states 
(along the vertical axis) with the positions i of the 
target words ei and the time (along the horizont.al 
axis) with the positions j of the source words J). 
2.2 Training 
To train the alignment and the lexicon model, we 
use the maximum likelihood criterion in the so-called 
maximum approximation, i.e. the likelihood criteri- 
on covers only the most likely alignment rather than 
the set of all alignments: 
J 
Pr(.f(leI) = ~ 1-i \[P(aJlaJ-l'. I)" P(fJle°.i )\] 
"i' j=i 
J 
-'= max1- ~ \[p(ajla.o_~, I). p(.l)leo,)\] j 
al j=l 
290 
days o 
two o 
for o 
room o 
double o 
a o 
is o 
much 
how Io 
I .... L___L___L___L ............... 
c v u h d p d d 
U a n a o a o ' 
' I a b b r s i 
a e i i a a 
n t e s 
t a 
o c 
i 
0 
n 
roomJ, o 
the J. o 
in Jo 
cold\[, o 
too I. o 
is I. 
it J. o J ........................ 
e I h h d f 
n a a a e r 
b c m ' 
i e a i 
t s o 
a i 
C a 
i d 
0 
0 
n 
night 
a 
for 
tv 
a 
and 
safe 
a 
telephonel 
a J 
with J 
room J 
a I 
booked I 
have 
we 
0 
0 
0 
0 
0 
0 0 
0 
o 
I 
Io 
I .... ----'--------- ................................... 
t r u h c t c f y t p u n 
e e n a o e a u 
n s a b n 1 j e 
e e i ' a r 
m r t e t 
o v a f e 
s a c o 
d i n 
a ) o 
0 
n 
e a n o 
1 r a c 
e a h 
v e 
i 
S 
i 
0 
n 
Figure 2: Word aligmnents for Spanish-English sentence pairs. 
291 
o*" 
Z 
r.~ © 
L5 iv, 
< F~ 
I I I I \[ I 
1 2 3 4 5 6 
SOURCE POSITION 
Figure 3: Illustrat ion of alignments for the nlonotone 
HMM. 
To find the optimal alignment, we use dynamic 
programming for which we have the following typical 
recursion formula: 
Q(i, j) = p(fj \]ei)max \[p(ili') . Q(i', j - 1)1 
i' 
Here. Q(i. j) is a sort of partial probability as in t.ime 
alignment for speech recognit.ion (aelinek, 1976). As 
a result, the training procedure amounts to a se- 
quence of iterat.ions, each of which consists of two 
steps: 
• posilion alignm~TH: Given the model parame- 
t.ers, det.ermine the most likely position align- 
n-lent. 
• parame*e-r eslimalion: Given the position align- 
ment. i.e. going along the alignment paths for 
all sentence pairs, perform maximum likelihood 
estimation of the model parameters; for model- 
free distributions, these estimates result in rel- 
a.tive fi'equencies. 
The IBM model 1 (Brown et al., 1993) is used to find 
an initial estimate of the translation probabilities. 
3 Search Algorithm for Translation 
For the translation operat.ion, we use a bigram lan- 
guage model, which is given in terms of the con- 
dit.ional probability of observing word ei given the 
predecessor word e.i- 1: 
p(~ilei-:) 
Using the conditional probability of the bigram lan- 
guage model, we have the overall search criterion in 
the maxinmm approximation: 
max p(eile;_:)lnax l'I \[p(ajla~-:)P(fJlea,)\] " 
,,' ti=: ~i ~=: 
Here and in the following, we omit a special treat- 
ment of the start and end conditions like j = 1 or 
j = J in order to simplify the presentation and avoid 
confusing details. Having the above criterion in 
mind, we try t.o associate the language model prob- 
abilities with the aligmnents j ~ i - aj. To this 
purpose, we exploit the monotony property of our 
alignment model which allows only transitions from 
aj-i tO aj if the difference 6 = oj-aj-1 is 0,1,2. 
We define a modified probability p~(el#) for the lan- 
guage model depending on the alignment difference 
t~. We consider each of the three cases 5 = 0, 1,2 
separately: 
• ~ = 0 (horizontal transition = alignment repe- 
tition): This case corresponds to a target word 
with two or more aligned source words and 
therefore requires ~ = # so that there is no 
contribution fl'om the language model: 
1 for e=e' 
P~=°(ele') = 0 for e ee' 
• 6 = 1 (forward transition = regular alignment.): 
This case is the regular one, and we can use 
directly the probability of the bigram language 
model: 
p~=:(ele') = p(ele') 
• ~ = 2 (skip transition = non-aligned word): 
This case corresponds to skipping a word. i.e, 
there is a word in the target string with no 
aligned word in the source string. We have to 
find the highest probability of placing a non- 
aligned word e_- between a predecessor word e' 
and a successor word e. Thus we optimize the 
following product, over the non-aligned word g: 
p~=~(eJe') = maxb~(elg).p(gIe')\] i 
This maximization is done beforehand and the 
result is stored in a table. 
Using this modified probability p~(ele'), we can 
rewrite the overall search criterion: 
aT 
l-I )\]. 
The problem now is to find the unknown mapping: 
j -- (aj, ca.,) 
which defines a path through a network with a uni- 
form trellis structure. For this trellis, we can still 
use Figure 3. However. in each position i along the 
292 
Table h DP based search algorithm for the monotone translation model. 
!nput: source string/l...fj...fJ 
initialization 
for each position j = 1,2 ..... d in source sel'ltence do 
for each position i = 1,2, ...,/maz in target sentence do 
for each target word e do 
V Q(i, j, e) = p(fj le)' ma;x{p(i\[i - 6). p~(e\[e'). Q(i - 6. j - 1, e')} 6,e 
traceback: 
- find best end hypothesis: max Q(i, J, e) 
- recover optimal word sequence 
vertical axis. we have to allow all possible words e 
of the target vocabulary. Due to the monotony of 
our alignnaent model and the bigraln language mod- 
el. we have only first-order type dependencies such 
that the local probabilities (or costs when using the 
negative logarithms of the probabilities) depend on- 
I.q on the arcs (or transitions) in the lattice. Each 
possible index triple (i.j.e) defines a grid point in 
the lattice, and we have the following set of possi- 
ble transitions fi'om one grid point to another grid 
point : 
~fi {0.1.2} : (i-6. j-l.e')--(i,j,e) 
Each of these transitions is assigned a local proba- 
bility: 
p(ili - 6). p,,(ele') . p(fj le) 
Using this formulation of the search task, we can 
now use the method of dynamic programming(DP) 
to find the best path through the lattice. To this 
purpose, we introduce the auxiliary quantity: 
Q(i.j.e): probability of the best. partial path 
which ends in the grid point (i, j, e). 
Since we have only first-order dependencies in our 
model, it is easy to see that the auxiliary quantity 
nmst satisfy the following DP recursion equation: 
Q(i.j.e) = p(fjle). 
max {p(ili- ~). maxp,,(ele'). Q(i- 6, j - 1,e')}. 
To explicitly construct the unknown word sequence 
~. it is convenient to make use of so-called back- 
pointers which store for each grid point (i.j,e) the 
best predecessor grid point (Ney et al.. 1992). 
The DP equation is evaluated recursively to find 
the best partial path to each grid point (i, j, e). The 
resuhing algorithm is depicted in Table 1. The com- 
plexity of the algorithm is J. I,,,.,. • E'-'. where E is 
the size of t.he target language vocabulary and I,,,,~. 
is the n~aximum leng{'h of the target sentence con- 
sidered. It is possible to reduce this COml)utational 
complexity by using so-called pruning methods (Ney 
et al.. 1992): due to space limitatiol~s, they are not 
discussed here. 
4 Experimental Results 
4.1 The Task and the Corpus 
The search algorithln proposed in this paper was 
tested on a subtask of the "'Traveler Task" (Vidal, 
1997). The general domain of the task comprises 
typical situations a visitor to a foreign country is 
faced with. The chosen subtask corresponds to a sce- 
nario of the hulnan-to-human communication situ- 
ations at the registration desk in a hotel (see Table 
4). 
The corpus was generated in a semi-automatic 
way. On the basis of examples from traveller book- 
lets, a prol)abilistic gralmnar for different language 
pairs has been constructed from which a large cor- 
pus of sentence pairs was generated. The vocabulary 
consisted of 692 Spanish and 518 English words (in- 
eluding punctuatioll marks). For the experiments, a 
trailfing corpus of 80,000 sentence pairs with 628,117 
Spanish and 684.777 English words was used. In ad- 
dition, a test corpus with 2.730 sentence pairs differ- 
ent froln the training sentence pairs was construct- 
ed. This test corpus contained 28.642 Spanish a.nd 
24.927 English words. For the English sentences, 
we used a bigram language model whose perplexity 
on the test corpus varied between 4.7 for the orig- 
inal text. and 3.5 when all transformation steps as 
described below had been applied. 
Table 2: Effect of the transformation steps on the 
vocabulary sizes in both languages. 
Transformation Step Spanish English 
Original (with punctuation) 692 518 
+ C.ategorization 416 227 
+ 'por_~avor' 417 
+ V~'ol'd Splkt.ing 374 
+ Word Joining 237 
+ 'Word Reordering 
293 
4.2 Text Tl-ansformations 
The purpose of the text transformations is to make 
the two languages resenable each other as closely as 
possible with respect, to sentence length and word or- 
der. In addition, the size of both vocabularies is re- 
duced by exploiting evident regularities; e.g. proper 
names and numbers are replaced by category mark- 
ers. We used different, preprocessing steps which 
were applied consecutively: 
• Original Corpus: Punctuation marks are 
treated like regular words. 
• Categorization: Some particular words or 
word groups are replaced by word categories. 
Seven non-overlapping categories are used: 
three categories for names (surnames, name and 
female names), two categories for numbers (reg- 
ular numbers and room numbers) and two cat- 
egories for date and time of day. 
• 'D_'eatment of 'pot :favor': The word 'pot 
:favor' is always moved to the end of the 
sentence and replaced by the one-word token 
' pot_favor '. 
• Word Splitting: In Spanish, the personal 
pronouns (in subject case and in object, case) 
can be part of the inflected verb form. To coun- 
teract this phenomenon, we split the verb into 
a verb part and pronoun part, such as 'darnos" 
-- "dar _nos' and "pienso" -- '_yo pienso'. 
• Word Joining: Phrases in the English lan- 
guage such as "Would yogi mind doing ...' and 
'1 would like you to do ..." are difficult to han- 
dle by our alignment model. Therefore, we 
apply some word joining, such as 'would yo~t 
mi71d" -- 'wo~dd_yo',_mind" and ~would like ' -- 
"wotdd_like '. 
• Word Reordering: This step is applied to 
the Spanish text to take into account, cases like 
the position of the adjective in noun-adjective 
phrases and the position of object, pronouns. 
E.g. "habitacidT~ dobh'-- 'doble habitaci6~'. 
By this reordering, our assumption about the 
monotony of the alignment model is more often 
satisfied. 
The effect of these transformation steps on the sizes 
of both vocabularies is shown in Table 2. In addi- 
tion to all preprocessing steps, we removed the punc- 
t.uation marks before translation and resubstituted 
t.hena by rule into the target sentence. 
4.3 Translation Results 
For each of the transformation steps described 
above, all probability models were trained anew, i.e, 
the lexicon probabilities p(fle), the alignment prob- 
abilities p(ili - 6) and the bigram language proba- 
bilities p(ele'). To produce the translated sentence 
in normal language, the transformation steps in the 
target language were inverted. 
The translation results are summarized in Table 
3. As an aut.omatic and easy-to-use measure of the 
translation errors, the Levenshtein distance between 
the automatic translation and the reference transla- 
tion was calculated. Errors are reported at the word 
level and at. the sentence level: 
• word leveh insertions (INS). deletions (DEL), 
and total lmmber of word errors (\VER). 
• sentence level: a sentence is counted as correct 
only if it is identical to the reference sentence. 
Admittedly, this is not a perfect measure. In par- 
ticular, the effect of word ordering is not taken into 
account appropriately. Actually, the figures for sen- 
tence error rate are overly pessimistic. Many sen- 
tences are acceptable and semantically correct trans- 
lations (see the example translations in Table 4), 
Table 3: Word error rates (INS/DEL, WER) and 
sentence error rates (SER) for different transforma- 
tion steps. 
Transformation Step 
Original CorPora 
+ Categorization 
+ 'por2favor ' 
+ Word Splitting 
Translation Errors \[~.\] 
423/11.2 21.2 85.5 
2.5/§.6 16.1 81.0 
2.6/8.3 14.3 75.6 
2.5/7.4 12.3 65.4 
i.3/4.9 44.6 + Word Joining 7.3 
+ Word Reordering 0.9/3.4 5.1 30.1 
As can be seen in Table 3. the translation er- 
rors can be reduced systen~at.ically by applying all 
transformation steps. The word error rate is re- 
duced from 21.2{,} t.o 5.1{2~: the sentence error rate 
is reduced from 85.55~, to 30.1%. The two most ina- 
portant transformation steps are categorization and 
word joining. What is striking, is the large fi'action 
of deletion errors. These deletion errors are often 
caused by the omission of word groups like 'for me 
please "and "could you ". Table 4 shows some example 
translations (for the best translation results). It can 
be seen that the semantic meaning of the sentence in 
the source language may be preserved even if there 
are three word errors according t.o our performance 
criterion. To study the dependence on the amount 
of training data, we also performed a training wit.la 
only 5 000 sentences out of the training corpus. For 
this training condition, the word error rate went up 
only slightly, namely from 5.15}. (for 80,000 training 
sentences) to 5.3% (for 5 000 training sentences). 
To study the effect of the language model, we test- 
ed a zerogram, a unigram and a bigram language 
model using the standard set of 80 000 training sen- 
tences. The results are shown in Table 5. The 
294 
Table 4: Examples from tile EuTrans task: O= original sentence, R= reference translation. A= automatic 
t.ranslatiol~. 
O: He hecho la reserva de una habitacidn con televisidn y t.el~fono a hombre del sefior Morales. 
R: I have made a reservation for a room with TV and telephone for Mr. Morales. 
A: I have made a reservation for a room with TV and telephone for Mr. Morales. 
O: Sfibanme las maletas a mi habitacidn, pot favor. 
R: Send up my suitcases to my room, please. 
A: Send up my suitcases to my room, please. 
O: Pot favor, querr{a qua nos diese las llaves de la habitacidn. 
R: I would like you to give us the keys to the room, please. 
A: I would like you to give us the keys to the room, please. 
O: Pot favor, me pide mi taxi para la habitacidn tres veintidds? 
R: Could you ask for nay taxi for room number three two two for me. please'? 
A: Could you ask for my taxi for room number three two two. please? 
O: Por favor, reservamos dos habitaciones dobles con euarto de bafio. 
R: We booked two double rooms with a bathroom. 
A: We booked two double rooms with a bathroom, please. 
O: Quisiera qua nos despertaran mafiana a las dos y cuarto, pot favor. 
R: l would like you to wake us up tomorrow at. a quarter past two. please. 
A: I want you to wake us up tomorrow at a quarter past two. please. 
O: Rep/seme la cuenta de la l~abitacidn ochocientos veintiuno. 
R: Could .you check the bill for room number eight two one for me, please'? 
A: Check the bill for room lmmber eight two one. 
WER decreases from 31.1c/c for the zerogram model 
to 5.1% for the bigram model. 
The results presented here can be compared with 
the results obtained by the finite-state transducer 
approach described in (Vidal, 1996: Vidal, 1997), 
where the same training and test conditions were 
used. However the only preprocessing step was cat- 
egorization. In that work. a WER of 7.1c)~. was ob- 
tained as opposed to 5.1(7c presented in this paper. 
For smaller amounts of training data (say 5 000 sen- 
tence pairs), the DP based search seems to be even 
lnore superior. 
Table 5: Language model perplexity (PP), word er- 
ror rates (INS/DEL. WER) and sentence error rates 
(SER) for different language models. 
Model Language PP INS/DEL Translation WER Errors \[SER \[%\] 
Zerogram 237.0 0.6/18.6 31.1 98.1 
Unigram 74.4 0.9/12.4 20.4 94.8 
Bigram 4.1 0.9/3.4 5.1 30.1 
4.4 Effect of the Word Reordering 
In more general cases and applications, there will 
ahvays be sentence pairs with word alignments for 
which the monotony constraint is \]lot satisfied. How- 
ever even then, the nlonotouy constraint is satisfied 
locally for the lion's share of all word alignments in 
such sentences. Therefore. we expect t.o extend the 
approach presented by the following methods: 
• more systelnatic approaches to local and global 
word reorderiugs that try to produce the same 
word order in both languages. 
• a multli-level approach that allows a small (say 
4) number of large forward and backward tran- 
sitions. Within each level, the monotone align- 
ment model can still be applied, and only when 
moving from one level to the next, we have to 
handle the problem of different word orders. 
To show the usefulness of global word reorder- 
ing. we changed the word order of some sentences 
by hand. Table 6 shows the effect of the global re- 
ordering for two sentences. In the first example, we 
changed the order of two groups of consecutive words 
and placed an a.dditional copy of the Spanish word 
"euest, a'" into the source sentence. In the second 
example, the personal pronoun "'me" was placed at 
the end of the source sentence. In both cases, we 
obtained a correct translation. 
5 Conclusion 
In this paper, we have presented an HMM based ap- 
proach to handling word alignlnents and an associat- 
ed search algorithm for autonaatic translation. The 
characteristic feature of this approach is to make the 
aligmnent probabilities explicitly dependent on the 
Mignment position of the previous word and t.o as- 
sume a monotony constraint for the word order in 
both languages. Due t.o this mOllOtony constraint. 
we are able to apply an efficient DP based search al- 
gorithln. We have tested the model successfully on 
the EuTrans traveller task, a limited domain task 
with a vocabulary of 200 to 500 words. The result- 
295 
Table 6: Effect of the global word reordering: O= original sentence, R= reference translation, A= automatic 
translation, O'= original sentence reordered, A'= aut, omatic translation after reordering. 
O: Cu£nto cuesta una habitacidn doble para cinco noches incluyendo servicio de habitaciones ? 
R: How much does a double room including room service cost for five nights ? 
A: How much does a double room including room service ? 
O': Cu~into cuesta una habitacidn doble incluyendo servicio' de habitaciones cuesta para cinco noches ? 
A': How much does a double room hlcluding room service cost for five nights ? 
O:. Expli'que _me la factura de la habitacidn tres dos cuatro. 
R: Explain the bill for room number three two four for me. 
A: Explain the bill for room number three two four. 
O': Explique la faclura de la habitaci6n tres dos cuatro .ane. 
A: Explain tile bill for rooln number three two four for me. 
ing word error rate was only 5.1V(. To mitigate the 
monotony constraint, we plan to reorder the words 
in the source sentences to produce the same word 
order in both languages. 
Acklmwledgement 
This work has been supported partly by t.he Ger- 
man Federal Ministry of Education. Science. Re- 
search and Technology under the contract number 
01 IV 601 A (Verbmobil) and by the European Com- 
munity under the ESPRIT project number 20268 
(EuTrans). 

References 
A. L. Berger. P. F. Brown. S. A. Della Pietra, V. J. 
Della Pietra. ,\]. R. Gillett. J. D. Lafferty. R. L. 
Mercer. H. Printz. and L. Ures. 1994. "The Call- 
dide System for Machine Translation". In Proc. of 
ARPA Huma~ La,guage Technology Workshop. 
pp. 152-157. Plainsboro. NJ. Morgan Kaufinann 
Publishers. San Mateo. CA, March. 
P. F. Brown, V. J. Della Pietra. S. A. Della Pietra, 
and R. L. Mercer. 1993. "'The Mathematics of 
Statistical Machine Translation: Parameter Esti- 
mat.ion". Comp,fational Linguistics, Vol. 19, No. 
2. pp. 263-311. 
I. Dagan. K. W. Church. and W. A. Gale. 1993. 
"'Robust Bilingual Word Alignment for Machine 
Aided Translation". In Proc. of the Workshop on 
I.<ry Large Corpora. pp. 1-8. Columbus, OH. 
P. Fung. and K. W. Church. 1994. "'K-vec: A New 
Approach for Aligning Parallel Texts", In Proc. of 
lhe 15th In i. Conf. on ('ompulalim~al Linguistics, 
pp. 10.(.)6-1102, Kyoto. 
F..lelinek. 1.(.t76. "'Speech Recognition by Statistical 
Methods". Proc. of lhe IEEE. Vol. 64. pp. 532- 
556. April. 
M. Kay. and M. R6scheisen. 1993. "Text- 
Translation Alignlnent". Comp~talional Lin.gu~s- 
lie.s. Vol. 19. No. 2. pp. 121-142. 
H. Ney, D. Mergel, A. Noll, A. Paeseler. 1992. "Da- 
t.a Driven Search Organization for Continuons 
Speech Recognition". IEEE Trans. on Signal Pro- 
cessing, Vol. SP-40. No. 2. pp. 272-281. February. 
E. Vidal. 1996. "Final report of Esprit Research 
Project. 20268 (EuTrans): Example-Based Under- 
standing and Translation Systelns". Universidad 
Polit~cnica de Valencia, Instituto Tecnol6gio de 
Informgtica, October. 
E. Vidal. 1997. "Finite-State Speech-to-Speech 
Translation". In Proc. of lhe Int. Co,,f. on Acous- 
fits, Speech and Signal Processing. Munich. April. 
S. Vogel, H. Ney, and C. Tillmmm. 1996. "HMM 
Based Word Alignment in Statistical Transla- 
tion". In Proc. of the 16~h Inf. Conf. on Com- 
putational Linguistics. pp. 836-841. Copenhagen, 
August. 
D. Wu. 1996. "'A Polynomial-Time Algorithm for 
Statistical Machine Translation". In Proc. of the 
34th Annual Conf. of the Associalio~ for Comp~l- 
talional Linguistics, pp. 152-158. Santa Cruz, CA. 
Julle, 
