Word Re-ordering and DP-based Search in Statistical Machine 
Translation 
Christoph Tillmann and Hermann Ney 
Lehrstuhl fiir Informatik VI, Computer Science Department 
RWTH Aachen - University of Technology 
D-52056 Aachen, Germany 
{t illmann, ney}@inf ormat ik. rwth-aachen, de 
Abstract 
In this paper, we describe a search procedure for sta- 
tistical machine translation (MT) based on dynmnic 
programming (DP). Starting from a DP-based solu- 
tion to the traveling salesman problem, we present 
a novel technique to restrict the possible word re- 
ordering between source and target  in or- 
der to achieve an efficient search algorithm. A search 
restriction especially useful for tile translation di- 
rection from German to English is presented. The 
experimental tests are carried out on the Verbmo- 
bil task (Germm>English, 8000-word vocabulary), 
which is a limited-domain spoken- task. 
1 Introduction 
The goal of machine translation is tile translation 
of a text given in some source  into a tar- 
gel: . We are given a source string fJ = 
fl ...fj.-.f.l of length J, wlfich is to be translated into 
a target string c\[ = cl ...ei...el of length I. Among 
all possible target strings, we will choose the string 
with the highest probability: 
dI = argmax {Pr(e{lff)} q 
-- argmax {Pr(el).Pr(fflel) } (1) 
The argmax operation denotes the search problem, 
i.e. the generation of the output sentence in the tar- 
get . Pr(c~) is the  model of tim 
target , whereas Pr(fi'le*l) is the transla- 
tion model. Our approach uses word-to-word depen- 
dencies between source and target words. The model 
is often further restricted so that each source word 
is assigned to exactly one target word (Brown et al., 
1993; Ney et al., 2000). These alignment models 
are similar to the concept of hidden Markov models 
(HMM) in speech recognition. The alignment map- 
ping is j --+ i = aj from source position j to target 
position i = aj. The use of this alignment model 
raises major problems if a source word has to be 
aligned to several target words, e.g. when translat- 
ing German compound nouns. A siuLple extension 
will be used to handle this problem. 
In Section 2, we briefly review our approach to sta- 
tistical machine translation. In Section 3, we in- 
troduce our novel concept to word re-ordering and 
a DP-based search, which is especially suitable for 
the translation direction fl'om German to English. 
This approach is compared to another re-ordering 
scheme presented in (Berger et al., 1996). In Sec- 
tion 4, we present the t)erformance measures used 
and give translation results on the Verbmobil task. 
2 Basic Approach 
In this section, we briefly review our translation ap- 
proach. In Eq. (1), Pr(C~l) is the  model, 
which is a trigrain  model in this case. For 
the translation model Pr(fille{), we go on the as- 
sunlption that each source word is aligned to ex- 
actly one target word. The alignment model uses 
two kinds of parameters: alignment probabilities 
p(ajlaj_l, I, J), where the probability of alignment 
aj for position j deI)ends on tile previous alignment 
position aj-i (Ney et al., 2000) and lexicon proba.- 
bilities p(\]~le~j). When aligning the words in par- 
allel texts (tbr  pairs like Spanish-English, 
French-Englisll, Italian-German,...), we typically ob- 
serve a strong localization effect. In many cases, 
there is an even stronger restriction: over large por- 
tions of tile source string, the alignment is monotone. 
2.1 Inverted Alignments 
To explicitly handle the word re-ordering between 
words in source and target , we use the con- 
cept of the so-called inverted aligmnents as given in 
(Ney et al., 2000). An inverted alignment is defined 
as follows: 
inverted alignment: i -+ j = bi. 
Target positions i are mapped to source positions bi. 
What is important and is not expressed by the nota- 
tion is the so-called coverage constraint: each source 
position j should be 'hit' exactly once by the path 
of the inverted aligmnent b~ = bL...bi...bi. Using the 
inverted alignments in the maximum approximation, 
850 
we obtain as search criterion: 
/fl i-I sn} . v(,Zl5 • 
ell ki=l 
' }} • max II z,.5. v(f,,,le*)\] --- 
hi i=1 
= max V(JII)-max 
I i I el 'hi i=1 
"p(bi\[bi-l,-\[,'\])'P(fbilCi)\] }}, 
where the two products over i have l)een merged into 
) i-1 a single product ()vet i. I (cilei_~) is tim trigram 
 model probability. The inverted alignment 
probability p(bi\[bi-l, I, .1) and the lexicon probabil- 
ity p(J'~,~ led are obtained by relative fl'equency es- 
timal;es frosll the Viterbi alignment path after the 
final training iteration. The details are given in 
(Och art(1 Ney, 2000). The sentence length prob- 
ability p(J\[1) is omitted without any loss in per- 
tbrmance. For the inverted alignment probability 
p(bi\[bi-~, I, J), we drop the dependence on the tar- 
get sentence length I. 
2.2 Word Joining 
The baseline alignment model does not pernfit that a 
source word is aligned to two or more target words, 
e.g. %r the translation direction from German to 
\]?,nglish, the German (:Oml)ound IIOlStl 'Zahnarztter- 
rain' causes ira>blares, because it must be translated 
by the two target words dcntist'.s appoi'ntmcnt. We 
use a solution to this 1)roblenl similar to the one 
presented in (()ell el; al., 1999), where target words 
are joined during training. The word joining is (lotto 
on the basis of a likelihood criterion. An extended 
lexicon model is defined, and its likelihood is com- 
pared to a baseline lexicon model, which takes only 
single-word dependencies into aecollnt. E.g. when 
'Zahnarzttermin' is aligned to dentist'.s, the extended 
lexicon model might learn that 'Zahnarzttcrmin' ac- 
tually has to be aligned to both dentist's and ap- 
pointment. In the following, we assmne that this 
word joining has been carried out. 
"I DP Algorithm for Statistical 
Machine Translation 
in order to handle the necessary word re-ordering as 
&n optimization problem within our dynmnic pro- 
gramming approach, we describe a solution to the 
traveling salesnmn problem (TSP) which is based 
on dynamic programming (Held, Karp, 1962). The 
traveling salesman problem is an oi)timization prob- 
lem which is defined as follows: given are a set of 
May 
of 
fourth 
the 
oll 
you 
visit 
not 
can 
colleague 
my 
case 
this 
In 
O O O O O O O O O O ~_....O 
O O O O O O O O O /O~ O O O 
O O O O O O O O O/O O O O 
O O O O O O O O ? O O O O 
O O O O O O O O// O O O O O 
O O O O O O O//O O O O O O 
o o o o o o 
oo Oo Oo Oo oO oO oO o o o ° 
?_... o.... o ...o._...o.....? 
O O O O O/~ O O O O O O O 
O O O9/7.O O O O O O O O O 
O Of O O O O O O O O O O 
 oO oOoOOoOOoOO 
I I I I 1 I I I I I I 1 I 
I d F k rn K S a v M n b 
. i a a e o i m i ~ i e 
e / n i / e e i C s 
s I n n I r h u 
e e t t C m 
g e h 
e I/ a 
n 
Figm'e 1: Re-ordering for the Gerlnan verbgroul). 
cities S = Sl ,---, s, and tbr each pair of cities si, sj 
the cost dij > 0 for traveling flom city s: to city 
.sj. We arc.' looking for the shortest tour visiting 
all cities exactly once while starting and ending in 
city sl. A straightforward way to find the short- 
est tour is by trying all possible permutations of the 
n cities. The resulting algorithm has a complexity 
of O(n!). Itowever, dynamic progrmnming can be 
used to find tile shortest tour in exponential time, 
namely in O(n 2-2'~), using the algorithm by Ileld and 
Karp. The approach recursively evahlates a quantity 
Q(C,j), where C is the set; of already visited cities 
and sj is the last visited city. Subsets C of increas- 
ing cardinality c are processed. The algorithm works 
due to the fact that not all permutations of cities 
have to be considered explicitly. For a given partial 
hypothesis (C, j), the order in which the cities in (2 
have beast visited cast be ignored (except j), only the 
score for the best path reaching j has to be stored. 
This algorithm can be applied to statistical machine 
translation. Using the concept of inverted align- 
ments, we explicitly take care of the coverage con- 
straint by introducing a coverage set C of source sen- 
tence positions that have been already processed. 
The advantage is that we can recombine search hy- 
potheses by dynmnic programming. The cities of 
the traveling salesman probleIn correspond to source 
851 
Table 1: DP algorithm tbr statistical machine translation. 
input: source string fl..-fj...f.l 
initialization 
for each cardinality c = 1, 2,. - •, J do 
tbr each pair (C,j), where j C C and ICl = c do 
tbr each tat'get word e E E 
max 
5,e t! 
Q~,(e,C,j) =p(fjle) {p(jlj',J).p(5) .pa(ele',e" ) -O~,,(e',C\ {j},j')} 
jlEC\{j} 
words fj in the input string of length J. For the 
final translation each source position is considered 
exactly once. Subsets of partial hypotheses with 
coverage sets C of increasing cardinality c are pro- 
cessed. For a trigrmn  model, the partial 
hyl)otheses are of tile form (c', c,(2,j), e', c are the 
last two target words, C is a coverage set for tile al- 
ready covered source positions and j is the last posi- 
tion visited. Each distance in the traveling salesman 
problem now corresponds to the negative logarithm 
of tile product of the translation, alignment and lan- 
guage model probabilities. The following auxiliary 
quantity is defined: 
Qc~((?,~ C~j) :.~. probability of tile best partial 
hypothesis (c~, b~), where 
C = {bt:\]k = 1,--.,i}, bi = j, 
(2 i ~- C aILd Ci_ 1 ~ e I. 
The type of alignment we have considered so far re- 
quires the stone length tbr source and target sen- 
tence, i.e. I = J. Evidently, this is an unrealistic 
assumption, therefore we extend the concept of in- 
verted alignments as follows: Wtmn adding a new 
position to the coverage set C, we might generate ei- 
ther 5 = 0 or a = 1 new target words. For 5 = 1, a 
new target  word is generated using the tri- 
gram  model p(e\[e', e"). For 5 = 0, no new 
target word is generated, while an additional source 
sentence position is covered. A modified  
inodel probability pa(e\[c', c") is defined as follows: 
1.0 ifa=0 
Pa(ele"e") = p(e\]e',e") ifa = 1 
We associate a distribution p(5) with the two cases 
5 = 0 and 5 = 1 and set p(5 = 1) = 0.7. 
The above auxiliary quantity satisfies tile following 
recursive DP equation: 
Qe,(c,C,j) = 
4. mein 5. Kollege 
~~ ,Q ~Verb_ .~ Fin:l~ 
--_ JJ 
1. In 7.nicht 9. Sie 
2. diesem 8. besuehen 10. am 
3. Fall 11. vierten 6. kann 
12. Mai 13.. 
Figure 2: Order in which source positions are visited 
tbr the example given in Fig.1. 
P(fJIe) "max {P(JlJ','/)'P(5)" ~,c I! 
i' ec\ {j} 
• pa( le', e")- Qe,, c \ {j}, j') }. 
The DP equation is evaluated recursively for each 
hypothesis (e',e,C,j). TILe resulting algorithm is 
depicted in Table 1. The complexity of the algorithm 
is O(E a • j2.2.1), where E is the size of the target 
 vocabulary. 
3.1 Word Re-Ordering with Verbgroup 
Restrictions: Quasi-monotone Search 
The above search space is still too large to allow 
the translation of a inedium length input sentence. 
On the other hand, only very restricted re-orderings 
are necessary, e.g. for the translation direction fi'om 
852 
Table 2: Coverage set hyllothesis extensions for the IBM re-ordering. 
Predecessor eovera~e, set \[ I Successor coverage set 
({1,...,,,,} \ ,l') ({1,...,,,,} ,0 
({\],-- .,,,t} \ {l,/1 } ,l') -- ({1,''', 'L~, } \ {/l} ,/) \ ,l') ({1,.--,,,,,} \ ,0 
({I,...,,,- J} \ ,l') ({1,--.,,,,} \ ,m) 
Oerlnan to English the monotonicity constraint is 
violated mainly with respect; to the German verb- 
group. In German, the verbgroui) usually consists 
of a left and a right verbal brace., whereas in En- 
glish the words of the verbgroul) usually tbrm a se- 
quence of consecutive words. Our new al)t)roach, 
which is ('alle.d quasi-monotone search, proce.sses 
the source sentence monotonically, while explicitly 
taking into account the positions of the (-lel'lnan 
verbgroup. 
A typical situation is shown in Figure \]. When 
translating the sentence monotonically fl'om left to 
right, the translation of the German finite verb 
'kmm', which is the left verbal brace in this case, 
is postponed mttil the German noun phrase 'mein 
t(ollege' is translated, which is the subject of the 
sentence. The.n, the. German infinitive 'besuclmn' 
and the negation particle 'nicht;' are translated. The 
trails\]at\oil of erie position in the source sentence 
nmy be postponed ti)r up to L = 3 source positions, 
and the translation of u I) to two source positions 
link be anticittated for at most 1~ = l0 source l)osi- 
tions. To formalize the attl)roach, we introduce four 
verbgroup stat;es S: 
• hfitial (Z): A contiguous, initial block <)f s<mrce 
l)ositions is covered. 
• Skitlped (K;): The translation of up to one word 
may be l)OStl)oned . 
• Verl> (V): The translation of Ul> (;o two words 
may be antMl)ated. 
, Final (Y:): The rest of the sentence is pro- 
cessed monotonically taking accoullt of tit(; al: 
ready covered positions. 
\Vhile processing the source sentence monotonically, 
the iifitial state Z is entere.d whenever there are no 
mmovered positions to the left of the rightmost cov- 
ered position. The sequence of states needed to 
carry out the. word re-ordering example in Fig. 1 
is given in Fig. 2. The 13 positions of the source 
sentence are processed in the order shown. A posi- 
tion is presented by the word at that position. Using 
these states, we define partial hypothesis extensions, 
which are of the following type: 
(S',C \ {j},j') --9 (S,C,j), 
Not only the cove.rage set C and the posit\oilS j,j', 
but also the verbgroup states S, S' are taken into ac- 
count. ~1~) be short, we omit the target words c, e' in 
the tbrinulation of the search hypotheses. There are 
13 types of extensions needed to describe the verb- 
group re-ordering. The details are given in (Till- 
mann, 2000). For each extension a new position is 
added to the coverage set. Covering the first lul- 
covered position in the source sentence, we use the 
 model probatfility p(e\[$, $). IIere, $ is the 
sentence boundary symbol, which is thought to be at; 
position 0 in the target sentence. Tile search starts 
in the hyl)othesis (Z, {~}, 0). {~} denotes the empty 
set, where, no source sentence t)osition is covered. 
The following recursive equation is evaluated: 
= (2) {gj\[j', 
3). p(5). ( ld, e"). p(fjlc) • ntax ~,c II 
• max Oc,(c',S',C \ {j},j')??. (st,/) ) 
(S t ,C\tj},ff)~(£,,C,j) 
j'cc\u} 
The search ends in the hypotheses (Z, {1,.-., d}, j). 
{1,..., d} de.notes a coverage se.t including all posi- 
tions from the starting 1)osition I to position J and 
j C {d- L,-.-, .\]}. The final score is obtaiiled from: 
,nax P($l", c'). G' (c, Z, { 1,..-, d}, .it, 
c,c I 
jc{a :.,...,a} 
where p($lc, c/) denotes the trigram  model, 
which predicts the sentence boundary $ at tim end 
of the target sentence. The complexity of the quasi- 
monotone search is O(\]'J a-,l- (\[~2-t-L-1~)). The proof 
is given ill (Tilhnann, 2000). 
3.2 Re-ordering with IBM Style 
Restrictions 
We compare our new api)roach with tim word re- 
ordering used in the IBM translation approach 
(Berger et al., 1996). A detailed descrit)tion of tile 
search procedure used is given in this patent. Source 
sentence words are aligned with hypothesized target 
sentence words, where the choice of a new source 
word, which has not been aligned with a target word 
yet, is restricted I . A procedural definitioll to restrict 
l In the approach described in (Berger et al., 1996), a mor- 
phological analysis is carried out and word morphenles rather 
thin, full-form words are used during the search, ltere, we 
process only flfll-forin words within the trmmlation proce.durc. 
853 
the number of pernmtations carried out for the word 
re-ordering is given. During the search process, a 
partial hyt)othesis is extended by choosing a source 
sentence position, which has not been aligned with a 
target sentence t)osition yet. Only one of the first n 
positions which are not already aligned in a partial 
hyt)othesis may be chosen, where n is set to 4. Tile 
restriction can be expressed in terms of the nmn- 
ber of uncovered source sentence positions to the 
left of the rightmost position m in the coverage set. 
This munber must be less than or equal to n - 1. 
Otherwise for the predecessor search hyt)othesis, we 
wonld have chosen a position that would not have 
been among the first n uncovered t)ositions. 
Ignoring the identity of the target  words 
e and c', the possible partial hypothesis extensions 
due to the IBM restrictions are shown in Table 2. 
In general, m, l, l' ~k {/1,12,/3} and in line umber 3 
and 4, l' must be chosen not to violate the above 
re-ordering restriction. Note that in line 4 the last; 
visited position for tile successor hypothesis must 
be m. Otherwise, there will be four uncovered po- 
sitions tbr the t)redecessor hypothesis violating the 
restriction. A dynamic programming recursion sin> 
ilar to the one in Eq. 2 is evaluated. In this case, we 
have no finite-state restrictions for the search space. 
Tile search stm'ts in hyi)othesis ({0}, 0) and ends in 
the hyt)otheses ({1,...,J},j), with j C {1,..-,d}. 
This approach leads to a search procedure with com- 
plexity O(E a . j4). The proof is given in (Tilhnann, 
2000). 
4 Experimental Results 
4.1 The Task and the Corpus 
We have tested tim translation system Oil the Verb- 
mobil task (Wahlster 1993). The Verbmobil task is 
an appointment scheduling task. Two subjects are 
each given a calendar and they are asked to schedule 
a meeting. The translation direction is from Ger- 
man to English. A summary of the corpus used in 
the experiments is given in Table 3. The perplexity 
for the trigrmn  model used is 26.5. Al- 
though the ultimate goal of tile Verbmobil project 
is the translation of spoken , the input used 
for the translation experinmnts reported on in this 
paper is the (more or less) correct orthographic tran- 
scription of the spoken sentences. Thus, the effects 
of spontaneous speech are t)resent in the corpus, e.g. 
the syntactic structure of tile sentence is rather less 
restricted, however the effect of sl)eech recognition 
errors is not covered. 
For the experiments, we use a simt)le pret)rocessing 
step. German city names are replaced by category 
markers. The translation search is carried out with 
tlm category markers and tlm city names are resub- 
stituted into the target sentence as a postt)rocessing 
step. 
Table 3: Training and test; conditions for the Verb- 
mobil task (*number of words without punctuation 
marks). 
\[German English 
Training: Selltences 
Words 
Words* 
Vocabulary Size 
Singletons 
Test-147: Sentences 
Words 
Perplexity 
58073 
519523 549921 
418979 453632 
7939 4648 
3454 1699 
147 
1968 2173 
- 26.5 
Table 4: Multi-reference word error rate (roWER) 
and subjective sentence error rate (SSER) for three 
different search t)rocedures. 
Search CPU time 
Method \[sec\] 
MonS 0.9 
QmS 10.6 
IbnlS 28.6 
roWER SSER \[yo\] \[y0\] 
42.0 30.5 
34.4 23.8 
38.2 26.2 
4.2 Performance Measures 
The following two error criteria are used ill our ex- 
t)erinmnts: 
• roWER: multi-reference WER: 
We use the Levenshtein distance between tile 
automatic translation and several reference 
translations as a measure of tile translation er- 
rors. On average, 6 reference translations per 
automatic translation are availal)le. Tile Lev- 
enshtein distance between the automatic trans- 
lation and each of tile reference translations is 
comt)uted, and the minimum Levenshtein dis- 
tance is taken. This measure has the advantage 
of being completely automatic. 
• SSER: subjective sentence error rate: 
For a more detailed analysis, tile translations 
are judged i)y a tminan test 1)erson. For the er-- 
ror counts, a range from 0.0 to 1.0 is used. An 
error count of 0.0 is assigned to a perfect trans- 
lation, and an error count of 1.0 is assigned to 
a semantically and syntactically wrong transb> 
tion. 
4.3 Translation Experiments 
For tile translation experiments, Eq. 2 is recursively 
evahlated. We apply a beam search concet)t as in 
st)eech recognition. However there is no global prun- 
ing. Search hypotheses are i)rocessed separately ac- 
cording to their coverage set d. The best scored 
854 
hyi)othesis tbr each coverage set is comlmted: 
Om,,,,,,(c) = c,c',6",j 
The hyl)othesis (d, e, $, C, j) is t)1'1151(;(l if: 
Q~,(e,S,C,j) < to.O,~cam(C), 
where to is a threshold to control the mmlber of sur- 
viving hypotheses. Additionally, for a given coverage 
set, at most 250 different hypotheses are kept dur- 
ing the search process, and the number of difl'erent 
words to |)e hyl)othesized by a source word is lim- 
ited. For each source word f, the list of its possible 
translations c is sorte(1 according to p(.flc) • p.,,.,,i(c), 
where Puui(e) is the unigrmn probability of the En- 
glish word c. It is sufficient to consi(ter only the best 
50 words. 
We show translation results for three at)l)roaches: 
tile monotone search (MonS): where no word re- 
ordering is allowed (Tillmann, 1997), the quasi- 
monotone search (QmS) as 1)resented in this palser 
amt the IBM style (IbmS) search as described in 
Section 3.2. 
TMsle 4: shows translation results tbr the three ap- 
I)roaches. The eomlsuting time is given in terms of 
CPU time per sentence (on a 450-MIlz l?entimn-III- 
PC). Itere, the printing threshold to = 10.0 is used. 
q_5:anslation errors are reported in terms of multi- 
reference word error rate (roWER) and subjective 
s(mtenee error rate (SSER). The monotone search 
tserforms worst in terms of both elTror rates 5IsWI~;I~. 
mid SSEIL The (;OSlll)lstislg time is low, sitlce 51o 5e- 
ordering is (:arried o551,. '\]'he quasi-inonotone search 
i)e1foI'551s t)est, in ter551s of l)oth error rates roWER 
and SSh;R. Additionally, it; works about 3 times as 
fast as the II3M style sem:eh. For our demonstra- 
tion system, we typically use the pruning threshold 
to = 5.0 to speed Ul) the search by a factor 5 while 
allowing for a sm¢fll degradation in translation accu- 
racy. 
The effect of the pruning threshold to is shown in 
Table 5. The coml)uting time, the number of search 
errors, and the mull;i-reference WEll, (roWER) are 
shown as a flmction of to. The negative logarithm 
of to is reporte(t. The translation scores for the hy- 
i)otheses generated with different threshohl values 
to are compared to the translation scores obtained 
with a conservatively large threshold to = 10.0. For 
each test series, we count tile mlml)er of sentences 
whose score is worse than the corresponding score of 
the tent; series with the conserw~tively large thresh- 
old to = 10.0, and this mm:ber is reported as the 
number of search errors, l)epending on the thresh- 
old to, the search algorithm may miss the globally 
of)timal path which typically results in additional 
translation errors. Decreasing the threshold results 
in higher mWEll due to additional search errors. 
Table 5: Effect 
of sere'e\ errors (147 sentences). 
Search to I CPU time #search 
Method \[ \[sec\] error 
QmS 0.0 0.07 108 
1.0 0.13 85 
2.5 0.35 44 
5.0 1.92 4 
10.0 10.6 0 
lbmS 0.0 0.14 108 
1.0 0.3 84 
2.5 0.8 45 
5.0 4.99 7 
10.0 28.52 0 
of the beam threshold on the nmnber 
roWER 
\[%1 
42.6 
37.8 
36.6 
34.6 
34.5 
43.4 
39.5 
39.1 
38.3 
38.2 
Table 6 shows example translations obtained by the 
three, difli;rent appro~mhes. Again, the monotone 
search performs worst. In the second and third 
translation examples, the 1bins word re-ordering 
performs worse than the QmS word re-ordering, 
since it; can not take l)roperly into at:count the word 
re-ordering (115(; to the, German verbgroul). Tile 
German finite verbs 'l)in' (second exmnple) and 
'kSnnten' (third exmnt)le) are too far away from the 
t)ersonal pronouns 'ich' and 'Sic' (6 respectively 5 
source sentence positions). In the last example, the 
less restrictive IbmS word re-ordering leads to a bet- 
ter translation, although the QmS translation is still 
aceeptabh'. 
5 Conclusion 
in this pal)er, we have presented a new, efficient 
DP-based search procedure for statistical machine 
translation. The approach assumes that, the word re- 
ordering is restricted to a few positions in the source 
sentence. The approach has been successfiflly tested 
on the 8000-word Verbmobil task. l'hture exten- 
sions of the system might include: 1) An e×tended 
translation model, where we use. more context to pre- 
(lict a source word. 2) An trot)roved  model, 
which takes into at:count syntactic structure, e.g. to 
ensure that a l)roper English verbgrout) in generated. 
3) A tight coupling with the speech recognizer out- 
lint. 
Acknowledgements 
This work has been supported as part of tile Verb- 
mobil project (contract number 01 IV 601 A) by 
the Certain5 Federal Ministry of Education, Science, 
Research and Technology and as part of the Eutrans 
project (ESPRIT project number 30268) by the Eu- 
rot)can Comimmity. 
855 
Table 6: Example Translations for the Verbmobil task. 
Input: Ja, wunderbar. KSnnen wir macllen. 
MonS: Yes, wonderflfl. Can we do . 
QmS: Yes, wonderflfl. We can do that . 
IbInS: Yes, wonderflfl. We can do that . 
Input: Das ist zu knapt) , weil ich ab dem dritten in Kaiserslautern bin . Genaugenommen imr atn dritten . 
Wie wSre es denn mn ghln Samstag , dem zehnten Februar ? 
MonS: That is too tight, becmlse I froln the third in Kaiserslautern. hi fact only on the third . 
How about 5hm Saturday, the tenth of February "? 
QmS: That is too tight, because Imn froln tim third in Kaiserslautern. In fact only on the third . 
Ahm how about Saturday, February the tenth ? 
IbmS: That is too tight , froln the third because I will be in Kaiserslautern. In fact only on the third. 
Atlln how about Saturday, February the tenth ? 
Input: Wenn Sie dann noch den siel)zehnten kSnnten , wSre das toll , ja. 
MonS: If you then also the seventeenth could, would be the great , yes . 
QmS: If you could then also the seventeenth , tlmt would be great, yes . 
1binS: Then if you could even take seventeenth , that would be great, , yes . 
Illtmt: .Ja, das kommt mir sehr gelegen. Machen wir es dann am besten so. 
MonS: Yes, that suits me perfectly . Do we should best like that . 
QmS: Yes , that suits me fine . We do it like that then best; . 
IbmS: Yes , that suits me fine . We should best do it like tlmt . 

References 

A. L. Berger, P. F. Brown, S. A. Della Pietra, 
V. J. Della Pietra, J. R. Gillett, A. S. Kehler, 
R. L. Mercer. 1996. Language Translation appa- 
ratus and method of using context-based transla- 
tion models. United States Patent, Patent Num- 
ber 5510981, April. 

P. F. Brown, V. J. Della Pietra, S. A. Della Pietra, 
and R. L. Mercer. 1993. The Mathematics of Sta- 
tistical Machine Translation: Parameter Estima- 
tion. Computational Linguistics, vol. 19, no. 2, 
pp. 263-311. 

M. Held, R. M. Karp. 1962. A Dynmnic Progrmn- 
ruing Approach to Sequencing Problems. &SIAM, 
vol. 10, no. 1, pp. 196-210. 

H. Ney, S. Niessen, F. J. Och, H. Sawaf, C. Tilhnmm, 
S. Vogel. 2000. Algorittuns for Statistical %'ansla- 
tion of Spoken Language. IEEE Transactions on 
Speech and Audio Processing, vol. 8, no. 1, pp. 24- 
36. 

F. J. Och, C. Tilhnann, H. Ney. 1999. hnprovcd 
Alignment Models for Statistical Machine ~IYans- 
lation. In Proc. of the ,loint SIGDAT Conference 
on Empirical Methods in Natural Language Pro- 
ccssing and Very Large Corpora (EMNLP99), pp. 
20-28, University of Maryland, College Park, MD, 
USA, June. 

F. J. Och and Ney, H. 2000. A comparison of align- 
ment models for statistical machine translation, hi 
Proceedings of COLING 2000: The 18th Interna- 
tional Conference on Computational Linguistics, 
Saarbr/ieken, Germany, July-August. 

C. Tilltnann. 2000. Complexity of the Dif 
ferent Word Re-ordering Approaches. The 
document can be found under the URL 
http ://www-i6. Informat ik. RWTg-Aachen. de/C 
olleagues/tilli/. Aachen University of' Tech- 
nology, Aachen, Germany, June. 

C. Tilhnann, S. Vogel, H. Ney and A. Zubiaga. 1997. 
A DP based Search Using Monotone Alignments 
in Statistical Translation. In Proc. of the 35th An- 
nual Conf. of the Association for" Computational 
Linguistics, pp. 289 296, Madrid, Spain, July. 

W. Wahlster. 1993. Verbmol)ih Translation of Face- 
to-Face Dialogs. MT Summit IV, pp. 127-135, 
Kobe, Japan. 
