A DP based Search Algorithm for Statistical Machine Translation 
S. Nieflen, S. Vogel, H. Ney, and C. Tillmann 
Lehrstuhl fiir Informatik VI 
RWTH Aachen - University of Technology 
D-52056 Aachen, Germany 
Emaih niessen©informatik, rwth-aachen, de 
Abstract 
We introduce a novel search algorithm for statisti- 
cal machine translation based on dynamic program- 
ming (DP). During the search process two statis- 
tical knowledge sources are combined: a translation 
model and a bigram language model. This search al- 
gorithm expands hypotheses along the positions of 
the target string while guaranteeing progressive cov- 
erage of the words in the source string. We present 
experimental results on the Verbmobil task. 
1 Introduction 
In this paper, we address the problem of finding the 
most probable target language representation of a 
given source language string. In our approach, we 
use a DP based search algorithm which sequentially 
visits the target string positions while progressively 
considering the source string words. 
The organization of the paper is as follows. Af- 
ter reviewing the statistical approach to machine 
translation, we first describe the statistical know- 
ledge sources used during the search process. We 
then present our DP based search algorithm in de- 
tail. Finally, experimental results for a bilingual cor- 
pus are reported. 
1.1 Statistical Machine Translation 
In statistical machine translation, the goal of the 
search strategy can be formulated as follows: We 
are given a source language ('French') string fl a = 
fl • .- f.t, which is to be translated into a target lan- 
guage ('English') string e~ = el...et with the un- 
known length I. Every English string is considered 
as a possible translation for the input string. If we 
assign a probability Pr(e~\[f() to each pair of strings 
(e/, f~), then we have to choose the length Iopt and 
-/opt the English string e 1 that maximize Pr(e f If J) for 
a given French string fJ. According to Bayes deci- 
sion rule, Iopt and ~°" can be found by 
(Iopt,J1 °"') =argmax{Pr(e~lf~)} 
I,e{ 
= argmax{Pr(e().Pr(f~Jlel)}. (1) 
l,e I 
Pr(e~) is the English language model, whereas 
Pr(flJ\[eZa) is the string translation model. 
The overall architecture of the statistical transla- 
tion approach is summarized in Fig. 1. In this figure, 
we already anticipate the fact that we will transform 
the source strings in a certain manner and that we 
will countermand these transformations on the pro- 
duced output strings. This aspect is explained in 
more detail in Section 3. 
Source Language Text 
Transformation 1 
\[ Global Search: -1_ Pr(faIel) 
maximize Pr(el). Pr(f~lel) 
over e I 
I 
I Transformation I 
1 Target Language Text 
Lexicon Model I 
I ..o,o.o.,o°., \] 
Figure 1: Architecture of the translation approach 
based on Bayes' decision rule. 
The task of statistical machine translation can be 
subdivided into two fields: 
1. the field of modelling, which introduces struc- 
tures into the probabilistic dependencies and 
provides methods for estimating the parameters 
of the models from bilingual corpora; 
2. the field of decoding, i.e. finding a search algo- 
rithm, which performs the argmax operation in 
Eq. (1) as efficient as possible. 
960 
1.2 Alignment with Mixture Distribution 
Several papers have discussed the first issue, espe- 
cially the problem of word alignments for bilingual 
corpora (Brown et al., 1993), (Dagan et al., 1993), 
(Kay and RSscheisen, 1993), (Fung and Church, 
1994), (Vogel et al., 1996). 
In our search procedure, we use a mixture-based 
alignment model that slightly differs from the model 
introduced as Model 2 in (Brown et al., 1993). It is 
based on a decomposition of the joint probability for 
f~ into a product of the probabilities for each word 
d 
Pr(flJle~) = p( J\[I) " H p(fJl eta), (2) 
3=1 
where the lengths of the strings are regarded as 
random variables and modelled by the distribution 
p(JlI). Now we assume a sort of pairwise interac- 
tion between the French word fj and each English 
word ei in el i. These dependencies are captured in 
the form of a mixture distribution: 
I 
p(f31ezl) = ~'~p(ilj, J,I) " p(fjlei) • (3) 
i=1 
Inserting this into (2), we get 
J I 
Pr(Y(le~) = p(JII) YI ~~p(ilj, J, i) . p(f~le,) (4) 
j=l i=1 
with the following components: the sentence length 
probability p(JlI), the mixture alignment probabil- 
ity p(ilj, J, I) and the translation probability p(fle). 
So far, the model allows all English words in the 
target string to contribute to the translation of a 
French word. This is expressed by the sum over i 
in Eq. (4). It is reasonable to assume that for each 
source string position j one position i in the target 
string dominates this sum. This conforms with the 
experience, that in most cases a clear word-to-word 
correspgndence between a string and its translation 
exists. As a consequence, we use the so-called max- 
imum approximation: At each point, only the best 
choice of i is considered for the alignment path: 
d 
Pr(/( le~) =p( JII) ~I m~ p(ilj, J, I).p(fj\]e~). (5) 
j=l "el, \] 
We can now formulate the criterion to be maximized 
by a search algorithm: 
max \[p( JlI) max { Pr(e\[). I el 
H m.ax ~(ilj, J,I ) .p(f31e~)l . 
j=l ie\[1,1\] 
(6) 
Because of the problem of data sparseness, we use 
a parametric model for the alignment probabilities. 
It assumes that the distance of the positions relative 
to the diagonal of the (j, i) plane is the dominating 
factor: 
r(i _j I) 
p(ilj, J, I) = (7) , 
Ei,=l r(i' - j ) 
As described in (Brown et al., 1993), the EM al- 
gorithm can be used to estimate the parameters of 
the model. 
1.3 Search in Statistical Machine 
Translation 
In the last few years, there has been a number of 
papers considering the problem of finding an effi- 
cient search procedure (Wu, 1996), (Tillmann et al., 
1997a), (TiUmann et al., 1997b), (Wang and Waibel, 
1997). All of these approaches use a bigram language 
model, because they are quite simple and easy-to- 
use and they have proven their prediction power 
in stochastic language processing, especially speech 
recognition. Assuming a bigram language model, we 
would like to re-formulate Eq. (6) in the following 
way: 
J 
max\[p(J\]I)m~axljX~l max~(eilei-1)'l e 1 .= iE\[1,I\] 
P(ilJ, J,I)'p(fjlei)\] }\] • (8) 
Any search algorithm tending to perform the max- 
imum operations in Eq. (8) has to guarantee, that 
the predecessor word ei-1 can be determined at the 
time when a certain word ei at position i in the tar- 
get string is under consideration. Different solutions 
to this problem have been studied. 
(Tillmann et al., 1997b) and (Tillmann et al., 
1997a) propose a search procedure based on dynamic 
programming, that examines the source string se- 
quentially. Although it is very efficient in terms 
of translation speed, it suffers from the drawback 
of being dependent on the so-called monotonicity 
constraint: The alignment paths are assumed to 
be monotone. Hence, the word at position i - 1 
in the target sentence can be determined when the 
algorithm produces ei. This approximation corre- 
sponds to the assumption of the fundamental simi- 
laxity of the sentence structures in both languages. 
In (Tillmann et al., 1997b) text transformations in 
the source language are used to adapt the word or- 
dering in the source strings to the target language 
grammar. 
(Wang and Waibel, 1997) describe an algorithm 
based on A*-search. Here, hypotheses are extended 
961 
by adding a word to the end of the target string 
while considering the source string words in any or- 
der. The underlying translation model is Model 2 
from (Brown et al., 1993). 
(Wu, 1996) formulates a DP search for stochastic 
bracketing transduction grammars. The bigram lan- 
guage model is integrated into the algorithm at the 
point, where two partial parse trees are combined. 
2 DP Search 
2.1 The Inverted Alignment Model 
For our search method, we chose an algorithm which 
is based on dynamic programming. Compared to an 
A'-based algorithm dynamic programming has the 
fundamental advantage, that solutions of subprob- 
lems are stored and can then be re-used in later 
stages of the search process. However, for the op- 
timization criterion considered here dynamic pro- 
gramming is only suboptimal because the decompo- 
sition into independent subproblems is only approx- 
imately possible: to prevent the search time of a 
search algorithm from increasing exponentially with 
the string lengths and vocabulary sizes, local deci- 
sions have to be made at an earlier stage of the opti- 
mization process that might turn out to be subopti- 
mal in a later stage but cannot be altered then. As 
a consequence, the global optimum might be missed 
in some cases. 
The search algorithm we present here combines 
the advantages of dynamic programming with the 
search organization along the positions of the target 
string, which allows the integration of the bigram in 
a very natural way without restricting the alignment 
paths to the class of monotone alignments. 
The alignment model as described above is defined 
as a function that assigns exactly one target word to 
each source word. We introduce a new interpretation 
of the alignment model: Each position i in e / is 
assigned a position bi = j in fl J. Fig. 2 illustrates 
the possible transitions in this inverted model. 
At each position i of el, each word of the target 
language vocabulary can be inserted. In addition, 
the fertility l must be chosen: A position i and the 
word ei at this position are considered to correspond 
to a sequence of words f~:+1-t in f\]. In most cases, 
the optimal fertility is 1. It is also possible, that a 
word ei has fertility 0, which means that there is no 
directly corresponding word in the source string. We 
call this a skip, because the position i is skipped in 
the alignment path. 
Using a bigram language model, Eq. (9) specifies 
the modified search criterion for our algorithm. Here 
as above, we assume the maximum approximation to 
be valid. 
°~ 
e~ 
L,. 
.~ 
0 
.N 
O 
: : _. • 
: _" : : 
c~.-... ~ ...-.~ 
position in source string 
Figure 2: Transitions in the inverted model. 
I 
max\[p(dlI)ma:xH\[p(eiJei-1)" 
I el i=1 
' J max 1"I {P(ilJ'J'I)'P(:31ei)} j,t_ l=j-l+l (9) 
For better legibility, we regard the second product 
in Eq. (9) to be equal to 1, ifl = 0. It should be 
stressed that the pair (I,e{) optimizing Eq. (9) is 
not guaranteed to be also optimal in terms of the 
original criterion (6). 
2.2 Basic Problem: Position Coverage 
.4. closer look at Eq. (9) reveals the most important 
problem of the search organization along the target 
string positions: It is not guaranteed, that all the 
words in the source string are considered. In other 
words we have to force the algorithm to cover all 
input string positions. Different strategies to solve 
this problem are possible: For example, we can in- 
troduce a reward for covering a position, which has 
not yet been covered. Or a penalty can be imposed 
for each position without correspondence in the tar- 
get string. 
In preliminary experiments, we found that the 
most promising method to satisfy the position cov- 
erage constraint is the introduction of an additional 
parameter into the recursion formula for DP. In the 
following, we will explain this method in detail. 
2.3 Recursion Formula for DP 
In the DP formalism, the search process is described 
recursively. Assuming a total length I of the target 
string, Q1(c, i, j, e) is the probability of the best par- 
tial path ending in the coordinates i in el / and j in 
f J, if the last word ei is e and if c positions in the 
source string have been covered. 
962 
This quantity is defined recursively. Leaving a 
word ei without any assignment (skip) is the easiest 
case: 
QS (c, i, j, e) = max {p(ele')Q1(c, i - 1, j, d)} . 
Note that it is not necessary to maximize over the 
predecessor positions jr: This maximization is sub- 
sumed by the maximization over the positions on the 
next level, as can easily be proved. 
In the original criterion (6), each position j in the 
source string is aligned to exactly one target string 
position i. Hence, if i is assigned to I subsequent po- 
sitions in fl s, we want to verify that none of these po- 
sitions has already been covered: We define a control 
function v which returns 1 if the above constraint is 
satisfied and 0 otherwise. Then we can write: 
fH Qr\](c,i.j,e) = max {p(il3, J,I ) • p(f3iei)} • 
• l>0 " ~=j--l+l 
max {p(eie')" 
e j 
mjax \[Q,(c- l,i - 1,f ,e') .v(c,l,f ,j,e')\] )\] , 
We now have to find the maximum: 
Q,(c, i, j, e) = max {QS(c, i, j, e), Qn(c, i, j, e)} . 
The decisions made during the dynamic program- 
ming process (choices of l, j' and e ~) are stored for 
recovering the whole translation hypothesis. 
The best translation hypothesis can be found by 
optimizing the target string length I and requiring 
the number of covered positions to be equal to the 
source string length J: 
max {P(JlI) " maxQ1(J'I'j'e) } j,e (10) 
2.4 Acceleration Techniques 
The time comple.,dty of the translation method as 
described above is 
O(i2ax " j3 . iSl 2) , 
where I~\] is the size of the target language vocab- 
ulary C. Some refinements of this algorithm have 
been implemented to increase the translation speed. 
1. We can expect the progression of the source 
string coverage to be roughly proportional to 
the progression of the translation procedure 
along the target string. So it is legitimate to 
define a minimal and maximal coverage for each 
level i: 
Cmin(i)= \[iJJ -r, Cmax(i)= \[i~\] + r , 
where r is a constant integer number. In prelim- 
inary experiments we found that we could set r 
to 3 without any loss in translation accuracy. 
This reduces the time complexity by a factor J. 
2. Optimizing the target string length as formu- 
lated in Eq. (10) requires the dynamic program- 
ming procedure to start all over again for each 
I. If we assume the dependence of the align- 
ment probabilities p(ilj, J, I) on I to be negligi- 
ble, we can renormalize them by using an esti- 
mated target string length/~ and use p(ilj , J, I). 
Now we can produce one translation e~ at each 
level i = I without restarting the whole process: 
max Vi(J, I, j, e) . (11) 
3,e 
For/~ we choose: /~ = \](J) = J-/~- where p¢ 
and #j denote the average lengths of the target 
and source strings, respectively. 
This approximation is partly undone by what 
we call rescoring: For each translation hypoth- 
esis e / with length I, we compute the "true" 
score (~(I) by searching the best inverted align- 
ment given e / and fs and evaluating the prob- 
abilities along this alignment. Hence, we finally 
find the best translation via Eq. (12): 
max{,(JII) (12) 
The time complexity for this additional step is 
negligible, since there is no optimization over 
the English words, which is the dominant factor 
in the overall time complexity 
O(Imax " j2 . \[E.\[2) . 
3. We introduced two thresholds: 
SL" If e' is the predecessor word of e and e is 
not aligned to the source string ("skip"), 
then p(eie') must be higher than SL. 
ST" A word e can only be associated with a 
source language word f, if p(f\[e) is higher 
than ST. 
This restricts the optimization over the target 
language vocabulary to a relatively small set of 
candidate words. The resulting time complexity 
is 
O(Im~x. J2-IEI). 
4. When searching for the best partial path to a 
gridpoint G = (c,i,j,e), we can sort the arcs 
leading to G in a specific manner that allows us 
to stop the computation whenever it becomes 
clear that no better partial path to G exists. 
The effect of this measure depends on the qual- 
ity of the used models; in preliminary experi- 
ments we observed a speed-up factor of about 
3.5. 
963 
3 Experiments 
The search algorithm suggested in this paper was 
tested on the Verbmobil Corpus. The results of pre- 
liminary tests on a small automatically generated 
Corpus (Amengual et al., 1996) were quite promis- 
ing and encouraged us to apply our search algorithm 
to a more realistic task. 
The Verbmobil Corpus consists of spontaneously 
spoken dialogs in the domain of appointment sche- 
duling (Wahlster, 1993). German source sentences 
are translated into English. In Table 1 the character- 
istics of the training and test sets are summarized. 
The vocabularies include category labels for dates, 
proper names, numbers, times, names of places and 
spellings. The model parameters were trained on 
16 296 sentence pairs, where names etc. had been 
replaced by the appropriate labels. 
Table 1: Training and test conditions of the Verb- 
mobil task. 
formed sample translations (i.e. after labelling) was 
13.8. 
In preliminary evaluations, optimal values for the 
thresholds OL and OT had been determined and kept 
fixed during the experiments. 
As an automatic and easy-to-use measure of the 
translation performance, the Levenshtein distance 
between the produced translations and the sample 
translations was calculated. The translation results 
are summarized in Table 2. 
Table 2: Word error rates on the Verbmobil Corpus: 
insertions (INS), deletions (DEL) and total rate 
of word errors (WER) before (BL) and after (AL) 
rule-based translation of the labels. 
before / after Error Rates (%) 
INS DEL WER 
BL 7.3 18.4 45.0 
AL 7.6 17.3 39.6 
Words in Vocabulary 
German 4 498 
English 2 958 
Number of Sentences 
in Training Corpus 16 296 
in Test Corpus 150 
Given the vocabulary sizes, it becomes quite ob- 
vious that the lexicon probabilities p(f\[e) can not 
be trained sufficiently on only 16 296 sentence pairs. 
The fact that about 40% of the words in the lexicon 
are seen only once in training illustrates this. To im- 
prove the lexicon probabilities, we interpolated them 
with lexicon probabilities pM(fle) manually created 
from a German-English dictionary: 
{'o ~ if (e, f) is in the dictionary pM(fle) _ otherwise ' 
where Ne is the number of German words listed as 
translations of the English word e. The two lexica 
were combined by linear interpolation with the in- 
terpolation parameter A. For our first experiments, 
we set A to 0.5. 
The test corpus consisted of 150 sentences, for 
which sample translations exist. The labels were 
translated separately: First, the test sentences were 
preprocessed in order to replace words or groups 
of words by the correct category label. Then, our 
search algorithm translated the transformed sen- 
tences. In the last step, a simple rule-based algo- 
rithm replaced the category labels by the transla- 
tions of the original words. 
We used a bigram language model for the Eng- 
lish language. Its perplexity on the corpus of trans- 
(Tillmann et al., 1997a) report a word error rate 
of 51.8% on similar data. 
Although the Levenshtein distance has the great 
advantage to be automatically computable, we have 
to keep in mind, that it depends fundamentally on 
the choice of the sample translation. For example, 
each of the expressions "thanks", "thank you" and 
"thank you very much" is a legitimate translation 
of the German "danke schSn", but when calculating 
the Levenshtein distance to a sample translation, at 
least two of them will produce word errors. The 
more words the vocabulary contains, the more im- 
portant will be the problem of synonyms. 
This is why we also asked five experts to classify 
independently the produced translations into three 
categories, being the same as in (Wang and Waibel, 
1997): 
Correct translations are grammatical and convey 
the same meaning as the input. 
Acceptable translations convey the same meaning 
but with small grammatical mistakes or they convey 
most but not the entire meaning of the input. 
Incorrect translations are ungrammatical or con- 
vey little meaningful information or the information 
is different from the input. 
Examples for each category are given in Table 
3. Table 4 shows the statistics of the translation 
performance. When different judgements existed 
for one sentence, the majority vote was accepted. 
For the calculation of the subjective sentence error 
rate (SSER), translations from the second category 
counted as "half-correct". 
When evaluating the performance of a statistical 
machine translator, we would like to distinguish er- 
rors due to the weakness of the underlying models 
964 
Table 3: Examples of Correct (C), Acceptable (A), and Incorrect (I) translations on Verbmobil. The source 
language is German and the target language is English. 
C 
Input: 
Output: 
Input: 
Output: 
Ah neunter M/irz bin ich in KSln. 
I am in Cologne on the ninth of March. 
Habe ich mir notiert. 
I have noted that. 
A 
Input: 
Output: 
Input: 
Output: 
Samstag und Februar sind gut, aber der siebzehnte w~ire besser. 
Saturday and February are quite but better the seventeenth. 
Ich kSnnte erst eigentlich jetzt wieder dann November vorschlagen. Ab zweiten 
November. 
I could actually coming back November then. Suggest beginning the second of November. 
Input: 
Output: 
Ja, also mit Dienstag und mittwochs und so h/itte ich Zeit, aber Montag kommen wir 
hier nicht weg aus Kiel. 
Yes, and including on Tuesday and Wednesday as well, I have time on Monday but we 
will come to be away from Kiel. 
Input: Dann fahren wir da los. 
Output: We go out. 
Table 4: Subjective evaluation of the translation 
performance on Verbmobil: number of sentences 
evaluated as Correct (C), Acceptable (A) or In- 
correct (I). For the total percentage of non-correct 
translations (SSER), the "acceptable" translations 
are counted as half-errors. 
I Total Correct Acceptable Incorrect SSER I 
150 61 45 44 44.3% 
from search errors, occuring whenever the search 
algorithm misses a translation hypothesis with a 
higher score. Unfortunately, we can never be sure 
that a search error does not occur, because we do 
not know whether or not there is another string with 
an even higher score than the produced output. 
Nevertheless, it is quite interesting to compare the 
score of the algorithm's output and the score of the 
sample translation in such cases in which the out- 
put is not correct (it is classified as "acceptable" or 
"incorrect" ). 
The original value to be maximized by the search 
algorithm (see Eq. (6)) is the score as defined by the 
underlying models and described by Eq. (13). 
J 
Pr(e~).p(JII) H max ~(ilj , J, I). p(fjlei)\] • (13) j=l ie\[1,1\] 
We calculated this score for the sample trans- 
lations as well as for the automatically generated 
translations. Table 5 shows the result of the com- 
parison. In most cases, the incorrect outputs have 
higher scores than the sample translations, which 
leads to the conclusion that the improvement of the 
models (stronger language model for the target lan- 
guage, better translation model and especially more 
training data) will have a strong impact on the qual- 
ity of the produced translations. The other cases, i. 
e. those in which the models prefer the sample trans- 
lations to the produced output, might be due to the 
difference of the original search criterion (6) and the 
criterion (9), which is the basis of our search algo- 
rithm. The approximation made by the introduction 
of the parameters OT and OL is an additional reason 
for search errors. 
Table 5: Comparison: Score of Reference Transla- 
tion e and Translator Output e ~ for "acceptable" 
translations (A) and "incorrect" translations (I). 
For the total number of non-correct translations 
(T), the "acceptable" translations are counted as 
half-errors. 
A I T % 
Total number 45 44 66.5 100.0 
Score(e) >_ Score(C) 11 13 18.5 27.8 
Score(e) < Score(C) 34 31 48.0 72.2 
As far as we know, only two recent papers have 
dealt with decoding problem for machine translation 
systems that use translation models based on hid- 
den alignments without a monotonicity constraint: 
(Berger et al., !994) and (Wang and Waibel, 1997). 
The former uses data sets that differ significantly 
from the Verbmobil task and hence, the reported 
results cannot be compared to ours. The latter 
presents experiments carried out on a corpus corn- 
965 
parable to our test data in terms of vocabulary sizes, 
domain and number of test sentences. The authors 
report a subjective sentence error rate which is in 
the same range as ours. An exact comparison is 
only possible if exactly the same training and test- 
ing data are used and if all the details of the search 
algorithms are considered. 
4 Conclusion and Future Work 
In this paper, we have presented a new search al- 
gorithm for statistical machine translation. First 
experiments prove its applicability to realistic and 
complex tasks such as spontaneously spoken dialogs. 
Several improvements to our algorithm are plan- 
ned, the most important one being the implementa- 
tion of pruning methods (Ney et al., 1992). Pruning 
methods have already been used successfully in ma- 
chine translation (Tillmann et al., 1997a). The first 
question to be answered in this context is how to 
make two different hypotheses H1 and/-/2 compara- 
ble: Even if they cover the same number of source 
string words, they might cover different words, es- 
pecially words that are not equally difficult to trans- 
late, which corresponds to higher or lower transla- 
tion probability estimates. To cope with this prob- 
lem, we will introduce a heuristic for the estimation 
of the cost of translating the remaining source words. 
This is similar to the heuristics in A'-search. 
(Vogel et al., 1996) report better perplexity re- 
sults on the Verbmobil Corpus with their HMM- 
based alignment model in comparison to Model 2 
of (Brown et al., 1993). For such a model, however, 
the new interpretation of the alignments becomes 
essential: We cannot adopt the estimates for the 
alignment probabilities p(ili', I). Instead, we have 
to re-calculate them as inverted alignments. This 
will provide estimates for the probabilities P(JlJ', J). 
The most important advantage of the HMM-based 
alignment models for our approach is the fact, that 
they do not depend on the unknown target string 
length I. 
Acknowledgement. This work was partly sup- 
ported by the German Federal Ministry of Educa- 
tion, Science, Research and Technology under the 
Contract Number 01 IV 601 A (Verbmobil). 

References 
J. C. Amengual, J. M. Benedi, A. Castafio, A. Mar- 
zal, F. Prat, E. Vidal, J. M. Vilar, C. Delogu, 
A. di Carlo, H. Ney, and S. Vogel. 1996. Example- 
Based Understanding and Translation Systems 
(EuTrans): Final Report, Part I. Deliverable of 
ESPRIT project No. 20268, October. 
A.L. Berger, P.F. Brown, J. Cocke, S.A. Della 
Pietra, V.J. Della Pietra, J.R. Gillett, J.D. Laf- 
ferty, R.L. Mercer, H. Printz, and L. Ures. 1994. 
The Candide System for Machine Translation. In 
Proc. ARPA Human Language Technology Work- 
shop, Plainsboro, N J, pages 152-157. Morgan 
Kanfmann Publ., March. 
P.F. Brown, S.A. Della Pietra, V.J. Della Pietra, 
and R.L. Mercer. 1993. Mathematics of Statisti- 
cal Machine Translation: Parameter Estimation. 
Computational Linguistics, 19(2):263-311. 
I. Dagan, K. W. Church, and W. A. Gale. 1993. 
Robust Bilingual Word Alignment for Machine 
Aided Translation. In Proceedings of the Work- 
shop on Very Large Corpora, Columbus, Ohio, 
pages 1-8. 
P. Fung and K.W. Church. 1994. K-vet: A new Ap- 
proach for Aligning Parallel Texts. In Proceedings 
of the 15th International Conference on Compu- 
tational Linguistics, Kyoto, Japan, pages 1096- 
1102. 
M. Kay and M. RSscheisen. 1993. Text-Trans- 
lation Alignment. Computational Linguistics, 
19(1):121-142. 
H. Ney, D. Mergel, A. Noll, and A. Paeseler. 1992. 
Data Driven Search Organization for Continuous 
Speech Recognition. IEEE Transactions on Sig- 
nal Processing, 40(2):272-281, February. 
C. Tillmann, S. Vogel, H. Ney, H. Sawaf, and A. Zu- 
biaga. 1997a. Accelerated DP based Search for 
Statistical Translation. In Proceedings of the 5th 
European Conference on Speech Communication 
and Technology, Rhodes, Greece, pages 2667-2670, 
September. 
C. Tillmann, S. Vogel, H. Ney, and A. Zubia- 
ga. 1997b. A DP-Based Search using Monotone 
Alignments in Statistical Translation. In Proceed- 
ings of the ACL/EACL '97, Madrid, Spain, pages 
289-296, July. 
S. Vogel, H. Ney, and C. Tillmann. 1996. HMM- 
Based Word Alignment in Statistical Translation. 
In Proceedings of the 16th International Confer- 
ence on Computational Linguistics, Copenhagen, 
Denmark, pages 836-841, August. 
W. Wahlster. 1993. Verbmobih Translation of Face- 
to-Face Dialogs. In Proceedings of the MT Sum- 
mit IV, pages 127-135, Kobe, Japan. 
Ye-Yi Wang and A. Waibel. 1997. Decoding Algo- 
rithm in Statistical Translation. In Proceedings of 
the ACL/EACL '97, Madrid, Spain, pages 366- 
372, July. 
D. Wu. 1996. A Polynomial-Time Algorithm for 
Statistical Machine Translation. In Proceedings 
of the 34th Annual Conference of the Association 
for Computational Linguistics, Santa Cruz, CA, 
pages 152 - 158, June. 
