An Algorithmic Framework for the Decoding Problem in
Statistical Machine Translation
Raghavendra Udupa U Tanveer A Faruquie
IBM India Research Lab
Block-1A, IIT, Hauz Khas
New Delhi - 110 016
India
{uraghave, ftanveer}@in.ibm.com
Hemanta K Maji
Dept. of Computer Science
and Engineering, IIT Kanpur
Kanpur - 208 016
India,
hkmaji@iitk.ac.in
Abstract
The decoding problem in Statistical Ma-
chine Translation (SMT) is a computation-
ally hard combinatorial optimization prob-
lem. In this paper, we propose a new al-
gorithmic framework for solving the decod-
ing problem and demonstrate its utility. In
the new algorithmic framework, the decod-
ing problem can be solved both exactly and
approximately. The key idea behind the
framework is the modeling of the decod-
ing problem as one that involves alternat-
ing maximization of two relatively simpler
subproblems. We show how the subprob-
lems can be solved efficiently and how their
solutions can be combined to arrive at a so-
lution for the decoding problem. A fam-
ily of provably fast decoding algorithms can
be derived from the basic techniques under-
lying the framework and we present a few
illustrations. Our first algorithm is a prov-
ably linear time search algorithm. We use
this algorithm as a subroutine in the other
algorithms. We believe that decoding algo-
rithms derived from our framework can be
of practical significance.
1 Introduction
Decoding is one of the three fundamental prob-
lems in classical SMT (translation model and
language model being the other two) as pro-
posed by IBM in the early 1990’s (Brown et al.,
1993). In the decoding problem we are given the
language and translation models and a source
language sentence and are asked to find the
most probable translation for the sentence. De-
coding is a discrete optimization problem whose
search space is prohibitively large. The chal-
lenge is, therefore, in devising a scheme to ef-
ficiently search the solution space for the solu-
tion.
Decoding is known to belong to a class of com-
putational problems popularly known as NP-
hard problems (Knight, 1999). NP-hard prob-
lems are known to be computationally hard and
have eluded polynomial time algorithms (Garey
and Johnson, 1979). The first algorithms for
the decoding problem were based on what is
known among the speech recognition commu-
nity as stack-based search (Jelinek, 1969). The
original IBM solution to the decoding prob-
lem employed a restricted stack-based search
(Berger et al., 1996). This idea was further ex-
plored by Wang and Waibel (Wang and Waibel,
1997) who developed a faster stack-based search
algorithm. In perhaps the first work on the
computational complexity of Decoding, Kevin
Knight showed that the problem is closely re-
lated to the more famous Traveling Salesman
problem (TSP). Independently, Christoph Till-
man adapted the Held-Karp dynamic program-
ming algorithm for TSP (Held and Karp, 1962)
to Decoding (Tillman, 2001). The original Held-
Karp algorithm for TSP is an exponential time
dynamic programming algorithm and Tillman’s
adaptation to Decoding has a prohibitive com-
plexity of Oparenleftbigl3m22mparenrightbig ≈ Oparenleftbigm52mparenrightbig (where m
and l are the lengths of the source and tar-
get sentences respectively). Tillman and Ney
showed how to improve the complexity of the
Held-Karp algorithm for restricted word re-
ordering and gave a Oparenleftbigl3m4parenrightbig ≈ Oparenleftbigm7parenrightbig algo-
rithm for French-English translation (Tillman
and Ney, 2000). An optimal decoder based on
the well-known A∗ heuristic was implemented
and benchmarked in (Och et al., 2001). Since
optimal solution can not be computed for prac-
tical problem instances in a reasonable amount
of time, much of recent work has focused on
good quality suboptimal solutions. An Oparenleftbigm6parenrightbig
greedy search algorithm was developed (Ger-
mann et al., 2003) whose complexity was re-
duced further to Oparenleftbigm2parenrightbig (Germann, 2003).
In this paper, we propose an algorithmic
framework for solving the decoding problem and
show that several efficient decoding algorithms
can be derived from the techniques developed in
the framework. We model the search problem
as an alternating search problem. The search,
therefore, alternates between two subproblems,
both of which are much easier to solve in prac-
tice. By breaking the decoding problem into
two simpler search problems, we are able to pro-
vide handles for solving the problem efficiently.
The solutions of the subproblems can be com-
bined easily to arrive at a solution for the orig-
inal problem. The first subproblem fixes an
alignment and seeks the best translation with
that alignment. Starting with an initial align-
ment between the source sentence and its trans-
lation, the second subproblem asks for an im-
proved alignment. We show that both of these
problems are easy to solve and provide efficient
solutions for them. In an iterative search for a
local optimal solution, we alternate between the
two algorithms and refine our solution.
The algorithmic framework provides handles
for solving the decoding problem at several lev-
els of complexity. At one extreme, the frame-
work yields an algorithm for solving the decod-
ing problem optimally. At the other extreme, it
yields a provably linear time algorithm for find-
ing suboptimal solutions to the problem. We
show that the algorithmic handles provided by
our framework can be employed to develop a
very fast decoding algorithm which finds good
quality translations. Ourfast suboptimalsearch
algorithms can translate sentences that are 50
words long in about 5 seconds on a simple com-
puting facility.
The rest of the paper is devoted to the devel-
opment and discussion of our framework. We
start with a mathematical formulation of the
decoding problem (Section 2). We then develop
the alternating search paradigm and use it to
develop several decoding algorithms (Section 3).
Next, we demonstrate the practical utility of our
algorithms with the help of results from our ini-
tial experiments (Section 5).
2 Decoding
The decoding problem in SMT is one of finding
the most probable translation ˆe in the target
language of a given source language sentence f
in accordance with the Fundamental Equation
of SMT (Brown et al., 1993):
ˆe = argmaxe Pr(f|e)Pr(e). (1)
In the remainder of this paper we will refer
to the search problem specified by Equation 1
as STRICT DECODING.
Rewriting the translation model Pr(f|e) assummationtext
aPr(f,a|e), where a denotes an alignment
between the source sentence and the target sen-
tence, the problem can be restated as:
ˆe = argmaxe summationdisplay
a
Pr(f,a|e)Pr(e). (2)
Even when the translation model Pr(f|e) is
as simple as IBM Model 1 and the language
model Pr(e) is a bigram language model, the
decoding problem is NP-hard (Knight, 1999).
Unless P = NP, there is no hope of an efficient
algorithm for the decoding problem. Since the
Fundamental Equation of SMT does not yield
an easy handle to design a solution (exact or
even an approximate one) for the problem, most
researchers have instead worked on solving the
following relatively simpler problem (Germann
et al., 2003):
(ˆe,ˆa) = argmax(e,a) Pr(f,a|e)Pr(e). (3)
We call the search problem specified
by Equation 3 as RELAXED DECODING.
Note that RELAXED DECODING relaxes
STRICT DECODING to a joint optimization
problem. The search in RELAXED DECODING
is for a pair (ˆe,ˆa). While RELAXED DECODING
is simpler than STRICT DECODING, it is also,
unfortunately, NP hard for even IBM Model
1 and Bigram language model. Therefore, all
practical solutions to RELAXED DECODING
have focused on finding suboptimal solutions.
The challenge is in devising fast search strate-
gies to find good suboptimal solutions. Table 1
lists the combinatorial optimization problems
in the domain of decoding.
In the remainder of the paper,mandldenote
the length of the source language sentence and
its translation respectively.
3 Framework for Decoding
We begin with a couple of useful observations
about the decoding problem. Although decep-
tively simple, these observations are very cru-
cial for developing our framework. They are
the source for algorithmic handles for breaking
the decoding problem into two relatively eas-
ier search problems. The first of these observa-
tions concerns with solving the problem when
we know in advance the mapping between the
source and target sentences. This leads to the
development of an extremely simple algorithm
for decoding when the alignment is known (or
Problem Search
STRICT DECODING ˆe = argmaxePr(f|e)Pr(e)
RELAXED DECODING (ˆe,ˆa) = argmax(e,a)Pr(f,a|e)Pr(e)
FIXED ALIGNMENT DECODING ˆe = argmaxePr(f,˜a|e)Pr(e)
VITERBI ALIGNMENT ˆa = argmaxaPr(f,a|˜e)
Table 1: Combinatorial Search Problems in Decoding
can be guessed). Our second observation is on
finding a better alignment between the source
and target sentences starting with an initial
(possibly suboptimal) alignment. The insight
provided by the two observations are employed
in building a powerful algorithmic framework.
3.1 Handles for attacking the Decoding
Problem
Our goal is to arrive at algorithmic handles
for attacking RELAXED DECODING. In this sec-
tion, we make couple of useful observations and
develop algorithmic handles from the insight
provided by them. The first of the two observa-
tions is:
Observation 1 For a given target length l and
a given alignment ˜a that maps source words to
target positions, it is easy to compute the opti-
mal target sentence ˆe.
ˆe = argmaxe Pr(f,˜a|e)Pr(e). (4)
Let us call the search problem specified by
Equation 4 as FIXED ALIGNMENT DECODING.
What Observation 1 is saying is that once the
target sentence length and the source to tar-
get mapping is fixed, the optimal target sen-
tence (with the specified target length and
alignment) can be computed efficiently. As
we will show later, the optimal solution for
FIXED ALIGNMENT DECODING can be com-
puted in O(m) time for IBM models 1-5 using
dynamic programming. As we can always guess
an alignment (as is the case with many decoding
algorithms in the literature), the above observa-
tion provides an algorithmic handle for finding
suboptimalsolutions for RELAXED DECODING.
Our second observation is on computing the
optimal alignment between the source sentence
and the target sentence.
Observation 2 For a given target sentence ˜e,
it is easy to compute the optimal alignment ˆa
that maps the source words to the target words.
ˆa = argmaxa Pr(f,a|˜e). (5)
It is easy to determine the optimal (Viterbi)
alignment between the source sentence and its
translation. In fact, for IBM models 1 and 2,
the Viterbi alignment can be computed using a
straight forward algorithm in O(ml) time. For
higher models, an approximate Viterbi align-
ment can be computed iteratively by an iter-
ative procedure called local search. In each it-
eration of local search, we look in the neighbor-
hood of the current best alignment for a better
alignment (Brown et al., 1993). The first itera-
tion can start with any arbitrary alignment (say
the Viterbi alignment of Model 2). It is possi-
ble to implement one iteration of local search in
O(ml) time. Typically, the number of iterations
is bounded in practice by O(m), and therefore,
local search takes Oparenleftbigm2lparenrightbig time.
Our framework is not strictly dependent on
the computation of an optimal alignment. Any
alignment which is better than the current
alignment is good enough for it to work. It is
straight forward to find one such alignment us-
ing restricted swaps and moves in O(m) time.
In the remainder of this paper, we use the term
Viterbi to denote any linear time algorithm for
computing an improved alignment between the
source sentence and its translation.
3.2 Illustrative Algorithms
In this section, we show how the handles pro-
vided by the above two observations can be em-
ployed to solve RELAXED DECODING. The two
handles are in some sense complementary to
each other. When the alignment is known, we
can efficiently determine the optimal translation
with that alignment. On the other hand, when
the translation is known, we can efficiently de-
termine a better alignment. Therefore, we can
use one to improve the other. We begin with the
following simple linear time decoding algorithm
which is based on the first observation.
Algorithm NaiveDecode
Input: Source language sentence f of length
m> 0.
Optional Inputs: Target sentence length l,
alignment ˜a between the source words and tar-
get positions.
Output: Target language sentence ˆe of length
l.
1. If l is not specified, let l = m.
2. If an alignment is not specified, guess some
alignment ˜a.
3. Compute the optimal translation ˆe by solv-
ing FIXED ALIGNMENT DECODING,
i.e., ˆe = argmaxe Pr(f,˜a|e)Pr(e).
4. return ˆe.
When the length of the translation is not
specified, NaiveDecode assumes that the trans-
lation is of the same length as the source sen-
tence. If an alignment that maps the source
words to target positions is not specified, the
algorithm guesses an alignment ˜a (˜a can be the
trivial alignment that maps the source word fj
to target position j, that is, ˜aj = j, or can
be guessed more intelligently). It then com-
putes the optimal translation for the source
sentence f, with the length of the target sen-
tence and the alignment between the source and
the target sentences kept fixed to l and ˜a re-
spectively, by maximizing Pr(f,˜a|e)Pr(e). As
FIXED ALIGNMENT DECODING can be solved
in O(m) time, NaiveDecode takes only O(m)
time.
The value of NaiveDecode lies not in itself per
se, butin its instrumental role in designing more
superior algorithms. The power of NaiveDecode
can be demonstrated with the following optimal
algorithm for RELAXED DECODING.
Algorithm NaiveOptimalDecode
Input: Source language sentence f of length
m> 0.
Output: Target language sentence ˆe of length
l, m2 ≤l ≤ 2m.
1. Let ˆe = null and ˆa = null.
2. For each l = m2 ,...,2m do
3. For each alignment a between the source
words and the target positions do
(a) Let e = NaiveDecode(f,l,a).
(b) If Pr(f,e,a) >Pr(f,ˆe,ˆa) then
i. ˆe = e
ii. ˆa = a.
4. return (ˆe,ˆa).
NaiveOptimalDecode considers various tar-
get lengths and all possible alignments be-
tween the source words and the target posi-
tions. For each target length l and alignment
a it employs NaiveDecode to find the best so-
lution. There are (l + 1)m candidate align-
ments for a target length l and O(m) can-
didate target lengths. Therefore, NaiveOp-
timalDecode explores Θ(m(l+ 1)m) alignments.
For each of these candidate alignments, it
makes a call to NaiveDecode. The time com-
plexity of NaiveOptimalDecode is, therefore,
Oparenleftbigm2(l+ 1)mparenrightbig. Although an exponential time
algorithm, it can compute the optimal solution
for RELAXED DECODING.
With NaiveDecode and NaiveOptimalDecode
we have demonstrated the power of the algo-
rithmic handle provided by Observation 1. It
is important to note that these two algorithms
are at the two extremities of the spectrum.
NaiveDecode is a linear time decoding algorithm
that computes a suboptimal solution for RE-
LAXED DECODING while NaiveOptimalDecode
is an exponential time algorithm that computes
the optimal solution. What we want are algo-
rithms that are close to NaiveDecode in com-
plexity and to NaiveOptimalDecode in qual-
ity. It is possible to reduce the complexity of
NaiveOptimalDecode significantly by carefully
reducing the number of alignments that are ex-
amined. Instead of examining all Θ(m(l+1)m)
alignments, if we examine only a small num-
ber, say g(m), alignments in NaiveOptimalDe-
code, we can find a solution in O(mg(m)) time.
In the next section, we show how to restrict
the search to only a small number of promis-
ing alignments.
3.3 Alternating Maximization
We now show how to use the two algorithmic
handles to come up with a fast search paradigm.
We alternate between searching the best trans-
lation given an alignment and searching the
best alignment given a translation. Since the
two subproblems are complementary, they can
be used to improve the solution computed by
one another by alternating between the two
problems.
Algorithm AlternatingSearch
Input: Source language sentence f of length
m> 0.
Output: Target language sentence e(o) of
length l (m/2 ≤l ≤ 2m).
1. Let e(o) = null and a(o) = null.
2. For each l = m/2,...,2m do
(a) Let e = null and a = null.
(b) While there is improvement in solution
do
i. Let e = NaiveDecode(f,l,a).
ii. Let ˆa = Viterbi(f,e).
(c) If Pr(f,e,a) >Prparenleftbigf,e(o),a(o)parenrightbig then
i. e(o) = e
ii. a(o) = a.
3. return e(o).
AlternatingSearch searches for a good trans-
lation by varying the length of the tar-
get sentence. For a sentence length l,
the algorithm finds a translation of length
l and then iteratively improves the trans-
lation. In each iteration it solves two
subproblems: FIXED ALIGNMENT DECODING
and VITERBI ALIGNMENT. The input to each
iteration are the source sentence f, the tar-
get sentence length l, and an alignment a be-
tween the source and target sentences. So, Al-
ternatingSearch finds a better translation e for
f by solving FIXED ALIGNMENT DECODING.
For this purpose it employs NaiveDecode. Hav-
ing computed e, the algorithm computes a bet-
ter alignment (ˆa) between e and f by solving
VITERBI ALIGNMENT using Viterbi algorithm.
The new alignment thus found is used by the al-
gorithm in the subsequent iteration. At the end
of each iteration the algorithm checks whether
it has made progress. The algorithm returns the
best translation of the source f across a range
of target sentence lengths.
The analysis of AlternatingSearch is compli-
cated by the fact that the number of iterations
(see step 2.b) depends on the input. It is rea-
sonable to assume that the length of the source
sentence (m) is an upper bound on the number
of iterations. In practice, however, the number
of iterations is typically O(1). There are 3m/2
candidate sentence lengths for the translation
(l varies from m/2 to 2m) and both NaiveDe-
code and Viterbi are O(m). Therefore, the time
complexity of AlternatingSearch is Oparenleftbigm2parenrightbig.
4 A Linear Time Algorithm for
FIXED ALIGNMENT DECODING
A key component of all our algorithms is
a linear time algorithm for the problem
FIXED ALIGNMENT DECODING. Recall that in
FIXED ALIGNMENT DECODING, we are given
the target length l and a mapping ˜afrom source
words to target positions. The goal is then to
find the optimal translation with ˜a as the align-
ment. In this section, we give a dynamic pro-
gramming based solution to this problem. Our
solution is based on a new formulation of IBM
translation models. We begin our discussion
with a few technical definitions.
Alignment ˜a maps each of the source words
fj,j = 1,...,mto a target position in the range
[0,...,l]. Define a mapping ψ from [0,...,l] to
subsets of {1,...,m} as follows:
ψ(i) = {j : j ∈ {1,...,m} ∧ ˜aj =i} ∨ i = 0,...,l.
ψ(i) is the set of source positions which are
mapped to the target location i by the align-
ment ˜a and the fertility of the target position i
is φi = |ψ(i)|.
We can rewrite each of the IBM models
Pr(f,˜a|e) as follows:
Pr(f,˜a|e) = ξ
lproductdisplay
i=1
TiDiNi.
Table 2 shows the breaking of Pr(f,˜a|e) into
the constituents Ti,Di and Ni. As a conse-
quence, we can write Pr(f,˜a|e)Pr(e) as:
Pr(f,˜a|e)Pr(e) = ξλ
lproductdisplay
i=1
TiDiNiLi
where Li = trigram(ei|ei−2,ei−1) and λ is the
trigram probability of the boundary word.
The above reformulation of the optimiza-
tion function of the decoding problem allows
us to employ Dynamic Programming for solv-
ing FIXED ALIGNMENT DECODING efficiently.
Note that each wordei has only a constant num-
ber of candidates in the vocabulary. Therefore,
the set of words e1,...,el that maximizes the
LHS of the above optimization function can be
found in O(m) time using the standard Dy-
namic Programming algorithm (Cormen et al.,
2001).
5 Experiments and Results
In this section we describe our experimental
setup and present the initial results. Our goal
Model ξ Ti Di Ni
1 ǫ(m|l)(l+1)m producttextk∈ψ(i) t(fk|ei) 1 1
2 ǫ(m|l) producttextk∈ψ(i) t(fk|ei) producttextk∈ψ(i) a(i|k,m,l) 1
3 n(φ0|m)pm−2φ00 pφ01 producttextk∈ψ(i) t(fk|ei) producttextk∈ψ(i) d(k|i,m,l) φi! n(φi|ei)
Table 2: Pr(f,˜a|e) for IBM Models
was not only to evaluate the performance of our
algorithms on real data, but also to evaluate
how easy it is to code the algorithm and whether
a straightforward implementation of the algo-
rithm with no parameter tuning can give satis-
factory results.
We implemented the algorithms in C++ and
conducted the experiments on an IBM RS-6000
dual processor machine with 1 GB of RAM. We
built a French-English translation model (IBM
Model 3) by training over a corpus of 100 K sen-
tence pairs from theHansard corpus. Thetrans-
lation direction was from French to English. We
built an English language model by training
over a corpus consisting of about 800 million
words. We divided the test sentences into sev-
eral classes based on their length. Each length
class consisted of 300 test French sentences.
We implemented four algorithms -1.1 (NaiveDe-
code), 1.2 (Alternating Search with l restricted
to m), 2.1 (NaiveDecode with l varying from
m/2 to 2m) and 2.2 (Alternating Search). In
order to compare the performance of the al-
gorithms proposed in this paper with a previ-
ous decoding algorithm, we also implemented
the dynamic programming based algorithm by
(Tillman, 2001). For each of the algorithms, we
computed the following:
1. Average time taken for translation for
each length class.
2. NIST score of the translations for each
length class.
3. Average value of the optimization
function for the translations for each
length class.
The results of the experiments are summa-
rized in Plots 1, 2 and 3. In all the plots, the
length class is denoted by the x-axis. 11-20 indi-
cates the class with sentences of length between
11 words to 20 words. 51 indicates the group
of sentences with sentence length 51 or more.
Plot 1 shows the average time taken by the al-
gorithms for translating the sentences in each
length class. Time is shown in seconds on a log
scale. Plot 2 shows the NIST score of the trans-
lations for each length class while Plot 3 shows
the average log score of the translations (-ve log
ofPr(f,a|e)Pr(e)) again for each length class.
It can be seen from Plot 1 that all of our al-
gorithms are indeed very fast in practice. They
are, in fact, an order faster than the Held-Karp
algorithm. Our algorithms are able to trans-
late even long sentences (50+ words) in a few
seconds.
Plot 3 shows that the log scores of the trans-
lations computed by our algorithms are very
close to those computed by the Held-Karp al-
gorithm. Plot 2 compares the NIST scores ob-
tained with each of the algorithm. Among the
four algorithms based on our framework, Al-
gorithm 2.2 gives the best NIST scores as ex-
pected. Although, the log scores of our algo-
rithms are comparable to those of the Held-
Karp algorithm, our NIST scores are lower. It
should be noted that the mathematical quan-
tity that our algorithm tries to optimize is the
log score. Plot 3 shows that our algorithms are
quite good at findingsolutions with good scores.
 0.01
 0.1
 1
 10
 100
 1000
 10000
0-10 11-20 21-30 31-40 41-50 51-
Time in seconds
Sentence Length
Decoding Time
"algorithm 1.1"
"algorithm 1.2"
"algorithm 2.1"
"algorithm 2.2"
"algorithm H-K"
Figure 1: Average decoding time
6 Conclusions
The algorithmic framework developed in this
paper is powerful as it yields several decoding
algorithms. At one end of the spectrum is a
provably linear time algorithm for computing
a suboptimal solution and at the other end is
an exponential time algorithm for computing
 3
 3.5
 4
 4.5
 5
 5.5
 6
 6.5
 7
0-10 11-20 21-30 31-40 41-50 51-
NIST Score
Sentence Length
NIST Scores
"algorithm 1.1"
"algorithm 1.2"
"algorithm 2.1"
"algorithm 2.2"
"algorithm H-K"
Figure 2: NIST scores
 0
 50
 100
 150
 200
 250
 300
 350
 400
0-10 11-20 21-30 31-40 41-50 51-
logscore
Sentence Length
Logscores
"algorithm 1.1"
"algorithm 1.2"
"algorithm 2.1"
"algorithm 2.2"
"algorithm H-K"
Figure 3: Log score
the optimal solution. We have also shown that
alternating maximization can be employed to
come up with Oparenleftbigm2parenrightbig decoding algorithm. Two
questions in this connection are:
1. Is it possible to reduce the complexity
of AlternatingSearch to O(m)?
2. Instead of exploring each alignment
separately, is it possible to explore a
bunch of alignments in one shot?
Answers to these questions will result in faster
and more efficient decoding algorithms.
7 Acknowledgements
We are grateful to Raghu Krishnapuram for his
insightful comments on an earlier draft of this
paper and Pasumarti Kamesam for his help dur-
ing the course of this work.

References

A. Berger, P. Brown, S. Della Pietra, V. Della
Pietra, A. Kehler, and R. Mercer. 1996. Lan-
guage translation apparatus and method us-
ing context-based translation models. United
States Patent 5,510,981.

P. Brown, S. Della Pietra, V. Della Pietra,
and R. Mercer. 1993. The mathematics of
machine translation: Parameter estimation.
Computational Linguistics, 19(2):263–311.

T. H. Cormen, C. E. Leiserson, R. L. Rivest,
and C. Stein. 2001. The MIT Press, Cam-
bridge.

M. R. Garey and D. S. Johnson. 1979. W. H.
Freeman and Company, New York.

U. Germann, M. Jahr, D. Marcu, and K. Ya-
mada. 2003. Fast decoding and optimal de-
coding for machine translation. Artificial In-
telligence.

Ulrich Germann. 2003. Greedy decoding for
statistical machine translation in almost lin-
ear time. In Proceedings of HLT-NAACL
2003. Edmonton, Canada.

M. Held and R. Karp. 1962. A dynamic pro-
gramming approach to sequencing problems.
J. SIAM, 10(1):196–210.

F. Jelinek. 1969. A fast sequential decoding al-
gorithm using a stack. IBM Journal Reseach
and Development, 13:675–685.

Kevin Knight. 1999. Decoding complexity in
word-replacement translation models. Com-
putational Linguistics, 25(4).

F. Och, N. Ueffing, and H. Ney. 2001. An ef-
ficient a* search algorithm for statistical ma-
chine translation. In Proceedings of the ACL
2001 Workshop on Data-Driven Methods in
Machine Translation, pages 55–62. Toulouse,
France.

C. Tillman and H. Ney. 2000. Word reorder-
ing and dp-based search in statistical machine
translation. In Proceedings of the 18th COL-
ING, pages 850–856. Saarbrucken, Germany.

Christoph Tillman. 2001. Word re-ordering
and dynamic programming based search
algorithm for statistical machine transla-
tion. Ph.D. Thesis, University of Technology
Aachen, pages 42–45.

R. Udupa and T. Faruquie. 2004. An english-
hindi statistical machine translation system.
In Proceedings of the 1st IJCNLP, pages 626–
632. Sanya, Hainan Island, China.

Y. Wang and A. Waibel. 1997. Decoding al-
gorithm in statistical machine translation. In
Proceedings of the 35th ACL, pages 366–372.
Madrid, Spain.
