Word to word alignment strategies
J¨org Tiedemann
Department of Linguistics and Philology
Uppsala University
Uppsala/Sweden
joerg@stp.ling.uu.se
Abstract
Word alignment is a challenging task aim-
ing at the identification of translational re-
lations between words and multi-word units
in parallel corpora. Many alignment strate-
gies are based on links between single words.
Different strategies can be used to find the
optimal word alignment using such one-to-
one word links including relations between
multi-word units. In this paper seven algo-
rithms are compared using a word alignment
approach based on association clues and
an English-Swedish bitext together with a
handcrafted reference alignment used for
evaluation.
1 Introduction
Word alignment is the task of identifying trans-
lational relations between words in parallel cor-
pora with the aim of re-using them in natu-
ral language processing. Typical applications
that make use of word alignment techniques
are machine translation and multi-lingual lex-
icography. Several approaches have been pro-
posed for the automatic alignment of words and
phrases using statistical techniques and align-
ment heuristics, e.g. (Brown et al., 1993; Vogel
et al., 1996; Garc´ıa-Varea et al., 2002; Ahren-
berg et al., 1998; Tiedemann, 1999; Tufis and
Barbu, 2002; Melamed, 2000). Word align-
ment usually includes links between so-called
multi-word units (MWUs) in cases where lex-
ical items cannot be split into separated words
with appropriate translations in another lan-
guage. See for example the alignment between
an English sentence and a Swedish sentence il-
lustrated in figure 1. There are MWUs in both
languages aligned to corresponding translations
in the other language. The Swedish compound
“mittplatsen” corresponds to three words in En-
glish (“the middle seat”) and the English verb
“dislike” is translated into a Swedish particle
verb “tycker om” (English: like) that has been
negated using “inte”. Most approaches model
Jag tar mittplatsen, vilket jag inte tycker om, men det gör mig inte så mycket.
I take the middle seat, which I dislike, but  I am not really put out.
Figure 1: A word alignment example from Saul
Bellow “To Jerusalem and back: a personal ac-
count” (Bellow, 1976) and its Swedish transla-
tion (Bellow, 1977) (the Bellow corpus).
word alignment as links between words in the
source language and words in the target lan-
guage as indicated by the arrows in figure 1.
However, in cases like the English expression
“I am not really put out” which corresponds to
the Swedish expression “det g¨or mig inte s˚a my-
cket” there is no proper way of connecting single
words with each other in order to express this
relation. In some approaches such relations are
constructed in form of an exhaustive set of links
between all word pairs included in both expres-
sions (Melamed, 1998; Mihalcea and Pedersen,
2003). In other approaches complex expressions
are identified in a pre-processing step in order
to handle them as complex units in the same
manner as single words in alignment (Smadja
et al., 1996; Ahrenberg et al., 1998; Tiedemann,
1999).
The one-to-one word linking approach seems
to be very limited. However, single word links
can be combined in order to describe links be-
tween multi-word units as illustrated in figure
1. In this paper we investigate different align-
ment strategies using this approach1. For this
we apply clue alignment introduced in the next
section.
2 Word alignment with clues
The clue alignment approach has been pre-
sented in (Tiedemann, 2003). Alignment clues
represent probabilistic indications of associa-
1A similar study on statistical alignment models is
included in (Och and Ney, 2003).
tions between lexical items collected from dif-
ferent sources. Declarative clues can be taken
from linguistic resources such as bilingual dic-
tionaries. They may also include pre-defined
relations between lexical items based on cer-
tain features such as parts of speech. Estimated
clues are derived from the parallel data using,
for example, measures of co-occurrence (e.g. the
Dice coefficient (Smadja et al., 1996)), statisti-
cal alignment models (e.g. IBM models from
statistical machine translation (Brown et al.,
1993)), or string similarity measures (e.g. the
longest common sub-sequence ratio (Melamed,
1995)). They can also be learned from pre-
viously aligned training data using linguistic
and contextual features associated with aligned
items. Relations between certain word classes
with respect to the translational association of
words belonging to these classes is one example
of such clues that can be learned from aligned
training data. In our experiments, for example,
we will use clues that indicate relations between
lexical items based on their part-of-speech tags
and their positions in the sentence relative to
each other. They are learned from automati-
cally word-aligned training data.
The clue alignment approach implements a
way of combining association indicators on a
word-to-word level. The combination of clues
results in a two-dimensional clue matrix. The
values in this matrix express the collected evi-
dence of an association between word pairs in
bitext segments taken from a parallel corpus.
Word alignment is then the task of identifying
the best links according to the associations indi-
cated in the clue matrix. Several strategies for
such an alignment are discussed in the following
section.
3 Alignment strategies
A clue matrix summarizes information from var-
ious sources that can be used for the identifica-
tion of translation relations. However, there is
no obvious way to utilize this information for
word alignment as we explicitly include multi-
word units (MWUs) in our approach. The clue
matrix in figure 2 has been obtained for a bi-
text segment from our English-Swedish test cor-
pus (the Bellow corpus) using a set of weighted
declarative and estimated clues.
There are many ways of “clustering” words
together and there is no obvious maximization
procedure for finding the alignment optimum
when MWUs are involved. The alignment pro-
ingen visar s¨arskilt mycket t˚alamod
no 29 0 0 1 9
one 16 2 1 1 13
is 1 13 1 2 0
very 0 2 18 17 1
patient 2 1 4 12 6
Figure 2: A clue matrix (all values in %).
cedure depends very much on the definition of
an optimal alignment. The best alignment for
our example would probably be the set of the
following links:
links =
braceleftBigg no one ingen
is patient visar t˚alamod
very s¨arskilt mycket
bracerightBigg
A typical procedure for automatic word align-
ment is to start with one-to-one word links.
Links that have common source or target lan-
guage words are called overlapping links. Sets
of overlapping links, which do not overlap with
any other link outside the set, are called link
clusters (LC). Aligning words one by one of-
ten produces overlaps and in this way implic-
itly creates aligned multi-word-units as part of
link clusters. A general word-to-word align-
ment L for a given bitext segment with N
source language words (s1s2...sN) and M tar-
get language words (t1t2...tM) can be formally
described as a set of links L = {L1,L2,...,Lx}
with Lx = [sx1,tx2],x1 ∈ {1..N},x2 ∈ {1..M}.
This general definition allows varying num-
bers of links (0 ≤ x ≤ N ∗ M) within possible
alignments L. It is not straightforward how to
find the optimal alignment as L may include
different numbers of links.
3.1 Directional alignment models
One word-to-word alignment approach is to as-
sume a directional word alignment model simi-
lar to the models in statistical machine trans-
lation. The directional alignment model as-
sumes that there is at most one link for each
source language word. Using alignment clues,
this can be expressed as the following optimiza-
tion problem: ˆLD = argmaxLD producttextNn=1 C(LDn )
where LD = {LD1 ,LD2 ,..,LDN} is a set of links
LDn =
bracketleftBig
sn,taDn
bracketrightBig
with aDn ∈ {1..M} and C(LDn )
is the combined clue value for the linked items
sn and taDn . In other words, word alignment
is the search for the best link for each source
language word. Directional models do not al-
low multiple links from one item to several tar-
get items. However, target items can be linked
to multiple source language words as they can
be aligned to the same target language word.
The direction of alignment can easily be re-
versed, which leads to the inverse directional
alignment: ˆLI = argmaxLI producttextMm=1 C(LIm) with
links LIm =
bracketleftBig
saIm,tm
bracketrightBig
and aIm ∈ {1..N}. In the
inverse directional alignment, source language
words can be linked to multiple words but not
the other way around. The following figure il-
lustrates directional alignment models applied
to the example in figure 2:
ˆLD =



no ingen
one ingen
is visar
very s¨arskilt
patient mycket



LCD =



no one ingen
is visar
very s¨arskilt
patient mycket



Using the inverse directional alignment strategy
we would obtain the following links:
ˆLI =



no ingen
is visar
very s¨arskilt
very mycket
one t˚alamod



LCI =



no ingen
is visar
very s¨arskilt mycket
one t˚alamod



3.2 Combined directional alignment
Directional link sets can be combined in several
ways. The union of link sets (ˆL∪ = ˆLD ∪ ˆLI)
usually causes many overlaps and, hence, very
large link clusters. On the other hand, an inter-
section of link sets (ˆL∩ = ˆLD ∩ ˆLI) removes all
overlaps and leaves only highly confident one-to-
one word links behind. Using the same example
from above we obtain the following alignments:
ˆL∪ =





no ingen
one ingen
one t˚alamod
is visar
very s¨arskilt
very mycket
patient mycket





LC∪ =
braceleftBigg no one ingen t˚alamod
is visar
very patient s¨arskilt mycket
bracerightBigg
The intersection of links produces the following
sets:
ˆL∩ =
braceleftBigg no ingen
is visar
very s¨arskilt
bracerightBigg
LC∩ = ˆL∩
The union and the intersection of links do not
produce satisfactory results as seen in the ex-
ample. Another alignment strategy is a refined
combination of link sets (ˆLR = {ˆLD ∩ ˆLI} ∪
{LR1 ,...,LRr }) as suggested by (Och and Ney,
2000b). In this approach, the intersection of
links is iteratively extended by additional links
LRr which pass one of the following two con-
straints:
• A new link is accepted if both items in the
link are not yet aligned.
• Mapped on a two-dimensional bitext space,
the new link is either vertically or horizon-
tally adjacent to an existing link and the
new link does not cause any link to be adja-
cent to other links in both dimensions (hor-
izontally and vertically).
Applying this approach to the example, we get:
ˆLR =





no ingen
is visar
very s¨arskilt
very mycket
one ingen
patient t˚alamod





LCR =



no one ingen
is visar
very s¨arskilt mycket
patient t˚alamod



3.3 Competitive linking
Another alignment approach is the compet-
itive linking approach proposed by Melamed
(Melamed, 1996). In this approach, one as-
sumes that there are only one-to-one word
links. The alignment is done in a greedy “best-
first” search manner where links with the high-
est association scores are aligned first, and the
aligned items are then immediately removed
from the search space. This process is re-
peated until no more links can be found. In
this way, the optimal alignment (ˆLC) for non-
overlapping one-to-one links is found. The num-
ber of possible links in an alignment is reduced
to min(N,M). Using competitive linking with
our example we yield:
ˆLC =



no ingen
very s¨arskilt
is visar
one t˚alamod
patient mycket



LCC = ˆLC
3.4 Constrained best-first alignment
Another iterative alignment approach has been
proposed in (Tiedemann, 2003). In this ap-
proach, the link LBx = [sx1,tx2] with the high-
est score in the clue matrix ˆC(sx1,tx2) =
maxsi,tj(C(si,tj)) is added to the set of link
clusters if it fulfills certain constraints. The top
score is removed from the matrix (i.e. set to
zero) and the link search is repeated until no
more links can be found. This is basically a
constrained best-first search. Several constraints
are possible. In (Tiedemann, 2003) an adja-
cency check is suggested, i.e. overlapping links
are accepted only if they are adjacent to other
links in one and only one existing link cluster.
Non-overlapping links are always accepted (i.e.
a non-overlapping link creates a new link clus-
ter). Other possible constraints are clue value
thresholds, thresholds for clue score differences
between adjacent links, or syntactic constraints
(e.g. that link clusters may not cross phrase
boundaries). Using a best-first search strategy
with the adjacency constraint we obtain the fol-
lowing alignment:
ˆLB =





no ingen
very s¨arskilt
very mycket
one ingen
is visar
patient mycket
patient t˚alamod





LCB =
braceleftBigg no one ingen
is visar
very patient s¨arskilt mycket t˚alamod
bracerightBigg
3.5 Summary
None of the alignment approaches described
above produces the preferred reference align-
ment in our example using the given clue ma-
trix. However, simple iterative procedures come
very close to the reference and produce ac-
ceptable alignments even for multi-word units,
which is promising for an automatic clue align-
ment system. Directional alignment models de-
pend very much on the relation between the
source and the target language. One direc-
tion usually works better than the other, e.g.
an alignment from English to Swedish is bet-
ter than Swedish to English because in En-
glish terms and concepts are often split into
several words whereas Swedish tends to con-
tain many compositional compounds. Symmet-
ric approaches to word alignment are certainly
more appropriate for general alignment systems
than directional ones.
4 Evaluation methodology
Word alignment quality is usually measured in
terms of precision and recall. Often, previously
created gold standards are used as reference data
in order to simplify automatic tests of alignment
attempts. Gold standards can be re-used for ad-
ditional test runs which is important when ex-
amining different parameter settings. However,
recall and precision derived from information re-
trieval have to be adjusted for the task of word
alignment. The main difficulty with these mea-
sures in connection with word alignment arises
with links between MWUs that cause partially
correct alignments. It is not straightforward
how to judge such links in order to compute
precision and recall. In order to account for
partiality we use a slightly modified version of
the partiality score Q proposed in (Ahrenberg
et al., 2000)2:
Qprecisionx = |alg
xsrc ∩corrxsrc|+|algxtrg ∩corrxtrg|
|algxsrc|+|algxtrg|
Qrecallx = |alg
xsrc ∩corrxsrc|+|algxtrg ∩corrxtrg|
|corrxsrc|+|corrxtrg|
The set of algxsrc includes all source language
words of all proposed links if at least one of
them is partially correct with respect to the ref-
erence link x from the gold standard. Similarly,
algxtrg refers to all the proposed target language
words. corrxsrc and corrxtrg refer to the sets of
source and target language words in link x of
the gold standard. Using the partiality value
Q, we can define the recall and precision met-
rics as follows:
Rmwu =
summationtextX
x=1 Qrecallx
|correct| ,Pmwu =
summationtextX
x=1 Qprecisionx
|aligned|
A balanced F-score can be used to combine
both, precision and recall:
Fmwu = (2∗Pmwu ∗Rmwu)/(Pmwu +Rmwu).
2Qx ≡ 0 for incorrect links for both, precision and
recall.
 65
 70
 75
 80
 85
 90
 60  65  70  75  80  85  90
recall
precision
different search strategies
dice+pp
giza
giza+pp
dice+pp
giza
giza+pp
F=80%
F=70%
competitive
union
intersection
refined
best-first
directional
inverse
Figure 3: Different alignment search strategies. Clue alignment settings: dice+pp, giza, and
giza+pp. Alignment strategies: directional (LD), inverse directional (LI), union (L∪), intersec-
tion (L∩), refined (LR), competitive linking (LC), and constrained best-first (LB).
Alternative measures for the evaluation of
one-to-one word links have been proposed in
(Och and Ney, 2000a; Och and Ney, 2003).
However, these measures require completely
aligned bitext segments as reference data. Our
gold standards include random samples from
the corpus instead (Ahrenberg et al., 2000).
Furthermore, we do not split MWU links as pro-
posed by (Och and Ney, 2000a). Therefore, the
measures proposed above are a natural choice
for our evaluations.
5 Experiments
Several alignment search strategies have been
discussed in the previous sections. Our clue
aligner implements these strategies in order to
test their impact on the alignment performance.
In the experiments we used one of our English-
Swedish bitext from the PLUG corpus (S˚agvall
Hein, 2002), the novel “To Jerusalem and back:
A personal account” by Saul Bellow. This cor-
pus is fairly small (about 170,000 words) and
therefore well suited for extensive studies of
alignment parameters. For evaluation, a gold
standard of 468 manually aligned links is used
(Merkel et al., 2002). It includes 122 links with
MWUs either on the source or on the target
side (= 26% of the gold standard). 109 links
contain source language MWUs, 59 links target
language MWUs, and 46 links MWUs in both
languages. 10 links are null links, i.e. a link
of one word to an empty string. Three differ-
ent clue types are used for the alignment: the
Dice coefficient (dice), lexical translation proba-
bilities derived from statistical translation mod-
els (giza) using the GIZA++ toolbox (Och and
Ney, 2003), and, finally, POS/relative-word-
position-clues learned from previous alignments
(pp). Alignment strategies are compared on the
basis of three different settings: dice+pp, giza,
and giza+pp. In figure 3, the alignment results
are shown for the three clue settings using dif-
ferent search strategies as discussed earlier.
5.1 Discussion
Figure 3 illustrates the relation between pre-
cision and recall when applying different algo-
rithms. As expected, the intersection of direc-
tional alignment strategies yields the highest
precision at the expense of recall, which is gen-
erally lower than for the other approaches. Con-
trary to the intersection, the union of directional
links produces alignments with the highest re-
call values but lower precision than all other
search algorithms. Too many (partially) incor-
rect MWUs are included in the union of direc-
tional links. The intersection on the other hand
includes only one-to-one word links that tend
to be correct. However, many links are missed
in this strategy evident in the low recall val-
giza+pp non-MWU MWU-links (122 in total) English MWU Swedish MWU both
P R correct partial P R P R P R P R
directional 82.66 88.73 19 77 59.23 67.10 58.04 69.84 68.93 70.16 68.85 77.52
inverse 81.93 87.28 5 50 64.39 59.74 64.68 57.87 67.62 72.57 69.21 71.78
union 80.13 91.62 21 88 55.61 73.24 55.57 73.29 63.40 81.74 65.50 84.25
intersection 91.67 85.84 0 98 74.35 52.44 73.56 53.44 83.04 60.31 83.33 64.89
competitive 88.44 88.44 0 105 64.66 58.59 66.11 59.71 72.13 67.94 77.66 73.23
refined 84.61 88.15 28 78 65.91 72.61 65.53 72.70 74.37 80.56 75.86 83.03
best-first 85.07 89.02 28 79 66.40 73.40 66.23 73.77 75.38 81.35 77.52 84.48
Table 1: Evaluations of different link types for the setting giza+pp.
ues. Directional alignment strategies generally
yield lower F-values than other refined symmet-
ric alignment strategies. Their implementation
is straightforward but the results are highly de-
pendent on the language pair under consider-
ation. The differences between the two align-
ment directions in our example are surprisingly
inconsistent. Using the giza clues both align-
ment results are very close in terms of preci-
sion and recall whereas a larger difference can
be observed using the other two clue settings
when applying different directional alignment
strategies. Competitive linking is somewhat
in between the intersection approach and the
two symmetric approaches, “best-first” and “re-
fined”. This could also be expected as competi-
tive linking only allows non-overlapping one-to-
one word links. The refined bi-directional align-
ment approach and the constrained best-first
approach are almost identical in our examples
with a more or less balanced relation between
precision and recall. One advantage of the best-
first approach is the possibility of incorporating
different constraints that suit the current task.
The adjacency check is just one of the possi-
ble constraints. For example, syntactic criteria
could be applied in order to force linked items
to be complete according to existing syntactic
markup. Non-contiguous elements could also be
identified using the same approach simply by
removing the adjacency constraint. However,
this seems to increase the noise significantly ac-
cording to experiments not shown in this paper.
Further investigations on optimizing alignment
constraints for certain tasks have to be done in
the future. Focusing on MWUs, the numbers in
table 1 show a clear picture about the difficul-
ties of all approaches to find correct MWU links.
Symmetric alignment strategies like refined and
best-first produce in general the best results for
MWU links. However, the main portion of such
links is only partially correct even for these ap-
proaches. Using our partiality measure, the
intersection of directional alignments still pro-
duces the highest precision values when consid-
ering MWU links only even though no MWUs
are included in these alignments at all. The best
results among MWU links are achieved for the
ones including MWUs in both languages. How-
ever, these results are still significantly lower
than for single-word links (non-MWU).
6 Conclusions
According to our results different alignment
strategies can be chosen to suit particular
needs. Concluding from the experiments, re-
strictive methods like the intersection of direc-
tional alignments or competitive linking should
be chosen if results with high precision are re-
quired (which are mostly found among one-
to-one word links). This is, for example, the
case in automatic extraction of bilingual lexi-
cons where noise should be avoided as much as
possible. A strong disadvantage of these ap-
proaches is that they do not include MWUs at
all. Other strategies should be chosen for appli-
cations, which require a comprehensive cover-
age as, for example, machine translation. Sym-
metric approaches such as the refined combi-
nation of directional alignments and the con-
strained best-first alignment strategy yield the
highest overall performance. They produce the
best balance between precision and recall and
the highest scores in terms of F-values.

References

Lars Ahrenberg, Magnus Merkel, and Mikael
Andersson. 1998. A simple hybrid aligner for
generating lexical correspondences in parallel
texts. In Christian Boitet and Pete White-
lock, editors, Proceedings of the 36th Annual
Meeting of the Association for Computational
Linguistics and the 17th International Con-
ference on Computational Linguistics, pages
29–35, Montreal, Canada.

Lars Ahrenberg, Magnus Merkel, Anna S˚agvall
Hein, and J¨org Tiedemann. 2000. Evaluation
of word alignment systems. In Proceedings
of the 2nd International Conference on Lan-
guage Resources and Evaluation, volume III,
pages 1255–1261, Athens, Greece.

Saul Bellow. 1976. From Jerusalem and back:
a personal account. The Viking Press, New
York, USA.

Saul Bellow. 1977. Jerusalem tur och retur.
Bonniers, Stockholm. Translation of Caj
Lundgren.

Peter F. Brown, Stephen A. Della Pietra, Vin-
cent J. Della Pietra, and Robert L. Mercer.
1993. The mathematics of statistcal machine
translation: Parameter estimation. Compu-
tational Linguistics, 19(2):263–311, June.

Ismael Garc´ıa-Varea, Franz Josef Och, Her-
mann Ney, and Francisco Casacuberta. 2002.
Improving alignment quality in statistical
machine translation using context-dependent
maximum entropy models. In Proceedings of
the 19th International Conference on Compu-
tational Linguistics, pages 1051–1054, Taipei,
Taiwan, August.

I. Dan Melamed. 1995. Automatic evaluation
and uniform filter cascades for inducing n-
best translation lexicons. In David Yarovsky
and Kenneth Church, editors, Proceedings of
the 3rd Workshop on Very Large Corpora,
pages 184–198, Boston, MA. Association for
Computational Linguistics.

I. Dan Melamed. 1996. Automatic construction
of clean broad-coverage lexicons. In Proceed-
ings of the 2nd Conference the Association for
Machine Translation in the Americas, pages
125–134, Montreal, Canada.

I. Dan Melamed. 1998. Annotation style guide
for the Blinker project, version 1.0. IRCS
Technical Report 98-06, University of Penn-
sylvania, Philadelphia, PA.

I. Dan Melamed. 2000. Models of transla-
tional equivalence among words. Computa-
tional Linguistics, 26(2):221–249, June.

Magnus Merkel, Mikael Andersson, and Lars
Ahrenberg. 2002. The PLUG link annotator
- interactive construction of data from par-
allel corpora. In Lars Borin, editor, Parallel
Corpora, Parallel Worlds. Rodopi, Amster-
dam, New York. Proceedings of the Sympo-
sium on Parallel Corpora, Department of Lin-
guistics, Uppsala University, Sweden,1999.

Rada Mihalcea and Ted Pedersen. 2003. An
evaluation exercise for word alignment. In
Workshop on Building and Using Parallel
Texts: Data Driven Machine Translation and
Beyond, pages 1–10, Edmonton, Canada,
May.

Franz-Josef Och and Hermann Ney. 2000a.
A comparison of alignment models for sta-
tistical machine translation. In Proceed-
ings of the 18th International Conference on
Computational Linguistics, pages 1086–1090,
Saarbr¨ucken, Germany, July.

Franz Josef Och and Hermann Ney. 2000b. Im-
proved statistical alignment models. In Proc.
of the 38th Annual Meeting of the Associ-
ation for Computational Linguistics, pages
440–447.

Franz Josef Och and Hermann Ney. 2003. A
systematic comparison of various statistical
alignment models. Computational Linguis-
tics, 29(1):19–51.

Anna S˚agvall Hein. 2002. The PLUG project:
Parallel corpora in Link¨oping, Uppsala, and
G¨oteborg: Aims and achievements. In Lars
Borin, editor, Parallel Corpora, Parallel
Worlds. Rodopi, Amsterdam, New York.

Proceedings of the Symposium on Parallel
Corpora, Department of Linguistics, Uppsala
University, Sweden,1999.

Frank A. Smadja, Kathleen R. McKeown, and
Vasileios Hatzivassiloglou. 1996. Translating
collocations for bilingual lexicons: A statis-
tical approach. Computational Linguistics,
22(1), pages 1–38.

J¨org Tiedemann. 1999. Word alignment -
step by step. In Proceedings of the 12th
Nordic Conference on Computational Lin-
guistics, pages 216–227, University of Trond-
heim, Norway.

J¨org Tiedemann. 2003. Combining clues for
word alignment. In Proceedings of the 10th
Conference of the European Chapter of the
Association for Computational Linguistics
(EACL), pages 339–346, Budapest, Hungary,
April.

Dan Tufis and Ana-Maria Barbu. 2002. Lexical
token alignment: Experiments, results and
applications. In Proceedings from The 3rd
International Conference on Language Re-
sources and Evaluation, pages 458–465, Las
Palmas, Spain.

Stephan Vogel, Hermann Ney, and Christoph
Tillmann. 1996. HMM-based word alignment
in statistical translation. In Proceedings of
the 16th International Confernece on Compu-
tational Linguistics, pages 836–841, Copen-
hagen, Denmark.
