HMM-Based Word Alignment in Statistical Translation 
Stephan Vogel Hermann Ney Christoph Tillmann 
Lehrstuhl ffir Informatik V, RWTH Aachen 
D-52056 Aachen, Germany 
{vogel, ney, t illmann}@inf ormat ik. rwth-aachen, de 
Abstract 
In this paper, we describe a new model 
for word alignment in statistical trans- 
lation and present experimental results. 
The idea of the model is to make the 
alignment probabilities dependent on the 
differences in the alignment positions 
rather than on the absolute positions. 
To achieve this goal, the approach us- 
es a first-order Hidden Markov model 
(HMM) for the word alignment problem 
as they are used successfully in speech 
recognition for the time alignment prob- 
lem. The difference to the time align- 
ment HMM is that there is no monotony 
constraint for the possible word order- 
ings. We describe the details of the mod- 
el and test the model on several bilingual 
corpora. 
1 Introduction 
In this paper, we address the problem of word 
alignments for a bilingual corpus. In the recent 
years, there have been a number of papers con- 
sidering this or similar problems: (Brown et al., 
1990), (Dagan et al., 1993), (Kay et al., 1993), 
(Fung et al., 1993). 
In our approach, we use a first-order Hidden 
Markov model (HMM) (aelinek, 1976), which is 
similar, but not identical to those used in speech 
recognition. The key component of this approach 
is to make the alignment probabilities dependent 
not on the absolute position of the word align- 
ment, but on its relative position; i.e. we consider 
the differences in the index of the word positions 
rather than the index itself. 
The organization of the paper is as follows. 
After reviewing the statistical approach to ma- 
chine translation, we first describe the convention- 
al model (mixture model). We then present our 
first-order HMM approach in lull detail. Finally 
we present some experimental results and compare 
our model with the conventional model. 
2 Review: Translation Model 
The goal is the translation of a text given in some 
language F into a target language E. For conve- 
nience, we choose for the following exposition as 
language pair French and English, i.e. we are giv- 
en a French string f~ = fx ...fj...fJ, which is to be 
translated into an English string e / = el...ei...cl. 
Among all possible English strings, we will choose 
the one with the highest probability which is given 
by Bayes' decision rule: 
a{ = argmax{P,.(c{lAa)} q 
= argmax {Pr(ejt) .l'r(f le\[)} el ~ 
Pr(e{) is the language model of the target lan- 
guage, whereas Pr(fJle{) is the string translation 
model. The argmax operation denotes the search 
problem. In this paper, we address the problem 
of introducing structures into the probabilistic de- 
pendencies in order to model the string translation 
probability Pr(f~ le{). 
3 Alignment Models 
A key issne in modeling the string translation 
probability Pr(J'~le I) is the question of how we 
define the correspondence between the words of 
the English sentence and the words of the French 
sentence. In typical cases, we can assume a sort of 
pairwise dependence by considering all word pairs (fj, ei) 
for a given sentence pair I.-/1\[~'J', elqlj' We fur- 
ther constrain this model by assigning each French 
word to exactly one English word. Models describ- 
ing these types of dependencies are referred to as alignment models. 
In this section, we describe two models for word 
alignrnent in detail: 
,. a mixture-based alignment model, which was 
introduced in (Brown et al., 1990); 
• an HMM-based alignment model. 
In this paper, we address the question of how to 
define specific models for the alignment probabil- 
ities. The notational convention will be as fol- 
lows. We use the symbol Pr(.) to denote general 
836 
probability distributions with (nearly) no Sl)eeitic 
asSUml)tions. In contrast, for modcl-t)ased prol)-- 
ability distributions, we use the generic symbol 
v(.). 
3.1 Alignment with Mixture Distri|mtion 
Here, we describe the mixture-based alignment 
model in a fornmlation which is different fronl 
the original formulation ill (Brown el, a\[., 1990). 
We will ,is(: this model as reference tbr the IIMM- 
based alignments to lie 1)resented later. 
The model is based on a decomposition of the 
joint probability \[br ,l'~ into a product over the 
probabilities for each word J): 
a 
j=l 
wheFe~ fo\[' norll-la\]iz;i, tion 17(~/SOllS~ the 8elltC\]\[ce 
length probability p(J\] l) has been included. The 
next step now is to assutne a sort O\['l,airwise inter- 
act, ion between tim French word fj an(l each, F,n- 
glish word ci, i = 1, ...l. These dep('ndencies are 
captured in the lbrm of a rnixtnre distritmtion: 
1 
p(J)le{) = ~_.p(i, fjlc I) 
i=1 
I 
= ~_~p(ilj, l).p(fjle~) 
i=1 
Putting everything together, we have the following 
mixture-based ntodel: 
J l 
r,'(fi!l~I) = p(JIO ' H ~_~ \[~,(ilJ, l). ~,(j)led\] (1) 
j=l i=t 
with the following ingredients: 
• sentence length prob~d)ility: P(Jll); 
• mixture alignment probability: p(ilj , I); 
• translation probM)ility: p(f\[e). 
Assuming a tmifornl ~flignment prol)ability 
1 
.p(ilj, 1) = 7 
we arrive at the lh'st model proposed t)y (Brown 
et al., 1990). This model will be referred to as 
IB M 1 model. 
To train the translation probabilities p(J'fc), we 
use a bilingual (;orpus consisting of sentence pairs 
\[:/';4"1 : ', .,s Using the ,,laxin,ul , like- 
lihood criterion, we ol)tain the following iterative L a 
equation (Brown et al., 1990): 
/)(fie) = ~- will, 
$' 
A(f,e) = ~ 2 ~5(f,J).~) }~ a(e,e~.~) 
For unilbrm alignment probabilities, it can be 
shown (Brown et al., 1990), that there is only one 
optinnnn and therefore the I,',M algorithm (Baum, 
1!)72) always tinds the global optimum. 
For mixture alignment model with nonunilbrm 
alignment probabilities (subsequently referred to 
as IBM2 model), there ~tre to() many alignrnent 
parameters Pill j, I) to be estimated for smMl col 
pora. Therefore, a specific model tbr tile Mign- 
ment in:obabilities is used: 
r(i-j~-) (~) p(ilj , 1) = l . I 
Ei':l "( it --" J J-) 
This model assumes that the position distance rel- 
ative to the diagonal line of the (j, i) plane is the 
dominating factor (see Fig. 1). 'lb train this mod- 
el, we use the ,naximutn likelihood criterion in the 
so-called ulaximmn al)proximation, i.e. the likeli- 
hood criterion covers only tile most lik(-.ly align: 
inch, rather than the set of all alignm(,nts: 
d 
P,'(f(I,:I) ~ II ~"IU HO, ~)v(J} I,:~)\] (a) 
j=l 
In training, this criterion amounts to a sequence 
of iterations, each of which consists of two steps: 
* posilion alignmcnl: (riven the model parame- 
ters, deLerlniim the mosL likely position align- 
\]lient. 
• paramc, lcr cstimalion: Given the position 
alignment, i.e. goiug along the alignment 
paths for all sentence pairs, perform maxi- 
tnulu likelihood estimation of the model pa- 
rameters; for model-De(' distributions, these 
estimates result in relative frequencies. 
l)ue to the natnre of tile nfixture tnod(:l, there 
is no interaction between djacent word positions. 
Theretbre, the optimal position i for each posi- 
tion j can be determined in(lependently of the 
neighbouring positions. Thus l.he resulting train- 
ing procedure is straightforward. 
a.2 Alignment with HMM 
We now propose all HMM-based alignment model. 
'\['he motivation is that typicMly we have a strong 
localization effect in aligning the words in parallel 
texts (for language pairs fi:om \]ndoeuropean lan- 
guages): the words are not distrilmted arbitrarily 
over the senteuce \])ositions, but tend to form clus- 
ters. Fig. 1 illustrates this effect for the language 
pair German- 15'nglish. 
Each word of the German sentence is assigned 
to a word of the English sentence. The alignments 
have a strong tendency to preserve the local neigh- 
borhood when going from the one langnage to the 
other language. In mm,y cases, although not al~ 
ways, there is an even stronger restriction: the 
differeuce in the position index is smMler than 3. 
837 
DAYS 
BOTH 
ON 
EIGHT 
AT 
IT 
MAKE 
CAN 
WE 
IF 
THINK 
I 
WELL 
+ + + + + + + + +j~ + + 
+ + + + + + + ~J ~+ + 
+++++++/+÷+.-. 
+ + + + + + +/+ + + + + 
+ + + + + ~x~ + + + + + 
+ + + + +/+ D + + + + + 
++ ++~+++++++ 
+ + + _~ + + + + + + + 
+ + +~ + + ++ ++++ 
+ +jg + + + + + + + + + 
+~ +++ +++++++ 
g + + ++ + ++ + + + + 
z 
aa 
Figure 1: Word alignment for a German- English 
sentence pair. 
To describe these word-by-word aligmnents, we 
introduce the mapping j ---+ aj, which assigns a 
word fj in position j to a word el in position 
{ = aj. The concept of these alignments is similar 
to the ones introduced by (Brown et al., 1990), 
but we wilt use another type of dependence in the 
probability distributions. Looking at such align- 
ments produced by a hmnan expert, it is evident 
that the mathematical model should try to cap- 
ture the strong dependence of aj on the previous 
aligmnent. Therefore the probability of alignment 
aj for position j should have a dependence on the 
previous alignment aj _ 1 : 
p(ajiaj_l,i) , 
where we have inchided the conditioning on the 
total length \[ of the English sentence for normal- 
ization reasons. A sinfilar approach has been cho- 
sen by (Da.gan et al., 1993). Thus the problem 
formulation is similar to that of the time align- 
ment problem in speech recognition, where the 
so-called IIidden Markov models have been suc- 
cessfully used for a long time (Jelinek, 1976). Us- 
ing the same basic principles, we can rewrite the 
probability by introducing the 'hidden' alignments 
af := al...aj...aa for a sentence pair If,a; e{\]: 
Pr(f~al es) = ~_,Vr(fal, aT\[ eI't, 
a7 
,1 
= ~ 1-IP"(k,"stfT-',"{ -*,e/) 
a I j=l 
So far there has been no basic restriction of the 
approach. We now assume a first-order depen- 
dence on the alignments aj only: 
Vr(fj,aslf{ -~, J-* a I , el) 
where, in addition, we have assmned that tile 
translation probability del)ends only oil aj and not 
oil aj-:l. Putting everything together, we have the 
ibllowing llMM-based model: 
a 
Pr(f:i'le{) = ~ I-I \[p(ajlaj -', l).p(Y)lea,)\] (4) 
af J=, 
with the following ingredients: 
• IlMM alignment probability: p(i\]i', I) or 
p(aj laj_l, I); 
• translation probabflity: p(f\]e). 
In addition, we assume that the t{MM align- 
ment probabilities p(i\[i', \[) depend only on the 
jump width (i - i'). Using a set of non-negative 
parameters {s(i- i')}, we can write the IIMM 
alignment probabilities in the form: 
4i- i') (5) 
p(ili', i) = E' s(1 - i') 
1=1 
This form ensures that for each word position 
i', i' = 1, ..., I, the ItMM alignment probabilities 
satisfy the normMization constraint. 
Note the similarity between Equations (2) and 
(5). The mixtm;e model can be interpreted as a 
zeroth-order model in contrast to the first-order 
tlMM model. 
As with the IBM2 model, we use again the max- 
imum approximation: 
J 
Pr(fiSle~) "~ max\]--\[ \[p(asl<*j-1, z)p(fjl<~,)\] (6) a' / .ll. j,,, 
j=l 
In this case, the task of finding the optimal 
alignment is more involved than in the case of the 
mixture model (lBM2). Thereibre, we have to re- 
sort to dynainic programming for which we have 
the following typical reeursion formula: 
Q(i, j) = p(fj lel) ,nvax \[p(ili', 1). Q(i', j - 1)\] i =l,.,,I 
Here, Q(i, j) is a sort of partial probability as 
in time alignment for speech recognition (Jelinek, 197@. 
4 Experimental Results 
4.1 The Task and the Corpus 
The models were tested on several tasks: 
• the Avalanche Bulletins published by the 
Swiss Federal Institute for Snow and 
Avalanche Research (SHSAR) in Davos, 
Switzerland and made awtilable by the Eu- 
p "q I ropean Corpus Initiative (I,CI/MCI, 1994); 
• the Verbmobil Corpus consisting of sponta- 
neously spoken dialogs in the domain of ap- 
pointment scheduling (Wahlster, 1993); 
838 
,, the EuTrans C, orpus which contains tyl)ical 
phrases from the tourists and t.ravel docnain. 
(EuTrans, 1996). 
'l'able \] gives the details on the size of tit<; cor- 
pora a, ud t;\]t<'it' vocal>ulary. It shottld I>e noted 
that in a.ll thes(; three ca.ses the ratio el' vocal)t,- 
\]ary size a.ml numl)er of running words is not very 
faw)rable. 
Tall)le, I: (,orpol :L 
(,o~pt s l,angua.ge Words Voc. Size 
AvalancJte 
\] A\[ \[rails 
Verlmlobil 
Frolt ch 
(~('~ l lall 
Spanish 
I,;nglish 
(le 11 an 
English 
62849 
,\]4805 
--1:77@- 
15888 
150279 
25,\] 27 
1993 
2265 
2008 
t 63(} 
dO 17 
2`\]/13 
For several years 1)et;weeu 83 and !)2, the 
Avalanche Bulletins are awdlabte for I>oth Get- 
ntan and I!'ren(;\]l. The following is a tyl)ical sen-- 
t<;nce t>air fS;onl the <;or:IreS: 
Bei zu('.rst recht holnm, Sl)~i.tev tM'eren 'l'em- 
l)eraJ, uren sind vou Samsta.g his 1)ienstag tnof 
gett auf <l<'~t; All>ennor(ls<'.ite un</ am All>en-. 
ha.uptkanml oberhalb 2000 m 60 his 80 cm 
Neuschnee gel'aJlen. 
l)ar des temp&'atures d' abord dlevdes, puis 
plus basses, 60 h 8(1 cm de neige sent tombs 
de samedi h. mardi matin sur le versant herd 
el; la eft're des Alpes au-dessus de 2000 l\[1. 
An exa,nq)le fi'om the Vet%mobil corpus is given 
in Figure 1. 
4.2 Training and ILesults 
l,;ach of the three COrlJora. were ttsed to train 1)oth 
alignnmnt models, the mixture-I>ased alignment 
model in Eq.(1) and the llMM-base<l a.lignntent 
mod('l in Eq.(d). ltere, we will consider the ex- 
p<'.rimenta.l tesl;s on tit<'. Avalanche corpus in more 
detail. The traii, ing procedure consiste(l of the 
following steps: 
• , Initialization training: IBMI model trahted 
for t0 iterations of the i';M algorithm. 
,, l{,efinement traiuiug: The translation pcoba- 
1)ilities Dotn the initialization training wet'(; 
use+d to initialize both the IBM2 model and 
the I I M M-based nligntnent mo<t<'+l 
IBM2 Model: 5 iteratious using Lit(" max- 
ilnum a.I)proximatiolt (Eq+(3)) 
IIMM Model: 5 iterations usiug lle max-. 
imum al)l)roximation (Fq.(6)) 
'l'h(, resulting perl>h:'~xity (inverse g<~olu(;l.ric av- 
era,ge of the likelihoods) for the dilferent lno(lels 
ave given iu tim Tal>\[es 2 and 3 for the Awdanehe 
<:<)rims. In adclitiou t;o the total i>erl>lexity, whi<'.h 
is the' globa.l optimization criterion, the tables al- 
so show the perplexities of the translation prob- 
abilities and of the alignment probabilities. The 
last line in Table 2 gives the perplexity measures 
wh(m a.lJplying the rtlaxilnun| approximation and 
COml>uting the perph'~xity in t;\]lis approximation. 
These values are equal to the ones after initializing 
the IBM2 and HMM models, as they should be. 
From Ta,ble 3, we can see. that the mixture align- 
ment gives slightly better perplexity values for the 
translation l)roba.1)ilities, whereas the IIMM mod- 
el produces a smaller perplexity for the alignment 
l>rohal)ilities. In the calculatiot, of the, perplexi- 
ties, th<' seld;en(;e length probal)ility was not in= 
eluded. 
Tahle 2: IBM I: Translation, a, ligmnent and total 
pert)h'~xil.y as a. fimction of' the iteration. 
Iteration Tra,nslatiotl. Alignrnent Total 
0 
1 
2 
9 
10 
99.36 
3.72 
2.67 
t.87 
1.86 
20.07 
20.07 
20.07 
20.07 
20.07 
1994.00 
7/1.57 
53.62 
37.55 
37.36 
Max. 3.88 20.07 77.!)5 
'l'able 3:'1 rans\] ~+tion, aligmn en t and totaJ perplex- 
ity as a function of the itcra.tion for the IBM2 (A) 
and the IIMM model (13) 
Iter. Tratmlat;i(m 
A 0 
l 
2 
3 
,\] 
5 
1~ 0 
1 
3 
4 
5 
A ligniN.elJ t 
3.88- 20.07 
3.17 10.82 
3.25 10.15 
3.22 10.10 
3.20 \] 0.06 
3.18 10.05 
3.88 20.07 
3.37 7.99 
3.46 6.17 
;{./17 5.90 
"Ld6 5.85 
3.`\]5 5.8,\] 
'l'otal 
77.95 
34.27 
33.03 
32.48 
32.18 
32.00 
77.95 
26.98 
2t.36 
20.48 
20.2/1 
20.18 
Anoth<2r inl;crc:sting question is whether the 
IIMM alignntent model helps in finding good and 
sharply fo('usscd word+to-word (-orres\]Jondences. 
As an (;xamf,1o, Table 4 gives a COmlm+rison of 
the translatioJ~ probabilities p(fl e) bctweett the 
mixture and the IIMM alignnw+nt model For the 
(,e, u +l word Alpensiidhang. The counts of the 
words a.re given in brackets. The, re is virLually no 
,:lilfc~rc~nce between the translation l.al>les for the 
two nn)dels (1BM2 and IIMM). But+ itt general, 
the tl M M model seems to giw'. slightly better re- 
suits in the cases of (;, ttna t COml+olmd words like 
Alpcus'iidha'n,(I vcrsant sud des Alpes which re- 
quire \['u,tction words in the trattslation. 
839 
Table 4: Alpens/idhang. 
IBM1 Alpes (684) 0.171 
des (1968) 0.035 
le (1419) 0.039 
sud (416) 0.427 
sur (769) 0.040 
versant (431) 0.284 
IBM2 Alpes (684) 0.276 
sud (41.6) 0.371 
versant (431) 0.356 
HMM Alpes (684) 0.284 
des (1968) 0.028 
sud (416) 0.354 
versant (431) 0.333 
This is a result of the smoother position align- 
ments produced by the HMM model. A pro- 
nounced example is given in Figure 2. 'She prob- 
lem of the absolute position alignment can he 
demonstrated at the positions (a) and (c): both 
Schneebretlgefahr und Schneeverfrachtungen have 
a high probability on neige. The IBM2 models 
chooses the position near the diagonal, as this 
is the one with the higher probability. Again, 
Schneebrettgefahr generates de which explains the 
wrong alignment near the diagonal in (c). 
However, this strength of the HMM model can 
also be a weakness as in the case of est developpe 
ist ... entstanden (see (b) in Figure 2. The 
required two large jumps are correctly found by 
the mixture model, but not by the HMM mod- 
el. These cases suggest an extention to the HMM 
model. In general, there are only a small number 
of big jumps in the position alignments in a given 
sentence pair. Therefore a model could be useful 
that distinguishes between local and big jumps. 
The models have also been tested on the Verb- 
mobil Translation Corpus as well as on a small 
Corpus used in the EuTrans project. The sen- 
tences in the EuTrans corpus are in general 
short phrases with simple grammatical structures. 
However, the training corpus is very small and the 
produced alignments are generally of poor quality. 
There is no marked difference for the two align- 
ment models. 
Table 5: Perplexity results 
(b) Verbmobil Corpus. 
for (a) EuTrans and 
Model Iter. Transl. Align. Total 
IBM1 10 2.610 6.233 16.267 
IBM2 5 2.443 4.003 9.781 
HMM 5 2.461 3.934 9.686 
IBM1 10 4.373 10.674 46.672 
IBM2 5 4.696 6.538 30.706 
ItMM 5 4.859 5.452 26.495 
The Verbmobil Corpus consists of spontaneous- 
ly spoken dialogs in the domain of appointment 
scheduling. The assumption that every word in 
the source language is aligned to a word in the 
target language breaks down for many sentence 
pairs, resulting in poor alignment. This in turn 
affects the quality of the translation probabilities. 
Several extensions to the current IIMM based 
model could be used to tackle these problems: 
* The results presented here did not use the 
concept of the empty word. For the HMM- 
based model this, however, requires a second- 
order rather than a first-order model. 
. We could allow for multi-word phrases in 
both languages. 
• In addition to the absolute or relative align- 
ment positions, the alignment probabilities 
can be assumed to depend on part of speech 
tags or on the words themselves. (confer 
model 4 in (Brown et al., 1990)). 
5 Conclusion 
In this paper, we have presented an itMM-based 
approach for rnodelling word aligmnents in par- 
allel texts. The characteristic feature of this ap- 
proach is to make the alignment probabilities ex- 
plicitly dependent on the alignment position of the 
previous word. We have tested the model suc- 
cessfully on real data. The HMM-based approach 
produces translation probabilities comparable to 
the mixture alignment model. When looking at 
the position alignments those generated by the 
ItMM model are in general much smoother. This 
could be especially helpful for languages such as 
German, where compound words are matched to 
several words in the source language. On the oth- 
er hand, large jumps due to different word order- 
ings in the two languages are successfully modeled. 
We are presently studying and testing a nmltilevel 
HMM model that allows only a small number of 
large jumps. The ultimate test of the different 
alignment and translation models can only be car- 
ried out in the framework of a fully operational 
translation system. 
6 Acknowledgement 
This research was partly supported by the (\]er- 
man Federal Ministery of Education, Science, t{e- 
search and Technology under the Contract Num- 
ber 01 IV 601 A (Verbmobil) and under the Esprit 
Research Project 20268 'EuTrans). 
References 
L. E. Baum. 11972. An inequality and associat- 
ed maximization technique in statistical esti- 
mation of probabilistic functions of a Markov 
process. Inequalities, 3:1-8. 
840 
ENTSTANDEN 
SCHNEEBRETTGEFAHI~ 
LOKALE 
ERHEBLICHE 
EINE 
M 
2O00 
OBERHALB 
SCHNEEVERFRACHTUNGEN 
DURCH 
IST 
GOTTHARDGEBIET 
IM 
UND 
WALLIS 
IM 
ENTSTANDEN 
SCHNEEBRETTGEFAHR 
LOKALE 
ERHEBLICHE 
EINE 
M 
2000 
OBERHALB 
SCHNEEVERFRACHTUNGEN 
DURCH 
IST 
GOTTHARDGEBIET 
IM 
UND 
WALLIS 
+++ + ++ ++++ + 
+ + + + + + + + ++ + 
+ + + + + + ++ ++ + 
+ + + + + + + + + + + 
+ + + +++ ++++ + 
++ +++++++++ 
+ + + ++ + ++ +~ + 
+ + + + + + + + +/~ + 
+ + + + + + + + +t44 + ++++++++;/;/# 
+++++++j,+° 
+ + -I- ~ + + + 
+ + ~ + + + + + + + 
+ +~ + + + + + + + + 
.~+ ++ + +++++ 
O ++++ + ++++ + 
++ ++ ++ ++ + ++ +++++ ++I 
+ + 
++++ + 
+ + + 
+ + - + + Mixture 
+ j + + + + + + + + + 
+ , + + + + + + + + + 
+ + + + + + + + -t- + + + + 
+ + + + +(b)+ + + + + + + + + + + 
+ + ++ ++ +++++ + +++ + + + + 
+ + ++ +++++++ ++ ++ + ++ + 
+ + ++ +++++++ ++ ++ + + ++ 
+ +++ +++++ +++++++ ++ + 
+ + + + + + 4-+ + + + + + + + + + ++ + + + ++ + + + + +~ 
++++++++++++++~+~++++++++~ 
+ + + + + + + + + + + + + +/l~-Q-g + + + ~ + + + + + + + +~ 
+ 
++++++++1/+++++++++++1/++++_12/,,++ 
+++++++++/++++++++++++y+++/--'+++HMM 
+ + + + + + + + I + + + + + + + + + + + + O--I~'-g-g + + + + + 
++++~++++++++++++++++++++ 
+ + + ~-t~t~-~ + + + + + + + + + + + + + + + + + + + + + + 
+ + ~/~ + + + + + + + + + + + + + ÷ ÷ + + ÷ ÷ + + + + + + + 
+~J+++++++++++++++++++++++++++ 
+~++++++++++++++++++++++++++++ 
++++ +++ +++++ +++ ++++++++ ++ + ++ + 
° o 
A 
Figure 2: Alignments generated by the IBM2 and the HMM model. 
Peter F. Brown, Vincent J. Della Piet, ra, 
Stephen A. I)ella Pietra, and Robert 1, Mercer. 
\]993. The Mat, hema.tics of Statistical Machine 
'lS:unslalfion: Parameter Estimatiom (\]omputa- 
tional Linguistics, 19(2):26")--31 1. 
hlo Dagan, Ken Ctmreh, and William A. Gale. 
1993. l{,obust 13ilingual Word Alignment, for 
Machine Aided 'l'rm~sl~ttion l'rocecdings of the 
Workshop on Very Largc Corpora, C, oluml)us, 
Ohio, 1-8 
ECI/MC\[: The European Corpus Initiative Mul-. 
tilingual Corpus 1. \[!)94. Association for Com- 
pul;ational binguistics. 
EuTrans. 'l'he I)etinidon of a M'I' '\['ask. 'l)eh- 
nieal Report, I,~f\]'rans Project 1996(I,'orth- 
conuni,g), l)epto, de Sistemas Informaticos y 
Computacion (DSIC), Universidad Politecnica 
de Valencia. 
Pascale I"ung, and Kenneth Ward Church. 11994. 
K-vet: A flew N)proach \[br aligning parMlel 
texts. Proceedings of COLING 94, 1096-ll02, 
Kyoto, Japan. 
Frederik Jelinek. 1976. Speech Recognition by 
Statistical Met;hods. Proceedings of the \[l~l£1'\], 
Vol. 64,532-556, April 11976. 
Martin Kay, and Martin RSscheisen. 1993. Text- 
'l}anslation Alignment. Computational Lin- 
guistics, 19(1):121-142 
Wolfgang Wahlster. t993. Verbmobil: 'l?ransla- 
tion of Face-to-Face Dialogs. Proceedings of the 
MT' Summit IV, \]27-135, Kobe, Japan. 
841 
