Improving Statistical Word Alignment with a Rule-Based Machine 
Translation System 
WU Hua, WANG Haifeng  
Toshiba (China) Research & Development Center 
5/F., Tower W2, Oriental Plaza, No.1, East Chang An Ave., Dong Cheng District 
Beijing, China, 100738 
{wuhua, wanghaifeng}@rdc.toshiba.com.cn 
 
Abstract 
The main problems of statistical word alignment 
lie in the facts that source words can only be 
aligned to one target word, and that the inappro-
priate target word is selected because of data 
sparseness problem. This paper proposes an ap-
proach to improve statistical word alignment 
with a rule-based translation system. This ap-
proach first uses IBM statistical translation 
model to perform alignment in both directions 
(source to target and target to source), and then 
uses the translation information in the rule-based 
machine translation system to improve the statis-
tical word alignment. The improved alignments 
allow the word(s) in the source language to be 
aligned to one or more words in the target lan-
guage. Experimental results show a significant 
improvement in precision and recall of word 
alignment. 
1 Introduction 
                                                          
Bilingual word alignment is first introduced as an 
intermediate result in statistical machine transla-
tion (SMT) (Brown et al. 1993). Besides being 
used in SMT, it is also used in translation lexicon 
building (Melamed 1996), transfer rule learning 
(Menezes and Richardson 2001), example-based 
machine translation (Somers 1999), etc. In previ-
ous alignment methods, some researches mod-
eled the alignments as hidden parameters in a 
statistical translation model (Brown et al. 1993; 
Och and Ney 2000) or directly modeled them 
given the sentence pairs (Cherry and Lin 2003). 
Some researchers used similarity and association 
measures to build alignment links (Ahrenberg et 
al. 1998; Tufis and Barbu 2002). In addition, Wu 
(1997) used a stochastic inversion transduction 
grammar to simultaneously parse the sentence 
pairs to get the word or phrase alignments. 
Generally speaking, there are four cases in 
word alignment: word to word alignment, word 
to multi-word alignment, multi-word to word 
alignment, and multi-word to multi-word align-
ment. One of the most difficult tasks in word 
alignment is to find out the alignments that in-
clude multi-word units. For example, the statisti-
cal word alignment in IBM translation models 
(Brown et al. 1993) can only handle word to 
word and multi-word to word alignments.  
Some studies have been made to tackle this 
problem. Och and Ney (2000) performed transla-
tion in both directions (source to target and target 
to source) to extend word alignments. Their re-
sults showed that this method improved precision 
without loss of recall in English to German align-
ments. However, if the same unit is aligned to 
two different target units, this method is unlikely 
to make a selection. Some researchers used 
preprocessing steps to identity multi-word units 
for word alignment (Ahrenberg et al. 1998; 
Tiedemann 1999; Melamed 2000). The methods 
obtained multi-word candidates based on con-
tinuous N-gram statistics. The main limitation of 
these methods is that they cannot handle sepa-
rated phrases and multi-word units in low fre-
quencies. 
In order to handle all of the four cases in word 
alignment, our approach uses both the alignment 
information in statistical translation models and 
translation information in a rule-based machine 
translation system. It includes three steps. (1) A 
statistical translation model is employed to per-
form word alignment in two directions
1
 (English 
to Chinese, Chinese to English). (2) A rule-based 
English to Chinese translation system is em-
ployed to obtain Chinese translations for each 
English word or phrase in the source language. (3) 
The translation information in step (2) is used to 
improve the word alignment results in step (1).  
A critical reader may pose the question “why 
 
1
 We use English-Chinese word alignment as a case study.  
not use a translation dictionary to improve statis-
tical word alignment?” Compared with a transla-
tion dictionary, the advantages of a rule-based 
machine translation system lie in two aspects: (1) 
It can recognize the multi-word units, particularly 
separated phrases, in the source language. Thus, 
our method is able to handle the multi-word 
alignments with higher accuracy, which will be 
described in our experiments. (2) It can perform 
word sense disambiguation and select appropriate 
translations while a translation dictionary can 
only list all translations for each word or phrase. 
Experimental results show that our approach im-
proves word alignments in both precision and 
recall as compared with the state-of-the-art tech-
nologies. 
2 
                                                          
Statistical Word Alignment 
Statistical translation models (Brown, et al. 1993) 
only allow word to word and multi-word to word 
alignments. Thus, some multi-word units cannot 
be correctly aligned. In order to tackle this prob-
lem, we perform translation in two directions 
(English to Chinese and Chinese to English) as 
described in Och and Ney (2000). The GIZA++ 
toolkit is used to perform statistical alignment. 
Thus, for each sentence pair, we can get two 
alignment results. We use  and  to represent 
the alignment sets with English as the source lan-
guage and Chinese as the target language or vice 
versa. For alignment links in both sets, we use i 
for English words and j for Chinese words. 
1
S
2
S
}0  },{|),{(
1
≥==
jjjj
aaAjAS  
}0  },{|),{(
2
≥==
iiii
aaAAiS  
 Where, represents the index posi-
tion of the source word aligned to the target word 
in position x. For example, if a Chinese word in 
position j is connected to an English word in po-
sition i, then . If a Chinese word in position 
j is connected to English words in positions i  
and , then .
),( jixa
x
=
ia
j
=
,{
21
iiA
j
=
)1( >k
1
2
i }
2
  We call an element in 
the alignment set an alignment link. If the link 
includes a word that has no translation, we call it 
a null link. If k words have null links, we 
treat them as k different null links, not just one 
link. 
 
2
 In the following of this paper, we will use the position 
number of a word to refer to the word.   
Based on  and , we obtain their intersec-
tion set, union set and subtraction set.  
1
S
2
S
Intersection: 
21
SSS ∩=  
Union: 
21
SSP ∪=  
Subtraction: S−= PF 
Thus, the subtraction set contains two differ-
ent alignment links for each English word.  
3 Rule-Based Translation System 
We use the translation information in a rule-
based English-Chinese translation system
3
 to im-
prove the statistical word alignment result. This 
translation system includes three modules: source 
language parser, source to target language trans-
fer module, and target language generator.  
From the transfer phase, we get Chinese trans-
lation candidates for each English word. This 
information can be considered as another word 
alignment result, which is denoted as 
)},{(
3 k
CkS = . C  the set including the trans-
lation candidates for the k-th 
 
English word or 
phrase. The difference between S  and the 
common alignment set is that each English word 
or phrase in S  has one or more translation can-
didates. A translation example for the English 
sentence “He is used to pipe smoking.” is shown 
in Table 1.  
k
 is
3
3
English Words Chinese Translations 
He 他 
is used to 习惯 
pipe 烟斗，烟筒 
smoking 吸，吸烟 
Table 1. Translation Example 
From Table 1, it can be seen that (1) the trans-
lation system can recognize English phrases (e.g. 
is used to); (2) the system can provide one or 
more translations for each source word or phrase; 
(3) the translation system can perform word se-
lection or word sense disambiguation. For exam-
ple, the word “pipe” has several meanings such 
as “tube”, “tube used for smoking” and “wind 
instrument”. The system selects “tube used for 
smoking” and translates it into Chinese words 
“烟斗” and “烟筒”. The recognized translation 
                                                           
3
 This system is developed based on the Toshiba English-
Japanese translation system (Amano et al. 1989). It achieves 
above-average performance as compared with the English-
Chinese translation systems available in the market. 
candidates will be used to improve statistical 
word alignment in the next section.  
4 
4.1 
Word Alignment Improvement 
As described in Section 2, we have two align-
ment sets for each sentence pair, from which we 
obtain the intersection set S and the subtraction 
set . We will improve the word alignments in S 
and  with the translation candidates produced 
by the rule-based machine translation system. In 
the following sections, we will first describe how 
to calculate monolingual word similarity used in 
our algorithm. Then we will describe the algo-
rithm used to improve word alignment results.  
F
F
Word Similarity Calculation 
This section describes the method for monolin-
gual word similarity calculation. This method 
calculates word similarity by using a bilingual 
dictionary, which is first introduced by Wu and 
Zhou (2003). The basic assumptions of this 
method are that the translations of a word can 
express its meanings and that two words are simi-
lar in meanings if they have mutual translations. 
Given a Chinese word, we get its translations 
with a Chinese-English bilingual dictionary. The 
translations of a word are used to construct its 
feature vector. The similarity of two words is 
estimated through their feature vectors with the 
cosine measure as shown in (Wu and Zhou 2003). 
If there are a Chinese word or phrase w  and a 
Chinese word set Z , the word similarity between 
them is calculated as shown in Equation (1). 
))',((),(
'
wwsimMaxZwsim
Zw∈
=  
(1)
4.2 Alignment Improvement Algorithm 
As the word alignment links in the intersection 
set are more reliable than those in the subtraction 
set, we adopt two different strategies for the 
alignments in the intersection set S and the sub-
traction set . For alignments in S, we will mod-
ify them when they are inconsistent with the 
translation information in S . For alignments in 
, we classify them into two cases and make se-
lection between two different alignment links or 
modify them into a new link. 
F
3
F
In the intersection set S, there are only word 
to word alignment links, which include no multi-
word units. The main alignment error type in this 
set is that some words should be combined into 
one phrase and aligned to the same word(s) in the 
target sentence. For example, for the sentence 
pair in Figure 1, “used” is aligned to the Chinese 
word “习惯”, and “is” and “to” have null links in 
. But in the translation set , “is used to" is a 
phrase. Thus, we combine the three alignment 
links into a new link. The words “is”, “used” and 
“ to” are all aligned to the Chinese word “习惯”, 
denoted as (is used to, 习惯). Figure 2 describes 
the algorithm employed to improve the word 
alignment in the intersection set S. 
S
3
S
)j
 ph
k
,
3
S
 
 
Figure 1. Multi-Word Alignment Example 
Input: Intersection set S, Translation set , 
3
S
            Final word alignment set WA  
For each alignment link（ in , do:  ,i S
(1) If all of the following three conditions are 
satisfied, add the new alignment link 
WA ph
k
∉）（w,  to WA . 
a) There is an element（, and 
the English word i is a constituent of the 
phrase  . 
3
) SC
k
∈
k
ph
b) The other words in the phrase ph  also 
have alignment links in S .  
k
c) For each word s in ph , we get 
k
}),|{ St(stT ∈= and combine
4
 all words 
in T into a phrase w , and the similar-
ity
1
),( δ>
k
Cwsim .  
(2) Otherwise, add（ to WA .  ), ji
Output: Word alignment set WA  
Figure 2. Algorithm for the Intersection Set 
In the subtraction set, there are two different 
links for each English word. Thus, we need to 
select one link or to modify the links according to 
the translation information in .  
For each English word i in the subtraction set, 
there are two cases:  
                                                          
4
 We define an operation “combine” on a set consisting of 
position numbers of words. We first sort the position num-
bers in the set ascendly and then regard them as a phrase. 
For example, there is a set {{2,3}, 1, 4}, the result after 
applying the combine operation is (1, 2, 3, 4). 
Case 1: In , there is a word to word alignment 
link（. In , there is a word to word or 
word to multi-word alignment link
1
S
1
S ), ji ∈
2
S
 
2
), SAi
i
∈（
5
.  
Case 2: In , there is a multi-word to word 
alignment link ( . In S , there 
is a word to word or word to multi-word align-
ment link（. 
1
S
, A
i
jj
AiSjA ∈∈ &),
1
2
) S∈
2
 i
For Case 1, we first examine the translation 
set . If there is an element（, we cal-
culate the Chinese word similarity between j in 
 and  with Equation (1) shown in 
Section 4.1. We also combine the words in A  
) into a phrase and get the word simi-
larity between this new phrase and C . The align-
ment link with a higher similarity score is 
selected and added to WA .  
3
S
)j ∈
), A
i
 
3
), SCi
i
∈
i
1
, Si（
( Si ∈（
i
C
i
2
Input: Alignment sets S  and  
1 2
S
Translation unit（  
3
), SCph
kk
∈
(1) For each sub-sequence
6
 s of , get the 
sets and 
 
k
ph
}
1
)∈,(|{
111
StstT =
})
22
St ∈,(|{
22
stT =
(2) Combine words in T  and T  into phrases 
 and  respectively. 
1 2
1
w
2
w
(3) Obtain the word similarities 
 and .  ),Csim(wws
k11
= ),Csim(wws
k22
=
(4) Add a new alignment link to WA  
according to the following steps. 
a) If ws and 
21
ws>
11
δ>ws , add （ 
to WA ;  
),
1
wph
k
b) If ws  and 
12
ws>
12
δ>ws , add（ 
to WA ;  
 ),
2
wph
k
c) If 
121
δ>= wsws
),
2
w
k
, add （ or 
 to WA  randomly. 
),
1
wph
k
ph（
Output: Updated alignment set WA  
Figure 3. Multi-Word to Multi-Word Align-
ment Algorithm 
If, in S , there is an element（ and i 
is a constituent of , the English word i of the 
alignment links in both S  and  should be 
3
 ),
kk
Cph
2
S
k
ph
1
combined with other words to form phrases. In 
this case, we modify the alignment links into a 
multi-word to multi-word alignment link. The 
algorithm is described in Figure 3. 
                                                           
5
 （),
i
Ai represents both the word to word and word to 
multi-word alignment links. 
6
 If a phrase consists of three words w , the sub-
sequences of this phrase are w . 
321
ww
221
,, www
3321
,, www
For example, given a sentence pair in Figure 4,  
in S , the word “whipped” is aligned to “突然” 
and “out” is aligned to “抽出”. In S , the word 
“whipped” is aligned to both “突然” and “抽出” 
and “out” has a null link. In , “whipped out” is 
a phrase and translated into “迅速抽出". And the 
word similarity between “突然抽出” and “迅速抽
出” is larger than the threshold 
1
2
1
3
S
δ . Thus, we 
combine the aligned target words in the Chinese 
sentence into “突然抽出”. The final alignment 
link should be (whipped out, 突然 抽出). 
 
Figure 4. Multi-Word to Multi-Word Alignment 
Example 
For Case 2, we first examine S  to see 
whether there is an element（. If true, 
we combine the words in  (（) into a 
word or phrase and calculate the similarity be-
tween this new word or phrase and C  in the 
same way as in Case 1. If the similarity is higher 
than a threshold 
3
3
S
2
S∈
i
 ), Ci
i
∈
), Ai
ii
A
1
δ , we add the alignment link 
 into WA .  ),
i
Ai（
If there is an element（ and i is a 
constituent of ph , we combine the English 
words in A  ( ) into a phrase. If it is 
the same as the phrase  and 
 
3
), SCph
kk
∈
1
k
ph (
k
, jA
jj
)( S∈
1
), δ>
k
Cjsim , 
we add (  into WA . Otherwise, we use the 
multi-word to multi-word alignment algorithm in 
Figure 3 to modify the links.  
),A
j
j
After applying the above two strategies, there 
are still some words not aligned. For each sen-
tence pair, we use E and C to denote the sets of 
the source words and the target words that are not 
aligned, respectively. For each source word in E, 
we construct a link with each target word in C. 
We use L },|),{( CjEiji ∈∈=  to denote the 
alignment candidates. For each candidate in L, 
we look it up in the translation set S . If there is 
an element 
3
3
), SCi
i
∈（ and 
2
)( , δ>j
i
Csim , we 
add the link into the set WA . 
|
C
||
C
C
S
S∩
S
C
C
=
|
|
5 Experiments 
5.1 
5.2 
Training and Testing Set 
We did experiments on a sentence aligned Eng-
lish-Chinese bilingual corpus in general domains. 
There are about 320,000 bilingual sentence pairs 
in the corpus, from which, we randomly select 
1,000 sentence pairs as testing data. The remain-
der is used as training data.  
The Chinese sentences in both the training set 
and the testing set are automatically segmented 
into words. The segmentation errors in the testing 
set are post-corrected. The testing set is manually 
annotated. It has totally 8,651 alignment links 
including 2,149 null links. Among them, 866 
alignment links include multi-word units, which 
accounts for about 10% of the total links. 
Experimental Results 
There are several different evaluation methods 
for word alignment (Ahrenberg et al. 2000). In 
our evaluation, we use evaluation metrics similar 
to those in Och and Ney (2000). However, we do 
not classify alignment links into sure links and 
possible links. We consider each alignment as a 
sure link.  
If we use S  to indicate the alignments iden-
tified by the proposed methods and S  to denote 
the reference alignments, the precision, recall and 
f-measure are calculated as described in Equation 
(2), (3) and (4). According to the definition of the 
alignment error rate (AER) in Och and Ney 
(2000), AER can be calculated with Equation (5).  
G
C
|S|
SS|
G
G
∩
=precision      
 
(2)
|S|
 |SS|
C
CG
∩
=recall   
 
(3)
||
||*2
G
G
S
S
fmeasure
+
=  
(4)
fmeasure
SS
S
AER
G
G
−
+
∩
−= 1
|||
|*2
1
(5)
In this paper, we give two different alignment 
results in Table 2 and Table 3. Table 2 presents 
alignment results that include null links. Table 3 
presents alignment results that exclude null links. 
The precision and recall in the tables are obtained 
to ensure the smallest AER for each method. 
 Precision Recall AER 
Ours 0.8531 0.7057 0.2276
Dic 0.8265 0.6873 0.2495 
IBM E-C 0.7121 0.6812 0.3064 
IBM C-E 0.6759 0.7209 0.3023 
IBM Inter 0.8756 0.5516 0.3233 
IBM Refined 0.7046 0.6532 0.3235 
Table 2. Alignment Results Including Null Links 
 Precision Recall AER 
Ours 0.8827 0.7583 0.1842
Dic 0.8558 0.7317 0.2111 
IBM E-C 0.7304 0.7136 0.2781 
IBM C-E 0.6998 0.6725 0.3141 
IBM Inter 0.9392 0.5513 0.3052 
IBM refined 0.8152 0.6926 0.2505 
Table 3. Alignment Results Excluding Null Links 
In the above tables, the row “Ours” presents 
the result of our approach. The results are ob-
tained by setting the word similarity thresholds to 
1.0
1
＝δ  and 5.0
2
＝δ . The Chinese-English dic-
tionary used to calculate the word similarity has 
66,696 entries. Each entry has two English trans-
lations on average. The row “Dic” shows the re-
sult of the approach that uses a bilingual 
dictionary instead of the rule-based machine 
translation system to improve statistical word 
alignment. The dictionary used in this method is 
the same translation dictionary used in the rule-
based machine translation system. It includes 
57,684 English words and each English word has 
about two Chinese translations on average. The 
rows “IBM E-C” and “IBM C-E” show the re-
sults obtained by IBM Model-4 when treating 
English as the source and Chinese as the target or 
vice versa. The row “IBM Inter” shows results 
obtained by taking the intersection of the align-
ments produced by “IBM E-C” and “IBM C-E”. 
The row “IBM Refined” shows the results by 
refining the results of “IBM Inter” as described in 
Och and Ney (2000). 
Generally, the results excluding null links are 
better than those including null links. This indi-
cates that it is difficult to judge whether a word 
has counterparts in another language. It is be-
cause the translations of some source words can 
be omitted. Both the rule-based translation sys-
tem and the bilingual dictionary provide no such 
information.  
It can be also seen that our approach performs 
the best among others in both cases. Our ap-
proach achieves a relative error rate reduction of 
26% and 25% when compared with “IBM E-C” 
and “IBM C-E” respectively
7
. Although the pre-
cision of our method is lower than that of the 
“IBM Inter” method, it achieves much higher 
recall, resulting in a 30% relative error rate re-
duction. Compared with the “IBM refined” 
method, our method also achieves a relative error 
rate reduction of 30%. In addition, our method is 
better than the “Dic” method, achieving a relative 
error rate reduction of 8.8%.  
In order to provide the detailed word align-
ment information, we classify word alignment 
results in Table 3 into two classes. The first class 
includes the alignment links that have no multi-
word units. The second class includes at least one 
multi-word unit in each alignment link. The de-
tailed information is shown in Table 4 and Table 
5. In Table 5, we do not include the method “In-
ter” because it has no multi-word alignment links.  
 Precision Recall AER 
Ours 0.9213 0.8269 0.1284
Dic 0.8898 0.8215 0.1457 
IBM E-C 0.8202 0.7972 0.1916 
IBM C-E 0.8200 0.7406 0.2217 
IBM Inter 0.9392 0.6360 0.2416 
IBM Refined 0.8920 0.7196 0.2034 
Table 4. Single Word Alignment Results 
 Precision Recall AER 
Ours 0.5123 0.3118 0.6124
Dic 0.3585 0.1478 0.7907 
IBM E-C 0.1682 0.1697 0.8311 
IBM C-E 0.1718 0.2298 0.8034 
IBM Refined 0.2105 0.2910 0.7557 
Table 5. Multi-Word Alignment Results  
All of the methods perform better on single 
word alignment than on multi-word alignment. In 
Table 4, the precision of our method is close to 
the “IBM Inter” approach, and the recall of our 
method is much higher, achieving a 47% relative 
error rate reduction. Our method also achieves a 
37% relative error rate reduction over the “IBM 
Refined” method.  Compared with the “Dic” 
method, our approach achieves much higher pre-
cision without loss of recall, resulting in a 12% 
relative error rate reduction. 
                                                           
6 Discussion 
7
 The error rate reductions in this paragraph are obtained 
from Table 2. The error rate reductions in Table 3 are 
omitted.  
Our method also achieves much better results 
on multi-word alignment than other methods. 
However, our method only obtains one third of 
the correct alignment links. It indicates that it is 
the hardest to align the multi-word units. 
Readers may pose the question “why the rule-
based translation system performs better on word 
alignment than the translation dictionary?” For 
single word alignment, the rule-based translation 
system can perform word sense disambiguation, 
and select the appropriate Chinese words as 
translation. On the contrary, the dictionary can 
only list all translations. Thus, the alignment pre-
cision of our method is higher than that of the 
dictionary method. Figure 5 shows alignment 
precision and recall values under different simi-
larity values for single word alignment including 
null links. From the figure, it can be seen that our 
method consistently achieves higher precisions as 
compared with the dictionary method. The t-
score value (t=10.37, p=0.05) shows the im-
provement is statistically significant.   
Figure 5. Recall-Precision Curves  
For multi-word alignment links, the translation 
system also outperforms the translation diction-
ary. The result is shown in Table 5 in Section 5.2. 
This is because (1) the translation system can 
automatically recognize English phrases with 
higher accuracy than the translation dictionary; (2) 
The translation system can detect separated 
phrases while the dictionary cannot. For example, 
for the sentence pairs in Figure 6, the solid link 
lines describe the alignment result of the rule-
base translation system while dashed lines indi-
cate the alignment result of the translation dic-
tionary. In example (1), the phrase “be going to” 
indicates the tense not the phrase “go to” as the 
dictionary shows. In example (2), our method 
detects the separated phrase “turn … on” while 
the dictionary does not. Thus, the dictionary 
method produces the wrong alignment link. 
 
Figure 6. Alignment Comparison Examples 
7 Conclusion and Future Work 
This paper proposes an approach to improve sta-
tistical word alignment results by using a rule-
based translation system. Our contribution is that, 
given a rule-based translation system that pro-
vides appropriate translation candidates for each 
source word or phrase, we select appropriate 
alignment links among statistical word alignment 
results or modify them into new links. Especially, 
with such a translation system, we can identify 
both the continuous and separated phrases in the 
source language and improve the multi-word 
alignment results. Experimental results indicate 
that our approach can achieve a precision of 85% 
and a recall of 71% for word alignment including 
null links in general domains. This result signifi-
cantly outperforms those of the methods that use 
a bilingual dictionary to improve word alignment, 
and that only use statistical translation models. 
Our future work mainly includes three tasks. 
First, we will further improve multi-word align-
ment results by using other technologies in natu-
ral language processing. For example, we can use 
named entity recognition and transliteration tech-
nologies to improve person name alignment. Sec-
ond, we will extract translation rules from the 
improved word alignment results and apply them 
back to our rule-based machine translation sys-
tem. Third, we will further analyze the effect of 
the translation system on the alignment results. 

References 

Lars Ahrenberg, Magnus Merkel, and Mikael Anders-
son 1998. A Simple Hybrid Aligner for Generating 
Lexical Correspondences in Parallel Texts. In Proc. 
of the 36th Annual Meeting of the Association for 
Computational Linguistics and the 17th Int. Conf. 
on Computational Linguistics, pp. 29-35. 

Lars Ahrenberg, Magnus Merkel, Anna Sagvall Hein 
and Jorg Tiedemann 2000. Evaluation of word 
alignment systems. In Proc. of the Second Int. Conf. 
on Linguistic Resources and Evaluation, pp. 1255-
1261. 

ShinYa Amano, Hideki Hirakawa, Hiroyasu Nogami, 
and Akira Kumano 1989. Toshiba Machine Trans-
lation System. Future Computing Systems, 
2(3):227-246. 

Peter F. Brown, Stephen A. Della Pietra, Vincent J. 
Della Pietra and Robert L. Mercer 1993. The 
Mathematics of Statistical Machine Translation: 
Parameter Estimation. Computational Linguistics, 
19(2):263-311. 

Colin Cherry and Dekang Lin 2003. A Probability 
Model to Improve Word Alignment. In Proc. of the 
41st Annual Meeting of the Association for Com-
putational Linguistics, pp. 88-95. 

I. Dan Melamed 1996. Automatic Construction of 
Clean Broad-Coverage Translation Lexicons. In 
Proc. of the 2nd Conf. of the Association for Ma-
chine Translation in the Americas, pp. 125-134. 

I. Dan Melamed 2000. Word-to-Word Models of 
Translational Equivalence among Words. Compu-
tational Linguistics, 26(2): 221-249. 

Arul Menezes and Stephan D. Richardson 2001. A 
Best-first Alignment Algorithm for Automatic Ex-
traction of Transfer Mappings from Bilingual Cor-
pora. In Proc. of the ACL 2001 Workshop on Data-
Driven Methods in Machine Translation, pp. 39-46. 

Franz Josef Och and Hermann Ney 2000. Improved 
Statistical Alignment Models. In Proc.of the 38th 
Annual Meeting of the Association for Computa-
tional Linguistics, pp. 440-447. 

Harold Somers 1999. Review Article: Example-Based 
Machine Translation. Machine Translation 14:113-
157. 

Jorg Tiedemann 1999. Word Alignment – Step by Step. 
In Proc. of the 12th Nordic Conf. on Computational 
Linguistics, pp. 216-227. 

Dan Tufis and Ana Maria Barbu. 2002. Lexical Token 
Alignment: Experiments, Results and Application. 
In Proc. of the Third Int. Conf. on Language Re-
sources and Evaluation, pp. 458-465. 

Dekai Wu 1997. Stochastic Inversion Transduction 
Grammars and Bilingual Parsing of Parallel Cor-
pora. Computational Linguistics, 23(3):377-403. 

Hua Wu and Ming Zhou 2003. Optimizing Synonym 
Extraction Using Monolingual and Bilingual Re-
sources. In Proc. of the 2nd Int. Workshop on Para-
phrasing, pp. 72-79. 
