Acquisition of English-Chinese Transliterated Word Pairs from Parallel-
Aligned Texts using a Statistical Machine Transliteration Model 
Chun-Jen Lee
1, 2
 
1
 Telecommunication Labs. 
Chunghwa Telecom Co., Ltd. 
Chungli, Taiwan, R.O.C. 
cjlee@cht.com.tw 
Jason S. Chang
2
 
2
 Department of Computer Science 
National Tsing Hua University 
Hsinchu, Taiwan, R.O.C. 
jschang@cs.nthu.edu.tw 
  
Abstract 
This paper presents a framework for extract-
ing English and Chinese transliterated word 
pairs from parallel texts. The approach is 
based on the statistical machine transliteration 
model to exploit the phonetic similarities be-
tween English words and corresponding Chi-
nese transliterations. For a given proper noun 
in English, the proposed method extracts the 
corresponding transliterated word from the 
aligned text in Chinese. Under the proposed 
approach, the parameters of the model are 
automatically learned from a bilingual proper 
name list. Experimental results show that the 
average rates of word and character precision 
are 86.0% and 94.4%, respectively. The rates 
can be further improved with the addition of 
simple linguistic processing. 
1 Introduction 
Automatic bilingual lexicon construction based on bi-
lingual corpora has become an important first step for 
many studies and applications of natural language proc-
essing (NLP), such as machine translation (MT), cross-
language information retrieval (CLIR), and bilingual 
text alignment. As noted in Tsuji (2002), many previous 
methods (Dagan et al., 1993; Kupiec, 1993; Wu and Xia, 
1994; Melamed, 1996; Smadja et al., 1996) deal with 
this problem based on frequency of words appearing in 
the corpora, which can not be effectively applied to low-
frequency words, such as transliterated words. These 
transliterated words are often domain-specific and cre-
ated frequently. Many of them are not found in existing 
bilingual dictionaries. Thus, it is difficult to handle 
transliteration only via simple dictionary lookup. For 
CLIR, the accuracy of transliteration highly affects the 
performance of retrieval. 
In this paper, we present a framework of acquisition 
for English and Chinese transliterated word pairs based 
on the proposed statistical machine transliteration model. 
Recently, much research has been done on machine 
transliteration for many language pairs, such as Eng-
lish/Arabic (Al-Onaizan and Knight, 2002), Eng-
lish/Chinese (Chen et al., 1998; Lin and Chen, 2002; 
Wan and Verspoor, 1998), English/Japanese (Knight 
and Graehl, 1998), and English/Korean (Lee and Choi, 
1997; Oh and Choi, 2002). Most previous approaches to 
machine transliteration have focused on the use of a 
pronunciation dictionary for converting source words 
into phonetic symbols, a manually assigned scoring 
matrix for measuring phonetic similarities between 
source and target words, or a method based on heuristic 
rules for source-to-target word transliteration. However, 
words with unknown pronunciations may cause prob-
lems for transliteration. In addition, using either a lan-
guage-dependent penalty function to measure the 
similarity between bilingual word pairs, or handcrafted 
heuristic mapping rules for transliteration may lead to 
problems when porting to other language pairs.  
The proposed method in this paper requires no con-
version of source words into phonetic symbols. The 
model is trained automatically on a bilingual proper 
name list via unsupervised learning. 
The remainder of the paper is organized as follows: 
Section 2 gives an overview of machine transliteration 
and describes the proposed model. Section 3 describes 
how to apply the model for extraction of transliterated 
target words from parallel texts. Experimental setup and 
quantitative assessment of performance are presented in 
Section 4. Concluding remarks are made in Section 5. 
2 Statistical Machine Transliteration 
Model 
2.1 Overview of the Noisy Channel Model  
Machine transliteration can be regarded as a noisy 
channel, as illustrated in Figure 1. Briefly, the language 
model generates a source word E and the transliteration 
model converts the word E to a target transliteration C. 
Then, the channel decoder is used to find the word Ê 
that is the most likely to the word E that gives rise to the 
transliteration C. 
Language
Model
P(E)
Transli-
Teration
Model
P(C|E)
Channel
Decoder
argmax
E
EC
P(E|C)
E
ˆ
 
Figure 1. The noisy channel model in ma-
chine transliteration. 
 
Under the noisy channel model, the back-
transliteration problem is to find out the most probable 
word E, given transliteration C. Letting P(E) be the 
probability of a word E, then for a given transliteration 
C, the back-transliteration probability of a word E can 
be written as P(E|C). By Bayes’ rule, the transliteration 
problem can be written as follows: 
.
)(
)|()(
maxarg)|(maxarg
ˆ
CP
ECPEP
CEPE
EE
==
   (1) 
Since P(C) is constant for the given C, we can rewrite 
Eq. (1) as follows: 
),|()(maxarg
ˆ
ECPEPE
E
=
     
(2) 
The first term, P(E), in Eq. (2) is the language model, 
the probability of E. The second term, P(C|E), in Eq. (2) 
is the transliteration model, the probability of the trans-
literation C conditioned on E. 
Below, we assume that E is written in English, while 
C is written in Chinese. Since Chinese and English are 
not in the same language family, there is no simple or 
direct way of mapping and comparison. One feasible 
solution is to adopt a Chinese romanization system
1
 to 
represent the pronunciation of each Chinese character. 
Among the many romanization systems for Chinese, 
Wade-Giles and Pinyin are the most widely used. The 
Wade-Giles system is commonly used in Taiwan today 
and has traditionally been popular among Western 
scholars. For this reason, we use the Wade-Giles system 
to romanize Chinese characters. However, the proposed 
approach is equally applicable to other romanization 
systems.  
The language model gives the prior probability P(E) 
which can be modeled using maximum likelihood esti-
mation. As for the transliteration model P(C|E), we can 
approximate it using the transliteration unit (TU), which 
is a decomposition of E and C. TU is defined as a se-
                                                           
1
 Ref. sites: “http://www.romanization.com/index.html” and 
“http://www.edepot.com/taoroman.html”. 
quence of characters transliterated as a base unit. For 
English, a TU can be a monograph, a digraph, or a tri-
graph (Wells, 2001). For Chinese, a TU can be a sylla-
ble initial, a syllable final, or a syllable (Chao, 1968) 
represented by corresponding romanized characters. To 
illustrate how this approach works, take the example of 
an English name, “Smith”, which can be segmented into 
four TUs and aligned with the romanized transliteration. 
Assuming that the word is segmented into “S-m-i-th”, 
then a possible alignment with the Chinese translitera-
tion “史密斯 (Shihmissu)” is depicted in Figure 2.  
S             m           i       th
Shih      m         i            ssu
史密斯
 
Figure 2. TU alignment between English and 
Chinese romanized character sequences. 
 
2.2 Formal Description: Statistical Translitera-
tion Model (STM) 
A word E with l characters and a romanized word C 
with n characters are denoted by 
l
e
1
 and 
n
c
1
, respec-
tively. Assume that the number of aligned TUs for (E, 
C) is N, and let 
},...,,{
21 N
mmmM =
 be an alignment 
candidate, where m
j
 is the match type of the j-th TU. 
The match type is defined as a pair of TU lengths for the 
two languages. For instance, in the case of (Smith, 
Shihmissu), N is 4, and M is {1-4, 1-1, 1-1, 2-3}. We 
write E and C as follows:  
,
,...,,
,...,,
2111
2111





===
===
N
Nn
N
Nl
vvvvcC
uuuueE
     
(3) 
where u
i
 and v
j
 are the i-th TU of E and the j-th TU of 
C, respectively. 
Then the probability of C given E, P(C|E), is formulated 
as follows: 
).|(),|()|,()|( EMPEMCPEMCPECP
MM
∑∑
==
 (4) 
To reduce computational complexity, one alternative 
approach is to modify the summation criterion in Eq. (4) 
into maximization. Therefore, we can approximate 
P(C|E) as follows: 
)|(),|(max)|( EMPEMCPECP
M
≈
 
).(),|(max MPEMCP
M
≈
     (5) 
We approximate 
)(),|( MPEMCP
 as follows: 
),...,,()|()(),|(
2111 N
NN
mmmPuvPMPEMCP =
 
).()|(
1
iii
N
i
mPuvP
=
∏≈
    (6) 
Therefore, we have 
().)(log)|(logmax)|(log
1
∑
=
+≈
N
i
iii
M
mPuvPECP
   
(7) 
Let ),( jiS  be the maximum accumulated log prob-
ability between the first i characters of E and the first j 
characters of C. Then, ),()|(log nlSECP = , the maxi-
mum accumulated log probability among all possible 
alignment paths of E with length l and C with length n, 
can be computed using a dynamic programming (DP) 
strategy, as shown in the following: 
 
Step 1 (Initialization): 
0)0,0( =S
       
(8) 
Step 2 (Recursion): 
    .0   ,0                                       
),(log)|(log                  
),(max),(
,
njli
khPecP
kjhiSjiS
i
hi
j
kj
kh
≤≤≤≤
++
−−=
−−
   
(9) 
Step 3 (Termination): 
 ),(log)|(log                  
),(max),(
,
khPecP
knhlSnlS
l
hl
n
kn
kh
++
−−=
−−
 (10) 
 
where ),( khP  is defined as the probability of the match 
type “h-k”. 
 
2.3 Estimation of Model Parameters 
To describe the iterative procedure for re-estimation of 
probabilities of )|(
ij
uvP  and )(
i
mP , we first define 
the following functions: 
 
),(
ji
vucount
 = the number of occurrences of 
aligned pair u
i
 and v
i
 in the training 
set. 
 
)(
i
ucount
 = the number of occurrences of u
i
 in 
the training set. 
 
),( khcount  = the total number of occurrences 
of u
i
 with length h aligned with v
j
 
with length k in the training set. 
 
Therefore, the translation probability )|(
ij
uvP  can be 
approximated as follows: 
.
)(
),(
)|(
i
ji
ij
ucount
vucount
uvP =
  
 (11) 
The probability of the match type, ),( khP , can be es-
timated as follows: 
.
),(
),(
),(
∑∑
=
ij
jicount
khcount
khP
  
 (12) 
For the reason that 
),(
ji
vucount
 is unknown in the 
beginning , a reasonable initial estimate of the parame-
ters of the translation model is to constrain the TU 
alignments of a word pair (E, C) within a position dis-
tance δ  (Lee and Choi, 1997). Assume that 
1−+
=
hp
pi
eu  
and 
1−+
=
kq
qj
cv , and ),(
ji
vud
δ
 is the allowable posi-
tion distance within δ  for the aligned pair (u
i
, v
i
). 
),(
ji
vud
δ
 is defined as follows: 
, 
 
)1(
)1(
             ,
 ),(







<
×−+
−−+
<
×
−
=
δ
δ
δ
n
lkq
hp
and
n
lq
p
vud
ji
 (13) 
where l and n are the length of the source word E and 
the target word C, respectively. 
To accelerate the convergence of EM training and 
reduce the noisy TU aligned pairs (u
i
, v
j
), we restrict the 
combination of TU pairs to limited patterns. Consonant 
TU pairs only with same or similar phonemes are al-
lowed to be matched together. An English consonant is 
also allowed to matching with a Chinese syllable begin-
ning with same or similar phonemes. An English semi-
vowel TU can either be matched with a Chinese 
consonant or a vowel with same or similar phonemes, or 
be matched with a Chinese syllable beginning with 
same or similar phonemes. 
As for the probability of match type, ),( khP , it is 
set to uniform distribution in the initialization phase, 
shown as follows: 
,
1
),(
T
khP =
     (14) 
where T is the total number of match types allowed. 
Based on the Expectation Maximization (EM) algo-
rithm (Dempster et al., 1977) with Viterbi decoding 
(Forney, 1973), the iterative parameter estimation pro-
cedure is described as follows:  
 
Step 1 (Initialization):  
Use Eq. (13) to generate likely TU alignment 
pairs. Calculate the initial model parameters, 
)|(
ij
uvP  and ),( khP , using Eq. (11) and Eq. 
(12). 
Step 2 (Expection):  
Based on current model parameters, find the 
best Viterbi path for each E and C word pair in 
the training set. 
Step 3 (Maximization):  
Based on all the TU alignment pairs obtained 
from Step 2, calculate the new model parame-
ters using Eqs. (11) and (12). Replace the 
model parameters with the new model parame-
ters. If it reaches a stopping criterion or a pre-
defined iteration numbers, then stop the 
training procedure. Otherwise, go back to Step 
2. 
3 Extraction of Transliteration from Par-
allel Text 
The task of machine transliteration is useful for many 
NLP applications, and one interesting related problem is 
how to find the corresponding transliteration for a given 
source word in a parallel corpus. We will describe how 
to apply the proposed model for such a task. 
For that purpose, a sentence alignment procedure is 
applied first to align parallel texts at the sentence level. 
Then, we use a tagger to identify proper nouns in the 
source text. After that, the model is applied to isolate the 
transliteration in the target text.  In general, the pro-
posed transliteration model could be further augmented 
with linguistic processing, which will be described in 
more details in the next subsection. The overall process 
is summarized in Figure 3. 
Bilingual corpus
Sentence alignment
Source 
sentence
Target 
sentence
Proper names: 
Word extraction
Source word
Proper names: 
Source & Target words
Linguistic
processing
Transli-
terator 
Prepro-
cessing
 
Figure 3. The overall process for the extrac-
tion of transliteration from parallel text. 
 
An excerpt from the magazine Scientific American 
(Cibelli et al., 2002) is illustrated as follows: 
 
Source sentence:  
“Rudolf Jaenisch, a cloning expert at the 
Whitehead Institute for Biomedical Re-
search at the Massachusetts Institute of 
Technology, concurred:” 
Target sentence: 
“麻省 理工學院懷海德 生物醫學研究院的複製
專家傑尼西 說： ” 
 
In the above excerpt, three English proper nouns “Jae-
nisch”, “Whitehead”, and “Massachusetts” are identi-
fied by a tagger. Utilizing Eqs. (7) and the DP approach 
formulated by Eqs. (8)-(10), we found the target word 
“huaihaite ( 懷海德 )” most likely corresponding to 
“Whitehead”. In order to retrieve the transliteration for a 
given proper noun, we need to keep track of the optimal 
TU decoding sequence associated with the given Chi-
nese term for each word pair under the proposed 
method. The aligned TUs can be easily obtained via 
backtracking the best Viterbi path (Manning and 
Schutze, 1999). For the example mentioned above, the 
alignments of the TU matching pairs via the Viterbi 
backtracking path are illustrated in Figure 4. 
Match Type            TU Pair
:
0 - 1 ,    -- y
0 - 1 ,    -- u
0 - 1 ,    -- a
0 - 1 ,    -- n
2 - 2 , Wh -- hu
1 - 1 , i     -- a
1 - 0 , t     --
1 - 1 , e    -- i
1 - 1 , h    -- h
0 -1 , --a
2 - 1 , ea  -- i
1 - 2 , d   -- te
0 -1 , --s
0 -1 , --h
0 -1 , --e
0 -1 , --n
0 -1 , --g
:
懷
海
德
院
生
 
Figure 4. The alignments of the TU matching pairs 
via the Viterbi backtracking path. 
 
3.1 Linguistic Processing 
Some language-dependent knowledge can be integrated 
to further improve the performance, especially when we 
focus on specific language pairs. 
 
Linguistic Processing Rule 1 (R1): 
Some source words have both transliteration and trans-
lation, which are equally acceptable and can be used 
interchangeably. For example, the source word “Eng-
land” is translated into “英國  (Yingkou)” and transliter-
ated into “英格蘭  (Yingkolan)”, respectively, as shown 
in Figure 5. Since the proposed model is designed spe-
cifically for transliteration, such cases may cause prob-
lems. One way to overcome this limitation is to handle 
those cases by using a list of commonly used proper 
names and translations. 
England vs. 英國
The Spanish Armada sailed to England in 
1588.
西班牙無敵艦隊於一五八八年出征英國 。
England vs.英格蘭
England is the only country coterminous 
with Wales.
英格蘭 是唯一與威爾斯毗連的國家。
 
Figure 5. Examples of mixed usages of 
translation and transliteration. 
 
Linguistic Processing Rule 2 (R2): 
From error analysis of the aligned results of the training 
set, the proposed approach suffers from the fluid TUs, 
such as “t”, “d”, “tt”, “dd”, “te”, and “de”. Sometimes 
they are omitted in transliteration, and sometimes they 
are transliterated as a Chinese character. For instance, 
“d” is usually transliterated into “特 ”, “得 ”, or “德 ” 
corresponding to Chinese TU of “te”. The English TU 
“d” is transliterated as “德 ” in (Clifford, 克利福德 ), but 
left out in (Radford, 雷德福 ). In the example shown in 
Figure 6, “David (大衛 )” is mistakenly matched up with 
“大衛的 ”.  
(A boy by the name of David.)
名叫大衛 的 一個男孩。
…… Ta Wei Te ………
……… David .
 
Figure 6. Example of the transliterated 
word extraction for “David”. 
 
However, that problem caused by fluid TUs 
can be partly overcome by adding more linguistic 
constraints in the post-processing phase. We calcu-
late the Chinese character distributions of proper 
nouns from a bilingual proper name list. A small 
set of Chinese characters is often used for translit-
eration. Therefore, it is possible to improve the 
performance by pruning extra tailing characters, 
which do not belong to the transliterated character 
set, from the transliteration candidates. For in-
stance, the probability of “的 , 去 , 說 , 是 , 有 ” being 
used in transliteration is very low. So correct trans-
literation “ 大衛 ” for the source word “David” 
could be extracted by removing the character “的 ”.  
3.2 Working Flow by Integrating Linguistic and 
Statistical Information 
Combining the linguistic processing and transliteration 
model, we present the algorithm for transliteration ex-
traction as follows: 
 
Step 1: Look up the translation list as stated in 
R1. If the translation of a source word 
appears in both the entry of the transla-
tion list and the aligned target sentence 
(or paragraph), then pick the translation 
as the target word. Otherwise, go to Step 
2. 
Step 2: Pass the source word and its aligned tar-
get sentence (or paragraph) through the 
proposed model to extract the target 
word.  
Step 3: Apply linguistic processing R2 to re-
move superfluous tailing characters in 
the target word.  
 
After the above processing, the performance of 
source-target word extraction is significantly improved 
over the previous experiment.  
4 Experiments 
In this section, we focus on the setup of experiments 
and performance evaluation for the proposed model. 
4.1 Experimental Setup 
The corpus T0 for training consists of 2,430 pairs of 
English names together with their Chinese translitera-
tions. Two experiments are conducted. In the first ex-
periment, we analyze the convergence characteristics of 
this model training based on a similarity-based frame-
work (Chen et al., 1998; Lin and Chen, 2002). A valida-
tion set T1, consisting of 150 unseen person name pairs, 
was collected from Sinorama Magazine (Sinorama, 
2002). For each transliterated word in T1, a set of 1,557 
proper names is used as potential answers. In the second 
experiment, a parallel corpus T2 was prepared to evalu-
ate the performance of proposed methods. T2 consists of 
500 bilingual examples from the English-Chinese ver-
sion of the Longman Dictionary of Contempory English 
(LDOCE) (Proctor, 1988). 
4.2 Evaluation Metric 
In the first experiment, a set of source words was com-
pared with a given target word, and then was ranked by 
similarity scores. The source word with the highest 
similarity score is chosen as the answer to the back-
transliteration problem. The performance is evaluated 
by rates of the Average Rank (AR) and the Average 
Reciprocal Rank (ARR) following Voorhees and Tice 
(2000). 
∑
=
=
N
i
iR
N
AR
1
)(
1
   
 (15) 
∑
=
=
N
i
iR
N
ARR
1
)(
1
1
   
 (16) 
where N is the number of testing data, and R(i) is the 
rank of the i-th testing data. Higher values of ARR indi-
cate better performance.  
 
0
0.5
1
1.5
2
2.5
3
3.5
12345
Iteration number
Rate of AR
0.72
0.74
0.76
0.78
0.8
0.82
0.84
Rate of ARR
AR
ARR
 
Figure 7. Performance at each iteration on 
the validation set T1. 
 
In Figure 7, we show the rates of AR and ARR for the 
validation set T1 by varying the number of iterations of 
the EM training algorithm from 1 to 6. We note that the 
rates become saturated at the 2nd iteration, which indi-
cates the efficiency of the proposed training approach.  
As for the second experiment, performance on the 
extraction of transliterations is evaluated based on pre-
cision and recall rates on the word and character level. 
Since we consider exact one proper name in the source 
language and one transliteration in the target language at 
a time. The word recall rates are same as word precision 
rates: 
=)(WP Precision Word
 
.
 wordscorrect of number
 wordsextractedcorrectly  of number
 (17) 
 
The character level recall and precision are as follows: 
=)( CPprecision Character
 
,
characters correct of number
characters extractedcorrectly  of number
 (18) 
=)(CR Recall Character
 
.
characters correct of number
characters extractedcorrectly  of number
 (19) 
 
For the purpose of easier evaluation, T2 was de-
signed to contain exact one proper name in the source 
language and one transliteration in the target language 
for each bilingual example. Therefore, if more than one 
proper name occurs in a bilingual example, we separate 
them into several testing examples. We also separate a 
compound proper name in one example into individual 
names to form multiple examples. For example, in the 
first case, two proper names “Tchaikovsky” and “Stra-
vinsky” were found in the testing sample “Tchaikovsky 
and Stravinsky each wrote several famous ballets”. In 
the second case, a compound proper name “Cyril 
Tourneur” was found in “No one knows who wrote that 
play, but it is usually ascribed to Cyril Tourneur”. How-
ever, in the third case, “New York” is transliterated as a 
whole Chinese word “紐約 ”, so it can not be separated 
into two words. Therefore, the testing data for the above 
examples will be semi-automatically constructed. For 
simplicity, we considered each proper name in the 
source sentence in turn and determined its correspond-
ing transliteration independently. Table 1 shows some 
examples of the testing set T2. 
 
Table 1. Part of bilingual examples of the test-
ing set T2. 
 
In the experiment of transliterated word extraction, 
the proposed method achieves on average 86.0% word 
accuracy rate, 94.4% character precision rate, and 
96.3% character recall rate, as shown in row 1 of Table 
2. The performance can be further improved with a sim-
ple statistical and linguistic processing, as shown in 
Table 2. 
 
Methods WP CP CR 
Baseline 86.0% 94.4% 96.3% 
Baseline+R1 88.6% 95.4% 97.7% 
Baseline+R2 90.8% 97.4% 95.9% 
Baseline+R1+R2 94.2% 98.3% 97.7% 
Table 2. The experimental results of transliter-
ated word extraction for T2. 
 
In the baseline model, we find that there are some 
errors caused by translations which are not strictly trans-
literated; and there are some source words transferred 
into target words by means of transliteration and transla-
tion mutually. Therefore, R1 can be viewed as the pre-
processing to extract transliterated words. Some errors 
are further eliminated by R2 which considers the usage 
of the transliterated characters in the target language. In 
this experiment, we use a transliterated character set of 
735 Chinese characters. 
5 Conclusion 
In this paper, we describe a framework to deal with the 
problem of acquiring English-Chinese bilingual translit-
erated word pairs from parallel-aligned texts. An unsu-
pervised learning approach to the proposed machine 
transliteration model is also presented. The proposed 
approach automatically learned the parameters of the 
model from a bilingual proper name list. It is not re-
stricted to the availability of pronunciation dictionary in 
the source language. From the experimental results, it 
indicates that our methods achieve excellent perform-
ance. With the statistical-based characteristic of the pro-
posed model, we plan to extend the experiments to bi-
directional transliteration and other different corpora.  

References 
Yaser Al-Onaizan and Kevin Knight. 2002. Translating 
named entities using monolingual and bilingual re-
sources. In Proceedings of the 40th Annual Meeting 
of the Association for Computational Linguistics 
(ACL), pages 400-408. 
Hsin-Hsi Chen, Sheng-Jie Huang, Yung-Wei Ding, and 
Shih-Chung Tsai. 1998. Proper name translation in 
cross-language information retrieval. In Proceedings 
of 17th COLING and 36th ACL, pages 232-236. 
Yuen Ren Chao. 1968. A Grammar of spoken Chinese. 
Berkeley, University of California Press.  
Dagan, I., Church, K. W., and Gale, W. A. 1993. Robust 
bilingual word alignment for machine aided transla-
tion. In Proceedings of the Workshop on Very Large 
Corpora: Academic and Industrial Perspectives, 
pages 1-8, Columbus Ohio. 
Jose B. Cibelli, Robert P. Lanza, Michael D. West, and 
Carol Ezzell. 2002. What Clones? Scientific Ameri-
can, Inc., New York, January. 
http://www.sciam.com. 
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. 
Maximum likelihood from incomplete data via the 
EM algorithm. Journal of the Royal Statistical Soci-
ety, 39(1):1-38. 
G. D. Forney. 1973. The Viterbi algorithm. Proceedings 
of IEEE, 61:268-278, March. 
Kevin Knight and Jonathan Graehl. 1998. Machine 
transliteration. Computational Linguistics, 
24(4):599-612. 
Julian Kupiec. 1993. An algorithm for finding noun 
phrase correspondences in bilingual corpora. In Pro-
ceedings of the 40th Annual Conference of the 
Association for Computational Linguistics (ACL), 
pages 17-22, Columbus, Ohio. 
Jae Sung Lee and Key-Sun Choi. 1997. A statistical 
method to generate various foreign word translitera-
tions in multilingual information retrieval system. In 
Proceedings of the 2nd International Workshop on 
Information Retrieval with Asian Languages 
(IRAL'97), pages 123-128, Tsukuba, Japan. 
Wei-Hao Lin and Hsin-Hsi Chen. 2002. Backward 
transliteration by learning phonetic similarity. In 
CoNLL-2002, Sixth Conference on Natural Lan-
guage Learning, Taipei, Taiwan.  
Christopher D. Manning and Hinrich Schutze. 1999. 
Foundations of Statistical Natural Language Proc-
essing, MIT Press; 1st edition. 
I Dan Melamed. 1996. Automatic construction of clean 
broad coverage translation lexicons. In Proceedings 
of the 2nd Conference of the Association for Ma-
chine Translation in the Americas (AMTA'96), 
Montreal, Canada. 
Jong-Hoon Oh and Key-Sun Choi. 2002. An English-
Korean transliteration model using pronunciation 
and contextual rules. In Proceedings of the 19th In-
ternational Conference on Computational Linguis-
tics (COLING), Taipei, Taiwan. 
P. Proctor, 1988. Longman English-Chinese Dictionary 
of Contemporary English, Longman Group (Far 
East) Ltd., Hong Kong. 
Sinorama. 2002. Sinorama Magazine. 
http://www.greatman.com.tw/sinorama.htm. 
Bonnie Glover Stalls and Kevin Knight. 1998. Translat-
ing names and technical terms in Arabic text. In 
Proceedings of the COLING/ACL Workshop on 
Computational Approaches to Semitic Languages. 
Frank Z. Smadja, Kathleen McKeown, and Vasileios 
Hatzivassiloglou. 1996. Translating collocations for 
bilingual lexicons: a statistical approach. Computa-
tional Linguistics, 22(1):1-38. 
Keita Tsuji. 2002. Automatic extraction of translational 
Japanese-KATAKANA and English word pairs 
from bilingual corpora. International Journal of 
Computer Processing of Oriental Languages, 
15(3):261-279. 
Ellen M. Voorhees and Dawn M. Tice. 2000. The trec-8 
question answering track report. In English Text Re-
trieval Conference (TREC-8). 
Stephen Wan and Cornelia Maria Verspoor. 1998. 
Automatic English-Chinese name transliteration for 
development of multilingual resources. In Proceed-
ings of 17th COLING and 36th ACL, pages 1352-
1356. 
J. C. Wells. 2001. Longman Pronunciation Dictionary 
(New Edition), Addison Wesley Longman, Inc. 
Dekai Wu and Xuanyin Xia. 1994. Learning an English-
Chinese lexicon from a parallel corpus. In Proceed-
ings of the First Conference of the Association for 
Machine Translation in the Americas, pages 206–
213. 
