Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 435–442, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Cluster-specific Named Entity Transliteration 
 
 
Fei Huang 
 
School of Computer Science 
Carnegie Mellon University, Pittsburgh, PA 15213 
fhuang@cs.cmu.edu 
 
Abstract 
Existing named entity (NE) transliteration 
approaches often exploit a general model to 
transliterate NEs, regardless of their origins. 
As a result, both a Chinese name and a 
French name (assuming it is already trans-
lated into Chinese) will be translated into 
English using the same model, which often 
leads to unsatisfactory performance. In this 
paper we propose a cluster-specific NE 
transliteration framework. We group name 
origins into a smaller number of clusters, 
then train transliteration and language mod-
els for each cluster under a statistical ma-
chine translation framework. Given a source 
NE, we first select appropriate models by 
classifying it into the most likely cluster, 
then we transliterate this NE with the corre-
sponding models. We also propose a phrase-
based name transliteration model, which ef-
fectively combines context information for 
transliteration. Our experiments showed 
substantial improvement on the translitera-
tion accuracy over a state-of-the-art baseline 
system, significantly reducing the 
transliteration character error rate from 
50.29% to 12.84%. 
1 Introduction 
Named Entity (NE) translation and transliteration 
are very important to many multilingual natural 
language processing tasks, such as machine trans-
lation, crosslingual information retrieval and ques-
tion answering. Although some frequently 
occurring NEs can be reliably translated using in-
formation from existing bilingual dictionaries and 
parallel or monolingual corpora (Al-Onaizan and 
Knight, 2002; Huang and Vogel, 2002; Lee and 
Chang, 2003), less frequently occurring NEs, espe-
cially new names, still rely on machine translitera-
tion to generate their translations. 
NE machine transliteration generates a phoneti-
cally similar equivalent in the target language for a 
source NE, and transliteration patterns highly de-
pend on the name’s origin, e.g., the country or the 
language family this name is from. For example, 
when transliterating names
1
 from Chinese into 
English, as shown in the following example, the 
same Chinese character “金” is transliterated into 
different English letters according to the origin of 
each person. 
金人庆 --- Jin  Renqing (China) 
金大中 --- Kim Dae-jung (Korea) 
丁马 路德 金 --- Martin Luther King (USA) 
金丸信 --- Kanemaru Shin (Japan) 
何塞 华金 布伦纳 --- Jose Joa quin Brunner (Chile) 
Several approaches have been proposed for 
name transliteration. (Knight and Graehl, 1997) 
proposed a generative transliteration model to 
transliterate foreign names in Japanese back to 
English using finite state transducers. (Stalls and 
Knight, 1998) expanded that model to Arabic-
English transliteration. (Meng et al. 2001) devel-
oped an English-Chinese NE transliteration tech-
nique using pronunciation lexicon and phonetic 
mapping rules. (Virga and Khudanpur, 2003) ap-
plied statistical machine translation models to 
“translate” English names into Chinese characters 
for Mandarin spoken document retrieval. All these 
approaches exploit a general model for NE trans-
literation, where source names from different ori-
gins or language families are transliterated into the 
target language with the same rules or probability 
distributions, which fails to capture their different 
                                                 
1
 Assuming foreign names are already transliterated into Chi-
nese. 
435
transliteration patterns. Alternatively, (Qu and Gre-
fenstette, 2004) applied language identification of 
name origins to select language-specific translit-
erations when back-transliterating Japanese names 
from English to Japanese. However, they only 
classified names into three origins: Chinese, Japa-
nese and English, and they used the Unihan data-
base to obtain the mapping between kenji 
characters and romanji representations.  
Ideally, to explicitly model these transliteration 
differences we should construct a transliteration 
model and a language model for each origin. How-
ever, some origins lack enough name translation 
pairs for reliable model training. In this paper we 
propose a cluster-specific NE transliteration 
framework. Considering that several origins from 
the same language family may share similar trans-
literation patterns, we group these origins into one 
cluster, and build cluster-specific transliteration 
and language models.  
Starting from a list of bilingual NE translation 
pairs with labeled origins, we group closely related 
origins into clusters according to their language 
and transliteration model perplexities. We train 
cluster-specific language and transliteration models 
with merged name translation pairs. Given a source 
name, we first select appropriate models by classi-
fying it into the most likely cluster, then we trans-
literate the source name with the corresponding 
models under the statistical machine translation 
framework. This cluster-specific transliteration 
framework greatly improves the transliteration per-
formance over a general transliteration model. Fur-
ther more, we propose a phrase-based 
transliteration model, which effectively combines 
context information for name transliteration and 
achieves significant improvements over the tradi-
tional character-based transliteration model.  
The rest of the paper is organized as following: 
in section 2 we introduce the NE clustering and 
classification schemes, and we discuss the phrase-
based NE transliteration in section 3. Experiment 
settings and results are given in section 4, which is 
followed by our conclusion. 
2 Name Clustering and Classification 
Provided with a list of bilingual name translation 
pairs whose origins are already labeled, we want to 
find the origin clusters where closely related ori-
gins (countries sharing similar languages or cul-
tural heritages) are grouped together.  
We define the similarity measure between two 
clusters as their LM and TM perplexities. Let 
)},{(
iii
EFS = denote a set of name translation 
pairs from origini , from which model
i
θ is trained: 
),,(
)(it)()( ieici
PPP=θ . Here and are N-gram 
character language models (LM) for source and 
target languages, and is a character translation 
model trained based on IBM translation model 1 
(Brown et.al. 1993). The distance between origin i  
and origin 
)(ic
P
)(ie
P
)(it
P
j  can be symmetrically defined as: 
)|(log
||
1
)|(log
||
1
),(
ij
j
ji
i
SP
S
SP
S
jid θθ −−=
, 
where, assuming name pairs are generated inde-
pendently, 
)]|()(
)|()(log[)|(
)()(
||
1
)()(
t
i
t
ijt
t
ije
S
t
t
i
t
ijt
t
ijcji
EFPEP
FEPFPSP
i
∑
=
+∝θ
 
We calculate the pair-wise distances among 
these origins, and cluster them with group-average 
agglomerative clustering. The distance between 
clusters and is defined as the average dis-
tance between all origin pairs in each cluster. This 
clustering algorithm initially sets each origin as a 
single cluster, then recursively merges the closest 
cluster pair into one cluster until an optimal num-
ber of clusters is formed.  
i
C
j
C
Among all possible cluster configurations, we 
select the optimal cluster number based on the 
model perplexity. Given a held-out data set L, a list 
of name translation pairs from different origins, the 
probability of generating L from a cluster configu-
ration 
ω
Θ is the product of generating each name 
pair from its most likely origin cluster: 
∏
∏
=
Θ∈
=
Θ∈
=
=Θ
||
1
)()(
||
1
)()()(max
)()|,(max)|(
L
t
j
t
je
t
jcj
L
t
jj
tt
j
PEPFP
PEFPLP
θ
θθ
ω
ω
ω
 
We calculate the language model perplexity: 
||/1
)|(log
||
1
)|(2),(
L
LP
L
LPLpp
−
Θ−
Θ==Θ
ωω
ω
, 
and select the model configuration with the small-
est perplexity. We clustered 56K Chinese-English 
name translation pairs from 112 origins, and evalu-
ate the perplexities of different models (number of  
436
 Figure 1. Perplexity value of LMs with different 
number of clusters  
Afghanistan, Algeria, Egypt, Iran, Iraq, 
Jordan, Kuwait, Pakistan, Palestine, 
clusters) with regard to a held-out 3K name pairs. 
As shown in Figure 1, the perplexity curve reaches 
its minimum when . This indicates that the 
optimal cluster number is 45. 
45=n
Table 1 lists some typical origin clusters. One 
may notice that countries speaking languages from 
the same family are often grouped together. These 
countries are either geographically adjacent or his-
torically affiliated. For example, in the English 
cluster, the Netherlands (Dutch) seems an abnor-
mality. In the clustering process it was first 
grouped with the South Africa, which was colo-
nized by the Dutch and the English in the seven-
teenth century. This cluster was further grouped 
into the English-speaking cluster. Finally, some 
origins cannot be merged with any other clusters 
because they have very unique names and transla-
tion patterns, such as China and Japan, thus they 
are kept as single origin clusters.  
For name transliteration task, given a source 
name F we want to classify it into the most likely 
cluster, so that the appropriate cluster-specific 
model can be selected for transliteration. Not 
knowing F’s translation E, we cannot apply the 
translation model and the target language model 
for name origin classification. Instead we train a 
Bayesian classifier based on N-gram source char-
acter language models, and assign the name to the 
cluster with the highest LM probability. Assuming 
a source name is composed of a sequence of source 
characters: . We want to find the 
cluster such that  
},...,,{
21 l
fffF =
*
j
                        (1) 
)()(maxarg
)|()(maxarg
)|(maxarg
)(
*
FPP
FPP
FPj
jcjj
jjj
jj
θ
θθ
θ
=
=
=
where )(
j
P θ is the prior probability of cluster j, 
estimated based on its distribution in all the train-
ing data, and is the probability of generat-
ing this source name based on cluster
)(
)(
FP
jc
j ’s character 
language model. 
3 Phrase-Based Name Transliteration  
Statistical NE transliteration is similar to the statis-
tical machine translation in that an NE translation 
pair can be considered as a parallel sentence pair, 
where “words” are characters in source and target 
languages. Due to the nature of name translitera-
tion, decoding is mostly monotone.  
 
Arabic 
Saudi Arabia, Sudan, Syria, Tunisia, 
Yemen, … 
Spanish- 
Portuguese 
Angola, Argentina, Bolivia, Brazil, 
Chile, Colombia, Cuba, Ecuador, Mex-
ico, Peru, Portugal, Spain, Venezuela, 
… 
English 
Australia, Canada, Netherlands, New 
Zealand, South Africa, UK, USA, … 
Russian Belarus, Kazakhstan, Russia, Ukraine 
East Euro-
pean 
Bosnia and Herzegovina, Croatia, 
Yugoslavia 
French  
(African) 
Benin, Burkina Faso, Cameroon, Cen-
tral African Republic, Congo, Gabon, 
Ivory Coast 
German Austria, Germany, Switzerland 
French Belgium, France, Haiti 
Korean North Korea, South Korea 
Danish- 
Swedish 
Denmark, Norway, Sweden 
Single Clus-
ters 
China 
Japan 
Indonesia 
Israel 
…… 
Table 1 Typical name clusters (n=45) 
437
NE transliteration process can be formalized as: 
)()|(maxarg)|(maxarg* EPEFPFEPE
EE
==  
where 
*
E is the most likely transliteration for the 
source NE F, P(F|E) is the transliteration model 
and P(E) is the character-based target language 
model. We train a transliteration model and a lan-
guage model for each cluster, using the name 
translation pairs from that cluster. 
3.1 Transliteration Model 
A transliteration model provides a conditional 
probability distribution of target candidates for a 
given source transliteration unit: a single character 
or a character sequence, i.e., “phrase”. Given 
enough name translation pairs as training data, we 
can select appropriate source transliteration units, 
identify their target candidates from a character 
alignment path within each name pair, and estimate 
their transliteration probabilities based on their co-
occurrence frequency.  
A naive choice of source transliteration unit is a 
single character. However, single characters lack 
contextual information, and their combinations 
may generate too many unlikely candidates. Moti-
vated by the success of phrase-based machine 
translation approaches (Wu 1997, Och 1999, 
Marcu and Wong 2002 and Vogel et. al., 2003), we 
select transliteration units which are long enough 
to capture contextual information while flexible 
enough to compose new names with other units. 
We discover such source transliteration phrases 
based on a character collocation likelihood ratio 
test (Manning and Schutze 1999). This test accepts 
or rejects a null hypothesis that the occurrence of 
one character is independent of the other, , by 
calculating the likelihood ratio between the inde-
pendent ( ) and dependent ( ) hypotheses: 
1
f
2
f
0
H
1
H
),,(log),,(log
),,(log),,(log
)(
)(
loglog
211221112
1122112
1
0
pcNccLpccL
pcNccLpccL
HL
HL
−−−−
−−+=
=λ
 
L is the likelihood of getting the observed character 
counts under each hypothesis. Assuming the char-
acter occurrence frequency follows a binomial dis-
tribution,        
knk
xx
k
n
xnkL
−
−
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
= )1(),,(
, 
1221
,, ccc  are the frequencies of , and , 
and 
1
f
2
f
21
^ ff
N is the total number of characters. and 
are defined as: 
1
, pp
2
p
N
c
p
2
=
,     
2
12
1
c
c
p =
,     
1
122
2
cN
cc
p
−
−
=
. 
We calculate the likelihood ratio for any adja-
cent source character pairs, and select those pairs 
whose ratios are higher than a predefined threshold.  
Adjacent character bigrams with one character 
overlap can be recursively concatenated to form 
longer source transliteration phrases. All these 
phrases and single characters are combined to con-
struct a cluster-specific phrase segmentation vo-
cabulary list, T. For each name pair in that cluster, 
we  
1. Segment the Chinese character sequence 
into a source transliteration phrase se-
quence based on maximum string match-
ing using T; 
2. Convert Chinese characters into their ro-
manization form, pinyin, then align the 
pinyin with English letters via phonetic 
string matching, as described in (Huang et. 
al., 2003); 
3. Identify the initial phrase alignment path 
based on the character alignment path; 
4. Apply a beam search around the initial 
phrase alignment path, searching for the 
optimal alignment which minimizes the 
overall phrase alignment cost, defined as: 
∑
∈
=
Aa
aiA
i
i
efDA ),(minarg*
. 
Here is the i th source phrase in F, is its tar-
get candidate under alignment A. Their alignment 
cost D is defined as the linear interpolation of the 
phonetic transliteration cost log  and semantic 
translation cost log : 
i
f
i
a
e
trl
P
trans
P
)|(log)1()|(log),( fePfePefD
transtrl
λλ −+=
, 
where is the 
trl
P product of the letter transliteration 
probabilities over aligned pinyin-English letter 
pairs, 
trans
P is the phrase translation probability  
calculated from word translation probabilities, 
where a “word” refers to a Chinese character or a 
English letter. More details about these costs are 
described in (Huang et. al., 2003). λ  is a cluster-
438
specific interpolation weight, reflecting the relative 
contributions of the transliteration cost and the 
translation cost. For example, most Latin language 
names are often phonetically translated into Chi-
nese, thus the transliteration cost is usually the 
dominant feature. However, Japanese names are 
often semantically translated when they contain 
characters borrowed from Chinese, therefore the 
translation cost is more important for the Japanese 
model (λ =0 in this case). We empirically select 
the interpolation weight for each cluster, based on 
their transliteration performance on held-out name 
pairs, and the combined model with optimal inter-
polation weights achieves the best overall perform-
ance. 
We estimate the phrase transliteration probabil-
ity according to their normalized alignment fre-
quencies. We also include frequent sub-name 
translations (first, middle and last names) in the 
transliteration dictionary. Table 2 shows some 
typical transliteration units (characters or phrases) 
from three clusters. They are mostly names or sub-
names capturing cluster-specific transliteration 
patterns. It also illustrates that in different clusters 
the same character has different transliteration 
candidates with different probabilities, which justi-
fies the cluster-specific transliteration modeling. 
 
        穆罕默德 
  mohamed 
阿卜杜勒 
abdul 
艾哈迈德 
ahmed 
Arabic 
尤 : yo (0.27)  y(0.19)  you(0.14)… 
约翰 
john 
威廉 
william 
彼得 
peter 
English 
尤 : u(0.25)  you(0.38)  joo(0.16)… 
弗拉基米尔 
vladimir 
伊万诺夫 
ivanov 
-耶维奇 
-yevich 
Russian 
尤：  yu(0.49)  y(0.08)  iu(0.07)… 
Table 2. Transliteration units examples from three 
name clusters. 
3.2 Language model and decoding 
For each cluster we train a target character lan-
guage model from target NEs. We use the N-gram 
models with standard smoothing techniques. 
During monotone decoding, a source NE is 
segmented into a sequence of transliteration units, 
and each source unit is associated with a set of tar-
get candidate translations with corresponding prob-
abilities. A transliteration lattice is constructed to 
generate all transliteration hypotheses, among 
which the one with the minimum transliteration 
and language model costs is selected as the final 
hypothesis.   
4 Experiment Results 
We selected 62K Chinese-English person name 
translation pairs for experiments. These origin-
labeled NE translation pairs are from the name 
entity translation lists provided by the LDC
2
 
(including the who’swho (china) and who’swho 
(international) lists), and devided into three parts: 
system training (90%), development (5%) and 
testing (5%). In the development and test data, 
names from each cluster followed the same 
distribution as in the training data. 
4.1 NE Classification Evaluation 
We evaluated the source name classification ac-
curacy, because classification errors will lead to 
incorrect model selection, and result in bad 
transliteration performance in the next step. We 
trained 45 cluster-specific N-gram source character 
language models, and classified each source name 
into the most likely cluster according to formula 1. 
We evaluated the classification accuracy on a held-
out test set with 3K NE pairs. We also experi-
mented with different N values. Table 3 shows the 
classification accuracy, where the 3-gram model 
achieves the highest classification accuracy. A de-
tailed analysis indicates that some classification 
errors are due to the inherent uncertainty of some 
names, e. g, “骆家辉  (Gary Locke)”, a Chinese 
American, was classified as a Chinese name based 
on its source characters while his origin was la-
beled as USA. 
 
Table 3. Source name origin classification accura-
cies
                                                 
2
 http://www.ldc.upenn.edu 
N=2 N=3 N=4 N=5 N=6 N=7 
83.62 84.88 84.00 84.04 83.94 83.94
439
4.2 NE Transliteration Evaluation 
We first evaluated transliteration results for each 
cluster, then evaluated the overall results on the 
whole test set, where a name was transliterated 
using the cluster-specific model in which it was 
classified. The evaluation metrics are:  
• Top1 accuracy (Top1), the percentage 
that the top1 hypothesis is correct, i.e., 
the same as the reference translation; 
• Top 5 accuracy (Top5), the percentage 
that the reference translation appears in 
the generated top 5 hypotheses; 
• Character error rate (CER), the percent-
age of incorrect characters (inserted, de-
leted and substituted English letters) 
when the top 1 hypothesis is aligned to 
the reference translation. 
Our baseline system was a character-based 
general  transliteration model, where 56K NE pairs 
from all clusters were merged to train a general 
transliteration model and a language model 
(CharGen). We compare it with a character-based 
cluster-specific model (CharCls) and a phrase-
based cluster-specific model (PhraCls). The CERs 
of several typical clusters are shown in Table 4. 
Because more than half of the training name 
pairs are from Latin language clusters, the general 
transliteration and language models adopted the 
Latin name transliteration patterns. As a result, it 
obtained reasonable performance (20-30% CERs) 
on  Latin language names, such as Spanish, 
English and French names, but strikingly high 
(over 70%) CERs on oriental language names such 
as Chinese and Japanese names, even though the 
Chinese cluster has the most training data.  
When applying the character-based cluster-
specific models, transliteration CERs consistently 
decreased for all clusters (ranging from 6.13% 
relative reduction for the English cluster to 97% 
for the Chinese cluster). As expected, the oriental 
language names obtained the most significant error 
reduction because the cluster-specific models were 
able to represent their unique transliteration 
patterns. When we applied the phrased-based 
transliteration models, CERs were further reduced 
by 23% ~ 51% for most clusters, because the 
context information were encapsulated in the 
transliteration phrases. An exception was the 
Chinese cluster, where names were often translated 
according to the pinyin of single characters, thus 
phrase-based transliteration slightly decreased the 
performance.  
The transliteration performance of different 
clusters varied a lot. The Chinese cluster achieved 
96.09% top 1 accuracy and 1.69% CER with the 
character-based model, and other clusters had 
CERs ranging from 7% to 30%. This was partly 
because of the lack of training data (e.g, for the 
Japanese cluster), and partly because of unique 
transliteration patterns of different languages. We  
try to measure this difference using the average 
number of translations per source phrase 
(AvgTrans), as shown in Table 4. This feature 
reflected the transliteration pattern regularity, and 
seemed linearly correlated with the CERs. For 
example, compared with the English cluster, 
Russian names have more regular translation 
patterns, and its CER is only 1/3 of the English 
cluster, even with only half size of training data.  
In Table 5 we compared translation examples 
from the baseline system (CharGen), the phrase-
based cluster-specific system (PhraCls) and a 
online machine translation system, the BabelFish
3
. 
The CharGen system transliterated every name in 
the Latin romanization way, regardless of each 
name’s original language. The BabelFish system 
inappropriately translated source characters based 
on their semantic meanings, and the results were 
difficult to understand. The PhraCls model 
captured cluster-specific contextual information, 
and achieved the best results. 
We evaluated three models’ performances on all 
the test data, and showed the result in Table 6. The 
CharGen model performed rather poorly 
transliterating oriental names, and the overall CER 
was around 50%. This result was comparable to 
other state-of-the-art statistical name transliteration 
systems (Virga and Khudanpur, 2003). The 
CharCls model significantly improved the top1 
and top 5 transliteration accuracies from 3.78% to 
51.08%, and from 5.84% to 56.50%, respectively.  
Consistently, the CER was also reduced from 
50.29% to 14.00%. Phrase-based transliteration 
further increased the top 1 accuracy by 9.3%, top 5 
accuracy by 10.7%, and reduced the CER by 8%, 
relatively. All these improvements were 
statistically significant. 
                                                 
3
 http://babelfish.altavista.com/ 
440
  
 
 
 
Table 4. Cluster-specific transliteration comparison 
 
 
 
 
Table 5. Transliteration examples from some typical clusters 
 
 
 
 
Cluster 
Training 
data size 
CharGen 
(CER) 
CharCls 
(CER) 
PhraCls 
(CER) 
AvgTrans 
Arabic 8336 22.88 18.93 14.47 4.58 
Chinese 27093 76.45 1.69 1.71 3.43 
English 8778 31.12 29.21 17.27 5.02 
French 2328 27.66 18.81 9.07 3.51 
Japanese 2161 86.94 38.65 29.60 7.57
 
Russian 4407 29.17 9.62 6.55 3.64 
Spanish 8267 18.87 15.99 10.33 3.61 
Cluster Source  Reference CharGen PhraCls BabelFish 
Arabic 
纳吉  萨布里
艾哈迈德  
Nagui Sabri 
Ahmed 
Naji Saburi 
Ahamed 
Naji Sabri  
Ahmed 
In natrium 吉
萨 cloth    
Aihamaide 
Chinese 范志伦  Fan Zhilun Van Tylen Fan zhilun Fan Zhilun 
English 
罗伯特        
斯特德沃德  
Robert    
Steadward 
Robert   
Stdwad 
Robert       
Sterdeward 
Robert Stead 
Warder 
French 
让 -吕克       
科雷捷  
Jean-luc    
Cretier 
Jean-luk  
Crete 
Jean-luc    
Cretier 
Let - Lu Keke 
lei Jie 
Japanese 小林隆治  
Kobayashi 
Ryoji 
Felinonge 
Kobayashi 
Takaji 
Xiaolin pros-
perous gov-
erns 
Russian 
弗拉基米尔  
萨姆索诺夫  
Vladimir  
Samsonov 
Frakimir  
Samsonof 
Vladimir  
Samsonov 
弗拉基 mil 
sum rope 
Knoff 
Spanish 
道夫鲁       
卡多索 
Rodolfo     
Cardoso 
Rudouf      
Cardoso 
Rodolfo 
Cadozo 
Rudolph card 
multi- ropes 
441
Model Top1 (%) Top5 (%) CER (%) 
CharGen 3.78±0.69 5.84±0.88 50.29±1.21 
CharCls 51.08±0.84 56.50±0.87 14.00±0.34 
PhraCls 56.00±0.84 62.66±0.91 12.84±0.41 
Table 6 Transliteration result comparison 
5 Conclusion 
We have proposed a cluster-specific NE translit-
eration framework. This framework effectively 
modeled the transliteration differences of source 
names from different origins, and has demon-
strated substantial improvement over the baseline 
general model. Additionally, phrase-based translit-
eration further improved the transliteration per-
formance by a significant margin. 
 
References  
Y. Al-Onaizan and K. Knight. 2002. Translating 
named entities using monolingual and bilingual 
resources. In Proceedings of the ACL-2002, 
pp400-408, Philadelphia, PA, July, 2002. 
F. Huang and S. Vogel. 2002. Improved Named 
Entity Translation and Bilingual Named Entity 
Extraction, Proceedings of the ICMI-2002. 
Pittsburgh, PA, October 2002 
F. Huang, S. Vogel and A. Waibel. 2003. Auto-
matic Extraction of Named Entity Translingual 
Equivalence Based on Multi-feature Cost Mini-
mization.  Proceedings of the ACL-2003, Work-
shop on Multilingual and Mixed Language 
Named Entity Recognition. Sapporo, Japan. 
K. Knight and J. Graehl. 1997. Machine Translit-
eration. Proceedings of the ACL-1997. pp.128-
135, Somerset, New Jersey. 
C. J. Lee and J. S. Chang. 2003. Acquisition of 
English-Chinese Transliterated Word Pairs from 
Parallel-Aligned Texts using a Statistical Ma-
chine Transliteration Model. HLT-NAACL 2003 
Workshop: Building and Using Parallel Texts: 
Data Driven Machine Translation and Beyond. 
pp96-103, Edmonton, Alberta, Canada. 
C. D. Manning and H. Schütze. 1999. Foundations 
of Statistical Natural Language Processing. MIT 
Press. Boston MA. 
H. Meng, W. K. Lo, B. Chen and K. Tang. 2001. 
Generating Phonetic Cognates to Handle Named 
Entities in English-Chinese Cross-Language 
Spoken Document Retrieval. Proceedings of the 
ASRU-2001, Trento, Italy, December.2001 
D. Marcu and W. Wong. A Phrase-Based, Joint 
Probability Model for Statistical Machine Trans-
lation. Proceedings of EMNLP-2002, Philadel-
phia, PA, 2002 
F. J. Och, C. Tillmann, and H. Ney. Improved 
Alignment Models for Statistical Machine 
Translation. pp. 20-28; Proc. of the Joint Conf. 
of Empirical Methods in Natural Language 
Processing and Very Large Corpora; University 
of Maryland, College Park, MD, June 1999. 
Y. Qu, and G. Grefenstette. Finding Ideographic 
Representations of Japanese Names Written in 
Latin Script via Language Identification and 
Corpus Validation. ACL 2004: 183-190 
P. Virga and S. Khudanpur. 2003. Transliteration 
of Proper Names in Cross-Lingual Information 
Retrieval. Proceedings of the ACL-2003 Work-
shop on Multi-lingual Named Entity Recognition 
Japan. July 2003. 
S. Vogel, Y. Zhang, F. Huang, A. Tribble, A. 
Venogupal, B. Zhao and A. Waibel. The CMU 
Statistical Translation System, Proceedings of 
MT Summit IX New Orleans, LA, USA, Sep-
tember 2003 
D. Wu. Stochastic Inversion Transduction Gram-
mars and Bilingual Parsing of Parallel Corpora. 
Computational Linguistics 23(3):377-404, Sep-
tember 1997. 
442
