Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 81–88,
Sydney, July 2006. c©2006 Association for Computational Linguistics
A High-Accurate Chinese-English NE Backward Translation System 
Combining Both Lexical Information and Web Statistics 
 
 
Conrad Chen Hsin-Hsi Chen    
Department of Computer Science and Information Engineering, National 
Taiwan University, Taipei, Taiwan 
drchen@nlg.csie.ntu.edu.tw hhchen@csie.ntu.edu.tw 
 
  
 
Abstract 
Named entity translation is indispensable 
in cross language information retrieval 
nowadays. We propose an approach of 
combining lexical information, web sta-
tistics, and inverse search based on 
Google to backward translate a Chinese 
named entity (NE) into English. Our sys-
tem achieves a high Top-1 accuracy of 
87.6%, which is a relatively good per-
formance reported in this area until pre-
sent. 
1 Introduction 
Translation of named entities (NE) attracts much 
attention due to its practical applications in 
World Wide Web. The most challenging issue 
behind is: the genres of NEs are various, NEs are 
open vocabulary and their translations are very 
flexible. 
Some previous approaches use phonetic simi-
larity to identify corresponding transliterations, 
i.e., translation by phonetic values (Lin and Chen, 
2002; Lee and Chang, 2003). Some approaches 
combine lexical (phonetic and meaning) and se-
mantic information to find corresponding transla-
tion of NEs in bilingual corpora (Feng et al., 
2004; Huang et al., 2004; Lam et al., 2004). 
These studies focus on the alignment of NEs in 
parallel or comparable corpora.  That is called 
“close-ended” NE translation. 
In “open-ended” NE translation, an arbitrary 
NE is given, and we want to find its correspond-
ing translations. Most previous approaches ex-
ploit web search engine to help find translating 
candidates on the Internet. Al-Onaizan and 
Knight (2003) adopt language models to generate 
possible candidates first, and then verify these 
candidates by web statistics. They achieve a Top-
1 accuracy of about 72.6% with Arabic-to-
English translation. Lu et al. (2004) use statistics 
of anchor texts in web search result to identify 
translation and obtain a Top-1 accuracy of about 
63.6% in translating English out-of-vocabulary 
(OOV) words into Traditional Chinese. Zhang et 
al. (2005) use query expansion to retrieve candi-
dates and then use lexical information, frequen-
cies, and distances to find the correct translation. 
They achieve a Top-1 accuracy of 81.0% and 
claim that they outperform state-of-the-art OOV 
translation techniques then. 
In this paper, we propose a three-step ap-
proach based on Google to deal with open-ended 
Chinese-to-English translation. Our system inte-
grates various features which have been used by 
previous approaches in a novel way.  We observe 
that most foreign Chinese NEs would have their 
corresponding English translations appearing in 
their returned snippets by Google. Therefore we 
combine lexical information and web statistics to 
find corresponding translations of given Chinese 
foreign NEs in returned snippets. A highly effec-
tive verification process, inverse search, is then 
adopted and raises the performance in a signifi-
cant degree. Our approach achieves an overall 
Top-1 accuracy of 87.6% and a relatively high 
Top-4 accurracy of 94.7%.   
2 Background 
Translating NEs, which is different from translat-
ing common words, is an “asymmetric” transla-
tion. Translations of an NE in various languages 
can be organized as a tree according to the rela-
tions of translation language pairs, as shown in 
Figure 1. The root of the translating tree is the 
NE in its original language, i.e., initially de-
81
nominated. We call the translation of an NE 
along the tree downward as a “forward transla-
tion”. On the contrary, “backward translation” is 
to translate an NE along the tree upward. 
 
Figure 1. Translating tree of “Cien años soledad”. 
Generally speaking, forward translation is eas-
ier than backward translation. On the one hand, 
there is no unique answer to forward translation. 
Many alternative ways can be adopted to forward 
translate an NE from one language to another. 
For example, “Jordan” can be translated into “喬
丹  (Qiao-Dan)”, “喬登  (Qiao-Deng)”, “約旦  
(Yue-Dan)”, and so on. On the other hand, there 
is generally one unique corresponding term in 
backward translation, especially when the target 
language is the root of the translating tree.  
In addition, when the original NE appears in 
documents in the target language in forward 
translation, it often comes together with a corre-
sponding translation in the target language 
(Cheng et al., 2004). That makes forward transla-
tion less challenging. In this paper, we focus our 
study on Chinese-English backward translation, 
i.e., the original language of NE and the target 
language in translation is English, and the source 
language to be translated is Chinese.  
There are two important issues shown below 
to deal with backward translation of NEs or 
OOV words.  
• Where to find the corresponding translation? 
• How to identify the correct translation?  
NEs seldom appear in multi-lingual or even 
mono-lingual dictionaries, i.e., they are OOV or 
unknown words. For unknown words, where can 
we find its corresponding translation? A bilin-
gual corpus might be a possible solution. How-
ever, NEs appear in a vast context and bilingual 
corpora available can only cover a small propor-
tion. Most text resources are monolingual. Can 
we find translations of NEs in monolingual cor-
pora? While mentioning a translated name during 
writing, sometimes we would annotate it with its 
original name in the original foreign language, 
especially when the name is less commonly 
known. But how often would it happen? With 
our testing data, which would be introduced in 
Section 4, over 97% of translated NEs would 
have its original NE appearing in the first 100 
returned snippets by Google. Figure 2 shows 
several snippets returned by Google which con-
tains the original NE of the given foreign NE.  
Figure 2. Several Traditional Chinese snippets of 
“老人與海 ” returned by Google which contains 
the translation “The Old Man and the Sea”. 
When translations can be found in snippets, 
the next work would be identifying which name 
is the correct translation of NEs. First we should 
know how NEs would be translated. The com-
monest case is translating by phonetic values, or 
so-called transliteration. Most personal names 
and location names are transliterated. NEs may 
also be translated by meaning. It is the way in 
which most titles and nicknames and some or-
ganization names would be translated. Another 
common case is translating by phonetic values 
for some parts and by meaning for the others. For 
example, “Sears Tower” is translated into “西爾
斯 (Xi-Er-Si) 大 廈 (tower)” in Chinese. NEs 
would sometimes be translated by semantics or 
contents of the entity it indicates, especially with 
movies. Table 1 summarizes the possible trans-
lating ways of NEs. From the above discussion, 
we may use similarities in phonetic values, 
meanings of constituent words, semantics, and so 
CEPS 思博網 -- 文章書目 ;-1  
篇名 , 《 老人與海 》的象徵手法及作者的人生哲學 . 並列篇
名 , Symbolic Means of the Author "The Old Man and the 
Sea" ... 摘要 , 以象徵分析的方法對 《 老人與海 》中老人 、  
海、大魚等元素的象徵涵義進行了探索和解讀 ，分析了海明
威在小說中闡述的主題 ： “ ... 
www.ceps.com.tw/ec/ecjnlarticleView.aspx?jnlcattype=1& 
jnlptype=4&jnltype=29&jnliid=1370&i... - 26k - 頁庫存檔  - 類
似網頁  
 
.:JSDVD Mall:. 世界名著 -老人與海   
世界名著 -老人與海  · 太陽馬戲團 -夢幻人生 (DTS) · 紐約放電
俏姐妹  · 懷舊電影系列 16-秋決  · 艾瑪  · 奪命訓練班  · 新好男
孩 -電視演唱會  · 神鬼認證 -特別版  ... 世界名著 -老人與海 . The 
Old Man and The Sea. 4715320115018, 我們提供的付款方
式  ... 
mall.jsdvd.com/product_info.php?products_id=3198 - 48k - 補
充資料  - 頁庫存檔  - 類似網頁   
82
on to identify corresponding translations. Besides 
these linguistic features, non-linguistic features 
such as statistical information may also help use 
well. We would discuss how to combine these 
features to identify corresponding translation in 
detail in the next section.  
3 Chinese-to-English NE Translation 
As we have mentioned in the last section, we 
could find most English translations in Chinese 
web page snippets. We thus base our system on 
web search engine: retrieving candidates from 
returned snippets, combining both linguistic and 
statistical information to find the correct transla-
tion. Our system can be split into three steps: 
candidate retrieving, candidate evaluating, and 
candidate verifying. An overview of our system 
is given in Figure 3.  
 
Figure 3. An Overview of the System. 
In the first step, the NE to be translated, GN, 
is sent to Google to retrieve traditional Chinese 
web pages, and a simple English NE recognition 
method and several preprocessing procedures 
are applied to obtain possible candidates from 
returned snippets. In the second step, four fea-
tures (i.e., phonetic values, word senses, recur-
rences, and relative positions) are exploited to 
give these candidates a score. In the last step, the 
candidates with higher scores are sent to Google 
again. Recurrence information and relative posi-
tions concerning with the candidate to be veri-
fied of GN in returned snippets are counted 
along with the scores to decide the final ranking 
of candidates. These three steps will be detailed 
in the following subsections. 
3.1 Retrieving Candidates 
Before we can identify possible candidates, we 
must retrieve them first. In the returned tradi-
tional Chinese snippets by Google, there are still 
many English fragments. Therefore, the first 
task our system would do is to separate these 
English fragments into NEs and non-NEs. We 
propose a simple method to recognize possible 
NEs. All fragments conforming to the following 
properties would be recognized as NEs: 
• The first and the last word of the fragment 
are numerals or capitalized. 
• There are no three or more consequent low-
ercase words in the fragment. 
• The whole fragment is within one sentence. 
After retrieving possible NEs in returned snip-
pets, there are still some works to do to make a 
Translating Way Description Examples 
Translating by Pho-
netic Values 
The translation would have a similar 
pronunciation to its original NE. 
 “New York” and “紐約 (pronounced as Niu-
Yue)” 
Translating by Mean-
ing 
The translation would have a similar or a 
related meaning to its original NE. 
“ 紅 (red) 樓 (chamber) 夢 (dream)” and “The 
Dream of the Red Chamber” 
Translating by Pho-
netic Values for Some 
Parts and by Meaning 
for the Others 
The entire NE is supposed to be trans-
lated by its meaning and the name parts 
are transliterated. 
 “Uncle Tom’s Cabin” and “湯姆 (pronounced 
as Tang-Mu)叔叔的 (uncle’s)小屋 (cabin)” 
Translating by Both 
Phonetic Values and 
Meaning 
The translation would have both a similar 
pronunciation and a similar meaning to 
its original NE. 
“New Yorker” and “紐約 (pronounced as Niu-
Yue)客 (people, pronounced as Ke)” 
Translating NEs by 
Heterography 
The NE is translated by these hetero-
graphic words in neighboring languages. 
“ 橫濱 ” and “Yokohama”, “鈴木一朗 ” and 
“Ichiro Suzuki” 
Translating by Se-
mantic or Content 
The NE is translated by its semantic or 
the content of the entity it refers to. 
“The Mask” and “摩登 (modern)大 (great)聖
(saint)” 
Parallel Names NE is initially denominated as more than 
one name or in more than one language. 
“孫中山 (Sun Zhong-Shan)” and “Sun Yat-Sen” 
Table 1. Possible translating ways of NEs. 
83
finer candidate list for verification. First, there 
might be many different forms for a same NE. 
For example, “Mr. & Mrs. Smith” may also ap-
pear in the form of “Mr. and Mrs. Smith”, “Mr. 
And Mrs. Smith”, and so on. To deal with these 
aliasing forms, we transform all different forms 
into a standard form for the later ranking and 
identification. The standard form follows the 
following rules: 
• All letters are transformed into upper cases. 
• Words consist “’”s are split. 
• Symbols are rewritten into words. 
For example, all forms of “Mr. & Mrs. Smith” 
would be transformed into “MR. AND MRS. 
SMITH”. 
The second work we should complete before 
ranking is filtering useless substrings. An NE 
may comprise many single words. These com-
ponent words may all be capitalized and thus all 
substrings of this NE would be fetched as candi-
dates of our translation work. Therefore, sub-
strings which always appear with a same preced-
ing and following word are discarded here, since 
they would have a zero recurrence score in the 
next step, which would be detailed in the next 
subsection. 
3.2 Evaluating Candidates 
After candidate retrieving, we would obtain a 
sequence of m candidates, C1, C2, …, Cm. An 
integrated evaluating model is introduced to ex-
ploit four features (phonetic values, word senses, 
recurrences, and relative positions) to score 
these m candidates, as the following equation 
suggests: 
),(),(
),(
GNCLScoreGNCSScore
GNCScore
ii
i
⋅
=  
LScore(Ci,GN) combines phonetic values and 
word senses to evaluate the lexical similarity 
between Ci and GN. SScore(Ci,GN) concerns 
both recurrences information and relative posi-
tions to evaluate the statistical relationship be-
tween Ci and GN. These two scores are then 
combined to obtain Score(Ci,GN). How to esti-
mate LScore(Cn, GN) and SScore(Cn, GN) would 
be discussed in detail in the following subsec-
tions. 
3.2.1 Lexical Similarity 
The lexical similarity concerns both phonetic 
values and word senses. An NE may consist of 
many single words. These component words 
may be translated either by phonetic values or 
by word senses. Given a translation pair, we 
could split them into fragments which could be 
bipartite matched according to their translation 
relationships, as Figure 4 shows.  
 
Figure 4. The translation relationships of “湯姆
叔叔的小屋 ”. 
To identify the lexical similarity between two 
NEs, we could estimate the similarity scores be-
tween the matched fragment pairs first, and then 
sum them up as a total score. We postulate that 
the matching with the highest score is the correct 
matching. Therefore the problem becomes a 
weighted bipartite matching problem, i.e., given 
the similarity scores between any fragment pairs, 
to find the bipartite matching with the highest 
score. In this way, our next problem is how to 
estimate the similarity scores between fragments.  
We treat an English single word as a fragment 
unit, i.e., each English single word corresponds 
to one fragment. An English candidate Ci con-
sisting of n single words would be split into n 
fragment units, Ci1, Ci2, …, Cin. We define a Chi-
nese fragment unit that it could comprise one to 
four characters and may overlap each other. A 
fragment unit of GN can be written as GNab, 
which denotes the ath to bth characters of GN, 
and b - a < 4. The linguistic similarity score be-
tween two fragments is:  
)},(),,({
),(
ijabijab
ijab
CGNWSSimCGNPVSimMax
CGNLSim =  
Where PVSim() estimates the similarity in pho-
netic values while WSSim() estimate it in word 
senses.  
square6 Phonetic Value 
In this paper, we adopt a simple but novel 
method to estimate the similarity in phonetic 
values. Unlike many approaches, we don’t in-
troduce an intermediate phonetic alphabet sys-
tem for comparison. We first transform the Chi-
nese fragments into possible English strings, and 
then estimate the similarity between transformed 
strings and English candidates in surface strings, 
as Figure 5 shows. However, similar pronuncia-
tions does not equal to similar surface strings. 
Two quite dissimilar strings may have very simi-
lar pronunciations. Therefore, we take this strat-
84
egy: generate all possible transformations, and 
regard the one with the highest similarity as the 
English candidate. 
 
Figure 5. Phonetic similarity estimation of our 
system. 
Edit distances are usually used to estimate the 
surface similarity between strings. However, the 
typical edit distance does not completely satisfy 
the requirement in the context of translation 
identification. In translation, vowels are an unre-
liable feature. There are many variations in pro-
nunciation of vowels, and the combinations of 
vowels are numerous. Different combinations of 
vowels may have a same phonetic value, how-
ever, same combinations may pronounce totally 
differently. The worst of all, human often arbi-
trarily determine the pronunciation of unfamiliar 
vowel combinations in translation. For these rea-
sons, we adopt the strategy that vowels can be 
ignored in transformation. That is to say when it 
is hard to determine which vowel combination 
should be generated from given Chinese frag-
ments, we can only transform the more certain 
part of consonants. Thus during the calculation 
of edit distances, the insertion of vowels would 
not be calculated into edit distances. Finally, the 
modified edit distance between two strings A 
and B is defined as follow: 

 ==

=





+−−
+−
+−
=
=
=
→
→
→
→
→
→
else
BAiftsRep
consonantaisBif
vowlaisBiftIns
tsReptsED
tsED
tInstsED
tsED
ssED
ttED
ts
t
t
BA
BA
BA
BA
BA
BA
,1
,0),(
,1
,0)(
),()1,1(
,1),1(
),()1,(
min),(
)0,(
),0(
 
The modified edit distances are then transformed 
to similarity scores: 
)}(),(max{
))(),((1),(
BLenALen
BLenALenEDBAPVSim BA→−=  
Len() denotes the length of the string. In the 
above equation, the similarity scores are ranged 
from 0 to 1. 
We build the fixed transformation table manu-
ally. All possible transformations from Chinese 
transliterating characters to corresponding Eng-
lish strings are built. If we cannot precisely indi-
cate which vowel combination should be trans-
formed, or there are too many possible combina-
tions, we ignores vowels. Then we use a training 
set of 3,000 transliteration names to examine 
possible omissions due to human ignorance.  
square6 Word Senses 
More or less similar to the estimation of pho-
netic similarity, we do not use an intermediate 
representation of meanings to estimate word 
sense similarity. We treat the English transla-
tions in the C-E bilingual dictionary (reference 
removed for blind review) directly as the word 
senses of their corresponding Chinese word en-
tries. We adopt a simple 0-or-1 estimation of 
word sense similarity between two strings A and 
B, as the following equation suggests: 





=
dictionary in the
  ofon  translatia is  if ,1
dictionary in the    
  ofon  translatianot  is  if,0
),( AB
AB
BAWSSim  
All the Chinese foreign names appearing in test 
data is removed from the dictionary. 
From the above equations we could derive 
that LSim() of fragment pairs is also ranged from 
0 to 1. Candidates to be evaluated may comprise 
different number of component words, and this 
would result the different scoring base of the 
weighted bipartite matching. We should normal-
ize the result scores of bipartite matching. As a 
result, the following equation is applied: 










+−⋅
=
∑
∑
GN
abCGNLSim
C
CGNLSim
GNCLScore
ijab
ijab
CGN ijab
i
CGN ijab
i
in  characters of # Total
 )1(),(
,in   wordsof # Total
),(
min
),(
 and  pairs matched all
 and  pairs matched all  
3.2.2 Statistical Similarity 
Two pieces of information are concerned to-
gether to estimate the statistical similarity: recur-
rences and relative positions. A candidate Ci 
might appear l times in the returned snippets, as 
Ci,1, Ci,2, …, Ci,l. For each Ci,k, we find the dis-
85
tance between it and the nearest GN in the re-
turned snippets, and then compute the relative 
position scores as the following equation: 
  14/),(
1),(
,
, +=
ki
ki CGNDistanceGNCRP
 
In other words, if the candidate is adjacent to the 
given NE, it would have a relative position score 
of 1. Relative position scores of all Ci,k would be 
summed up to obtain the primitive statistical 
score: 
PSS(Ci, GN) = ∑ k RP(Cn,k, GN) 
As we mentioned before, since the impreci-
sion of NE recognition, most substrings of NEs 
would also be recognized as candidates. This 
would result a problem. There are often typos in 
the information provided on the Internet. If some 
component word of an NE is misspelled, the 
substrings constituted by the rest words would 
have a higher statistical score than the correct 
NE. To prevent such kind of situations, we in-
troduce entropy of the context of the candidate. 
If a candidate has a more varied context, it is 
more possible to be an independent term instead 
of a substring of other terms. Entropy provides 
such a property: if the possible cases are more 
varied, there is higher entropy, and vice versa. 
Entropy function here concerns the possible 
cases of the most adjacent word at both ends of 
the candidate, as the following equation suggests: 


⋅−
=
=
∑ i iCT irNPTir
i
NCNCTNCNCT
CEntropy
else ,/log/
1context  possible of #  while,1
) ofContext (
 
Where NCTr and NCi denote the appearing times 
of the rth context CTr and the candidate Ci in the 
returned snippets respectively, and NPTi denotes 
the total number of different cases of the context 
of Ci. Since we want to normalize the entropy to 
0~1, we take NPTi as the base of the logarithm 
function. 
While concerning context combinations, only 
capitalized English word is discriminated. All 
other words would be viewed as one sort 
“OTHER”. For example, assuming the context 
of “David” comprises three times of (Craig, 
OTHER), three times of (OTHER, Stern), and 
six times of (OTHER, OTHER), then: 
946.0)126log126123log123123log123(
)David"" ofContext (
333 =⋅+⋅+−
=Entropy  
Next we use Entropy(Context of Ci) to weight 
the primitive score PSS(Ci, GN) to obtain the 
final statistical score.: 
)() ofContext (
)(
,GNCPSSCEntropy
,GNCSScore
ii
i
⋅
=  
3.3 Verifying Candidates 
In evaluating candidate, we concern only the 
appearing frequencies of candidates when the 
NE to be translated is presented. In the other 
direction, we should also concern the appearing 
frequencies of the NE to be translated when the 
candidate is presented to prevent common words 
getting an improper high score in evaluation. We 
perform the inverse search approach for this 
sake. Like the evaluation of statistical scores in 
the last step, candidates are sent to Google to 
retrieve Traditional Chinese snippets, and the 
same equation of SScore() is computed concern-
ing the candidate. However, since there are too 
many candidates, we cannot perform this proc-
ess on all candidates. Therefore, an elimination 
mechanism is adopted to select candidates for 
verification. The elimination mechanism works 
as follows: 
1. Send the Top-3 candidates into Google for 
verification. 
2. Count SScore(GN, Ci). (Notice that the or-
der of the parameter is reversed.) Re-weight 
Score(Ci, GN) by multiplying SScore(GN, 
Ci) 
3. Re-rank candidates 
4. After re-ranking, if new candidates become 
the Top-3 ones, redo the first step. Other-
wise end this process. 
The candidates have been verified would be re-
corded to prevent duplicate re-weighting and 
unnecessary verification.  
There is one problem in verification we 
should concern. Since we only consider recur-
rence information in both directions, but not co-
occurrence information, this would result some 
problem when dealing rarely used translations. 
For example, “Peter Pan” can be translated into 
“彼得潘 ” or “彼德潘 ” (both pronounced as Bi-
De-Pan) in Chinese, but most people would use 
the former translation. Thus if we send “Peter 
Pan” to verification when translating “彼德潘 ”, 
we would get a very low score.  
To deal with this situation, we adopt the strat-
egy of disbelieving verification in some situa-
86
tions. If all candidates have scores lower than 
the threshold, we presume that the given NE is a 
rarely used translation. In this situation, we use 
only Score(Cn, GN) estimated by  the evaluation 
step to rank its candidates, without multiplying 
SScore(GN, Ci) of the inverse search. The 
threshold is set to 1.5 by heuristic, since we con-
sider that a commonly used translation is sup-
posed to have their SScore() larger than 1 in both 
directions.   
4 Experiments 
To evaluate the performance of our system, 15 
common users are invited to provide 100 foreign 
NEs per user. These users are asked to simulate 
a scenario of using web search machine to per-
form cross-lingual information retrieval. The 
proportion of different types of NEs is roughly 
conformed to the real distribution, except for 
creation titles. We gathers a larger proportion of 
creation titles than other types of NEs, since the 
ways of translating creation titles is less regular 
and we may use them to test how much help 
could the web statistics provide. 
After removing duplicate entries provided by 
users, finally we obtain 1,119 nouns. Among 
them 7 are not NEs, 65 are originated from Ori-
ental languages (Chinese, Japanese, and Korean), 
and the rest 1,047 foreign NEs are our main ex-
perimental subjects. Among these 1,047 names 
there are 455 personal names, 264 location 
names, 117 organization names, 196 creation 
titles, and 15 other types of NEs.  
Table 2 and Figure 5 show the performance of 
the system with different types of NEs. We 
could observe that the translating performance is 
best with location names. It is within our expec-
tation, since location names are one of the most 
limited NE types. Human usually provide loca-
tion names in a very limited range, and thus 
there are less location names having ambiguous 
translations and less rare location names in the 
test data. Besides, because most location names 
are purely transliterated, it can give us some 
clues about the performance of our phonetic 
model.  
Our system performs worst with creation titles. 
One reason is that the naming and translating 
style of creation titles are less formulated. Many 
titles are not translated by lexical information, 
but by semantic information or else. For exam-
ple, “Mr. & Mrs. Smith” is translated into “史密
斯任務 (Smiths’ Mission)” by the content of the 
creation it denotes. Another reason is that many 
titles are not originated from English, such as “le 
Nozze di Figaro”. It results the C-E bilingual 
dictionary cannot be used in recognizing word 
sense similarity. A more serious problem with 
titles is that titles generally consist of more sin-
gle words than other types of NEs. Therefore, in 
the returned snippets by Google, the correct 
translation is often cut off. It would results a 
great bias in estimating statistical scores.  
Table 3 compares the result of different fea-
ture combinations. It considers only foreign NEs 
in the test data. From the result we could con-
clude that both statistical and lexical features are 
helpful for translation finding, while the inverse 
search are the key of our system to achieve a 
good performance. 
60%
65%
70%
75%
80%
85%
90%
95%
100%
1 5 9 13 17 21 25 29
Ranking
Recall at TOP N
PER
LOC
ORG
Title
Other
Oriental
Non-NE
 
Figure 5. Curve of recall versus ranking. 
Top-1 Top-2 Top-4 Top-M  Total 
Num Recall Num Recall Num Recall Num Recall 
PER 455 408 89.7% 430 94.5% 436 95.8% 443 97.3% 
LOC 264 242 91.7% 252 95.5% 253 95.8% 264 100.0% 
ORG 117 98 83.8% 106 90.6% 108 92.3% 114 97.4% 
TITLE 196 151 77.0% 168 85.7% 181 92.3% 189 96.4% 
Other 15 10 66.7% 13 86.7% 14 93.3% 15 100.0% 
All NE 1047 909 87.6% 969 92.6% 992 94.7% 1025 97.9% 
Oriental 65 47 72.3% 52 80.0% 55 84.6% 60 92.3% 
Non-NE 7 6 85.7% 6 85.7% 6 85.7% 7 100.0% 
Overall 1119 962 86.0% 1027 91.8% 1053 94.1% 1092 97.6% 
Table 2. Experiment results of our system with different NE types. 
 
87
Top-1 Top-2 Top-4  
Num Recall Num Recall Num Recall 
SScore 540 51.6% 745 71.2% 887 84.7% 
LScore 721 68.9% 789 75.4% 844 80.6% 
SScore + LScore 837 79.9% 916 87.5% 953 91.0% 
+ Inverse Search 909 87.6% 969 92.6% 992 94.7% 
Table 3. Experiment results of our system with different feature combinations. 
 
From the result we could also find that our 
system has a high recall of 94.7% while consid-
ering top 4 candidates. If we only count in the 
given NEs with their correct translation appear-
ing in the returned snippets, the recall would go 
to 96.8%. This achievement may be not yet good 
enough for computer-driven applications, but it 
is certainly a good performance for user querying. 
5 Conclusion 
In this study we combine several relatively sim-
ple implementations of approaches that have 
been proposed in the previous studies and obtain 
a very good performance. We find that the Inter-
net is a quite good source for discovering NE 
translations. Using snippets returned by Google 
we can efficiently reduce the number of the pos-
sible candidates and acquire much useful infor-
mation to verify these candidates. Since the 
number of candidates is generally less than proc-
essing with unaligned corpus, simple models can 
performs filtering quite well and the over-fitting 
problem is thus prevented. 
From the failure cases of our system, (see Ap-
pendix A) we could observe that the performance 
of this integrated approach could still be boosted 
by more sophisticated models, more extensive 
dictionaries, and more delicate training mecha-
nisms. For example, performing stemming or 
adopting a more extensive dictionary might en-
hance the accuracy of estimating word sense 
similarity; the statistic formula can be replaced 
by more formal measures such as co-occurrences 
or mutual information to make a more precise 
assessment of statistical relationship. These tasks 
would be our future works in developing a more 
accurate and efficient NE translation system.  
Reference 
Al-Onaizan, Yaser and Kevin Knight. 2002. Translat-
ing Named Entities Using Monolingual and Bilin-
gual Resources. ACL 2002: 400-408. 
Cheng, Pu-Jen, J.W. Teng, R.C. Chen, J.H. Wang, 
W.H. Lu, and L.F. Chien. Translating unknown 
queries with web corpora for cross-language in-
formation retrieval. SIGIR 2004: 146-153. 
Feng, Donghui, Lv Y., and Zhou M. 2004. A New 
Approach for English-Chinese Named Entity 
Alignment. EMNLP 2004: 372-379. 
Huang, Fei, Stephan Vogel, and Alex Waibel. 2003. 
Improving Named Entity Translation Combining 
Phonetic and Semantic Similarities. HLT-NAACL 
2004: 281-288. 
Lam, Wai, Ruizhang Huang, and Pik-Shan Cheung. 
2004. Learning phonetic similarity for matching 
named entity translations and mining new transla-
tions. SIGIR 2004: 289-296. 
Lee, Chun-Jen and Jason S. Chang. 2003. Acquisition 
of. English-Chinese Transliterated Word Pairs 
from Parallel-Aligned Texts. HLT-NAACL 2003. 
Workshop on Data Driven MT: 96-103. 
Lin, Wei-Hao and Hsin-Hsi Chen. 2002. Backward 
Machine Transliteration by Learning Phonetic 
Similarity. Proceedings of CoNLL-2002: 139-145. 
Lu, Wen-Hsiang, Lee-Feng Chien, and Hsi-Jian Lee. 
2004. Anchor Text Mining for Translation of Web 
Queries: A Transitive Translation Approach. ACM 
Transactions on Information Systems 22(2): 242-
269. 
Zhang, Ying, Fei Huang, and Stephan Vogel. 2005. 
Mining translations of OOV terms from the web 
through cross-lingual query expansion. SIGIR 
2005: 669-670. 
Zhang, Ying and Phil Vines. 2004. Using the web for 
automated translation extraction in cross-language 
information retrieval. SIGIR 2004: 162-169.  
Appendix A. Some Failure Cases of Our 
System 
GN Top 1  Correct Translation Rank 
海珊  CBS SADDAM HUSSEIN 2 
紐澤西  JERSEY NEW JERSEY 2 
天方夜譚  ONLINE ARABIAN NIGHTS 2 
勞斯萊斯  ROYCE ROLLS ROYCE 2 
朱利斯厄文  NBA JULIUS ERVING 2 
艾薇兒  LAVIGNE AVRIL LAVIGNE 2 
羅琳  JK JK. ROWLING 2 
塞爾蒂克  RICKY DAVIS CELTICS 8 
印象日出  MONET IMPRESSION SUNRISE 9 
蘇聯  TUPOLEV TU USSR 33 
梅德維登科  NBA MEDVENDENKO N/A 
命運交響曲  TOS SYMPHONY NO. 5 N/A 
愛的教育  AROUND03 CUORE N/A 
民主黨  JACK LAYTON DEMOCRATIC PARTY N/A 
88
