An English to Korean Transliteration Model 
of Extended Markov Window 
SungYoung Jung SungLim Hong* 
Information Technology Lab. 
LG Electronics Institute of Technology 
Seoul, Korea 
e-mail : syjung, pack}@lg-elite.com 
Eunok Pack 
Abstract 
Automatic transliteration problem 1s to 
transcribe foreign words in one's own alphabet. 
Machine generated transliteration can be useful 
in various applications such as indexing in an 
information retrieval system and pronunciation 
synthesis in a text-to-speech system. In this 
paper we present a model for statistical English- 
to-Korean transliteration that generates 
transliteration candidates with probability. The 
model is designed to utilize various information 
sources by extending a conventional Markov 
window. Also, an efficient and accurate method 
for alignment and syllabification of 
pronunciation units is described. The 
experimental results show a recall of 0.939 for 
trained words and 0.875 for untrained words 
when the best 10 candidates are considered. 
Introduction 
As the amount of international communication 
increases, more foreign words arc flooding into 
the Korean . Especially in the area of 
comlmter and information science, it has been 
reported that 29.4% of index terms are 
transliterated fiom or directly written in English 
in the case of a balanced corpus, KT-SET \[18\]. 
The transliteration of l'oreign words is 
indispensable in Korean  processing. 
In information retrieval, a simple method of 
processing foreign words is via query term 
translation based on a synonym dictionary of 
foreign words and their target transliteration. It is 
necessary to automate the construction process of 
a synonym dictionary since its maintenance 
requires continuous efforts for ever-incoming 
foreign words. Another area to which 
transliteration can be applied is a text-to-speech 
system where orthographic words are transcribed 
into phonetic symbols, in such applications, 
maximum likelihood \[15\], decision tree \[1\], 
neural network \[10\] or weighted finited-state 
acceptor \[19\] has been used for finding the best 
fit. 
English-to-Korean transliteration problem is that 
of generating an appropriate Korean word given 
an English word. In general, there can be various 
possible transliterations in Korean which 
correspond to a single English word. It is 
COlnmon that the newly imported foreign word is 
transliterated into several possible candidate 
words based on pronunciation, out of which only 
a few survive in competition over a period of 
time. In tiffs respect, a statistical approach makes 
sense where multiple transliteration variations 
exist for one word, generating candidates in 
probable order. 
In this paper, we present a statistical method to 
transliterate English words in Korean alphabet to 
generate various candidates. In the next section, 
we describe a phonetic mapping table 
construction. In Section 2, we describe alignment 
and syllabification methods, and in Section 3, 
mathematical formulation for a statistical model 
is presented. Section 4 provides experimental 
results, and finally, we state our conclusions. 
~' Present a&h'ess: Sevice Engineering Team, Chollia, Service Development Division, DACOM Cotporation, Seoul, 
Korea (E-mail : syrup913@chollia,.net) 
383 
1 Phonetic mapping table construction 
First of all, we generate a mapping between 
English and Korean phonetic unit pairs (Table 6). 
In doing so, we use pronunciation sylnbols for 
English words (Table 5) as defined in the Oxford 
computer-usable dictionary \[12\]. The English 
and Korean phonetic unit can be a consonant, a 
vowel or some composite of them so as to make 
transliteration mapping unique and accurate. The 
orthography for foreign word transliteration to 
Korean provides a siml?le mapping from English 
to Korean phonetic units. But in reality, there are 
a lot of transliteration cases that do not follow 
the orthography. Table 6-1 has been constructed 
by examining a significant amount of corpus so 
that we can cover as many cases as possible. 
Table 6-2 shows complex cases where a 
combination of two or more English phonelnes 
are mapped to multiple candidates of a 
composite Korean phonetic unit. This phonetic 
mapping table is carefully constructed so as to 
produce a unique candidate in syllabification and 
aligmnent in the training stage. When a given 
English pronunciation can be syllabificated into 
serveral milts or a single composite unit, we 
adopt a heuristic that only the composite unit 
consisting of longer phonetic units is considered. 
For example, the English phonetic unit "u @" can 
be mapped to a Korean phonetic unit ,,@o\] 
\[u@\]" or "w°q \[w@\]" even though the 
colnposition of each unit mapping of "u" and 
"@" can result in other composite mappings 
such as "-°r-°\] \[juG\]", ,,~o\] \[wI@\]", ,,o_o\] 
\[wuja\]", etc. This composite phonetic unit 
mapping is also useful for statistical tagging 
since composite units provide more accurate 
statistical information when they are well 
devised. 
2 Alignment and syllabification 
method 
The alignment and syllabification process is 
critical for probabilistic tagging as it is closely 
linked to computational complexity. There can 
be combinatorial explosion of state sequences 
because potential syllables may overlap the same 
letter sequences. A statistical approach called, 
Forward-Backward parameter estimation 
algorithm, is used by Sharman in phonetic 
transcription problem \[2\]. But a statistical 
approach for syllabification requires expensive 
computatioual resources and a large amount of 
training corpus. Moreover, it often results in 
many improper candidates. In this paper, we 
propose a simple heuristic alignment and 
syllabification method that is fast and efficient. 
The maiu principle in separating phonetic units is 
to manage a phonetic unit of English and that of 
Korean to be mapped in a unique way. For 
example, the pronunciation notation "@R" of the 
suffix "-er" in "computer" is mapped to ,,cq 
\[@R\]" in Korean. In this case, the complex 
pronunciation "@R" is treated as one phonetic 
unit. There are many such examples in complex 
vowels, as in "we" to "~-\]\] \[we\]", "'jo" to ,,.~o 
\[jo TM j, etc. It is essential to come up with a 
phonetic unit mapping table that can reduce the 
time complexity of a tagger and also contribute 
to accurate transliteration results. Table 6 shows 
the examples of phonetic units and their mapping 
to Korean. 
The alignment process in training consists of two 
stages. The first is consonant alignment which 
identifies corresponding consonant pairs by 
scanning the English phonetic unit mad Korean 
notation. The second is vowel alignment which 
separates corresponding vowel pairs within the 
consonant alignment results of stage 1. Figure 1 
shows an aligmnent example in training. The 
aligned and syllabificated units are used to 
extract statistical inforination from the training 
corpus. The alignment process always produces 
one result. This is possible because of the 
predefined English to Korean phonetic unit 
mapping in Table 6. 
Input: 
English pronunciation 
and Korean notation 
First stage: 
consonant alignment 
Second stage: 
vowel alignment 
k@mput@R 
=1 t ~ ~z-II -E ¢ 
k @ lm/pu/t @ R/ 
~l -\] I~ lsr.'lFIt4 "l I ¢ 
kl @lmlplultl @ RI 
<Figure l>AIignment example for training data input. 
'/' mark: a segmentation position by a consonant 
'\]' mark: a segmentation position by a vowel. 
384 
Figure 1. shows an example of syllabification 
and alignment. To take the English word 
"computer" as an exalnple, the English 
pronunciation notation "k@mpu@R" is retrieved 
froln the Oxford dictionary, in the first stage, it 
is segmented in flont of the consonants "k", "m", 
"p" and "t" which are aligned with the 
corresponding Korean consonants "=1 \[k\]", "rJ 
\[m\]", "~ \[p\]" and "E It\]". In the second stage, 
it is segmented in flont of the vowels "@", "u" 
and "@R" and aligned with the corresponding 
Korean vowels "-\] \[@R\]", "-lT \[ju\]" and 
"-\] \[@R\]". The composite vowel "@R" is not 
divided into two simple vowels "@" and "R" 
since it is aligned to Korean "-\] \[@R\]" in 
accordance with entry in Table 6-2. When it is 
possible to syllabificate in more than one ways, 
only the longest phonetic unit is selected so that 
an alignment always ends up being unique 
during the training process. 
After the training stage, an input English word 
must be syllabificated automatically so that it 
can be transliterated by our tagger. During this 
stage, all possible syllabil'ication candidates are 
geuerated and are given as inputs to the 
statistical tagger so that the proper Korean 
notation can be found. 
3 Statistical transliteration model 
A probabilistic tagger finds the most probable set 
of Korean notation candidates fl'om the possible 
syllabificated results of English pronunciation 
notation. Let \[7, 8, 9\] proposed a statistical 
transliteration model based on the statistical 
translation model-I by Brown \[2\] that uses only 
a simple information source of a word pair. 
Various kinds of information sources are 
involved in the English to Korean transliteration 
problem. But it is not easy to systematically 
exploit various information sources by extending 
the Markov window in a statistical model. The 
tagging model proposed in this paper exploits 
not only simple pronunciation unit-to-unit 
mapping froul English to Korean, but also more 
complex contextual information of multiple units 
mapping. In what follows, we explain how the 
contextual information is represented as 
conditional probabilities. 
An English word E's pronunciation S is fouud in 
a phonetic dictionary. Suppose that S can be 
segmented into a sequence of syllabificated units 
sis 2...s. where s~ is an English phonetic unit 
as in Table 6. Also suppose thatKis a Korean 
word, where lq is the i-th phonetic unit of K. 
S.~- SIS 2 "" Sn~ 
K = k 1 k2... k,, 
(l) 
Let us say P(E, K) is the probability that an 
English word E is transliterated to a Korean 
word K. What we have to find is K where P(E, 
K) is lnaximized given E. This probability can be 
approximated by substituting the English word E 
with its prontmciation S. Thus, the following 
formula holds. 
arg max P(E, K) 
K 
arg max P(S, K) 
K 
= arg max P(K I S)P(S) 
K 
(2) 
where P(S) is called  model and I'(K\]S) 
is called translation, model. P(S) is not constant 
given a fixed input word because there can be a 
number of syllabification candidates. 
In detwmining k~, four neighborhood variables 
are taken into account, while conventional 
tagging models use only two neighborhood 
wuiables. The extended Markov window of 
infolnlation source is defined as in Figure 2. It 
also shows a conventional Markov window using 
a dashed line. Mathematical fornmlation for 
Markov window extension is not an easy 
probleln since extended window aggravates data 
sparseness. We will explain our solution in the 
next step. 
Korean notation ikj 1 Icj 
.................... :.~..i 
Extended Markov window "...~.. 
Conventional Markov window 
<Figure 2> Extended Markov window of information 
source t"01" k i 
385 
Now, the translation lnodel, P(K\[S) in equation 
(2) can be approximated by Markov assumption 
as follows. 
p(/<71s) ~ l~p(k, I k,_,,s.,.s,s,+,) (3) 
i 
Equation(3) still has data sparseness problem in 
gathering information directly from the training 
corpus. So we expand it using Markov chain in 
order to replace the conditional probability term 
in (3) with more fragmented probability terms. 
P(k,k,,_\],%,s;,~;+O 
~k,_,s;_,,s;,%) 
_ p(k,. ,)p(~ I ~_,)~(.~,. I ~_,k,.)p(,~; I ~_,k,,~._,)z%., I k,. y,_lk,.,)) 
m Z~k,. ,)Z~.% I k,. ,)P(s, I k,. ,.%)P(.% I k,. lS~_vs~) 
~P(~ Ik, ,)P("'-' j-k'-lk') P(,,. I ~,,._,) ~,~.+, I ~,,.) 
- P(.s,_, I ki_l) ~.':1.':-,) r(.,,+, I.,,) 
(4) 
In Equation (4), there are two kinds of 
approximations in probability terms. First, 
P(s~ \]kjs~ ~) and P(sj \]sj_~) are substituted for 
p(s~lkj_4k~s~_l) and P(s, I k, iSi_l)' respectively. 
This approxilnation is based on our heuristic that 
kj_~and s~ ~ provide somewhat redundant 
information in deterlnining s,. Secondly, 
P(s,~ Ik~s~) and P(s,~ Is~) are substituted for 
p(si+ I I ki_lsi_lkisi) and P(s, I I k,_,s,_,s,). 
respectively, based on a heuristic that k,_ts~_ ~ is 
farther off than k, sj. and is redundant. Equation 
(4) can be reduced to Equation (5) because 
l'(s,, I k,,k,) of (4) is equivalent to ,'(k, Is, ,k, ,) 
P(Si_ , \[ ki_l) l'(k~ I k, ~) 
mathematically. 
P(ki I K_,s,_,,~',s,+,) 
P(k~ l s.,k~_,)P(s, I k, s,_,)P(s,+, I k,s,) 
P(si I s,_,)P(s,+, Is,) 
(5) 
The  model we use is a bigram  
model (Brown et al. \[61) 
p(s) ~ H p(s, I .~',-,) (6) 
i 
Now, our statistical tagging model can be 
formulated as Equation (7) when the translation 
model (5) and the  model (6) are applied 
to the transliteration model (2) 
.'. argmaxP(S, K) 
K 
rnr P(k, l s.,k,_,)P(s, I k, s.OP(s., I k~s,) = arglnaxl 1 
x , P(si+, j si) 
(7) 
English Pron: o o o ~isi-1 @_ @si+l 
Korean Notatio% e o @ "N~ 0 0 0 
<Figure 3> Statistical information source used in the extended Markov window 
(1)\]'(k,l.s.,_\]k, ,) (2) l'(s, lk,s,_,) (3)P(ss+ ,\]kr% ) 
Figure 3 pictorially summarizes the final 
information sources that our statistical tagger 
utilizes. It can be thought of as a generalized case 
of prevalent Part-of-Speech tagging model. 
When P(kj I s,-lk~-i) is approximated as 
P(kiJki_\]), P(siJkisi_,) as P(sjJkj). and 
P(s,~ \[k~s,) as P(s,+ I \]s,). then Equation (7) is 
reduced to a conventional bigram tagging model 
(Eq. 8). that is a base model of Brown model-1 
\[2\]. Charniak \[4\], Merialdo \[11\] and Lee \[7, 8. 
9\]. 
arg max P(S. K) --_ arg max I~ P(ki I k.,)P(s~ I ki) (8) K K i 
Equation (7) is the final tagging model proposed 
in this paper. We use a back-off strategy \[10, 1 1\] 
as follows, because our tagging model may have 
a data sparseness problem. 
P(k, \] s,_,k,_l).~ P(k, I k,,), if Count(s,_,ki_l) = 0 
.,'(s, I k, s, ,) .~ p(.~, I k,). if Counz(k,s,_,) = 0 
p(si+ 1 \]kisi) w, F(s/+ 1 Is/), if Coltrlt(kisi) = 0 
Each probability term in equation (7) is obtained 
fi'om the training data. The statistical tagger 
modeled here uses Viterbi algorithm \[12\] for its 
search to get N-Best candidates. 
386 
4. Experiniental restllts 
For the evahlation we constructed a training 
corpus of 8368 English-Korean word pairs. One 
I{nglish word can have one or more Korean 
transliteration entries in the corpus. 90% of the 
corpus is used as trainirig data and 10% of the 
corpus as test data. For nlore objective 
experiment owthiation, we estinmted word-level 
accuracy based Oil exact strillg lllatch OVOI1 
though many other papers are based oil lexical- 
level distance to the correct word. We adopted a 
recall measure based on wordqevel accuracy. 
Recall 111easure is the average nuulber of 
generated corrool words divided by the iohil 
word COtlllt el: prepared correct allswer set given 
{111 input word (Eq. 9). Precision ll\]e{isul'e is tile 
average number o1' relrieved correct words 
divided by the number of generated candidates 
(Eq. 10). 
Recall = cottnl(<generaled, correcl wos'dv) (9) 
count(possible, correct._ worUs) 
Precision= c(,zmt(x, enerated con'oct_: word, v) (10) 
comet(generated_ words) 
For words not found in the pronunciation 
dictionary, a transcription ailtonlata is used Io 
lransfornl the English alphabet to the Korean 
alphabet, k transcription aulonlata can be helpful 
bocatlsO it ilsos alphabetic illfornlalioll thai otir 
statistical tag;ger does ilot llso. \]'tie atltom;_ila 
produces one result aiid ailachos it at the end of 
N-best results of the statistical tagger. This 
automata has about 500 lranscriplion rules, based 
oil previous, current, and lleXt coulext window 
and production alphabet. 
I 
0.9 
0.8 
0.7 
0.6 
O 0.t~ 
¢1) 0.4 
0.3 
0.2 
0.1 
0 ................... ~t ~t'candidates 
<Figure 4> Reoall value for each number 
of candidates 
All experimental results are estimated by 10-fold 
cross validation for more accurate results. Table 
1 shows the estimated recall wlhles for the 10- 
best results generated by the tagger and for tile 
case when transcription automata used as well. 
Figure 4 shows recall wihies given a number of 
candidates. 
Pure suilistical lagger (cq. 7) 
Transcription autonlala used as 
well 
Trained Test 
0.925 0.850 
0.939 0.875 
<Table 1 > esthnaied recall result on 10-best results 
We estimated recall values in tile same 
environment for conventional tagging model (Eq. 
8) in order to compare accuracy improvement by 
lhe I;xicndod Markov window model (Eq. 7) 
without transcription atltonlata (Table 2). 
I~xicnded Markov 
window model (Eq. 7) 
Conventional tagging 
model (Eq. 8) 
Trained \] untrained 
0.925 0.850 
0.878 0.796 
<Table 2> comparison of recall wilucs on 10-best 
results el' lhc proposed Extended Markov model with 
conventional taggillg inede\] 
It cannot be COmlmred directly with tilt results of 
other models since other rchttcd works arc on a 
different donmin and they adopt different 
evahlalion lllOaStlros such as lexical-levol 
accuracy \[1, 7, 10\]. There is a model that is ill the 
same doumin, i.e., l~nglish-to-Korean 
transliteration \[1, 8, 9\], but it adopts a lexical- 
level accuracy nloasule \[17, 9\], or a subjective 
evahlation measure such as human judgment \[8, 
9\]. Table 3 shows the comparison with Lee's 
model which adopted the average wthle of 
trained and untrained results, even though the 
average of trained and untrairiod results makes no 
sense since a registered dictionary lookup 
method can make all experiments oil trained data 
100% accurate. Accuracy measure on untrained 
data should be tile measure for comparison 
among different experiments for fairness. 
Extended Markov 
window model 
l,ce - hybrid \[171 
Trained 
0.962 
Untrained 
0.888 
Averagc:(train+untraincd ) / 2 
0.471 
387 
<Table 3> Comparison with Lee's model, word based 
recall measure on 20-best results. Lee's result of 
single frequency coverage \[ 17\] is the same as a recall 
measure in this paper. 
Lee's model is a fully statistical approach even 
in pronunciation unit alignment and 
syllabification that may cause inaccurate results, 
while we use a heuristic approach in 
pronunciation unit alignment. Another 
significant difference is that Lee's model uses 
only conventional information sources such as a 
bigram while our model use various information 
sources fi'om extended Markov window. Lee's 
model transliterates using English alphabet 
directly without pronounciation dictionary so 
that it can be better for unknown words or proper 
I1oun. 
Trained Untrained 
Extended Markov 64.0% 54.9% 
window 
transcription (no training) 36.4% 
automata 
MBRtalk \[16\] (100%) 43% 
Neural Net \[ 10\] 37.7% 26.2% 
WFSA \[ 19\] ? 64% 
Lee's - modified Average:( train + untrained ) / 2 
Direct \[9\] 38.1% 
<Table 4> comparison with other models - word 
accuracy (precision) of 1-best candidate. 
Table 4 shows the comparison with MBRtalk 
\[ 16\], Neural Network \[ i 0\], Weighted finite-state 
acceptor (WFSA) \[19\] and direct transliteration 
\[9\] even though they are based on different 
problem domains. MBRtalk and Neural Network 
models are based on English word's 
pronunciation generation. WFSA is for English- 
to-Japanese transliteration. Experiment for 
training data in MBRtalk makes no sense, since 
it finds the most similar word in a database that 
stores all the training data; thus the result would 
always produce the exact answer. The results 
show that the model we propose indicates the 
good performance. 
Conclusion 
We have proposed a statistical English-to- 
Korean transliteration model that exploits 
various information sources. This model is a 
generalized model from a conventional statistical 
tagging model by extending Markov window 
with some inathelnatical approximation 
techniques. An alignment and syllabification 
method instead of a statistical method is 
proposed for accurate and fast operation. The 
experimental results show that the model 
proposed in this paper demonstrates significant 
improvement in its performance. 
Phnnelie svmhnl examr~le Phone|ie ~vmhnl e×nmnle 
i b e a d S shhe d 
I bid Z b e i,Q& 
e be_ed tS etch 
& bad dZ e _d~g~_ 
A bard p tk b d g 
0 (zero) cod m n f v s z 
O (cap O) cord r I w h j 
U good ei day_ 
u food @ U go. 
V bud a l 
3 b izd a U c o w 
@ about ol by_ 
N sinE& I@ beer 
T t_hhin e @ b a re 
D t h.en U @ to u r 
<Table 5: l?ronunciation symbol to Ascii mapping 
\[3\]> 
English Korean unit Korean unit 
phonetic unil (orthography) (extra) 
I 
W 
@ 
O 
E .X E I 
r- 
K~ 2 2 
(o) 
ot 
2,A EE :K:: E 
77 
X ~A ;~X 
E .Z. :;K 
£ol ~ootl oF o4 
o\[ 
2 
oH 011 YJ o~ o -?7- 
-?-r~o 
-£E- (fa ~-F) ol _9_ 
0tl °41 ol 04 o oi- 
o 
<Table 6-1: Examples of English to Korean phonetic 
unit mapping (simple cases)> 
388 
English 
Phonetic unil 
I@ 
@U 
U@ 
@R 
AI 
el 
wa 
w3 
we 
w@ 
w@U 
ja 
jA 
Korean units Korean units 
(orthography) (extra) 
olot 
o 
-~ot q 
ol-ol 
oqol 
ql 
-S~l 
q 
o:1 
ol- 
olG ol ° oI o ~r 
Oo O 
ot 
ol 
o\[otl 
(-q-) -S-o~ q 
q 
-'q- q ql 
o ot ~ olot o~ olo4 
oF 
<Table 6-2: Examl)les of English to Korean phonctic 
unit mapping (complex cases)> 

References 

\[1\] Bahl, L. R. and et al. (1991) Decisiolt Trees for 
Phonological Rules itt Continuous Speech. IEEE 
ICASSP, pp. 125-138. 

\[12\] Brown, P. F. and et al. (1990) The Mathematics of 
Statistical Machi~te Trat~slation: Parameter 
Estimatio~t. Computational I~inguistics, 
Computational Linguistics, V. 19, No. 2, June. 

\[3\] Brown, P. F. and et al. (1992) Class-Based ~z- 
gram Models of Natural Language. Association 
for Compulational Linguistics, V. 18, No. 4. 

\[4\] Charniak, E. (1993) Statistical Models in Natural- 
Language Processing. MIT Press. 

\[15\] Jung, S. Y. and el al. (1996) Markov mttdomfield 
based English Part-of Speech taggi~tg system. 
International conference of Computational 
Linguistics (COLING). 

116\] Katz, S. (1987) Estimation of probabilities from 
,~pame data for the  model component of 
a .weech recognizer. IEEE Transactions on ASSP, 
34(3), pp400-401. 

\[7\]Lce, J. S. and K. S. Choi (1997) Automatic 
Foreign Word Transliteratiott Model ./'or 
h!forrnation Retrieval. 4 '~' Korean Information 
management conference proceeding, Seoul, Korea, 
pp 17 -24. 

\[18\] Lee, J. S. and K. S. Choi. (1997)A Statistical 
Method to Generate Various Foreign Word 
Transliterations in Multilingual lnfonnatioll 
Retrieval System. Proceedings of the 2 '"t 
International Workshop on \]nforlnation Retrieval 
with Asian s (IRAL'97), pp123-128, 
Tsukuba-shi, Japan. 

\[19\] Lee, J. S. and K. S. Choi. (1998) Ettglish to 
Korean Statistical Transliteration for htformation 
Retrieval. Colnputer Processing of Oriental 
s. 

\[10\] Lucas, S.M. and R.I. Damper. (1991) Syntactic 
neural networks for bi-directional text-phottetics 
trattslation. Talking Machines, North Holland, 
pp 127-141. 

\[11\] Merialdo, B. (1994) Tagging E~tglish Text with 
a Pmbabilistic Model Association for 
Computational Linguistics, V. 20, No. 2, pp155- 
171. 

\[ 12\] Roger, M. (1992) A Description of a computer- 
usable dictionao, file based ott the Oaford 
adva~med lean~er's dictionao~ of current Ettglish. 
l)epartment of computer science, University of 
London. 

\[13\] Rosen fold, R. (1994) Adaptive Statistical 
Language Modelling: A Maximum E~ttropy 
Approach. Technical report, CMU-CS-94-138, 
Carnegie Mellon University. 

\[14\] Salton, G. (1988) Automatic Text Processing. 
Addison Wesley. 

1115\] Sharman, R. A. (1994) Syllable-based Phmtetic 
7)'anscription by Maximum Likelihood Methods. 
Computational Linguistics, pp. 1279. 

1116\] Slanfill, C. (1986) 7bward Memoo,-Based 
Reasoning. Communication of the ACM, vol. 29, 
no. 12, pp 1213-1228, December. 

\[117\] Viterbi, A. J. (1967) Error Bounds for 
Com,olutional Codes attd al~ A©,mptotically 
Optimum Decoding AIL, orithm. IEEE Transactions 
on Information Theory, v. IT-13, no. 2, pp 260- 
269, April. 

1118\] Kim, J., Y. Kim and S. Kim (1994) 
Developmem of the Data collection (KTSET) for 
Korean In\[ormation Retrieval Studies. The 6"' 
Hangul and Korean Inforlnation Processing, pp. 
378-385. 

\[119\] Knight, Kevin and Jonathan Graehl (1997) 
Machine Transliteration, Proc. 35"' Mtg, Assoc. 
for Computational Linguistics 
