Bilingual Knowledge Acquisition from Korean-English 
Parallel Corpus Using Alignment Method 
( Korean-English Alignment at Word and Phrase Level ) 
Jung H. Shin and Young S. Han* and Key-Sun Choi 
Department of Computer Science 
Korean Advanced Institute of Science and Technology 
Taejon, 305-701, Korea 
*Department of Computer Science 
Suwon University 
Kyungki, 445-743, Korea 
emaih j hshin @stissbs.kordic.re.kr 
Abstract 
This paper snggests a method to align 
Korean-English parallel corpus. '1?he 
structural dissimilarity between Korean 
and Indo-European languages requires 
more flexible measures to evaluate the 
alignment candidates between the bilin- 
gual units than is used to handle the 
pairs of Indo-European languages. The 
flexible measure is intended to capture 
the dependency between bilingual items 
that can occur in different units accord- 
ing to different ordering rules. The pro- 
posed method to accomplish Korean- 
English aligmnent takes phrases as an 
alignment unit that is a departure from 
the existing methods taking words as 
the unit. Phrasal alignment avoids the 
problem of alignment units and appease 
the problem of ordering mismatch. The 
parameters are estimated using the EM 
algorithm. The proposed alignment al- 
gorithm is based on dynamic program- 
ming. In the experimenl, s carried out 
on 253,000 English words and its Ko- 
rean translations the proposed method 
achived 68.7% in accuracy at phrase 
level and 89.2% in accuracy with the 
bilingual dictionary induced from the 
alignment. 'File result of the align- 
ment may lead to richer bilingual data 
than can be derived from only word- 
level aligments. 
1 Introduction 
Studies on parallel corpus consisting of multilin- 
gum texts are often guided with the purpose to 
obtain linguistic resources such as bilingual dic- 
tionary, bilingual grammars (Wu 1995) and trans- 
lation examples. Parallel texts have t)roved to be 
useful not only in tile development of statistical 
The 
House 
is~ 
gradually 
disintegrating 
with 
age 
kuCThc") 
cip-un("lhmse-nomina\[ case) 
seyweI-iC time"-nominal case)) 
hullekam-ey("fly" -adverbial case) 
ttal-a("folh)w "-subordinative case) 
cemchacek-ulo("gradually"-adverbial case) 
pwungkwiha-y("disintegrate"-subordinate case) 
ka-koCgo"-subordinative case) 
iss-taCis"-final ending) 
Figure 1: An example of typical Korean-English 
alignment. 
machine translation (Brown et al. 1993) but also 
in other applications such as word sense disana- 
biguation (Brown et al. 1991) and bilingnal lex- 
icography (Klavans and Tzoukermann 1990). As 
the parallel corpora become more and more ac- 
cessible, many researches based on the bilingual 
corpora are now encouraged that were once con- 
sidered impractical. 
Alignment as a study of parallel corpus refers 
to the process of establishing the correspondences 
between matching elements in parallel corpus. 
Alignment methods tend to approach the problem 
differently according to the alignment units the 
methods adopt. Of various alignment options, the 
alignlnent of word units is to compute a sequence 
of the matching pairs of words in a parallel corpus. 
Figure 1 show the aligned results of a paral- 
lel corpus that was originally paired in a sentence 
level. In figure 1, the right-hand side of pair-wise 
aligmnent is the corresponding Korean words. De- 
scribed in the parentheses on the right of each Ko- 
rean word are corresponding English meaning and 
syntactic functions of the word. 
The existing methods for the alignment of Indo- 
European language pairs such as English and 
French take words as aligning units and restrict 
the correspondences between words to be one 
of the functional mappings (one-to-one, one-to- 
230 
~i ,,, I ...... lig,~c(~l a SClIteIICC level . 1 
l} 
/ K/E Align,n?k ~ - 7-\[ \[~ : ~--\[-- 
Figure 2: Ov('a'view of the l)roposed aligmnent 
method. 
many) (l~row,~ ('~t al. 199:1, Sn,ad,ia 1!)!~2). 'rh(,se 
methods made extensive, us(." of the position infer 
marion of words at ltlat(;hillg pairs of sellte/lCeS, 
which turned out useful (Brown et al. 1993). q'he 
structural similarity in word order and units be- 
tween English and l,'rench tIIIISt \[)e ()lie of the \[l|a- 
jot factor to the succ(;ss of th(~ tuethods. 
The Mignment of the pairs of structurally dis- 
similar languages such as Korean and English rc 
quires different strategy to comp(~nsate the lack 
of structural information such word or(ler and to 
handle the difli~reu('e of aligimwnt units. 
An early ~ttemt)t to align Asian and \[ndo- 
l!;uropean l~mguage pairs is found from tim work 
by Wu and Xia (199d). Their result is promising 
with the demor, stration of high accuracy o\[' learn- 
ing 1)ilingual lexicon between English aml (;hin(',se 
for fl:equently use(1 words without t;he considera- 
tion of word order. The C, hinesc-t';nglish align- 
meat consists of segmentation of an inl)U/, (',hinese 
sentence, and aligning the segmented seltteiic(? with 
the c~mdidate English SelltellCe. The g(eneration of 
segments to be aligne(l is an additional prol)h~m to 
the decision of aligning units before 1he aligmnent 
takes l)b~ce. Wu and Xia (1!)94) used I)ilingual dic- 
tionary to segment the sentence, but the selectioii 
of segment can(lid~ttes is hard to make with rdi- 
able accuracy. 'l'he bilingual dictionari(,s are not 
always awfilabh', and take. considered)h; resources 
to build. 
'\['he method we suggest integrates the l)roce- 
dures to solve the two critical /)robh~ms: deci(l 
ing aligning units and aligning tim candidates of 
dilferent word orders and accoml)lishes the atigu 
meat wi|,hout using any dictionary. 
The proposed alignment nmthod assumes it l)re - 
l)roc(,.ssing step t)efore iterative applications of 
~fligmnent ste 1) as is illustrated in tigure 2. Part- 
of-sl)eech tagging is don(; I)elbre the actual align 
meat so that the. word-phrases (a spacing unit in 
Korean) may be decomposed into prop('.r words 
attd functional morphemes and the Korean and 
I:mglish words may be assigned with apl)ropriate 
tags. 
'l'he proposed alignment is done first \['or l)hras(~ 
pairs and then word pairs that eventually induces 
the bilingual dictionary. The alignment nlethod 
is realized through ~he rcestimation of its proba- 
l)ilistic parameters from tim aligne.d sem,cn('es. In 
particular, the \])arallleters ;-i,ccotlllt \['or th(! cooctll!- 
ren(:e, probilities el'bilingual word pairs and phrase 
pairs. The repetitive ai)plicatioil of tim alignmeut 
m,d reesl, imation h'ads to a convergent stationary 
state where the tra.ining stops. 
In the folk)wing secl,ion, our t)ropos(~d method 
for aligning l£or(,an- t,;nglish sentences is described 
~md l)aranmt('.r reestimation algorithm in ex- 
plained. Section 3 summarizes the results of ex- 
1)erinlents an(l Conclusion is given in section 4. 
2 Korean/English Alignment 
Model 
2.1 English/Fren('h aligmne.nt nm(lel 
To detine p(f\]e), the 1)robability of the French sen 
tence f given the l",nglish sentence e, Brown et 
al. (1991) ;ulol)ted the translation nlo(lel in which 
each word in e acts independently to produce the 
words in f. When a typical alignm('at is denoted 
by a, the l)rol)ability off given (: can l)e written 
as the sum over all l)ossibh', alignments (Brown et 
;d. 1991) 
v(fl(:) (t) a 
Given an aligmnent a between e and f, Brown 
ctal. (199l) has shown that one can estimate 
p(f,al('. ) as the product of the following thre.c 
terms (l~erger (% al. 19!),5). 
If'l If'l 
/)(t', ale) = 1-I p(n(,;a, )\[e,, ) H l'(filc",)d(f' ap.) (2) 
i =: \[ i-: t 
In l, he al)ove equation, p(nlc) denotes the l)roba - 
bility that the l",nglish word e generates n l,'rench 
words and p(fle) denotes the probability that 
the l"mglish word e generates the l"rench word 1'. 
d(f, ale. ) rel)resents the. distortion prol)abilil,y that 
in about how the words are reordered in the l!'rench 
output. 
in the above methods, only one English word 
in reb~t(xl to one or n lq:ench words. The (lister 
Lion probabilities are defined on the positional re- 
lations such as absolute or relative positions of 
matching words. 
2.2 Characterlsl;i(-s of Korean/English 
alignm('.nt 
Unlike the. case of l';nglish-l,'ren('h alignnt(mt, Ko 
rean and gnglish have dilfer(:mt word units to 
231 
Table t: The result of manuM analysis about 
matching unit 
Korean words English words 
I 
2 
3 
\] 
etc. 
l 
1 
1 
2 
etc. 
Ratio 
:{3.8% 
28.1% 
9.7% 
7.3% 
II. t% 
be aligned, for an English sentence consists o\[' 
words whereas a t,\[oreatt sentence consists of word- 
l>hrases (compound words). Typically a word- 
phrase is (:otnl)osed of one or more content words 
and postpositional function words. 
A Korean word is usually a smaller unit than an 
English word and a word-phrase is larger than an 
English word. For this reason the exact thatch as 
in English-French pair is hard to establish for the 
case of Koean-English (Shin et al. 1995). Con- 
sequently word-to-word or word-to-word-phrase 
alignment t)etwcen Korean and l';nglish will suf'+ 
fee from trait mistnatch attd low accuracy. The 
complication of unit mismatch often implies the 
need of non-flmctional aligntnent such as many- 
to-many mapping. Non-flmctiomd mapping tnay 
also occur in the l!htglish-French case, but with 
much less frequency. 
'l'he table 1 shows the degree of mismatch be- 
tween English words and Korean words that are 
analyzed by our atttomatic POS tagger and tnor- 
phological analyzer. When we checked randomly 
selected 200 sentence pairs by hand, only aa.s% or 
all pairs have one+to-one correspondences between 
English words and Korean words. 
2.3 Korean to English Alignment 
In this section, we propose a Korean to English 
aligmnent method that aligns in both word and 
phrase lewds at the same t.ime. First, we introduce 
the method in word-to-word alignment, att(l then 
extend it to inchMe phrase-to-phrase alignment. 
By definition, a phrase in this paper refers to 
a linguistic unit of 1Tlore general structure than 
it is recognized in general from the terms, noun 
and adverb phrases. A phrase is any arbitrary 
sequence of ad, iaeent words in a sentence. 
2.3.1 Base Method (using only 
word-to-word correspondences) 
In t;he developrnent of our method, we follow 
the basic idea of' statisticaL1 translation proposed 
by Brown et al. (11993). '\['o every pair of sen- 
tences of e and k, we assign a value p(elk), the 
probability that a translator will pro(luce e as its 
translation of k, where e is a sequence of English 
words and k is a sequence of Korean words. 
p(e.lk) = ~ r,(g Iki) (3) 
j=l i=0 
In equation 3, n and m are the nmnl)er of words 
in the English sentence e and its correspoudil G 
Korean sentence k respectively, cj and kl are tit{> 
aligtdng unit between l'2nglish sentence e and Ko- 
rean sentence k. cj rq+resenl,s j-th word in I"nglish 
sentence and k/ represents i-th word in Korean 
sentence. For example, in Figure 1 English word 
"the" is ct and Korean word "ku" is kt. 
2.3.2 Proposed Method (Extended 
Method) 
The base method of word level aligtnncnt is ex- 
tend('d with 1)hrase-level alignntettt that ow'x- 
comes the dHDrence of matching unit and provides 
more opportunity for the extraction of richer lit> 
guistic information such as l)hrasal-lewq bilingual 
dictionary. To cot)e with the data sparseness prob- 
lem caused by considering all possible phrases, we 
represent phrases by the tag sequences of their 
component words. 
If an English sentence e and its Korean trans- 
lation k are partitioned into a sequence of' phrases 
p~. and t)~ of all possible sequences s(e, k), we can 
write p(elk) as in equation 5 where l)~ and Pk are 
phrase sequences and a(p+, t>~:) denotes all possible 
alignments between Pe and Pk. 
/)(elk) 
> (4) 
<pk,p, >ES 
<Pk ,p,. > C S a(pk ,p+: ) 
If we represent the phra.se-to-phrase correspon- 
dences using the tag sequence of phrase and words 
composing phrase, The equatiou 5 can be rewrit- 
ten as in equation 6 letting phrase match be rep- 
resented by the tag sequence of phrases as well as 
words. \[n equation 6, k~ ~ is j-th phrase of \]d '~ , and l(kP~l 
, j e denotes the tag sequence of words compos- 
ing phrase kj'Pk. IP~\[ is the number of phrases in a 
phrase sequence Pc. 
v(+ > Ih,:v ) 
Ivkl IvkP >, 
-- ... II,,(t(4'+)Et(<:))p(¢+l<:) 
a.=O alv~l i=1 
rrcl I~,kl 
= t. ,, (s) 
i=1 j=O 
The likelihood of all alignable cases within bilin- 
gual phrase is defined as in equation 7, where \[e~+l 
232 
House) ~n("tlouse-nommal case) 
iq Feywel-i<"~ti,ne"-,mhfi u al~)> - " 
gradually K //J hullekam ey('fly"-adverbia\[ ...... ) 
disintegrating J ~ tubal-at;; re!low "-subordin a!!ve casc) . 
L age pwungkwiha-y("disiutegrate"-suboldiilate case) 
_ ~e) > li',',-k,,("go"-subordi,u,tiv ...... ) 'j 
iss ta("is"-lhml ending) J 
tqgure 3: An example of lforea.n English align- 
merit at phrase level. 
is the nllutber o\[ words in a phrase c} a.ud ( ~;'~i,, de- 
uotes k-th woM of iu a t>hras<e c i . 
P( 4:" /'~ ~ "° / ) 1-I ~ z,(.**: la,}'~) (7) 
k=:i 1=1 
\[:igure :/shows how tip l>rol>lem o1" word unit ,his- 
match can t>e dealt wit, h in the phrase level aligu- 
,lien\[,. 
lit the example, d ''~ = (The houst') (is gradually 
<iisintegratiug) (with ~llg{, ), aml c;\[' _ (The Itousc), 
P, I% (it ~ Tile, /,(C t ) = (determitmr ttoun), ,,qt'*'+ =- (ku 
cil>-Utl), /~'~'~ --ku, ,'<>Sl>Cctiwqy. 
2.4 Parameter re.estimatlon 
With the <:onstraiut that the st,tti ov<w a+ll align- 
nte,tts should be 1, the reestintatiott a.lgorith,n can 
be d<'.rivt'd to give equation 8 Ibr word t.ranslntion 
probal>ility and equation 10 for I>hras<~ <'<~l:r<'sl>on- 
dence prolmt>ility. This proc<'ss, wht'n apl>lied re- 
peatedly, must give a localty ot>tint;d est.inmtion 
of the l>ara.rneters \[ollowing I, he l)riucil>h? (>\[" t, he 
EM algorit, hnt (Brown et al. 1993)(I)etrtt>stcr et 
al. 1977). 
p(clk)< ..... ti,,,,~> <ltmotcs the alignment <:atoll- 
dates that satisfies < conditio't~ >. l:'or <:alculating 
p(clk), only constant t <-ases of a.ligntnenl;s nt'.cd 
to be <:onsidered in tim prol>Os<'d alignnttutt al- 
gorithm t>ecause most ;digntnc.nl. <-avitlidatcs have 
very low prol)al)ility l;ha.t 1.h<:y may I>c igttore(l. 
exl)~ct<xt numl><w o\[ ,: &ivct, ~, 
~-~e.kecorl>uS ~( :,lX, ;e,k ) 
~, c.kecorpus ' 
Let us <:all tim exl>ected Iltttrtl:,el; el" l;imc'.s, that 
k matches with e in the corresl>onding sentence 
k and e, the count of e giwm k. By using the 
notation (:(elk), the ree, stitnation forntula ofp(elk ) 
can be induced as equation 8 using \[",M ntethod. 
: (,qk)< .... :,;,,,=,.~ > o(,:l< 
,,, k) -- ~,(,~lk) tO) 
When we de,tote c(:,lG) the expected number o\[" 
tames I.ha.t, a. tag s<~ctuen<:e of English I>hrase corre- 
Sl>onds to a tag sequence of Korean l>hrase as in 
equa, t, ion I I. Then the reest.imation algorithut of 
l)(l,.,\[Ik) is giwm as in eqlt;~tt.ion t0. 
cx >ecl, ed numl>er of t~ ~XEn ~ z,(z, lea:) '~- - ~:-- 
l.otal expected nu|nber o\[ t,. given t~ 
3~ ~ .(t Itk;e,k) ~c,l~cco,',ms ~ • 
(L()) ::: ~,,, }~e,k~corlms c(t'~lt'v ;°"k) 
t'(e'ik)<t.:=:td'~ ,,,~ =:~'~ > 
(,(t~lt~.;c.,k) = - - v(elk) (It) 
I"or tim exl;e,l(led tnethod of phrase alignment, the 
Itase model is an intcrntediatc stage for the estima 
l,iou of word-to-w<)rd f>rol)abilith~s. Who phrase- 
t,()-\[)h\[)O.rse i)rol>abililics are reesl,\[tna.t~c~d Ul>Olt I, hc 
inil.ial vnlu<'~s <)\[ word:to--word prol>al>ilties. 
2.5 Alignment algorithm 
The alignntent process of gen<'rating Korean 
phrases and selecting their matching i>hrases in 
l';nglish can l>e formul;tted around I.he l)rincipl<~ 
o\[' tlyna.mi<: l>rogramming. 'l'he l>rol)ability va.lttc 
(\[efil,ed ill e(Itl;t,l;ion (\] ;\]rill\[ 'T iS t|~¢,~et\[ t;() cc, ittpute 
nla.tchi,,g prol)ahility of t)(\]c:,a) and l)(cj,b). 
p(ej,~,) stand for tilt'. I)hl';t,se (;Olllposed of 1) In/\[ti- 
ber of w(>Ms from j-th woM ill ;t sC.lll.etlce. (~i iS 
used to ke.<q) Life seh~cl.ed phrase sequence ,tp t.(> 
i-th word a.nd ~i denotes its sC()l'C. N attd M are 
uuvnl)er of words of Koreall sentence and I:mglish 
se,g;e,tc<~' r<~sl)eCtively. '\['he c<>nstanl~ value l, is tie-. 
\[iued as tna.xinntm ntt,nt>er el' words which c(>nsis(, 
of a phrase. 
Initialization 
1\],OCI II'SioIl 
#)i 
L2 
~oo = 0 
,,t;,~: \[:~_,~ t- Iogg~:,,,,, g:,)l 1_<7<N 
l<a<l, 
I<lJ~l, 
(j, (,., b) 
.,-g ,l.~..: \[:,: ~ + log~,(& .... ,/,)\] 
t<t<N L<,,<L 
~<S<_L 
F'ath backtracking 
op@nal pa.th = (G~,...O ........ 0 ...... ...,fiN) 
h,~ ~ ~-: h,,, • a( (h,,, ), where 
a(g~h.~ ) is a hi 0 .... = (j, (z, b) 
Although the aligmnent algorithm described 
above with the COml>texity of O(I,:2MN) is sim- 
ple and c\[licicnt,, this algorit, hm has the limit, a- 
lion caused by the assumption of dynanfic pro- 
gramming. The dynamic programming in the 
context of alig|nnent assumes fltat th,+, previous 
233 
selections do not interfere with the fllture deci- 
sions. The alignment decision, however, may de- 
pend on the previous matches to the extent that 
the results from dynamic programming inay not 
be sufficiently accurate. One popular solution is 
to maintain upper t-best cases instead of just one 
as following where max-t denotes the t-th max 
candidate. 
= max- t \[pi-a(t') + logp(k~,, c~b)\] 
l<t/<r,l<j<N l<a<L,l<b<L 
= (j,<~) 
= arg max- t \[~i_~(t') + logp(ki¢,, ej, b)\] I<tt<T,I<j<N 
l<a~L,I<b<L 
As a result, the running complexity of the pro- 
posed algorithm becomes O(TL2MN). Taking T 
and L as constants, the order of complexity be- 
comes O(MN). 
As another method to relax the problem of de- 
cision dependency on the previous matches, pre- 
emptive scheme to find max matching of phrase 
ki,~ is adopted. In the preemptive aligmnent, the 
previous selection can be rematched with the bet- 
ter selection found by later decision. 
In following algorithm, ~(ki,a,n) denote ej,v 
which has n-th highest matching wdue with Ko- 
rean phrase ki,~ among all possible matching 
Korean phrase and u(ki,a,n) carry the weight 
tbr tile matching. ~i,b indicate matched Korean 
phrase with ej,b in current status and v~j,~ denote 
their matching weight. 'l'he established match- 
ing in previous stage can be changed when an- 
other matching, which has higer matching weight, 
is identified in this algorithm. 
Initialization 
O(ki,~,,n) = (j,b) 
Oj,b = O,(I<_.j<_N,I<b<L) 
p( l%a, ~j,b ) .(/<o, n) = 
Preemptive selection 
n=0 
(j, b) = ~(ki,a, n) 
repeat 
if(u(ki/,, n) > '~j,b) 
lgj, b ~- l/( \[~i,a, Zl) 
\]¢~,a = ej,b, ~j,b ~- \]gi,a, Igi,a ~ lg~,a 
else 
n = n + 1, (j, b) = CO(l~i,a, ,Z) 
until 0j,b is 0 
Table 2: 'Pile content of training corpus (En- 
glish:words, Korean:word-phrases) 
Source English Korean 
middle-school textbook 46,400 34,800 
high-school textbook 153,300 106,400 
other books 54,400 37,100 
total 254,100 178,300 
Although the proposed algorithm can not cover all 
possible alignment cases, the proposed algorithm 
produces resonably accurate alignment results efli- 
ciently as is demonstrated in the following section. 
2.6 Experilnents 
The total training corpus tbr our experiments con- 
sists of 254,100 English words and 178,300 Korean 
word-phrases. The content of training corpus is 
summarized in table 2. 
A tIMM Part-of-Speech tagger is used to tag 
words beibre aligmnents. An accurate IIMM de- 
signed by the authors for Korean sentences taking 
into account the fact that a Korean sentence is 
a sequence of word-phrases is used (Shin et al. 
95). The l)enn Treebank POS tagset that is com- 
posed of 48 tags and 52 Korean tagset is used in 
the tagging. The errors that is generated by mor- 
phological analysis and tagging cause many of the 
alignment errors. 
qb avoid the noise due to the insufficient bilin 
gum sentences, we adopted two significance filter-- 
ing methods that were introduced by Wu and Xia 
(1994). First, the Korean sentences consisting of 
words with more than 5 occurrences in the corpus 
are considered in the experiment. Second, we se.. 
lected the English words that accounts for the top 
0.80 of the translation probability density given a 
Korean word. 
When we selected 200 sentence pairs randomly 
and manually tested aligned results, we obtained 
68.7% precision at the phrase level and 89.2% 
precision of bilingual dictionary induced from the 
alignment. The table 3 and 4 illustrate tile bilin- 
gual knowledge acquired from the aligned results. 
The information in table 4 is the unique product 
of phrase-level alignment. 
3 Conclusion 
With the alignment of Koreanq~,nglish sentences, 
the most serious problem, that is seldom found 
at indo-European language pairs, is how to over- 
come tile differences of word unit and word order. 
The proposed method is an extension of word level 
alignment and solves the problems of word unit 
mismatch and word order through phrase level 
alignment. We have also described several alter- 
natives of alignment and parameter estimation. 
234 
Table 3: Examples of result for word translation 
prol)ability. 
Korean word English word probMfility 
yengli 
yengli 
yengli 
kion 
kion 
kion 
kion 
kion 
clever 
srHart 
cleverness 
degrees 
temperatures 
centigrade 
increase 
would 
0.616331 
0.238197 
0.145472 
0.279992 
0.248706 
0.131713 
0.7130894 
0. 108766 
Table 4: EXaml)les of phrase-h'.vel t)ilingual dic- 
tionary results 
Korean phrase t!;nglish phrase 
wuli uy our thanksgiving (lay 
chwuswu kamsace\[ 
ey kwansim i iss interested in 
maywu kul)kyek very fast 
ha ko tto wuihem and dangerous 
moscianh key dange~rous 
wuihcm as anything else 
It produces more accurate bilingual dictionary 
than the nmthod using only word correspondence 
inf'orn~ation. Moreover, we can extract phrase- 
level information from the results of phrase level 
alignment. Also in the prot)osed method, the 
whoh; process of generating phrase units and lind- 
ing matching phrases, is done. mechauicMly with- 
out human intervention. One negative aspect is 
that l;he method requires large amount of training 
corpus lbr the saturated estimation of the model 
though larger data will increase tile accuracy of 
the performance. 
The proI)osed method may well he al)plied 
to other language pairs of similar structures as 
well as dissimilar ou(;s. Since the results from 
the method are richer with linguistic informa- 
tion, other applications such as machine trans- 
lation and multilingual information retrieval are 
promising research areas. 
I)ekai Wu, Xuanyin Xia. 1994. Learning an Enlish- 
Chinese lexicon from a t)arMlel corpus. In Pro- 
ceedings of AM7'A-94, 206-213. Columbia. 
Crammarless extraction of phrasal translation ex- 
amples fi'om parelle\[ corl)ra 1995, lit Proceed- 
ings of the Sixlh International Conference on 
Theoretical and Methodological Is,sues in Ma- 
chine Translation, 354-37\[. Leuven, Begiun. 
Frank A. Smadja. 1992. How to compile a bilin~ 
gum collocational lexicon automatieMly. In 
A AA-92 Workshop on Statistically-Based NL P 
7'echniqacs, 65-71, San dose, CA. 
Judith Klawms, l",velyne Tzoukermann. 1990. The 
bicord system. In Proceedings of COLING-90, 
174-179. llelsinki, Finland. 
Jung 11. Shin, Young S. Hail Young C. Park, 
Key~Sun. Choi. 1995. A IIMM Part-of-Speech 
Tagger for Korean with wordpharsal Relations. 
In Proceedings of Recent Advances in Natural 
l,anguage Processing. 
Peter F. Brown, Stephen A. l)eela Pietra, Vincent 
J. Della Pietra, Robert 1,. Mercer.1991. Word 
Sense disambiguation using statistical methods. 
In Proceedings of 29th Annual Meeting of Aug, 
Berkeley (;A. 
Peter F. Brown, Stephen A. Deela Pietra, Vin- 
cent J. l)ella Pietra, Robert I,. Mercer. 1993. 
The Mathematics of StatisticM Machine 'i'rans- 
lation: Parameter Estimation. Computational 
Linguistics, L9(2):263-311. 
References 
Adam L. l~ergcr, Stephen A. Della l)ietra, Vincent 
J. l)ella Pietra. 1995. A Maximum I';ntrol)y Ap- 
proach to Natural l,anguage Processing. Com~ 
pulational l, inguislics, 22(1):39-73. 
A.P. l)empster, N.M. Laird, and l).B. Rubin. 
11977. Maximum likelihood from incomplete 
data via the EM algorithm. Journal of the 
Royal Stalistical Society, B39:1-38, 1977. 
235 
