Aligning More Words with High Precision for Small Bilingual Corpora 
Sur-Jin Ker 
Department of Computer Science 
National Tsing Hua University 
Hsinchu, Taiwan, ROC 30043 
j schang@cs.nthu.edu.tw 
Jason J. S. Chang 
Department of Computer Science 
National Tsing tlua University 
Hsinchu, Taiwan, ROC 30043 
jschang@cs.nthu.edu.tw 
Abstract 
In this paper, we propose an algorithm for 
aligning words with their translation in a 
bilingual corpus. Conventional algorithms are 
based on word-by-word models which require 
bilingual data with hundreds of thousand 
sentences for training. By using a word-based 
approach, less frequent words or words with 
diverse translations generally do not have 
statistically significant evidence for confident 
alignment. Consequently, incomplete or 
incorrect alignments occur. Our algorithm 
attempts to handle the problem using class- 
based rules which are automatic acquired from 
bilingual materials such as a bilingual corpus or 
machine readable dictionary. The procedures for 
acquiring these rules is also described. We found 
that the algorithm can align over 80% of word 
pairs while maintaining a comparably high 
precision rate, even when a small corpus was 
used in .training. The algorithm also poses the 
advantage of producing a tagged corpus for 
word sense disambiguation. 
1. Introduction 
Brown et al. (1990) initiated much of the recent 
interest in bilingual corpora. They advocated applying 
a statistical approach to machine translation (SMT). 
The SMT approach can be understood as a word by 
word model consisting of two submodels: a language 
model for generating a source text segment ST and a 
translation model for translating ST to a target text 
segment TT. They recommended using an aligned 
bilingual corpus to estimate the parameters of 
translation probability, Pr(ST \[TT) in the translation 
model. The resolution of alignment can vat3, from low 
to high: section, paragraph, sentence, phrase, and 
word (Gale and Church 1993; Matsumoto et al. 1993). 
In addition to machine translation, many 
applications tbr aligped corpora have been proposed, 
including bilingual lexicography (Gale and Church 
199l, Smadja 1992, Dallie, Gaussier and Lange 1994), 
and word-sense disambiguation (Gale, Church and 
Yarowsky 1992, Chen and Chang 1994). 
In the context of statistical machine translation, 
Brown et al. (1993) presented a series of five models 
for Pr(ST \[TT). The first two models have been used 
in research on word alignment. Model 1 assumes that 
Pr(ST\[TT) depends only on lexical translation 
probability t(s I t), i.e., the probability of the i-th word 
in ST producing the j-th word t in TT as its translation. 
The pair of words (s, t) is called a connection. Model 
2 enhances Model 1 by considering the dependence of 
Pr(ST ITT) on the distortion probability, d(i l J, 1, m) 
where I and m are the numbers of words in ST and TT, 
respectively. 
Using an EM algorithm for Model 2, Brown et al. 
(1990) reported the model produced seventeen 
acceptable translations for twenty-six testing 
sentences. However, the degree of success in word 
alignment was not reported. 
Dagan, Church and Gale (1992) proposed directly 
aligning words without the preprocessing phase of 
sentence alignment. Under this proposal, a rough 
chm'acter-by-character alignment is first performed. 
From this rough character alignment, words are 
aligned using an EM algorithm for Model 2 in a 
fashion quite similar to the method presented by 
Brown. Instead of d(i \[ j, 1, m), a smaller set of offset 
probabilities, o(i - i') were used where the i-th word of 
ST was connected to the j-th word of TT in the rough 
alignment. This algorithm was evaluated on a noisy 
English-French technical document. The authors 
claimed that 60.5% of 65,000 words in the document 
were correctly aligned. For 84% of the words, the 
offset from correct alignment was at most 3. 
Motivated by the need to reduce on the memory 
requirement and to insure robustness in estimation of 
probability, Gale and Church (1991) proposed an 
alternative algorithm in which probabilities are not 
estimated and stored for all word pairs. Instead, only 
strongly associated word pairs are Ibund and stored. 
This is achieved by applying dO 2 test, a x~-like statistic. 
The extracted word pairs are used to match words in 
ST and TT. The algorithm works from left to right in 
ST, using a dynamic programming procedure to 
maximize Pr(ST ITT). The probability t(s \] t) is 
approximated as a function of thn-in, the number of 
matches (s', t) for all s' ~ ST, while distortion d(i I J, l, 
m) is approximated as a probability function, 
Pr(matchlj'-j) of slope, j'j, where (i', j') is the positions 
of the nearest connection to the left of s. The authors 
claim that when a relevant threshold is set, the 
algorithm can recommend connections for 61% for 
210 
the words in 800 sentence pairs. Approximately 95% 
of the suggested connections are correct. 
in this paper, we propose a word-alignment 
algorithm based on classes derived from sense-related 
categories in existing thesauri. We refer to this 
algorithm as SenseAlign. The proposed algorithm 
relies on an automatic procedure to acquire class- 
based rules for alignment. It does not employ word- 
by-word translation probabilities; nor does it use a 
lengthy iterative EM algorithm for converging to such 
probabilities. Results obtained fiom the algorithms 
demonstrate that classification based on existing 
thesauri is very effective in broadening coverage while 
maintaining high precision. When trained with a 
corpus only one-tenth the size of the corpus used in 
Gale and Church (1991), the algorithm aligns over 
80% of word pairs with comparable precision (93%). 
Besides, since the rules are based on sense distinction, 
word sense ambiguity can be resolved in favor of the 
corresponding senses of rules applied in the alignment 
process. 
The rest of this paper is organized as tbllows. In 
the next section, we describe SenseAlign and discuss 
its main components. Examples of its output are 
provided in Section 3. All examples and their 
translations are taken from the l~ongman English- 
Chinese Dictionary of Contemporary English (Procter 
1988, I,ecDOCE, hencetbrth). Section 4 summarizes 
the results of inside and outside tests. In Section 5, we 
compare SenseAlign to several other approaches that 
have been proposed in literature involving 
computational linguistics. Finally, Section 6 
summarized the paper. 
2. The Word Alignment Algorithm 
2.1 Preliminary details. SenseAlign is a class-based 
word alignment system that utilizes both existing and 
acquired lexical knowledge. The system contains the 
following components and distinctive t~atures. 
A A greedy algorithm for aligning words. The 
algorithm is a greedy decision procedure for 
selecting preferred connections. The evaluation is 
based on composite scores of various factors: 
applicability, specificity, fan-out relative distortion 
probabilities, and evidence from bilingual 
dictionaries. 
B. Lexieal preprocessing. Morphological analysis, 
part-of-speech tagging, ktioms identification are 
performed for the two languages involved. In 
addition, certain morpho-syntactic analyses are 
performed to handle structures that are specific 
only to one of the two languages involved. By 
doing so, the sentences are brought closer to each 
other in the number of words. 
C. Two thesauri for classifying words. (McArthur 
1992; Mei et al. 1993) Classification allows a 
word to align with a target word using the 
collective translation tendency of words in the 
same class. Class-base roles obviously have much 
less parameters, are easier to acquire and can be 
applied more broadly. 
1). Two different ways of learning class-based 
rules. The class-based can be acquired either from 
bilingual materials such as example sentences and 
their translations or definition sentences tbr senses 
in a machine readable dictionary. 
E. Similarity between connection target and 
dictionary translations. In 40% of the correct 
connections, the target of the connection and 
dictionary translation have at least one Chinese 
character in common. To exploit this thesaury t 
effect in translation, we include similarity between 
target and dictionary translation as one of the 
factors. 
F. Relative distortion. Translation process tends to 
preserve contiguous syntactical structures. The 
target position in a connection high depends that 
of adjacent connections. Therelbre, parameters in 
an model of distortion based on absolute position 
are highly redundant. Replacing probabilities of the 
fbrm d(iLj, 1, m) with relative distortion is a feasible 
alternative. By relative distortion, rd for the 
connection (s,t), we mean (j-j')-(i-i') where i'th 
word, s' in the same syntactical structure of s, is 
connected to the j'th word, t' in TT, 
2.2. Acquisition of alignment rules. Class-based 
alignment rules can be acquired from a bilingual 
corpus. Table i presents the ten rules with the 
highest applicability acquired from the example 
sentences and their translations in LecDOCE. 
Alternatively, we can acquire rules from the bilingual 
definition text for senses in a bilingual dictionary. 
The definition sentence are disambiguated using a 
sense division based on thesauri for the two language 
involved. Each sense is assigned codes fi'om the two 
thesauri according to its definition in both languages. 
See Table 2 lbr examples of sense definition and 
acquired rules. 
2.3 Evaluation of connection candidates. 
Connection candidates can be evaluated using various 
factors of confidence. The probabilities of having a 
correct connection as fimctions of these fhctors are 
estimated empirically to reflect their relative 
contribution to the total confidence of a connection 
1 From one aspect those words sharing common 
characters can be considered as synonyms tha would 
appear in a thesaurus. Fujii and Croft (1993) pointed 
out that this thesaury effect of Kanji in Japanese helps 
broaden tile query lhvorably for character-based 
information retrieval of Japanese documents. 
211 
candidate, fable 3 lists the empirical probabilities of 
various factors. 
2.4. Alignmen! algorithm. Our algorithm fbr word 
aligmnent is a decision procedure tbr selecting the 
preferred connection fiom a list of candidates. The 
initial list of selected connection contains two dummy 
connections. This establishes the initial anchor points 
tbr calculating relative distortion. The highest scored 
candidate is selected and added to the list of solution. 
The newly added connection serves as an additional 
anchor for a more accurate estimation of relative 
distortion. The connection candidates that are 
inconsistent with the selected connection are removed 
from the list. Subsequently, the rest of the candidates 
are re-evaluated again. Figure 1 presents the 
SenseAlign algorithm. 
3. Example of running SenseAlign. 
To illustrate how SenseAlign works, consider the pair 
of sentences (1 e, 1 c). 
( I e) I caught a fish yesterday. 
(lc) Zhuotian wuo budao yitiao yu. 
yesterday I catch one fish. 
Table 4 shows the connections that are considered in 
each iteration of the SenseAlign algorithm. Various 
factors used to evaluate connections are also given. 
Table 5 lists the connection in the final solution of 
alignment. 
4. Experiments with SenseAlign 
In this section, we present the experimental results of 
an implementation of SenseAlign and related 
algorithms. Approximately 25,000 bilingual example 
sentences from LecDOCE are used here as the 
training data. Here, the training data were used 
primarily to acquire rules by a greedy learner and to 
determine empMcally probability thnctions of various 
factors. The algorithnfs pertbrmance was then tested 
on the two sets of reside and outside data. The inside 
test consists of fitty sentence pairs from LecDOCE as 
input. The outside test are 416 sentence pairs fiom a 
book on English sentence patterns containing a 
comprehensive fifty-five sets of typical sentence 
patterns, l lowever, the words in this outside test is 
somewhat more common, and, thereby, easier to align. 
"fhis is evident from the slightly higher hit rate based 
on simple dictionary lookup. 
The first experiment is designed to demonstrate the 
effectiveness of an naive algorithm (DictAlign) based 
on a bilingual dictionary. According to our results, 
although DictAlign produces high precision alignment 
the coverage for both test sets is below 20%. 
However, if the thesaury eft}act is exploited, the 
coverage can be increased nearly three tblds to about 
40%, at the expense of a decrease around 10% in 
precision, 
Table 1 
#~ 
I 
2 
3 
4 
5 
6 
7 
8 
9 
1o 
, Ten rules with the 
Rule 
642 Ma001, Hj63 
459 ,lh210. Dil9 
440 Md108, Be21 
4t8 L8202 , Eb28 
367 DaO03, Bn01 
362 Gc060, Hil6 
349 Fc050, Ed03 
310 Lh226, TII 8 
303 Ca002, Ab04 
302 'Fb020. Gb09 
highest applicability 
Gloss lbr classes 
moving / come, and go 
jobs, trade / work 
trams/car 
new/ne~, flesh 
huildmg, house/building 
spcaking/ introduce 
qualities / good. bad 
nicotine / time 
man and ~oman / babx 
likin% loving / like, love 
Table 2. Rules acquired from bilingual definitions for 12 
senses of"bank'" m LDOCIE. 
SgllbC & 1)cfinition 
I I.n. 11 land along the side of a river, lake 
etc. i'/: ; ~i~ 
II .n.2\] earth which is heaped up m a field or 
garden, often making a border or division. 
II IJ:~ 
l.n.3\] a mass ofsnmv, clouds, mud, etc. - 
~I\[i : -l'Jfl 
I.n.4\[ a slope made at bends in a road ori 
race-track, so that they are safer for cars tc 
go round. ~\]J~' 
l.n.51 = SANDBANK. , f,~ll'l'"; 
12.v. II (of a car or aircraft) to move ~ith 
one side higher than the other, esp. Mama 
making a turn {~'\[$;I.qi~g,~ 
il3 n. I I a row, csp. of OARs in an ancient 
boat or KEYs on a TYPEWRfFER ~Uf 
14.n. II a placc in which money is kept and 
paid out on demand, and where related 
aetMties go on. ,JI\[47 j: 
14.n.2i (usu. in comb.) a place where 
something is held ready for use, esp. 
ORGANIC producls o1" lmman origin for 
medical use. {i~{(igi!( " 
14.n.31 (a person Mlo keeps) a supply of 
moncy or pieces for paymcnt or use in a 
game of chance. ;~p~ 
15.v. 11 to put or kcep (money) in a bank (%: 
Rules 
1,d099, 13e03 
I x1{}99. Bn 12 
Ib, Bb(i3 
Ix10!}9. I ~c(},} 
I,d(}99, I~c02 
N.j295, I:d{}2 
l Ib, I )n{}8 
Jel04, l)m0,1 
.cI04 Ih~17 
.le104. I )m04 
Jc 106. I Ij,l{} 
15.v.21 \[csp. with\] to keel} one's money .lelO6,1lj40 
!esp. in the stated bank) (/di£ 
Table 3. Factor types with empirical probability 
Factor condition and probability 
Fo f= 1 
Prob 0.85 
App A >. 1 
Prob 0.95 
Spec 3' _> 12 
Prob 0.95 
R.D. rd = 0 
Prob 0.26 
Sire 3'ira = I 
eTg-,7 o.94 
/": 2 f= 3 f> 3 
0.61 0.44 0.42 
• I>A k .01 .0I>A > .001 10-% A 
0.90 0.85 0.43 
12>3"#11 11> S_>I0 
0.85 0.77 
rd :: I rd = 2 
O. 1 I 0.07 
1 =Sire>.66 .66~57m ~,2 
0.42 0.35 
'10 > ,S 
0.35 
rd> 2 
0.04 
bhm < .2 
0.12 
212 
Iteration 
English English 
Word POS 
'Fable 4. Various factors for covmection candidates 
Chinese Chinese Fan- 
Word P()S Rule Out Sire rd Spcc App= 
yesterday NR IJ\[~ Nd l,h225 Tq23 I- 
fish. NN fi), Na Ab032 Bil,l 1- 
1 PP ~j':~; Nh Gh280 Na02 l- 
l PP ~~ Nh (\]h280 Na05 I - 
fish NN ~(t Na Af100 Bil4 l- 
fish NN ,((t Na Ah 120 gi 14 I - 
fish NN f,(t Na Ea017 I?,i 14 l- 
fish NN f,(/ Na Eb031 Bi 14 I- 
a AT '/L~ Nc Nd098 Qa04 l - 
yesterday NR ~,), Na Lh225 Bil4 l- 
caught VB 4\]1\] J~J Vq Di Dc098 lhn05 l- 
fish NN tl~l~ Nd All00 Tq23 l- 
fish NN IlJl(Z~ Nd Ah 120 Tq23 1 - 
fish NN {l~-:i~ Nd Ea017 Tq23 1 - 
fish NN \[l'\[)~ Nd Eb031 Tq23 1 - 
fish NN \[l~\[; ~ Nd Ab032 fq23 1 - 1 
I 4 \[ 1.2 0.0097 
0.75 l 15.3 0.0017 
1 l 0 0 
l l 0 0 
(}.15 1 0 0 
(}.75 1 0 0 
0.75 1 0 0 
0.75 1 0 0 
0.5 1 0 0 
0 0 0 0 
0 1 0 0 
0 3 0 0 
0 3 0 0 
0 3 0 0 
0 3 0 0 
o 3 o o! 
2 fish NN ~0, Na Ab032 P,i 14 1 - 1 0.75 
2 1 PP -1;3~ Nh Gh280 Na02 1- I I 
2 \[ PP ~ Nh Gh280 Na(}5 1-1 1 
2 fish NN f,(~, Na Afl00 Bi 14 1-1 0.75 
2 lish NN ~(i Na AhI20 Bil4 I-1 0.75 
2 fish NN ~(( Na Ea017 Bi 14 I - 1 0.75 
2 tish NN ~(I Na Eb031 P, il4 I-1 0.75 
2 a A'\[" -~ Nc Nd098 Qa04 t- 1 0.5 
2 caught VB ~l{iJil\] V I Di Dc098 lhn05 l- 1 0 
15,3 ~i-Y 
0 (\] 
0 0 
0 0 
0 01 
0 0\[ 
0 0 i 
0 ()i 
o o! 
3 I PP :J:~ Nh Gh280 Na02 I- 1 I 0 () ((~ 
3 I PP 4-~ Nh Gh280 Na05 1- I I 0 0 
3 a AT -{~ Ne Nd098 Qa04 1-1 0.5 0 0 0 
3 caught VB 41115i~j V+Di Dc098 lhn05 1- I 0 0 0 0: 
4 a AT "ti~ Nc Nd098 Qa04 1- I 0.5 0 0 () 
4 caught VB l\]\[\]j!I\] V ~.\[)i 1)e098 Hm05 l-1 0 0 0 
5 caught VB ~II\] 5_!r\] V+Di De098 Hm05 l - 1 0 0 0 0 
In our second experiment, we use SenseAlign 
described above for word aligmnent except that no 
bilingual diclionary is used. In our thiM expet+iment, 
we use the full SenseAlign to align the testing data. 
Table 6 indicates that acquired lexical infornmtion 
augmented and existing lexical information such as a 
bilingual dictionary can supplement each other to 
produce optimum aligmnent results. The generality of 
the approach is evident fi-om the thct that the coverage 
and precision for the ovtside test are comparable with 
those of the inside test. 
5. Discussions 
5.1 Machine-readable lexieal resources vs. corpora 
We believe the proposed algorithm addresses tile 
problem of knowledge engineering bottleneck by 
using both corpora and machine readable lexical 
resources such as dictionaries and thesauri. The 
corpora provide us with training and testing materials, 
so that empirical knowledge can be derived and 
evaluated objectively. The thesauri provide 
classification that can be utilized to generalize the 
empirical knowledge gleaned fi-om corpora 
SenseAlign achieves a degree of generality since a 
word pair can be accurately aligned, even when they 
occur rarely or only once ill the corpus. This kind of 
generality is unattainable by statistically trained word- 
based lnodels. Class-based models obviously offer 
advantages of smaller storage requirement and hi vher 
system efficiency. Such advantages do have their costs, 
tot' class-based models may be over-generalized and 
miss word-specific rules. However, work on class- 
based systems have indicated that the advantages 
oulweigh the disadvantages. 
5.2 Mutual information, and frequency. Gale and 
Church (1990) shows a near-miss example where (\]2 a 
Z2-1ike statistic works better than mutual infimnation 
for selecting strongly associated woM pairs to use in 
word alignment. In their study, they contend that 2 
like statistic works better because it uses co- 
213 
nonoccurrence and the number of sentences where 
one word occurs while the other does not which are 
often larger, more stable, and more indicative than co- 
occurrence used in mutual information. 
The above-cited work's discussions of the Z2-1ike 
statistic and the fan-in factor provide a valuable 
reference for this work. In our attempt to improve on 
low coverage of word-based approaches, we use 
simple filtering according to fan-out in the acquisition 
of class-based rules, in order to maximize both 
coverage and precision. The rules that provide the 
most instances of plausible connection is selected. 
This contrasts with approaches based on word- 
specific statistic where strongly associated word pairs 
selected may not have a strong presence in the data. 
This generally corresponds to the results from a recent 
work on a variety of tasks such as terminology 
extraction and structural disambiguation. Dallie, 
Gaussier and Lange (1994) demonstrated that simple 
criteria related to frequency coupled with a linguistic 
filter works better than mutual information tbr 
terminology extraction. Recent work involving 
structural disambiguation (Brill and Resnik 1994) also 
indicated that statistics related to frequency 
outperform mutual intbrmation and q~2 statistic. 
6. Concluding remarks 
This paper has presented an algorithm capable of 
identit~,ing words and their translation in a bilingual 
corpus. It is effective for specific linguistic reasons. 
The significant majority of words in bilingual 
sentences have diverging translation; those 
translations are not often tbund in a bilingual 
dictionaly. However, those deviation are largely 
limited within the classes defined by thesauri. 
Therefore, by using a class-based approach, the 
problem's complexity can reduced in the sense that 
less number of candidates need to be considered with 
a greater likelihood of finding the correct translation. 
In general, a slight amotmt of precision can 
apparently be expended to gain a substantial increase 
in applicability. Our results suggest that mixed 
strategies can yield a broad coverage and high 
precision word alignment and sense tagging system 
which can produce richer information fbr MT and 
NLP tasks such as word sense disambiguation. The 
word sense information can provide a certain degree 
of generality which is lacking in most statistical 
procedures. The algorithm's performance discussed 
here can definitely be improved by enhancing the 
various components of the algorithm, e.g., 
morphological analyses, bilingual dictionary, 
monolingual thesauri, and rule acquisition. However, 
this work has presented a workable core for 
processing bilingual corpus. The proposed algorithm 
can produce effective word-alignment results with 
1. Read a pair of English-Chinese sentences. 
2. Two dummies are replace to the left of the first 
and to the right of the last word of the source 
sentence. Similar two dummies are added to 
the target sentence. The left dummy in the 
source and target sentences align with each 
other. Similarly, the right dummies align with 
each other. \]'his establishes anchor points for 
calculating the relative distortion score. 
3. Perfbrm the part-of-speech tagging and 
analysis tbr sentences in both languages. 
4. Lookup the words in LEXICON and C1LIN to 
determine the classes consistent with the part- 
of-speech analyses. 
5. Follow the procedure in Section 2.3 to 
calculate a composite probability tbr each 
connection candidate according to fan-out, 
applicability, specificity of alignment rules, 
relative distortion, and dictionary evidence. 
6. The highest scored candidate is selected and 
added to the list of alignment. 
7. The connection candidates that are 
inconsistent with the selected connection are 
also removed from the candidate list. 
8. The rest of the candidates are evaluated again 
according to the new list of connections. 
9. The procedure iterates until all words in the 
source sentence are alibi. 
Figure 1. Alignment Algorithm of SenseAlign 
Table 5. q'he final alignment 
English English 
Word Code 
Chinese 
Word 
Chinese 
Code 
I Gh280 wuo Na05 
Hm05 caught 
i a 
I)e098 bu-dao 
Nd098 yi-tiao Qa04 
Ab032 
Lh225 
!fish iYU 
zuotian yesterday 
Bil4 
Tq23 
Table 6. Experimental Results 
" Inside Test 
No. Matched # ftk Coverage 
DictAlign with sim = 1.0 59 56 15.3% 
I DictAlign withsim > 0.67 113 100 29.4% 
Dici/klign with sire > 0.5 l 51 124 39.2% 
ScnseAlign wilhout sire 237 213 61.7% 
Full ScnseAlign 314 293 81.8% 
Outside Test 
No. Matched 
DictAlign with sire : 1.0 499 
DictAlign with sin; > 0.67 970 
:DictAlign with sim > 0.5 1221 
SenscAlign wifllout sire 1913 
:Full SenseAlign 2424 
fl Hit 
486 
865 
1046 
1721 
2265 
(~ovcta~c 
16.8% 
32.7% 
41.1% 
66.8% 
84.7% 
Pmcision 
94.9% 
88.5% 
82.1% 
89.9% 
93.3% 
Prccisi()n 
97.4% 
89.2% 
85.7% 
90.0% 
93.4% 
214 
sense tagging which can provide a basis for such N I,P 
tasks as word sense disambiguation (Chen and Chang 
1994) and PP attachment (Chen and Chang 199"5). 
While this paper has specifically addressed only 
English-Chinese corpora, the linguistic issues that 
motivated the algorithm are quite general and are to a 
great degree language independent. If such a case is 
true, the algorithm presented here should be adaptable 
to other language pairs. The prospects tbr Japanese, in 
particular, seem highly promising There arc some 
work on alignment of l£nglish-Japanese texts using 
both dictionaries and statistics (Utsuro, lkeda, 
Yamane, Matsumoto and Nagao 1994). 
Acknowledgments 
The authors would like to thank the National Science 
Concil of the Republic of China for financial support 
of this manuscript under Contract No. NSC 84-102- 
1211. Zebra Corporation and l,ongnmn Group arc 
appreciated tbr the machine readable dictionary. 
Special thanks are due to Mathis H. C Chen lbr work 
of preprocessing the Mill). Thanks are also due to 
Keh-Yih Su tbr many helpful comments on an early 
drall of this paper. 
References 
1. Brill, Eric and P. Resnik, (1994). A Rule bascd 
Approach to l'repositional Phrase Attachment, In 
l~roceedings oJ" Ihe 15lh hJlernaliotml ("ot!/L, rence 
on ( 7omlmlalional Linguistics, \[ 198-1205, Kyoto 
Japan. 
2. Brown, P., J. Cocke, S. Della Pietra, V. \[)ella 
l'ietra, F. Jelinek, \]. l,afl~rty, R. Mercer, and P. 
Roosin, (1990). A Statistical Approach to Machine 
Translation, Computational Linguislies, 16:2, page 
79-85. 
3. l?,mwn, l'., S. Della l'ietra, V. Della Pietra, and R. 
Mercer, (1993). The Mathematics of Statistical 
Machine Translation: Paranteter Estimation, 
Compulalional l, inguistics, Vo\[. 19, No. 2, page 
263-31 l. 
4. Chang, J. S. amt M. 11. C. Chen, (1995). Structure 
Ambiguity and Conceptual Information Retrieval, 
In l'roceeding oJ" t'aciJic Asia (7ot(/~,rence on 
Lattguage, lqfi)rmalion and ( 7omlmlaliott , page 16- 
23. 
5. Chert, \]. N. anti J. S. Chang, (1994). Towards 
Generality and Modularity in Statistical Word 
Sense Disambiguation, In l'roceeding of t'acific 
Asia Coq/i, rence on I, brmal arm (7ompulaional 
Lingl#slic's, page 45-48. 
6. I)agan, ldo, K. W. Church and W. A. Gale, (1993). 
Robust Bilingual Word Aligmnent lbr Machine 
Aided Translation, In l~roceedings o)ihe Workshop 
On Uer F \],arge (?orl)ora : Academic and lnduslrial 
I'er,vwclives, page I-8. 
7. Daille, B., E. Gaussier and J.-M. 1,ange, (1994). 
Towards automatic extraction of monolinguaI and 
bilingual terminology, in t'roceedillgs of the 
hllernational ( "o/~/'erence on ( 'ompulatiomd 
Linguistics, 515-52 I. 
8. Fuji< \[\[ideo and W. Bruce Cro\['t, (1093). A 
Comparison of Indexing Techniques for Japanese 
Text Retrieval, In Iq'oceedings of Ihe 16Ih 
htlernaliomtl A ( 7A/I SI( ;IR ( "ollfi, reltee (711 leesearch 
amt Development in ht/ormation I&,lrieval, page 
237-246. 
Gale, W. and K. Church, (\[993). A Program for 
Aligning Sentence in Bilingual Corpora, 
( 7o#qmtalional Littguislics, 19( I ), page 75-102. 
10Gale, W. A. and K. W. Church. (1991). Identifying 
Word Correspondences in Parallel Texts, in 
lq'oceedmgs o\[" lhe bourlh IMRt.'A Speech and 
Natural l,anguage Workshop, page 152-157, 
Pacific Grove, CA., February. 
l I.Gale, W. A., K. W. Church, and l)avid Yarowsky, 
(1992). Using bilingual materials to develop word 
sense disambiguation methods. In Proceedings oJ 
lhe I,'ottrlh \]ttlernatiomd ( 7ot?fi,,rettce on 
7 heoreti~xd and Methodological L~sues it, Machine 
Trattsktlion, 101-112, Montreal, CanadaKay, 
Martin and Martin Iloscheisen, (1993). Text- 
Translation Aligmnent, Computational Linguistics, 
Vol. 19, No. 1, page 121-142. 
12.1,ongman, (1993). lxmgman English-Chinese 
Dictionary of Contemporary English, Published by 
l,ongnmn Group (Far l,;ast) I,td., I long Kong. 
13.Matsumoto, Y. et al. (1993). Structural Matching 
of Parallet Texts, In l'roceedmgs of Ihe 31s1 
Atmual Meet#; L, o) c the Association ./or 
( ?omlmtaliottal Linguistics, page 1-30, Ohio, USA. 
\[4.McArthur, T. (1992) Longman l,exicon of 
Contemporary English, Published hy l,ongman 
Group (Far East)Ltd., Hong Kong 
\[5.Mei, J.\]. el al., (1993). Tongyici Cilin (Word 
Forest of Synonyms), Tong Hua Publishing, Taipei, 
(traditional Chinese edition of a simplilied Chinese 
edition published in 1984). 
16.Proclor, Paul, (1988). \[,ongman English-Chinese 
l)ictionary of Contemporary English, Longman 
Group (l"ar East), Hong Kong 
17.Utsuro, T., hi. Ikeda, M. Yamane, M. Matsumoto, 
aml M. Nagao, (1994). Bilingual text matching 
using bilingual dictionary anti statistics, In 
l~roceedin£<~ ' (?\]" the 151h Inlernational ( ?oql'ereuce 
on ('~ompulaliona\[ Linguistics, page 1076-1083, 
Kyoto, Japan. 
9. 
215 
