Extraction of Lexical Translations from Non-Aligned Corpora 
Kumiko TANAKA 
Faculty of Engineering 
The University of Tokyo 
7-3-1 Hongo, Bunkyo-ku 
Tokyo 113 JAPAN 
kumiko@ipl, t. u-tokyo, ac. jp 
Hideya IWASAKI* 
Educational Computer Centre 
The University of Tokyo 
2-11-16 Yayoi, Bunkyo-ku 
Tokyo 113 JAPAN 
iwasaki@rds, ecc. u-tokyo, ac. jp 
Abstract 
A method for extracting lexical trans- 
lations from non-aligned corpora is pro- 
posed to cope with the unavailability of 
large aligned corpus. The assumption 
that "translations of two co-occurring 
words in a source language also co-occur 
in the target language" is adopted and 
represented in the stochastic matrix for- 
mulation. The translation matrix pro- 
vides the co-occurring information trans- 
lated from the source into the target. 
This translated co-occurring information 
should resemble that of the original in 
the target when the ambiguity of the 
translational relation is resolved. An al- 
gorithm to obtain the best translation 
matrix is introduced. Some experiments 
were performed to evaluate the effective- 
ness of the ambiguity resolution and the 
refinement of the dictionary. 
1 Introduction 
Alignment of corpora is now being actively stud- 
ied to support example-based automatic transla- 
tion and dictionary refinement. Focusing on the 
latter, in order to obtain lexical translations, the 
maximum likelihood method is applied to roughly 
aligned corpus. One of the problems of this 
method is that it needs a large amount of aligned 
corpus for training (Brown, 1993). 
When it exists, a qualified dictionary is also 
likely to exist, because it should have been created 
and used when the corpus in the source language 
was translated by hand to make the aligned cor- 
pus. There are few requirements to improve dic- 
tionaries in such a case. On the other hand, when 
a large amount of aligned corpus does not exist 
but only two independent corpora do, for exam- 
ple, the corpora between two 'not so international' 
*Author's current address: Department of Com- 
puter Science, Tokyo University of Agriculture and 
Technology. 2-24-16 Naka-machi, Koganei, Tokyo 
184 JAPAN. 
languages or those in a constrained domain, the 
low quality dictionaries need to be improved. 
To make a new dictionary between two uncom- 
mon languages, it is often necessary to transform 
published dictionaries, one between the source and 
the international language, the other between the 
international and the target language. The prob- 
lem in this process is to eliminate the irrelevant 
translations introduced by words with ambiguous 
meanings (Tanaka, 1994). 
This carl be thought of as choosing the 
translations from several candidates with- 
out aligned corpus. Note that adopting aligned 
corpus of insufficient size cause the same situation. 
We therefore propose a method to extract lexi- 
cal translations using two corpora which are not 
aligned in the source and target language. Our 
method is proposed as the extension of the frame- 
work to solve the problem of choosing the trans- 
lation according to the context. Thus, one of tile 
merits of our research is that two problems, look- 
ing for the translation according to the global and 
local context, are handled within the same frame- 
work. 
2 Assumption and Ambiguity 
Resolution 
The source language is denoted as LA and the 
target as LB. Japanese and English have been 
adopted as LA and LB, respectively. Matrix A is 
defined with its (i, j)-th element as the value rep- 
resenting co-occurrence between two words ai and 
aj in LA, with a similar definition for B. A and 
B are symmetric matrices. The number of words 
in LA and LB are denoted as NA and NB. The 
(i,j)-th element of matrix X is denoted as Xij. 
The cited Japanese examples are listed in the 
Appendix with their transliterations and first 
meanings. The cited English examples are written 
in this font. 
2.1 Formalization 
Translations of two co-occurring words in a 
source language also co-occur in the target 
language is assumed. For example, doctor and 
580 
T 
au ~ bk 
AS ~ Tt AT vs. B 
a. ~ bt 
T 
Figure 1: Calculation of TtAT 
nurse co-occur in English and their translations 
\[~ and ~ also co-occur in Japanese. 
Rapp (1995) verified this assumption between 
English and German. He showed that two matri- 
ces A and B resemble each ottmr, when ai cor- 
respond to bi for all i. Thus, the resem'ch had 
the additional assumption that, English words and 
German words correspond one~to-one. 
We introdnce the translation matrix T from A 
to B because a word corresponds to several words 
rather than one. The (i,j)-th element of T is de- 
fined a~s the conditional probability p(bj\[ai), the 
translational probability of bj given hi. T forms 
a stochastic matrix, such that the sum of all ele- 
ments in the same row is 1.0. 
The co-occurrences A~ in LA can be translated 
into LB using both p(bklau) mid p(btlav): 
~-~p(bkla=)A=~p(btla,) (11 
Denoting for all Bkl, (1) can be rewritten in a 
simple matrix formulation as follows: 
TtAT (2) 
Note that tim resulting matrix is also symmetric. 
Returning to the example of doctor given in this 
section, its translation is ~ but not |~:t:, be- 
cause ~, the translation of the co-occurring 
word nurse, co-occurs with ~ but not with 15::1:. 
Thus, our assumption serves to resolve ambiguity. 
This fact indicates that the translated co~ 
occurring matrix T t AT should resemble/3 (Figure 
1). Defining IX- Y\] as a certain distance between 
matrices X and Y, ambiguity resolution is possi~ 
ble by simply obtaining T which minimizes the 
following formula: 
F(T) = ITtAT - BI (3) 
when A and B are known. Note that the above 
formulation assumes that the co-occurrence in LA 
can be transformed congruently into L~. Thus, 
T gives the pattern matching of two structures 
formed by co-occurrence relations (Section 4.2). 
2.2 The Choice of Co-occurrence 
~qeasure and Matrix Distance 
There :~:c many alternatives to measure co- 
occurrence between two words x and y (Church, 
1990; Dunning, 1993). Having fi'eq(x) as the count 
of x in the entire text, freq(x, y) as the number of 
appearances of both x and y within a window of 
a fixed number of words, and N as the number of 
words in the text concerned, we adopt the follow- 
ing mutual information: 
Nfreq(ai, a j) (4) 
freq( ai ) fi'eq( aj ) 
Rapp argues that, freq(ai, aj)2/freq(ai)freq(aj) is 
although more sensitive than above. Formula (4), 
however, will be adopted due to its statistical 
property being already studied (Church, 1990). 
Rapp normalized matrices A and B. We, how- 
ever, do not normalize from the reason that the 
value by Formula (4) is already normalized by N 1 . 
Distance for matrices should also be considered. 
Rapp used the sum of absolute distance of the ele- 
ments. Since our requirement is that the distance 
is easy to handle analytically to obtain T as in 
Section 4.1, the following definition was ctmsen: 
Ix - rl = - (5) 
i,j 
3 Local Ambiguity Resolution 
Note that, the elements with value 0.0 in a matrix 
are denoted by "-" in the following discussion. 
3.1 Example of doctor 
Suppose that doctor occurs in the local con- 
text "The doctor nursed the patient." We wmlt 
to disambiguate the meaning of doctor as the 
medical doctor, not Ph.D. As doctor co-occurs 
with nurse and patient, nurse with doctor and pa- 
tient etc., tim matrix A can be defined by Formula 
(4) as follows2: 
doctor nurse patient 
doctor - 3.0 3.0 
nurse 3.0 - 3.0 
patient 3.0 3.0 - 
For T, only the ambiguity of doctor is concerned 
here for simplicity, not that of nurse or patient, 
giving T as follows: 
doctor ~I~ 1 - - T41 - 
nurse - 1.0 - - 
patient - 1.0 -- 
Note that ~ is a co-occurring word with |~t. 
Here we are interested in whether Tll = 1.0 (doc- 
tor- \[~) or ~/~1 = 1.0 (doctor--- |~d:): the 
correct answer is clearly T11 = 1.0. 
1When we renormalized A and B and applied the 
incremental calculation which will be indicated in Sec- 
tion 4, T empirically oscillated and did not converge, 
because NA and NB can differ drastically. 
2The value 3.0 refers to NA, which is calculated as 
(NA X 1)/(1 x 1) -= NA. whereas 1 is the frequency 
of each occurrence. Here NA is 3, the three words 
doctor, nurse and patient. 
Tile quality of A is poor from a statistical point 
of view (Church, 1990). What is needed in the lo- 
cal ambiguity resolution is only the information of co- 
occurring words, and the co-occurrence values are not 
that important when forming A. Although there are 
other solutions for forming A, for example, to put all 
elements concerned simply to 1.0, this definition was 
used because the local and global problems can be 
handled within exactly the same framework. 
581 
B is obtained globally from the corpus in LB. 
Suppose that B for the words in question is given 
for simplicity as follows: 
N~ - ~0.0 50.0 - 
~l- ~5 10.0 2.0 8.0 - 
~ 50.0 8.0 .... 
-~t± - - - 3.0 15.0 
~ - - - 15.0 3.0 
We experimentally put Tl1 = 1.0, so that doctor 
corresponds to I!K~, and calculated TtAT giving 
the following result with F(T) = 5038: 
N~ ~i~-~ ,~ t$± ±~- 
N~ -~ 3.0 3.0 - - 
~ 70 3.0 - 3.0 - - 
• :~ 3.0 3.0 - - 
t*± ..... 
Next, we put T41 = 1.0, so that doctor corre- 
sponded to ~$:t:. TtAT gave the following result 
with F(T) = 5758: 
~-¢~ - - 3.O 3.0 - 
~ 3.0 - 3.0 - 
iS± - 3.0 3.0 - - 
~ ..... 
These two results indicate that T with ~/\]l = 1.0 
(doctor- N~ff) makes TtAT and B closer than 
T with T41 = 1.0 (doctor- ~i~=t:). Therefore the 
translation of doctor is determined to be \[~. 
The algorithm to choose the translation from 
several candidates reflecting the local context is 
summarized as follows: 
1. Create a local A. 
2. Make a T that assumes one candidate to be 
the translation. Calculate the distance F(T) 
for each candidate. 
3. Choose the T with the minimum F(T). 
3.2 Related Work 
Dagan (1994) proposed a method to choose a 
translation according to the local context. The 
significance of this work is that the ambiguity is 
not solved within LA, as was trmtitionally stud- 
led, but was solved in LB, same as our standpoint. 
Word to be translated (a~) and its relating word 
(av) concerning phrasal structure (for example ob- 
jective for verb) were translated into Lu (bi and 
by, respectively), using an electronic dictionary. 
The co-occurring frequency within LB was mea- 
sured and p(bk, bl lau, a.) was estimated as follows: 
\]req(bk, bt) (6) 
Dagan chose bk of the largest p(bk,blla~,,av) as 
translation after statistically testing its reliability. 
The difference with our method is that he esti- 
mated the translational probability between pairs 
(the word and its co-occurrence) whereas our 
framework reduces the translational probability of 
pairs into that of words. Thus, our method can 
be applied to obtain global translations, which will 
be explained in the following section. 
4 Global Extraction of 
Translations 
The extraction of global lexical translations is for- 
mulated using the same framework as ambiguity 
resolution in the local context. The difference is 
that A is formed globally from the corpus in LA. 
For local context, the number of possible trans- 
lations is small enough that each case can he 
tested one after another to find the best T. Un- 
fortunately, the same method cannot be applied 
to obtain global translations because the number 
of combinations of possible translations explodes. 
Hence, we propose a method to update T incr~ 
mentally. 
4.1 Steepest Descent Method 
T is not a square matrix and the number of equa- 
tions obtained by TtAT = B is not always equal 
to that of variables Tij, so the equation may not 
be solved directly. We therefore try to obtain the 
best T by the Steepest Descent Method (SDM) 
to minimize the Formula (3). T is incrementally 
updated from T~ to T,~+l by: 
T,~+I = T,~ + dT (7) 
where dT can be calculated with ds being a certain 
small length as: 
OF 
dTij -- OTij ds (8) 
The result can be represented as follows: 
dT = -4AT(TtAT - B)ds (9) 
The constraint for T that the sum of the same 
row must be 1.0 can be reflected on the calcu- 
lation using Lagrange's method of indeterminate 
coefficients. 
4.2 Characteristics of Our Method 
If words are regarded as nodes, relations such as 
co-occurrences and translations as branches, then 
matrices A, B and T represent graphs. 
Suppose that A and B are exactly the same 
graph as in Figure 2. The representation matrices 
are also indicated in the figure. 
The best T is obviously as follows, 
- - 1.0 - 
T= - 1.0 - - 
1.0 - - - 
This means that al, as, a3, a4 correspond to b4, 
b3, b2, bl respectively. It also indicates that al 
582 
A= p - r s B = r q r s r - 
- s - q p 
al b4 
a2 I:~t-~ a 3 
B IS 
a4 bl 
Figure 2: Graphs of Matrices A and B 
-) q 
p 
bt b5 I( A) 
B = ~ A A b2 b7 
b4 bs 
Figure 3: Another Graph of Matrix B 
does not eorrest)ond to b3, b2, or b~, whi('h is ex- 
actly the disambiguation. In terms of linear al- 
gel)ra, the calculation TtAT is so-called a "con- 
gruent transformation." T provi(tes the l)attern 
matching of the two graphs given by A and B. 
Next, sut)pose that A is defined ,~ al)ove and II 
is written in a block matrix as shown in Figure 3, 
containing the same grat)hs as A. ~/' will clearly 
be T = 1/2(E E) with E being a unit matrix of 
size 4. The I)oint is that our algorithm has a limit 
for aunbiguity resolution especially when there are 
several resembling graphs interc(mnected, that is, 
the ambiguity of aj cannot be resolved between b:l 
and b~. 
On the other hand, as shown in (Brown, 1993), 
methods using aligned corlms does not have this 
limit. Starting his nmthod with every English 
word eorrest)onding to all French words, only sev- 
eral French words remain as translations in the 
result. This difference shows our weak point com- 
t)ared with Brown's. 
Our inethod, assunfing that two graphs can be 
linearly transformed, only tries to make a match 
between two grat)hs in LA and LB without aligned 
corpus, so some hints for obtaining the correct cor- 
respondences, some compensations for the. lack of 
aligned corpus, are nee(ted. For example, when 
the wtlue of (i,j)-th element is zero in T0, the 
value of the saine element can be ket)t at zero dur- 
ing the SDM. 
4.3 Related Work 
Some research using aligne(t corpus point (),it 
problems with corpus size and noise, which leads 
to insufficient a('curacy in translations. 
Fling (11995) asserts l;hat translation of words 
or I)hrases might not exist even in the aligned 
corpus. She extracte(l noun translations from 
noisy aligned corpus. First, a number of obvi- 
Table 1: Local Ambiguity Resolution Power 
verbn°unPOS ~ unresolved276 
adjective 1 2 
adverb 4 
total 49 
ous translations were statistically extracted, then 
the mlce.rtaill translations were found using the 
co-occurrence with the obvious ones. 
Utsuro (1994) claimed that there is a nee(t to 
extract lexical translations even from an aligned 
corpus of a small size an(t proposed to use an (dec- 
tronic (tictionary as an aid. First, a certain nlllll- 
bcr of candidates are found. If a candidate in LB 
co-occurs with miother found ill the electronic di('- 
tionary, its probability of being the translation is 
adjusted to be higher. 
The cominon idea in the two approaches, the 
use of lexical co-occurrence within Lu, was also 
introduced by Dagan (1994). 
5 Experiments 
Two experiments, local and global, were t)er~ 
formed t)y choosing the ,Japanese translations for 
English words. The corpora adoptc(t are the 30M 
Wall Street Jom'nal and 33M political and eco- 
nonfi(" articles of Asahi Newspaper. 
These were morphologically mlalyzed a to ex- 
tract; nouns, verbs, adje(:tives and adverbs in 
canonical forms. Co-oecurren(:cs were counted us- 
ing an 11 word window size. A and B were created 
as was depicted in Section 2.1. Elements under 
the certain thresholds were set at 0.0. The initial 
bilingual dictionary used was Edict (Breen, 1995), 
a word-to-word public dictionary. 
5.1 Local Ambiguity Resolution 
We randoinly extracted 11 successive words from 
cort)us. If the 6th c(mter word was ambiguous sat- 
isfying the following three conditions, the method 
explained in Section 3.1 was applied for (tisam- 
t)iguation: its translations could t)e subjectively 
judged according to the context; the translations 
exist in Edict; Edict contains candidates other 
than the translation. 
The calculation choice was selected as the one 
which exhibited the minimum F(T). If all tit(; 
scores were the same, it was judged unresolved. 
When our subjectively ju(lged translations con- 
tained the calculation choice, it was correct, other- 
wise wrong. The experiinent was performed ,mtil 
the amhiguity was resolved for 200 ditferent words. 
Table 1 shows tile results. The applicability, 
the rate of words which were not unT~;solw:d, was 
apC-KIMMO and JUMAN were used. 
583 
research scissors 15o.o /1oo 
university -'" professor -'- paper 
~15.0~. 7~;.0 ~175~ 
doctor 15.0/t -,='~ "-175,0 
/~ to:u 
nurse -- hospital -- patient -- hurt 10.0 15.0 5.0 
Figure 4: A Graph of doctor 
3.o~50.o ~ 
:~-5.0-~-5.0- -~:~ /10.0 
\] 5.0 ~:~/175.0 i~. 
~¢ (~ z 175.0 j~" 7s.o \ 
10.0 15.0 5.0 10.0 
Figure 5: A Graph of ~ and 1~± 
75.5% ((124+27)/200). The correctness (preci- 
sion), the rate of the correct candidates among the 
words not unresolved, was 82.1% (124/(124+27)). 
The general trends found are as follows: 
• Translations reflect the trends in the corpus. 
For example, for doctor, I~ilf was calculated 
to be the best choice. Although I~ was also 
a candidate meaning medical doctor, it was 
dropped, because \[~ is a rather uncommon 
usage in the corpus. 
• Most words with two obviously different 
meanings were calculated to obtain the cor- 
rect result. 
The applicability depends on the window size, 
such that the window should be large enough to 
focus the meaning of the word in question. The 
smaller the size is, the lower the rate should be. 
However, even if the window is made wider, the 
rate should eventually reach a certain limit. 
5.2 Global Extraction of Translations 
Example of doctor 
Figure 4 shows a small graph concerning doc- 
tor. The values attached to branches represent 
co-occurrences. Figure 5 shows the corresponding 
graph in Japanese. We initially defined A and B 
from these graphs, and To as each English word 
corresponding one-to-one to the Japanese word 
(with a value 1.0), except that three ambiguous 
words have the following correspondences: 
doctor -+ ~$(0.333), is±(0.333), 
~(0.334) 
pa~ent --+ ~¢~J-~ (0.5),,,~(0.5) 
paper ~ ~(0.5),~(0.5) 
SDM was applied to To and its convergence was 
judged with the first 5 digits of F(T). This needed 
3400 iterations for convergence. The result T3400 
is as follows: 
doctor -~ ~i~ (0.502), iS=i: (0.498), 
~ (0.0) 
patient -~ ~¢~-4-~5 (0.0), ~ (1.0) 
paper --+ \]~5~ (0.989), ~ (0.011) 
doctor 
nurse -- hospital -- patient -- hurt 10.0 15.0 5.0 
Figure 6: A Graph of medical doctor 
research scissors 
/.. 3"~''~ 50 0 5 o " 5.0 /lO.O 
un versity ~ professor -- pa~er 
~"15.0 75.0 ~175.0 
doctor 
Figure 7: A Graph of Ph.D. 
The wrong translation doctor--~ was dropped. 
Next, we removed from Figure 4 the portion 
of the graph which corresponds to the meaning 
of Ph.D. (Figure 6) so that the context was re- 
stricted to medical doctor. This time the result 
W~L~: 
doctor -~ ~¢~ (1.0), is=t: (0.0), ~ (0.0) I 
patient --~ ~¢J~5 (0.0), ~ (1.0) I 
Then we removed from Figure 4 the portion of 
the graph which corresponded to the meaning of 
medical doctor (Figure 7) so that the context was 
restricted to Ph.D, giving the result: 
doctor ---} ~g/li (0.0), is± (1.0), ~}~ (0.0) \] 
paper --+ ~$9: (0.996), ~ (0.004) I 
These three small experiments show that the 
translation for doctor reflects the context repre- 
sented by the source graph in LA. 
Minor Analysis of 378 words 
The best experiment is to calculate T for entire 
dictionary and measure how much the obtained 
translations reflect the corpus context, but this 
is difficult both from calculation time and judg- 
ment of context reflection. Hence we intentionally 
added to Edict the irrelevant translations to see if 
they drop out by our method. 
The irrelevant translations were chosen ran- 
domly so that they become the same number as 
those which existed originally in Edict. This was 
performed for entire English words in Edict. A 
was formed so that all the words involved are 
reachable within 2 co-occurrence branch distances 
from the test word. B is created by all translations 
of words involved in A. The test words applied 
SDM was selected by the following conditions: a 
test word has more than one candidate (ambigu- 
ous words) in Edict; its all co-occurrence values 
are greater than a certain threshold. 
If the candidates are separated into the follow- 
ing three categories through calculation: those 
which gain value, decrease value, and those whose 
values do not change, then we define the word in 
question as applicable. The following rates were 
calculated for CDIW (correctly dropped irrelevant 
words, ~he irrelevant words added as a noise and 
dropped correctly by the method) for each appli- 
cable test words: 
584 
Table 2: Dropped Irrelevant Translations 
threshold \[ applicability correctness coverage 
50.0 \] 68.3% 84.7% 35.2% 
30.0 84.7% 84.6% 41.9% 
• The fraction between the number of CDIW 
and dropped words. (correctness, recall) 
• The fraction between the number of CDIW 
and irrelevant words. (coverage) 
The results are listed in Table 2. 
The applicability and coverage depend on the 
threshold: the lower the threshold is, the higher 
the two rates increase because more co-occurrence 
information is obtained. The threshold is a trade- 
off with calculation time. 
About 15% (100-84.6) incorrectly dropped 
ones were original translations contained in Edict. 
These did not match the context, similar to the 
case of (doctor--~) shown in Section 5.1. 
6 Conclusions 
Lexical translations were extracted from non- 
aligned corpora. The assumption that "trans- 
lations of two co-occurring words in a source 
language also co-occur in the target language" 
was introduced and represented in the stochas- 
tic matrix formulation. The translation matrix 
provides the co-occurring information translated 
from the source into the target. This translated 
co-occurring information should resemble that in 
the target when the ambiguity of translational re- 
lation is resolved. This condition was used to ob- 
tain the best translation matrix. 
The proposed framework, aimed at ambiguity 
resolution, serves to globally obtain lexical trans- 
lations using non-aligned corpora just as to choose 
a translation according to the local context. The 
algorithms for obtaining the best translation ma- 
trix were shown based on the Steepest Descent 
Method, an algorithm well known in the field of 
non-linear programming. 
Two experiments were t)erformed to exanfine 
the power of local ambiguity resolution and dictio- 
nary refinement. The former showed a precision 
of 82.1% with applicability of 75.5%. In the latter, 
irrelevant translations were intentionally added to 
the dictionary to examine whether the relevant 
ones will be chosen. It was found that 84.7% of 
the dropped words were indeed irrelevant ones. 
An important future task is to decrease the 
computational complexity. The method is appli- 
cable to matrix calculation with the size of an en- 
tire dictionary, but this is unrealistic at this stage. 
We must also increase the rate of ambigqfity reso- 
lution. The corpus is regarded as non-structured 
data in this paper, the ambiguity might be re- 
solved more effectively by introducing a phrasal 
structure. 
Acknowledgment 
We thank Dr. Koiti Hasida for useful discus- 
sion. Our experiments are supported by Dr. Kyoji 
Umemura's corpus data. We express our grati- 
tudes to Mr. Breen for providing his Edict for our 
experiments. 
References 
James W. Breen, (1995). Edict, Freeware Japanese / 
English Dictionary. 
Peter F. Brown et al. (1993). The Mathematics of 
Statistical Machine Translation: Parameter Es- 
timation. Computational Linguistics, vol. 19(2), 
pp. 263-311. 
Kenneth W. Church and Patrick Hanks (1990). Word 
Association Norms, Mutual Information, and Lex~ 
icography. Computational Linguistics, vol. 16(1), 
pp. 22 29. 
Ido Dagan and Alon Itai (1994). Word Sense Dis- 
ambiguation Using a Second Language Monolin- 
gual Corpus, Computational Linguistics, vol. 20 (4.), 
pp. 563-596. 
Ted Dunning (1993). Accurate Methods for the Statis- 
tics of Surprise and Coincidence. Computational 
Linguistics, vol. 19 (1), pp. 61-74. 
Paseale Fung (1995). A Pattern Matching Method for 
Finding Noun and Proper Noun Translations from 
Noisy Parallel Corpora. Proceedings of ACL '95~ 
pp. 236-243. 
Reinhard Rapp (1995). Identifying Word Translations 
in Non-Parallel Texts. Proceedings of ACL '95, 
pp. 321-322. 
Kumiko Tanaka and Violaine Prince (t995). Amelio- 
ration Automatique Incr(~mentale de Dictionnaires 
Bilingues Utilisant un Corpus Monolingue. Confer- 
ence Internationale d'A UPELF '95. 
Kumiko Tanaka and Kyoji Umenmra (1994). Con- 
struction of a Bilingual Dictionary Intermediated 
by A Third Language. Proceedings of the Inter- 
national Conference for Computational Linguistics 
'9~, pp. 293-393. 
Takehito Utsuro et al. (1994). Bilingual Text Match- 
ing using Bilingual Dictionary and Statistics. Pro- 
ceedings of the International Conference for Com- 
putational Linguistics '9~, pp. 1076-1082. 
Appendix 
Japanese Transliteration First meaning 
i¢k 
g 
isha 
ishi 
hakase 
kangohu 
kangosuru 
kanja 
itai 
daigaku 
ronbun 
kyouju 
gamansuru 
kami 
h,~ami 
medical doctor 
medical doctor 
Ph.D. 
nurse 
to nurse 
patient 
hurt 
university 
paper as articles 
professor 
be patient 
paper to write on 
scissors 
585 
