Extracting Word Correspondences from Bilingual Corpora 
Based on Word Co-occurrence Information 
Hiroyuki Kaji and Toshiko Aizono 
Central Research Laboratory, Hitachi Ltd. 
1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185, Japan 
{ kaji, aizono }@crl.hitachi.co.jp 
ABSTRACT 
A new method has been developed for extracting word 
correspondences from a bilingual corpus. First, the 
co-occurrence infi~rmation for each word in both 
languages is extracted li'om the corpus. Then, the 
correlations between the co-occurrence features of the 
words are calculated pairwisely with tile assistance of a 
basic word bilingual dictionary. Finally, the pairs of 
words with the highest correlations are output 
selectively. This method is applicable to rather small, 
unaligned corpora; it can extract correspondences 
between compound words as well as simple words. An 
experiment using bilingual patent-specification corpora 
achieved 28% recall and 76% precision; this 
demonstrates that the method effectively reduces the cost 
of bilingual dictionary augmentation. 
1 Introduction 
Bilingual dictionaries are essential componeuts for 
machine translation systems. One of the major problems 
with bilingual dictionaries is that they are expensive to 
build, since a huge number of terms are used in a variety 
of fields. Cnmputer support is thus needed to reduce the 
cost of dictionary building. 
With the growing wdume of text available in 
electronic lorm, a number of methods have been proposed 
tor extracting word correspondences from bilingual 
corpora automatically. These methods can be divided 
into those taking a statistical approach (Gale & Church 
1991a; Kupiec 1993; Dagan et al. 1993; Inoue & Nogaito 
1993; Fung 1995) and those taking a linguistic approach 
(Yamamoto & Sakamoto 1993; Kum~mo & Hirakawa 
1994; Ishimoto & Nagao 1994). The statistical approach 
utilizes the occurrence frequencies and locations of words 
in a parallel corpus to calculate the pairwise correlations 
between the words in the two languages. The linguistic 
approach primarily extracts correspondences between 
compound words by consulting a bilingual dictionary of 
simple words. 
These proposed methods for extracting word 
correspondences from bilingual corpora have the 
following drawbacks. First, most of theln assume that 
the input corpora m'e aligned sentence by sentence, which 
reduces their applicability remarkably. Although a 
number of automatic sentence alignment methods have 
been proposed (Brown et al. 1991 ; Gale & Church 1991 b; 
Kay & Roscheisen 1993; Chen 1993), they are not very 
reliable for real noisy bilingual texts. Second, the 
statistical methods usually require a very large corpus as 
their input. However, it is not easy to obtain a very large 
corpus. Third, tile linguistic methods are restricted to 
extracting correspondences between compound words. 
We have developed an extraction method that is free 
fi'om the above drawbacks. In Sec. 2 we describe the 
hasic idea of our methud and give an overview. In Sec. 3 
we describe the technical details, and in Sec. 4 we 
describe an experiment using patent-specification texts. 
In Sec. 5 we make a remark on the effectiveness of the 
proposed method, and discuss directions for 
improvement. 
2 Overview of Proposed Method 
The finding underlying our proposed method is as 
follows. In a hilingual corpus, a pair of words 
corresponding to each other generally accompany the 
same context, although expressed in the two diflcrent 
languages. If we calculate the pairwise correlations 
between the contexts in which the words occur, a 
correponding pair of words will show a high correlation. 
Although one occurrence of a word may not give a 
suMcient context to chm'acterize the word, accumulating 
all the contexts in which the word occurs throughout the 
text allows the word to be distinguished from the other 
words in the same language text. 
Figure 1 shows how two words are associated through 
their contexts, each expressed in its respective language. 
We use the set of words co-nccurring with word w, which 
we refer to as the co-occurrence set of w, to concisely 
represent tire accumtdated contexts characterizing the 
word. To associate two co-occurrence sets whose 
elements are words in different languages, we consult a 
bilingual dictionary and extract the possible word 
correspondences between them. The point is that even if 
the pair of words to be associated is missing in the 
bilingual dictionary, their co-occurrence sets can be 
associated through the bilingual dictionary. Of cource, 
some of the correspondences between the co-occurrence 
sets may be also missing in the bilingual dictionary. 
Nevertheless, the co-occurrence sets can be still 
associated, owing to the other correspondences between 
them that arc contained in the bilingual dictionary. 
• Our proposed method (Fig. 2) is based on the above 
23 
Japanese text 
...... T b° 1/5~©~ " *jJ75¢-~- 7~ ~ ......... i 
........................ ~o~x~, i~, I 
AND). r'~ b 7J~OgJ~7Oo • ................................... I 
I 
English text 
.................. the two inputs to the addres~ comparato~ 
coincide with each other, • ................................... 
................................. a lock identification number 
register, an identification numberlcomparato~ and an 
AND gate ................................. 
41, 41, 
Co-occurrence set of 'J:L~L~i~'~' Co-occurrence set of 'comparator' 
...................... ~i ~~- 
Dictionary ~~~~~~iiiii~!:i:i ........ !i~i!i!!iii!iiiiiiii! ~ .... 
Fig. 1 Associating words through contexts. 
Japanese text 4. 
I Sentence segmentation I 
- @ Set of sentences 4. 
I Morphological analysis 14~ "m 
English text 
4. 
I Sentence segmentation I @ 
Set of sentences 4- 
\] Morphological analysis I 
4, 
4, I 
Co-occurrence data extraction "l 
4- 4. 4. 
oo occu.on e  ot,or ) I Oa, u,at,oo o, corre,a,,oo  I Co-occ rrooco 
each Japanese word 41- "~"11 each English word 
Correlation for each pair of 
Japanese and English words ! i._~ 
4, m~ 
Selection of highly correlated pairs of words I II 
4. m" II 
Pairs of Japanese and English words " " " 
Set of words~ffor each sentence Set of words for each sentence 
ual dicti I Co-occurrence data extraction \] 
Fig. 2 Method for extracting word correspondences. 
idea. While the examples shown here are for Japanese 
and English, the method is applicable to any pair of 
languages. The method is divided into three parts: 
Japanese text processing, English text processing, and 
bilingual processing. The Japanese text processing is 
composed of sentence segmentation, morphological 
analysis, and co-occurrence data extraction. It extracts a 
co-occurrence set for each word from a Japanese text. 
Likewise, the English text processing extracts a 
co-occurrence set for each word from an Engish text. The 
bilingual processing then calculates the pairwise 
correlations between the co-occurrence sets for Japanese 
words and those for English words, and selects the pairs 
of words with the highest correlations. 
3 Technical Details 
3.1 Extraction of words from text 
Natural language texts are composed of two types of 
words: content words and function words. The target of 
extraction can usually be restricted to the 
correspondences between content words, which are 
characterized by both dominance in number and 
straightforwardness. Additionally, the function words are 
useless as elements of co-occurrence sets, since they do 
not indicate specific contexts. Therefore, we extract only 
the content words from the texts in both languages. 
The content words are divided into simple words and 
24 
compound words. The tbnner are extracted by dictionary 
look up mid morphological analysis. To extract the 
latter, we are describing a set of rules or patterns. So far, 
we have only addressed nominal compounds (simple noun 
phrases), whose patterns arc given below. Here, N, A, 
and NP stand for noun, adjective, and simple noun phrase, 
respectively. Nq- stands for at string of one or more Ns. 
• Japanese nominal comlxmnds: NP := N N+ 
• English nonfinal compounds: NP := N N+ I A N+ 
The nominal compounds are extracted from the 
morphological analysis results by pattern matching. 
Here, an NP included in a larger NP is rejected, since only 
self-contained NPs qualify as nominal compounds. One 
exception is an English NP starting wilh a noun that is 
included in an NP starting with an adjective, lmcause the 
case of an adjective modil'ying a nominal compound is 
just as likely as the case of an adjective being a part of a 
no Illinltl C O m pc ulld, 
3.2 Extraction of co-occurrence data 
Definitions of 'co-occurrence' include syntactic 
co-occurrence, co-occurrence in a k-word window, 
co-.occurrcuce ill a sentence, and co-occnrfcncc ill a 
documen|. We use co-.occurrence i n a sentence, i n which a 
pair of words occurring within the same sentence is 
regarded as a co-occurrence. While co-occurrence in a 
k-word window may produce better results when a 
sentence in one hulguage corresi)onds to a sequence of 
lwo or more shorler sentences in tile other language, it is 
difficult to determine an appropriate wdue of k because 
word order differs considerably between Japanese and 
English. 
The relations between a compound word and its 
constituent words are not, strictly speaking, 
co-occnrreuce relations. Moreover, if we treated them in 
the same nlanucr its co-occurrence relations, it would 
cause some confosion. Suppose that compound word w is 
composed of lwo simple words, w' and w". If we included 
both w' and w" in the co-occurrence set of w, and vice 
versa, the differences between the co-occurrence set of w 
and those of w' and w" woukl decrease. Therefore, we 
exclude the constituent words from the co-occurrence set 
of a compound word and vice versa. 
As mentioned in Section 2, the co-occurrence sets of a 
word are accumulated. This is not a mere union operation, 
but a union operation accompanied by frequency 
counting. The resultant co-occurrence set i s expressed as 
C(w)= {w,/f, \[ i = 1 ,-'-, n }, 
which shows that word w~co-occurs with word w ( times. 
3.3 Calculation of correlations between 
words 
We define correlation R(jw, ew) between Japanese word 
jw and English word ew as follows. 
R(jw, ew) : I c(jw) Cl C(ew) I / 
{ I C(iw) I t- I C(ew) I I C(/w) r) C(ew) I 1. 
Here, C(jw) = {jw,/t~ I i= 1 ,'", m} and C(ew) = {e%/gj I 
j= 1,'", n} are the co-occurrence sets of jw and ew, 
respectively. C(j'w) f) C(ew)= {(\]w i , ewj)/lkjl i= 1,'", 
m; j= 1,'", n} is the intersection of C(jw) and C(ew), 
whose elements ~u'e pairs of a Japanese word and an 
English word with their frequency. \] • \[ means the sum 
of frequencies of all elements. 
Generating intersection C(\]w) f) C(ew)from C(\]w)and 
C(ew) is not easy because the procedure ofpairingjw~ (c_ 
C(\]w) ) and eu~ (E ~ C(ew) ) is nondeterministic. A pair of 
words cannot be determined independently of the other 
possible pairs. To reduce processing time, we calculate 
J C(jw) () C(ew) I approximately, as illustrated in Fig. 
3. For example, the English-based approximate 
calculation is done as follows. First, Japanese 
co-occurrence set C(jw) is transformed into pseudo 
co-occurrence set Cl,(jw) by consulting bilingual 
dictionary D, which is a set ()f pairs of words: 
Cp(jw) = {ewj/f'jI j = 1 ,'", n}, 
where f'j = Z f~. 
m ~ CO'w) & (iw,, ew? ~ D 
The intersection of pseudo co-occurrence set Cp(jw) and 
English co-occurrence set C(ew) is then generated: 
Cp(iw) (3 C(ew) = {e%/min{f 'j, gi} \] J = 1 ,'", n}. 
Finally, \] Cp(iw) () C(ew)\] is calculated as the 
al)proximatc wtlue of I C(jw) rl C(ew) I : 
I Cp(jw) (~ C(ew) I = )2 min{f'), gj}. 
J 
Tiffs approximate calculation is likely to result in an 
overestimated correlation when there is ambiguity in 
pairing jw, ((! C(/w) ) and eu~ (G C(ew) ), as occurs in 
Fig. 3(a). Figure 3(a) shows that the number of elements 
in the intersection exceeds that in the Japanese 
co-uccurrence set. The English-based and Japanese-based 
approximate calculations therefore do not always 
coincide with each other. While selecting the minimmn 
of the two approxinmte wducs is safer, it does not 
guarantee a precise value. Since ambiguity in associating 
co-occurrence sets does not occur too often, and 
considering the need lbr efficiency, we execute either of 
the two approximate calculations rather than make a 
precise calculation. 
To increase tile reliability of the correlation values, we 
remove tile useless words from tile Co-occurrence sets 
before calculating the correlations. The useless Japanese 
word i s jw such th at { ew I (j'w, ew) c- D} (\] { ew I ewe- TE}= 
(T u is the input English text), and tile useless English 
word is ew such that {jw I (\]w, ew) ~ D} f\] {jw \[ jw~: Tj} = 
(Tj is the inpt, t Japanese text). These words do not 
contribute to the word-pair correlations. 
3.4 Selection of pairs of words with high 
correlation 
The absolute values of the correlations are not significant 
because they are sensitive to the numbers of words in the 
co-occurrence sets, which vary considerably from word to 
word. However, their relative values are significant when 
either a Japanese or an English word i s fixed. We take the 
strategy of selecting the mutually best-matched pairs 
having no highly probable competitors. We call (jw, 
25 
(P~q~, issue: I ._o ~ ~) 
\[~dJ ~d~, probiemy 
C jo;~\]cu~l:ncede t of C° ECgfi :he nCoer;e t ° f 
(a) English-based approximate calculation 
~ (~l', issue) 
:i:i:i ~ II~ ~ t ~~0 (PallS, issue) ~~~ 
~"~ l ~~-~ ~\[~, problem)~'~" ~ 
Co-occurrence set of Co-occurrence set of 
a Japanese word an English word 
(b) Japanese-based approximate calculation 
Fig. 3 Approximate calculation of correlation. 
ew), a pair of a Japanese word and an English word, the 
mutually best-matched pair when 
R(jw, ew) > R(jw, ew) for any ew'( d~ ew) and 
RO'w, ew) > R(jw', ew) for any jw'( d:jw). 
When for a mutually best-matched pair (jw, ew), there 
exists either ew' such that 
R(jw, ew) > a • R(jw, ew) and ~w, ew)C D 
or jw' such that 
R(jw', ew) >" a " RUw, ew) and (jw', ew) < D, 
we call (jw, ew) or (jw', ew) a highly probable 
competitor• Here, a is a predetermined constant (0 < a 
<~ 1 ), and D i s the bilingual dictionary. 
3.5 Feedback of extracted pairs of words 
Obviously, the performance of the proposed method 
depends upon the coverage of the bilingual dictionary 
over the corpus. The coverage is the proportion of the 
word correspondences in the corpus that are already 
contained in the bilingual dictionary. Generally 
speaking, the wider the coverage, the more reliable the 
correlation values. Accordingly, the feedback of 
extracted pairs will probably improve performance, even 
though some of them are erroneous. In Fig. 2, the 
feedback is represented by dotted line. 
4 Experiment and Results 
We implemented our proposed method on a workstation 
and carried out an experiment using patent-specification 
documents in Japanese and English and a bilingual 
dictionary for a machine translation system. The 
dictionary contains approximately 60,000 Japanese 
entry words, each having several English translations. 
The quantitative profile of the sample patent documents 
is shown in Table l(a). 
We executed the word correspondence extraction 
program for each document. Parameter a in the 
selection of pairs of words was assumed to be 0. This 
means that tile output pairs were limited as much as 
possible. Both results before and after feedback were 
obtained to evaluate the effect of feedback. The extracted 
pairs of words were divided into two groups: those which 
are already contained in the bilingual dictionary and 
those which are not yet contained in the bilingual 
dictionaryJ ) The former are insignificant from the 
practical point of view. However, they are signficant in 
evaluating the effectiveness of the proposed correlation 
measure because the dictionary information regarding a 
particular pair of words does not contribute to the 
correlation between the pak itself. Accordingly, we 
evaluated two cases: Case A - the already known pairs of 
words are included - and Case B - the already known pairs 
of words are excluded. 
A good way to evaluate word correspondence 
extraction methods is to measure their recall and 
precision. These measures are defined as follows. The 
recall is the proportion of all word correspondences in a 
lJWe neglected tile reference numbers peculiar to 
the patent docmnents because their correpondences 
are Irivial. Tile underlined numerals in the following 
pair of sentences is an example of a retbrence 
number: ...... ~: g b'Z\]:L~,~ 5 0 4 a)~XJjT)~ -~ 
7a ~ ...,/ ...... the two inputs to address compm'ator 
504 coincide with -'-. 
26 
1I Ill IV 
2,089 8,023 i- 2'84% 
120 686 230 
17.4 11.7 16.7 
273 719 392 
(43) (146) (51) 
Table 1 Experimental profile and results. 
(a) Profile of sample patent documents 
Document # I 
~-~__N.umber of content words * \[a\] 1,322 
I Number of sentences \[b\] 90 
FA~e~'ag~e sm~encel~l~h- .... "\[a\])\[b-\] 14.7 
Fbium-b er o~' c~mtent words * * ...... \[c'-\] 202 
\] (Number of content words whose \[d\] (39) 
~'. I-translationsareunknown) 
"\ \] Number of candidate compound words \[e\] 62 
", ~Number of content words * \[a'\] .\] 1,463 
~"rAve~,~ seniencelalgit; ~'\]-/-\[b 7\] | 15.6 
• one coullt pet" occtlrrellce ** OllO cotlnt per word 
97 395 251 
2,055 9,561 4,326 
V Total 
_ _ 2.44_9_ _ 17,7 29 
178 1,304 
13.8 13.6 
524 2,110 
(97) (376) 
288 1,093 
2,872 20,277 
178 1,355 
16.1 15.0 
143 704 236 
14.4 13.6 18.3 
(b) Results of Case A 
"-'d 
,--M 
Document # 1 
78 Number of t: words extracted \[fl \] 
Number of correct mirs extracted \[gl \] 
II HI IV V Toml 
123 366 247 203 1,017 
69 115 322 212 172 890 
Pseudo-recall ...... j# 
\[gll/\[fl\] Precision 
Number of p~ words extracted It2\] 
- i/.2~ ~ 
83 
Number of correct mirs extracted \[g2\] 75 
- -0.5 l-~ 
135 
0.289 - 0.330 - -0.2~2- - q)~78 
0.880 0.858 0.847 0.875 
400 257 231 1,106 
125 355 220 198 973 
Pseudo-recall \[g2\]/(\[c\]+\[e\]) 0.284 
l'recision \[g2\]/112\] 0.904 
0.338 0.319 0.342 0.244 0.304 
0.926 0.888 0.856 0.857 0.880 
(c) Results of Case B 
Document # 
Number o f pairs of words extracted I h 1 \] 
~ Nulnber of correct pairs extracted \[il\]- 
- l~eudo_,Tccali \[ill/(\[~t\]~;\[~) - 
~ -Precision '\[Hl~ldl- 
Number of pairs of words extracted \[h2\] 
-Iqumb'e~r of correct pairs extracted 1i2\] 
~,..m ~ - l~eu~-~cal\[ \[i2\]/(\[d\]+le\]) 
< ~ -Prec-isi-on \[i21/\[h21 
I II 11I 1V V Total 
31 53 190 131 100 505 
22 45 146 96 69 378 
(I.218 0.321 0,270 0.318 (I.179 0.257 
0.710 0.849 0.768 0.733 0.690 11.749 
31 60 202 140 11l 544 
23 50 157 103 78 411 
0.228 0.357 0.290 0.341 0.203 0.280 
0.742 0.833 0.777 0.736 0.703 0.756 
bilingual corpus that m'e actually extracted. The 
precision is the proportion of extracted word 
correspondences that arc actually correct. While the 
precision is rather easy to calculate, the recall is difficult 
to calculate because it is a time-consuming task to 
manually identify all the word correspondences in the 
bilingual cortms. Therefore, instead of calculating the 
recall according to its de_nition, we make a rough 
estimation using the ratio of the number of correct pairs 
of words extracted to the number of words in either the 
Japanese or English text. We call this the pseudo-recall. 
The pseudo-recall indicates the lowest limit of the recall 
since a word in the Japanese text does not always have a 
straightforward counterpart in the English text, and vice 
versa. 
Tables l(b) and (c) show the pseudo-recall and the 
precision in Cases A and B, respectively, lu Case A, the 
pseudo-recall and precision before feedback were 27.8% 
Table 2 Examples of extracted word 
correspondences. 
Example 
( S, S ) ( ~Jl!-~, pumping ) ( ij I ~ ~,)~ ~, subsequently ) 
( S, C ) ( i~lhi, liquid level ) ( ~-, thin fihn ) 
( C, S ) ( ~f,4~ ~,i~, vaporizer ) ( ~)~f~ l l, connector ) 
( YfV, ~k,,,,~. ,--~\]¢~t~., gas supplier ) 
( ~)~3/. Jill ,~,,, radio frequency heating ) 
S: simple word, C: compound word 
and 87.5% respectively, and those after feedback were 
30.4% and 88.0%. In Case B, the pseudo-recall and 
precision before feedback were 25.7% and 74.9% 
27 
respectively, and those after feedback were 28.0% and 
75.6%. 
The experiment confirmed that the proposed method 
can extract not only compound word correspondences but 
also simple word correspondences from a small corpus. 
Examples of word correspondences extracted from a 
patent document are shown in Table 2. The comparison 
of results before and after feedback supported the 
effectiveness of using feedback. That is, feedback 
increases recall while preserving precision. We also 
ascertained that repeating the feedback one more time did 
not result in significant improvement. 
5 Discussion 
The experiment shows that the proposed method is 
effective in reducing the cost of bilingual dictionary 
augmentation. Tile recall of the method is not high. 
Furthermore, it cannot extract more than one 
correspondence for a word. Still, the method is effective 
because it can extract from a small corpus. Bilingual 
documenLs should be handled separately. Even if a 
correspondence pair of words fails to be extracted from 
one bilingual document, it may be extracted from another 
bilingual document, where it occurs prevailingly. 
The following are directions for further improvement. 
(1) Refinement of nominal compound extraction 
procedure: 
The simplified procedure described in Sec. 3.1 often 
causes omission (a nominal compound is not extracted) 
and noise (an inappropriate word string is extracted). 
These are major causes of errors in word correspondence 
extraction; refining the nominal compound extraction 
procedure will considerably improve recall and precision. 
(2) Use of symbol/numeral correspondences: 
In the present implementation, the correspondences of 
symbols and numerals are not used in calculating the 
correlation because the bilingual dictionary does not 
contain them. However, they have the potential of 
increasing the reliablilty of the correlation values. A 
character-string-matching routine to identify the 
correspondences of symbols/numerals should thus be 
added to the correlation calculation module. 
(3) Use of the constituent word information of compound 
words: 
The key idea of our method is to associate a pair of words 
through their co-occurrence information with the 
assistance of a bilingual dictionary. In contrast, that of 
the previous linguistic methods is to associate a pair of 
compound words through their constituent word 
information with the assistance of a bilingual 
dictionary. These two are not incompatible. Combining 
them would surely increase the recall and precision for 
compound word correspondences. 
6 Conclusion 
We have developed a new method for extracting word 
correspondences from bilingual corpora. The essence of 
the method is to calculate correlations between words 
based on their co-occurrence information with the 
assistance of a basic word bilingual dictionary. This 
method is applicable to rather small, unaligned corpora; 
it can extract correspondences between not only simple 
words but also between compound words. In an 
experiment with patent corpora, 28.0% pseudo-recall and 
75.6% precision were achieved. 
Acknowledgments: We would like to thank Dr. 
Michiharu Nakamura, Dr. Testuo Yokoyama and Dr. 
Hiromichi Fujisawa for their constant support and 
encouragement. 
References 
Brown, P. F., et a1.1991. Aligning Sentences in Parallel 
Corpora. Proc. of the 29th Annual Meeting of the ACL, 
pp. 169-176. 
Chen, S. F. 1993. Aligning Sentences in Bilingual 
Corpora Using Lexical Information. Proc. of the 31st 
Annual Meeting of the ACL, pp. 9-16. 
Dagan, I., et al. 1993. Robust Bilingual Word Alignment 
for Machine Aided Translation. Proc. of Workshop on 
Very Large Corpora, pp. 1-8. 
Fung, P. 1995. A Pattern Matching Method for Finding 
Noun and Proper Noun Translations from Noisy Parallel 
Corpora. Proc. of the 33rd Annual Meeting of the ACL, 
pp. 236-243. 
Gale, Wo A. and K. W. Church. 1991a. Identifying Word 
Correspondences in Parallel Texts. Proc. of the 4th 
DARPA Speech and Natural Language Workshop, pp. 
152-157. 
Gale, W. A. and K. W. Church. 1991b. AProgram for 
Aligning Sentences in Bilingual Corpora. Proc. of the 
29th Annual Meeting of the ACL, pp. 177-184. 
inoue, N. and I. Nogaito. 1993. Automatic Construction 
of the Japanese-English Dictionary from Bilingual 
Text. Technical Report of IEICE, NLC93-39 (in 
Japanese). 
lshimoto, H and M. Nagao. 1994. Automatic 
Construction of a Bilingual Dictionary of Technical 
Terms from Parallel Texts. Technical Report of IPSJ, 
NL-102-11 (in Japanese). 
Kay, M. ,and M. Roscheisen. 1993. Text-Translation 
Alignment. Computational Linguistics, Vol. 19, No. 1, 
pp. 121-142. 
Kumano, A. and H. Hirakawa. 1994. Building an MT 
Dictionary from Parallel Texts Based on Linguistic and 
Statistical Information. Proc. of COLING'94, pp. 
76-81. 
Kupiec, J. 1993. An Algorithm for Finding Noun Phrase 
Correspondences in Bilingual Corpora. Proc. of the 
31st Annual Meeting of the ACL, pp. 17-22. 
Yamamoto, Y. and M. Sakamoto. 1993. Extraction of 
Technical Term Bilingual Dictionary from Bilingual 
Corpus. Technical Report of IPSJ, NL-94-12 (in 
Japanese). 
28 
