Measuring the Similarity between Compound Nouns
in Difierent Languages Using Non-Parallel Corpora
Takaaki TANAKA
NTT Communication Science Laboratories
2-4 Hikari-dai, Seika-cho, Soraku-gun,
Kyoto, 619-0237, JAPAN
takaaki@cslab.kecl.ntt.co.jp
Abstract
Thispaperpresentsamethodthatmeasuresthe
similarity between compound nouns in difierent
languagestolocatetranslationequivalentsfrom
corpora. Themethodusesinformationfromun-
related corpora in difierent languages that do
not have to be parallel. This means that many
corporacanbeused. Themethodcomparesthe
contexts of target compound nouns and trans-
lation candidates in the word or semantic at-
tribute level. In this paper, we show how this
measuring method can be applied to select the
best English translation candidate for Japanese
compoundnounsinmorethan70%ofthecases.
1 Introduction
Manyelectronicdocumentsinvariouslanguages
are distributed via the Internet, CD-ROM, and
other computer media. Cross-lingual natural
languageprocessingsuchasmachinetranslation
(MT) and cross-lingual information retrieval
(CLIR) is becoming more important.
Whenwereadorwritedocumentsinaforeign
language, we need more knowledge than what
is provided in an ordinary dictionary, such as
terminology, words relevant to current afiairs,
etc. Such expressions can be made up of multi-
ple words, and there are almost inflnite possible
variations. Therefore, so it is quite di–cult to
add them and their translations to a dictionary.
Many approaches have tried to acquire trans-
lation equivalents automatically from parallel
corpora (Dagan and Itai, 1994; Fung, 1995). In
parallelcorpora, efiectivefeaturesthathaveob-
vious correlations between these corpora can be
used { e.g., similarity of position and frequency
of words.
However, we cannot always get enough par-
allel corpora to extract the desired information.
We propose a method of measuring the similar-
ity to acquire compound noun translations by
corpus information, which is not restricted to
parallel corpora. Co-occurrence information is
obtained as context where target nouns appear
from the corpora. In a speciflc domain (e.g., fl-
nancial news), a target word and its translation
are often used in a similar context. For exam-
ple, in a flnancial newspaper, price competition
may appear with products (electric appliances,
clothes, and foods), stores and companies more
often than with nations and public facilities.
2 Extraction of Translations from
Non-Parallel Corpora
In parallel corpora, positions and frequencies
of translation equivalents are correlated; there-
fore, whenwetrytoflndtranslationequivalents
from parallel corpora, this information provides
valuable clues. On the other hand, in non-
parallel corpora, positions and frequencies of
words cannot be directly compared. Fung as-
sumed that co-occurring words of translation
equivalents are similar, and compared distri-
butions of the co-occurring words to acquire
Chinese-English translations from comparable
corpora (Fung, 1997). This method generates
co-occurringwordsvectorsfortargetwords,and
judges the pair of words whose similarity is
high to be translation equivalents. Rapp made
German and English association word vectors
and calculated the similarity of these vectors
to flnd translations (Rapp, 1999). K.Tanaka
and Iwasaki (1996) also assumed the resem-
blance between co-occurring words in a source
language and those in a target language, and
performed experiments to flnd irrelevant trans-
lations intentionally added to a dictionary.
In fact, flnding translation equivalents from
non-parallel corpora is a very di–cult problem,
soitisnotpracticaltoacquireallkindsoftrans-
lationsinthecorpora. Mosttechnicaltermsare
composed of known words, and we must collect
these words to translate them correctly because
new terms can be inflnitely created by combin-
ing several words. We focus on translations of
compound nouns here. First, we collect the
translation candidates of a target compound,
and then measure the similarity between them
to choose an appropriate candidate.
In many cases, translation pairs of compound
nouns in difierent languages have correspond-
ing component words, and these can be used as
strong clues to flnding the translations (Tanaka
and Matsuo, 1999). However, these clues are
sometimes insu–cient for determining which is
thebesttranslationforatargetcompoundnoun
when two or more candidates exist. For exam-
ple, a0a2a1a4a3a6a5 eigyo rieki, which means earnings
beforeinterestandtaxes,canbepairedwithop-
erating proflts or business interest. Both pairs
havecommoncomponents,andwecannotjudge
which pairisbetterusingonlythisinformation.
A reasonable way to discriminate their mean-
ings and usages is to see the context in which
the compound words appear. In the following
example, we can judge operating proflts is a nu-
merical value and business interest is a kind of
group.
† ... its fourth-quarter operating proflt will fall
short of expectations ...
† ... the powerful coalition of business interests
is pumping money into advertisements ...
Thus contextual information helps us dis-
criminate words’ categories. We use the dis-
tribution of co-occurring words to compare the
context.
This paper describes a method of measuring
semanticsimilaritybetweencompoundnounsin
difierent languages to acquire compound noun
translations from non-parallel corpora. We
choose Japanese and English as the language
pairs. The English translation candidates of a
Japanese compound cJ that are tested for simi-
larity can be collected by the method proposed
byT.TanakaandMatsuo(1999). Thesummary
of the method is as follows except to measure
the similarity in the third stage.
1. Collect English candidate translation
equivalents CE from corpus by part-of-
speech (POS) patterns.
2. Make translation candidates set TE by ex-
tracting the compounds whose component
words are related to the components of cJ
in Japanese from CE.
3. Select a suitable translation cE of cJ from
TE by measuring the similarity between cJ
and each element of TE.
In the flrst stage, this method collects tar-
get candidates CE by extracting all units
that are described by a set of POS tem-
plates. For example, candidate translations
of Japanese compound nouns may be English
noun{noun,adjective{noun,noun{of{noun,etc.
T.TanakaandMatsuo(2001)reportedthat60%
of Japanese compound nouns in a terminologi-
caldictionaryarenoun-nounornoun-su–xtype
and 55% of English are noun-noun or adjective-
noun. Next, it selects the compound nouns
whose component words correspond to those of
nouns of the original language cJ, and makes
a set of translation candidates TE. The com-
ponent words are connected by bilingual dictio-
nariesandthesauri. Forexample, if cJ isa0a7a1a2a3
a5 eigyo rieki, the elements of T
E are fbusiness
interest, operating proflts, business gaing.
Theoriginalmethodselectsthemostfrequent
candidate as the best one, however, this can
be improved by using contextual information.
Therefore we introduce the last stage; the pro-
posed method calculates the similarity between
the original compound noun cJ and its transla-
tion candidates by comparing the contexts, and
chooses the most plausible translation cE.
In this paper, we describe the method of se-
lecting the best translation using contextual in-
formation.
3 Similarity between two compounds
Even in non-parallel corpora, translation equiv-
alents are often used in similar contexts. Fig-
ure 1 shows parts of flnancial newspaper arti-
cles whose contents are unrelated to each other.
In the article, a8a10a9a12a11a14a13 kakaku-kyousou ap-
pears with a15a17a16 gekika \intensify", a0a6a1 eigyo
\business"a18 a3a19a5 rieki \proflt", a20a19a21 yosou
\prospect", etc. Its translation price competi-
tion is used with similar or relevant words {
brutal, business, profltless,etc.,althoughthear-
ticle is not related to the Japanese one at all.
We use the similarity of co-occurring words of
targetcompoundsindifierentlanguagestomea-
sure the similarity of the compounds. Since co-
occurring words in difierent languages cannot
be directly compared, a bilingual dictionary is
usedasabridgeacrossthecorpora. Someother
co-occurringwordshavesimilarmeaningsorare
related to the same concept { a3a4a5 \proflt" and
a0a2a1a4a3a2a5a7a6a4a8a10a9a12a11a14a13a4a15a4a16a18a17a2a19a21a20a2a22a18a23a25a24a4a26a21a27a21a28a21a19a21a29
a30a12a31a33a32a4a34a36a35a4a37a21a38a40a39a42a41a4a43a7a44a46a45a4a47a48a31a10a49a14a34a50a11a4a51a12a52a33a53a10a54a7a55a10a56
(In particular, price competition of overseas
travel has become intense. The operating prof-
its are likely to show a flve hundred million yen
deflcit, although they were expected to show a
surplus at flrst.)
Price competition has become so brutal in a
wide array of businesses { cellular phones, disk
drives, personal computers { that some compa-
nies are stuck in a kind of profltless prosperity,
selling loads of product at puny margins.
Figure 1: Examples of newspaper articles
a57a2a58a18a59a2a60
kakaku kyoso 1030
price
competition 158
price
control 100
a20a18a22a12a61a7a52 223-intensify 1 0
a62a21a63 a61a7a52 174-bid 4 0
a20a64a55a14a65 86-severe 2 0
a66a2a67a12a61a7a52 21-rival 5 1
a68 a57a70a69 8-cheap 1 0
Table 1: Common co-occurring words
margin, etc. These can be correlated by a the-
saurus.
The words frequently co-occurring with a8
a9a17a11a12a13 are listed in Table 1. Its translation
price competition has more co-occurring words
related to these words than the irrelevant word
price control. The more words can be related
to each other in a pair of compounds, the more
similar the meanings.
4 Context representation
Inordertodenotethefeatureofeachcompound
noun, we use the context in the corpora. In this
paper, context means information about words
that co-occur with the compound noun in the
same sentence.
4.1 Word co-occurrence
Basically, the co-occurring words of the target
word are used to represent its features. Such
co-occurring words are divided into two groups
in terms of syntactic dependence, and are dis-
tinguished in comparing contexts.
1. Words that have syntactic dependence on the
target word.
(subject-predicate, predicate-object, modiflca-
tion, etc.)
† ... flerce price competition by exporters ...
Japanese
CN N
CN a31 (no) N
CN a19 (ga) V
CN a6 (o) V
CN a11 (ni) V
CN a19 (ga) Adj
English
CN (prep) N
CN V
CN (be) Adj
N CN
Adj CN
Ving CN
CN: a target compound, N: noun,
V: verb, Adj: adjective
Figure 2: Templates for syntactic dependence
(part)
† ... price competition was intensifying in
this three months ...
2. Words that are syntactically independent of the
target word.
† ... intense price competition caused
margins to shrink ...
The words classifled into the flrst class rep-
resent the direct features of the target word:
attribute, function, action, etc. We cannot
distinguish the role using only POS since it
varies { attributes are not always represented
by adjectives nor actions by verbs (compare
intense price competition withprice competition
is intensifying this month.).
On the other hand, the words in the second
class have indirect relations, e.g., association,
with the target word. This type of word has
more variation in the strength of the relation,
and includes noise, therefore, they are distin-
guished from the words in the flrst class.
For simplicity of processing, words that have
dependent relations are detected by word se-
quencetemplates, asshowninFigure2. Kilgar-
rifi and Tugwell collect pairs of words that have
syntactic relations, e.g., subject-of, modifler-
modiflee, etc., using flnite-state techniques (Kil-
garrifi and Tugwell, 2001). The templates
showninFigure2aresimplifledversionsforpat-
ternmatching. Therefore, thetemplatescannot
detect all the dependent words; however, they
can retrieve frequent and dependent words that
are relevant to a target compound.
4.2 Semantic co-occurrence
Since flnding the exact translations of co-
occurring words from unrelated corpora is
harder than from parallel corpora, we also com-
pare the contexts at a more abstract level. In
the example of \price competition", a a71a73a72
denwa \telephone"correspondstoa fax interm
of communications equipment, as well as its ex-
act translation, \telephone".
We employ semantic attributes from Ni-
hongo Goi-Taikei { A Japanese Lexicon (Ike-
hara et al., 1997) to abstract words. Goi-
Taikei originated from a Japanese analysis dic-
tionary for the Japanese-English MT system
ALT-J/E (Ikehara et al., 1991). This lexicon
has about 2,700 semantic attributes in a hierar-
chical structure (maximum 12 level), and these
attributes are attached to three hundred thou-
sand Japanese words. In order to abstract En-
glish words, the bilingual dictionary for ALT-
J/E was used. This dictionary has the same
semantic attributes as Goi-Taikei for pairs of
Japanese and English. We use 397 attributes in
the upper 5 levels to ignore a slight difierence
between lower nodes. If a word has two or more
semantic attributes, an attribute for a word is
selected as follows.
1. For each set of co-occurring words, sum up
the frequency for all attributes that are at-
tached to the words.
2. For each word, the most frequent attribute
is chosen. As a result each word has a
unique attribute.
3. Sum up the frequency for an attribute of
each word.
In the following example, each word has one or
more semantic attributes at flrst. The number
of words that have each attribute are counted:
three for [374], and one for [494] and [437].
Astheattribute [374] appearsmorefrequently
than [494] among all words in the corpus,
[374] is selected for \bank".
bank : [374: enterprise/company],[494: embankment]
store : [374: enterprise/company]
hotel : [437: lodging facilities],[374: enterprise/company]
4.3 Context vector
A simple representation of context is a set of
co-occurring words for a target word. As the
strength of relevance between a target com-
pound noun t and its co-occurring word r, the
feature value of r, „w(t;r) is deflned by the log
likelihood ratio (Dunning, 1993) 1 as follows.
„w(t;r) =
‰ L(t;r) : f(t;r) 6= 0
0 : f(t;r) = 0 (1)
L(t;r) =
X
i;j21;2
kij log kijNC
iRj
= k11 log k11NC
1R1
+k12 log k12NC
1R2
+ k21 log k21NC
2R1
+k22 log k22NC
2R2
(2)
k11 = f(t;r)
k12 = f(t) ¡ k11
k21 = f(r) ¡ k11
k22 = N ¡ k11 ¡ k12 ¡ k21 (3)
C1 = k11 +k12
C2 = k21 +k22
R1 = k11 +k21
R2 = k12 +k22
where f(t) and f(r) are frequencies of com-
pound noun t and co-occurring word r, respec-
tively. f(t;r) is the co-occurring frequency be-
tween t and r, and N is the total frequencies of
all words in a corpus.
The context of a target compound t can be
represented by the following vector (context
wordvector1, cw1), whoseelementsarethefea-
ture values of t and its co-occurring words ri.
cw1(t) = („w(t;r1);:::;„w(t;rn)) (4)
Notethattheorderoftheelementsarecommon
to all vectors of the same language. Moreover,
translation matrix T, described in K.Tanaka
and Iwasaki (1996), can convert a vector to an-
other vector whose elements are aligned in the
sameorderasthatoftheotherlanguage(Tcw).
The element tij of T denotes the conditional
probability that a word ri in a source language
is translated into another word rj in a target
language.
We discriminate between words that have
syntactic dependence and those that do not be-
cause the strengths of relations are difierent as
mentioned in Section 4.1. In order to intensify
the value of dependent words, f(t;r) in equa-
tion(3)isreplacedwiththefollowing f0(t;r)us-
ing the weight w determined by the frequency
of dependence.
f0(t;r) = wf(t;r) (5)
1This formula is the faster version proposed by Ted
Dunning in 1997.
w = 1+ fd(t;r)f(t;r) ⁄ const (6)
Here, fd(t;r) is the frequency of word r that
has dependency on t. The constant is deter-
mined experimentally, and later evaluation is
done with const = 2. We deflne a modifled
vector (context word vector 2, cw2), which is a
version of cw1.
Similarly, another context vector is also de-
flned for semantic attributes to which co-
occurring belong by using the following feature
value„a insteadof„w (contextattributevector,
ca). La inequation(8)isthesemanticattribute
version of L in equation (2). f(t;r) and f(t)
are replaced with f(a;r) and f(a), respectively,
where a indicates an attribute of a word.
ca(t) = („a(t;a1);:::;„a(t;am)) (7)
„a(t;a) =
‰ L
a(t;a) : f(t;a) 6= 0
0 : f(t;a) = 0 (8)
5 Comparison of context
AsdescribedinSection3,thecontextsofacom-
poundnounanditstranslationareoftenalikein
thecorporaofasimilardomain. Figure2shows
a comparison of co-occurrence words and se-
manticattributesofthreecompoundnouns{a0
a1a6a3 a5 and its translation operating proflt, and
an irrelevant word, business interest. Each item
corresponds to an element of context vector cw
or ca, and the words in the same row are con-
nected by a dictionary. The high „w words in
the class of \independent words" include words
associated indirectly to a target word, e.g., a0
a1a3a2 mikomi {expectation,
a4a6a5 haito {sharea7
Some of these words are valid clues for connect-
ing the two contexts; however, others are not
very important. On the other hand, words in
the class of \dependent words" are directly re-
latedtoatargetword,e.g., increase of operating
proflts, estimate operating proflts. The variety
of these words are limited in comparison to the
\independent words" class, whereas they often
can be efiective clues.
More co-occurring words of operating proflts
that mark high „w are linked to those of a0 a1 a3
a5 ratherthanbusiness interest. Asforsemantic
attributes, operating proflt shares more upper
attributes with a0a2a1a6a3a4a5 than business interest.
The similarity Sw(ts;tt) between compound
nouns ts in the source language and tt in the
a24a2a26a2a27a2a28 „
w=„a
operating
proflt „w=„a
business
interest „w=„a
[independent words]
a27a18a28 3478 proflt 117
a8a10a9a12a61a7a52 654 slash 16.3 reduce 7.8
a53a12a11a14a13 508 expectation 137
a15a17a16 a22 455 rationalize 46.8
a18a17a19 363 design 11.2
a20a22a21 353 share 130
[dependent words]
a23a14a24 a52 1866 increase 49.7 increase 6.3
a9a17a25a12a61a7a52 727 decline 5.9 diminish 14.1
a35a18a37a12a61a7a52 709 estimate 51.2 estimate 3.6
a26a28a27 422 division 49.2 division 7.1
a29a31a30 a52 321 contrast 3.2 compete 9.4
a32a31a33 266 connect 4.9 link 5.4
[semantic attributes](*)
[2262] 8531 131 11
[1867] 7290 321 93
[2694] 4936 13 19
[1168] 3855 83
[1395] 3229 695 110
[1920] 1730 810 428
(*) [2262:increase/decrease],[1867:transaction],
[2694:period/term],[1168:economic system],
[1395:consideration],[1920:labor]
Table2: Comparisonofco-occurrencewordand
semantic attributes
target language is deflned by context word vec-
tors and translation matrix T as follows.
Sw(ts;tt) = Tcw(ts)cw(tt) (9)
Similarly, the semantic attribute based similar-
ity Sa(ts;tt) is deflned as follows.
Sa(ts;tt) = ca(ts)ca(tt) (10)
6 Evaluation
In order to evaluate this method, an experi-
mentontheselectionofEnglishtranslationsfor
a Japanese compound noun is conducted. We
use two Japanese corpora, Nihon Keizai Shim-
bunCD-ROM1994(NIK,1.7millionsentences)
and Mainichi Shimbun CD-ROM 1995 (MAI,
2.4millionsentences), andtwoEnglishcorpora,
The Wall Street Journal 1996 (WSJ, 1.2 million
sentences) and Reuters Corpus 1996 (REU, 1.9
million sentences 2 Reuters (2000)) as contex-
tual information. Two of them, NIK and WSJ,
are flnancial newspapers, and the rest are gen-
eral newspapers and news archives. All combi-
nations ofJapaneseand Englishcorporaareex-
amined to reduce the bias of the combinations.
2Only part of the corpus is used because of flle size
limitation in the data base management system in which
the corpora are stored.
First, 400 Japanese noun-noun type com-
pounds cJ that appear frequently in NIK (more
than 15 times) are randomly chosen. Next,
the translation candidates TE for each cJ are
collected from the English corpus WSJ as de-
scribed in Section 2. The bilingual dictio-
nary for MT system ALT-J/E, Goi-Taikei and
a terminological dictionary (containing about
105,000 economic and other terms) are used to
connect component words. As a result, 393
Japanese compound nouns and their transla-
tioncandidatesarecollectedandthecandidates
for 7 Japanese are not extracted. Note that
we link component words widely in collecting
the translation candidates because components
in difierent languages do not always have di-
rect translations, but do have similar meanings.
For instance, for the economic term a0a2a1a4a3a6a5
setsubi toushi and its translation capital invest-
ment, while a3a4a5 toushi means investment, a0
a1 setsubi, which means equipment or facility,
is not a direct translation of capital. The Goi-
Taikei and the terminological dictionary are
employedtolinksuchsimilarcomponentwords.
Each Japanese word has a maximum of 5 can-
didates (average 3 candidates). We judge ade-
quacy of chosen candidates by referring to arti-
cles and terminological dictionaries. More than
70% of Japanese have only one clearly correct
candidate and many incorrect ones (e.g. securi-
ties company and paper company for a7a9a8a2a10a9a11
shouken gaisha). The others have two or more
acceptable translations.
Moreover, if all of the translation candidates
ofcompound cJ arecorrect(45Japanese),orall
are incorrect (86 Japanese), cJ and its transla-
tion candidates are removed from the test set.
Foreach cJ intheremainderofthetestset(262
Japanese compound nouns, set 1), a translation
cE thatisjudgedthemostsimilartocJ ischosen
by measuring the similarity between the com-
pounds. Set 1 is divided into two groups by the
frequency of the Japanese word: set 1H (more
than100times)andset1L(lessthan100times)
to examine the efiect of frequency. In addition,
the subset of set 1 (135 Japanese compound
nouns, set 2), whose members also appear more
than 15 times in MAI, is extracted, since set
1 includes compounds that do not appear fre-
quently in MAI. On the other hand, the can-
didate that appears the most frequently in the
sets corpora word1 word2 attr freq
(cw1) (cw2) (ca) (WSJ)
[1H]NIK-WSJ 73.4 74.2 66.4 65.6
[1L] NIK-WSJ 53.3 53.3 43.0 46.7
[1] NIK-WSJ 63.0 63.4 54.2 55.7
[2] NIK-WSJ 71.1 72.6 65.9 64.4
[2] NIK-REU 71.9 71.9 66.7
[2] MAI-WSJ 58.5 58.5 63.7
[2] MAI-REU 57.0 56.3 65.2
Table 3: Precision of selecting translation can-
didates
Englishcorpuscanbeselectedasthebesttrans-
lation of cJ. This simple procedure is the base-
line that is compared to the proposed method.
Table 3 shows the result of selecting the ap-
propriate English translations for the Japanese
compounds when each pair of corpora is used.
The column of \freq(WSJ)" is the result of
choosing the most frequent candidates in WSJ.
Since the methods based on word context reach
higherprecisioninset1,thissuggeststhatword
context vectors (cw) can e–ciently describe the
contextofthetargetcompounds. Foralmostall
sets, context word vector 2 provides higher pre-
cisionthancontextwordvector1. However,the
efiect of consideration of syntactic dependency
is minimal in this experiment.
The precisions of word context vector in both
MAI-WSJ and MAI-REU are low. This main
reasonisthatmanyJapanesecompoundsinthe
test set appear less frequently in MAI than in
NIK, since the frequent compounds in NIK are
chosenfortheset(theaveragefrequencyinNIK
is 417, but that in MAI is 75). Therefore, less
common co-occurrence words are found in MAI
and the English corpora than in NIK and them.
For instance, 25 Japanese compounds share no
co-occurrence words with their translation can-
didates in MAI-WSJ while only one Japanese
shares none in NIK-WSJ. In spite of this handi-
cap, themethod basedon semanticcontext (ca)
of MAI-WSJ/REU has the high precision. This
result suggests that an abstraction of words can
compensate for lack of word information to a
certain extent.
The proposed method based on word context
(cw) surpasses the baseline method in precision
in measuring the similarity between relatively
frequent words. Our method can be used for
compiling dictionary or machine translations.
Table 4 shows examples of translation candi-
Japanese English candidates Sw1 £10¡2
a0a2a1a4a3a2a5 +technology transfer 24433
technology share 20849
policy move 9173
a6a2a7a9a8a11a10a13a12 +exchange rate 530509
market rate 111712
bill rate 46417
a14a16a15a2a17a2a18 energy company 730323
+power company 441790
Table 4: Examples of translation candidates
a14a19a15a20a17a2a18 power company energy company
a14a19a15 power power
a21
a8a2a22a23a12a4a24a11a25a26a22 electric electric
a27 a14 blackout
a28 a52 remain remain
a29a4a30 charge charge
a31a19a32 storage
Table 5: Similar co-occurring words of hyper-
nyms and hyponyms
datesandtheirsimilarityscores. Themark\+"
indicates correct translations. Some hyponyms
and hypernyms or antonyms cannot be distin-
guished by this method, for these words of-
ten have similar co-occurring words. As shown
in Table 5, using the example of power com-
pany and energy company, co-occurring words
are very similar, therefore, their context vec-
torscannotassistindiscriminatingthesewords.
Thisproblemcannotberesolvedbythismethod
alone. However, there is still room for improve-
ment by combining other information, e.g., the
similarity between components.
7 Conclusion
We proposed a method that measures the sim-
ilarity between compound nouns in difierent
languages by contextual information from non-
parallel corpora. The method efiectively selects
translation candidates although it uses unre-
lated corpora in difierent languages. It mea-
sures the similarity between relatively frequent
words using context word vector. As for less
commonco-occurrencewords, contextattribute
vectors compensate for the lack of information.
Forfuturework,wewillinvestigatewaysofinte-
grating the method and other information, e.g.,
similarity of components, to improve precision.
Acknowledgements
This research was supported in part by the Re-
search Collaboration between NTT Communi-
cation Science Laboratories, Nippon Telegraph
and TelephoneCorporationand CSLI,Stanford
University. The author would like to thank
Timothy Baldwin of CSLI and Francis Bond of
NTT for their valuable comments.

References

Ido Dagan and Alon Itai. 1994. Word sense disam-
biguation using a second language monolingual
corpus. Computational Linguistics, 20(4):563{
596.

Ted Dunning. 1993. Accurate methods for the
statistics of surprise and coincidence. Computa-
tional Linguistics, 19(1):61{74.

Pascale Fung. 1995. A pattern method for flnding
noun and proper noun translation from noisy par-
allel corpora. In Proc. of the 33rd Annual Meeting
of the Association for Computational Linguistics.

Pascale Fung. 1997. Finding terminology transla-
tions from non-parallel corpora. In Proc. of the
5th Annual Workshop on Very Large Corpora,
pages 192{202.

Satoru Ikehara, Satoshi Shirai, Akio Yokoo, and Hi-
romi Nakaiwa. 1991. Toward an MT system with-
out pre-editing { efiects of new methods in ALT-
J/E {. In Proc. of the 3rd Machine Translation
Summit, pages 101{106.

Satoru Ikehara, Masahiro Miyazaki, Satoshi Shirai,
Akio Yokoo, Hiromi Nakaiwa, Kentaro Ogura,
Yoshifumi Ooyama, and Yoshihiko Hayashi.
1997. Nihongo Goi-Taikei { A Japanese Lexicon.
Iwanami Shoten.

Adam Kilgarrifi and David Tugwell. 2001. WASP-
bench: an MT lexicographers’ workstation sup-
porting state-of-art lexical disambiguation. In
Proc. of the 8th Machine Translation Summit.

Reinhard Rapp. 1999. Automatic identiflcation of
word translations from unrelated english and ger-
man corpora. In Proc. of the 37th Annual Meeting
of the Association of Computational Linguistics,
pages 1{17.

Reuters. 2000. Reuters corpus (1996{1997).

Kumiko Tanaka and Hideya Iwasaki. 1996. Extrac-
tion of lexical translations from non-aligned cor-
pora. In Proc. of the 16th International Confer-
ence on Computational Linguistics, pages 580{
585.

Takaaki Tanaka and Yoshihiro Matsuo. 1999. Ex-
traction of translation equivalents from non-
parallel corpora. In Proc. of the 8th International
Conference on Theoretical and Methodological Is-
sues in Machine Translation, pages 109{119.

Takaaki Tanaka and Yoshihiro Matsuo. 2001. Ex-
traction of compound noun translations from non-
parallel corpora (in Japanese). In Trans. of the
Institute of Electronics, Information and Commu-
nication Engineers, 84-D-II(12):2605{2614.
