Sense Classification of Verbal Polysemy 
based-on Bilingual Class/Class Association* 
Takehito Utsuro 
Graduate School of Information Science, Nara Institute of Science and Technology 
8916-5, Takayama-cho, Ikoma-shi, Nara, 630-01, JAPAN 
utsuro@is.aist-nara.ac.jp 
Abstract 
\[n the field of statistical analysis of 
natural language data, the measure of 
word/class association has proved to be 
quite useful for discovering a meaning- 
tiff sense cluster in an arbitrary level 
of the thesaurus. In this paper, we 
apply its idea to the sense classifica- 
tion of Japanese verbal polysemy in case 
frame acquisition from Japanese-English 
parallel corpora. Measures of bilin- 
gual class~class association and bilingual 
class/frame association are introduced 
and used for discovering sense clusters 
in the sense distribution of English pred- 
icates and Japanese case element nouns. 
In a small experiment, 93.3% of the dis- 
covered clusters are correct in that none 
of them contains examples of more than 
one hand-classified senses. 
1 Introduction 
In corpus-based NLP, acquisition of lexical knowl- 
edge has become one of the major research topics. 
Among several research topics in this field, acqui- 
sition from parallel corpora is quite attractive (e.g. 
Dagan et al. (1991)). The reason is that paral- 
lel sentences are useful for resolving both syn- 
tactic and lexical ambiguities in the monolingual 
sentences. Especially if the two languages have 
different syntactic structures and word meanings 
(such as English and Japanese), this approach 
has proved to be most effective in disambigua- 
tion (Matsumoto et al., 1993; Utsuro et al., 1993). 
Utsuro et al. (1993) proposed a method for ac- 
quiring surface case frames of Japanese verbs 
from Japanese-English parallel corpora. In this 
method, translated English verbs and case labels 
are used to classify senses of Japanese polysemous 
verbs. Clues to sense classification are found us- 
ing English verbs and case labels, as well as the 
sense distribution of the Japanese case element 
*The author would like to thank Prof. Yuji MAT- 
SUMOTO for his valuable comments on this research. 
This work is partly supported by the Grants from the 
Ministry of Education, Science, and Culture, Japan, 
~07780326. 
nouns. Then, a human instructor judges whether 
the clues are correct. One of the major disadvan- 
tages of the method is that the use of English in- 
formation and sense distribution of Japanese case 
element nouns is restricted. Only surface forms of 
English verbs and case labels are used and sense 
distribution of English verbs is not used. Also, the 
threshold of deciding a distinction in the sense dis- 
tribution of Japanese case element nouns is prede- 
termined on a fixed level in a Japanese thesaurus. 
As a result, the human instructor is frequently 
asked to judge the correctness of the clue. 
In the field of statistical analysis of natural 
language data, it is common to use measures 
of lexical association, such as the information- 
theoretic measure of mutual information, to ex- 
tract useful relationships between words (e.g. 
Church and Hanks (1990)). Lexical association 
has its limits, however, since often either the 
data is insufficient to provide reliable word/word 
correspondences, or the task requires more ab- 
straction than word/word correspondences per- 
mit. Thus, Resnik (1992) proposed a useful mea~ 
sure of word/class association by generalizing 
information-theoretic measure of word/word asso- 
ciation. The proposed measure addresses the lim- 
itations of lexical association by facilitating sta~ 
tistical discovery of facts involving word classes 
rather than individual words. 
We find the measure of word/class associa- 
tion of Resnik (1992) is quite attractive, since it 
is possible to discover a meaningful sense club- 
ter in an arbitrary level of the thesaurus. We 
thus expect that the restrictions of the previ- 
ous method of Utsuro et al. (1993) can be Over- 
come by employing the idea of the measure of 
word/class association. In this paper, we de- 
scribe how this idea can be applied to the sense 
classification of Japanese verbal polysemy in case 
frame acquisition from Japanese-English parallel 
corpora. First, sense distribution of English pred- 
icates and Japanese case element nouns is repre- 
sented using monolingual English and Japanese 
thesaurus, respectively (sections 2 and 3). Then, 
the measure of the association of classes of En- 
glish predicates and Japanese case element nouns, 
i.e., a measure of bilingual class~class associa- 
tion, is introduced, and extended into a measure 
of bilingual class/frame association (section 4). 
968 
Using these measures, sense clusters are discov- 
ered in the. sense distribution of English predicates 
and ,lapanese ease element nouns. Finally, ex- 
amples of a Japanese polysemous verb collected 
from ,/apanese-l'\]nglish parallel corpora are cli- 
vided into disjoint clusters according to those dis- 
covered sense clusters (section 5). The results of a 
small experiment are presented and the proposed 
measure is evaluated (section 6). 
2 Bilingual Surface Case Structure 
In the framework of verbal case frame acquisition 
fi'om parallel corpora, bilingually matched surface 
case structures (Matsumoto eta\[., 1993) are col- 
lected and surface case frames of Japanese verbs 
are acquired ti'om the collection, in this paper, 
each bilingually matched surface case structure is 
(:ailed a bilingual surface case structure, and rep- 
resented as a feature structure: 
p'red : va 
semu : SEMi, 
pl : sere : SEMal 
: 
P'" : sere : ,5'EMa,~ 
vj in(licat(~s the verb in the Japanese sentence, 
Pl,..., P,, denote the Japanese ease markers, and 
n~l,...,nj,, denote the Japanese ease element 
nouns. When a .Japanese noun nji tins several 
senses, it may appear in several leaf classes in the 
,lapanese thesaurus. Thus, St';Mai is represented 
as a. set of" those classes, and is referred to as a se- 
mantic label. St';ML, is a semantic label of the cor- 
responding English predicate, i.e., a set of classes 
in the English thesaurus: 
,gl';Mu = {cE1 ..... (:l~k}, SI'\]Mj; = {cal ..... cal} 
cI,:t,..., ¢:gk and caa,..., cjl indicate the classes 
in the English and Japanese thesaurus, rcspec- 
t.ively. 
By structurally matching the Japanese-English 
parallel sentences in Example 1, the following 
bilingual surface case structure is obtained: 
Examph.' :1 
J: Watashi-ha uwagi-wo kagi-ni kaketa. 
I- 7'0 P coat-A C;C hook- on hung 
E: I hung my (;oat on the hook. 
\])'i' (~ d 
,S ( : I~L t,; 
\]l,(Z : 
'UIO : 
'lti : 
Fred : watashi \] J 
pred : uwagi \] 
sere: {~,,} J 
pred : kagi \] 
.gCT/~ : {Ckl,.,. , Ck4 } J 
We use \[¢,oget's Thesaurus (Roget, 1911) 
as the English thesaurus and 'Bunrui Goi 
Hyon'(BGH) (NLRI, 1993) as the Japanese the- 
saurus. In Roget's Thesaurus, the verb "han.q" 
has four senses. In BGH, the nouns "watash, i" 
and "uwagi" have only one sense, respectively, and 
"kagi" has four senses. 
3 Monolingual Thesaurus 
A thesaurus is regarded as a tree in which each 
node represents a class. We introduce ~ as the 
superordinate-subordinate relation of closes. In 
general, c1 _~ e2 means that cl is subordinate to 
c2. We define -/ so that a semantic label SEM= 
{cl,...,cn} is subordinate to each class ci: 
Vc C SEM, SEM ~ c 
When searching for classes which give maximum 
association scm'e (section 5), this detinition makes 
it possible to calculate association score for all the 
senses in a semantic label and to find senses which 
give a maximum association score ~. 
BGIt has a six-layered abstraction hierarchy 
and more than 60,000 .Japanese words are assigned 
at the leaves and its nominal part contains about 
45,000 words 2. Roget's l hesalrus has a seven- 
layered abstraction hierarchy and over 100,000 
words are allocated at the leaves a. In Roget's The- 
saurus, sense classification is preferred to part of 
speech distinction. Thus, a noun and a verb which 
have similar senses are assigned similar classes in 
the thesanrus. 
4 Class-based Association Score 
4.1 Word/Class Association Score 
The measure of word/class association of 
Resnik (1992) can be illustrated by the problem 
of finding tile prototypical object classes for verbs. 
Let )2 and A/" be the sets of all verbs and norms, 
respectively. (liven a verb v(E )2) and a norm 
class c(C N'), the joint probability of v and c is 
estimated as 
~'2 count(v, r~) 
"n~c Pr(v, e) 
E E 
v'CV nJc.Af 
The association score A(v,c) of a verb v and a 
noun class c is defined as 
Pr(v, c) - Pr(c \[ v) l(v; e) A(v,c) -- Pr(c I v) log Pr(v)Pr(c) 
The association score takes the mutual informa- 
tion between the verb and a noun class, and scales 
1This process corresponds to sense disamblguation 
by maximizing the association score. 
2Five classes are allocated at the next level from 
the root node: abstract-relations, agents-@human- 
activities, human-activities, products, and natural- 
objects- and-natural-phenomena. 
SAt the next level from the root node, it has six 
classes: abstract-relations, space, matter, intellect, vo- 
lition, and affections. 
969 
it according to the likelihood that a member of the 
class will actually appear as the object of the verb. 
The first term of the conditional probability mea- 
sures the generality of the association, while the 
second term of the mutual information measures 
the co-occurrence of the association. 
4.2 Bilingual Class/Class Association 
Score 
We now apply the word/class association score to 
the task of measuring the association of classes 
of English predicates and Japanese case element 
nouns in the collection of bilingual surface ease 
structures. First, we assume that for any polyse- 
mous Japanese verb v j, there exists a case marker 
p which is most effective for sense classification 
of vj. Given the collection of bilingual surface 
case structures for v j, we introduce the bilingual 
class/class association score for measuring the as- 
sociation of a class cE of English predicates and a 
class cj of Japanese case element nouns for a case 
marker p. 
Let Eg(vg,p) be the set of bilingual sur- 
face case structures collected fronl the Japanese- 
English parallel corpora, each element of which 
has a Japanese verb vj and a Japanese case 
marker p. Among the elements of Eg(vj,p), let 
Eg(vj,p,c~) be the set of those whose seman- 
tic label SEME of the English predicate satisfies 
the class c~, i.e., SEME ~ cE, and Eg(vj,p/cj) 
be the set of those whose semantic label SEMj 
of the Japanese case element noun for the case 
marker p satisfies the class c j, i.e., SEMj 
cj. Let l';g(vj,cE,p/cj) be the intersection of 
Eg(vj, p, c~i) and Eg(vj, p/cj). Then, conditional 
probabilities Pr(cE Ira,p), Pr(cj I va,p), and 
Pr(cE,cj I vj,p) are defined as the ratios of the 
numbers of the elements of those sets: 
\[Eg(vJ,p, cE)\[ pr(c  - 
Pr(,:,, I vJ,p) - IE (v ,p)I 
Pr(cE,cJ I vj,p) = \]Eg(v ,p) l 
Then, given vj and p, the association score 
A(c~,., cj Ivj,p) of cE and cj is defined as 
A(cE,cj \[ vj,p) = 
Pr(cs, c~ Ivj,p) 
Pr(e~,cj I v.l,p)log Pr(cE \[vj,p)Pr(cj I vj,p) 
This definition is slightly different from that of 
the word/class association score in that it only 
needs the set Eg(vj,p) for a Japanese verb vy 
and a Japanese case marker p, but not the whole 
3apanese-English parallel corpora. This is be- 
cause our task is to discover strong association of 
an English (:lass and a Japanese class in Eg(vj,p), 
rather than in the whole Japanese-English paral- 
lel corpora. Besides, as the first term for mea- 
snring the generality of the association, we use 
Pr(cE,cj \[ vj,p) instead of Pr(c,, I vj,p, cl~) or 
Pr(c~ \[vj,p/cj) below:* 
IEg/- , I Pr(cz \[ vj,p, cE) = 
Pr(cE \[ vj,p/cj) = 
4.3 Bilingual Class/Frame Association 
Score 
In the previous section, we assume that for any 
polysemous Japanese verb v j, there exists a case 
marker p which is most effective for sense classi- 
fication of verbal polysemy vj. However, it can 
happen that a combination of more than one ease 
marker characterizes a sense of the verbal poly- 
senly vj. Even if there exists exactly one case 
marker which is most effective for sense classifi- 
cation, it is necessary to select the most effective 
case marker automatically by some measure. For 
example, using some measure, it is desirable to 
automatically discover the fact that, for the task 
of sense classification of verbal polysenry, subject 
nouns are usually nlost effective for intransitive 
verbs, while object nouns are usually most effec- 
tive for transitive verbs. 
This section generalizes the previous defini- 
tion of bilingual class/class association score, and 
introduces the bilingual class/frame association 
score. In the new definition, we consider every 
possible set of pairs of a Japanese case marker 
p and a Japanese noun class c j, instead of pre- 
determining the most effective case marker. The 
bilingual class/frame association score measures 
the association of an English class c~ and a set of 
pairs of a Japanese case marker p and a Japanese 
noun class cs marked by p. By searching for a 
large association score, it becomes possible to find 
any combination of case markers which character- 
izes a sense of the verbal polysemy vs. 
4.3.1 Japanese Case-Class Frame 
First, we introduce a data structure which rep- 
resents a set of pairs of Japanese case marker p and 
a Japanese noun class cj marked by p, and call it; 
Japanese case-class frame. A Japanese case-class 
frame can be represented as a feature structure: 
Pm : CJm 
4pr(cd \[ vj,p, cE) and Pr(eE I vj,p/cd) are too 
large in lower parts of the thesaurus, since we focus 
oi1 examples which have a Japanese verb v.l and a 
Japanese case marker p. When we used the average 
of Pr(ej I vJ,p, cE) and Pr(e~ \[ vj,p/cj) instead of 
Pr(eE, cj \] vj,p) in the experiment d section 6, most 
discovered clusters consisted of only one example. 
970 
4.3.2 Subsumption Relation 
Next, we introduce subsuraption relation ~/'of ~ 
a bilingual surface case structure e and a Japanese 
ease-class frame fa: 
e ~f f3 iff. for each case marker p in f's and 
its noun class c's, there exists 
the same case marker p in e and 
its semantic babel SEMj is sub- 
ordinate to ca, i.e. SEM'S ~ ca 
This definition can be easily extended into a sub- 
snmption relation of Japanese case-class frames. 
4.3.3 Bilingual Class/Prame Association 
Score 
Let Eg(va) be the set of bilingual surface case 
strnctures collected from the Japanese-English 
parallel corpora, each element of which has a 
Japanese verb va. Among the elements e of 
Eg(va), let Eg(va,cE) be the set of those whose 
semantic label SEME of the English predicate 
satisfies the class cE, i.e., SEME ~ cE, and 
Eg(vj, fd) be the set of those which satisfy the 
Japanese case-class frame fa, i.e., e ~f fj. Let 
Eg(vj,cf,;, fa) be the intersection of Eg(va,cE) 
and Eg(va, fa). Then, conditional probabilities 
1','(c~: Iv j), Pr(fa ira), and I'r(cm fjiva) are de- 
fined as the ratios of the numbers of the elements 
of those sets: 
\[)I'(CE I V J) -- \]~(Vj)l 
t',.(S's Iv,) - iEv(~'s) I 
Pr(c~:, f's I~a) = 
Then, given va, the association score A(cE, fal 
V'S) of cg and fj is defined as 
A(cF,, f,, Iv,s) = 
PI"(CE, f'S I~'s) 
Pr(es,:, f,, Iva) log Pr(c~ I v's)Pr(f's I v's) 
As well as the case of the bilingual class/class 
association score, this definition only needs the 
set Eg('va) for a Japanese verb va, not the whole 
Japanese-English parallel corpora. 
5 Sense Classification of Verbal 
Polysemy 
This section explains how to classify the elements 
of the set l')g(va) of bilingual surface case struc- 
tures according to the sense of the verbal poly- 
scaly va, with the bilingual class/frame associ- 
ation score defined in the previous section, hi 
this classification process, pairs of an English class 
cz,: and a Japanese case-class frame fj which give 
large association score A(cE, fa ira) are searched 
for. It is desirable that the set Eg(vj) be divided 
into disjoint subsets by the discovered pairs of cu 
and fa. The classification process proceeds ac- 
cording to the following steps: 
1. First, the index i and the set of examples Eg 
are initialized as i ~- \] and Eg *- Eg(va). 
2. For the i-th iteration, let cE and fa be a pair 
of an English class and a Japanese case-class 
frame which satisfy the following constraint 
for all the pairs ofcEj and fjj (l<j<i- 1.): 
csu is not subordinate nor superordinate to 
cEj (i.e., cF, ~ cEj and cEj ~ cE), or fa is not 
subordinate nor superordinate to faj (i.e., 
fJ 74 1 faj and fjj Zf fa). Then, among 
those pairs of c~ and f j, search for a pair 
eel and fJi which gives maximum association 
score max A(cE, fa Iv j), a and collect the ele- cE ,fj 
ments of Eg which satisfy the restrictions of 
CEi and fJi into the set Eg(va, eel, f.'i). 
3. Subtract the set Eg(va, cu~, fJi) from Eg as 
Eg +-- Eg - Eg(va, eel, fal). If Eg 7~ O, then 
increment the index i as i +-- i + 1 and go to 
step 2. Otherwise, set the number k of the 
subsets as k +- i and terminate the class/flea-. 
t/on process. 
As the result of this classification process, the 
set Eg(vj) is divided into disjoint subsets Eg(vj, 
cNl, fdl), ..., Eg(v2, cEk, fak). 6 For example, if 
a Japanese polysemous verb vg has both intran- 
sitive and transitive senses, pairs with the sub- 
ject case like (era, \[subj : Cjl\]),... , <CEk, , \[s~l.bj : 
oak,\]) will be discovered for intransitive senses, 
while pairs with the object case like (cEk,~ l, \[obj: 
Cdk, ,q\]),..., (cEk, \[obj :csk\]) will be discovered for 
transitive senses. 
Given tile set Eg(va), the iterations of the as- 
sociation score calculation is O(IEg('oa)t) ~. Since 
the classification process can be regarded as sort- 
ing the calculated association score, its COnlputa- 
tional complexity can be O(IEg(vj) I log IEg(~j)l ) 
if efficient sorting algorithms such as quick sort 
are employed. 
6 Experiment and Evaluation 
This section gives the results of a small exper- 
5The association score A(CS~,fjIvj) is calculated 
from the whole set Eg(v's), not Eg. 
6Although the classification process itself guaran- 
tees the disjointness of Eg(va, eel, f'sl), ..., Eg(vj, 
CEk, fJk), the subordinate-superordinate constraint of 
c,,~, and fj in the step 2 also guarantees the disjoint- 
ness of the example sets which satisfy the restrictions 
of me p~irs (c~,DJ,..., (c~, i's~). 
7Let l's, d's~ and ds~. be the maximum number of 
Japanese cases in a bilingual surface ease structure, 
the depths of the Japanese and English thesauri, re- 
spectively. Then, given a bilingual surface case struc- 
ture e, the number of Japaxiese case-class frames f.s 
which is superordinate to e (i.e., e ~f f's) is less than 
2 ° x d~', an(t the mtmber of possible pairs of c~ and 
f's is less than 2 ° x dt/ x dF,, which is constant. 
971 
l Iland- Cluster 
Classif. No. 
t 
2 
---3 
4 
5 
9 
10 
11 
12 
3 13 14 
I 15 I1 
Table 1: Sense Classification of kau 
English Predicate Class (CE)/ 
Japanese ~o(ACC) Case Noun Class (~) 
(Level in the Thesaurus and Example Word) 
buy(Leaf)/131(Leve13, hon(book)) 
buy(Leaf)/13220(Leve15, e(picture)) 
Purchase(Leaf-I, buy, pay)/14(Levd2, Products) 
treat oneself to(Leaf)/14650-6-80(Leaf, gaisha(foreign car)) 
treat Oneself to(Leaf)/14280-3-10(Leaf, yubiwa(ring)) 
purchase(Leaf)~11720-3-lO(Leaf, dis~-- 
bring(Leaf) /14010-4-40(Leaf, m~-- 
get(Leaf)/14570-1-10(Leaf, omocha(toy)) 
incur(Leaf)/laO(Level3, urami(enmity)) 
Motive(Leaf-I, rouse)/13020-5-50(Leaf, hankan(antipathy)) 
disgust( Leaf ) /13O l O-1-50( Leaf, hinshuku( displeasure) ) 
appreciate(Leaf)/13040-6-aO(Leaf, do,'yoku(effort)) 
get an opinion of(Leaf)/12040-1-5O(Leaf , otoko(person)) 
use(Leaf)/la421-6-50(Leaf, shuwan(ability)) 
win(Leaf)/laOlO-6-2OO(Leaf, kanshin(favor)) II 
Total I1 
Number 
of Egs. 
8 
46 
Association 
Score 
0.048 
0.018 
0.149 
0.070 
0.070 
0.083 
0.062 
0.070 
0.185 
3 0.169 
1 0.083 
1 0.083 
1 0.083 
75 
0.08'3 
0.083 
nese 
rb 
ru 
Table 2: Examples of Intransitive/Transitive Distinction 
Ll 
English Predicate Class (ce)/Japanese Case-Class Frame (fa) 
(Level in the Thesaurus and Example Word) 
expensive(Leaf)/ ga(NOM):ne(price)(Leaf) 
Special Sensation(Leaf-a, freeze)/ 
ga(NOM) :15130-11-10(Leaf, koori(ice)) 
Acts(Leaf-2, persist,stick to)/ 
wo(hCC):13040(LevelS, goujou(obstinacy)) 
Decrease(Leaf-I, subside)/ga(NOM):151(Level3, kouzui(floods)) 
Results of Reasoning(Leaf-2, catch, have)~ 
wo(ACC):15860-11 (Level6, kaze(cold)) 
Number 
of Egs. 
26 
Association 
Score 
3 0.299 
3 0.237 
7 0.459 
0.109 
0.421 
zku Intellect(Levell, open)/ga(NOM):14(Products)(Level2, to(door)) 
hold(Leaf) / wo( A CC):13510- l-CLeve16 , kaigou(meeting)) 
~au Completion(Leaf-I, ~Level4, negai ~ 
Quantity(Leaf-3, equal)/ni(DAT):12000-3-10(Leaf, kate(he)) 
12 
3 
0.339 
0.114 
I 0.460 
0.504 
iment. As a Japanese-English parallel corpus, we 
use a corpus of about 40,000 translation exam- 
ples extracted h'om a machine readable Japanese- 
English dictionary (Shimizu and Narita, 1979). 
6.1 Example of kau 
First, we show the result of classifying 75 examples 
(represented as bilingual surface ease structures) 
of the Japanese polysemous verb kau. 
As the result of searching for pairs of an English 
class and a Japanese case-class frame with a large 
association score, the wo case (the accusative case) 
is preferred as the most effective case for sense 
classification. 15 pairs of an English class and a 
Japanese case-class frame are found and the set 
of the 75 examples are divided into 15 disjoint 
clusters (Table 1). Each cluster is represented as 
a pair of the class c~ of the English predicates and 
the class ca of the Japanese case element nouns of 
wo case, along with the level of the class in the 
thesaurus and the example word. English classes 
are taken from Roget's Thesaurus and Japanese 
classes fi'om BGH s. In both thesauri, leaf classes 
SThe classes of BGH are represented as numer- 
correspond to one word. 
For the evaluation of the results, we hand- 
classified the 15 clusters into four groups, each 
of which corresponds to only one sense of kau 9. 
Most hand-classified clusters for kau consist of 
more than one clusters found by maximizing the 
association score. However, these clusters are cot- 
rect in that none of them contains examples of 
more than one hand-classified senses of kau. 
6.2 Examples of Intransitive/Transitive 
Distinction 
For four Japanese verbs haru, hiku, hiraku, and 
kanau, Table 2 shows examples of classifying in- 
transitive/transitive senses by the proposed sense 
icM codes, in which each digit denotes the choice 
of the branch in the thesaurus. The classes start- 
ing with '11', '12', '13', '14', and '15' are subordi- 
nate to abstract-relations, a qents-of.human-activities, 
human-activities, products and natural-objects-and- 
natural-phenomena, respectively. 
9The criterion of this hand-classification is taken 
from the existing Japanese dictionaries for human use 
and the hand-compiled Japanese case frame dictionary 
IPAL (IPA, 1987). 
972 
1 
2 
3 
4 
5 
6 
7 
8 
9 
,lapanese 
Verb 
agaru~-- -- 
ageru(raise) 
aku(open, iv) 
l~aru( spread, ivT~v ) 
lgku(subside,pull) 
h irakll.(open, iv7%V f 
kakeru(hang) 
, o',qorm to) 
~au(buy) 
'Fable 3: Eva,lnation of Sense 
Japanese Case- 
Class Frame fa 
9a(NOM)~Tvo A~AC~ 
wo(ACC) 
/ni(DAT) wo(A~O3)---- 
Classification 
'Pot al-- O n--e S~nn se-Clu s ter 
41 r---g- ~.2~;y- 
~-~-~ 30~ 
15 54 1.=C~ 50 (92.6%) 
1C74_-- 3--f- ~g-7-:Yl-~ 
Average 
classification method. Clusters of intransitive 
senses are discovered with the Japanese case-class 
frames which contain the .qa case (the nominative 
ease), while those of transitive senses are discov- 
ered with the Japanese case-class frames which 
contain the w0 case (the accusative c~se) and ni 
ease (the dative case). 
6.3 Evaluation 
u,,,,a~-~rTt,%i-d.-7- 
Classif. Hand:Glassif. 
~-- 2.41 
t8 _ 3.00 
8 1.50 
-- 11---- 1.73 
2~--- 1.74 
10 :1.50 
25 1.80 
3 4.(;7 
7 Conclusion 
This pal)er proposed a t)ilingual class-based 
method for sense classification of verbal i)olysemy, 
which is based on the maximization of the bilin- 
gual class/frame association score. It achieved 
fairly high accuracy, although it is necessary to 
farther merge the clusters so that exactly one clus-- 
ter corresponds to one hand-classified sense. We 
are planning to make experiments on sense classi- 
fication without bilingual information to evaluate 
the e.lt'ectiveness of such bilingual information. 
li'or 9 verl)s, we made an ext)eriment on sense 
classification of verbal polysemy. We compared 
the result with the hand-classification and checked 
whether each cluster contained examples of only 
one hand-classitied sense (Table 3). In the ta- 
ble, 'CI.' and 'lEg.' indicate the numbers of ellis= 
ters and examples, respectively. The column 'One 
Sense (Jluster' means that each cluster contains 
examples of only one hand-classified sense, and 
the sub-eohmms 'CI.' and 'Eg.' list the number of 
SlLch (:lusters and the sum of examples contained 
in such clusters, respectively. We ewduated the 
accuracy of the method am the rate of the num- 
ber of examples contained in one sense clusters 
as in the 'Eg.' sub-eohmm. This achieved 100% 
accuracy for four verbs out of the 9 verbs, and 
93.3% in average. The coluinn 'Total C1./Hand- 
Classif.' indicates the ratio of the total number of 
clusters to the number of hand-classified senses, 
correspoading to the average number of clnsters 
into which one hand-classified sense is divided. Its 
a, verag% median, and standard deviation are 2.46, 
1.80, and 1.06, respectively. 
The result of the experiment indicated that the 
t)r<)posed sense classification method has achieved 
almost pure classification, while the result seems a 
little liner than hand-elassitieation. This is mainly 
cause<l by the Net that clusters which correspond 
to the same hand-classified sense are separately 
located in the human-made thesaurus, and it is 
not easy to find exactly one representative class 
in the thesaurus (Utsuro, 11995). It is necessary 
to further merge the clusters so that exactly one 
cluster corresponds to one hand-classified sense. 
References 
K. W. Church and P. Hanks. 1990. Word associa- 
tion norms, mutual information, and lexicography. 
Computational Linguistics, \[6(1):22--29. 
I. Dagan, A. Itai, and U. Schwall. 1991. Two lan- 
guages are more informative than one. hi Proc. of 
the 29th Annual Meeting of ACL, t)ages 130-137. 
IPA, (hfformation--technology Promotion Agency, 
Japan). 1987. IPA Lexicon of the ,lapanesc Lan- 
Japanese). 
Y. Matsumoto, f\[. \[shimoto, and '\[L Utsuro. 1993. 
Structural matching of parallel texts. In Proc. of 
the 3lst Annual Meeting of A CL, pages 23 a0. 
NLRI, (National Language Research Institute). 1993. 
Word List by Semantic Principles. Syuei Syuppan. 
(in Jat)anese ). 
P. Resnik. 11992. WordNet and distributional anal- 
ysis: A class-based approach to lexieal discovery. 
In Proc. of the AAA\[-92 Workshop on Statistically- 
Based Natural Language Programming Techniques, 
pages 48-56. 
S. R. Roger. 1911. Roget's Thesaurus. Crowell Co. 
M. Shimizu and S. Narita, editors. 1979. 2apanese- 
English Dictionary. Kodansha Gakujutsu Blmko. 
T. Utsuro, Y. Matsuinoto, and M. Nagao. 1993. Ver- 
bal case frame acquisition fi'om bilingual corpora. 
in Proc. of the 13th L1CAI, pages 1150 1156. 
q'. Utsuro. 1995. Class-based sense classification of 
verbal polysemy in case frame acquisition from par-. 
allel corpora. In l)roe, of the 3rd Natural Language 
Proce.ssing Pacific Rim Symposium, pages 671- 677. 
973 
