Proceedings of EACL '99 
Word Sense Disambiguation in Untagged Text based on Term 
Weight Learning 
Fumiyo Fukumoto and Yoshimi Suzukit 
Department of Computer Science and Media Engineering, 
Yamanashi University 
4-3-11 Takeda, Kofu 400-8511 Japan 
{fukumoto@skye.esb, ysuzuki@windermere.alpsl.esit }.yamanashi.ac.jp 
Abstract 
This paper describes unsupervised learn- 
ing algorithm for disambiguating verbal 
word senses using term weight learning. 
In our method, collocations which char- 
acterise every sense are extracted using 
similarity-based estimation. For the re- 
sults, term weight learning is performed. 
Parameters of term weighting are then 
estimated so as to maximise the colloca- 
tions which characterise every sense and 
minimise the other collocations. The re- 
suits of experiment demonstrate the ef- 
fectiveness of the method. 
1 Introduction 
One of the major approaches to disambiguate 
word senses is supervised learning (Gale et al., 
1992), (Yarowsky, 1992), (Bruce and Janyce, 
1994), (Miller et al., 1994), (Niwa and Nitta, 
1994), (Luk, 1995), (Ng and Lee, 1996), (Wilks 
and Stevenson, 1998). However, a major obstacle 
impedes the acquisition of lexical knowledge from 
corpora, i.e. the difficulties of manually sense- 
tagging a training corpus, since this limits the ap- 
plicability of many approaches to domains where 
this hard to acquire knowledge is already avail- 
able. 
This paper describes unsupervised learning al- 
gorithm for disambiguating verbal word senses us- 
ing term weight learning. In our approach, an 
overlapping clustering algorithm based on Mutual 
information-based (Mu) term weight learning be- 
tween a verb and a noun is applied to a set of 
verbs. It is preferable that Mu is not low (Mu(x,y) 
_> 3) for a reliable statistical analysis (Church et 
al., 1991). However, this suffers from the problem 
of data sparseness, i.e. the co-occurrences which 
are used to represent every distinct senses does 
not appear in the test data. To attack this prob- 
lem, for a low Mu value, we distinguish between 
unobserved co-occurrences that are likely to oc- 
cur in a new corpus and those that are not, by 
using similarity-based estimation between two co- 
occurrences of words. For the results, term weight 
learning is performed. Parameters of term weight- 
ing are then estimated so as to maximise the col- 
locations which characterise every sense and min- 
imise the other collocations. 
In the following sections, we first define a pol- 
ysemy from the viewpoint of clustering, then de- 
scribe how to extract collocations using similarity- 
based estimation. Next, we present a clustering 
method and a method for verbal word sense dis- 
ambiguation using the result of clustering. Fi- 
nally, we report on an experiment in order to show 
the effect of the method. 
2 Polysemy in Context 
Most previous corpus-based WSD algorithms are 
based on the fact that semantically similar words 
appear in a similar context. Semantically sim- 
ilar verbs, for example, co-occur with the same 
nouns. The following sentences from the Wall 
Street Journal show polysemous usages of take. 
(sl) Coke has typically taken a minority 
stake in such ventures. 
(sl') Guber and pepers tried to buy a stake 
in mgm in 1988. 
(s2) That process of sorting out specifies is 
likely to take time. 
(s2') We spent a lot of time and money in 
building our group of stations. 
Let us consider a two-dimensional Euclidean space 
spanned by the two axes, each associated with 
stake and time, and in which take is assigned a 
vector whose value of the i-th dimension is the 
value of Mu between the verb and the noun as- 
signed to the i-th axis. Take co-occurs with the 
two nouns, while buy and spend co-occur only 
with one of the two nouns. Therefore, the dis- 
tances between take and these two verbs are large 
209 
Proceedings of EACL '99 
and the synonymy of take with them disappears• 
stake AL>buy 
takel ~- o~ take 
pend 
time 
Figure 1: The decomposition of the verb take 
In order to capture the synonymy of take with 
the two verbs correctly, one has to decompose the 
vector assigned to take into two component vec- 
tors, takel and take2, each of which corresponds 
to one of the two distinct usages of take (in Figure 
1). (we call them hypothetical verbs in the follow- 
ing). The decomposition of a vector into a set of 
its component vectors requires a proper decom- 
position of the context in which the word occurs. 
Furthermore, in a general situation, a polysemous 
verb co-occurs with a large group of nouns and 
one has to divide the group of nouns into a set of 
subgroups, each of which correctly characterises 
the context for a specific sense of the polysemous 
word. Therefore, the algorithm has to be able to 
determine when the context of a word should be 
divided and how. 
The approach proposed in this paper explic- 
itly introduces new entities, i.e. hypothetical verbs 
when an entity is judged polysemous and asso- 
ciates them with contexts which are sub-contexts 
of the context of the original entity• Our algorithm 
has two basic operations, splitting and lumping• 
Splitting means to divide a polysemous verb into 
two hypothetical verbs and lumping means to com- 
bine two hypothetical verbs to make one verb out 
of them (Fukumoto and Tsujii, 1994). 
3 Extraction of Collocations 
Given a set of verbs, vl, v2,--., v,~, the algorithm 
produces a set of semantic clusters, which are or- 
dered in the ascending order of their semantic de- 
viation values• Semantic deviation is a measure 
of the deviation of the set in an n-dimensional 
Euclidean space, where n is the number of nouns 
which co-occur with the verbs• 
In our algorithm, if vi is non-polysemous, it be- 
longs to at least one of the resultant semantic clus- 
ters. If it is polysemous, the algorithm splits it 
into several hypothetical verbs and each of them 
belongs to at least one of the clusters• Table 1 
summarises the sample result from the set {close, 
open, end}. 
Table 1: Distinct senses of the verb 'close' 
Vi n Mu(vi ,n) 
closel 
(open) 
close2 
(end) 
account 
banking 
acquisition 
book 
bottle 
announcement 
connection 
conversation 
period 
practice 
2.116 
2.026 
1.072 
4.427 
3.650 
1.692 
2.745 
4.890 
1.876 
2.564 
In Table 1, subsets 'open' and 'end' correspond to 
the distinct senses of'close'. Mu(vi,n) is the value 
of mutual information between a verb and a noun. 
If a polysemous verb is followed by a noun which 
belongs to a set of the nouns, the meaning of the 
verb within the sentence can be determined ac- 
cordingly, because a set of the nouns characterises 
one of the possible senses of the verb. 
The basic assumption of our approach is that 
a polysemous verb could not be recognised cor- 
rectly if collocations which represent every dis- 
tinct senses of a polysemous verb were not 
weighted correctly. In particular, for a low Mu 
value, we have to distinguish between those unob- 
served co-occurrences that are likely to occur in a 
new corpus and those that are not. We extracted 
these collocations which represent every distinct 
senses of a polysemous verb using similarity-based 
estimation. Let (wv, nq) and (w~i , nq) be two dif- 
ferent co-occurrence pairs. We say that wv and 
nq are semantically related if w~i and nq are se- 
mantically related and (wp, nq) and (w~i , nq) are 
semantically similar (Dagan et al., 1993). Us- 
ing the estimation, collocations are extracted and 
term weight learning is performed. Parameters 
of term weighting are then estimated so as to 
maximise the collocations which characterise ev- 
ery sense and minimise the other collocations. 
Let v be two senses, wp and wl, but not be 
judged correctly. Let N_Setl be a set of nouns 
which co-occur with both v and wp, but do not co- 
occur with wl. Let also N.Set2 be a set of nouns 
which co-occur with both v and wl, but do not 
co-occur with wp, and N-Set3 be a set of nouns 
which co-occur with v, wp and wl. Extraction 
of collocations using similarity-based estimation 
210 
Proceedings of EACL '99 
begin 
(a) for all nq E N_Sett - N_Set3 such that Mu(wp,nq) < 3 
t Extract wpi (1 < i < s) such that Mu(w~i, nq) > 3. Here, s is the number of verbs which 
co-occur with nq 
for all w;i 
if w~i exists such that Sim(wp,w'pi ) > 0 
(a-l) then parameters of Mu of(wp,nq) and (v,rtq) are set to a (1 < a) 
(a-2) else parameters of Mu of (wp,nq) and (V,nq) are set to ~ (0 </3 < 1) 
end_if 
end_for 
end_for 
(b) for all n, E g_Set3 such that Mu(wp,rt,) >_ 3 and Mu(wt,n,) > 3 
t Extract wp~ (1 < i < t) such that Mu(w~, ~) > 3. Here, t is the number of verbs which 
co-occur with n, 
for all w~i 
if w;, exists such that Sirn(wp,w'pl ) > 0 and Sirn(wt,w;i ) > 0 
then parameters of Mu of (v,n.), (wp,n.) and (wl,n.) are set to/3 (0 < /3 < 1) 
end_if 
end_for 
end_for 
end 
Figure 2: Extraction of collocations 
is shown in Figure 2 t 
In Figure 2, (a-l) is the procedure to extract 
collocations which were not weighted correctly 
and (a-2) and (b) are the procedures to extract 
other words which were not weighted correctly. 
Sim(vi, v~) in Figure 2 is the similarity value ofvl 
and v~ which is measured by the inner product of 
their normalised vectors, and is shown in formula 
(1). 
v i × ~)~ 
vi = (v~:,..-,v~) 
(1) 
{ Mu(vi,nj) ifMu(vi,nj) >_ 3 
vii = 0 otherwise (2) 
In formula (1), k is the number of nouns which 
co-occur with vi. vii is the Mu value between vl 
and nj. 
We recall that wp and nq are semantically re- 
lated if w~i and nq are semantically related and 
(wv,n q) and (w'pi,nq) are semantically similar. (a) 
' and nq are se- in Figure 2, we represent wpi 
mantically related when Mu(w~i,nq) >__ 3. Also, 
(wv,nq) and (w'pi,nq) are semantically similar if 
t For wt, we can replace wp with wt, nq 6 N_Sett - 
N_Sets with nq E N_Set, - N.Sets, and Sim(wp, w'pl) 
> 0 with Sirn(wt, w'pi) > O. 
Sim(wp, w~i ) > 0. In (a)of Figure 2, for example, 
when (wp,nq) is judged to be a collocation which 
represents every distinct senses, we set Mu values 
of (wp,nq) and (v,nq) to a x Mu(wp,nq) and a x 
Mu(v,r%), 1 < a. On the other hand, when nq 
is judged not to be a collocation which represents 
every distinct senses, we set Mu values of these 
co-occurrence pairs to fl x Mu(wp,nq) and /3 x 
Mu(v,nq), 0 < j3 < 1 2 
4 Clustering a Set of Verbs 
Given a set of verbs, VG = {vl, -- -, vm}, the algo- 
rithm produces a set of semantic clusters, which 
are sorted in ascending order of their semantic de- 
viation. The deviation value of VG, Dev(VG) is 
shown in formula (3). 
Dev(VG) 
1 E(vo ~)2 
191(~*m+7) ~=: j__: 
(3) 
/3 and 7 are ob- 
tained by least square estimation 3 . vii is the 
1 m Mu value between v{ and n i. ~ = ~-~i=lvij 
In the experiment, we set increment value of a 
and decrease value of/3 to 0.001. 
3 Using Wall Street Journal, we obtained 13 = 0.964 
and 7 = -0.495. 
211 
Proceedings of EACL '99 
is the j-th value of the centre of gravity. \[ 0 \[ = 
1 n m 2 ~i~j=l(~i vii) is the length of the centre of 
gravity. In formula (3), a set with a smaller value 
is considered semantically less deviant. 
Figure 3 shows the flow of the clustering algo- 
rithm. As shown in '(' in Figure 3, the func- 
tion Make-Inltial-Cluster-Set applies to VG 
and produces all possible pairs of verbs with 
their semantic deviation values. The result is a 
list of pairs called the ICS (Initial Cluster Set). 
The CCS (Created Cluster Set) shows the clus- 
ters which have been created so far. The func- 
tion Make-Temporary-Cluster-Set retrieves 
the clusters from the CCS which contain one of 
the verbs of Seti. The results (Set~3) are passed to 
the function Reeognition-of-Polysemy, which 
determines whether or not a verb is polysemous. 
Let v be an element included in both Seti and 
Set 3. To determine whether v has two senses wp, 
where wp is an element of Seti, and wl, where wl 
is an element of Set3, we make two clusters, as 
shown in (4) and their merged cluster, as shown 
in (5). 
{vl, wp}, {v=, wl,---, (4) 
{v, wp,---, (5) 
Here, v and wp are verbs and wl, • • -, w,~ are verbs 
or hypothetical verbs, wl, "-', wp, -.-, w,~ in (5) 
satisfy Dev(v, wi) < Dev(v,wj) (1 < i _< j < n). 
vl and v2 in (4) are new hypothetical verbs which 
correspond to two distinct senses of v. 
If v is a polysemy, but is not recognised cor- 
rectly, then Extraction-of-Collocations shown 
in Figure 2 is applied. In Extraction-of- 
Collocations, for (4) and (5), a and /3 are es- 
timated so as to satisfy (6) and (7). 
D,v(,.,,,~,,)_< O~v(,,~,,,-.-,~,,,,.-,,=n) (6) 
Dev(v2,w,,...,w,~) < Oev(v,w,,...,wp,..,,w,~) (7) 
The whole process is repeated until the newly ob- 
tained cluster, Setx, contains all the verbs in the 
input or the ICS is exhausted. 
5 Word Sense Disambiguation 
We used the result of our clustering analysis, 
which consists of pairs of collocations of a distinct 
sense of a polysemous verb and a noun. 
Let v has senses vl, v2, "--, v,~. The sense 
of a polysemous verb v is vi (1 < i < m) if 
t ~- Ej Mu(vi,ni) is largest among Ej Mu(vl,nj), 
• .. and Et~ Mu(v,~,nj). Here, t is the number of 
nouns which co-occur with v within the five-word 
distance. 
6 Experiment 
This section describes an experiment conducted 
to evaluate the performance of our method. 
6.1 Data 
The data we have used is 1989 Wall Street Jour- 
nal (WSJ) in ACL/DCI CD-ROM which consists 
of 2,878,688 occurrences of part-of-speech tagged 
words (Brill, 1992). The inflected forms of the 
same nouns and verbs are treated as single units. 
For example, 'book' and 'books' are treated as sin- 
gle units. We obtained 5,940,193 word pairs in a 
window size of 5 words, 2,743,974 different word 
pairs. From these, we selected collocations of a 
verb and a noun. 
As a test data, we used 40 sets of verbs. We 
selected at most four senses for each verb, the best 
sense, from among the set of the Collins dictionary 
and thesaurus (McLeod, 1987), is determined by 
a human judge. 
6.2 Results 
The results of the experiment are shown in Table 
2, Table 3 and Table 4. 
In Table 2, 3 and 4, every polysemous verb has 
two, three and four senses, respectively. Column 
1 in Table 2, 3 and 4 shows the test data. The 
verb v is a polysemous verb and the remains show 
these senses. For example, 'cause' of (1) in Table 
2 has two senses, 'effect' and 'produce'. 'Sentence' 
shows the number of sentences of occurrences of 
a polysemous verb, and column 4 shows their dis- 
tributions. 'v' shows the number of polysemous 
verbs in the data. W in Table 2 shows the num- 
ber of nouns which co-occur with wp and wl. v 
n W shows the number of nouns which co-occur 
with both v and W. In a similar way, W in Table 
3 and 4 shows the number of nouns which co-occur 
with wp ~ w2 and wp ~ w3, respectively. 'Correct' 
shows the performance of our method. 'Total' in 
the bottom of Table 4 shows the performance of 
40 sets of verbs. 
Table 2 shows when polysemous verbs have two 
senses, the percentage attained at 80.0%. When 
polysemous verbs have three and four senses, the 
percentage was 77.7% and 76.4%, respectively. 
This shows that there is no striking difference 
among them. Column 8 and 9 in Table 2, 3 and 
4 show the results of collocations which were ex- 
tracted by our method. 
212 
Proceedings of EACL '99 
begin 
ICS := Make-Initial-Cluster-Set(VG) 
vo = {v~ l i = 1,..., m} Its = {sal,---, Set.,,,,;-,, } 
where Setp = {vi, vj} and Setq = {vk,vt} E ICS (1 ~ p < q < m) satisfy Dev(vi, vj) < Dev(vk,vt 
for i:= 1 to ~ do 
if CCS = ¢ 
then Set 7 := Set~ i.e. Seti is stored in CCS as a newly obtained cluster 
else if Set a E CCS exists such that SeQ C Seth 
then Seti is removed from ICS and Set 7 := ¢ 
else if 
for all Seth E CCS do 
if Setl fq Set,, = ¢ 
then Set 7 := Seti i.e. Seti is stored in CCS as a newly obtained cluster 
end_if 
end_for 
else Setz := Make-Temporary-Cluster-Set( Set~,CCS) 
( Set~ := Seth E CCS such that Seti M Seta ~£ ¢ 
Set 7 := Recognltion-of-Polysemy( Seti,Set~ ) 
if Set 7 was not recognised correctly 
then for v, wp and wl, do 
Extractlon-of- C oUo cations. 
end..for 
i:=1 
end_if 
end_.if 
end_if 
end_if 
if Set 7 = VG 
then exit from the for_loop ; 
end_if 
end_.for 
end 
Figure 3: Flow of the algorithm 
Mu < 3 shows the number of nouns which satisfy 
Mu(wp,n) < 3 or Mu(wt,n) <3. 'Correct' shows 
the total number of collocations which could be 
estimated correctly. Table 2 ~ 4 show that the 
frequency of v is proportional to that of v M W. 
As a result, the larger the number of v M W is, 
the higher the percentage of correctness of collo- 
cations is. 
7 Related Work 
Unsupervised learning approaches, i.e. to de- 
termine the class membership of each object to 
be classified in a sample without using sense- 
tagged training examples of correct classifications, 
is considered to have an advantage over supervised 
learning algorithms, as it does not require costly 
hand-tagged training data. 
Schiitze and Zernik's methods avoid tagging 
each occurrence in the training corpus. Their 
methods associate each sense of a polysemous 
word with a set of its co-occurring words (Schutze, 
1992), (Zernik, 1991). Ifa word has several senses, 
then the word is associated with several different 
sets of co-occurring words, each of which corre- 
sponds to one of the senses of the word. The 
weakness of Schiitze and Zernik's method, how- 
ever, is that it solely relies on human intuition for 
identifying different senses of a word, i.e. the hu- 
man editor has to determine, by her/his intuition, 
how many senses a word has, and then identify 
the sets of co-occurring words that correspond to 
the different senses. 
213 
Proceedings of EACL '99 
Table 2: The result of disambiguation experiment(two senses) 
(6) 
\[__ 
122 
"-~cause~ e~'ect ~ 
  require a-~ 
"-Telose, open, 
~ rrect(~ 
"-'(fall, decline, win} \] 278 
"-~feel, think, sense T T 280 
{hit, attack, strike} I 250 
{leave, remain, go} \[ 183 
gcty t ~Ol 
accomplish, operate'}-- 216 
--{occur, happen, ~ 
--{order, request, arrange-'~"~ 240 
"-~ass, adopt, ~ 
274 
-'~roduce, create, gro'~~"--""2~ 
--~ush, attack, pull~ 
-~s~ve, 
223 
"-{ship, put, send} 
{stop, end, move} 
{add, append, total} 
{keep, maintain, protect} 
Total 
215(77.3 
181(72.4 
160(87.4 
349(92.3) 
~-~ Correct(%)\] 
83(77.0) 
113(86.2) 
I 169(87.5) J 
Yarowsky used an unsupervised learning pro- 
cedure to perform noun WSD (Yarowsky, 1995). 
This algorithm requires a small number of training 
examples to serve as a seed. The result shows that 
the average percentage attained was 96.1% for 12 
nouns when the training data was a 460 million 
word corpus, although Yarowsky uses only nouns 
and does not discuss distinguishing more than two 
senses of a word. 
A more recent unsupervised approach is de- 
scribed in (Pedersen and Bruce, 1997). They 
presented three unsupervised learning algorithms 
that distinguish the sense of an ambiguous word in 
untagged text, i.e. McQuitty's similarity analysis, 
Ward's minimum-variance method and the EM al- 
gorithm. These algorithms assign each instance 
of an ambiguous word to a known sense definition 
based solely on the values of automatically iden- 
tifiable features in text. Their methods are per- 
haps the most similar to our present work. They 
reported that disambiguating nouns is more suc- 
cessful rather than adjectives or verbs and the best 
result of verbs was McQuitty's method (71.8%), 
although they only tested 13 ambiguous words 
(of these, there are only 4 verbs). Furthermore, 
each has at most three senses. In future, we will 
compare our method with their methods using the 
data we used in our experiment. 
8 Conclusion 
In this study, we proposed a method for disam- 
biguating verbal word senses using term weight 
learning based on similarity-based estimation. 
The results showed that when polysemous verbs 
have two, three and four senses, the average per- 
centage attained at 80.0%, 77.7% and 76.4%, re- 
spectively. Our method assumes that nouns which 
co-occur with a polysemous verb is disambiguated 
in advance. In future, we will extend our method 
to cope with this problem and also apply our 
214 
Proceedings of EACL '99 
Nunl 
(21) 
(22) 
(23) 
(24) 
(2s) 
(26) 
(27) 
(28) 
(29) 
(30) 
Table 3: The result of disambiguation experiment(three senses) 
{catch, acquire, grab, watch} 
{complete, end, develop, fill} 
{gain, win, get, increase} 
{grow, increase, develop become} 
{operate, run, act, control} 
{rise, increase, appear, grow} 
{see, look, know, feel} 
{want, desire, search, lack} 
{lead, cause, guide, precede} 
{carry, bring, capture, behave} 
Total (3 senses) 
Sentence w__w__w__w__w__w__~ v v N HI Correct(%) Mu < 3 Correct(%) 
240 120(50.0) 447 432 180(75.0) 124 99(79.9) 21(9.0) 
199(41.0) 
365 107(29.3) 727 450 280(76.7) 240 193(80.4) 
242(66.3) 
16(4.4) 
334 47(14.0) 527 467 270(80.8) 187 152(81.4) 228(68.2) 
59(17.8) 
310 68(21.9) 903 651 241(77.7) 372 305(82.0) 132(42.5) 
11o(35.6) 
232 76(32.7) 812 651 187(80.6) 311 255(82.3) 
83(35.7) 
73(31.6) 
276 51(18.4) 711 414 198(71.7) 372 294(79.1) 
137(49.6) 88(32.0) 
318 128(40.2) 1,785 934 263(82.7) 497 414(83.4) 
162(50.9) 28(8.9~ 
267 66(24.7) 590 470 208(77.9) 198 159(80.8) 
53t19.8) 148(55.5) 
183 139(75.9) 548 456 138(75.4) 274 221(80.9) 
38(20.7) 6(3.4) 
186 142(76.3) 474 440 142(76.3) 207 167(80.7) 
39(20.9) 5(2.8) 
2,711 1,573(56.5) 2,107(77.7) 
method to not only a verb but also a noun and 
an adjective sense disambiguation to evaluate our 
method. 
Acknowledgments 
The authors would like to thank the reviewers 
for their valuable comments. This work was sup- 
ported by the Grant-in-aid for the Japan Society 
for the Promotion of Science(JSPS). 
References 
E. Brill. 1992. A simple rule-based part of speech 
tagger. In Proc. of the 3rd Conference on Ap- 
plied Natural Language Processing, pages 152- 
155. 
R. Bruce and W. Janyce. 1994. Word-sense dis- 
ambiguation using decomposable models. In 
Proc. of the 32nd Annual Meeting, pages 139- 
145. 
K. W. Church, W. Gale, P. Hanks, and D. Hindte. 
1991. Using statistics in lexical analysis. In 
Lezical acquisition: Ezploiting on-line resources 
to build a lezicon, pages 115-164. (Zernik Uri 
(ed.)), London, Lawrence Erlbaum Associates. 
I. Dagan, P. Fernando, and L. Lilian. 1993. Con- 
textual word similarity and estimation from 
sparse data. In Proc. of the 31th Annual Meet- 
ing of the ACL, pages 164-171. 
F. Fukumoto and J. Tsujii. 1994. Automatic 
recognition of verbal polysemy. In Proc. of the 
15th COLING, Kyoto, Japan, pages 762-768. 
W. K. Gale, K. W. Church, and D. Yarowsky. 
1992. A method for disambiguating word senses 
in a large corpus. In Computers and the Hu- 
manities, volume 26, pages 415-439. 
A. K. Luk. 1995. Statistical sense disambiguation 
with relatively small corpora using dictionary 
definitions. In Proc. of the 335t Annual Meeting 
of ACL, pages 181-188. 
W. T. McLeod. 1987. The new collins dictionary 
and thesaurus in one volume. London, Harper- 
Collins Publishers. 
G. Miller, C. Martin, L. Shari, L. Claudia, and 
R. G. Thomas. 1994. Using a semantic concor- 
dance for sense identification. In Proc. of the 
ARPA Workshop on Human Language Technol- 
ogy, pages 240-243. 
H. T. Ng and H. B. Lee. 1996. Integrating mul- 
tiple knowledge sources to disambiguate word 
215 
Proceedings of EACL '99 
Table 4: The result of disambiguation experiment(four senses) 
Num {v, wp, wl, w~, wa} 
(31) {develop, create, grow, improve, 187 
expand} 
(32) {face, confront, cover, lie, turn} 222 
(33) {get, become, lose, understand, 302 
catch} 
(34) {go, come, become, run, fit} 
(35) {make, create, do, get, behave} 227 
(36) {show, appear, inform, prove, 227 
expi'ess} 
(37) {take, buy, obtain, spend, bring} 246 
Sentence wp(%) v v N W Correct(%) Mu < 3 Correct(%) 
w~(%) 
117(62.5) 922 597 155(82.8) 253 218(86.1) 
34118.1 ) 412.1) 
32(17.3) 
54(24.3) 859 567 184(82.8) 178 154(86.5) 
103(46.3) 12(s.4) 
53(24.0} 
88(29.1) 762 513 229(75.8) 424 365(86.2) 98(~2.4) 
34(11.21 82(27.3) 
217 101(46.5) 732 435 145(66.8) 374 302(80.9) 
66(30.4) 
36(16.5) 
14(6.6) 
123(54.1) 783 555 178(78.4) 435 370(85.2) 28(12.3) 
58(25.5) 18(8.1) 
121(53.3) 996 560 181(79.7) 258 214(83.2) 
16(7.0) 
40(17.6) 50(22.1) 
20(8.1) 2,742 1,244 i79(72.7) 829 677(81.6) 123(5o.o) 
42(17.o} 
6i(24.9) 
7(4.81 727 459 111(76.5) 394 300(76.2) 53(36.5) 
2(1.5) 83(57.2) 
2(1.1) 746 491 151(74.0) 341 272(79.7) 
81(39.7} 8614~.1 } 
35(17.1) 
78(48.1) 798 533 123(75.9) 143 119(83.2) 13(8.o) 
43(26.5) ~8(17.4) 
(as) 
(39) 
(40) 
{hold, keep, carry, reserve, 145 
accept } 
{raise, lift, increase, create, 204 
Collect} 
{draw, attract, pull, close, 162 
write} 
Total (4 senses) 
I Tot al 
2,139 11636(76.4) 
\[ 9,706\[ \[ \[ 7,572(75.6) II I I 
sense: An examplar-based approach. In Proc. 
of the 34th Annual Meeting of ACL, pages 40- 
47. 
Y. Niwa and Y. Nitta. 1994. Co-occurrence vec- 
tors from corpora vs. distance vectors from dic- 
tionaries. In Proc. of 15th COLING, Kyoto, 
Japan, pages 304-309. 
T. Pedersen and R. Bruce. 1997. Distinguishing 
word senses in untagged text. In Proc. of the 
2nd Conference on Empirical Methods in Natu- 
ral Language Processing, pages 197-207. 
H. Schutze. 1992. Dimensions of meaning. In 
Proc. of Supercomputing, pages 787-796. 
Y. Wilks and M. Stevenson. 1998. Word sense dis- 
ambiguation using optimised combinations of 
knowledge sources. In Proe. of the COLING- 
ACL'98, pages 1398-1402. 
D. Yarowsky. 1992. Word sense disambiguation 
using statistical models of roget's categories 
trained on large corpora. In Proc. of the l$th 
COLING, pages 454--460. 
D. Yarowsky. 1995. Unsupervised word sense dis- 
ambiguation rivaling supervised methods. In 
Proc. of the 33rd Annual Meeting of the ACL, 
pages 189-196. 
U. Zernik. 1991. Trainl vs. train2: Tagging 
word senses in corpus. In Lexical acquisi- 
tion: Exploiting on-line resources to build a lex- 
icon, pages 91-112. Uri Zernik(Ed.), London, 
Lawrence Erlbaum Associates. 
216 
