Word Sense Disambiguation Based on Structured Semantic Space* 
Ji Donghong Huang Changning 
Department of Computer Science 
Tsinghua University 
Beijing, 100084, P. R. China 
Email: jdh@s1000e.cs.tsinghua.edu.cn 
hcn@mail.tsinghua.edu.cn 
Abstract 
In this paper, we propose a framework, structured 
semantic space, as a foundation for word sense 
disarnbiguation tasks, and present a strategy to 
identify the correct sense of a word in some 
context based on the space. The semantic space is 
a set of multidimensional real-valued vectors, 
which formally describe the contexts of words. 
Instead of locating all word senses in the space, 
we only make use of mono-sense words to 
outline it. We design a merging procedure to 
establish the dendrogram structure of the space 
and give an heuristic algorithm to find the nodes 
(sense clusters) corresponding with sets of 
similar senses in the dendrogram. Given a word 
in a particular context' the context would activate 
some clusters in the dendrogram, based on its 
similarity with the contexts of the words in the 
clusters, then the correct sense of the word could 
be determined by comparing its definitions with 
those of the words in the clusters. 
1. Introduction 
Word sense disambiguation has long been one of 
the major concerns in natural language processing 
area (e.g., Bruce et al., 1994; Choueka et al., 1985; 
Gale et al., 1993; McRoy, 1992; Yarowsky 1992, 
1994, 1995), whose aim is to identify the correct 
sense of a word in a particular context, among all of 
its senses defined in a dictionary or a thesaurus. 
Undoubtedly, effective disambiguation techniques 
are of great use in many natural language processing 
tasks, e.g., machine translation and information 
retrieving (Allen, 1995; Ng and Lee, 1996; Resnik, 
1995), etc. 
Previous strategies for word sense 
disambiguation mainly fall into two categories: 
statistics-based method and exemplar-based method. 
Statistics-based method often requires large-scale 
corpora (e.g., Hirst, 1987; Luk, 1995), sense-tagging 
or not, monolingual or aligned bilingual, as training 
data to specify significant clues for each word sense. 
The method generally suffers from the problem of 
data sparseness. Moreover, huge corpora, especially 
sense-tagged or aligned ones, are not generally 
available in all domains for all languages. 
Exemplar-based method makes use of typical 
contexts (exemplars) of a word sense, e.g., verb- 
noun collocations or adjective-noun collocations, 
and identifies the correct sense of a word in a 
particular context by comparing the context with the 
exemplars (Ng and Lee, 1996). Recently, some 
kinds of learning techniques have been applied to 
cumulatively acquire exemplars form large corpora 
(Yarowsky, 1994, 1995). But ideal resources from 
which to learn exemplars are not generally available 
for any languages. Moreover, the effectiveness of 
this method on disambiguating words in large-scale 
corpora into fine-grained sense distinctions needs to 
be further investigated (Ng and Lee, 1996). 
* The work is supported by National Science Foundation of China. 
187 
A common assumption held by both 
approaches is that neighboring words provide strong 
and consistent clues for the correct sense of a target 
word in some context. In this paper, we also hold 
the same assumption, but start from a different point. 
We see the senses of all words in a particular 
language as forming a space, which we call 
semantic space, for any word of the language, each 
of its senses is regarded as a point in the space. So 
the task of disambiguating a word in a particular 
context is to locate an appropriate point in the space 
based on the context. 
Now that word senses can be generally 
suggested by their distributional contexts, we model 
senses with their contexts. In this paper, we 
formalize the contexts as a kind of multidimensional 
real-valued vectors, so the semantic space can be 
seen as a vector space. The similar idea about 
representing contexts with vectors has been 
proposed by Schuetze (1993), but what his work 
focuses on is the contexts of words, while what we 
concern is the contexts of word senses. Furthermore, 
his formulation of contexts is based on word 
frequencies, while we formalize them with semantic 
codes given in a thesaurus and their salience with 
respect to senses. 
It seems that we should first have a large-scale 
sense-tagged corpus in order to build semantic space, 
but establishing such a corpus is obviously too time- 
consuming. To simplify it, we only try to outline the 
semantic space by locating the mono-sense words in 
the space, rather than build it completely by spotting 
all word senses in the space. 
Now that we don't try to specify all word 
senses in the semantic space, for a word in a 
particular context, it may be the case that we cannot 
directly spot its correct sense in the space, because 
the space may not contain the sense at all. But we 
could locate some senses in the space which are 
similar with it according to their contexts, and based 
on their definitions given in a dictionary, we could 
make out the correct sense of the word in the 
context. 
In our implementation, we first build the 
semantic space based on the contexts of the mono- 
sense words, and structure the senses in the space as 
a dendrogram, which we call structured semantic 
space. Then we make use of an heuristic method to 
determine some nodes in the dendrogram which 
correspond with sets of similar senses, which we 
call sense clusters. Finally, given a target word in a 
particular context, some clusters in the dendrogram 
can be activated by the context, then we can make 
use of the definitions of the target word and the 
words ~ in the clusters to determine its correct sense 
in the context. 
The remainder of the paper is organized as 
follows: Section 2 defines the notion of semantic 
space, and discuss how to outline it by establishing 
the context vectors for mono-sense words. Section 3 
examines the structure of the semantic space, and 
introduces algorithms to merge the senses into a 
dendrogram and specify the nodes in it which 
correspond with sets of similar senses. Section 4 
discusses the disambiguation procedure based on the 
contexts. Section 5 describes some experiments and 
their results. Section 6 presents some conclusions 
and discusses the future work 
2 Semantic Space 
In general, a word may have several senses and may 
appear in several different kinds of contexts. From a 
point of empirical view, we suppose that each sense 
of a word is corresponded with a particular kind of 
context it appears, and the similarity between word 
senses can be measured by their corresponding 
contexts. For a particular kind of language, we 
regard its semantic space as the set of all word 
senses of the language, with similarity relation 
between them. 
Now that word senses are in accordance with 
their contexts, we use the contexts to model word 
senses. Due to the unavailability of large~-scale 
i Because the senses in the semantic space are of mono-sense 
words, we don't distinguish "words" from "senses" strictly 
here. 
188 
sense-tagged corpus, we try to outline the semantic 
space by only taking into consideration the mona- 
sense words, instead of locating all word senses in 
the space. 
In order to formally represent word senses, we 
formalize the notion of context as multidimensional 
real-valued vectors. For any word, we first annotate 
its neighboring words within certain distances in the 
corpus with all of their semantic codes in a 
thesaurus respectively, then make use of such codes 
and their salience with respect to the word to 
formalize its contexts. Suppose w is a mona-sense 
word, and there are n occurrences of the word in a 
corpus, i.e., wt, we .... , w,, (1) lists their neighbor 
words within d word distances respectively. 
(l) 
al,.d, al,-(d.l), ..., al,-i 
az-a, az-ca-O .... , az.t 
an,~, a,,-(d-O ..... an, a 
WI at, l, al,2, ..., al,d 
we azt, az2 ..... aza 
Wn an, l, an, e, ..., an, d 
Suppose Cr is the set of all the semantic codes 
defined in a thesaurus, for any occurrence wt, 1_< i.~_n, 
let NCi. be the set of all the semantic codes of its 
neighboring words which are given in the thesaurus, 
for any c ~ Cr, we define its salience with respect to 
w, denoted as Sal(c, w), as (2). 
(2) Sal(c, w)- 
Itw, Nc,)I 
n 
So we can build a context vector for w as (3), 
denoted as CVw, whose dimension is \[Cr\[. 
Chinese thesaurus, iii) a Chinese corpus consisting 
of 80 million Chinese characters. In the Chinese 
dictionary, 37,824 words have only one sense, 
among which only 27,034 words occur in the corpus, 
we select 15,000 most frequent mona=sense words 
in the corpus to build the semantic space for 
Chinese. In the Chinese thesaurus, the words are 
divided into 12 major classes, 94 medium classes 
and 1428 minor classes respectively, and each class 
is given a semantic code, we select the semantic 
codes for the minor classes to formalize the contexts 
of the words. So k=\[ CT\[ =1428. 
3. Structure of Semantic Space 
• Due to the similarity/dissimilarity relation between 
word senses, those in the semantic space cannot be 
distributed in an uniform way. We suppose that the 
senses form some clusters, and the senses in each 
cluster are similar with each other. In order to make 
out the clusters, we first construct a dendrogram of 
the senses based on their similarity, then make use 
of an heuristic strategy to select some appropriate 
nodes in the dendrogram which most likely 
correspond with the clusters. 
Now that word senses occur in accordance with 
their contexts, we measure their similarity 
/dissimilarity by their contexts. For any two senses 
st, seeS, let cvt=(xt xe ... xk), cve=( Yt Ye ... yk) be 
their context vectors respectively, we define the 
distance between st and se; denoted as dis(st, se), 
based on the cosine of the angle between the two 
vectors. 
(4) dis(st, s2)=l-cos(cvt, cv2) 2 
(3) cvw=<Sal(ct, w), Sal(c2, w) ..... Sal(ck, w)> 
where k= I CTI. 
When building the semantic space for Chinese 
language, we make use of the following resources, i) 
Xiandai Hanyu Cidian(1978), a Modem Chinese 
Dictionary, ii) Tongyici Cilin(Mei et al, 1983), a 
Obviously, dis(st, s2) is a normalized coefficient: its 
value ranges from 0 to 1. 
Suppose S is the set of the mona-senses in the 
189 
I V l~i~k l~i~k 
semantic space, for any sense si~S, we create a 
preliminary node dj, and let Idjl=l, which denotes 
the number of senses related with d~. Suppose D be 
the set of all preliminary nodes, the following is the 
algorithm to construct the dendrogram. 
0.105 
O. 078 
O. 04.3 O. 058 
ai shang/ /b e ishartg/ /b e it ong/ /b e iai/ 
O. 067 
~.t~ 
/ shartgbei/ / a:i.t ort~ 
Fig. 1 A subtree of the dendrogram for Chinese mono-sense words 
Algorithm 1. 
Procedure Den-construct(D) 
begin 
i) select dr and d2 among all in D, 
whose distance is the smallest; 
ii) merge dl and d2 into a new node d, 
and let Idl=l dl I+l de l; 
iii) remove dr and d2 from D, and put 
d into D; 
iv) compute the context vector of d 
based on the vectors of dl and d/; 
v) go to i) until there is only one 
node; 
end; 
Obviously, the algorithm is a bottom-up merging 
procedure. In each step, two closest nodes are 
selected and merged into a new one. In (n-1)th step, 
where n is the number of word senses in S, a final 
node is produced. The complexity of the algorithm 
is O(n 3) when implementing it directly, but can be 
reduced to O(n 2) by sorting the distances between 
all nodes in each previous step. 
Fig. 1 is a sub-tree of the dendrogram we build 
for Chinese. It contains six mono-sense words, 
whose English correspondences are sad, sorrowful, 
etc. In the sub-tree, we mark each non-preliminary 
node with the distance between the two merged sub- 
nodes, which we also refer to as the weight of the 
node. 
It can be proved that the distances between the 
merged nodes in earlier merging steps are smaller 
than those in later merging steps 4. According to the 
similarity/dissimilarity relation between the senses, 
there should exist a level across the dendrogram 
such that the weights of the nodes above it are 
bigger, while the weights of the nodes below it 
smaller, in other words, the ratio between the mean 
weight of the nodes above the level and that of the 
nodes below the level is the biggest. Furthermore we 
suppose that the nodes immediately below the level 
correspond with the clusters of similar senses. So, in 
3 We call (zl z2 ... zk ) the context vector of d, where for all i, 
l_<i_<k, zi = (\[ d, \]e x, +1 d21- y,)/I d I. 
4 This can be seen from the fact that the context vector ofd is 
a linear composition of the vectors of dl and d2. 
190 
order to make out the sense clusters, we only need to 
determine the level. 
Unfortunately, the complexity of determining 
such a level is exponential to the edges in the 
dendrogram, which demonstrates that the problem is 
hard. So we adopt an heuristic strategy to determine 
an optimal level. 
Suppose T is the dendrogram, sub_T is the sub- 
tree of T, which takes the same root as T, we also 
use T and sub T to denote the sets of non- 
preliminary nodes in T and in sub_T respectively, 
for any de T, let Wei(d) be the weight of the node d, 
we define an object function, as (5): 
(5) 
~ Wei ( d) / 
d~sub T /sub_ 7~ 
Oby(sub_T)-  Wei(d)/ 
de(T-sub T) / 
- /JZ- ub 
/* v.L is not a sense cluster */ 
else Tee-- Tc ~ {v.L}; 
/* v.L is a sense cluster */ 
if Obj(sub T+v.R)>Obj(sub T) 
then Clustering(v.R, sub T) 
/* v.R is not a sense cluster */ 
else Tc¢-- Tc u {v.R}; 
/* v.R is a sense cluster */ 
end; 
The algorithm is a depth-first search 
procedure. Its complexity is O(n), where n is the 
number of the leaf nodes in the dendrogram, i.e., the 
number of the mono-sense words in the semantic 
space. 
When building the dendrograrn for the Chinese 
semantic space, we found 726 sense clusters in the 
space. The distribution of the senses in the clusters 
is demonstrated in Table 1. 
where the numerator is the mean weight of the 
nodes in sub_T, while the denominator is the mean 
weight of the nodes in T-sub_T. 
In order to specify the sense clusters, we only 
need to determine a sub-tree of T which makes (5) 
get its biggest value. We adopt depth-first search 
strategy to determine the sub-tree. Suppose vo is the 
root ofT, for any veT, we use v.L andv.R to denote 
its two sub-nodes, let Tc be the set of all the nodes 
corresponding with the sense clusters, we can get Tc 
by Clustering(vo, Nlff) calling the following 
procedure. 
Algorithm 2 
Clustering(v, sub_T) 
begin 
sub_T~ sub_T+ {v} ; 
/* add node v to the subtree*/ 
if Obj(sub_T+v.L)>Obj(sub_T) 
then Clustering(v.L, subT) 
5 NIL is a preliminary value for sub_T, which demonstrates 
the tree includes no nodes. 
Number of senses Number of clusters 
\[1, 10) 
\[10, 20) 
\[20, 30 ) 
\[30, 40) 
, \[40, 58), 
92 
157 
297 
176 
4 
All: 726 
Table 1. The distribution of senses in the clusters 
4. Disambiguation Procedure 
Given a word in some context, we suppose that 
some clusters in the space can be activated by the 
context, which reflects the fact that the contexts of 
the clusters are similar with the given context. But 
the given context may contain much noise, so there 
may be some activated clusters in which the senses 
are not similar with the correct sense of the word in 
the given context. But due to the fact that the given 
context can suggest the correct sense of the word, 
there should be clusters, among all activated ones, in 
which the senses are similar with the correct sense. 
191 
To make out these clusters, we make use of the 
definitions of the words in the Modem Chinese 
Dictionary, and determine the correct sense of the 
word in the context by measuring the similarity 
between their definitions. 
percent 1 
0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
0 
90.2~3.66-/0. 
87"90%/..70.60 
'/'21"20¢'2-30% a %, ~ , 
0.1 0.2 0.3 0.4 0.5 
distance 
4.1 Activation 
Given a word w in some context, we consider the 
context as consisting of n words to the left of the 
word, i.e., w.,,, w.(,. 0 .... , w.l and n words to the right 
of the word, i.e., wl, w2, w3 ..... w,,. We make use of 
the semantic codes given in the Chinese thesaurus to 
............ ,- .... ,,nnnno/_ lnno/^ 100% 
.ll.#v/o ~u.uv/0 Jo. I JTllr ....... - .... 
I I ! I I 
0.6 0.7 0.8 0.9 1 
Fig.2 The distribution ofdisl(clu~, w). 
create a context vector to formally model the 
context,. Suppose NCw be the set of all semantic 
codes of the words in the context, then cvw=<x~, 
x2 ..... xp, where if c~eNC,, then x~=l; otherwise 
x~=O. 
For any cluster clu in the space, let cvau be its 
context vector, we also define its distance from w 
based on the cosine of the angle between their 
context vectors as (6). 
(6) disl(clu, w)=l-cos(cvdu, cvw) 
We say clu is activated, if disl(clu, w).~dl, where dt 
is a threshold. Here we don't define the activated 
cluster as the one which makes disl(clu, w) smallest, 
this is because that the context may contain much 
noise, and the senses in the cluster which makes 
disj(clu, w) smallest may not be similar with the 
very sense of the word in the context. 
To estimate a reasonable value for dl, we can 
compute the distance between the context vector of 
each mono-sense word occurrence in the corpus and 
the context vector of the cluster containing the word, 
then select a reasonable value ford1 based on these 
distances as the threshold. Suppose CLU is the set 
of all sense clusters in the space, O is the set of all 
occurrences of the mono-sense word in the corpus, 
for any weO, let cluw be the sense cluster containing 
the sense in the space, we compute all distances 
dist(cluw, w), for all weO. It should be the case that 
most values for disl(cluw, w) will be smaller than a 
threshold, but some will be bigger, even close to 1, 
this is because most contexts in which the mono- 
sense words occur would contain meaningful words 
for the senses, while other contexts contain much 
noise, and less words, even no words in the contexts 
are meaningful for the senses. 
When estimating the parameter di for the 
Chinese semantic space, we let n=5, i.e., we only 
take 5 words to the left or the right of a word as its 
context. Fig. 2 demonstrates the distribution of the 
values of disl(cluw, w), where X axle denotes the 
192 
distance, and Y axle denotes the percent of the 
distances whose values are smaller than x~\[0, 1\] 
among all distances. We produce a function fix) to 
model the distribution based on commonly used 
smoothing tools and locate its inflection point by 
settingf"(x)=0. Finally we get x=0.378, and let it be 
the threshold dr. 
(7) I{w,\[c sat(c, clu) = n 
We call (8) definition vector ofclu, denoted as dvd~. 
(8) dvau = 
< sal(cl, clu), sal(c2, clu) ..... sal(ck, clu)> 
4.2 Definition-Based Disambiguation 
Given a word w in some context c, suppose CLU~ is 
the set of all the clusters in the semantic space 
activated by the context, the problem is to determine 
the correct sense of the word in the context, among 
all of its senses defined in the modem Chinese 
dictionary. 
The activation of the clusters in CLUw by the 
context c demonstrates that c is similar with the 
contexts of the clusters in CLUw, so there should be 
at least one cluster in CLU~, in which the senses are 
similar with the correct sesne of w in c. On the other 
hand, now that the senses in a cluster are similar in 
meaning, their definitions in the dictionary should 
contain similar words, which can be characterized as 
holding the same semantic codes in the thesaurus. 
So the definitions of all the words in the clusters 
contain strong and meaningful information about the 
very sense of the word in the context. 
We first construct two definition vectors to 
model the definitions of all the words in a cluster 
and the definitions of w based on the semantic codes 
of the definition words 6, then determine the sense of 
w in the context by measuring the similarity between 
each definition of w and the definitions of all the 
words in a cluster. 
For any clu~CLU~, suppose clu={wJ l_<ig_n}, 
let C~ be the set of all semantic codes of all the 
words in w;s definition, CT be defined as above, i.e., 
the set of all the semantic codes in the thesaurus, for 
any CeCT, we define its salience with respect to clu, 
denoted as sal(c, clu), as (7). 
6 The words in the definitions are called definition words. 
Suppose Sw is the set ofw's senses defined in the 
dictionary, for any sense s~Sw, let Cs be the set of all 
the semantic codes of its definition words, we call 
dvs=<xl, x2 ..... xk> definition vector of s, where for 
all i, ifci~C,, x~=l; otherwise x~=0. 
We define the distance between an activated 
cluster in the semantic space and the sense of a word 
as (9) again in terms of the cosine of the angle 
between their definition vectors. 
(9) dis2(clu, s)=l-cos(dvau, dv,) 
Intuitively the distance can be seen as a measure 
of the similarity between the definitions of the 
words in the cluster and each definition of the word. 
Compared with the distance defined in (6), this 
distance is to measure the similarity between 
definitions, while the distance in (6) is to measure 
the similarity between contexts. 
Thus it is reasonable to select the sense s* among 
all as the correct one in the context, such that there 
exists clu'~CLUw, and dis2(clu*, s*) gets the smallest 
value as (10), for clu~CLUw, and s~Sw. 
(1 O) MIN dis 2 ( clu, s) 
cIueCLU w ,s~.S,, 
5. Experiments and Results 
In order to evaluate the application of the Chinese 
semantic space to WSD tasks, we make use of 
another Chinese lexical resource, i.e., Xiandai 
Hanyu Cihai (Zhang et al., 1994), a Chinese 
collocation dictionary. The sense distinctions in the 
dictionary are the same as those in the modem 
Chinese dictionary, and for each sense in the 
193 
collocation dictionary, some words are listed as its 
collocations. We see these collocations as the 
contexts of the word senses, and evaluate our 
algorithm automatically. We randomly select 40 
ambiguous words contained in the dictionary, and 
there are altogether 1240 words listed as their 
collocations. Table 2 lists the distribution of the 
number of the sense clusters activated by these 
collocation words. 
Table 3 lists the distribution of the smallest 
distances between the word senses and the activated 
clusters, and the accuracy of the disambiguation. 
From Table 3, we can see that smaller distances 
between the senses and the activated clusters mean 
higher accuracy of disambiguation. 
Number of activated Number of collocations 
clusters 
1 420 
2 380 
3 250 
4 100 
~5 90 
All: 1240 
Table 2. Collocation words and the number 
of activated clusters 
occurrences of the word in the corpus, and 
implement our algorithm on them respectively. The 
result is 66 occurrences are tagged with the second 
sense (6 occurrences wrongly tagged), and the 
others tagged with the first sense (2 occurrences 
wrongly tagged). The overall accuracy is 92%. To 
examine the reasonableness of the result, we 
formalize four context vectors again based on 
semantic codes to represent the contexts of four 
groups of the occurrences: 
cvl: the context of the 60 occurrences 
correctly tagged with the second sense; 
cv2: the context of the 6 occurrences wrongly 
tagged with the second sense; 
cv3: the context of the 32 occurrences 
correctly tagged with the first sense; 
cv4: the context of the 2 occurrences wrongly 
tagged with the first sense; 
The distances between these vectors are listed in 
Table 4: 
CVL 
cv2 
cv3 
cv4 
CV I CV 2 
0.364 
0.364 
0.914 
0.825 
0.941 
cv3 cv4 
0.914 0.825 
0.941 0.876 
0.320 
0.876 0.320 
Distance area 
\[0.0 0.2) 
\[0.2 0.4) 
\[0.4 0.6) 
\[0.6 1.0) 
Percent(%) 
27.3 
Accuracy(%) 
94.2 
58.2 90.5 
9.6 40.5 
4.9 10.4 
Table 3. Distribution of distances and 
disambiguation accuracy 
In another experiment, we examine the 
ambiguous Chinese word --0-~ (/danbo/7), it has 
two senses, one is less clothes taken by a man, the 
other is thin and weak. We randomly select 100 
7 The Piyin of the word. 
Table 4. The distances between the contexts of 
the four groups 
From Table 4, we find that both the distance 
between cvl and cv4 and that between cv2 and cv3 
are very high, which reflects the fact that they are 
not similar with each other. This demonstrates that 
one main reason for tagging errors is that the 
considered contexts of the words contain less 
meaningful information for determining the correct 
senses. 
In third experiment, we implement our algorithm 
on 100 occurrences of the ambiguous word f~l~ 
(/bianji/), it also has two senses, one is editor, the 
other is to edit. We find the tagging accuracy is very 
low. To explore the reason for the errors, we 
194 
compute the distances between its definitions and 
those of the words in the activated clusters, and find 
that the smallest distances fall in \[0.34, 0.87\]. This 
demonstrates that another main reason for the 
tagging errors is the sparseness of the clusters in the 
space. 
6. Conclusions and Future work 
In this paper, we propose a formal resource of 
language, structured semantic space, as a 
foundation for word sense disambiguation tasks. For 
a word in some context, the context can activate 
some sense clusters in the semantic space, due to its 
similarity with the contexts of the senses in the 
clusters, and the correct sense of the word can be 
determined by comparing its definitions and those of 
the words in the clusters. 
Structured semantic space can be seen as a 
general model to deal with WSD problems, because 
it doesn't concern any language-specific knowledge 
at all. For a language, we can first make use of its 
mono-sense word to outline its semantic space, and 
produce a dendrogram according to their similarity, 
then word sense disambiguation can be carried out 
based on the dendrogram and the definitions of the 
words given in a dictionary. 
As can be seen that ideal structured semantic 
space should be homogeneous, i.e., the clusters in it 
should be well-distributed, neither too dense nor too 
sparse. If it is too dense, there may be too many 
clusters activated by a context. On the contrary, if it 
is too sparse, there may be no clusters activated by a 
context, even if there is any, it may be the case that 
the senses in the clusters are not similar with the 
correct sense of the target word. So future work 
includes how to evaluate the homogeneity of the 
semantic space, how to locate the non-homogeneous 
areas in the space, and how to make them 
homogeneous. 
Obviously, the disambiguation accuracy will be 
reduced if the cluster contains less words, because 
less words in the cluster will lead to invalidity of its 
definition vectors in revealing the similar words 
included in their definitions. But it seems to be 
impossible to ensure that every cluster contains 
enough words, with only mono-sense words taken 
into consideration when building the semantic space. 
In order to make the cluster contain more words, we 
must make use of ambiguous words. So future work 
includes how to add ambiguous words into clusters 
based on their contexts. 
Another problem is about the length of the 
contexts to be considered. With longer contexts 
taken into consideration, there may be too many 
clusters activated. But if we consider shorter 
contexts, the meaningful information for word sense 
disambiguation may be lost. So future work also 
includes how to make an appropriate decision on the 
length of the contexts to be considered, meanwhile 
make out the meaningful information carried by the 
words outside the considered contexts. 

References 
J. Allen. 1995. Natural Language Understanding, 
The Benjamin/Cumming Publishing Company, 
INC. 
Rebecca Bruce, Janyce Wiebe. 1994. Word sense 
disarnbiguation using decomposable models. In 
Proceedings of the 32nd Annual Meeting of the 
Association for Computational Linguistics, Las 
Cruces, New Mexico. 
Y. Choueka and S. Lusignan. 1985. Disambiguation 
by short contexts. Computers and the 
Humanities, 19:147-157. 
William Gale, Kenneth Ward Church, and David 
Yarowsky. 1992. Estimating upper and lower 
bounds on the performance of word-sense 
disambiguation programs. In Proceedings of the 
30th Annual Meeting of the Association for 
Computational Linguistics, Newark, Delaware. 
Graeme Hirst. 1987. Semantic Interpretation and 
the Resolution of Ambiguity. Cambridge 
University Press, Cambridge. 
Alpha K.Luk. 1995. Statistical sense disambiguation 
with relatively small corpora using dictionary 
definitions. In Proceedings of the 33th Annual 
195 
Meeting of the Association for Computational 
Linguistics, Cambridge, Massachusetts. 
Susan W. McRoy. 199Z Using multiple knowledge 
sources for word sense disambiguation. 
Computational Linguistics, 18(1): 1-30. 
J.J.Mei et al. 1983. TongYiCi CiLin (A Chinese 
Thesaurus), Shanghai Cishu press, Shanghai. 
Hwee Tou Ng and Hian Beng Lee. 1996. Integrating 
multiple knowledge sources to disambiguating 
word sense: an exemplar-based approach. In 
Proceedings of the 34th Annual Meeting of the 
Association for Computational Linguistics. 
P. Resnik. 1995. Disambiguating noun groupings 
with respect to WordNet senses, In Proceedings 
of 3rd Workshop on Very Large Corpus, MIT, 
USA, 54-68. 
H. Schutz¢. 1993. Part-of-speech induction from 
scratch. In Proceedings of the 31st Annual 
Meeting of the Association for Computational 
Linguistics, Columbus, OH. 
Xiandai Hanyu Cidian. (a Modern Chinese 
Dictionary). 1978. Shnagwu Press, Beijing (in 
Chinese). 
D. Yarowsky. 1992. Word sense disambiguation 
using statistical models of Roget's categories 
trained on large corpora, Proceedings of 
COLING '92, Nantas, France, 454-460. 
David Yarowsky. 1994. Decision lists for lexical 
ambiguity resolution: Application to accent 
restoration in Spanish and French. In 
Proceedings of the 32nd Annual Meeting of the 
Association for Computational Linguistics, Las 
Cruces, New Mexico. 
David Yarowsky. 1995. Unsupervised word sense 
disambiguation rivaling supervised methods. In 
Proceedings of the 33th Annual Meeting of the 
Association for Computational Linguistics, 
Cambridge, Massachusetts. 
Zhang et al. 1994. Xiandai Hanyu Cihai. (a Chinese 
Collocation Dictionary), Renmin Zhongguo 
Press (in Chinese). 
