Integrating Collocation Features in Chinese Word Sense 
Disambiguation 
Wanyin Li  
Department of Computing 
The Hong Kong Polytechnic 
University 
Hong Hom, Kowloon, HK 
cswyli@comp.polyu.e
du.hk 
Qin Lu 
Department of Computing 
The Hong Kong Polytechnic 
University 
Hong Hom, Kowloon, HK 
csqinlu@comp.polyu.e
du.hk 
Wenjie Li  
Department of Computing 
The Hong Kong Polytechnic 
University 
Hong Hom, Kowloon, HK 
cswjli@comp.polyu.ed
u.hk 
 
Abstract 
The selection of features is critical in pro-
viding discriminative information for clas-
sifiers in Word Sense Disambiguation 
(WSD). Uninformative features will de-
grade the performance of classifiers. Based 
on the strong evidence that an ambiguous 
word expresses a unique sense in a given 
collocation, this paper reports our experi-
ments on automatic WSD using collocation 
as local features based on the corpus ex-
tracted from People’s Daily News (PDN) 
as well as the standard SENSEVAL-3 data 
set. Using the Naïve Bayes classifier as our 
core algorithm, we have implemented a 
classifier using a feature set combining 
both local collocation features and topical 
features. The average precision on the 
PDN corpus has 3.2% improvement com-
pared to 81.5% of the baseline system 
where collocation features are not consid-
ered. For the SENSEVAL-3 data, we have 
reached the precision rate of 37.6% by in-
tegrating collocation features into 
contextual features, to achieve 37% im-
provement  over  26.7% of precision in the 
baseline system. Our experiments have 
shown that collocation features can be used 
to reduce the size of human tagged corpus. 
1 Introduction 
WSD tries to resolve lexical ambiguity which 
refers to the fact that a word may have multiple 
meanings such as the word “walk” in  “Walk or 
Bike to school” and “BBC Education Walk 
Through Time”, or the Chinese word  “
� ” in  
“
� p ”(“local government”) and “�3��
,X
� ”(“He is also partly right”). WSD tries to 
automatically assign an appropriate sense to an 
occurrence of a word in a given context.  
Various approaches have been proposed to deal 
with the word sense disambiguation problem 
including rule-based approaches, knowledge or 
dictionary based approaches, corpus-based ap-
proaches, and hybrid approaches. Among these 
approaches, the supervised corpus-based ap-
proach had been applied and discussed by many 
researches ([2-8]). According to [1], the corpus-
based supervised machine learning methods are 
the most successful approaches to WSD where 
contextual features have been used mainly to 
distinguish ambiguous words in these methods. 
However, word occurrences in the context are 
too diverse to capture the right pattern, which 
means that the dimension of contextual words 
will be very large when all words in the training 
samples are used for WSD [14]. Certain 
uninformative features will weaken the dis-
criminative power of a classifier resulting in a 
lower precision rate. To narrow down the con-
text, we propose to use collocations as contex-
tual information as defined in Section 3.1.2. It is 
generally understood that the sense of an am-
biguous word is unique in a given collocation 
[19]. For example, “�>� ” means “burden” but 
not “baggage” when it appears in the collocation 
“���>� ” (“ burden of thought”). 
In this paper, we apply a classifier to combine 
the local features of collocations which contain 
the target word with other contextual features to 
discriminate the ambiguous words. The intuition 
is that when the target context captures a collo-
cation, the influence of other dimensions of
87
contextual words can be reduced or even ig-
nored. For example, in the expression “$��$
&�!�Z
��x ” (“terrorists burned down the 
gene laboratory”), the influence of contextual 
word “
�� ” (“gene”) should be reduced to work 
on the target word “�$ ” because “$��$ ” is 
a collocation whereas “�$ ” and “
�� ” are not 
collocations even though they do co-occur. Our 
intention is not to generally replace contextual 
information by collocation only. Rather, we 
would like to use collocation as an additional 
feature in WSD. We still make use of other  con-
textual features because of the following reasons. 
Firstly, contextual information is proven to be 
effective for WSD in the previous research 
works. Secondly, collocations may be independ-
ent on the training corpus and a sentence in con-
sideration may not contain any collocation. 
Thirdly, to fix the tie case such as $��$

��#A�
 (terrorists’ gene checking
),  
�$
 means human
 when presented in 
the collocation $��$
, but particle
 
in the collocation �$
�� ”.  The primary 
purpose of using collocation in WSD is to im-
prove precision rate without any sacrifices in 
recall rate. We also want to investigate whether 
the use of collocation as an additional feature 
can reduce the size of hand tagged sense corpus. 
 The rest of this paper is organized as follows. 
Section 2 summarizes the existing Word Sense 
Disambiguation techniques based on annotated 
corpora. Section 3 describes the classifier and 
the features in our proposed WSD approach. 
Section 4 describes the experiments and the 
analysis of our results. Section 5 is the conclu-
sion. 
2 Related Work 
Automating word sense disambiguation tasks 
based on annotated corpora have been proposed. 
Examples of supervised learning methods for 
WSD appear in [2-4], [7-8]. The learning algo-
rithms applied including: decision tree, decision-
list [15], neural networks [7], naïve Bayesian 
learning ([5],[11]) and maximum entropy [10]. 
Among these leaning methods, the most impor-
tant issue is what features will be used to con-
struct the classifier. It is common in WSD to use 
contextual information that can be found in the 
neighborhood of the ambiguous word in training 
data ([6], [16-18]). It is generally true that when 
words are used in the same sense, they have 
similar context and co-occurrence information 
[13]. It is also generally true that the nearby con-
text words of an ambiguous word give more ef-
fective patterns and features values than those 
far from it [12]. The existing methods consider 
features selection for context representation in-
cluding both local and topic features where local 
features refer to the information pertained only 
to the given context and topical features are sta-
tistically obtained from a training corpus. Most 
of the recent works for English corpus including 
[7] and [8], which combine both local and topi-
cal information in order to improve their per-
formance. An interesting study on feature 
selection for Chinese [10] has considered topical 
features as well as local collocational, syntactic, 
and semantic features using the maximum en-
tropy model. In Dang’s [10] work, collocational 
features refer to the local PoS information and 
bi-gram co-occurrences of words within 2 posi-
tions of the ambiguous word. A useful result 
from this work based on (about one million 
words) the tagged People’s Daily News shows 
that adding more features from richer levels of 
linguistic information such as PoS tagging 
yielded no significant improvement (less than 
1%) over using only the bi-gram co-occurrences 
information. Another similar study for Chinese 
[11] is based on the Naive Bayes classifier 
model which has taken into consideration PoS 
with position information and bi-gram templates 
in the local context. The system has a reported 
60.40% in both precision and recall based on the 
SENSEVAL-3 Chinese training data. Even 
though in both approaches, statistically signifi-
cant bi-gram co-occurrence information is used, 
they are not necessarily true collocations.  For 
example, in the express “`� 
` ,�?�  � 	�

 �4�2� �$  ,X #| �� ”, the bi-grams in 
their system are (`� 
`  ,  
` ,�?�  ,  ,�?�  
�  ,   � 	�
  ,  	�
 �4�2�  ,  �4�2�  ,X   
 ,X #|  ,  #| ��   Some bi-grams such as 
 #| ��   may have higher frequency but 
may introduce noise when considering it as fea-
tures in disambiguating the sense “human|� ” 
and “symbol|0�	� ” like in the example case of 
“"�$#|�� ”. In our system, we do not rely 
on co-occurrence information. Instead, we util-
ize true collocation information (�4�2� , �$ ) 
which fall in the window size of (-5, +5) as fea-
88
tures and the sense of “human|� ” can be de-
cided clearly using this features. The collocation 
information is a pre-prepared collocation list 
obtained from a collocation extraction system 
and verified with syntactic and semantic meth-
ods ([21], [24]).    
Yarowsky [9] used the one sense per collocation 
property as an essential ingredient for an unsu-
pervised Word-Sense Disambiguation algorithm 
to perform bootstrapping algorithm on a more 
general high-recall disambiguation. A few re-
cent research works have begun to pay attention 
to collocation features on WSD. Domminic [19] 
used three different methods called bilingual 
method, collocation method and UMLS (Unified 
Medical Language System) relation based 
method to disambiguate unsupervised English 
and German medical documents. As expected, 
the collocation method achieved a good preci-
sion around 79% in English and 82% in German 
but a very low recall which is 3% in English and 
1% in German. The low recall is due to the na-
ture of UMLS where many collocations would 
almost never occur in natural text.  To avoid this 
problem, we combine the contextual features in 
the target context with the pre-prepared colloca-
tions list to build our classifier.  
3 The Classifier With Topical Contex-
tual and Local Collocation Features 
3.1 The Feature Set 
As stated early, an important issue is what fea-
tures will be used to construct the classifier in 
WSD. Early researches have proven that using 
lexical statistical information, such as bi-gram 
co-occurrences was sufficient to produce close 
to the best results [10] for Chinese WSD. In-
stead of including bi-gram features as part of 
discrimination features, in our system, we con-
sider both topical contextual features as well as 
local collocation features. These features are 
extracted form the 60MB human sense-tagged 
People’s Daily News with segmentation infor-
mation.  
3.1.1 Topical Contextual Features 
Niu [11] proved in his experiments that Naïve 
Bayes classifier achieved best disambiguation 
accuracy with small topical context window size 
(< 10 words).  We follow their method and set 
the contextual window size as 10 in our system.  
Each of the Chinese words except the stop 
words inside the window range will be consid-
ered as one topical feature. Their frequencies are 
calculated over the entire corpus with respect to 
each sense of an ambiguous word w.  The sense 
definitions are obtained from HowNet. 
3.1.2 Local Collocation Features 
We chose collocations as the local features. A 
collocation is a recurrent and conventional fixed 
expression of words which holds syntactic and 
semantic relations [21]. Collocations can be 
classified as fully fixed collocations, fixed col-
locations, strong collocations and loose colloca-
tions. Fixed collocations means the appearance 
of one word implies the co-occurrence of an-
other one such as “	Z	��>� ” (“burden of his-
tory”), while strong collocations allows very 
limited substitution of the components, for ex-
ample, “
� L6� ” (“local college”), or ” 
� �
: ” (“local university”). The sense of ambiguous 
words can be uniquely determined in these two 
types of collocations, therefore are the colloca-
tions applied in our system. The sources of the 
collocations will be explained in Section 4.1. 
In both Niu [11] and Dang’s [10] work, topical 
features as well as the so called collocational 
features were used. However, as discussed in 
Section 2, they both used bi-gram co-
occurrences as the additional local features. 
However, bi-gram co-occurrences only indicate 
statistical significance which may not actually 
satisfy the conceptual definition of collocations. 
Thus instead of using co-occurrences of bi-
grams, we take the true bi-gram collocations 
extracted from our system and use this data to 
compare with bi-gram co-occurrences to test the 
usefulness of collocation for WSD. The local 
features in our system make use of the colloca-
tions using the template (wi, w) within a window 
size of ten (where i = ± 5). For example, “p
F�K�  
`  
�  p  Ax ” (“Government 
departments and local government commanded 
that”) fits the bi-gram collocation template (w, 
w1) with the value of (
�  p ). During the 
training and the testing processes, the counting 
of frequency value of the collocation feature will 
be increased by 1 if a collocation containing the 
ambiguous word occurs in a sentence. To have a 
good analysis on collocation features, we have 
also developed an algorithm using lonely 
adjacent bi-gram as locals features(named Sys-
89
adjacent bi-gram as locals features(named Sys-
tem A)  and another using collocation as local 
features(named System B). 
3.2 The Collocation Classifier 
We consider all the features in the features set F 
= Ft ∪Fl = {f1, f2,  … , fm } as independent, where 
Ft stands for the topical contextual features set, 
and Fl stands for the local collocation features 
set. For an ambiguous word w with n senses, let 
Sw = {ws1, ws2,  … , wsn } be the sense set. For 
the contextual features, we directly apply the 
Naïve Bayes algorithm using Add-Lambda 
Smoothing to handle unknown words: 
 
)|(log)(log)(1 sij
Ff
sisi wfpwpwscore
tj
∑
∈
+=   
(1) 
For each sense siw of an ambiguous word w:
 )( )()( wfreqwfreqwp sisi =                       (2) 
For each contextual feature fj respects to each 
sense siw of w : 
),(
),()|(
si
Ff
t
sij
sij wffreq
wffreqwfp
tt
∑
∈
=   (3) 
To integrate the local collocation feature fj ) Fl  
with respect to each sense siw  of w, we use the 
follows formula: 
)()()( 21 sisisi wscorewscorewscore •+= α  (4) 
 
where α is tuned from experiments (Section 4.5), 
score1( siw ) refers the score of the topical con-
textual features based on formula (1) and 
score2( siw ) refers the score of collocation fea-
tures with respect to the sense sjw  of w defined 
below. 
∑
∈
=
lj Ff
sjjsi wfwscore )|()(2 δ           (5) 
where δ(fj| sjw ) = 1 for fj ) Fl if the collocation 
occurs in the local context. Otherwise this term 
is set as 0. 
Finally, we choose the right skw so that 
)(maxarg sks wscores
k
=        (6) 
4 Experimental Results 
We have designed a set of experiments to com-
pare the classifier with and without the colloca-
tion features. In system A, the classifier is built 
with local bi-gram features and topical contex-
tual features. The classifier in system B is con-
structed from combining the local collocation 
features with topical features. 
4.1 Preparation the Data Set 
We have selected 20 ambiguous words from 
nouns and verbs with the sense number as 4 in 
average. The sense definition is taken from 
HowNet [22]. To show the effect of the algo-
rithm, we try to choose words with high degree 
of ambiguity, high frequency of use [23], and 
high frequency of constructing collocations. The 
selection of these 20 words is not completely 
random although within each criterion class we 
do try to pick word randomly. 
Based on the 20 words, we extracted 28,000 
sentences from the 60 MB People’s Daily News 
with segmentation information as our train-
ing/test set which is then manually sense-tagged.  
The collocation list is constructed from a 
combination of a digital collocation dictionary, a 
return result from a collocation automatic ex-
traction system [21], and a hand collection from 
the People’s Daily News. As we stated early, the 
sense of ambiguous words in the fixed colloca-
tions and strong collocations can be decided 
uniquely although they are not unique in loose 
collocations. For example, the ambiguous word 
“M6,� ” in the collocation “�,XM6,� ” may 
have both the sense of “appearance|�?� ” or 
“reputation|	�� ”. Therefore, when labeling the 
sense of collocations, we filter out the ones 
which cannot uniquely determine the sense of 
ambiguous words inside. However, this does not 
mean that loose collocations have no contribu-
tion in WSD classification. We simply reduce its 
weight when combining it with the contextual 
features compared with the fixed and strong col-
locations. The sense and collocation distribution 
over the 20 words on the training examples can 
be found in Table 1. 
Table 1. Sense and Collocation Distribution of the 20 tar-
get words in the training corpus 
Am. 
W 
T# S1 
co# 
S2 
co# 
S3 
co# 
S4 
co# 
S5 
co# 
S6 
co# 
90
�1u  31 1  1 30 10 NA    
�L=   499 479  324 18  0 0 0 NA  
4~�   944 908  129 1  1 17 10 18  0 0 NA 
/�c   409 3  2 389  171 17 0 NA   
�>�   110 3  0 101 36 6  9 NA   
4�J   41 3  0 37  6 1  0 NA   
G2�   4885 26  0 34  0 72  0 4492 1356 261 1 NA 
2�/2   3508 7  0 48  4 3194 1448 259 194 NA  
y?�   348 312  117 22 11 14  4 NA   
#|   4438 3983 721 33  10 123  37 153 123 102 23 44 5 
y
   1987 1712 723 274 10 NA    
�   83 36  14 47  4 00 NA   
/�z   995 168  108 827 513 NA    
M6,�   31 11  3 20  11 NA    

�    2725 227 1772 498 49 102 424 1898 201 NA  
^�   592 1  0 208 63 367 124 16 1 NA  
C!   1155 756  571 399 135 NA    
�n   2792 691  98 1765 113 336  29 0 NA  

�   2460 82  63 36 11 1231 474 877 103 NA  
��   125 11  0 64  0 15  3 32  4 3  0 NA 
T#: total number of sentences contain the ambiguous word 
s1- s6: sense no; co#: number of collocations in each sense 
4.2 The Effect of Collocation Features 
We recorded 6 trials with average precision over 
six-fold validation for each word. Their average 
precision for the six trials in the system A, and B 
can be found in Table 2 and Table 3. From Ta-
ble 3, regarding to precision, there are 16 words 
have improved and 4 words remained the same 
in the system B. The results from the both sys-
tem confirmed that collocation features do im-
prove the precision. Note that 4 words have the 
same precision in the two systems, which fall 
into two cases. In the first case, it can be seen 
that these words already have very high preci-
sion in the system A (over 93%) which means 
that one sense dominates all other senses. In this 
case, the additional collation information is not 
necessary. In fact, when we checked the inter-
mediate outputs, the score of the candidate 
senses of the ambiguous words contained in the 
collocations get improved. Even though, it 
would not change the result. Secondly, no collo-
cation appeared in the sentences which are 
tagged incorrectly in the system A. This is con-
firmed when we check the error files. For exam-
ple, the word “G&}” with the sense as “�!�” 
(“closeness”) appeared in 4492 examples over 
the total 4885 examples (91.9%). In the mean 
time, 99% of collocation in its collocation list 
has the same sense of “�!�” (“closeness”). 
Only one collocation “G&} ” has the sense of 
“�
” (“power”). Therefore, the collocation 
features improved the score of sense “�!�” 
which is already the highest one based on the 
contextual features.  
As can be seen from Table 3, the collocation 
features work well for the sparse data. For ex-
ample, the word “�1u ” in the training corpus 
has only one example with the sense “
q” (“hu-
man”), the other 30 examples all have the sense 
“1u)� ” (“management”). Under this situation, 
the topical contextual features failed to identify 
the right sense for the only appearance of the 
sense “
q” (“human”) in the training instance 
“�,�h,X2�N��P`��1u���
, …”. How-
ever, it can be correctly identified in the system 
B because the appearance of the collocation “�
1u���
, ”. 
To well show the effect of collocations on 
the accuracy of classifier for the task of WSD, 
we also tested both systems on SENSEVAL-3 
data set, and the result is recorded in the Table 4. 
From the difference in the relative improvement 
of both data sets, we can see that collocation 
features work well when the statistical model is 
not sufficiently built up such as from a small 
corpus like SENSEVAL-3. Actually, in this case, 
the training examples appear in the corpus only 
once or twice so that the parameters for such 
sparse training examples may not be accurate to 
forecast the test examples, which convinces us 
that collocation features are effective on han-
dling sparse training data even for unknown 
words. Fig. 1 shows the precision comparison in 
the system A, and B on SENVESAL-3. 
Table 2.  Average Precision (5/6 training, 1/6 test) of 
system A on People’s Daily News 
Amb. 
W T1 T2 T3 T4 T5 T6 
Ave. 
Prec. 
�1u   1.00 1.00 1.00 1.00 1.00 .83 .972 
�L=   .90 .97 1.00 1.00 .97 .98 .972 
4~�   .97 .96 .96 .92 .98 .96 .958 
91
/�c   .94 .94 .97 .92 .97 .97 .951 
�>�   1.00 1.00 .77 .94 .88 1.00 .932 
4�J   .83 1.00 1.00 1.00 .83 .90 .927 
G2�   .93 .95 .91 .92 .92 .92 .925 
2�/2   .93 .94 .89 .91 .89 .90 .91 
y?�   .94 .93 .86 .93 .89 .87 .903 
#|   .83 .94 .89 .90 .88 .94 .897 
y
   .86 .88 .92 .84 .82 .87 .865 
�   .92 .84 .92 .76 .84 .72 .833 
/�z   .84 .83 .88 .82 .88 .71 .827 
M6,�   .80 .60 .80 .20 1.00 1.00 .733 

�    .68 .72 .67 .77 .70 .68 .703 

�   .51 .67 .47 .60 .68 .59 .586 
�n   .70 .63 .66 .64 .64 .64 .652 
^�  .57 .74 .55 .64 .72 .67 .648 
C!   .65 .58 .66 .64 .54 .47 .58 
��   .55 .50 .45 .45 .45 .64 .507 
Total Average Precision 0.815 
Table 3. Average Precision (5/6 training, 1/6 test) of 
system B on People’s Daily News 
Amb. 
W T1 T2 T3 T4 T5 T6 
Ave. 
Prec. 
�1u  1.00 1.00 1.00 1.00 1.00 1.00 1.00 
�L=   .90 .97 1.00 1.00 .97 .98 .970 
4~�   .96 .98 .97 .96 .98 .96 .968 
/�c   .94 .94 .97 .94 .97 .98 .957 
�>�   1.00 1.00 .77 .94 .88 1.00 .931 
4�J   .83 1.00 1.00 1.00 .83 .90 .927 
G2�   .93 .95 .91 .92 .92 .92 .925 
2�/2   .92 .95 .92 .92 .91 .91 .922 
y?�   .94 .94 .86 .93 .91 .87 .908 
#|   .80 .95 .89 .93 .89 .94 .902 
y
   .87 .88 .92 .84 .83 .91 .875 
�   .84 1.00 .92 .76 .84 .77 .855 
/�z   .88 .86 .89 .84 .90 .74 .852 
M6,�   1.00 .80 .80 .20 1.00 1.00 .800 

�    .69 .72 .68 .79 .75 .72 .725 
^�   .69 .76 .73 .74 .82 .79 .755 
C!   .58 .59 .70 .67 .64 .59 .628 
�n   .68 .67 .66 .63 .65 .63 .653 

�   .65 .68 .71 .61 .70 .69 .673 
��   .60 .55 .54 .54 .54 .64 .568 
Total Average Precision 0.840 
Table 4.  Average Precision of System A & B on 
SENSEVAL-3 Data Set 
Amb. 
Word 
Total 
S 
Ave. Prec. in 
Sys A 
Ave. Prec. 
in Sys B 
�$   48 .207 .290 
��  20 .742 .742 
CD  49 .165 .325 

$  25 .325 .325 
#|   36 .260 .373 
-�0J   30 .167 .267 
"u�   30 .192 .392 
�$   36 .635 .635 
C�  57 .238 .275 

�   36 .327 .385 
^�   31 .100 .322 
J�  40 .358 .442 
CK9   40 .308 .308 
�  76 .110 .123 
0S  28 .308 .475 
0U�  30 .500 .667 
�  42 .165 .260 
5�  57 .037 .422 
��   28 .833 .103 
Total Ave. 
Precision .276 .376 
Fig. 1. The precision comparison in system A, and B based 
on SENSEVAL-3 
 
4.3 The Effect of Collocations on the Size 
of Training Corpus Needed 
Hwee [21] stated that a large-scale, human 
sense-tagged corpus is critical for a supervised 
learning approach to achieve broad coverage 
and high accuracy WSD. He conducted a thor-
ough study on the effect of training examples on 
the accuracy of supervised corpus based WSD. 
As the result showed, WSD accuracy continues 
to climb as the number of training examples in-
creases. Similarly, we have tested the system A, 
and B with the different size of training corpus 
based on the PDN corpus we prepared. Our ex-
periment results shown in Fig 2 follow the same 
fact.  The purpose we did the testing is that we 
hope to disclose the effect of collocations on the 
size of training corpus needed. From Fig 2, we 
can see by using the collocation features, the 
precision of the system B has increased slower 
along with the growth of training examples than 
the precision of the system A.  The result is rea-
sonable because with collocation feature, the 
statistical contextual information over the entire 
corpus becomes side effect. Actually, as can be 
seen from Fig. 2, after using collocation features 
92
in the system B, even we use 1/6 corpus as train-
ing, the precision is still higher than we use 5/6 
train corpus in the system A. 
Fig. 2. The precision variation respect to the size of   train-
ing corpus in system A, and B based on PDN corpus 
 
4.4 Investigation of Sense Distribution on 
the Effect of Collocation Features 
To investigate the sense distribution on the ef-
fect of collocation features, we selected the am-
biguous words with the number of sense varied 
from 2 to 6. In each level of the sense number, 
the words are selected randomly. Table 5 shows 
the effect of sense distribution on the effect of 
collocation features. From the table, we can see 
that the collocation features work well when the 
sense distribution is even for a particular am-
biguous word under which case the classifier 
may get confused. 
Table 5.  The Effect of Sense Distribution on the Effect of 
collocation Features 
Amb. 
word 
Prec. 
Wihtout 
coll 
Prec. 
With  
coll 
Sense 
# 
Sense 
Distri. 
�1u   .972 1 2 97% * 
�L=   .97 .97 4 96% * 
4~�   .957 .968 5 96% * 
/�c   .951 .957 3 95% * 
�>�   .931 .931 3 92% * 
4�J   .927 .927 3 90% * 
G2�   .925 .925 5 92% * 
2�/2   .915 .922 4 91% * 
y?�   .903 .908 3 90% * 
#|   .902 .902 6 90% * 
y
   .865 .875 2 86% o 
�   .833 .855 3 ^ 
/�z   .823 .852 2 83% o 
M6,�   .733 .8 2 ^ 

�    .706 .725 4 ^ 
^�  .65 .653 4 ^ 
C!   .618 .755 4 ^ 
�n   .582 .628 2 ^ 

�   .563 .673 4 ^ 
��   .507 .568 5 ^ 
     *: over 90% samples fall in one dominate sense 
     ^: Even distribution over all senses  
     o: 83% to 86% samples fall in one dominate sense 
4.5 The Test of α 
We have conducted a set of experiments based 
on both the PDN corpus and SENSEVLA-3 data 
to set the best value of α for the formula (4) de-
scribed in Section 3.2. The best start value of α 
is tested based on the precision rate which is 
shown in Fig. 3. It is shown from the experiment 
that α takes the start value of 0.5 for both cor-
puses.  
Fig. 3. The best value of α vs the precision rate 
 
5 Conclusion and the Future Work 
This paper reports a corpus-based Word Sense 
Disambiguation approach for Chinese word us-
ing local collocation features and topical contex-
tual features. Compared with the base-line 
systems in which a Naïve Bayes classifier is 
constructed by combining the contextual fea-
tures with the bi-gram features, the new system 
achieves 3% precision improvement in average 
in Peoples’ Daily News corpus and 10% im-
provement in SENSEVAL-3 data set. Actually, 
it works very well when disambiguating the 
sense with sparse distribution over the entire 
corpus under which the statistic calculation 
prone to identify it incorrectly. In the same time, 
because disambiguating using collocation fea-
93
tures does not need statistical calculation, it 
makes contribution to reduce the size of human 
tagged corpus needed which is critical and time 
consuming in corpus based approach.  
Because different types of collocations may 
play different roles in classifying the sense of an 
ambiguous word, we hope to extend this work 
by integrating collocations with different weight 
based on their types in the future, which may 
need a pre-processing job to categorize the col-
locations automatically. 
6 Acknowledgements 
We would like to present our thanks to the IR 
Laboratory in HIT University of China for shar-
ing their sense number definition automatically 
extracted from HowNet with us. 
References 
1. Hwee Tou Ng, Bin Wang, Yee Seng Chan. Exploiting 
Parallel Texts for Word Sense Disambiguation. ACL-03 
(2003) 
2. Black E.: An experiment in computational discrimina-
tion of English word senses. IBM Journal of Research 
and Development, v.32, n.2, (1988) 185-194 
3. Gale, W. A., Church, K. W. and Yarowsky, D.: A 
method for disambiguating word senses in a large cor-
pus. Computers and the Humanities, v.26, (1993) 415-
439 
4. Leacock, C., Towell, G. and Voorhees, E. M.: Corpus-
based statistical sense resolution. In Proceedings of the 
ARPA Human Languages Technology Workshop (1993) 
5. Leacock, C., Chodorow, M., & Miller G. A..Using Cor-
pus Statistics and WordNet Relations for Sense Identifi-
cation. Computational Linguistics, 24:1, (1998) 147–
165  
6. Schütze, H.: Automatic word sense discrimination. 
Computational Linguistics, v.24, n.1, (1998) 97-124 
7. Towell, G. and Voorhees, E. M.: Disambiguating highly 
ambiguous words. Computational Linguistics, v.24, n.1, 
(1998) 125-146 
 8. Yarowsky, D.: Decision lists for lexical ambiguity reso-
lution: Application to accent restoration in Spanish and 
French. In Proceedings of the Annual Meeting of the 
Association for Computational Linguistics, (1994) 88-
95 
9. Yarowsky, D.: Unsupervised word sense disambiguation 
rivaling supervised methods. In Proceedings of the An-
nual Meeting of the Association for Computational Lin-
guistics, (1995)189-196 
 10. Dang, H. T., Chia, C. Y., Palmer M., & Chiou, F.D., 
Simple Features for Chinese Word Sense Disambigua-
tion. In Proc. of COLING (2002) 
11. Zheng-Yu Niu, Dong-Hong Ji, Chew Lim Tan, Opti-
mizing Feature Set for Chinese Word Sense Disam-
biguation. To appear in Proceedings of the 3rd 
International Workshop on the Evaluation of Systems 
for the Semantic Analysis of Text (SENSEVAL-3). 
Barcelona, Spain (2004) 
12. Chen, Jen Nan and Jason S. Chang, A Concept-based 
Adaptive Approach to Word SenseDisambiguation, 
Proceedings of 36th Annual Meeting of the Association 
for Computational Linguistics and 17th International 
Conference on Computational linguistics. 
COLING/ACL-98 (1998) 237-243 
13.  Rigau, G., J. Atserias and E. Agirre, Combining Unsu-
pervised Lexical Knowledge Methods for Word Sense 
Disambiguation, Proceedings of joint 35th Annual 
Meeting of the Association for Computational Linguis-
tics and 8th Conference of the European Chapter of the 
Association for Computational Linguistics 
(ACL/EACL’97), Madrid, Spain (1997) 
14. Jong-Hoon Oh, and Key-Sun Choi, C02-1098.: Word 
Sense Disambiguation using Static and Dynamic Sense 
Vectors. COLING (2002) 
15. Yarowsky, D., Hierarchical Decision Lists for Word 
Sense Disambiguation. Computers and the Humanities, 
34(1-2), (2000) 179–186 
16. Agirre, E. and G. Rigau (1996) Word Sense Disam-
biguation using Conceptual Density, Proceedings of 
16th International Conference on Computational Lin-
guistics. Copenhagen, Denmark, COLING (1996) 
17. Escudero, G., L. Màrquez and G. Rigau, Boosting Ap-
plied to Word Sense Disambiguation. Proceedings of 
the 11th European Conference on Machine Learning 
(ECML 2000) Barcelona, Spain. 2000. Lecture Notes in 
Artificial Intelligence 1810. R. L. de Mántaras and E. 
Plaza (Eds.). Springer Verlag (2000) 
18. Gruber, T. R., Subject-Dependent Co-occurrence and 
Word Sense Disambiguation. Proceedings of 29th An-
nual Meeting of the Association for Computational Lin-
guistics (1991) 
19. Dominic Widdows, Stanley Peters, Scott Cederberg, 
Chiu-Ki Chan, Diana Steffen, Paul Buitelaar, Unsuper-
vised Monolingual and Bilingual Word-Sense Disam-
biguation of Medical Documents using UMLS. 
Appeared in Natural Language Processing in Biomedi-
cine,. ACL 2003 Workshop, Sapporo, Japan (2003) 9–
16 
20. Hwee Tou Ng., Getting serious about word sense dis-
ambiguation. In Proceedings of the ACL SIGLEX 
Workshop on Tagging Text with Lexical Seman-
tics:Why, What, and How? (1997) 1–7 
21. Ruifeng Xu , Qin Lu, and Yin Li, An automatic Chi-
nese Collocation Extraction Algorithm Based On Lexi-
cal Statistics. In Proceedings of the NLPKE Workshop 
(2003) 
22.  D. Dong and Q. Dong, HowNet. 
   http://www.keenage.com, (1991) 
23.  Chih-Hao Tsai, 
 http://technology.chtsai.org/wordlist/, (1995-2004) 
24. Q. Lu, Y. Li, and R. F. Xu, Improving Xtract for Chi-
nese Collocation Extraction.  Proceedings of IEEE In-
ternational Conference on Natural Language Processing 
and Knowledge Engineering, Beijing (2003) 
 
 
94
