Using Semantic Similarity to Acquire Cooccurrence Restrictions from Corpora 
Antonio Sanfilippo 
SHARP Laboratories of Europe 
Oxford Science Park 
Oxford OX4 4GA, UK 
antonio@sharp, co. uk 
Abstract 
We describe a method for acquiring semantic 
cooccurrence restrictions for tuples of syntacti- 
cally related words (e.g. verb-object pairs) from 
text corpora automatically. This method uses the 
notion of semantic similarity to assign a sense 
from a dictionary database (e.g. WordNet) to 
ambiguous words occurring in a syntactic de- 
pendency. Semantic similarity is also used to 
merge disambiguated word tuples into classes of 
cooccurrence restrictions. This encoding makes 
it possible to reduce subsequent disambiguation 
events to simple table lookups. 
1 Introduction 
Although the assessment of semantic similarity using a 
dictionary database as knowledge source has been rec- 
ognized as providing significant cues for word clustering 
(Resnik 1995b) and the determination of lexical cohe- 
sion (Morris & Hirst, 1991), its relevance for word dis- 
ambiguation in running text remains relatively unex- 
plored. The goal of this paper is to investigate ways in 
which semantic similarity can be used to address the 
disambiguation of syntactic collocates with specific 
reference to the automatic acquisition of semantic cooc- 
currence restrictions from text corpora. 
A variety of methods have been proposed to rate 
words for semantic similarity with reference to an ex- 
isting word sense bank. In Rada et al. (1989), semantic 
similarity is evaluated as the shortest path connecting the 
word senses being compared in a hierarchically struc- 
tured thesaurus. Kozima & Furugori (1993) Measure 
conceptual distance by spreading activation on a seman- 
tic network derived from LDOCE. Resnik (1995a) de- 
fines the semantic similarity between two words as the 
entropy value of the most informative concept subsum- 
ing the two words in a hierarchically structured thesau- 
rus. A comparative assessment of these methods falls 
outside the scope of this paper as the approach to disam- 
biguation we propose is in principle compatible with 
virtually any treatment of semantic similarity. Rather, 
our objective is to show that given a reliable calculation 
of semantic similarity, good results can be obtained in 
the disambiguation of words in context. In the work 
described here, Resnik's approach was used. 
Following Resnik, semantic similarity is assessed with 
reference to the WordNet lexical database (Miller, 1990) 
where word senses are hierarchically structured. For 
example, (all senses of) the nouns clerk and salesperson 
in WordNet are connected to the first sense of the nouns 
employee, worker, person so as to indicate that clerk and 
salesperson are a kind of employee which is a kind of 
worker which in turn is a kind of person. In this case, the 
semantic similarity between the words clerk and sales- 
person would correspond to the entropy value of em- 
ployee which is the most informative (i.e. most specific) 
concept shared by the two words. Illustrative extracts of 
WordNet with specific reference to the examples used 
throughout the paper are provided in table 1. 
The information content (or entropy) of a concept c --- 
which in WordNet corresponds to a set of such as 
fire_v_4, dismiss_v_4, terminate_v_4, sack v 2 --- is for- 
mally defined as -log p(c) (Abramson, 1963:6-13). The 
probability of a concept c is obtained for each choice of 
text corpus or corpora collection K by dividing the fre- 
quency of c in K by the total number of words W ob- 
served in K which have the same part of speech p as the 
word senses in c: 
(1) prob(cp) = fieq(cp) Wp 
The frequency of a concept is calculated by counting 
the occurrences of all words which are potential in- 
stances of (i.e. subsumed by) the concept. These include 
words which have the same orthography and part of 
speech as the synonyms defining the concept as well as 
the concept's superordinates. Each time a word Wr~is 
encountered in K, the count of each concepts Cp ~ub- 
suming Wp (in any of its senses) is increased by one: 
(2) fieq(cp) = E count(Wp) c. e{x~,sub(x, Wp)} 
The semantic similarity between two words Wlp W2p 
is expressed as the entropy value of the most informative 
concept cp which subsumes both Wlp and W2p, as 
shown in (3). 
82 
max (3) ssm(Wlp, W2p) = \[- log p(cp)\] 
cp e {x \[ sub(x, Wlp) ^ sub(x, W2p)} 
The specific senses of Wlp W2p under which semantic 
similarity holds is determified with respect to the sub- 
sumption relation linking Cp with Wlp ;f2p. Suppose for 
example that in calculating the semantic similarity of 
the two verbs fire, dlsmtss using the WordNet lexical 
database we find that the most informative subsuming 
concept is represented by the synonym set containing the 
word sense remove v 2. We will then know that the 
senses for fire, dismiss under which the similarity holds 
are fire v 4 and dismiss v 4 as these are the only in- 
stances of the verbs fire and dismlss subsumed by re- 
move v 2 in the WordNet hierarchy. 
We propose to use semantic similarity to disambiguate 
syntactic collocates and to merge disambiguated collo- 
cates into classes of cooccurrence restrictions. Disam- 
biguation of syntactic collocates results from intersect- 
ing pairs consisting of (i) a cluster containing all senses 
of a word collocate W1 having appropriate syntactic 
usage, and (it) a cluster of semantically similar word 
senses related to W1 by the same syntactic dependency, 
e.g.: 
(4) IN: < {firej_213141617/8}, 
{clerk_n_i/2, employee_n_l} > 
< {fire_v2~3~4~6/7/8}, 
{gun n 1, rockeL n_l} > 
< {hire v_3, recruit_v 2}, 
{clerk n_1/2} > 
< {dismiss v 4, fire_v_4}, 
{clerk_n_1/2} • 
OUT: < {fire v_4}, {clerk n_1/2} • 
The results of distinct disambiguation events are merged 
into pairs of semantically compatible word clusters using 
the notion of semantic similarity. 
2 Extraction of Syntactic Word Collo- 
cates from Corpora 
First, all instances of the syntactic dependency pairs 
under consideration (e.g. verb-object, verb-subject, ad- 
jective-noun) are extracted from a collection of text 
corpora using a parser. In performing this task, only the 
most important words (e.g. heads of immediate constitu- 
ents) are chosen. The chosen words are also lemmatized. 
For example, the extraction of verb-object collocates 
from a text fragment such as have certamly htred the 
best financial analysts tn the area would yield the pair < 
hire, analyst >. 
The extracted pairs are sorted according to the syntac- 
tic dependency involved (e.g. verb-object). All pairs 
which involve the same dependency and share one word 
collocate are then merged. Each new pair consists of a 
unique associating word and a set of associated words 
containing all "statistically relevant" words (see below) 
which are related to the associating word by the same 
syntactic dependency, e.g. 
(5) IN: < fire_v, gun_n > 
< firev, rocket n > 
< fire_v, employee_n > 
< fire_v, clerk n • 
< firev, hymn n • 
< fwev, rate_n > 
OUT: < fre_v, 
{gun_n,rocket n,employee_n,clerk_n} > 
IN: < firev, employee_n > 
< dtsmms_v, employee_n • 
< hire_v, employee_n > 
< recruitv, employee_n • 
< attract_v, employee_n• 
< be_v, employeen• 
< makev, employee_n> 
< affectv, employee_n• 
OUT: < {fire_v,dmmiss_v,hire v,recruit v}, 
employee_n • 
The statistical relevance of associated words is de- 
fined with reference to their conditional probability. For 
example, consider the equations in (6) where the nu- 
meric values express the (conditional) probability of 
occurrence in some corpus for each verb in (5) given the 
noun employee. 
(6) fi'eq(fire v \[ employee_n)= .3 
freq(dlsmiss v \[ employee_n)= .28 
freq(hlre_v \[ employee_n)= .33 
freq(recrmt v \[ employee_n)= .22 
fi-eq(attract v \[ employee n) = .02 
freq(be v \[ employee_n) = .002 
freq(make v \] employee_n)= .005 
freq(affect_v \[ employee_n•)= .01 
These conditional probabilities are obtained by dividing 
the number of occurrences of the verb with employee by 
the total number of occurrences of the verb with refer- 
ence to the text corpus under consideration, as indicated 
in (7). 
(7) prob(W1 ~ W2) = count(W1, W2) 
count(Wl) 
Inclusion in the set of statistically relevant associated words 
is established with reference to a threshold TI which can be 
either selected manually or determined automatically as the 
most ubiquitous probability value for each choice of associ- 
ating word. For example, the threshold T1 for the selection 
of verbs taking the noun employee as direct object with 
reference to the conditional probabilities in (6) cart be cal- 
culated as follows. First, all probabilities in (6) are distrib- 
uted over a ten-bin template, where each bin is to receive 
progressively larger values starting from a fixed lowest 
point greater than 0, e.g.: 
83 
y rom_~._ 02 - _L. _L__3. 
To 1 2 3 4 
3 
Values -- -- 28 33 
I i 22 
4\[ 5 I 6 7 ~.98 - 
~F~"-T' -7 " " 
Then one of the values from the bin containing most 
elements (e.g. the lowest) is chosen as the threshold. 
The exclusion of collocates which are not statistically 
relevant in the sense specified above makes it possible to 
avoid interference from collocations which do not pro- 
vide sufficiently specific exemplifications of word us- 
age. 
3 Word Clustering and Sense Expansion 
Each pair of syntactic collocates at this stage consists of 
either 
• an associating head word (AING) and a set of de- 
pendent associated words (AED), e.g. 
< AING: fire_v, 
AED: {gun n,rocket_n,employee_n,clerk n} > 
• or an associating dependent word (AING) and a set 
of associated head words (AED), e.g. 
< AED: {fire_v, dismiss v,hire v,recruit_v}, 
AING: employee_n> 
The next step consists in partitioning the set of associ- 
ated words into clusters of semantically congruent word 
senses. This is done in three stages. 
1. Form all possible unique word pairs with non-identical 
members out of each associated word set, e.g. 
IN: {fire, dismiss, htre, recrmt} 
OUT: {ftre-dismms,fire-htre,fire-recrult, 
dmmms-hlre,dmmiss-recrmt, 
hire-recruit} 
IN: {gun,rocket,employee,clerk} 
OUT: {gun-rocket,gun-employee, 
gun-clerk,rocket-employee, 
rocket-clerk, employee-clerk} 
. Find the semantic similarity (expressed as a numeric 
value) for each such pair, specifying the senses with 
respect to which the similarity holds (if any), e.g. 
IN: {fire-dasmms,fire-hwe,fire-recrult, 
dmrmss-hlre,dmmms-reeruit,hire-recrmt} 
OUT: {sim(fire_v4,dismiss_v_4) = 6.124, 
sim(fire,hirel = 0, 
ram(fire,recruit) = O, 
sirnldmmms,hire I = 0, 
stm(dismiss,recruit) = 0, 
sim(hlrev_3,recruit_v_2) = 3.307} 
IN: {gun-rocket,gun-employee, 
gun-clerk, rocket-employee, 
rocket-clerk,employee-clerk} 
OUT: {s~m(gun_n_l,rocket n_l) = 5.008, 
mm(gun_n_l-3,employee n_l) = 1.415, 
ram(gun_n_ 1-3,clerk n 1/2) = 1.415, 
mm(rocket_n_3,employee_n 1} = 2.255, 
stm(rocket n_3,clerk._n l /2\] = 2.255, 
stm(employee_n_l,clerk n 1/2\] = 4.144} 
The assessment of semantic similarity and the ensuing 
word sense specification are carried out using Resnik's 
approach (see section 1). 
3. Fix the threshold for membership into clusters of se- 
mantically congruent word senses (either manually or by 
calculation of the most ubiquitous semantic similarity 
value) and generate such clusters. For example, assum- 
ing a threshold value of 3, we will have: 
IN: {stm(fire_v_4,dmmiss_v4) = 6.124, 
strrt(fire,htre) = 0, 
szm(fire,recrmt,} = 0, 
sim(dtsmms,htre} = O, 
szrn(dismtss,recrult} = 0, 
sire(hire v_3,recruikv_2} = 3.307} 
OUT: {fire v_4,dismiss v 4} 
{hlre_v 3, recru It_v_2} 
IN: {stm(gun_n_l,rocket_nl I = 5.008, 
s~m(gun_n_i/2/3,employee_nl} = 1.415, 
stm(gun_n_I / 2 / 3,clerk_n_i / 2} = 1.415, 
stm(rocket_n 3,employee n_1) = 2.255, 
stm(rocketn_3,clerkn_l/2} = 2.255, 
stm(employee_n i ,clerk n_l/2} = 4.144} 
OUT: {clerk n i/2,employee_n_1} 
{gun_n_ i ,rocket n_1} 
Once associated words have been partitioned into se- 
mantically congruent clusters, new sets of collocations 
are generated as shown in (8) by 
• pairing each cluster of semantically congruent asso- 
ciated words with its associating word, and 
• expanding the associating word into all of its possi- 
ble senses. 
At this stage, all word senses which are syntactically 
incompatible with the original input words are removed. 
For example, the intransitive verb senses fire v 1 and 
fire_v_5 (see table 1) are eliminated since the occurrence 
of fire in the input collocation which we are seeking to 
disambiguate relates to the transitive use of the verb. 
Note that the noun employee has only one sense in 
WordNet (see table 1); therefore, employee has a single 
expansion when used as an associating word. 
84 
(8) IN: < AED: { {h:re_v_3,recruit_v_2}, 
{dmmiss v_4,fire_y_4} }, 
AING: employee_n > 
OUT: < {hire v 3,recrutLv_2}, {employee_n_1} > 
< {dismtss_v 4,fire v_4}, {employee n_l} > 
IN: < AING:fire v, 
AED:{ {clerk_n_l,clerk n_2,employee_n_l}, 
{gun_n 1,rocket_n_1} } > 
OUT: < {fire_v_2~3~4~6~7~8}, 
{clerk n i/2,employee_n_l} > 
< {fire v_2/3/4/6/7/8}, 
{gun n_l,rocket_n I} > 
4 Disambiguating the "Associating" 
Word and Merging Disambiguated 
Collocations 
The disambiguation of the associating word is performed 
by intersecting correspondent subsets across pairs of the 
newly generated collocations. In, the case of verb-object 
pairs, for example, the subsets of these new sets con- 
taining verbs are intersected and likewise the subsets 
containing objects are intersected. The output comprises 
a new set which is non-empty if the two sets have one or 
more common members in both the verb and object sub- 
sets. For the specific example of newly expanded collo- 
cations given in (8), there is only one pairwise intersec- 
tion producing a non empty result, as shown in (9). 
(9) IN: < {fwe_v_2/3/4/6/7/8}, 
{clerk n i /2,employee_n_l} > 
< {dlsmiss v_4,flre_v_4} , 
{employee_n_l} > 
OUT: < {fire v_4}, {employee n_l} > 
All other pairwise intersections are empty as there are no 
verbs and objects common to both sets of each pairwise 
combination. 
The result of distinct disambiguation events can be 
merged into pairs of semantically compatible word 
clusters using the notion of semantic similarity. For 
example, the verbs and nouns of all the input pairs in 
(10) are closely related in meaning and can therefore be 
merged into a single pair. 
(10) IN: < fire_v 4, employee_n_l > 
< dmmiss v_4, clerk n_l > 
< give_the_axe_v_1 , salesclerk_n_1 > 
< sack v_2, shop_clerk n 1 > 
< terminate_v_4, clerk n_2 > 
OUT: < {fire_v_.4, dmmlss_v_4, sack v_2, 
give_the_axe_v_ 1, termmate_v 4}, 
{clerkn_l, employee_n_1, clerk_n_2 
salesclerk n 1, shop_clerk_n 1} > 
5 Storing Results 
Pairs of semantically congruent word sense clusters such 
as the one shown in the output of (10) are stored as 
cooccurrence restrictions so that future disambiguation 
events involving any head-dependent word sense pair in 
them can be reduced to simple table lookups. 
The storage procedure is structured in three phases. 
First, each cluster of word senses in each pair is assigned 
a unique code consisting of an id number and the syn- 
tactic dependency involved: 
(I I) < {I 02_VO, fire_v_4, diatoms v_4, sack_v 2, 
g,ve_the axe v_ I, send_away_v_2, 
force_out v_2, terminate v 4}, 
{I 02_OV, clerk_n_I/2, employee_n I, 
salesclerk_n_1, shop_clerk n I} > 
< {103_VO, lease v_4, rent v_3, hire.N_3, 
charter_v 3, engage_v 6, take_v_22, 
recruxt_v_2}, 
{102 OV, clerk n_1/2, employee_n_l, 
salesclerkn_l, shop_clerk_n_l} > 
< {104VO, shoot_v3, flre v 1 .... }, 
{104_OV, gun_n_l, rocket n_l .... } > 
Then, the cluster codes in each pair are stored in a cooc- 
currence restriction table: 
\] 102_VO , 102_OV I 103VO , 103_OV 
104VO , I04_OV 
Finally, each word sense is stored along with its associ- 
ated cluster code(s): 
fire v_4 
dismms v_4 
clerk_n_ 1 / 2 
employee_n 1 
h,re_v 3 
recruit v 2 
shoot_v 3 
firev 1 
gun_n_1 
rocket n_l 
102_VO 
102VO 
102VO 
i 02_VO 
103VO 
102VO 
i 04_VO 
104_VO 
I04VO 
i04VO 
The disambiguation of a pair of syntactically related 
words such as the pair <fire_v, employee_n> can be car- 
ried out by 
retrieving all the cluster codes for each word in the 
pair and create all possible pairwise combinations, e.g. 
IN: < fire v, employee_n > 
OUT: < 102_VO, 102_OV > 
< i04 VO, 102_OV > 
85 
• eliminating code pairs which are not in the table of 
cooccurrence restrictions for cluster codes, e.g. 
INPUT: < 102 VO, 102_OV • 
< 104_VO, 102_OV • 
OUTPUT: < 102_VO, 102_OV • 
using the resolved cluster code pairs to retrieve the 
appropriate senses of the input words from previ- 
ously stored pairs of word senses and cluster codes 
such as those in the table above, e.g. 
INPUT: < \[fire v, 102 VO\] , 
\[employee_n, 102_OV\] • 
OUTPUT: < fire v_4, employee_n_l • 
By repeating the acquisition process described in sec- 
tions 2-4 for collections of appropriately selected source 
corpora, the acquired cooccurrence restrictions can be 
parameterized for sublanguage specific domains. This 
augmentation can be made by storing each word sense 
and associated cluster code with a sublanguage specifi- 
cation and a percentage descriptor indicating the relative 
frequency of the word sense with reference to the clus- 
ter code in the specified sublanguage, e.g. 
flre_v 4 
fire_v_4 
fire_v 1 
fire v_l 
102_VO Business 65% 
102VO Crime 25% 
104VO Business 5% 
104VO Crime 70% 
6 Statistically Inconspicuous Collocates 
Because only statistically relevant collocations are cho- 
sen to drive the disambiguation process (see section 2), 
it follows that no cooccurrence restrictions will be ac- 
quired for a variety of word pairs. This, for example, 
might be the case with verb-object pairs such as < firev, 
hand_n > where the noun is a somewhat atypical object. 
This problem can be addressed by using the cooccur- 
rence restrictions already acquired to classify statisti- 
cally inconspicuous collocates, as shown below with 
reference to the verb object pair < firev, hand n >. 
Find all verb-object cooccurrence restrictions con- 
taining the verbfire, which as shown in the previous 
section are 
< 102_VO, 102_OV > 
< 104 VO, 104_OV • 
• Retrieve all members of the direct object collocate 
class, e.g. 
102OV -> clerk_n_ 1/2, employee_n_1 
104OV -> gun_n_l, rocket n_l 
Cluster the statistically inconspicuous collocate with all 
members of the direct object collocate class. This will 
provide one or more sense classifications for the statisti- 
cally inconspicuous collocate. In the .present case, the 
WordNet senses 2 and 9 (glossed as "farm labourer" 
and "crew member" respectively) are given when 
hand_n clusters with clerk n 1/2 and employee_n_1, 
e.g. 
IN: {hand_n, clerk n_l/2, employee_n 1, 
gun_n_ 1, rocketL n_l} 
OUT: {hand_n_2/9, clerk_n 1/2, employee_n_1} 
{gun_n_1, rocketn_l} 
Associate the disambiguated statistically incon- 
spicuous collocate with the same code of the word 
senses with which it has been clustered, e.g. 
IIhand In 12 I 10"O \[I hand n g 102_VO 
This will make it possible to choose senses 2 and 9 
for hand in contexts where hand occurs as the direct 
object of verbs such asfire, as explained in the pre- 
vious section. 
7 Preliminary Results and Future Work 
A prototype of the system described was partially im- 
plemented to test the effectiveness of the disambiguation 
method. The prototype comprises: 
a component performing semantic similarity judge- 
ments for word pairs using WordNet (this is an im- 
plementation of Resnik's approach); 
a component which turns sets of word pairs rated for 
semantic similarity into clusters of semantically con- 
gruent word senses, and 
a component which performs the disambiguation of 
syntactic collocates in the manner described in sec- 
tion 4. 
The current functionality provides the means to disam- 
biguate a pair of words <W1 W2> standing in a given 
syntactic relation Dep given a list of words related to W1 
by Dep, a list of words related to W2 by Dep, and a se- 
mantic similarity threshold for word clustering, as shown 
in (12). 
In order to provide an indication of how well the sys- 
tem performs, a few examples are presented in (12). As 
can be confirmed with reference to the WordNet entries 
in table 1, these preliminary results are encouraging as 
they show a reasonable resolution of ambiguities. A 
more thorough evaluation is currently being carried out. 
86 
(12) IN: < fire_v-\[employee n,clerk_n, gun_n,plstol_nl, 
\[fire,dasmlss, htre,recrmt\]-employee_n, 3 > 
OUT: < fire_v_4 employee n_l > 
IN" < fire v-\[employee_n,clerk._n, gun_n,plstol n\], 
\[fire v,shoot_v, pop v,&sharge v\]-gun n, 3 > 
OUT: < fire_~..1 gun_n_l > 
IN: < wear_v-\[sult...n,garment_n, clothes_.n,umform n\], 
\[wear_v, have_onv, record v,iile v\]-smt_n, 3 > 
OUT. < wear_v_l/9 smt n_l > 
IN. < file_v-\[sult_.n,proceedmgs n, lawsult_n, 
htagataon n\], 
\[wear,have_on, record_v,file v\]-sult n, 3 > 
OUT. < file_v 1/5 sult_n_2 > 
Note that disambiguation can yield multiple senses, as 
shown with reference to the resolution of the verbs file 
and wear in the third and fourth examples shown in (12). 
Multiple disambiguation results typically occur when 
some of the senses given for a word in the source dic- 
tionary database are close in meaning. For example, both 
sense 1 and 9 of wear relate to an eventuality of 
"clothing oneself". Multiple word sense resolutions can 
be ranked with reference to the semantic similarity 
scores used in clustering word senses during disam- 
biguation. The basic idea is that the word sense resolu- 
tion contained in the word cluster which has highest 
semantic similarity scores provides the best disambigua- 
tion hypothesis. For example, specific word senses for 
the verb-object pair < wear suit > in the third example of 
(12) above are given by the disambiguated word tuples 
in (13) which arise from intersecting pairs consisting of 
all senses of an associating word and a semantically 
congruent cluster of its associated words, as described 
in section 4. 
(13) { < {have_on_v_l,wear v I}, 
{clothes n 1 ,garment n_l, suit_n_1, 
uniform n 1} > 
< {file v_2,wear_v_9}, 
{clothes n_l,garment_n_l, sult_n_l, 
uniform_n l} > } 
Taking into account the scores shown in (14), the best 
word sense candidate for the verb wear in the context 
wear suit would be wear_v 1. In this case, the semantic 
similarity scores for the second cluster (i.e. the nouns) 
do not matter as there is only one such cluster. 
(14) szm(have_on_v_l, wear_v 1) = 6.291 
sim(file_v_2, wear v_9} = 3.309 
Preliminary results suggest that the present treatment 
of disambiguation can achieve good results with small 
quantities of input data. For example, as few as four 
input collocations may suffice to provide acceptable 
results, e.g. 
(15) IN: < flre_v-\[employee_n,clerk n\], 
\[fire,dlsmiss\]-employee_n, 3 • 
OUT: < fire_v_4 employee_n_1 • 
IN: < wear v-\[star n,clothes_n\], 
\[wear_v,have_on_v\]-suit_n, 3 > 
OUT: < wea_ v_l sultn 1 > 
This is because word clustering --- which is the decisive 
step in disambiguation --- is carried out using a measure 
of semantic similarity which is essentially induced from 
the hyponymic links of a semantic word net. As long as 
the collocations chosen as input data generate some 
word clusters, there is a good chance for disambiguation. 
The reduction of input data requirements offers a sig- 
nificant advantage compared with methods such as those 
presented in Brown et al. (1991), Gale et al. (1992), 
Yarowsky (1995), and Karol & Edelman (1996) where 
strong reliance on statistical techniques for the calcula- 
tion of word and context similarity commands large 
source corpora. Such advantage can be particularly ap- 
preciated with reference to the acquisition of cooccur- 
rence restrictions for those sublanguage domains where 
large corpora are not available. 
Ironically, the major advantage of the approach pro- 
posed --- namely, a reliance on structured semantic word 
nets as the main knowledge source for assessing seman- 
tic similarity --- is also its major drawback. Semantically 
structured lexical databases, especially those which are 
tuned to specific sublanguage domains, are currently not 
easily available and expensive to build manually. How- 
ever, advances in the area of automatic thesaurus dis- 
covery (Grefenstette, 1994) as well as progress in the 
area of automatic merger of machine readable diction- 
aries (Sanfilippo & Poznanski, 1992; Chang & Chen, 
1997) indicate that availability of the lexical resources 
needed may gradually improve in the future. In addition, 
ongoing research on rating conceptual distance from 
unstructured synonym sets (Sanfilippo, 1997) may soon 
provide an effective way of adapting any commercially 
available thesaurus to the task of word clustering, thus 
increasing considerably the range of lexical databases 
used as knowledge sources in the assessment of semantic 
similarity. 
Acknowledgements 
This research was carried out within the SPARKLE 
project (LE-12111). I am indebted to Geert Adriaens, 
Simon Berry, Ted Briscoe, Ian Johnson, Victor Poznan- 
ski, Karen Sparck Jones, R.alf Steinberger and Yorick 
Wilks for valuable feedback. 

References 
N. Abramson. 1963. Information Theory and Coding. 
McGraw-Hill, NY. 
P. Brown, S. Pietra, V. Pietra & R. Mercer. 1991. Word 
sense disambiguation using statistical methods. In Pro- 
ceedmgs of ACL, pp. 264-270. 
J. Chang and J. Chert. 1997. Topical Clustering of MRD 
Senses based on Information Retrieval Techniques. Ms. 
Dept. of Computer Science, National Tsing Hua University, 
Taiwan. 
W. Gale, K. Church & D. Yarowsky. 1992. A method for 
disambiguating word senses in a large corpus. Computers 
and the Humanities, 26:415-439. 
G. Grefenstette. 1994. Explorations in Automatw Thesaurus 
Discovery. Kluwer Academic Publishers, Boston. 
Y. Karov & S. Edelman. 1993. Learning similarity-based 
word sense disambigu~ion &om sparse data. Available 
Fout! Bladwi\]zer abet gedef~nieerd. ~ paperNo. 
9605009 
H. Kozima & T. Furugori. 1993. Similarity between Words 
Computed by Spreading Activation on an English Diction- 
ary. In Proceedings of EACL. 
G. Miller. 1990 Five Papers on WordNet. Special issue of 
the International Journal of Lexicography, 3 (4). 
J. Morris & G. Hirst. 1991. Lexical Cohesion Computed 
by Thesaural Relations as an Indicator of the Structure of 
Text. ComputatzonalLmguistws, 17:21-48. 
R. Rada, M. Hafedh, E. Bicknell and M. Blettner. 1989. 
Development and application of a metric on semantic nets. 
IEEE Transactions on System, Man, and Cybernetws, 
19(1):17-30. 
P. Resnik. 1995a. Using information content to evaluate 
semantic similarity in a taxonomy. In Proceedmgs of 
1JCAL 
P. Resnik. 1995b. Disarnbiguating noun groupings with 
respect to WordNet Senses. In Proceedings of 3rd Work- 
shop on Very Large Corpora. Association for Computa- 
tional Linguistics. 
A. Sanfilippo. 1997. Rating conceptual distance using ex- 
tended synonym sets. Ms. SHARP Lab. of Europe, Oxford. 
A. Sanfilippo and V. Poznanski. 1992. The Acquisition of 
Lexical Knowledge from Combined Machine-Readable 
Dictionary Sources. In Proceedings of the 3rd Conference 
on Apphed Natural Language Processing, Trento. 
D. Yarowsky. 1995. Unsupervised Word Sense Disam- 
biguation Rivaling Supervised Methods. In Proceedings of 
the 33rd Annual Meeting of the ACL, pp. 189-96. 
