WORD SENSE AMBIGUATION: CLUSTERING RELATED SENSES 
William B. Dolan 
Microsoft Research 
billdol @ microsoft.corn 
Abstract 
This paper describes a heuristic approach to 
automatically identifying which senses of a machine- 
readable dictionary (MRD) headword are 
semantically related versus those which correspond 
to fundamentally different senses of the word. The 
inclusion of this information in a lexical database 
profoundly alters the nature of sense disambiguation: 
the appropriate "sense" of a polysemous word may 
now correspond to some set of related senses. Our 
technique offers benefits both for on-line semantic 
processing and for the challenging task of mapping 
word senses across multiple MRDs in creating a 
merged lexical database. 
1. Introduction 
The problem of word sense disambiguation is one 
which has received increased attention in recent 
work on Natural Language Processing (NLP) and 
hfformation Retrieval (IR). Given an occurrence of a 
polysemous word in running text, the task as it is 
generally formulated involves examining a set of 
senses, defined by a MRD or hand-constructed 
lexicon, and examining contextual cues to discover 
which of these is the intended one. This paper 
considers a problem with the standard approach to 
handling polysemy, arguing that in many cases this 
kind of "forced-choice" approach to disambiguation 
leads to arbitrary decisions which have negative 
consequences for NLP systems. In particular, we 
show that a great deal of potentially useful 
information about a word's meaning may be missed 
if the task involves isolating a single "correct" sense. 
We describe an approach to the construction of an 
MRD-derived lexical database that helps overcome 
some of these difficulties. 
We begin by reviewing two difficulties with this 
approach, then go on to suggest our approach to 
solving these difficulties in creating a large MRD- 
derived lexical database. Our method might be 
termed "ambiguation", because it involves blurring 
the boundaries between closely related word senses. 
Alter describing the algorithm which accomplishes 
this task, we go on to briefly discuss its results. 
Finally, we describe the implications of this work 
has for the task of merging multiple. 
The arbitrm'iness of sense divisions 
The division of word meanings into distinct 
dictionary senses and entries is frequently arbitrary 
(Atkins and Levin, 1988; Atkins, 1991), as a 
comparison of any two dictionaries quickly makes 
clear. For example, consider the verb "mo(u)lt", 
whose single sense in the American Heritage 
Dictionary, Third Edition (AHD3) corresponds to 
two senses in Longman's Dictionary of 
Contemporary English (LDOCE): 2 
AHD3 
!y: part :or a!l of a coat or... 
covering; sucli:as feathers;cuticle or skin ' 
LDOCE 
(1) "(of a biid ) to !0se o r thro TM off (tleatberS) at the 
seasoii when new feathers grow' 
. i (2) {of an anima!; eSpl ad6g or cat)to lose or throw 
0ff(hair or fur) 
The arbitrary nature of such divisions is 
componnded by the fact that dictionaries typically 
provide no information about how the different 
senses of a polysemous headword might be related. 
Examination of dictionary entries shows that these 
interrelationships are ol)en highly complex, 
encompassing senses which differ only in some 
slight shade of meaning, those which are historically 
but not synchronically related, those which are 
linked through some more or less opaque process of 
metaphor or metonymy, and finally, those which 
appear to be completely unrelated. 
A typical case is the entry liar the noun "crank", 
which includes one "apparatus" sense (1) and two 
"person" senses (2) and (3). Nothing in this entry 
indicates that (2) and (3) are more closely related to 
one another than either is to (1). 
I 1 would like to extend my thanks to Robert Dale and Lisa Braden-ltarder, as well as the naembers of the 
Microsoft NLP group: George Heidorn, Karen Jensen, Joseph Pentheroudakis, Diana Peterson, Steve Richardson, 
and Lucy Vanderwende. 
2 All examples are fi'om LDOCE, except as noted. 
712 
(!) "an apparatus fl~r c ranging movement in a 
straight line into circular nlovenmnt..." 
(2,) "a person with.,stnmge, odd, or peculiar ideas" 
(3) "a nasty bad4empered person, 
Usi!!g MRI)s forSe+nse l)isatnbiguation 
Atkins (1991) argues that dictionary-derived 
lexical databases will be capable of supporting high- 
quality NI+P only if they contain highly detailed 
taxonomic descriptions ol' the interrelationships 
among word senses. These rchttionships are often 
systematic (see Atkins, 1991), and it is possible to 
imagine strategies tot autonmtically or at least semi- 
iUltomatically identi(ying them. One such proposal is 
due to Chodorow (199t)), who notes 10 recnrring 
types of inter-sense relationships in Webster's 7th, 
inchlding PROCESS/RESUI+T, FOOI)/PI+ANT, and 
CONTAINEP,/VOI+UME, and suggests that seine 
instances of these relationships might be 
autolnatically identified. Ideally, such strategy mighl 
allow the autolnated construction of lexical databases 
which explicitly characterize how individual senses 
of a headword are rehttcd, with these 
il+terrclationships described by a fixed, general set of 
semantic isssociations which hold between words 
throughout the lexicon. 
In practice, however, attempts It) antomatically 
identil}¢ systematic polysemy in MRDs will capture 
only a slnall subset of the clsses in which word senses 
overlap semantically. ()flen, distinctions among a 
word's senses are so fine or so idiosyncratic that they 
silnply ca!mot bc characterized in a general way. For 
instance, while the two LI)OCF, senses of "moult" 
arc closely related, the film distinction they reflect 
between "bird" lind "animal" behavior is not one 
which recurs systematically thronghont the English 
lexicon. 
In short, the task of identil+ying and atlaching a 
mealfingful label to each of tim links alnong related 
words senses in a largo lexical database is a daunting 
one, and one that will ultinultely require it great deal 
of hand-coding, l)erhaps lot these teasel/S, we know 
of no large-scale attempts to autolnatically create 
labeled links among serlses of polysemous words. 
Moreover, it is not clear that isttaching a 
lncanil~gful label Io the rehstionstfil~ between two 
semantically rclated senses of a word will necessarily 
aid in perforlning NI+P tasks. Krovetz and Croft 
(1992) snggest jnst the opposile, claimil~g that in 
lnany cases, dictionary elltries for polyselnons words 
encode film-grain semantic distinctions that arc 
unlikely to be of practical valnc for specific 
ispl)tications+ Our expcrience suggests a silnilar 
conclusion. Consider, for instance, tile following pair 
of senses lor tile tloun "stalk": 
(1) "the m,,ain upright part of a plant (not a tree)" 
. (Ex: abeanstaik) 
(2) 'a long narrow part of a plant suppor, ting one or 
\[ inore leaves, fruitS, or flowers; stem 
The differences between these two senses are 
subtle enough that for many tasks, including sense 
disambiguation in running text, the two are likely to 
be indistinguishable from one another. In a sentence 
like "Tile stalks remained in the farmer's field long 
after summer", lor instance, the choice of some 
particular sense of "stalk" as the "correct" one will be 
essentially arbitrary. 
S~nsc \])jsambkguatiQ!~ ve!'sus l!fformation lmss 
Sense disambiguation algorithms arc l~equently 
faced with mnltiple "conect" choices, a siluation 
which increases their odds of choosing a reasonable 
sense, hut which also has bidden negative 
consequences for selnanlic processing. First of all, 
the task of discrimilmting between two or more 
extremely similar senses can waste processing 
resonrces while providing no obvious benefit. 
tlowever, there are more problematic effects nf 
combining a lexicon which makes unnecessarily fine 
distinctions between word senses with a 
disambignation algorithm which sets up the iSltiI'icial 
task of choosing a single "correct" sense Ibr a word. 
The probleln is that this strategy means thal the 
innonnt of senlantic inflmnation retrieved los' a word 
will always be lilnited to just that which is available 
in some individual sense, and valuable background 
inlormation about a word's meaning nlay be ignored. 
In the cltse of "stalk", for instance, choosing the first 
sense will mean losing the fitct that "stalks" are 
"steins", that they lue "mmow", lind that they 
"support leaves, fruits, or flowers". Choosing the 
second sense, on the other band, will mean losing 
tim fact that stalks arc upright, that one example of a 
stalk is a "beanstalk", and that the main upright part 
of a "tree" cannot be called a "stalk". 
lhunan dictionary users never encounter this 
problein. The reason is that instead of treating the 
entry tot a word like "stalk" its a pail el' entirely 
discl'ele senses, a hulnan looking this word np wouM 
typically arrive at nlore abstract notion of its 
meaning, one which encompasses infol+lnatiol~ flom 
both senses, llow call we refommlate tile problem ot: 
sense disalnbigualion its a computational context so 
that Selnantic processing can do a better job of 
nlinficking the hunlan user? Our solution involves 
encoding in our 13)OCE-derived Icxical database 
inforlnatiol~ isbout how a word's senses overlap 
semantically. 
2. Mentifying Semantically Similar Senses 
The relnahuler of tile paper describes a heuristic- 
based algnrithln which antomatically determines 
which senses of a given IJ)OCE headword are 
713 
closely related to one another vs. those which appear 
to represent fundamentally different senses of the 
word. While no attempt is made to explicitly identif}¢ 
the nature of these links, our program has the 
advantage of generality: no hand-coding is required, 
and the techniques we describe can thus be applied 
(with some modification) to on-line dictionaries 
other than LDOCE. This work has an important 
effect on the formulation of the sense disambiguation 
task: by encodiug information of this kind in onr 
LDOCE-derived lexical database, we can now permit 
the sense disambiguation component of our system to 
return a merged representation of the semantic 
information contained in multiple senses of a word 
like "stalk". Making available more background 
information about a word's meaning increases the 
likelihood of correctly interpreting sentences which 
contain this word. 
Our method involves pertbrming an exhaustive 
set of pairwise comparisons of the different senses of 
a polysemons word with one another, with the aim of 
discovering which pairs show a higher degree of 
semantic similarity. Comparisons are not limited by 
part of speech; for example, noun and verb senses 
are compared to one another. A variety of types (ff 
inRnmation about a sense's meaning are exploited by 
this comparison step, including: 
• LDOCE Syntactic Subcategorization Codes 
,, LI)OCE Boxcodes 
The program uses a taxonomic classification of 
these codes based on Bruce and Guthrie (1992) to 
allow partial matches between senses with non- 
identical but related Boxcodes. In addition, certain 
Boxcode specifications (e.g., Iplant\]) match against 
sets of keywords in definition strings (e.g., {plant, 
soil} }. 
• LDOCE Domain Codes 
A taxonomic classification of the 124 Domain 
codes like that in Slator (1988) is used to identify 
cases in which two senses have similar but non- 
identical codes. As with the Boxcodes, certain 
Domain specifications (e.g., BB, "baseball"\]) match 
against sets of keywords in definition strings (e.g., 
{baseball, ball, sports}). 
• Features Abstracted from LDOCE Definitious: 
A number of binary features, inch, ding \[locative\] 
and \[human\] have been automatically assigned to 
LDOCE senses, based on syntactic and lexical 
properties of their definitions. Matches between 
these features increase the likelihood that two senses 
are semantically related. 
• Semantic Relations 
The most important source of evidence about the 
interrelationships among senses has been 
automatically derived fi'om LDOCE definition 
sentences. The program consults a lexical database 
which contains approximately 150,000 semantic 
associations between word senses, the result of 
autonmtically parsing the definition text of each 
noun and verb sense in LDOCE and then applying a 
set of heuristic rules which antomatically attempt to 
identify any systematic semantic relationships 
holding between a headword and the (base forms of) 
words used to define it (Jensen & Binot, 1987; 
Montemagni and Vanderwende, 1992). 
Approximately 25 types of semantic relations are 
currently identified, including Hypernym (genus 
term), Location, Manner, Purpose, ttas Part, 
TypicalSubject, and Possessor. Finally, each of. 
these links is automatically sense-disambiguated. 
The resulting associations are modeled as labeled 
edges in a directed cyclic graph whose nodes 
correspond to individual word senses (Dolan et al, 
1993; Pentheroudakis and Vanderwende, 1993). 
Matching two senses involves comparing any 
wdues which have been identified for each of the 
semantic relation types. One of the most important 
comparisons is of Hypernyms, which have been 
identified lbr the wtst majority of noun and verb 
senses. An exact Hypernym match generally signals 
a close semantic relationship between two senses, as 
in the following senses of the noun "cat": 
(!)with S0f( fur and sharp teeih and \] 
claws (naiis), 0ften kept as a pet,i": \[ 
rel!t!ed to this, \[ 
:suCh as :the li0n or:tiger,.. \] 
Comparisons are not limited to Hypernyms, of 
course: in comparing two senses, the program 
attempts to identify shared values tot each of the 
different semantic attributes present in a word's 
lexical representation. For instance, in each of the 
following verb senses of "crawl", the word "slowly" 
has been automatically identified as the value of a 
Manner attribute. 
(I)"tomgves!oWlywith the body c!0se t0 tlie \[ 
ground or floor 0r 0n tlie hands and kneeS" I 
(2) "tO gO very Siowly" I 
Each time an identical value is found for a given 
semantic attribute, the algorithm increments the 
correlation score for that pair of senses. If no exact 
match is found, the program checks whether the 
values for this attribute in the two senses have a 
hypemym or hyponym in common. The following 
senses of the noun "insect", lbr example, are linked 
through the Ilypernyms "creature" .'md "animal": 
(1) "a small Creature with no bones and: a hard outer 
covering..." \[ 
(2) "a very Sinall animal that creeps along the \[ 
groundl such as a spider or worm" \[ 
According to the network implicit in LDOCE, 
"creatm'e" is a hyponym of "animal", while "animal" 
is a hyponym of "creature". (For discussion of this 
714 
kind el: circularity in dictionary detinitions, see 
Calzolari, 1977•) 
In addition to such straightlorward comparisons, 
a number of "scrambled" colnparisons are attempted. 
For instance, any value for the lngredientOf attribute 
is automatically compared to tile Itypernym wflue(s) 
lk)r each other senses. This comparison reflects tile 
fact that maBy nouns are both the nalne for a 
substance and tor something which is made li'om 
that substance. An exanrple of this is the noun 
"coffee": in one sense, "coffee" is hlgredientOf of a 
"drink", while in another sense it has been klentified 
as a Hypernym of the noun "drink". 
(!) 'a brown p0wder made by crushing coffee beans,~\] 
used fo{i mak!g g drmks: \] 
(2) !'(a cupful of)a hot br0wn drittk made,by adding \] 
\[ hot water am!/or, milk to this powder' \[ 
3. DiscAIssion and Ewthtation 
The sense clustering prograln was run over tile 
set of 33,0(10 single word noun defintions and 12,000 
single word verb definitions in I,DOCE (45,000 
total) in a process that took approximately 20 hours 
on a 486/50 I'C. Given a set of senses tot a 
polysemous word such as "crank", tile result of the 
exhaustive pairwise comparisons performed by the 
program is a (synnnetrical) matrix of correlation 
scores: 
~ ~II~0 v(!b) zd .2 ,,3 
v (1 a;~"-~ 
(lb) I V5 "~...... 
~ I \] 41 35 .. 
.3 /3 _z 2 
Since our conlparison are heuristic in nature, the 
relative rankings of the pairwise comparisons for a 
polysemous word's senses are the relevant measnre 
of semantic similarity, rather than any absolute 
threshold. In the case of "crank", (21ustel'ing has 
correctly indicated a high correlation between the 
tWO "hulnail" senses of the II()l.in "ci'ank", anti a high 
correlation between the two verbal subsenses and the 
"apparatus" noun sense. Mo,eover, tire two "hnman" 
noun senses are not semantically correlated with any 
of tile three "apparattls" senses. 
Negative scores are also common, reflecting 
certain kinds of incompatibilities between senses 
(e.g., one sense is \[+animate\] while the other is 
\[-animate\]). As a rule of thumb, however, it is much 
easier to identify commonalties between senses than 
to identify definite mismatches. 
Zero Derivation 
One of the most useful products of clustering is 
the identification of many cases of zero-derived 
norm/verb pairs. For instance, tile comparison of tile 
various senses of the word "cook" shows the verb 
sense "to prepare (toed) for eating..." to be highly 
correlated with the noun sense "a person who 
prepares and cooks a~od". This kind of cross.- 
classification, which dictionaries generally fail to 
provkte, has interesting implications tot normalizing 
tilt semantics of superficially very different 
sentences. For example, a concept which is 
expressed verbally in one sentence can now be 
related to the same general concept expressed 
nominally in anoflmr, even if LDOCE does not 
explicitly link the definitions ff)r the two parts of 
speech. (Pentheroudakis & Vanderwende (1993) 
describe a general approach to identifying semanlic 
links among lnorphologically-rehited words.) 
Metaphor 
Interestingly, the tact that many conventional 
metaphors tire lexicalized in diclionary definitions 
can lead to difficulties with our strategy of 
comt,aring different definitions to one another. 3 
Consider the lbllowing senses of the noun "nmuth": 
(i) "the oPening onthe face through which an 1 
animal or hmnan being may take food::." / 
" "r (2) an opemng, entrance, or way out 
...... (Ex:" "mouttt of a cave") , 
In considering these two senses, Clustering 
returned a correlation score ¢ff 26, snggesting a 
reasonably close semantic relationship between 
them. l~:mm one perspective this is simply wrong: a 
lnunan or anilnal "month" is fundamentally different 
from a cave "mouth", and we wotfld like our MRD- 
derived lexicon to indicatc this fact. Once the 
obvious metaphorical association between these two 
senses of "mouth" is noted, however, the reason for 
tile clustering program's result becomes clear: both 
senses are defined as kinds of "npenings". The case 
lor treating the two senses as semantically similar is 
strengthened by other evidence: one sense of 
"entrance" (which is the tlypernym of the second 
sense) has "opening" as its own Hypernym: "a gate, 
door, or other opening by which one enters". 
Such metaphorical associations between word 
senses add a considerable degree of complexity to 
disambiguation and other kinds of reasoning 
processes that operate by identifying semantic 
relationships between different words. More work 
aimed at identifying tire systematic natnre of such 
3 This same problem crops up in any task which 
involves comparing different words/senses, including 
disambiguation of running text• 
715 
relationships will be required before metaphor-based 
confusions of the kind described above can be 
automatically resolved. 
4. Conclusions and Future Work 
Interestingly, the machinery used to identify 
common semantic threads among a polysemous 
word's senses was originally constructed with 
another purpose in mind -- namely, disambiguating 
LDOCE genus terms. As it turned out, exactly the 
same set of tests used to compare a word sense to the 
set of possible senses of its Hypernym proved usefid 
in comparing the different senses of a single word. 
While the current instantiation of the Clustering 
program relies partially on information which is 
idiosyncratic to LDOCE (e.g., Domain codes), most 
of the information it uses for inter-sense comparisons 
has been extracted from the text of their definitions. 
For this reason, the techniques we have described 
here are can be readily applied to other MRDs. 
In addition, we plan to experiment with 
augmenting the results of out' sense clustering with 
statistics derived from running a sense 
disambiguation program over a large fiee-text 
corpus. In particular, we are interested in 
discovering whether the "hard cases" encountered by 
this sense disambiguation program (i.e., those cases 
in which the program consistently has difficulty in 
choosing among two or more competing senses) 
correlate with cases of significant semantic overlap 
among senses, if this hypothesis is borne out, then 
information about which senses are difficult to 
distinguish in flee text can be used to help us 
establish the taxonomic relationships among the 
different senses of a polysemous word. 
Finally, the work we have described here has 
important implications for the task of merging 
multiple MRDs into a single lexical database. This 
task is greatly complicated by arbitrary sense 
divisions encountered in different dictionaries (see 
Atkins and Levin, 1988; Byrd, 1989). Consider the 
verb "mo(u)lt" again: since the single AHD3 sense 
for this word subsumes both LDOCE senses, no 
obvious mapping strategy is available. Should the 
AHD3 sense be mapped into just one of the LDOCE 
senses? Each of them? Or should the AHD3 sense be 
left separate, resulting in a merged lexical eutry with 
three separate entries? As more sources of 
infor,nation about word meanings arc folded in, this 
last strategy can only increase the complexity of 
semantic processing, since it will become more and 
more difficult to deternrine which of an ever-larger 
set of semantically-related senses is the appropriate 
one in a given context, Clustering offers a simple 
way to begin to approach this problem. By pooling 
and clustering senses for words fi'om both LDOCE 
and AHD3, we can provide a rough indication of the 
semantic iuterconnections between the two entries. 
As our techniques for automatically extracting 
semantic information from the text of definition and 
example sentences gradually improve, we expect our 
ability to automatically identify semantic overlaps 
and differences to improve as well. 
5. References 
Atkins, Beryl (1991) "Building a Lexicon: the 
contribution of lexicography", International 
Journal of Lexicography 3:167-204. 
Atkins, B. and B. Levin (1988) "Admitting 
Impediments", Proceedings of the Fourth Annual 
Conference of the UW Centre for the New OED, 
Oxford. 
Bruce, R. and L. Guthrie (1992) "Genus 
Disambiguation: a Study in Weighted Preference", 
In hvceedings of COLING92, pp. 1187- I 191. 
Byrd, R. (1989) "Discovering Relationships among 
Word Senses", Proceedings of the Fifth Annual 
Conference of the UW Centre for the New OED, 
Waterloo, Canada. 
Calzolari, N. (1977) An Empirical Approach to 
Circularity in Dictionary Definitions", in Cahiers 
de Lexicologie 3 l : 118-128. 
Chodorow, M. (1990) "Making Sense of Word 
Senses: detecting and analyzing systematic 
polysemy in noun definitions", CUNY, 
unpublished ms. 
Dolan, W., I,. Vauderwende, and S. Richardson 
(1993). "Automatically Derived Structured 
Knowledge Bases from On-line Dictionaries", 
Proceedings of the First ConJerence of the Pacific 
Association for Computational Linguistics, April 
21-24, Simon Fraser University, Vancouver, 
Canada. 
Jensen, K. and J.-L. Binot (1987). "Disambiguating 
prepositional phrase attachments by using on-line 
dictionary definitions". Computational Linguistics 
13: 3-4, pp. 251-260. 
Krovetz, R. and B. Croft (1992). "l,exical Ambiguity 
and Information Retrieval", ACM Transactions on 
InJormation Systems, 10: 2, pp. 115-141. 
Montemagne, S. and L. Vanderwende (1992). 
"Structural patterns vs. string patterns \[br 
extracting semantic iniormation from 
dictionaries." In Proceedings of COLING92, pp. 
546-552. 
Pentheroudakis, J. and L. Vanderwende (1993) 
"Automatically Identifying Morphological 
Relations in Machine Readable Dictionaries", in 
Proceedings of the Ninth Annual Conf. of the UW 
Centre for the New OED and Text Research, 
Oxford, England.pp. 114-131. 
Slator, B. (1988) "Constructing Contextually 
Organized I,exical Semantic Knowledge-Bases", 
Proceedingx o/" the Third Annual Rocky Mountain 
Cor~'erence on ArtoCicial hztelligence (RMCAI- 
88), Denver, CO, pp. 142-148. 
716 
