Word-Sense Disambiguation 
Using Statistical Models of Roget's Categories 
Trained on Large Corpora 
David Yarowsky 
AT&T Bell Laboratories 
600 Mountain Avenue 
Murray Hill N J, 07974 
yarowsky@research.att.com 
Abstract 
This paper describes a program that disambignates 
English word senses in unrestricted text using statistical 
models of the major Roget's Thesaurus categories. 
Roget's categories serve as approximations of conceptual 
classes. The categories listed for a word in Roger's index 
tend to correspond to sense distinctions; thus selecting 
the most likely category provides a useful level of sense 
disambiguatiou. The selection of categories is 
accomplished by identifying and weighting words that 
are indicative of each category when seen in context, 
using a Bayesian theoretical framework. 
Other statistical approaches have required special corpora 
or hand-labeled training examples for much of the 
lexicon. Our use of class models overcomes this 
knowledge acquisition bottleneck, enabling training on 
unresUicted monolingual text without human 
intervention. Applied to the 10 million word Grolier's 
Encyclopedia, the system correctly disambiguated 92% 
of the instances of 12 polysemous words that have been 
previously studied in the literature. 
1. Problem Formulation 
This paper presents an approach to word sense 
disambiguation that uses classes of words to derive 
models useful for disambignating individual words in 
context. "Sense" is not a well defined concept; it has 
been based on subjective and often subtle distinctions in 
topic, register, dialect, collocation, part of speech and 
valency. For the purposes of this study, we will define the 
senses of a word as the categories listed for that word in 
Roger's International Thesaurus (Fourth Edition - 
Chapman, 1977). 1 Sense disambiguation will constitute 
1. Note that this edition of Roger's Thesaurus is much more e0ttm$ive 
than the 1911 vm'sion, though somewhat more difficult to obtain in 
electronic form, One could me other other concept hlemrehics, such 
as WordNet (Miller, 1990) or the LDOCE mbject codes (Slator, 
1991). All that it necessary is • set of semamic categories and • list 
of the words in each category. 
selecting the listed category which is most probable given 
the surrounding context. This may appear to be a 
particularly crude approximation, but as shown in the 
example below and in the table of results, it is 
surprisingly successful. 
Input Output 
Tvr.admillsauachedto cranu were used to lift heavy TOOLS 
for supplying powe¢ for cranes, hoists, and lift s. TOOLS 
hovetl'fitheisht,atower crane is oftea med .SB TM* TOOLS 
¢labocate oaumhip ribalds cranes build • nest of vegetafi ANIMAL 
are more closely tv.lated to cranes and rails .Sn They ran ANIMAL 
low tees ,PP At least five crane species are in danger of ANIMAl. 
Not only do the Roget categories succeed in partitioning 
the major senses, but the sense tags they provide as 
output are far more mnemonic than a dictionary 
numbering such as "crane 1.2". Should such a 
dictionary sense number be desired as output, section 5 
will outline how a linkage between Roget categories and 
dictionary definitions can be made. 
We will also focus on sense distinctions within a given 
part of speech. Distinctions between parts of speech, 
should be based on local syntactic evidence. We use a 
stochastic part-of-speech tagger (Church, 1989) for this 
purpose, run as a preprocessor. 
2, Proposed Method 
The strategy proposed here is based on the following 
three observations: 1) Different conceptual classes of 
words, such as AmMALS or MACH~mS tend to appear in 
recognizably different contexts. 2) Different word senses 
tend to belong to different conceptual classes (crane can 
be an ANIMAL or a MACHINE). 3) If one can build a 
context discriminator for the conceptual classes, one has 
effectively built a context discriminator for the word 
senses that are members of those classes. Furthermore, 
the context indicators for a Roget category (e.g. gear, 
piston and engine for the category TOOLS/MACHINERY) 
will also tend to be context indicators for the members of 
that category (such as the machinery sense of crane). 
ACRES DE COLING-92. NA~rrES, 23-28 AO~r 1992 4 5 4 PRec. OF COLING-92, NANTES, AUG. 23-28, 1992 
We attempt to identify, weight and utilize th~e indicative 
words &s follows. For each of the 1042 Roget Categories: 
1. Collect contexts which are representative of the 
Roget category 
2. Identify salient words in the collective context and 
determine weights for each word, and 
3. Use the resulting weights to predict the appropriate 
category for a polysemous word occurring in novel 
text. 
2.1 Step 1: Collect Contexts which are 
Representative of the Roget category 
The goal of this step is to collect a set of words t/tat are 
typically found in the context of a Roget category. To do 
this, we extract concordances of 100 surrounding words 
for e2~h occurrence of each member of the category ill 
the corpus. Below is a sample set of partial concordances 
for words in the category "IOOLS.tMACIIINERY (348). The 
complete set contains 30,924 lines, selected from the 
particular training corpus used in this study, the 10 
million word. June 1991 electronic version of Grolier's 
Encyclopedia. 
CARVING .SB The gutter 
uipment such as a hydraulic 
on .SB Resembling a power 
uipmant, valves for nuclear 
00 BC, flint-odged wooden 
1-penetrating c~bide-tipped 
~lt heightens the colors .SB 
lxaditionM ABC method and 
nter of ro~tion .PP A rower 
rshy areas .SB The crowned 
adz has a concave blade for fonn 
shovel capable of lifting 26 cubic 
shovel mounted on a floating hul 
generators, oil-refinery turbines 
sickles were used to gather wild 
drills forced manufacturers to fi 
Drills live in the forests of equa 
drill were unchanged, and dissa 
crane is an assembly of fabricat 
crane, however, occasionally 
For optimal training, die concordance set should only 
include references to the given category. But in practice 
it will unavoidably include spurious examples since mmty 
of the words are polysemous (such its drill mid crane in 
lines 7, 8, and 10 above). 
While the level of noise introduced through polysemy is 
substantial, it can usually be tolerated because the 
spurious senses are distributed through tile 1(}41 other 
categories, whereas the signal is coneenwated in just one. 
Only if several words had secondary senses in the state 
category would context typical for the other category 
appear significant in this context. 
However, if one of these spurious senses was frexluent 
and dominated the set of examples, file situation could be 
disastrous. An attempt is made to weight the 
concordance data to minim~e this effect and to make the 
sample representative of all tools attd tnachinery, not just 
the more common ones. If a word such as drill occurs k 
tinies in the coqms, all words ill the context of drill 
contribnte weight 1/k to frequency sunos. 
Despite its flaws, this weighted matrix will serve as a 
representative, albeit noisy, sample of the typical context 
of'IYOOI.S/MACItlNERY in Grolier's encyclopedia. 
2.2 Step 2: Identify ,salient words in the 
collective cmttext, and weight appropriately 
Intuitively, a salient word 2 is one which appears 
siguificantly more often in the context of a category than 
at other txfints in the corpus, and hence is a better than 
average indicator for the category. We formalize this 
wifl\] a nmtual-in formation-like estimate: 
Pr(wlRCat) / Pr(w), tile probability of a word (w) 
appearing in the context of a Roget category divided by 
its overall probability in rile corpus. 
It is imlmrtant to exerci~ some care in estimating 
Pr(wlRCat). In principle, one could situply count tile 
number of times that w appears in the collective contexL 
However, this estimate, which is known as the tuaximnnt 
likelih(x',d estimate (MLE), can be unreliable, especially 
when w does not apl~-~ar vely often in the collective 
coutexl. We have smoothed file local estimates of 
Pr(wlRCat) with global estinmtes of Pr(w) to obtain a 
more reliable estimate. Estimates obtained from the local 
context are subject to measurement errors whereas 
estimates obtained li'om the global context are subject to 
being irrelevant. By interpoiathlg between the two, we 
attempt to find a compromise between the two sources of 
error, qllis procexlure is b~sed on recent work pioneewM 
by Willimn Gale, attd is explained in detail in another 
paper (Gale, Church and Yarowsky, 1992). Space does 
not permit a complete description here. 
Below are salient words tor Roget categories 348 and 
414. *lllose ~lected are tile ntosl important 1o rite 
models, where importance is delined as the product of 
salience and local fi'equency. That is to say important 
words ate distinctive and fi~equcat. 
The nnmhers in parentheses are the log of the salience 
(logPr(wlRCat) /Pr(w)), which we will henceforth 
refer to as the word's weight in the statistical model of 
the category. 
2. Fo~ illustrative simplicity, we will refer to words in context, In 
pnlctice, all op~\]lil~t$ ale ac~uMly p~rfonned on the Iemma~ of the 
words (eal/V = eat,eatg.elling,ate,elae~l), lind inflecdonml dlnincdons 
tire igltored. While thi* achieves more concentrated and bclter 
estimated ttttiUics, it throws away uneful information which natty be 
ext~loited in future work. 
AC'I'ES DE cOL1NG-92, NANTES, 23-28 Ao(rr 1992 4 S g PROC. o1: COLING-92, NAN rJ',S, A\[Jo. 23-28, 1992 
ANIMAL,INSECT (Category 414): 
species (2.3), family (1.7), bird (2.6), fish (2.4), 
breed (2.2), cm (2.2), animal (1.7), tail (2.7), egg (2.2), 
wild (2.6), common (1.3), coat (2.5), female (2.0), 
inhabit (2.2), eat (2,2), nest (2.5) .... 
TOOLS/MACHINERY (Category 348): 
tool (3.1), machine (2.7), engine (2.6), blade (3.8), 
cut (2.6), saw (5.1), lever (4.1), pump (3.5), device 
(2.2), gear (3.5), knife(3.8), wheel (2.8), shaft(3.3), 
wood(2.0), tooth(2.5), piston(3.6) .... 
Notice that these are not a list of members of the 
category; they are the words which are likely to co-occur 
with the members of the category. The complete list for 
TOOLS/MACH1NFJI.Y includes a broad set of relations, such 
as meronomy (blade, engine, gear, wheel, shaft, tooth, 
piston and cylinder), typical functions of machines (cut, 
rotate, move, turn, pull), typical objecls of those actions 
(wood, metal), as well as typical modifiers for machines 
(electric, mechanical, pneumatic). The list for a category 
typically contains over 3000 words, and is far richer than 
can be derived from a dictionary definition. 
2.3 Step 3: Use the resulting weights to predict 
the appropriate category for a word in novel 
text 
When any of the salient words derived in step 2 appear in 
the context of an ambiguous word, there is evidence that 
the word belongs to the indicated category. If several 
such words appear, the evidence is compounded. Using 
Bayes' rule, we sum their weights, over all words in 
context, and determine the category for which the sum is 
greatest ~. 
ARGMAX ~ log Pr(w\[RCat) x Pr(RCat) 
Rca: w i~ co,,~1 Pr(w) 
The context is defined to extend 50 words to the left and 
50 words to the right of the polysemous word. This range 
was shown by Gale, Church and Yarowsky (1992) to be 
useful for this type of broad topic classification, in 
contrast to the relatively narrow (+3-6 word) window 
used in previous studies (e.g. Black, 1988). The 
3. The reader may have noticed that the Pr(w) factor can be omitted 
since it will not change the results of the maximization. It is 
included here for expository convenience so that it is possible to 
~-,npare results across words with very different probabilities, 'nae 
factor also become• impoc.ant when an incomplete tet of indicators 
iJ stored be, cause of comlmtational spac~ constraints. Currently we 
assume a uniform prior- probability for each Roget category 
(Pr(Rcal)). i,e. tense classification is based exclusively on 
otmte~tual information, independent of the underlying prd3abillt y of 
a given Re•el category appearing at any point in the colpos. 
maximization over RCats is constrained to consider only 
those categories under which the polysemous word is 
listed, generally on the order of a half dozen or so. 4 
For example the word crane appears 74 times in Groliers; 
36 occurrences refer to the animal sense and 38 refer to 
the heavy machinery sense. The system correctly 
classified all but one of the machinery senses, yielding 
99% overall accuracy. The one miselassified case had a 
low score for all models, indicating a lack of confidence 
in any classification. 
It is useful to look at one example in some more detail. 
Consider the following instance of crane and its context 
of + l0 words: 5 
lift water and to grind grain .PP Treadmills attached 
to cranes were used to lift heavy objects from Roman 
times, 
The table below shows the strongest indicators identified 
for the two categories in the sentence above. The model 
weights, as noted above, are equivalent to 
log Pr(wlRCat ) / Pr(w). Several indicators were 
found for the TOOLS/MACHtNE class. There is very little 
evidence for the ANIMAL sense of crane, with the possible 
exception of water. The preponderance of evidence 
favors the former classification, which happens to be 
correct. The difference between the two total scores 
indicate strong confidence in the answer. 
TOOLS/MACII. Weight ANIMALINSECT Weight 
water 0.76 lift 2.44 
lift 2.44 
grain 1.68 
used 1.32 
heavy 1.28 
Treadmills 1.16 
attached 0.58 
grind 0.29 
water 0,11 
TOTAL 11,30 TOTAL 0.76 
4. Although it is often useful to restrict the search in this way. the 
restriction does sometimes lead to uc~ble, especially when there are 
gaps in the thesaurus. For example, the category AMIJSI~I,,g~-r (# 
876) lisa • number of card playin 8 terms, lint for some reason, the 
word suit is not included in this list. As it happens, the Grulier's 
Encydopndia contains 54 instances of the card-playing sense of suit, 
all of which ale mislabeled if the search is limited to just those 
categories of suit that are listed in RogeCs. However, if we open up 
the search to consider all 1042 care•odes, then we find that all 54 
instances of su//are correctly labeled ils/o¢/usE,~,~cr, and mo~over. 
the scca~ is large in all 54 instances, indicating great confidence in 
the assignment. It is poJsiblc that the unrestricted search mode 
might be • good way to attemps to fill in omisfions in the •ha•auras. 
In any case. when suit is added to the ,oa~t/s~E~rr category, overall 
accuracy improves from 68% to 92%. 
5, "Ibis narrower window is used for iaust rative simplicity. 
ACRES DE COLING-92, NANTES, 23-28 ^OUT 1992 4 5 6 PREC. OF COLING-92, NANTES, AUG. 23-28, 1992 
TABLE i 
Sen~ R~etCat~o~ N Co~. 
STAR (Hirst, 1987: N/A) 
Space Object UNIVERSE 1422 96% 
Celebrity r~rcrm~TArNt.X 222 95% 
Staa" Shaped Object INSlOlqIA ..... _5_6_ ...... .82.% 
1700 96% 
MOLE (HirsL 1987: N/A *) 
Quantity OII!MlCALS 95 98% 
Mammal ANIMALJNSECI" 46 100% 
Skin Blemish DISEASE 13 100% 
Digging Machine SUPPORT 4 100~o 
160 99% 
GALLEY (LusL 1986: 50-70% overall) 
Ancient Slfip SIIIP,BOAT 35 97% 
Printer's Tray PRI~flNG 5 100% 
Ship's Kitchen . COOKING 2 50% 
42 95% 
CONE (Lesk, 1986; 50-70% overall *) 
Part of Trec PLANT 71 99% 
Shape of Ohject ANGULARITY 89 61% 
Part of Eye VISION 13 69% 
173 77% 
BA&q (HearsL 1991: 100%; Speech Synthesis) 
Musical Senses MUSIC 158 99% 
Fish ANIMAL,INSECrf 69 100% 
227 99% 
BOW (Clear, 1989: < 67%; Speech Synthesis) 
Weapon ARMS 59 92% 
Front of Ship StllP,BOAT 34 94% 
Violin Part MUSICAL INSTR 30 100% 
Ribbon ORNAMENTAalON 4 25% 
Bend in Object CONV~.Xn'V 2 50e 
Lowering Head RESPF.CT .______0_ ....... 5_-___ 
129 91% 
~--~Clear, 1989: < 65%) 
Preference PAR'I\]CULA R IIY 228 93% 
Flavor SENSATION 80 93% 
308 93% 
INTEREST (Black, 1988: 72%; Zemik, 1990: > 70%) 
Curiosity REASONING 359 88% 
Advantage IN/UffI1CE 163 34% 
Financial DEBT 59 90% 
Share PROPERTY 21 38% 
602 72% 
ISSUE (Zemik, 1990: < 70%) 
Topic rotrtacs 831 94% 
Periodical BOOKS.PERIODI 28 89% 
Stock SECURn1Es 9 1OO% 
868 94% 
Sense Roget Category N Corr. 
DUTY (Gale et el, 1992: 96%) 
Obligation DUTY 347 96% 
Tax PRICE.Iq:~. 52 96% 
399 96% 
SENTENCE (Gale et al, 1992', 90% *) 
Florishment t .EGALAC'I1ON 128 99% 
Set of Words GRAMMAR 213 98% 
341 98% 
SLUG (Hirsk 1987: N/A *) 
Animal ANI MAL,INSI!CT 24 100% 
Type Strip I,RINTtNO 8 100% 
Mass Unit WEIGItT 3 100% 
Fake Coin MONEY 2 50% 
Metallurgy 1MPUIZE.IMPAC-f I 100% 
Bullet ARMS 1 100% 
39 97% 
Notes: 
1) N refers to the total number of each sense obseawed in 
the test corpus. Corr. indicates file percemage of those 
tagged correctly. 
2) Because thexe is no independent ground truth to indicate 
which is the "correct" Roget category for a given word, 
the decision is a subjective judgement made by a single 
human judge, in this case the author. 
3) As previously noted, the Roger index is incomplete, hi 
four cases, identified by *, one missing category has been 
added to the list of possibilities for a word. These 
ontissions in the lexicon have been identified as outlined in 
Footnote 4. Without these additions, overall system 
performance would decrease by 5%. 
4) Uses which an English speaker may consider a single 
sense are often realized by several Roget categories. For 
the purposes of succinct representation, such categories 
have been merged, and the name of file dominant category 
used in the table. As of this writing, the process has not 
been fully automated. 
For many applications such as speech synthesis and 
assignment to an established dictionary sense number or 
possible French translations, this merging of Roget classes 
is not necessary. 
The primary criterion for success is that words are 
partitioned into pure sense clusters. Words having a 
different sense from the majority sense of a partition are 
graded as errors. 
5) Examples with the ammtation 'speech synthesis' have 
multiple pronunciations corresponding to sense 
distinctions. Their disambiguafion is important in speech 
processing. 
6) All results are based on 100% recall. 
AcrEs DE COLlNG-92, NAI~'ES, 23-28 AOt3T 1992 4 5 7 PROC. OF COLING-92, NArCrES, AUG. 23-28, 1992 
3. Evaluation 
The algorithm described above was applied to 12 
polysemous words previously discussed in the sense 
disambignation literature. Table 1 (previous l~lge) shows 
the systenl's performance. Authors who have discussed 
these words are listed in parentheses, along with the 
reported accuracy of their systems. Direct comparisons 
of performance between researchers is difficult, 
compounded by variances in corpora and grading criteria; 
using the same words is an attempt to minimize these 
differences. 
Regrettably, most authors have reported their results in 
qualitative terms. The exceptions include Zemik (1990) 
who cited "recall and precision of over 70%" for one 
word (interest) and observed that results for other words, 
including /ssue, were "less positive." Clear (1989) 
reported results for two words (65% and 67%), 
apparently at 85% recall. Leak (1986) claimed overall 
"50-70%" accuracies, although it is unclear under which 
parameters and constraints. In a 5 word test set, Black 
(1988) observed 75% mean accuracy using his optimal 
method on high entropy, 4-way sense distinctions. Hearst 
(1991) achieved 84% on simpler 2-way distinctions, 
editing out additional senses from the test set. Gale, 
Church and Yarowsky (1992) reported 92% accuracy, 
also on 2-way distinctions. 
Out eun'ent work compares favorably with these results, 
with 92% accuracy on a mean 3-way sense distinction 6. 
The performance is especially promising given that no 
hand tagging or special corpora were required in training, 
unlike all other systems considered. 
4. Limitations of the Method 
The procedure described here is based on broad context 
models. It performs best on words with senses which can 
be distinguished by their broad context. These are most 
typically concrete nouns. Performance is weaker on the 
following: 
Topic Independent Distinctions: One of the reasons that 
interest is disambiguated poorly is that it can appear in 
almost any context. While its "curiosity" sense is often 
indicated by the presence of an academic subject or 
hobbie, the "advantage" sense (to be in one's interests) 
has few topic constraints. Distinguishing between two 
such abstractions is difficult. 7 However, the financial 
6. This result is a fair ra~lure of pedorr~nee on words used in 
p~vi{ms studies, and may he useful for comparison acmsl systems. However, as wolrd$ pmvioully discuJscd in the literature may not he 
t~preu~tafive of typical English polyk-my, mean performance on • eomlTletely random u~ of words should differ, 
7. Black (1988) has noted that this disfnction for interest is strongly 
corrected with th© ~urality (~" the word, a future we cura~ntly don't utilize. 
sense of interest is readily identifiable, and can be 
distinguished from the non-financial uses with 92% 
accuracy. Other distinctions between topic independent 
and topic constrained senses appear successful as well 
(e.g. taste, issue, duty and sentence). 
Minor Sense Distinctions within a Category: Distinctions 
between the medicinal and narcotic senses of drug ate not 
captured by the system because they both belong to the 
same Roget category (REMEDY). Similar problems occur 
with the musical senses of bass. Roget's Thesaurus 
offers a rich sub-hierarchy within each category, 
however. Future implementations will likely use this 
information, which is currently ignored. 
Verbs: Verbs have not been considered in this particular 
study, and it appears that they may benefit from more 
local models of their typical arguments. The unmodified 
system does seem to perform well on verbs which show 
clear topic distinctions such as fire. It's weapon, engine, 
furnace, employee, imagination and pottery senses have 
been disambiguated with 85% accuracy. 
Pre-Nominal Modifiers: The disambiguation of pre- 
nominal modifiers (adjectives and compound nominals) 
is heavily dependent on the noun modified, and much less 
so on distant context. While class-based Bayesian 
discrimination may be useful here as well, the optimal 
window size is much narrower. 
Idioms: These broad context, topic-based discriminators 
are also less successful in dealing with a word like hand, 
which is usually found in fixed expressions such us on the 
other hand and close at hand. These fixed expressions 
have more function than content, and therefore, they do 
not lend themselves to a method that depends on 
differences in content. The situation is far from hopeless, 
as many idioms are listed directly in Roget's Thesaurus 
and can be associated with a category through simple 
table lookup. Other research, such as Smadja and 
McKeown (1990), have shown more general ways of 
identifying and handling these fixed expressions and 
collocations. 
Given the broad set of issues involved in sense 
disambiguation, it is reasonable to use several specialized 
tools in cooperation. We akeady handle part of speech 
distinctions through other methods; an efficient idiom 
recognizer would be an appropriate addition as well. 
5. Linking Roget Categories with other 
Sense Representations 
The Roget category names tend to be highly mnemonic 
and may well suffice as sense tags. However, one may 
want to link the Roget tags with an established reference 
such as the sense numbers one finds in a dictionary. We 
accomplish this by applying the models described above 
to the text of the definitions in a dictionary, creating a 
table of correspondences between Roget categories and 
ACRES DE COLING-92, NANTES, 23-28 AOUT 1992 4 5 8 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
sense numbers. Results for the word crane are illustrated 
below for two dictionaries: (1) COBUILD (Sinclair, 
1987), and (2) Collins English Dictionary, First Edition 
(CED1) (Hanks, 1979). 
RCAT Sense # Definition 
TOOLS crane 1.1 
ANIMAL crane 1.2 
ANIMAL crane l 
ANIMAL crane 2 
TOOLS crane 3 
"IDOLS crane 4 
a machine with a long movable 
large bird with a long neck and 
any large long-necked long-leg 
any similar bird, such as a her 
a device for lifting and moving 
a large trolley carrying a boom 
It may also be possible to link Roget category tags with 
"natural" sense tags, such as translations in a foreign 
language. We use a word-aligned parallel bilingual 
corpus such as the French-English Canadian Hansards for 
this purpose. For example, consider the polysemous 
word duty which can be translated into French its devoir 
or droit, depending on the sense (obligation or tax, 
respectively). When the Grolier-trained models are 
applied to the English side of the Hansards, the words 
tagged PRICE.FI~ most commonly aligned with the 
French words droits (256), droit (96) and douane (67). 
Words labeled OUT'/(the Roget category for Obligation) 
most frequently aligned with devoir (205). These 
correlations may have useful implications for machine 
translation and bilingual lexicography. 
6. Other Sense Disambiguation Methods: 
The Knowledge Acquisition Bottleneck 
Word sense disambiguation is a long-standing problem in 
computational linguistics (Kaplan, 1950 ; Yngve, 1955; 
Bar-Hillel, 1960), with important implications for a 
variety of practical applications including speech 
synthesis, information retrieval, and machine translation. 
Most approaches may be characterized by the lollowing 
generalizations: 1) They tend to focus on the search for 
sets of word-specific features or indicators (typically 
words in context) which can disambignate the senses of a 
word. 2) Efforts to acquire these indicators have faced a 
knowledge acquisition bottleneck, characterized by either 
substantial human involvement for each word, and/or 
incomplete vocalmlary coverage. 
The AI community has enjoyed some success hand- 
coding detailed "word experts" (Small and Rieger, 1982; 
HirsL 1987), but this labor intensive process has severely 
limited coverage beyond small vocabularies. 
Others such as Lesk (1986), Walker (1987), Veronis and 
Ide (1990), and Guthrie et al. (1991) have turned to 
machine readable dictionaries (MRD's) in an effort to 
achieve broad vocabulary coverage. MRD's have the 
useful property that some indicative words for each sense 
are directly available in numbered definitions and 
examples. However, definitions arc often too short to 
provide an adequate set of indicators, and those words 
which are found lack significance weights to identify 
which are crucial and which are merely chaff. 
Dictionaries provide well structured but incomplete 
information. 
Recently, many have turned to text corpora to broaden 
the range and volume of available examples. Unlike 
dictionaries, however, raw corpora do not indicate which 
sense of a word occurs at a given instance. Several 
researchers (Kelly and Stone, 1975; Black, 1988) have 
overcome this through hand tagging of training examples, 
and were able to discover useful discriminatory patterns 
from the partitioned contexts. This also has proved labor 
intensive. Others (Weiss, 1973; Zeroik, 1990; Hearst, 
1991) have attempted to partially automate the hand- 
tagging process through bootstrapping. Yet this has still 
required significant human intervention for each word in 
the vocabulary. 
Brown et al. (1991), Dagan (1991), and Gale ct at. (1992) 
have looked to parallel bilingual corpora to further 
automate training set acquisition. By identifying word 
correspondences in a bilingual text such as the Canadian 
Parliamentary Proceedings (Hansards), the translations 
found fur each English word may serve as sense tags. For 
example, the senses of sentence may be identified 
through their correspondence in the French to phrase 
(grammatical sentence) or peine (legal sentence). While 
this method has been used successfully on a portion of 
the vocabulary, its coverage is also limited. Currently 
available bilingual corpora lack size or diversity: over 
hulf of the words considered in this study either never 
appear in the Hansards or lack examples of secondary 
senses. More fundamentally, many words are mutually 
ambiguous across languages. French would be of little 
use in disambiguating the word interest, as all major 
senses translate as int~rdt. More promising is a non-lndo 
European language such as Japanese, which should avoid 
such mutual ambiguity for etymological reasons. Until 
more diverse, large bilingual corpora become available, 
the coverage of these methods will remain limited. 
Each of these approaches have faced a fundamental 
obstacle: word sense is an abstract concept that is not 
identified in natural texL Hence any system which hopes 
to acquire discriminators for specific senses of a word 
will need to isolate samples of those senses. While this 
process has been partially automated, it appears to require 
substantial human intervention to handle an unrestricted 
vocabulary. 
7. Conclusion 
This paper has described an approach to word sense 
disambiguation using statistical models of word classes. 
This method overcomes the knowledge acquisition 
bottleneck faced by word-specific sense discriminators. 
By entirely circumventing the issue of polysemy 
AClXS DE COLING-92, NANqVS, 23-28 Aovr 1992 4 5 9 PRec. OF COLING-92, NAI'~fES, AUG. 23-28. 1992 
resolution in training material acquisition, the system has 
acquired an extensive set of sense discriminators from 
unrestricted monolingnal texts withoat haman 
intervention. Class models also offer the additional 
advantages of smaller model storage requirements and 
increased implementation efficiency due to reduced 
dimensionality. Also, they can correctly identify a word 
sense which occurs rarely or only once in the corpus -- 
performance unattainable by statistically trained word- 
specific models. These advances are not without cost, as 
class-based models have diluted discriminating power 
and may not capture highly indicative collocations 
specific to only one word. Despite the inherent 
handicaps, the system performs better than several 
previous approaches, based on a direct comparison of 
results for the same words. 
8. Acknowledgements 
Special thanks are due to Ken Church and Barbara Grosz 
for their invaluable help in restructuring this paper, and to 
Bill Gale for the theoretical fonndalions on which this 
work rests. The author is also grateful to Marts Hearst, 
Femando Pemira, Donald Hindle, Richard Sproat, and 
Michael Riley for their comments and suggestions. 

References 

Bar-Hillel (1960). "Automatic Translation of Languages," in Advances 
in Coenpmera, Donald Booth and R. E. Meagher, eds., Academic, New 
York. 

Black, Ez~ (1988), "'An Experiment in Computational Discrimination 
of English Word Sea~s." IBM Journal of Research and Dev¢lopmenl, 
v 32. pp 185-194. 

Brown, Pr.mr, Stephen Della Pietra, Vincent Della Pinata, and Robert 
Mercer (1991), "Word Sense Disambiguition using Statistical 
Methods," Prooteding$ of the 2¢th Annual Meeting of the Association 
\[or Computational Linguistics, pp 264-270. 

Brown, Peter, Vil,,c~at Delh Pintra, Peter deSouza, and Rck~rt Mercer 
(1990), "clasa-based n-gram Modeht of Natural Language," 
Proceedings of the IBM Natural Language ITL, Paris, Fnmce, pp 283-298. 

C~apman, Robert (1977). Roget's International Thesaur~ (Fourth 
Edition), Haq~r and Row, New York, 

Choueka, Ymmov, and Serge Lusignam (1985). "'Disambiguation by 
Short Contexts," Computera and the ltwnanities, v 19. pp. 14%158. 

Omtch, Kmneth (1989), "A Stochastic Parts Program an Noun Phnse 
Parser for Un~strict~d Text, '~ Proceeding, IEEE International 
Conference on Acovatics, Speech and Signal Processing, Glasgow. 

Clear, Jeremy (1989). "'An Experiment in Automatic Word Sense 
ld~RificJtlon." Internal Doctwnent, Oxford Univerlity Press, Oxford. 

Courdl, Garriton (1989). A Connectionist Appre.ach to Word Sense 
Disamblguatioa, Pitman, London. 

Dagan, 13o, Alon leaS, and Olrike Schwall (1991), "Two Languages am 
Infmmative than One," Proceedings of the 29th Annual Meeting 
of the Aesoclation for Computatiosal Linguistics. pp 130-137. 

Gale, William, Kenneth (:hutch, and David Yarowsky (1992), 
"Ditcriminatlon Decisions for 100,000-Dimensional Spaces" AT&T 
Statistical Retear¢.h Report No. 103. 

Gale, William, Kenneth Church, and David Yarowsky (1992), "A 
Method for Disarnbiguating Word S~ses in • Large Ca~.ts," to appear 
in Computers and llumdnitits, 

Gnmger, Richard (1977), "FOUL-UP A program that figures out 
meanings of wo~ from ¢~ntext," HCAII-77, pp. 172-178. 

Guthile, J., L Guthrle, Y. Walks, and H. Aidinejad (1991), "Subject- 
Dependent Co-oc.cunea~ and Word Sense Disambiguation," 
Proceedings of the 29th Annual M, teeing of the Association for 
Compulmlanal Linguistics, pp 146-152. 

Hanks, Patrick (ed.) (1979), Collins English Dictionary, Collins, 
London and Glasgow, 

He.am, Matti (1991), '*Noun Hctnograph Disambiguation Using Local 
Context in Large Text Corpora," Using Corpora, Univenrity of 
Waterloo, Waterloo, Ontario, 

\]tirst, Graerne. (1987), Stmamic luterpretation and the Resolution of 
Ambiguity, Cambridge University Pl~ss, Cambridge. 

K~plan, Abraham 0950), "An Experimental Study of Ambiguity in 
Context," cited in Mechanical Translation, v. I, nos. 1-3. 

Kelly, ,Edward, and Phillip Stone (1975), Computer Recognition of 
English Word Senses, North-HoUand, Amsterdam. 

Lask, Michael (1986), "Automatic Sense Disambiguadoa: How to tell • 
Pine Cone from an Ice Cream Cone," Proceeding of the 1986 SIGDOC 
Conference, Association for Ct~nputing Machinery, New York. 

Miller, George (1990), "Woednea: An On-line Leaical Database," 
InterncUionalJournal of Lexicography, 4(3), 1990. (Special Issue). 

Moate\].ler, Fredrick, and David Wallace (1964). Inference and Disputed 
Authorxhip: The Federalist, Addison-Wesley, Reading, Mastacinamtts. 

Salton, G. (1989), Automatic Text Processing, Addis0n-Wesley 
Publishing Co. 

Sinclair, J., Ilanks, P., Fox, G., Moon, R., Stock, P. et el. (edl.) (1987) 
Collin~ Cobuild English Language Dictionary, Collins, London and 
Glasgow. 

Slator, Brian (1991), "Using Context for Sense Prefenmce," in Zcmik 
(ed.) Lexical AcqioMtion: E:9~ioitiog On-Line ResoW:ces to B~Id a 
L~icon, Lawrence Edbamn, Hillsdale, N'J. 

Smadja, F. and K. McKeown (1990), "'Automatically Ext~cting and 
Representing Collocations for Language Generation," Proceedings of 
the 21tth Annual Meeting of the Association for Computational 
Linguistics. 

Small, S. and C. Rieger (1982), "'Parring and Contprehending with 
Word Experts (A Theory and its Realization),'* in StrategiesfgrNalural 
Language Processing, W, Lehnert and M, Ringle, eds., Lawrence 
Erlbaum Associates, Hillsdale, NJ. 

Walker, Donald (1987), "Ka\]owledge Resource Tools for Aeo~'ssing 
Large Text Files," in Machine Translation: Theoretical and 
Methodological Issues, Serges Nirenberg, ed., Cambridge University 
Pm~s, Cambridge, England. 

Weiss, Stephen (1973), "Learning to Disamb/guate," Information 
Storage and Rari~val. v. 9. pp 33-4 I, 

Veronis, Jean and Nancy lde (1990), "Word Sense Disamliiguation wilh 
Very Large Neural Networks Extracted from Machine Readable 
Dictionaries," in Proceedings COLING-90 , pp 389-394. 

Yngve, Victor (1955), "Syntax and the Problem of Multiple Meaning," 
in Machine Translation of Languages. William Locke and Donald 
Booth, eds., Wiley, New York. 

Zemik, Un (1990) "Tagging Word Senses in a Corpus: The Nee.die in 
the Haystack Revisited," in Text-Bated Intelligenl Systems: Currem 
Research in Text Analysis, Information Extraction, and Retrisval, P.S. 
Jacobs, ed., GE Research & Devdopmemt Center, Schetw..oady, New 
York. 
