A LEXICON OF DISTRIBUTED NOUN REPRESENTATIONS 
CONSTRUCTED BY TAXONOMIC TRAVERSAL 
Richard F.E. Sutcliffc ~, Donie O'Sullivan, Fergus Meharg 
\])ei)a,rlane.nt of (-,'Onal)utcr Science and Informal;ion Systems 
University of Limerick, Ireland 
1 INTII,ODUCTION 
In order to construct systems wlfich can pro('css nat- 
ural language in a sophisticated fashion it is highly 
desirable to be able to rel)resent linguistic meanings 
in a comlmtationally tractable fashion. One a,ppro~ch 
to the problem of capturing meanings ~tt the lcxi<:al 
level is to use a form of distributed representation 
where each word meaning is converted into a point in 
an n-dimenskmal space (Sutcliffc, 1992a). Such rel)- 
resentations can capture a wide variety of word mean- 
ings within the same forlnalism. In addition they can 
be used within distributed representations \[br captur- 
ing higher level information such as that expressed I)y 
sentences (Sutcliffc, 1991a). Moreover, they can be 
scaled to suit a particular tradeoil" of speciticity and 
memory usage (Sutclilfi~, 1991b). Hnally, distributed 
representations can be processed conwmiently by vec- 
tor processing methods or connectionist algorithms 
and can be used either as part of a symbolic sys- 
tem (Sutclitl~, 1992b) or within a eonnectionist ar- 
chitecture (Sutcliffe, 1988). In previous work we have 
shown how such representations can be constructed 
automatically by the method of taxonomic tr~Lversal, 
using tire Merriam Webster Compact Electronic die-- 
tionary (Sutcliffe, 1993) ~md the Irish-Irish An Focldir 
Beag (Sutcliffe, McElligott and O Ndill, 1993). Ilow 
ever our efforts so far have I)een limited by our parsing 
technology to lexicons of a few thousand words. We 
describe here how we can gel|er;M,e a lexical entry for 
any of the 71,000 nouns 2 in the Princeton WordNet 
(Beckwith, Fetlbaum, (\]ross mM Miller, 1992), and 
the initial tests we have conducted on the representa- 
tions. 
Our method is closely related to other work which 
exploits the taxonomic nature of dictionary detini- 
tions (Amsler, 1980; Iliedorn, Byrd and (~hodorow, 
t986; Vossen, 1990; Guthrie, Slator, Wilks and I/ruce, 
1990; Nutter, Fox and Evens, 1990). In addition there. 
have already been some very interesting al)l)roaehes 
to the construction of distributed semantic represen- 
tations either from dicl, ionaries (Wilks el, el., t990) or 
fl'om corpora (Schuetze, 1993). 
1 This research was support~ed in l)art by the I!\]m'Ol)Can Union 
under (:ontract~ 1,1{E-6203(1 and by the National Software \])irec- 
toratc of h'chmd. Wc are indebted to Tony Molloy, HedmolM 
O'Brien and Gcmn~a ltyan i~:n' t.heir help with this work. 
2'l'his figure includes hyphenat~cd t.erms, COml)ound nouns 
~klld Ill'Opel" l\[&tl\[ll~s, 
2 EXTRACTING FEATURE R,EPR,ESEN- 
TATIONS 
'l'he object of our work is to produce for each n()un 
sense in a lexicon a semantic reprcscntation consist. 
ing of a set of tim.lure-centrality p~firs. The lhatures 
are semantic attributes each of which says something 
about the concept being defined. 'Phe centrality as- 
sociated with each feature is ;u real mmd)er which in-- 
dicatcs how strongly the feature contributes to the 
meaning of the concept. The use of centralities allows 
us to distinguish I)etwecn important and less impor 
taut ti;atures in a semantic rel)resentation. By scaling 
the centralities in a particular noun-sense representa- 
tion so that tire stun of their squares is one we can 
use the (lot product ol)eration to compute the sen,an 
tic simila.rity of a pair of coneel)ts. A word COlnpared 
to itself always scores OlLe while a word compared to 
another word is always less than or equal to one. This 
is equivalent to saying that each word representation 
is a vector of length one in an n-dimensional space, 
where n is the nmnber of features which are used in 
the lexicon as a whole. 
Our algorithm for constructing the representations 
is based on two well-known observations. Firstly, a 
word definition in a dictionary provides attribute in- 
I'ormation about the COlmept ('a ilia.still' is a LAR(31'; 
dog'). Secondly a word delinition also provides tax(> 
nomic information about the concept ('a mastiff is i~ 
large DOG'). We use the former to derive attributes 
for our representation, and the latter to ol)tain other 
definitions higher up in the taxonomy from which fur-- 
ther attril)utes can be obtained. In assigning central 
ities to %aturcs, we use the same value for each at- 
tribute added at a particuli~r level in the taxonomic 
hierarchy, and we reduce the value used as we move 
u 1) to higher levels. This corresponds I,o the intuition 
that a feature which is derived from a delinitioa which 
is close to the word of interest in the taxonomy con 
tri/)utes more to its meaning than one which is derived 
from a more distanu detinition. 
The Princeton WordNet is very suitable lbr use in 
iml)lementing our extraction algorithm because taxo 
nomic links are represented explicitly by pointers. In 
most Ml{l)s such links have to be deduced by synta(> 
tic and semantic analysis of sense detinitions. Nouns 
in WoMNet are organised around synsels. I!;ach 
synset may inelmle a list of synonyms, pointers to 
hyponym and hypernym synsets, and a gloss con'e- 
sponding to a conventional dictionary definition. 
827 
Figure 1. Synset Hierarchy for the word 'terrier' derived from Princeton Wordnet, 
1 sense of terrier 
Sense I 
terrier -- 
(any of several usu. small short-bodied breeds originally trained 
to hunt animals living underground) 
=> hunting dog -- 
(a dog used in hunting game) 
=> domestic dog, pooch, Canis familiaris -- 
(domesticated mammal prob. descended from the common wolf; occurs in 
many breeds) 
=> dog 
=> canine, canid -- 
(any of various fissiped mammals with nonretractile claws and 
typically long muzzles) 
=> carnivore -- 
(terrestrial or aquatic flesh-eating mammal; terrestrial carnivores 
have four or five clawed digits on each limb) 
=> placental mammal, eutherian, eutherian mammal 
=> mammal -- 
(any warm-blooded vertebrate that nourish their young with milk 
and having the skin more or less covered with hair; young are 
born alive except for the small subclass of monotremes) 
=> vertebrate, craniate -- 
(animals having a bony or cartilagenous skeleton with a 
segmented spinal column and a large brain enclosed in a skull 
or cranium) 
=> chordate 
=> animal, animate being, beast, brute, creature, fauna -- 
(a living organism characterized by voluntary movement) 
=> life form, organism, being, living thing -- 
(any living entity) 
=> entity -- 
(something having concrete existence; living or nonliving) 
828 
Table 1. Twenty words in tire categories \] 
ears clogs flowers trees l)eol)le\] 
chariot l)ug t)ansy larch brniser J 
inotorhike terrier dalfodil pine patriarch 
jcep lapdog t.ulil) oak siren 
lnoped chillllahlla l'Oge sycal nol'o rake 
\[ Table 2. Lexieal Repres(mtation Summary 
No of words 20 
Total lmnll)er of features 249 
Average illlln\])(~l" of fea{blll'eS 39 
Miifimmn 17 
Maxinmm 76 
The extraction algorithm starts with the synset cor- 
responding to the word-sense for which we wish to 
create a lexieal entry. 'l?he gloss is tokenised, function 
words are removed and tire relnaining content words 
are converted to their root railer(ion. All such words 
are considered to be real,ares of the word-sense, and 
are given a centrality of 1.0. We then chain at)war(Is 
using a hypernymic link (if any) 3. At the. next level 
up, features are extracted from the hypernym's gloss, 
nsing a centrality of 0.9. The process is repeated, re- 
ducing the centrality by 0.1 at each level, until either 
the top of tire hierarchy is reached or the centrality 
falls to zero. Finally, the rel)resentat.ion , consisting of 
a set of feature-centrality l)airs, is normalised. 
3 RESULTS 
The Mgorithm deseril)ed above has helm imt)hmmnted 
and can be used to construct a lexical entry for any of 
the nouns in the WordNet database. Figure 1 shows 
the synset hypernym hierarchy for tile word 'terrier' in 
WordNet. Figure 2 shows the semantic representation 
derived by the algorithm for this word. We present 
here some preliminary exl)eriments whic.h attempt to 
measure the performance of the lexicon. Four words 
were chosen fi'om each of live categories of noun wlfi('.h 
we label cars, clogs, /lowers, trees mid peol)le. These 
are shown in Table I. Talrle 2 shows a Slllllll/lary of 
tile characteristics of the word representations in the 
set of twenty words. Pairs of categories were eho- 
soil, cars-dogs, flowers-trees alld so ()\[1~ each contain- 
ing eight words. A series of eighl.-by-eight tables was 
then computed, showing the dot l)roduct of each word 
with every other word in the category pair. Table 3 
shows the results for the curs-dogs matrix. There are 
several points to note about this table. I:'irstly, the 
match of one car word with another is high, rang- 
ing between 0.58 and 1.0 with an average of 0.8. This 
shows that the lexicon has cal)tured the similarity be- 
tween the ear coneel)ts. Secon(lly, the match of one 
dog word with another is also high, ranging I)etwcen 
0.63 and 1.0 with an average of 0.76, for the same 
festoon. Thirdly, the lnateh of a car word with a clog 
word is low, ranging between 0.05 and 0.17 with an 
average of 0.t. This is heeause cars and dogs are 
not closely linke.d semantically. Tal)le 4 shows re- 
suits for the flowers-trees matrix. Flowers and trees 
are much more closely related semantically thau cars 
and dogs, and this is rellected in the results. Hower 
words match with tree words in a range of 0.30 to 0.67 
aAt. wesent, we only clmosc the firs(, such link if there are 
several. 
with an average of 0.4, n~mch higher than for cars m~d 
dogs. The match of flowers with tlowers or trees with 
trees continues to be high. Finally, Tabh; 5 shows the 
people-dogs matrix. Note here thai; the match of peo- 
ple with themselwes is lower than that of (logs with 
themselves (average 0.63 rather than average 0.76.) 
This is because tile people words are in fact a rather 
disparate set. Note in particular that 'bruiser' against 
'rake' is the best lnatch while 'bruiser' against 'patri- 
arch' is the worst. This matches one's intuitions abont 
these concepts: patriarchs are "good" while 'hruisers' 
all(l q'akes' arc not. 
4 CONCLUSIONS 
We have presented a simph', algorithm which allows a 
set of distributed lexical semantir representations to 
be constructed from nouns in the Princeton WordNet. 
'\['he results show that the method works and produces 
good results. The main reason for this is the explicit 
taxonomic information in WordNet which has to be 
inferred in other dictionaries. Incorrect taxonomic in- 
formation seriously degrades the pex\[brrnanee of this 
Idml of method. On the other hand errors in indi- 
vidual features arc not so harmful as they have. no 
knock on effects. Ilowever, we are engaged in elim- 
inating errors in word sense and syntactic category 
which are the princilml sources of inaccuracy in the 
method. In addi|;ion we a.re working on objective 
methods for nw.asl,ring the perfornmnce of the lexi- 
con on a large scale. 
5 FLEFERF, NCES 
Amsler, R.A. (1984). Machine-l{,e;~dable I)ictionar- 
ies. Annual l~cvicw oj Information Science and 
7'echn.ology (ARLgT), 19, 161-209.. 
Bcckwith, R., Fellbaum, C., Gross, 1)., & Miller, G. 
A. (19!)2). WordNet: A Lexical Database Organ- 
ised on Psycholinguistie l'rineiples. In U. Zernik 
(l",d.) Usznq On-line l~csourees to Huild a Lexicon. 
Ilillsdale, Nil: I,awrence Erlbaum Associates. 
Ew'.ns, M. (1989). Computer-l/.eadahle Dictionaries. 
Annual Review of l~formaliou Science and Tech- 
nology (A 1~ L9"I'), 24, 85-I 17. 
(hlthrie, \],, Slator, B.M, Wilks, Y. and Bruce, 
I{,. (1990). ls there (~ontent in Empty tleads? 
In Proceedings of the 13Ih International Confer- 
once on Compulalional Linguistics (COLING-90), 
llelsinki, Finland, 3, 138-143. 
/liedorn, Byrd and Chodorow (1986). 
l,esk, M. (1986). Automated Word Sense Dis- 
ambiguation using Machine-I{eadable Dictionaries: 
829 
Table 3. Cars vs. Dogs 
chariot motorbike jeep moped pug terrier lapdog chihuahua 
chariot 1.00 0.74 0.58 0.73 0.13 0.17 0.14 0.09 
motorbike 0.74 1.00 0.69 1.00 0.1l 0.11 0.11 0.06 
jeep 0.58 0.69 1.00 0.68 0.08 0.09 0.09 0.05 
moped 0.73 1.00 0.68 1.00 0.10 0.10 0.11 0,05 
pug 0.13 O. 11 0.08 O. 10 1.00 0.68 0.65 0.69 
terrier 0.17 0.11 0.09 0.10 0.68 1.00 0.63 0.72 
lapdog 0.14 0.11 0.09 0.11 0.65 0.63 1.00 0.67 
chihuahua 0.09 0,06 0.05 0.05 0,69 0.72 0.67 1.00 
Table 4. Flowers vs. Trees 
pansy daffodil tulip rose larch pine oak sycamore 
pansy 1.00 0.32 0.36 0.49 0.37 0.32 0.37 0.28 
daffodil 0.32 1.00 0.70 0.37 0.38 0.33 0.37 0.39 
tulip 0.36 0.70 1.00 0,41 0.39 0.33 0.37 0.30 
rose 0.49 0.37 0.41 1.00 0,56 0,58 0.67 0.44 
larch 0.37 0.38 0.39 0,56 1.00 0.83 0.74 0.64 
pine 0.32 0.33 0.33 0.58 0.83 1.00 0.83 0.62 
oak 0.37 0.37 0.37 0.67 0.74 0.83 1.00 0.60 
sycamore 0.28 0,39 0.30 0.44 0.64 0.62 0.60 1,00 
Table 5: People vs. Dogs 
bruiser patriarch siren rake pug terrier lapdog chihuahua 
bruiser 1.00 0.40 0.52 0.63 0.12 0.15 0.13 0.08 
patriarch 0.40 1.00 0.40 0.55 0.15 0.18 0.16 0.17 
siren 0.52 0.40 1.00 0.50 O. 14 0.17 O. 14 0.09 
rake 0.63 0.55 0.50 1.00 0.12 0.15 0.13 0.08 
pug 0.12 0.15 0.14 0.12 1.00 0.68 0.65 0.69 
terrier 0.15 0.18 0.17 0.15 0.68 1.00 0.63 0.72 
lapdog 0.13 0.:16 0.14 0.13 0.65 0.63 1.00 0.67 
chihuahua 0.08 0.17 0.09 0.08 0.69 0.72 0.67 1.00 
repn( terrier, '(any of several usu. small short-bodied breeds originally trained to 
hunt animals living underground)', \[\[any, 0.19\], \[several, 0.19\], \[small, 0.19\], 
\[breed,O.19\], \[originally, 0.19\], \[trained,O.19\], \[hunt, 0.19\], \[animal,1.9\], \[living, 
0.19\], \[underground, 0.19\], \[a, 0.17\], \[dog, 0.17\], \[used, 0.17\], \[in, 0.17\], 
\[hunting, 0.17\], \[game, 0.17\], \[domesticated, 0.15\], \[mammal, 0.15\], \[descend, 0.15\], 
\[common, 0.15\], \[wolf, 0.15\], \[occur, 0.15\], \[many, 0.15\], \[various, 0.11\], \[fissiped, 
0.11\], \[with, 0.11\], \[nonretractile, 0.11\], \[claw, 0.11\], \[typically, 0.11\], \[long, 
0.ii\], \[muzzle, 0.11\], \[terrestrial,O.096\], \[aquatic, 0.096\], \['flesh-eating', 0.096\], 
\[carnivore, 0.096\], \[have, 0.096\], \[four, 0.096\], \[five, 0.096\], \[clawed, 0.096\], 
\[digit, 0.096\], \[on, 0.096\], \[each, 0.096\], \[limb, 0.096\], \['warm-blooded', 0.057\], 
\[vertebrate, 0.057\], \[nourish, 0.057\], \[young, 0.057\], \[milk, 0.057\], \[skin, 0.057\], 
\[more, 0.057\], \[less, 0.057\], \[covered, 0.057\], \[hair, 0.057\], \[are, 0.057\], \[born, 
0.057\], \[alive, 0.057\], \[except, 0.057\], \[monotreme, 0.057\], \[bony, 0.038\], \[skeleton, 
0.038\], \[segment, 0.038\], \[spinal, 0.038\], \[column, 0.038\], \[large, 0.038\], \[brain, 
0.038\], \[enclosed, 0.038\], \[skull, 0.038\], \[cranium, 0.038\], \[organism, 0.01\], 
\[characterized, 0.01\], \[voluntary, 0.01\], \[movement, 0.01\], \[entity, 0.01\], \[concrete, 
0.01\], \[existence, 0.01\], \[nonliving, 0.01\]\] ). 
Figure 2. The semantic representation for 'terrier' produced by the algorithm. 
830 
How to Tell a Pine Cone \[Yore an Ice (+'ream Cone. 
Proceedings of the 1986 SIGDOC Conference. 
Ide, N.M. and Veronis, J. (1990). Very I,arge Neu- 
ral Networks for Word Sense Disambiguation. Pro- 
ceedin9s oJ" the gu~vpean Conference on Artificial 
Intelligence, t,,'CAI'90, Slockholm, August 1990. 
Miller, G.A., Beekwith, R., Christiane Pelll>;mm, 
Gross, D. and Miller, K. (1992). Introduc- 
tion to WordNet: An On:line l,exieal l)atabase. 
Manuseril>t. 
Nutter, J.q'., Fox, E.A. and Evens, M.W (19901. 
Building a Lexicon from Machine Readable I)ictio- 
varies for hnproved Information Retrieval. Literary 
and Linguistic Computing, 5, 2, 129-138. 
Schuetze, |1. (199111. Translation by Confusion. In 
l>roeeedings of the AA A 1-93 Spring Symposium So- 
vies: Building Lexicons for Machine Translation, 
Mareh 23 25, 1993, Stanford, UT~iversily, CA. 
Sharkey, N.E., Day, P.A. and Sharkey, A.J.C. (19911. 
A Conncctionist Machine Tractable Dictionary: 
q'he Very Idea. In C.-M. Guo (Ed.) Machine l{.ea<l 
able Dictionaries. Norwood, N,I: Ablex. 
Suteliffe, l{.F.E. (19881. A Parallel l)istrit)uted Pro- 
cessing Approach to t, he I{cpresentation of Kn<>wl: 
edge for Natural Language Undersl, anding. Unl>ub- 
lished doctoral thesis, University of Essex, U K. 
Sutclilt'e, I{.Ii'.E. (1991a). Distributed II.el)resent, a: 
tions in a 'Fext Based lnfornmtioll Retrieval System: 
A New Way of Using the Vector Space Model. In 
Proceedings of the 1,'ourteenth Annual International 
ACM/SIGIR (/onference on Research and Develop- 
mcnt in Information Retrieval, Chicago, ll., Octoo 
ber 13-16, 1991. New York, NY: A(JM Press, pp. 
1.23-132. 
Sutcliffe, R.F.I",. (19911)). I)istributed Sut>syml)olic 
I{el>resentations for Natural I,allg/la.ge: llow \]rlUtlly 
Features I)o You Need? Proceedings of the 3rd 
h'ish Conference on Artificial Intelligence and Co:l= 
nitive Science, 20-21 ,qeplember 1990, University of 
Ulster at Jordanstown, Northern h'cland. P, erlin, 
FRG, Ileidelberg, FI{.G, New York, NY: Springer 
Verlag. 
Sutcli\[re, I{..F.E. (1992a). R.epresenting Meaning us- 
ing Microfeatures. In I{. l{eilly and N.F,. Sharkey 
(eds) Conneclionisl Approaches 1o Nalural Lan- 
gua,qe Processing. l,;nglewood Cliffs, NJ: Lawrence 
Erlbaum Associates. 
Sutclitl'e, IL.I:.E. (19921>). PI';I,ICAN: A Prototyl>c 
Information Retrieval System using I)istrihuted 
Prol)ositional 1{:'4>resentations. 'lb AI>pear in 1l. 
Sorensen (Ed.) Proceedings of AICS-9I - The 
l'buvth Irish Conference on Artificial lnlelligence 
and Cognitive Science, Uuiversity College Cork, 19- 
20 September 1991. l,ondou, UI(, Berlin, I!'I~G, 
tleidelberg, FGll,, New 3%rk, NY: Sprivger-Verlag. 
SutelitI~, H,. F. G. (1993). Constructing 1)istributcd 
Semantic Lexieal ll,el>resentations using a Machine 
II,eadal)le I)ietionary. In K.T. Ryau and IL 1:. 
E. Sutcliffe (Eds.) l'roceedil~gs of AI(LS'-91 - !l'hrx 
l"iJlh h'ish Conference on Artificial lnlelligencc and 
Cognilivc Science, University of Limerick, 10-11 
,%tflcmber t992. London, UK, Berlin, I"RG, llei: 
delberg, FGI{, New York, NY: Sl>ringer-Verlag. 
Sutctitrc, R. F. E., McF, lligott, A., &. O'Neill, (',. 
(I 9!t3). Using l)istributed Patterlls as l,anguage In- 
dependent Lexieal ll,epresentations. In Proceedings 
of the AAAI-93 £'pring Symposiunz Series: Build- 
ing Lexicons for Machine 7'ranslalion, March 23 
25, 1993, Slanford, University, (/A. 
Vossen, P. (1990). 'Fhe End of the ('hain: Where 
Does l)ecomposition of Lexieal Knowlc<lge Lead us 
Eventually? Esprit BRA-3030 ACQUII,EX WP 
010. English l)el)artment , Uniw'xsity of Amster: 
dam, The Netherlands. 
Wilks, Y., Fass, 1)., Guo, C.-M., Macdonald, 3., 
Plate, T. and Slator, B. (1990). Providing Machine 
Tra.ctabh', l)ietionary To<As. Machine Translation, 
5, 99-154. 
Tal>le 6 Word-Word Summary 
Cars (;ars avera~ge 0.80 
(Jars-Dogs a, verage 0.10 
Dogs Dogs average 0.76 
l:'lowers- lqowcrs average 0.58 
q'rees- Flowers aw~rage 0.40 
'l'rees-'lYees average 11.78 
People--Peol>le a.vcr age, 0.63 
I)ogs Dogs average 0.76 
Pcopl<>l)ogs average 0.14 
831 
