File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-2132_metho.xml

Size: 8,093 bytes

Last Modified: 2025-10-06 14:13:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C94-2132">
  <Title>A LEXICON OF DISTRIBUTED NOUN REPRESENTATIONS CONSTRUCTED BY TAXONOMIC TRAVERSAL</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
A LEXICON OF DISTRIBUTED NOUN REPRESENTATIONS
CONSTRUCTED BY TAXONOMIC TRAVERSAL
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 INTII,ODUCTION
</SectionTitle>
    <Paragraph position="0"> In order to construct systems wlfich can pro('css natural language in a sophisticated fashion it is highly desirable to be able to rel)resent linguistic meanings in a comlmtationally tractable fashion. One a,ppro~ch to the problem of capturing meanings ~tt the lcxi&lt;:al level is to use a form of distributed representation where each word meaning is converted into a point in an n-dimenskmal space (Sutcliffc, 1992a). Such rel)resentations can capture a wide variety of word meanings within the same forlnalism. In addition they can be used within distributed representations \[br capturing higher level information such as that expressed I)y sentences (Sutcliffc, 1991a). Moreover, they can be scaled to suit a particular tradeoil&amp;quot; of speciticity and memory usage (Sutclilfi~, 1991b). Hnally, distributed representations can be processed conwmiently by vector processing methods or connectionist algorithms and can be used either as part of a symbolic system (Sutclitl~, 1992b) or within a eonnectionist architecture (Sutcliffe, 1988). In previous work we have shown how such representations can be constructed automatically by the method of taxonomic tr~Lversal, using tire Merriam Webster Compact Electronic die-tionary (Sutcliffe, 1993) ~md the Irish-Irish An Focldir Beag (Sutcliffe, McElligott and O Ndill, 1993). Ilow ever our efforts so far have I)een limited by our parsing technology to lexicons of a few thousand words. We describe here how we can gel|er;M,e a lexical entry for any of the 71,000 nouns 2 in the Princeton WordNet (Beckwith, Fetlbaum, (\]ross mM Miller, 1992), and the initial tests we have conducted on the representations. null Our method is closely related to other work which exploits the taxonomic nature of dictionary detinitions (Amsler, 1980; Iliedorn, Byrd and (~hodorow, t986; Vossen, 1990; Guthrie, Slator, Wilks and I/ruce, 1990; Nutter, Fox and Evens, 1990). In addition there.</Paragraph>
    <Paragraph position="1"> have already been some very interesting al)l)roaehes to the construction of distributed semantic representations either from dicl, ionaries (Wilks el, el., t990) or fl'om corpora (Schuetze, 1993).</Paragraph>
    <Paragraph position="2"> 1 This research was support~ed in l)art by the I!\]m'Ol)Can Union under (:ontract~ 1,1{E-6203(1 and by the National Software \])irectoratc of h'chmd. Wc are indebted to Tony Molloy, HedmolM O'Brien and Gcmn~a ltyan i~:n' t.heir help with this work.</Paragraph>
    <Paragraph position="3"> 2'l'his figure includes hyphenat~cd t.erms, COml)ound nouns ~klld Ill'Opel&amp;quot; l\[&amp;tl\[ll~s,</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="828" type="metho">
    <SectionTitle>
2 EXTRACTING FEATURE R,EPR,ESEN-
TATIONS
</SectionTitle>
    <Paragraph position="0"> 'l'he object of our work is to produce for each n()un sense in a lexicon a semantic reprcscntation consist.</Paragraph>
    <Paragraph position="1"> ing of a set of tim.lure-centrality p~firs. The lhatures are semantic attributes each of which says something about the concept being defined. 'Phe centrality associated with each feature is ;u real mmd)er which in-dicatcs how strongly the feature contributes to the meaning of the concept. The use of centralities allows us to distinguish I)etwecn important and less impor taut ti;atures in a semantic rel)resentation. By scaling the centralities in a particular noun-sense representation so that tire stun of their squares is one we can use the (lot product ol)eration to compute the sen,an tic simila.rity of a pair of coneel)ts. A word COlnpared to itself always scores OlLe while a word compared to another word is always less than or equal to one. This is equivalent to saying that each word representation is a vector of length one in an n-dimensional space, where n is the nmnber of features which are used in the lexicon as a whole.</Paragraph>
    <Paragraph position="2"> Our algorithm for constructing the representations is based on two well-known observations. Firstly, a word definition in a dictionary provides attribute in-I'ormation about the COlmept ('a ilia.still' is a LAR(31'; dog'). Secondly a word delinition also provides tax(&gt; nomic information about the concept ('a mastiff is i~ large DOG'). We use the former to derive attributes for our representation, and the latter to ol)tain other definitions higher up in the taxonomy from which fur-ther attril)utes can be obtained. In assigning central ities to %aturcs, we use the same value for each attribute added at a particuli~r level in the taxonomic hierarchy, and we reduce the value used as we move u 1) to higher levels. This corresponds I,o the intuition that a feature which is derived from a delinitioa which is close to the word of interest in the taxonomy con tri/)utes more to its meaning than one which is derived from a more distanu detinition.</Paragraph>
    <Paragraph position="3"> The Princeton WordNet is very suitable lbr use in iml)lementing our extraction algorithm because taxo nomic links are represented explicitly by pointers. In most Ml{l)s such links have to be deduced by synta(&gt; tic and semantic analysis of sense detinitions. Nouns in WoMNet are organised around synsels. I!;ach synset may inelmle a list of synonyms, pointers to hyponym and hypernym synsets, and a gloss con'esponding to a conventional dictionary definition.</Paragraph>
    <Paragraph position="4">  (any of several usu. small short-bodied breeds originally trained to hunt animals living underground) =&gt; hunting dog -(a dog used in hunting game) =&gt; domestic dog, pooch, Canis familiaris -(domesticated mammal prob. descended from the common wolf; occurs in</Paragraph>
    <Paragraph position="6"> (any of various fissiped mammals with nonretractile claws and typically long muzzles) =&gt; carnivore -(terrestrial or aquatic flesh-eating mammal; terrestrial carnivores have four or five clawed digits on each limb) =&gt; placental mammal, eutherian, eutherian mammal =&gt; mammal -(any warm-blooded vertebrate that nourish their young with milk and having the skin more or less covered with hair; young are born alive except for the small subclass of monotremes) =&gt; vertebrate, craniate -(animals having a bony or cartilagenous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) =&gt; chordate =&gt; animal, animate being, beast, brute, creature, fauna -(a living organism characterized by voluntary movement) =&gt; life form, organism, being, living thing -(any living entity) =&gt; entity -(something having concrete existence; living or nonliving)  ears clogs flowers trees l)eol)le\] chariot l)ug t)ansy larch brniser J inotorhike terrier dalfodil pine patriarch jcep lapdog t.ulil) oak siren lnoped chillllahlla l'Oge sycal nol'o rake  The extraction algorithm starts with the synset corresponding to the word-sense for which we wish to create a lexieal entry. 'l?he gloss is tokenised, function words are removed and tire relnaining content words are converted to their root railer(ion. All such words are considered to be real,ares of the word-sense, and are given a centrality of 1.0. We then chain at)war(Is using a hypernymic link (if any) 3. At the. next level up, features are extracted from the hypernym's gloss, nsing a centrality of 0.9. The process is repeated, reducing the centrality by 0.1 at each level, until either the top of tire hierarchy is reached or the centrality falls to zero. Finally, the rel)resentat.ion , consisting of a set of feature-centrality l)airs, is normalised.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML