File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/e93-1028_metho.xml
Size: 18,822 bytes
Last Modified: 2025-10-06 14:13:19
<?xml version="1.0" standalone="yes"?> <Paper uid="E93-1028"> <Title>Similarity between Words Computed by Spreading Activation on an English Dictionary</Title> <Section position="5" start_page="232" end_page="234" type="metho"> <SectionTitle> 3 Paradigme: A Field for Measuring </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="232" end_page="232" type="sub_section"> <SectionTitle> Similarity </SectionTitle> <Paragraph position="0"> We analyse word meaning in terms of the semantic space defined by a semantic network, called Paradigme. Paradigme is systematically constructed from Gloss~me, a subset of an English dictionary.</Paragraph> </Section> <Section position="2" start_page="232" end_page="233" type="sub_section"> <SectionTitle> 3.1 Gloss~me -- A Closed Subsystem of English </SectionTitle> <Paragraph position="0"> A dictionary is a closed paraphrasing system of natural language. Each of its headwords is defined by a phrase which is composed of the headwords and their derivations. A dictionary, viewed as a whole, looks like a tangled network of words.</Paragraph> <Paragraph position="1"> We adopted Longman Dictionary of Contemporary English (LDOCE) \[1987\] as such a closed system of English. LDOCE has a unique feature that each of its 56,000 headwords is defined by using the words in Longman Defining Vocabulary (hereafter, LDV) and their derivations. LDV consists of 2,851 words (as the headwords in LDOCE) based on the survey of restricted vocabulary \[West, 1953\].</Paragraph> <Paragraph position="2"> We made a reduced version of LDOCE, called Glossdme. Gloss~me has every entry of LDOCE whose headword is included in LDV. Thus, LDVis defined by Gloss~me, and Glossdme is composed of ...... LDV. Gloss~me is a closed subsystem of English.</Paragraph> <Paragraph position="3"> GIoss~me has 2,851 entries that consist of 101,861 words (35.73 words/entry on the average). An item of Gloss~me has a headword, a word-class, and one or more units corresponding to numbered definitions in the entry of LDOCE. Each unit has one head-part and several det-parts. The head-part is the first phrase in the definition, which describes the broader red t /red/ adj -dd- 1 of the colour of blood or fire: a red rose~dress \[ We painted the door red. -- see also like a red rag to a bull (RAG 1) 2 (of human hair) of a bright brownish orange or copper colour 3 (of the human skin) pink, usa. for a short time: I turned red with embarrassment~anger. I The child's eye (= the skin round the eyes) were red from crying. 4 (of wine) of a dark pink to dark purple colour - ~n~. \[U\] (red adj ((of the colour) (of blood or fire) ) ((of a bright brownish (of human hair) ) (pink (usu for a short time) (of the human akin) )</Paragraph> <Paragraph position="5"/> <Paragraph position="7"> meaning of the headword. The det-parts restrict the meaning of the head-part. (See Figure 2.)</Paragraph> </Section> <Section position="3" start_page="233" end_page="234" type="sub_section"> <SectionTitle> 3.2 Paradlgme -- A Semantic Network </SectionTitle> <Paragraph position="0"> We then translated Gloss~me into a semantic network Paradigme. Each entry in Gloss~me is mapped onto a node in Paradigme. Paradigme has 2,851 nodes and 295,914 unnamed links between the nodes (103.79 links/node on the average). Figure 3 shows a sample node red_l. Each node consists of a headword, a word-class, an activity-value, and two sets of links: a rdf4rant and a rdfdrd.</Paragraph> <Paragraph position="1"> A r~f~rant of a node consists of several subrdfdrants correspond to the units of Giossdme. As shown in Figure 2 and 3, a morphological analysis maps the word bromlish in the second unit onto a link to the node broom_l, and the word colour onto two links to colour_l (adjective) and colour.2 (noun).</Paragraph> <Paragraph position="2"> A rdfdrd of a node p records the nodes referring to p. For example, the rdf6rd of red_l is a set of links to nodes (ex. apple_l) that have a link to red_t in their rdf~rants. The rdf6rd provides information about the extension of red_l, not the intension shown in the rdf6rant.</Paragraph> <Paragraph position="3"> Each link has thickness tk, which is computed from the frequency of the word wk in Gloss~me and other information, and normalized as )-~tk = 1 in each subrdf6rant or r6f~rd. Each subrdf~rant also has thickness (for example, 0.333333 in the first subrdf6rant of red_l), which is computed by the order of the units which represents significance of the definitions. Appendix A describes the structure of Paradigme in detail.</Paragraph> </Section> </Section> <Section position="6" start_page="234" end_page="236" type="metho"> <SectionTitle> 4 Computing Similarity between Words </SectionTitle> <Paragraph position="0"> Similarity between words is computed by spreading activation on Paradigme. Each of its nodes can hold activity, and it moves through the links. Each node computes its activity value vi(T+ 1) at time T+ 1 as follows:</Paragraph> <Paragraph position="2"> where Rd(T) and R~(T) are the sum of weighted activity (at time T) of the nodes referred in the r6f6rant and r~f6r6 respectively. And, ei(T) is activity given from outside (at time T); to 'activate a node' is to let ei(T) > 0. The output function C/ sums up three activity values in appropriate proportion and limits the output value to \[0,1\]. Appendix B gives the details of the spreading activation.</Paragraph> <Section position="1" start_page="234" end_page="235" type="sub_section"> <SectionTitle> 4.1 Measuring Similarity </SectionTitle> <Paragraph position="0"> Activating a node for a certain period of time causes the activity to spread over Paradigme and produce an activated pattern on it. The activated pattern approximately gets equilibrium after 10 steps, whereas it will never reach the actual equilibrium. The pattern thus produced represents the meaning of the node or of the words related to the node by morphological analysis 1.</Paragraph> <Paragraph position="1"> The activated pattern, produced from a word w, suggests similarity between w and any headword in LDV. The similarity a(w, w') E \[0, 1\] is computed in the following way. (See also Figure 4.) 1. Reset activity of all nodes in Paradigme.</Paragraph> <Paragraph position="2"> 2. Activate w with strength s(w) for 10 steps, where s(w) is significance of the word w.</Paragraph> <Paragraph position="3"> Then, an activated pattern P(w) is produced on Paradigmc.</Paragraph> <Paragraph position="4"> 3. Observe a(P(w), w') -- an activity value of the node w' in P(w).</Paragraph> <Paragraph position="5"> Then, a(w, w') is s(w').a(P(w), w').</Paragraph> <Paragraph position="6"> The word significance s(w) E \[0, 1\] is defined as the normalized information of the word w in the corpus \[West, 1953\]. For example, the word red appears 2,308 times in the 5,487,056-word corpus, and the word and appears 106,064 times. So, s(red) and s(and) are computed as follows:</Paragraph> <Paragraph position="8"> We estimated the significance of the words excluded from the word list \[West, 1953\] at the average significance of their word classes. This interpolation virtually enlarged West's 5,000,000-word corpus.</Paragraph> <Paragraph position="9"> For example, let us consider the similarity between red and orange. First, we produce an activated pattern P(red) on Paradigrae. (See Figure 5.) In this case, both of the nodes red..1 (adjective) and red_,?. (noun) are activated with strength s(red)= 0.500955. Next, we compute s(oraage)= 0.676253, and observe a(P(red),orange) = 0.390774. Then, the similarity between red and orange is obtained as follows:</Paragraph> <Paragraph position="11"> XThe morphological analysis maps all words derived by 48 affixes in LDV onto their root forms (i.e. headwotds of LDOCE).</Paragraph> </Section> <Section position="2" start_page="235" end_page="235" type="sub_section"> <SectionTitle> 4.2 Examples of Similarity between Words </SectionTitle> <Paragraph position="0"> The procedure described above can compute the similarity a(w, w I) between any two words w, w I in LDV and their derivations. Computer programs of this procedure- spreading activation (in C), morphological analysis and others (in Common Lisp) -- can compute a(w, w') within 2.5 seconds on a workstation (SPARCstation 2).</Paragraph> <Paragraph position="1"> The similarity C/r between words works as an indicator of the lexical cohesion. The following examples illustrate that a increases with the strength of</Paragraph> <Paragraph position="3"> The similarity ~r also increases with the occurrence tendency of words, for example:</Paragraph> <Paragraph position="5"> Note that a(w, w') has direction (from w to w'), so that a(w, w') may not be equal to a(w', w): a(films, theatre) = 0.178988 , o(theatre, films) ---- 0.068927.</Paragraph> <Paragraph position="6"> Meaningful words should have higher similarity; meaningless words (especially, function words) should have lower similarity. The similarity a(w, w') increases with the significance s(w) and s(w') that represent meaningfulness of w and w':</Paragraph> <Paragraph position="8"> Note that the reflective similarity a(w,w) also depends on the significance s(w), so that cr(w,w) < 1:</Paragraph> <Paragraph position="10"/> </Section> <Section position="3" start_page="235" end_page="236" type="sub_section"> <SectionTitle> 4.3 Similarity of Extra Words </SectionTitle> <Paragraph position="0"> The similarity of words in LDV and their derivations is measured directly on Paradigme; the similarity of extra words is measured indirectly on Paradigme by treating an extra word as a word list W = {Wl,..., wn} of its definition in LDOCE. (Note that each wi E W is included in LDV or their derivations.) The similarity between the word lists W, W ~ is defined as follows. (See aiso Figure 6.)</Paragraph> <Paragraph position="2"> as the similarity between word fists.</Paragraph> <Paragraph position="4"> the word list: {red, alcoholic, drink}.</Paragraph> <Paragraph position="5"> where P(W) is the activated pattern produced from W by activating each wi E W with strength s(wl)2/~ s(wk) for 10 steps. And, C/ is an output function which limits the value to \[0,1\].</Paragraph> <Paragraph position="6"> As shown in Figure 7, bottle_l and wine_l have high activity in the pattern produced from the phrase &quot;red alcoholic drink&quot;. So, we may say that the overlapped pattern implies % bottle of wine&quot;.</Paragraph> <Paragraph position="7"> For example, the similarity between linguistics and stylistics, both are the extra words, is computed as follows: ~(linguistics, stylistics) = o({the, study, of, language, in, general, and, of, particular, languages, and, their, structure, and, grammar, and, history}, {the, study, of, style, in, written, or, spoken, language} ) = 0.140089.</Paragraph> <Paragraph position="8"> Obviously, both ~r(W,w) and a(w, W), where W is an extra word and w is not, are also computable. Therefore, we can compute the similarity between any two headwords in LDOCE and their derivations.</Paragraph> <Paragraph position="10"> This section shows the application of the similarity between words to text analysis -- measuring similarity between texts, and measuring text coherence.</Paragraph> </Section> <Section position="4" start_page="236" end_page="236" type="sub_section"> <SectionTitle> 5.1 Measuring Similarity between Texts </SectionTitle> <Paragraph position="0"> Suppose a text is a word list without syntactic structure. Then, the similarity ~r(X,X') between two texts X, X' can be computed as the similarity of extra words described above.</Paragraph> <Paragraph position="1"> The following examples suggest that the similarity between texts indicates the strength of coherence relation between them: ~(&quot;I have a bummer.&quot;, &quot;Take some nails.&quot; ) = 0.100611 , a(&quot;I have a bummer.&quot;, &quot;Take some apples.&quot; ) = 0.005295 , ~(&quot;I have a pen.&quot;, &quot;Where is ink?&quot; ) = 0.113140 , a(&quot;I have a pen.&quot;, &quot;Where do you live?&quot; ) = 0.007676 . It is worth noting that meaningless iteration of words (especially, of function words) has less influence on the text similarity: a(&quot;It is a dog.&quot;, &quot;That must be your dog.&quot;)= 0.252536, ff(&quot;It is a doE.&quot;, &quot;It is a log.&quot; ) = 0.053261 .</Paragraph> <Paragraph position="2"> The text similarity provides a semantic space for text retrieval -- to recall the most similar text in X' { 1,&quot;&quot; X'} to the given text X. Once the activated pattern P(X) of the text X is produced on Paradigms, we can compute and compare the similarity a(X, XI), .-., a(X, X') immediately. (See</Paragraph> </Section> <Section position="5" start_page="236" end_page="236" type="sub_section"> <SectionTitle> Figure 8.) 5.2 Measuring Text Coherence </SectionTitle> <Paragraph position="0"> Let us consider the reflective similarity a(X, X) of a text X, and use the notation c(X) for a(X, X).</Paragraph> <Paragraph position="1"> Then, c(X) can be computed as follows: = C/ (E. x ,(,O,(P(X).,,,)).</Paragraph> <Paragraph position="2"> The activated pattern P(X), as shown in Figure 7, represents the average meaning of wl @ X. So, c(X) represents cohesiveness of X -- or semantic closeness of w 6 X, or semantic compactness of X. (It is also closely related to distortion in clustering.) The following examples suggest that c(X) indicates the strength of coherence of X: c (&quot;She opened the world with her typewriter. Her work was typing.</Paragraph> <Paragraph position="3"> But She did not type quickly.&quot; ) = 0.502510 (coherent), c (&quot;Put on your clothes at once.</Paragraph> <Paragraph position="4"> I can not walk ten miles.</Paragraph> <Paragraph position="5"> There is no one here but me.&quot; ) = 0.250840 (incoherent).</Paragraph> <Paragraph position="6"> However, a cohesive text can be incoherent; the following example shows cohesiveness of the incoherent text -- three sentences randomly selected from LDOCE: c (&quot;I saw a lion.</Paragraph> <Paragraph position="7"> A lion belongs to the cat family.</Paragraph> <Paragraph position="8"> My family keeps a pet.&quot; ) = 0.560172 (incoherent, but cohesive).</Paragraph> <Paragraph position="9"> Thus, c(X) can not capture all the aspects of text coherence. This is because c(X) is based only on the lexical cohesion of the words in X.</Paragraph> </Section> </Section> <Section position="7" start_page="236" end_page="237" type="metho"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> The structure of Paradigme represents the knowledge system of English, and an activated state produced on it represents word meaning. This section discusses the nature of the structure and states of Paradigms, and also the nature of the similarity computed on it.</Paragraph> <Section position="1" start_page="236" end_page="237" type="sub_section"> <SectionTitle> 6.1 Paradigms and Semantic Space </SectionTitle> <Paragraph position="0"> The set of all the possible activated patterns produced on Paradigms can be considered as a semantic space where each state is represented as a point.</Paragraph> <Paragraph position="1"> The semantic space is a 2,851-dimensional hypercube; each of its edges corresponds to a word in LDV.</Paragraph> <Paragraph position="2"> LDV is selected according to the following information: the word frequency in written English, and the range of contexts in which each word appears.</Paragraph> <Paragraph position="3"> So, LDV has a potential for covering all the concepts commonly found in the world.</Paragraph> <Paragraph position="4"> This implies the completeness of LDV as dimensions of the semantic space. Osgood's semantic differential procedure \[1952\] used 50 adjective dimensions; our semantic measurement uses 2,851 dimensions with completeness and objectivity.</Paragraph> <Paragraph position="5"> Our method can be applied to construct a semantic network from an ordinary dictionary whose defining vocabulary is not restricted. Such a network, however, is too large to spread activity over it. Paradigme is the small and complete network for measuring the similarity.</Paragraph> </Section> <Section position="2" start_page="237" end_page="237" type="sub_section"> <SectionTitle> 6.2 Connotation and Extension of Words </SectionTitle> <Paragraph position="0"> The proposed similarity is based only on the denotational and intensional definitions in the dictionary LDOCE. Lack of the connotational and extensional knowledge causes some unexpected results of measuring the similarity. For example, consider the following similarity: ~(tree, leaf) = 0.008693.</Paragraph> <Paragraph position="1"> This is due to the nature of the dictionary definitions- they only indicate sufficient conditions of the headword. For example, the definition of tree in LDOCE tells nothing about leaves: tree n 1 a tall plant with a wooden trunk and branches, that lives for many years 2 a bush or other plant with a treelike form 3 a drawing with a branching form, esp. as used for showing family relationships However, the definition is followed by pictures of leafy trees providing readers with connotational and extensional stereotypes of trees.</Paragraph> </Section> <Section position="3" start_page="237" end_page="237" type="sub_section"> <SectionTitle> 6.3 Paradigmatic and Syntagmatic Similarity </SectionTitle> <Paragraph position="0"> In the proposed method, the definitions in LDOCE are treated as word lists, though they are phrases with syntactic structures. Let us consider the following definition of lift: llft v 1 to bring from a lower to a higher level; raise 2 (of movable parts) to be able to be lifted 3 ---Anyone can imagine that something is moving upward. But, such a movement can not be represented in the activated pattern produced from the phrase. The meaning of a phrase, sentence, or text should be represented as pattern changing in time, though what we need is static and paradigmatic relation. This paradox also arises in measuring the similarity between texts and the text coherence. As we have seen in Section 5, there is a difference between the similarity of texts and the similarity of word lists, and also between the coherence of a text and cohesiveness of a word list.</Paragraph> <Paragraph position="1"> However, so far as the similarity between words is concerned, we assume that activated patterns on Paradigme will approximate the meaning of words, like a still picture can express a story.</Paragraph> </Section> </Section> class="xml-element"></Paper>