XML Viewer - p97-1007

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1007_metho.xml
Size: 20,717 bytes
Last Modified: 2025-10-06 14:14:31
<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1007">
  <Title>Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation *</Title>
  <Section position="4" start_page="48" end_page="50" type="metho">
    <SectionTitle>
2 Heuristics for Genus Sense
</SectionTitle>
    <Paragraph position="0"> Disambiguation As the methods described in this paper have been developed for being applied in a combined way, each one must be seen as a container of some part of the knowledge (or heuristic) needed to disambiguate the correct hypernym sense. Not all the heuristics are suitable to be applied to all definitions. For combining the heuristics, each heuristic assigns each candidate hypernym sense a normalized weight, i.e. a real number ranging from 0 to 1 (after a scaling process, where maximum score is assigned 1, c.f. section 2.9).</Paragraph>
    <Paragraph position="1"> The heuristics applied range from the simplest (e.g.</Paragraph>
    <Paragraph position="2"> heuristic 1, 2, 3 and 4) to the most informed ones (e.g. heuristics 5, 6, 7 and 8), and use information present in the entries under study (e.g. heuristics 1, 2, 3 and 4) or extracted from the whole dictionary as a unique lexical knowledge resource (e.g. heuristics 5 and 6) or combining lexical knowledge from several heterogeneous lexical resources (e.g. heuristic 7 and 8).</Paragraph>
    <Section position="1" start_page="48" end_page="48" type="sub_section">
      <SectionTitle>
2.1 Heuristic 1: Monosemous Genus Term
</SectionTitle>
      <Paragraph position="0"> This heuristic is applied when the genus term is monosemous. As there is only one hypernym sense candidate, the hyponym sense is attached to it. Only 12% of noun dictionary senses have monosemous genus terms in DGILE, whereas the smaller LPPL reaches 40%.</Paragraph>
    </Section>
    <Section position="2" start_page="48" end_page="49" type="sub_section">
      <SectionTitle>
2.2 Heuristic 2: Entry Sense Ordering
</SectionTitle>
      <Paragraph position="0"> This heuristic assumes that senses are ordered in an entry by frequency of usage. That is, the most used and important senses are placed in the entry before less frequent or less important ones. This heuristic provides the maximum score to the first sense of the hypernym candidates and decreasing scores to the others.</Paragraph>
    </Section>
    <Section position="3" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.3 Heuristic 3: Explicit Semantic Domain
</SectionTitle>
      <Paragraph position="0"> This heuristic assigns the maximum score to the hypernym sense which has the same semantic domain tag as the hyponym. This heuristic is of limited application: LPPL lacks semantic tags, and less than 10% of the definitions in DGILE are marked with one of the 96 different semantic domain tags (e.g.</Paragraph>
      <Paragraph position="1"> med. for medicine, or def. for law, etc.).</Paragraph>
    </Section>
    <Section position="4" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.4 Heuristic 4: Word Matching
</SectionTitle>
      <Paragraph position="0"> This heuristic trusts that related concepts will be expressed using the same content words. Given two definitions - that of the hyponym and that of one candidate hypernym - this heuristic computes the total amount of content words shared (including headwords). Due to the morphological productivity of Spanish and French, we have considered different variants of this heuristic. For LPPL the match among lemmas proved most useful, while DGILE yielded better results when matching the first four characters of words.</Paragraph>
    </Section>
    <Section position="5" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.5 Heuristic 5: Simple Cooccurrence
</SectionTitle>
      <Paragraph position="0"> This heuristic uses cooccurrence data collected from the whole dictionary (see section 4.1 for more details). Thus, given a hyponym definition (O) and a set of candidate hypernym definitions, this method selects the candidate hypernym definition (E) which returns the maximum score given by formula (1): SC(O, E) : E cw(wi, wj) (I) 'wIEOAwj6E The cooccurrence weight (cw) between two words can be given by Cooccurrence Frequency, Mutual Information (Church and Hanks, 1990) or Association Ratio (Resnik, 1992). We tested them using different context window sizes. Best results were obtained in both dictionaries using the Association Ratio. In DGILE window size 7 proved the most suitable, whereas in LPPL whole definitions were used.</Paragraph>
    </Section>
    <Section position="6" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.6 Heuristic 6: Cooccurrence Vectors
</SectionTitle>
      <Paragraph position="0"> This heuristic is based on the method presented in (Wilks et al., 1993) which also uses cooccurrence data collected from the whole dictionary (c.f. section 4.1). Given a hyponym definition (O) and a set of candidate hypernym definitions, this method selects the candidate hypernym (E) which returns the maximum score following formula (2): CV(O, E) = sim(Vo, VE) (2) The similarity (sim) between two definitions can be measured by the dot product, the cosine function or the Euclidean distance between two vectors (Vo and VE) which represent the contexts of the words presented in the respective definitions following formula (3):</Paragraph>
      <Paragraph position="2"> The vector for a definition (VDel) is computed adding the cooccurrence information vectors of the words in the definition (civ(wi)). The cooccurrence information vector for a word is collected from the whole dictionary using Cooccurrence Frequency, Mutual Information or Association Ratio. The best combination for each dictionary vary: whereas the dot product, Association Ratio, and window size 7 proved best for DGILE, the cosine, Mutual Information and whole definitions were preferred for LPPL.</Paragraph>
    </Section>
    <Section position="7" start_page="49" end_page="49" type="sub_section">
      <SectionTitle>
2.7 Heuristic 7: Semantic Vectors
</SectionTitle>
      <Paragraph position="0"> Because both LPPL and DGILE are poorly semantically coded we decided to enrich the dictionary assigning automatically a semantic tag to each dictionary sense (see section 4.2 for more details). Instead of assigning only one tag we can attach to each dictionary sense a vector with weights for each of the 25 semantic tags we considered (which correspond to the 25 lexicographer files of WordNet (Miller, 1990)). In this case, given an hyponym (O) and a set of possible hypernyms we select the candidate hzypernym (E) which yields maximum similarity among semantic vectors: sv(o, E) = sim(Vo, (4) where sim can be the dot product, cosine or Euclidean Distance, as before. Each dictionary sense .has been semantically tagged with a vector of semantic weights following formula (5).</Paragraph>
      <Paragraph position="2"> The salient word vector (swv) for a word contains a saliency weight (Yarowsky, 1992) for each of the 25 semantic tags of WordNet. Again, the best method differs from one dictionary to the other: each one prefers the method used in the previous section.</Paragraph>
    </Section>
    <Section position="8" start_page="49" end_page="50" type="sub_section">
      <SectionTitle>
2.8 Heuristic 8&amp;quot; Conceptual Distance
</SectionTitle>
      <Paragraph position="0"> Conceptual distance provides a basis for determining closeness in meaning among words, taking as reference a structured hierarchical net. Conceptual distance between two concepts is essentially the length  of the shortest path that connects the concepts in the hierarchy. In order to apply conceptual distance, WordNet was chosen as the hierarchical knowledge base, and bilingual dictionaries were used to link Spanish and French words to the English concepts.</Paragraph>
      <Paragraph position="1"> Given a hyponym definition (O) and a set of candidate hypernym definitions, this heuristic chooses the hypernym definition (E) which is closest according to the following formula: CD(O, E) = dist(headwordo, genusE) (6) That is, Conceptual Distance is measured between the headword of the hyponym definition and the genus of the candidate hypernym definitions using formula (7), c.f. (Agirre et al., 1994). To compute the distance between any two words (wl,w2), all the corresponding concepts in WordNet (el,, e2j) are searched via a bilingual dictionary, and the minimum of the summatory for each concept in the path between each possible combination of c1~ and c2~ is returned, as shown below:</Paragraph>
      <Paragraph position="3"> Formulas (6) and (7) proved the most suitable of several other possibilities for this task, including those which included full definitions in (6) or those using other Conceptual Distance formulas, c.f.</Paragraph>
      <Paragraph position="4"> (Agirre and Rigau, 1996).</Paragraph>
    </Section>
    <Section position="9" start_page="50" end_page="50" type="sub_section">
      <SectionTitle>
2.9 Combining the heuristics: Summing
</SectionTitle>
      <Paragraph position="0"> As outlined in the beginning of this section, the way to combine all the heuristics in one single decision is simple. The weights each heuristic assigns to the rivaling senses of one genus are normalized to the interval between 1 (best weight) and 0. Formula (8) shows the normalized value a given heuristic will give to sense E of the genus, according to the weight assigned to the heuristic to sense E and the maximum weight of all the sense of the genus Ei.</Paragraph>
      <Paragraph position="1"> vote(O, E) = weight(O, E) max E, ( weigth( O , Ei ) ) (s) The values thus collected from each heuristic, are added up for each competing sense. The order in which the heuristics are applied has no relevance at</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="50" end_page="51" type="metho">
    <SectionTitle>
3 Evaluation
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="50" end_page="50" type="sub_section">
      <SectionTitle>
3.1 Test Set
</SectionTitle>
      <Paragraph position="0"> In order to test the performance of each heuristic and their combination, we selected two test sets at random (one per dictionary): 391 noun senses for DGILE and 115 noun senses for LPPL, which give confidence rates of 95% and 91% respectively. From these samples, we retained only those for which the automatic selection process selected the correct genus (more than 97% in both dictionaries). Both test sets were disambiguated by hand. Where necessary multiple correct senses were allowed in both dictionaries. Table 2 shows the data for the test sets.</Paragraph>
    </Section>
    <Section position="2" start_page="50" end_page="51" type="sub_section">
      <SectionTitle>
3.2 Results
</SectionTitle>
      <Paragraph position="0"> Table 3 summarizes the results for polysemous genus.</Paragraph>
      <Paragraph position="1"> In general, the results obtained for each heuristic seem to be poor, but always over the random choice baseline (also shown in tables 3 and 4). The best heuristics according to the recall in both dictionaries is the sense ordering heuristic (2). For the rest, the difference in size of the dictionaries could explain the reason why cooccurrence-based heuristics (5 and 6) are the best for DGILE, and the worst for LPPL.</Paragraph>
      <Paragraph position="2"> Semantic distance gives the best precision for LPPL, but chooses an average of 1.25 senses for each genus. With the combination of the heuristics (Sum) we obtained an improvement over sense ordering (heuristic 2) of 9% (from 70% to 79%) in DGILE, and of 7% (from 66% to 73%) in LPPL, maintaining in both cases a coverage of 100%. Including monosemous genus in the results (c.f. table 4), the sum is able to correctly disambiguate 83% of the genus in DGILE (8% improvement over sense ordering) and 82% of the genus in LPPL (4% improvement).</Paragraph>
      <Paragraph position="3"> Note that we are adding the results of eight different heuristics with eight different performances, improving the individual performance of each one.</Paragraph>
      <Paragraph position="4"> In order to test the contribution of each heuristic to the total knowledge, we tested the sum of all the heuristics, eliminating one of them in turn. The results are provided in table 5.</Paragraph>
      <Paragraph position="5">  (Gale et al., 1993) estimate that any senseidentification system that does not give the correct sense of polysemous words more than 75% of the time would not be worth serious consideration.</Paragraph>
      <Paragraph position="6"> As table 5 shows this is not the case in our system. For instance, in DGILE heuristic 8 has the worst performance (see table 4, precision 57%), but it has the second larger contribution (see table 5, precision decreases from 83% to 77%). That is, even those heuristics with poor performance can contribute with knowledge that other heuristics do not provide.</Paragraph>
    </Section>
    <Section position="3" start_page="51" end_page="51" type="sub_section">
      <SectionTitle>
3.3 Evaluation
</SectionTitle>
      <Paragraph position="0"> The difference in performance between the two dictionaries show that quality and size of resources is a key issue. Apparently the task of disambiguating LPPL seems easier: less polysemy, more monosemous genus and high precision of the sense ordering heuristic. However, the heuristics that depend only on the size of the data (5, 6) perform poorly on LPPL, while they are powerful methods for DGILE.</Paragraph>
      <Paragraph position="1"> The results show that the combination of heuristics is useful, even if the performance of some of the heuristics is low. The combination performs better than isolated heuristics, and allows to disambiguate all the genus of the test set with a success rate of 83% in DGILE and 82% in LPPL.</Paragraph>
      <Paragraph position="2"> All the heuristics except heuristic 3 can readily be applied to any other dictionary. Minimal parameter adjustment (window size, cooccurrence weigth formula and vector similarity function) should be done to fit the characteristics of the dictionary, but according to our results it does not alter significantly the results after combining the heuristics.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="51" end_page="52" type="metho">
    <SectionTitle>
4 Derived Lexical Knowledge
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="51" end_page="52" type="sub_section">
      <SectionTitle>
Resources
4.1 Cooccurrence Data
</SectionTitle>
      <Paragraph position="0"> Following (Wilks et al., 1993) two words cooccur if they appear in the same definition (word order in definitions are not taken into account). For instance, for DGILE, a lexicon of 300,062 cooccurrence pairs among 40,193 word forms was derived (stop words were not taken into account). Table 6 shows the first eleven words out of the 360 which cooccur with vino (wine) ordered by Association Ratio. From left to right, Association Ratio and number of occurrences.</Paragraph>
      <Paragraph position="1"> The lexicon (or machine-tractable dictionary,  (wine).</Paragraph>
      <Paragraph position="2"> association ratio for vino MTD) thus produced from the dictionary is used by heuristics 5 and 6.</Paragraph>
    </Section>
    <Section position="2" start_page="52" end_page="52" type="sub_section">
      <SectionTitle>
4.2 Multilingual Data
</SectionTitle>
      <Paragraph position="0"> Heuristics 7 and 8 need external knowledge, not present in the dictionaries themselves. This knowledge is composed of semantic field tags and hierarchical structures, and both were extracted from WordNet. In order to do this, the gap between our working languages and English was filled with two bilingual dictionaries. For this purpose, we derived a list of links for each word in Spanish and French as follows.</Paragraph>
      <Paragraph position="1"> Firstly, each Spanish or French word was looked up in the bilingual dictionary, and its English translation was found. For each translation WordNet yielded its senses, in the form of WordNet concepts (synsets). The pair made of the original word and each of the concepts linked to it, was included in a file, thus producing a MTD with links between Spanish or French words and WordNet concepts. Obviously some of this links are not correct, as the translation in the bilingual dictionary may not necessarily be understood in its senses (as listed in WordNet).</Paragraph>
      <Paragraph position="2"> The heuristics using these MTDs are aware of this.</Paragraph>
      <Paragraph position="3"> For instance when accessing the semantic fields for vin (French) we get a unique translation, wine, which has two senses in WordNet: &lt;wine,vino&gt; as a beverage, and &lt;wine, wine-coloured&gt; as a kind of color. In this example two links would be produced (vin, &lt;wine,vino&gt;) and (vin, &lt;wine, wine-coloured&gt;). This link allows us to get two possible semantic fields for vin (noun.food, file 13, and noun.attribute, file 7) and the whole structure of the hierarchy in Word-Net for each of the concepts.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="52" end_page="53" type="metho">
    <SectionTitle>
5 Comparison with Previous Work
</SectionTitle>
    <Paragraph position="0"> Several approaches have been proposed for attaching the correct sense (from a set of prescribed ones) of a word in context. Some of them have been fully tested in real size texts (e.g. statistical methods (Yarowsky, 1992), (Yarowsky, 1994), (Miller and Teibel, 1991), knowledge based methods (Sussna, 1993), (Agirre and Rigau, 1996), or mixed methods (Richardson et al., 1994), (Resnik, 1995)). The performance of WSD is reaching a high stance, although usually only small sets of words with clear sense distinctions are selected for disambiguation (e.g. (Yarowsky, 1995) reports a success rate of 96% disambiguating twelve words with two clear sense distinctions each one).</Paragraph>
    <Paragraph position="1"> This paper has presented a general technique for WSD which is a combination of statistical and knowledge based methods, and which has been applied to disambiguate all the genus terms in two dictionaries. null Although this latter task could be seen easier than general WSD 4, genus are usually frequent and general words with high ambiguity ~. While the average of senses per noun in DGILE is 1.8 the average of senses per noun genus is 2.75 (1.30 and 2.29 respectively for LPPL). Furthermore, it is not possible to apply the powerful &amp;quot;one sense per discourse&amp;quot; prop-erty (Yarowsky, 1995) because there is no discourse in dictionaries.</Paragraph>
    <Paragraph position="2"> WSD is a very difficult task even for humans 6, but semiautomatic techniques to disambiguate genus have been broadly used (Amsler, 1981) (Vossen and Serail, 1990) (Ageno et ah, 1992) (Artola, 1993) and some attempts to do automatic genus disambiguation have been performed using the semantic codes of the dictionary (Bruce et al., 1992) or using cooccurrence data extracted from the dictionary itself (Wilks et al., 1993).</Paragraph>
    <Paragraph position="3"> Selecting the correct sense for LDOCE genus terms, (Bruce et al., 1992)) report a success rate of 80% (90% after hand coding of ten genus). This impressive rate is achieved using the intrinsic char4In contrast to other sense distinctions Dictionary word senses frequently differ in subtle distinctions (only some of which have to do with meaning (Gale et ah, 1993)) producing a large set of closely related dictionary senses (Jacobs, 1991).</Paragraph>
    <Paragraph position="4"> 5However, in dictionary definitions the headword and the genus term have to be the same part of speech.</Paragraph>
    <Paragraph position="5"> 6(Wilks et al., 1993) disambiguating 197 occurrences of the word bank in LDOCE say &amp;quot;was not an easy task, as some of the usages of bank did not seem to fit any of the definitions very well&amp;quot;. Also (Miller et al., 1994) tagging semantically SemCor by hand, measure an error rate around 10% for polysemous words.</Paragraph>
    <Paragraph position="6">  acteristics of LDOCE. Yhrthermore, using only the implicit information contained into the dictionary definitions of LDOCE (Cowie et al., 1992) report a success rate of 47% at a sense level. (Wilks et al., 1993) reports a success rate of 45% disambiguating the word bank (thirteen senses LDOCE) using a technique similar to heuristic 6. In our case, combining informed heuristics and without explicit semantic tags, the success rates are 83% and 82% overall, and 95% and 75% for two-way ambiguous genus (DGILE and LPPL data, respectively). Moreover, 93% and 92% of times the real solution is between the first and second proposed solution.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML