File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0853_metho.xml

Size: 14,338 bytes

Last Modified: 2025-10-06 14:09:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-0853">
  <Title>A Gloss-centered Algorithm for Disambiguation</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
from WordNet
</SectionTitle>
    <Paragraph position="0"> We call these descriptions - descriptive glosses.</Paragraph>
    <Paragraph position="1"> For word-senses picked up from WordNet, the WordNet glosses are the descriptive glosses. Word-Net glosses also contain example usages of the word. We have excluded the examples from descriptive glosses For other word-senses, the descriptions could come from glossaries (like glossaries of software terms), encyclopedias (for names of people, places, events, pacts etc), world fact books, abbreviation lists etc. Examples glosses picked up from above sources are listed below.</Paragraph>
    <Paragraph position="2"> descriptive-gloss for &amp;quot;piccolo&amp;quot; an instrument of the woodwind family. Most of these instruments were once of made of wood, and because they are played by blowing with air or wind, they are called woodwind.</Paragraph>
    <Paragraph position="3">  WordNet words picked from glossaries</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
Association for Computational Linguistics
</SectionTitle>
      <Paragraph position="0"> for the Semantic Analysis of Text, Barcelona, Spain, July 2004 SENSEVAL-3: Third International Workshop on the Evaluation of Systems</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 Hypernymy glosses
</SectionTitle>
      <Paragraph position="0"> The gloss for a particular sense of a word could also describe what hierarchical categories it belongs to.</Paragraph>
      <Paragraph position="1"> For instance, the hierarchical categorization of the a0a2a1a4a3 noun sense of the word &amp;quot;Vesuvius&amp;quot; is:  first noun sense of &amp;quot;Vesuvius&amp;quot;, we describe its hypernymy-gloss as the collection of all nodes in its hypernymy-path to the root - viz. &amp;quot;entity&amp;quot;. Hypernymy gloss for first noun sense of &amp;quot;Vesuvius&amp;quot; a5 volcanoa6 , a5 mountain, mounta6 , a5 natural elevation, elevationa6 , a5 geological formation, for-</Paragraph>
      <Paragraph position="3"/>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
of Vesuvius(noun)
</SectionTitle>
      <Paragraph position="0"> Whereas descriptive-glosses can be derived even for word-senses not present in WordNet, hypernymy-glosses require classification of word-senses into nodes into an ontological structure - like the hypernymy hierarchy of WordNet. This is not that easy to procure for words not present in WordNet. null</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.3 Hyper-Desc(a7 ) glosses
</SectionTitle>
      <Paragraph position="0"> This category of gloss was developed for each word-sense by concatenating the descriptive glosses of a word-sense with the glosses of its hypernyms, all the way upto height a7 . Hyper-Desc(a8 ) glosses denotes concatenating descriptive glosses all the way upto the root.</Paragraph>
    </Section>
    <Section position="5" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.4 Holo-Desc(a7 ) glosses
</SectionTitle>
      <Paragraph position="0"> The specification of these glosses is same as of Hyper-Desc(a7 ) glosses, except that holonyms are considered here instead of hypernyms.</Paragraph>
      <Paragraph position="1"> Handling Named Entities One possible solution, and the one that we actually resort to, is to find the named entity tag for a token (if one exists) and then map the tag to a node in WordNet. For example, the token &amp;quot;President Musharraf&amp;quot; is not present in WordNet. But this token can be tagged as a PERSON and PER-SON could be mapped to a node in WordNet - viz.</Paragraph>
      <Paragraph position="2"> the first noun sense of &amp;quot;person&amp;quot; (person#n#1). Similarly, the token &amp;quot;a9a11a10a11a3a13a12 December 2003&amp;quot; has a DATE named-entity tag. DATE could be translated to the  a3a13a12 sense of the word &amp;quot;date&amp;quot; (date#n#7) in WordNet. null Thus, the glosses of named entites, which dont find their entries into WordNet could be evolved from their named-entity tags. This information is valuable for disambiguating the surrounding words.</Paragraph>
      <Paragraph position="3"> For the seneval task, we have built our own Named Entity tagger that uses gazetteers and context-sensitive grammar rules for tagging named entities. Context of a word The context of the word to be disambiguated (target word) can be evolved in several possible ways.</Paragraph>
      <Paragraph position="4">  1. The passage in which the target word lies can be tokenized and the set of tokens are considered the context for that word.</Paragraph>
      <Paragraph position="5"> 2. In addition to tokenizing the passage as de- null scribed above, each token is also subjected to stemming using the porter stemming algorithm (Porter, 1980). The corresponding set of stemmed tokens form the context. This option is abbreviated as ST in table ??.</Paragraph>
      <Paragraph position="6"> 3. The passage can be part of speech tagged. In the case of SemCor and Extended WordNet, the part of speech tags have already been assigned manually. In the absence of a manual POS tags, we use the QTag part of speech tagger (Manson, 1980). And each part of speech tagged word is expanded to the concatenation of the glosses of all its word-senses. The collection of all tokens in the expansions of all words in the passage put together forms the context for the target word. In table ??, this option is abbreviated as FG.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Similarity metrics
</SectionTitle>
    <Paragraph position="0"> Another parameter for measuring the similarity between the context of a word and the gloss of each of its senses is the similarity metric.</Paragraph>
    <Paragraph position="1"> The similarity between two sets of tokens is found by constructing vectors of counts from the two vectors and finding similarity between the vectors. null</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Cosine similarity
</SectionTitle>
      <Paragraph position="0"> One standard metric of similarity, as used in information retrieval, is the cosine-similarity. We find the cosine similarity between the term frequencyinverse gloss frequency (tfigf) vectors of the two sets. The inverse gloss frequency (igf) of a token is the inverse of the number of glosses which contain that token and it captures the &amp;quot;commonness&amp;quot; of that particular token.</Paragraph>
      <Paragraph position="1"> There have been fancier definitions of similarity in literature (Lin, 1998) which involve information theoretic measures of similarity between wordsenses, based on the hypernymy path and DAG structure of WordNet. These methods are heavily dependent on frequencies of synsets in a sense-tagged corpus. The idea is that two word-senses are highly related if their subsuming synsets are highly information bearing - or in other words, have high information content. Information content is computed from a sense tagged corpus - word-senses with high frequencies of occurrence have low information content. This brings in the the problem of data sparsity - because sense-tagged corpora are very scarce and of short size. Their coverage of synsets is poor as well. Hence there is the danger of making the similarity measure biased toward the sense-tagged corpus.</Paragraph>
      <Paragraph position="2"> Also, these methods are very slow and CPU intensive, since finding similarity between two word-senses at run time involves traversing the WordNet graph, in the direction of hypernymy links, up to the least common ancestor.</Paragraph>
      <Paragraph position="3"> On the other hand, a cosine similarity on tfigf vectors built from hypernymy-glosses, gives a low similarity value between word-senses whose hypernymy-glosses overlap in very frequently occurring synsets relative to the synsets which are not common to their glosses. This is because igf implicitly captures the information content of a synset - the higher the igf - higher is the information content of a synset. The purpose served by a sense-tagged corpus is cumulatively served by the collection of hypernymy glosses of all the WordNet synsets. This method is also more reliable since the igf values come from WordNet which is very exhaustive, unlike sense tagged corpora (like SemCor) which will have bias and data-sparsity in terms of which words occur in the corpus and which sense is picked for a word. (The reader might want to note some work which has been done to illustrate that words can inherently have multiple senses in a given context).</Paragraph>
      <Paragraph position="4"> The cosine similarity on tfigf vectors built from descriptive glosses is very much like the similarity found between document and query vectors, since the tokens in descriptive glosses are regular words.</Paragraph>
      <Paragraph position="5"> Cosine similarity is intuitively the most useful similarity measure on descriptive glosses since cosine similarity of tfigf vectors takes care of stop words and very non-informative words like &amp;quot;the&amp;quot; etc.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Jaccard similarity
</SectionTitle>
      <Paragraph position="0"> Another metric of similarity is the jaccard similarity. Jaccard similarity between two sets of tokens (glosses) is computed as a0</Paragraph>
      <Paragraph position="2"> . Here a7 and a8 are the two glosses.</Paragraph>
      <Paragraph position="3"> Jaccard similarity is appealing only if the glosses used are hypernymy-glosses.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.3 Asymmetric measures of similarity
</SectionTitle>
      <Paragraph position="0"> The above two were symmetric measures of similarity. A third asymmetric similarity measure is one that takes a value of a9 if the intersection of the glosses of two word-senses is not equal to the gloss of one of the word-senses. Else, the similarity is equal to one of cosine or jaccard similarity measures. This means that there are actually two asymmetric similarity measures - one due to jaccard and the other due to cosine.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Main Algorithm
</SectionTitle>
    <Paragraph position="0"> For each word, a set of content words in its surrounding was found and the similarity of this set with with the gloss of each sense of the word was measured. Cosine similarity measure was used for all the experiments. The senses were then ordered in decreasing value of scores. The word-sense with highest similarity measure was picked as its most appropriate sense. Following were the parameters used in the sense-ranking algorithm.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Parameters
</SectionTitle>
      <Paragraph position="0"> 1. GlossType : The type of gloss being used in the algorithm. It can be any one of the four outlined in section 2.</Paragraph>
      <Paragraph position="1"> 2. Similarity measure: The cosine similarity measure was used in all the experiments.</Paragraph>
      <Paragraph position="2"> 3. Stemming : Sometimes the words in the con- null text are related semantically with the gloss of the ambiguous word but they may not be in the same morphological form. For example, suppose that the context contains the word Christian but the gloss of the word contains the word Christ. The base form of both the words is Christ but since they are not in the same morphological form they will not be treated as common words during intersection. Stemming of words may prove useful in this case, because after stemming both will give the same base form.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. FullContextExpansion : This parameter de-
</SectionTitle>
    <Paragraph position="0"> termines whether or not the words in the context should be expanded to their glosses. This feature expands the context massively. If set true the gloss of each sense of each context word will be included in the context.</Paragraph>
    <Paragraph position="1"> 5. Context size : The context size can be 1 or 2 sentences etc. or 1 or 2 paragraphs etc.</Paragraph>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experimental Results
</SectionTitle>
    <Paragraph position="0"> The algorithms were evaluated against Semcor and was also used in Senseval-3 competition. We present results in this section.</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5.1 Results for Semcor
</SectionTitle>
    <Paragraph position="0"> For preliminary experiments, we chose the Semcor 1.7 corpus. It has been manually tagged using WordNet 1.7 glosses. The baseline algorithm for sense-tagging of Semcor was of picking a sense for a word, as its correct sense, uniformly at random. This gave us a precision measure of 42.5% for nouns and 23.2% for verbs. Tables 2, 3, 4 and 5 report precision for WSD on Semcor, using our algorithm, with different parameter settings. We see that the algorithm certainly makes a difference over the baseline algorithm.</Paragraph>
    <Paragraph position="1"> PrRank1 and PrRank2 (precision at rank 1 and 2 respectively) denote the percentage of cases where the highest scoring sense is the correct sense or one of first two highest scoring senses is the correct sense, respectively. Our recall measures were the same as precision because every word was assigned a sense tag. In the event of lack of any evidence for any sense tag, the first WordNet sense(the most frequent sense) was picked.</Paragraph>
    <Paragraph position="2"> Also note that acronyms in table 1 have been employed for parameters in the subsequent tables.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML