File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/p99-1020_evalu.xml

Size: 7,904 bytes

Last Modified: 2025-10-06 14:00:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1020">
  <Title>A Method for Word Sense Disambiguation of Unrestricted Text</Title>
  <Section position="6" start_page="155" end_page="156" type="evalu">
    <SectionTitle>
5 Evaluation and comparison with
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="155" end_page="155" type="sub_section">
      <SectionTitle>
other methods
5.1 Tests against SemCor
</SectionTitle>
      <Paragraph position="0"> The method was tested on 384 pairs selected from the first two tagged files of SemCor 1.6 (file br-a01, br-a02). From these, there are 200 verb-noun pairs, 127 adjective-noun pairs and 57 adverb-verb pairs.</Paragraph>
      <Paragraph position="1"> In Table 3, we present a summary of the results. top 1 top 2 top 3 top 4  pairs using both algorithms.</Paragraph>
      <Paragraph position="2"> Table 3 shows the results obtained using both algorithms; for nouns and verbs, these results are improved with respect to those shown in Table 1, where only the first algorithm was applied. The results for adjectives and adverbs are the same in both these tables; this is because the second algorithm is not used with adjectives and adverbs, as words having this part of speech are not structured in hierarchies in WordNet, but in clusters; the small size of the clusters limits the applicability of the second algorithm.</Paragraph>
      <Paragraph position="3"> Discussion of results When evaluating these results, one should take into consideration that: 1. Using the glosses as a base for calculating the conceptual density has the advantage of eliminating the use of a large corpus. But a disadvantage that comes from the use of glosses is that they are not part-of-speech tagged, like some corpora are (i.e. Treebank). For this reason, when determining the nouns from the verb glosses, an error rate is introduced, as some verbs (like make, have, go, do) are lexically ambiguous having a noun representation in Word-Net as well. We believe that future work on part-of-speech tagging the glosses of WordNet will improve our results.</Paragraph>
      <Paragraph position="4"> 2. The determination of senses in SemCor was done of course within a larger context, the context of sentence and discourse. By working only with a pair of words we do not take advantage of such a broader context. For example, when disambiguating the pair protect court our method picked the court meaning &amp;quot;a room in which a law court sits&amp;quot; which seems reasonable given only two words, whereas SemCor gives the court meaning &amp;quot;an assembly to conduct judicial business&amp;quot; which results from the sentence context (this was our second choice). In the next section we extend our method to more than two words disambiguated at the same time.</Paragraph>
    </Section>
    <Section position="2" start_page="155" end_page="156" type="sub_section">
      <SectionTitle>
5.2 Comparison with other methods
</SectionTitle>
      <Paragraph position="0"> As indicated in (Resnik and Yarowsky, 1997), it is difficult to compare the WSD methods, as long as distinctions reside in the approach considered (MRD based methods, supervised or unsupervised statistical methods), and in the words that are disambiguated. A method that disambiguates unrestricted nouns, verbs, adverbs and adjectives in texts is presented in (Stetina et al., 1998); it attempts to exploit sentential and discourse contexts and is based on the idea of semantic distance between words, and lexical relations. It uses WordNet and it was tested on SemCor.</Paragraph>
      <Paragraph position="1"> Table 4 presents the accuracy obtained by other WSD methods. The baseline of this comparison is considered to be the simplest method for WSD, in which each word is tagged with its most common sense, i.e. the first sense as defined in WordNet.</Paragraph>
      <Paragraph position="2">  ods.</Paragraph>
      <Paragraph position="3"> As it can be seen from this table, (Stetina et al., 1998) reported an average accuracy of 85.7% for nouns, 63.9% for verbs, 83.6% for adjectives and 86.5% for adverbs, slightly less than our results. Moreover, for applications such as information retrieval we can use more than one sense combination; if we take the top 2 ranked combinations our average accuracy is 91.5% (from Table 3).</Paragraph>
      <Paragraph position="4"> Other methods that were reported in the lit- null erature disambiguate either one part of speech word (i.e. nouns), or in the case of purely statistical methods focus on very limited number of words. Some of the best results were reported in (Yarowsky, 1995) who uses a large training corpus. For the noun drug Yarowsky obtains 91.4% correct performance and when considering the restriction &amp;quot;one sense per discourse&amp;quot; the accuracy increases to 93.9%, result represented in the third column in Table 4.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="156" end_page="157" type="evalu">
    <SectionTitle>
6 Extensions
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="156" end_page="156" type="sub_section">
      <SectionTitle>
6.1 Noun-noun and verb-verb pairs
</SectionTitle>
      <Paragraph position="0"> The method presented here can be applied in a similar way to determine the conceptual density within noun-noun pairs, or verb-verb pairs (in these cases, the NEAR operator should be used for the first step of this algorithm).</Paragraph>
    </Section>
    <Section position="2" start_page="156" end_page="157" type="sub_section">
      <SectionTitle>
6.2 Larger window size
</SectionTitle>
      <Paragraph position="0"> We have extended the disambiguation method to more than two words co-occurrences. Consider for example: The bombs caused damage but no injuries. The senses specified in SemCor, are: la. bomb(#1~3) cause(#1//2) damage(#1~5) iujury ( #1/4 ) For each word X, we considered all possible combinations with the other words Y from the sentence, two at a time. The conceptual density C was computed for the combinations X -Y as a summation of the conceptual densities between the sense i of the word X and all the senses of the words Y. The results are shown in the tables below where the conceptual density calculated for the sense #i of word X is presented in the column denoted by C#i:  By selecting the largest values for the conceptual density, the words are tagged with their senses as follows: lb. bomb(#1/3) cause(#1/2) damage(#1~5) iuju, (#e/4)  Note that the senses for word injury differ from la. to lb.; the one determined by our method (#2/4) is described in WordNet as &amp;quot;an accident that results in physical damage or hurt&amp;quot; (hypernym: accident), and the sense provided in SemCor (#1/4) is defined as &amp;quot;any physical damage'(hypernym: health problem).</Paragraph>
      <Paragraph position="1"> This is a typical example of a mismatch caused by the fine granularity of senses in Word-Net which translates into a human judgment that is not a clear cut. We think that the sense selection provided by our method is justified, as both damage and injury are objects of the same verb cause; the relatedness of damage(#1/5) and injury(#2/~) is larger, as both are of the same class noun.event as opposed to injury(#1~4) which is of class noun.state.</Paragraph>
      <Paragraph position="2"> Some other randomly selected examples considered were: 2a. The te,~orists(#l/1) bombed(#l/S) the embassies(#1~1).</Paragraph>
      <Paragraph position="4"> where sentences 2a, 3a and 4a are extracted from SemCor, with the associated senses for each word, and sentences 2b, 3b and 4b show the verbs and the nouns tagged with their senses by our method. The only discrepancy is for the  word broke and perhaps this is due to the large number of its senses. The other word with a large number of senses explode was tagged correctly, which was encouraging.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML