File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/j98-1003_intro.xml

Size: 11,342 bytes

Last Modified: 2025-10-06 14:06:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="J98-1003">
  <Title>Topical Clustering of MRD Senses Based on Information Retrieval Techniques</Title>
  <Section position="4" start_page="78" end_page="82" type="intro">
    <SectionTitle>
5. Discussion
</SectionTitle>
    <Paragraph position="0"> In this section, we thoroughly analyze the experimental results, in particular, the cases for which TopSense fails. These cases reveal the strengths and limitations of TopSense and hint at possible improvements to the algorithm. In addition, we also point out several uses of the topical clusters.</Paragraph>
    <Section position="1" start_page="78" end_page="79" type="sub_section">
      <SectionTitle>
5.1 Failure of the TopSense Algorithm
</SectionTitle>
      <Paragraph position="0"> Failure of TopSense can be attributed to a number of factors, including vagueness of definitions, inappropriate definition lengths (too short or too long), metaphoric or metonymic senses, and deictic references. Table 14 shows some examples of the failed cases. For instance, the sense interest.l.n.3 (a readiness togive attention) is too vague and short for correct clustering to occur. On the other hand, long definitions including too many non-essential differentiae also give rise to erroneous clustering. We notice that the definitions of such senses have been radically changed and made more specific in the third edition of the LDOCE. The reason behind the changes may be that these sense definitions are also difficult for humans to grasp.</Paragraph>
      <Paragraph position="1">  of office, rank, honour, etc.) in the Dg class (Clothes and personal belongings). On the other hand, the metonymic meaning, Nb (Chance) of another star sense (a heavenly body regarded as determining one's fate) comes out second to the &amp;quot;primary&amp;quot; sense, La (Heavenly body). By considering cue phrases such as regarded as or as a mark of, we might be able to handle metaphoric and metonymic senses more successfully.</Paragraph>
      <Paragraph position="2"> Krovetz (1992) observes that the LDOCE indicates explicit sense shifts via the deictic reference, which is a link to the previous sense created by such terms as this, these, that, those, its, itself, such a, and such an. The author identifies many systematic sense shifts indicated by such references including Substance/Product (lemon, tree or fruit), Substance/Color (jade, amber), Object/Shape (pyramid), Animal/Food (chicken), Countnoun/Mass-noun (blasphemy), Language/People (Spanish), Animal/Skin-fur (crocodile), and Music/Dance (waltz). Such shifts indicated through a deictic reference are so pervasive in the MRD that they show up more than once in our small 20-word test set. For instance, the LDOCE sense issue.l.n.2 (an example of this) indicates a Countnoun/Mass-noun shift from its previous sense issue.l.n.1 (the act of coming out) through the deictic reference of this. Since these specific patterns of definition are not taken into consideration in TopSense, the algorithm often fails in such cases. Further work must be undertaken to cope with direct and deictic references, so that such definitions can be appropriately clustered.</Paragraph>
    </Section>
    <Section position="2" start_page="79" end_page="81" type="sub_section">
      <SectionTitle>
5.2 Clustered Definitions and Examples as a Knowledge Source for WSD
</SectionTitle>
      <Paragraph position="0"> Many studies have shown that MRD definitions and example sentences are a good knowledge source for WSD. As described in the introduction, Lesk (1986) shows that defining words are especially effective for disambiguating senses strongly associated</Paragraph>
      <Paragraph position="2"> interest - an activity, subject, etc., which one gives time and attention to table - also multiplication table; a list which young children repeat to learn what number results when a number from 1 to 12 is multiplied by any of the numbers from 1 to 12 star - a heavenly body regarded as determining one's fate suit - a set (of armour) interest - a readiness to give attention issue - the act of coming out issue - something which comes or is given out space - a quantity or bit of this for a particular purpose issue - an example of this with specific collocations, such as cone in ice-cream cone and pine cone. Wilks et al. (1990) call the defining words in the LDOCE definition semantic primitives (SP) and suggest that a semantic network constructed on the strength of co-occurrence of SPs in definitions can be useful for a variety of NLP tasks, ranging from WSD, to machine translation, to message understanding. Along the same lines, Luk (1995) terms SP the definition-based concept (DBC) and proposes using DBC co-occurrence (DBCC) trained on a large corpus to disambiguate word senses. However, the effectiveness of SPs or DBCs to represent a word sense and its indicative context is hampered by ambiguity and data sparseness. For instance, earth, one of the SPs in bank.l.n.2 is ambiguous (either as the planet Earth or soil) thus possibly leading to problems in WSD. Although these SPs are drawn from a small, controlled vocabulary in most MRDs, nevertheless, it is difficult to find SPs of a polysemous sense overlapping the SPs of its context. For instance, consider the problem of disambiguating the word bank in the context of an LDOCE example, He sat down and rested on a mossy bank in the woods. When working on the level of the SPs of an individual MRD sense, we are hard pressed to find a match between the SPs of the intended sense: SP(bank.l.n.2) = {earth, heap, field, garden, make, border, division} and the SPs of its context:</Paragraph>
      <Paragraph position="4"> soil, surface}, = {material, trunk, branch, tree, cut, dry,form, burn, paper,furniture}, = {place, tree, grow, small,forest}.</Paragraph>
      <Paragraph position="5"> The clusters of MRD senses produced by TopSense give us an advantage in this respect. By matching the context against the clustered semantic primitives (CSP) of the  Computational Linguistics Volume 24, Number 1 related senses, we have a better chance of a match. For instance, the following CSPs of the relevant bank senses contains more words, therefore are more likely to recur in the SPs of contextual words:</Paragraph>
      <Paragraph position="7"> = {hand, side, river, stream, lake, earth, heap, field, make, border, division, slope, bend, road, race-track, safe, car, go round, sandbank} If data sparseness still gets in the way, as in the case of this example, one can go one step further and adopt a class-based approach. Under such an approach, the SPs of the context are matched against the SPs of a class of senses related to the polysemous sense in question. To this end, we can make use of the topical clusters of MRD senses produced by TopSense. By taking the collective defining terms of all the senses in a topical cluster, we obtain the virtual document of SPs described in Section 4.1. To cope with the problem caused by ambiguous SPs, it is a good idea to weight terms according to tf and idf, as in the TopSense algorithm. Under such a class-based approach, we will be matching the contextual information against the unweighted or weighted terms in a class relevant to the intended sense. For instance, to resolve the sense of bank in the above example to the Ld sense, we look for a match of contextual information with</Paragraph>
      <Paragraph position="9"> = {land, side, river, stream, lake, earth, heap, field, make, border, division, slope, bend, road, race-track, safe, car, go round, sandbank, large, area, land, thick, cover, tree, bush, grow, wild, plant,</Paragraph>
      <Paragraph position="11"> lake (27.88), earth (25.76), tree (21.87) .... } Notice that for this example, the relevant VD is now large enough to overlap the contextual information; the term tree appears in SP (wood.l.n.1) as well as the relevant document VLd. Although the relevant VLd is very large, it contains mostly words that are nevertheless consistently related to geography.</Paragraph>
    </Section>
    <Section position="3" start_page="81" end_page="82" type="sub_section">
      <SectionTitle>
5.3 Systematic Sense Shift
</SectionTitle>
      <Paragraph position="0"> Ostler and Atkins (1991) contend that there is strong evidence to suggest that a large part of word sense ambiguity is not arbitrary but follows regular patterns. Moreover, gaps frequently arise in dictionaries and thesauri in specifying this kind of polysemy.</Paragraph>
      <Paragraph position="1"> Encoding regularity of the extended usage of a sense makes it possible to resolve word sense ambiguity for word entries that are underspecified in this respect. This so-called virtual polysemy can be illustrated through some examples. For instance, many verbs for moving and action, such as move and strike, can be used polysemously in the sense of emotion. Chodorow, Byrd, and Heidom (1985) observe that many instances of intersense relations can be found in W7 that are not idiosyncratic, but rather exist among senses of many words. Those relations include Process/Result, Food/Plant, and Container/Volume. Virtual polysemy and recurring intersense relations are closely related to polymorphic senses that can support coercion in semantic typing under Putstejovsky's (1991) theory of the generative lexicon.</Paragraph>
      <Paragraph position="2">  Chen and Chang Topical Clustering Dolan (1994) maintains the position that intersense relations are mostly idiosyncratical, thereby making it difficult to characterize them in a general way so as to identify them. The author cites the example of two senses of to moult, one a bird behavior and the other an animal behavior, to stress that polysemy primarily reflects fine distinctions that do not recur systematically throughout the English lexicon. However, our experimental results indicate that (a) it is exactly senses with fine distinction that are merged together and (b) there is a greater concentration of recurring intersense relations emerging from condensed senses. For instance, the distinction between the bird and animal behavior of moulting would be eliminated, since both are likely to be clustered and labeled as Ha (Making things) by TopSense. Relations among senses in the same topical clusters are mostly systematic. Many of those relations are reflected in the cross-reference information in the LLOCE. For instance, the LLOCE lists the following cross-references for the topic of Eb (Food):  abovementioned Food/Plant relation. Indeed, words involved in such intersense relations are frequently underspecified. For instance, chicken is listed under both topic Eb and topic Ad, while duck is listed under Ad but not Eb.</Paragraph>
      <Paragraph position="3"> By characterizing some 200 cross-references in LLOCE, most systematic sense shifts can be easily identified among the senses across topical clusters. The topical clusters of MRD senses, coupled with the topical description of sense-shift knowledge, can support and realize automatic sense extension, as advocated in Putstejovsky and Bouillon (1994), and prevent a proliferation of senses in the semantic lexicon. For instance, the sense of duck in the Ad cluster can be coerced into an Eb sense, in some context, based on the knowledge of a systematic sense shift from Ad (Birds) to Eb (Food).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML