File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1702_intro.xml

Size: 17,408 bytes

Last Modified: 2025-10-06 14:01:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1702">
  <Title>Class Based Sense Definition Model for Word Sense Tagging and Disambiguation</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Word sense disambiguation has been an important research area for over 50 years. WSD is crucial for many applications, including machine translation, information retrieval, part of speech tagging, etc.</Paragraph>
    <Paragraph position="1"> Ide and Veronis (1998) pointed out the two major problems of WSD: sense tagging and data sparseness. On one hand, tagged data are very difficult to come by, since sense tagging is considerably more difficult than other forms of linguistic annotation.</Paragraph>
    <Paragraph position="2"> On the other hand, although the data sparseness is a common problem, it is especially severe for WSD. The problems were attacked in various ways.</Paragraph>
    <Paragraph position="3"> Yarowsky (1992) showed a class-based approach under which a very large untagged corpus and thesaurus can be used effectively for unsupervised training for noun homograph disambiguation.</Paragraph>
    <Paragraph position="4"> However, the method does not offer a method that explicitly produces sense tagged data for any given sense inventory. Li and Huang (1999) described a similar unsupervised approach for Chinese text based on a Chinese thesaurus. As noted in Merialdo (1994), even minimal hand tagging improved on the results of unsupervised methods. Yarowsky (1995) showed that the learning strategy of bootstrapping from small tagged data led to results rivaling supervised training methods. Li and Li (2002) extended the approach by using corpora in two languages to bootstrap the learning process.</Paragraph>
    <Paragraph position="5"> They showed bilingual bootstrapping is even more effective. The bootstrapping approach is limited by lack of a systematic procedure of preparing seed data for any word in a given sense inventory. The approach also suffers from errors propagating from one iteration into the next. Li and Huang Another alternative involves using a parallel corpus as a surrogate for tagged data. Gale, Church and Yarowsky (1992) exploited the so-called one sense per translation constraint for WSD. They reported high precision rates of a WSD system for two-way disambiguation of six English nouns based on their translations in an English-French Parallel corpus. However, when working with a particular sense inventory, there is no obvious way to know whether the one sense per translation constraint holds or how to determine the relevant translations automatically.</Paragraph>
    <Paragraph position="6"> Diab and Resnik (2002) extended the translation-based learning strategy with a weakened constraint that many instances of a word in a parallel corpus often correspond to lexically varied but semantically consistent translations. They proposed to group those translations into a target set, which can be automatically tagged with correct senses based on the hypernym hierarchy of WordNet.</Paragraph>
    <Paragraph position="7"> Diab and Resnik's work represents a departure from previous unsupervised approaches in that no seed data is needed and explicit tagged data are produced for a given sense inventory (WordNet in their case). The system trained on the tagged data was shown to be on a par with the best &amp;quot;supervised training&amp;quot; systems in SENSEVAL-2 competition.</Paragraph>
    <Paragraph position="8"> However, Diab and Resnik's method is only applicable to nominal WordNet senses. Moreover, the method is seriously hampered by noise and semantic inconsistency in a target set. Worse still, it is not always possible to rely on the hypernym hierarchy for tagging a target set. For instance, the relevant senses of the target set of {serve, tee off} for the Chinese counterpart [faqiu] do not have a common hypernym: Sense 15 serve - (put the ball into play; as in games like tennis) null move - (have a turn; make one's move in a game) Sense 1 Tee off - (strike a golf ball from a tee at the start of a game) null play - (participating in game or sports) null compete - (compete for something) This paper describes a new WSD approach to simultaneously attack the problems of tagging and data sparseness. The approach assumes the availability of a parallel corpus of text written in E (the first language, L1 + ) and C (the second language, L2), an L1 to L2 bilingual machine readable dictionary M, and a L1 thesaurus T. A so-called Mutually Assured Resolution of Sense Algorithm (MARS) and Class Based Sense Definition Model (CBSDM) are proposed to identify the word senses in I for each word in a semantic class of words L in T. Unlike Diab and Resnik, we do not apply the MARS algorithm directly to target sets to avoid the noisy words therein. The derived classes senses and their relevant glosses in L1 and L2 make it possible to build Class Based Sense Definition and Translation Models (CBSDM and CBSTM), which subsequently can be applied to assign sense tags to words in a parallel corpus.</Paragraph>
    <Paragraph position="9"> The main idea is to exploit the defining L1 and L2 words in the glosses to resolve the sense ambi+ null This has nothing to do with the direction of translation and is not to be confused with the native and second language distinction made in the literature of Teaching English As a Second Language (TESL) and Computer Assisted Language Learning.</Paragraph>
    <Paragraph position="10"> guity. For instance, for the class containing &amp;quot;serve&amp;quot; and &amp;quot;tee off,&amp;quot; the approach exploits common defining words, including &amp;quot;ball&amp;quot; and &amp;quot;game&amp;quot; in two relevant serve-15 and tee off-1 to assign the correct senses to &amp;quot;serve&amp;quot; and &amp;quot;tee off.&amp;quot; The character bigram [faqiu] in an English-Chinese MRD: serve v 10 [I[?]; T1] to begin play by striking (the ball) to the opponent (LDOCE E-C p.</Paragraph>
    <Paragraph position="11"> 1300), would make it possible to align and sense tag &amp;quot;serve&amp;quot; or &amp;quot;tee off&amp;quot; in a parallel corpus such as the bilingual citations in Example 1: (1C) (1E) drink a capful before teeing off at each hole. (Source: Sinorama, 1999, Nov. Issue, p.15, Who Played the First Stroke?).</Paragraph>
    <Paragraph position="12"> That effectively attaches semantic information to bilingual citations and turns a parallel corpus into a Bilingual Semantic Concordance (BSC). The BSC enables us to simultaneously attack two critical WSD problems of sense tagging difficulties and data sparseness, thus provides an effective approach to WSD. BSC also embodies a projection of the sense inventory from L1 onto L2, thus creates a new sense inventory and semantic concordance for L2. If I is based on WordNet for English, it is then possible to obtain an L2 WordNet. There are many additional applications of BSC, including bilingual lexicography, cross language information retrieval, and computer assisted language learning.</Paragraph>
    <Paragraph position="13"> The remainder of the paper is organized as follows: Sections 2 and 3 lay out the approach and describe the MARS and SWAT algorithms. Section 4 describes experiments and evaluation. Section 5 contains discussion and we conclude in Section 6.</Paragraph>
    <Paragraph position="14"> 2. Class Based Sense Definition Model We will first illustrate our approach with an example. A formal treatment of the approach will follow in Section 2.2.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.1 An example
</SectionTitle>
      <Paragraph position="0"> To make full use of existing machine readable dictionaries and thesauri, some kind of linkage and integration is necessary (Knight and Luk, 1994).</Paragraph>
      <Paragraph position="1"> Therefore, we are interested in linking thesaurus classes and MRD senses: Given a thesaurus class S, it is important that the relevant senses for each word w in S is determined in a MRD-based sense inventory I. We will show such linkage is useful for WSD and is feasible, based solely on the words of the glosses in I. For instance, given the following set of word (N060) in Longman Lexicon of Contemporary English (McArthur 1992): L = {difficult, hard, stiff, tough, arduous, awkward}. Although those words are highly ambiguous, the juxtaposition immediately brings to mind the relevant senses. Specifically for the sense inventory of LDOCE E-C, the relevant senses for L are as follows: Therefore, we have the intended senses, S</Paragraph>
      <Paragraph position="3"> It is reasonable to assume each sense in I is accompanied by a sense definition written in the same language (L1). We use D(S) to denote the glosses of S. Therefore we have D(S) = &amp;quot;not easy; hard to do, make, understand, etc.; difficult to do or understand; difficult to do; difficult to do; not easy; demanding effort; needing much effort; difficult; not well made for use; difficult to use; causing difficulty;&amp;quot; The intuition of bringing out the intended senses of semantically related words can be formalized by Class Based Sense Definition Model (CBSDM), which is a micro language model generating D(S), the glosses of S in I. For simplicity, we assume an unigram language model P(d) that generates the content words d in the glosses of S.</Paragraph>
      <Paragraph position="4"> Therefore, we have D(S) = &amp;quot;easy hard do make understand difficult do understand difficult do difficult do easy demanding effort needing much effort difficult well made use difficult use causing difficulty&amp;quot; If we have the relevant senses, it is a simple matter of counting to estimate P(d). Conversely, with P(d) available to us, we can pick the relevant sense of S in I which is most likely generated by P(d). The problem of learning the model P(d) lend itself nicely to an iterative relaxation method such as the Expectation and Maximization Algorithm (Dempster, Laird, Rubin, 1977).</Paragraph>
      <Paragraph position="5"> Initially, we assume all senses of S word in I is equally likely and use all the defining words therein to estimate P(d) regardless of whether they are relevant. For LDOCE senses, initial estimate of the relevant glosses is as follows: D(S) = &amp;quot;easy hard do make understand people unfriendly quarrelling pleased ... firm stiff broken pressed bent difficult do understand forceful needing using force body mind ...bent painful moving moved ... strong weakened suffer uncomfortable conditions cut worn broken ...needing effort difficult lacking skill moving body parts body CLUMSY made use difficult use causing difficulty&amp;quot; null  hard, stiff, tough, arduous, awkward} based on the relevant and irrelevant LDOCE senses, n = 6.</Paragraph>
      <Paragraph position="6">  As evident from Table 1, the initial estimates of P(d) are quite close to the true probability distribution (based on the relevant senses only). The three top ranking defining words &amp;quot;difficult,&amp;quot; &amp;quot;effort,&amp;quot; and &amp;quot;understand&amp;quot; appear in glosses of relevant senses, and not in irrelevant senses. Admittedly, there are still some noisy, irrelevant words such as &amp;quot;bent&amp;quot; and &amp;quot;broken.&amp;quot; But they do not figure prominently in the model from the start and will fade out graduately with successive iterations of re-estimation. We estimate the probability of a particular sense s being in S by P(D(s)), the probability of its gloss under P(d). For intance, we have</Paragraph>
      <Paragraph position="8"> On the other hand, we re-estimate the probability P(d) of a defining word d under CBSDM by how often d appears in a sense s and P(s). P(d) is positively prepositional to the frequency of d in D(s) and to the value of P(s). Under that re-estimation scheme, the defining words in relevant senses will figure more prominently in CBSDM, leading to more accurate estimation for probability of s being in S. For instance, in the first round, &amp;quot;difficult&amp;quot; in the gloss of hard-2 will weigh twice more than &amp;quot;firm&amp;quot; in the gloss of irrelevant hard-1, leading to relatively higher unigram probability for &amp;quot;difficult.&amp;quot; That in turn makes hard-2 even more probable than hard-1. See Table 2.</Paragraph>
      <Paragraph position="9">  Often the senses in I are accompanied with glosses written in a second language (L2); exclusively (as in a simple bilingual word list) or additionally (as in LDOCE E-C). Either way, the words in L2 glosses can be incorporated into D(s) and P(d). For instance, the character unigrams and/or overlapping bigrams in the Mandarin glosses of S in LDOCE E-C and their appearance counts and probability are shown in Table 3.</Paragraph>
      <Paragraph position="10">  {difficult-1, hard-2, stiff-6, tough-4, arduous-1, awkward-2} in LDOCE*.</Paragraph>
      <Paragraph position="11"> We call the part of CBSDM that are involved with words written in L2, Class Based Sense Translation Model. CBSTM trained on a thesaurus and a bilingual MRD can be exploited to align words and translation counter part as well as to assign word sense in a parallel corpus. For instance, given a pair of aligned sentences in a parallel corpus: null (2E) A scholar close to Needham analyses the reasons that he was able to achieve this huge work as being due to a combination of factors that would be hard to find in any other person.</Paragraph>
      <Paragraph position="12"> (Source: 1990, Dec Issue Page 24, Giving Justice Back to China --Dr. Joseph Needham and the History of Science and Civilisation in China) It is possible to apply CBSTM to obtain the following pair of translation equivalent, ( [nan], &amp;quot;hard&amp;quot;) and, at the same time, determine the intended sense. For instance, we can label the cita- null After we have done this for all pairs of word and translation counterpart, we would in effect establish a Bilingual Semantic Concordance (BSC).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 The Model
</SectionTitle>
      <Paragraph position="0"> We assume that there is a Class Based Sense Definition Model, which can be viewed as a language model that generates the glosses for a class of senses S. Assume that we are given L, the words of S but not explicitly the intended senses S. In addition, we are given a sense inventory I in the form of an MRD with the regular glosses, which are written in L1 and/or L2. We are concerned with two problems: (1) Unsupervised training of M, CBSDM for S; (2) Determining S by identifying a relevant sense in I, if existing, for each word in L.</Paragraph>
      <Paragraph position="1"> Those two problems can be solved based on Maximum Likelihood Principle: Finding M and S such that M generates the glosses of S with maximum probability. For that, we utilize the Expectation and Maximization Algorithm to derive M and  (2) Sense inventory I.</Paragraph>
      <Paragraph position="2"> Output: (1) Senses S from I for words in L; (2) CBSTM M from L1 to L2.</Paragraph>
      <Paragraph position="3"> 1. Initially, we assume that each of the senses w</Paragraph>
      <Paragraph position="5"> the number of senses in I for the word w</Paragraph>
      <Paragraph position="7"> 7. Estimate and output CBSTM for L,  where c is a unigram or overlapping bigram in L2 and t i,j is the L2 gloss of w i,j.</Paragraph>
      <Paragraph position="8"> Note that the purpose of Step 2 is to estimate how likely a word will appear in the definition of S based on the definining word for the senses, w i,j and relevant probability P(w i,j  |i,L). This likelihood of the word d being used to define senses in questions is subsequently used to re-estimate P(w i,j  |i,L), the likelihood of the jth sense,</Paragraph>
      <Paragraph position="10"> being in the intended senses of L.</Paragraph>
      <Paragraph position="11"> 3. Application to Word Sense Tagging Armed with the Class Based Sense Translation Model, we can attack the word alignment and sense tagging problems simultaneously. Each word in a pair of aligned sentences in a parallel corpus will be considered and assigned a counterpart translation and intended sense in the given context through the proposed algorithm below: Simutaneous Word Alignment and Tagging Algorithm (SWAT) Align and sense tag words in a give sentence and translation. null Input: (1) Pair of sentences (E, C);  (2) Word w, POS p in question; (3) Sense Inventory I; (4) CBSTM, P(c|L).</Paragraph>
      <Paragraph position="12"> Output: (1) Translation c of w in C; (2) Intended sense s for w.</Paragraph>
      <Paragraph position="13"> 1. Perform part of speech tagging on E; 2. Proceed if w with part of speech p is found in the results of tagging E; 3. For all classes L to which (w, p) belongs and all</Paragraph>
      <Paragraph position="15"> where LINK(x, y) means x and y are two word aligned based on Competitive Linking Align- null ment 4. Output c* as the translation; 5. Output the sense of w in L* as the intended sense.  To make sense tagging more precise, it is advisable to place constraint on the translation counterpart c of w. SWAT considers only those translations c that has been linked with w based the Competitive</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML