File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1037_metho.xml

Size: 22,030 bytes

Last Modified: 2025-10-06 14:08:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1037">
  <Title>Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Constructing the Senses and Concepts
</SectionTitle>
    <Paragraph position="0"> Building the structure of the model is crucial for our task. Choosing the dimensionality of the hidden variables by selecting the number of senses and concepts, as well as taking advantage of prior knowledge to impose constraints, are very important aspects of building the structure.</Paragraph>
    <Paragraph position="1"> If certain words are not possible for a given sense, or certain senses are not possible for a given concept, their corresponding parameters should be 0.</Paragraph>
    <Paragraph position="2"> For instance, for all words a10 a1 that do not belong to a sense a40 a1 , the corresponding parameter a46a69a71a73a72a36a74a75a76a72 would be permanently set to 0. Only the remaining parameters need to be modeled explicitly.</Paragraph>
    <Paragraph position="3"> While model selection is an extremely dif cult problem in general, an important and interesting option is the use of world knowledge. Semantic hierarchies for some languages have been built. We should be able to make use of these known taxonomies in constructing our model. We make heavy use of the WordNet ontology to assign structure to both our models, as we discuss in the following subsections. There are two major tasks in building the structure determining the possible sense labels for each word, both English and Spanish, and constructing the concepts, which involves choosing the number of concepts and the probable senses for each concept.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 Building the Sense Model
</SectionTitle>
      <Paragraph position="0"> Each word in WordNet can belong to multiple synsets in the hierarchy, which are its possible senses. In both of our models, we directly use the WordNet senses as the English sense labels. All WordNet senses for which a word has been observed in the corpus form our set of English sense labels. The Sense Model holds that the sense labels for the two domains are the same. So we must use the same WordNet labels for the Spanish words as well. We include a Spanish word a10 a3 for a sense a40 if a10 a3 is the translation of any English word a10 a1 in a40 .</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Building the Concept Model
</SectionTitle>
      <Paragraph position="0"> Unlike the Sense Model, the Concept Model does not constrain the Spanish senses to be the same as the English ones. So the two major tasks in building the Concept Model are constructing the Spanish senses and then clustering the English and Spanish senses to build the concepts.</Paragraph>
      <Paragraph position="1">  For each Spanish word a10a77a3 , we have its set of English translations a8a11a10a78a1a14a13a16a15a16a17a16a17a16a17a11a15a19a10a12a1a21a61a79a22 . One possibility is to group Spanish words looking at their translations.</Paragraph>
      <Paragraph position="2"> However, a more robust approach is to consider the relevant English senses for a10 a3 . Each English translation for a10 a3 has its set of English sense labels a80 a71a73a72a68a81 drawn from WordNet. So the relevant English sense labels for a10 a3 may be de ned as a80 a71a83a82 a34a85a84a78a86 a80 a71a73a72 a81 . We call this the English sense map or a5a50a87a24a88a18a89 for a10 a3 . We use the a5a50a87a24a88a18a89 s to de ne the Spanish senses.</Paragraph>
      <Paragraph position="3"> We may imagine each Spanish word to come from one or more Spanish senses. If each word has a single sense, then we add a Spanish sense a40 a3 for each a5a42a87a90a88a18a89 and all Spanish words that share that a5a50a87a24a88a18a89 belong to that sense. Otherwise, the a5a42a87a90a88a11a89 s have to be split into frequently occurring subgroups.</Paragraph>
      <Paragraph position="4"> Frequently co-occurring subsets of a5a50a87a24a88a18a89 s can dene more re ned Spanish senses. We identify these subsets by looking at pairs of a5a50a87a90a88a11a89 s and computing their intersections. An intersection is considered to be a Spanish sense if it occurs for a signi cant number of pairs of a5a50a87a24a88a18a89 s. We consider both ways of building Spanish senses. In either case, a constructed Spanish sense a40 a3 comes with its relevant set a8a11a40 a1 a81 a22 of English senses, which we denote as a5a50a87a90a88a11a89 a28 a40a19a3 a30 .</Paragraph>
      <Paragraph position="5"> Once we have the Spanish senses, we cluster them to form concepts. We use the a5a42a87a90a88a18a89 corresponding to each Spanish sense to de ne a measure of similarity for a pair of Spanish senses. There are many options to choose from here. We use a simple measure that counts the number of common items in the two a5a42a87a90a88a11a89 s.1 The similarity measure is now used to cluster the Spanish senses a40 a3 . Since this measure is not transitive, it does not directly de ne equivalence classes over a40a54a3 . Instead, we get a similarity graph where the vertices are the Spanish senses and we add an edge between two senses if their similarity is above a threshold. We now pick each connected component from this graph as a cluster of similar Spanish senses.</Paragraph>
      <Paragraph position="6"> 1Another option would be to use a measure of similarity for English senses, proposed in Resnik (1995) for two synsets in a concept hierarchy like WordNet. Our initial results with this measure were not favorable.</Paragraph>
      <Paragraph position="7"> Now we build the concepts from the Spanish sense clusters. We recall that a concept is de ned by a set of English senses and a set of Spanish senses that are related. Each cluster represents a concept.</Paragraph>
      <Paragraph position="8"> A particular concept is formed by the set of Spanish senses in the cluster and the English senses relevant for them. The relevant English senses for any Spanish sense is given by its a5a50a87a90a88a11a89 . Therefore, the union of the a5a42a87a90a88a11a89 s of all the Spanish senses in the cluster forms the set of English senses for each concept.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Learning the Model Parameters
</SectionTitle>
    <Paragraph position="0"> Once the model is built, we use the popular EM algorithm (Dempster et al., 1977) for hidden variables to learn the parameters for both models. The algorithm repeatedly iterates over two steps. The rst step maximizes the expected log-likelihood of the joint probability of the observed data with the current parameter settings a46 a47 . The next step then re-estimates the values of the parameters of the model.</Paragraph>
    <Paragraph position="1"> Below we summarize the re-estimation steps for each model.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 EM for the Sense Model
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> follow similarly.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Initialization of Model Probabilities
</SectionTitle>
      <Paragraph position="0"> Since the EM algorithm performs gradient ascent as it iteratively improves the log-likelihood, it is prone to getting caught in local maxima, and selection of the initial conditions is crucial for the learning procedure. Instead of opting for a uniform or random initialization of the probabilities, we make use of prior knowledge about the English words and senses available from WordNet. Word-Net provides occurrence frequencies for each synset in the SemCor Corpus that may be normalized to derive probabilities a27 a71a113a112 a28 a40 a1a36a30 for each English sense a40a62a1 . For the Sense Model, these probabilities form the initial priors over the senses, while all English (and Spanish) words belonging to a sense are initially assumed to be equally likely. However, initialization of the Concept Model using the same knowledge is trickier. We would like each English sense a40 a1 to have a27 a86 a112 a86 a75 a28 a40 a1a31a30a2a34a114a27 a71a113a112 a28 a40 a1a36a30 . But the fact that each sense belongs to multiple concepts and the constraint a98 a75 a72a54a115a69a116 a27a29a28 a40 a1a69a49 a66 a30a117a34 a92 makes the solution non-trivial. Instead, we settle for a compromise. We set a27 a86 a112 a86 a75 a28 a40 a1a69a49 a66 a30a118a34a119a27 a71a113a112 a28 a40 a1a36a30 and</Paragraph>
      <Paragraph position="2"> takes care of the sum constraints. For a Spanish sense, we set a27a29a28 a40a60a3 a30a97a34 a98 a75a76a72 a115 a3a21a121a123a122a125a124a69a126 a75a76a82a21a127 a27 a71a113a112 a28 a40a62a1 a30 . Once we have the Spanish sense probabilities, we follow the same procedure for setting a27a29a28 a40 a3a50a49 a66 a30 for each concept. All the Spanish and English words for a sense are set to be equally likely, as in the Sense Model.</Paragraph>
      <Paragraph position="3"> It turned out in our experiments on real data that this initialization makes a signi cant difference in model performance.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Experimental Evaluation
</SectionTitle>
    <Paragraph position="0"> Both the models are generative probabilistic models learned from parallel corpora and are expected to t the training and subsequent test data. A good t should be re ected in good prediction accuracy over a test set. The prediction task of interest is the sense of an English word when its translation is provided.</Paragraph>
    <Paragraph position="1"> We estimate the prediction accuracy and recall of our models on Senseval data.2 In addition, the Concept Model learns a sense structure for the Spanish 2Accuracy is the ratio of the number of correct predictions and the number of attempted predictions. Recall is the ratio of the number of correct predictions and the size of the test set. language. While it is hard to objectively evaluate the quality of such a structure, we present some interesting concepts that are learned as an indication of the potential of our approach.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Evaluation with Senseval Data
</SectionTitle>
      <Paragraph position="0"> In our experiments with real data, we make use of the parallel corpora constructed by Diab and Resnik (2002) for evaluation purposes. We chose to work on these corpora in order to permit a direct comparison with their results. The sense-tagged portion of the English corpus is comprised of the English all-words section of the SENSEVAL-2 test data. The remainder of this corpus is constructed by adding the Brown Corpus, the SENSEVAL-1 corpus, the SENSEVAL-2 English Lexical Sample test, trial and training corpora and the Wall Street Journal sections 18-24 from the Penn Treebank. This English corpus is translated into Spanish using two commercially available MT systems: Globalink Pro 6.4 and Systran Professional Premium. The GIZA++ implementation of the IBM statistical MT models was used to derive the most-likely word-level alignments, and these de ne the English/Spanish word co-occurrences. To take into account variability of translation, we combine the translations from the two systems for each English word, following in the footsteps of Diab and Resnik (2002). For our experiments, we focus only on nouns, of which there are 875 occurrences in our tagged data. The sense tags for the English domain are derived from the WordNet 1.7 inventory. After pruning stopwords, we end up with 16,186 English words, 31,862 Spanish words and 2,385,574 instances of 41,850 distinct  As can be seen from the following table, both our models clearly outperform Diab (2003), which is an improvement over Diab and Resnik (2002), in both accuracy and recall, while the Concept Model does signi cantly better than the Sense Model with fewer parameters. The comparison is restricted to the same subset of the test data. For our best results, the Sense Model has 20,361 senses, while the Concept Model has 20,361 English senses, 11,961 Spanish senses and 7,366 concepts. The Concept Model results are for the version that allows multiple senses for a Spanish word. Results for the  single-sense model are similar.</Paragraph>
      <Paragraph position="1"> In Figure 3, we compare the prediction accuracy and recall against those of the 21 Senseval-2 English All Words participants and that of Diab (2003), when restricted to the same set of noun instances from the gold standard. It can be seen that our models outperform all the unsupervised approaches in recall and many supervised ones as well. No unsupervised approach is better in both accuracy and recall. It needs to be kept in mind that we take into account only bilingual data for our predictions, and not monolingual features like context of the word as most other WSD approaches do.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Semantic Grouping of Spanish Senses
</SectionTitle>
      <Paragraph position="0"> Table 2 shows some interesting examples of different Spanish senses for discovered concepts.3 The context of most concepts, like the ones shown, can be easily understood. For example, the rst concept is about government actions and the second deals with murder and accidental deaths. The penultimate concept is interesting because it deals with different kinds of association and involves three different senses containing the word conexi*on. The other words in two of these senses suggest that they are about union and relation respectively. The third probably involves the link sense of connection.</Paragraph>
      <Paragraph position="1"> Conciseness of the concepts depends on the similarity threshold that is selected. Some may bring together loosely-related topics, which can be separated by a higher threshold.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Model Analysis
</SectionTitle>
    <Paragraph position="0"> In this section, we back up our experimental results with an in-depth analysis of the performance of our two models.</Paragraph>
    <Paragraph position="1"> Our Sense Model was motivated by Diab and Resnik (2002) but the avors of the two are quite 3Some English words are found to occur in the Spanish Senses. This is because the machine translation system used to create the Spanish document left certain words untranslated. different. The most important distinction is that the Sense Model is a probabilistic generative model for parallel corpora, where interaction between different words stemming from the same sense comes into play, even if the words are not related through translations, and this interdependence of the senses through common words plays a role in sense disambiguation. null We started off with our discussions on semantic ambiguity with the intuition that identi cation of semantic concepts in the corpus that relate multiple senses should help disambiguate senses. The Sense Model falls short of this target since it only brings together a single sense from each language.</Paragraph>
    <Paragraph position="2"> We will now revisit the motivating example from Section 2 and see how concepts help in disambiguation by grouping multiple related senses together. For the Sense Model, a27a29a28a129a128a83a130a60a131a31a132a70a131a31a133a73a134a68a135a137a136a69a133a63a49 a40 a1a62a138a31a30a140a139 a27a29a28a129a128a83a130a60a131a31a132a70a131a31a133a73a134a68a135a137a136a69a133a63a49 a40 a1a60a13a14a30 since it is the only word that a40 a1a21a138 can generate. However, this difference is compensated for by the higher prior probability a27a29a28 a40 a1a14a13a14a30 , which is strengthened by both the translation pairs.</Paragraph>
    <Paragraph position="3"> Since the probability of joint occurrence is given by the product a27a51a28 a40 a30a21a27a29a28 a10 a1a70a49 a40 a30a21a27a29a28 a10 a3a50a49 a40 a30 for any sense a40 , the model does not develop a clear preference for any of the two senses.</Paragraph>
    <Paragraph position="4"> The critical difference in the Concept Model can be appreciated directly from the corresponding joint probability a27a29a28 a66 a30a21a27a29a28 a40 a1a70a49 a66 a30a21a27a29a28 a10 a1a79a49 a40 a1a31a30a21a27a29a28 a40 a3a50a49 a66 a30a21a27a29a28 a10 a3a50a49 a40 a3a54a30 , where a66 is the relevant concept in the model.</Paragraph>
    <Paragraph position="5"> The preference for a particular instantiation in the model is dependent not on the prior a27a29a28 a40a14a1 a30 over a sense, but on the sense conditional a27a51a28 a40 a1a50a49 a66 a30 . In our example, since a141 bar, obstrucci*ona139 can be generated only through concept a66a16a142a69a143 , a27a29a28 a40 a1a14a13a42a49 a66a16a142a69a143 a30 is the only English sense conditional boosted by it.</Paragraph>
    <Paragraph position="6"> a141 prevention, prevenci*on a139 is generated through a different concept a66a31a144 a92a70a92a11a145 , where the higher conditional a27a29a28a129a128a113a130a19a131a16a132a69a131a31a133a83a134a68a135a68a136a69a133a63a49 a40 a1a21a138a31a30 gradually strengthens one of the possible instantiations for it, and the other one becomes increasingly unlikely as the iterations progress. The inference is that only one sense of prevention is possible in the context of the parallel corpus. The key factor in this disambiguation was that two senses of prevention separated out in two different concepts.</Paragraph>
    <Paragraph position="7"> The other signi cant difference between the models is in the constraints on the parameters and the effect that they have on sense disambiguation. In the Sense Model, a98 a75 a27a29a28 a40 a30a117a34 a92 , while in the Concept Model, a98 a75 a72a54a115a69a116 a27a29a28 a40 a1a69a49 a66 a30a63a34 a92 separately for each concept a66 . Now for two relevant senses for an English word, a slight difference in their priors will tend to get ironed out when normalized over the en- null actos accidente accidentes supremas muertes(deaths) decisi*on decisiones casualty gobernando gobernante matar(to kill) matanzas(slaughter) muertes-le gubernamentales slaying gobernaci*on gobierno-proporciona derramamiento-de-sangre (spilling-of-blood) prohibir prohibiendo prohibitivo prohibitiva cachiporra(bludgeon) obligar(force) obligando(forcing) gubernamental gobiernos asesinato(murder) asesinatos linterna-el*ectrica linterna(lantern) man*ia craze faros-autom*ovil(headlight) culto(cult) cultos proto-senility linternas-portuarias(harbor-light) delirio delirium antorcha(torch) antorchas antorchas-pino-nudo rabias(fury) rabia farfulla(do hastily) oportunidad oportunidades diferenciaci*on ocasi*on ocasiones distinci*on distinciones riesgo(risk) riesgos peligro(danger) especializaci*on destino sino(fate) maestr*ia (mastery) fortuna suerte(fate) peculiaridades particularidades peculiaridades-inglesas probabilidad probabilidades especialidad especialidades diablo(devil) diablos modelo parang*on dickens ideal ideales heller santo(saint) santos san lucifer satan satan*as idol idols *idolo deslumbra(dazzle) dios god dioses cromo(chromium) divinidad divinity meteoro meteoros meteor meteoros-blue inmortal(immortal) inmortales meteorito meteoritos teolog*ia teolog pedregosos(rocky) deidad deity deidades variaci*on variaciones minutos minuto discordancia desacuerdo(discord) discordancias momento momentos un-momento desviaci*on(deviation) desviaciones desviaciones-normales minutos momentos momento segundos discrepancia discrepancias fugaces( eeting) variaci*on diferencia instante momento disensi*on pesta neo(blink) gui na(wink) pesta nean adhesi*on adherencia ataduras(tying) pasillo(corridor) enlace(connection) ataduras aisle atadura ataduras pasarela(footbridge) conexi*on conexiones hall vest*ibulos conexi*on une(to unite) pasaje(passage) relaci*on conexi*on callej*on(alley) callejas-ciegas (blind alley) callejones-ocultos implicaci*on (complicity) envolvimiento tire set of senses for the corpus. In contrast, if these two senses belong to the same concept in the Concept Model, the difference in the sense conditionals will be highlighted since the normalization occurs over a very small set of senses the senses for only that concept, which in the best possible scenario will contain only the two contending senses, as in concept a66 a92a70a92a11a145 of our example.</Paragraph>
    <Paragraph position="8"> As can be seen from Table 1, the Concept Model not only outperforms the Sense Model, it does so with signi cantly fewer parameters. This may be counter-intuitive since Concept Model involves an extra concept variable. However, the dissociation of Spanish and English senses can signi cantly reduce the parameter space. Imagine two Spanish words that are associated with ten English senses and accordingly each of them has a probability for belonging to each of these ten senses. Aided with a concept variable, it is possible to model the same relationship by creating a separate Spanish sense that contains these two words and relating this Spanish sense with the ten English senses through a concept variable. Thus these words now need to belong to only one sense as opposed to ten. Of course, now there are new transition probabilities for each of the eleven senses from the new concept node. The exact reduction in the parameter space will depend on the frequent subsets discovered for the a5a50a87a24a88a18a89 s of the Spanish words. Longer and more frequent subsets will lead to larger reductions. It must also be borne in mind that this reduction comes with the independence assumptions made in the Concept Model.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML