File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1037_intro.xml
Size: 6,591 bytes
Last Modified: 2025-10-06 14:02:21
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1037"> <Title>Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Probabilistic Models for Parallel Corpora </SectionTitle> <Paragraph position="0"> We motivate the use of a probabilistic model by illustrating that disambiguation using translations is possible even when a word has a unique translation. For example, according to WordNet, the word prevention has two senses in English, which may be abbreviated as hindrance (the act of hindering or obstruction) and control (by prevention, e.g. the control of a disease). It has a single translation in our corpus, that being prevenci*on. The rst English sense, hindrance, also has other words like bar that occur in the corpus and all of these other words are observed to be translated in Spanish as the word obstrucci*on. In addition, none of these other words translate to prevenci*on. So it is not unreasonable to suppose that the intended sense for prevention when translated as prevenci*on is different from that of bar. Therefore, the intended sense is most likely to be control. At the very heart of the reasoning is probabilistic analysis and independence assumptions. We are assuming that senses and words have certain occurrence probabilities and that the choice of the word can be made independently once the sense has been decided. This is the avor that we look to add to modeling parallel documents for sense disambiguation. We formally describe the two generative models that use these ideas in Subsections 2.2 and 2.3.</Paragraph> <Paragraph position="1"> Model and the b) Concept Model</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Notation </SectionTitle> <Paragraph position="0"> Throughout, we use uppercase letters to denote random variables and lowercase letters to denote speci c instances of the random variables. A translation pair is (a0a2a1 , a0a2a3 ) where the subscript a4 and a5 indicate the primary language (English) and the secondary language (Spanish). a0 a1a7a6a9a8a11a10a12a1a14a13a16a15a16a17a16a17a16a17a18a15a19a10a12a1a21a20a23a22 and a0a2a3 a6a24a8a11a10 a3a19a13 a15a16a17a16a17a16a17a11a15a19a10 a3a26a25 a22 . We use the shorthand a27a29a28 a10 a1a31a30 for a27a29a28a32a0a33a1a35a34 a10 a1a36a30 .</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 The Sense Model </SectionTitle> <Paragraph position="0"> The Sense Model makes the assumption, inspired by ideas in Diab and Resnik (2002) and Bengio and Kermorvant (2003), that the English word a0a33a1 and the Spanish word a0a37a3 in a translation pair share the same precise sense. In other words, the set of sense labels for the words in the two languages is the same and may be collapsed into one set of senses that is responsible for both English and Spanish words and the single latent variable in the model is the sense label a38 a6a39a8a11a40a31a41a11a15a16a17a16a17a16a17a42a15a19a40a21a43a44a22 for both words a0a2a1 and a0a45a3 . We also make the assumption that the words in both languages are conditionally independent given the sense label. The generative parameters a46a48a47 for the model are the prior probability a27a29a28 a40 a30 of each sense a40 and the conditional probabilities a27a29a28 a10 a1a50a49 a40 a30 and a27a51a28 a10 a3a42a49 a40 a30 of each word a10 a1 and a10 a3 in the two languages given the sense. The generation of a translation pair by this model may be viewed as a two-step process that rst selects a sense according to the priors on the senses and then selects a word from each language using the conditional probabilities for that sense. This may be imagined as a factoring of the joint distribution:</Paragraph> <Paragraph position="2"> that in the absence of labeled training data, two of the random variables a0a45a1 and a0a2a3 are observed, while the sense variable a38 is not. However, we can derive the possible values for our sense labels from WordNet, which gives us the possible senses for each English word a0a45a1 . The Sense model is shown in Figure 1(a).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 The Concept Model </SectionTitle> <Paragraph position="0"> The assumption of a one-to-one association between sense labels made in the Sense Model may be too simplistic to hold for arbitrary languages. In particular, it does not take into account that translation is from sentence to sentence (with a shared meaning), while the data we are modeling are aligned single-word translations a28a32a0a37a1 a15 a0a45a3a54a30 , in which the intended meaning of a0a45a1 does not always match perfectly with the intended meaning of a0a55a3 . Generally, a set of a56 related senses in one language may be translated by one of a57 related senses in the other.</Paragraph> <Paragraph position="1"> This many-to-many mapping is captured in our alternative model using a second level hidden variable called a concept. Thus we have three hidden variables in the Concept Model the English sense a38 a1 , the Spanish sense a38 a3 and the concept a58 , where a38 a1a59a34 a8a11a40 a1a60a13 a15a16a17a16a17a16a17a42a15a19a40 a1a62a61 a22 , a38 a3a63a34 a8a11a40 a3a60a13 a15a16a17a16a17a16a17a42a15a19a40 a3a65a64 a22 and</Paragraph> <Paragraph position="3"> We make the assumption that the senses a38 a1 and a3 are independent of each other given the shared concept a58 . The generative parameters a46 a47 in the model are the prior probabilities a27a29a28 a66 a30 over the concepts, the conditional probabilities a27a29a28 a40 a1a69a49 a66 a30 and a27a29a28 a40 a3a42a49 a66 a30 for the English and Spanish senses given the concept, and the conditional probabilities a27a29a28 a10 a1a70a49 a40 a1a36a30 and a27a29a28 a10 a3a50a49 a40 a3a36a30 for the words a10 a1 and a10 a3 in each language given their senses. We can now imagine the generative process of a translation pair by the Concept Model as rst selecting a concept according to the priors, then a sense for each language given the concept, and nally a word for each sense using the conditional probabilities of the words. As in Bengio and Kermorvant (2003), this generative procedure may be captured by factoring the joint distribution using the conditional independence assumptions as a27a51a28a32a0a45a1 a15 a0a45a3 a15 a38 a1 a15 a38 a3 a15 a58 a30a7a34</Paragraph> </Section> </Section> class="xml-element"></Paper>