File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/j00-2004_intro.xml
Size: 4,390 bytes
Last Modified: 2025-10-06 14:00:52
<?xml version="1.0" standalone="yes"?> <Paper uid="J00-2004"> <Title>Models of Translational Equivalence among Words</Title> <Section position="4" start_page="223" end_page="224" type="intro"> <SectionTitle> AEA </SectionTitle> <Paragraph position="0"> The second stage of decomposition takes us from bags of words to the words that they contain. The following bag pair generation process illustrates how a word-to-word translation model can be embedded in a bag-to-bag translation model for languages PS1 and PS2: From each concept Ci, 1 < i < I, generate a pair of word sequences (ffi, rTi) from PS~ x PS~, according to the distribution trans(G ~), to lexicalize the concept in the two languages. 5 Some concepts are not lexicalized in some languages, so one of ffi and rTi may be empty.</Paragraph> <Paragraph position="1"> A pair of bags containing m and n nonempty word sequences can be generated by a process where l is anywhere between 1 and m + n.</Paragraph> <Paragraph position="2"> For notational convenience, the elements of the two bags can be labeled so that B1 - {u~,...,t~} and B 2 ~ {V~ ..... ~}, where some of the 1/'s and &quot;?'s may be empty. The elements of an assignment, then, are pairs of bag element labels: A -{(h,jl) ..... (h, jl)}, where each i ranges over {IJ 1 ..... 11l}, eachj ranges over {v~ ..... x~}, 3 Assignments are different from Brown, Della Pietra, Della Pietra, and Mercer's (1993) alignments in that assignments can range over pairs of arbitrary labels, not necessarily sequence position indexes. Also, unlike alignments, assignments must be one-to-one.</Paragraph> <Paragraph position="3"> 4 The exact nature of the bag size distribution is immaterial for the present purposes. 5 Since they are put into bags, ffi and r7 i could just as well be bags instead of sequences. I make them sequences only to be consistent with more sophisticated models that account for noncompositional compounds (e.g. Melamed, to appear, Chapter 8).</Paragraph> <Paragraph position="4"> Melamed Models of Translational Equivalence each i is distinct, and each j is distinct. The label pairs in a given assignment can be generated in any order, so there are I! ways to generate an assignment of size I. 6 It follows that the probability of generating a pair of bags (B1, B2) with a particular assignment A of size l is Pr(B1,A, B2\]I,C, trans) : Pr(1). I! n E Pr(C)trans('fi'vilC)&quot; (i,j) ff A CCC (lO) The above equation holds regardless of how we represent concepts. There are many plausible representations, such as pairs of trees from synchronous tree adjoining grammars (Abeill6 et al. 1990; Shieber 1994; Candito 1998), lexical conceptual structures (Dorr 1992) and WordNet synsets (Fellbaum 1998; Vossen 1998). Of course, for a representation to be used, a method must exist for estimating its distribution in data. A useful representation will reduce the entropy of the trans distribution, which is conditioned on the concept distribution as shown in Equation 10. This topic is beyond the scope of this article, however. I mention it only to show how the models presented here may be used as building blocks for models that are more psycholinguistically sophisticated.</Paragraph> <Paragraph position="5"> To make the translation model estimation methods presented here as general as possible, I shall assume a totally uninformative concept representation--the trans distribution itself. In other words, I shall assume that each different pair of word sequence types is deterministically generated from a different concept, so that trans(.1i,~i\]C) is zero for all concepts except one. Now, a bag-to-bag translation model can be fully specified by the distributions of l and trans.</Paragraph> <Paragraph position="7"> The probability distribution trans (.1, ~) is a word-to-word translation model. Unlike the models proposed by Brown et al. (1993b), this model is symmetric, because both word bags are generated together from a joint probability distribution. Brown and his colleagues' models, reviewed in Section 4.3, generate one half of the bitext given the other hall so they are represented by conditional probability distributions. A sequence-to-sequence translation model can be obtained from a word-to-word translation model by combining Equation 11 with order information as in Equation 8.</Paragraph> </Section> class="xml-element"></Paper>