File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1050_metho.xml
Size: 13,995 bytes
Last Modified: 2025-10-06 14:07:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1050"> <Title>Towards a Unified Approach to Memoryand Statistical-Based Machine Translation</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The IBM Model 4 </SectionTitle> <Paragraph position="0"> For the work described in this paper we used a modified version of the statistical machine translation tool developed in the context of the 1999 Johns Hopkins- Summer Workshop (Al-Onaizan et al., 1999), which implements IBM translation model 4 (Brown et al., 1993).</Paragraph> <Paragraph position="1"> IBM model 4 revolves around the notion of word alignment over a pair of sentences (see Figure 2). The word alignment is a graphical representation of an hypothetical stochastic process by which a source string e is converted into a target string f. The probability of a given alignment a and target sentence f given a source sentence e is given by</Paragraph> <Paragraph position="3"> where the factors delineated by a13 symbols correspond to hypothetical steps in the following generative process: a65 Each English word e a3 is assigned with probability na8a10a9 a3 a0 ea3a12a11 a fertility a9 a3 , which corresponds to the number of French words into which e is going to be translated.</Paragraph> <Paragraph position="4"> a65 Each English word e a3 is then translated with probability ta8a20a19 a3 a18 a0 ea3 a11 into a French word a19 a3 a18 , where a66 ranges from 1 to the number of words a9 a3 (fertility of ea3 ) into which ea3 is translated. For example, the English word &quot;no&quot; in Figure 2 is a word of fertility 2 that is translated into &quot;aucun&quot; and &quot;ne&quot;. a65 The rest of the factors denote distorsion probabilities (d), which capture the probability that words change their position when translated from one language into another; the probability of some French words being generated from an invisible English NULL element (pa6 ), etc. See (Brown et al., 1993) or (Germann et al., 2001) for a detailed discussion of this translation model and a description of its parameters.</Paragraph> <Paragraph position="5"> 3 Building a statistical translation memory Companies that specialize in producing high-quality human translations of documentation and news rely often on translation memory tools to increase their productivity (Sprung, 2000). Building high-quality TMEM is an expensive process that requires many person-years of work. Since we are not in the fortunate position of having access to an existing TMEM, we decided to build one automatically.</Paragraph> <Paragraph position="6"> We trained IBM translation model 4 on 500,000 English-French sentence pairs from the Hansard corpus. We then used the Viterbi alignment of each sentence, i.e., the alignment of highest probability, to extract tuples of the form</Paragraph> <Paragraph position="8"> a contiguous English phrase, a75a72a76a52a41a44a75a72a76a42a69a7a6a37a41a72a71a72a71a72a71a17a41a44a75 a76a35a69 a1 represents a contiguous French phrase, and a38a21a76a52a41a43a38a21a76a35a69a7a6a70a41a72a71a72a71a72a71a70a41a43a38 a76a42a69 a1 represents the Viterbi alignment between the two phrases. We selected only &quot;contiguous&quot; alignments, i.e., alignments in which the words in the English phrase generated only words in the French phrase and each word in the French phrase was generated either by the NULL word or a word from the English phrase.</Paragraph> <Paragraph position="9"> We extracted only tuples in which the English and French phrases contained at least two words.</Paragraph> <Paragraph position="10"> For example, in the Viterbi alignment of the two sentences in Figure 2, which was produced automatically, &quot;there&quot; and &quot;.&quot; are words of fertility 0, NULL generates the French lexeme &quot;.&quot;, &quot;is&quot; generates &quot;est&quot;, &quot;no&quot; generates &quot;aucun&quot; and &quot;ne&quot;, and so on. From this alignment we extracted the</Paragraph> <Paragraph position="12"> by IBM model 4.</Paragraph> <Paragraph position="13"> six tuples shown in Table 1, because they were the only ones that satisfied all conditions mentioned above. For example, the pair a67 no one ; aucun syndicat particulier ne a78 does not occur in the translation memory because the French word &quot;syndicat&quot; is generated by the word &quot;union&quot;, which does not occur in the English phrase &quot;no one&quot;.</Paragraph> <Paragraph position="14"> By extracting all tuples of the form a67a10a68 a73a44a75a100a73a43a38a101a78 from the training corpus, we ended up with many duplicates and with French phrases that were paired with multiple English translations. We chose for each French phrase only one possible English translation equivalent. We tried out two distinct methods for choosing a translation equivalent, thus constructing two different probabilistic TMEMs: a65 The Frequency-based Translation MEMory (FTMEM) was created by associating with each French phrase the English equivalent that occurred most often in the collection of phrases that we extracted.</Paragraph> <Paragraph position="15"> a65 The Probability-based Translation MEMory (PTMEM) was created by associating with each French phrase the English equivalent that corresponded to the alignment of highest probability.</Paragraph> <Paragraph position="16"> In contrast to other TMEMs, our TMEMs explicitly encode not only the mutual translation pairs but also their corresponding word-level alignments, which are derived according to a certain translation model (in our case, IBM model 4).</Paragraph> <Paragraph position="17"> The mutual translations can be anywhere between two words long to complete sentences. Both methods yielded translation memories that contained around 11.8 million word-aligned translation pairs. Due to efficiency considerations and memory limitations -- the software we wrote loads a complete TMEM into the memory -- we used in our experiments only a fraction of the TMEMs, those that contained phrases at most 10 English French Alignment one union syndicat particulier one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 no one union aucun syndicat particulier ne no a102a106a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 is no one union aucun syndicat particulier ne est is a102a106a103 esta105 ; no a102a104a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 there is no one union aucun syndicat particulier ne est is a102a106a103 esta105 ; no a102a104a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 is no one union involved aucun syndicat particulier ne est en cause is a102a106a103 esta105 ; no a102a104a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 involved a102a106a103 en causea105 there is no one union involved aucun syndicat particulier ne est en cause is a102a106a103 esta105 ; no a102a104a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 involved a102a106a103 en causea105 there is no one union involved . aucun syndicat particulier ne est en cause . is a102a106a103 esta105 ; no a102a104a103 aucun, nea105 ; one a102a104a103 particuliera105 ; uniona102a104a103 syndicata105 involved a102a106a103 en causea105 ; NULL a102a106a103 . a105 TMEMs.</Paragraph> <Paragraph position="18"> words long. This yielded a working FTMEM of 4.1 million and a PTMEM of 5.7 million phrase translation pairs aligned at the word level using IBM statistical model 4.</Paragraph> <Paragraph position="19"> To evaluate the quality of both TMEMs we built, we extracted randomly 200 phrase pairs from each TMEM. These phrases were judged by a bilingual speaker as a65 perfect translations if she could imagine contexts in which the aligned phrases could be mutual translations of each other; a65 almost perfect translations if the aligned phrases were mutual translations of each other and one phrase contained one single word with no equivalent in the other lan- null de&quot; and &quot;final act , the secretary of&quot; were labeled as almost perfect because the English word &quot;act&quot; has no French equivalent. null The results of the evaluation are shown in Table 2. A visual inspection of the phrases in our TMEMs and the judgments made by the evaluator suggest that many of the translations labeled as incorrect make sense when assessed in a larger context. For example, &quot;autres r'egions de le pays que&quot; and &quot;other parts of Canada than&quot; were judged as incorrect. However, when considered in a context in which it is clear that &quot;Canada&quot; and &quot;pays&quot; corefer, it would be reasonable to assume that the translation is correct. Table 3 shows a few examples of phrases from our FTMEM and their corresponding correctness judgments.</Paragraph> <Paragraph position="20"> Although we found our evaluation to be extremely conservative, we decided nevertheless to stick to it as it adequately reflects constraints specific to high-standard translation environments in which TMEMs are built manually and constantly checked for quality by specialized teams (Sprung, 2000).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Statistical decoding using both a </SectionTitle> <Paragraph position="0"> statistical TMEM and a statistical translation model The results in Table 2 show that about 70% of the entries in our translation memory are correct or almost correct (very easy to fix). It is, though, an empirical question to what extend such TMEMs can be used to improve the performance of current translation systems. To determine this, we modified an existing decoding algorithm so that it can exploit information specific both to a statistical translation model and a statistical TMEM. English French Judgment , but I cannot say , mais je ne puis dire correct how did this all come about ? comment est-ce arriv'ee ? correct but , I humbly believe mais , `a mon humble avis correct final act , the secretary of final , le secr'etaire de almost correct other parts of Canada than autres r'egions de le pays que incorrect what is the total amount accumulated a combien se 'el`eve la incorrect that party present this ce parti pr'esent aujourd-hui incorrect the airraft company to present further studies de autre 'etudes incorrect The decoding algorithm that we use is a greedy one -- see (Germann et al., 2001) for details. The decoder guesses first an English translation for the French sentence given as input and then attempts to improve it by exploring greedily alternative translations from the immediate translation space. We modified the greedy decoder described by Germann et al. (2001) so that it attempts to find good translation starting from two distinct points in the space of possible translations: one point corresponds to a word-for-word &quot;gloss&quot; of the French input; the other point corresponds to a translation that resembles most closely translations stored in the TMEM.</Paragraph> <Paragraph position="1"> As discussed by Germann et al. (2001), the word-for-word gloss is constructed by aligning each French word fa76 with its most likely English translation ef</Paragraph> <Paragraph position="3"> For example, in translating the French sentence &quot;Bien entendu , il parle de une belle victoire .&quot;, the greedy decoder initially assumes that a good translation of it is &quot;Well heard , it talking a beautiful victory&quot; because the best translation of &quot;bien&quot; is &quot;well&quot;, the best translation of &quot;entendu&quot; is &quot;heard&quot;, and so on. A word-for-word gloss results (at best) in English words written in French word order.</Paragraph> <Paragraph position="4"> The translation that resembles most closely translations stored in the TMEM is constructed by deriving a &quot;cover&quot; for the input sentence using phrases from the TMEM. The derivation attempts to cover with translation pairs from the TMEM as much of the input sentence as possible, using the longest phrases in the TMEM. The words in the input that are not part of any phrase extracted from the TMEM are glossed. For example, this approach may start the translation process from the phrase &quot;well , he is talking a beautiful victory&quot; if the TMEM contains the pairs a67 well , ; bien entendu ,a78 and a67 he is talking; il parlea78 but no pair with the French phrase &quot;belle victoire&quot;.</Paragraph> <Paragraph position="5"> If the input sentence is found &quot;as is&quot; in the translation memory, its translation is simply returned and there is no further processing. Otherwise, once an initial alignment is created, the greedy decoder tries to improve it, i.e., it tries to find an alignment (and implicitly a translation) of higher probability by modifying locally the initial alignment. The decoder attempts to find alignments and translations of higher probability by employing a set of simple operations, such as changing the translation of one or two words in the alignment under consideration, inserting into or deleting from the alignment words of fertility zero, and swapping words or segments.</Paragraph> <Paragraph position="6"> In a stepwise fashion, starting from the initial gloss or initial cover, the greedy decoder iterates exhaustively over all alignments that are one such simple operation away from the alignment under consideration. At every step, the decoder chooses the alignment of highest probability, until the probability of the current alignment can no longer be improved.</Paragraph> </Section> class="xml-element"></Paper>