File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/w05-0823_metho.xml

Size: 10,332 bytes

Last Modified: 2025-10-06 14:10:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-0823">
  <Title>Statistical Machine Translation of Euparl Data by using Bilingual N-grams</Title>
  <Section position="3" start_page="0" end_page="133" type="metho">
    <SectionTitle>
2 Bilingual N-gram Translation Model
</SectionTitle>
    <Paragraph position="0"> As already mentioned, the translation model used here is based on bilingual n-grams. It actually constitutes a language model of bilingual units which are referred to as tuples (de Gispert and Mari~no, 2002). This model approximates the joint probability between source and target languages by using 3grams as it is described in the following equation:</Paragraph>
    <Paragraph position="2"> where t refers to target, s to source and (t,s)</Paragraph>
    <Paragraph position="4"> tuple of a given bilingual sentence pair.</Paragraph>
    <Paragraph position="5"> Tuples are extracted from a word-to-word aligned corpus according to the following two constraints: first, tuple extraction should produce a monotonic segmentation of bilingual sentence pairs; and second, the produced segmentation is maximal in the sense that no smaller tuples can be extracted without violating the previous constraint (Crego et al., 2004). According to this, tuple extraction provides a unique segmentation for a given bilingual sentence pair alignment. Figure 1 illustrates this idea with a simple example.</Paragraph>
    <Paragraph position="6">  We would like to achieve perfect translations NULL quisieramos lograr traducciones perfectas</Paragraph>
    <Paragraph position="8"> aligned sentence pair.</Paragraph>
    <Paragraph position="9"> Two important issues regarding this translation model must be mentioned. First, when extracting tuples, some words always appear embedded into tuples containing two or more words, so no translation probability for an independent occurrence of such words exists. To overcome this problem, the tuple 3-gram model is enhanced by incorporating 1-gram translation probabilities for all the embedded words (de Gispert et al., 2004).</Paragraph>
    <Paragraph position="10"> Second, some words linked to NULL end up producing tuples with NULL source sides. This cannot be allowed since no NULL is expected to occur in a translation input. This problem is solved by preprocessing alignments before tuple extraction such that any target word that is linked to NULL is attached to either its precedent or its following word.</Paragraph>
  </Section>
  <Section position="4" start_page="133" end_page="134" type="metho">
    <SectionTitle>
3 SMT Procedure Description
</SectionTitle>
    <Paragraph position="0"> This section describes the procedure followed for preprocessing the data, training the models and optimizing the translation system parameters.</Paragraph>
    <Section position="1" start_page="133" end_page="133" type="sub_section">
      <SectionTitle>
3.1 Preprocessing and Alignment
</SectionTitle>
      <Paragraph position="0"> The Euparl data provided for this shared task (Euparl, 2003) was preprocessed for eliminating all sentence pairs with a word ratio larger than 2.4.Asa result of this preprocessing, the number of sentences in each training set was slightly reduced. However, no significant reduction was produced.</Paragraph>
      <Paragraph position="1"> In the case of French, a re-tokenizing procedure was performed in which all apostrophes appearing alone were attached to their corresponding words.</Paragraph>
      <Paragraph position="2"> For example, pairs of tokens such as l'and qu ' were reduced to single tokens such as l' and qu'.</Paragraph>
      <Paragraph position="3"> Once the training data was preprocessed, a word-to-word alignment was performed in both directions, source-to-target and target-to-source, by using GIZA++ (Och and Ney, 2000). As an approximation to the most probable alignment, the Viterbi alignment was considered. Then, the intersection and union of alignment sets in both directions were computed for each training set.</Paragraph>
    </Section>
    <Section position="2" start_page="133" end_page="134" type="sub_section">
      <SectionTitle>
3.2 Feature Function Computation
</SectionTitle>
      <Paragraph position="0"> The considered translation system implements a total of five feature functions. The first of these models is the tuple 3-gram model, which was already described in section 2. Tuples for the translation model were extracted from the union set of alignments as shown in Figure 1. Once tuples had been extracted, the tuple vocabulary was pruned by using histogram pruning. The same pruning parameter, which was actually estimated for Spanish-English, was used for the other three language pairs. After pruning, the tuple 3-gram model was trained by using the SRI Language Modeling toolkit (Stolcke, 2002). Finally, the obtained model was enhanced by incorporating 1-gram probabilities for the embedded word tuples, which were extracted from the intersection set of alignments.</Paragraph>
      <Paragraph position="1"> Table 1 presents the total number of running words, distinct tokens and tuples, for each of the four  The second feature function considered was a target language model. This feature actually consisted of a word 3-gram model, which was trained from the target side of the bilingual corpus by using the SRI Language Modeling toolkit.</Paragraph>
      <Paragraph position="2"> The third feature function was given by a word penalty model. This function introduces a sentence length penalization in order to compensate the sys- null tem preference for short output sentences. More specifically, the penalization factor was given by the total number of words contained in the translation hypothesis.</Paragraph>
      <Paragraph position="3"> Finally, the fourth and fifth feature functions corresponded to two lexicon models based on IBM Model 1 lexical parameters p(t|s) (Brown et al., 1993). These lexicon models were calculated for each tuple according to the following equation:</Paragraph>
      <Paragraph position="5"> words in the source and target sides of tuple (t,s) n , being J and I the corresponding total number words in each side of it.</Paragraph>
      <Paragraph position="6"> The forward lexicon model uses IBM Model 1 parameters obtained from source-to-target alignments, while the backward lexicon model uses parameters obtained from target-to-source alignments.</Paragraph>
    </Section>
    <Section position="3" start_page="134" end_page="134" type="sub_section">
      <SectionTitle>
3.3 Decoding and Optimization
</SectionTitle>
      <Paragraph position="0"> The search engine for this translation system was developed by Crego et al. (2005). It implements a beam-search strategy based on dynamic programming and takes into account all the five feature functions described above simultaneously. It also allows for three different pruning methods: threshold pruning, histogram pruning, and hypothesis recombination. For all the results presented in this work the decoder's monotonic search modality was used.</Paragraph>
      <Paragraph position="1"> An optimization tool, which is based on a simplex method (Press et al., 2002), was developed and used for computing log-linear weights for each of the feature functions described above. This algorithm adjusts the log-linear weights so that BLEU (Papineni et al., 2002) is maximized over a given development set. One optimization for each language pair was performed by using the 2000-sentence development sets made available for the shared task.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="134" end_page="135" type="metho">
    <SectionTitle>
4 Shared Task Results
</SectionTitle>
    <Paragraph position="0"> Table 2 presents the BLEU scores obtained for the shared task test data. Each test set consisted of 2000 sentences. The computed BLEU scores were case  As can be seen from Table 2 the best ranked translations were those obtained for French, followed by Spanish, German and Finnish. A big difference is observed between the best and the worst results. Differences can be observed from translation outputs too. Consider, for example, the following segments taken from one of the test sentences: es-en: We know very well that the present Treaties are not enough and that , in the future , it will be necessary to develop a structure better and different for the European Union... fr-en: We know very well that the Treaties in their current are not enough and that it will be necessary for the future to develop a structure more effective and different for the Union... de-en: We very much aware that the relevant treaties are inadequate and , in future to another , more efficient structure for the European Union that must be developed...</Paragraph>
    <Paragraph position="1"> fi-en: We know full well that the current Treaties are not sufficient and that , in the future , it is necessary to develop the Union better and a different structure...</Paragraph>
    <Paragraph position="2"> It is evident from these translation outputs that translation quality decreases when moving from Spanish and French to German and Finnish. A detailed observation of translation outputs reveals that there are basically two problems related to this degradation in quality. The first has to do with reordering, which seems to be affecting Finnish and, specially, German translations.</Paragraph>
    <Paragraph position="3"> The second problem has to do with vocabulary. It is well known that large vocabularies produce data sparseness problems (Koehn, 2002). As can be confirmed from Tables 1 and 2, translation quality decreases as vocabulary size increases. However, it is not clear yet, in which degree such degradation is due to monotonic decoding and/or vocabulary size.</Paragraph>
    <Paragraph position="4"> Finally, we also evaluated how much the full feature function system differs from the baseline tuple 3-gram model alone. In this way, BLEU scores were computed for translation outputs obtained for the baseline system and the full system. Since the English reference for the test set was not available, we computed translations and BLEU scores over de- null velopment sets. Table 3 presents the results for both the full system and the baseline.</Paragraph>
    <Paragraph position="5">  From Table 3, it is evident that the four additional feature functions produce important improvements in translation quality.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML