File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/p05-1066_metho.xml

Size: 26,895 bytes

Last Modified: 2025-10-06 14:09:46

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1066">
  <Title>Clause Restructuring for Statistical Machine Translation</Title>
  <Section position="4" start_page="531" end_page="534" type="metho">
    <SectionTitle>
2 Background
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="531" end_page="532" type="sub_section">
      <SectionTitle>
2.1 Previous Work
</SectionTitle>
      <Paragraph position="0"> The original work on statistical machine translation was carried out by researchers at IBM (Brown et al., 1993). More recently, phrase-based models (Och et al., 1999; Marcu and Wong, 2002; Koehn et al., 2003) have been proposed as a highly successful alternative to the IBM models. Phrase-based models generalize the original IBM models by allowing multiple words in one language to correspond to multiple words in another language. For example, we might have a translation entry specifying that I will in English is a likely translation for Ich werde in German.</Paragraph>
      <Paragraph position="1"> In this paper we use the phrase-based system of (Koehn et al., 2003) as our underlying model.</Paragraph>
      <Paragraph position="2"> This approach first uses the original IBM models to derive word-to-word alignments in the corpus of example translations. Heuristics are then used to grow these alignments to encompass phrase-to-phrase pairs. The end result of the training process is a lexicon of phrase-to-phrase pairs, with associated costs or probabilities. In translation with the system, a beam search method with left-to-right search is used to find a high scoring translation for an input sentence. At each stage of the search, one or more English words are added to the hypothesized string, and one or more consecutive German words are &amp;quot;absorbed&amp;quot; (i.e., marked as having already been translated--note that each word is absorbed at most once). Each step of this kind has a number of costs: for example, the log probability of the phrase-to-phrase correspondance involved, the log probability from a language model, and some &amp;quot;distortion&amp;quot; score indicating how likely it is for the proposed words in the English string to be aligned to the corresponding position in the German string.</Paragraph>
      <Paragraph position="3">  A number of researchers (Alshawi, 1996; Wu, 1997; Yamada and Knight, 2001; Gildea, 2003; Melamed, 2004; Graehl and Knight, 2004; Galley et al., 2004) have proposed models where the translation process involves syntactic representations of the source and/or target languages. One class of approaches make use of &amp;quot;bitext&amp;quot; grammars which simultaneously parse both the source and target languages. Another class of approaches make use of syntactic information in the target language alone, effectively transforming the translation problem into a parsing problem. Note that these models have radically different structures and parameterizations from phrase-based models for SMT. As yet, these systems have not shown significant gains in accuracy in comparison to phrase-based systems.</Paragraph>
      <Paragraph position="4"> Reranking methods have also been proposed as a method for using syntactic information (Koehn and Knight, 2003; Och et al., 2004; Shen et al., 2004). In these approaches a baseline system is used to generate a0 -best output. Syntactic features are then used in a second model that reranks the a0 -best lists, in an attempt to improve over the baseline approach.</Paragraph>
      <Paragraph position="5"> (Koehn and Knight, 2003) apply a reranking approach to the sub-task of noun-phrase translation.</Paragraph>
      <Paragraph position="6"> (Och et al., 2004; Shen et al., 2004) describe the use of syntactic features in reranking the output of a full translation system, but the syntactic features give very small gains: for example the majority of the gain in performance in the experiments in (Och et al., 2004) was due to the addition of IBM Model 1 translation probabilities, a non-syntactic feature.</Paragraph>
      <Paragraph position="7"> An alternative use of syntactic information is to employ an existing statistical parsing model as a language model within an SMT system. See (Charniak et al., 2003) for an approach of this form, which shows improvements in accuracy over a baseline system.</Paragraph>
      <Paragraph position="8">  Our approach involves a preprocessing step, where sentences in the language being translated are modified before being passed to an existing phrase-based translation system. A number of other re- null searchers (Berger et al., 1996; Niessen and Ney, 2004; Xia and McCord, 2004) have described previous work on preprocessing methods. (Berger et al., 1996) describe an approach that targets translation of French phrases of the form NOUN de NOUN (e.g., conflit d'int'er^et). This was a relatively limited study, concentrating on this one syntactic phenomenon which involves relatively local transformations (a parser was not required in this study). (Niessen and Ney, 2004) describe a method that combines morphologically-split verbs in German, and also reorders questions in English and German.</Paragraph>
      <Paragraph position="9"> Our method goes beyond this approach in several respects, for example considering phenomena such as declarative (non-question) clauses, subordinate clauses, negation, and so on.</Paragraph>
      <Paragraph position="10"> (Xia and McCord, 2004) describe an approach for translation from French to English, where reordering rules are acquired automatically. The reordering rules in their approach operate at the level of context-free rules in the parse tree. Our method differs from that of (Xia and McCord, 2004) in a couple of important respects. First, we are considering German, which arguably has more challenging word order phenonema than French. German has relatively free word order, in contrast to both English and French: for example, there is considerable flexibility in terms of which phrases can appear in the first position in a clause. Second, Xia et. al's (2004) use of reordering rules stated at the context-free level differs from ours. As one example, in our approach we use a single transformation that moves an infinitival verb to the first position in a verb phrase. Xia et. al's approach would require learning of a different rule transformation for every production of the form VP =&gt; .... In practice the German parser that we are using creates relatively &amp;quot;flat&amp;quot; structures at the VP and clause levels, leading to a huge number of context-free rules (the flatness is one consequence of the relatively free word order seen within VP's and clauses in German). There are clearly some advantages to learning reordering rules automatically, as in Xia et. al's approach. However, we note that our approach involves a handful of linguistically-motivated transformations and achieves comparable improvements (albeit on a different language pair) to Xia et. al's method, which in contrast involves over 56,000 transformations.</Paragraph>
      <Paragraph position="12"> cusative object.</Paragraph>
    </Section>
    <Section position="2" start_page="532" end_page="533" type="sub_section">
      <SectionTitle>
2.2 German Clause Structure
</SectionTitle>
      <Paragraph position="0"> In this section we give a brief description of the syntactic structure of German clauses. The characteristics we describe motivate the reordering rules described later in the paper.</Paragraph>
      <Paragraph position="1"> Figure 1 gives an example parse tree for a German sentence. This sentence contains two clauses:  Clause 1: Ich/I werde/will Ihnen/to you die/the entsprechenden/corresponding Anmerkungen/comments aushaendigen/pass on Clause 2: damit/so that Sie/you das/them eventuell/perhaps bei/in der/the Abstimmung/vote  uebernehmen/adopt koennen/can These two clauses illustrate a number of syntactic phenomena in German which lead to quite different word order from English: Position of finite verbs. In Clause 1, which is a matrix clause, the finite verb werde is in the second position in the clause. Finite verbs appear rigidly in 2nd position in matrix clauses. In contrast, in subordinate clauses, such as Clause 2, the finite verb comes last in the clause. For example, note that koennen is a finite verb which is the final element of Clause 2.</Paragraph>
      <Paragraph position="2"> Position of infinitival verbs. In German, infinitival verbs are final within their associated verb  phrase. For example, returning to Figure 1, notice that aushaendigen is the last element in its verb phrase, and that uebernehmen is the final element of its verb phrase in the figure.</Paragraph>
      <Paragraph position="3"> Relatively flexible word ordering. German has substantially freer word order than English. In particular, note that while the verb comes second in matrix clauses, essentially any element can be in the first position. For example, in Clause 1, while the subject Ich is seen in the first position, potentially any of the other constituents (e.g., Ihnen) could also appear in this position. Note that this often leads to the subject following the finite verb, something which happens very rarely in English.</Paragraph>
      <Paragraph position="4"> There are many other phenomena which lead to differing word order between German and English. Two others that we focus on in this paper are negation (the differing placement of items such as not in English and nicht in German), and also verb-particle constructions. We describe our treatment of these phenomena later in this paper.</Paragraph>
    </Section>
    <Section position="3" start_page="533" end_page="534" type="sub_section">
      <SectionTitle>
2.3 Reordering with Phrase-Based SMT
</SectionTitle>
      <Paragraph position="0"> We have seen in the last section that German syntax has several characteristics that lead to significantly different word order from that of English. We now describe how these characteristics can lead to difficulties for phrase-based translation systems when applied to German to English translation.</Paragraph>
      <Paragraph position="1"> Typically, reordering models in phrase-based systems are based solely on movement distance. In particular, at each point in decoding a &amp;quot;cost&amp;quot; is associated with skipping over 1 or more German words.</Paragraph>
      <Paragraph position="2"> For example, assume that in translating Ich werde Ihnen die entsprechenden Anmerkungen aushaendigen.</Paragraph>
      <Paragraph position="3"> we have reached a state where &amp;quot;Ich&amp;quot; and &amp;quot;werde&amp;quot; have been translated into &amp;quot;I will&amp;quot; in English. A potential decoding decision at this point is to add the phrase &amp;quot;pass on&amp;quot; to the English hypothesis, at the same time absorbing &amp;quot;aushaendigen&amp;quot; from the German string. The cost of this decoding step will involve a number of factors, including a cost of skipping over a phrase of length 4 (i.e., Ihnen die entsprechenden Anmerkungen) in the German string.</Paragraph>
      <Paragraph position="4"> The ability to penalise &amp;quot;skips&amp;quot; of this type, and the potential to model multi-word phrases, are essentially the main strategies that the phrase-based system is able to employ when modeling differing word-order across different languages. In practice, when training the parameters of an SMT system, for example using the discriminative methods of (Och, 2003), the cost for skips of this kind is typically set to a very high value. In experiments with the system of (Koehn et al., 2003) we have found that in practice a large number of complete translations are completely monotonic (i.e., have a0 skips), suggesting that the system has difficulty learning exactly what points in the translation should allow reordering. In summary, phrase-based systems have relatively limited potential to model word-order differences between different languages.</Paragraph>
      <Paragraph position="5"> The reordering stage described in this paper attempts to modify the source language (e.g., German) in such a way that its word order is very similar to that seen in the target language (e.g., English). In an ideal approach, the resulting translation problem that is passed on to the phrase-based system will be solvable using a completely monotonic translation, without any skips, and without requiring extremely long phrases to be translated (for example a phrasal translation corresponding to Ihnen die entsprechenden Anmerkungen aushaendigen).</Paragraph>
      <Paragraph position="6"> Note than an additional benefit of the reordering phase is that it may bring together groups of words in German which have a natural correspondance to phrases in English, but were unseen or rare in the original German text. For example, in the previous example, we might derive a correspondance between werde aushaendigen and will pass on that was not possible before reordering. Another example concerns verb-particle constructions, for example in Wir machen die Tuer auf machen and auf form a verb-particle construction.</Paragraph>
      <Paragraph position="7"> The reordering stage moves auf to precede machen, allowing a phrasal entry that &amp;quot;auf machen&amp;quot; is translated to to open in English. Without the reordering, the particle can be arbitrarily far from the verb that it modifies, and there is a danger in this example of translating machen as to make, the natural translation when no particle is present.</Paragraph>
      <Paragraph position="8">  Original sentence: Ich werde Ihnen die entsprechenden Anmerkungen aushaendigen, damit Sie das eventuell bei der Abstimmung uebernehmen koennen. (I will to you the corresponding comments pass on, so that you them perhaps in the vote adopt can.) Reordered sentence: Ich werde aushaendigen Ihnen die entsprechenden Anmerkungen, damit Sie koennen uebernehmen das eventuell bei der Abstimmung.</Paragraph>
      <Paragraph position="9"> (I will pass on to you the corresponding comments, so that you can adopt them perhaps in the vote.)</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="534" end_page="534" type="metho">
    <SectionTitle>
3 Clause Restructuring
</SectionTitle>
    <Paragraph position="0"> We now describe the method we use for reordering German sentences. As a first step in the reordering process, we parse the sentence using the parser described in (Dubey and Keller, 2003). The second step is to apply a sequence of rules that reorder the German sentence depending on the parse tree structure. See Figure 2 for an example German sentence before and after the reordering step.</Paragraph>
    <Paragraph position="1"> In the reordering phase, each of the following six restructuring steps were applied to a German parse tree, in sequence (see table 1 also, for examples of the reordering steps): [1] Verb initial In any verb phrase (i.e., phrase with label VP-...) find the head of the phrase (i.e., the child with label -HD) and move it into the initial position within the verb phrase. For example, in the parse tree in Figure 1, aushaendigen would be moved to precede Ihnen in the first verb phrase (VP-OC), and uebernehmen would be moved to precede das in the second VP-OC. The subordinate clause would have the following structure after this transformation: null  S-..., with a complementizer KOUS, PREL, PWS or PWAV, find the head of the clause, and move it to directly follow the complementizer.</Paragraph>
    <Paragraph position="2"> For example, in the subordinate clause in Figure 1, the head of the clause koennen would be moved to follow the complementizer damit, giving the following structure:</Paragraph>
  </Section>
  <Section position="6" start_page="534" end_page="534" type="metho">
    <SectionTitle>
S-MO KOUS-CP damit
VMFIN-HD koennen
PPER-SB Sie
VP-OC VVINF-HD uebernehmen
PDS-OA das
ADJD-MO eventuell
PP-MO APPR-DA bei
ART-DA der
NN-NK Abstimmung
[3] Move Subject For any clause (i.e., phrase with
</SectionTitle>
    <Paragraph position="0"> label S...), move the subject to directly precede the head. We define the subject to be the left-most child of the clause with label ...-SB or PPER-EP, and the head to be the leftmost child with label ...-HD.</Paragraph>
    <Paragraph position="1"> For example, in the subordinate clause in Figure 1, the subject Sie would be moved to precede koennen, giving the following structure:</Paragraph>
  </Section>
  <Section position="7" start_page="534" end_page="535" type="metho">
    <SectionTitle>
S-MO KOUS-CP damit
PPER-SB Sie
VMFIN-HD koennen
VP-OC VVINF-HD uebernehmen
PDS-OA das
ADJD-MO eventuell
PP-MO APPR-DA bei
ART-DA der
NN-NK Abstimmung
</SectionTitle>
    <Paragraph position="0"> [4] Particles In verb particle constructions, move the particle to immediately precede the verb. More specifically, if a finite verb (i.e., verb tagged as VVFIN) and a particle (i.e., word tagged as PTKVZ) are found in the same clause, move the particle to precede the verb.</Paragraph>
    <Paragraph position="1"> As one example, the following clause contains both a verb (forden) as well as a particle (auf):  [5] Infinitives In some cases, infinitival verbs are still not in the correct position after transformations [1]-[4]. For this reason we add a second step that involves infinitives. First, we remove all internal VP nodes within the parse tree. Second, for any clause (i.e., phrase labeled S...), if the clause dominates both a finite and infinitival verb, and there is an argument (i.e., a subject, or an object) between the two verbs, then the infinitive is moved to directly follow the finite verb.</Paragraph>
    <Paragraph position="2"> As an example, the following clause contains an infinitival (einreichen) that is separated from a finite verb konnten by the direct object es:  particles. If a clause dominates both a finite and infinitival verb, as well as a negative particle (i.e., a word tagged as PTKNEG), then the negative particle is moved to directly follow the finite verb.</Paragraph>
    <Paragraph position="3"> As an example, the previous example now has the negative particle nicht moved, to give the following</Paragraph>
  </Section>
  <Section position="8" start_page="535" end_page="537" type="metho">
    <SectionTitle>
AP-MO ADV-MO mehr
ADJD-HD rechtzeitig
4 Experiments
</SectionTitle>
    <Paragraph position="0"> This section describes experiments with the reordering approach. Our baseline is the phrase-based MT system of (Koehn et al., 2003). We trained this system on the Europarl corpus, which consists of 751,088 sentence pairs with 15,256,792 German words and 16,052,269 English words. Translation performance is measured on a 2000 sentence test set from a different part of the Europarl corpus, with average sentence length of 28 words.</Paragraph>
    <Paragraph position="1"> We use BLEU scores (Papineni et al., 2002) to measure translation accuracy. We applied our re- null annotators on 100 translation judgements. R gives counts corresponding to translations where an annotator preferred the re-ordered system; B signifies that the annotator preferred the baseline system; E means an annotator judged the two systems to give equal quality translations.</Paragraph>
    <Paragraph position="2"> ordering method to both the training and test data, and retrained the system on the reordered training data. The BLEU score for the new system was 26.8%, an improvement from 25.2% BLEU for the baseline system.</Paragraph>
    <Section position="1" start_page="536" end_page="536" type="sub_section">
      <SectionTitle>
4.1 Human Translation Judgements
</SectionTitle>
      <Paragraph position="0"> We also used human judgements of translation quality to evaluate the effectiveness of the reordering rules. We randomly selected 100 sentences from the test corpus where the English reference translation was between 10 and 20 words in length.1 For each of these 100 translations, we presented the two annotators with three translations: the reference (human) translation, the output from the baseline system, and the output from the system with reordering. No indication was given as to which system was the base-line system, and the ordering in which the baseline and reordered translations were presented was chosen at random on each example, to prevent ordering effects in the annotators' judgements. For each example, we asked each of the annotators to make one of two choices: 1) an indication that one translation was an improvement over the other; or 2) an indication that the translations were of equal quality.</Paragraph>
      <Paragraph position="1"> Annotator 1 judged 40 translations to be improved by the reordered model; 40 translations to be of equal quality; and 20 translations to be worse under the reordered model. Annotator 2 judged 44 translations to be improved by the reordered model; 37 translations to be of equal quality; and 19 translations to be worse under the reordered model. Table 2 gives figures indicating agreement rates between the annotators. Note that if we only consider preferences where both annotators were in agree1We chose these shorter sentences for human evaluation because in general they include a single clause, which makes human judgements relatively straightforward.</Paragraph>
      <Paragraph position="2"> ment (and consider all disagreements to fall into the &amp;quot;equal&amp;quot; category), then 33 translations improved under the reordering system, and 13 translations became worse. Figure 3 shows a random selection of the translations where annotator 1 judged the re-ordered model to give an improvement; Figure 4 shows examples where the baseline system was preferred by annotator 1. We include these examples to give a qualitative impression of the differences between the baseline and reordered system. Our (no doubt subjective) impression is that the cases in figure 3 are more clear cut instances of translation improvements, but we leave the reader to make his/her own judgement on this point.</Paragraph>
    </Section>
    <Section position="2" start_page="536" end_page="537" type="sub_section">
      <SectionTitle>
4.2 Statistical Significance
</SectionTitle>
      <Paragraph position="0"> We now describe statistical significance tests for our results. We believe that applying significance tests to Bleu scores is a subtle issue, for this reason we go into some detail in this section.</Paragraph>
      <Paragraph position="1"> We used the sign test (e.g., see page 166 of (Lehmann, 1986)) to test the statistical significance of our results. For a source sentence a0 , the sign test requires a function a1a3a2a0a5a4 that is defined as follows:  for a9 than the reordered system.</Paragraph>
      <Paragraph position="2"> a13 If the two systems produce equal quality translations on a9 We assume that sentences a0 are drawn from some underlying distribution a21a22a2 a0a5a4 , and that the test set consists of independently, identically distributed (IID) sentences from this distribution. We can define the following probabilities:</Paragraph>
      <Paragraph position="4"> where the probability is taken with respect to the distribution a21a35a2 a0a28a4 . The sign test has the null hypothesis a36a35a37 a26 a38a39a23 a24a41a40 a23 a31a43a42 and the alternative hypothesis a36a45a44 a26 a38a39a23a46a24 a47a41a23a48a31 a42 . Given a sample of a49 test points a38 a0 a44a51a50a53a52a53a52a53a52a51a50 a0a55a54 a42 , the sign test depends on calculation of the following counts:  and a56 a37 a26a59a58a60a38a51a61a69a62 a1a3a2 a0a55a64a70a4 a26 a0 a42 a58, where a58a0 a58 is the cardinality of the set a0 .</Paragraph>
      <Paragraph position="5"> We now come to the definition of a1a3a2 a0a28a4 -- how should we judge whether a translation from one system is better or worse than the translation from another system? A critical problem with Bleu scores is that they are a function of an entire test corpus and do not give translation scores for single sentences.</Paragraph>
      <Paragraph position="6"> Ideally we would have some measure a1a2a1a69a2 a0a28a4a4a3a6a5 of the quality of the translation of sentence a0 under the reordered system, and a corresponding function a1a8a7 a2a0a5a4 that measures the quality of the baseline translation. We could then define a1a3a2 a0a5a4 as follows:  sentence measures a1a11a1a69a2a0a5a4 and a1 a7 a2a0a5a4 , and thus do not allow a definition of a1a3a2 a0a5a4 in this way. In general the lack of per-sentence scores makes it challenging to apply significance tests to Bleu scores.2 To get around this problem, we make the following approximation. For any test sentence a0 a64 , we calculate a1a3a2a0a55a64a70a4 as follows. First, we define a12 to be the Bleu score for the test corpus when translated by the baseline model. Next, we define a12a64 to be the Bleu score when all sentences other than a0a22a64 are translated by the baseline model, and where a0a22a64 itself is translated by the reordered model. We then define  is not valid, as it depends on the entire set of sample points a0 a44a48a52a53a52a53a52 a0a55a54 rather than a0 a64 alone. However, we believe it is a reasonable approximation to an ideal 2The lack of per-sentence scores means that it is not possible to apply standard statistical tests such as the sign test or the t-test (which would test the hypothesis a14a10a15a6a17a16a25a7a10a9a12a11a19a18a21a20 a14a10a15a6a17a22a48a7a10a9a12a11a19a18 , where a14a10a15 a0 a18 is the expected value under a9 ). Note that previous work (Koehn, 2004; Zhang and Vogel, 2004) has suggested the use of bootstrap tests (Efron and Tibshirani, 1993) for the calculation of confidence intervals for Bleu scores. (Koehn, 2004) gives empirical evidence that these give accurate estimates for Bleu statistics. However, correctness of the bootstrap method relies on some technical properties of the statistic (e.g., Bleu scores) being used (e.g., see (Wasserman, 2004) theorem 8.3); (Koehn, 2004; Zhang and Vogel, 2004) do not discuss whether Bleu scores meet any such criteria, which makes us uncertain of their correctness when applied to Bleu scores.</Paragraph>
      <Paragraph position="7"> function a1a3a2a0a5a4 that indicates whether the translations have improved or not under the reordered system. Given this definition of a1a3a2a0a28a4 , we found that</Paragraph>
      <Paragraph position="9"> of all test sentences had improved translations under the baseline system, 36.4% of all sentences had worse translations, and 10.75% of all sentences had the same quality as before.) If our definition of a1a3a2a0a5a4 was correct, these values for a56 a24 and a56a66a31 would be significant at the level a23 a40 a0 a52 a0 a23 .</Paragraph>
      <Paragraph position="10"> We can also calculate confidence intervals for the results. Define a21 to be the probability that the re-ordered system improves on the baseline system, given that the two systems do not have equal performance. The relative frequency estimate of a21 is</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML