File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1066_intro.xml

Size: 4,282 bytes

Last Modified: 2025-10-06 14:03:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1066">
  <Title>Clause Restructuring for Statistical Machine Translation</Title>
  <Section position="3" start_page="0" end_page="531" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent research on statistical machine translation (SMT) has lead to the development of phrase-based systems (Och et al., 1999; Marcu and Wong, 2002; Koehn et al., 2003). These methods go beyond the original IBM machine translation models (Brown et al., 1993), by allowing multi-word units (&amp;quot;phrases&amp;quot;) in one language to be translated directly into phrases in another language. A number of empirical evaluations have suggested that phrase-based systems currently represent the state-of-the-art in statistical machine translation.</Paragraph>
    <Paragraph position="1"> In spite of their success, a key limitation of phrase-based systems is that they make little or no direct use of syntactic information. It appears likely that syntactic information will be crucial in accurately modeling many phenomena during translation, for example systematic differences between the word order of different languages. For this reason there is currently a great deal of interest in methods which incorporate syntactic information within statistical machine translation systems (e.g., see (Alshawi, 1996; Wu, 1997; Yamada and Knight, 2001; Gildea, 2003; Melamed, 2004; Graehl and Knight, 2004; Och et al., 2004; Xia and McCord, 2004)).</Paragraph>
    <Paragraph position="2"> In this paper we describe an approach for the use of syntactic information within phrase-based SMT systems. The approach constitutes a simple, direct method for the incorporation of syntactic information in a phrase-based system, which we will show leads to significant improvements in translation accuracy. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the resulting parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string.</Paragraph>
    <Paragraph position="3"> Finally, we apply a phrase-based system to the re-ordered string to give a translation into the target language.</Paragraph>
    <Paragraph position="4"> We describe experiments involving machine translation from German to English. As an illustrative example of our method, consider the following German sentence, together with a &amp;quot;translation&amp;quot; into English that follows the original word order: Original sentence: Ich werde Ihnen die entsprechenden Anmerkungen aushaendigen, damit Sie das eventuell bei der Abstimmung uebernehmen koennen.</Paragraph>
    <Paragraph position="5"> English translation: I will to you the corresponding comments pass on, so that you them perhaps in the vote adopt can. The German word order in this case is substantially different from the word order that would be seen in English. As we will show later in this paper, translations of sentences of this type pose difficulties for phrase-based systems. In our approach we reorder the constituents in a parse of the German sentence to give the following word order, which is much closer to the target English word order (words which have been &amp;quot;moved&amp;quot; are underlined): Reordered sentence: Ich werde aushaendigen Ihnen die entsprechenden Anmerkungen, damit Sie koennen uebernehmen das eventuell bei der Abstimmung.</Paragraph>
    <Paragraph position="6"> English translation: I will pass on to you the corresponding comments, so that you can adopt them perhaps in the vote.  We applied our approach to translation from German to English in the Europarl corpus. Source language sentences are reordered in test data, and also in training data that is used by the underlying phrase-based system. Results using the method show an improvement from 25.2% Bleu score to 26.8% Bleu score (a statistically significant improvement), using a phrase-based system (Koehn et al., 2003) which has been shown in the past to be a highly competitive SMT system.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML