File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-3026_intro.xml

Size: 3,736 bytes

Last Modified: 2025-10-06 14:03:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-3026">
  <Title>Multi-Engine Machine Translation Guided by Explicit Word Matching</Title>
  <Section position="3" start_page="0" end_page="101" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> A variety of different paradigms for machine translation (MT) have been developed over the years, ranging from statistical systems that learn mappings between words and phrases in the source language and their corresponding translations in the target language, to Interlingua-based systems that perform deep semantic analysis. Each approach and system has different advantages and disadvantages. While statistical systems provide broad coverage with little manpower, the quality of the corpus based systems rarely reaches the quality of knowledge based systems.</Paragraph>
    <Paragraph position="1"> With such a wide range of approaches to machine translation, it would be beneficial to have an effective framework for combining these systems into an MT system that carries many of the advantages of the individual systems and suffers from few of their disadvantages. Attempts at combining the output of different systems have proved useful in other areas of language technologies, such as the ROVER approach for speech recognition (Fiscus 1997). Several approaches to multi-engine machine translation systems have been proposed over the past decade. The Pangloss system and work by several other researchers attempted to combine lattices from many different MT systems (Frederking et Nirenburg 1994, Frederking et al 1997; Tidhar &amp; Kussner 2000; Lavie, Probst et al. 2004).</Paragraph>
    <Paragraph position="2"> These systems suffer from requiring cooperation from all the systems to produce compatible lattices as well as the hard research problem of standardizing confidence scores that come from the individual engines. In 2001, Bangalore et al used string alignments between the different translations to train a finite state machine to produce a consensus translation. The alignment algorithm described in that work, which only allows insertions, deletions and substitutions, does not accurately capture long range phrase movement.</Paragraph>
    <Paragraph position="3"> In this paper, we propose a new way of combining the translations of multiple MT systems based on a more versatile word alignment algorithm. A &amp;quot;decoding&amp;quot; algorithm then uses these alignments, in conjunction with confidence estimates for the various engines and a trigram language model, in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system. We  experimentally tested the new approach by combining translations obtained from combining three Arabic-to-English translation systems. Translation quality is scored using the METEOR MT evaluation metric (Lavie, Sagae et al 2004). Our experiments demonstrate that our new MEMT system achieves a substantial improvement over all of the original systems, and also outperforms an &amp;quot;oracle&amp;quot; capable of selecting the best of the original systems on a sentence-by-sentence basis.</Paragraph>
    <Paragraph position="4"> The remainder of this paper is organized as follows. In section 2 we describe the algorithm for generating multi-engine synthetic translations.</Paragraph>
    <Paragraph position="5"> Section 3 describes the experimental setup used to evaluate our approach, and section 4 presents the results of the evaluation. Our conclusions and directions for future work are presented in section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML