File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2158_intro.xml

Size: 7,574 bytes

Last Modified: 2025-10-06 14:06:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2158">
  <Title>A DP based Search Algorithm for Statistical Machine Translation</Title>
  <Section position="2" start_page="0" end_page="961" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> In this paper, we address the problem of finding the most probable target language representation of a given source language string. In our approach, we use a DP based search algorithm which sequentially visits the target string positions while progressively considering the source string words.</Paragraph>
    <Paragraph position="1"> The organization of the paper is as follows. After reviewing the statistical approach to machine translation, we first describe the statistical knowledge sources used during the search process. We then present our DP based search algorithm in detail. Finally, experimental results for a bilingual corpus are reported.</Paragraph>
    <Section position="1" start_page="0" end_page="960" type="sub_section">
      <SectionTitle>
1.1 Statistical Machine Translation
</SectionTitle>
      <Paragraph position="0"> In statistical machine translation, the goal of the search strategy can be formulated as follows: We are given a source language ('French') string fl a = fl * .- f.t, which is to be translated into a target language ('English') string e~ = el...et with the unknown length I. Every English string is considered as a possible translation for the input string. If we assign a probability Pr(e~\[f() to each pair of strings (e/, f~), then we have to choose the length Iopt and -/opt the English string e 1 that maximize Pr(e f If J) for a given French string fJ. According to Bayes decision rule, Iopt and ~deg&amp;quot; can be found by</Paragraph>
      <Paragraph position="2"> Pr(e~) is the English language model, whereas Pr(flJ\[eZa) is the string translation model.</Paragraph>
      <Paragraph position="3"> The overall architecture of the statistical translation approach is summarized in Fig. 1. In this figure, we already anticipate the fact that we will transform the source strings in a certain manner and that we will countermand these transformations on the produced output strings. This aspect is explained in more detail in Section 3.</Paragraph>
      <Paragraph position="4">  based on Bayes' decision rule.</Paragraph>
      <Paragraph position="5"> The task of statistical machine translation can be subdivided into two fields: 1. the field of modelling, which introduces structures into the probabilistic dependencies and provides methods for estimating the parameters of the models from bilingual corpora; 2. the field of decoding, i.e. finding a search algorithm, which performs the argmax operation in Eq. (1) as efficient as possible.</Paragraph>
    </Section>
    <Section position="2" start_page="960" end_page="960" type="sub_section">
      <SectionTitle>
1.2 Alignment with Mixture Distribution
</SectionTitle>
      <Paragraph position="0"> Several papers have discussed the first issue, especially the problem of word alignments for bilingual corpora (Brown et al., 1993), (Dagan et al., 1993), (Kay and RSscheisen, 1993), (Fung and Church, 1994), (Vogel et al., 1996).</Paragraph>
      <Paragraph position="1"> In our search procedure, we use a mixture-based alignment model that slightly differs from the model introduced as Model 2 in (Brown et al., 1993). It is based on a decomposition of the joint probability for f~ into a product of the probabilities for each word</Paragraph>
      <Paragraph position="3"> where the lengths of the strings are regarded as random variables and modelled by the distribution p(JlI). Now we assume a sort of pairwise interaction between the French word fj and each English word ei in el i. These dependencies are captured in the form of a mixture distribution:</Paragraph>
      <Paragraph position="5"> Inserting this into (2), we get</Paragraph>
      <Paragraph position="7"> with the following components: the sentence length probability p(JlI), the mixture alignment probability p(ilj, J, I) and the translation probability p(fle). So far, the model allows all English words in the target string to contribute to the translation of a French word. This is expressed by the sum over i in Eq. (4). It is reasonable to assume that for each source string position j one position i in the target string dominates this sum. This conforms with the experience, that in most cases a clear word-to-word correspgndence between a string and its translation exists. As a consequence, we use the so-called maximum approximation: At each point, only the best choice of i is considered for the alignment path:</Paragraph>
      <Paragraph position="9"> We can now formulate the criterion to be maximized by a search algorithm:</Paragraph>
      <Paragraph position="11"> Because of the problem of data sparseness, we use a parametric model for the alignment probabilities.</Paragraph>
      <Paragraph position="12"> It assumes that the distance of the positions relative to the diagonal of the (j, i) plane is the dominating factor:</Paragraph>
      <Paragraph position="14"> As described in (Brown et al., 1993), the EM algorithm can be used to estimate the parameters of the model.</Paragraph>
    </Section>
    <Section position="3" start_page="960" end_page="961" type="sub_section">
      <SectionTitle>
1.3 Search in Statistical Machine
Translation
</SectionTitle>
      <Paragraph position="0"> In the last few years, there has been a number of papers considering the problem of finding an efficient search procedure (Wu, 1996), (Tillmann et al., 1997a), (TiUmann et al., 1997b), (Wang and Waibel, 1997). All of these approaches use a bigram language model, because they are quite simple and easy-to-use and they have proven their prediction power in stochastic language processing, especially speech recognition. Assuming a bigram language model, we would like to re-formulate Eq. (6) in the following way:</Paragraph>
      <Paragraph position="2"> Any search algorithm tending to perform the maximum operations in Eq. (8) has to guarantee, that the predecessor word ei-1 can be determined at the time when a certain word ei at position i in the target string is under consideration. Different solutions to this problem have been studied.</Paragraph>
      <Paragraph position="3"> (Tillmann et al., 1997b) and (Tillmann et al., 1997a) propose a search procedure based on dynamic programming, that examines the source string sequentially. Although it is very efficient in terms of translation speed, it suffers from the drawback of being dependent on the so-called monotonicity constraint: The alignment paths are assumed to be monotone. Hence, the word at position i - 1 in the target sentence can be determined when the algorithm produces ei. This approximation corresponds to the assumption of the fundamental similaxity of the sentence structures in both languages. In (Tillmann et al., 1997b) text transformations in the source language are used to adapt the word ordering in the source strings to the target language grammar.</Paragraph>
      <Paragraph position="4"> (Wang and Waibel, 1997) describe an algorithm based on A*-search. Here, hypotheses are extended  by adding a word to the end of the target string while considering the source string words in any order. The underlying translation model is Model 2 from (Brown et al., 1993).</Paragraph>
      <Paragraph position="5"> (Wu, 1996) formulates a DP search for stochastic bracketing transduction grammars. The bigram language model is integrated into the algorithm at the point, where two partial parse trees are combined.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML