File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-3123_metho.xml

Size: 6,720 bytes

Last Modified: 2025-10-06 14:11:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3123">
  <Title>Constraining the Phrase-Based, Joint Probability Statistical Translation Model</Title>
  <Section position="3" start_page="154" end_page="154" type="metho">
    <SectionTitle>
2 Translation Models
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="154" end_page="154" type="sub_section">
      <SectionTitle>
2.1 Standard Phrase-based Model
</SectionTitle>
      <Paragraph position="0"> Most phrase-based translation models (Och, 2003; Koehn et al., 2003; Vogel et al., 2003) rely on a pre-existing set of word-based alignments from which they induce their parameters. In this project we use the model described by Koehn et al. (2003) which extracts its phrase alignments from a corpus that has been word aligned. From now on we refer to this phrase-based translation model as the standard model. The standard model decomposes the foreign input sentence F into a sequence of I phrases f1,...,fI. Each foreign phrase fi is translated to an English phrase ei using the probability distribution th(fi|ei). English phrases may be reordered using a relative distortion probability.</Paragraph>
      <Paragraph position="1"> This model performs no search for optimal phrase pairs. Instead, it extracts phrase pairs (fi,ei) in the following manner. First, it uses the IBM Models to learn the most likely word-level Viterbi alignments for English to Foreign and Foreign to English. It then uses a heuristic to reconcile the two alignments, starting from the points of high confidence in the intersection of the two Viterbi alignments and growing towards the points in the union. Points from the union are selected if they are adjacent to points from the intersection and their words are previously unaligned.</Paragraph>
      <Paragraph position="2"> Phrases are then extracted by selecting phrase pairs which are 'consistent' with the symmetrized alignment, which means that all words within the source language phrase are only aligned to the wordsofthetargetlanguagephraseandviceversa.</Paragraph>
      <Paragraph position="3"> Finally the phrase translation probability distribution is estimated using the relative frequencies of the extracted phrase pairs.</Paragraph>
      <Paragraph position="4"> This approach to phrase extraction means that phrasal alignments are locked into the symmetrized alignment. This is problematic because the symmetrization process will grow an alignment based on arbitrary decisions about adjacent words and because word alignments inadequately represent the real dependencies between translations. null</Paragraph>
    </Section>
    <Section position="2" start_page="154" end_page="154" type="sub_section">
      <SectionTitle>
2.2 Joint Probability Model
</SectionTitle>
      <Paragraph position="0"> The joint model (Marcu and Wong, 2002), does not rely on a pre-existing set of word-level alignments. Like the IBM Models, it uses EM to align and estimate the probabilities for sub-sentential units in a parallel corpus. Unlike the IBM Models, it does not constrain the alignments to being single words.</Paragraph>
      <Paragraph position="1"> The joint model creates phrases from words and commonly occurring sequences of words. A concept, ci, is defined as a pair of aligned phrases &lt; ei,fi &gt;. A set of concepts which completely covers the sentence pair is denoted by C. Phrases are restricted to being sequences of words which occur above a certain frequency in the corpus.</Paragraph>
      <Paragraph position="2"> Commonly occurring phrases are more likely to lead to the creation of useful phrase pairs, and without this restriction the search space would be much larger.</Paragraph>
      <Paragraph position="3"> The probability of a sentence and its translation is the sum of all possible alignments C, each of which is defined as the product of the probability of all individual concepts:</Paragraph>
      <Paragraph position="5"> The model is trained by initializing the translation table using Stirling numbers of the second kind to efficiently estimate p(&lt; ei,fi &gt;) by calculating the proportion of alignments which contain p(&lt; ei,fi &gt;) compared to the total number of alignments in the sentence (Marcu and Wong, 2002). EM is then performed by first discovering an initial phrasal alignments using a greedy algorithm similar to the competitive linking algorithm (Melamed, 1997). The highest probability phrase pairs are iteratively selected until all phrases are are linked. Then hill-climbing is performed by searching once for each iteration for all merges, splits, moves and swaps that improve the probability of the initial phrasal alignment. Fractional counts are collected for all alignments visited.</Paragraph>
      <Paragraph position="6"> Training the IBM models is computationally challenging, but the joint model is much more demanding. Considering all possible segmentations of phrases and all their possible alignments vastly increases the number of possible alignments that can be formed between two sentences. This number is exponential with relation to the length of the shorter sentence.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="154" end_page="155" type="metho">
    <SectionTitle>
3 Constraining the Joint Model
</SectionTitle>
    <Paragraph position="0"> The joint model requires a strategy for restricting the search for phrasal alignments to areas of the alignment space which contain most of the probabilitymass. Weproposeamethodwhichexamines  phrase pairs that are consistent with a set of high confidence word alignments defined for the sentence. The set of alignments are taken from the intersection of the bi-directional Viterbi alignments. This strategy for extracting phrase pairs is similar to that of the standard phrase-based model and the definition of 'consistent' is the same. However, the constrained joint model does not lock thesearchintoaheuristicallyderivedsymmetrized alignment. Joint model phrases must also occur above a certain frequency in the corpus to be considered. null The constraints on the model are binding during the initialization phase of training. During EM, inconsistent phrase pairs are given a small, non-zero probability and are thus not considered unless unaligned words remain after linking together high probability phrase pairs. All words must be aligned, there is no NULL alignment like in the IBM models.</Paragraph>
    <Paragraph position="1"> By using the IBM Models to constrain the joint model, we are searching areas in the phrasal alignment space where both models overlap. We combinetheadvantageofpriorknowledgeaboutlikely null word alignments with the ability to perform a probabilistic search around them.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML