XML Viewer - n03-1019

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/n03-1019_evalu.xml
Size: 6,019 bytes
Last Modified: 2025-10-06 13:58:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1019">
  <Title>A Weighted Finite State Transducer Implementation of the Alignment Template Model for Statistical Machine Translation</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4 Translation and Alignment Experiments
</SectionTitle>
    <Paragraph position="0"> We now evaluate this implementation of the alignment template translation model.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Building the Alignment Template Library
</SectionTitle>
      <Paragraph position="0"> To create the template library, we follow the procedure reported in Och (2002). We first obtain word alignments of bitext using IBM-4 translation models trained in each translation direction (IBM-4 F and IBM-4 E), and then forming the union of these alignments (IBM-4 a58 a126 a57 ).</Paragraph>
      <Paragraph position="1"> We extract the library of alignment templates from the bitext alignment using the phrase-extract algorithm reported in Och (2002). This procedure identifies several alignment templates a14a127a30 a1 a57</Paragraph>
      <Paragraph position="3"> a28 that are consistent with a source phrase a21 . We do not use word classes in the experiments reported here; therefore templates are specified by phrases rather than by class sequences. For a given pair of source and target phrases, we retain only the matrix of alignments that occurs most frequently in the training corpus. This is consistent with the intended application of these templates for translation and alignment under the maximum likelihood criterion; in the current formulation, only one alignment will survive in any application of the models and there is no reason to retain any of the less frequently occuring alignments. We estimate the probability a0a2a1a4a14a111a35 a21a22a28 by the relative frequency of phrasal translations found in bitext alignments. To restrict the memory requirements of the model, we extract only the templates which have at most a35 words in the source phrase. Furthermore, we restrict ourselves to the templates which have a probability a0a2a1a32a14a109a35 a21a22a28 a1 a92 a49 a92 a74 for some source phrase a21 .</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Bitext Word Alignment
</SectionTitle>
      <Paragraph position="0"> We present results on the French-to-English Hansards translation task (Och and Ney, 2000). We measured the alignment performance using precision, recall, and Alignment Error Rate (AER) metrics (Och and Ney, 2000).</Paragraph>
      <Paragraph position="1"> Our training set is a subset of the Canadian Hansards which consists of a35 a92 a8 a92a43a92a117a92 French-English sentence pairs (Och and Ney, 2000). The English side of the bitext had a total ofa42a20a41 a31a22a8 a38a31a26a31 words (a74 a0 a8 a41a31 a92 unique tokens) and the French side containeda0 a74a4a38a22a8 a35a41a30a35 words (a68 a41 a8 a92 a28a26a38 unique tokens). Our template library consisted of a74a117a8 a92 a42a22a74a48a8a47a74 a68a1a0 templates.</Paragraph>
      <Paragraph position="2"> Our test set consists of 500 unseen French sentences from Hansards for which both reference translations and word alignments are available (Och and Ney, 2000). We present the results under the ATTM in Table 1, where we distinguish word alignments produced by the templates from the template library against those produced by the templates introduced for alignment in Section 3.1. For comparison, we also align the bitext using IBM-4 translation models.</Paragraph>
      <Paragraph position="3">  English Hansards Alignment Task.</Paragraph>
      <Paragraph position="4"> We first observe that the complete set of word alignments generated by the ATTM (ATTM-C) is relatively poor. However, when we consider only those word alignments generated by actual alignment templates (ATTM-A) (and discard the alignments generated by the dummy templates introduced as described in Section 3.1), we obtain very high alignment precision. This implies that word alignments within the templates are very accurate.</Paragraph>
      <Paragraph position="5"> However, the poor performance under the recall measure suggests that the alignment template library has relatively poor coverage of the phrases in the alignment test set.</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Translation and Lattice Quality
</SectionTitle>
      <Paragraph position="0"> We next measured the translation performance of ATTM on the same test set. The translation performance was measured using the BLEU (Papineni et al., 2001) and the NIST MT-eval metrics (Doddington, 2002), and Word Error Rate (WER). The target language model was a trigram language model with modified Kneser-Ney smoothing trained on the English side of the bitext using the SRILM tookit (Stolcke, 2002). The performance of the model is reported in Table 2. For comparison, we also report performance of the IBM-4 translation model trained on the same corpus. The IBM Model-4 translations were obtained using the ReWrite decoder (Marcu and Germann, 2002). The results in Table 2 show that the alignment  English Hansards Translation Task.</Paragraph>
      <Paragraph position="1"> template model outperforms the IBM Model 4 under all three metrics. This verifies that WFST implementation of the ATTM can obtain a performance that compares favorably to other well known research tools.</Paragraph>
      <Paragraph position="2"> We generate N-best lists from each translation lattice, and show the variation of their oracle-best BLEU scores in Table 3. We observe that the oracle-best BLEU score  lists generated by the ATTM.</Paragraph>
      <Paragraph position="3"> increases with the size of the N-Best List. We can therefore expect to rescore these lattices with more sophisticated models and achieve improvements in translation quality.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML