File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1006_evalu.xml

Size: 8,498 bytes

Last Modified: 2025-10-06 13:59:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1006">
  <Title>Improved Word Alignment Using a Symmetric Lexicon Model</Title>
  <Section position="6" start_page="0" end_page="4" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Evaluation Criteria
</SectionTitle>
      <Paragraph position="0"> We use the same evaluation criterion as described in (Och and Ney, 2000). The generated word alignment is compared to a reference alignment which is produced by human experts. The annotation scheme explicitly takes the ambiguity of the word alignment into account. There are two different kinds of alignments: sure alignments (S) which are used for alignments that are unambiguous and possible alignments (P) which are used for alignments that might or might not exist. The P relation is used especially to align words within idiomatic expressions, free translations, and missing function words. It is guaranteed that the sure alignments are a subset of the possible alignments (S P). The obtained reference alignment may contain many-to-one and one-to-many relationships.</Paragraph>
      <Paragraph position="1"> The quality of an alignment A is computed as appropriately redefined precision and recall measures. Additionally, we use the alignment error rate (AER), which is derived from the well-known F-measure.</Paragraph>
      <Paragraph position="3"> With these definitions a recall error can only occur if a S(ure) alignment is not found and a precision error can only occur if a found alignment is not even P(ossible).</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> We evaluated the presented lexicon symmetrization methods on the Verbmobil and the Canadian Hansards task. The German-English Verbmobil task (Wahlster, 2000) is a speech translation task in the domain of appointment scheduling, travel planning and hotel reservation. The French-English Canadian Hansards task consists of the debates in the Canadian Parliament.</Paragraph>
      <Paragraph position="1"> The corpus statistics are shown in Table 1 and Table 2. The number of running words and the vocabularies are based on full-form words including punctuation marks. As in  (Och and Ney, 2003), the first 100 sentences of the test corpus are used as a development corpus to optimize model parameters that are not trained via the EM algorithm, e.g. the discounting parameter for lexicon smoothing. The remaining part of the test corpus is used to evaluate the models.</Paragraph>
      <Paragraph position="2"> We use the same training schemes (model sequences) as presented in (Och and Ney, 2003). As we use the same training and testing conditions as (Och and Ney, 2003), we will refer to the results presented in that article as the baseline results. In (Och and Ney, 2003), the alignment quality of statistical models is compared to alternative approaches, e.g. using the Dice coefficient or the competitive linking algorithm. The statistical approach showed the best performance and therefore we report only the results for the statistical systems. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="4" type="sub_section">
      <SectionTitle>
5.3 Lexicon Symmetrization
</SectionTitle>
      <Paragraph position="0"> In Table 3 and Table 4, we present the following experiments performed for both the Verbmobil and the Canadian Hansards task: + Base: the system taken from (Och and Ney, 2003) that we use as baseline system.</Paragraph>
      <Paragraph position="1"> + Lin.: symmetrized lexicon using a linear interpolation of the lexicon counts after each training iteration as described in Section 3.1.</Paragraph>
      <Paragraph position="2"> + Log.: symmetrized lexicon using a log-linear interpolation of the lexicon counts after each training iteration as described in Section 3.2.</Paragraph>
      <Paragraph position="3">  mance for the Verbmobil task (S!T: source-to-target direction, T!S: target-to-source direction; all numbers in percent).</Paragraph>
      <Paragraph position="4">  ods as a function of the training corpus size for the Verbmobil task (source-to-target direction). null In Table 3, we compare both interpolation variants for the Verbmobil task to (Och and Ney, 2003). We observe notable improvements in the alignment error rate using the linear interpolation. For the translation direction from German to English (S!T), an improvement of about 25% relative is achieved from an alignment error rate of 5:7% for the baseline system to 4:3% using the linear interpolation. Performing the loglinear interpolation, we observe a substantial reduction of the alignment error rate as well. The two symmetrization methods improve both precision and recall of the resulting Viterbi alignment in both translation directions for the Verbmobil task. The improvements with the linear interpolation is for both translation directions statistically significant at the 99% level. For the loglinear interpolation, the target-to-source translation direction is statistically significant at the 99% level. The statistical significance test were done using boostrap resampling.</Paragraph>
      <Paragraph position="5"> We also performed experiments on sub-corpora of different sizes. For the Verbmobil task, the results are illustrated in Figure 1.  We observe that both symmetrization variants result in improvements for all corpus sizes.</Paragraph>
      <Paragraph position="6"> With increasing training corpus size the performance of the linear interpolation becomes superior to the performance of the loglinear interpolation.</Paragraph>
      <Paragraph position="7"> In Table 4, we compare the symmetrization methods with the baseline system for the Canadian Hansards task. Here, the loglinear interpolation performs best. We achieve a relative improvement over the baseline of more than 30% for both translation directions. For instance, the alignment error rate for the translation direction from French to English (S!T) improves from 12.6% for the baseline system to 8.6% for the symmetrized system with loglinear interpolation. Again, the two symmetrization methods improve both precision and recall of the Viterbi alignment. For the Canadian Hansards task, all the improvements of the alignment error rate are statistically significant at the 99% level.</Paragraph>
    </Section>
    <Section position="4" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
5.4 Generalized Alignments
</SectionTitle>
      <Paragraph position="0"> In (Och and Ney, 2003) generalized alignments are used, thus the final Viterbi alignments of both translation directions are combined using some heuristic. Experimentally, the best heuristic for the Canadian Hansards task is the intersection. For the Verbmobil task, the refined method of (Och and Ney, 2003) is used. The results are summarized in Table 5.</Paragraph>
      <Paragraph position="1"> We see that both the linear and the loglinear lexicon symmetrization methods yield an improvement with respect to the alignment error rate. For the Verbmobil task, the improvement with the loglinear interpolation is statistically significant at the 99% level. For the Canadian Hansards task, both lexicon symmetrization methods result in statistically significant improvements at the 95% level. Additionally, we observe that precision and recall are more balanced for the symmetrized lexicon variants, especially for the Canadian Hansards  abilities on the alignment performance for the Verbmobil task (S!T: source-to-target direction, smooth English; T!S: target-to-source direction, smooth German; all numbers in percent). null S!T T!S Pre. Rec. AER Pre. Rec. AER Base 93.5 95.3 5.7 91.4 88.7 9.9 smooth 94.8 94.8 5.2 93.4 88.2 9.1 task.</Paragraph>
    </Section>
    <Section position="5" start_page="4" end_page="4" type="sub_section">
      <SectionTitle>
5.5 Lexicon Smoothing
</SectionTitle>
      <Paragraph position="0"> In Table 6, we present the results for the lexicon smoothing as described in Section 4 on the Verbmobil corpus2. As expected, a notable improvement in the AER is reached if the lexicon smoothing is performed for German (i.e. for the target-to-source direction), because many full-form words with the same base form are present in this language. These improvements are statistically significant at the 95% level.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML