File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1021_concl.xml

Size: 3,192 bytes

Last Modified: 2025-10-06 13:53:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1021">
  <Title>Minimum Error Rate Training in Statistical Machine Translation</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
9 Conclusions
</SectionTitle>
    <Paragraph position="0"> We presented alternative training criteria for log-linear statistical machine translation models which are directly related to translation quality: an unsmoothed error count and a smoothed error count on a development corpus. For the unsmoothed error count, we presented a new line optimization algorithm which can efficiently find the optimal solution along a line. We showed that this approach obtains significantly better results than using the MMI training criterion (with our method to define pseudoreferences) and that optimizing error rate as part of the training criterion helps to obtain better error rate on unseen test data. As a result, we expect that actual 'true' translation quality is improved, as previous work has shown that for some evaluation criteria there is a correlation with human subjective evaluation of fluency and adequacy (Papineni et al., 2001; Doddington, 2002). However, the different evaluation criteria yield quite different results on our Chinese-English translation task and therefore we expect that not all of them correlate equally well to human translation quality.</Paragraph>
    <Paragraph position="1"> The following important questions should be answered in the future: a45 How many parameters can be reliably estimated using unsmoothed minimum error rate criteria using a given development corpus size? We expect that directly optimizing error rate for many more parameters would lead to serious overfitting problems. Is it possible to optimize more parameters using the smoothed error rate criterion? a45 Which error rate should be optimized during training? This relates to the important question of which automatic evaluation measure is optimally correlated to human assessment of translation quality.</Paragraph>
    <Paragraph position="2"> Note, that this approach can be applied to any evaluation criterion. Hence, if an improved automatic evaluation criterion is developed that has an even better correlation with human judgments than BLEU and NIST, we can plug this alternative criterion directly into the training procedure and optimize the model parameters for it. This means that improved translation evaluation measures lead directly to improved machine translation quality. Of course, the approach presented here places a high demand on the fidelity of the measure being optimized. It might happen that by directly optimizing an error measure in the way described above, weaknesses in the measure might be exploited that could yield better scores without improved translation quality. Hence, this approach poses new challenges for developers of automatic evaluation criteria. null Many tasks in natural language processing, for instance summarization, have evaluation criteria that go beyond simply counting the number of wrong system decisions and the framework presented here might yield improved systems for these tasks as well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML