File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/p03-1021_relat.xml
Size: 1,848 bytes
Last Modified: 2025-10-06 14:15:40
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1021"> <Title>Minimum Error Rate Training in Statistical Machine Translation</Title> <Section position="8" start_page="0" end_page="0" type="relat"> <SectionTitle> 8 Related Work </SectionTitle> <Paragraph position="0"> The use of log-linear models for statistical machine translation was suggested by Papineni et al. (1997) and Och and Ney (2002).</Paragraph> <Paragraph position="1"> The use of minimum classification error training and using a smoothed error count is common in the pattern recognition and speech spond to larger BLEU and NIST scores and to smaller error rates. Italic numbers refer to results for which the difference to the best result (indicated in bold) is not statistically significant. error criterion used in training mWER [%] mPER [%] BLEU [%] NIST # words recognition community (Duda and Hart, 1973; Juang et al., 1995; Schl&quot;uter and Ney, 2001).</Paragraph> <Paragraph position="2"> Paciorek and Rosenfeld (2000) use minimum classification error training for optimizing parameters of a whole-sentence maximum entropy language model.</Paragraph> <Paragraph position="3"> A technically very different approach that has a similar goal is the minimum Bayes risk approach, in which an optimal decision rule with respect to an application specific risk/loss function is used, which will normally differ from Eq. 3. The loss function is either identical or closely related to the final evaluation criterion. In contrast to the approach presented in this paper, the training criterion and the statistical models used remain unchanged in the minimum Bayes risk approach. In the field of natural language processing this approach has been applied for example in parsing (Goodman, 1996) and word alignment (Kumar and Byrne, 2002).</Paragraph> </Section> class="xml-element"></Paper>