File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1047_intro.xml

Size: 6,251 bytes

Last Modified: 2025-10-06 14:02:04

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1047">
  <Title>Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper addresses the challenging problem of eliminating unsatisfactory outputs from machine translation (MT) systems, which are subsystems of a speech-to-speech machine translation (S2SMT) system. The permissible range of translation quality by MT/S2SMT systems depends on the user. Some users permit only perfect translations, while other users permit even translations with flawed grammar.</Paragraph>
    <Paragraph position="1"> Unsatisfactory MT outputs are those whose translation quality is worse than the level the user can permit. null In this paper, the authors intend to eliminate unsatisfactory outputs by using confidence measures for MT outputs. The confidence measures1 indicate how perfect/satisfactory the MT outputs are. In the This research was supported in part by the Ministry of Public Management, Home Affairs, Posts and Telecommunications, Japan.</Paragraph>
    <Paragraph position="2"> 1These confidence measures are a kind of automatic evaluator such as mWER (Niessen et al., 2000) and BLEU (Papineni et al., 2001). While mWER and BLEU cannot be used online, these confidence measures can. This is because the former are based on reference translations, while the latter is not. discipline of MT, confidence measures for MT outputs have rarely been investigated.</Paragraph>
    <Paragraph position="3"> The few existing confidence measures include the rank-sum-based confidence measure (RSCM) for statistical machine translation (SMT) systems, Crank in (Ueffing et al., 2003). The basic idea of this confidence measure is to roughly calculate the word posterior probability by using ranks of MT outputs in an N-best list from an SMT system.</Paragraph>
    <Paragraph position="4"> In the discipline of non-parametric statistical test, ranks of numerical values are commonly used instead of the numerical values themselves for statistical tests. In the case of the existing RSCM, the ranks of probabilities of MT outputs in the N-best list were used instead of the probabilities of the outputs themselves. The existing RSCM scores each word in an MT output by summing the complemented ranks of candidates in the N-best list that contain the same word in a Levenshtein-aligned position (Levenshtein, 1966). When the confidence values of all words in the MT output are larger than a fixed threshold, the MT output is judged as correct/perfect. Otherwise, the output is judged as incorrect/imperfect. null The existing RSCM does not always work well  different types of Japanese-to-English (J2E) MT systems: D3, HPAT, and SAT. The existing RSCM tried to accept perfect MT outputs (grade A in Section 4) and to reject imperfect MT outputs (grades B, C, and D in Section 4).</Paragraph>
    <Paragraph position="5"> on types of MT systems other than SMT systems.</Paragraph>
    <Paragraph position="6"> Figure 1 shows the differences among the performances, indicated by the Receiver Operating Characteristics (ROC) curve (Section 4.1), of the existing RSCM on each of three MT systems (Section 4.2.1): D3, HPAT, and SAT (Doi and Sumita, 2003; Imamura et al., 2003; Watanabe et al., 2003). Only SAT is an SMT system; the others are not. The ideal ROC curve is a square (0,1), (1,1), (1,0); thus, the closer the curve is to a square, the better the performance of the RSCM is. The performances of the existing RSCM on the non-SMT systems, D3 and HPAT, are much worse than that on the SMT system, SAT.</Paragraph>
    <Paragraph position="7"> The performance of the existing RSCM depends on the goodness/density of MT outputs in the N-best list from the system. However, the system's N-best list does not always give a good approximation of the total summation of the probability of all candidate translations given the source sentence/utterance. The N-best list is expected to approximate the total summation as closely as possible. null This paper proposes a method that eliminates unsatisfactory top output by using an alternative RSCM based on a mixture of N-best lists from multiple MT systems (Figure 2). The elimination system is intended to be used in the selector architecture, as in (Akiba et al., 2002). The total translation quality of the selector architecture proved to be better than the translation quality of each element MT system. The final output from the selection system is the best among the satisfactory top2 outputs from the elimination system. In the case of Figure 2, the selection system can receive zero to three top MT outputs. When the selection system receive fewer than two top MT outputs, the selection system merely passes a null output or the one top MT output.</Paragraph>
    <Paragraph position="8"> The proposed RSCM differs from the existing RSCM in its N-best list. The proposed RSCM re2To distinguish the best output from the selection system, the MT output in the first place in each N-best list (e.g., N-best lista in Figure 2 ) refers to the top MT output.</Paragraph>
    <Paragraph position="9">  ceives an M-best list from each element MT system. Next, it sorts the mixture of the MT outputs in all M-best lists in the order of the average product (Section 3.2) of the scores of a language model and a translation model (Akiba et al., 2002). This sorted mixture is used instead of the system's N-best list in the existing RSCM.</Paragraph>
    <Paragraph position="10"> To experimentally evaluate the proposed RSCM, the authors applied the proposed RSCM and the existing RSCM to a test set of the Basic Travel Expression Corpus (Takezawa et al., 2002). The proposed RSCM proved to work better than the existing RSCM on the non-SMT systems and to work as well as the existing RSCM on the SMT system.</Paragraph>
    <Paragraph position="11"> The next section outlines the existing RSCM.</Paragraph>
    <Paragraph position="12"> Section 3 proposes our RSCM. Experimental results are shown and discussed in Section 4. Finally, our conclusions are presented in Section 5.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML