File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1612_intro.xml
Size: 4,237 bytes
Last Modified: 2025-10-06 14:02:00
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1612"> <Title>Paraphrasing Rules for Automatic Evaluation of Translation into Japanese</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Evaluating natural language processing applications' output is important both for users and developers. Tasks such as sentential parsing, morphological analysis and named entity recognition are easy to evaluate automatically because the &quot;right answer&quot; can be defined deterministically under a specific grammar or assumed criterion.</Paragraph> <Paragraph position="1"> The evaluation of machine translation is not so straightforward since there are infinite ways to output similar meanings and one can not enumerate the right answers exhaustively. In spite of that, automatic translation evaluation is practically important because the evaluation is laborious work for humans and evaluation by humans tends to be arbitrary. Automatic evaluation is more reliable than human evaluation because of its consistency for the same translations. null BLEU (Papineni et al., 2002b) is one of the methods for automatic evaluation of translation quality. It uses the ratio of co-occurring n-grams between a translation and single or multiple reference sentences. High correlation is reported between the BLEU score and human evaluations for translations from Arabic, Chinese, French, and Spanish to English (Papineni et al., 2002a).</Paragraph> <Paragraph position="2"> This paper investigates how to apply BLEU to the evaluation of English-to-Japanese translation. The main goal of this paper is to design a reliable method of evaluation for translations from another language to Japanese (henceforth we call this Japanese translation evaluation). There are some difficulties in adjusting BLEU for Japanese: BLEU uses n-grams of words, so words in a sentence are assumed to be separated by spaces, while Japanese does not use spaces between words. Moreover, Japanese has more variation in writing styles than English. A major difference in these languages is that Japanese has polite forms expressed by inflections or auxiliary verbs. If the style of the translations is not the same as that of the reference sentences, the evaluation score becomes low even though the translations are accurate in their meanings and grammar. To solve these problems, we apply paraphrasing rules to the reference sentences so that the differences in writing styles do not affect the evaluation score.</Paragraph> <Paragraph position="3"> Another goal is derived from this application of paraphrasing: to define a &quot;good paraphrase&quot;. Here paraphrasing means rewriting sentences without changing their semantics. Several methods of paraphrasing have been studied. Some of them aim at the preprocessing of machine translation (Mitamura and Nyberg, 2001; Takahashi et al., 2001).</Paragraph> <Paragraph position="4"> They use paraphrasing to transform the input sentences so that the language-transferring routines can handle them easily. Another application of paraphrasing is to canonicalize many expressions that have the same semantics, supporting information retrieval or question answering (Zukerman and Raskutti, 2002; Torisawa, 2002). Paraphrasing techniques in these studies are considered to be useful, but they are difficult to evaluate.</Paragraph> <Paragraph position="5"> Machine translation evaluation requires methods to judge whether two sentences have the same meaning even when they are syntactically different.</Paragraph> <Paragraph position="6"> Therefore if a set of paraphrasing rules contributes to more reliable translation evaluation, it can be said to be &quot;good&quot; paraphrasing. Thus the study in this paper also presents a new paradigm for evaluating paraphrases.</Paragraph> <Paragraph position="7"> Section 2 overviews the BLEU metric. Section 3 presents the proposed method of Japanese translation evaluation, and its performance is evaluated in Section 4. Based on the experimental results, Section 5 discusses qualitative and quantitative features of paraphrasing.</Paragraph> </Section> class="xml-element"></Paper>