File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1604_intro.xml
Size: 4,778 bytes
Last Modified: 2025-10-06 14:01:34
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-1604"> <Title>English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations</Title> <Section position="3" start_page="2" end_page="5" type="intro"> <SectionTitle> 3 Evaluation </SectionTitle> <Paragraph position="0"> In May 2002, we compared output of the MSR-MT English-Japanese system with a commercially available desktop MT system.</Paragraph> <Paragraph position="1"> Toshiba's The Honyaku Office v2.0 desktop MT system was selected for this purpose. The Honyaku is a trademark of the Toshiba Corporation. Another desktop system was also considered for evaluation; however, comparative evaluation with that system indicated that the Toshiba system performed marginally, though not significantly, better on our technical documentation.</Paragraph> <Section position="1" start_page="2" end_page="5" type="sub_section"> <SectionTitle> Training Data Translation Output </SectionTitle> <Paragraph position="0"> This URL provides access to public folders.</Paragraph> <Paragraph position="1"> This computer provides access to the internet.</Paragraph> <Paragraph position="2"> The Japanese sentences, which had been translated by human translators, were taken as reference sentences (and were assumed to be correct translations). The English sentences were then translated by the two MT systems into Japanese for blind evaluation performed by seven outside vendors unfamiliar with either system's characteristics. No attempt was made to constrain or modify the English input sentences on the basis of length or other characteristics. Both systems provided a translation for each sentence.</Paragraph> <Paragraph position="3"> For each of the Japanese reference sentences, evaluators were asked to select which translation was closer to the reference sentence. A value of +1 was assigned if the evaluator considered MSR-MT output sentence better and [?]1 if they considered the comparison system better. If two translated sentences were considered equally good or bad in comparison 250 sentences were originally selected for evaluation; 12 were later discarded when it was discovered by evaluators that the Japanese reference sentences (not the input sentences) were defective owing to the presence of junk characters (mojibake) and other deficiencies.</Paragraph> <Paragraph position="4"> In MSR-MT, Mindnet coverage is sufficiently complete with respect to the domain that an untranslated sentence normally represents a complete failure to parse the input, typically owing to excessive length.</Paragraph> <Paragraph position="5"> to the reference, a value of 0 was assigned. On this metric, MSR-MT scored slightly worse than the comparison system rating of [?]0.015. At a two-way confidence measure of +/[?]0.16, the difference between the systems is statistically insignificant. By contrast, an earlier evaluation conducted in October 2001 yielded a score of [?]0.34 vis-a-vis the comparison system.</Paragraph> <Paragraph position="6"> In addition, the evaluators were asked to rate the translation quality on an absolute scale of 1 through 4, according to the following criteria: 1. Unacceptable: Absolutely not comprehensible and/or little or no information transferred accurately.</Paragraph> <Paragraph position="7"> 2. Possibly Acceptable: Possibly comprehensible (given enough context and/or time to work it out); some information transferred accurately.</Paragraph> <Paragraph position="8"> 3. Acceptable: Not perfect, but definitely comprehensible, and with accurate transfer of all important information.</Paragraph> <Paragraph position="9"> 4. Ideal: Not necessarily a perfect translation, but grammatically correct, and with all information accurately transferred.</Paragraph> <Paragraph position="10"> On this absolute scale, neither system performed exceptionally well: MSR-MT scored an average 2.25 as opposed to 2.32 for the comparison system. Again, the difference between the two is statistically insignificant. It should be added that the comparison presented here is not ideal, since MSR-MT was trained principally on technical manual sentences, while the comparison system was not specifically tuned to this corpus. Accordingly the results of the evaluation need to be interpreted narrowly, as demonstrating that: l A viable example-based English-Japanese MT system can be developed that applies general-purpose alignment rules to semantic representations; and l Given general-purpose grammars, a representation of what the sentence means, and suitable learning techniques, it is possible to achieve in a domain, results analogous with those of a mature commercial product, and within a relatively short time frame.</Paragraph> </Section> </Section> class="xml-element"></Paper>