File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/i05-2014_intro.xml

Size: 3,624 bytes

Last Modified: 2025-10-06 14:02:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="I05-2014">
  <Title>ATR - Spoken language communication research labs</Title>
  <Section position="3" start_page="0" end_page="79" type="intro">
    <SectionTitle>
2 Overview
2.1 The word segmentation problem
</SectionTitle>
    <Paragraph position="0"> As statistical machine translation systems basically rely on the notion of words through their lexicon models (BROWN et al., 1993), they are usually capable of outputting sentences already segmented into words when they translate into languages like Chinese or Japanese. But this is not necessarily the case with commercial systems.</Paragraph>
    <Paragraph position="1"> For instance, Systran4 does not output segmented texts when it translates into Chinese or Japanese.</Paragraph>
    <Paragraph position="2"> As such, comparing systems that translate into languages where words are not an immediate given in unprocessed texts, is still hindered by the human evaluation bottleneck. To compare the performance of different systems, segmentation has to be performed beforehand.</Paragraph>
    <Paragraph position="3">  One can always apply standard word segmentation tools (for instance, The Peking University Segmenter for Chinese (DUAN et al., 2003) or ChaSen for Japanese (MATSUMOTO et al., 1999)), and then apply objective MT evaluation methods. However, the scores obtained would be biased by the error rates of the segmentation tools on MT outputs5. Indeed, MT outputs still differ from standard texts, and their segmentation may lead to a different performance. Consequently, it is difficult to directly and fairly compare scores obtained for a system outputting non-segmented sentences with scores obtained for a system delivering sentences already segmented into words.</Paragraph>
    <Section position="1" start_page="79" end_page="79" type="sub_section">
      <SectionTitle>
2.2 BLEU in characters
</SectionTitle>
      <Paragraph position="0"> Notwithstanding the previous issue, it is undeniable that methods like BLEU or NIST have been adopted by the MT community as they measure complementary characteristics of translations: namely fluency and adequacy (AKIBA et al., 2004, p. 7). Although far from being perfect, they definitely are automatic, fast, and cheap. For all these reasons, one cannot easily ask the MT community to give up their practical know-how related to such measures. It is preferable to state an equivalence with well established measures than to merely look for some correlation with human scores, which would indeed amount to propose yet another new evaluation method.</Paragraph>
      <Paragraph position="1"> Characters are always an immediate given in any electronic text of any language, which is not necessarily the case for words. Based on this observation, this study shows the effect of shifting from the level of words to the level of characters, i.e., of performing all computations in characters instead of words. According to what was said above, the purpose is not to look for any correlation with human scores, but to establish an equivalence between BLEU scores obtained in two ways: on characters and on words.</Paragraph>
      <Paragraph position="2"> Intuitively a high correlation should exist. The contrary would be surprising. However, the equivalence has yet to be determined, along with the corresponding numbers of characters and words for which the best correlation is obtained.</Paragraph>
      <Paragraph position="3"> 5Such error rates are around 5% to 10% for standard texts. An evaluation of the segmentation tool is in fact required. on MT outputs alone.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML