File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0318_intro.xml

Size: 3,614 bytes

Last Modified: 2025-10-06 14:01:54

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0318">
  <Title>Input Sentence Splitting and Translating</Title>
  <Section position="2" start_page="0" end_page="3" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> To achieve translation technology that is adequate for speech translation, the possibilities of several corpus-based approaches are being investigated. Among these methods, DP-match Driven transDucer (D  ) has been proposed as an Example-Based Machine Translation (EBMT). When D  is adapted to Japanese-to-English translation in a travel conversation domain, the method can achieve a high translation quality (Sumita, 2001 and 2002). On the other hand, the translation method is sensitive to the long sentence problem, where longer input sentences make it more difficult for a machine translation (MT) system to perform good translation. To overcome this problem, the technique of splitting an input sentence  and translating the split sentences appears promising.</Paragraph>
    <Paragraph position="1"> The methods of previous studies related to this approach can be roughly classified into two types: one splits sentences before translation and the other splits them in the parsing phase of translation. We'll call the former pre-process-splitting, the latter parse-time- null Strictly speaking, this isn't necessarily a sentence but an utterance including sentences. In this paper, we use the term sentence without strictly defining it to simplify discussion. splitting, and translation with any splitting split-andtranslate. null In previous research on pre-process-splitting, such as Takezawa (1999), many methods have been based on word-sequence characteristics. Some research efforts have achieved high performance in recall and precision against correct splitting positions. Despite such a high performance, from the view point of translation, MT systems are not always able to translate the split sentences well.</Paragraph>
    <Paragraph position="2"> In some research works on parse-time-splitting, such as Furuse (1998 and 2001), sentences have been split based on parsing trees under construction. Partly constructed trees are combined and translated. A sentences is split according to the sub-trees. The split sentences can be translated because an internal parsing mechanism guarantees their fitness. However, parse-time-splitting technique cannot be adapted, or can be adapted only as pre-process-splitting by using an external parsing system, to MT systems that deal with no parsing tree, such as D  and Statistical MT.</Paragraph>
    <Paragraph position="3"> In this paper, we propose another split-and-translate technique in which splitting and translation act in harmony. This technique depends on no particular MT method, therefore can be applied to D  . In order to prove the effect for translation quality, our proposed split-and-translate method and, for the purpose of comparison, a pre-process-splitting technique are evaluated. For convenience, we'll call the two split-and-translate methods in our experiments as follows.</Paragraph>
    <Paragraph position="4"> method-T: Our proposed method based on partial Translation results, described in section 2.</Paragraph>
    <Paragraph position="5"> method-N: Before translation, splitting an input sentence with the pre-process-splitting method based on N-gram, described in section 3.</Paragraph>
    <Paragraph position="6"> The following sections describe the two methods, the</Paragraph>
    <Paragraph position="8"> that the methods are applied to, and experiments. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML