File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0311_intro.xml
Size: 3,272 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0311"> <Title>Retrieving Meaning-equivalent Sentences for Example-based Rough Translation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Speech-to-speech translation (S2ST) technologies consist of speech recognition, machine translation (MT), and speech synthesis (Waibel, 1996; Wahlster, 2000; Yamamoto, 2000). The MT part receives speech texts recognized by a speech recognizer. The nature of speech causes difficulty in translation since the styles of speech are different from those of written text and are sometimes ungrammatical (Lazzari, 2002). Therefore, rule-based MT cannot translate speech accurately compared with its performance for written-style text .</Paragraph> <Paragraph position="1"> Example-based MT (EBMT) is one of the corpus-based machine translation methods. It retrieves examples similar to inputs and adjusts their translations to obtain the output (Nagao, 1981). EBMT is a promising method for S2ST in that it performs robust translation of ungrammatical sentences and requires far less manual work than rule-based MT.</Paragraph> <Paragraph position="2"> However, there are two problems in applying EBMT to S2ST. One is that the translation accuracy drastically drops as input sentences become long. As the length of a sentence becomes long, the number of retrieved similar sentences greatly decreases. This often results in no output when translating long sentences. The other problem arises due to the differences in style between input sentences and the example corpus. It is difficult to acquire a large volume of natural speech data since it requires much time and cost. Therefore, we cannot avoid using a corpus with written-style text, which is different from that of natural speech. This style difference makes retrieval of similar sentences difficult and degrades the performance of EBMT.</Paragraph> <Paragraph position="3"> This paper proposes a method of retrieving sentences whose meaning is equivalent to input sentences to overcome the two problems. A meaning-equivalent sentence means a sentence having the main meaning of an input sentence despite lacking some unimportant information.</Paragraph> <Paragraph position="4"> Such a sentence can be more easily retrieved than a similar sentence, and its translation is useful enough in S2ST. We call this translation strategy example-based &quot;rough translation.&quot; Retrieval of meaning-equivalent sentences is based on content words, modality, and tense. This provides robustness against long inputs and in the differences in style between the input and the example corpus. This advantage distinguishes our method from other translation methods.</Paragraph> <Paragraph position="5"> We describe the difficulties in S2ST in Section 2. Then, we describe our purpose, features for retrieval, and retrieval method for meaning-equivalent sentences in Section 3. We report an experiment comparing our method with two other methods in Section 4. The experiment demonstrates the robustness of our method to length of input and the style differences between inputs and the example corpus.</Paragraph> </Section> class="xml-element"></Paper>