File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0716_intro.xml
Size: 2,458 bytes
Last Modified: 2025-10-06 14:01:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0716"> <Title>Towards a Speech-to-Speech Machine Translation Quality Metric</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> When we consider speech-to-speech (S2S) translation systems, several abstract models are possible.</Paragraph> <Paragraph position="1"> In Model 1 (Figure 1) we treat the entire software system as a &quot;black box,&quot; just recognizing that the input is a source language utterance (SLU) and the output is a target language utterance In Model 2 (Figure 2) we break the previous black box into several traditional components, reflecting typical language processing modules. The source language utterance is transformed to a source language text (SLT) by an automatic speech recognition (ASR) system. The SLT is then translated by a machine translation (MT) system to a target language text (TLT), which is in turn converted to the target language utterance by a Model 3 (Figure 3) illustrates how the source language text and MT component may be replaced by a natural language generation (NLG) system, given a rich enough semantic representation. Other models are certainly possible, depending upon how the various processing tasks are subdivided.</Paragraph> <Paragraph position="2"> Regardless of how many levels and components there are in a given implementation, different metrics could be applied around any input-output pair of interest to help drive quality improvements. In Model 2 above, for example, we could have a metric around each processing Association for Computational Linguistics.</Paragraph> <Paragraph position="3"> Algorithms and Systems, Philadelphia, July 2002, pp. 117-120. Proceedings of the Workshop on Speech-to-Speech Translation: module; that is, one metric for the mapping SLU to SLT, another metric for SLT to TLT, and a third from TLT to TLU. Each metric would be used to study the effectiveness of the system module of interest.</Paragraph> <Paragraph position="4"> However, since the only guaranteed input-output pair regardless of the particular combination of technologies used would be SLU to TLU, and since all systems can be abstracted into Model 1 above, let us focus on an abstract metric which we will call the utterance-to-utterance (U2U) Metric. What do we require of our abstract U2U metric?</Paragraph> </Section> class="xml-element"></Paper>