File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0706_evalu.xml
Size: 3,861 bytes
Last Modified: 2025-10-06 13:58:52
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0706"> <Title>Architectures for speech-to-speech translation using finite-state models</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Experiments and results </SectionTitle> <Paragraph position="0"> Three sets of speech-to-speech translation prototypes have been implemented for Spanish to English and for Italian to English. In all of them, the application was the translation of queries, requests and complaints made by telephone to the front desk of a hotel. Three tasks of different degree of difficulty have been considered.</Paragraph> <Paragraph position="1"> In the first one (EUTRANS-0), Spanish-to-English translation systems were learned from a big and well controlled training corpus: about 170k different pairs ([?] 2M running words), with a lexicon of about 700 words. In the second one (EUTRANS-I), also from Spanish to English, the systems were learned from a random subset of 10k pairs ([?] 100k running words) from the previous corpus; this was established as a more realistic training corpus for the kind of application considered. In the third and most difficult one, from Italian to English (EUTRANS-II), the systems were learned from a small training corpus that was obtained from a transcription of a spontaneous speech corpus: about 3k pairs ([?] 60k running words), with a lexicon of about 2,500 words.</Paragraph> <Paragraph position="2"> For the serial architecture, the speech decoding was performed in a conventional way, using the same acoustic models as with the integrated architecture and trigrams of the source language models. For the integrated architecture, the speech decoding of an utterance is a sub-product of the translation process (the sequence of source words associated to the optimal sequence of transitions that produces the sequence of target words).</Paragraph> <Paragraph position="3"> The acoustic models of phone units were trained with the HTK Toolkit (Woodland, 1997). For the EUTRANS-0 and EUTRANS-I prototypes, a training speech corpus of 57,000 Spanish running words was used, while the EUTRANS-II Italian acoustic models were trained from another corpus of 52,000 running words Performance was assessed on the base of 336 Spanish sentences in the case of EUTRANS-0 and EUTRANS-I and 278 Italian sentences in EUTRANS-II. In all the cases, the test sentences (as well as the corresponding speakers) were different from those appearing in the training data.</Paragraph> <Paragraph position="4"> For the easiest task, EUTRANS-0, (well controlled and a large training set), the best result was achieved with an integrated architecture and a SFST obtained with the OMEGA learning technique. A Translation Word Error Rate of 7.6% was achieved, while the corresponding source-language speech decoding Word Error Rate was 8.4%. Although these figures may seem strange (and they would certainly be in the case of a serial architecture), they are in fact consistent with the fact that, in this task (corpus), the target language exhibits a significantly lower perplexity than the source language.</Paragraph> <Paragraph position="5"> For the second, less easy task EUTRANS-I, (well controlled task but a small training set), the best result was achieved with an integrated architecture and a SFST obtained with the MGTI learning technique (10.5% of word error rate corresponding to the speech decoding and 12.6% of translation word error rate).</Paragraph> <Paragraph position="6"> For the most difficult task, EUTRANS-II (spontaneous task and a small training set), the best result was achieved with a serial architecture and a SFST obtained with the MGTI learning technique (22.1% of word error rate corresponding to the speech decoding and 37.9% of translation word error rate).</Paragraph> </Section> class="xml-element"></Paper>