XML Viewer - h93-1042

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/93/h93-1042_evalu.xml
Size: 2,671 bytes
Last Modified: 2025-10-06 14:00:09
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1042">
  <Title>A SPEECH TO SPEECH TRANSLATION SYSTEM BUILT FROM STANDARD COMPONENTS</Title>
  <Section position="7" start_page="221" end_page="221" type="evalu">
    <SectionTitle>
5. RESULTS OF SYSTEM
EVALUATION
</SectionTitle>
    <Paragraph position="0"> In this final section we present evaluation results for the current version of the system running on data previously unseen by the developers. There is so far little consensus on how to evaluate spoken language translation systems; for instance, no evaluation figures on unseen material are cited for the systems described in \[17\] and \[14\]. We present the results below partly in an attempt to stimulate discussion on this topic.</Paragraph>
    <Paragraph position="1"> The sentences of lengths 1 to 12 words from the Fall 1992 test set (633 sentences from 1000) were processed through the system from speech signal to target language text output, and the translations produced were evaluated by a panel fluent in both languages. Points were awarded for meaning preservation, gramrnatieality of the output, naturalness of the output, and preservation of the style of the original, and a translation had to be classified as acceptable on all four counts to be regarded as acceptable in general. Judgements were also elicited for intermediate results, in particular whether a speech hypothesis could be judged as a valid variant of the reference sentence in the context of the translation task, and whether the semantic analysis sent to the transfer stage was correct. The criteria used to determine whether a speech hypothesis was a valid variant of the reference were strict, typical differences being substitution of all the for plural the, what's for what is, or I want for I'd like.</Paragraph>
    <Paragraph position="2"> The results were as follows. For 1-best recognition, 62.4% of the hypotheses were equal to or valid variants of the reference, and 55.3% were valid and also within grammatical coverage. For 5-best recognition, the corresponding figures were 78.2% and 69.0%. Selecting the acoustically highest-ranked hypothesis that was inside grammatical coverage yielded an acceptable choice in 61.1% of the examples; a scoring scheme that chose the best hypothesis using a weighted combination of the acoustic and linguistic scores did slightly better, increasing the proportion to 63.0%. 54% of the exampies received a most preferred semantic analysis that was judged correct, 45.3% received a translation, and 41.8% received an acceptable translation. The corresponding error rates for each component are shown in table 2.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML