File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/w00-0203_intro.xml

Size: 3,358 bytes

Last Modified: 2025-10-06 14:00:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="W00-0203">
  <Title>Evaluation of a Practical Interlingua for Task-Oriented Dialogue</Title>
  <Section position="3" start_page="2698" end_page="2698" type="intro">
    <SectionTitle>
2 Transcription
3 Recosnition
4 Transcription
5 Recognition
6 Transcription
7 Recognition
8 Transcription
9 Recognition
</SectionTitle>
    <Paragraph position="0"> score of perfect is assigned if, in addition to the previous criteria, the translation is fluent in the target language. A score of bad is assigned if the target language sentence is incomprehensible or some element of meaning has been added, deleted, or changed. The evaluation procedure is described in detail in \[GLL+96\]. In Figure 6, acceptable is the sum of perfect and ok scores, s Figure 6 shows the results of the intra-site and inter-site evaluations. The first row grades the speech recognition output against a human-produced transcript of what was said. This gives us a ceiling for how well we could do if translation were perfect, given speech recognition errors. Rows 2 through 7 show the results of the intra-site evaluation. All analyzers and generators were written at CMU, and the results were graded by CMU researchers. (The German results are a lower than the English and Japanese results because a shorter time was spent on grammar development.) Rows 8 and 9 report on CMU's intra~site evaluation of English-German transla~ Sin another paper (\[LBL+00\]), we describe a task-based evaluation which focuses on success of communicative goals and how long it takes to achieve them. tion (the same system as in Rows 6 and 7), but the results were graded by researchers at IRST.</Paragraph>
    <Paragraph position="1"> Comparing Rows 6 and 7 with Rows 8 and 9, we can check that CMU and IRST graders were using roughly the same grading criteria: a difference of up to ten percent among graders is normal in our experience. Rows 10 and 11 show the results of the inter-site English-Italian evaluation. The CMU English analyzer produced IF representations which were sent to IRST and were fed into IRST's Italian generator. The results were graded by IRST researchers.</Paragraph>
    <Paragraph position="2"> Conclusions drawn from the inter-site evaluation: Since the inter-site evaluation results are comparable to the intra-site results, we conclude that researchers at IRST and CMU are using IF at least as consistently as researchers within CMU.</Paragraph>
    <Section position="1" start_page="2698" end_page="2698" type="sub_section">
      <SectionTitle>
Future Plans
</SectionTitle>
      <Paragraph position="0"> In the next phase of C-STAR, we will cover descriptive sentences (e.g., The castle was built in the thirteenth century and someone was imprisoned in the tower) as well as task-oriented sentences. Descriptive sentences will be represented  in a more traditional frame-based interlingua focusing on lexical meaning and grammatical features of the sentences. We are working on disambiguating literal from task-oriented meanings in context. For example That's great could be an acceptance (like I'll take it) (task oriented) or could just express appreciation. Sentences may also contain a combination of task oriented (e.g., Can you tell me) and descriptive (how long the castle has been standing) components.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML