File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/99/p99-1040_intro.xml

Size: 3,691 bytes

Last Modified: 2025-10-06 14:06:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1040">
  <Title>Automatic Detection of Poor Speech Recognition at the Dialogue Level</Title>
  <Section position="2" start_page="0" end_page="309" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Builders of spoken dialogue systems face a number of fundamental design choices that strongly influence both performance and user satisfaction. Examples include choices between user, system, or mixed initiative, and between explicit and implicit confirmation of user commands. An ideal system wouldn't make such choices a priori, but rather would adapt to the circumstances at hand. For instance, a system detecting that a user is repeatedly uncertain about what to say might move from user to system initiative, and a system detecting that speech recognition performance is poor might switch to a dialogUe strategy with more explicit prompting, an explicit confirmation mode, or keyboard input mode. Any of these adaptations might have been appropriate in dialogue D1 from the Annie system (Kamm et al., 1998), shown in Figure 1.</Paragraph>
    <Paragraph position="1"> In order to improve performance through such adaptation, a system must first be able to identify, in real time, salient properties of an ongoing dialogue that call for some useful change in system strategy.</Paragraph>
    <Paragraph position="2"> In other words, adaptive systems should try to automatically identify actionable properties of ongoing dialogues.</Paragraph>
    <Paragraph position="3"> Previous work has shown that speech recognition performance is an important predictor of user satisfaction, and that changes in dialogue behavior impact speech recognition performance (Walker et al., 1998b; Litman et al., 1998; Kamm et al., 1998).</Paragraph>
    <Paragraph position="4"> Therefore, in this work, we focus on the task of automatically detecting poor speech recognition performance in several spoken dialogue systems developed at AT&amp;T Labs. Rather than hand-crafting rules that classify speech recognition performance in an ongoing dialogue, we take a machine learning approach. We begin with a collection of system logs from actual dialogues that were labeled by humans as having had &amp;quot;good&amp;quot; or &amp;quot;bad&amp;quot; speech recognition (the training set). We then apply standard machine learning algorithms to this training set in the hope of discovering, in a principled manner, classifiers that can automatically detect poor speech recognition during novel dialogues.</Paragraph>
    <Paragraph position="5"> In order to train such classifiers, we must provide them with a number of &amp;quot;features&amp;quot; of dialogues derived from the system logs that might allow the system to automatically identify poor recognition performance. In addition to identifying features that provide the best quantitative solutions, we are also interested in comparing the performance of classifiers derived solely from acoustic features or from &amp;quot;high-level&amp;quot; dialogue features, and from combinations of these and other feature types. Note that we are free to invent as many features as we like, as long as they can be computed in real time from the raw system logs.</Paragraph>
    <Paragraph position="6"> Since the dialogue systems we examine use automatic speech recognition (ASR), one obvious feature available in the system log is a per-utterance score from the speech recognizer representing its &amp;quot;confidence&amp;quot; in its interpretation of the user's utterance (Zeljkovic, 1996). For dialogue D1, the recognizer's output and the associated confidence scores</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML