XML Viewer - p00-1013

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1013_intro.xml
Size: 6,648 bytes
Last Modified: 2025-10-06 14:00:51
<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1013">
  <Title>Spoken Dialogue Management Using Probabilistic Reasoning</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The development of automatic speech recognition has made possible more natural human-computer interaction. Speech recognition and speech understanding, however, are not yet at the point where a computer can reliably extract the intended meaning from every human utterance.</Paragraph>
    <Paragraph position="1"> Human speech can be both noisy and ambiguous, and many real-world systems must also be speaker-independent. Regardless of these difficulties, any system that manages human-machine dialogues must be able to perform reliably even with noisy and stochastic speech input.</Paragraph>
    <Paragraph position="2"> Recent research in dialogue management has shown that Markov Decision Processes (MDPs) can be useful for generating effective dialogue strategies (Young, 1990; Levin et al., 1998); the system is modelled as a set of states that represent the dialogue as a whole, and a set of actions corresponding to speech productions from the system. The goal is to maximise the reward obtained for fulfilling a user's request. However, the correct way to represent the state of the dialogue is still an open problem (Singh et al., 1999). A common solution is to restrict the system to a single goal.</Paragraph>
    <Paragraph position="3"> For example, in booking a flight in an automated travel agent system, the system state is described in terms of how close the agent is to being able to book the flight.</Paragraph>
    <Paragraph position="4"> Such systems suffer from a principal problem. A conventional MDP-based dialogue manager must know the current state of the system at all times, and therefore the state has to be wholly contained in the system representation. These systems perform well under certain conditions, but not all. For example, MDPs have been used successfully for such tasks as retrieving e-mail or making travel arrangements (Walker et al., 1998; Levin et al., 1998) over the phone, task domains that are generally low in both noise and ambiguity. However, the issue of reliability in the face of noise is a major concern for our application. Our dialogue manager was developed for a mobile robot application that has knowledge from several domains, and must interact with many people over time. For speaker-independent systems and systems that must act in a noisy environment, the user's action and intentions cannot always be used to infer the dialogue state; it may be not be possible to reliably and completely determine the state of the dialogue following each utterance.</Paragraph>
    <Paragraph position="5"> The poor reliability of the audio signal on a mobile robot, coupled with the expectations of natural interaction that people have with more anthropomorphic interfaces, increases the demands placed on the dialogue manager.</Paragraph>
    <Paragraph position="6"> Most existing dialogue systems do not model confidences on recognition accuracy of the human utterances, and therefore do not account for the reliability of speech recognition when applying a dialogue strategy. Some systems do use the log-likelihood values for speech utterances, however these values are only thresholded to indicate whether the utterance needs to be confirmed (Niimi and Kobayashi, 1996; Singh et al., 1999). An important concept lying at the heart of this issue is that of observability - the ultimate goal of a dialogue system is to satisfy a user request; however, what the user really wants is at best partially observable.</Paragraph>
    <Paragraph position="7"> We handle the problem of partial observability by inverting the conventional notion of state in a dialogue. The world is viewed as partially unobservable - the underlying state is the intention of the user with respect to the dialogue task. The only observations about the user's state are the speech utterances given by the speech recognition system, from which some knowledge about the current state can be inferred. By accepting the partial observability of the world, the dialogue problem becomes one that is addressed by Partially Observable Markov Decision Processes (POMDPs) (Sondik, 1971). Finding an optimal policy for a given POMDP model corresponds to defining an optimal dialogue strategy. Optimality is attained within the context of a set of rewards that define the relative value of taking various actions. null We will show that conventional MDP solutions are insufficient, and that a more robust methodology is required. Note that in the limit of perfect sensing, the POMDP policy will be equivalent to an MDP policy. What the POMDP policy offers is an ability to compensate appropriately for better or worse sensing. As the speech recognition degrades, the POMDP policy acquires reward more slowly, but makes fewer mistakes and blind guesses compared to a conventional MDP policy.</Paragraph>
    <Paragraph position="8"> There are several POMDP algorithms that may be the natural choice for policy generation (Sondik, 1971; Monahan, 1982; Parr and Russell, 1995; Cassandra et al., 1997; Kaelbling et al., 1998; Thrun, 1999). However, solving real world dialogue scenarios is computationally intractable for full-blown POMDP solvers, as the complexity is doubly exponential in the number of states. We therefore will use an algorithm for finding approximate solutions to POMDP-style problems and apply it to dialogue management.</Paragraph>
    <Paragraph position="9"> This algorithm, the Augmented MDP, was developed for mobile robot navigation (Roy and Thrun, 1999), and operates by augmenting the state description with a compression of the current belief state. By representing the belief state succinctly with its entropy, belief-space planning can be approximated without the expected complexity.</Paragraph>
    <Paragraph position="10"> In the first section of this paper, we develop the model of dialogue interaction. This model allows for a more natural description of dialogue problems, and in particular allows for intuitive handling of noisy and ambiguous dialogues. Few existing dialogues can handle ambiguous input, typically relying on natural language processing to resolve semantic ambiguities (Aust and Ney, 1998). Secondly, we present a description of an example problem domain, and finally we present experimental results comparing the performance of the POMDP (approximated by the Augmented MDP) to conventional MDP dialogue strategies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML