XML Viewer - h05-1029

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1029_metho.xml
Size: 18,734 bytes
Last Modified: 2025-10-06 14:09:29
<?xml version="1.0" standalone="yes"?>
<Paper uid="H05-1029">
  <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 225-232, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Error Handling in the RavenClaw Dialog Management Framework</Title>
  <Section position="4" start_page="225" end_page="227" type="metho">
    <SectionTitle>
2 RavenClaw Dialog Management
</SectionTitle>
    <Paragraph position="0"> We begin with a brief overview of the RavenClaw dialog management framework, as it provides the larger context for the error handling architecture.</Paragraph>
    <Paragraph position="1"> RavenClaw is a dialog management framework for task-oriented spoken dialog systems. To date, it has been used to construct a large number of systems spanning multiple domains and interaction types (Bohus and Rudnicky, 2003): information access (RoomLine, the Let's Go Bus Information System), guidance through procedures (LARRI), command-and-control (TeamTalk), taskable agents (Vera). Together with these systems, RavenClaw provides the larger context as well as a test-bed for evaluating the proposed error handling architecture. More generally, RavenClaw provides a robust basis for research in various other aspects of dialog management, such as learning at the task and discourse levels, multi-participant dialog, timing and turn-taking, etc.</Paragraph>
    <Paragraph position="2"> A key characteristic of the RavenClaw framework is the separation it enforces between the domain-specific and domain-independent aspects of dialog control. The domain-specific dialog control logic is described by a Dialog Task Specification,  essentially a hierarchical dialog plan provided by the system author. A fixed, domain-independent Dialog Engine manages the conversation by executing the given Dialog Task Specification. In the process, the Dialog Engine also contributes a set of domain-independent conversational skills, such as error handling (discussed extensively in Section 4), timing and turn-taking, etc. The system authoring effort is therefore minimized and focused entirely on the domain-specific aspects of dialog control.</Paragraph>
    <Section position="1" start_page="226" end_page="226" type="sub_section">
      <SectionTitle>
2.1 The Dialog Task Specification
</SectionTitle>
      <Paragraph position="0"> A Dialog Task Specification consists of a tree of dialog agents, where each agent manages a sub-part of the interaction. Figure 1 illustrates a portion of the dialog task specification from RoomLine, a spoken dialog system which can assist users in making conference room reservations. The root node subsumes several children: Welcome, which produces an introductory prompt, GetQuery which obtains the time and room constraints from the user, DoQuery which performs the database query, and DiscussResults which handles the follow-up negotiation dialog. Going one level deeper in the tree, GetQuery contains GetDate which requests the date for the reservation, GetStartTime and GetEnd-Time which request the times, and so on. This type of hierarchical task representation has a number of advantages: it scales up gracefully, it can be dynamically extended at runtime, and it implicitly captures a notion of context in dialog.</Paragraph>
      <Paragraph position="1"> The agents located at the leaves of the tree are called basic dialog agents, and each of them implements an atomic dialog action (dialog move).</Paragraph>
      <Paragraph position="2"> There are four types of basic dialog agents: Inform - conveys information to the user (e.g. Welcome), Request - asks a question and expects an answer (e.g. GetDate), Expect - expects information without explicitly asking for it, and EXecute - implements a domain specific operation (e.g. DoQuery). The agents located at non-terminal positions in the tree are called dialog agencies (e.g. RoomLine, GetQuery). Their role is to plan for and control the execution of their sub-agents. For each agent in the tree, the system author may specify preconditions, completion criteria, effects and triggers; various other functional aspects of the dialog agents (e.g.</Paragraph>
      <Paragraph position="3"> state-specific language models for request-agents, help-prompts) are controlled through parameters.</Paragraph>
      <Paragraph position="4"> The information the system acquires and manipulates in conversation is captured in concepts, associated with various agents in the tree (e.g. date, start_time). Each concept maintains a history of previous values, information about current candidate hypotheses and their associated confidence scores, information about when the concept was last updated, as well as an extended set of flags which describe whether or not the concept has been conveyed to the user, whether or not the concept has been grounded, etc. This rich representation provides the necessary support for concept-level error handling.</Paragraph>
    </Section>
    <Section position="2" start_page="226" end_page="227" type="sub_section">
      <SectionTitle>
Dialog Stack
Dialog Engine
Dialog Task
Specification
Expectation Agenda
</SectionTitle>
      <Paragraph position="0"> start_time: [start_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] location: [location] network: [with_network]-&gt;true,</Paragraph>
    </Section>
    <Section position="3" start_page="227" end_page="227" type="sub_section">
      <SectionTitle>
2.2 The Dialog Engine
</SectionTitle>
      <Paragraph position="0"> The Dialog Engine is the core domain-independent component which manages the interaction by executing a given Dialog Task Specification. The control algorithms are centered on two data-structures: a dialog stack, which captures the dialog structure at runtime, and an expectation agenda, which captures the system's expectations for the user input at each turn in the dialog. The dialog is controlled by interleaving Execution Phases with Input Phases.</Paragraph>
      <Paragraph position="1"> During an Execution Phase, dialog agents from the tree are placed on, and executed from the dialog stack. At the beginning of the dialog, the root agent is placed on the stack. Subsequently, the engine repeatedly takes the agent on the top of the stack and executes it. When dialog agencies are executed, they typically schedule one of their sub-agents for execution by placing it on the stack. The dialog stack will therefore track the nested structure of the dialog at runtime. Ultimately, the execution of the basic dialog agents on the leaves of the tree generates the system's responses and actions.</Paragraph>
      <Paragraph position="2"> During an Input Phase, the system assembles the expectation agenda, which captures what the system expects to hear from the user in a given turn. The agenda subsequently mediates the transfer of semantic information from the user's input into the various concepts in the task tree. For the interested reader, these mechanisms are described in more detail in (Bohus and Rudnicky, 2003) Additionally, the Dialog Engine automatically provides a number of conversational strategies, such as the ability to handle various requests for help, repeating the last utterance, suspending and resuming the dialog, starting over, reestablishing the context, etc. These strategies are implemented as library dialog agencies. Their corresponding sub-trees are automatically added to the Dialog Task Specification provided by the system author (e.g. the Start-Over agency in Figure 1.) The automatic availability of these strategies lessens development efforts and ensures a certain uniformity of behavior both within and across tasks.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="227" end_page="230" type="metho">
    <SectionTitle>
3 The Error Handling Architecture
</SectionTitle>
    <Paragraph position="0"> The error handling architecture in the RavenClaw dialog management framework subsumes two main components: (1) a set of error handling strategies (e.g. explicit and implicit confirmation, asking the user to repeat, etc.) and (2) an error handling process which engages these strategies.</Paragraph>
    <Paragraph position="1"> The error handling strategies are implemented as library dialog agents. The decision process which engages these strategies is part of the Dialog Engine. This design, in which both the strategies and the decision process are decoupled from the dialog task, as well as from each other, provides a number of advantages. First, it ensures that the error handling mechanisms are reusable across different dialog systems. Second, the approach guarantees a certain uniformity and consistency in error handling behaviors both within and across systems. Third, as new error handling strategies are developed, they can be easily plugged into any existing system. Last, but not least, the approach significantly lessens the system authoring effort by allowing developers to focus exclusively on describing the dialog control logic.</Paragraph>
    <Paragraph position="2"> The responsibility for handling potential understanding errors1 is delegated to the Error Handling Process which runs in the Dialog Engine (see Figure 2). At each system turn, this process collects evidence and makes a decision with respect to engaging any of the error handling strategies. When necessary, it will insert an error handling strategy on the dialog stack (e.g. the ExplicitConfirm (start_time) strategy in Figure 2), thus modifying on-the-fly the task originally specified by the system author. The strategy executes and, once completed, it is removed from the stack and the dialog resumes from where it was left off.</Paragraph>
    <Paragraph position="3"> 1 Note that the proposed framework aims to handle understanding errors. The corresponding strategies are generic and can be applied in any domain. Treatment of domain or task-specific errors (e.g. database access error, etc) still needs to be implemented as part of the dialog task specification.</Paragraph>
    <Section position="1" start_page="228" end_page="228" type="sub_section">
      <SectionTitle>
3.1 Error Handling Strategies
</SectionTitle>
      <Paragraph position="0"> The error handling strategies can be divided into two groups: strategies for handling potential misunderstandings and strategies for handling nonunderstandings. null For handling potential misunderstandings, three strategies are currently available: Explicit Confirmation, Implicit Confirmation and Rejection. For non-understandings, a larger number of error recovery strategies are currently available: AskRepeat - the system asks the user to repeat; AskRephrase - the system asks the user to rephrase; Reprompt - the system repeats the previous prompt; DetailedReprompt - the system repeats a more verbose version of the previous prompt, Notify - the system simply notifies the user that a non-understanding has occurred; Yield - the system remains silent, and thus implicitly notifies the user that a non-understanding has occurred; MoveOn - the system tries to advance the task by giving up on the current question and moving on with an alternative dialog plan (note that this strategy is only available at certain points in the dialog); YouCanSay - the system gives an example of what the user could say at this point in the dialog; FullHelp - the system provides a longer help message which includes an explanation of the current state of the system, as well as what the user could say at this point. An in-depth analysis of these strategies and their relative tradeoffs is available in (Bohus and Rudnicky, 2005a). Several sample dialogs illustrating these strategies are available on-line (RoomLine, 2003).</Paragraph>
    </Section>
    <Section position="2" start_page="228" end_page="230" type="sub_section">
      <SectionTitle>
3.2 Error Handling Process
</SectionTitle>
      <Paragraph position="0"> The error handling decision process is implemented in a distributed fashion, as a collection of local decision processes. The Dialog Engine automatically associates a local error handling process with each concept, and with each request agent in the dialog task tree, as illustrated in Figure 3. The error handling processes running on individual concepts are in charge of recovering from misunderstandings on those concepts. The error handling processes running on individual request agents are in charge or recovering from non-understandings on the corresponding requests.</Paragraph>
      <Paragraph position="1"> At every system turn, each concept- and request-agent error handling process computes and forwards its decision to a gating mechanism, which queues up the actions (if necessary) and executes them one at a time. For instance, in the example in Figure 3, the error handling decision process for the start_time concept decides to engage an explicit confirmation on that concept, while the other decision processes do not take any action. In this case the gating mechanism creates a new instance of an explicit confirmation agency, passes it the pointer to the concept to be confirmed (start_time), and places it on the dialog stack. On completion, the strategy updates the confidence score of the confirmed hypothesis in light of the user response, and the dialog resumes from where it was left off.</Paragraph>
      <Paragraph position="2"> The specific implementation of the local decision processes constitutes an active research issue. Currently, they are modeled as Markov Decision Processes (MDP). The error handling processes running on individual concepts (concept-MDPs in  underlying hidden states: correct, incorrect and empty. The belief state is constructed at each time step from the confidence score of the top-hypothesis for the concept. For instance, if the top hypothesis for the start_time concept is 10 a.m. with confidence 0.76, then the belief state for the POMDP corresponding to this concept is: {P(correct)=0.76, P(incorrect)=0.24, P(empty)=0}.</Paragraph>
      <Paragraph position="3"> The action-space for these models contains the three error recovery strategies for handling potential misunderstandings, and no-action. The third ingredient in the model is the policy. A policy defines which action the system should take in each state, and is indirectly described by specifying the utility of each strategy in each state. Currently, a number of predefined policies (e.g. alwaysexplicit-confirm, pessimistic, and optimistic) are available in the framework. Alternatively, system authors can specify and use their own policies.</Paragraph>
      <Paragraph position="4"> The error handling processes running on request agents (request-MDPs in Figure 3) are in charge of handling non-understandings on those requests. Currently, two types of models are available for this purpose. The simplest model has three states: non-understanding, understanding and inactive. A second model also includes information about the number of consecutive non-understandings that have already happened. In the future, we plan to identify more features which carry useful information about the likelihood of success of individual recovery strategies and use them to create more complex models. The action-space is defined by the set of non-understanding recovery strategies presented in the previous subsection, and noaction. Similar to the concept-MDPs, a number of default policies are available; alternatively, system authors can specify their own policy for engaging the strategies.</Paragraph>
      <Paragraph position="5"> While the MDP implementation allows us to encode various expert-designed policies, our ultimate goal is to learn such policies from collected data using reinforcement learning. Reinforcement learning has been previously used to derive dialog control policies in systems operating with small tasks (Scheffler and Young, 2002; Singh et al, 2000). The approaches proposed to date suffer however from one important shortcoming, which has so far prevented their use in large, practical spoken dialog systems. The problem is lack of scalability: the size of the state space grows very fast with the size of the dialog task, and this renders the approach unfeasible in complex domains.</Paragraph>
      <Paragraph position="6"> A second important limitation of reinforcement learning techniques proposed to date is that the learned policies cannot be reused across tasks. For each new system, a new MDP has to be constructed, new data has to be collected, and a new training phase is necessary. This requires a significant amount of expertise and effort from the system author.</Paragraph>
      <Paragraph position="7"> We believe that the error handling architecture we have described addresses these issues in several ways. The central idea behind the distributed nature of the approach is to keep the learning problem tractable by leveraging independence relationships between different parts of the dialog. First, the state and action-spaces can be maintained relatively small since we are only focusing on making error handling decisions (as opposed to other dialog control decisions). A more complex task translates into a larger number of MDP instantiations rather than a more complex model structure. Second, both the model structure and parameters (i.e. the transition probabilities) can be tied across models: for instance the MDP for grounding the start_time concept can be identical to the one for grounding the end_time concept; all models for grounding Yes/No concepts could be tied together, etc. Model tying has the potential to greatly improve scalability since data is polled together and the total number of model parameters to be learned grows sub-linearly with the size of the task. Third, since the individual MDPs are decoupled from the actual system task, the policies learned in a particular system can potentially be reused in other systems (e.g. we expect that grounding yes/no concepts functions similarly at different locations in the dialog, and across domains). Last but not least, the approach can easily accommodate dynamic task generation. In traditional reinforcement learning approaches the state and action-spaces of the underlying MDP are task-specific. The task therefore has to be fixed, known in advance: for instance the slots that the system queries the user about (in a slot-filling system) are fixed. In contrast, in the RavenClaw architecture, the dialog task tree (e.g. the dialog plan) can be dynamically expanded at runtime with new questions and concepts, and the corresponding request- and concept-MDPs are automatically created by the Dialog Engine. null</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML