File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2219_concl.xml

Size: 5,018 bytes

Last Modified: 2025-10-06 13:58:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2219">
  <Title>Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email</Title>
  <Section position="7" start_page="1349" end_page="1350" type="concl">
    <SectionTitle>
5 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper illustrates a novel technique by which an agent can learn to choose an optimal dialogue strategy. We illustrate our technique with ELVIS, an agent that supports access to email by phone, with strategies for initiative, and for reading and summarizing messages. We show that ELVIS can learn that the System-Initiative strategy has higher utility than the Mixed-Initiative strategy, that Read-First is the best read strategy, and that Summarize-System is the best summary strategy.</Paragraph>
    <Paragraph position="1"> Here, our method was illustrated by evaluating strategies for managing initiative and for message presentation. However there are numerous dialogue strategies that an agent might use, e.g. to gather information, handle errors, or manage the dialogue interaction (Chu-Carroll and Carberry, 1995; Danieli and Gerbino, 1995; Hovy, 1993; McKeown, 1985; Moore and Paris, 1989). Previous work in natural language generation has proposed heuristics to determine an agent's choice of dialogue strategy, based on factors such as discourse focus, medium, style, and the content of previous explanations (McKeown, 1985; Moore and Paris, 1989; Maybury, 1991; Hovy, 1993). It should be possible to test experimentally whether an agent can automatically learn these heuristics since the methodology we propose is general, and could be applied to any dialogue strategy choice that an agent might make.</Paragraph>
    <Paragraph position="2"> Previous work has also proposed that an agent's choice of dialogue strategy can be treated as a stochastic optimization problem (Walker, 1993; Biermann and Long, 1996; Levin and Pieraccini, 1997). However, to our knowledge, these methods have not previously been applied to interactions with real users. The lack of an appropriate performance function has been a critical methodological limitation.</Paragraph>
    <Paragraph position="3"> We use the PARADISE framework (Walker et al., 1997) to derive an empirically motivated performance function, that combines both subjective user preferences and objective system performance measures into a single function. It would have been impossible to predict a prior{ which dialogue factors influence the usability of a dialogue agent, and to what degree. Our performance equation shows that both dialogue quality and efficiency measures contribute to agent performance, but that dialogue quality measures have a greater influence. Furthermore, in contrast to assuming an a priori model, we use the dialogues from real user-system interactions to provide realistic estimates of M/~, the state transition model used by the learning algorithm. It is impossible to predict a priori the transition frequencies, given the imperfect nature of spoken language understanding, and the unpredictability of user be- null havior.</Paragraph>
    <Paragraph position="4"> The use of this method introduces several open issues. First, the results of the learning algorithm are dependent on the representation of the state space. In many reinforcement learning problems (e.g. backgammon), the state space is pre-defined.</Paragraph>
    <Paragraph position="5"> In spoken dialogue systems, the system designers construct the state space and decide what state variables need to be monitored. Our initial results suggest that the state representation that the agent uses to interact with the user may not be the optimal state representation for learning. See (Fromer, 1998).</Paragraph>
    <Paragraph position="6"> Second, in advance of actually running learning experiments, it is not clear how much experience an agent will need to determine which strategy is better. Figure 1 shows that it took no more than 50 dialogue samples for the algorithm to show the differences in convergence trends when learning about initiative strategies. However, it appears that more data is needed to learn to distinguish between the summarization strategies. Third, our experimental data is based on short-term interactions with novice users, but we might expect that users of an email agent would engage in many interactions with the same agent, and that preferences for agent interaction strategies could change over time with user expertise. This means that the performance function might change over time. Finally, the learning algorithm that we report here is an off-line algorithm, i.e. the agent collects a set of dialogues and then decides on an optimal strategy as a result. In contrast, it should be possible for the agent to learn on-line, during the course of a dialogue, if the performance function could be automatically calculated (or approximated). We are exploring these issues in on-going work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML