File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2219_metho.xml

Size: 20,357 bytes

Last Modified: 2025-10-06 14:15:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2219">
  <Title>Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email</Title>
  <Section position="3" start_page="1345" end_page="1345" type="metho">
    <SectionTitle>
2 Method for Learning to Optimize
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1345" end_page="1345" type="sub_section">
      <SectionTitle>
Dialogue Strategy Selection
</SectionTitle>
      <Paragraph position="0"> Our method for learning to optimize dialogue strategy selection combines the application of PARADISE to empirical data (Walker et al., 1997), with algorithms for learning optimal strategy choices.</Paragraph>
      <Paragraph position="1"> PARADISE provides an empirical method for deriving a performance function that calculates over-all agent performance as a linear combination of a number of simpler metrics. Our learning method consists of the following sequence of steps:  * Implement a spoken dialogue agent for a particular domain.</Paragraph>
      <Paragraph position="2"> * Implement multiple dialogue strategies and design the agent so that strategies are selected randomly or under experimenter control.</Paragraph>
      <Paragraph position="3"> * Define a set of dialogue tasks for the domain, and their information exchange requirements. Represent these tasks as attribute-value matrices to facilitate calculating task success.</Paragraph>
      <Paragraph position="4"> * Collect experimental dialogues in which a number of human users converse with the agent to do the tasks.</Paragraph>
      <Paragraph position="5"> * For each experimental dialogue: - Log the history of the state-strategy choices for each dialogue. Use this to estimate a state transition model.</Paragraph>
      <Paragraph position="6"> - Log a range of quantitative and qualitative cost measures for each dialogue, either automatically or with hand-tagging.</Paragraph>
      <Paragraph position="7"> - Collect user satisfaction reports for each dialogue. null * Use multivariate linear regression with user satis null faction as the dependent variable and task success and the cost measures as independent variables to determine a performance equation.</Paragraph>
      <Paragraph position="8"> * Apply the derived performance equation to each dialogue to determine the utility of the final state of the dialogue.</Paragraph>
      <Paragraph position="9"> * Use reinforcement learning to propagate the utility of the final state back to states Si where strategy choices were made to determine which action maximizes U(Si).</Paragraph>
      <Paragraph position="10"> These steps consist of those for deriving a performance function (Section 3), and for using the derived performance function as feedback to the agent with a learning algorithm (Section 4).</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1345" end_page="1348" type="metho">
    <SectionTitle>
3 Using PARADISE to Derive a
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1345" end_page="1346" type="sub_section">
      <SectionTitle>
Performance Function
3.1 ELVIS Spoken Dialogue System
</SectionTitle>
      <Paragraph position="0"> ELVIS is implemented using a general-purpose platform for spoken dialogue agents (Kamm et al., 1997). The platform consists of a speech recognizer that supports barge-in so that the user can interrupt the agent when it is speaking. It also provides an audio server for both voice recordings and text-to-speech ('I~I'S), an interface between the computer running ELVIS and the telephone network, a module for application specific functions, and modules for specifying the application grammars and the dialogue manager. Our experiments are based on modifications to the dialogue manager as described below. null The dialogue manager is based on a state machine. Each state specifies transitions to other states and the conditions that license these transitions, as well as a grammar for what the user can say.</Paragraph>
      <Paragraph position="1"> State definitions also include the specification of agent prompts in terms of templates, with variables that are instantiated each time the state is entered.</Paragraph>
      <Paragraph position="2"> Prompts include: (1) an initial prompt, which the agent says upon entering the state (this may include a response to the user's current request); (2) a help prompt which the agent says if the user says help; (3) multiple rejection prompts which the agent says if the speech recognizer confidence is too low to continue without more user input; (4) multiple time-out prompts which the agent produces if the user doesn't say anything.</Paragraph>
      <Paragraph position="3"> Each of these specifications is affected by the agent' s dialogue strategy. An agent' s dialogue strategy is implemented as a combination of the prompts  that are played to the user and the state transitions that the agent makes in response to the user' s utterance. In particular, alternative prompts can be specified for all types of prompts (initial, help, rejection and timeout) to provide alternate dialogue strategies in each state. We implemented alternate strategies for reading messages, for initiative and for summarization. null The alternate strategies for reading messages are Read-First, Read-Summarize-Only and Read-Choice-Prompt as in dialogues D1, D2, and D3.</Paragraph>
      <Paragraph position="4"> The alternate strategies for initiative are a system-initiative (SI) strategy and a mixed-initiative (MI) strategy. In the SI strategy, the system maintains control of the dialogue and the user must specify one query parameter at a time, as in D4.</Paragraph>
      <Paragraph position="5">  (D4) A: Hi, Elvis here. You have 5 new and 0 unread messages in your inbox. Say Read, Summarize, or say Help for more options.</Paragraph>
      <Paragraph position="6"> U: Read.</Paragraph>
      <Paragraph position="7"> A: Select by Sender, Subject or in order of arrival? U: Sender, A: Which Sender? U: Kim.</Paragraph>
      <Paragraph position="8"> A: In the messages from Kim, there's 1 message about &amp;quot;Interviewing Antonio&amp;quot;. ......</Paragraph>
      <Paragraph position="9"> In the MI strategy, the system never tells the user what to do, and the user can take control of the dialogue and specify exactly what they want to do in a single utterance, as in D5.</Paragraph>
      <Paragraph position="10"> (D5) A: Hi, Elvis here. I' ve got your mail.</Paragraph>
      <Paragraph position="11"> U: Read me my messages from Kim.</Paragraph>
      <Paragraph position="12">  We also implemented three alternate summarization strategies as responses to a user request to Summarize my messages. The basis for these alternates is ELVIS's decision about which message attributes to mention in the summary. The Summarize-Choice strategy in D6 asks the user to specify which of the relevant attributes to summarize by.</Paragraph>
      <Paragraph position="13">  (D6) A: Summarize by subject, by sender, or both? U: Subject.</Paragraph>
      <Paragraph position="14"> A: In your top level inbox, there's 1 message about &amp;quot;Lunch&amp;quot; 1 message about &amp;quot;Interviewing Antonio&amp;quot; .1 message about &amp;quot;Call Me Tomorrow,&amp;quot; 1 message about &amp;quot;Evaluation Group Meeting,&amp;quot; and 1 message about &amp;quot;Agent Personality:' The Summarize-Both strategy in D7 uses all attributes in the summary: (DT) A: In your top level inbox, from Kim, there's 1  message about &amp;quot;Lunch:' From Michael, there's 1 message about &amp;quot;Evaluation group meeting.&amp;quot; From Noah, there's 1 message about &amp;quot;Call Me Tomorrow&amp;quot; and 1 message about &amp;quot;Interviewing Antonio?' And from Owen, there's 1 message about &amp;quot;Agent Personality?' In the Summarize-System strategy in D8, ELVIS summarizes by subject or by sender based on the current context. For instance, if the user is in the top level inbox, ELVIS will summarize by sender, but if the user is situated in a folder containing messages from Noah, ELVIS will summarize by subject, as a summary by sender would provide no new information. null (D8) A: In your top level inbox, there's 1 message from Kim, 2 messages from Noah, 1 message from Michael, and 1 message from Owen.</Paragraph>
      <Paragraph position="15"> Transitions between states are driven by the user' s conversational behavior, such as whether s/he says anything and what s/he says, the semantic interpretation of the user' s utterances, and the settings of the agent' s dialogue strategy parameters.</Paragraph>
    </Section>
    <Section position="2" start_page="1346" end_page="1347" type="sub_section">
      <SectionTitle>
3.2 Experimental Design
</SectionTitle>
      <Paragraph position="0"> Experimental dialogues were collected via two experiments in which users (AT&amp;T summer interns and MIT graduate students) interacted with ELVIS to complete three representative application tasks that required them to access email messages in three different email inboxes. In the second experiment, users participated in a tutorial dialogue before doing the three tasks. The first experiment varied initiative strategies and the second experiment varied the presentation strategies for reading messages and summarizing folders. In order to have adequate data for learning, the agent must explore the space of strategy combinations and collect enough samples of each combination. In the second experiment, we parameterized the agent so that each user interacted with three different versions of ELVIS, one for each task. These experiments resulted in a corpus of 108 dialogues testing the initiative strategies, and a corpus of 124 dialogues testing the presentation strategies. null Each of the three tasks were performed in sequence, and each task consisted of two scenarios.</Paragraph>
      <Paragraph position="1"> Following PARADISE, the agent and the user had to exchange information about criteria for selecting messages and information within the message body in each scenario. Scenario 1.1 is typical.</Paragraph>
      <Paragraph position="2"> * 1.1: You are working at home in the morning and plan to go directly to a meeting when you go into  work. Kim said she would send you a message telling you where and when the meeting is. Find out the Meeting Time and the Meeting Place.</Paragraph>
      <Paragraph position="3"> Scenario 1.1 is represented in terms of the attribute value matrix (AVM) in Table 1. Successful completion of a scenario requires that all attribute-values must be exchanged (Walker et al., 1997). The AVM representation for all six scenarios is similar to Table 1, and is independent of ELVIS's dialogue strategy.</Paragraph>
    </Section>
    <Section position="3" start_page="1347" end_page="1347" type="sub_section">
      <SectionTitle>
3.3 Data Collection
</SectionTitle>
      <Paragraph position="0"> Three different methods are used to collect the measures for applying the PARADISE framework and the data for learning: (1) All of the dialogues are recorded; (2) The dialogue manager logs the agent's dialogue behavior and a number of other measures discussed below; (3) Users fill out web page forms after each task (task success and user satisfaction measures). Measures are in boldface below.</Paragraph>
      <Paragraph position="1"> The dialogue recordings are used to transcribe the user's utterances to derive performance measures for speech recognition, to check the timing of the interaction, to check whether users barged in on agent utterances (Barge In), and to calculate the elapsed time of the interaction (ET).</Paragraph>
      <Paragraph position="2"> For each state, the system logs which dialogue strategy the agent selects. In addition, the number of timeout prompts (Timeout Prompts), Recognizer Rejections, and the times the user said Help (Help Requests) are logged. The number of System Turns and the number of User Turns are calculated on the basis of this data. In addition, the recognition result for the user's utterance is extracted from the recognizer and logged. The transcriptions are used in combination with the logged recognition result to calculate a concept accuracy measure for each utteranceJ Mean concept accu- null contains two concepts, the read function, and the sender:kim selection criterion. If the system understood only that the user said Read, concept accuracy would be .5.</Paragraph>
      <Paragraph position="3"> used as a Mean Recognition Score MRS for the dialogue. null The web page forms are the basis for calculating Task Success and User Satisfaction measures. Users reported their perceptions as to whether they had completed the task (Comp), 2 and filled in an AVM with the information that they had acquired from the agent, e.g. the values for Email.attl and Email.att2 in Table 1. The AVM matrix supports calculating Task Success objectively by using the Kappa statistic to compare the information in the AVM that the users filled in with an AVM key such as that in Table 1 (Walker et al., 1997).</Paragraph>
      <Paragraph position="4"> In order to calculate User Satisfaction, users were asked to evaluate the agent's performance with a user satisfaction survey. The data from the survey resulted in user satisfaction values that range from 0 to 33. See (Walker et al., 1998) for more details.</Paragraph>
    </Section>
    <Section position="4" start_page="1347" end_page="1347" type="sub_section">
      <SectionTitle>
3.4 Deriving a Performance Function
</SectionTitle>
      <Paragraph position="0"> Overall, the results showed that users could successfully complete the tasks with all versions of ELVIS.</Paragraph>
      <Paragraph position="1"> Most users completed each task in about 5 minutes and average ,~ over all subjects and tasks was .82.</Paragraph>
      <Paragraph position="2"> However, there were differences between strategies; as an example see Table 2.</Paragraph>
    </Section>
    <Section position="5" start_page="1347" end_page="1348" type="sub_section">
      <SectionTitle>
for Initiative Strategies
</SectionTitle>
      <Paragraph position="0"> PARADISE provides a way to calculate dialogue agent performance as a linear combination of a number of simpler metrics that can be directly measured such as those in Table 2. Performance for any (sub)dialogue D is defined by the following equation: null</Paragraph>
      <Paragraph position="2"> where o~ is a weight on n, ci are the cost functions, which are weighted by wl, and .Af is a Z score normalization function (Walker et al., 1997; Cohen, 1995). The Z score normalization function ensures that, when the weights c~ and wi are solved for, that the magnitude of the weights reflect the magnitude of the contribution of that factor to performance.</Paragraph>
      <Paragraph position="3"> The performance function is derived through multivariate linear regression with User Satisfaction as the dependent variable and all the other measures as independent variables (Walker et al., 1997). See Table 2. In the ELVIS data, an initial regression over the measures in Table 2 suggests that Comp, MRS and ET are the only significant contributors to User Satisfaction. A second regression including only these factors results in the following equation:</Paragraph>
      <Paragraph position="5"> with Comp (t=2.58, p =.01), MRS (t =5.75, p =.0001) and ET (t=-l.8, p=.07) significant predictors, accounting for 38% of the variance in R-Squared (F (3,104)=21.2, p &lt;.0001). The magnitude of the coefficients in this equation demonstrates the performance of the speech recognizer (MRS) is the most important predictor, followed by users' perception of Task Success (Comp) and efficiency (ET). In the next section, we show how to use this derived performance equation to compute the utility of the final state of the dialogue.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1348" end_page="1348" type="metho">
    <SectionTitle>
4 Applying Q-learning to ELVIS
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="1348" end_page="1348" type="sub_section">
      <SectionTitle>
Experimental Data
</SectionTitle>
      <Paragraph position="0"> The basic idea is to apply the performance function to the measures logged for each dialogue Di, thereby replacing a range of measures with a single performance value 19i. Given the performance values Pi, any of a number of automatic learning algorithms can be used to determine which sequence of action choices (dialogue strategies) maximize utility, by using/~ as the utility for the final state of the dialogue Di. Possible algorithms include Genetic Algorithms, Q-learning, TD-Leaming, and Adaptive Dynamic Programming (Russell and Norvig, 1995). Here we use Q-learning to illustrate the method (Watkins, 1989). See (Fromer, 1998) for experiments using alternative algorithms.</Paragraph>
      <Paragraph position="1"> The utility of doing action a in state Si, U(a, Si) (its Q-value), can be calculated terms of the utility of a successor state S i, by obeying .the following recursive equation:</Paragraph>
      <Paragraph position="3"> where R(Si) is a reward associated with being in state Si, a is a strategy from a finite set of strategies A that are admissable in state Si, and M~j is the probability of reaching state Sj if strategy a is selected in state Si.</Paragraph>
      <Paragraph position="4"> In the experiments reported here, the reward associated with each state, R(SI), is zero. 3 In addition, since reliable a priori prediction of a user action in a particular state is not possible (for example the user may say Help or the speech recognizer may fail to understand the user), the state transition model M/~ is estimated from the logged state-strategy history for the dialogue.</Paragraph>
      <Paragraph position="5"> The utility values can be estimated to within a desired threshold using Value Iteration, which updates the estimate of U(a, Si), based on updated utility estimates for neighboring states, so that the equation above becomes: Un+l(a, Sd = R(Sd + ~ M~ m/xUn(a',Sj)  where Un(a, Si) is the utility estimate for doing a in state Si after n iterations. Value Iteration stops when the difference between Un(a, Si) and Un+l (a, Si) is below a threshold, and utility values have been associated with states where strategy selections were made. After experimenting with various threshholds, we used a threshold of 5% of the performance range of the dialogues.</Paragraph>
      <Paragraph position="6"> The result of applying Q-learning to ELVIS data for the initiative strategies is illustrated in Figure 1. The figure plots utility estimates for SI and MI over time. It is clear that the SI strategy is better because it has a higher utility: at the end of 108 training sessions (dialogues), the utility of SI is estimated at .249 and the utility of MI is estimated at -0.174.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="1348" end_page="1349" type="metho">
    <SectionTitle>
TYPE STRATEGY UTILITY
</SectionTitle>
    <Paragraph position="0"> after 124 Training Sessions The SI and MI strategies affect the whole dialogue; the presentation strategies apply locally and a See (Fromer, 1998) for experiments in which local rewards are nonzero.</Paragraph>
    <Paragraph position="1">  can be actived in different states of the dialogue. We examined the variation in a strategy' s utility at each phase of the task, by representing the task as having three phases: no scenarios completed, one scenario completed and both scenarios completed. Table 3 reports utilities for the use of a strategy after one scenario was completed. The policy implied by the utilities at other phases of the task are the same. See (Fromer, 1998) for more detail.</Paragraph>
    <Paragraph position="2"> The Read-First strategy in D1 has the best performance of the read strategies. This strategy takes the initiative to read a message, which might result in messages being read that the user wasn't interested in. However since the user can barge-in on system utterances, perhaps little is lost by taking the initiative to start reading a message. After 124 training sessions, the best summarize strategy is Summarize-System, which automatically selects which attributes to summarize by, and so does not incur the cost of asking the user to specify these attributes. However, the utilities for the Summarize-Choice strategy have not completely converged after 124 trials.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML