File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/c00-1073_concl.xml

Size: 2,626 bytes

Last Modified: 2025-10-06 13:52:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1073">
  <Title>Automatic Optimization of Dialogue Management</Title>
  <Section position="7" start_page="507" end_page="507" type="concl">
    <SectionTitle>
6 Discussion
</SectionTitle>
    <Paragraph position="0"> This paper presents a practical methodology for applying RL to optimizing dialogue strategies in spoken dialogue systems, and shows empirically that the method improves performance over the EIC strategy in NJFun. A companion paper (Singh et al., 2000) shows that the learned strategy is not only better than EIC, but also better than other fixed choices proposed in the literature. Our results demonstrate that the application of RL allows one to empirically optimize a system's dialogue strategy by searching through a much larger search space than can be explored with more traditional lnethods (i.e. empirically testing several versions of a systent).</Paragraph>
    <Paragraph position="1"> RL has been appled to dialogue systems in previous work, but our approach ditlhrs from previous work in several respects. Biermann and Long (1996) did not test RL in an implemented system, and the experiments of Levin et 31. (2000) utilized a simulated user model. Walker et al. (1998)'s methodology is similar to that used here, in testing RL with an imt)lelnented system with human users. However that work only explored strategy choices at 13 states in the dialogue, which conceivably could have been explored with more traditional methods (~ts compared to the 42 choice states explored here).</Paragraph>
    <Paragraph position="2"> We also note that our learned strategy made dialogue decisions based on ASR confidence in conjunction with other features, mid alto varied initiative and confirmation decisions at a finer grain than previous work; as such, our learned strategy is not; a standard strategy investigated in the dialogue systeln literature. For example, we would not have predicted the complex and interesting back-off strategy with respect to initiative when reasking for an attribute. null To see how our method scales, we are al)plying RL to dialogue systems for eustolner care and tbr travel planning, which are more complex task-oriented domains. As fllture work, we wish to understand the aforementioned results on the subjective reward measures, explore the potential difference between optimizing tbr expert users and novices, automate the choice of state space for dialogue systems, ilwestigate the use of a learned reward function (Walker et al., 1998), and explore the use of more informative non-terminal rewards.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML