File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-2044_evalu.xml
Size: 2,643 bytes
Last Modified: 2025-10-06 13:59:40
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-2044"> <Title>Evolving optimal inspectable strategies for spoken dialogue systems</Title> <Section position="5" start_page="174" end_page="175" type="evalu"> <SectionTitle> 4 Experimental Results </SectionTitle> <Paragraph position="0"> Table 1 lists the total reward (payoff) averaged over the 10 cross-validated test trials for each experiment, expressed as a percentage of the maximum payoff.</Paragraph> <Paragraph position="1"> In these experiments, the maximum payoff represents the shortest possible successful dialogue. For example, the maximum payoff for Experiment 1 is 195,000: 100,000 for filling the slots plus 100,000 for greeting the user at the start of the dialogue minus 5000 for the minimum number of turns (five) taken to complete the dialogue successfully. The average payoff for the 10 trials trained on simulated user A and tested on user B was 193,877 - approximately 99.4% of the maximum possible. In light of Exp. Training/Test Users Payoff (%) these results and the stochastic user responses, we suggest that these evolved strategies would compare favourably with any handcoded strategies.</Paragraph> <Paragraph position="2"> It is instructive to compare the rate of convergence for different strategies. Figure 1 shows the average payoff for the 100,000 dialogues trained with simulated user A in Experiments 3 and 4. It shows that Experiment 3 approached the optimal policy after approximately 20,000 dialogues whereas Experiment 4 converged after approximately 5000 dialogues. This is encouraging because it suggests that XCS remains focused on finding the shortest successful dialogue even when the number of available actions increases.</Paragraph> <Paragraph position="3"> training in Experiments 3 and 4 (simulated user A). Finally, we look at how to represent an optimal strategy. From the logs of the test dialogues we extracted the state-action rules (classifiers) that were executed. For example, in Experiment 4, the op- null timal strategy is represented by 17 classifiers. By comparison, a purely RL-based strategy would define an optimal action for every theoretically possible state (i.e. 128). In this example, the evolutionary approach has reduced the number of rules from 128 to 17 (a reduction of 87%) and is therefore much more easily inspectable. In fact, the size of the optimal strategy can be reduced further by selecting the most general classifier for each action (Table 2). These rules are sufficient since they cover the 60 states that could actually occur while following the optimal strategy.</Paragraph> </Section> class="xml-element"></Paper>