File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1301_metho.xml
Size: 15,760 bytes
Last Modified: 2025-10-06 14:10:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1301"> <Title>Adaptive Help for Speech Dialogue Systems Based on Learning and Forgetting of Speech Commands</Title> <Section position="4" start_page="1" end_page="1" type="metho"> <SectionTitle> 2 Learning of Commands </SectionTitle> <Paragraph position="0"> In this section, we determine which function most adequately describes learning in our environment.</Paragraph> <Paragraph position="1"> In the literature, two mathematically functions can be found. These functions help to predict the time necessary to achieve a task after several trials.</Paragraph> <Paragraph position="2"> One model was suggested by (Newell and Rosenbloom, 1981) and describes learning with a power law. Heathcote et. al. (2002) instead suggest to use an exponential law.</Paragraph> <Paragraph position="4"> In both equations T represents the time to solve a task, B is the time needed for the first trial of a task, N stands for the number of trials and a is the learning rate parameter that is a measure for the learning speed. The parameter a has to be determined empirically. We conducted memory tests to determine, which of the the two functions best describes the learning curve for our specific environment. null</Paragraph> <Section position="1" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.1 Test Design for Learning Experiments </SectionTitle> <Paragraph position="0"> The test group consisted of seven persons. The subjects' age ranged from 26 to 43 years. Five of the subjects had no experience with an SDS, two had very little. Novice users were needed, because we wanted to observe only novice learning behaviour. The tests lasted about one hour and were conducted in a BMW, driving a predefined route with moderate traffic.</Paragraph> <Paragraph position="1"> Each subject had to learn a given set of ten tasks with differing levels of complexity (see table 1).</Paragraph> <Paragraph position="2"> Complexity is measured by the minimal necessary dialogue steps to solve a task. The tasks were not directly named, but explained in order not to mention the actual command and thus avoid any influence on the learning process. There was no help allowed except the options function. The subjects received the tasks one by one and had to search for the corresponding speech command in the options. After completion of a task in the testset the next task was presented. The procedure was repeated until all commands had been memorized.</Paragraph> <Paragraph position="3"> For each trial, we measured the time span from SDS activation until the correct speech command was spoken. The time spans were standardized by dividing them through the number of the minimal necessary steps that had to be taken to solve a task.</Paragraph> </Section> <Section position="2" start_page="1" end_page="1" type="sub_section"> <SectionTitle> 2.2 Results </SectionTitle> <Paragraph position="0"> In general, we can say that learning takes place very fast in the beginning and with an increasing amount of trials the learning curve flattens and approximates an asymptote. The asymptote at Tmin = 2s defines the maximum expert level, that means that a certain task can not be completed faster.</Paragraph> <Paragraph position="1"> The resulting learning curve is shown in Fig. 3. In order to determine whether equation (1) or (2) describes this curve more exactly, we used a chi-squared goodness-of-fit test (Rasch et al., 2004). The more kh2 tends to zero, the less the observed values (fo) differ from the estimated values (fe).</Paragraph> <Paragraph position="3"> According to Fig. 2, the power law has a minimum (kh2min = 0.42) with a learning rate parameter of a = 1.31. The exponential law has its minimum (kh2min = 2.72) with a = 0.41. This means that the values of the exponential law differ more from the actual value than the power law's values.</Paragraph> <Paragraph position="4"> Therefore, we use the power law (see Fig. 3(a)) to describe learning in our environment.</Paragraph> </Section> </Section> <Section position="5" start_page="1" end_page="3" type="metho"> <SectionTitle> 3 Forgetting of Commands </SectionTitle> <Paragraph position="0"> The second factor influencing our algorithm for the calculation of options is forgetting. If a command was not in use for a long period of time, we can assume that this command will be forgotten. In this section, we determine how long commands are being remembered and deduce a function most adequately describing the process of for- null Task 1 Listen to a radio station with a specific frequency (a) kh2 for the Power Law (b) kh2 for the Exponential Law getting in our environment. In (Rubin and Wenzel, 1996) 105 mathematical models on forgetting were compared to several previously published retention studies. The results showed that there is no generally appliable mathematical model, but a few models fit to a large number of studies. The most adequate models based on a logarithmic function, an exponential function, a power function and a square root function.</Paragraph> <Paragraph position="2"> The variable u represents the initial amount of learned items. The period of time is represented through t while d defines the decline parameter of the forgetting curve. In order to determine the best forgetting curve for SDS interactions, we conducted tests in which the participants' memory skills were monitored.</Paragraph> <Section position="1" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 3.1 Test design for forgetting experiments </SectionTitle> <Paragraph position="0"> The second experiment consisted of two phases, learning and forgetting. In a first step ten subjects learned a set of two function blocks, each consisting of ten speech commands (see table (2)). The learning phase took place in a BMW. The tasks and the corresponding commands were noted on a handout. The participants had to read the tasks and uttered the speech commands. When all 20 tasks were completed, this step was repeated as long as all SDS commands could be freely reproduced. These 20 commands built the basis for our retention tests.</Paragraph> <Paragraph position="1"> Our aim was to determine how fast forgetting took place, so we conducted several memory tests over a time span of 50 days. The tests were conducted in a laboratory environment and should imitate the situation in a car if the driver wants to perform a task (e.g. listen to the radio) via SDS. Because we wanted to avoid any influence on the participant's verbal memory, the intentions were not presented verbally or in written form but as iconic representations (see Fig. 4). Each icon represented an intention and the corresponding speech command had to be spoken.</Paragraph> <Paragraph position="2"> fluence the results. As a measure for forgetting, we used the number of commands recalled correctly after a certain period of time.</Paragraph> </Section> <Section position="2" start_page="3" end_page="3" type="sub_section"> <SectionTitle> 3.2 Results </SectionTitle> <Paragraph position="0"> The observed forgetting curve can be seen in Fig.</Paragraph> <Paragraph position="1"> 6(a). In order to determine whether equation (4), (5), (6) or (7) fits best to our findings, we used the chi-squared goodness-of-fit test (cf. section 2.2).</Paragraph> <Paragraph position="2"> The minima kh2 for the functions are shown in table (3). Because the exponential function (see Fig.</Paragraph> <Paragraph position="3"> 6(b)) delivers the smallest kh2, we use equation (5) for our further studies.</Paragraph> <Paragraph position="4"> Concerning forgetting in general we can deduce that once the speech commands have been learned, forgetting takes place faster in the beginning. With increasing time, the forgetting curve flattens and at any time tends to zero. Our findings show that after 50 days about 75% of the original number of speech commands have been forgotten. Based on the exponential function, we estimate that complete forgetting will take place after approximately 100 days.</Paragraph> </Section> </Section> <Section position="6" start_page="3" end_page="6" type="metho"> <SectionTitle> 4 Providing Adaptive Help </SectionTitle> <Paragraph position="0"> As discussed in previous works, several adaptive components can be included in dialogue systems, e.g. user adaption (Hassel and Hagen, 2005), content adaption, situation adaption and task adaption (Libuda and Kraiss, 2003). We concentrate on user and content adaption and build a user model.</Paragraph> <Paragraph position="1"> According to Fischer (2001), the user's knowledge about complex systems can be divided into several parts (see Fig. 7): well known and regularly used concepts (F1), vaguely and occasionally used concepts (F2) and concepts the user believes to exist in the system (F3). F represents the complete functionality of the system. The basic idea behind the adaptive help system is to use information about the driver's behaviour with the SDS to provide only help on topics he is not so familiar with. Thus the help system focuses on F2, F3 within F and finally the complete functionality F.</Paragraph> <Paragraph position="2"> For every driver an individual profile is gen- null complex systems erated, containing information about usage frequency and counters for every function. Several methods can be used to identify the driver, e.g.</Paragraph> <Paragraph position="3"> a personal ID card, a fingerprint system or face recognition (Heckner, 2005). We do not further focus on driver identification in our prototype.</Paragraph> <Section position="1" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 4.1 Defining an Expert User </SectionTitle> <Paragraph position="0"> In section 2 we observed that in our environment, the time to learn speech commands follows a power law, depending on the number of trials (N), the duration of the first interaction (B) and the learning rate parameter (a). If we transform equation (1), we are able to determine the number of trials that are needed to execute a function in a given time T.</Paragraph> <Paragraph position="2"> If we substitute T with the minimal time Tmin an expert needs to execute a function (Tmin = 2s, cf.</Paragraph> <Paragraph position="3"> section 2.2), we can estimate the number of trials which are necessary for a novice user to become an expert. The only variable is the duration B, which has to be measured for every function at its first usage.</Paragraph> <Paragraph position="4"> Additionally, we use two stereotypes (novice and expert) to classify a user concerning his general experience with the SDS. According to Hassel (2006), we can deduce a user's experience by monitoring his behaviour while using the SDS.</Paragraph> <Paragraph position="5"> The following parameters are used to calculate an additional user model: help requests h (user asked for general information about the system), options requests o (user asked for the currently available speech commands), timeouts t (the ASR did not get any acoustic signal), onset time ot (user needed more than 3 sec to start answering) and barge-in b (user starts speech input during the system's speech output). The parameters are noted in a vector -UM.</Paragraph> <Paragraph position="6"> The parameters are differently weighted by a weight vector -UMw, because each parameter is a different indicator for the user's experience.</Paragraph> <Paragraph position="8"> The final user model is calculated by the scalar product of -UM x -UMw. If the resulting value is over a predefined threshold, the user is categorized as novice and a more explicit dialogue strategy is applied, e.g. the dialogues contain more expamples. If the user model delivers a value under the threshold, the user is categorized as expert and an implicit dialogue strategy is applied.</Paragraph> </Section> <Section position="2" start_page="5" end_page="6" type="sub_section"> <SectionTitle> 4.2 Knowledge Modeling Algorithm </SectionTitle> <Paragraph position="0"> Our findings from the learning experiments can be used to create an algorithm for the presentation of the context specific SDS help. Therefore, the option commands of every context are split into several help layers (see Fig. 8). Each layer contains a items divided into three help layers maximum of four option commands in order to reduce the driver's mental load (Wirth, 2002). Each item has a counter, marking the position within the layers. The initial order is based on our experience with the usage frequency by novice users. The first layer contains simple and frequently used commands, e.g. dial number or choose radio station. Complex or infrequent commands are put into the lower layers. Every usage of a function is logged by the system and a counter i is increased by 1 (see equation 10).</Paragraph> <Paragraph position="1"> Besides the direct usage of commands, we also take transfer knowledge into account. There are several similar commands, e.g. the selection of entries in different lists like phonebook, adressbook or in the cd changer playlists. Additionally, there are several commands with the same parameters, e.g. radio on/off, traffic program on/off etc. All similar speech commands were clustered in functional families. If a user is familiar with one command in the family, we assume that the other functions can be used or learned faster. Thus, we introduced a value, s, that increases the indices of all cammnds within the functional families. The value of s depends on the experience level of the user.</Paragraph> <Paragraph position="2"> inew = braceleftBigg iold + 1 direct usage iold + s similar command (10) In order to determine the value of s, we conducted a small test series where six novice users were told to learn ten SDS commands from different functional families. Once they were familiar with the set of commands, they had to perform ten tasks requiring similar commands. The subjects were not allowed to use any help and should derive the necessary speech command from their prior knowledge about the SDS. Results showed that approximately 90% of the tasks could be completed by deducing the necessary speech commands from the previously learned commands. Transferring these results to our algorithm, we assume that once a user is an expert on a speech command of a functional family, the other commands can be derived very well. Thus we set sexpert = 0.9 for expert users and estimate that for novice users the value should be snovice = 0.6. These values have to be validated in further studies.</Paragraph> <Paragraph position="3"> Every usage of a speech command increases its counter and the counters of the similar commands. These values can be compared to the value of N resulting from equation (8). N defines a threshold that marks a command as known or unknown. If a driver uses a command more often than the corresponding threshold (i > N), our assumption is that the user has learned it and thus does not need help on this command. It can be shifted into the lowest layer and the other commands move over to the upper layers (see Fig. 9).</Paragraph> <Paragraph position="4"> If a command is not in use for a long period of time (cf. section 3.2), the counter of this command steadily declines until the item's initial counter value is reached. The decline itself is based on the results of our forgetting experiments (cf. section Figure 9: Item A had an initial counter of i = 1 and was presented in layer 1; after it has been used 15 times (i > N), it is shifted into layer 3 and the counter has a new value i = 16 3.2) and the behaviour of the counter is described by equation (5).</Paragraph> </Section> </Section> class="xml-element"></Paper>