File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/p04-1012_metho.xml
Size: 21,591 bytes
Last Modified: 2025-10-06 14:08:59
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1012"> <Title>User Expertise Modelling and Adaptivity in a Speech-based E-mail System</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 System functionality </SectionTitle> <Paragraph position="0"> AthosMail is an interactive speech-based e-mail system being developed for mobile telephone use in the project DUMAS (Jokinen and Gamback, 2004). The research goal is to investigate adaptivity in spoken dialogue systems in order to enable users to interact with the speech-based systems in a more flexible and natural way. The practical goal of AthosMail is to give an option for visually impaired users to check their email by voice commands, and for sighted users to access their email using a mobile phone.</Paragraph> <Paragraph position="1"> The functionality of the test prototype is rather simple, comprising of three main functions: navigation in the mailbox, reading of messages, and deletion of messages. For ease of navigation, AthosMail makes use of automatic classification of messages by sender, subject, topic, or other relevant criteria, which is initially chosen by the system. The classification provides different &quot;views&quot; to the mailbox contents, and the user can move from one view to the next, e.g. from Paul's messages to Maria's messages, with commands like &quot;next&quot;, &quot;previous&quot; or &quot;first view&quot;, and so on. Within a particular view, the user may navigate from one message to another in a similar fashion, saying &quot;next&quot;, &quot;fourth message&quot; or &quot;last message&quot;, and so on. Reading messages is straightforward, the user may say &quot;read (the message)&quot;, when the message in question has been selected, or refer to another message by saying, for example, &quot;read the third message&quot;. Deletion is handled in the same way, with some room for referring expressions.</Paragraph> <Paragraph position="2"> The user has the option of asking the system to repeat its previous utterance.</Paragraph> <Paragraph position="3"> The system asks for a confirmation when the user's command entails something that has more potential consequences than just wasting time (by e.g. reading the wrong message), namely, quitting and the deletion of messages. AthosMail may also ask for clarifications, if the speech recognition is deemed unreliable, but otherwise the user has the initiative.</Paragraph> <Paragraph position="4"> The purpose of the AthosMail user model is to provide flexibility and variation in the system utterances. The system monitors the user's actions in general, and especially on each possible system act. Since the user may master some part of the system functionality, while not be familiar with all commands, the system can thus provide responses tailored with respect to the user's familiarity with individual acts.</Paragraph> <Paragraph position="5"> The user model produces recommendations for the dialogue manager on how the system should respond depending on the assumed competence levels of the user. The user model consists of different subcomponents, such as Message Prioritizing, Message Categorization and User Preference components (Jokinen et al, 2004). The Cooperativity Model utilizes two parameters, explicitness and dialogue control (i.e. initiative), and the combination of their values then guides utterance generation. The former is an estimate of the user's competence level, and is described in the following sections.</Paragraph> <Paragraph position="6"> 3 User expertise modelling in AthosMail AthosMail uses a three-level user expertise scale to encode varied skill levels of the users. The common assumption of only two classes, experts and novices, seems too simple a model which does not take into account the fact that the user's expertise level increases gradually, and many users consider themselves neither novices nor experts but something in between. Moreover, the users may be experienced with the system selectively: they may use some commands more often than others, and thus their skill levels are not uniform across the system functionality.</Paragraph> <Paragraph position="7"> A more fine-grained description of competence and expertise can also be presented. For instance, Dreyfus and Dreyfus (1986) in their studies about whether it is possible to build systems that could behave in the way of a human expert, distinguish five levels in skill acquisition: Novice, Advanced beginner, Competent, Proficient, and Expert. In practical dialogue systems, however, it is difficult to maintain subtle user models, and it is also difficult to define such observable facts that would allow fine-grained competence levels to be distinguished in rather simple application tasks.</Paragraph> <Paragraph position="8"> We have thus ended up with a compromise, and designed three levels of user expertise in our model: novice, competent, and expert. These levels are reflected in the system responses, which can vary from explicit to concise utterances depending on how much extra information the system is to give to the user in one go.</Paragraph> <Paragraph position="9"> As mentioned above, one of the goals of the Cooperativity model is to facilitate more natural interaction by allowing the system to adapt its utterances according to the perceived expertise level. On the other hand, we also want to validate and assess the usability of the three-level model of user expertise. While not entering into discussions about the limits of rule-based thinking (e.g. in order to model intuitive decision making of the experts according to the Dreyfus model), we want to study if the designed system responses, adapted according to the assumed user skill levels, can provide useful assistance to the user in interactive situations where she is still uncertain about how to use the system.</Paragraph> <Paragraph position="10"> Since the user can always ask for help explicitly, our main goal is not to study the decrease in the user's help requests when she becomes more used to the system, but rather, to design the system responses so that they would reflect the different skill levels that the system assumes the user is on, and to get a better understanding whether the expertise levels and their reflection in the system responses is valid or not, so as to provide the best assistance for the user.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Dialogue act specific explicitness </SectionTitle> <Paragraph position="0"> The user expertise model utilized in AthosMail is a collection of parameters aimed at observing telltale signals of the user's skill level and a set of second-order parameters (dialogue act specific explicitness DASEX, and dialogue control CTL) that reflect what has been concluded from the first-order parameters. Most first-order parameters are tuned to spot incoherence between new information and the current user model (see below). If there's evidence that the user is actually more experienced than previously thought, the user expertise model is updated to reflect this. The process can naturally proceed in the other direction as well, if the user model has been too fast in concluding that the user has advanced to a higher level of expertise. The second-order parameters affect the system behaviour directly. There is a separate experience value for each system function, which enables the system to behave appropriately even if the user is very experienced in using one function but has never used another.</Paragraph> <Paragraph position="1"> The higher the value, the less experienced the user; the less experienced the user, the more explicit the manner of expression and the more additional advice is incorporated in the system utterances.</Paragraph> <Paragraph position="2"> The values are called DASEX, short for Dialogue Act Specific Explicitness, and their value range corresponds to the user expertise as follows: 1 = expert, 2 = competent, 3 = novice.</Paragraph> <Paragraph position="3"> The model comprises an online component and an offline component. The former is responsible for observing runtime events and calculating DASEX recommendations on the fly, whereas the latter makes long-time observations and, based on these, calculates default DASEX values to be used at the beginning of the next session. The offline component is, so to speak, rather conservative; it operates on statistical event distributions instead of individual parameter values and tends to round off the extremes, trying to catch the overall learning curve behind the local variations. The components work separately. In the beginning of a new session, the current offline model of the user's skill level is copied onto the online component and used as the basis for producing the DASEX recommendations, while at the end of each session, the offline component calculates the new default level on the basis of the occurred events.</Paragraph> <Paragraph position="4"> Figure 1 provides an illustration of the relationships between the parameters. In the next section we describe them in detail.</Paragraph> <Paragraph position="5"> The online component can be seen as an extension of the ideas proposed by Yankelovich (1996) and Chu-Carroll (2000). The relative weights of the parameters are those used in our user tests, partly based on those of (Krahmer et al, 1999). They will be fine-tuned according to our results.</Paragraph> <Paragraph position="6"> Figure 1 The functional relationships between the offline and online parameters used to calculate the DASEX values.</Paragraph> <Paragraph position="7"> DASEX (dialogue act specific explicitness): The value is modified during sessions. Value: DDASEX (see offline parameters) modified by SDAI, HLP, TIM, and INT as specified in the respective parameter definitions.</Paragraph> <Paragraph position="8"> SDAI (system dialogue act invoked): A set of parameters (one for each system dialogue act) that tracks whether a particular dialogue act has been invoked during the previous round. If SDAI = 'yes', then DASEX -1. This means that when a particular system dialogue move has been instantiated, its explicitness value is decreased and will therefore be presented in a less explicit form the next time it is instantiated during the same session.</Paragraph> <Paragraph position="9"> HLP (the occurrence of a help request by the user): The system incorporates a separate help function; this parameter is only used to notify the offline side about the frequency of help requests.</Paragraph> <Paragraph position="10"> TIM (the occurrence of a timeout on the user's turn): If TIM = 'yes', then DASEX +1. This refers to speech recognizer timeouts.</Paragraph> <Paragraph position="11"> INT (occurrence of a user interruption during system turn): Can be either a barge-in or an interruption by telephone keys. If INT = 'yes', then</Paragraph> <Paragraph position="13"> DDASEX (default dialogue act specific explicitness): Every system dialogue act has its own default explicitness value invoked at the beginning of a session. Value: DASE + GEX / 2.</Paragraph> <Paragraph position="14"> GEX (general expertise): General expertise. A general indicator of user expertise. Value: NSES + OHLP + OTIM / 3.</Paragraph> <Paragraph position="15"> DASE (dialogue act specific experience): This value is based on the number of sessions during which the system dialogue act has been invoked. There is a separate DASE value for every system dialogue act.</Paragraph> <Paragraph position="16"> number of sessions DASE</Paragraph> <Paragraph position="18"> more than 7 1 NSES (number of sessions): Based on the total number of sessions the user has used the system. number of sessions NSES</Paragraph> <Paragraph position="20"> more than 7 1 OHLP (occurrence of help requests): This parameter tracks whether the user has requested system help during the last 1 or 3 sessions. The HLP parameter is logged by the online component. HLP occurred during OHLP the last session 3 the last 3 sessions 2 if not 1 OTIM (occurrence of timeouts): This parameter tracks whether a timeout has occurred during the last 1 or 3 sessions. The TIM parameter is logged by the online component.</Paragraph> <Paragraph position="21"> TIM occurred during OTIM the last session 3 the last 3 sessions 2 if not 1</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 DASEX-dependent surface forms </SectionTitle> <Paragraph position="0"> Each system utterance type has three different surface realizations corresponding to the three DASEX values. The explicitness of a system utterance can thus range between [1 = taciturn, 2 = normal, 3 = explicit]; the higher the value, the more additional information the surface realization will include (cf. Jokinen and Wilcock, 2001). The value is used for choosing between the surface realizations which are generated by the presentation components as natural language utterances. The following two examples have been translated from their original Finnish forms.</Paragraph> <Paragraph position="1"> Example 1: A speech recognition error (the ASR score has been too low).</Paragraph> <Paragraph position="3"> speak clearly, but do not over-articulate, and speak only after the beep.</Paragraph> <Paragraph position="4"> DASEX = 3: I'm sorry, I didn't understand. Please speak clearly, but do not over-articulate, and speak only after the beep. To hear examples of what you can say to the system, say 'what now'.</Paragraph> <Paragraph position="5"> Example 2: Basic information about a message that the user has chosen from a listing of messages from a particular sender.</Paragraph> <Paragraph position="7"> file&quot;. Say 'tell me more', if you want more details.</Paragraph> <Paragraph position="8"> DASEX = 3: First message, about &quot;reply: sample file&quot;. Say 'read', if you want to hear the messages, or 'tell me more', if you want to hear a summary and the send date and length of the message.</Paragraph> <Paragraph position="9"> These examples show the basic idea behind the DASEX effect on surface generation. In the first example, the novice user is given additional information about how to try and avoid ASR problems, while the expert user is only given the error message. In the second example, the expert user gets the basic information about the message only, whereas the novice user is also provided with some possible commands how to continue. A full interaction with AthosMail is given in Appendix 1.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Evaluation of AthosMail </SectionTitle> <Paragraph position="0"> Within the DUMAS project, we are in the process of conducting exhaustive user studies with the prototype AthosMail system that incorporates the user expertise model described above. We have already conducted a preliminary qualitative expert evaluation, the goal of which was to provide insights into the design of system utterances so as to appropriately reflect the three user expertise levels, and the first set of user evaluations where a set of four tasks was carried out during two consecutive days.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Adaptation and system utterances </SectionTitle> <Paragraph position="0"> For the expert evaluation, we interviewed 5 interactive systems experts (two women and three men). They all had earlier experience in interactive systems and interface design, but were unfamiliar with the current system and with interactive email systems in general. Each interview included three walkthroughs of the system, one for a novice, one for a competent, and one for an expert user. The experts were asked to comment on the naturalness and appropriateness of each system utterance, as well as provide any other comments that they may have on adaptation and adaptive systems.</Paragraph> <Paragraph position="1"> All interviewees agreed on one major theme, namely that the system should be as friendly and reassuring as possible towards novices. Dialogue systems can be intimidating to new users, and many people are so afraid of making mistakes that they give up after the first communication failure, regardless of what caused it. Graphical user interfaces differ from speech interfaces in this respect, because there is always something salient to observe as long as the system is running at all.</Paragraph> <Paragraph position="2"> Four of the five experts agreed that in an error situation the system should always signal the user that the machine is to blame, but there are things that the user can do in case she wants to help the system in the task. The system should acknowledge its shortcomings &quot;humbly&quot; and make sure that the user doesn't get feelings of guilt - all problems are due to imperfect design. E.g., the responses in Example 1 were viewed as accusing the user of not being able to act in the correct way.</Paragraph> <Paragraph position="3"> We have since moved towards forms like &quot;I may have misheard&quot;, where the system appears responsible for the miscommunication. This can pave the way when the user is taking the first wary steps in getting acquainted with the system.</Paragraph> <Paragraph position="4"> Novice users also need error messages that do not bother the user with technical matters that concern only the designers. For instance, a novice user doesn't need information about error codes or characteristics of the speech recognizer; when ASR errors occur, the system can simply talk about not hearing correctly; a reference to a piece of equipment that does the job - namely, the speech recognizer - is unnecessary and the user should not be burdened with it.</Paragraph> <Paragraph position="5"> Experienced users, on the other hand, wish to hear only the essentials. All our interviewees agreed that at the highest skill level, the system prompts should be as terse as possible, to the point of being blunt. Politeness words like &quot;I'm sorry&quot; are not necessary at this level, because the expert's attitude towards the system is pragmatic: they see it as a tool, know its limitations, and &quot;rudeness&quot; on the part of the system doesn't scare or annoy them anymore. However, it is not clear how the change in politeness when migrating from novice to expert levels actually affects the user's perception of the system; the transition should at least be gradual and not too fast. There may also be cultural differences regarding certain politeness rules.</Paragraph> <Paragraph position="6"> The virtues of adaptivity are still a matter of debate. One of the experts expressed serious doubt over the usability of any kind of automatic adaptivity and maintained that the user should decide whether she wants the system to adapt at a given moment or not. In the related field of tutoring systems, Kay (2001) has argued for giving the user the control over adaptation. Whatever the case, it is clear that badly designed adaptivity is confusing to the user, and especially a novice user may feel disoriented if faced with prompts where nothing seems to stay the same. It is essential that the system is consistent in its use of concepts, and manner of speech.</Paragraph> <Paragraph position="7"> In AthosMail, the expert level (DASEX=1 for all dialogue acts) acts as the core around which the other two expertise levels are built. While the core remains essentially unchanged, further information elements are added after it. In practise, when the perceived user expertise rises, the system simply removes information elements that have become unnecessary from the end of the utterance, without touching the core. This should contribute to a feeling of consistency and dependability. On the other hand, Paris (1988) argued that the user's expertise level does not affect only the amount but the kind of information given to the user. It will prove interesting to reconcile these views in a more general kind of user expertise modeling.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Adaptation and user errors </SectionTitle> <Paragraph position="0"> The user evaluation of AthosMail consisted of four tasks that were performed on two consecutive days. The 26 test users, aged 20-62, thus produced four separate dialogues each and a total of 104 dialogues. They had no previous experience with speech-based dialogue systems, and to familiarize themselves to synthesized speech and speech recognizers, they had a short training session with another speech application in the beginning of the first test session. An outline of AthosMail functionality was presented to the users, and they were allowed to keep it when interacting with the system. At the end of each of the four tests, the users were asked to assess how familiar they were with the system functionality and how confident they felt about using it. Also, they were asked to assess whether the system gave too little information about its functionality, too much, or the right amount. The results are reported in (Jokinen et al, 2004). We also identified four error types, as a point of comparison for the user expertise model.</Paragraph> </Section> </Section> class="xml-element"></Paper>