File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/e06-1024_metho.xml
Size: 26,055 bytes
Last Modified: 2025-10-06 14:10:03
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1024"> <Title>Keeping the initiative: an empirically-motivated approach to predicting user-initiated dialogue contributions in HCI</Title> <Section position="4" start_page="185" end_page="186" type="metho"> <SectionTitle> 2 A targetted comparison of HHC and HCI dialogues </SectionTitle> <Paragraph position="0"> In order to ascertain the extent to which techniques of recipient design established on the basis of human-human natural interaction can be transferred to HCI, we investigated comparable task-oriented dialogues that varied according to whether the users believed that that they were interacting with another human or with an artificial agent. The data for our investigation were taken from three German corpora collected in the mid1990s within a toy plane building scenario used for a range of experiments in the German Collaborative Research Centre Situated Artificial Communicators (SFB 360) at the University of Bielefeld (Sagerer et al., 1994). In these experiments, one participant is the 'constructor' who actually builds the model plane, the other participant is the 'instructor', who provides instructions for the constructor. null The corpora differ in that the constructor in the HHC setting was another human interlocutor; in the other scenario, the participants were seated in front of a computer but were informed that they were actually talking to an automatic speech processing system (HCI).1 In all cases, there was no visual contact between constructor and instructor.</Paragraph> <Paragraph position="1"> Previous work on human-human task-oriented dialogues going back to, for example, Grosz (1982), has shown that dialogue structure commonly follows task structure. Moreover, it is well known that human-human interaction employs a variety of dialogue structuring mechanisms, ranging from meta-talk to discourse markers, and that some of these can usefully be employed for automatic analysis (Marcu, 2000).</Paragraph> <Paragraph position="2"> If dialogue with artificial agents were then to be structured as it is with human interlocutors, there would be many useful linguistic surface cues available for guiding interpretation. And, indeed, a common way of designing dialogue structure in HCI is to have it follow the structure of the task, since this defines the types of actions necessary and their sequencing.</Paragraph> <Paragraph position="3"> 1In fact, the interlocutors were always humans, as the artificial agent in the HCI conditions was simulated employing standard Wizard-of-Oz methods allowing tighter control of the linguistic responses received by the user.</Paragraph> <Paragraph position="4"> Previous studies have not, however, addressed the issue of dialogue structure in HCI systematically, although a decrease in framing signals has been noted by Hitzenberger and Womser-Hacker (1995)--indicating either that the discourse structure is marked less often or that there is less structure to be marked. A more precise characterisation of how task-structure is used or expressed in HCI situations is then critical for further design. For our analysis here, we focused on properties of the overall dialogue structure and how this is signalled via linguistic cues. Our results show that there are in fact significant differences in HCI and HHC and that it is not possible simply to take the human-human interaction results and transpose results for one situation to the other.</Paragraph> <Paragraph position="5"> The structuring devices of the human-to-human construction dialogues can be described as follows. The instructors first inform their communication partners about the general goal of the construction. Subsequently, and as would be expected for a task-oriented dialogue from previous studies, the discourse structure is hierarchical. At the top level, there is discussion of the assembly of the whole toy airplane, which is divided into individual functional parts, such as the wings or the wheels. The individual constructional steps then usually comprise a request to identify one or more parts and a request to combine them. Each step is generally acknowledged by the communication partner, and the successful combination of the parts as a larger structure is signalled as well. All the human-to-human dialogues were similar in these respects. This discourse structure is shown graphically in the outer box of Figure 1.</Paragraph> <Paragraph position="6"> Instructors mark changes between phases with signals of attention, often the constructor's first name, and discourse particles or speech routines that mark the beginning of a new phase such as goal discourse marker explicit marking usage HHC HCI HHC HCI HHC HCI none 27.3 100 0 52.5 13.6 52.5 single 40.9 0 9.1 25.0 54.5 27.5 frequent 31.8 0 90.9 22.5 31.8 20.0 Percentage of speakers making no, single or frequent use of a particular structuring strategy.</Paragraph> <Paragraph position="7"> HCI: N=40; HHC: N=22. All differences are highly significant (ANOVA p<0.005).</Paragraph> <Paragraph position="8"> also [so] or jetzt geht's los [now]. This structuring function of discourse markers has been shown in several studies and so can be assumed to be quite usual for human-human interaction (Swerts, 1998). Furthermore, individual constructional steps are explicitly marked by means of als erstes, dann [first of all, then] or der erste Schritt [the first step]. In addition to the marking of the construction phases, we also find marking of the different activities, such as description of the main goal versus description of the main architecture, or different phases that arise through the addressing of different addressees, such as asides to the experimenters.</Paragraph> <Paragraph position="9"> Speakers in dialogues directed at human interlocutors are therefore attending to the following three aspects of discourse structure: * marking the beginning of the task-oriented phase of the dialogue; * marking the individual constructional steps; * providing orientations for the hearer as to the goals and subgoals of the communication.</Paragraph> <Paragraph position="10"> When we turn to the HCI condition, however, we find a very different picture--indicating that a straightforward tuning of dialogue structure for an artificial agent on the basis of the HHC condition will not produce an effective system.</Paragraph> <Paragraph position="11"> These dialogues generally start as the HHC dialogues do, i.e., with a signal for getting the communication partner's attention, but then diverge by giving very low-level instructions, such as to find a particular kind of component, often even before the system has itself given any feedback. Since this behaviour is divorced from any possible feed-back or input produced by the artificial system, it can only be adopted because of the speaker's initial assumptions about the computer. When this strategy is successful, the speaker continues to use it in following turns. Instructors in the HCI condition do not then attempt to give a general orientation to their hearer. This is true of all the human-computer dialogues in the corpus. Moreover, the dialogue phases of the HCI dialogues do not correspond to the assembly of an identifiable part of the airplane, such as a wing, the wheels, or the propeller, but to much smaller units that consist of successfully identifying and combining some parts. The divergent dialogue structure of the HCI condition is shown graphically in the inner dashed box of Figure 1.</Paragraph> <Paragraph position="12"> These differences between the experimental conditions are quantified in Table 1, which shows for each condition the frequencies of occurrence for the use of general orienting goal instructions, describing what task the constructor/instructor is about to address, the use of discourse markers, and the use of explicit signals of changes in task phase. These differences prove (a) that users are engaging in recipient design with respect to their partner in these comparable situations and (b) that the linguistic cues available for structuring an interpretation of the dialogue in the HCI case are considerably impoverished. This can itself obviously lead to problems given the difficulty of the interpretation task.</Paragraph> </Section> <Section position="5" start_page="186" end_page="188" type="metho"> <SectionTitle> 3 Interpretation of the observed </SectionTitle> <Paragraph position="0"> differences in terms of recipient design Examiningthe resultsof theprevious sectionmore closely, we find signs that the concept of the communication partner to which participants were orienting was not the same for all participants. Some speakers believed structural marking also to be useful in the HCI situation, for example. In this section, we turn to a more exact consideration of the reasons for these differences and show that directly employing the mechanisms of recipient design developed by Schegloff (1972) is a beneficial strategy. The full range of variation observed, includingintra-corpusvariationthatspaceprecluded null us describing in detail above, is seen to arise from a single common mechanism. Furthermore, we show that precisely the same mechanism leads to a predictive account of user-initiated clarificatory dialogues.</Paragraph> <Paragraph position="1"> The starting point for the discussion is the conversation analytic notion of the insertion sequence. An insertion sequence is a subdialogue inserted between the first and second parts of an adjacency pair. They are problematic for artificial agents precisely because they are places where the user takes the initiative and demands information from the system. Clarificatory subdialogues are regularly of this kind. Schegloff (1972) analyses the kinds of discourse contents that may constitute insertion sequences in human-to-human conversations involving spatial reference. His results imply a strong connection between recipient design and discourse structure. This means that we can describe the kind of local sequential organisation problematic for mixed-initiative dialogue interpretation on the basis of more general principles. Insertion sequences have been found to address the following kinds of dialogue work: Location Analysis: Speakers check upon spatial information regarding the communication partners, such as where they are when on a mobile phone, which may lead to an insertion sequence and is also responsible for one ofthemostcommontypesofutteranceswhen beginning a conversation by mobile phone: i.e., &quot;I'm just on the bus/train/tram&quot;.</Paragraph> <Paragraph position="2"> Membership Analysis: Speakers check upon information about the recipient because the communication partner's knowledge may render some formulations more relevant than others. As a 'member' of a particular class of people, such as the class of locals, or of the class of those who have visited the place before, the addressee may be expected to know some landmarks that the speaker may use for spatial description. Membership groups may also include differentiation according to capabilities (e.g., perceptual) of the interlocutors. null Topic or Activity Analysis: Speakers attend to which aspects of the location addressed are relevant for the given topic and activity. They have a number of choices at their disposal among which they can select: geographical descriptions, e.g. 2903 Main Street, descriptions with relation to members, e.g. John's place, descriptions by means of landmarks, or place names.</Paragraph> <Paragraph position="3"> These three kinds of interactional activity each give rise to potential insertion sequences; that is, they serve as the functional motivation for particular clarificatory subdialogues being explored rather than others. In the HCI situation, however, one of them stands out. The task of membership analysis is extremely challenging for a user faced with an unknown artificial agent. There is little basis for assigning group membership; indeed, there are not even grounds for knowing which kind of groups would be applicable, due to lack of experience with artificial communication partners. Since membership analysis constitutes a pre-requisite for the formulation of instructions, recipient design can be expected to be an essential force both for the discourse structure and for the motivation of particular types of clarification questions in HCI. We tested this prediction by means of a further empirical study involving a scenario in which the users' task was to instruct a robot to measure the distance between two objects out of a set of seven. These objects differed only in their spatial position. The users had an overview of the robot and the objects to be referred to and typed their instructions into a notebook. The relevant objects were pointed at by the instructor of the experiments. The users were not given any information about the system and so were explicitly faced with a considerable problem of membership analysis, making the need for clarification dialogues particularly obvious. The results of the study confirmed the predicted effect and, moreover, provide a classification of clarification question types. Thus, the particular kinds of analysis found to initiate insertionsequencesinHHCsituationsareclearlyactive null in HCI clarification questions as well.</Paragraph> <Paragraph position="4"> 21 subjects from varied professions and with different experience with artificial systems participated in the study. The robot's output was generated by a simple script that displayed answers in a fixed order after a particular 'processing' time.</Paragraph> <Paragraph position="5"> The dialogues were all, therefore, absolutely comparable regarding the robot's linguistic material; moreover, the users' instructions had no impact on the robot's linguistic behaviour. The robot, a Pioneer 2, did not move, but the participants were told that it could measure distances and that they were connected to the robot's dialogue processing system by means of a wireless LAN connection. The robot's output was either &quot;error&quot; (or later in the dialogues a natural language variant) or a distance membership analysis clarification questions in centimeters. This forced users to reformulate theirdialoguecontributions--aneffectivemethodology for obtaining users' hypotheses about the functioning and capabilities of a system (Fischer, 2003). In our terms, this leads directly to an explicit exploration of a user's membership analysis. As expected in a joint attention scenario, very limited location analysis occurred. Topic analysis is also restricted; spatial formulations were chosen on the basis of what users believed to be 'most understandable' for the robot, which also leads back to the task of membership analysis.</Paragraph> <Paragraph position="6"> In contrast, there were many cases of membership analysis. There was clearly great uncertainty about the robot's prerequisites for carrying out the spatial task and this was explicitly specified in the users' varied formulations. A simple example is given in Figure 2.</Paragraph> <Paragraph position="7"> The complete list of types of questions related to membership analysis and which digress from the task instructions in our corpus is given in Table2. Eachoftheseinstancesofmembershipanalysis constitutes a clarification question that would initiate an off-topic subdialogue if the robot had reacted to it.</Paragraph> </Section> <Section position="6" start_page="188" end_page="190" type="metho"> <SectionTitle> 4 Consequences for system design </SectionTitle> <Paragraph position="0"> So far our empirical studies have shown that there are particular kinds of interactional problems that will regularly trigger user-initiated clarification subdialogues. These might appear off-topic or out of place but when understood in terms of the membership and topic/activity analysis, it becomes clear that all such contributions are, in a very strong sense, 'predictable'. These results can, and arguably should,2 be exploited in the following ways. One is to extend dialogue system design to be able to meet these contingently rele2Doran et al. (2001) demonstrate a negative relationship between number of initiative attempts and their success rate. vant contributions whenever they occur. That is, we adapt dialogue manager, lexical database etc.</Paragraph> <Paragraph position="1"> so that precisely these apparently out-of-domain topics are covered. A second strategy is to determine discourse conditions that can be used to alert the dialogue system to the likely occurrence or absence of these kinds of clarificatory subdialogues (see below). Third, we can design explicit strategies for interaction that will reduce the likelihood that a user will employ them: for example, by providing information about the agent's capabilities, etc. as listed in Table 2 in advance by means of system-initiated assertions. That is, we can guide, or shape, to use the terminology introduced by Zoltan-Ford (1991), the users' linguistic behaviour. A combination of these three capabilities promises to improve the overall quality of a dialogue system and forms the basis for a significant part of our current research.</Paragraph> <Paragraph position="2"> We have already ascertained empirically discourse conditions that support the second strategy above, and these follow again directly from the basic notions of recipient design and membership analysis. If a user already has a strong membership analysis in place--for example, due to preconceptions concerning the abilities (or, more commonly, lack of abilities) of the artificial agent--then this influences the design of that user's utterances throughout the dialogue. As a consequence, we have been able to define distinctive linguistic profiles that lead to the identification of distinct user groups that differ reliably in their dialogue strategies, particularly in their initiation of subdialogues. In the human-robot dialogues just considered, for example, we found that eight out of 21 users did not employ any clarification questions at all and an additional four users asked only a single clarification question. Providing these users with additional information about the robot's capabilities is of limited utility because these users found ways to deal with the situation without asking clarification questions. The second group of participants consisted of nine users; this group used many questions that would have led into potentially problematic clarification dialogues if the system had been real. For these users, the presentation of additional information on the robot's capabilities would be very useful.</Paragraph> <Paragraph position="3"> It proved possible to distinguish the members of these two groups reliably simply by attending to their initial dialogue contributions. This is domain example (translation) perception VP7-3 [do you see the cups?] readiness VP4-25 [Are you ready for another task?] functional capabilities VP19-11 [what can you do?] linguistic capabilities VP18-7 [Or do you only know mugs?] cognitive capabilities VP20-15 [do you know where is left and right of you?] N = 21; average number of clarification questions for task-oriented group: 1.17 clarification questions per dialogue; average number for 'greeting'group 3.2; significance by t-test p<0.01 gle, or frequent clarification questions depending on first utterance where their pre-interaction membership analysis was most clearly expressed. In the human-robot dialogues investigated, there is no initial utterance from the robot, the user has to initiate the interaction. Two principally different types of first utterance were apparent: whereas one group of users beginstheinteractionwithtask-instructions, asecondgroupbeginsthedialoguebymeansofagreet- null ing, an appeal for help, or a question with regard to the capabilities of the system. These two different ways of approaching the system had systematic consequences for the dialogue structure.</Paragraph> <Paragraph position="4"> The dependent variable investigated is the number of utterances that initiate clarification subdialogues. The results of the analysis show that those who greet the robot or interact with it other than by issuing commands initiate clarificatory subdialogues significantly more often than those who start with an instruction (cf. Table 3). Thus, user modelling on the basis of the first utterance in these dialogues can be used to predict much of users' linguistic behaviour with respect to the initiation of clarification dialogues. Note that for this type of user modelling no previous information about the user is necessary and group assignment can be carried out unobtrusively by means of simple key word spotting on the first utterance.</Paragraph> <Paragraph position="5"> Whereas the avoidance of clarificatory user-initiated subdialogues is clearly a benefit, we can also use the results of our empirical investigations to motivate improvements in the other areas of interactive work undertaken by speakers. In particular topic and activity analysis can become problematic when the decompositions adopted by a userareeitherinsufficienttostructuredialogueappropriately for interpretation or, worse, are incompatible with the domain models maintained by the artificial agent. In the latter case, communication will either fail or invoke rechecking of membership categories to find a basis for understanding (e.g., 'do you know what cups are?'). Thus, what can be seen on the part of a user as reducing the complexity of a task can in fact be removing information vital for the artificial agent to effect successful interpretation.</Paragraph> <Paragraph position="6"> The results of a user's topic and activity analysis make themselves felt in the divergent dialogue structures observed. As shown above in Figure 1, the structure of the dialogues is thus much flatter than the one found in the corresponding HHC dialogues, such that goal description and marking of subtasks is missing, and the only structure results from the division into selection and combination of parts. In our second study, precisely the same effects are observed. The task of measuring distances between objects is often decomposed into 'simpler' subtasks; for example, the complexity of the task is reduced by achieving reference to each of the objects first before the robot is requested to measure the distance between them.</Paragraph> <Paragraph position="7"> This potential mismatch between user and system can also be identified on the basis of the interaction. Proceeding directly to issuing low-level instructions rather than providing background general goal information is a clear linguistically recognisable cue that a nonaligned topic/activity analysis has been adopted. A successful dialogue system can therefore rely on this dialogue transition as providing an indication of problems to come, which can again be avoided in advance by explicit system-initiated assertions of information.</Paragraph> <Paragraph position="8"> Our main focus in this paper has been on setting out and motivating some generic principles for dialogue system design. These principles could find diverse computational instantiations and it has not been our aim to argue for any one instantation rather than another. However, to conclude, we summarise briefly the approach that we are adopting to incorporating these mechanisms within our own dialogue system (Ross et al., 2005).</Paragraph> <Paragraph position="9"> Our system augments an information-state based approach with a distinguished vocabulary of discourse transitions between states. We attach 'conceptualisation-conditions' to these transitions which serve to post discourse goals whose particular function is to head off user-initiated clarification. The presence of a greeting is one such condition; the immediate transition to basic-level instructions is another. Recognition and production of instructions is aided by treating the semantic types that occur ('cups', 'measure', 'move', etc.) as elements of a domain ontology. The diverse topic/activity analyses then correspond to the specification of the granularity and decomposition of activated domain ontologies. Similarly, location analyses correspond to common sense geographies, which we model in terms similar to those of ontologies now being developed for Geographic Information Systems (Fonseca et al., 2002).</Paragraph> <Paragraph position="10"> The specification of conceptualisationconditions triggered by discourse transitions and classifications of the topic/activity analysis given by the semantic types provided in user utterances represents a direct transfer of the implicit strategies found in conversation analyses to the design of our dialogue system. For example, in our case many simple clarifications like 'do you see the cups?,' 'how many cups do you see?' as well as 'what can you do?' are prevented by providing information in advance on what the robot can perceive to those users that use greetings.</Paragraph> <Paragraph position="11"> Similarly, during a scene description where the system has the initiative, the opportunity is taken to introduce terms for the objects it perceives as well as appropriate ways of describing the scene, e.g., by means of 'There are two groups of cups.</Paragraph> <Paragraph position="12"> What do you want me to do?' a range of otherwise necessary clarificatory questions is avoided. Even in the case of failure, users will not doubt those capabilities of the system that it has displayed itself, due to alignment processes also observable in human-to-human dialogical interaction (Pickering and Garrod, 2004). After a successful interaction, users expect the system to be able to process parallel instructions because they reliably expect the system to behave consistently (Fischer and Batliner, 2000).</Paragraph> </Section> class="xml-element"></Paper>