File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-1306_metho.xml
Size: 20,291 bytes
Last Modified: 2025-10-06 14:10:40
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1306"> <Title>Multidimensional Dialogue Management</Title> <Section position="4" start_page="37" end_page="38" type="metho"> <SectionTitle> 2 The DIT dialogue act taxonomy </SectionTitle> <Paragraph position="0"> Based on studies of a variety of dialogues from several dialogue corpora, a dialogue act taxonomy was developed consisting of a number of dimensions, reflecting the idea that during a dialogue, several aspects of the communication need to be attended to by the dialogue participants (Bunt, 2006). Even within single utterances, several aspects are dealt with at the same time, i.e., in general, utterances are multifunctional. The multidimensional organisation of the taxonomy supports this multifunctionality in that it allows several dialogue acts to be performed in each utterance, at most one from each dimension. The 11 dimensions of the taxonomy are listed below, with brief descriptions and/or specific dialogue act types in that dimension. For convenience, the dimensions are further grouped into so-called layers. At the top level are two layers: one for dialogue control acts and one coinciding with the task-domain dimension. Dialogue control is further divided into 3 layers: Feedback (2 dimensions), Interaction Management (7 dimensions), and a layer coinciding with the Social Obligations Management dimension.</Paragraph> <Paragraph position="1"> * Dialogue Control - Feedback 1. Auto-Feedback: acts dealing with the speaker's processing of the addressee's utterances; contains positive and negative feedback acts on the levels of perception, interpretation, evaluation, and execution; 2. Allo-Feedback: acts dealing with the addressee's processing of the speaker's previous utterances (as viewed by the speaker); contains positive and negative feedback-giving acts and feedback elicitation acts, both on the levels of per- null ception, interpretation, evaluation, and execution; - Interaction management 3. Turn Management: turn accepting, giving, grabbing, keeping; 4. Time Management: stalling, pausing; 5. Dialogue Structuring: opening, preclosing, closing, dialogue act announcement; null 6. Partner Processing Management: completion, correct-misspeaking; 7. Own Processing Management: error signalling, retraction, self-correction; 8. Contact Management: contact check, contact indication; 9. TopicManagement: topic introduction, closing, shift, shift announcement; 10. Social Obligations Management: salutation, self-introduction, gratitude, apology, valediction; 11. Task/domain: acts that concern the specific underlying task and/or domain.</Paragraph> <Paragraph position="2"> Formally, a dialogue act in DIT consists of a Semantic Content and a Communicative Function, the latter specifying how the information state of the addressee is to be updated with the former. A dialogue act in a particular dimension may have either a dimension-specific communicative function, or a General-Purpose communicative function with a content type (type of semantic content) in that dimension. The general-purpose communicative functions are hierarchically organised into the branches of Information Transfer and Action Discussion functions, Information Transfer consisting of information-seeking (e.g., WH-QUESTION, YN-QUESTION, CHECK) and information-providing functions (e.g., INFORM, WH-ANSWER, YN-ANSWER, CONFIRM), and Action Discussion consisting of commissives (e.g., OFFER, PROMISE, ACCEPT-REQUEST) and directives (e.g., INSTRUCT, REQUEST, DECLINEOFFER). null The taxonomy is currently being evaluated in annotation experiments, involving several annotators and several dialogue corpora. Measuring inter-annotator agreement will give an indication of the usability of the taxonomy and annotation scheme. A first analysis has resulted in promising scores (Geertzen and Bunt, 2006).</Paragraph> </Section> <Section position="5" start_page="38" end_page="39" type="metho"> <SectionTitle> 3 The DIT context model </SectionTitle> <Paragraph position="0"> The Information State according to DIT is represented by a Context Model, containing all information considered relevant for interpreting user utterances (in terms of dialogue acts) and generating system dialogue acts (leading to system utterances). The contents of the context model are therefore very closely related to the dialogue act taxonomy; in (Bunt and Keizer, 2005) it is argued that the context model serves as a formal semantics for dialogue annotation, such an annotation being a kind of underspecified semantic representation. In combination with additional general conceptual considerations, the context model has evolved into a five component structure: 1. Linguistic Context: linguistic information about the utterances produced in the dialogue so far (a kind of 'extended dialogue history'); information about planned system dialogue acts (a 'dialogue future'); 2. Semantic Context: contains current infor- null mation about the task/domain, including assumptions about the dialogue partner's information; null 3. Cognitive Context: the current processing states of both participants (on the levels of perception, interpretation, evaluation, and task execution), as viewed by the speaker; 4. Physical and Perceptual Context: the perceptible aspects of the communication process and the task/domain; 5. Social Context: current communicative pres null sures.</Paragraph> <Paragraph position="1"> In Figure 1, a feature structure representation of the context model is given, in which the five components have been specified in further detail. This specification forms the basis for the dialogue managerbeingimplementedinthe PARADIME project. The Linguistic Context contains features for storing dialogue acts performed in the dialogue so far: user utts and system utts, having lists of dialogue act representations as values. It also has features for information about topics and conversational structure: topic struct and conv state respectively. Finally, there are two features that are related to the actual generation of system dialogue acts: candidate dial acts stores the dialogue acts generated by the dialogue act agents, and dial acts pres stores combined dialogue acts for presentation as system output; in Section 4, this will be discussed in more detail.</Paragraph> <Paragraph position="2"> The specification of the Semantic Context is determined by the character of the task-domain. In Section 4.1, the task-domain of interactive question-answering on encyclopedic medical information will be discussed and from that, the specification of the Semantic Context for this purpose. null The Cognitive Context is specified by means of two features, representing the processing states of the system (own proc state) and the user (partner proc state). Both features indicate whether or not a processing problem was encountered, and if so, on which level of processing this happened. The Physical and Perceptual Context is considered not to be relevant for the current system functionality. null The Social Context is specified in terms of reactive and interactive pressures; the corresponding features indicate whether or not a pressure exists and if so, for which social obligations management act it is a pressure (e.g., reactive pressures: grt indicates a pressure for the system to respond to a greeting).</Paragraph> </Section> <Section position="6" start_page="39" end_page="42" type="metho"> <SectionTitle> 4 Dialogue Act Agents </SectionTitle> <Paragraph position="0"> Having discussed the dialogue act taxonomy and context model in DIT, we can now move on to the dialoguemanagementapproachthatisalsoclosely connected to these concepts. Having 11 dimensions of dialogue acts that each attend to a different aspect of communication, the generation of (system) dialogue acts should also happen along those 11 dimensions. As a dialogue act in a dimension can be selected independent of the other dimensions, we propose to divide the generation process over 11 Dialogue Act Agents operating in parallel on the information state of the system, each agent dedicated to generating dialogue acts from one particular dimension.</Paragraph> <Paragraph position="1"> All of the dialogue act agents continuously monitor the context model and, if appropriate, try to generate candidate dialogue acts from their associated dimension. This process of monitoring and act generation is modelled through a triggering mechanism: if the information state satisfies the agent's triggering conditions, i.e., if there is a motivation for generating a dialogue act from a particular dimension, the corresponding agent gets triggered and tries to generate such a dialogue act.</Paragraph> <Paragraph position="2"> For example, the Auto-Feedback Agent gets triggered if a processing problem is recorded in the Own Processing State of the Cognitive Context.</Paragraph> <Paragraph position="3"> The agent then tries to generate a negative auto-feedback act in order to solve the processing prob- null lem (e.g., &quot;Could you repeat that please?&quot; or &quot;Did you say 'five'?&quot;). The Auto-Feedback Agent may also be triggered if it has reason to believe that the user is not certain that the system has understood a previous utterance, or simply if it has not given any explicit positive feedback for some time. In these cases of triggering, the agent tries to generate a positive auto-feedback act.</Paragraph> <Paragraph position="4"> Hence the dialogue management process involves 11 dialogue act agents that operate in parallel on the context model. The dialogue acts generated by these agents are kept in the linguistic context as candidates. The selection of dialogue acts from different dimensions may happen independently, but for their order of performance and their combination, the relative importance of the dimensions at the given point in the dialogue has to be taken into account.</Paragraph> <Paragraph position="5"> An additional Evaluation Agent monitors the list of candidates and decides which of them can be combined into a multifunctional system utterance for generation, and when. Some of the dialogue act candidates may have higher priority and should be generated at once, some may be stored for possible generation in later system turns, and some will already be implicitly performed through the performance of other candidate acts.</Paragraph> <Section position="1" start_page="40" end_page="42" type="sub_section"> <SectionTitle> 4.1 A dialogue manager for interactive QA </SectionTitle> <Paragraph position="0"> The current implementation of the PARADIME dialogue manager is integrated in an interactive question-answering (QA) system, as developed the IMIX multiproject. The task-domain at hand concerns encyclopedic information in the medical domain, in particular RSI (Repetitive Strain Injury). The system consists of several input analysis modules (ASR, syntactic analysis in terms of dependency trees, and shallow semantic tagging), three different QA modules that take self-contained domain questions and return answers retrieved from several electronic documents with text data in the medical domain, and a presentation module that takes the output from the dialogue manager, possibly combining any QA-answers to be presented, into a multimodal system utterance.</Paragraph> <Paragraph position="1"> The dialogue management module provides support for more interactive, coherent dialogues, in which problems can be solved about both communication and question-answering processes. In interaction with the user, the system should play the role of an Information Search Assistant (ISA).</Paragraph> <Paragraph position="2"> This HCI metaphor posits that the dialogue system is not an expert on the domain, but merely assists theuserinformulatingquestionsaboutthedomain that will lead to QA answers from the QA modules satisfying the user's information need (Akker et al., 2005).</Paragraph> <Paragraph position="3"> In the context model for this dialogue manager, as represented by the feature structure in Figure 1, the Semantic Context has been further specified according to this underlying task. It contains a state variable for keeping track of the question-answering process (the feature task progress with values to distinguish between the states of composing a self-contained question to send to the QA modules, waiting for the QA results in case a QAquestion has been sent, evaluating the QA results, and discussing the results with the user). Also, the Semantic Context keeps a record of user's information need, by means of a list user info needs of 'information need' specifications in terms of semantic descriptions of domain questions and whether or not these info-needs have been satisfied. null For the first version of the dialogue manager we have defined a limited system functionality, and following from that a simplified version of the dialogue act taxonomy. This simplification means for example that Social Obligations Management (SOM) and the various dimensions in the Interaction Management (IM) layer have been merged into one dimension, following the observation that utterances with a SOM function very often also have a function in the IM layer, especially in human-computer dialogue; see (Bunt, 2000b). Also several general-purpose communicative functions have been clustered into single types. Table 1 lists the dialogue acts that the dialogue act recogniser is able to identify from user utterances.</Paragraph> <Paragraph position="4"> erated by the dialogue manager. Task-domain acts, generally answers to questions about the do- null main, consist of a general-purpose function (either a WH-ANSWER or UNC-WH-ANSWER; the latter reflectingthatthespeakerisuncertainabouttheinformation provided) with a semantic content containing the answers obtained from QA.</Paragraph> <Paragraph position="5"> responses.</Paragraph> <Paragraph position="6"> The above considerations have resulted in a dialogue manager containing 4 dialogue act agents that operate on a slightly simplified version of the context model as specified in Figure 1: a Task-Oriented (TO) Agent, an Auto-Feedback (AUF) Agent, an Allo-Feedback (AUF) Agent, and an Interaction Management and Social Obligations Management (IMSOM) Agent. In addition, a (currently very simple) Evaluation Agent takes care of mergingcandidatedialogueactsforoutputpresentation. null In Appendices A.1 and A.2, two example dialogues with the IMIX demonstrator system are given, showing system responses based on candidate dialogue acts from several dialogue act agents. The ISA metaphor is reflected in the system behaviour especially in the way in which QA results are presented to the user. In system utterances S2 and S3 in Appendix A.1, for example, theanswerderivedfromtheretrievedQAresultsis isolated from the first part of the system utterance, showing that the system has a neutral attitude concerning that answer.</Paragraph> <Paragraph position="7"> The TO-Agent is dedicated to the generation of task-specific dialogue acts, which in practice involves ANSWER dialogue acts intended to satisfy the user's information need about the (medical) domain as indicated through his/her domain questions. The agent is triggered if a new information need is recorded in the Semantic Context. Once it has been triggered, the agent sends a request to the QA modules to come up with answers to a question asked, and evaluates the returned results. This evaluation is based on the number of answers received and the confidence scores of the answers; the confidence scores are also part of the output of the QA modules. If the QA did not find any answers or if the answers produced had confidence scores that were all below some lower threshold, the TO-Agent will not generate a dialogue act, but write an execution problem in the Own Processing State of the Cognitive Context (which causes the Auto-Feedback Agent to be triggered, see Section 4.1.2; an example can be found in the dialogue in Appendix A.2). Otherwise, the TO-Agent tries to make a selection from the QA answers to be presented to the user. If this selection will end up containing extremely many answers, again, an execution problem is written in the Cognitive Context (the question might have been too general to be answerable). Otherwise, the selection will be included in an answer dialogue act, either a WHANSWER, or UNCWHANSWER (uncertain wh-answer) in case the confidence scores are below some upper threshold. System utterances S1 and S2 in the example dialogue in Appendix A.1 illustrate this variation. The selection is narrowed down further if there is a subselection of answers with confidences that are significantly higher than those of the other answers in the selection.</Paragraph> <Paragraph position="8"> The AUF-Agent is dedicated to the generation of auto-feedback dialogue acts. It currently produces negative auto-feedback acts on the levels of interpretation (&quot;I didn't understand what you said&quot;), evaluation (&quot;I do not know what to do with this&quot;) and execution (&quot;I could not find any answers to your question&quot;). It may also decide to occasionally give positive feedback to the user. In the future, we would also like this agent to be able to generate articulate feedback acts, for example with the purpose of resolving reference resolution problems, as in: U: what is RSI? S: RSI (repetitive strain injury) is a pain or discomfort caused by small repetitive movements or tensions.</Paragraph> <Paragraph position="9"> U: how can it be prevented? S: do you mean 'RSI' or 'pain'? The ALF-Agent is dedicated to the generation of allo-feedback dialogue acts. For example, it may generate a feedback-elicitation act if it has reason to believe that the user might not be satisfied with an answer (&quot;Was this an answer to your question?&quot;).</Paragraph> <Paragraph position="10"> The IM-SOM Agent is dedicated to the generation of social obligations management acts, possibly also functioning as dialogue structuring acts (opening resp. closing a dialogue through a greeting resp. valediction act). It gets triggered if communicative pressures are recorded in the Social Context. Currently it only responds to reactive pressures as caused by initiative greetings and goodbyes. The example dialogues in Appendices A.1 and A.2 illustrate this type of social behaviour. null</Paragraph> </Section> <Section position="2" start_page="42" end_page="42" type="sub_section"> <SectionTitle> 4.1.5 Multi-agent Architecture of the Dialogue Manager </SectionTitle> <Paragraph position="0"> In Figure 2, a schematic overview of the multi-agent dialogue manager is given. It shows the context model with four components (for now, the Physical and Perceptual Context is considered to be of minor importance and is therefore ignored), a set of dialogue act agents, and an Evaluation Agent. The dialogue act agents each monitor the context model and may be triggered if certain conditions are satisfied. The TO-agent may also write to the Cognitive Context (particularly in case of execution problems). All agents may construct a dialogue act and write it in the candidates list in the Linguistic Context. The Evaluation Agent monitors this candidates list and selects one or more dialogue acts from it for presentation as system output. In this way, a control module may decide to take this combination of dialogue act for presentation anytime and send it to the presentation module to produce a system utterance.</Paragraph> <Paragraph position="1"> With this initial design of a multi-agent dialogue manager, the system is able to support multifunctional output. The beginning of the example dialogue in Appendix A.1 illustrates multifunctionality, both in input interpretation and output generation. The system has recognised two dialogue acts in processing U1 (a conventional opening and a domain question), and S1 is generated on the basis of two candidate dialogue acts generated by different dialogue act agents: the IM-SOM-Agent (generated the react-greeting act) and the TO-Agent (generated the answer act).</Paragraph> </Section> </Section> class="xml-element"></Paper>