File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0226_metho.xml
Size: 27,362 bytes
Last Modified: 2025-10-06 14:07:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0226"> <Title>Bridging the Gap Between Dialogue Management and Dialogue Models</Title> <Section position="3" start_page="0" end_page="5" type="metho"> <SectionTitle> 2 The Gap </SectionTitle> <Paragraph position="0"> Why do most working SDSs make little use of dialogue models in their dialogue management? Or, why is there a gap between dialogue management and dialogue models? To make it clear, we first distinguish between dialogue models and dialogue management models , or equivalently, between dialogue modeling and dialogue management modeling.The goal of dialogue modeling is to develop general theories of (cooperative task-oriented) dialogues and to uncover the universals in dialogues and, if appropriate, to provide dialogue management with theoretical support. It takes an analyzer's point of view. While the goal of dialogue management modeling is to integrate dialogue model with task model in some specific domain to &quot;develop algorithms and procedures to support a computer's participation in a cooperative dialogue&quot; (Cohen, 1998, p.204). It takes the viewpoint of a dialogue system designer.</Paragraph> <Paragraph position="1"> Next, we briefly overview main approaches to dialogue modeling and dialogue management, then point out the causes behind the gap.</Paragraph> <Section position="1" start_page="1" end_page="2" type="sub_section"> <SectionTitle> 2.1 Dialogue Models </SectionTitle> <Paragraph position="0"> There are mainly two approaches to dialogue modeling: pattern-based and plan-based.</Paragraph> <Paragraph position="1"> The distinction between dialogue models and dialogue management models is close to what Cohen (1998) makes in dialogue modeling. He distinguishes &quot;two related, but at times conflicting, research goals ... often adopted by researchers of dialogue&quot;. Roughly speaking, one is theoretical and the other is practical.</Paragraph> <Paragraph position="2"> Cohen (1998) gives a more detailed discussion on dialogue modeling. Below we draw a lot from there. He mentions &quot;three approaches to modeling dialogue - dialogue grammars, plan-based models of dialogue, and joint action theories of dialogue&quot;. We treat joint action theories as further development of original plan-based approach. So his latter two correspond to our plan-based approach in general.</Paragraph> <Paragraph position="3"> Patten-based approach models recurrent interaction patterns or regularities in dialogues at the illocutionary force level of speech acts (Austin, 1962; Searle, 1969) in terms of dialogue grammar (Sinclair and Coulthard, 1975), dialogue/conversational game (Carlson, 1983; Kowtko et al., 1992; Mann, 2001), or adjacency pairs (Sacks et al., 1974). It benefits a lot from the insights of discourse analysis (Sinclair and Coulthard, 1975; Coulthard, 1992; Brown and Yule, 1983) and conversation analysis (Levinson, 1983).</Paragraph> <Paragraph position="4"> Plan-based approach relates speech acts performed in utterances to plans and complex mental states (Cohen and Perrault, 1979; Allen and Perrault, 1980; Lochbaum et al., 2000) and uses AI planning techniques (Fikes and Nilsson, 1971). Later developments of plan-based dialogue models include multilevel plan extension (Litman and Allen, 1987; Litman and Allen, 1990; Carberry, 1990; Lambert and Carberry, 1991), theories of joint action (Cohen and Levesque, 1991) and SharedPlan (Grosz and Sidner, 1990; Grosz and Kraus, 1996).</Paragraph> <Paragraph position="5"> Pattern-based dialogue model describes what happens in dialogues at the speech act level and cares little about why. Plan-based dialogue model explains why agents act in dialogues, but at the expense of complex representation and reasoning. In other words, the former is shallow and descriptive and the latter is deep and explanatory. Hulstijn (2000) argues for the complementary aspects of the two approaches and claims that &quot;dialogue games are recipes for joint action&quot;.</Paragraph> <Paragraph position="6"> Since, on the one hand, our target tasks belong to the class of simple service, like information-seeking and simple transactions, which are relatively well-structured and well-defined and not too complex for pattern-based dialogue models, on the other hand, there are some significant problems in using plan-based models in practical SDSs - those of &quot;knowledge representation, knowledge engineering, computational complexity, and noisy input&quot; (Allen et al., 2000), we will choose pattern-based instead of plan-based dialogue model as our theoretical basis for practical dialogue management at present.</Paragraph> </Section> <Section position="2" start_page="2" end_page="3" type="sub_section"> <SectionTitle> 2.2 Dialogue Management Models </SectionTitle> <Paragraph position="0"> We view dialogue management as an organic combination of dialogue model with task model in some specific domain. Its basic functionalities include interpretation in context, generation in context, task management, interaction management, choice of dialogue strategies, and context management. All of them require contextual (linguistic and/or world) knowledge.</Paragraph> <Paragraph position="1"> According to how task model and dialogue model are used, approaches to dialogue management can be classified into four categories in Table 1.</Paragraph> <Paragraph position="2"> DITI or graph-based, both dialogue model and task model are implicit. Dialogue is controlled via finite state transitions (McTear, 1998). Topic flow is predetermined. It is neither flexible nor natural, but simple and efficient. It's suitable for simple and well-structured tasks similar to automated services over ATMs or telephones with DTMF input.</Paragraph> <Paragraph position="3"> DITE or frame-based, with no explicit dialogue model, but task is explicitly represented as a frame or a form (Goddeau et al., 1996), a task description table (Lin et al., 1998), a topic forest (Wu et al., 2000), or an agenda (Xu and Rudnicky, 2000), etc. Both system and user may take the initiative. Topic flow is not predetermined. It's more flexible than that of DITI, but still far from naturalness and friendliness, since it makes no explicit use of dialogue models. Most working SDSs adopt this way of dialogue management.</Paragraph> <Paragraph position="4"> DETI there is no practical dialogue management For a more comprehensive discussion on dialogue management (and SDSs), see (McTear, 2002). He identifies two aspects of dialogue control (i.e., dialogue management) - initiative and flow of dialogue, and three strategies for dialogue control - finite-state-based, frame-based, and agent-based. The first two are similar to DITI and DITE respectively and the third is a collection of some other approaches which are now hardly applicable for practical dialogue management, among which is plan-based. Our classification below is more clear. using such a combination of task model and dialogue model.</Paragraph> <Paragraph position="5"> DETE both dialogue model and task model are explicit. This type of dialogue management shares the advantages of frame-based one. At the same time it is potential to allow of more natural interactions according to the dialogue model used. This is what we are after here.</Paragraph> </Section> <Section position="3" start_page="3" end_page="5" type="sub_section"> <SectionTitle> 2.3 The Causes behind the Gap </SectionTitle> <Paragraph position="0"> From the analysis above we can see the surface gap between (DITE) dialogue management in most working SDSs and (pattern-based) dialogue models is mainly due to a deep one, i.e., the one between dialogue models and the underlying tasks.</Paragraph> <Paragraph position="1"> There is another important cause - the interaction patterns are described at the level of speech act or dialogue act.</Paragraph> <Paragraph position="2"> To link dialogue acts to utterances, three problems must be addressed at the same time: AF Dialogue act classification scheme and its reliability in coding corpus, (Carletta et al., 1997; Allen and Core, 1997; Traum, 1999); AF Choice of features/cues that can support automatic dialogue act identification, including lexical, syntactic, prosodic, collocational, and discourse cues; AF A model that correlates dialogue acts with those features.</Paragraph> <Paragraph position="3"> Some of the problems are discussed in (Jurafsky et al., 1998; Stolcke et al., 2000; Jurafsky and Martin, 2000; Jurafsky, 2002). The empirical work on dialogue act classification and recognition did not begin until some dialogue corpora (like Map Task, Verbmobil, TRAINS, and our NLPR-TI) were available. But how could dialogue act recognition be successfully applied to practical dialogue management remains to be seen. So we choose a higher level Following Jurafsky (2002), we will adopt the term dialogue act, which captures the illocutionary force or commucative function of speech act. Though there are some arguments in (Levinson, 1983) and others against using dialogue act to model dialogues, and there are indeed some unresolved problems in linking dialogue acts to utterances, it will be our choice for the time being.</Paragraph> <Paragraph position="4"> We extend Webber's (2001) idea by splitting feature choice out.</Paragraph> <Paragraph position="5"> construct (UT-3, see section 3.1.3) to describe interaction patterns instead. We are by no means denying the important role dialogue act plays in dialogue modeling, but try to incorporate high level knowledge into dialogue modeling.</Paragraph> </Section> </Section> <Section position="4" start_page="5" end_page="6" type="metho"> <SectionTitle> 3 The Bridge - GDM </SectionTitle> <Paragraph position="0"> Against the above gap and its causes we propose a generic dialogue model (GDM) for task-oriented dialogues, which consists of five ranks of discourse units and three levels of dialogue dynamics. It captures two important aspects of task-oriented dialogue - interaction patterns at the low level and underlying task at the high level.</Paragraph> <Section position="1" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 3.1 Discourse Units </SectionTitle> <Paragraph position="0"> We distinguish five ranks of discourse units in describing task-oriented dialogues: dialogue, phase, transaction, utterance group, and utterance.</Paragraph> <Paragraph position="1"> The overall organization of a typical task-oriented dialogue can be divided into three phases, namely, an opening phase, a closing phase, and between them a problem-solving phase, which can be subdivided into transactions depending on how the underlying task is divided into subtasks. Each subtask corresponds to a transaction. If a task is atomic, there will be only one transaction in the problem-solving phase, just like the task of tourism informationseeking. null In performing a subtask (or task, if atomic), some interaction patterns will recur. We name the interaction patterns utterance groups (or groups, for short). It's also called exchanges or conversational games (see section 2.1). The unit at this level involves complex grounding process towards common ground or mutual knowledge (Clark and Schaefer, 1989; Clark, 1996; Traum, 1994).</Paragraph> <Paragraph position="2"> The elementary unit in our model is utterance.</Paragraph> <Paragraph position="3"> Every utterance either initiates a new group, continues, or ends an old one. Usually it is what a speaker utters in his/her turn (for simplification, overlaps will not be considered here). But there are some turns with two or more utterances. These multi-utterance turns usually end an old group with their first utterance and begin a new one with their last utterance. Similar observations are found in Verbmobil corpus (Alexandersson and Reithinger, 1997). Each utterance can be analyzed at three levels and assigned a type correspondingly (utterance type, UT): UT-1 sentence type or mood, i.e., declarative, imperative, and interrogative (including yes-no question (ynq), wh-question (whq), alternative question (atq), disjunctive question (djq), which can be identified using surface lexical and prosodic features).</Paragraph> <Paragraph position="4"> UT-2 dialogue act, see section 2.3.</Paragraph> <Paragraph position="5"> UT-3 a more general communicative function, relative to a group, of a small number, including initiative (I), response/reply (R), feedback (F), acknowledgement (A) (typical in information-seeking dialogues), and others. It can be identified using UT-1 and semantic content (or utterance topic) and preceding UT-3s, It is at this level that interaction patterns are more obvious. What's more, it can be recognized without UT-2 (dialogue act) but contribute to dialogue act recognition.</Paragraph> </Section> <Section position="2" start_page="5" end_page="5" type="sub_section"> <SectionTitle> 3.2 Dialogue Dynamics </SectionTitle> <Paragraph position="0"> By dialogue dynamics, we mean the dynamic process within dialogues, i.e., how dialogues flow from one partner's utterance to another's all the way till the closing. The dynamic process includes that of intra-utterance (micro-dynamics) and that of interutterance. Inter-utterance dynamics is further divided into intra-group dynamics (meso-dynamics) and inter-group dynamics (macro-dynamics).</Paragraph> <Paragraph position="1"> Micro-dynamics deals with how discourse phenomena (like anaphora, ellipsis, etc.) within one utterance are decoded (interpretation) or encoded (generation) in discourse context and how utterance level intention (dialogue act) is recognized using lexical, prosodic, and other cues and discourse structure (see section 2.3). Discourse phenomena contain much discourse-level context information. It is those that contribute partly to the naturalness and coherence in human-human dialogues. But it's very difficult for computers to make full use of them, either in interpretation or in generation. They are implemented in few of present SDSs, though much effort has been put on the study of computational models of discourse phenomena (see (Webber, 2001) for an overview and references therein for further details).</Paragraph> <Paragraph position="2"> Meso-dynamics explains utterance-to-utterance moves within one group which present recurrent interaction patterns. Our corpus study shows that those patterns in information-seeking dialogues are closely related to two factors - initiative and direction of information flow between user and server (see section 4.1).</Paragraph> <Paragraph position="3"> Macro-dynamics describes inter-group moves, which may take place intra-transactionally within one subtask or inter-transactionally between subtasks. Inter-group moves are subject to the underlying task. It's difficult to give an account like intra-group moves, because they reflect the process how a problem is solved.The account depends on how tasks are represented and reasoned. We may gain some hints from the study of general problem solving in AI (Bonet and Geffner, 2001).</Paragraph> </Section> <Section position="3" start_page="5" end_page="6" type="sub_section"> <SectionTitle> 3.3 Discussion </SectionTitle> <Paragraph position="0"> GDM as we propose above is a DETE dialogue management framework with fine-grained patterns. We discuss related work and its implication for dialogue management below.</Paragraph> <Paragraph position="1"> Different discourse units are used by different researchers in studying discourse. In (Sinclair and Coulthard, 1975), five ranks of units are used to analyze classroom interactions: lesson, transaction, exchange, move, and act. The first four roughly correspond to our dialogue, transaction, group, utterance. We add the unit phase and omit act, which is a sub-utterance unit. In (Alexandersson and Reithinger, 1997), four ranks of units are used to analyze meeting scheduling dialogues: dialogue, phase, turn, and dialogue act. Turn is a natural unit that appears in dialogues, but is it an basic unit? Four units with conversation acts (Traum and Hinkelman, 1992; Traum, 1994), are used to analyze TRAINS (freight scheduling) dialogues: multiple discourse unit (argumentation act), discourse unit (core speech act), utterance unit (grounding act), sub-utterance unit (turn-taking act). Theirs differ a lot from ours partly because they pay more attention to grounding.</Paragraph> <Paragraph position="2"> In GDM the structure of discourse is accounted for from two aspects: local structure is reflected in utterance groups and shaped by meso-dynamics; global structure is determined by the underlying task and shaped by macro-dynamics. This is obvious to task-oriented dialogues in view of GDM.</Paragraph> <Paragraph position="3"> In most working SDSs dialogue strategies are handcrafted by system developers. Recently there are some efforts in applying machine learning approaches to the acquisition of dialogue strategies (Walker, 2000; Levin et al., 2000). We hope to find out what strategies are used in human-human dialogue and how they could be applied to human-computer dialogue. We first refine the concept of dialogue strategies. From the view of GDM, the strategies a dialogue agent may choose can also be classified into three levels, i.e., Micro-level strategies how to realize information structure, anaphora, ellipsis, and others, in utterances, null Meso-level strategies what to say regarding current group status, so as to complete ongoing group more friendly, Macro-level strategies how to choose discourse topic regarding current task status, so as to complete the underlying task more efficiently.</Paragraph> <Paragraph position="4"> Grosz and Sidner (1986) proposed a tripartite discourse model consisting of attentional state, intentional structure, and linguistic structure. It is influential and covers both dialogue and text. But their intentional structure fails to capture the distinction between global level and local level structure. Their discourse unit - discourse segment - is used without noticing that there are different ranks of discourse unit in dialogues. This is partly due to that they looked more at the similarities between dialogue and text and less at the differences between them. Dialogue and text, as two types of discourse, share something in common, but there is also something that makes them different. Since dialogue management is closely related to dialogue model and underlying task and domain, the complexity of dialogue management can be decomposed into three parts, i.e., the complexity of dialogue model, the complexity of task, and the complexity of domain. The complexity of dialogue model is affected by what kind of initiative and dialogue phenomena are allowed. The task complexity is affected by the number of its possible actions and by whether it is well-structured and well-defined. The domain complexity is affected by domain entities and their relations and by the volume of information. The three are not independent but intertwined. null</Paragraph> </Section> </Section> <Section position="5" start_page="6" end_page="11" type="metho"> <SectionTitle> 4 Utterance Groups in GDM-IS </SectionTitle> <Paragraph position="0"> We now apply GDM to information-seeking dialogues (GDM-IS) and search for interaction patterns in the NLPR-TI corpus. We first try to classify and segment utterance groups. This is a preliminary step toward group pattern recognition. Details of the recognition process and results will be given in (Xu, 2002).</Paragraph> <Section position="1" start_page="6" end_page="7" type="sub_section"> <SectionTitle> 4.1 Group Classification </SectionTitle> <Paragraph position="0"> Group patterns are recurrent, but how many? Or, is there a limited number? In our NLPR-TI corpus information-seeking dialogues (see section 4.2.1), we find four basic groups with some variations.</Paragraph> <Paragraph position="1"> The recurrent patterns, according to our observation, can be classified into one of the four types in Table 2 along two dimensions - initiative and the direction of information flow (determined using world knowledge in the domain).</Paragraph> <Paragraph position="2"> Direction of information flow In the dialogues of information-seeking, there are two directions of information flow: one from user to server (UBPBQS) and the other from server to user (SBPBQU). In the tourism domain, the former includes intended route (or sight-spot, or a rough area, obligatory), intended start time, number of tourists (optional); the latter includes start time, duration, vehicle, price, accommodation, meal, schedule, and more. Server must know the information about user's intended route before providing user with other information.</Paragraph> <Paragraph position="3"> Initiative In GDM, initiative always starts a new utterance group. It is one of utterance's general communicative functions relative to a group, together with reply, feedback, acknowledgement, as we mention in section 3.1.3. Regarding one group topic there are user initiatives (UI) and there are server initiatives (SI). Group patterns depend heavily on who initiates the group regarding some specific topic. This is due to the role asymmetry of the dialogue partners.</Paragraph> <Paragraph position="4"> Though most groups can be covered by the above basic patterns, there are some exceptions which are more complex. They are usually embedded ones.</Paragraph> <Paragraph position="5"> When one partner signals non-understanding or nonhearing, or a normal group is suspended, one or two more utterances will be inserted, either to repeat previous utterance or resume suspended group. The embedded groups may also be precondition groups. Precondition groups occur when some obligatory information is missing before the salient issue could be addressed. Once the missing is provided, the outer group will continue. Complex groups can also occur when one partner lists more than one items or does some repairing.</Paragraph> </Section> <Section position="2" start_page="7" end_page="11" type="sub_section"> <SectionTitle> 4.2 Group Segmentation </SectionTitle> <Paragraph position="0"> Given the above group classification, how to recognize them? We have to segment and classify groups, and determine UT-3 of every utterance within groups. This is a big problem. Only the experiment on group segmentation is reported in this paper.</Paragraph> <Paragraph position="1"> We note that there are task initiative and dialogue initiative (Chu-Carroll and Brown, 1998) and there are local initiative and global initiative (Rich and Sidner, 1998). Our initiative-ingroup is more task-related and global. For a comprehensive discussion on mixed initiative interaction, see (Haller and McRoy, 1998, 1999).</Paragraph> <Paragraph position="2"> To segment a dialogue into groups is first to determine the beginning of a group, i.e., to determine if an utterance is an initiative or not. (Multi-utterance turns are manually segmented beforehand for simplification.) null We use NLPR-TI corpus (Xu et al., 1999) in the experiment. It consists of 60 spontaneous human-human dialogues (about 5.5 hours) over telephones on tourism information-seeking. There are total 2716 turns (1346 by the user and 1370 by the server). The average length of user's turns is about 7 Chinese characters and server's about 9. The first 20 dialogues (transcript) are used for current group segmentation.</Paragraph> <Paragraph position="3"> A subject was given the basic ideas about GDM and utterance groups in GDM-IS and segmented two dialogues with an expert's guide before starting the work.</Paragraph> <Paragraph position="4"> To test the reliability of group segmentation within GDM-IS, we calculate the kappa coefficient</Paragraph> <Paragraph position="6"> (Carletta, 1996; Carletta et al., 1997; Flammia, 1998) to measure pairwise agreement between the subject and the expert. Two coders segmented the first 20 dialogues (totally 845 utterances). They reached C3 BP BMBKBH BQBMBK, which shows a high reliability. Using the expert's segmentation as reference, we also measure the subject's segmentation using information retrieval metrics - precision (P), recall (R), and F-measure (see Table 3 for the result).</Paragraph> <Paragraph position="7"> Three simple algorithms in Figure 1 are used to perform the same task on the 20 dialogues. The input is a semantic tag sequence produced by a statistical parser (Deng et al., 2000) That we adopt such deep features in discourse segmentation is mainly due to our target application - dialogue management. This makes it different from others using surface features like (Passonneau and Litman, 1997).</Paragraph> <Paragraph position="8"> I. Using topic only for segmentation if topic is new then UT-3 = initiative else UT-3 = non-initiative II. Using UT-1 only for segmentation if UT-1 BE interrogatives then UT-3 = initiative else UT-3 = non-initiative III. Using both for segmentation if topic is new CM UT-1 BE interrogatives then UT-3 = initiative Given the semantic tag sequence of an utterance, we determine its topic and UT-1 (what we are most interested in is interrogatives (ynq, whq, atq, and djq)). Since the parser performs with an error rate of BEBIBMBGB1, there will be some wrong semantic tags, which lead to errors in assigning UT-1 and topic. Then we use the three simple algorithms to segment groups in the 20 dialogues. Their performance (also using the expert's segmentation as reference) is given in Table 3.</Paragraph> </Section> <Section position="3" start_page="11" end_page="11" type="sub_section"> <SectionTitle> 4.3 Discussion </SectionTitle> <Paragraph position="0"> Table 3 shows the results of group segmentation, both manual and automated. Though none of the three algorithms outperforms the subject, they all show that topic change and UT-1 as interrogative are acceptable and also good indicators of utterance group beginning, esp. when topic and UT-1 are the We presume that the topic of an utterance is the last one in the candidate tags. This seems problematic but is true to most of the utterances according to our observation. How to determine the topic of an utterance needs further study.</Paragraph> <Paragraph position="1"> only information sources and when discourse markers (Schiffrin, 1987) in spontaneous speech are unavailable in current deep analysis.</Paragraph> <Paragraph position="2"> There is no obvious performance difference in segmenting dialogue into groups with the three algorithms. The performance of algorithm I may be improved if the noises brought by the parser and our simple topic identification algorithm are cleared. This implies that topic change is a potentially better indicator of the beginning of new groups. The result using UT-1 only is the worst. This is partly because not all groups begin with interrogatives and that interrogatives do not always occur at the beginning of a group. When using both topic and UT-1, the performance changes little, though seemly more constraints are used. This possibly is because topic change and UT-1 as interrogative overlap a lot.</Paragraph> </Section> </Section> class="xml-element"></Paper>