File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/e91-1041_metho.xml
Size: 24,434 bytes
Last Modified: 2025-10-06 14:12:36
<?xml version="1.0" standalone="yes"?> <Paper uid="E91-1041"> <Title>A DIALOGUE MANAGER USING INITIATIVE-RESPONSE UNITS AND DISTRIBUTED CONTROL</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> A DIALOGUE MANAGER USING INITIATIVE-RESPONSE UNITS AND DISTRIBUTED CONTROL </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> Email: ARJ@IDA.LIU.SE Abstract </SectionTitle> <Paragraph position="0"> This paper describes a system for managing: dialogue in a natural language interface. The proposed approach uses a dialogue manager as the overall control mechanism. The dialogue manager accesses domain independent resources for interpretation, generation and background system access. It also uses information from domain dependent knowledge sources, which are customized for various applications.</Paragraph> <Paragraph position="1"> Instead of using complex plan-based reasoning, the dialogue manager uses information about possible interaction structures and information from the specific dialogue situation to manage the dialogue. This is motivated from the analysis of a series of experiments where users interacted with a simulated natural language interface. The dialogue manager integrates information about segment types and moves into a hierarchical dialogue tree. The dialogue tree is accessed through a score-board which uses exchangeable access functions. The control is distributed and the dialogue is directed from action plans in the nodes in the dialogue tree.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> To achieve true cooperation a natural language interface must be able to participate in a coherent dialogue with the user. A common, generally applicable'approach is to use plan-inference as a basis for reasoning:about intentions of the user as proposed by, for instance, Allen & Perrault (1980), Litman (1986), Carberry (1989) and Pollack (1986). However, computationally these approaches are not so efficient.</Paragraph> <Paragraph position="1"> Reichman (1985) describes a discourse grammar based on the assumption that a conversation can be described using conventionalized discourse rules. Gilbert, Buckland, Frolich, Jirotka & Luff (1990) uses interaction rules in their menu-based advisory system. Our approach is similar to Reichman and Gilbert el al. In a series of experiments (Dahlb~lck & JOnsson, 1989, J0nsson & Dahib/tck, 1988) we studied dialogue behaviour in an information-seeking interaction between a human and a computer using a simulated natural language interface (NLI). One important result was that the users followed a rather straightforward information searching strategy which could be well described using conventionalized rules.</Paragraph> <Paragraph position="2"> Reichman uses surface linguistic phenomena for recognizing how the speaker's structure the discourse. We found, however, very little use of surface linguistic cues in our dialogues. In our corpus users normally initiate a request for information, which is followed by an answer from the system. Sometimes the request needs clarifica, tion before the answer can be given as a response to the initial question (this is illustrated in section 4 and 5). Op. tionally the user can interrupt the original question and start a new initiative-response unit, but this also follows the goals of information-seeking. Thus, we adopt a strat, egy in which we employ the notion of adjacency pairs (Schegloff & Sacks, 1973, see also Levinson, 1983: 3030. In our approach the dialogue is planned and utterances are interpreted in terms of speech acts. The speech acts are determined on the basis of structural information in the utterance and in the immediate context.</Paragraph> <Paragraph position="3"> Further, we found, in our experiments, that different configurations of the background system (e.g. data base, consultation) and task to solve (e.g. information retrieval, configuration) require different mechanisms for handling dialogue in an NLI (JOnsson, 1990). Therefore, one major design criterion is that the system should be easy to adapt (customiZe) to a new application.</Paragraph> <Paragraph position="4"> The natural language interface described in this paper is constructed on the assumption that different applications have different sublanguages (Grishman & Kittredge, 1987), i.e. subsets of a natural language. A sub-language is not only defined by a grammar and lexicon, but also by interaction behaviour, i.e factors such as how the user and system handle clarifications, who takes the initiative, what is cooperative in a certain application, what are the user categories and so on.</Paragraph> <Paragraph position="5"> The dialogue manager operates as the central control.</Paragraph> <Paragraph position="6"> ler in the NLI (Ahrenberg, Dahlb/tck & J6nsson, 1990).</Paragraph> <Paragraph position="7"> It passes information encoded in directed acyclic graphs (dags) between different modules for parsing, generation, etc. This paper, however, only describes the dialogue manager's role in the control of the dialogue. I assume that the dag's correctly describe the full meaning of the user's input. For a discussion of interpretation of user input in this system see Ahrenberg (1988). The dialogue manager is implemented in CommonLisp but is currently not completely integrated with the other modules of the system.</Paragraph> <Paragraph position="9"/> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The dialogue manager </SectionTitle> <Paragraph position="0"> The dialogue manager (DM) is the kernel in the natural language interface, see figure 1. It directs the dialogue, assists the instantiator and deep generator and communicates with the background system. DM can be viewed as a controller of resources and knowledge sources.</Paragraph> <Paragraph position="1"> The resources in our system are a chart parser (Wir6n, 1988), an instantiator which links the linguistic object descriptions to objects in the universe of discourse (Ahrenberg, 1989), a translator which translates the instantiated structures in|o a form suitable for accessing the background system&quot; and finally a deep and a surface generator for generating a system utterance. These resources are domain independent processes accessing various knowledge sources.</Paragraph> <Paragraph position="2"> The knowledge sources are domain dependent and implemented in the same knowledge base system and can be modified for each new application. We use a lexicon for general and domain-specific&quot; vocabulary and a grammar with knowledge of syntactic constructions and their semantic impact. Furthermore, we use descriptions of dialogue objects, i.e. segments and moves and their associated information (section 3) and domain object descriptions which contain relations between the concepts used to describe objects in the background system and constraints on them.</Paragraph> <Paragraph position="3"> The need for domain object information in a natural language database interface has been argued for by for instance Copestake & Sparck Jones (1990) and McCoy & Cheng (1988). The domain objects are primarily used by the instantiator and deep generator, but the translator, parser and surface generator can also use this information. For a discussion on domain objects in this system see Ahrenberg, J6nsson & Dahlb~ick (1990).</Paragraph> <Paragraph position="4"> Each input or output from the resources passes via the dialogue manager (DM). A typical segment begins with an input from the user that is sent to the DM which l. Initially we use only a relational database system.</Paragraph> <Paragraph position="5"> passes it to the parser. The parser sends its result to the DM which passes it to the instantiator where it is enhanced with referential information. This is sent to the translator which accesses the background system and if the access succeeds, informs the DM. The DM forwards the information to the deep generator where an enhanced description is created which is sent to the surface generator and finally a. response is given from the DM to the user. This has the advantage that the DM always has control over what happens in the system. Thus, if one module does not succeed with its task, the DM directs the recovery. For instance, if the translator cannot access the data base due to lack of information from the user, the DM receives information from the translator that there is information missing and then in turn calls the deep and surfac.~ generators to produce a suitable message to the user. The DM then waits for input to provide to the parser and: instantiator. Finally, the DM tries to integrate the new information with the previous information. null Internally the dialogue manager maintains three dynamic structures for monitoring the dialogue: the dialogue tree (section 4) where the dialogue history is kept, action plans (section 5) for controlling the dialogue and finally a scoreboard (section 6) which constitutes the interface between the dialogue tree and other modules of the system.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Dialogue objects </SectionTitle> <Paragraph position="0"> Dialogue objects play a central role in this architecture. The dialogue objects consist of two components, one is a process :description of a prototypicaf use of the dialogue object. This is described below. The other conrains static reformation about speaker, hearer, type, topic, context and different types of descriptors to describe salient objects, e.g. the focused object, potential focused objects and optionally the current set. The current set records which subset of the data base that is currently used. We found in our data base dialogues that the user often restricts the possible candidates in a database - 234 search. For ~ln example consider the dialogue fragment in example 1L First the user specifies a set of cars in utterance U8>, presented by the system in $9>. This set is however, too large, therefore in utterance U10> it is reduced. In the sequence of utterances U12> to S15>, Current set consists of the cars presented in U11>. Current set does not have to be explicit as in example 1, instead it can be described by constraints. For instance, in a travel data base the user may be interested in a trip to Greek islands which restricts the search in the database to Greek islands for a large part of the ensuing dialogue.</Paragraph> <Paragraph position="1"> The communication is hierarchically structured using three different categories of dialogue objects. There are various proposals as to the number of levels needed. The system developed by Polanyi & Scha (1984) uses five different levels to hierarchically structure a dialogue and LOKI (Wachtel, 1986) uses four. In LOKI the levels are: conversation, dialogue, exchange and move. When analysing our dialogues we found no certain criteria concerning how to divide a dialogue into'a set of exchanges. Therefore we only use three different dialogue object types: dialogue, initiative-response-unit (IR) and move.</Paragraph> <Paragraph position="2"> Dialogue, in our notation, is similar to conversation in LOKI, while IR-units resemble exchanges. IR-units are recursive and, unlike LOKI, we allow arbitrary embedding of IR-units.</Paragraph> <Paragraph position="3"> The smallest unit handled by our dialogue manager is the move. An utterance can consist of more than one move and is thus regarded as a sequence of moves. A move object is used for describing information about a move. Moves are categorized according to the type of il-Iocutionary act and topic. Some typical move types are: Question (Q), Assertion (AS), Answer (A) and Directive (DI). Topic describes which knowledge source to consuit: the background system, i.e. solving a task (T), the ongoing dialogue (D) or the organisation of the back-I. The dialogue is an English translation of a dialogue from our corpus of Swedish dialogues collected in Wizard-of-Oz simulations. It is continued in section 4. ground system (S). For brevity when we refer to a move with its associated topic, the move type is subscribed with topic, e.g. Qr.</Paragraph> <Paragraph position="4"> * Normally an exchange of information begins with an initiative followed by a response (IR). The initiative can come from the system or the user. A typical IR-unit in a question-answer database application is a task-related question followed by a successful answer Qr/A-r. Other typical IR-units are: Qs/As for a clarification request from the user, Qr/ASs when the requested information is not in the database, Q~/A o for questions about the ongoing dialogue.</Paragraph> <Paragraph position="5"> * The dialogue:manager uses a dialogue tree (section 4) as: control structure. The root node is of type Dialogue (the D-node) and controls the overall interaction. When an IR-unit is finished it returns control to the D-node.</Paragraph> <Paragraph position="6"> The D-node creates an instance of a new IR-unit with information about initiator and responder. It also copies relevant information about salient objects and attributes from the previous IR-unit to the new one. Our simula.</Paragraph> <Paragraph position="7"> ti0ns show that users prefer coherence in the dialogue.</Paragraph> <Paragraph position="8"> Thus, we use the heuristic that no information explicitly changed is duplicated from one IR-unit to the next.</Paragraph> <Paragraph position="9"> As stated above, an instance of a dialogue object has one component describing static information about initiator, responder, salient objects etc., and another describing the process, i.e. the actions performed when executing the object. We call this a plan, although if we were to follow Pollack (1990) we could call it recipe-foractions. Figure 2 shows a template description for an IR-unit used in a database information-seeking application.</Paragraph> <Paragraph position="11"> The static component forms the context in which the processes are executed. The attributes are updated with new values during the execution of the action plan. For instance, a user IR-unit, i.e. an IR-unit which waits for a user initiative to be interpreted, has no value for the Initiative and Response slots until the initiative has been interpreted. This is discussed further in section 4.</Paragraph> <Paragraph position="12"> The process component of the IR-unit is divided into two different plan descriptions, one if the system initiate d the segment and another for a user-initiated segment. - 235 -However, as can be seen in figure 2, they use the same general actions for creating moves, acting and traversing the tree (up). The actions behave differently depending on the static description, for instance the action (access) uses the value of the slot Topic to determine which knowledge source to consult. Information about values of attributes describing the request for information is found in the dag structure delivered by the instantiator which is passed to the translator by the dialogue manager. The slot CurrontRequost contains the request formed by the translator and is used for clarifications.</Paragraph> <Paragraph position="13"> In database applications the system behaves as a user-directed interface. It initiates an IR-unit only for clarification requests, either because 1) difficulties arise when interpreting the utterance, or 2) difficulties arise when accessing the data base, e.g. when the user needs to provide a parameter for correct access, see S17> in example 2 below, or finally 3) if difficulties arise in the presentation of the result from the data base access. The action to take after a clarification request is first to check the validity of the response and then to propagate the information to the node which initiated the clarification.</Paragraph> <Paragraph position="14"> In other applications, e.g. tutoring or consultation systems, the behaviour need not be user-directed. Instead it may be system-directed or mixed initiative. In our approach this is achieved by customizing the dialogue objects, section 7.</Paragraph> <Paragraph position="15"> For move-units there are two different process descriptions, one for user moves and one for system moves. The user move has the plan ((parse) (instantiate) (up)) and the system move has the plan ((deep-generate) (surface-generate) (up)).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 The dialogue tree </SectionTitle> <Paragraph position="0"> The dialogue tree represents the dialogue as it develops in the interaction. Information about salient objects is represented in the dialogue tree and is used by the instantiator and deep generator. The dialogue manager updates the dialogue tree for each new move.</Paragraph> <Paragraph position="1"> An important feature of the dialogue manager is distributed control. Every node in the tree is responsible for its own correctness. For instance, the plan for a task related question-answer, Or/AT, contains no reparation strategies for missing information to the background system. If the interpreter fails to access the data base due to lack of information, the translator signals this to the DM which creates an instance of an IR-unit for a clarification request and inserts it into the Or/AT. The plan for clarificauon request then generates a move explaining the missing information and creates a user move waiting for the user input. This has the advantage that theplans are very simple, as they only have local scope, cf. sections 3 and 6. Furthermore, the plans are more generally applicable. null The tree is built bottom up but with a top down prediction from the context. This is illustrated in the dialogue in example 2, which will generate a dialogue tree with clarifications on two levels. Initially the D-node creates an instance of an IR-node and inserts it into the tree, i.e. creates links between the IR-node and the Dnode. The IR-node creates an instance of a user move.</Paragraph> <Paragraph position="2"> The move node parses and instantiates U16> successfully as an ASa- and then integrates it into the tree. Information from the move-node is then available also at the IR-node whose type can be determined as AST/AT. When the database is accessed from this node, the translator finds that there is a need for clarification, in this case concerning the use of the word large in connection with a boot. This creates a plan which first prompts the user with a question, S17>, and then waits for the user to give an answer. Here the user does not answer but instead expresses a request for clarification, U18>. This is shown in part 1) of figure 3 as the clarification IR-unit, QSs/As. The fact that U18> constitutes a clarification request and not an answer to S 17> is decided after the creation of the user move from U18>. When the DM receives the interpretation from the instantiator, it does not satisfy the expectation for an answer, and so it has to instantiate a new IR-unit for clarification request which is connected to the previously created IR-clarification request (Qr/AT).</Paragraph> <Paragraph position="3"> Utterance UI8> in the context of the Qr/Ar IR-unit indicates that the user needs some information about the background system and it is thus interpreted as Qs. This information is supplied in S19>. For the next utterance, U20>, a new user move is created which is integrated into the tree as an answer to the original clarification request. This information is propagated up to the first node AST/Ar which now can form an answer to the first question $21>, part 2) in figure 3. The next step (not shown in figure 3) is to generate a new IR-unit under D which will generate a new user move and the system is ready for further user input.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 The action plan </SectionTitle> <Paragraph position="0"> The plan describing a prototypical use of an object is pushed onto a slack called the action plan. In accordance with our distributed design, each node maintains its own stack, see figure 5. The overall control strategy is that the stack top is popped and executed. Complex plans, as when the query to the data base needs clarification, are handled with the same control mechanism. The dialogue manager then updates the action plan of the current node with an action for creating an instance of a,clarification request dialogue object and another action'to integrate - 236 new information. The DM pops the stack of the current node and executes that action. When this new exchange is completed the result is integrated into the node which initiated the clarification.</Paragraph> <Paragraph position="1"> Again, consider the dialogue tree in figure 3. Part 1) in figure 4 shows the stack for the node AST/Ar before processing U16>, i.e. before the move node is created which parses and instantiates the move. At this time the Popping the action (create-move user) results in the creation of a move node which is ready to interpret a user input. The move node has a plan of its own: ((parse) (instantiate) (up)). When UI6> is interpreted in the move node, AS T in figure 3, the move node ends with the action (up) which tries to find a corresponding father. In this case it succeeds with the IR-unit from which the move node was created and the dialogue is controlled from this node, now AST/AT. The slack top is now (access) which in this case uses the topic T, i.e. a data base access. However, the data base access does not succeed.</Paragraph> <Paragraph position="2"> Therefore a call for clarification, an action for later integrating the new information into the old request and a new call to (access) is placed on the slack. This is seen in part 2) of figure 4. The action (access) has different repair strategies for the different clarification request types described above. Similar repair strategies apply to all actions.</Paragraph> <Paragraph position="3"> The slack top is an action which creates a known IR-unit asking for a data base access parfimeter. This action then creates the Qr/Ar-node in figure 3. Now this node will have its own action plan stack from which processing is controlled. This node is also responsible for the correctness of the answer given from the user, which in this case results in a new clarification request. This does not affect the node AST/AT instead the clarifications are processed and eventually control is returned to the node AST/Ar and the new information:is integrated into its old request, stored in CurrentRequost.</Paragraph> <Paragraph position="4"> The two clarification nodes, QT/A r, Qs/As, in figure 3 behave in a similar fashion.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Scoreboard </SectionTitle> <Paragraph position="0"> Controlling the dialogue is only one of the responsibilities of the dialogue manager. It is also responsible for monitoring the dialogue. Information about salient objects is represented in the dialogue tree and is accessed through a scoreboard, figure 5. The scoreboard is the interface between the dialogue manager and the other modules in the NLI.</Paragraph> <Paragraph position="1"> The attributes of the scoreboard take their values from the tree via pointers or via retrieve functions which search the dialogue tree. The lexicon and grammar are written with references to the attributes on the score-board and therefore are not involved in traversing the dialogue tree.</Paragraph> <Paragraph position="2"> :Furthermore, the retrieve functions can be altered, allowing the search for a referent to an anaphoric expression to be application dependent. This means that we need only update the retrieve function connected to an element on the ~oreboard, not the grammar or lexicon, when an application requires a change in dialogue style.</Paragraph> </Section> class="xml-element"></Paper>