File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/w02-0201_metho.xml
Size: 27,733 bytes
Last Modified: 2025-10-06 14:07:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0201"> <Title>Synchronization in an Asynchronous Agent-based Architecture for Dialogue Systems</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 TRIPS Architecture </SectionTitle> <Paragraph position="0"> As mentioned above, the TRIPS system1 (Allen et al., 2000; Allen et al., 2001a; Allen et al., 2001b) is built on an agent-based architecture. Unlike many systems, however, the ow of information within TRIPS is not pipelined. The architecture and information ow between components is shown in Figure 1. In TRIPS, information ows between the three general areas of interpretation, behavior, and generation.</Paragraph> <Paragraph position="1"> Each TRIPS component is implemented as a separate process. Information is shared by passing KQML (Finin et al., 1997) messages through a central hub, the Facilitator, which supportsmessageloggingandsyntaxchecking as well as broadcast and selective broadcast between components.</Paragraph> <Paragraph position="2"> We flrst discuss the individual system components and their functions. We then describethe owofinformationthroughthesystem and illustrate it with an example. 1Further details of the TRIPS dialogue system can be found at our website:</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 System Components </SectionTitle> <Paragraph position="0"> Figure 1 shows the various components in the TRIPS system. Components are divided among three main categories: Interpretation, Behavior, and Generation. As shown in the flgure, some components straddle categories, meaning they represent state and provide services necessary for both sorts of processing. The Interpretation Manager (IM) interprets user input coming from the various modality processors as it arises. It interacts with Reference to resolve referring expressions and with the Task Manager (TM) to perform plan and intention recognition, as part of the interpretation process. It broadcasts recognized speech acts and their interpretationascollaborativeproblemsolvingac- null tions (see below), and incrementally updates the Discourse Context (DC). The Behavioral Agent (BA) is in some sense the autonomous \heart&quot; of the agent. It plans system behavior based on its own goals and obligations, the user's utterances and actions, and changes in the world state. Actions that require task- and domain-dependent processing are performed by the Task Manager. Actions that involve communication and collaboration with the user are sent to the Generation Manager (GM) in the form of communicative acts. The GM coordinates planning the speciflc content of utterances and display updatesandproducingtheresults. Itsbehaviorisdrivenbydiscourseobligations(fromthe null DC), and the directives it receives from the BA.</Paragraph> <Paragraph position="1"> Model The three main components (IM, BA, GM) communicate using messages based on a collaborative problem solving model of dialogue (Allen et al., 2002; Blaylock, 2002). We model dialogue as collaboration between agents which are planning and acting. Together, collaborating agents (i.e., dialogue partners)buildandexecuteplans,decidingon such things as objectives, recipes, resources, situations (facts about the world), and so forth. These are called collaborative problem solving objects, and are operated on by collaborative problem solving acts such as identity (present as a possibility), evaluate, adopt, and others. Thus, together, two agents may decide to adopt a certain objective, or identify a recipe to use for an objective. The agreed-upon beliefs, objectives, recipes, and so forth constitute the collaborative problem solving state.</Paragraph> <Paragraph position="2"> Of course, because the agents are autonomous, no agent can single-handedly change the collaborative problem solving (CPS)state. Interaction acts areactionsthat a single agent performs to attempt to change the CPS state. The interaction acts are initiate, continue, complete, and reject. Initiate proposes a new change to the CPS state. Continue adds new information to the proposal, and complete simply accepts the proposal (bringing about the change), without adding additional information. Of course, proposalscanbe rejected atanytime,causing them to fail.</Paragraph> <Paragraph position="3"> As an example, the utterance \Let's save the heart-attack victim in Pittsford&quot; in an emergency planning domain would be interpreted as two interaction acts: (initiate (identify objective (rescue person1))) and (initiate (adopt objective (rescue person1))).</Paragraph> <Paragraph position="4"> Here the user is proposing that they consider rescuing person1 as a possible objective to pursue. He is also proposing that they adopt it as an objective to plan for.2 Interaction acts are recognized (via intention recognition) from speech acts. Interaction acts and speech acts difier in several ways. First, speech acts describe a linguistic level of interaction (ask, tell, etc.), whereas interaction acts deal with a problem solving level (adopting objectives, evaluating recipes and so forth). Also, as shown above, a single speech act may correspond to many interaction acts.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Information Flow in the System </SectionTitle> <Paragraph position="0"> There are several paths along which information asynchronously ows through the system. Wediscussinformation owatthelevels of problem solving, discourse, and grounding.</Paragraph> <Paragraph position="1"> The section that follows then gives an example of how this proceeds.</Paragraph> <Paragraph position="2"> The problem solving level describes the actual underlying task or purposes of the dialogue and is based on interaction acts. We flrst describe the problem solving information ow when the user makes an utterance. We then discuss the case where the system takes initiative and how this results in an utterance by the system.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> User Utterance Following the diagram in </SectionTitle> <Paragraph position="0"> Figure 1, when a user makes an utterance, it goes through the Speech Recognizer to the Parser, which then outputs a list of speech acts (which cover the input) to the Interpretation Manager (IM). The IM then sends the speech acts to Reference for resolution.</Paragraph> <Paragraph position="1"> 2Here two interaction acts are posited because of the ability of the system to react to each separately, for example completing the flrst, but rejecting the second. Consider the possible response \No, not right now.&quot; (accept this as a possible objective, but reject adopting it right now), versus \The 911 center in Pittsford is handling that, we don't have to worry about it.&quot; (reject this as even a possible objective and reject adopting it). The scope of this paper precludes us from giving more detail about multiple interaction acts.</Paragraph> <Paragraph position="2"> The IM then sends these speech act hypotheses to the Task Manager (TM), which computes the corresponding interaction acts foreachaswellasaconfldencescorethateach hypothesis is the correct interpretation.</Paragraph> <Paragraph position="3"> Based on this, the IM then chooses the best interpretation and broadcasts3 the chosenCPSact(s)ina\systemunderstood&quot;mes- null sage. The TM receives this message and updates to the new collaborative problem solving state which this interpretation entails. TheBehavioralAgent(BA)receivesthe broadcast and decides if it wants to form any intentions for action based on the interaction act.</Paragraph> <Paragraph position="4"> Assuming the BA decides to act on the user's utterance, it sends execution and reasoningrequeststotheTM,whichpassesthem null on to the appropriate back-end components and returns the result to the BA.</Paragraph> <Paragraph position="5"> TheBAthenformsaninteractionactbased on this result and sends it to the GM to be communicatedtotheuser. TheGMthengeneratestextand/orgraphicalupdatesbasedon null the interaction act and utters/presents them to the user.</Paragraph> <Paragraph position="6"> In most pipelined and pipelined ow-of-information systems, the only ow of information is at this problem solving level. In TRIPS, however, there are other paths of information ow.</Paragraph> <Paragraph position="7"> System Initiative TRIPS is also capable of taking initiative. As we stated above, this initiative originates in the BA and can come from one of three areas: user utterances, private system objectives, or exogenous events. If the system, say because of an exogenous event, decides to take initiative and communicate with the user, it sends an interaction act to the GM. The GM then, following the same path as above, outputs content to the user.</Paragraph> <Paragraph position="8"> 3This is a selective broadcasts to the components which have registered for such messages.</Paragraph> <Paragraph position="9"> The discourse level4 describes information which is not directly related to the task at hand, but rather is linguistic in nature. This information is represented as salience information (for Reference) and discourse obligations (Traum and Allen, 1994).</Paragraph> <Paragraph position="10"> When the user makes an utterance, the input passes (as detailed above) through the SpeechRecognizer, totheParser, andthento theIM,whichcallsReferencetodoresolution.</Paragraph> <Paragraph position="11"> Based on this reference resolved form, the IM computes any discourse obligations which the utterance entails (e.g., if the utterance was a question, to address or answer it, also, to acknowledge that it heard the question).</Paragraph> <Paragraph position="12"> At this point, the IM broadcasts an \system heard&quot; message, which includes incurred discourse obligations and changes in salience. Upon receipt of this message, Discourse ContextupdatesitsdiscourseobligationsandRef- null erence updates its salience information.</Paragraph> <Paragraph position="13"> TheGMlearnsofnewdiscourseobligations from the Discourse Context and begins to try to fulflll them, regardless of whether or not it has heard from the BA about the problem solving side of things. However, there are some obligations it will be unable to fulflll without knowledge of what is happening at the problem solving level |answering or addressing the question, for example. However,otherobligationscanbefulfllledwithout null problemsolvingknowledge|anacknowledgment, for example |in which case, the GM produces content to fulflll the discourse obligation. null If the GM receives interaction acts and discourse obligations simultaneously, it must produce content which fulfllls both problem solving and discourse needs. Usually, these interaction acts and discourse obligations are towards the same objective |an obligation to address or answer a question, and an interaction act of identifying a situation (commu4Although it works in a conceptually similar way, the current system does not handle discourse level information ow quite so cleanly as is presented here. We intend to clean things up and move to this exact model in the near future.</Paragraph> <Paragraph position="14"> nicating the answer to the user), for example.</Paragraph> <Paragraph position="15"> However, because the system has the ability to take initiative, these interaction acts and discourse obligations may be disparate |an obligationtoaddressoransweraquestionand aninteractionacttoidentifyandadoptanew pressing objective, for example. In this case, the GM must plan content to fulflll the acts and obligations the best it can |apologize for not answering the question and then informing the user, for example. Through this method, the GM maintains dialogue coherence even though the BA is autonomous.</Paragraph> <Paragraph position="16"> The last level of information ow is at the level that we loosely call grounding (Clark and Schaefer, 1989; Traum, 1994).5 In TRIPS, acts and obligations are not accomplished and contexts are not updated unless theuserhas heard and/or understood thesystem's utterance.6 Upon receiving a new utterance, the IM flrst determines if it contains evidence of the user having heard and understood the utterance.7 If the user heard and understood, the IM broadcasts a \user heard&quot; message which contains both salience information from the previoussystemutteranceaswellaswhatdiscourse obligations the system utterance fulfllled. This message can be used by Reference to update salience information and by DiscourseContexttodischargefulfllleddiscourse null obligations.</Paragraph> <Paragraph position="17"> It is important that these contexts not be updated until the system know that the user heard its last utterance. If the user for example, walks away as the system speaks, the system's discourse obligations will still not fulfllled, and salience information will not ognized separately in the system. For future work, we would like to extend the system to handle them separately (e.g., the case of the user having heard but not understood).</Paragraph> <Paragraph position="18"> change.</Paragraph> <Paragraph position="19"> The GM receives the \user heard&quot; message and also knows which interaction act(s) the system utterance was presenting. It thenbroadcastsa\userunderstood&quot;message, which causes the TM to update the collaborative problem solving state, and the BA to release any goals and intentions fulfllled by the interaction act(s).</Paragraph> <Paragraph position="20"> Again, it is important that these context updates do not occur until the system has evidence that the user understood its last utterance (for reasons similar to those discussed above).</Paragraph> <Paragraph position="21"> This handling of grounding frees the system from the assumptions that the user always hears and understands each utterance.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 An Example </SectionTitle> <Paragraph position="0"> We use here an example from our TRIPS MedicationAdvisor domain ((Ferguson et al., 2002)). The Medication Advisor is a project carried out in conjunction with the Center for Future Health at the University of Rochester.8 The system is designed to help people(especiallytheelderly)understandand manage their prescription medications.</Paragraph> <Paragraph position="1"> With the huge growth in the number of pharmaceutical therapies, patients tend to end up taking a combination of several difierent drugs, each of which has its own characteristics and requirements. For example, each drug needs to be taken at a certain rate: once a day, every four hours, as needed, and so on.</Paragraph> <Paragraph position="2"> Some drugs need to be taken on an empty stomach, others with milk, others before or after meals, and so on. Overwhelmed with this large set of complex interactions many patients simply do not (or cannot) comply with their prescribed drug regimen (Claxton et al., 2001).</Paragraph> <Paragraph position="3"> TheTRIPSMedicationAdvisorisdesigned to help alleviate this problem by giving patients easy and accessible prescription information an management in their own home.</Paragraph> <Paragraph position="4"> Forourexample,weassumethatadialogue between the system and user is in progress, and a number of other topics have been addressed. At this certain point in the conversation, the system has just uttered \Thanks, I'll try that&quot; and now the user utters the following: null User: \Can I take an aspirin?&quot; We trace information ow flrst at the grounding level, then at the discourse level, and flnally at the problem solving level. This information ow is illustrated in Figure 2.</Paragraph> <Paragraph position="5"> Grounding Level The utterance goes through the Speech Recognizer and Parser to the IM. As illustrated in Figure 2a, based on theutterance, theIMrecognizesthattheuser heard and understood the system's last utterance, so it sends a \user heard&quot; message, whichcausestheDiscourseContexttoupdate discourseobligationsandReferencetoupdate salience based on the system's last utterance. The GM receives the \user heard&quot; message and sends the corresponding \user understood&quot;message,containingtheinteraction null act(s) motivating the system's last utterance. Uponreceivingthismessage, theTMupdates the collaborative problem solving state, and the BA updates its intentions and goals.</Paragraph> <Paragraph position="6"> Meanwhile ... things have been happening at the discourse level.</Paragraph> <Paragraph position="7"> Discourse Level After the IM sends the \user heard&quot; message, as shown in Figure 2b, it sends Reference a request to resolve references within the user's utterance. It then recognizes that the user has asked a question, which gives the system the discourse obligations of answering (or addressing) the question, as well as acknowledging the question. The IM then sends a \system heard&quot; message, which causes Reference to update salience and Discourse Context to store the newly-incurred discourse obligations.</Paragraph> <Paragraph position="8"> The GM receives the new discourse obligations, but has not yet received anything from the BA about problem solving (see below).</Paragraph> <Paragraph position="9"> Without knowledge of what is happening in problem solving, the GM is unable to fulflll the discourse obligation to answer (or address) the question. However, it is able to fulflll the obligation of acknowledging the question, so, after a certain delay of no response from the BA, the GM plans content to produce an acknowledgment, which causes the avatar9 to graphically show that it is thinking, and also causes the system to utter the following: System: \Hang on.&quot; Meanwhile ... things have been happening at the problem solving level as well.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Problem Solving Level Afteritsendsthe </SectionTitle> <Paragraph position="0"> \system heard&quot; message, as shown in Figure 2c, the IM computes possible speech acts for the input. In this case, there are two: a yes-no question about the ability to take aspirin and a request to evaluate the action of taking aspirin.</Paragraph> <Paragraph position="1"> These are sent to the TM for intention recognition. The flrst case (the yes-no question) does not seem to flt the task model well and receives a low score. (The system prefers interpretations in which the user wants information for a reason and not just for the sake of knowing something.) The second speech act is recognized as an initiate of an evaluation of the action of taking aspirin (i.e., the user wants to evaluate this action with the system). This hypothesis receives a higher score.</Paragraph> <Paragraph position="2"> The IM chooses the second interpretation and broadcasts a \system understood&quot; message that announces this interpretation. The TM receives this message and updates its collaborative problem solving state to re ect that the user did this interaction act. The BA receives the message and, as shown in Figure 2d, decides to adopt the intention of doing the evaluation and reporting it to the user. Itsendsanevaluationrequestfortheaction of the user taking an aspirin to the TM, which queries the back-end components (user knowledge-base and medication knowledgebase) about what prescriptions the user has and if any of them interact with aspirin.</Paragraph> <Paragraph position="3"> The back-end components report that the user has a prescription for Celebrex, and that Celebrexinteractswithaspirin. TheTMthen reports to the BA that the action is a bad idea.</Paragraph> <Paragraph position="4"> The BA then formulates an interaction act re ecting these facts and sends it to the GM.</Paragraph> <Paragraph position="5"> The GM then produces the following utterance, which performs the interaction act as well as fulfllls the discourse obligation of responding to the question.</Paragraph> <Paragraph position="6"> System: \No, you are taking Celebrex and Celebrex interacts with aspirin.&quot;</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Synchronization </SectionTitle> <Paragraph position="0"> The architecture above is somewhat idealized in that we have not yet given the details of how the components know which context to interpret messages in and how to ensure that messages get to components in the right order. null We flrst illustrate these problems by giving a few examples. We then discuss the solution we have implemented.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Examples of Synchronization Problems </SectionTitle> <Paragraph position="0"> One of the problems that faces most distributed systems is that there is no shared state between the agents. The flrst problem withthearchitecturedescribedinSection2is the lack of context in which to interpret messages. This is well illustrated by the interpret request from the IM to the TM.</Paragraph> <Paragraph position="1"> As discussed above, the IM sends its candidate speech acts to the TM, which performs intentionrecognitionandassignsascore. The problem is, in which context should the TM interpretutterances? Itcannotsimplychange its collaborative problem solving state each timeitperformsintentionrecognition,sinceit may get multiple requests from the IM, only one of which gets chosen to be the o-cial \interpretation&quot; of the system.</Paragraph> <Paragraph position="2"> We have stated that the TM updates its contexteachtimeitreceivesa\systemunderstood&quot; or \user understood&quot; message. This brings up, however, the second problem of our distributed system. Because all components are operating asynchronously (including the user, we may add), it is impossible to guarantee that messages will arrive at a component in the desired order. This is because \desired order&quot; is a purely pragmatic assessment. Even with a centralized Facilitator through which all messages must pass, the only guarantee is that messages from a particular component to a particular component will arrive in order; i.e., if component A sends component B three messages, they will get there in the order that component A sent them. However, if components A and C each send component B a message, we cannot say which will arrive at component B flrst.</Paragraph> <Paragraph position="3"> What this means is that the \current&quot; contextoftheIMmaybeverydifierentfromthat null of the TM. Consider the case where the system has just made an utterance and the user is responding. As we describe above, the flrst thingtheIMdoesischeckforhearingandunderstandingandsendsofia\userheard&quot;mes- null sage. The GM, when it receives this message, sends the corresponding \user understood&quot; message, which causes the TM to update to a context containing the system's utterance.</Paragraph> <Paragraph position="4"> In the meantime, the IM is assuming the context of the systems last utterance, as it does interpretation. It then sends ofi interpret requests to the TM. Now, if the TM receives an interpret request from the IM before itreceivesthe\userunderstood&quot;message from the GM, it will try to interpret the input in the context of the user's last utterance (as if the user had made two utterance in a row, without the system saying anything in between). This situation will give erroneous results and must be avoided.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Synchronization Solution </SectionTitle> <Paragraph position="0"> The solution to these problems is, of course, synchronization: causing components to wait at certain stages to make sure they are in the same context. It is interesting to note that these synchronization points are highly related to a theory of grounding and common ground.</Paragraph> <Paragraph position="1"> Tosolvetheflrstproblemlistedabove(lack of context), we have components append context assumptions to the end of each message. Thus, instead of the IM sending the TM a request to interpret B, it sends the TM a request to interpret B in the context of having understood A. Likewise, instead of the IM requesting that Reference resolve D, it requests that Reference resolve D having heard C. Having messages explicitly contain context assumptions allows components to interpret messages in the correct context.</Paragraph> <Paragraph position="2"> With this model, context now becomes discrete, incrementing with every \chunk&quot; of common ground.10 These common ground updates correspond exactly to the \heard&quot; and \understood&quot; messages we described above. Thus, in order to perform a certain task (reference resolution, intention recognition, etc.), a component must know in which common ground context it must be done.</Paragraph> <Paragraph position="3"> The solution to the second problem (message ordering) follows from explicitly listing context assumptions. If a component receives a message that is appended with a context about which the component hasn't received an update notice (the \heard&quot; or \understood&quot; message), the component simply defers processing of the message until it has received the corresponding update message and can update its context. This ensures that, although messages may not be guaranteed to arrive in the right order, they will be processed in the right context. This provides the necessary synchronization and allows the asynchronous system components to work together in a coherent manner.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Discussion </SectionTitle> <Paragraph position="0"> We believe that, in general, this has several ramiflcations for any agent-based, nonpipelined ow-of-information architecture.</Paragraph> <Paragraph position="1"> 1. Agents which are queried about more than one hypothesis must keep state for 10For now we treat each utterance as a single \chunk&quot;. We are interested, however, in moving to more flne-grained models of dialogue. We believe that our current architecture will still be useful as we move to a flner-grained model.</Paragraph> <Paragraph position="2"> all hypotheses until one is chosen.</Paragraph> <Paragraph position="3"> 2. Agents cannot assume shared context.</Paragraph> <Paragraph position="4"> Because both the system components and user are acting asynchronously, it is impossible in general for any agent to know what context another agent is currently in.</Paragraph> <Paragraph position="5"> 3. Agents must be able to defer working on input. This feature allows them to wait for synchronization if they receive a message to be interpreted in a context they have not yet reached.</Paragraph> <Paragraph position="6"> Asynchronousagent-basedarchitecturesallow dialogue systems to interact with users in a much richer and more natural way. Unfortunately, the cost of moving to a truly distributed system is the need to deal with synchronization. Fortunately, for dialogue systems, models of grounding provide a suitable and intuitive basis for system synchronization. null</Paragraph> </Section> class="xml-element"></Paper>