File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-2131_concl.xml
Size: 7,578 bytes
Last Modified: 2025-10-06 13:58:08
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2131"> <Title>An Architecture for Dialogue Management, Context Tracking, and Pragmatic Adaptation in Spoken Dialogue Systems</Title> <Section position="9" start_page="797" end_page="799" type="concl"> <SectionTitle> 4 Implementations of the Architecture </SectionTitle> <Paragraph position="0"> We have implemented two spoken dialogue systems using the architecture presented. The first is a telephone-based interface to a simulated employee Time Reporting System (TRS), as might be used at a large corporation. We then ported the system to a spoken interface to a battlefield simulation (Modular Semi-Automated Forces, or ModSAF).</Paragraph> <Paragraph position="1"> In our implementation of this architecture, each component is a unique agent which may reside on its own platform and communicate over a network. The middleware our agents use to communicate is the Open Agent Architecture (OAA) (Moran et al. 1997) from SRI. The OAA's flexibility allowed us to easily hook up modules and experiment with the division of labor between the three discourse components we are studying. We treat the Dialogue Manager as a special OAA agent that insists on being called frequently so that it can monitor the progress of communicative events through the system. null</Paragraph> <Section position="1" start_page="798" end_page="798" type="sub_section"> <SectionTitle> 4.1 The Time Reporting System (TRS) </SectionTitle> <Paragraph position="0"> The architecture components in our TRS system are listed in Table 5, along with their specific implementations used. Each implemented module included a thin OAA agent layer, allowing it to communicate via the OAA.</Paragraph> <Paragraph position="1"> Components not in our focus (shaded in gray) are either commercial or simulated software. For Context Tracking, we use an algorithm based on (LuperFoy 1992). For Dialogue Management, we developed a simple agent able to control a system-initiated dialogue, as well as handle non-linguistic events from the back-end. The third discourse component, Pragmatic Adaptation, awaits future research, and was simulated for this system.</Paragraph> <Paragraph position="2"> Figure 2 presents a sample TRS dialogue.</Paragraph> <Paragraph position="3"> System: Welcome. What is your employee number? User: 12345 System: What is your password? User: 54321 System: How can I help you? User: What's the first charge number? System: 123GK498J User: What's the name of that task? System: Project X User: Charge 6 hours to it today for me. System: 6 hours has been charged to Project X. When the user logs in, the back-end system brings up a non-linguistic event--the list of tasks, with associated charge numbers, which belong to the user. The Dialogue Manager receives this and passes it to the Context Tracker. The Context Tracker is then able to resolve the first charge number, as well as subsequent dependent references such as that task, it, and today. null</Paragraph> </Section> <Section position="2" start_page="798" end_page="799" type="sub_section"> <SectionTitle> 4.2 The ModSAF Interface </SectionTitle> <Paragraph position="0"> We ported the TRS demo to a simulated battlefield back-end called ModSAF. We used the same components with the exception of the speech recognizer and the back-end interface.</Paragraph> <Paragraph position="1"> The Dialogue Manager was improved over the TRS demo in several ways. First, we added the capability of the Dialogue Manager to dynamically inform the speech recognizer of what input to expect, i.e., which language model to use. The Dialogue Manager could also add words to the speech recognizer's vocabulary on the fly. We chose Nuance (from Nuance Communications) as our speech recognition component specifically because it supports such run-time updates. Figure 3 presents a sample ModSAF dialogue.</Paragraph> <Paragraph position="2"> Note that only the user speaks.</Paragraph> <Paragraph position="3"> When the user asks to create an entity, the Dialogue Manager detects the beginning of a subdialogue, and informs the speech recognizer to restrict its expected grammar to that of entity creation (name and location). Later, the back-end (ModSAF) sends the Dialogue Manager a non-linguistic event, in which a different platoon (created by another player in the simulation) appears. This event includes a name for the new platoon; the Dialogue Manager passes this to the speech recognizer, so that it may later recognize it. In addition, the event is passed to the Context Tracker, so that it may later resolve the reference that new platoon.</Paragraph> <Paragraph position="4"> To illustrate some advantages of our architecture, we briefly mention what we needed to change to port from TRS to ModSAF. First, the Context Tracker needed no change at all--operating on linguistic principles, it is domain-independent. LuperFoy's framework does provide for a layer connected to a knowledge source, for external context--this would need to be changed when changing domains. The Dialogue Manager also required little change to its core code, adding only the ability to influence the speech recognizer. The Pragmatic Adaptation Module, being dependent on the domain of the back-end, is where most changes are needed when switching domains.</Paragraph> <Paragraph position="5"> Conclusion We have presented a modular, flexible architecture for spoken dialogue systems which separates discourse processing into three component tasks with three corresponding software modules: Dialogue Management, Context Tracking, and Pragmatic Adaptation. We discussed the roles of these components in a complex, near-future scenario portraying a variety of dialogue types. We closed by describing implementations of these dialogues using the architecture presented, including development and porting of the first two discourse components.</Paragraph> <Paragraph position="6"> The architecture itself is derived from a standard blackboard control structure. This is appropriate for our current dialogue processing research in two ways. First, it does not require a prior full enumeration of all possible subroutine firing sequences. Rather, the possibilities emerge from local decisions made by modules that communicate with the blackboard, depositing data and consuming data from the blackboard. Second, as we learn categories of dialogue segment types, we can move away from the fully decentralized control structure, to one in which the central Dialogue Manager, as a blackboard module with special status, assumes increasing decision power for processing flow, in cases of dialogue segment type with which it is familiar. The intended contribution of this work is thus in the generic definition of standard dialogue functions such as dynamic troubleshooting (repair), context updating, anaphora resolution, and translation of natural language interpretations into functional interface languages of back-end systems.</Paragraph> <Paragraph position="7"> Future work includes investigation of issues raised when a human is engaged in more than one of our scenario dialogues concurrently. For example, how does one speech enabled dialogue system among many determine when it is being addressed by the user, and how can the system judge whether the current utterance is humancomputer, i.e., to be fully interpreted and acted upon by the system as opposed to a human-human utterance that is to be simply recorded, transcribed, or translated without interpretation.</Paragraph> </Section> </Section> class="xml-element"></Paper>