File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-1715_abstr.xml

Size: 5,489 bytes

Last Modified: 2025-10-06 13:42:41

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1715">
  <Title>SALT: An XML Application for Web-based Multimodal Dialog Management</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> This paper describes the Speech Application Language Tags, or SALT, an XML based spoken dialog standard for multimodal or speech-only applications. A key premise in SALT design is that speech-enabled user interface shares a lot of the design principles and computational requirements with the graphical user interface (GUI). As a result, it is logical to introduce into speech the object-oriented, event-driven model that is known to be flexible and powerful enough in meeting the requirements for realizing sophisticated GUIs. By reusing this rich infrastructure, dialog designers are relieved from having to develop the underlying computing infrastructure and can focus more on the core user interface design issues than on the computer and software engineering details. The paper focuses the discussion on the Web-based distributed computing environment and elaborates how SALT can be used to implement multimodal dialog systems.</Paragraph>
    <Paragraph position="1"> How advanced dialog effects (e.g., cross-modality reference resolution, implicit confirmation, multimedia synchronization) can be realized in SALT is also discussed.</Paragraph>
    <Paragraph position="2"> Introduction Multimodal interface allows a human user to interaction with the computer using more than one input methods. GUI, for example, is multimodal because a user can interact with the computer using keyboard, stylus, or pointing devices. GUI is an immensely successful concept, notably demonstrated by the World Wide Web. Although the relevant technologies for the Internet had long existed, it was not until the adoption of GUI for the Web did we witness a surge on its usage and rapid improvements in Web applications.</Paragraph>
    <Paragraph position="3"> GUI applications have to address the issues commonly encountered in a goal-oriented dialog system. In other words, GUI applications can be viewed as conducting a dialog with its user in an iconic language. For example, it is very common for an application and its human user to undergo many exchanges before a task is completed. The application therefore must manage the interaction history in order to properly infer user's intention. The interaction style is mostly system initiative because the user often has to follow the prescribed interaction flow where allowable branches are visualized in graphical icons. Many applications have introduced mixed initiative features such as type-in help or search box.</Paragraph>
    <Paragraph position="4"> However, user-initiated digressions are often recoverable only if they are anticipated by the application designers. The plan-based dialog theory (Sadek et al 1997, Allen 1995, Cohen et al 1989) suggests that, in order for the mixed initiative dialog to function properly, the computer and the user should be collaborating partners that actively assist each other in planning the dialog flow. An application will be perceived as hard to use if the flow logic is obscure or unnatural to the user and, similarly, the user will feel frustrated if the methods to express intents are too limited. It is widely believed that spoken language can improve the user interface as it provides the user a natural and less restrictive way to express intents and receive feedbacks.</Paragraph>
    <Paragraph position="5"> The Speech Application Language Tags (SALT 2002) is a proposed standard for implementing spoken language interfaces. The core of SALT is a collection of objects that enable a software program to listen, speak, and communicate with other components residing on the underlying platform (e.g., discourse manager, other input modalities, telephone interface, etc.). Like their predecessors in the Microsoft Speech Application Interface (SAPI), SALT objects are programming language independent. As a result, SALT objects can be embedded into a HTML or any XML document as the spoken language interface (Wang 2000). Introducing speech capabilities to the Web is not new (Aron 1991, Ly et al 1993, Lau et al 1997). However, it is the utmost design goal of SALT that advanced dialog management techniques (Sneff et al 1998, Rudnicky et al 1999, Lin et al 1999, Wang 1998) can be realized in a straightforward manner in SALT.</Paragraph>
    <Paragraph position="6"> The rest of the paper is organized as follows. In Sec. 1, we first review the dialog architecture on which the SALT design is based. It is argued that advanced spoken dialog models can be realized using the Web infrastructure. Specifically, various stages of dialog goals can be modeled as Web pages that the user will navigate through.</Paragraph>
    <Paragraph position="7"> Considerations in flexible dialog designs have direct implications on the XML document structures. How SALT implements these document structures are outlined. In Sec. 2, the XML objects providing spoken language understanding and speech synthesis are described. These objects are designed using the event driven architecture so that they can included in the GUI environment for multimodal interactions. Finally in Sec. 3, we describe how SALT, which is based on XML, utilizes the extensibility of XML to allow new extensions without losing document portability.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML