File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/w03-0811_metho.xml

Size: 19,520 bytes

Last Modified: 2025-10-06 14:08:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-0811">
  <Title>ing in Language Generation: An Abstract Reference</Title>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 High-level Interfaces for Dialog System
Components
</SectionTitle>
    <Paragraph position="0"> Instead of using programming interfaces, the interaction between distributed components within the testbed framework is based on the exchange of structured data through messages. The communication platform is open to transfer arbitrary contents but careful design of information flow and accurate specification of content formats constitute essential elements of our approach.</Paragraph>
    <Paragraph position="1"> Agent communication languages like KQML (Finin et al., 1994) and FIPA ACL (Pitt and Mamdani, 1999) are not a natural choice in our context. In general, large-scale dialog systems are a mixture of knowledge-based and conventional data-processing components. A further aspect relates to the pool architecture, which does not rely on unspecific point-to-point communication but on a clear modularization of data links. The specification of the content format for each pool defines the common language that dialog system components use to interoperate.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.1 XML-based Data Interfaces
</SectionTitle>
      <Paragraph position="0"> Over the last few years, the so-called extensible markup language has become the premier choice for the flexible definition of application-specific data formats for information exchange. XML technology, which is based on standardized specifications, progresses rapidly and offers an enormous spectrum of useful techniques and tools.</Paragraph>
      <Paragraph position="1"> XML-based languages define an external notation for the representation of structured data and simplify the interchange of complex data between separate applications. All such languages share the basic XML syntax, which defines whether an arbitrary XML structure is well-formed, and they are build upon fundamental concepts like elements and attributes. A specific markup language needs to define the structure of the data by imposing constraints on the valid use of selected elements and attributes. This means that the language serves to encode semantic aspects of the data into syntactic restrictions.</Paragraph>
      <Paragraph position="2"> Various approaches have been developed for the formal specification of XML-based languages. The most prominent formalism is called document type definition.</Paragraph>
      <Paragraph position="3"> A DTD basically defines for each allowed element all allowed attributes and possibly the acceptable attribute values as well as the nesting and occurrences of each element. The DTD approach, however, is more and more superseded by XML Schema. Compared with the older DTD mechanism, a schema definition (XSD) offers two main advantages: The schema itself is also specified in XML notation and the formalism is far more expressive as it enables more detailed restrictions on valid data structures. This includes in particular the description of element contents and not only the element structure. As a schema specification can provide a well-organized type structure it also helps to better document the details of the data format definition. A human friendly presentation of the communication interfaces is an important aid during system development.</Paragraph>
      <Paragraph position="4"> It should be noted that the design of an XML language for the external representation of complex data constitutes a non-trivial task. Our experience is that design decisions have to be made carefully. For example, it is better to minimize the use of attributes. They are limited to unstructured data and may occur at most once within a single element. Preferring elements over attributes better supports the evolution of a specification since the content model of an element can easily be redefined to be structured and the maximum number of occurrences can simply be increased to more than one. A further principle for a well-designed XML language requires that the element structure reflects all details of the inherent structure of the represented data, i.e. textual content for an element should be restricted to well-defined elementary types. Another important guideline is to apply strict naming rules so that it becomes easier to grasp the intended meaning of specific XML structures.</Paragraph>
      <Paragraph position="5"> From the point of view of component development,</Paragraph>
      <Paragraph position="7"> tice represents the interpretation result for a multimodal user input that can be stated as: &amp;quot;I would like to know more about this [a0 ].&amp;quot; transferred content structures. The DOM API makes the data available as a generic tree structure--the document object model--in terms of elements and attributes. Another interesting option is to employ XSLT stylesheets to flexibly transform between the external XML format used for communication and a given internal markup language of the specific component. The use of XSLT makes it easier to adapt a component to interface modifications and simplifies its re-use in another dialog system. Instead of working on basic XML structures like elements and attributes, XML data binding can be used for a direct mapping between program internal data structures and application-specific XML markup. In this approach, the language specification in form of a DTD or an XML Schema is exploited to automatically generate a corresponding object model in a given programming language.</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
3.2 Multimodal Markup Language
</SectionTitle>
      <Paragraph position="0"> In the context of the SMARTKOM project (see section 4.2) we have developed M3L (Multimodal Markup Language) as a complete XML language that covers all data interfaces within this complex multimodal dialog system. Instead of using several quite different XML languages for the various data pools, we aimed at an integrated and coherent language specification, which includes all sub-structures that may occur on the different pools. In order to make the specification process manageable and to provide a thematic organization, the M3L language definition has been decomposed into about 40 schema specifications.</Paragraph>
      <Paragraph position="1"> Figure 2 shows an excerpt from a typical M3L expression. The basic data flow from user input to system output continuously adds further processing results so that the representational structure will be refined step-by-step. Intentionally, M3L has not been devised as a generic knowledge representation language, which would require an inference engine in every single component so that the exchanged structures can be interpreted adequately. Instead, very specific element structures are used to convey meaning on the syntactic level. Obviously, not all relevant semantic aspects can be covered on the syntax level using a formalism like DTD or XSD. This means, that it is impossible to exclude all kinds of meaningless data from the language definition and the design of an interface specification will always be a sort of compromise. Conceptual taxonomies provide the foundation for the representation of domain knowledge as it is required within a dialog system to enable a natural conversation in the given application scenario. In order to exchange instantiated knowledge structures between different system components they need to be encoded in M3L. Instead of relying on a manual reproduction of the underlying terminological knowledge within the M3L definition we decided to automate that task. Our tool OIL2XSD (Gurevych et al., 2003) transforms an ontology written in OIL (Fensel et al., 2001) into an M3L compatible XML Schema definition. The resulting schema specification captures the hierarchical structure and a significant part of the semantics of the ontology. For example in Figure 2, the representation of the event structure inside the intention lattice originates from the ontology. The main advantage of this approach is that the structural knowledge available on the semantic level is consistently mapped to the communication interfaces and M3L can easily be updated as the ontology evolves.</Paragraph>
      <Paragraph position="2"> In addition to the language specification itself, a specific M3L API has been developed, which offers a light-weight programming interface to simplify the processing of such XML structures within the implementation of a component. Customized testbed utilities like tailored XSLT stylesheets for the generic data viewer as well as several other tools are provided for easier evaluation of M3L-based processing results.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Sample Applications
</SectionTitle>
    <Paragraph position="0"> Our framework and the MULTIPLATFORM testbed have been employed to realize various natural language and multimodal dialog systems. In addition to the research prototypes mentioned here, MULTIPLATFORM has also been used as an integration platform for inhouse projects of industrial partners and for our own commercial projects.</Paragraph>
    <Paragraph position="1"> The first incarnation of MULTIPLATFORM arose from the VERBMOBIL project where the initial system architecture, which relied on a multi-agent approach with point-to-point communication, did not prove to be scalable (Kl&amp;quot;uter et al., 2000). The testbed has been enhanced in the context of the SMARTKOM project and was recently adapted for the COMIC system. As described in the previous sections, the decisive improvement of the current MULTIPLATFORM testbed is, besides a more robust implementation, a generalized architecture framework for multimodal dialog systems and the use of XML-based data interfaces as examplified by the Multimodal Markup Language M3L.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 VERBMOBIL
</SectionTitle>
      <Paragraph position="0"> VERBMOBIL (Wahlster, 2000) is a speaker-independent and bidirectional speech-to-speech translation system that aims to provide users in mobile situations with simultaneous dialog interpretation services for restricted topics. The system handles dialogs in three businessoriented domains--including appointment scheduling, travel planning, and remote PC maintenance--and provides context-sensitive translations between three languages (German, English, Japanese).</Paragraph>
      <Paragraph position="1"> VERBMOBIL follows a hybrid approach that incorporates both deep and shallow processing schemes. A peculiarity of the architecture is its multi-engine approach. Five concurrent translations engines, based on statistical translation, case-based translation, substring-based translation, dialog-act based translation, and semantic transfer, compete to provide complete or partial translation results.</Paragraph>
      <Paragraph position="2"> The final choice of the translation result is done by a statistical selection module on the basis of the confidence measures provided by the translation paths.</Paragraph>
      <Paragraph position="3"> In addition to a stationary prototype for face-to-face dialogs, a another instance has been realized to offer translation services via telephone (Kirchmann et al., 2000).</Paragraph>
      <Paragraph position="4"> The final VERBMOBIL demonstrator consists of about 70 distributed software components that work together to recognize spoken input, analyze and translate it, and finally utter the translation. These modules are embedded into an earlier version of the MULTIPLATFORM testbed using almost 200 data pools--replacing several thousand point-to-point connections--to interconnect the components. null</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 SMARTKOM
</SectionTitle>
      <Paragraph position="0"> SMARTKOM is a multimodal dialog system that combines speech, gesture, and facial expressions for both, user input and system output (Wahlster et al., 2001). The system aims to provide an anthropomorphic and affective user interface through its personification of an interface agent. The interaction metaphor is based on the so-called situated, delegation-oriented dialog paradigm. The basic idea is, that the user delegates a task to a virtual communication assistant which is visualized as a life-like character. The interface agent recognizes the user's intentions and goals, asks the user for feedback if necessary,  Smartakus, the SMARTKOM life-like character is shown in the lower left corner.</Paragraph>
      <Paragraph position="1"> accesses the various services on behalf of the user, and presents the results in an adequate manner.</Paragraph>
      <Paragraph position="2"> The current version of the MULTIPLATFORM testbed, including M3L, is used as the integration platform for SMARTKOM. The overall system architecture includes about 40 different components. As shown in Figure 3, the SMARTKOM project addresses three different application scenarios.</Paragraph>
      <Paragraph position="3"> SMARTKOM PUBLIC realizes an advanced multi-modal information and communication kiosk for airports, train stations, or other public places. It supports users seeking for information concerning movie programs, offers reservation facilities, and provides personalized communication services using telephone, fax, or electronic mail.</Paragraph>
      <Paragraph position="4"> SMARTKOM HOME serves as a multimodal portal to information services. Using a portable webpad, the user is able to utilize the system as an electronic program guide or to easily control consumer electronics devices like a TV set or a VCR. Similar to the kiosk application, the user may also use communication services at home.</Paragraph>
      <Paragraph position="5"> In the context of SMARTKOM HOME two different interaction modes are supported and the user is able to easily switch between them. In lean-forward mode coordinated speech and gesture input can be used for multimodal interaction with the system. Lean-backward mode instead is constrained to verbal communication.</Paragraph>
      <Paragraph position="6"> SMARTKOM MOBILE uses a PDA as a front end, which can be added to a car navigation system or is carried by a pedestrian. This application scenario comprises services like integrated trip planning and incremental route guidance through a city via GPS and GSM,</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 COMIC
COMIC2 (Conversational Multimodal Interaction with
</SectionTitle>
      <Paragraph position="0"> Computers) is a recent research project that focuses on computer-based mechanisms of interaction in cooperative work. One specific sample application for COMIC is a design tool for bathrooms with an enhanced multi-modal interface. The main goal of the experimental work is to show that advanced multimodal interaction can make such a tool usable for non-experts as well.</Paragraph>
      <Paragraph position="1"> The realization of the integrated COMIC demonstrator is based on the MULTIPLATFORM testbed. Figure 4 displays the control interface of the multimodal dialog system. On the input side, speech and handwriting in combination with 3-dimensional pen-based gestures can be employed by the user. On the output side, a dynamic avatar with synthesized facial, head and eye movements is combined with task-related graphical and textual information. In addition to multiple input and output channels, there are components that combine the inputs-taking into account paralinguistic information like intonation and hesitations--and interpret them in the context of the dialog, plan the application-specific actions to be taken and finally split the output information over the available channels.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> GCSI, the Galaxy Communicator software infrastructure (Seneff et al., 1999), is an open source architecture for the realization of natural language dialog systems. It can be described as a distributed, message-based, client-server architecture, which has been optimized for constructing spoken dialog systems. The key component in this framework is a central hub, which mediates the interaction among various servers that realize different dialog system components. The central hub does not only handle all communications among the server modules but is  also responsible to maintain the flow of control that determines the processing within the integrated dialog system. To achieve this, the hub is able to interpret scripts encoded in a special purpose, run-time executable programming language.</Paragraph>
    <Paragraph position="1"> The GCSI architecture is fundamentally different from our approach. Within the MULTIPLATFORM testbed there exists no centralized controller component which could become a potential bottleneck for more complex dialog systems.</Paragraph>
    <Paragraph position="2"> OAA, the Open Agent Architecture (Martin et al., 1999), is a framework for integrating a community of heterogeneous software agents in a distributed environment. All communication and cooperation between the different is achieved via messages expressed in ICL, a logic-based declarative language capable of representing natural language expressions. Similar to the GCSI architecture, a sort of centralized processing unit is required to control the behavior of the integrated system. So-called facilitator agents reason about the agent interactions necessary for handling a given complex ICL expression, i.e. the facilitator coordinates the activities of agents for the purpose of achieving higher-level, complex problem-solving objectives. Sample applications built with the OAA framework also incorporated techniques to use multiple input modalities. The user can point, speak, draw, handwrite, or even use a standard graphical user interface in order to communicate with a collection of agents.</Paragraph>
    <Paragraph position="3"> RAGS (Cahill et al., 2000) does not address the entire architecture of dialog systems and multimodal interaction. The RAGS approach, which stands for Reference Architecture for Generation Systems, focuses instead on natural language generation systems and aims to produce an architectural specification and model for the development of new applications in this area. RAGS is based on the well-known three-stage pipeline model for natural language generation which distinguishes between content determination, sentence planning, and linguistic realization. The main component of the RAGS architecture is a data model, in the form of a set of declarative linguistic representations which cover the various levels of representation that have to be taken into account within the generation process. XML-based notations for the data model can be used in order to exchange RAGS representations between distributed components. The reference architecture is open regarding the technical interconnection of the different components of a generation system. One specifically supported solution is the use of a single centralized data repository.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML