XML Viewer - p06-4015

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-4015_metho.xml
Size: 11,118 bytes
Last Modified: 2025-10-06 14:10:36
<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-4015">
  <Title>First.Last@dfki.de</Title>
  <Section position="5" start_page="0" end_page="57" type="metho">
    <SectionTitle>
3 Architecture
</SectionTitle>
    <Paragraph position="0"> Our system architecture follows the classical approach (Bunt et al., 2005) of a pipelined architecture with multimodal interpretation (fusion) and  U: Show me the Beatles albums.</Paragraph>
    <Paragraph position="1"> S: I have these four Beatles albums.</Paragraph>
    <Paragraph position="2"> [shows a list of album names] U: Which songs are on this one? [selects the Red Album] S: The Red Album contains these songs [shows a list of the songs] U: Play the third one.</Paragraph>
    <Paragraph position="3"> S: [music plays]  fission modules encapsulating the dialogue manager. Fig. 2 shows the modules and their interaction: Modality-specific recognizers and analyzers provide semantically interpreted input to the multimodal fusion module that interprets them in the context of the other modalities and the current dialogue context. The dialogue manager decides on the next system move, based on its model of the tasks as collaborative problem solving, the current context and also the results from calls to the MP3 database. The turn planning module then determines an appropriate message to the user by planning the content, distributing it over the available output modalities and finally co-ordinating and synchronizing the output. Modality-specific output modules generate spoken output and graphical display update. All modules interact with the extended information state which stores all context  Many tasks in the SAMMIE system are modeled by a plan-based approach. Discourse modeling, interpretation management, dialogue management and linguistic planning, and turn planning are all based on the production rule system PATE2 (Pfleger, 2004). It is based on some concepts of the ACT-R 4.0 system, in particular the goal-oriented application of production rules, the 2Short for (P)roduction rule system based on (A)ctivation and (T)yped feature structure (E)lements.</Paragraph>
    <Paragraph position="4"> activation of working memory elements, and the weighting of production rules. In processing typed feature structures, PATE provides two operations that both integrate data and also are suitable for condition matching in production rule systems, namely a slightly extended version of the general unification, but also the discourse-oriented operation overlay (Alexandersson and Becker, 2001).</Paragraph>
  </Section>
  <Section position="6" start_page="57" end_page="57" type="metho">
    <SectionTitle>
4 Related Work and Novel Aspects
</SectionTitle>
    <Paragraph position="0"> Many dialogue systems deployed today follow a state-based approach that explicitly models the full (finite) set of dialogue states and all possible transitions between them. The VoiceXML3 standard is a prominent example of this approach. This has two drawbacks: on the one hand, this approach is not very flexible and typically allows only so-called system controlled dialogues where the user is restricted to choosing their input from provided menu-like lists and answering specific questions.</Paragraph>
    <Paragraph position="1"> The user never is in control of the dialogue. For restricted tasks with a clear structure, such an approach is often sufficient and has been applied successfully. On the other hand, building such applications requires a fully specified model of all possible states and transitions, making larger applications expensive to build and difficult to test.</Paragraph>
    <Paragraph position="2"> In SAMMIE we adopt an approach that models the interaction on an abstract level as collaborative problem solving and adds application specific knowledge on the possible tasks, available resources and known recipes for achieving the goals.</Paragraph>
    <Paragraph position="3"> In addition, all relevant context information is administered in an Extended Information State.</Paragraph>
    <Paragraph position="4"> This is an extension of the Information State Update approach (Traum and Larsson, 2003) to the multi-modal setting.</Paragraph>
    <Paragraph position="5"> Novel aspects in turn planning and realization include the comprehensive modeling in a single, OWL-based ontology and an extended range of context-sensitive variation, including system alignment to the user on multiple levels.</Paragraph>
  </Section>
  <Section position="7" start_page="57" end_page="59" type="metho">
    <SectionTitle>
5 Flexible Multi-modal Interaction
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="57" end_page="58" type="sub_section">
      <SectionTitle>
5.1 Extended Information State
</SectionTitle>
      <Paragraph position="0"> The information state of a multimodal system needs to contain a representation of contextual information about discourse, but also a representation of modality-specific information and user-specific information which can be used to plan system output suited to a given context. The over- null all information state (IS) of the SAMMIE system is shown in Fig. 3.</Paragraph>
      <Paragraph position="1"> The contextual information partition of the IS represents the multimodal discourse context. It contains a record of the latest user utterance and preceding discourse history representing in a uniform way the salient discourse entities introduced in the different modalities. We adopt the three-tiered multimodal context representation used in the SmartKom system (Pfleger et al., 2003). The contents of the task partition are explained in the next section.</Paragraph>
    </Section>
    <Section position="2" start_page="58" end_page="58" type="sub_section">
      <SectionTitle>
5.2 Collaborative Problem Solving
</SectionTitle>
      <Paragraph position="0"> Our dialogue manager is based on an agent-based model which views dialogue as collaborative problem-solving (CPS) (Blaylock and Allen, 2005). The basic building blocks of the formal CPS model are problem-solving (PS) objects, which we represent as typed feature structures. PS object types form a single-inheritance hierarchy. In our CPS model, we define types for the upper level of an ontology of PS objects, which we term abstract PS objects. There are six abstract PS objects in our model from which all other domain-specific PS objects inherit: objective, recipe, constraint, evaluation, situation, and resource. These are used to model problem-solving at a domain-independent level and are taken as arguments by all update operators of the dialogue manager which implement conversation acts (Blaylock and Allen, 2005).</Paragraph>
      <Paragraph position="1"> The model is then specialized to a domain by inheriting and instantiating domain-specific types and instances of the PS objects.</Paragraph>
    </Section>
    <Section position="3" start_page="58" end_page="58" type="sub_section">
      <SectionTitle>
5.3 Adaptive Turn Planning
</SectionTitle>
      <Paragraph position="0"> The fission component comprises detailed content planning, media allocation and coordination and synchronization. Turn planning takes a set of CPS-specific conversational acts generated by the dialogue manager and maps them to modality-specific communicative acts.</Paragraph>
      <Paragraph position="1"> Information on how content should be distributed over the available modalities (speech or graphics) is obtained from Pastis, a module which stores discourse-specific information. Pastis provides information about (i) the modality on which the user is currently focused, derived by the current discourse context; (ii) the user's current cognitive load when system interaction becomes a secondary task (e.g., system interaction while driving); (iii) the user's expertise, which is represented as a state variable. Pastis also contains information about factors that influence the preparation of output rendering for a modality, like the currently used language (German or English) or the display capabilities (e.g., maximum number of displayable objects within a table). Together with the dialogue manager's embedded part of the information state, the information stored by Pastis forms the Extended Information State of the SAMMIE system (Fig. 3).</Paragraph>
      <Paragraph position="2"> Planning is then executed through a set of production rules that determine which kind of information should be presented through which of the available modalities. The rule set is divided in two subsets, domain-specific and domain-independent rules which together form the system's multi-modal plan library.</Paragraph>
    </Section>
    <Section position="4" start_page="58" end_page="59" type="sub_section">
      <SectionTitle>
5.4 Spoken Natural Language Output
Generation
</SectionTitle>
      <Paragraph position="0"> Our goal is to produce output that varies in the surface realization form and is adapted to the context. A template-based module has been developed and is sufficient for classes of system output that do not need fine-tuned context-driven variation. Our template-based generator can also deliver alternative realizations, e.g., alternative syntactic constructions, referring expressions, or lexical items. It is implemented by a set of straightforward sentence planning rules in the PATE system to build the templates, and a set of XSLT transformations to yield the output strings. Output in German and English is produced by accessing different dictionaries in a uniform way.</Paragraph>
      <Paragraph position="1"> In order to facilitate incremental development of the whole system, our template-based module has a full coverage wrt. the classes of sys- null tem output that are needed. In parallel, we are experimenting with a linguistically more powerful grammar-based generator using OpenCCG4, an open-source natural language processing environment (Baldridge and Kruijff, 2003). This allows for more fine-grained and controlled choices between linguistic expressions in order to achieve contextually appropriate output.</Paragraph>
    </Section>
    <Section position="5" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
5.5 Modeling with an Ontology
</SectionTitle>
      <Paragraph position="0"> We use a full model in OWL as the knowledge representation format in the dialogue manager, turn planner and sentence planner. This model includes the entities, properties and relations of the MP3 domain-including the player, data base and playlists. Also, all possible tasks that the user may perform are modeled explicitly. This task model is user centered and not simply a model of the application's API.The OWL-based model is transformed automatically to the internal format used in the PATE rule-interpreter.</Paragraph>
      <Paragraph position="1"> We use multiple inheritance to model different views of concepts and the corresponding presentation possibilities; e.g., a song is a browsableobject as well as a media-object and thus allows for very different presentations, depending on context. Thereby PATE provides an efficient and elegant way to create more generic presentation planning rules.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML