File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-3015_intro.xml

Size: 5,374 bytes

Last Modified: 2025-10-06 14:03:47

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-3015">
  <Title>Clavius: Bi-Directional Parsing for Generic Multimodal Interaction</Title>
  <Section position="3" start_page="0" end_page="85" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Since the seminal work of Bolt (Bolt, 1980), the methods applied to multimodal interaction (MMI) have diverged towards unreconcilable approaches retrofitted to models not specifically amenable to the problem. For example, the representational differences between neural networks, decision trees, and finite-state machines (Johnston and Bangalore, 2000) have limited the adoption of the results using these models, and the typical reliance on the use of whole unimodal sentences defeats one of the main advantages of MMI - the ability to constrain the search using cross-modal information as early as possible.</Paragraph>
    <Paragraph position="1"> CLAVIUS is the result of an effort to combine sensing technologies for several modality types, speech and video-tracked gestures chief among them, within the immersive virtual environment (Boussemart,2004)showninFigure1. Itspurpose is to comprehend multimodal phrases such as &amp;quot;put this arrowsoutheast here arrowsoutheast .&amp;quot;, for pointing gestures arrowsoutheast, in either command-based or dialogue interaction.</Paragraph>
    <Paragraph position="2"> CLAVIUS provides a flexible, and trainable new bi-directional parsing algorithm on multi-dimensional input spaces, and produces modalityindependent semantic interpretation with a low computational cost.</Paragraph>
    <Section position="1" start_page="0" end_page="85" type="sub_section">
      <SectionTitle>
1.1 Graphical Models and Unification
</SectionTitle>
      <Paragraph position="0"> Unification grammars on typed directed acyclic graphs have been explored previously in MMI, but typically extend existing mechanisms not designed for multi-dimensional input. For example, both (Holzapfel et al., 2004) and (Johnston, 1998) essentially adapt Earley's chart parser by representing edges as sets of references to terminal input elements - unifying these as new edges are added to the agenda. In practice this has led to systems that analyze every possible subset of the input resulting in a combinatorial explosion that balloons further when considering the complexities of cross-sentential phenomena such as anaphora, and the effects of noise and uncertainty on speech and gesture tracking. We will later show the extent to which CLAVIUS reduces the size of the search space.</Paragraph>
      <Paragraph position="1">  Directed graphs conveniently represent both syntactic and semantic structure, and all partial parses in CLAVIUS , including terminallevel input, are represented graphically. Few restrictions apply, except that arcs labelled CAT and TIME must exist to represent the grammar category and time spanned by the parse, respectively1. Similarly, all grammar rules, Gi : LHS [?]- RHS1 RHS2 ... RHSr, are graphical structures, as exemplified in Figure 2.</Paragraph>
    </Section>
    <Section position="2" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
1.2 Multimodal Bi-Directional Parsing
</SectionTitle>
      <Paragraph position="0"> Our parsing strategy combines bottom-up and top-down approaches, but differs from other approaches to bi-directional chart parsing (Rocio, 1998) in several key respects, discussed below.</Paragraph>
      <Paragraph position="1">  A defining characteristic of our approach is that edges are selected asynchronously by two concurrent processing threads, rather than serially in a two-stage process. In this way, we can distribute processing across multiple machines, or dynamically alter the priorities given to each thread. Generally, this allows for a more dynamic processwherenothreadcandominatetheother. In typical bi-directional chart parsing the top-down component is only activated when the bottom-up component has no more legal expansions (Ageno, 2000).</Paragraph>
      <Paragraph position="2">  Alhough evidence suggests that deictic gestures overlap or follow corresponding spoken pronomials 85-93% of the time (Kettebekov et al, 1Usually this timespan corresponds to the real-time occurrence of a speech or gestural event, but the actual semantics are left to the application designer 2002), wemustallowforallpossiblepermutations of multi-dimensional input - as in &amp;quot;put arrowsoutheast this arrowsoutheast here.&amp;quot; vs. &amp;quot;put this arrowsoutheast here arrowsoutheast .&amp;quot;, for example. We therefore take the unconvential approach of placing no mandatory ordering constraints on constituents, hence the rule Gabc : A [?]- B C parses the input &amp;quot; C B&amp;quot;. We show how we can easily maintain regular temporal ordering in SS3.5.</Paragraph>
    </Section>
    <Section position="3" start_page="85" end_page="85" type="sub_section">
      <SectionTitle>
1.2.3 Partial Qualification
Whereas existing bi-directional chart parsers
</SectionTitle>
      <Paragraph position="0"> maintain fully-qualified edges by incrementally adding adjacent input words to the agenda, CLAVIUS has the ability to construct parses that instantiate only a subset of their constituents, so Gabc also parses the input &amp;quot;B&amp;quot;, for example. Repercussions are discussed in SS3.4 and SS4.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML