File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-1004_concl.xml

Size: 1,735 bytes

Last Modified: 2025-10-06 13:53:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1004">
  <Title>A Salience-Based Approach to Gesture-Speech Alignment</Title>
  <Section position="9" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Conclusions
</SectionTitle>
    <Paragraph position="0"> This work represents one of the first efforts at aligning gesture and speech on a corpus of natural multimodal communication. Using greedy optimization and only a minimum of linguistic processing, we significantly out-perform a competitive baseline, which has actually been implemented in existing multimodal user interfaces. Our approach is shown to be robust to spoken English, even with a high level of disfluency. By blending some of the benefits of empirical and knowledge-based approaches, our system can learn from a large corpus of data, but degrades gracefully when limited data is available.</Paragraph>
    <Paragraph position="1"> Obviously, alignment is only one small component of a comprehensive system for recognizing and understanding multimodal communication. Putting aside the issue of gesture recognition, there is still the problem of deriving semantic information from aligned speech-gesture units. The solutions to this problem will likely have to be specially tailored to the application domain. While our evaluation indicates that our approach achieves what appears to be a high level of accuracy, the true test will be whether our system can actually support semantic information extraction from multimodal data. Only the construction of such a comprehensive end-to-end system will reveal whether the algorithm and features that we have chosen are sufficient, or whether a more sophisticated approach is required.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML