File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-1004_concl.xml
Size: 1,735 bytes
Last Modified: 2025-10-06 13:53:58
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-1004"> <Title>A Salience-Based Approach to Gesture-Speech Alignment</Title> <Section position="9" start_page="0" end_page="0" type="concl"> <SectionTitle> 8 Conclusions </SectionTitle> <Paragraph position="0"> This work represents one of the first efforts at aligning gesture and speech on a corpus of natural multimodal communication. Using greedy optimization and only a minimum of linguistic processing, we significantly out-perform a competitive baseline, which has actually been implemented in existing multimodal user interfaces. Our approach is shown to be robust to spoken English, even with a high level of disfluency. By blending some of the benefits of empirical and knowledge-based approaches, our system can learn from a large corpus of data, but degrades gracefully when limited data is available.</Paragraph> <Paragraph position="1"> Obviously, alignment is only one small component of a comprehensive system for recognizing and understanding multimodal communication. Putting aside the issue of gesture recognition, there is still the problem of deriving semantic information from aligned speech-gesture units. The solutions to this problem will likely have to be specially tailored to the application domain. While our evaluation indicates that our approach achieves what appears to be a high level of accuracy, the true test will be whether our system can actually support semantic information extraction from multimodal data. Only the construction of such a comprehensive end-to-end system will reveal whether the algorithm and features that we have chosen are sufficient, or whether a more sophisticated approach is required.</Paragraph> </Section> class="xml-element"></Paper>