File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1001_intro.xml
Size: 3,213 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?> <Paper uid="P04-1001"> <Title>Optimization in Multimodal Interpretation</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multimodal systems provide a natural and effective way for users to interact with computers through multiple modalities such as speech, gesture, and gaze (Oviatt 1996). Since the first appearance of &quot;Put-That-There&quot; system (Bolt 1980), a variety of multimodal systems have emerged, from early systems that combine speech, pointing (Neal et al., 1991), and gaze (Koons et al, 1993), to systems that integrate speech with pen inputs (e.g., drawn graphics) (Cohen et al., 1996; Wahlster 1998; Wu et al., 1999), and systems that engage users in intelligent conversation (Cassell et al., 1999; Stent et al., 1999; Gustafson et al., 2000; Chai et al., 2002; Johnston et al., 2002).</Paragraph> <Paragraph position="1"> One important aspect of building multimodal systems is multimodal interpretation, which is a process that identifies the meanings of user inputs.</Paragraph> <Paragraph position="2"> In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback).</Paragraph> <Paragraph position="3"> These dependencies form a rich set of constraints from various aspects (e.g., semantic, temporal, and contextual). A correct interpretation can only be attained by simultaneously considering these constraints. In this process, two issues are important: first, a mechanism to combine information from various sources to form an overall interpretation given a set of constraints; and second, a mechanism that achieves the best interpretation among all the possible alternatives given a set of constraints. The first issue focuses on the fusion aspect, which has been well studied in earlier work, for example, through unification-based approaches (Johnston 1998) or finite state approaches (Johnston and Bangalore, 2000). This paper focuses on the second issue of optimization.</Paragraph> <Paragraph position="4"> As in natural language interpretation, there is strong evidence that competition and ranking of constraints is important to achieve an optimal interpretation for multimodal language processing.</Paragraph> <Paragraph position="5"> We have developed a graph-based optimization approach for interpreting multimodal references.</Paragraph> <Paragraph position="6"> This approach achieves an optimal interpretation by simultaneously applying semantic, temporal, and contextual constraints. A preliminary evaluation indicates the effectiveness of this approach, particularly for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures. In this paper, we first describe the necessities for optimization in multimodal interpretation, then present our graph-based optimization approach and discuss how our approach addresses key principles in Optimality Theory used for natural language interpretation (Prince and Smolensky 1993).</Paragraph> </Section> class="xml-element"></Paper>