File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4011_intro.xml
Size: 2,048 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4011"> <Title>Performance Evaluation and Error Analysis for Multimodal Reference Resolution in a Conversation System</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Multimodal systems enable users to interact with computers through multiple modalities such as speech, gesture, and gaze (Bolt 1980; Cassell et al., 1999; Cohen et al., 1996; Chai et al., 2002; Johnston et al., 2002). One important aspect of building multimodal systems is for the system to understand the meanings of multimodal user inputs. A key element of this understanding process is reference resolution. Reference resolution is a process that finds the most proper referents to referring expressions. To resolve multimodal references, many approaches have been developed, from the use of a focus space model (Neal et al., 1998), a centering framework (Zancanaro et al, 1997), contextual factors (Huls et al., 1995); to recent approaches using unification (Johnston, 1998), finite state machines (Johnston and Bangalore 2000), and context-based rules (Kehler 2000).</Paragraph> <Paragraph position="1"> Given the substantial work in this area; it is important to evaluate the state of the art, understand the limitations, * This work was supported by grant IIS-0347548 from the and identify directions for future improvement. We conducted a series of user studies to evaluate the capability of reference resolution in a multimodal conversation system. In particular, this paper examines two important aspects: (1) algorithm requirements for handling a variety of references, and (2) technology requirements for achieving good real-time performance. In the following sections, we first give a brief description of our system. Then we analyze the main error sources during real-time human-machine interaction and discuss the key strategies for designing robust reference resolution algorithms.</Paragraph> </Section> class="xml-element"></Paper>