File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/c04-1062_intro.xml
Size: 4,465 bytes
Last Modified: 2025-10-06 14:02:05
<?xml version="1.0" standalone="yes"?> <Paper uid="C04-1062"> <Title>Multi-Human Dialogue Understanding for Assisting Artifact-Producing Meetings</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Recently, much attention has been focused on the domain of multi-person meeting understanding. Meeting dialogue presents a wide range of challenges including continuous multi-speaker automatic speech recognition (ASR), 2D whiteboard gesture and handwriting recognition, 3D body and eye tracking, and multi-modal multi-human dialogue management and understanding. A significant amount of research has gone toward understanding the problems facing the collection, organization, and visualization of meeting data (Moore, 2002; Waibel et al., 2001), and meeting corpora like the ICSI Meeting Corpus (Janin et al., 2003) are being made available. Continuing research in the multimodal meeting domain has since blossomed, including ongoing work from projects such as AMI1 and M42, and efforts from several institutions.</Paragraph> <Paragraph position="1"> Previous work on automatic meeting understanding has mostly focused on surface-level recognition, such as speech segmentation, for obvious reasons: understanding free multi-human speech at any level is an extremely difficult problem for which best performance is currently poor. In addition, the primary focus for applications has been on off-line tools such as post-meeting multimodal information browsing.</Paragraph> <Paragraph position="2"> In parallel to such efforts we are applying dialogue-management techniques to attempt to understand and monitor meeting dialogues as they occur, and to supplement multimodal meeting records with information relating to the structure and purpose of the meeting.</Paragraph> <Paragraph position="3"> Our efforts are focused on assisting artifact-producing meetings, i.e. meetings for which the intended outcome is a tangible product such as a project management plan or a budget. The dialogue-understanding system helps to create and manipulate the artifact, delivering a final product at the end of the meeting, while the state of the artifact is used as part of the dialogue context under which interpretation of future utterances is performed, serving a number of useful roles in the dialogue-understanding process: * The dialogue manager employs generic dialogue moves with plugin points to be defined by specific artifact types, e.g. project plan, budget; * The artifact state helps resolve ambiguity by providing evidence for multimodal fusion and constraining topic-recognition; * The artifact type can be used to bias ASR language-models; * The constructed artifact provides a interface for a meeting browser that supports directed queries about discussion that took place in the meeting, e.g. &quot;Why did we decide on that date?&quot; In addition, we focus our attention on the handling of ambiguities produced on many levels, including those produced during automatic speech recognition, multimodal communication, and artifact manipulation. The present dialogue manager uses several techniques to do this, including the maintenance of highlighting the dialogue-management components. null multiple dialogue-move hypotheses, fusion with multimodal gestures, and the incorporation of artifact-specific plug-ins.</Paragraph> <Paragraph position="4"> The software architecture we use for managing multi-human dialogue is an enhancement of a dialogue-management toolkit previously used at CSLI in a range of applications, including command-and-control of autonomous systems (Lemon et al., 2002) and intelligent tutoring (Clark et al., 2002). In this paper, we detail the dialogue-management components (Section 3), which support a larger project involving multiple collaborating institutions (Section 2) to build a multimodal meeting-understanding system capable of integrating speech, drawing and writing on a whiteboard, and physical gesture recognition.</Paragraph> <Paragraph position="5"> We also describe our toolkit for on-line and off-line meeting browsing (Section 4), which allows a meeting participant, observer, or developer to visually and interactively answer questions about the history of a meeting, the processes performed to understand it, and the causal relationships between dialogue and artifact manipulation.</Paragraph> </Section> class="xml-element"></Paper>