XML Viewer - p98-2136

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2136_intro.xml
Size: 3,990 bytes
Last Modified: 2025-10-06 14:06:33
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2136">
  <Title>Confirmation in Multimodal Systems</Title>
  <Section position="3" start_page="823" end_page="824" type="intro">
    <SectionTitle>
1 MOTIVATION
</SectionTitle>
    <Paragraph position="0"> Historically, multimodal systems have either not confLrmed input \[18-22\] or confLrmed only the primary modality of such systems--speech. This is reasonable, considering the evolution of multimodal systems from their speech-based roots. Observations of QuickSet prototypes last year, however, showed that simply confirming the results of speech recognition was often problematic---users had the expectation that whenever a command was conf~ it would be executed. We observed that confwming speech prior to multimodal integration led to three possible cases where this expectation might not be met: ambiguous gestures, nonmeaningful speech, and delayed confinmtion.</Paragraph>
    <Paragraph position="1"> The first problem with speech-only confirmation was that the gesture recognizer produced results that were often ambiguous. For example, recognition of the ink in Figure 1 could result in confusion. The arc (left) in the figure provides some semantic content, but it may be incomplete. The user may have been selecting something or she may have been creating an area, line, or route. On the other hand, the circle-like gesture (middle) might not be designating an area or specifying a selection; it might be indicating a circuitous route or line. Without more information from other modalities, it is difficult to guess the hutentions behind these gestures. OOc  determine which interpretation is correct. Some gestures can be assumed to be fully specified by themselves (at right, an editor's mark meaning &amp;quot;cut&amp;quot;). However, most rely on complementary input for complete interpretation. If the gesture recognizer misinterprets the gesture, failure will not occur until integration. The speech hypothesis might not combine with any of the gesture hypotheses. Also, earlier versions of our speech recognition agent were limited to a single recognition hypothesis and one that might not even be syntactically  correct, in which case integration would always fail.</Paragraph>
    <Paragraph position="2"> Finally, the confirmation act itself could delay the arrival of speech into the process of multimodal integration. If the user chose to correct the speech recognition output or to delay confirmation for any other reason, integration itself could fail due to sensitivity in the multimodal architecture.</Paragraph>
    <Paragraph position="3"> In all three cases, users were asked to confirm a command that could not be executed. An important lesson learned from these observations is that when confirming a command, users think they are giving approval; thus, they expect that the command can be executed without hindrance. Due to these early observations, we wished to determine whether delaying confirmation until after modalities have combined would enhance the human-computer dialogue in multimodal systems. Therefore, we hypothesize that late-stage confirmations will lead to three improvements in dialogue. First, because late-stage systems can be designed to present only feasible commands for confirmation, blended inputs that fail to produce a feasible command can be immediately flagged as a non-understanding and presented to the user as such, rather than as a possible command. Second, because of multimodal disambiguation, misunderstandings can be reduced, and therefore the number of conversational tums required to reach mutual understanding can be reduced as well. Finally, a reduction in turns combined with a reduction in time spent will lead to reducing the &amp;quot;collaborative effort&amp;quot; in the dialogue. To examine our hypotheses, we designed an experiment using QuickSet to determine if late-stage confmmtions enhance human-computer conversational performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML