File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-3001_intro.xml

Size: 4,830 bytes

Last Modified: 2025-10-06 14:03:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="E06-3001">
  <Title>What's There to Talk About? A Multi-Modal Model of Referring Behavior in the Presence of Shared Visual Information</Title>
  <Section position="3" start_page="7" end_page="8" type="intro">
    <SectionTitle>
2 Motivation
</SectionTitle>
    <Paragraph position="0"> There are several motivating factors for developing a computational model of referring behavior in shared visual contexts. First, a model of referring behavior that integrates a component of shared visual information can be used to increase the robustness of interactive agents that converse with humans in real-world situated environments. Second, such a model can be applied to the development of a range of technologies to support distributed group collaboration and mediated communication. Finally, such a model can be used to provide a deeper theoretical understanding of how humans make use of various forms of shared visual information in their every-day communication.</Paragraph>
    <Paragraph position="1"> The development of an integrated multi-modal model of referring behavior can improve the performance of state-of-the-art computational models of communication currently used to support conversational interactions with an intelligent agent (Allen et al., 2005; Devault et al., 2005; Gorniak &amp; Roy, 2004). Many of these models rely on discourse state and prior linguistic contributions to successfully resolve references in a given utterance. However, recent technological advances have created opportunities for human-human and human-agent interactions in a wide variety of contexts that include visual objects of interest. Such systems may benefit from a data-driven model of how collaborative pairs adapt their language in the presence (or absence) of shared visual information. A successful computational model of referring behavior in the presence of visual information could enable agents to emulate many elements of more natural and realistic human conversational behavior.</Paragraph>
    <Paragraph position="2"> A computational model may also make valuable contributions to research in the area of computer-mediated communication. Video-mediated communication systems, shared media spaces, and collaborative virtual environments are technologies developed to support joint activities between geographically distributed groups.</Paragraph>
    <Paragraph position="3"> However, the visual information provided in each of these technologies can vary drastically.</Paragraph>
    <Paragraph position="4"> The shared field of view can vary, views may be misaligned between speaking partners, and delays of the sort generated by network congestion may unintentionally disrupt critical information required for successful communication (Brennan, 2005; Gergle et al., 2004). Our proposed model could be used along with a detailed task analysis to inform the design and development of such technologies. For instance, the model could inform designers about the times when particular visual elements need to be made more salient in order to support effective communication. A computational model that can account for visual salience and understand its impact on conversational coherence could inform the construction of shared displays or dynamically restructure the environment as the discourse unfolds.</Paragraph>
    <Paragraph position="5"> A final motivation for this work is to further our theoretical understanding of the role shared visual information plays during communication.</Paragraph>
    <Paragraph position="6"> A number of behavioral studies have demonstrated the need for a more detailed theoretical understanding of human referring behavior in the presence of shared visual information. They suggest that shared visual information of the task objects and surrounding workspace can significantly impact collaborative task performance and communication efficiency in task-oriented interactions (Kraut et al., 2003; Monk &amp; Watts, 2000; Nardi et al., 1993; Whittaker, 2003). For example, viewing a partner's actions facilitates monitoring of comprehension and enables efficient object reference (Daly-Jones et al., 1998), changing the amount of available visual information impacts information gathering and recovery from ambiguous help requests (Karsenty, 1999), and varying the field of view that a remote helper has of a co-worker's environment influences performance and shapes communication patterns in directed physical tasks (Fussell et al., 2003).</Paragraph>
    <Paragraph position="7"> Having a computational description of these processes can provide insight into why they occur, can expose implicit and possibly inadequate simplifying assumptions underlying existing  theoretical models, and can serve as a guide for future empirical research.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML