File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-0505_intro.xml
Size: 2,691 bytes
Last Modified: 2025-10-06 14:03:14
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0505"> <Title>A Connectionist Model of Language-Scene Interaction</Title> <Section position="2" start_page="0" end_page="36" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> People learn language within the context of the surrounding world, and use it to refer to objects in that world, as well as relationships among those objects (e.g., Gleitman, 1990). Recent research in the visual worlds paradigm, wherein participants' gazes in a scene while listening to an utterance are monitored, has yielded a number of insights into the time course of sentence comprehension. The careful manipulation of information sources in this experimental setting has begun to reveal important characteristics of comprehension such as incrementality and anticipation. For example, people's attention to objects in a scene closely tracks their mention in a spoken sentence (Tanenhaus et al., 1995), and world and linguistic knowledge seem to be factors that facilitate object identification (Altmann and Kamide, 1999; Kamide et al., 2003). More recently, Knoeferle et al. (2005) have shown that when scenes include depicted events, such visual information helps to establish important relations between the entities, such as role relations.</Paragraph> <Paragraph position="1"> Models of sentence comprehension to date, however, continue to focus on modelling reading behavior. No model, to our knowledge, attempts to account for the use of immediate (non-linguistic) context. In this paper we present results from two simulations using a Simple Recurrent Network (SRN; Elman, 1990) modified to integrate input from a scene with the characteristic incremental processing of such networks in order to model people's ability to adaptively use the contextual information in visual scenes to more rapidly interpret and disambiguate a sentence. In the modelling of five visual worlds experiments reported here, accurate sentence interpretation hinges on proper case-role assignment to sentence referents. In particular, modelling is focussed on the following aspects of sentence processing: * anticipation of upcoming arguments and their roles in a sentence * adaptive use of the visual scene as context for a spoken utterance * influence of depicted events on developing interpretation null * multiple/conflicting information sources and their relative importance on whether the hare is the subject or object of the sentence, as well as the thematic role structure of the verb. These gaze fixations reveal that people use linguistic and world knowledge to anticipate upcoming arguments.</Paragraph> </Section> class="xml-element"></Paper>