File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0610_intro.xml
Size: 2,631 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0610"> <Title>Conversational Robots: Building Blocks for Grounding Word Meaning</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Background </SectionTitle> <Paragraph position="0"> Although robots, speech recognizers, and speech synthesizers can easily be connected in shallow ways, the results are limited to canned behavior. The proper integration of language in a robot highlights deep theoretical issues that touch on virtually all aspects of artificial intelligence (and cognitive science) including perception, action, memory, and planning. Along with other researchers, we use the term grounding to refer to problem of anchoring the meaning of words and utterances in terms of non-linguistic representations that the language user comes to know through some combination of evolutionary and lifetime learning.</Paragraph> <Paragraph position="1"> A natural approach is to connect words to perceptual classifiers so that the appearance of an object, event, or relation in the environment can instantiate a corresponding word in the robot. This basic idea has been applied in many speech-controlled robots over the years (Brown et al., 1992; McGuire et al., 2002; Crangle and Suppes, 1994).</Paragraph> <Paragraph position="2"> Detailed models have been suggested for sensory-motor representations underlying color (Lammens, 1994), spatial relations (Regier, 1996; Regier and Carlson, 2001). Models for grounding verbs include grounding verb meanings in the perception of actions (Siskind, 2001), and grounding in terms of motor control programs (Bailey, 1997; Narayanan, 1997). Object shape is clearly important when connection language to the world, but remains a challenging problem in computational models of language grounding. Landau and Jackendoff provide a detailed analysis of additional visual shape features that play a role in language (Landau and Jackendoff, 1993).</Paragraph> <Paragraph position="3"> In natural conversation, people speak and gesture to coordinate joint actions (Clark, 1996). Speakers and listeners use various aspects of their physical environment to encode and decode utterance meanings. Communication partners are aware of each other's gestures and foci of attention and integrate these source of information into the conversational process. Motivated by these factors, recent work on social robots have explored mechanisms that provide visual awareness of human partners' gaze and other facial cues relevant for interaction (Breazeal, 2003; Scassellati, 2002).</Paragraph> </Section> class="xml-element"></Paper>