File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/01/p01-1016_intro.xml
Size: 6,620 bytes
Last Modified: 2025-10-06 14:01:10
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1016"> <Title>Non-Verbal Cues for Discourse Structure</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2. Background </SectionTitle> <Paragraph position="0"> Only recently have computational linguists begun to examine the association of nonverbal behaviors and language. In this section we review research by non-computational linguists and discuss how this research has been employed to formulate algorithms for natural language generation or understanding.</Paragraph> <Paragraph position="1"> About three-quarters of all clauses in descriptive discourse are accompanied by gestures [17], and within those clauses, the most effortful part of gestures tends to co-occur with or just before the phonologically most prominent syllable of the accompanying speech [13]. It has been shown that when speech is ambiguous or in a speech situation with some noise, listeners rely on gestural cues [22] (and, the higher the noise-tosignal ratio, the more facilitation by gesture). Even when gestural content overlaps with speech (reported to be the case in roughly 50% of utterances, for descriptive discourse), gesture often emphasizes information that is also focused pragmatically by mechanisms like prosody in speech. In fact, the semantic and pragmatic compatibility in the gesture-speech relationship recalls the interaction of words and graphics in multimodal presentations [11].</Paragraph> <Paragraph position="2"> On the basis of results such as these, several researchers have built animated embodied conversational agents that ally synthesized speech with animated hand gestures. For example, Lester et al. [15] generate deictic gestures and choose referring expressions as a function of the potential ambiguity and proximity of objects referred to. Rickel and Johnson [19]'s pedagogical agent produces a deictic gesture at the beginning of explanations about objects. Andre et al. [1] generate pointing gestures as a sub-action of the rhetorical action of labeling, in turn a sub-action of elaborating. Cassell and Stone [3] generate either speech, gesture, or a combination of the two, as a function of the information structure status and surprise value of the discourse entity.</Paragraph> <Paragraph position="3"> Head and eye movement has also been examined in the context of discourse and conversation.</Paragraph> <Paragraph position="4"> Looking away from one's interlocutor has been correlated with the beginning of turns. From the speaker's point of view, this look away may prevent an overload of visual and linguistic information. On the other hand, during the execution phase of an utterance, speakers look more often at listeners. Head nods and eyebrow raises are correlated with emphasized linguistic items - such as words accompanied by pitch accents [7]. Some eye movements occur primarily at the ends of utterances and at grammatical boundaries, and appear to function as synchronization signals. That is, one may request a response from a listener by looking at the listener, and suppress the listener's response by looking away. Likewise, in order to offer the floor, a speaker may gaze at the listener at the end of the utterance. When the listener wants the floor, s/he may look at and slightly up at the speaker [10]. It should be noted that turn taking only partially accounts for eye gaze behavior in discourse. A better explanation for gaze behavior integrates turn taking with the information structure of the propositional content of an utterance [5]. Specifically, the beginning of themes are frequently accompanied by a look-away from the hearer, and the beginning of rhemes are frequently accompanied by a look-toward the hearer. When these categories are co-temporaneous with turn construction, then they are strongly predictive of gaze behavior.</Paragraph> <Paragraph position="5"> Results such as these have led researchers to generate eye gaze and head movements in animated embodied conversational agents.</Paragraph> <Paragraph position="6"> Takeuchi and Nagao, for example, [21] generate gaze and head nod behaviors in a &quot;talking head.&quot; Cassell et al. [2] generate eye gaze and head nods as a function of turn taking behavior, head turns just before an utterance, and eyebrow raises as a function of emphasis.</Paragraph> <Paragraph position="7"> To our knowledge, research on posture shifts and other gross body movements, has not been used in the design or implementation of computational systems. In fact, although a number of conversational analysts and ethnomethodologists have described posture shifts in conversation, their studies have been qualitative in nature, and difficult to reformulate as the basis of algorithms for the generation of language and posture. Nevertheless, researchers in the non-computational fields have discussed posture shifts extensively. Kendon [13] reports a hierarchy in the organization of movement such that the smaller limbs such as the fingers and hands engage in more frequent movements, while the trunk and lower limbs change relatively rarely.</Paragraph> <Paragraph position="8"> A number of researchers have noted that changes in physical distance during interaction seem to accompany changes in the topic or in the social relationship between speakers. For example Condon and Osgton [9] have suggested that in a speaking individual the changes in these more slowly changing body parts occur at the boundaries of the larger units in the flow of speech. Scheflen (1973) also reports that posture shifts and other general body movements appear to mark the points of change between one major unit of communicative activity and another. Blom & Gumperz (1972) identify posture changes and changes in the spatial relationship between two speakers as indicators of what they term &quot;situational shifts&quot; -- momentary changes in the mutual rights and obligations between speakers accompanied by shifts in language style. Erickson (1975) concludes that proxemic shifts seem to be markers of 'important' segments. In his analysis of college counseling interviews, they occurred more frequently than any other coded indicator of segment changes, and were therefore the best predictor of new segments in the data.</Paragraph> <Paragraph position="9"> Unfortunately, in none of these studies are statistics provided, and their analyses rely on intuitive definitions of discourse segment or &quot;major shift&quot;. For this reason, we carried out our own empirical study.</Paragraph> </Section> class="xml-element"></Paper>