File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0319_intro.xml
Size: 7,399 bytes
Last Modified: 2025-10-06 14:06:37
<?xml version="1.0" standalone="yes"?> <Paper uid="W98-0319"> <Title>Lexical, Prosodic, and Syntactic Cues for Dialog Acts</Title> <Section position="2" start_page="0" end_page="115" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The structure of a discourse is reflected in many aspects of its linguistic realization. These include 'cue phrases', words like now and well which can indicate discourse structure, as well as other lexical, prosodic, or syntactic 'discourse markers'. Multi-party dialog contains a particular kind of discourse structure, the dialog act, related to the speech acts of Searle (1969), the conversational moves of Carletta et al. (1997), and the adjacency pair-parts of Schegloff (1968) Sacks et al. (1974) (see also e.g. Allen and Core (1997; Nagata and Morimoto (1994)). Like other types of structure, the dialog act sequence of a conversation is also reflected in its lexical, prosodic, and syntactic realization.</Paragraph> <Paragraph position="1"> This paper presents a preliminary investigation into the realization of a particular class of dialog acts which play an essential structuring role in dialog, the backehannels or acknowledgements tokens.</Paragraph> <Paragraph position="2"> We discuss the importance of words like yeah as cue-phrases for dialog structure, the role of prosodic knowledge, and the constrained syntactic realization of certain dialog acts.</Paragraph> <Paragraph position="3"> This is part of a larger project on automatically detecting discourse structure for speech recognition and understanding tasks, originally part of the 1997 Summer Workshop on Innovative Techniques in LVCSR at Johns Hopkins. See Jurafsky et al.</Paragraph> <Paragraph position="4"> (1997a) for a summary of the project and its relation to previous attempts to build stochastic models of dialog structure (e.g. Reithinger et al. (1996),Suhm and Waibel (1994),Taylor et al. (1998) and many others), Shriberg et al. (1998) for more details on the automatic use of prosodic features, Stolcke et al. (1998) for details on the machine learning architecture of the project, and Jurafsky et al. (1997a) on the applications to automatic speech recognition.</Paragraph> <Paragraph position="5"> In this paper we focus on the realization of five particular dialog acts which are subsumed by or related to backchannel acts, utterances which give discourse-structuring feedback to the speaker. Four (continuers, assessments, incipient speakership, and to some extent agreements), are subtypes of backchannels. These four and the fifth type (yesanswers) overlap strongly in their lexical realization; many or all of them are realized with words like yeah, okay, uh-huh, or mm-hmm. Distinguishing true markers of agreements or factual answers from mere continuers is essential in understanding a dialog or modeling its structure. Knowing whether a speaker is trying to take the floor (incipient speakership) or merely passively following along (continuers) is essential for predictive models of speakers and dialog.</Paragraph> <Paragraph position="6"> Do you have to have any special training <Laughter>, < Throat_clearing> Yes.</Paragraph> <Paragraph position="7"> Well, it's been nice talking to you.</Paragraph> <Paragraph position="8"> But, uh, yeah Well, how old are you? No, Oh, okay.</Paragraph> <Paragraph position="9"> I don't know if I'm making any sense So you can afford to get a house? Well give me a break, you know.</Paragraph> <Paragraph position="10"> Is that Backchannel-Question._~_._____fight ? The SWBD-DAMSL dialog act tagset (Jura(sky et al., 1997b) was adapted from the DAMSL tag-set (Core and Allen, 1997), and consists of approximately 60 labels in orthogonal dimensions (so labels from different dimensions could be :ombined). Seven CU-Boulder linguistic graduate students labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al., 1992) of human-to-human telephone conversations with these tags, resulting in 220 unique tags for the 205,000 SWBD utterances.</Paragraph> <Paragraph position="11"> The SWBD conversations had already been hand-segmented into utterances by the Linguistic Data Consortium (Meteer and others, 1995; an utterance roughly corresponds to a sentence). Each utterance received exactly one of these 220 tags. For practical reasons, the first labeling pass was done only from text transcriptions without listening to the speech. The average conversation consisted of 144 turns, 271 utterances, and took 28 minutes to label. The labeling agreement was 84% (n = .80; (Carletta, 1996)). The resulting 220 tags included many which were extremely rare, making statistical analysis impossible. We thus clustered the 220 tags into 42 final tags. The 18 most frequent of these 42 tags are shown in Table 1. In the rest of this section we give longer examples of the 4 types which play a role in the rest of the paper.</Paragraph> <Paragraph position="12"> A continuer is a short utterance which plays discourse-structuring roles like indicating that the other speaker should go on talking (Jefferson, t984; Schegloff, 1982; Yngve, 1970). Because continuers are the most common kind of backchannel, our group and others have used the term 'backchannel' as a shorthand for 'continuer-backchannels'. For clarity in this paper we will use the term continuer, in order to avoid any ambiguity with the larger class of utterances which give discourse-structuring feedback to the speaker. Table 2 shows examples of continuers in the context of a Switchboard conversation. null Jefferson (1984)(see also Jefferson (1993)) noted that continuers vary along the dimension of incipient speakership; continuers which acknowledge that the other speaker still has the floor reflect 'passive recipiency', and those which indicate an intention to take the floor reflect 'preparedness to shift from recipiency to speakership'. She noted that tokens of passive recipiency are often realized as mm-hmm, while tokens of incipient speakership are often realized as yeah, or sometimes as yes. The example in Table 2 is one of Passive Recipiency. Table 3 shows an example of a continuer that marks incipient speakership. In our original coding, these were not labeled differently (tokens of passive recipiency and incipient speakership were both marked as 'backchannels'). Afterwards, we took all continuers which the speaker followed by further talk and coded them as incipient speakership, l ~This simple coding unfortunately misses more complex cases of incipiency, such as the speaker's next turns beginning answer category, which includes any sort of answers to questions, yes-answer includes yes, yeah, yep, uh-huh, and such other variations on yes, when they are acting as an answer to a Yes-No-Question.</Paragraph> <Paragraph position="13"> The various agreements (accept, reject, partial accept etc.) all mark the degree to which speaker accepts some previous proposal, plan, opinion, or statement. Because SWBD consists of free conversation and not task-oriented dialog, the majority of our tokens were agree/accepts, which for convenience we will refer to as agreements. These are used to indicate the speaker's agreement with a statement or opinion expressed by another speaker, or the acceptance of a proposal. Table 5 shows an example.</Paragraph> </Section> class="xml-element"></Paper>