File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2318_intro.xml

Size: 3,389 bytes

Last Modified: 2025-10-06 14:02:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2318">
  <Title>Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Contemporary theories of discourse, both computational and descriptive, postulate a tree-structured hierarchical model of discourse. These structures may be viewed as corresponding to&amp;quot;intentional&amp;quot; structure of discourse segment purposes in the view of (Grosz and Sidner, 1986), to plan and subplan structure directly in the view of (Allen and Litman, 1990) , to nuclei and satellite rhetorical relations in the Rhetorical Structure Theory of (Mann and Thompson, 1987), or to information structures as in (Traum and Hinkelman, 1992). Despite this diversity of views on the sources of structural organization, these theories agree on the decomposition of discourse into segments and subsegments in a hierarchical structure.</Paragraph>
    <Paragraph position="1"> Discourse segments help to establish the domain of interpretation for referents or anaphors. (Grosz, 1977) Discourse segmentation can also provide guidance for summarization or retrieval by identifying the topical structure of extended text spans. As a result, an understanding of the mechanisms that signal discourse structure is highly desirable.</Paragraph>
    <Paragraph position="2"> While substantial work has been done on identifying and automatically recognizing the textual and prosodic correlates of discourse structure in monologue, comparable cues for dialogue or multi-party conversation, and in particular human-computer dialogue remain relatively less studied. In this paper, we explore prosodic cues to discourse segmentation in human-computer dialogue.</Paragraph>
    <Paragraph position="3"> Using data from 60 hours of interactions with a voice-only conversational spoken language system, we identify pitch and intensity features that signal segment boundaries. Specifically, based on 473 pairs of segment-final and segment-initiating utterances, we find significant increases for segment-initial utterances in maximum and average pitch and average intensity, with significantly lower minimum pitch for segment-final utterances. These results suggest that even in the artificial environment of human-computer dialogue, prosodic cues robustly signal discourse segment structure, comparably to the contrastive uses of pitch and amplitude identified in natural monologues.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
1.1 Overview
</SectionTitle>
      <Paragraph position="0"> We begin with a discussion of related work on discourse segmentation and dialogue act identification in monologue and dialogue, primarily in the human-human case.</Paragraph>
      <Paragraph position="1"> Then we introduce the system and data collection process that produced the human-computer discourse segment change materials for the current analysis. We describe the acoustic analyses performed and the features chosen for comparison. Then we identify the prosodic cues that distinguish discourse segment boundaries and discuss the relation to previously identified cues for other discourse types. Finally we conclude and present some future work.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML