XML Viewer - w06-3407

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-3407_intro.xml
Size: 3,514 bytes
Last Modified: 2025-10-06 14:04:16
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3407">
  <Title>Topic Segmentation of Dialogue</Title>
  <Section position="3" start_page="0" end_page="42" type="intro">
    <SectionTitle>
2 Defining Topic
</SectionTitle>
    <Paragraph position="0"> In the most general sense, the challenge of topic segmentation can be construed as the task of finding locations in the discourse where the focus shifts from one topic to another. Thus, it is not possible to address topic segmentation of dialogue without first addressing the question of what a &amp;quot;topic&amp;quot; is. We began with the goal of adopting a definition of topic that meets three criteria. First, it should be reproducible by human annotators. Second, it should not rely heavily on domain-specific knowledge or knowledge of the task structure. Finally, it should be grounded in generally accepted principles of discourse structure.</Paragraph>
    <Paragraph position="1"> The last point addresses a subtle, but important, criterion necessary to adequately serve downstream applications using our dialogue segmentation. Topic analysis of dialogue concerns itself mainly with thematic content. However, boundaries should be placed in locations that are natural turning points in the discourse. Shifts in topic should be readily recognizable from surface characteristics of the language.</Paragraph>
    <Paragraph position="2"> With these goals in mind, we adopted a definition of &amp;quot;topic&amp;quot; that builds upon Passonneau and Litman's seminal work on segmentation of monologue (Passonneau and Litman, 1993). They found that human annotators can successfully accomplish a flat monologue segmentation using an informal notion of speaker intention.</Paragraph>
    <Paragraph position="3">  Dialogue is inherently hierarchical in structure.</Paragraph>
    <Paragraph position="4"> However, a flat segmentation model is an adequate approximation. Passonneau and Litman's pilot studies confirmed previously published results (Rotondo, 1984) that human annotators cannot reliably agree on a hierarchical segmentation of monologue. Using a stack-based hierarchical model of discourse, Flammia (1998) found that 90% of all information-bearing dialogue turns referred to the discourse purpose at the top of the stack.</Paragraph>
    <Paragraph position="5"> We adopt a flat model of topic segmentation based on discourse segment purpose, where a shift in topic corresponds to a shift in purpose that is acknowledged and acted upon by both conversational participants. We place topic boundaries on contributions that introduce a speaker's intention to shift the purpose of the discourse, while ignoring expressed intentions to shift discourse purposes that are not taken up by the other participant. We adopt the dialogue contribution as the basic unit of analysis, refraining from placing topic boundaries within a contribution. This decision is analogous to Hearst's (Hearst, 1994, 1997) decision to shift the TextTiling induced boundaries to their nearest reference paragraph boundary.</Paragraph>
    <Paragraph position="6"> We evaluated the reproducibility of our notion of topic segment boundaries by assessing inter-coder reliability over 10% of the corpus (see Section 5.1). Three annotators were given a 10 page coding manual with explanation of our informal definition of shared discourse segment purpose as well as examples of segmented dialogues. Pair-wise inter-coder agreement was above 0.7 for all pairs of annotators.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML