File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/w04-2906_abstr.xml

Size: 1,595 bytes

Last Modified: 2025-10-06 13:44:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-2906">
  <Title>Assessing Prosodic and Text Features for Segmentation of Mandarin Broadcast News</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While signi cant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic segmentation of other languages using both text and acoustic information. In this paper, we focus on exploiting both textual and prosodic features for topic segmentation of Mandarin Chinese. As a tone language, Mandarin presents special challenges for applicability of intonation-based techniques, since the pitch contour is also used to establish lexical identity. However, intonational cues such as reduction in pitch and intensity at topic boundaries and increase in duration and pause still provide signi cant contrasts in Mandarin Chinese. We rst build a decision tree classier that based only on prosodic information achieves boundary classi cation accuracy of 89-95.8% on a large standard test set. We then contrast these results with a simple text similarity-based classi cation scheme. Finally we build a merged classi er, nding the best effectiveness for systems integrating text and prosodic cues.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML