File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-2906_intro.xml
Size: 3,106 bytes
Last Modified: 2025-10-06 14:02:46
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2906"> <Title>Assessing Prosodic and Text Features for Segmentation of Mandarin Broadcast News</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Natural spoken discourse is composed of a sequence of utterances, not independently generated or randomly strung together, but rather organized according to basic structural principles. This structure in turn guides the interpretation of individual utterances and the discourse as a whole. Formal written discourse signals a hierarchical, tree-based discourse structure explicitly by the division of the text into chapters, sections, paragraphs, and sentences. This structure, in turn, identi es domains for interpretation; many systems for anaphora resolution rely on some notion of locality (Grosz and Sidner, 1986).</Paragraph> <Paragraph position="1"> Similarly, this structure represents topical organization, and thus would be useful in information retrieval to select documents where the primary sections are on-topic, and, for summarization, to select information covering the different aspects of the topic.</Paragraph> <Paragraph position="2"> Unfortunately, spoken discourse does not include the orthographic conventions that signal structural organization in written discourse. Instead, one must infer the hierarchical structure of spoken discourse from other cues.</Paragraph> <Paragraph position="3"> Prior research (Nakatani et al., 1995; Swerts, 1997) has shown that human labelers can more sharply, consistently, and con dently identify discourse structure in a word-level transcription when an original audio recording is available than they can on the basis of the transcribed text alone. This nding indicates that substantial additional information about the structure of the discourse is encoded in the acoustic-prosodic features of the utterance. Given the often errorful transcriptions available for large speech corpora, we choose to focus here on fully exploiting the prosodic cues to discourse structure present in the original speech in addition to possibly noisy textual cues. We then compare the effectiveness of a pure prosodic classi cation to text-based and mixed text and prosodic based classi cation.</Paragraph> <Paragraph position="4"> In the current set of experiments, we concentrate on sequential segmentation of news broadcasts into individual stories. This level of segmentation can be most reliably performed by human labelers and thus can be considered most robust, and segmented data sets are publicly available. null Furthermore, we consider the relative effectiveness prosodic-based, text-based, and mixed cue-based segmentation for Mandarin Chinese, to assess the relative utility of the cues for a tone language. Not only is the use of prosodic cues to topic segmentation much less well-studied in general than is the use of text cues, but the use of prosodic cues has been largely limited to English and other European languages.</Paragraph> </Section> class="xml-element"></Paper>