XML Viewer - w04-1115

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-1115_concl.xml
Size: 2,940 bytes
Last Modified: 2025-10-06 13:54:14
<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-1115">
  <Title>Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News</Title>
  <Section position="10" start_page="0" end_page="0" type="concl">
    <SectionTitle>
8 Conclusion and Future Work
</SectionTitle>
    <Paragraph position="0"> We have demonstrated the utility of prosody-only, text-only, and mixed text-prosody features for automatic topic segmentation of Mandarin Chinese. We have demonstrated the applicability of intonational prosodic features, specifically pitch, intensity, pause and duration, to the identification of topic boundaries in a tone language. We find highly significant decreases in pitch and intensity at topic final positions, and significant increases in word duration. Furthermore, these features in both local form and contextualized form provide the basis for an effective decision tree classifier of boundary positions that does not use term similarity or cue phrase information, but only prosodic features.</Paragraph>
    <Paragraph position="1"> We observe similar effectiveness for all feature sets when all features are available, with slightly better classification accuracy for the text and hybrid text-prosody approach. We further observe that the prosody-only and hybrid feature sets are much less sensitive to the absence of individual features, and, in particular, to silence features, as pitch and intensity provide comparable sharp cues to the position of topic boundaries. These findings indicate that prosodic features are robust cues to topic boundaries, both with and without textual cues.</Paragraph>
    <Paragraph position="2"> Finally, we demonstrate the joint utility of the dif- null that is newly removed from the set of available features. ferent feature sets - prosodic, textual, and silence.</Paragraph>
    <Paragraph position="3"> The use of a simple voting mechanism exploits the different contributions of each of the feature-set-specific classifiers in conjunction with the integrated classifier. This final combination allows a substantial reduction of the false alarm rate, reduction in the overall error rate, and only a small increase in the miss rate. Further tuning of relative miss and false alarm rates is certainly possible, but should be tied to a specific task application.</Paragraph>
    <Paragraph position="4"> There is still substantial work to be done. We would like to integrate speaker identification for normalization and speaker change detection. We also plan to explore the integration of text and prosodic features for the identification of more fine-grained sub-topic structure, to provide more focused units for information retrieval, summarization, and anaphora resolution. We also plan to explore the interaction of prosodic and textual features with cues from other modalities, such as gaze and gesture, for robust segmentation of varied multi-modal data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML