XML Viewer - n04-4018

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4018_intro.xml
Size: 2,759 bytes
Last Modified: 2025-10-06 14:02:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4018">
  <Title>Improving Automatic Sentence Boundary Detection with Confusion Networks</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The output of most current automatic speech recognition systems is an unstructured sequence of words. Additional information such as sentence boundaries and speaker labels are useful to improve readability and can provide structure relevant to subsequent language processing, including parsing, topic segmentation and summarization.</Paragraph>
    <Paragraph position="1"> In this study, we focus on identifying sentence boundaries using word-based and prosodic cues, and in particular we develop a method that leverages additional information available from multiple recognizer hypotheses.</Paragraph>
    <Paragraph position="2"> Multiple hypotheses are helpful because the single best recognizer output still has many errors even for state-of-the-art systems. For conversational telephone speech (CTS) word error rates can be from 20-30%, and for broadcast news (BN) word error rates are 10-15%. These errors limit the effectiveness of sentence boundary prediction, because they introduce incorrect words to the word stream. Sentence boundary detection error rates on a baseline system increased by 50% relative for CTS when moving from the reference to the automatic speech condition, while for BN error rates increased by about 20% relative (Liu et al., 2003). Including additional recognizer hypotheses allows for alternative word choices to inform sentence boundary prediction.</Paragraph>
    <Paragraph position="3"> To integrate the information from different alternatives, we first predict sentence boundaries in each hypothesized word sequence, using an HMM structure that integrates prosodic features in a decision tree with hidden event language modeling. To facilitate merging predictions from multiple hypotheses, we represent each hypothesis as a confusion network, with confidences for sentence predictions from a baseline system. The final prediction is based on a combination of predictions from individual hypotheses, each weighted by the recognizer posterior for that hypothesis.</Paragraph>
    <Paragraph position="4"> Our methods build on related work in sentence boundary detection and confusion networks, as described in Section 2, and a baseline system and task domain reviewed in Section 3. Our approach integrates prediction on multiple recognizer hypotheses using confusion networks, as outlined in Section 4. Experimental results are detailed in Section 5, and the main conclusions of this work are summarized in Section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML