XML Viewer - n04-4040

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4040_intro.xml
Size: 2,828 bytes
Last Modified: 2025-10-06 14:02:23
<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4040">
  <Title>A Lexically-Driven Algorithm for Disfluency Detection</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Disfluencies in human speech are widespread and cause problems for both downstream processing and human readability of speech transcripts. Recent human studies (Jones et al., 2003) have examined the effect of disfluencies on the readability of speech transcripts. These results suggest that the &amp;quot;cleaning&amp;quot; of text by removing disfluent words can increase the speed at which readers can process text. Recent work on detecting edits for use in parsing of speech transcripts (Core and Schubert, 1999), (Charniak and Johnson, 2001) has shown an improvement in the parser error rate by modeling disfluencies.</Paragraph>
    <Paragraph position="1"> Many researchers investigating disfluency detection have focused on the use of prosodic cues, as opposed to lexical features (Nakatani and Hirschberg, 1994). There are different approaches to detecting disfluencies. In one approach, one can first try to locate evidence of a general disfluency, e.g., using prosodic features or language model discontinuations. These locations are called interruption points (IPs). Following this, it is generally sufficient to look in the nearby vicinity of the IP to find the dis- null fluent words. The most successful approaches so far combine the detection of IPs using prosodic features and language modeling techniques (Liu et al., 2003), (Shriberg et al., 2001), (Stolcke et al., 1998).</Paragraph>
    <Paragraph position="2"> Our work is based on the premise that the vast majority of disfluencies can be detected using primarily lexical features--specifically the words themselves and part-of-speech (POS) labels--without the use of extensive prosodic cues. Lexical modeling of disfluencies with only minimal acoustic cues has been shown to be successful in the past using strongly statistical techniques (Heeman and Allen, 1999). We shall discuss our algorithm and compare it to two other algorithms that make extensive use of acoustic features. Our algorithm performs comparably on most of the tasks assigned and in some cases outperforms systems that used both prosodic and lexical features.</Paragraph>
    <Paragraph position="3"> We discuss the task definition in Section 2. In Section 3 we describe our Transformation-Based Learning (TBL) algorithm and its associated features. Section 4 presents results for our system and two other systems that make heavy use of prosodic features to detect disfluencies. We then discuss the errors made by our system, in Section 5, and discuss our conclusions and future work in Section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML