File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/p04-1005_intro.xml

Size: 2,746 bytes

Last Modified: 2025-10-06 14:02:23

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1005">
  <Title>A TAG-based noisy channel model of speech repairs</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Most spontaneous speech contains dis uencies such as partial words, lled pauses (e.g., \uh&amp;quot;, \um&amp;quot;, \huh&amp;quot;), explicit editing terms (e.g., \I mean&amp;quot;), parenthetical asides and repairs. Of these repairs pose particularly di cult problems for parsing and related NLP tasks. This paper presents an explicit generative model of speech repairs and shows how it can eliminate this kind of dis uency.</Paragraph>
    <Paragraph position="1"> While speech repairs have been studied by psycholinguists for some time, as far as we know this is the rst time a probabilistic model of speech repairs based on a model of syntactic structure has been described in the literature.</Paragraph>
    <Paragraph position="2"> Probabilistic models have the advantage over other kinds of models that they can in principle be integrated with other probabilistic models to produce a combined model that uses all available evidence to select the globally optimal analysis. Shriberg and Stolcke (1998) studied the location and distribution of repairs in the Switchboard corpus, but did not propose an actual model of repairs. Heeman and Allen (1999) describe a noisy channel model of speech repairs, but leave \extending the model to incorporate higher level syntactic . ..processing&amp;quot; to future work. The previous work most closely related to the current work is Charniak and Johnson (2001), who used a boosted decision stub classier to classify words as edited or not on a word by word basis, but do not identify or assign a probability to a repair as a whole.</Paragraph>
    <Paragraph position="3"> There are two innovations in this paper.</Paragraph>
    <Paragraph position="4"> First, we demonstrate that using a syntactic parser-based language model Charniak (2001) instead of bi/trigram language models signi cantly improves the accuracy of repair detection and correction. Second, we show how Tree Adjoining Grammars (TAGs) can be used to provide a precise formal description and probabilistic model of the crossed dependencies occurring in speech repairs.</Paragraph>
    <Paragraph position="5"> The rest of this paper is structured as follows. The next section describes the noisy channel model of speech repairs and the section after that explains how it can be applied to detect and repair speech repairs. Section 4 evaluates this model on the Penn 3 dis uency-tagged Switchboard corpus, and section 5 concludes and discusses future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML