File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/93/p93-1007_intro.xml

Size: 10,522 bytes

Last Modified: 2025-10-06 14:05:27

<?xml version="1.0" standalone="yes"?>
<Paper uid="P93-1007">
  <Title>A SPEECH-FIRST MODEL FOR REPAIR DETECTION AND CORRECTION</Title>
  <Section position="2" start_page="0" end_page="46" type="intro">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> Interpreting fully natural speech is an important goal for spoken language understanding systems. However, while corpus studies have shown that about 10% of spontaneous utterances contain self-corrections, or RE-PAIRS, little is known about the extent to which cues in the speech signal may facilitate repair processing. We identify several cues based on acoustic and prosodic analysis of repairs in a corpus of spontaneous speech, and propose methods for exploiting these cues to detect and correct repairs. We test our acoustic-prosodic cues with other lexical cues to repair identification and find that precision rates of 89-93% and recall of 78-83% can be achieved, depending upon the cues employed, from a prosodically labeled corpus.</Paragraph>
    <Paragraph position="1"> Introduction Disfluencies in spontaneous speech pose serious problems for spoken language systems. First, a speaker may produce a partial word or FRAGMENT, a string of phonemes that does not form the complete intended word. Some fragments may coincidentally match words actually in the lexicon, such as fly in Example (1); others will be identified with the acoustically closest item(s) in the lexicon, as in Example (2). 1  (1) What is the earliest fli- flight from Washington to Atlanta leaving on Wednesday September fourth? (2) Actual string: What is the fare fro- on American  Airlines fourteen forty three Recognized string: With fare four American Airlines fourteen forty three Even if all words in a disfluent segment are correctly recognized, failure to detect a disfluency may lead to interpretation errors during subsequent processing, as in Example (3).</Paragraph>
    <Paragraph position="2"> 1The presence of a word fragment in examples is indicated by the diacritic '-'. Self-corrected portions of the utterance appear in boldface. All examples in this paper are drawn from the ATIS corpus described below. Recognition output shown in Example (2) is from the system described in (Lee et al., 1990).</Paragraph>
    <Paragraph position="3"> (3) ... Delta leaving Boston seventeen twenty one arriving Fort Worth twenty two twenty one forty...</Paragraph>
    <Paragraph position="4"> Here, 'twenty two twenty one forty' must be interpreted as a flight arrival time; the system must somehow choose among '21:40', '22:21', and '22:40'.</Paragraph>
    <Paragraph position="5"> Although studies of large speech corpora have found that approximately 10% of spontaneous utterances contain disfluencies involving self-correction, or REPAIRS (Hindle, 1983; Shriberg et al., 1992), little is known about how to integrate repair processing with real-time speech recognition. In particular, the speech signal itself has been relatively unexplored as a source of processing cues for the detection and correction of repairs. In this paper, we present results from a study of the acoustic and prosodic characteristics of 334 repair utterances, containing 368 repair instances, from the AROA Air Travel Information System (ATIS) database.</Paragraph>
    <Paragraph position="6"> Our results are interpreted within our &amp;quot;speech-first&amp;quot; framework for investigating repairs, the REPAIR INTERVAL MODEL (RIM). RIM builds upon Labov (1966) and Hindle (1983) by conceptually extending the EDIT SIGNAL HYPOTHESIS -- that repairs are acoustically or phonetically marked at the point of interruption of fluent speech. After describing acoustic and prosodic characteristics of the repair instances in our corpus, we use these and other lexical cues to test the utility of our &amp;quot;speech-first&amp;quot; approach to repair identification on a prosodically labeled corpus.</Paragraph>
    <Paragraph position="7">  While self-correction has long been a topic of psycholinguistic study, computational work in this area has been sparse. Early work in computational linguistics treated repairs as one type of ill-formed input and proposed solutions based upon extensions to existing text parsing techniques such as augmented transition networks (ATNs), network-based semantic grammars, case frame grammars, pattern matching and deterministic parsers.</Paragraph>
    <Paragraph position="8"> Recently, Shriberg et al. (1992) and Bear et al. (1992) have proposed a two-stage method for processing repairs. In the first stage, lexical pattern  matching rules operating on orthographic transcriptions would be used to retrieve candidate repair utterances. In the second, syntactic, semantic, and acoustic information would filter true repairs from false positives found by the pattern matcher. Results of testing the first stage of this model, the lexical pattern matcher, are reported in (Bear et al., 1992): 309 of 406 utterance containing 'nontrivial' repairs in their 10,718 utterance corpus were correctly identified, while 191 fluent utterances were incorrectly identified as containing repairs. This represents recall of 76% with precision of 62%.</Paragraph>
    <Paragraph position="9"> Of the repairs correctly identified, the appropriate correction was found for 57%. Repaj'r candidates were filtered and corrected by deleting a portion of the utterance based on the pattern matched, and then checking the syntactic and semantic acceptability of the corrected version using the syntactic and semantic components of the Gemini NLP system. Bear et al. (1992) also speculate that acoustic information might be used to filter out false positives for candidates matching two of their lexical patterns -- repetitions of single words and cases of single inserted words -- but do not report such experimentation.</Paragraph>
    <Paragraph position="10"> This work promotes the important idea that automatic repair processing can be made more robust by integrating knowledge from multiple sources. Such integration is a desirable long-term goal. However, the working assumption that correct transcriptions will be available from speech recognizers is problematic, since current recognition systems rely primarily upon language models and lexicons derived from fluent speech to decide among competing acoustic hypotheses. These systems usually treat disfluencies in training and recognition as noise; moreover, they have no way of modeling word fragments, even though these occur in the majority of repairs. We term such approaches that rely on accurate transcription to identify repair candidates &amp;quot;text-first&amp;quot;.</Paragraph>
    <Paragraph position="11"> Text-first approaches have explored the potential contributions of lexical and grammatical information to automatic repair processing, but have largely left open the question of whether there exist acoustic and prosodic cues for repairs in general, rather than potential acoustic-prosodic filters for particular pattern subclasses. Our investigation of repairs addresses the problem of identifying such general acoustic-prosodic cues to repairs, and so we term our approach &amp;quot;speechfirst&amp;quot;. Finding such cues to repairs would provide early detection of repairs in recognition, permitting early pruning of the hypothesis space.</Paragraph>
    <Paragraph position="12"> One proposal for repair processing that lends itself to both incremental processing and the integration of speech cues into repair detection is that of Hindle (1983), who defines a typology of repairs and associated correction strategies in terms of extensions to a deterministic parser. For Hindle, repairs can be (1) full sentence restarts, in which an entire utterance is reinitiated; (2) constituent repairs, in which one syntactic constituent (or part thereof) is replaced by another; 2 or (3) surface level repairs, in which identical strings appear adjacent to each other. An hypothesized acoustic-phonetic edit signal, &amp;quot;a markedly abrupt cut-off of the speech signal&amp;quot; (Hindle, 1983, p.123), is assumed to mark the interruption of fluent speech (cf. (Labov, 1966)). This signal is treated as a special lexical item in the parser input stream that triggers certain correction strategies depending on the parser configuration. Thus, in Hindle's system, repair detection is decoupled from repair correction, which requires only that the location of the interruption is stored in the parser state.</Paragraph>
    <Paragraph position="13"> Importantly, Hindle's system allows for nonsurface-based corrections and sequential application of correction rules (Hindle, 1983, p. 123). In contrast, simple surface deletion correction strategies cannot readily handle either repairs in which one syntactic constituent is replaced by an entirely different one, as in Example (4), or sequences of overlapping repairs, as in Example (5).</Paragraph>
    <Paragraph position="14"> (4) I 'd like to a flight from Washington to Denver...</Paragraph>
    <Paragraph position="15"> (5) I 'd like to book a reser- are there f- is there a first class fare for the flight that departs at six forty p.m.</Paragraph>
    <Paragraph position="16"> Hindle's methods achieved a success rate of 97% on a transcribed corpus of approximately 1,500 sentences in which the edit signal was orthographically represented and lexical and syntactic category assignments hand-corrected, indicating that, in theory, the edit signal can be computationally exploited for both repair detection and correction. Our &amp;quot;speech-first&amp;quot; investigation of repairs is aimed at determining the extent to which repair processing algorithms can rely on the edit signal hypothesis in practice.</Paragraph>
    <Paragraph position="17"> The Repair Interval Model To support our investigation of acoustic-prosodic cues to repair detection, we propose a &amp;quot;speech-first&amp;quot; model of repairs, the REPAIR INTERVAL MODEL (RIM). RIM divides the repair event into three consecutive temporal intervals and identifies time points within those intervals that are computationally critical. A full repair comprises three intervals, the REPARANDUM INTERVAL, the DISFLUENCY INTERVAL, and the REPAIR INTERVAL.</Paragraph>
    <Paragraph position="18"> Following Levelt (1983), we identify the REPARANDUM as the lexicai material which is to be repaired. The end of the reparandum coincides with the termination of the fluent portion of the utterance, which we term the</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML