File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/n03-1001_intro.xml
Size: 3,967 bytes
Last Modified: 2025-10-06 14:01:41
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-1001"> <Title>Effective Utterance Classification with Unsupervised Phonotactic Models</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> A major bottleneck in building data-driven speech processing applications is the need to manually transcribe training utterances into words. The resulting corpus of transcribed word strings is then used to train application-specific language models for speech recognition, and in some cases also to train the natural language components of the application. Some of these speech processing applications make use of utterance classification, for example when assigning a call destination to naturally spoken user utterances (Gorin et al., 1997; Carpenter and Chu-Carroll, 1998), or as an initial step in converting speech to actions in spoken interfaces (Alshawi and Douglas, 2001).</Paragraph> <Paragraph position="1"> In this paper we present an approach to utterance classification that avoids the manual effort of transcribing training utterances into word strings. Instead, only the desired utterance class needs to be associated with each sample utterance. The method combines automatic training of application-specific phonotactic models together with token sequence classifiers. The accuracy of this phone-string utterance classification method turns out to be surprisingly close to what can be achieved by conventional methods involving word-trigram language models that require manual transcription. To quantify this, we present empirical accuracy results from three different call-routing applications comparing our method with conventional utterance classification using word-trigram recognition.</Paragraph> <Paragraph position="2"> Previous work at AT&T on utterance classification without words used information theoretic metrics to discover &quot;acoustic morphemes&quot; from untranscribed utterances paired with routing destinations (Gorin et al., 1999; Levit et al., 2001; Petrovska-Delacretaz et al., 2000).</Paragraph> <Paragraph position="3"> However, that approach has so far proved impractical: the major obstacle to practical utility was the low run-time detection rate of acoustic morphemes discovered during training. This led to a high false rejection rate (between 40% and 50% for 1-best recognition output) when a word-based classification algorithm (the one described by Wright et. al (1997)) was applied to the detected sequence of acoustic morphemes.</Paragraph> <Paragraph position="4"> More generally, previous work using phone string (or phone-lattice) recognition has concentrated on tasks involving retrieval of audio or video (Jones et al., 1996; Foote et al., 1997; Ng and Zue, 1998; Choi et al., 1999).</Paragraph> <Paragraph position="5"> In those tasks, performance of phone-based systems was not comparable to the accuracy obtainable from word-based systems, but rather the rationale was avoiding the difficulty of building wide coverage statistical language models for handling the wide range of subject matter that a typical retrieval system, such as a system for retrieving news clips, needs to cover. In the work presented here, the task is somewhat different: the system can automatically learn to identify and act on relatively short phone subsequences that are specific to the speech in a limited domain of discourse, resulting in task accuracy that is comparable to word-based methods.</Paragraph> <Paragraph position="6"> In section 2 we describe the utterance classification method. Section 3 describes the experimental setup and the data sets used in the experiments. Section 4 presents the main comparison of the performance of the method against a &quot;conventional&quot; approach using manual transcription and word-based models. Section 5 gives some concluding remarks.</Paragraph> </Section> class="xml-element"></Paper>