File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/w04-3009_intro.xml

Size: 7,159 bytes

Last Modified: 2025-10-06 14:02:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3009">
  <Title>Using Higher-level Linguistic Knowledge for Speech Recognition Error Correction in a Spoken Q/A Dialog</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> New application environments such as telephone-based retrieval, car navigation systems, and mobile information retrieval, often require speech interface to conveniently process user queries. In these environments, keyboard input is inconvenient or sometimes impossible because of spatial limitation on mobile devices and instability in manipulating the devices.</Paragraph>
    <Paragraph position="1"> However, because of the low recognition rate in current speech recognition systems, the performance of speech applications such as speech-driven information retrieval (IR) and question answering (QA), and speech dialogue systems is very low. The performance of the serially connected spoken QA system, based on the QA system from text input which has 76% performance and the output of the ASR which operated at a 30% WER, was only 7% (Harabagiu et al., 2002). (Harabagiu et al., 2002) exposes several fundamental flaws of this simple combination of an automatic speech recognition (ASR) and QA system, including the importance of named entity information, and the inadequacies of current speech recognition technology based on n-gram language models.</Paragraph>
    <Paragraph position="2"> The major problem of speech-driven IR and QA is the decreasing of the performance due to the recognition errors in ASR systems. Erroneously recognized spoken queries drop the precision and recall of IR and QA system. Some authors investigated the relation of ASR errors and precision of IR (Barnett et al., 1997; Crestani, 2000). They evaluated the effectiveness of the IR systems through various error rates using 35 queries of TREC.</Paragraph>
    <Paragraph position="3"> Their researches show that the increasing word error rate (WER) quickly decreases the precision of IR. Another group investigated the performance of spoken queries in NTCIR collections (Fujii et al., 2002A). They evaluated a variety of speakers, and calculated the error rate with respect to a query term, which is a keyword used for the retrieval. They showed that the WER of the query terms was generally higher than that of the general words irrespective of the speakers. In other words, recognition of content words related to the IR and QA performance was more difficult than that of normal words. So, they introduced a method to improve the precision of speech-driven IR by suggesting a new type of IR system tightlyintegrated with a speech input interface (Fujii et al., 2002B). In their system, document collection provides an adaptation of the language model of the ASR, which results in a drop of the word error rate.</Paragraph>
    <Paragraph position="4"> For this reason, some appropriate adaptation techniques are required for overcoming speech recognition errors such as post error correction. ASR error correction can be one of the domain adaptation techniques to improve the recognition accuracy, and the primary advan- null tage of the error correction approach is its independence of the specific speech recognizer. If the speech recognizer can be regarded as a black-box, we can perform robust and flexible domain adaptation through the post error correction process. Figure 1 shows the paradigm of this post error correction approach.</Paragraph>
    <Paragraph position="5"> One approach in post error correction, which is a straightforward and intuitive method to robustly handle many kinds of recognition errors, was rule-based approach (Kaki et al., 1998). (Kaki et al., 1998) collected many lexical error patterns that occurred in a speech translation system in Japanese. They could correct any type of errors by matching the strings in the transcription with lexical error patterns in the database. However, their approach has a disadvantage in that the correction is only feasible to the trained (or collected) lexical error patterns. Another approach has been based on a statistical method utilizing the probabilistic information of words in a spoken dialogue situation and the language models adapted to the application domain (Ringger and Allen, 1996). (Ringger and Allen, 1996) applied the noisy channel model to the correction of the errors in speech recognition. They simplified a statistical machine translation (MT) model called an IBM model (Brown et al., 1990), and tried to construct a general post-processor that can correct errors generated by any speech recognizer. The model consists of two parts: a channel model, which accounts for errors made by the ASR, and the language model, which accounts for the likelihood of a sequence of words being uttered. They trained the channel model and the language model both using some transcriptions from TRAINS-95 dialogue system which is a train traveling planning system (Allen et al., 1996). Here, the channel model has the distribution that an original word may be recognized as an erroneous word. They use the probability of mistakenly recognized words, the co-occurrence information extracted from the words and their neighboring words, and the tagged word bi-grams, which are all lexical clues in error strings.</Paragraph>
    <Paragraph position="6"> Such approaches based on lexical information of words have shown some successful results, but they still have major drawbacks; The performance of such systems depends on the size and the quality of speech recognition result, or on the database of collected error strings since they are directly dependent on lexical items. The error patterns constructed are available but not enough, because it is expensive to collect them; so in many cases, they fail to recover the original strings from the lexical specific error patterns. Also, since they are sensitive to the error patterns, they occasionally mis-identify a correct word as an error word.</Paragraph>
    <Paragraph position="7"> We suggest a more improved and robust semantic-oriented error correction approach, which can be integrated into previous fragile lexical-based approaches.</Paragraph>
    <Paragraph position="8"> In our approach, in addition to lexical information, we use high level syntactic and semantic information of the words in a speech transcription. We obtain semantic information from a knowledge base such as general thesauri and a special domain dictionary that we construct by ourselves to contain some domain specific knowledge to the target application.</Paragraph>
    <Paragraph position="9"> In the next section, we first describe a general noisy channel model for ASR error correction and discuss some problems with them. We then introduce our improved channel model especially for Korean language in section  3. We also propose a new high-level error correction model using syntactic and semantic knowledge in section 4. We prove the feasibility of our approach through some  experiments in section 5, and draw some conclusions in section 6.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML