File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/n04-4022_evalu.xml

Size: 5,501 bytes

Last Modified: 2025-10-06 13:59:09

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-4022">
  <Title>Context-based Speech Recognition Error Detection and Correction</Title>
  <Section position="4" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
3 Results and Discussion
</SectionTitle>
    <Paragraph position="0"> We carried out an initial evaluation of our system using three specific query words that were featured in a large number of news stories in the training corpus: &amp;quot;Iraq,&amp;quot; &amp;quot;Abbas,&amp;quot; and &amp;quot;Lynch&amp;quot; (from Jessica Lynch, an American soldier during the war in Iraq). The 39 files in the test corpus were annotated to indicate all the locations of recognition errors involving these three spoken words.</Paragraph>
    <Paragraph position="1"> In addition, the location of errors that are morphological variants of the query word, such as &amp;quot;Iraqi&amp;quot; and &amp;quot;Iraq's&amp;quot; were annotated and were not included in the evaluation results; in the context of information retrieval these morphological variants can easily be addressed using common techniques such as stemming.</Paragraph>
    <Paragraph position="2"> The query word &amp;quot;Lynch&amp;quot; turned out to be an uninteresting case for our approach: it was misrecognized only 4 times in the test corpus, each time as the morphological variant &amp;quot;lynched.&amp;quot; Nevertheless, the context matching test worked well, as three of the top-ranked context words were the very relevant &amp;quot;Jessica,&amp;quot; &amp;quot;private,&amp;quot; and &amp;quot;rescue.&amp;quot; The detection and correction results for the word &amp;quot;Abbas&amp;quot; were also very encouraging, although the small sample size makes it difficult to draw significant conclusions. In our test corpus, there were n=10 examples in which &amp;quot;Abbas&amp;quot; was misrecognized. Our method detected 8 candidates, 7 of which were actually misrecognitions of &amp;quot;Abbas,&amp;quot; for a recall of 70% and a precision of 88% (window size w=10, minimum context c=2). Corrections included &amp;quot;a bus,&amp;quot; &amp;quot;a bass,&amp;quot; and &amp;quot;a boss,&amp;quot; and the false positive was the word &amp;quot;about,&amp;quot; which is phonetically very similar.</Paragraph>
    <Paragraph position="3"> The query term &amp;quot;Iraq&amp;quot; proved to be the most fruitful query term, due to its prevalence throughout the corpus.</Paragraph>
    <Paragraph position="4"> There was a total of 142 cases in which &amp;quot;Iraq&amp;quot; was misrecognized. Examples of common errors were &amp;quot;rock,&amp;quot; &amp;quot;a rock,&amp;quot; &amp;quot;your rocks,&amp;quot; &amp;quot;warren rock&amp;quot; (war in Iraq), &amp;quot;her rock,&amp;quot; &amp;quot;any rocket&amp;quot; (in Iraq), and &amp;quot;a rack.&amp;quot; Table 2 shows the final results for the query term &amp;quot;Iraq&amp;quot; for the 39 ASR output test files, for a range of minimum required context words c and the most-successful window size (14 ).</Paragraph>
    <Paragraph position="5">  w=14 and a range of minimum context words c: hypothesized errors detected and corrected, false positives, recall, and precision (n=142).</Paragraph>
    <Paragraph position="6"> Although we can not draw conclusions about the general applicability of this approach until we carry out further experiments with more test cases, the preliminary detection and correction results indicate that it is possible to achieve very high precision with a reasonable recall for certain window sizes and numbers of context words. Table 3 shows recall and precision values for some of the most effective combinations of window sizes w and minimum context words c which return few false positives and many accurate corrections.</Paragraph>
    <Paragraph position="7">  and minimum context values.</Paragraph>
    <Paragraph position="8"> The work we describe in this paper is complementary to ASR algorithmic improvements, in that we treat error detection and correction as a post-processing step that can be applied to the output of any ASR system and can be adapted to incremental improvements in the systems.</Paragraph>
    <Paragraph position="9"> This form of post-processing also allows us to take advantage of long-range contextual features that are not available during the ASR decoding itself. Post-processing also enables large-scale data analysis that models the types of systematic errors that ASR systems make. All the steps in our approach, co-occurrence analysis, context matching, and phonetic distance pruning, are unsupervised methods that can be automatically run for large quantities of data.</Paragraph>
    <Paragraph position="10"> The results in this paper are promising but are obviously very preliminary. We are in the process of evaluating the work on a much larger set of query words. We should emphasize that the goal of this work is not to produce a significant improvement in the overall word error rate of a particular corpus of ASR output, although we believe that such an improvement is possible using similar contextual analysis. Instead, the focus of the work is to improve the specific aspects of the ASR output that may adversely affect a user-centered task like information retrieval. While we have not formally evaluated the impact of our error detection and correction on retrieval performance, there is an obvious benefit to correcting misrecognitions of the specific query term that a user is seeking.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML