File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/04/n04-4022_intro.xml
Size: 3,086 bytes
Last Modified: 2025-10-06 14:02:17
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4022"> <Title>Context-based Speech Recognition Error Detection and Correction</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Spoken language sources, such as news broadcasts, meetings, and telephone conversations, are becoming a very common data source for user-centered tasks such as information retrieval, question answering, and summarization.</Paragraph> <Paragraph position="1"> Automatic speech recognition (ASR) systems, which can rapidly produce a transcript of spoken audio, are consequently becoming an essential part of the information flow. However, ASR systems often generate transcripts with many word errors, which can adversely affect the performance of systems designed to assist users in managing large quantities of natural language data. Retrieving documents or passages relevant to a user query is significantly easier when the words in the query are contained in the document; when a query word is misrecognized by the ASR system, retrieval accuracy declines.</Paragraph> <Paragraph position="2"> For example, if a user is searching for spoken documents related to &quot;Iraq,&quot; and the spoken word &quot;Iraq&quot; is consistently misrecognized, the user will not be able to locate many of the desired documents.</Paragraph> <Paragraph position="3"> In this work we introduce a novel unsupervised approach to detecting and correcting misrecognized query words in a document collection. Our approach takes advantage of two important patterns in the appearance of ASR errors. First, specific words in a large corpus tend to co-occur frequently with certain other context words, and misrecognitions of those specific words will also tend to co-occur with the same context words. Second, many ASR errors are phonetically similar to the actual spoken words. Our approach takes advantage of these patterns of ASR errors and seeks to find output words that are both phonetically similar to a query word and that occur in a context that is more likely to indicate the query word. For example, &quot;Iraq&quot; and &quot;a rock&quot; are phonetically very similar but generally occur in different contexts.</Paragraph> <Paragraph position="4"> Our ASR error detection and correction is carried out in three steps that are separate from the speech recognition itself. We first analyze a large corpus of output from a given ASR system to compile co-occurrence statistics for each word in the system's vocabulary. This analysis results in a set of context words likely to occur with each vocabulary word. Next, given a target word, such as a query word entered into an information retrieval system, we identify regions in the search corpus containing a large number of the expected context words for the query word. Finally, we detect words in the regions that are unlikely to occur with the context words and that are phonetically similar to the query.</Paragraph> </Section> class="xml-element"></Paper>