File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1107_intro.xml
Size: 13,752 bytes
Last Modified: 2025-10-06 14:06:32
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1107"> <Title>A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence</Title> <Section position="3" start_page="0" end_page="656" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In spite of the increased performance of speech recognition systems, the output still contains many errors. For language processing such as a machine translation, it is extremely difficult to deal with such errors.</Paragraph> <Paragraph position="1"> In integrating recognition and translation into a speech translation system, the development of the following processes is therefore important: (1) detection of errors in speech recognition results; (2) sorting of speech recognition results by means of error detection; (3) providing feedback to the recognition process and/or making the user speak again; (4) correct errors, etc.</Paragraph> <Paragraph position="2"> For this purpose, a number of methods have been proposed. One method is to translate correct parts extracted from speech recognition results by using the semantic distance between words calculated with an example-based approach (Wakita et al., 97). Another</Paragraph> <Section position="1" start_page="0" end_page="653" type="sub_section"> <SectionTitle> 2.1 Error-Pattern-Correction (EPC) </SectionTitle> <Paragraph position="0"> When examining errors in speech recognition, errors are found to occur in regular pattems rather than at random.</Paragraph> <Paragraph position="1"> EPC uses such error pattems for correction. We refer to this pattern as an Ermr-Pattem.</Paragraph> <Paragraph position="2"> An Error-Pattem is made up of two strings. One is the Ma chiog I \[Sobsti ting string including errors, and the other is the corresponding correct string (the former string is referred to as the Error-Part, and the latter as the Correct-Part respectively). These parts are extracted from the speech recognition results and the corresponding actual utterances, then they are stored in a database (referred to as an Error-Pattern-Database). In EPC, the correction is made by substituting a Correct-Part for an Error-Part when the Error-Part is detected in a recognition result (see Figure 2-1). Table 2-1 shows some Error-Pattern examples.</Paragraph> </Section> <Section position="2" start_page="653" end_page="653" type="sub_section"> <SectionTitle> Table 2-1 Examples of Error-Patterns Correct-Part Error-Part 2.1.1 Extraction of Error-Patterns </SectionTitle> <Paragraph position="0"> The Error-Pattern-Database is mechanically prepared using a pair of parts from the speech recognition results and the corresponding actual utterance. The examples below show candidates grouped according to the correct part '<~>' and the erroneous part '< ~</Paragraph> <Paragraph position="2"> EPC is a simple and effective method because it detects and corrects errors only by pattern-matching.</Paragraph> <Paragraph position="3"> The unrestricted use of Error-Patterns, however, may produce the wrong correction. Therefore a careful selection of Error-Patterns is necessary. In this method, several selection conditions are applied in order, as described below. Candidates passing all of the conditions are employed as Error-Patterns.</Paragraph> <Paragraph position="4"> Condition of High Frequency: Candidates of not less than a given threshold value (2 in the experiment) in frequency are selected to collect errors which have a high frequency of occurrence in recognition results.</Paragraph> <Paragraph position="5"> Condition of Non-Side Effect:, This step excludes the candidate whose Error-Part is included in actual utterances to prevent the Error-Part from matching with a section of actual utterances.</Paragraph> <Paragraph position="6"> Condition of Inclusion-l: Because a long Error-Part is more accurate for matching, this step selects an Error-Pattern whose Error-Part is as long as possible. For two arbitrary candidates, when one of their Error-Parts includes the other, and their frequencies are the same value, the candidate whose Error-Part includes the other is accepted. Condition of Inclusion-2: If some Error-Parts are derived from different utterances and have a common part in them, this common part is suitable for an Error-Pattern.</Paragraph> <Paragraph position="7"> Therefore in this step, an Error-Pattem with its Error-Part as short as possible is selected. For two arbitrary candidates, when one of their Error-Parts includes the other, and their frequencies have different values, the included candidate is accepted.</Paragraph> </Section> <Section position="3" start_page="653" end_page="654" type="sub_section"> <SectionTitle> 2.2 Similar-String-Correction (SSC) </SectionTitle> <Paragraph position="0"> In an erroneous Japanese sentence, the correct expressions can be estimated frequently by the row of characters before and after the erroneous sections of the sentence. This means that we are involuntarily applying a portion of a regular expression to an erroneous section.</Paragraph> <Paragraph position="1"> Instead of this portion of the regular expression, SSC uses a collection of strings, the members of which are in the corpus (this collection we refer to as the String-Database). As shown in the block diagram in figure 2-2, the correction is performed through the following steps, the first step is error detection. The next step is the retrieval of the string that is most similar to the string including errors from the String-Database (the former string is referred to as the Similar-String, and the latter as the Error-String). Finally, the correction is made using the difference between these two strings.</Paragraph> <Paragraph position="2"> The procedure for correction varies slightly, depending on the position of the detected error: a top, a middle, or a tail, in an utterance. Here we will explain the case of a middle.</Paragraph> <Paragraph position="3"> Step 1: Estimate an erroneous section (referred to as an error-block) with error detection method'. If there is no error-block, the procedure is terminated.</Paragraph> <Paragraph position="4"> Depending on the position of the error-block, the procedure branches in the following way.</Paragraph> <Paragraph position="5"> If P1 is less than T (T=4), then go to the step for a top. If a value L - P2 + T is less than T, then go to the step for a tail.</Paragraph> <Paragraph position="6"> In all other cases, go to the step for a middle.</Paragraph> <Paragraph position="7"> Here, P1 and P2 denote the start and end positions of an error-block, and L denotes the length of the input string. Step 2: Take the string (Error-String) that comprises an error-block and each M (5 in the experiment) character before and after the error-block out of the input string, and using this string (Error-String) as a query key, retrieve a string (Similar-String) from the String-Database to satisfy the following condition. It must be located in a middle of an utterance, it must have the highest value (S), and S must be not less than a given threshold value ( 0.6 in the experiment). Here, S is defined as: S=(L-N)/L where L is the len~uh of the Similar String, and N is the minimum number of character insertions, deletions, or substitutions necessary to transform the Error-String to the Similar-String.</Paragraph> <Paragraph position="8"> If there is no Similar-String, then go to step 1 leaving this error-block undone.</Paragraph> <Paragraph position="9"> Step 3: If the two strings (denoted A and B), that are each K (2 in the experiment) characters before and after an error-block in the Error-String, am found in the Similar-String, take out the string (denoted C) between A and B in 1 For detecting errors in Japanese sentences, the method using the probability of character sequence was reported to be fairly effective (Araki et al., 93). The result of a preliminary experiment was that the precision and recall rates were over 80% and over 70% respectively.</Paragraph> <Paragraph position="10"> the Similar-String. ff k is not found, then go to Step 1 leaving this error-block undone.</Paragraph> <Paragraph position="11"> Substitute string C as the correct string for the string between A and B in the Error-String (see figure 2-3). 3. Evaluation</Paragraph> </Section> <Section position="4" start_page="654" end_page="654" type="sub_section"> <SectionTitle> 3.1 Data Condition for Experiments Results of Speech Recognition: We used 4806 </SectionTitle> <Paragraph position="0"> recognition results including errors, from the output of speech recognition (Masataki et al., 96; Shimizu et al., 96) experiment using an ATR spoken language database (Morimoto et al., 94) on travel arrangements. The characteristics of those results are shown in table 3-1.</Paragraph> <Paragraph position="1"> The breakdown of these 4806 results is as follows: 4321 results were used for the preparation of Error-Patterns and the other 495 results were used for the evaluation.</Paragraph> <Paragraph position="2"> for the frequency of the occurrence, we employed a value of not less than 2, therefore we obtained 629 Error-Pattems using the 4321 results of speech recognition.</Paragraph> <Paragraph position="3"> Preparation of the String-Database: Using the different data-sets of the ATR spoken language database from the above-mentioned 4806 results, we prepared the StringDatabase. null We employed 3 as the threshold value for the frequency of the occurrence, and 10 as the length of a string, therefore obtaining 16655 strings.</Paragraph> </Section> <Section position="5" start_page="654" end_page="656" type="sub_section"> <SectionTitle> 3.2 Two Factors for Evaluation </SectionTitle> <Paragraph position="0"> We evaluated the following two factors before and after correction: (1) the counting of errors, and (2) the effectiveness of the method in understanding the recognized results.</Paragraph> <Paragraph position="1"> To confirm the effectiveness, the recognition results were evaluated by two native Japanese. They assigned one of five levels, A-E, to each recognition result before and after correction, by comparing it with the corresponding actual utterance. Finally, we employed the overall results of the stricter of two evaluators.</Paragraph> <Paragraph position="2"> (A) No lacking in the meaning of the actual utterance, and with perfect expression.</Paragraph> <Paragraph position="3"> (B) No lacking in meaning, but with slightly awkward expression.</Paragraph> <Paragraph position="4"> (C) Slightly lacking in meaning.</Paragraph> <Paragraph position="5"> (D) Considerably lacking in meaning.</Paragraph> <Paragraph position="6"> (E) Unable to understand, and unable to imagine the actual utterance.</Paragraph> <Paragraph position="7"> 4. Results and Discussions 4.1 Decrease in the Number of Errors The values inside brackets 0 are the rate of decrease In EPC+SSC, the rate of decrease was 8.5%, and the decrease was obtained in all type of errors. In SSC, the number of deletion errors increased by 3.9%. The reason for this is that in SSC, correction by deleting the part of a substitution error frequently caused new deletion errors as shown in the example below. From the standpoint of the correction it might be a mistaken correction, but it increases understanding of the results by deleting a noise and makes the results viable for machine translation. It therefore practically refines the speech recognition results.</Paragraph> </Section> <Section position="6" start_page="656" end_page="656" type="sub_section"> <SectionTitle> 4.2 Improvement of Understandability </SectionTitle> <Paragraph position="0"> Table 4-2 shows the number of change in the evaluated level.</Paragraph> <Paragraph position="1"> The rate of improvement after correction was 7%. There were also a lot of cases that improved their level by recovering content words. For example, the word &quot;cash&quot; was recovered in '~,~ ~, &quot;~'--~,@, &quot;~&quot; (before-'after), &quot;guide&quot; in '~i\]X-J --~ ~-&quot;~', etc. These results confirm that our method is effective in improving the understanding of the recognition results.</Paragraph> <Paragraph position="2"> On the other hand, there were four level-down cases. Three of these cases were caused by the misdetection of errors in the SSC procedure. The remaining case occurred in the EPC procedure. The Error-Pattern used in this case could not be excluded by the condition of non-side effects because its Error-Part was not included in the corpus of the actual utterance.</Paragraph> <Paragraph position="3"> Table 4-2 The number of changes in the evaluated level before and aJier correction.</Paragraph> <Paragraph position="4"> This number is the minimum number of character insertions, deletions or substitutions necessary to transform the result of recognition into a corresponding actual utterance. included in the recognition results.</Paragraph> <Paragraph position="5"> The recognition results improving their level after cone~tion mosdy fell in the range of erroneous numbers by not more than 7. The reasons for this are that with there being many errors, the failure of the corrections increases because the corrections are prevented by other surrounding errors. In addition, when only a few successful corrections have been made, they have little influence on the overall understanding.</Paragraph> <Paragraph position="6"> These results show that the proposed method is more applicable for a recognition result having a few errors, as compared with one having many errors.</Paragraph> </Section> </Section> class="xml-element"></Paper>