File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/h90-1036_metho.xml
Size: 4,294 bytes
Last Modified: 2025-10-06 14:12:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H90-1036"> <Title>A Rapid Match Algorithm for Continuous Speech Recognition</Title> <Section position="3" start_page="170" end_page="171" type="metho"> <SectionTitle> 3. Some Results on the </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="170" end_page="171" type="sub_section"> <SectionTitle> Mammography Task </SectionTitle> <Paragraph position="0"> The recognition task that has been used during the development of our rapid match algorithm has been an 842 word vocabulary drawn from mammography reports, many of which are actually transcriptions of oral dictations of radiologists. Since many of the words in the vocabulary can be pronounced in more than one way, we have addedl81 additional prounciations; thus the effective vocabulary size is 1023. With this vocabulary size, we have found that running our continuous speech recognizer with rapid match speeds up the system by a factor of 5 to 10, relative to running without it.</Paragraph> <Paragraph position="1"> One obvious way of assessing the quality of the algorithm is simply to run the continuous speech recognizer with and without rapid match, and observe how many errors are introduced (or removed ) by virtue of its use. On a test set of 1000 sentences,spoken by one person, consisting of a total of 8571 words, the word error rate was 3.7% running with rapid match returning a list of around 40 words per frame, and it was also 3.7% running without rapid match; in this particular experiment the total number of errors happened to be exactly the same, although the actual errors were somewhat different. This strategy for assessing performance is rather global, however, and it proved to be useful to have an assessment tool which provides more detail on where rapid match makes mistakes.</Paragraph> <Paragraph position="2"> By running the recognizer in a mode where it know s the correct transcription of each of the utterances, it is possible to compute a reasonable segmentation of each utterance; that is to say, we can compute in which frame each word in the transcription is most likely to begin. Then, we can ask thisquestion for each word in each utterance: what is the rank of the score of the correct word among the candidates returned by the rapid match module in the frame in which the word begins (according to the segmenter)? At Dragon, there is an interactive program that has been written for the purpose of studying this question; by using it, the investigator finds it easy to detect words which have bad or inadequate rapid match models. Its basic functionality is to display and record the words that rapid match passes through to the recognizer in given frames. Table 1 displays the percentage of the time that the rapid match algorithm passes through the correct word in the correct frame as a function of the list size that the recognizer requests. Beyond a fist size of 40, there are diminishing returns.</Paragraph> <Paragraph position="3"> which the word begins, based on 700 tokens One of the important features of the algorithm that the evaluation program highfights is that even if a word is not passed through in the &quot;ideal&quot; frame (ideal from the Hidden Markov Model's point of view), it is very likely to be passed through in a nearby flame. Because of the flexibility of dynamic programming, the inflexibility of our linear segmentation based rapid matcher does not prove to be as much of a disadvantage as one might have guessed. This observation is reinforced by the fact that even though 4% of the time the correct word does not appear in the rapid match list in the correct frame, running with the rapid match procedure (in a sufficiently conservative mode) produces no degradation in overall recognition performance, relative to running without it. In many cases where rapid match fails to pass back the word in the correct frame but does pass it through on a nearby frame, one sees upon inspecting a spectrogram of the utterance that the segmentation is hard to do by eye, and that the segmenter has made a reasonable choice for itself but not for the rapid matcher. In those cases the recognizer very often gets the word correct.</Paragraph> </Section> </Section> class="xml-element"></Paper>