File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/93/h93-1016_concl.xml
Size: 2,152 bytes
Last Modified: 2025-10-06 13:57:03
<?xml version="1.0" standalone="yes"?> <Paper uid="H93-1016"> <Title>An Overview of the SPHINX-II Speech Recognition System</Title> <Section position="8" start_page="84" end_page="85" type="concl"> <SectionTitle> 7. SUMMARY </SectionTitle> <Paragraph position="0"> Our contributions in SPHINX-II include improved feature representations, multiple-codebook semi-continuous hidden Markov models, between-word senones, multi-pass search algorithms, and unified acoustic and language modeling. The key to our success is our data-driven unified optimization approach. This paper characterized our contributionsby percent error rate reduction on the 5000-word WSJ task, for which we reduced the word error rate from 20% to 5% in the past year \[2\].</Paragraph> <Paragraph position="1"> Although we have made dramatic progress there remains a large gap between commercial applications and laboratory systems. One problem is the large number of out of vocabulary (OOV) words in real dictation applications. Even for a 20000-word dictation system, on average more than 25% of the utterances in a test set contain OOV words. Even if we exclude those utterance containing OOV words, the error rate is still more than 9% for the 20000-word task due to the limitations of current technology. Other problems are illustrated by the November 1992 DARPA stress test evaluation, where testing data comprises both spontaneous speech with many OOV words but also speech recorded using several different microphones. Even though we augmented our system with more than 20,000 utterances in the training set and a noise normalization component \[1\], our augmented system only reduced the error rate of our 20000-word baseline result from 12.8% to 12.4%, and the error rate for the stress test was even worse 'when compared with the baseline (18.0% vs. 12.4%).</Paragraph> <Paragraph position="2"> To sunmaarize, our current word error rates under different testing conditions are listed in Table 1. We can see from this The authors would like to express their gratitude to Raj Reddy and other members of CMU speech group for their help.</Paragraph> </Section> class="xml-element"></Paper>