File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/h92-1097_metho.xml
Size: 3,927 bytes
Last Modified: 2025-10-06 14:13:09
<?xml version="1.0" standalone="yes"?> <Paper uid="H92-1097"> <Title>ROBUST CONTINUOUS SPEECH RECOGNITION</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> ROBUST CONTINUOUS SPEECH RECOGNITION </SectionTitle> <Paragraph position="0"> PIs: John Makhoul and Richard Schwartz makhoul@bbn.com, schwartz@bbn.com BBN Systems and Technologies, 10 Moulton St., Cambridge, MA 02138</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> OBJECTIVES </SectionTitle> <Paragraph position="0"> The pnrnary objective of this basic research program is to develop robust methods and models for speaker-independent acoustic recognition of spontaneously-produced, :ontinuous speech. The work has focussed on developing accurate and detailed models of phonemes and their coarticulation for the purpose of large-vocabulary continuous speech recognition. Important goals of this work are to achieve the highest possible word recognition accuracy in continuous speech and to develop methods for the rapid adaptation of phonetic models to the voice of a new speaker.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> RECENT RESULTS * Ported our BYBLOS speech </SectionTitle> <Paragraph position="0"> recognition software to run on Silicon Graphics Inc. workstations, in addtion to Sun workstations. In the process, we consolidated our programs and modularized them to make them more easily portable to other sites and applications.</Paragraph> <Paragraph position="1"> * Developed methods for modeling spontaneous speech effects in the Airline Travel Information System (ATIS) domain.</Paragraph> <Paragraph position="2"> * Developed a novel method for silence modeling in spontaneous speech, especially to deal with the problem of missing silences in the transcriptions. The method works iteratively by hypothesizing silences everywhere in the grammar, performing recognition, correcting the transcnptions based on the recognition, and then retraining. * Developed a trigram language model for the ATIS domain. This was possible because of the availability of sufficient training data. Because of the potential large amount of computation associated with trigram grammars, we achieved large savings in computation by simply rescoring an N-best list that was produced using a bigram grammar.</Paragraph> <Paragraph position="3"> In the most recent speech recognition test on the ATIS domain, using data collected from five different sites, our BYBLOS system achieve a 9.6% average word error rate over all utterances. This performance was the best among all sites tested.</Paragraph> <Paragraph position="4"> The N-best paradigm allows the simple and modular integration of many knowledge sources. Recently, we have combined our BYBLOS HMM system to another system at Boston University based on Stochastic Segment Models.</Paragraph> <Paragraph position="5"> The hybrid system improved speech recognition performance over our state-of-the-art HMM system.</Paragraph> <Paragraph position="6"> A hybrid system consisting of our BYBLOS system and another phonetic modeling technique using neural networks, called Segmental Neural Networks, has also succeeded in improving performance over our HMM system by reducing the error rate by 25%.</Paragraph> </Section> <Section position="4" start_page="0" end_page="464" type="metho"> <SectionTitle> PLANS FOR THE COMING YEAR </SectionTitle> <Paragraph position="0"> For the coming year, we plan to continue our work on improving speech recognition performance on spontaneous speech in the ATIS domain. In addition, we plan to start work on the new Wall Street Journal continuous speech recognition corpus. Our research into improved modeling will include the exploration of new acoustic features and methods for rapid speaker adaptation.</Paragraph> </Section> class="xml-element"></Paper>