File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/90/h90-1035_intro.xml

Size: 1,740 bytes

Last Modified: 2025-10-06 14:04:56

<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1035">
  <Title>Phoneme-in-Context Modeling for Dragon's Continuous Speech Recognizer</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> The nature of the training process for a speech-recognition system changes radically once the size of the vocabulary becomes larger than the number of words for which a user is willing to provide training tokens. Below this threshold, it is reasonable to make an independent model for each word in the vocabulary. Such a model, based on data from that word and no others, can in principle capture all the acoustic-phonetic subtleties of the word, even though the phonetic spelling of the word is not even used in constructing the model.</Paragraph>
    <Paragraph position="1"> For continuous speech recognition, the quantity of data required for complete training grows much more rapidly than vocabulary. In the simple case of a recognizer for three-digit strings, for example, each digit should at a minimum be trained in initial, medial, and final position, while for optimum performance all digit triples should be included in the training data.</Paragraph>
    <Paragraph position="2"> The approach to training at Dragon Systems has been to regard the recognition task as all of natural English, whether isolated words or connected speech. We have developed a training database from which we have constructed recognition models for a 30,000 word isolated-word recognizer and for two different 1000-word connected speech tasks. All these recognition models are based on the same set of &amp;quot;phonemes in context.&amp;quot;</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML