File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/h90-1035_abstr.xml

Size: 2,385 bytes

Last Modified: 2025-10-06 13:47:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1035">
  <Title>Phoneme-in-Context Modeling for Dragon's Continuous Speech Recognizer</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> For large-vocabulary continuous speech recognition, the goal of training is to model phonemes with enough precision so that from the models one could reconstruct a sequence of acoustic parameters that accurately represents the spectral characteristics of any naturally-occurring sentence, including all coarticuladon effects that arise either between phonemes in a word or across word boundaries. The aim at Dragon Systems is to collect and process enough training data to accomplish this goal for all of natural spoken English rather than for any one restricted task.</Paragraph>
    <Paragraph position="1"> The basic unit that must be trained is the &amp;quot;phoneme in context&amp;quot; (PIC), a sequence of three phonemes accompanied by a code for prepausal lengthening. At present, syllable and word boundaries are ignored in defining PICs.</Paragraph>
    <Paragraph position="2"> More than 16,000 training tokens, half isolated words and half short phrases, were phonemically labeled by a semi. automatic procedure using hidden Markov models. To model a phoneme in a specific context, a weighted average is constructed from training data involving the desired context and acoustically similar contexts.</Paragraph>
    <Paragraph position="3"> For use in HMM continuous-speech recognition, each PIC is converted to a Markov model that is a concatenation of one to six node models. No phoneme, in all its contexts, requires more than 64 distinct nodes, and the total number of node models (&amp;quot;phonemic segments&amp;quot;) required to construct all PICs is only slightly more than 2000. As a result, the entire set of PICs can be adapted to a new speaker on the basis of a couple of thousand isolated words or a few hundred sentences of connected speech.</Paragraph>
    <Paragraph position="4"> The advantage of this approach to training is that it is not task-specific. From a single training database, Dragon Systems has constructed models for use in a 30,000-word isolated-word recognizer, for connected digits, and for two different thousand-word continuous-speech tasks.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML