XML Viewer - h90-1035

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/90/h90-1035_concl.xml
Size: 3,196 bytes
Last Modified: 2025-10-06 13:56:33
<?xml version="1.0" standalone="yes"?>
<Paper uid="H90-1035">
  <Title>Phoneme-in-Context Modeling for Dragon's Continuous Speech Recognizer</Title>
  <Section position="6" start_page="167" end_page="168" type="concl">
    <SectionTitle>
1. The DragonDictate isolated-word recognition system
</SectionTitle>
    <Paragraph position="0"> uses 25,000 word models based on PICs and phonemic segments, built from the same database of training utterances that is used for connected speech. Recognition performance for two diverse texts, a short story by Hemingway and a newspaper article on parallel processing, was 83% correct on the first 500 words. After adaptation on 1500 words, performance rose to 89% correct for the speaker who recorded the training database. For two other speakers, performance without adaptation was dismal (45% for a male speaker, 18% for a female speaker), but it rose after adaptation on 2500 words to 87% for the male speaker and 85% for the female.</Paragraph>
    <Paragraph position="1"> 2. For connected digit recognition, the error rate on five-digit strings was less than half a percent for each of three different speakers after adaptation. Less than 0.2% of the training database consists of digit strings.</Paragraph>
    <Paragraph position="2"> 3. For the mammography task used in testing the real-time implementation of continuous-speech recognition\[2\] (842 words, 1023 distinct pronunciations), recognition was tested on a set of 1000 sentences which had not been used either in selecting training utterances or in determining which PICs should be modeled. Several hundred of the PICs in this test data did not occur in any of the &amp;quot;practice&amp;quot; sentences that had been for training; these PICs were modeled only by genefic PICs in which an average was taken over all left and fight contexts. About 15% of the training database consists of short phrases extracted from the 3000 practice sentences. On this task, whose perplexity is about 66, 96.6% of words were recognized correctly.</Paragraph>
    <Paragraph position="3"> Performance was slightly better on the &amp;quot;practice&amp;quot; sentences that had been used to construct the set of PICs to be modeled, sentences for which no generic PICs were required.</Paragraph>
    <Paragraph position="4"> Preliminary results indicate that after several hundred sentences of adaptation, performance close to this level can be achieved for other speakers.</Paragraph>
    <Paragraph position="5"> 4. As a test of performance on a connected-speech task which was not so heavily used in constructing the training database, recognition was carried out on the 600 training sentences of the Resource Management task using the word-pair grammar. This task has a perplexity of about 60, comparable to that of the mammography task. PICs were built from the same training database as descfibed above, in which about 5% of the tokens are phrases based on the resource management vocabulary. Recognition performance was 97.3% correct on a per-word basis. For this task, as for the mammography &amp;quot;practice&amp;quot; sentences, all PICs had been modeled, so that no genetic PICs were required.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML