File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/n04-4007_concl.xml
Size: 2,070 bytes
Last Modified: 2025-10-06 13:54:03
<?xml version="1.0" standalone="yes"?> <Paper uid="N04-4007"> <Title>Advances in Children's Speech Recognition within an Interactive Literacy Tutor</Title> <Section position="9" start_page="0" end_page="0" type="concl"> <SectionTitle> 7 Practical Real-Time Implementation </SectionTitle> <Paragraph position="0"> The research systems described in Sect. 5 and 6 do not operate in real-time since multiple adaptation passes over the data are required. To address this issue, we have implemented a real-time system that operates on small pipelined audio segments (250ms on average).</Paragraph> <Paragraph position="1"> When evaluated on the read-aloud task (Sect. 5), the initial baseline system achieves an error rate of 19.5%.</Paragraph> <Paragraph position="2"> This system has a real-time factor of 0.56 on a 2.4 GHz Intel Pentium 4 PC with 512MB of RAM. When integrated, the proposed methods show the error rate can be reduced from 19.5% to 12.7% (compare with 10.7% error research system in Table 1(D)). The revised system which incorporates dynamic language modeling operates 35% faster than the single language model method while also reducing the variance in real-time factor for each processed chunk of audio. Further gains are possible by incorporating adaptation in an incremental manner. For example, in Table 3(C) a real-time system that incorporates incremental unsupervised maximum likelihood linear regression (MLLR) adaptation of the Gaussian means is shown. This final real-time system simultaneously adapts both language and acoustic model parameters during system use. The system is now being refined for deployment in classrooms within the CLT project. We were able to further improve the system after the submission deadline. The current WER on the story read aloud task improved to 7.6%; while a WER of 32.2% was achieved on the summary recognition task. The improvements are due to the inclusion of a breath model and the additional use of audio data from 103 second graders for more accurate acoustic modeling.</Paragraph> </Section> class="xml-element"></Paper>