XML Viewer - h93-1093

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1093_metho.xml
Size: 3,655 bytes
Last Modified: 2025-10-06 14:13:27
<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1093">
  <Title>EVALUATION AND ANALYSIS OF AUDITORY FRONT ENDS FOR ROBUST SPEECH RECOGNITION PROGRAM SUMMARY* MODEL</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROGRAM GOALS
</SectionTitle>
    <Paragraph position="0"> The purpose of this work is to integrate a number of auditory model front ends into a high-performance HMM recognizer, to test and evaluate these front ends on noisy speech, and to analyze the results in order to develop a more robust front end which may combine features of a number of the current auditory model-based systems.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
BACKGROUND
</SectionTitle>
    <Paragraph position="0"> This project was motivated by the need for improved speech recognition in noise, and by expectation that auditory model front ends could make recognition more robust to noise, microphone variation, and speaking style.</Paragraph>
    <Paragraph position="1"> The project has focussed on implementing, evaluating, and comparing three promising auditory front ends: (1) the mean-rate and synchrony outputs of S. Seneff'sauditory modal; (2) the ensemble interval histogram (EIH) model developed by O. Ghitza; and (3) the IMELDA model due to M. Hunt. Additional comparisons have been carried out between baseline systems using mel-cepstra derived from filterbank and LPC analysis.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT ACCOMPLISHMENTS
</SectionTitle>
    <Paragraph position="0"> The three auditory models (Seneff, EIH and IMELDA) have been compared extensively among themselves and with a.</Paragraph>
    <Paragraph position="1"> mel-cepstrum front end for HMM isolated-word recognition on the TI-105 isolated word corpus. Conditions tested have included additive white noise, additive speech babble noise, and spectral variability due to microphone placement, channel, and acoustic recording environment. The best results from the auditory models were shown to provide small but consistent improvement over mel-cepstrum under conditions of high noise and spectral variability. These small improvements may not warrant the added complexity of the auditory models.</Paragraph>
    <Paragraph position="2"> Additional comparisons between mel-filterbank (MFB) and LPC-based cepstrum front ends were conducted, showing significant advantages for MFB in noise; the gain in moving from LPG to MFB was greater than the gain in moving from MFB to any of the auditory models. Most recently,</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
*THIS WORK WAS SPONSORED BY THE DEFENSE AD-
VANCE RESEARCH PROJECTS AGENCY. THE VIEWS EX-
PRESSED ARE THOSE OF THE AUTHOR AND DO NOT
REFLECT THE OFFICIAL POLICY OR POSITION OF THE
</SectionTitle>
    <Paragraph position="0"> U.S. GOVERNMENT.</Paragraph>
    <Paragraph position="1"> selected CSR experiments have been performed on resource management comparing auditory models to MFB. These results were confirmed at other sites.</Paragraph>
    <Paragraph position="2"> As yet, no improvements have been achieved with the auditory models.</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="399" type="metho">
    <SectionTitle>
PLANS
</SectionTitle>
    <Paragraph position="0"> Plans include: (1) further investigation of dimension ality reduction using principal components and linear discriminant analysis, and (2) completion of the CSR resource management tests on the auditory models.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML