File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/h93-1109_metho.xml

Size: 4,549 bytes

Last Modified: 2025-10-06 14:13:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="H93-1109">
  <Title>HIGH PERFORMANCE SPEECH RECOGNITION USING CONSISTENCY MODELING VassUios Digalakis</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
HIGH PERFORMANCE SPEECH RECOGNITION
USING CONSISTENCY MODELING
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PROJECT GOALS
</SectionTitle>
    <Paragraph position="0"> The primary goal of this project is to develop acoustic roodcling techniques that advance the state-of-the-art in speech recognition, focusing on those techniques that relax the hidden Markov model's improper independence assumptions.</Paragraph>
    <Paragraph position="1"> Such techniques should both improve robustness to systematic variations such as microphone, channel, and speaker, by conditioning state's acoustic output distributions on long-term measurements, as well as improve general acoustic calibration by removing improper short-term (e.g. frame to frame) independence assumptions.</Paragraph>
    <Paragraph position="2"> In order to perform this work certain infraslructure needs to be developed. This includes the development of a state-of-the-art baseline recognition system for the development task (ARPA's Wall-Street Journal Task); the development of search techniques that allow experiments with computationally expensive techniques to have reasonable turnaround times; and the development of modular software that enables rapid prototyping of new algorithms.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RECENT RESULTS
</SectionTitle>
    <Paragraph position="0"> * We have built a software library that implements the components of an HMM recognition system dealing with the observation distributions. The functional interface is designed to enable fast integration of new acoustic modeling techniques * We introduced a new search strategy, called Progressive Search, that constrains the search space of computationally expensive systems using simpler and faster systems in an iterative fashion. Using the word graphs created during the initial recognition pass as grammars in subsequent recognition passes, we have been able to reduce recognition time of systems that use more complex acoustic models and higher order language models by more than an order of magnitude.</Paragraph>
    <Paragraph position="1"> * We developed a less-traditional, continuous output distribution system where different allophones of the same phone share the same sets of Gaussians, but different Gaussians are used for different phones. Our phonetically-fled mixture system achieved a 16% reduction in error rate over a typical tied mixture system.</Paragraph>
    <Paragraph position="2"> * We found that the different pronunciation dictionaries and the corresponding phone sets that the various sites used in the last CSR evaluations can account for differences in performance in the order of 10 - 15%.</Paragraph>
    <Paragraph position="3"> * We developed new algorithms for local consistency by modeling the correlation between spectral features at neighboring time frames. This acoustic correlation is used to improve the accuracy of the acoustic model by conditioning the state output probabilities on the previous frame's observations.</Paragraph>
    <Paragraph position="4"> * We have achieved a 31% reduction in error rate over our November evaluation system on the 5K, non verbalized punctuation development set. The improvement is the combined effect of the phonetically-fled mixtures, the improved pronunciation dictionaries and replacement of RASTA filtering with cepstral-mean removal on a sentence basis.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="415" type="metho">
    <SectionTitle>
PLANS FOR THE COMING YEAR
</SectionTitle>
    <Paragraph position="0"> * Continue exploring trade-offs in parameter tying for continuous distribution acoustic models. We will sample other points beyond fled-mixture, phoneticallytied mixture, and untied Ganssian-mixture systems. * Explore techniques for modeling the global consistencies of speaker and channel effects across the speech acoustic models.</Paragraph>
    <Paragraph position="1"> * Continue to develop search techniques that both allow us to perform experiments using computationally burdensome techniques, as well as those that allow us to implement these systems as real-time demonstrations.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML