File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/89/h89-2038_evalu.xml

Size: 7,921 bytes

Last Modified: 2025-10-06 14:00:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-2038">
  <Title>Large-Vocabulary Speaker-Independent Continuous Speech Recognition with Semi.Continuous Hidden Markov Models</Title>
  <Section position="5" start_page="277" end_page="278" type="evalu">
    <SectionTitle>
3. EXPERIMENTAL EVALUATION
3.1, Analysts Conditions
</SectionTitle>
    <Paragraph position="0"> For both training and evaluation, the standard Sphinx front-end consists of t2th order bilinear transformed LPC cepstrum \[12\].</Paragraph>
    <Paragraph position="1"> The complete database consists of 4358 training sentences from 105 speakers Qune-train~ and 300 test sentences from 12 speakers.</Paragraph>
    <Paragraph position="2"> The vocabulary of the Resource Management database is 99I words. There is also an uncial word.pair recognition grammar.</Paragraph>
    <Paragraph position="3"> which is just a list of allowable word pairs without probabilities for the purpose of reducing the recognition perplexity to about 60.</Paragraph>
    <Paragraph position="4"> 3.2. Experimental Results Using Bilinear Transformed Cepstrum Discrete HMMs and continuous mixture HMMs based on 200 generalized triphones are first experimented as benchmarks. The discrete HMM is the same as Sphinx except only 200 generalized triphones are used \[12\].</Paragraph>
    <Paragraph position="5"> In the continuous mixture HMM implemented here. the cepstrum, difference cepstrum, normalized energy, and difference energy are packed into one vector. This is similar to the one codebook implementation of the discrete HMM \[12I. Each continuous output probahllity consists of 4 diagonal q;aus~*an pruhahlhty density function as Eq ,2~ To obte,n rehahle initial m,~dels for the con.</Paragraph>
    <Paragraph position="6"> tinuous mixture HMM. the Viterbi alignment w,th the d,screte HMM is used to phonetically segment and label trammg speech.</Paragraph>
    <Paragraph position="7"> These labeled segments are then clustered by using the k-means clustering algorithm to obtain initial means and diagonal covariances. The forward-backward algorithm is used iteratively for the monophone models, which are then used as initial models for the generalized trlphone models. Though contmuous mixture HMM was reported to significantly better the performance of the discrete HMM \[15\]. for the experiments conducted here. it is signi/~cantly worse than the discrete HMM Why is this paradox? One expla.nation is that multiple codebooks are used in the discrete HMM. therefore the VQ errors for the discrete HMM are not so serious here. Another reason may be that the diagonal coveriance assumption is not appropriate for the bilinear transformed LPC cepstrum since many coefficients are strongly correlated after the transformation. Indeed, observation of average covariance matrix for the bilinear transformed LPC cepstrum shows that values of off-diagonal components are generally quite large.</Paragraph>
    <Paragraph position="8"> For the semi-continuous model, multiple codebooks are used instead of packing different feature parameters into one vector The initial model for the SCHMM comes directly from the discrete \[IMM with the VQ variance obtained from k-mean~ clustering for each codeword In computing the semi cuntinuousoutput probabil.</Paragraph>
    <Paragraph position="9"> ~ty den~*ty function, only the M ' i. 4 here, must qtcntficant codewordq are used for subsequent processing Under the same analysis condition, the percent correct tcorrect word percentage, and word accuracy tpercent correct - percent insertiont results of the discrete HMM. the continuous mixture IIMM, and the SCHMM are shown in Table 1.</Paragraph>
    <Paragraph position="10">  From Table 1. it can be observed that the SCHMM with top 4 codewords works better than both the discrete and continuous mixture HMM The SCHMM with top I codeword works actually worse than the discrete'HMM, which indicates that diagonal Gaussian assumption may be inappropriate here Though bilinear transformed cepstral coefficients could not be well modeled by the diagonal Gaussian assumption ,which was proven by the poor performance of the continuous mixture HMM and the SCHMM wit__h Gaussian assumption lwhich was proven by the poor performance of the continuous mixture HMM and the SCHMM with top I codeword), the SCHMM with tdp 4 codewords works modestly better than the discrete HMM. The improvement may primarily come from smoothing effect of theSCHMM, ie the robustness of multi.</Paragraph>
    <Paragraph position="11"> pie codewords and multiple codebooks in the sem,.continuous output probability representation, albeit 200 generalized triphone models are relatively well trained in comparison to standard Sphinx version \[12\], where I000 generalized triphone models are used.</Paragraph>
    <Paragraph position="12"> 3.3. Experimental Results Using Less Correlated Data If the diagonal Gaussian covariance is used, each dimension in speech vector should be un-correlated. In practice, this can be partially satisfied by using less correlated feature as acoustic observation representation. One way to reduce correlation is principal component projection. In the implementation here. the projection matrix is computed by first pooling together the bilinear * transformed cepstrum of the whole training sentences, and then computing the eigenvector of that pooled coveriance matrix Unfortunately. only insignificant improvements are obtained based on such a projection\[7\] This is because the covariance for each codeword is quite different, and suc.h a proJection only makes average covariance diagonal, which is inadequate  As bilinear transformed cepstral coefficients could not be well modeled hy diagnnal Gaus~ian prohahihty density function, expertments without bdmear transformation are conducted The lSth order cepstrum is used here for the SCHMM because of less correlated characteristics of the cepstrum With 4:\]58 Ira=sang sentences {june-trainL test results of 300 sentences cjune.test~ are  Here. the recognition accuracy of the SCHMM is significantly improved in comparison with the discrete ltMM, and error reduction is over 29% Even the SCHMM with top one codeword is used. it is still better than the discrete HMM '855~ vs, ~3S%,.</Paragraph>
    <Paragraph position="13"> Use of multiple codewords ,top4 and top6~ m the semi-conhnuous output probability density functmn greatly improves the word accuracy Ifrom 85.5% to 88.6'31. Further increase of codewords used in the semi-continuous output probability density functions shows no improvement on word accuracy, but substantial growth of computational complexity. From Table 2. it can be seen that the SCHMM with top four codewords is adequate 188 5% ~ \[n contrast.</Paragraph>
    <Paragraph position="14"> when bilinear transformed data was used. the error reduction is less than 10% in comparison to the discrete HM:.I. and the SCHMM with top one codeword is actually slightly worse than the discrete HMM. This strongly md~cates that appropriate feature ,s very ~mportant if continuous probability density function, rape.</Paragraph>
    <Paragraph position="15"> cially diagonal covariance a*sumption, is used If assumption is inappropriate, maximum likelihood estimation will only maximize the tcrnng assumption Although more than 29&amp;quot;; error reduction has been achieved for lath order LPC analyses using diagonal covariance assumption, the last results with the discrete HMM 'bdinear transformed cepstrum. 883~ and the SCHMM 1lath order ccpstrum. ~3R&amp;quot;;, are about the same This ~u~zgest that hil. ineur trunsformutmn is helpful for recognlt*on, hut have corre.</Paragraph>
    <Paragraph position="16"> luted coefT~c~ents, which is inappropriate to the dt,gonul L;auss,un ussumptmn It can be expected that wHh the full covarlance SCHMM and bilinear transformed cepstral data. better recognition accuracy can be obtained</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML