File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-2038_metho.xml
Size: 11,226 bytes
Last Modified: 2025-10-06 14:12:20
<?xml version="1.0" standalone="yes"?> <Paper uid="H89-2038"> <Title>Large-Vocabulary Speaker-Independent Continuous Speech Recognition with Semi.Continuous Hidden Markov Models</Title> <Section position="4" start_page="0" end_page="277" type="metho"> <SectionTitle> 2. SEMI.CONTINUOUS HIDDEN MARKOV MODELS </SectionTitle> <Paragraph position="0"> 2.1. Discrete HMMs and Continuous HMMs An N:state Markov chain with state transition matrix A=\[a,:\]. i,j=l. 2 ..... N. where o, I denotes the transition probability from state i to state j; and a discrete output probability distribution. b:,O,', or continuous output probability density function b.tx) associated with each st*re j of the unobservable Markov chain is conszdered here. Here O, represents discrete observation symbols tusually VQ mdicesL and x represents continuous observations ~usually speech frame vectors~ of K-dimensional random vectors.</Paragraph> <Paragraph position="1"> With the discrete HMM. there are L discrete output symbols from a L.level VQ. and the output probability ts modeled with discrete probability distrtbutmns of these discrete symbols. Let O be the observed sequence, O= O,O,~''.O,r observed over T samples.</Paragraph> <Paragraph position="2"> Here O~, denotes the VQ codeword k, observed at time i. The observation probability of such an observed sequence. Pr' OI k ,. can be expressed as:</Paragraph> <Paragraph position="4"> where S is a particular state sequence. S ( fsss~ &quot;'&quot; .s,>. s~ ( {t.</Paragraph> <Paragraph position="5"> 2. . N!. and the summation is taken over all of the possible state ~equences. S. of the given model,k, which is represented byler, A.</Paragraph> <Paragraph position="6"> B,, where t7 is the initial state probability vector, A is the state tr~nsition matrix, snd B ~s the output probability distribution matrix. In the discrete HMM. classification of O,, from x, in the VQ may not be accurate.</Paragraph> <Paragraph position="7"> If the observation to be decoded is not v~ctor quantized, then the prohability dens=ty function, fl X IAL of producing an observat;on of continuous vector sequences given the model A, would be computed. ~nstead of the probability of generating a discrete observa. turn ~ymboi. Pr~OlL~. Here X is a sequence of contlnuous acous.</Paragraph> <Paragraph position="8"> t,c, vectors x. X = x~x 2 * XT The principal advantage of u~ng the continuous HMM is the ability to directly model speech parameters without involving VQ However, the continuousHMM requires considerably longer training and recognition times, especially when a mixture of several Gaussian probability density components is used. In the continuous Gaussian ~M-component) mixture HMM Ilt}. the output probability density of state y. b:,x).</Paragraph> <Paragraph position="9"> can be represented as</Paragraph> <Paragraph position="11"> where N(x,#.~I denotes a multi-dimensional Gauss~an density funct=on of mean vector p and covariance matrix ~. and ca, is a weighting coefficient for thekthGaussiancomponent With suche mixture, any arbitrary distribution can be approximately modeled, provided the mixture is large enough.</Paragraph> <Section position="1" start_page="276" end_page="277" type="sub_section"> <SectionTitle> 22 Semi.Continuous Hidden Markoc Models </SectionTitle> <Paragraph position="0"> In the discrete HMM. the discrete probability distrihutmns ere * ufficwnHy powerful to model any random events with a reasonable number of parameters. The major problem with the discrete output probability is that the VQ operatmn partitions the acoustic space znto separate regions according to some distortion measure This introduces errors as the partition operations may destroy the original ssgnal structure An improvement is to model the VQ codebook as a family of Gaussian density functions such that the d~stributionsare overlaped, rather thsndisjointed Each codeword nf the codehook can then be represented by one of the Gaussian probability density functions and may be used together with others to model the acoustic event. The use of a parametric family of fimte m~xture densities la mixture density VQ~ can then be closely combined with the HMM methodology. From the continuous mixture HMM point of view. the output probability in the continuous mixture HMM is shored among the Gaussian probability density functions of the VQ. This can reduce the number of free parameters to be estimated as well as the computational complex.</Paragraph> <Paragraph position="1"> ~ty. From the discrete HMM point of view. the pertihon of the VQ is unnecessary, and is replaced by the mixture density model.</Paragraph> <Paragraph position="2"> ~ng with overlap, which can effectively minimize the VQ errors The procedure, known as the EM algorithm \[3\]. is a specialization.</Paragraph> <Paragraph position="3"> to the mixture density context, of a general algorithm for obtain.</Paragraph> <Paragraph position="4"> ~ng maximum likelihood estimates. This has been defined earlier by Baum \[l\] in a similar way and has been widely used in HMM-based speech recognition methods. Thus, the VQ problems and HMM modeling problems can be unified under the same probabilistic framework to obtain an optimized VQ'HMM combination, which forms the foundation of the SCHMM.</Paragraph> <Paragraph position="5"> Provided that each eodeword of the VQ codeboek is represented by a Gaussian density function, for a given state s~ of HMM, the probability density function that sr produces a vector x can then be written as:</Paragraph> <Paragraph position="7"> where L denotes the VQ codebook level. For the sake of simplicity. the output probability density function conditioned on the codewords can be assumed to be independent of the Markov states st. ,31 can then be wrltten as:</Paragraph> <Paragraph position="9"> This equation is the key to the semi-continuous hidden Markov modeling Given the VQ codebook index 04, the prob..hllity density function f, xlOs ~ can be estimated with the EM algorithm !t71.</Paragraph> <Paragraph position="10"> or maxlmum hkelihood clustering. \[t can also be obtained from the HMM parameter estimatLondlrectiyasexplalned later. Csing q4~ to represent the semi.continuous outpu! probabihty density, it is possible to combine the codebook distortmn characteristics wRh the parameters of the discrete HMM under a umfied probabilistie framework. Here. each discrete output probability is weighted by the conUnuous conditional Gauss;an prohahility density function derived from VQ If these continuous VQ densnty functions are considered as the continuous output probahility denslty function in the continuous mlxture HMM. this also resembles the L-mixture continuous \[{MM with all the continuous output probability density functions shared with each other tn the VQ codebuok Here the discrete output probability in state ~. h}O, ~, becomes the weightnng coefficients for the mixture components In implementation of the SCHMM !8\], Eq. 14~ can be replaced by finding M most significant values offlxlO s) lwith M be one to six.</Paragraph> <Paragraph position="11"> the algorithm converges well in practice) over all possible codebook indices O v which can he easily obtained in the VQ procedure. This can significantly reduce the amount of computational load for subsequent output probability computation since M is of lower order than L Experimental results show this to perform well in speech recognmon \[8\], and result in an L-mixture continuous HMM with a computational complexity significantly lower than the continuous mixture HMM 2.3. Re-est~matton formulas for the SCHMM If the bJO~) are considered as the weighting coefficients of different mixture output probability density functions in the continuous mixture i{MM. thc re.csHmatinn algor=thm for the weighting coefficients can be extended to re-estimate h,~O#~ I of the SCHMM till. The re-estimation formulations can be more readdy computed by defining a forward partial probability, nJH. and a backward partial probability. ~fti) for any time t and state i as:</Paragraph> <Paragraph position="13"> The intermediate probabilities, x,#ij,k), &quot;ft(ij~. 'Y=~(il, ~q|j). and ~'~lj) can be defined as follows for efficient re-estimation of the model parameters:</Paragraph> <Paragraph position="15"> All these intermediate probabilities can be represented by X~II.</Paragraph> <Paragraph position="16"> Using Eq. ,5~ and ~6~. the re-est=mation equatmns for ~,. o:,. end</Paragraph> <Paragraph position="18"> lsisN; L.~j~L. ~9~ The means and covariances of the Gaussian probability density functions can alC/o be re-estimated to update the VQ codebook separately with Eq tS~ and r6).The feedback from the HMM esti- null mat,on result~ to the VQ codebouk ~mplies that the VQ codebook ~s opt,mixed based on the HMM likelihood maximization rather than mmimizing the total distortion errors from the set of tra,n. ,ng data. Although re-estimation of means and covariances of different models will involve inter-dependencies, the different density functions which are re-estimated are strongly correlated. To re-estimate the parameters of the VQ codebook, i.e. the means. 9:, and covariance matrices* ~,. of the codcbook index j. it is not d~cult to extend the continuous mixture HMM re-estimation algorithm with modified Q function. In general, it can be written Be:</Paragraph> <Paragraph position="20"> where L. denotes the HMM used; and e.apression~ in ! \] ere vari.</Paragraph> <Paragraph position="21"> shies of model p. In Eo ,10~ and ,IlL the re e~timatinn for the means and covariance matrices m the output probab~hty density function of the SCHMM are tied up with all the t{MM models.</Paragraph> <Paragraph position="22"> which ,s similar to the approach w,th bed irene,lion probabdity inside the model \[10\]. From Eq ,10b and ~1l '. ,t can he observed that they are merely a special form of EM algorithm for the parameter estimation of mixture dens,ty functions liT\]. which are closely welded into the HMM re-estimahon equations When multiple C/odebooks are used. each codebook represents a set of different speech parameters. One way to combine these multiple output observations is to assume that they are ,ndependent.</Paragraph> <Paragraph position="23"> and the output probability is computed as the product of the output probability of each codebook It has been shown that performance using multiple codebook can be substation~lly improved \[i3\] In the semi-continuous HMM. the semi-continuous output probability of multiple codebooks can also be computed as the product of the semi-continuous output probability for each codebook as Eq '4'. which consists of L-mixture continuous density fur, c.</Paragraph> <Paragraph position="24"> lions. In other word, the semi-continuous output probability could be modified as'</Paragraph> <Paragraph position="26"> whcrc c denotes the codebook used. The re-estimation algorithm for the mulhple codebook based HMM could be extended if Eq.</Paragraph> <Paragraph position="27"> ~6 a* ,s computed for each codeword of each codebook c with the combination of the rest codebook probability \[7\].</Paragraph> </Section> </Section> class="xml-element"></Paper>