Large-Vocabulary Speaker-Independent Continuous Speech Recognition 
with Semi.Continuous Hidden Markov Models 
X.D. Huang*, H.W. Hon, and K.F. Lee 
ABSTRACT 
A semi-continuous hidden Markov model based on the muluple 
vector quantization codebooks is used here for large.vocabulary 
speaker-independent continuous speech recognition in the 
techn,ques employed here. the semi-continuous output proba- 
b~hty densHy function for each codebook is represented by a 
comhinat,on of the corre,~ponding discrete output probablhttes 
of the hidden Markov model end the continuous Gauss,an den. 
stay functions of each individual codebook. Parameters of vec. 
tor qusnttzation codebook and hidden Markov model are mutu- 
ully optimized to achJeve an optimal model'codebook comb,na- 
tion under a untried probab,hshc framework Another advan- 
tages of thts approach is the enhanced robustness of the semi. 
continuous output probability by the combination of multiple 
codewords and multtple codebooks For a 1000.word speaker- 
mdependent continuous speech recognition using a word.pa,r 
grammar, the recogmtion error rate of the semi-conhnuouq bud. 
den Markov model was reduced by more than 29'~ and 41"3 in 
comparison to the discrete and continuous mixture htdden Mar. 
kay model respectively 
I. INTRODUCTION 
In the discrete hidden Msrkov model IHMML vector qusntization 
,VQ, produces the closet codebword from the codebook for each 
acoust,c observation. This mapping from continuous acoustic 
space to quantized discrete space may cause serious quantization 
errors for subsequent hidden Markov modeling. To reduce VQ 
errors, varmus smoothing techniques have been proposed for VQ 
and subsequent bidden Markov modeling \[9.12\] A distinctive 
techn,que ts multiple VQ codebook hidden Markov modeling, 
wh*ch has been shown to offer improved speech recognition sccu. 
racy\[$.t2}. In the multiple VQcodebook approach. VQdistortion 
can be stgntficantly minimized by partitioning the parsmeters into 
~eparatecodebooks. Another disadvantage of the discrete HMM ~s 
that the VQ codebook and tee discrete HMM are separste\]y 
modeled, which may not be an optimal combination for pattern 
classification (8\]. The discrete HMM uses the discrete output pro- 
bability distributions to model various acoustic events, which ere 
inherently superior to the continuous mixture HMM with mlzture 
of I small number of probability density functions since the 
distress distributions could model events with any shapes pro- 
vided enough training data exist. 
On the other hand. the continuous mixture HMM models the 
acoustic observation directly us,ng estimated continuous probabil- 
ity density functions without VQ. and has been shown to improve 
the recognition accuracy in compartson to the discrete HMM llfJ 
For speaker-independent speech recognition, mixture of a large 
number of probability density functions \[t4.16} or s large number 
of states in single-mixture case \[4\] are generally requJred to model 
characterisUcs of different speakers. However. mixture of a large 
number of probability density functions will considerably increase 
not only the computational complexity, but also the number of 
free parameters that can be reliabiely estimated In addition, the 
continuous mixture HMM has to be used wsth care as continuous 
• and Untventity ot r I~d,nburlh. 80 South St,die. Edinburgh EHI tHN. UK 
School of Computer Science 
Carnegie Mellon University 
Pittsburgh, PA 15213 
USA 
probability density functions make more assumption than the 
discrete HMM, especially when the diagonal cov•riance Gsussian 
probability density is used for simplicity \[15\]. To obtain a better 
recognition accuracy, acoustic parameters must be well chosen 
according to the assumption of the continuous probability density 
functions used 
The sent.continuous hidden Markov model 'SCHMM~ has been 
proposed to ext,'nd the d,screte tIMM by replacing discrete output 
probabthty d,strtbuttons wash a combination of the origtnal 
discrete output probabthty distributions and continuous probabil. 
~ty density functions of• Gaussian codebook {6\]. In the SCHMM. 
each VQ codeword is regarded as a Gaussian probability dens,ty 
\[ntuttlvely. from the discrete HMM point of view, the SCHMM 
tries to smooth the discrete output probabilities with multiple 
codewordcandidates in VQ procedure From the continuous mix- 
ture HMM point of view, the SCHMM ties all the continuous out- 
put probability densities across each individual HMM to form a 
shared Gaussian codebook, i e. a mixture of Gaussian probability 
densities. With the SCHMM. the codebook and HMM can be 
jointly re-estimated to achieve an optimal eodebookmodel combi- 
nation in sense of maxtmum likelihood criterion. Such a tying can 
also substantially reduce the number of free parameters and com- 
putational complexity in comparison to the continuous mixture 
HMM. while mains•in reasonablelv modeling power of a mixture 
of • I'~t'ge number of probability density functions. The SCHMM 
has shown to offer improved recognition accuracy in several 
speech recognition experiments (6.8, 14,2\]. 
\[n this study, the SCHMM is applied to Sphinx, • speaker- 
independent continuous speech recognition system. Sphinx uses 
multiple VQ codebooks for each acoustic observation \[12}. To 
apply the SCHMM to Sphinx. the SCHMM algorithm must be 
modified to accommodate multiple ¢odebooks and multiple 
codewords combination. For the SCHMM re-estimation algorithm, 
the modified unified re.estimation algorithm for multiple VQ code- 
books and bidden Markov models are proposed in this paper. The 
spplicability of the SCHMM to speaker-independent conUnuous 
speech is explored based on 200 generalized triphone models \[12\]. 
In the t000-word speaker-independent continuous speech recogni- 
tion task using word-pair grammar, the error rate was reduced by 
more than 29", and 41/', in comparison to the corresponding 
discrete HMM and continuous maxture HMM respectively. 
2. SEMI.CONTINUOUS HIDDEN MARKOV MODELS 
2.1. Discrete HMMs and Continuous HMMs 
An N:state Markov chain with state transition matrix A=\[a,:\]. 
i,j=l. 2 ..... N. where o, I denotes the transition probability from 
state i to state j; and a discrete output probability distribution. 
b:,O,', or continuous output probability density function b.tx) 
associated with each st•re j of the unobservable Markov chain is 
conszdered here. Here O, represents discrete observation symbols 
tusually VQ mdicesL and x represents continuous observations 
~usually speech frame vectors~ of K-dimensional random vectors. 
With the discrete HMM. there are L discrete output symbols from 
a L.level VQ. and the output probability ts modeled with discrete 
probability distrtbutmns of these discrete symbols. Let O be the 
observed sequence, O= O,O,~''.O,r observed over T samples. 
Here O~, denotes the VQ codeword k, observed at time i. The 
276 
observation probability of such an observed sequence. Pr' OI k ,. can 
be expressed as: 
T 
Pr(O\[k) = ~ ~,.l'\[n.,_,,b,,(O. ) (t) 
where S is a particular state sequence. S ( fsss~ "'" .s,>. s~ ( {t. 
2. . N!. and the summation is taken over all of the possible state 
~equences. S. of the given model,k, which is represented byler, A. 
B,, where t7 is the initial state probability vector, A is the state 
tr~nsition matrix, snd B ~s the output probability distribution 
matrix. In the discrete HMM. classification of O,, from x, in the 
VQ may not be accurate. 
If the observation to be decoded is not v~ctor quantized, then the 
prohability dens=ty function, fl X IAL of producing an observat;on of 
continuous vector sequences given the model A, would be com- 
puted. ~nstead of the probability of generating a discrete observa. 
turn ~ymboi. Pr~OlL~. Here X is a sequence of contlnuous acous. 
t,c, vectors x. X = x~x 2 • XT The principal advantage of u~ng 
the continuous HMM is the ability to directly model speech 
parameters without involving VQ However, the continuousHMM 
requires considerably longer training and recognition times, espe- 
cially when a mixture of several Gaussian probability density 
components is used. In the continuous Gaussian ~M-component) 
mixture HMM Ilt}. the output probability density of state y. b:,x). 
can be represented as 
M 
k=t 
where N(x,#.~I denotes a multi-dimensional Gauss~an density 
funct=on of mean vector p and covariance matrix ~. and ca, is a 
weighting coefficient for thekthGaussiancomponent With suche 
mixture, any arbitrary distribution can be approximately 
modeled, provided the mixture is large enough. 
22 Semi.Continuous Hidden Markoc Models 
In the discrete HMM. the discrete probability distrihutmns ere 
• ufficwnHy powerful to model any random events with a reason- 
able number of parameters. The major problem with the discrete 
output probability is that the VQ operatmn partitions the acoustic 
space znto separate regions according to some distortion measure 
This introduces errors as the partition operations may destroy the 
original ssgnal structure An improvement is to model the VQ 
codebook as a family of Gaussian density functions such that the 
d~stributionsare overlaped, rather thsndisjointed Each codeword 
nf the codehook can then be represented by one of the Gaussian 
probability density functions and may be used together with oth- 
ers to model the acoustic event. The use of a parametric family of 
fimte m~xture densities la mixture density VQ~ can then be 
closely combined with the HMM methodology. From the continu- 
ous mixture HMM point of view. the output probability in the con- 
tinuous mixture HMM is shored among the Gaussian probability 
density functions of the VQ. This can reduce the number of free 
parameters to be estimated as well as the computational complex. 
~ty. From the discrete HMM point of view. the pertihon of the 
VQ is unnecessary, and is replaced by the mixture density model. 
~ng with overlap, which can effectively minimize the VQ errors 
The procedure, known as the EM algorithm \[3\]. is a specialization. 
to the mixture density context, of a general algorithm for obtain. 
~ng maximum likelihood estimates. This has been defined earlier 
by Baum \[l\] in a similar way and has been widely used in HMM- 
based speech recognition methods. Thus, the VQ problems and 
HMM modeling problems can be unified under the same proba- 
bilistic framework to obtain an optimized VQ'HMM combination, 
which forms the foundation of the SCHMM. 
Provided that each eodeword of the VQ codeboek is represented by 
a Gaussian density function, for a given state s~ of HMM, the pro- 
bability density function that sr produces a vector x can then be 
written as: 
b,,(xl = f(zls,) = ,~ f(xlO,,,s,)Pr(Os, ls,) (3) 
where L denotes the VQ codebook level. For the sake of simpli- 
city. the output probability density function conditioned on the 
codewords can be assumed to be independent of the Markov states 
st. ,31 can then be wrltten as: 
f~xlsA = ~ f~xlOl,,Pr~Oj, ls ,, = hxlO~,b,'O~,~ '4~ j= tl 
This equation is the key to the semi-continuous hidden Markov 
modeling Given the VQ codebook index 04, the prob..hllity den- 
sity function f, xlOs ~ can be estimated with the EM algorithm !t71. 
or maxlmum hkelihood clustering. \[t can also be obtained from 
the HMM parameter estimatLondlrectiyasexplalned later. Csing 
q4~ to represent the semi.continuous outpu! probabihty density, it 
is possible to combine the codebook distortmn characteristics wRh 
the parameters of the discrete HMM under a umfied probabilistie 
framework. Here. each discrete output probability is weighted by 
the conUnuous conditional Gauss;an prohahility density function 
derived from VQ If these continuous VQ densnty functions are 
considered as the continuous output probahility denslty function 
in the continuous mlxture HMM. this also resembles the L- 
mixture continuous \[{MM with all the continuous output probabil- 
ity density functions shared with each other tn the VQ codebuok 
Here the discrete output probability in state ~. h}O, ~, becomes the 
weightnng coefficients for the mixture components 
In implementation of the SCHMM !8\], Eq. 14~ can be replaced by 
finding M most significant values offlxlO s) lwith M be one to six. 
the algorithm converges well in practice) over all possible code- 
book indices O v which can he easily obtained in the VQ pro- 
cedure. This can significantly reduce the amount of computational 
load for subsequent output probability computation since M is of 
lower order than L Experimental results show this to perform 
well in speech recognmon \[8\], and result in an L-mixture continu- 
ous HMM with a computational complexity significantly lower 
than the continuous mixture HMM 
2.3. Re-est~matton formulas for the SCHMM 
If the bJO~) are considered as the weighting coefficients of 
different mixture output probability density functions in the con- 
tinuous mixture i{MM. thc re.csHmatinn algor=thm for the 
weighting coefficients can be extended to re-estimate h,~O#~ I of the 
SCHMM till. The re-estimation formulations can be more readdy 
computed by defining a forward partial probability, nJH. and a 
backward partial probability. ~fti) for any time t and state i as: 
af(i~ = Pr(xt,x=.''' x,.sf=il~t) (Sa~ 
~f(i~ = Pr~xt.l.x,.Z " " " zrls~=i, ~) 
The intermediate probabilities, x,#ij,k), "ft(ij~. '¥~(il, ~q|j). and 
~'~lj) can be defined as follows for efficient re-estimation of the 
model parameters: 
X~lj.k) = Pr(s~=i, Sl.l=j, O~.,\]X, A,) 
7t=ijI = Pros, =i. st÷~ =jIX, ~) 
7~) = Pr{s~=ilX. k) (6) 
~,q.ki = Pr's~=i.O,~lX,~) 
~lk~ = Pr(O,,lX , k~ 
All these intermediate probabilities can be represented by X~II. 
Using Eq. ,5~ and ~6~. the re-est=mation equatmns for ~,. o:,. end 
b,(O#) can be written as: 
~, = 711i). 
T 
N~t',J) 
~¥}i) t=| 
# 
lsi,jsN; (8) 
lsisN; L.~j~L. ~9~ 
The means and covariances of the Gaussian probability density 
functions can al¢o be re-estimated to update the VQ codebook 
separately with Eq tS~ and r6).The feedback from the HMM esti- 
277 
mat,on result~ to the VQ codebouk ~mplies that the VQ codebook 
~s opt,mixed based on the HMM likelihood maximization rather 
than mmimizing the total distortion errors from the set of tra,n. 
,ng data. Although re-estimation of means and covariances of 
different models will involve inter-dependencies, the different den- 
sity functions which are re-estimated are strongly correlated. To 
re-estimate the parameters of the VQ codebook, i.e. the means. 9:, 
and covariance matrices• ~,. of the codcbook index j. it is not 
d~cult to extend the continuous mixture HMM re-estimation 
algorithm with modified Q function. In general, it can be written 
Be: 
T Ei E ~,,s ,,,I 
~; = r . I~j~L; ¢I0) 
and 
r E\[ ~,*j,,x,-;,.a,-~,:l 
~s = " ~" , ISj~2L. fll) 
where L. denotes the HMM used; and e.apression~ in ! \] ere vari. 
shies of model p. In Eo ,10~ and ,IlL the re e~timatinn for the 
means and covariance matrices m the output probab~hty density 
function of the SCHMM are tied up with all the t{MM models. 
which ,s similar to the approach w,th bed irene,lion probabdity 
inside the model \[10\]. From Eq ,10b and ~1l '. ,t can he observed 
that they are merely a special form of EM algorithm for the 
parameter estimation of mixture dens,ty functions liT\]. which are 
closely welded into the HMM re-estimahon equations 
When multiple ¢odebooks are used. each codebook represents a 
set of different speech parameters. One way to combine these mul- 
tiple output observations is to assume that they are ,ndependent. 
and the output probability is computed as the product of the out- 
put probability of each codebook It has been shown that perfor- 
mance using multiple codebook can be substation~lly improved 
\[i3\] In the semi-continuous HMM. the semi-continuous output 
probability of multiple codebooks can also be computed as the pro- 
duct of the semi-continuous output probability for each codebook 
as Eq '4'. which consists of L-mixture continuous density fur, c. 
lions. In other word, the semi-continuous output probability could 
be modified as' 
L b,,x, : II E:,alo;,,~:,,o;,, 
,12, 
whcrc c denotes the codebook used. The re-estimation algorithm 
for the mulhple codebook based HMM could be extended if Eq. 
~6 a* ,s computed for each codeword of each codebook c with the 
combination of the rest codebook probability \[7\]. 
3. EXPERIMENTAL EVALUATION 
3.1, Analysts Conditions 
For both training and evaluation, the standard Sphinx front-end 
consists of t2th order bilinear transformed LPC cepstrum \[12\]. 
The complete database consists of 4358 training sentences from 
105 speakers Qune-train~ and 300 test sentences from 12 speakers. 
The vocabulary of the Resource Management database is 99I 
words. There is also an uncial word.pair recognition grammar. 
which is just a list of allowable word pairs without probabilities 
for the purpose of reducing the recognition perplexity to about 60. 
3.2. Experimental Results Using Bilinear Transformed Cepstrum 
Discrete HMMs and continuous mixture HMMs based on 200 gen- 
eralized triphones are first experimented as benchmarks. The 
discrete HMM is the same as Sphinx except only 200 generalized 
triphones are used \[12\]. 
In the continuous mixture HMM implemented here. the cepstrum, 
difference cepstrum, normalized energy, and difference energy are 
packed into one vector. This is similar to the one codebook imple- 
mentation of the discrete HMM \[12I. Each continuous output 
probahllity consists of 4 diagonal q;aus~*an pruhahlhty density 
function as Eq ,2~ To obte,n rehahle initial m,~dels for the con. 
tinuous mixture HMM. the Viterbi alignment w,th the d,screte 
HMM is used to phonetically segment and label trammg speech. 
These labeled segments are then clustered by using the k-means 
clustering algorithm to obtain initial means and diagonal covari- 
ances. The forward-backward algorithm is used iteratively for the 
monophone models, which are then used as initial models for the 
generalized trlphone models. Though contmuous mixture HMM 
was reported to significantly better the performance of the discrete 
HMM \[15\]. for the experiments conducted here. it is signi/~cantly 
worse than the discrete HMM Why is this paradox? One expla- 
.nation is that multiple codebooks are used in the discrete HMM. 
therefore the VQ errors for the discrete HMM are not so serious 
here. Another reason may be that the diagonal coveriance 
assumption is not appropriate for the bilinear transformed LPC 
cepstrum since many coefficients are strongly correlated after the 
transformation. Indeed, observation of average covariance matrix 
for the bilinear transformed LPC cepstrum shows that values of 
off-diagonal components are generally quite large. 
For the semi-continuous model, multiple codebooks are used 
instead of packing different feature parameters into one vector 
The initial model for the SCHMM comes directly from the discrete 
\[IMM with the VQ variance obtained from k-mean~ clustering for 
each codeword In computing the semi cuntinuousoutput probabil. 
~ty den~*ty function, only the M ' i. 4 here, must qtcntficant code- 
wordq are used for subsequent processing Under the same 
analysis condition, the percent correct tcorrect word percentage, 
and word accuracy tpercent correct - percent insertiont results of 
the discrete HMM. the continuous mixture IIMM, and the 
SCHMM are shown in Table 1. 
\[ Table 1 
Average recognition accuracy 
t#pes percent correct (u:ord accuracy) 
Discrete HMM 8915% (88.0q 
Continuous Mixture HMM 842~ ~81.3~ 
SCHM.M - topl 87.2c~ ,84.0~) 
SCHMM ~ top4 90 8'~" ,89.1'3~ 
From Table 1. it can be observed that the SCHMM with top 4 
codewords works better than both the discrete and continuous 
mixture HMM The SCHMM with top I codeword works actually 
worse than the discrete'HMM, which indicates that diagonal 
Gaussian assumption may be inappropriate here Though bilinear 
transformed cepstral coefficients could not be well modeled by the 
diagonal Gaussian assumption ,which was proven by the poor per- 
formance of the continuous mixture HMM and the SCHMM wit__h 
Gaussian assumption lwhich was proven by the poor performance 
of the continuous mixture HMM and the SCHMM with top I code- 
word), the SCHMM with tdp 4 codewords works modestly better 
than the discrete HMM. The improvement may primarily come 
from smoothing effect of theSCHMM, ie the robustness of multi. 
pie codewords and multiple codebooks in the sem,.continuous out- 
put probability representation, albeit 200 generalized triphone 
models are relatively well trained in comparison to standard 
Sphinx version \[12\], where I000 generalized triphone models are 
used. 
3.3. Experimental Results Using Less Correlated Data 
If the diagonal Gaussian covariance is used, each dimension in 
speech vector should be un-correlated. In practice, this can be par- 
tially satisfied by using less correlated feature as acoustic observa- 
tion representation. One way to reduce correlation is principal 
component projection. In the implementation here. the projection 
matrix is computed by first pooling together the bilinear 
• transformed cepstrum of the whole training sentences, and then 
computing the eigenvector of that pooled coveriance matrix 
Unfortunately. only insignificant improvements are obtained 
based on such a projection\[7\] This is because the covariance for 
each codeword is quite different, and suc.h a proJection only makes 
average covariance diagonal, which is inadequate 
278 
As bilinear transformed cepstral coefficients could not be well 
modeled hy diagnnal Gaus~ian prohahihty density function, exper- 
tments without bdmear transformation are conducted The lSth 
order cepstrum is used here for the SCHMM because of less corre- 
lated characteristics of the cepstrum With 4:\]58 Ira=sang sen- 
tences {june-trainL test results of 300 sentences cjune.test~ are 
listed in Table 2. 
Table 2 
Average accuracy of IRtb order cepstrum 
~~.______.~e.fcent correct (.,nrd nccurncv) 
-~'screte HMM \[ 863% '83.8~':I 
SCHMM -~ topl I ~6 e~ ,as s,-~, 
SCHMM .Mop2 t 898~,876~ 
SCHMM + top4 \] :q93% ,~RSr;-~ 
SCHMM + topfi \[ 896q ,~86~;~ 
SCHMM + top8 I 89 3%,~g2"~* 
Here. the recognition accuracy of the SCHMM is significantly 
improved in comparison with the discrete ltMM, and error reduc- 
tion is over 29% Even the SCHMM with top one codeword is 
used. it is still better than the discrete HMM '855~ vs, ~3S%,. 
Use of multiple codewords ,top4 and top6~ m the semi-conhnuous 
output probability density functmn greatly improves the word 
accuracy Ifrom 85.5% to 88.6'31. Further increase of codewords 
used in the semi-continuous output probability density functions 
shows no improvement on word accuracy, but substantial growth 
of computational complexity. From Table 2. it can be seen that the 
SCHMM with top four codewords is adequate 188 5% ~ \[n contrast. 
when bilinear transformed data was used. the error reduction is 
less than 10% in comparison to the discrete HM:.I. and the 
SCHMM with top one codeword is actually slightly worse than the 
discrete HMM. This strongly md~cates that appropriate feature ,s 
very ~mportant if continuous probability density function, rape. 
cially diagonal covariance a*sumption, is used If assumption is 
inappropriate, maximum likelihood estimation will only maximize 
the tcrnng assumption Although more than 29"; error reduction 
has been achieved for lath order LPC analyses using diagonal 
covariance assumption, the last results with the discrete HMM 
'bdinear transformed cepstrum. 883~ and the SCHMM 1lath 
order ccpstrum. ~3R";, are about the same This ~u~zgest that hil. 
ineur trunsformutmn is helpful for recognlt*on, hut have corre. 
luted coefT~c~ents, which is inappropriate to the dt,gonul L;auss,un 
ussumptmn It can be expected that wHh the full covarlance 
SCHMM and bilinear transformed cepstral data. better recogni- 
tion accuracy can be obtained 
4. CONCLUSIONS 
Semi-continuous hidden Markov models based on multiple vector 
quantization codebooks take the advantages of both the discrete 
HMM and continuous HMM. With the SCHMM, it is possible to 
model a mixture of a large number of probability density rune• 
tions with a limited amount of training data and computational 
complexity Robustness is enhanced by using multiple codewords 
and multiple codebooks for the semi-continuous output probability 
representation. In addition, the VQ codebook itself can be 
adjusted together with the HMM parameters in order to obtain 
the optimum maximum likelihood of the HMM. The applicab,lity 
of the continuous mixture HMM or the SCHMM relies on 
appropriately chosen acoustic parameters and assumption of the 
continuous probability density function. Acoustic features must be 
well represented if diagonal covariance is applied to the Geussian 
probability density function. This is strongly indicated by the 
experimental results based on the bilineer transformed cepstrum 
and eepstrum. With bilinear transformation, high frequency com- 
ponents are compressed in comparison to low frequency com. 
ponents \[2.3\]. Such a transformation converts the linear fre- 
quency axis into a mel.scale-like one. The discrete HMM can be 
substantially improved by bilinear transformation However. bil. 
inear transformation introduces strong correlations, which is inap. 
propriate for the diagonal Gausstan assumption modeling. Using 
the cepstrum without bilinear transformation, the diagonal 
SCHMM can be substantially improved in comparison to the 
discrete HMM 
All experiments conducted here were based on only 200 general- 
ired tr~phnnes as smoothing can play a more ~mportant role m 
those less-well.tra=ned models, more improvement can be expected 
for 1000 generahzed triphones ,where the word accuracy for the 
discrete HMM is 91~ with bilinear transformed data~ \[n addi. 
tion. removal of diagonal covariance assumption by use of full 
covariance can be expected to further improve recognition accu- 
racy\[l\]. Regard,ng use of full covariance, theSCl(MM hasadis. 
tinctive advantage Since Gaussian probability density functions 
are tied to the VQ codebook, by chosing M mo~t significant code- 
words, computational complexity can be several order lower than 
the conventional continuous mixture \[{MM while mamtalning the 
modeling power of large mixture components 
Experimental results have clearly demonstrated that the SCIIMM 
ofl'erx improved recognition accuracy in compurison to hoth the 
discrete HMM and the continuous mixture HMM in speaker. 
independent continuous speech recognition, We conclude that the 
SCHMM is indeed a powerful technique for modeling non- 
stationary stochastic processes with multi-modal prohabilistic 
functions of Markov chains 
ACKNOWLEDGEMENTS 
We would hke to thank Professor Mervyn Jack and Profe*sor Ra| Reddy 
for their help ~nd ~nslght shared In this research 
REFERENCES 
\[l\]Baum. LE and et ~l."A maximi~atlontechniqueoccunngln the,to. 
tlstIcsl analysis of prohablhstlc functions of Marker chuln~". J Ann 
Math ~tat Vol 41. pp 164.171. 1970 
iJI Bellegarda. J and Nahamoo D . "Tied mixture contmuous parameter 
mode\[~ for large ~ocabuLar) isolated ~peech recogmtmn". ICASSP 89. 
Glasgow. Scotlund. 1989 
\[3! Dempqter. A P . Laird N M . and Ruhin. D B "Maximum-likelihood 
from incomplete data ~ia the EM algorithm". J Royal Statist Soc. Ser B 
rmethodologlcsl,.Vol 39. pp 1-38.1977 
f4\] Doddmg'ton. G R. "Phonetically *ensltive discriminan~ for improved 
*peech recogmtion". 1CASSP a9. GIs.gow. Scotland. 1989 
\[51Gupta. VN. Lenn,¢ M. and Mermelstein. P "Tntegratioo of Acous. 
tic Information in u L,rge Vocebulury Word Recognizer". ICASSP 87. pp. 
~97.700 19A7 
"6t Huung, X O ,Jr:~ ,l.~k. MA "Hidden Markov modelling of speech 
h,~ed on u ~emt continuous model". Electromcs Letter~. Vul. 24. pp 6.7. 
f71 Huang. X D . Hen. H W . and Lee. K P.. "Multiple codebook semi- 
continuous hidden Murkov models for speaker-independent continuous 
spee:h recognitmn". CMU Technical Report CMU-CS-89.136. C. Science. 
1989 
\[8}Hueng. XD endJack. MA.."Semi.continuoushidden Marker models 
for speech recognltmn". Computer Speech end Language. VoI 3. 1989 
\[9\] Jelinek. F. "Continuous speech recognition by statistical methods". 
Proceedings of IEEE. Vol 64. pp 532.5.56. 1976 
\[10~ Jelioek. F. and Mercer. RL, "Interpolated estimation of Marker 
source parameters from sparse data". The workshop on pattern recogni- 
tion in practice. Amsterdam. \[980 
\[tll Juang. B H . "Maxsmum-hkehhood eqt~mation for mixture multivari- 
ate stochastic nE.er'various of Marker chain". AT&T Technical Journal 
Vol 64. pp 1235-1249. 19A5 
\[12T Lee. K F, "Large-vocabulary ,peuker-lndependent contmuous ~peech 
recognitmn The SPHINX %stem" Ph D thesis. C Science. CMU. t9gA 
\[t3i Lea. K F . Hen. H W . and Redd~.. R. "The SPHINX speech recogni. 
tlon system", ICASSP ~9. Glasgow. Scotland. 1989 
\[141 Paul. D. "The Lincoln continuous speech recognition system recent 
development~ and results". DARPA \[989 Feb meeting, t989 
\[t5TRabiner. LR and et al."Recogn=tlonofisolated d=g=ts u,ln 8hidden 
Marker models with contmuous mixture denslties". AT&T Technical 
JournaI. Vol 64. pp 1211-1234. 198.5 
It6\] Rabiner. L R. Wflpon. JG., end Soong. FK.. "High Performance 
Connected Digit Recognition Using Hidden Marker Models". ICASSP 88, 
NY, t988 
{17\] Redner. R A and Walker. H F. "Mixture densities, muaimum \[ikeli. 
hood and the EM algorithm". SLAM review. Vol. 26. pp. 195-239. L984 
(18\] Sh~kano. K. "Evaluation of LPC spectral matching meuqures for 
phonetic unit recognition". CMU Technical Report CMU.CS-86-\[0B. C 
Science, 1.986 
279 
