File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/91/h91-1049_evalu.xml
Size: 3,072 bytes
Last Modified: 2025-10-06 14:00:03
<?xml version="1.0" standalone="yes"?> <Paper uid="H91-1049"> <Title>A DYNAMICAL SYSTEM APPROACH TO CONTINUOUS SPEECH RECOGNITION</Title> <Section position="8" start_page="254" end_page="254" type="evalu"> <SectionTitle> EXPERIMENTAL RESULTS </SectionTitle> <Paragraph position="0"> We have implemented a system based on our correlation invariance assumption and performed phone classifi- null of iterations and log-likellhood ratio of each iteration relative to the convergent value for the training data.</Paragraph> <Paragraph position="1"> cation experiments on the TIMIT database \[5\]. We used Mel-warped cepstra and their derivatives together with the derivative of log power.The number of different distributions (time-invariant regions) for each segment model was 5. We used 61 phonetic models, but in counting errors we folded homophones together and effectively used the reduced CMU/MIT 39 symbol set. The measurement-noise variance was common over all different phone-models and was not reestimated after the first iteration. In experiments with class-dependent measurement noise, we observed a decrease in performance, which we attribute to &quot;over-training&quot;; a first order Gauss-Markov structure can adequately model the training data, because of the small length of the time-invariant regions in the model. In addition, the observed feature vectors were centered around a class-dependent mean.</Paragraph> <Paragraph position="2"> Duration probabilities as well as a priori class probabilities where also used in these experiments. The training set that we used consist of 317 speakers (2536 sentences), and evaluation of our algorithms is done on a separate test set with 12 speakers (96 sentences).</Paragraph> <Paragraph position="3"> The effectiveness of the training algorithm is shown in Figure 2, where we present the normalized log-likelihood of the training data and classification rate of the test data versus the number of iterations. We used 10 cepstra for this experiment, and the initial parameters for the models where uniform across all classes, except the class-dependent means. We can see the fast initial convergence of the EM algorithm, and that the best performance is achieved after only 4 iterations.</Paragraph> <Paragraph position="4"> In Figure 3 we show the classification rates for no correlation modeling (independent frames), the Gauss-Markov model and the Dynamical system model for different numbers of input features. We also include in the same plot the classification rates when the derivatives of the cepstra are</Paragraph> <Paragraph position="6"> modeling and numbers of cepstral coefficients included in the feature set, so that some form of correlation modeling is included in the independent-frame model. We can see that the proposed model clearly outperforms the independent-frame model. Furthermore, we should notice the significance of incorporating observation noise in the model, by comparing the performance of the new model to the earlier, Gauss-Markov one.</Paragraph> </Section> class="xml-element"></Paper>