File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/h94-1015_evalu.xml
Size: 5,888 bytes
Last Modified: 2025-10-06 14:00:14
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1015"> <Title>SPEECH RECOGNITION USING A STOCHASTIC LANGUAGE MODEL INTEGRATING LOCAL AND GLOBAL CONSTRAINTS</Title> <Section position="6" start_page="90" end_page="90" type="evalu"> <SectionTitle> 4. EXPERIMENTS </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="90" end_page="90" type="sub_section"> <SectionTitle> 4.1. Estimation of Language Model Parameters </SectionTitle> <Paragraph position="0"> A 11,000-sentence text database of Japanese conversations concerning conference registration was used to train the language models.</Paragraph> <Paragraph position="1"> This database is manually labeled with part of speech tags. Each word was classified as a function word or a content word according to its part of speech. The size of the vocabulary is 5,389 words (5,041 content words and 348 function words), where words having the same spelling but different pronunciation, or were different parts of speech, were counted as different words.</Paragraph> <Paragraph position="2"> The probability values in the language models were estimated by the maximum likelihood method. These values were then smoothed using the deleted interpolation method \[9\]. To cope with the unknown word problem, 'zero-gram' probabilities (uniform distribution) were also used in the interpolation. In the model II, this interpolation was applied to probabilities of local constraints (PL) and those of global constraints (Pc), respectively.</Paragraph> <Paragraph position="3"> In the calculation of perplexity for model II, use of the values obtained by equation (10) does not give the correct perplexity because</Paragraph> <Paragraph position="5"> does not hold due to the approximation. Therefore the values of P(wi \] Wi-l, ci-1 ,fi-i ) were normalized in order to satisfy this equation. This normalization was done by simply multiplying a constant value found for each combination of (wi-l, ci-1 ,fi-1 ). It was omitted in the recognition experiment for computational reasons.</Paragraph> <Paragraph position="6"> Beam width for recognition was fixed at 6,000 in all cases. Weighting values for the acoustic score and linguistic score were determined by preliminary experiments. Common weighting values were used for all models.</Paragraph> </Section> </Section> <Section position="7" start_page="90" end_page="90" type="evalu"> <SectionTitle> 4.3. Results </SectionTitle> <Paragraph position="0"> The results are shown in Table 4. The proposed models give lower perplexities than the bigram model, although not so low as the trigram model, which is reflected in the speech recognition accuracy. The perplexity of model II is higher than that of model I, which we think is caused by the approximation used to derive model II, but the smallness of the increase supports the validity of the assumptions described in 2.3.</Paragraph> <Paragraph position="1"> Although the perplexity and recognition rate are improved compared with the bigram model, the gain is modest. This may be due to a lack of training data or to a mismatch between the training and test data, especially since the difference in performance is also small between the bigram model and the trigram model.</Paragraph> <Paragraph position="2"> However, the fact that the performance obtained by the proposed model II lies almost halfway between the bigram and trigram, shows that the proposed model has the capability to capture linguistic con-</Paragraph> <Section position="1" start_page="90" end_page="90" type="sub_section"> <SectionTitle> 4.2. Experimental Conditions </SectionTitle> <Paragraph position="0"> Speaker-dependent continuous speech recognition experiments were carried out under the conditions shown in Table 2. The domain of the recognition task is the same as that of the training data, but the text of the test speech data was not included in the training data. Context-independent continuous mixture HMMs were used as straints effectively with a comparatively small number of parameters. Its perfolanance could be improved by extending it to use trigram probabilities for local or global constraints,</Paragraph> </Section> </Section> <Section position="8" start_page="90" end_page="90" type="evalu"> <SectionTitle> 5. DISCUSSIONS </SectionTitle> <Paragraph position="0"> In an attempt to capture the global constraints, we took note of the role of function words as case markers and used their N-gram probabilities to extract the syntactic constraints. We also used the N-gram probabilities of the content words to extract the semantic constraints.</Paragraph> <Paragraph position="1"> One of its advantages is that it does not need expensive computational cost compared with previous works \[3, 4, 5, 6\]. Furthermore, as the syntactic constraints are considered to be less dependent on the domain than the semantic ones, function word N-grams could be trained with a task-independent large database and combined with content word N-grams trained with a task-dependent smaller database.</Paragraph> <Paragraph position="2"> One of the disadvantages of our approach is that the labels indicating whether a word is a function word or a content word are necessary in the training data. We think it would not be so difficult to automatically label if we only have to classify the words into these two categories, because the category of function words can be regarded as a closed class.</Paragraph> <Paragraph position="3"> Another problem is its generality, especially its applicability to other languages. English, for example, has different structure of sentences and different way of specifying the cases, although relationships between the content words are expected to exist. We think similar approach could be also useful for other languages, but some modification may be needed.</Paragraph> </Section> class="xml-element"></Paper>