File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/94/h94-1014_concl.xml
Size: 4,312 bytes
Last Modified: 2025-10-06 13:57:12
<?xml version="1.0" standalone="yes"?> <Paper uid="H94-1014"> <Title>Language Modeling with Sentence-Level Mixtures</Title> <Section position="6" start_page="84" end_page="85" type="concl"> <SectionTitle> 3.2. Results </SectionTitle> <Paragraph position="0"> The results reported in Table 1 compare three different language models in terms of perplexity and recognition performance: a simple trigram, and five- and eight-component mixtures. The mixture models reduce the perplexity only by a small amount, but there is a reduction in word-error with the five-component mixture model. We hypothesize that there is not enough training data to effectively use more mixture components.</Paragraph> <Paragraph position="1"> for different acoustic model (AM) and language model (LM) knowledge sources (KSs).</Paragraph> <Paragraph position="2"> a conceptually simple variation of statistical n-gram models. The model is actually a two-level mixture model, with separate mixture weights at the n-gram and sentence levels. Training involves either automatic clustering or heuristic rule, s to determine the initial topic-dependent models, and an iterative algorithm for estimating mixture-weights at the different levels. Recognition experiments on the WSJ task showed a significant improvement in the accuracy for the BU-SSM recognition system.</Paragraph> <Paragraph position="3"> This work can be extended in several ways. First, time limitations ,did not permit us to explore the use of the complete EM algorithm for estimating mixture components and weights jointly, ~xid we hope to investigate that approach in the future. In addition, it may be useful to consider other metrics for automatic topic clustering, such as a word count weighted by inverse document frequencies or a multinomial distribution assumption with a likelihood clustering criterion. Of course, it would also be interesting to see ff further performance gains could be achieved with more clusters. Much more could also be done in the area of robust parameter estimation. For example, one could use an n-gram part-of-speech sequence model as the base for all component models and topic-dependent word likelihoods given the part-of-speech label, a natural extension of \[16\].</Paragraph> <Paragraph position="4"> Dynamic language model adaptation, which makes use of the previous document history to tune the language model to that particular topic, can easily fit into the mixture model framework in two ways. First, the sentence-level mixture weights can be adapted according to the likelihood of the respective mixture components in the previous utterance, as in \[8\] for n-gram level mixture weights. Second, the dynamic n-gram cache model \[I, 9\] can easily be incorporated into the mixture language model. However, in the mixture model, it is possible to have component-dependent cache models, where each component cache would be updated after each sentence according to the likelihood of that component given the recognized word string. Trigger models \[2, 3\] could also be component dependenL The simple static mixture language model can also be useful in applications other than continuous speech transcription. For example, topic -dependent models could be used for topic spotting. In addition, as mentioned earlier, the notion of topic need not be related to subject area, it can be related to speaking style or speaker goal. In the ATIS task, for example, the goal of the speaker (e.g. flight information request, respouse clarification, error correction) is likely to be reflected in the language of the utterance. Representing this structure explicitly has the double benefit of improving recognition performance and providing information for a dialog model.</Paragraph> <Paragraph position="5"> From a cursory look at our recognition errors from the recent WSJ benchmark tests, it is clear that topic-dependent models will not be enough to dramatically reduce word error rate.</Paragraph> <Paragraph position="6"> Out-of-vocabulary words and function words also represent a major source of errors. However, an important advantage of this framework is that it is a simple extension of existing language modeling techniques that can easily be integrated with other language modeling advances.</Paragraph> </Section> class="xml-element"></Paper>