XML Viewer - h91-1057

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/91/h91-1057_intro.xml
Size: 3,249 bytes
Last Modified: 2025-10-06 14:05:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="H91-1057">
  <Title>A Dynamic Language Model for Speech Recognition</Title>
  <Section position="2" start_page="0" end_page="293" type="intro">
    <SectionTitle>
INTRODUCTION
</SectionTitle>
    <Paragraph position="0"> A language model is used in speech recognition systems and automatic translation systems to improve the performance of such systems. A trigram language model \[1\], whose parameters are estimated from a large corpus of text (greater than a few million words), has been used successfully in both applications. The trigram language model has a probability distribution for the next word conditioned on the previous two words. This static distribution is obtained a.s an average over many documents. Yet we know that sev-I Now at Rutgers University, N J, work performed while visiting IBM eral words are bursty by nature, i.e., one expects the word &amp;quot;language&amp;quot; to occur in this paper at a significantly higher rate than the average frequency estimated from a large collection of text. To capture the &amp;quot;dynamic&amp;quot; nature of the trigram probabilities in a particular document, we present a &amp;quot;cache;' trigram language model (CTLM) that uses a window of the n most recent words to determine the probability distribution of the next word.</Paragraph>
    <Paragraph position="1"> The idea of using a window of the recent history to adjust the LM probabilities was proposed in \[2, 3\]. In \[2\] tile dynamic component adjusted the conditional probability, p,(w,+l \] g,+l), of the next word, wn+l, given a predicted part-oPspeech (POS), g,~+l, in a tripart-of-speech language model. Each POS had a separate cache where the frequencies of all the words that occurred with a POS determine the dynamic component of the language model. As a word is observed it is tagged and the appropriate POS cache is updated.</Paragraph>
    <Paragraph position="2"> At least 5 occurrences per cache are required before activating it. P~eliminary results for a couple POS caches indicate that with appropriate smoothing the perplexity, for example, on NN-words is decreased by a factor of 2.5 with a cache-based conditional word probability given the POS category instead of a static probability model.</Paragraph>
    <Paragraph position="3"> In \[3\], the dynamic model uses two bigram language models, p,(w,.+l \] w,, D), where D=I for words w,+l that have occurred in the cache window and D=0 for words that have occurred in the cache window. For cache sizes from 128 to 4096, the reported res,lts indicate an improvement in the average rank of tile correct word predicted by the model by 7% to  17% over the static model assuming one knows if the next word is in the cache or not.</Paragraph>
    <Paragraph position="4"> In this paper, we will present a new cache language model and compare its performance to a tri-gram language model. In Section 2, we present our proposed dynamic component and some results comparing static and dynamic trigram language models using perplexity. In Section 3, we present our method for incorporating the dynamic language model in an isolated 20,000 word speech recognizer and its effect on recognition performance.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML