File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/p00-1073_intro.xml
Size: 2,319 bytes
Last Modified: 2025-10-06 14:00:54
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1073"> <Title>Distribution-Based Pruning of Backoff Language Models</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Statistical language modelling (SLM) has been successfully applied to many domains such as speech recognition (Jelinek, 1990), information retrieval (Miller et al., 1999), and spoken language understanding (Zue, 1995). In particular, n-gram language model (LM) has been demonstrated to be highly effective for these domains. N-gram LM estimates the probability of a word given previous words,</Paragraph> <Paragraph position="2"> In applying an SLM, it is usually the case that more training data will improve a language model. However, as training data size increases, LM size increases, which can lead to models that are too large for practical use.</Paragraph> <Paragraph position="3"> To deal with the problem, count cutoff (Jelinek, 1990) is widely used to prune language models. The cutoff method deletes from the LM those n-grams that occur infrequently in the training data. The cutoff method assumes that if an n-gram is infrequent in training data, it is also infrequent in testing data. But in the real world, training data rarely matches testing data perfectly. Therefore, the count cutoff method is not perfect.</Paragraph> <Paragraph position="4"> In this paper, we propose a distribution-based cutoff method. This approach estimates if an n-gram is &quot;likely to be infrequent in testing data&quot;. To determine this likelihood, we divide the training data into partitions, and use a cross-validation-like approach. Experiments show that this method performed 7-9% (word perplexity reduction) better than conventional cutoff methods.</Paragraph> <Paragraph position="5"> In section 2, we discuss prior SLM research, including backoff bigram LM, perplexity, and related works on LM pruning methods. In section 3, we propose a new criterion for LM pruning based on n-gram distribution, and discuss in detail how to estimate the distribution. In section 4, we compare our method with count cutoff, and present experimental results in perplexity. Finally, we present our conclusions in section 5.</Paragraph> </Section> class="xml-element"></Paper>