File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/02/p02-1024_relat.xml
Size: 2,726 bytes
Last Modified: 2025-10-06 14:15:39
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1024"> <Title>Exploring Asymmetric Clustering for Statistical Language Modeling</Title> <Section position="4" start_page="1" end_page="1" type="relat"> <SectionTitle> 2 Related Work </SectionTitle> <Paragraph position="0"> A large amount of previous research on clustering has been focused on how to find the best clusters [Brown et al., 1992; Kneser and Ney, 1993; Yamamoto and Sagisaka, 1999; Ueberla, 1996; Pereira et al., 1993; Bellegarda et al., 1996; Bai et al., 1998]. Only small differences have been observed, however, in the performance of the different techniques for constructing clusters. In this study, we focused our research on novel techniques for using clusters - the ACM, in which different clusters are used for predicted and conditional words respectively.</Paragraph> <Paragraph position="1"> The discussion of the ACM in this paper is an extension of several studies below. The first similar cluster model was presented by Goodman and Gao [2000] in which the clustering techniques were combined with Stolcke's [1998] pruning to reduce the language model (LM) size effectively. Goodman [2001] and Gao et al, [2001] give detailed descriptions of the asymmetric clustering algorithm. However, the impact of the asymmetric clustering on the performance of the resulting cluster model was not empirically studied there. Gao et al., [2001] presented a fairly thorough empirical study of clustering techniques for Asian language modeling.</Paragraph> <Paragraph position="2"> Unfortunately, all of the above work studied the ACM without applying it to an application; thus only perplexity results were presented. The first real application of the ACM was a simplified bigram ACM used in a Chinese text input system [Gao et al.</Paragraph> <Paragraph position="3"> 2002]. However, quite a few techniques (including clustering) were integrated to construct a Chinese language modeling system, and the contribution of using the ACM alone was by no means completely investigated.</Paragraph> <Paragraph position="4"> Finally, there is one more point worth mentioning. Most language modeling improvements reported previously required significantly more space than word trigram models [Rosenfeld, 2000].</Paragraph> <Paragraph position="5"> Their practical value is questionable since all realistic applications have memory constraints. In this paper, our goal is to achieve a better tradeoff between LM performance (perplexity and CER) and model size. Thus, whenever we compare the performance of different models (i.e. ACM vs. word trigram model), Stolcke's pruning is employed to bring the models compared to similar sizes.</Paragraph> </Section> class="xml-element"></Paper>