File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1024_intro.xml
Size: 3,396 bytes
Last Modified: 2025-10-06 14:01:30
<?xml version="1.0" standalone="yes"?> <Paper uid="P02-1024"> <Title>Exploring Asymmetric Clustering for Statistical Language Modeling</Title> <Section position="3" start_page="1" end_page="1" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> The n-gram model has been widely applied in many applications such as speech recognition, machine translation, and Asian language text input [Jelinek, 1990; Brown et al., 1990; Gao et al., 2002]. It is a stochastic model, which predicts the next word (predicted word) given the previous n-1 words (conditional words) in a word sequence.</Paragraph> <Paragraph position="1"> The cluster n-gram model is a variant of the word n-gram model in which similar words are classified in the same cluster. This has been demonstrated as an effective way to deal with the data sparseness problem and to reduce the memory sizes for realistic applications. Recent research [Yamamoto et al., 2001] shows that using different clusters for predicted and conditional words can lead to cluster models that are superior to classical cluster models, which use the same clusters for both words [Brown et al., 1992]. This is the basis of the asymmetric cluster model (ACM), which will be formally defined and empirically studied in this paper.</Paragraph> <Paragraph position="2"> Although similar models have been used in previous studies [Goodman and Gao, 2000; Yamamoto et al., 2001], several issues have not been completely investigated. These include: (1) an effective methodology for constructing the ACM, (2) a thorough comparative study of the ACM with classical cluster models and word models when they are applied to a realistic application, and (3) an analysis of the reason why the ACM is superior.</Paragraph> <Paragraph position="3"> The goal of this study is to address the above three issues. We first present a formal definition of the ACM; then we describe in detail the methodology of constructing the ACM including (1) an asymmetric clustering algorithm in which different metrics are used for clustering the predicted and conditional words respectively; and (2) a method for model parameter optimization in which the optimal cluster numbers are found for different clusters. We evaluate the ACM on a real application, Japanese Kana-Kanji conversion, which converts phonetic Kana strings into proper Japanese orthography. The performance is measured in terms of character error rate (CER). Our results show substantial improvements of the ACM in comparison with classical cluster models and word n-gram models at the same model size. Our analysis shows that the high-performance of the ACM comes Computational Linguistics (ACL), Philadelphia, July 2002, pp. 183-190. Proceedings of the 40th Annual Meeting of the Association for from better structure and better smoothing, both of which lie in the asymmetry of the model.</Paragraph> <Paragraph position="4"> This paper is organized as follows: Section 1 introduces our research topic, and then Section 2 reviews related work. Section 3 defines the ACM and describes in detail the method of model construction. Section 4 first introduces the Japanese Kana-Kanji conversion task; it then presents our main experiments and a discussion of our findings.</Paragraph> <Paragraph position="5"> Finally, conclusions are presented in Section 5.</Paragraph> </Section> class="xml-element"></Paper>