File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1710_intro.xml
Size: 3,367 bytes
Last Modified: 2025-10-06 14:02:06
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1710"> <Title>Modeling of Long Distance Context Dependency in Chinese</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Language modeling is the attempt to characterize, capture and exploit the regularities and constraints in natural language. Among various language modeling approaches, ngram modeling has been widely used in many applications, such as speech recognition, machine translation (Katz 1987; Jelinek 1989; Gale and Church 1990; Brown et al.</Paragraph> <Paragraph position="1"> 1992; Yang et al. 1996; Bai et al 1998; Zhou et al 1999; Rosenfeld 2000; Gao et al 2002). Although ngram modeling is simple in nature and easy to use, it has obvious deficiencies. For instance, ngram modeling can only capture the short distance context dependency within an N-word window where currently the largest practical N for natural language is three.</Paragraph> <Paragraph position="2"> In the meantime, it is found that there always exist many preferred relationships between words.</Paragraph> <Paragraph position="3"> Two highly associated word pairs are Bu Jin /Er Qie (&quot;not only/but also&quot;) and Yi Sheng /Hu Shi (&quot;doctor/nurse&quot;). Psychological experiments in Meyer et al. (1975) indicated that the human's reaction to a highly associated word pair was stronger and faster than that to a poorly associated word pair. Such preference information is very useful for natural language processing (Church et al. 1990; Hiddle et al. 1993; Rosenfeld 1994; Zhou et al.1998; Zhou et al 1999). Obviously, the preference relationships between words can expand from a short to long distance. While we can use traditional ngram modeling to capture the short distance context dependency, the long distance context dependency should also be exploited properly.</Paragraph> <Paragraph position="4"> The purpose of this paper is to propose a new MI-Ngram modeling approach to capture the context dependency over both a short distance and a long distance. Experimentation shows that this new MI-Ngram modeling approach can significantly decrease the perplexity of the new MI-Ngram model compared with traditional ngram model. In the meantime, evaluation on Chinese word segmentation shows that this new approach can significantly reduce the error rate.</Paragraph> <Paragraph position="5"> This paper is organized as follows. In section 2, we describe the traditional ngram modeling approach and discuss its main property. In section 3, we propose the new MI-Ngram modeling approach to capture context dependency over both a short distance and a long distance. In section 4, we measure the MI-Ngram modeling approach and evaluate its application in Chinese word segmentation. Finally we give a summary of this paper in section 5.</Paragraph> <Paragraph position="6"> And the probability P can be estimated by using maximum likelihood estimation (MLE)</Paragraph> <Paragraph position="8"> Where )(*C represents the number of times the sequence occurs in the training data. In practice, due to the data sparseness problem, some smoothing techniques, such as linear interpolation (Jelinek 1989; Chen and Goodman 1999) and back-off modeling (Katz 1987), are applied.</Paragraph> </Section> class="xml-element"></Paper>