File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/e95-1008_intro.xml
Size: 4,184 bytes
Last Modified: 2025-10-06 14:05:52
<?xml version="1.0" standalone="yes"?> <Paper uid="E95-1008"> <Title>Collocation Map for Overcoming Data Sparseness</Title> <Section position="2" start_page="0" end_page="53" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In statistical language processing, n-grams are bar sic to many probabilistic models including Hidden Markov models that work on the limited dependency of linguistic events. In this regard, Bayesian models (Bayesian network, Belief network, Inference diagram to name a few) are not very different from ItMMs. Bayesian models capture the conditional independence among probabilistic variables, and can compute the conditional distribution of the variables, which is known as a probabilistic inferencing. The pure n-gram statistic, however, is somewhat crude in that it cannot do anything about unobserved events and its approximation on infrequent events can be unreliable.</Paragraph> <Paragraph position="1"> In this paper we show by way of extensive experiments that the Bayesian method that also can be composed from bigrams can overcome the data sparseness problem that is inherent in frequency counting methods. According to the empirical results, Collocation map that is a Bayesian model for lexical variables induced graceful approximation over unobserved and infrequent events.</Paragraph> <Paragraph position="2"> There are two known methods to deal with the data sparseness problem. They are smoothing and class based methods (Dagan 1992). Smoothing methods (Church and Gale 1991) readjust the distribution of frequencies of word occurrences obtained from sample texts, and verify the distribution through held-out texts. As Dagan (1992) pointed out, however, the values from the smoothing methods closely agree with the probability of a bigram consisting of two independent words.</Paragraph> <Paragraph position="3"> Class based methods (Pereira et al. 1993) approximate the likelihood of unobserved words based on similar words. Dagan and et al. (1992) proposed a non-hierarchical class based method.</Paragraph> <Paragraph position="4"> The two approaches report limited successes of purely experimental nature. This is so because they are based on strong assumptions. In the case of smoothing methods, frequency readjustment is somewhat arbitrary and will not be good for heavily dependent bigrams. As to the class based methods, the notion of similar words differs across different methods, and the association of probabilistic dependency with the similarity (class) of words is too strong to assume in generM.</Paragraph> <Paragraph position="5"> Collocation map that is first suggested in (Itan 1993) is a sigmoid belief network with words as probabilistic variables. Sigmoid belief network is extensively studied by Neal (1992), and has an efficient inferencing algorithm. Unlike other Bayesian models, the inferencing on sigmoid belief network is not NP-hard, and inference methods by reducing the network and sampling are discussed in (Han 1995). Bayesian models constructed from local dependencies provide formal approximation among the variables, thus using Collocation map does not require strong assumption or intuition to justify the associations among words produced by the map.</Paragraph> <Paragraph position="6"> The results of inferencing on Collocation map are probabilities among any combinations of words represented in the map, which is not found in other models. One significant shortcoming of Bayesian models lies in the heavy cost of inferencing. Our implementation of Collocation map includes 988 nodes, and takes 2 to 3 minutes to compute an association between words. The purpose of experiments is to find out how gracefully Collocation map deals with the unobserved cooccurrences in comparison with a naive bigram statistic. In the next section, Collocation map is reviewed following the definition in (Flail 1993). In section 3, mutual information and conditional probabilities computed using bigrams and Collocation map are compared. Section 4 concludes the paper by summarizing the good and bad points of the Collocation map and other methods.</Paragraph> </Section> class="xml-element"></Paper>