File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-0417_intro.xml
Size: 3,523 bytes
Last Modified: 2025-10-06 14:01:55
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0417"> <Title>Training a Naive Bayes Classifier via the EM Algorithm with a Class Distribution Constraint</Title> <Section position="2" start_page="0" end_page="0" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Many of the tasks in natural language processing can be addressed as classification problems. State-of-the-art machine learning techniques including Support Vector Machines (Vapnik, 1995), AdaBoost (Schapire and Singer, 2000) and Maximum Entropy Models (Ratnaparkhi, 1998; Berger et al., 1996) provide high performance classifiers if one has abundant correctly labeled examples.</Paragraph> <Paragraph position="1"> However, annotating a large set of examples generally requires a huge amount of human labor and time. This annotation cost is one of the major obstacles to applying machine learning techniques to real-world NLP applications. null Recently, learning algorithms called minimally supervised learning or unsupervised learning that can make use of unlabeled data have received much attention. Since collecting unlabeled data is generally much easier than annotating data, such techniques have potential for solving the problem of annotation cost. Those approaches include a naive Bayes classifier combined with the EM algorithm (Dempster et al., 1977; Nigam et al., 2000; Pedersen and Bruce, 1998), Co-training (Blum and Mitchell, 1998; Collins and Singer, 1999; Nigam and Ghani, 2000), and Transductive Support Vector Machines (Joachims, 1999). These algorithms have been applied to some tasks including text classification and word sense disambiguation and their effectiveness has been demonstrated to some extent.</Paragraph> <Paragraph position="2"> Combining a naive Bayes classifier with the EM algorithm is one of the promising minimally supervised approaches because its computational cost is low (linear to the size of unlabeled data), and it does not require the features to be split into two independent sets unlike cotraining. null However, the use of unlabeled data via the basic EM algorithm does not always improve classification performance. In fact, this often causes disastrous performance degradation resulting in poor classification performance on average. To alleviate this problem, we introduce a class distribution constraint into the iteration process of the EM algorithm. This constraint keeps the class distribution of unlabeled data consistent with the class distribution estimated from labeled data, preventing the EM algorithm from converging into an undesirable state.</Paragraph> <Paragraph position="3"> In order to assess the effectiveness of the proposed method, we applied it to the problem of semantic disambiguation using local context features. Experiments were conducted with 26 confusion sets and a large number of unlabeled examples collected from a corpus of one hundred million words.</Paragraph> <Paragraph position="4"> This paper is organized as follows. Section 2 briefly reviews the naive Bayes classifier and the EM algorithm as means of using unlabeled data. Section 3 presents the idea of using a class distribution constraint and how to impose this constraint on the learning process. Section 4 describes the problem of confusion set disambiguation and the features used in the experiments. Experimental results are presented in Section 5. Related work is discussed in Section 6. Section 7 offers some concluding remarks.</Paragraph> </Section> class="xml-element"></Paper>