File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-0417_concl.xml
Size: 1,790 bytes
Last Modified: 2025-10-06 13:53:41
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-0417"> <Title>Training a Naive Bayes Classifier via the EM Algorithm with a Class Distribution Constraint</Title> <Section position="6" start_page="75" end_page="75" type="concl"> <SectionTitle> 7 Conclusion </SectionTitle> <Paragraph position="0"> The naive Bayes classifier can be combined with the well-established EM algorithm to exploit the unlabeled data . However, the use of unlabeled data sometimes causes disastrous degradation of classification performance.</Paragraph> <Paragraph position="1"> In this paper, we introduce a class distribution constraint into the iteration process of the EM algorithm.</Paragraph> <Paragraph position="2"> This constraint keeps the class distribution of unlabeled data consistent with the true class distribution estimated from labeled data, preventing the EM algorithm from converging into an undesirable state.</Paragraph> <Paragraph position="3"> Experimental results using 26 confusion sets and a large amount of unlabeled data showed that combining the EM algorithm with our proposed constraint consistently reduced the average classification error rates when the amount of labeled data is small. The results also showed that use of unlabeled data is especially advantageous when the amount of labeled data is small (up to about one hundred).</Paragraph> <Section position="1" start_page="75" end_page="75" type="sub_section"> <SectionTitle> 7.1 Future Work </SectionTitle> <Paragraph position="0"> In this paper, we empirically demonstrated that a class distribution constraint reduced the chance of undesirable convergence of the EM algorithm. However, the theoretical justification of this constraint should be clarified in future work.</Paragraph> </Section> </Section> class="xml-element"></Paper>