File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/p04-3024_concl.xml

Size: 1,037 bytes

Last Modified: 2025-10-06 13:54:11

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-3024">
  <Title>A New Feature Selection Score for Multinomial Naive Bayes Text Classification Based on KL-Divergence</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> By interpreting Naive Bayes in an information theoretic framework, we derive a new scoring method for feature selection in text classification, based on the KL-divergence between training documents and their classes. Our experiments show that it out-performs mutual information, which was one of the best performing methods in previous studies (Yang and Pedersen, 1997). The KL-divergence based scores are especially effective for smaller categories, but additional experiments are certainly required. null In order to keep the computational cost low, we use an approximation instead of the exact KLdivergence. Assessing the error introduced by this approximation is a topic for future work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML