File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/99/w99-0908_abstr.xml

Size: 1,252 bytes

Last Modified: 2025-10-06 13:49:57

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0908">
  <Title>Text Classification by Bootstrapping with Keywords, EM and Shrinkage</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> When applying text classification to complex tasks, it is tedious and expensive to hand-label the large amounts of training data necessary for good performance.</Paragraph>
    <Paragraph position="1"> This paper presents an alternative approach to text classification that requires no labeled documentsi instead, it uses a small set of keywords per class, a class hierarchy and a large quantity of easilyobtained unlabeled documents. The key-words are used to assign approximate labels to the unlabeled documents by termmatching. These preliminary labels become the starting point for a bootstrapping process that learns a naive Bayes classifier using Expectation-Maximization and hierarchical shrinkage. When classifying a complex data set of computer science research papers into a 70-leaf topic hierarchy, the keywords alone provide 45% accuracy. The classifier learned by bootstrapping reaches 66% accuracy, a level close to human agreement.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML