File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/99/w99-0908_concl.xml

Size: 1,340 bytes

Last Modified: 2025-10-06 13:58:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0908">
  <Title>Text Classification by Bootstrapping with Keywords, EM and Shrinkage</Title>
  <Section position="7" start_page="56" end_page="56" type="concl">
    <SectionTitle>
6 Conclusions and Future Work
</SectionTitle>
    <Paragraph position="0"> This paper has considered building a text classifier without labeled training documents. In its place, our bootstrapping algorithm uses a large pool of unlabeled documents and class-specific knowledge in the form of a few keywords per class and a class hierarchy. The bootstrapping algorithm combines Expectation-Maximization and hierarchical shrinkage to correct and complete preliminary labeling provided by keyword matching. Experimental results show that accuracies close to human agreement can be obtained by the bootstrapping algorithm.</Paragraph>
    <Paragraph position="1"> In future work we plan to refine our probabilistic model to allow for documents to be placed in interior hierarchy nodes, documents to have multiple class assignments, and classes to be modeled with multiple mixture components. We are also investigating principled methods of re-weighting the word features for &amp;quot;semi-supervised&amp;quot; clustering that will provide better discriminative training with unlabeled data.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML