File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/w04-2409_concl.xml
Size: 1,445 bytes
Last Modified: 2025-10-06 13:54:27
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-2409"> <Title>A Comparison of Manual and Automatic Constructions of Category Hierarchy for Classifying Large Corpora</Title> <Section position="8" start_page="3" end_page="3" type="concl"> <SectionTitle> 6 Conclusions </SectionTitle> <Paragraph position="0"> We proposed a method for generating category hierarchy in order to improve text classification performance.</Paragraph> <Paragraph position="1"> We used CZ-means and a D0D3D7D7 CUD9D2CRD8CXD3D2 which is derived from NB classifiers. We found small advantages in the F-score for automatically generated hierarchy, compared with a baseline flat non-hierarchy and that of manually constructed hierarchy from large training samples. We have also shown that our method can benefit significantly from less training samples. Future work includes (i) extracting features which discriminate between categories within the same cluster with low F-score, (ii) using other machine learning techniques to obtain further advantages in efficiency in dealing with a large collection of data, (iii) comparing the method with other techniques such as hierarchical agglomerative clustering and 'X-means'(Pelleg and Moore, 2000), and (iv) developing evaluation method between manual and automatic construction of hierarchies to learn more about the strengths and weaknesses of the two methods of classifying documents.</Paragraph> </Section> class="xml-element"></Paper>