File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/n06-1061_evalu.xml

Size: 1,940 bytes

Last Modified: 2025-10-06 13:59:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="N06-1061">
  <Title>Language Model-Based Document Clustering Using Random Walks</Title>
  <Section position="6" start_page="483" end_page="484" type="evalu">
    <SectionTitle>
4.3.3 Results
</SectionTitle>
    <Paragraph position="0"> The output of a hierarchical clustering algorithm is a tree where leaves are the documents and each node in the tree shows a cluster merging operation. Therefore each subtree represents a cluster. We assume that each class of documents in the corpus form a cluster subtree at some point during the construction of the tree. To evaluate the cluster tree, we use F-measure proposed in (Larsen and Aone, 1999). F-measure for a class a57a9 in the corpus and a subtree a41a13 is defined as</Paragraph>
    <Paragraph position="2"> where a127 a15a57a9 a16 a41a13 a17 and  a13 a17 is the recall and the precision of a41a13 considering the class a57a9. Let a43 be the set of subtrees in the output cluster tree, and a37 be the set of classes. F-measure of the entire tree is the weighted average of the maximum F-measures of all the classes:  is the number of documents that belong to class a57.</Paragraph>
    <Paragraph position="3"> We ran all three algorithms for both corpora. Unlike kmeans, hierarchical algorithms we used are deterministic. Table 4 summarizes our results. An immediate observation is that average-link clustering performs much better than other two algorithms independent of the data set or the document representation, which is consistent with earlier research (Zhao and Karypis, 2002). The highest result (shown boldface) for each algorithm and corpus was achieved by using generation vectors. However, unlike in the k-means experiments, a1a2 a3 a4a5a2 was able to outperform a49 a89a90 a51 and a49 a89a90a52 in one or two cases. a49a89a90 a52 yielded the best result instead of a49 a89a90 a100 in one of the six cases.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML