File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/c02-1045_concl.xml

Size: 2,118 bytes

Last Modified: 2025-10-06 13:53:12

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1045">
  <Title>A Method of Cluster-Based Indexing of Textual Data</Title>
  <Section position="6" start_page="1" end_page="1" type="concl">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper, we reported a method of generating overlapping micro-clusters in which documents, terms, and other related elements of text-based information are grouped together.</Paragraph>
    <Paragraph position="1"> Comparing the proposed micro-clustering method with existing text categorization methods, the distinctive feature of the former is that the documents on borders are readily viewed and examined. In addition, the terms in the cluster can be further utilized in digesting the descriptions of the clustered documents. Such properties of micro-clustering may be particularly important when the system actually interacts with its users.</Paragraph>
    <Paragraph position="2"> For comparison purposes, we have used only the conventional documents-and- terms feature space in our experiments. However, the proposed micro-clustering framework can be applied more flexibly to other cases as well. For example, we have also generated clusters using the co-occurrences of the triple of documents, terms, and authors. Although the performance was not much different in terms of text categorization (2,584 correct judgments out of 2,639 judgments, the precision slightly improved), we can confirm that many of the highly ranked clusters contain documents produced by the same group of authors, emphasizing the characteristics of such generated clusters.</Paragraph>
    <Paragraph position="3"> Future issues include: (i) enhancing the probabilistic models considering other discounting techniques in linguistic studies; (ii) developing a strategy for initiating clusters by combining different attribute sets, such as documents or authors; and also (iii) establishing a method of evaluating overlapping clusters. We are also looking into the possibility of applying the proposed framework to Web document clustering problems.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML