File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1045_intro.xml

Size: 1,935 bytes

Last Modified: 2025-10-06 14:01:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1045">
  <Title>A Method of Cluster-Based Indexing of Textual Data</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> This paper is an attempt to provide a view of indexing as a process of generating many small clusters overlapping with each other. Individual clusters, referred to as micro-clusters in this paper, contain multiple subsets of associated elements, such as documents, terms, authors, keywords, and other related attribute sets. For example, a cluster in Figure 1 represents 'a set of documents written by a specific community of authors related to a subject represented by a set of terms'.</Paragraph>
    <Paragraph position="1"> Our motivations for considering such clusters are that (i) the universal properties of text-based information spaces, namely large scale, sparseness, and local redundancy (Joachims, 2001), may be better manipulated by focusing on only limited sub-regions of the space; and also that (ii) the multiple viewpoints of information contents, which a conventional retrieval system provides, can be better utilized by considering not only the relations between 'documents' and 'terms' but also associations between other attributes such as 'authors' within the same unified framework.</Paragraph>
    <Paragraph position="2"> Based on the background, this paper presents a framework of micro-clustering, within which we adopt a probabilistic formulation of co-ST : a subset of terms SD: a subset of documents SA: a subset of authors  spaces.</Paragraph>
    <Paragraph position="3"> occurrences of textual elements. For simplicity, we focus primarily on the co-occurrences between 'documents' and 'terms' in our explanation, but the presented framework is directly applicable to more general cases with more than two attributes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML