File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-4006_metho.xml

Size: 1,674 bytes

Last Modified: 2025-10-06 14:08:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-4006">
  <Title>QCS: A Tool for Querying, Clustering, and Summarizing Documents</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Querying, Clustering, Summarizing
</SectionTitle>
    <Paragraph position="0"> QCS employs a vector space model (Salton et al., 1975) to represent a set of documents. Choices for the term weighting currently include the following: Local: term frequency, log, binary Global: none, normal, idf, idf2, entropy Normalization: none, normalized Detailed descriptions of each of these weighting factors as well as strategies for using each of these are presented by Dumais (1991) and Kolda and O'Leary (1998).</Paragraph>
    <Paragraph position="1"> The current computational methods used for retrieving a set of documents that best match a query, clustering a set of documents by topic, and creating a summary of multiple documents are as follows: Querying: Latent Semantic Indexing (LSI) Clustering: spherical k-means Summarization: a hidden Markov model (HMM) and pivoted QR Detailed descriptions of these methods presented in Deerwester et al. (1990), Dhillon and Modha (2001), and Schlesinger et al. (2002), respectively.</Paragraph>
    <Paragraph position="2"> The interface to QCS (see Figure 1) consists of a collection of JavaTM 1 servlets which format input to and output from QCS via dynamic HTML documents. This approach allows all of the computation and formatting to take place on a JavaTM server, with the only requirement on the users' systems being that of an HTML-enabled browser application (e.g., Netscape R 7.0 ).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML