File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/03/n03-4006_metho.xml
Size: 1,674 bytes
Last Modified: 2025-10-06 14:08:16
<?xml version="1.0" standalone="yes"?> <Paper uid="N03-4006"> <Title>QCS: A Tool for Querying, Clustering, and Summarizing Documents</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Querying, Clustering, Summarizing </SectionTitle> <Paragraph position="0"> QCS employs a vector space model (Salton et al., 1975) to represent a set of documents. Choices for the term weighting currently include the following: Local: term frequency, log, binary Global: none, normal, idf, idf2, entropy Normalization: none, normalized Detailed descriptions of each of these weighting factors as well as strategies for using each of these are presented by Dumais (1991) and Kolda and O'Leary (1998).</Paragraph> <Paragraph position="1"> The current computational methods used for retrieving a set of documents that best match a query, clustering a set of documents by topic, and creating a summary of multiple documents are as follows: Querying: Latent Semantic Indexing (LSI) Clustering: spherical k-means Summarization: a hidden Markov model (HMM) and pivoted QR Detailed descriptions of these methods presented in Deerwester et al. (1990), Dhillon and Modha (2001), and Schlesinger et al. (2002), respectively.</Paragraph> <Paragraph position="2"> The interface to QCS (see Figure 1) consists of a collection of JavaTM 1 servlets which format input to and output from QCS via dynamic HTML documents. This approach allows all of the computation and formatting to take place on a JavaTM server, with the only requirement on the users' systems being that of an HTML-enabled browser application (e.g., Netscape R 7.0 ).</Paragraph> </Section> class="xml-element"></Paper>