File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/02/w02-2009_abstr.xml

Size: 864 bytes

Last Modified: 2025-10-06 13:42:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-2009">
  <Title>Cross-dataset Clustering: Revealing Corresponding Themes Across Multiple Corpora</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> We present a method for identifying corresponding themes across several corpora that are focused on related, but distinct, domains. This task is approached through simultaneous clustering of keyword sets extracted from the analyzed corpora. Our algorithm extends the informationbottleneck soft clustering method for a suitable setting consisting of several datasets. Experimentation with topical corpora reveals similar aspects of three distinct religions. The evaluation is by way of comparison to clusters constructed manually by an expert.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML