File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/04/c04-1068_concl.xml

Size: 1,945 bytes

Last Modified: 2025-10-06 13:53:52

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1068">
  <Title>Filtering Speaker-Specific Words from Electronic Discussions</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Conclusion
</SectionTitle>
    <Paragraph position="0"> In this paper, we have identified features of electronic discussions that influence clustering performance, and presented a filtering mechanism that removes adverse influences. The effect of our filtering mechanism was evaluated by means of two experiments: coarse-level clustering and simple information retrieval. Our results show that filtering out the signature words of dominant speakers has a positive effect on clustering and retrieval performance.</Paragraph>
    <Paragraph position="1"> Although these experiments were performed at a coarser level of granularity than that of our target domain, our results indicate that filtering signature words is a promising pre-processing step for clustering electronic discussions.</Paragraph>
    <Paragraph position="2"> From a more qualitative perspective, we clearly saw the benefit of the filtering mechanism in the example in Section 3.3 (Tables 2 and 3): when a generation component is used to describe the contents of clusters, the inclusion of author-specific words is uninformative and even confusing.</Paragraph>
    <Paragraph position="3"> Our approach to filtering is general in the sense that we do not target specific parts of electronic discussions (e.g. the last few lines of a posting) for filtering. We have experimented with a more naive approach that removes all web and email addresses from a posting (they account for a significant portion of a signature). However, this simple heuristic yielded only a small improvement in clustering performance. More importantly, it clearly does not generalise to deal with the problem of identifying and removing author-specific terminology.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML