File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/h01-1030_concl.xml

Size: 2,248 bytes

Last Modified: 2025-10-06 13:53:01

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1030">
  <Title>First Story Detection using a Composite Document Representation.</Title>
  <Section position="11" start_page="3" end_page="3" type="concl">
    <SectionTitle>
8. CONCLUSIONS
</SectionTitle>
    <Paragraph position="0"> A variety of techniques for data fusion have been proposed in IR literature. Results from data fusion research have suggested that significant improvements in system effectiveness can be obtained by combining multiple sources of evidence of relevancy such as document representations, query formulations and search strategies.</Paragraph>
    <Paragraph position="1">  Recent editions of WordNet now contain information on the probability of use of a word based on polysemy. WordNet researchers noted the direct relationship between the increase in the frequency of occurrence of a word and the number of distinct meanings it has. This frequency value could also be used in the 'cleaning' process.</Paragraph>
    <Paragraph position="2"> In this paper we investigated the impact on FSD performance when a composite document representation is used in this TDT task. Our results showed that a marginal increase in system effectiveness could be achieved when lexical chain representations were used in conjunction with free text representations. In particular, we saw that the miss rate of our FSD system LexDetect, decreased with little or no impact to the false alarm rate of the system. When a weighted combination of evidence was used on the same system this improvement was even more apparent. From these results we deduced that using our chain word representation as stronger evidence in the classification process could lead to improved performance. Based on Ng and Kantor's dissimilarity criteria for successful data fusion we attributed the success of our composite document representation to the fact that a chain word classifier is sufficiently dissimilar to a simple 'bag of words' classifier to contribute additional evidence to a combination experiment involving both these representations. In future experiments, we expect an even greater improvement in FSD effectiveness as we continue to refine our lexical chain representation.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML