File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/04/n04-1040_abstr.xml

Size: 1,457 bytes

Last Modified: 2025-10-06 13:43:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="N04-1040">
  <Title>Multiple Similarity Measures and Source-Pair Information in Story Link Detection</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> State-of-the-art story link detection systems, that is, systems that determine whether two stories are about the same event or linked, are usually based on the cosine-similarity measured between two stories. This paper presents a method for improving the performance of a link detection system by using a variety of similarity measures and using source-pair specific statistical information. The utility of a number of different similarity measures, including cosine, Hellinger, Tanimoto, and clarity, both alone and in combination, was investigated.</Paragraph>
    <Paragraph position="1"> We also compared several machine learning techniques for combining the different types of information. The techniques investigated were SVMs, voting, and decision trees, each of which makes use of similarity and statistical information differently. Our experimental results indicate that the combination of similarity measures and source-pair specific statistical information using an SVM provides the largest improvement in estimating whether two stories are linked; the resulting system was the best-performing link detection system at TDT-2002.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML