XML Viewer - w06-3803

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/06/w06-3803_evalu.xml
Size: 2,153 bytes
Last Modified: 2025-10-06 13:59:57
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-3803">
  <Title>Graph-Based Text Representation for Novelty Detection</Title>
  <Section position="6" start_page="21" end_page="22" type="evalu">
    <SectionTitle>
5 Results
</SectionTitle>
    <Paragraph position="0"> To establish a baseline, we used a simple bag-of-words approach and KL divergence as a feature for classification. Employing the protocol described above, i.e. training the classifier on the 2003 data set, and optimizing the parameters on 2 folds of the training data, we achieve a surprisingly high result of 0.618 average F-measure on the 2004 data. This result would place the run at a tie for third place with the UMass system in the 2004 competition.</Paragraph>
    <Paragraph position="1"> In the tables below, KL refers to the KL divergence feature, TR to the TextRank based features and SG to the simple graph based features.</Paragraph>
    <Paragraph position="2"> Given that the feature sets we investigate possibly capture orthogonal properties, we were also interested in using combinations of the three feature sets. For the graph based features we determined on the training set that results were optimal at a &amp;quot;window size&amp;quot; of 6, i.e. if graph edges are produced only if the distance between terms is six tokens or less. All results are tabulated in Table 1, with the best results boldfaced.</Paragraph>
    <Paragraph position="3">  We used the McNemar test to determine pairwise statistical significance levels between the novelty classifiers based on different feature sets  .</Paragraph>
    <Paragraph position="4"> The two (boldfaced) best results from Table 1 are significantly different from the baseline at 0.999 confidence. Individual sentence level  We could not use the Wilcoxon rank test for our results since we only had binary classification results for each sentence, as opposed to individual (class probability) scores.</Paragraph>
    <Paragraph position="5">  classifications from the official 2004 runs were not available to us, so we were not able to test for statistical significance on our results versus TREC results.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML