File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/p03-1004_evalu.xml

Size: 2,753 bytes

Last Modified: 2025-10-06 13:58:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="P03-1004">
  <Title>Fast Methods for Kernel-based Text Analysis</Title>
  <Section position="7" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
5.4 Results
</SectionTitle>
    <Paragraph position="0"> Tables 2, 3 and 4 show the execution time, accuracy4, and jOhmj (size of extracted subsets), by changing from 0.01 to 0.0005.</Paragraph>
    <Paragraph position="1"> The PKI leads to about 2 to 12 times improvements over the PKB. In JDP, the improvement is significant. This is because B, the average of h(i) over all items i 2 F, is relatively small in JDP. The improvement significantly depends on the sparsity of the given support examples.</Paragraph>
    <Paragraph position="2"> The improvements of the PKE are more significant than the PKI. The running time of the PKE is 30 to 300 times faster than the PKB, when we set an appropriate , (e.g., = 0:005 for EBC and JWS, = 0:0005 for JDP). In these settings, we could preserve the final accuracies for test data.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.5 Frequency-based Pruning
</SectionTitle>
      <Paragraph position="0"> The PKE with a Cubic Kernel tends to make Ohm large (e.g., jOhmj = 2:32 million for JWS, jOhmj = 8:26 million for JDP).</Paragraph>
      <Paragraph position="1"> To reduce the size of Ohm, we examined simple frequency-based pruning experiments. Our extension is to simply give a prior threshold &gt;&gt;(= 1;2;3;4:::), and erase all subsets which occur in less than &gt;&gt; support examples. The calculation of frequency can be similarly conducted by the PrefixSpan algorithm. Tables 5 and 6 show the results of frequency-based pruning, when we fix =0:005 for JWS, and =0:0005 for JDP.</Paragraph>
      <Paragraph position="2"> In JDP, we can make the size of set Ohm about one third of the original size. This reduction gives us not only a slight speed increase but an improvement of accuracy (89.29%!89.34%). Frequency-based pruning allows us to remove subsets that have large weight and small frequency. Such subsets may be generated from errors or special outliers in the training examples, which sometimes cause an overfitting in training.</Paragraph>
      <Paragraph position="3"> In JWS, the frequency-based pruning does not work well. Although we can reduce the size of Ohm by half, the accuracy is also reduced (97.94%!97.83%). It implies that, in JWS, features  (Note: In EBC, to handle K-class problems, we use a pairwise classification; building KPS(K!1)=2 classifiers considering all pairs of classes, and final class decision was given by majority voting. The values in this column are averages over all pairwise classifiers.)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML