File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1027_evalu.xml
Size: 5,631 bytes
Last Modified: 2025-10-06 13:59:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1027"> <Title>Virtual Examples for Text Classification with Support Vector Machines</Title> <Section position="8" start_page="3" end_page="6" type="evalu"> <SectionTitle> 5.4 Results </SectionTitle> <Paragraph position="0"> First, we carried out experiments using GenerateByDeletion and GenerateByAddition separately to create virtual examples, where a virtual example was created per Support Vector. We did not generate virtual examples from non support vectors. We set the parameter D8 to BCBMBCBH for GenerateByDeletion and GenerateByAddition for all the experiments.</Paragraph> <Paragraph position="1"> To build an SVM with virtual examples we use the following steps: We first tried D8 BP BCBMBCBDBN BCBMBCBHBN and BCBMBDBC with GenerateByDeletion using the 9603 size training set. The value D8 BPBCBMBCBH yielded best micro-average F-measure for the test set. We used the same value also for GenerateByAddition.</Paragraph> <Paragraph position="2"> 1. Train an SVM.</Paragraph> <Paragraph position="3"> 2. Extract Support Vectors.</Paragraph> <Paragraph position="4"> 3. Generate virtual examples from the Support Vectors.</Paragraph> <Paragraph position="5"> 4. Train another SVM using both the original la null beled examples and the virtual examples. We evaluated the performance of the two methods depending on the size of a training set. We created subsamples by selecting randomly from the 9603 size training set. We prepared seven sizes: 9603, 4802, 2401, 1200, 600, 300, and 150.</Paragraph> <Paragraph position="6"> Micro-average F-measures of the two methods are shown in Table 3. We see from Table 3 that both the methods give better performance than that of the original SVM. The smaller the number of examples in the training set is, the larger the gain is. For the 9603 size training set, the gain of GenerateByDeletion is 0.75 (BP BLBCBMBDBJ A0 BKBLBMBGBE), while for the 150 size set, the gain is 6.88 (BP BIBCBMBDBI A0 BHBFBMBEBK). These results suggest that in the smaller training sets there are not enough various examples to give a accurate decision boundary and therefore the effect of virtual examples is larger at the smaller training sets. It is reasonable to conclude that GenerateByDeletion and GenerateByAddition generated good virtual examples for the task and this led to the performance gain.</Paragraph> <Paragraph position="7"> After we found that the simple two methods to generate virtual support vectors were effective, we examined a combined method which is to use both GenerateByDeletion and GenerateByAddition. Two virtual examples are generated per Support Vector. The performance of the combined method is also shown in Table 3. The performance gain of the combined method is larger than that with either GenerateByDeletion or GenerateByAddition.</Paragraph> <Paragraph position="8"> Furthermore, we carried out another experiment with a combined method to create two virtual examples with GenerateByDeletion and GenerateByAddition respectively. That is, four virtual examples were generated from a Support Vector. The performance of that setting is shown in Table 3. The best Since we selected samples randomly, some smaller training sets of low frequent categories may have had few or even zero positive examples.</Paragraph> <Paragraph position="9"> training sets F-measures cannot be computed because the precisions are undefined.</Paragraph> <Paragraph position="10"> result is achieved by the combined method to create four virtual examples per Support Vector.</Paragraph> <Paragraph position="11"> For the rest of this section, we limit our discussion to the comparison of the results of the original SVM and SVM with four virtual examples per SV (SVM with 4 VSVs). The learning curves of the original SVM and SVM with 4 VSVs are shown in Figures 3 and 4. It is clear that SVM with 4 VSVs outperforms the original SVM considerably in terms of both micro-average F-measure and macro-average F-measure. SVM with 4 VSVs achieves a given level of performance with roughly half of the labeled examples which the original SVM requires. One might suppose that the improvement of F-measure Number of Examples in Training Set is realized simply because the recall gets highly improved while the error rate increases. We plot changes of the error rate for 32990 tests (3299 tests for each of the 10 categories) in Figure 5. SVM with 4 VSVs still outperforms the original SVM significantly. null The performance changes for each of the 10 categories are shown in Tables 4 and 5. SVM with 4 VSVs is better than the original SVM for almost all the categories and all the sizes except for &quot;interest&quot; and &quot;wheat&quot; at the 9603 size training set. For low frequent categories such as &quot;ship&quot;, &quot;wheat&quot; and &quot;corn&quot;, the classifiers of the original SVM perform poorly. There are many cases where they never output 'positive', i.e. the recall is zero. It suggests that the original SVM fails to find a good hyperplane due to the imbalanced training sets which have very few We have done the significance test which is called &quot;p-test&quot; in (Yang and Liu, 1999), requiring significance at the 0.05 level. Although at the 9603 size training set the improvement of the error rate is not statistically significant, in all the other cases the improvement is significant.</Paragraph> <Paragraph position="12"> positive examples. In contrast, SVM with 4 VSVs yields better results for such harder cases.</Paragraph> </Section> class="xml-element"></Paper>