File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/99/w99-0620_evalu.xml

Size: 6,019 bytes

Last Modified: 2025-10-06 14:00:40

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0620">
  <Title>Learning Discourse Relations with Active Data Selection</Title>
  <Section position="7" start_page="162" end_page="1000" type="evalu">
    <SectionTitle>
4 Evaluation
</SectionTitle>
    <Paragraph position="0"> To evaluate our method, we carried out experiments, using a corpus of news articles from a Japanese economics daily (Nihon-Keizai-Shimbun-Sha, 1995). The corpus had 477 articles, randomly selected from issues that were published durilig the year. Each sentence in the articles was tagged with one of the discourse relations at the subclass level (i.e. CONSEQUEN-TIAL, ANTITHESIS, etc.). However, in evaluation experiments, we translated a subclass relation into a corresponding major class relation (SE-QUENCE/ELABORATION) for reasons discussed earlier. Furthermore , we explicitly asked coders not to tag a paragraph initial sentence for a discourse relation, for we found that coders rarely agree on their :classifications. Paragraph-initial sentences were dropped ffrom the evaluation corpus. This had left us with 5221 sentences, of which 56% are labeled as SEQUENCE and 44% ELABORATION.</Paragraph>
    <Paragraph position="1"> To find out effects of the committee-based sampling method (CBS), we ran the C4.5 (Release 5) decision tree algorithm with CBS turned on and off (Quinlan, 1993) and measured the performance by the 10-fold cross validation, in which the corpus is divided evenly into 10 blocks of data and 9 blocks are used for training and the remaining one block is held out for testing. On each validation fold, CBS starts with a set of about 512 samples from the set of training blocks and sequentially examines samples from the rest of the training set for possible labeling. If a sample is selected, then a decision tree will be trained on the sample together with the data acquired so far, and tested on the held-out data. Performance scores (error rates) are averaged over 10 folds to give a summary figure for a particular learning strategy. Throughout the experiments, we assume that k = 10 and g = 1, i.e., 10 committee members and the entropy gain of 1. Figure 1 shows the result of using CBS for a decision tree.</Paragraph>
    <Paragraph position="2"> Though the performance fluctuates erratically, we see a general tendency that the CBS method fares better than a decision tree classifier alone.</Paragraph>
    <Paragraph position="3"> In fact differences between C4.5/CBS and C4.5 alone proved statistically significant (t = 7.06, df = 90, p &lt; .01).</Paragraph>
    <Paragraph position="4"> While there seems to be a tendency for performance to improve with an increase in the amount of training data, either with or without CBS, it is apparent that an increase in the training data has non-linear effects on performance, which makes an interesting contrast with probabilistic classifiers like HMM, whose performance improves linearly as the training data grow. The reason has to do with the structural complexity of the decision tree model: it is possible that small changes in the INFO value lead to  summary figure, i.e. the average of figures obtained for a given x in 10-fold cross validation trials. The x-axis represents the amount of training data, and the y-axis the error rate. The error rate is the proportion of the misclassified instances to the total number of instances.</Paragraph>
    <Paragraph position="6"> a drastic restructuring of a decision tree. In the face of this, we made a small change to the way CBS works. The idea, which we call a sampling with error feedback, is to remove harmful examples from the training data and only use those with positive effects on performance. It forces the sampling mechanism to return to status quo ante when it finds that an example selected degrades performance. More precisely, this would be put as follows: f St U {e}, if E(CSU{e}) &lt; E(C s~) S +l \[ St otherwise St is a training set at time t. C s denotes a classifter built from the training set S. E(C s) is an error rate of a classifier C s. Thus if there is an increase or no reduction in the error rate after adding an example e to the training set, a classifter goes back to the state before the change. As Figure 2 shows, the error feedback produced a drastic reduction in the error rate. At 900, the committee-based method with the error feedback reduced the error rate by as much as 23%. Figure 3 compares performance of three sampling methods, random sampling, the committee-based sampling with 100 bootstrap replicates (i.e., K = 100) and that with 500 bootstrap replicates. In the random sampling method, a sample is selected randomly from the data and added to the training data. Figure 4 compares a random sampling approach with CBS with 500 bootstrap replicates. Both used the error feedback mechanism. Differences, though they seem small, turned out to be statistically significant (t = 4.51, df = 90, p &lt; .01), which demonstrates the significance of C4.5/CBS approach. Furthermore, Figure 5 demonstrates that the number of bootstrap replicates affects performance (t = 8.87, df = 90, p &lt; .01). CBS with 500 bootstraps performs consistently better than that with 100 bootstrap replicates. This might mean that in the current setup, 100 replicates are not enough to simulate the true distribution of P(M I S).</Paragraph>
    <Paragraph position="7"> Note that CBS with 500 replicates achieves the error rate of 33.40 with only 1008 training samples, which amount to one fourth of the training data C4.5 alone required to reach 44.64. While a direct comparison with other learning schemes in discourse such as a transformation method (Samuel et al., 1998) is not feasible, if Samuel et al. (1998)'s approach is indeed comparable to C5.0, as discussed in Samuel et al. (1998), then the present method might be able to reduce the  strapped CBS with 100 replicates (CBS100-EF), and bootstrapped CBS with 500 replicates (CBS500-EF),: all with the error feedback on.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML