File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/03/w03-1024_evalu.xml
Size: 5,245 bytes
Last Modified: 2025-10-06 13:59:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-1024"> <Title>Japanese Zero Pronoun Resolution based on Ranking Rules and Machine Learning</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 3 Results </SectionTitle> <Paragraph position="0"> We conducted leave-one(-article)-out experiments.</Paragraph> <Paragraph position="1"> For each article, 29 other articles were used for training. Table 1 compares the scores of the above methods. 'First' picks up the first candidate given by a given lexicographical ordering. The acronym 'vrads' stands for the lexicographical ordering of</Paragraph> <Paragraph position="3"> is positive. Consequently, it is independent of the ordering (unless two or more candidates have the best value). 'Svm1' uses the ordinary SVM (Vapnik, 1995) while 'svm2' uses a modified SVM for unbalanced data (Morik et al., 1999), which gives a large penalty to misclassification of a minority (= positive) example.5 In general, svm2 accepts more cadidates than svm1. According to this table, svm1 is too severe to exclude only bad candidates. We also tried the maximum entropy model 6 (mem) and C4.5, but they were also too severe.</Paragraph> <Paragraph position="4"> When we use SVM, we have to choose a good kernel for better performance. Here, we used the linear kernel (a61 a21 a54 a3a38a63a65a26a69a11 a54 a67 a63 ) for SVM because it was best according to our preliminary experiments.</Paragraph> <Paragraph position="5"> We set maxDi at 3 because it gave the best results.</Paragraph> <Paragraph position="6"> The table also shows Seki's scores for reference, but it is not fair to compare our scores with Seki's scores directly because our data is slightly different from Seki's. The number of zeros in general in our data is 347, while Seki resolved 355 detected zeros in (Seki et al., 2002a) and 404 in (Seki et al., 2002b). The number of zeros in our editorial is 514, while (Seki et al., 2002a) resolved 498 detected zeros. In order to overcome the data sparseness,</Paragraph> <Paragraph position="8"> ples/number of positive examples.</Paragraph> <Paragraph position="9"> statistics. Without the data, their scores degraded about 5 points. We have not conducted experiments that use unannotated corpora; this task is our future work.</Paragraph> <Paragraph position="10"> As we expected, instances of a21 Via3a6a5a50a5a49a3 Aga3a7a5a50a5a49a3 Saa3a7a5a50a5a49a26 show good performance. Without SVMs, 'vrads' is the best for general in the table. It is interesting that such a simple ordering gives better performance than SVMs. However, the combination of 'vrads' and 'svm2' (= vrads+svm2) gives even better results. In general, 'a101 +svm2' is better than 'first' and 'a101 +svm1.' With SVM, 'davrs+svm2' gave the best result for editorial. Editorial articles sometimes use anthropomorphism (e.g., The report says . . . ) that violates semantic constraints. Therefore, 'vrads' does not work well for such cases.</Paragraph> <Paragraph position="11"> Table 2 shows the weights of the above features determined by svm2 for a fold of the leave-one-out experiment of 'vrads+svm2.' The weights can be given by rewriting a58 property-sharing (Ag), semantic violation (Vi), candidate's particle (CP), and distance (Di) are very important features. Our new features Parallel, Unfinished, and Intra also obtained relatively large weights. Semantic categories 'suggestions' and 'report' reflect the fact that some articles use anthropomorphism. These weights will be useful to design better heuristic rules. The fact that Unfinished's weight almost cancels Relative's weight justifies the a67 a119a120a26a25a121 . He tried to find useful features by feature elimination. Since features are not completely independent, removing a heavily weighted feature does not necessarily degrade the system's performance. Hence, feature elimination is more reliable for reducing the number of features.</Paragraph> <Paragraph position="12"> However, feature elimination takes a long time. On the other hand, feature weights can give rough guidance. According to the table, our new features (Parallel, Unfinished, and Intra) obtained relatively large weights. This implies their importance. When we eliminated these three features, vrads+svm2's score for editorial dropped by 4 points. Therefore, combinations of these three features are useful.</Paragraph> <Paragraph position="13"> Recently, Iida et al. (2003a) proposed an SVM-based tournament model that compares two candidates and selects the better one. We would like to compare or combine their method with our method.</Paragraph> <Paragraph position="14"> For further improvement, we have to make the morphological analyzer and the dependency analyzer more reliable because they make many mistakes when they process complex sentences.</Paragraph> <Paragraph position="15"> SVM has often been criticized as being too slow.</Paragraph> <Paragraph position="16"> However, the above data were small enough for the state-of-the-art SVM programs. The number of examples in each set of training data was about 5,0006,100, and each training phase took only 5-18 seconds on a 2.4-GHz Pentium 4 machine.</Paragraph> </Section> class="xml-element"></Paper>