File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/02/w02-0809_evalu.xml
Size: 2,720 bytes
Last Modified: 2025-10-06 13:58:53
<?xml version="1.0" standalone="yes"?> <Paper uid="W02-0809"> <Title>Dutch Word Sense Disambiguation: Optimizing the Localness of Context</Title> <Section position="5" start_page="0" end_page="0" type="evalu"> <SectionTitle> 4 Results </SectionTitle> <Paragraph position="0"> The top line of Table 1 shows the mean score of all the word experts together on the test set. The score of the word experts on the test set, 84.8%, is generously higher than the baseline score of 77.2%. These are the results of the word experts only; the second row also includes the best-guess outputs for the lower-frequency words, lowering the system's performance slightly.</Paragraph> <Paragraph position="1"> test selection #words baseline system We can also calculate the score on all the words in the test set, including the unambiguous words, to give an impression of the overall performance. The unambiguous words are given a score of 100%. It might be useful for a disambiguation system to tag unambiguous words with their lemma, but the kind of tagging this is not of interest in our task. The third row of Table 1 shows the results on all words in the test set.</Paragraph> <Paragraph position="2"> The best context and parameter settings, determined by cross-validation for each word expert on the training set, is estimated to be the best setting for test material as well - this is a fundamental assumption of parameter cross-validation. As a post-hoc analysis, we checked the validity of this assumption. We partitioned the exhaustive matrix of experiments on all tested parameters, measuring the accuracy on test material while holding each value of the parameter constant. This means, for example, that we split the matrix of 1000 experiments per word expert into 500 experiments without the use of MVDM, and 500 experiments with MVDM. Two test scores are computed: the best settings from the first and the second 500 are used respectively (for each word expert) to determine the best parameter settings, and apply these to the test material. In other words, all parameters are optimized except MVDM, which is held constant (on or off). We performed this post-hoc test for all parameters. As it turned out, in six cases keeping the parameter constant led to (slighlty) better or equal performance as compared to the cross-validated 84.8%. Table 2 lists the six constant parameter settings. These results indicate that the parameter setting estimation by cross-validation suffers, albeit slightly, from overfitting on the training material.</Paragraph> <Paragraph position="3"> their accuracy on test material that, held constant, equal or outperform the cross-validated test score (top).</Paragraph> </Section> class="xml-element"></Paper>