File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/w02-0809_concl.xml

Size: 2,403 bytes

Last Modified: 2025-10-06 13:53:24

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0809">
  <Title>Dutch Word Sense Disambiguation: Optimizing the Localness of Context</Title>
  <Section position="6" start_page="0" end_page="0" type="concl">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> In this paper we reported on a refined version of MBWSD-D, a memory-based WSD system for Dutch. As compared to an earlier version, built on data made available to the SENSEVAL-2 competition, we have made manual corrections in the annotations of the data, and on the corrected data we have additionally cross-validated the amount of local context, which in previous work had been left arbitrarily constant at three left and right neighbouring words and their POS tags (Hendrickx and van den Bosch, 2002; Hoste et al., 2002b). Also, we did not include keyword features that were used in the mentioned studies, but were shown in those studies not to contribute to accuracy on test material. Our cross-validation experiments lead to a score on test material of 84.8%. As we have done these exeriments on a cleaned version of the data, the results described so far cannot be compared to the results described in (Hendrickx and van den Bosch, 2002), which were obtained on the previous version of the data and with different parameter optimalisations. In those experiments an optimized memory-based classifier trained only on local context of three neighbouring words right and left, achieved a score of 84.2 % on the word-expert words in the test set.</Paragraph>
    <Paragraph position="1"> To make a comparison between the results on the old version of the data and the new version, we have conducted an experiment on the new data, using the same cross-validation procedure as we have used in (Hendrickx and van den Bosch, 2002) which led to a score of 84.3% on the test set. This shows that the cleaning of the data did not give significant better results.</Paragraph>
    <Paragraph position="2"> Additional post-hoc analyses show that when local context is not cross-validated but held constant at two left and right neighbouring words, an accuracy of 85.0% can be obtained. This suggests that the cross-validation method has overfitted its estimations on the training material slightly; this is also witnessed by the higher cross-validated optimal accuracy on held-out training material (87.3%).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML