File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-0809_intro.xml

Size: 3,054 bytes

Last Modified: 2025-10-06 14:01:35

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-0809">
  <Title>Dutch Word Sense Disambiguation: Optimizing the Localness of Context</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Solving lexical ambiguity, or word sense disambiguation (WSD), is an important task in Natural Language Processing systems (Kilgarriff and Palmer, 2000). Much like syntactic word-class disambiguation, it is not a end in itself, but rather a sub-task of other natural language processing tasks. The problem is far from solved, and research and competition in the development of WSD systems in isolation remains meritable, preferrably on many different languages and genres.</Paragraph>
    <Paragraph position="1"> This paper describes a refinement of an existing all-words WSD system for Dutch (Hoste et al., 2002b) that is an ensemble of word experts, each specialised in disambiguating the senses for one particular ambiguous wordform. Each word expert has a memory-based classification kernel. The system was developed on the basis of Dutch WSD data made available for the SENSEVAL-2 competition.</Paragraph>
    <Paragraph position="2"> The data, a collection of 102 children's books for the age range of 4 to 12, is annotated according to a non-hierarchical sense inventory that is based on a children's dictionary (for a detailed description of the data, cf. (Hendrickx and van den Bosch, 2002)).</Paragraph>
    <Paragraph position="3"> Since SENSEVAL-2, both the data and the system have been refined. The data has been cleaned by hand to remove annotation errors. Subsequently, cross-validation experiments were performed to optimize the amount of local context around the ambiguous word, which had been set arbitrarily constant in previous studies (Veenstra et al., 2000; Hendrickx and van den Bosch, 2002; Hoste et al., 2002a). Cross-validation focused on local context as opposed to non-local context (e.g. keyword features), since a post-SENSEVAL-2 study described in (Hoste et al., 2002b) indicated that for the Dutch data, WSD on local context, the immediate three left and right neighbouring words of the ambiguous words, yielded the best performance among all variants tested. Local context alone proved to be better than keyword vector representations of the wider July 2002, pp. 61-66. Association for Computational Linguistics. Disambiguation: Recent Successes and Future Directions, Philadelphia, Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense textual context, and better than classifier combination schemes.</Paragraph>
    <Paragraph position="4"> The paper is structured as follows. First, in Section 2 we briefly review the Dutch WSD system and the data it is based on. Section 3 describes the new cross-validation experiments that focus on optimising the amount of local context per word expert.</Paragraph>
    <Paragraph position="5"> Section 4 discusses the new results and puts them in perspective of related studies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML