File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/w02-1006_intro.xml

Size: 2,117 bytes

Last Modified: 2025-10-06 14:01:37

<?xml version="1.0" standalone="yes"?>
<Paper uid="W02-1006">
  <Title>An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Natural language is inherently ambiguous. A word can have multiple meanings (or senses). Given an occurrence of a word a2 in a natural language text, the task of word sense disambiguation (WSD) is to determine the correct sense of a2 in that context.</Paragraph>
    <Paragraph position="1"> WSD is a fundamental problem of natural language processing. For example, effective WSD is crucial for high quality machine translation.</Paragraph>
    <Paragraph position="2"> One could envisage building a WSD system using handcrafted rules or knowledge obtained from linguists. Such an approach would be highly laborintensive, with questionable scalability. Another approach involves the use of dictionary or thesaurus to perform WSD.</Paragraph>
    <Paragraph position="3"> In this paper, we focus on a corpus-based, supervised learning approach. In this approach, to disambiguate a word a2 , we first collect training texts in which instances of a2 occur. Each occurrence of a2 is manually tagged with the correct sense. We then train a WSD classifier based on these sample texts, such that the trained classifier is able to assign the sense of a2 in a new context.</Paragraph>
    <Paragraph position="4"> Two WSD evaluation exercises, SENSEVAL-1 (Kilgarriff and Palmer, 2000) and SENSEVAL-2 (Edmonds and Cotton, 2001), were conducted in 1998 and 2001, respectively. The lexical sample task in these two SENSEVALs focuses on evaluating WSD systems in disambiguating a subset of nouns, verbs, and adjectives, for which manually sense-tagged training data have been collected.</Paragraph>
    <Paragraph position="5"> In this paper, we conduct a systematic evaluation of the various knowledge sources and supervised learning algorithms on the English lexical sample data sets of both SENSEVALs.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML