File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/j98-1002_abstr.xml
Size: 4,543 bytes
Last Modified: 2025-10-06 13:49:09
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-1002"> <Title>Similarity-based Word Sense Disambiguation</Title> <Section position="2" start_page="0" end_page="0" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Word sense disambiguation (WSD) is the problem of assigning a sense to an ambiguous word, using its context. We assume that different senses of a word correspond to different entries in its dictionary definition. For example, suit has two senses listed in a dictionary: 'an action in court,' and 'suit of clothes.' Given the sentence The union's lawyers are reviewing the suit, we would like the system to decide automatically that suit is used there in its court-related sense (we assume that the part of speech of the polysemous word is known).</Paragraph> <Paragraph position="1"> In recent years, text corpora have been the main source of information for learning automatic WSD (see, for example, Gale, Church, and Yarowsky \[1992\]). A typical corpus-based algorithm constructs a training set from all contexts of a polysemous word W in the corpus, and uses it to learn a classifier that maps instances of W (each supplied with its context) into the senses. Because learning requires that the examples in the training set be partitioned into the different senses, and because sense information is not available in the corpus explicitly, this approach depends critically on manual sense tagging--a laborious and time-consuming process that has to be repeated for every word, in every language, and, more likely than not, for every topic of discourse or source of information.</Paragraph> <Paragraph position="2"> The need for tagged examples creates a problem referred to in previous works as the knowledge acquisition bottleneck: training a disambiguator for W requires that the examples in the corpus be partitioned into senses, which, in turn, requires a fully operational disambiguator. The method we propose circumvents this problem by automatically tagging the training set examples for W using other examples, that do not contain W, but do contain related words extracted from its dictionary definition. For instance, in the training set for suit, we would use, in addition to the contexts of suit, * Dept. of Applied Mathematics and Computer Science, Rehovot 76100, Israel t Center for Biological & Computational Learning, MIT E25-201, Cambridge, MA 02142. Present address: School of Cognitive and Computing Sciences, University of Sussex, Falmer BN1 9QH, UK (~) 1998 Association for Computational Linguistics Computational Linguistics Volume 24, Number 1 all the contexts of court and of clothes in the corpus, because court and clothes appear in the machine-readable dictionary (MRD) entry of suit that defines its two senses. Note that, unlike the contexts of suit, which may discuss either court action or clothing, the contexts of court are not likely to be especially related to clothing, and, similarly, those of clothes will normally have little to do with lawsuits. We will use this observation to tag the original contexts of suit.</Paragraph> <Paragraph position="3"> Another problem that affects the corpus-based WSD methods is the sparseness of data: these methods typically rely on the statistics of co-occurrences of words, while many of the possible co-occurrences are not observed even in a very large corpus (Church and Mercer 1993). We address this problem in several ways. First, instead of tallying word statistics from the examples for each sense (which may be unreliable when the examples are few), we collect sentence-level statistics, representing each sentence by the set of features it contains (for more on features, see Section 4.2). Second, we define a similarity measure on the feature space, which allows us to pool the statistics of similar features. Third, in addition to the examples of the polysemous word 142 in the corpus, we learn also from the examples of all the words in the dictionary definition of W. In our experiments, this resulted in a training set that could be up to 20 times larger than the set of original examples.</Paragraph> <Paragraph position="4"> The rest of this paper is organized as follows. Section 2 describes the approach we have developed. In Section 3, we report the results of tests we have conducted on the Treebank-2 corpus. Section 4 concludes with a discussion of related methods and a summary. Proofs and other details of our scheme can be found in the appendix.</Paragraph> </Section> class="xml-element"></Paper>