File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/j01-3001_abstr.xml
Size: 7,964 bytes
Last Modified: 2025-10-06 13:41:59
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3001"> <Title>The Interaction of Knowledge Sources in Word Sense Disambiguation</Title> <Section position="2" start_page="0" end_page="322" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> Word sense disambiguation (WSD) is a problem long recognised in computational linguistics (Yngve 1955) and there has been a recent resurgence of interest, including a special issue of this journal devoted to the topic (Ide and V4ronis 1998). Despite this there is still a considerable diversity of methods employed by researchers, as well as differences in the definition of the problems to be tackled. The SENSEVAL evaluation framework (Kilgarriff 1998) was a DARPA-style competition designed to bring some conformity to the field of WSD, although it has yet to achieve that aim completely. The main sources of divergence are the choice of computational paradigm, the proportion of text words disambiguated, the granularity of the meanings assigned to them, and the knowledge sources used. We will discuss each in turn.</Paragraph> <Paragraph position="1"> Resnik and Yarowsky (1997) noted that, for the most part, part-of-speech tagging is tackled using the noisy channel model, although transformation rules and grammaticostatistical methods have also had some success. There has been far less consensus as to the best approach to WSD. Currently, machine learning methods (Yarowsky 1995; Rigau, Atserias, and Agirre 1997) and combinations of classifiers (McRoy 1992) have been popular. This paper reports a WSD system employing elements of both approaches.</Paragraph> <Paragraph position="2"> Another source of difference in approach is the proportion of the vocabulary disambiguated. Some researchers have concentrated on producing WSD systems that base results on a limited number of words, for example Yarowsky (1995) and Schtitze (1992) who quoted results for 12 words, and a second group, including Leacock, Towell, and Voorhees (1993) and Bruce and Wiebe (1994), who gave results for just one, namely interest. But limiting the vocabulary on which a system is evaluated can have two serious drawbacks. First, the words used were not chosen by frequency-based sampling techniques and so we have no way of knowing whether or not they are special cases, a point emphasised by Kilgarriff (1997). Secondly, there is no guarantee * Department of Computer Science, 211 Regent Court, Portobello Street, Sheffield $1 4DP, UK (~) 2001 Association for Computational Linguistics Computational Linguistics Volume 27, Number 3 that the techniques employed will be applicable when a larger vocabulary is tackled.</Paragraph> <Paragraph position="3"> However it is likely that mark-up for a restricted vocabulary can be carried out more rapidly since the subject has to learn the possible senses of fewer words.</Paragraph> <Paragraph position="4"> Among the researchers mentioned above, one must distinguish between, on the one hand, supervised approaches that are inherently limited in performance to the words over which they evaluate because of limited training data and, on the other hand, approaches whose unsupervised learning methodology is applied to only small numbers of words for evaluation, but which could in principle have been used to tag all content words in a text. Others, such as Harley and Glennon (1997) and ourselves Wilks and Stevenson (1998a, 1998b; Stevenson and Wilks 1999), have concentrated on approaches that disambiguate all content words. 1 In addition to avoiding the problems inherent in restricted vocabulary systems, wide coverage systems are more likely to be useful for NLP applications, as discussed by Wilks et al. (1990).</Paragraph> <Paragraph position="5"> A third difference concerns the granularity of WSD attempted, which one can illustrate in terms of the two levels of semantic distinctions found in many dictionaries: homograph and sense (see Section 3.1). Like Cowie, Guthrie, and Guthrie (1992), we shall give results at both levels, but it is worth pointing out that the targets of, say, work using translation equivalents (e.g., Brown et al. 1991; Gale, Church, and Yarowsky 1992c; and see Section 2.3) and Roget categories (Yarowsky 1992; Masterman 1957) correspond broadly to the wider, homograph, distinctions.</Paragraph> <Paragraph position="6"> In this paper we attempt to show that the high level of results more typical of systems trained on many instances of a restricted vocabulary can also be obtained by large vocabulary systems, and that the best results are to be obtained from an optimization of a combination of types of lexical knowledge (see Section 2).</Paragraph> <Section position="1" start_page="0" end_page="322" type="sub_section"> <SectionTitle> 1.1 Lexical Knowledge and WSD </SectionTitle> <Paragraph position="0"> Syntactic, semantic, and pragmatic information are all potentially useful for WSD, as can be demonstrated by considering the following sentences: John did not feel well.</Paragraph> <Paragraph position="1"> John tripped near the well.</Paragraph> <Paragraph position="2"> The bat slept.</Paragraph> <Paragraph position="3"> He bought a bat from the sports shop.</Paragraph> <Paragraph position="4"> The first two sentences contain the ambiguous word well; as an adjective in (1) where it is used in its &quot;state of health&quot; sense, and as a noun in (2), meaning &quot;water supply&quot;. Since the two usages are different parts of speech they can be disambiguated by this syntactic property.</Paragraph> <Paragraph position="5"> Sentence (3) contains the word bat, whose nominal readings are ambiguous between the &quot;creature&quot; and &quot;sports equipment&quot; meanings. Part-of-speech information cannot disambiguate the senses since both are nominal usages. However, this sentence can be disambiguated using semantic information, such as preference restrictions. The verb sleep prefers an animate subject and only the &quot;creature&quot; sense of bat is animate. So Sentence (3) can be effectively disambiguated by its semantic behaviour but not by its syntax.</Paragraph> <Paragraph position="6"> Stevenson and Wilks Interaction of Knowledge Sources in WSD A preference restriction will not disambiguate Sentence (4) since the direct object preference will be at least as general as physical object, and any restriction on the direct object slot of the verb sell would cover both senses. The sentence can be disambiguated on pragmatic grounds because it is far more likely that sports equipment will be bought in a sports shop. Thus pragmatic information can be used to disambiguate bat to its &quot;sports equipment&quot; sense.</Paragraph> <Paragraph position="7"> Each of these knowledge sources has been used for WSD and in Section 3 we describe a method which performs rough-grained disambiguation using part-of-speech information. Wilks (1975) describes a system which performs WSD using semantic information in the form of preference restrictions. Lesk (1986) also used semantic information for WSD in the form of textual definitions from dictionaries. Pragmatic information was used by Yarowsky (1992) whose approach relied upon statistical models of categories from Roget's Thesaurus (Chapman, 1977), a resource that had been used in much earlier approaches to WSD such as Masterman (1957).</Paragraph> <Paragraph position="8"> The remainder of this paper is organised as follows: Section 2 reviews some systems which have combined knowledge sources for WSD. In Section 3 we discuss the relationship between semantic disambiguation and part-of-speech tagging, reporting an experiment which quantifies the connection. A general WSD system is presented in Section 4. In Section 5 we explain the strategy used to evaluate this system, and we report the results in Section 6.</Paragraph> </Section> </Section> class="xml-element"></Paper>