File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0213_intro.xml

Size: 7,933 bytes

Last Modified: 2025-10-06 14:06:20

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0213">
  <Title>A Perspective on Word Sense Disambiguation Methods and Their Evaluation</Title>
  <Section position="3" start_page="0" end_page="80" type="intro">
    <SectionTitle>
2 Observations
</SectionTitle>
    <Paragraph position="0"> Observation 1. Evaluation of word sense disambiguation systems is not yet standardized.</Paragraph>
    <Paragraph position="1"> Evaluation of many natural language processing tasks including part-of-speech tagging and parsing has become fairly standardized, with most reported studies using common training and testing resources such as the Brown Corpus and Penn Treebank. Performance measures include a fairly well recognized suite of metrics including crossing brackets and precision/recall of non-terminal label placement. Several researchers (including Charniak, Collins and Magerman) have facilitated contrastive evaluation of their parsers by even training and testing on identical segments of the Treebank. Government funding agencies have accelerated this process, and even the task of anaphora resolution has achieved an evaluation standard under the MUC-6 program.</Paragraph>
    <Paragraph position="2"> In contrast, most previous work in word sense disambiguation has tended to use different sets of polysemous words, different corpora and different evaluation metrics. Some clusters of studies have used common test suites, most notably the 2094-word Hne data of Leacock et al. (1993), shared by Lehman (1994) and Mooney (1996) and evaluated on the system of Gale, Church and Yarowsky (1992). Also, researchers have tended to keep their evaluation data and procedures somewhat standard across their own studies for internally consistent comparison. Nevertheless, there are nearly as many test suites as there are researchers in this field.</Paragraph>
    <Paragraph position="3"> Observation 2. The potential for WSD varies by task. As Wilks and Stevenson (1996) emphasize, disambiguating word senses is not an end in itself, but rather an intermediate capability that is believed -- but not yet proven -- to improve natural language applications. It would appear, however, that different major applications of language differ in their potential to make use of successful word sense information. In information retrieval, even perfect word sense information may be of only limited utility, largely owing to the implicit disambiguation that takes place when,multiple words within a query match multiple words within a document (Krovetz and Croft, 1992). In speech recognition, sense information is potentially most relevant in the form of word equivalence classes for smoothing in language models, but smoothing based on equivalence classes of contexts (e.g. (Bahl et al., 1983; Katz, 1987)) has a far better track record than smoothing based On classes of words (e.g. (Brown et al., 1992)).</Paragraph>
    <Paragraph position="4"> The potential for using word senses in machine translation seems rather more promising. At the level of monolingual lexical information useful for high quality machine translation, for example, there  is good reason to associate information about syntactic realizations of verb meanings with verb senses rather than verb tokens (Don' and Jones, 1996a; 1996b). And of course unlike machine translation or speech recognition, the human process followed in completing the task takes exp\]\]icit account of word senses, in that translators make use of correspondences in bilingual dictionaries organized according to word senses.</Paragraph>
    <Paragraph position="5"> Observation 3. Adequately large sense-tagged data sets are difficult to obtain. Availability of data is a significant factor contributing to recent advances in part-of-speech tagging, parsing, etc. For the most successful approaches to such problems, correctly annotated data are crucial for training learning-based algorithms. Regardless of whether or not learning is involved, the prev~illng evaluation methodology requires correct test sets in order to rigorously assess the quality of algorithms and compare their performance.</Paragraph>
    <Paragraph position="6"> Unfortunately, of the few sense-annotated corpora currently available, virtually all are tagged collections of a single ambiguous word such as line or tank. The only broad-coverage annotation of all the words in a subcorpus is the WordNet semantic concordance (Miller et ai., 1994). This represents a very important contribution to the field, providing the first large-scale, balanced data set for the study of the distributional properties of polysemy in English.</Paragraph>
    <Paragraph position="7"> However, its utility as a tr~inlng and evaluation resource for supervised sense taggers is currently somewhat limited by its token-by-token sequential tagging methodology, yielding too few tagged instances of the large majority of polysemous words (typically fewer than 10 each), rather than providing much larger training/testing sets for a selected subset of the vocabulary. In addition, sequential ~nnotation forces annotators to repeatedly refamiliarize themselves with the sense inventories of each word, slowing ~nnotation speed and lowering intra- and inter-annotator agreement rates. Nevertheless, the Word-Net semantic hierarchy itself is a central training resource for a variety of sense disambiguation algorithms and the existence of a corpus tagged in this sense inventory is a very useful complementary resource, even if small.</Paragraph>
    <Paragraph position="8"> The other major potential source of sense-tagged data comes from parallel aligned bilingual corpora.</Paragraph>
    <Paragraph position="9"> Here, translation distinctions can provide a practical correlate to sense distinctions, as when instances of the English word duty translated to the French words devoir and droit correspond to the mono-lingual sense distinction between dUty/OBLIGATION and duty/TAX. Current offerings of parallel bilingual corpora are limited, but as their availability and diversity increase they offer the possibility of limitless '~agged&amp;quot; training data without the need for manual annotation.</Paragraph>
    <Paragraph position="10"> Given the data requirements for supervised learning algorithms and the current paucity of such data, we believe that unsupervised and minimally supervised methods offer the primary near-term hope for broad-coverage sense tagging. However, we see strong future potential for supervised algorithms using many types of aligned bilingual corpora for many types of sense distinctions.</Paragraph>
    <Paragraph position="11"> Observation 4. The field has narrowed down approaches, but only a little. In the area of part-of-speech tagging, the noisy channel model dominates (e.g. (Bald and Mercer, 1976; Jelinek, 1985; Church, 1988)), with transformational role-based methods (Brill, 1993) and grammatico-statistical hybrids (e.g. (Tapanainen and Voutilainen, 1994)) also having a presence. Regardless of which of these approaches one takes, there seems to be consensus on what makes part-of-speech tagging successful:  fully using only tag-level models without lexical sensitivities besides the priors.</Paragraph>
    <Paragraph position="12"> * Standard annotated corpora of adequate size have long been available.</Paragraph>
    <Paragraph position="13">  In contrast, approaches to WSD attempt to take advantage of many different sources of information (e.g. see (McRoy, 1992; Ng and Lee, 1996; Bruce and Wiebe, 1994)); it seems possible to obtain benefit from sources ranging from local collocational clues (Yarowsky, 1993) to membership in semantically or topically related word classes (Y=arowsky, 1992; Resnik, 1993) to consistency of word usages within a discourse (Gale et al., 1992); and disambignation seems highly lexically sensitive, in effect requiring specialized disamhignators for each polysemous word.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML