File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/96/p96-1006_relat.xml

Size: 4,753 bytes

Last Modified: 2025-10-06 14:16:02

<?xml version="1.0" standalone="yes"?>
<Paper uid="P96-1006">
  <Title>Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach</Title>
  <Section position="7" start_page="499" end_page="499" type="relat">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> There is now a large body of past work on WSD.</Paragraph>
    <Paragraph position="1"> Early work on WSD, such as (Kelly and Stone, 1975; Hirst, 1987) used hand-coding of knowledge to perform WSD. The knowledge acquisition process is laborious. In contrast, LEXAS learns from tagged sentences, without human engineering of complex rules.</Paragraph>
    <Paragraph position="2"> The recent emphasis on corpus based NLP has resulted in much work on WSD of unconstrained real-world texts. One line of research focuses on the use of the knowledge contained in a machine-readable dictionary to perform WSD, such as (Wilks et al., 1990; Luk, 1995). In contrast, LEXAS uses supervised learning from tagged sentences, which is also the approach taken by most recent work on WSD, including (Bruce and Wiebe, 1994; Miller et al., 1994; Leacock et al., 1993; Yarowsky, 1994; Yarowsky, 1993; Yarowsky, 1992).</Paragraph>
    <Paragraph position="3"> The work of (Miller et al., 1994; Leacock et al., 1993; Yarowsky, 1992) used only the unordered set of surrounding words to perform WSD, and they used statistical classifiers, neural networks, or IR-based techniques. The work of (Bruce and Wiebe, 1994) used parts of speech (POS) and morphological form, in addition to surrounding words. However, the POS used are abbreviated POS, and only in a window of -b2 words. No local collocation knowledge is used. A probabilistic classifier is used in (Bruce and Wiebe, 1994).</Paragraph>
    <Paragraph position="4"> That local collocation knowledge provides important clues to WSD is pointed out in (Yarowsky, 1993), although it was demonstrated only on performing binary (or very coarse) sense disambiguation. The work of (Yarowsky, 1994) is perhaps the most similar to our present work. However, his work used decision list to perform classification, in which only the single best disambiguating evidence that matched a target context is used. In contrast, we used exemplar-based learning, where the contributions of all features are summed up and taken into account in coming up with a classification. We also include verb-object syntactic relation as a feature, which is not used in (Yarowsky, 1994). Although the work of (Yarowsky, i994) can be applied to WSD, the results reported in (Yarowsky, 1994) only dealt with accent restoration, which is a much simpler problem. It is unclear how Yarowsky's method will fare on WSD of a common test data set like the one we used, nor has his method been tested on a large data set with highly ambiguous words tagged with the refined senses of WORDNET.</Paragraph>
    <Paragraph position="5"> The work of (Miller et al., 1994) is the only prior work we know of which attempted to evaluate WSD on a large data set and using the refined sense distinction of WORDNET. However, their results show no improvement (in fact a slight degradation in performance) when using surrounding words to perform WSD as compared to the most frequent heuristic.</Paragraph>
    <Paragraph position="6"> They attributed this to insufficient training data in SEMCOm In contrast, we adopt a different strategy of collecting the training data set. Instead of tagging every word in a running text, as is done in SEMCOR, we only concentrate on the set of 191 most frequently occurring and most ambiguous words, and collected large enough training data for these words only. This strategy yields better results, as indicated by a better performance of LEXAS compared with the most frequent heuristic on this set of words.</Paragraph>
    <Paragraph position="7"> Most recently, Yarowsky used an unsupervised learning procedure to perform WSD (Yarowsky, 1995), although this is only tested on disambiguating words into binary, coarse sense distinction. The effectiveness of unsupervised learning on disambiguating words into the refined sense distinction of WoRBNET needs to be further investigated. The work of (McRoy, 1992) pointed out that a diverse set of knowledge sources are important to achieve WSD, but no quantitative evaluation was given on the relative importance of each knowledge source.</Paragraph>
    <Paragraph position="8"> No previous work has reported any such evaluation either. The work of (Cardie, 1993) used a case-based approach that simultaneously learns part of speech, word sense, and concept activation knowledge, although the method is only tested on domain-specific texts with domain-specific word senses.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML