File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/99/w99-0903_relat.xml

Size: 3,427 bytes

Last Modified: 2025-10-06 14:16:10

<?xml version="1.0" standalone="yes"?>
<Paper uid="W99-0903">
  <Title>Dual Distributional Verb Sense Disambiguation with Small Corpora and Machine Readable Dictionaries*</Title>
  <Section position="6" start_page="118" end_page="118" type="relat">
    <SectionTitle>
5 Related Work
</SectionTitle>
    <Paragraph position="0"> Using MRDs for word sense disambiguation was popularized by (Lesk, 1986). Several researchers subsequently continued and improved this line of work(Guthrie, 1991; Krovetz, 1989; Veronis, 1990; Wilks, 1997). Unlike the information in a corpus, the information in the dictionary definitions is presorted into senses. However, the dictionary definitions alone do not contain enough information to allow reliable disambiguation. Recently, many works combined a MRD and a corpus for word sense disambiguation(Karov, 1998; Luk, 1995; Ng, 1996; Yarowsky,1995). In (Yarowsky,1995), the definition words were used as initial sense indicators, automatically tagging the target word examples containing them. These tagged examples were then used as seed examples in a bootstrapping process. In (Luk, 1995), using the dictionary definition, co-occurrence data of concepts, rather than words, is collected from a relatively small corpus to tackle the data sparseness problem. In (Karov, 1998), all the corpus examples of the dictionary definition words, instead of those word alone were used as sense indicators. In comparison, we suggest to combine the MRD definition words and usage examples as the sense indicators.</Paragraph>
    <Paragraph position="1"> Because the MRD's usage examples can be used as the sense-tagged instances, the sense indicators extracted from them are very useful for word sense disambiguation. And this yield much more sensepresorted training information.</Paragraph>
    <Paragraph position="2"> The problem of data sparseness, which is common for much corpus-based work, is especially severe for work in WSD. Traditional attempts to tackle the problem of data sparseness include the class-based approaches and similarity-based approaches. The class-based approaches(Brown, 1992; Luk, 1995; Pereira, 1993; Resnik, 1992) attempt to obtain the best estimates by combining observations of classes of words considered to belong to a common category. These methods answer in part the problem of data sparseness and eliminate the need for pretagged data. However, there is some information loss with these methods because the hypothesis that all words in the same class behave in a similar fashion is too strong. In the similarity-based approaches(Dagan, 1997; Karov, 1998), rather than a class, each word is modeled by its own set of similar words derived from statistical data extracted from corpora. However, deriving these sets of similar words requires a substantial amount of statistical data and thus these approaches require relatively large corpora.</Paragraph>
    <Paragraph position="3"> (Karov, 1998) proposed an extension to similarity-based methods by means of an iterative process at the learning stage with small corpus. Our system is similar to (Karov, 1998) with respect to similarity measure, which allows it to extract high-order contextual relationship. However, we attempt to concern a polysemous word's all senses in the training corpus, rather than restricting the word's sense set within binary senses and this allows our system to be more practical.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML