File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0715_intro.xml

Size: 5,783 bytes

Last Modified: 2025-10-06 14:06:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0715">
  <Title>I I | I I I I I I I I I I I Semi-automatic Induction of Systematic Polysemy from WordNet</Title>
  <Section position="2" start_page="0" end_page="108" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> WordNet (Miller, 1990) has been used as a general resource of broad-coverage lexical information in many Natural Language Processing (NLP) tasks, including sense tagging, text summarization and machine translation. However, like other large-scale knowledge-base systems or machine readable dictionaries (MRDs), WordNet contains massive ambiguity and redundancy. In particular, since WordNet senses are more fine-grained than most other MRDs such as LDOCE (Procter, 1978), each word entry is more ambiguous. For example, WordNet 1.6 (released December 1997) lists the following 9 senses for the verb write:  1. write, compose, pen, indite - produce a literary work 2. write - communicate or express by writing 3. publish, write - have (one's written work) issued for publication 4. write, drop a line - communicate (with) in writing 5. write - communicate by letter 6. compose, write- write music 7. write - mark or trace on a surface 8. write - record data on a computer 9. spell, write - write or name the letters  These fine sense distinctions may not be desired in some applications. Consequently any system which incorporates WordNet without customization must presume this redundancy, and may need to control the ambiguities in order to make the computation tractable.</Paragraph>
    <Paragraph position="1"> Although the redundancy in WordNet could be a drawback, it can be an ideal resource for a broad-coverage domain-independent semantic lexicon based on underspecified semantic classes (Buitelaar, 1997, 1998). An underspecified semantic class is an abstract semantic type which encodes systematic polysemy (or regular polysemy (Apresjan, 1973)): 1 a set of word senses that are related in systematic and predictable ways (eg. INSTITUTION and BUILDING meanings of the word school).</Paragraph>
    <Paragraph position="2"> These related word senses are grouped together, and assigned an abstract semantic class that generalizes the relation. This way, we do not need to distinguish or disambiguate word senses that encompass several semantic &amp;quot;axes&amp;quot;, and we can regard azt underspecified class as a multi-dimensional semantic entity. This abstract class is underspecified because it does not specify either one of the member senses. Here, in building a lexicon based on such underspecified semantic classes, redundancy in WordNet is a desirable property since the amount of information lost by abstraction is minimized. Also, since WordNet sense entries are taken from general but wide range of domains, systematic polysemy can be extracted from the dictionary rather than from a sense-tagged corpus. Therefore, data sparseness problems become less significant. Then, the resulting lexicon can effectively compact the redundancy and ambiguity in WordNet by two dimensions: abstraction and systematic polysemy.</Paragraph>
    <Paragraph position="3"> The use of underspecified semantic classes is one of the underspecification techniques being investigated in recent years (van Deemter and Peters, I Note that systematic polysemy should be contrasted with homonymy which refers to words which have more than one unrelated sense (eg. FINAN-CIAL_INSTITUTION and SLOPING_LAND meanings of the word bank).</Paragraph>
    <Paragraph position="5"> 1996). This underspecified class has several advantages. First, it can compactly represent the ambiguity which arises from multiple related senses.</Paragraph>
    <Paragraph position="6"> Thus it is more expressive and computationaUy efficient than single sense representations. Second, it can facilitate abductive inference through the systematicity between senses: given a word with n related senses, the identification of one sense in a context can imply maximally all n senses, some of which may only be implicit in the context. In addition, when two systematically polysemous words are used together, the combination enables even more powerful inferences through a complex matching between the two sets of systematic relations. Then, a domain-independent broad-coverage lexicon defined by such abstract underspecified classes can be used as a background lexicon in domain-specific reasoning tasks such as Information Extraction (Kilgarriff, 1997), or as a general semantic lexicon for parsing, as well as for many other NLP tasks that require contextual inferences.</Paragraph>
    <Paragraph position="7"> However, automatic acquisition of systematic polysemy has been a difficult task. In fact, in most previous work in lexical semantics it is done manually (Buitelaar, 1997, 1998). In this paper, we present a semi-automatic method of inducing underspecifled semantic classes from WordNet verbs and nouns. The method first applies a statistical analysis to obtain a rough approximation of the sense dependencies found in WordNet. Incorrect dependencies are then manually filtered out. Although the approach is not fully automated, it provides a principled way of acquiring systematic polysemy from a large-scale lexical resource, and greatly reduces the amount of manual effort that was previously required. Furthermore, by having a manual intervention, the results will be able to reflect our prior knowledge about WordNet that was not assumed in the statistical analysis. To see the usefulness of the induced semantic classes in the contextual inferences of real-world texts, predicate-argument structures are extracted from Brown corpus, and the occurrences of such classes are observed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML