File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/w98-0710_intro.xml

Size: 5,775 bytes

Last Modified: 2025-10-06 14:06:44

<?xml version="1.0" standalone="yes"?>
<Paper uid="W98-0710">
  <Title>Aligning WordNet with Additional Lexical Resources</Title>
  <Section position="2" start_page="0" end_page="73" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Lexical resources used in natural language processing (NLP) have evolved from handcrafted lexical entries to machine readable lexical databases and large corpora which allow statistical manipulation.</Paragraph>
    <Paragraph position="1"> The availability of electronic versions of linguistic resources was a big leap. Among these resources we find conventional dictionaries as well as thesauri.</Paragraph>
    <Paragraph position="2"> However, it does not often suffice to depend on any single resource, either because it does not contain all required information or the information is not organised in a way suitable for the purpose. Merging different resources is therefore necessary. Calzolaxi's (1988) Italian lexical database, Knight and Luk's (1994) PANGLOSS ontology, and Klavans and Tzoukermann's (1995) bilingual lexicon axe some responses to this need.</Paragraph>
    <Paragraph position="3"> Many attempts have also been made to transform the implicit information in dictionary definitions to explicit knowledge bases for computational purposes (Amsler, 1981; Calzolaxi, 1984; Chodorow et al., 1985; Maxkowitz et al., 1986; Klavans et al., 1990; Vossen and Copestake, 1993). Nonetheless, dictionaries axe also infamous for their nonstandaxdised sense granularity, and the taxonomies obtained from definitions axe inevitably ad hoc. It would therefore be a good idea if we could integrate  such information from dictionaries with some existing, and widely exploited, classifications such as the system in Roget's Thesaurus (Roget, 1852), which has remained intact for years.</Paragraph>
    <Paragraph position="4"> We can see at least the following ways in which an integration of lexical resources would be useful in NLP: * Most NLP functions, notably word sense disambiguation (WSD), need to draw information from a variety of resources and cannot sufficiently rely on any single resource.</Paragraph>
    <Paragraph position="5"> * When different systems of sense tagging are used in different studies, we need a common ground for comparison. Knowing where one sense in one resource stands in another would enable better evaluation.</Paragraph>
    <Paragraph position="6"> * In attempting integration, we can discover how one resource differs from another and thus identify their individual limitations. This can guide improvement of the resources.</Paragraph>
    <Paragraph position="7"> An approach to the integration problem is offered by WordNet. WordNet is designed to enable conceptual search (Miller et al., 1993), and therefore it should provide a way of linking word level senses as those in dictionaries with semantic classes as those in thesauri. However, one important question is whether WordNet, a psycholinguistically-based resource, will work the same way as conventional linguistic resources do.</Paragraph>
    <Paragraph position="8"> We can divide this question into two parts. First, we axe interested in how similar the sense discrimination is in WordNet and in a conventional dictionary. Second, WordNet has a classificatory structure, but the principle of classification is somehow different from that of a thesaurus. As a result, terms which axe close in a thesaurus, thus allowing contextual sense disambiguation, may be found further apart in the WordNet taxonomy, which may therefore not be informative enough. For example, &amp;quot;car&amp;quot; and &amp;quot;driver&amp;quot; axe located in two different branches in the WordNet hierarchy and the only way to relate them is through the top node &amp;quot;entity&amp;quot;. This fails to uncover the conceptual closeness of the two</Paragraph>
    <Paragraph position="10"> words as Roget's Thesaurus does, for they are put in adjacent semantic classes (&amp;quot;Land travel&amp;quot; and &amp;quot;Traveller&amp;quot; respectively). Nevertheless, we believe that there must be some relation between the classes in WordNet and those in a thesaurus, which provides some means for us to make an association between them.</Paragraph>
    <Paragraph position="11"> We have therefore proposed an algorithm to link up the three kinds of resources, namely a conventional dictionary, WordNet and a thesaurus. This is made possible with the WordNet taxonomic hierarchy as the backbone because traversing the hierarchy gives many kinds of linking possibility. The resulting integrated information structure should then serve the following functions: * enhancing the lexical information in a dictionary with the taxonomic hierarchy in WordNet, and vice versa * complementing the taxonomic hierarchy in WordNet with the semantic classification in a thesaurus, and vice versa We have carried out an experiment, using the algorithm, to map senses in a dictionary to those in WordNe t, and those in WordNet to the classes in a thesaurus. Our aim has been to (i) assess the plausibility of the algorithm, and to (ii) explore how the various resources differ from one another. The results suggest that mappings axe in general successful (i.e. when links can be made, they are appropriate) while failures mostly arise from the inadequacy of individual resources. Based on these findings, we have also proposed some ways to overcome such inadequacies. null The algorithm is described in the next section.</Paragraph>
    <Paragraph position="12"> The test materials and the design are given in Section 3. The results are presented in Section 4. They are analysed and discussed in Section 5, where we also suggest some ways to apply them.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML