File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/p05-1016_intro.xml
Size: 3,957 bytes
Last Modified: 2025-10-06 14:03:02
<?xml version="1.0" standalone="yes"?> <Paper uid="P05-1016"> <Title>Inducing Ontological Co-occurrence Vectors</Title> <Section position="2" start_page="0" end_page="125" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Despite considerable effort, there is still today no commonly accepted semantic corpus, semantic framework, notation, or even agreement on precisely which aspects of semantics are most useful (if at all). We believe that one important reason for this rather startling fact is the absence of truly wide-coverage semantic resources.</Paragraph> <Paragraph position="1"> Recognizing this, some recent work on wide coverage term banks, like WordNet (Miller 1990) and CYC (Lenat 1995), and annotated corpora, like FrameNet (Baker et al. 1998), Propbank (Kingsbury et al. 2002) and Nombank (Meyers et al. 2004), seeks to address the problem. But manual efforts such as these suffer from two drawbacks: they are difficult to tailor to new domains, and they have internal inconsistencies that can make automating the acquisition process difficult.</Paragraph> <Paragraph position="2"> In this work, we introduce a general framework for inducing co-occurrence feature vectors for nodes in a WordNet-like ontology. We believe that this framework will be useful for a variety of applications, including adding additional semantic information to existing semantic term banks by disambiguating lexical-semantic resources. null Ontologizing semantic resources Recently, researchers have applied text- and web-mining algorithms for automatically creating lexical semantic resources like similarity lists (Lin 1998), semantic lexicons (Riloff and Shepherd 1997), hyponymy lists (Shinzato and Torisawa 2004; Pantel and Ravichandran 2004), part-whole lists (Girgu et al. 2003), and verb relation graphs (Chklovski and Pantel 2004). However, none of these resources have been directly linked into an ontological framework. For example, in VERBOCEAN (Chklovski and Pantel 2004), we find the verb relation &quot;to surpass is-stronger-than to hit&quot;, but it is not specified that it is the achieving sense of hit where this relation applies.</Paragraph> <Paragraph position="3"> We term ontologizing a lexical-semantic resource as the task of sense disambiguating the resource. This problem is different but not orthogonal to word-sense disambiguation. If we could disambiguate large collections of text with high accuracy, then current methods for building lexical-semantic resources could easily be applied to ontologize them by treating each word's senses as separate words. Our method does not require the disambiguation of text. Instead, it relies on the principle of distributional similarity and that polysemous words that are similar in one sense are dissimilar in their other senses.</Paragraph> <Paragraph position="4"> Given the enriched ontologies produced by our method, we believe that ontologizing lexical-semantic resources will be feasible. For example, consider the example verb relation &quot;to surpass is-stronger-than to hit&quot; from above. To disambiguate the verb hit, we can look at all other verbs that to surpass is stronger than (for example, in VERBOCEAN, &quot;to surpass is-stronger-than to overtake&quot; and &quot;to surpass is-stronger-than to equal&quot;). Now, we can simply compare the lexical co-occurrence vectors of overtake and equal with the ontological feature vectors of the senses of hit (which are induced by our framework). The sense whose feature vector is most similar is selected.</Paragraph> <Paragraph position="5"> It remains to be seen in future work how well this approach performs on ontologizing various semantic resources. In this paper, we focus on the general framework for inducing the ontological co-occurrence vectors and we apply it to the task of linking new terms into the ontology.</Paragraph> </Section> class="xml-element"></Paper>