File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-1100_intro.xml
Size: 4,088 bytes
Last Modified: 2025-10-06 14:03:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P06-1100"> <Title>Ontologizing Semantic Relations</Title> <Section position="4" start_page="0" end_page="793" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> NLP researchers have developed many algorithms for mining knowledge from text and the Web, including facts (Etzioni et al. 2005), semantic lexicons (Riloff and Shepherd 1997), concept lists (Lin and Pantel 2002), and word similarity lists (Hindle 1990). Many recent efforts have also focused on extracting binary semantic relations between entities, such as entailments (Szpektor et al. 2004), is-a (Ravichandran and Hovy 2002), part-of (Girju et al.</Paragraph> <Paragraph position="1"> 2003), and other relations.</Paragraph> <Paragraph position="2"> The output of most of these systems is flat lists of lexical semantic knowledge such as &quot;Italy is-a country&quot; and &quot;orange similar-to blue&quot;. However, using this knowledge beyond simple keyword matching, for example in inferences, requires it to be linked into formal semantic repositories such as ontologies or term banks like WordNet (Fellbaum 1998).</Paragraph> <Paragraph position="3"> Pantel (2005) defined the task of ontologizing a lexical semantic resource as linking its terms to the concepts in a WordNet-like hierarchy. For example, &quot;orange similar-to blue&quot; ontologizes in WordNet to &quot;orange#2 similar-to blue#1&quot; and &quot;orange#2 similar-to blue#2&quot;. In his framework, Pantel proposed a method of inducing ontological co-occurrence vectors which are subsequently used to ontologize unknown terms into WordNet with 74% accuracy.</Paragraph> <Paragraph position="4"> In this paper, we take the next step and explore two algorithms for ontologizing binary semantic relations into WordNet and we present empirical results on the task of attaching part-of and causation relations. Formally, given an instance (x, r, y) of a binary relation r between terms x and y, the ontologizing task is to identify the WordNet senses of x and y where r holds. For example, the instance (proton, PART-OF, element) ontologizes into WordNet as (proton#1, PART-OF, element#2).</Paragraph> <Paragraph position="5"> The first algorithm that we explore, called the anchoring approach, was suggested as a promising avenue of future work in (Pantel 2005). This bottom up algorithm is based on the intuition that x can be disambiguated by retrieving the set of terms that occur in the same relation r with y and then finding the senses of x that are most similar to this set. The assumption is that terms occurring in the same relation will tend to have similar meaning. In this paper, we propose a measure of similarity to capture this intuition.</Paragraph> <Paragraph position="6"> In contrast to anchoring, our second algorithm, called the clustering approach, takes a top-down view. Given a relation r, suppose that we are given every conceptual instance of r, i.e., instances of r in the upper ontology like (particles#1, PART-OF, substances#1). An instance (x, r, y) can then be ontologized easily by finding the senses of x and y that are subsumed by ancestors linked by a conceptual instance of r. For example, the instance (proton, PART-OF, element) ontologizes to (proton#1, PART-OF, element#2) since proton#1 is subsumed by particles and element#2 is subsumed by substances. The problem then is to automatically infer the set of con- null The ontological co-occurrence vector of a concept consists of all lexical co-occurrences with the concept in a corpus.</Paragraph> <Paragraph position="7"> ceptual instances. In this paper, we develop a clustering algorithm for generalizing a set of relation instances to conceptual instances by looking up the WordNet hypernymy hierarchy for common ancestors, as specific as possible, that subsume as many instances as possible. An instance is then attached to its senses that are subsumed by the highest scoring conceptual instances.</Paragraph> </Section> class="xml-element"></Paper>