File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/p06-1100_metho.xml

Size: 26,460 bytes

Last Modified: 2025-10-06 14:10:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-1100">
  <Title>Ontologizing Semantic Relations</Title>
  <Section position="5" start_page="793" end_page="793" type="metho">
    <SectionTitle>
2 Relevant Work
</SectionTitle>
    <Paragraph position="0"> Several researchers have worked on ontologizing semantic resources. Most recently, Pantel (2005) developed a method to propagate lexical co-occurrence vectors to WordNet synsets, forming ontological co-occurrence vectors. Adopting an extension of the distributional hypothesis (Harris 1985), the co-occurrence vectors are used to compute the similarity between synset/synset and between lexical term/synset. An unknown term is then attached to the WordNet synset whose co-occurrence vector is most similar to the term's co-occurrence vector. Though the author suggests a method for attaching more complex lexical structures like binary semantic relations, the paper focused only on attaching terms.</Paragraph>
    <Paragraph position="1"> Basili (2000) proposed an unsupervised method to infer semantic classes (WordNet synsets) for terms in domain-specific verb relations. These relations, such as (x, EXPAND, y) are first automatically learnt from a corpus. The semantic classes of x and y are then inferred using conceptual density (Agirre and Rigau 1996), a Word-Net-based measure applied to all instantiation of x and y in the corpus. Semantic classes represent possible common generalizations of the verb arguments. At the end of the process, a set of syntactic-semantic patterns are available for each verb, such as:</Paragraph>
    <Paragraph position="3"> The method is successful on specific relations with few instances (such as domain verb relations) while its value on generic and frequent relations, such as part-of, was untested.</Paragraph>
    <Paragraph position="4"> Girju et al. (2003) presented a highly supervised machine learning algorithm to infer semantic constraints on part-of relations, such as (object#1, PART-OF, social_event#1). These constraints are then used as selectional restrictions in harvesting part-of instances from ambiguous lexical patterns, like &amp;quot;X of Y&amp;quot;. The approach shows high performance in terms of precision and recall, but, as the authors acknowledge, it requires large human effort during the training phase.</Paragraph>
    <Paragraph position="5"> Others have also made significant additions to WordNet. For example, in eXtended WordNet (Harabagiu et al. 1999), the glosses in WordNet are enriched by disambiguating the nouns, verbs, adverbs, and adjectives with synsets. Another work has enriched WordNet synsets with topically related words extracted from the Web (Agirre et al. 2001). Finally, the general task of word sense disambiguation (Gale et al. 1991) is relevant since there the task is to ontologize each term in a passage into a WordNet-like sense inventory. If we had a large collection of sense-tagged text, then our mining algorithms could directly discover WordNet attachment points at harvest time. However, since there is little high precision sense-tagged corpora, methods are required to ontologize semantic resources without fully disambiguating text.</Paragraph>
  </Section>
  <Section position="6" start_page="793" end_page="795" type="metho">
    <SectionTitle>
3 Ontologizing Semantic Relations
</SectionTitle>
    <Paragraph position="0"> Given an instance (x, r, y) of a binary relation r between terms x and y, the ontologizing task is to identify the senses of x and y where r holds. In this paper, we focus on WordNet 2.0 senses, though any similar term bank would apply.</Paragraph>
    <Paragraph position="2"> be the sets of all WordNet senses of x and y. A sense pair, s xy , is defined as any pair of senses of x and y: s  for which the relation r holds; and * Instantiate the relation in WordNet, using the synsets corresponding to all correct permutations between the senses in S'</Paragraph>
    <Paragraph position="4"> note this set of attachment points as S'</Paragraph>
    <Paragraph position="6"> is empty, no attachments are produced.</Paragraph>
    <Paragraph position="7"> For example, the instance (study, PART-OF, report) is ontologized into WordNet through the</Paragraph>
    <Paragraph position="9"> ={report#1}. The final attachment points S'</Paragraph>
    <Paragraph position="11"> Unlike common algorithms for word sense disambiguation, here it is important to take into consideration the semantic dependency between the two terms x and y. For example, an entity that is part-of a study has to be some kind of informa- null tion. This knowledge about mutual selectional preference (the preferred semantic class that fills a certain relation role, as x or y) can be exploited to ontologize the instance.</Paragraph>
    <Paragraph position="12"> In the following sections, we propose two algorithms for ontologizing binary semantic relations. null</Paragraph>
    <Section position="1" start_page="794" end_page="794" type="sub_section">
      <SectionTitle>
3.1 Method 1: Anchor Approach
</SectionTitle>
      <Paragraph position="0"> Given an instance (x, r, y), this approach fixes the term y, called the anchor, and then disambiguates x by looking at all other terms that occur in the relation r with y. Based on the principle of distributional similarity (Harris 1985), the algorithm assumes that the words that occur in the same relation r with y will be more similar to the correct sense(s) of x than the incorrect ones. After disambiguating x, the process is then inverted with x as the anchor to disambiguate y.</Paragraph>
      <Paragraph position="1"> In the first step, y is fixed and the algorithm retrieves the set of all other terms X' that occur in an instance (x', r, y), x' [?] X'  . For example, given the instance (reflections, PART-OF, book), and a resource containing the following relations: (false allegations, PART-OF, book) (stories, PART-OF, book) (expert analysis, PART-OF, book) (conclusions, PART-OF, book) the resulting set X' would be: {allegations, stories, analysis, conclusions}.</Paragraph>
      <Paragraph position="2"> All possible permutations, S xx' , between the senses of x and the senses of each term in X',  where the distance d(s</Paragraph>
      <Paragraph position="4"> ) is the length of the shortest path connecting the two synsets in the hypernymy hierarchy of WordNet, and f(s x' ) is the number of times sense s x' occurs in any of the instances of X'. Note that if no connection between two synsets exists, then r(s</Paragraph>
      <Paragraph position="6"> The overall sense score for each sense s  Finally, the algorithm inverts the process by setting x as the anchor and computes r(s</Paragraph>
      <Paragraph position="8"> For semantic relations between complex terms, like (expert analysis, PART-OF, book), only the head noun of terms are recorded, like &amp;quot;analysis&amp;quot;. As a future work, we plan to use the whole term if it is present in WordNet.</Paragraph>
      <Paragraph position="9"> each sense of y. All possible permutations of senses are computed and scored by averaging r(s  x ) and r(s y ). Permutations scoring higher than a threshold t  are selected as the attachment points in WordNet. We experimentally set t  = 0.02.</Paragraph>
    </Section>
    <Section position="2" start_page="794" end_page="795" type="sub_section">
      <SectionTitle>
3.2 Method 2: Clustering Approach
</SectionTitle>
      <Paragraph position="0"> The main idea of the clustering approach is to leverage the lexical behaviors of the two terms in an instance as a whole. The assumption is that the general meaning of the relation is derived from the combination of the two terms.</Paragraph>
      <Paragraph position="1"> The algorithm is divided in two main phases.</Paragraph>
      <Paragraph position="2"> In the first phase, semantic clusters are built using the WordNet senses of all instances. A semantic cluster is defined by the set of instances that have a common semantic generalization. We denote the conceptual instance of the semantic cluster as the pair of WordNet synsets that represents this generalization. For example the following two part-of instances: (second section, PART-OF, Los Angeles-area news) (Sandag study, PART-OF, report) are in a common cluster represented by the following conceptual instance: [writing#2, PART-OF, message#2] since writing#2 is a hypernym of both section and study, and message#2 is a hypernym of news and report  .</Paragraph>
      <Paragraph position="3"> In the second phase, the algorithm attaches an instance into WordNet by using WordNet distance metrics and frequency scores to select the best cluster for each instance. A good cluster is one that: * achieves a good trade-off between generality and specificity; and * disambiguates among the senses of x and y using the other instances' senses as support. For example, given the instance (second section, PART-OF, Los Angeles-area news) and the following conceptual instances:</Paragraph>
      <Paragraph position="5"> the first conceptual instance should be scored highest since it is both not too generic nor too specific and is supported by the instance (Sandag study, PART-OF, report), i.e., the conceptual instance subsumes both instances. The second and  Again, here, we use the syntactic head of each term for generalization since we assume that it drives the meaning of the term itself.</Paragraph>
      <Paragraph position="6">  the third conceptual instances should be scored lower since they are too generic, while the last two should be scored lower since the sense for section and news are not supported by other instances. The system then outputs, for each instance, the set of sense pairs that are subsumed by the highest scoring conceptual instance. In the previous example:</Paragraph>
      <Paragraph position="8"> are selected, as they are subsumed by [writing#2, PART-OF, message#2]. These sense pairs are then retained as attachment points into WordNet.</Paragraph>
      <Paragraph position="9"> Below, we describe each phase in more detail.</Paragraph>
      <Paragraph position="10">  Given an instance (x, r, y), all sense pair permu-</Paragraph>
      <Paragraph position="12"> is the number of hypernymy links needed to go from s</Paragraph>
      <Paragraph position="14"> ranges from [0, 1] and is highest when little generalization is needed.</Paragraph>
      <Paragraph position="15"> For example, the instance (Sandag study, PART-OF, report) produces 70 sense pairs since study has 10 senses and report has 7 senses. Assuming t  =1, the instance sense (survey#1, PART-OF, report#1) has the following set of candidate conceptual instances:</Paragraph>
      <Paragraph position="17"> Finally, each candidate conceptual instance c forms a cluster of all instances (x, r, y) that have some sense pair s</Paragraph>
      <Paragraph position="19"> as hyponyms of c. Note also that candidate conceptual instances may be subsumed by other candidate conceptual instances. Let G c refer to the set of all candidate conceptual instances subsumed by candidate conceptual instance c.</Paragraph>
      <Paragraph position="20"> Intuitively, better candidate conceptual instances are those that subsume both many instances and other candidate conceptual instances, but at the same time that have the least distance from subsumed instances. We capture this intuition with the following score of c:</Paragraph>
      <Paragraph position="22"> is the set of instances subsumed by c.</Paragraph>
      <Paragraph position="23"> We experimented with different variations of this score and found that it is important to put more weight on the distance between subsumed conceptual instances than the actual number of subsumed instances. Without the log terms, the highest scoring conceptual instances are too generic (i.e., they are too high up in the ontology).  In this phase, we utilize the conceptual instances of the previous phase to attach each instance (x, r, y) into WordNet.</Paragraph>
      <Paragraph position="24"> At the end of Phase 1, an instance can be clustered in different conceptual instances. In order to select an attachment, the algorithm selects the sense pair of x and y that is subsumed by the highest scoring candidate conceptual instance. It and all other sense pairs that are subsumed by this conceptual instance are then retained as the final attachment points.</Paragraph>
      <Paragraph position="25"> As a side effect, a final set of conceptual instances is obtained by deleting from each candidate those instances that are subsumed by a higher scoring conceptual instance. Remaining conceptual instances are then re-scored using score(c). The final set of conceptual instances thus contains unambiguous sense pairs.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="795" end_page="797" type="metho">
    <SectionTitle>
4 Experimental Results
</SectionTitle>
    <Paragraph position="0"> In this section we provide an empirical evaluation of our two algorithms.</Paragraph>
    <Section position="1" start_page="795" end_page="796" type="sub_section">
      <SectionTitle>
4.1 Experimental Setup
</SectionTitle>
      <Paragraph position="0"> Researchers have developed many algorithms for harvesting semantic relations from corpora and the Web. For the purposes of this paper, we may choose any one of them and manually validate its mined relations. We choose Espresso  , a generalpurpose, broad, and accurate corpus harvesting algorithm requiring minimal supervision. Adopt- null Reference suppressed - the paper introducing Espresso has also been submitted to COLING/ACL 2006.  ing a bootstrapping approach, Espresso takes as input a few seed instances of a particular relation and iteratively learns surface patterns to extract more instances.</Paragraph>
    </Section>
    <Section position="2" start_page="796" end_page="796" type="sub_section">
      <SectionTitle>
Test Sets
</SectionTitle>
      <Paragraph position="0"> We experiment with two relations: part-of and causation. The causation relation occurs when an entity produces an effect or is responsible for events or results, for example (virus, CAUSE, influenza) and (burning fuel, CAUSE, pollution). We manually built five seed relation instances for both relations and apply Espresso to a dataset consisting of a sample of articles from the Aquaint (TREC-9) newswire text collection. The sample consists of 55.7 million words extracted from the Los Angeles Times data files. Espresso extracted 1,468 part-of instances and 1,129 causation instances. We manually validated the output and randomly selected 200 correct relation instances of each relation for ontologizing into WordNet 2.0.</Paragraph>
    </Section>
    <Section position="3" start_page="796" end_page="796" type="sub_section">
      <SectionTitle>
Gold Standard
</SectionTitle>
      <Paragraph position="0"> We manually built a gold standard of all correct attachments of the test sets in WordNet. For each relation instance (x, r, y), two human annotators selected from all sense permutations of x and y the correct attachment points in WordNet. For example, for (synthetic material, PART-OF, filter), the judges selected the following attachment points: (synthetic material#1, PART-OF, filter#1) and (synthetic material#1, PART-OF, filter#2). The kappa statistic (Siegel and Castellan Jr. 1988) on the two relations together was K = 0.73.</Paragraph>
    </Section>
    <Section position="4" start_page="796" end_page="796" type="sub_section">
      <SectionTitle>
Systems
</SectionTitle>
      <Paragraph position="0"> The following three systems are evaluated: * BL: the baseline system that attaches each relation instance to the first (most common) WordNet sense of both terms;</Paragraph>
    </Section>
    <Section position="5" start_page="796" end_page="796" type="sub_section">
      <SectionTitle>
4.2 Precision, Recall and F-score
</SectionTitle>
      <Paragraph position="0"> For both the part-of and causation relations, we apply the three systems described above and compare their attachment performance using precision, recall, and F-score. Using the manually built gold standard, the precision of a system on a given relation instance is measured as the percentage of correct attachments and recall is measured as the percentage of correct attachments retrieved by the system. Overall system precision and recall are then computed by averaging the precision and recall of each relation instance.</Paragraph>
      <Paragraph position="1"> Table 1 and Table 2 report the results on the part-of and causation relations. We experimentally set the CL generalization parameter t  to 5 and the t  parameter for AN to 0.02.</Paragraph>
    </Section>
    <Section position="6" start_page="796" end_page="797" type="sub_section">
      <SectionTitle>
4.3 Discussion
</SectionTitle>
      <Paragraph position="0"> For both relations, CL and AN outperform the baseline in overall F-score. For part-of, Table 1 shows that CL outperforms BL by 13.6% in F-score and AN by 9.4%. For causation, Table 2 shows that AN outperforms BL by 4.4% on F-score and CL by 0.6%.</Paragraph>
      <Paragraph position="1"> The good results of the CL method on the part-of relation suggest that instances of this relation are particularly amenable to be clustered. The generality of the part-of relation in fact allows the creation of fairly natural clusters, corresponding to different sub-types of part-of, as those proposed in (Winston 1983). The causation relation, however, being more difficult to define at a semantic level (Girju 2003), is less easy to cluster and thus to disambiguate.</Paragraph>
      <Paragraph position="2"> Both CL and AN have better recall than BL, but precision results vary with CL beating BL only on the part-of relation. Overall, the system performances suggest that ontologizing semantic relations into WordNet is in general not easy.</Paragraph>
      <Paragraph position="3"> The better results of CL and AN with respect to BL suggest that the use of comparative semantic analysis among corpus instances is a good way to carry out disambiguation. Yet, the BL  method shows surprisingly good results. This indicates that also a simple method based on word sense usage in language can be valuable.</Paragraph>
      <Paragraph position="4"> An interesting avenue of future work is to better combine these two different views in a single system.</Paragraph>
      <Paragraph position="5"> The low recall results for CL are mostly attributed to the fact that in Phase 2 only the best scoring cluster is retained for each instance. This means that instances with multiple senses that do not have a common generalization are not captured. For example the part-of instance (wings, PART-OF, chicken) should cluster both in [body_part#1, PART-OF, animal#1] and [body_part#1, PART-OF, food#2], but only the best scoring one is retained.</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="797" end_page="798" type="metho">
    <SectionTitle>
5 Conceptual Instances: Other Uses
</SectionTitle>
    <Paragraph position="0"> Our clustering approach from Section 3.2 is enabled by learning conceptual instances - relations between mid-level ontological concepts. Beyond the ontologizing task, conceptual instances may be useful for several other tasks. In this section, we discuss some of these opportunities and present small qualitative evaluations.</Paragraph>
    <Paragraph position="1"> Conceptual instances represent common semantic generalizations of a particular relation.</Paragraph>
    <Paragraph position="2"> For example, below are two possible conceptual instances for the part-of relation: [person#1, PART-OF, organization#1] [act#1, PART-OF, plan#1] The first conceptual instance in the example subsumes all the part-of instances in which one or more persons are part of an organization, such as: (president Brown, PART-OF, executive council)  Below, we present three possible ways of exploiting these conceptual instances.</Paragraph>
    <Paragraph position="3"> Support to Relation Extraction Tools Conceptual instances may be used to support relation extraction algorithms such as Espresso. Most minimally supervised harvesting algorithm do not exploit generic patterns, i.e. those patterns with high recall but low precision, since they cannot separate correct and incorrect relation instances. For example, the pattern &amp;quot;X of Y&amp;quot; extracts many correct relation instances like &amp;quot;wheel of the car&amp;quot; but also many incorrect ones like &amp;quot;house of representatives&amp;quot;.</Paragraph>
    <Paragraph position="4"> Girju et al. (2003) described a highly supervised algorithm for learning semantic constraints on generic patterns, leading to a very significant increase in system recall without deteriorating precision. Conceptual instances can be used to automatically learn such semantic constraints by acting as a filter for generic patterns, retaining only those instances that are subsumed by high scoring conceptual instances. Effectively, conceptual instances are used as selectional restrictions for the relation. For example, our system discards the following incorrect instances: (week, CAUSE, coalition) (demeanor, CAUSE, vacuum) as they are both part of the very low scoring conceptual instance [abstraction#6, CAUSE, state#1]. Ontology Learning from Text Each conceptual instance can be viewed as a formal specification of the relation at hand. For example, Winston (1983) manually identified six sub-types of the part-of relation: membercollection, component-integral object, portionmass, stuff-object, feature-activity and placearea. Such classifications are useful in applications and tasks where a semantically rich organization of knowledge is required. Conceptual instances can be viewed as an automatic derivation of such a classification based on corpus usage. Moreover, conceptual instances can be used to improve the ontology learning process itself.</Paragraph>
    <Paragraph position="5"> For example, our clustering approach can be seen as an inductive step producing conceptual instances that are then used in a deductive step to learn new instances. An algorithm could iterate between the induction/deduction cycle until no new relation instances and conceptual instances can be inferred.</Paragraph>
    <Paragraph position="6">  exploit the selectional restrictions identified by conceptual instances to disambiguate ambiguous terms occurring in particular contexts. For example, given the sentence: &amp;quot;the board is composed by members of different countries&amp;quot; and a harvesting algorithm that extracts the part-of relation (members, PART-OF, board), the system could infer the correct senses for board and members by looking at their closest conceptual instance. In our system, we would infer the attachment (member#1, PART-OF, board#1) since it is part of the highest scoring conceptual instance [person#1, PART-OF, organization#1].</Paragraph>
    <Section position="1" start_page="798" end_page="798" type="sub_section">
      <SectionTitle>
5.1 Qualitative Evaluation
</SectionTitle>
      <Paragraph position="0"> Table 3 and Table 4 list samples of the highest ranking conceptual instances obtained by our system for the part-of and causation relations.</Paragraph>
      <Paragraph position="1"> Below we provide a small evaluation to verify: * the correctness of the conceptual instances.</Paragraph>
      <Paragraph position="2"> Incorrect conceptual instances such as [attribute#2, CAUSE, state#4], discovered by our system, can impede WSD and extraction tools where precise selectional restrictions are needed; and * the accuracy of the conceptual instances.</Paragraph>
      <Paragraph position="3"> Sometimes, an instance is incorrectly attached to a correct conceptual instance. For example, the instance (air mass, PART-OF, cold front) is incorrectly clustered in [group#1, PART-OF, multitude#3] since mass and front both have a sense that is descendant of group#1 and multitude#3. However, these are not the correct senses of mass and front for which the part-of relation holds.</Paragraph>
      <Paragraph position="4"> For evaluating correctness, we manually verify how many correct conceptual instances are produced by Phase 2 of the clustering approach described in Section 3.2. The claim is that a correct conceptual instance is one for which the relation holds for all possible subsumed senses. For example, the conceptual instance [group#1, PART-OF, multitude#3] is correct, as the relation holds for every semantic subsumption of the two senses. An example of an incorrect conceptual instance is [state#4, CAUSE, abstraction#6] since it subsumes the incorrect instance (audience, CAUSE, new context). A manual evaluation of the highest scoring 200 conceptual instances, generated on our test sets described in Section 4.1, showed 82% correctness for the part-of relation and 86% for causation.</Paragraph>
      <Paragraph position="5"> For estimating the overall clustering accuracy, we evaluated the number of correctly clustered instances in each conceptual instance. For example, the instance (business people, PART-OF, committee) is correctly clustered in [multitude#3, PART-OF, group#1] and the instance (law, PART-OF, constitutional pitfalls) is incorrectly clustered in [group#1, PART-OF, artifact#1]. We estimated the overall accuracy by manually judging the instances attached to 10 randomly sampled conceptual instances. The accuracy for part-of is 84% and for causation it is 76.6%.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="798" end_page="798" type="metho">
    <SectionTitle>
6 Conclusions
</SectionTitle>
    <Paragraph position="0"> In this paper, we proposed two algorithms for automatically ontologizing binary semantic relations into WordNet: an anchoring approach and a clustering approach. Experiments on the part-of and causation relations showed promising results. Both algorithms outperformed the baseline on F-score. Our best results were on the part-of relation where the clustering approach achieved 13.6% higher F-score than the baseline.</Paragraph>
    <Paragraph position="1"> The induction of conceptual instances has opened the way for many avenues of future work. We intend to pursue the ideas presented in Section 5 for using conceptual instances to: i) support knowledge acquisition tools by learning semantic constraints on extracting patterns; ii) support ontology learning from text; and iii) improve word sense disambiguation through selectional restrictions. Also, we will try different similarity score functions for both the clustering and the anchor approaches, as those surveyed in Corley and Mihalcea (2005).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML