File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/e06-1003_intro.xml
Size: 4,973 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?> <Paper uid="E06-1003"> <Title>Weakly Supervised Approaches for Ontology Population</Title> <Section position="2" start_page="0" end_page="17" type="intro"> <SectionTitle> 1 Introduction Automatic Ontology Population (OP) from texts </SectionTitle> <Paragraph position="0"> has recently emerged as a new field of application for knowledge acquisition techniques (see, among others, (Buitelaar et al., 2005)). Although there is no a univocally accepted definition for the OP task, a useful approximation has been suggested (Bontcheva and Cunningham, 2003) as Ontology Driven Information Extraction, where, in place of atemplatetobefilled,thegoalofthetaskistheextraction and classification of instances of concepts and relations defined in a Ontology. The task has been approached in a variety of similar perspectives, including term clustering (e.g. (Lin, 1998a) and (Almuhareb and Poesio, 2004)) and term categorization (e.g. (Avancini et al., 2003)).</Paragraph> <Paragraph position="1"> A rather different task is Ontology Learning (OL), where new concepts and relations are supposed to be acquired, with the consequence of changing the definition of the Ontology itself (see, for instance, (Velardi et al., 2005)).</Paragraph> <Paragraph position="2"> In this paper OP is defined in the following scenario. Given a set of terms T = t1,t2,...,tn, a document collection D, where terms in T are supposed to appear, and a set of predefined classes C = c1,c2,...,cm denoting concepts in an Ontology, each term ti has to be assigned to the proper class in C. For the purposes of the experiments presented in this paper we assume that (i) classes in C are mutually disjoint and (ii) each term is assigned to just one class.</Paragraph> <Paragraph position="3"> As we have defined it, OP shows a strong similarity with Named Entity Recognition and Classification (NERC). However, a major difference is that in NERC each occurrences of a recognized term has to be classified separately, while in OP it is the term, independently of the context in which it appears, that has to be classified.</Paragraph> <Paragraph position="4"> While Information Extraction, and NERC in particular, have been addressed prevalently by means of supervised approaches, Ontology Populationistypicallyattackedinanunsupervisedway. null As many authors have pointed out (e.g. (Cimiano and V&quot;olker, 2005)), the main motivation is the fact that in OP the set of classes is usually larger and more fine grained than in NERC (where the typical set includes Person, Location, Organization, GPE, and a Miscellanea class for all other kind of entities). In addition, by definition, the set of classes in C changes as a new ontology is considered, making the creation of annotated data almost impossible practically.</Paragraph> <Paragraph position="5"> According with the demand for weakly supervised approaches to OP, we propose a method, called Class [?] Example, which learns a classification model from a set of classified terms, exploiting lexico-syntactic features. Unlike most of theapproacheswhichconsiderpairwisesimilarity between terms ((Cimiano and V&quot;olker, 2005); (Lin, 1998a)), the Class-Example method considers the similarity between a term ti and a set of training examples which represent a certain class. This results in a great number of class features and opens the possibility to exploit more statistical data, such asthefrequencyofappearance ofaclassfeature in different training terms.</Paragraph> <Paragraph position="6"> In order to show the effectiveness of the Class-Example approach, it has been compared against twodifferentapproaches: (i)aClass-Patternunsupervised approach, in the style of (Hearst, 1998); (ii) an unsupervised approach that considers the word of the class as a pivot word for acquiring relevant contexts for the class (we refer to this methodasClass[?]Word). Resultsofthecomparison show that the Class-Example method outperforms significantly the other two methods, making it appealing even considering the need of supervision. null Although the Class-Example method we propose is applicable in general, in this paper we show its usefulness when applied to terms denoting Named Entities. The motivation behind this choice is the practical value of Named Entity classifications, as, for instance, in applications such as Questions Answering and Information Extraction.</Paragraph> <Paragraph position="7"> Moreover, some Named Entity classes, including names of writers, athletes and organizations, dynamically change over the time, which makes it impossible to capture them in a static Ontology.</Paragraph> <Paragraph position="8"> The rest of the paper is structured as follows.</Paragraph> <Paragraph position="9"> Section 2 describes the state-of-the-art methods in Ontology Population. Section 3 presents the three approaches to the task we have compared. Section</Paragraph> </Section> class="xml-element"></Paper>