File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-0501_concl.xml

Size: 5,654 bytes

Last Modified: 2025-10-06 13:55:28

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-0501">
  <Title>Enriching a formal ontology with a thesaurus: an aplication in the cultural heritage domain</Title>
  <Section position="9" start_page="10" end_page="12" type="concl">
    <SectionTitle>
5 Related work
</SectionTitle>
    <Paragraph position="0"> This paper presented a method to automatically annotate the gloses of a thesaurus, the AT, with the properties (conceptual relations) of a core ontology, the CIDOC-CRM. Several methods for ontology population and semantic annotation described in literature (e.g. (Thelen and Riloff, 202; Califf and Mooney, 204; Cimiano et al. 205; Valarakos et al. 204) use regular expressions to identify named entities, i.e. concept instances. Other methods extract  In AT the hyperny relation is already available, since T is a thesaurus, not a glosary. However we developed regular expresions also for hypernym extraction from definitions. For sake of space this is not discused in this paper, however the remarkable result (wrt analogous evaluations in literature) is that in 34% of the cases the automaticaly extracted hypernym is the same as in AT, and in 26% of the cases, either the extracted hypernym is more general than the one defined in AT, or the contrary, patterns (Snow et al. 205; Morin and Jaquemin 204) or supervised clustering techniques (Kashyap et al. 203).</Paragraph>
    <Paragraph position="1"> In our work, we automatically learn formal concepts, not simply instances or taxonomies (e.g. the graphs of Figure 3) compliant with the semantics of a well-established core ontology, the CIDOC. The method is unsupervised, in the sense that it does not need manual annotation of a significant fragment of text. However, it relies on a set of manually writen regular expresions, based on lexical, part-of-speech, and semantic constraints. The structure of regular expressions is rather more complex than in similar works using regular expressions, especially for the use of automatically verified semantic constraints.</Paragraph>
    <Paragraph position="2"> This complexity is indeed necessary to identify non-trivial relations in an unconstrained text and without training. The isue is however how much this method generalizes to other domains: * A first problem is the availability of lexical and semantic resources used by the algorithm. The most critical requirement of the method is the availability of sound domain core ontologies, which hopefuly wil be produced by other web communities stimulated by the recent success of CIDOC CRM. On the other side, in absence of an agreed conceptual reference model, no large scale anotation is posible at al. As for the other resources used by our algorithm, glosaries, thesaura and gazetteers are widely available in &amp;quot;mature&amp;quot; domains. If not, we developed a methodology, described in (Navigli and Velardi, 205b), to automatically create a glosary in novel domains (e.g. enterprise interoperability), extracting definition sentences from domain-relevant documents and authoritative web sites.</Paragraph>
    <Paragraph position="3"> * The second problem is about the generality of regular expressions. Clearly, the relation checkers that we defined are tuned on the CIDOC properties. This however is consistent with our target: in specific domains users are interested to identify specific relations, not general purpose.</Paragraph>
    <Paragraph position="4"> Certain relevant application domains -like cultural heritage, e-commerce, or tourismare those that dictate specifications for real-world applications of NLP techniques.</Paragraph>
    <Paragraph position="5"> However, several CIDOC properties are rather general (especially locative and wrt the AT hierarchy.</Paragraph>
    <Paragraph position="6">  temporal relations) therefore some relation checkers easily apply to other domains, as demonstrated by the experiment on automatic annotation of historical archives in Table 4. Furthermore, the method used to verify semantic constraints is fuly general, since it is based on WordNet and a generalpurpose, untrained semantic disambiguation algorithm, SSI.</Paragraph>
    <Paragraph position="7"> * Finally, the authors believe with some degree of convincement that automatic patternlearning methods often require non-trivial human effort just like manual methods (because of the need of annotated data, careful parameter setting, etc.), and furthermore they are unable to combine in a non-trivial way different types of features (e.g. lexical, syntactic, semantic). To make an example, a recent work on learning hypernymy patterns (Morin and Jacquemin, 204) provides the ful list of learned patterns. The complexity of these patterns is certainly lower than the regular expression structures used in this work, and many of them are rather intuitive.</Paragraph>
    <Paragraph position="8"> In the literature the tasks on which automatic methods have been tested are rather constrained, and do not convincingly demonstrate the superiority of automatic with respect to manually defined patterns. For example, in Senseval-3 (automated labeling of semantic roles  ), participating systems are requested to identify semantic roles in a sentence fragment for which the &amp;quot;frame semantics&amp;quot; is given, therefore the posible semantic relations to be identified are quite limited.</Paragraph>
    <Paragraph position="9"> However, we believe that our method can be automated to some degree (for example, machine learning methods can be used to botstrap the syntactic patterns, and to learn semantic constraints), a research line we are currently exploring.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML