File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1903_intro.xml

Size: 2,884 bytes

Last Modified: 2025-10-06 14:02:06

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1903">
  <Title>Ontology-based linguistic annotation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Linguistic annotation is crucial for the development and evaluation of natural language processing (NLP) tools. In particular machine-learning based approaches to part-of-speech tagging, word sense disambiguation, information extraction or anaphora resolution - just to name a few - rely on corpora annotated with the corresponding phenomenon to be trained and tested on. In this paper, we argue that linguistic annotation can to some extent be considered a special case of semantic annotation with regard to an ontology. Part-of-Speech (POS) annotation for example can be seen as the task of choosing the appropriate tag for a word from an ontology of word categories (compare for example the Penn Treebank POS tagset as described in (Marcus et al., 1993)). The annotation of word senses such as used by machine-learning based word sense disambiguation (WSD) tools corresponds to the task of selecting the correct semantic class or concept for a word from an underlying ontology such as WordNet (Resnik, 1997). Annotation by template filling such as used to train machine-learning based information extraction (IE) systems as (Ciravegna, 2001) can be seen as the task of finding and marking all the attributes of a given ontological concept in a text. An ontological concept in this sense can be a launching event, a management succession event or a person together with attributes such as name, affiliation, position, etc. The annotation of anaphoric or bridging relations is actually the task of identifying the semantic relation between two linguistic expressions representing a certain ontological concept.</Paragraph>
    <Paragraph position="1"> Most linguistic annotation tools make use of schema specifying what can actually be annotated. These schema can in fact be understood as a formal representation of the conceptualization underlying the annotation task. Ontologies are formal specifications of a conceptualization (Gruber, 1993) so that it seems straightforward to formalize annotation schemes as ontologies and make use of semantic annotation tools such as OntoMat (Handschuh et al., 2001) for the purpose of linguistic annotation.</Paragraph>
    <Paragraph position="2"> The structure of this paper is as follows: Section 2 presents the ontology-based framework for linguistic annotation, and section 3 shows how the framework can be applied to the annotation of anaphoric relations. Section 4 presents CREAM, a semantic annotation framework for the Semantic Web as well as its concrete implementation OntoMat. Finally, section 5 discusses related work, and section 6 concludes the paper.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML