File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/w97-0211_intro.xml

Size: 3,257 bytes

Last Modified: 2025-10-06 14:06:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="W97-0211">
  <Title>Towards a Bootstrapping Framework for Corpus Semantic Tagging</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Lexical Acquisition (LA) processes strongly rely on basic assumptions embodied by the source information and training examples. Several approaches to LA rely on some forms of declarative descriptions of source data: bracketed or POS tagged corpora are just examples. Many authors claim that class-based methods are more robust against data sparseness problems (Dagan,1994), (Pereira, 1993), (Brown et al.,1992). Other works (Basili et al.,1993a, 1996) demonstrated that a variety of lexical acquisition methods over small size corpora are viable whenever a domain specific semantic bias is available: using high level semantic classes (rather than simple words) increase the robustness of the probability driven methods, usually affected by coverage and data sparseness problems. Furthermore, domain specific semantic classes add expressivity to the underlying statistical acquisition model (Basili et al., 1996): this saves the knowledge engineer from having to deal with mysterious scores with no linguistic flavor. Semantic data possess an explanatory power that is truly required in specific knowledge domains.</Paragraph>
    <Paragraph position="1"> 1 This work has been partially supported by the Esprit LRE project n. 2110 - ECRAN Modeling semantic information is much more corpus and domain dependent than POS or syntactic tagging. Bracketed corpora are core components of an underlying grammatical knowledge to which resuits of different inductive methods equivalently refer. Such equivalence is no longer valid for semantic tagging when corpora (as well as underlying domains) change. In order to design tagging capabilities at a semantic level, it is more important to design adaptation capabilities to process a given corpus in a domain driven fashion. Tagging is a dynamic process that aims to produce a core semantic information to support several induction processes over the same domain.</Paragraph>
    <Paragraph position="2"> Availability of source information to support any tagging activity is problematic: general purpose sources (e.g. MRDs and static Lexical Knowledge Bases) may be too generic and worsen the induction quality, while specific domain sources are usually absent. Although semantic information is crucial to the induction of most lexical knowledge, accessing it is often impossible. As gold standards are fairly questionable, it is necessary to rely on sources that are as much systematic as possible and adapting their description to the underlying corpus. The widespread diffusion of WordNet, and its large scale as well, have motivated in several recent studies to start using it as a common source and adapt it for the purpose of the target LA task.</Paragraph>
    <Paragraph position="3"> In this framework, we consider tagging as a process carried out in two phases: (1) selection of the semantic tag system specific to the domain (tnning Wordnet); (2) use of the specific classification to tag the corpus in v/vo.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML