File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1112_intro.xml

Size: 2,693 bytes

Last Modified: 2025-10-06 14:01:25

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1112">
  <Title>Syntactic features for high precision Word Sense Disambiguation</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1. Introduction
</SectionTitle>
    <Paragraph position="0"> Supervised learning has become the most successful paradigm for Word Sense Disambiguation (WSD). This kind of algorithms follows a two-step process:  1. Choosing the representation as a set of features for the context of occurrence of the target word senses.</Paragraph>
    <Paragraph position="1"> 2. Applying a Machine Learning (ML)  algorithm to train on the extracted features and tag the target word in the test examples. Current WSD systems attain high performances for coarse word sense differences (two or three senses) if enough training material is available. In contrast, the performance for finer-grained sense differences (e.g. WordNet senses as used in Senseval 2 (Preiss &amp; Yarowsky, 2001)) is far from application needs. Nevertheless, recent work (Agirre and Martinez, 2001a) shows that it is possible to exploit the precision-coverage trade-off and build a high precision WSD system that tags a limited number of target words with a predefined precision.</Paragraph>
    <Paragraph position="2"> This paper explores the contribution of a broad set of syntactically motivated features that ranges from the presence of complements and adjuncts, and the detection of subcategorization frames, up to grammatical relations instantiated with specific words. The performance of the syntactic features is measured in isolation and in combination with a basic set of local and topical features (as defined in the literature), and using two ML algorithms: Decision Lists (Dlist) and AdaBoost (Boost). While Dlist does not attempt to combine the features, i.e. it takes the strongest feature only, Boost tries combinations of features and also uses negative evidence, i.e. the absence of features.</Paragraph>
    <Paragraph position="3"> Additionally, the role of syntactic features in a high-precision WSD system based on the precision-coverage trade-off is also investigated. The paper is structured as follows. Section 2 reviews the features previously used in the literature. Section 3 defines a basic feature set based on the preceding review. Section 4 presents the syntactic features as defined in our work, alongside the parser used. In section 5 the two ML algorithms are presented, as well as the strategies for the precision-coverage trade-off. Section 6 shows the experimental setting and the results. Finally section 7 draws the conclusions and summarizes further work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML