File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-2608_intro.xml
Size: 5,825 bytes
Last Modified: 2025-10-06 14:04:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-2608"> <Title>Syntagmatic Kernels: a Word Sense Disambiguation Case Study</Title> <Section position="2" start_page="0" end_page="57" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In computational linguistics, it is usual to deal with sequences: words are sequences of letters and syntagmatic relations are established by sequences of words. Sequences are analyzed to measure morphological similarity, to detect multiwords, to represent syntagmatic relations, and so on. Hence modeling syntagmatic relations is crucial for a wide variety of NLP tasks, such as Named Entity Recognition (Gliozzo etal., 2005a) and WordSenseDisambiguation (WSD) (Strapparava et al., 2004). In general, the strategy adopted to model syntagmatic relations is to provide bigrams and trigrams of collocated words as features to describe local contexts (Yarowsky, 1994), and each word is regarded as a different instance to classify. For instance, occurrences of a given class of named entities (such as names of persons) can be discriminated in texts by recognizing word patterns in their local contexts.</Paragraph> <Paragraph position="1"> For example the token Rossi, whenever is preceded by the token Prof., often represents the name of a person. Another task that can benefit from modeling this kind of relations is WSD. To solve ambiguity it is necessary to analyze syntagmatic relations in the local context of the word to be disambiguated. In this paper we propose a kernel function that can be used to model such relations, the Syntagmatic Kernel, and we apply it to two (English and Italian) lexical-sample WSD tasks of the Senseval-3 competition (Mihalcea and Edmonds, 2004).</Paragraph> <Paragraph position="2"> In a lexical-sample WSD task, training data are provided as a set of texts, in which for each text a given target word is manually annotated with a sense from a predetermined set of possibilities. To model syntagmatic relations, the typical supervised learning framework adopts as features bigrams and trigrams in a local context. The main drawback of this approach is that non contiguous or shifted col- null locations cannot be identified, decreasing the generalization power of the learning algorithm. For example, suppose that the verb to score has to be disambiguated into the sentence &quot;Ronaldo scored the goal&quot;, and that the sense tagged example &quot;the football player scores#1the first goal&quot; isprovided for training. A traditional feature mapping would extract the bigram w+1 w+2:the goal to represent the former, and the bigram w+1 w+2:the first to index the latter. Evidently such features will not match, leading the algorithm to a misclassification.</Paragraph> <Paragraph position="3"> In the present paper we propose the Syntagmatic Kernel as an attempt to solve this problem. The Syntagmatic Kernel is based on a Gap-Weighted</Paragraph> <Section position="1" start_page="57" end_page="57" type="sub_section"> <SectionTitle> Subsequences Kernel (Shawe-Taylor and Cristian- </SectionTitle> <Paragraph position="0"> ini, 2004). In the spirit of Kernel Methods, this kernel is able to compare sequences directly in the input space, avoiding any explicit feature mapping.</Paragraph> <Paragraph position="1"> To perform this operation, it counts how many times a (non-contiguous) subsequence of symbols u of length n occurs in the input string s, and penalizes non-contiguous occurrences according to the number of the contained gaps. To define our Syntagmatic Kernel, we adapted the generic definition of the Sequence Kernels to the problem of recognizing collocations in local word contexts.</Paragraph> <Paragraph position="2"> In the above definition of Syntagmatic Kernel, only exact word-matches contribute to the similarity. One shortcoming of this approach is that (near)synonyms will never be considered similar, leading to a very low generalization power of the learning algorithm, that requires a huge amount of data to converge to an accurate prediction. To solve this problem we provided external lexical knowledge to the supervised learning algorithm, in order to define a &quot;soft-matching&quot; schema for the kernel function. For example, if we consider as equivalent the terms Ronaldo and football player, the proposition &quot;The football player scored the first goal&quot; is equivalent to the sentence &quot;Ronaldo scored the first goal&quot;, providing a strong evidence to disambiguate the latter occurrence of the verb.</Paragraph> <Paragraph position="3"> We propose two alternative soft-matching criteria exploiting twodifferent knowledge sources: (i)hand made resources and (ii) unsupervised term similarity measures. The first approach performs a soft-matching among allthose synonyms words in Word-Net, while the second exploits domain relations, acquired from unlabeled data, for the same purpose.</Paragraph> <Paragraph position="4"> Our experiments, performed on two standard WSD benchmarks, show the superiority of the Syntagmatic Kernel with respect to aclassical flatvector representation of bigrams and trigrams.</Paragraph> <Paragraph position="5"> The paper is structured as follows. Section 2 introduces the Sequence Kernels. In Section 3 the Syntagmatic Kernel is defined. Section 4 explains how soft-matching can be exploited by the Collocation Kernel, describing two alternative criteria: WordNet Synonymy and Domain Proximity. Section 5 gives a brief sketch of the complete WSD system, composed by the combination of different kernels, dealing with syntagmatic and paradigmatic aspects. Section 6evaluates theSyntagmatic Kernel, and finally Section 7 concludes the paper.</Paragraph> </Section> </Section> class="xml-element"></Paper>