Syntagmatic Kernels:
a Word Sense Disambiguation Case Study
Claudio Giuliano and Alfio Gliozzo and Carlo Strapparava
ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica
I-38050, Trento, ITALY
{giuliano,gliozzo,strappa}@itc.it
Abstract
In this paper we present a family of ker-
nel functions, named Syntagmatic Ker-
nels, which can be used to model syn-
tagmatic relations. Syntagmatic relations
hold among words that are typically collo-
cated in a sequential order, and thus they
can be acquired by analyzing word se-
quences. In particular, Syntagmatic Ker-
nels are defined by applying a Word Se-
quence Kernel to the local contexts of the
words to be analyzed. In addition, this
approach allows us to define a semi su-
pervised learning schema where external
lexical knowledge is plugged into the su-
pervised learning process. Lexical knowl-
edge is acquired from both unlabeled data
and hand-made lexical resources, such as
WordNet. We evaluated the syntagmatic
kernel on two standard Word Sense Dis-
ambiguation tasks (i.e. English and Ital-
ian lexical-sample tasks of Senseval-3),
where the syntagmatic information plays
a crucial role. We compared the Syntag-
matic Kernel with the standard approach,
showing promising improvements in per-
formance.
1 Introduction
In computational linguistics, it is usual to deal with
sequences: words are sequences of letters and syn-
tagmatic relations are established by sequences of
words. Sequences are analyzed to measure morpho-
logical similarity, to detect multiwords, to represent
syntagmatic relations, and so on. Hence modeling
syntagmatic relations is crucial for a wide variety
of NLP tasks, such as Named Entity Recognition
(Gliozzo etal., 2005a) and WordSenseDisambigua-
tion (WSD) (Strapparava et al., 2004).
In general, the strategy adopted to model syntag-
matic relations is to provide bigrams and trigrams of
collocated words as features to describe local con-
texts (Yarowsky, 1994), and each word is regarded
as a different instance to classify. For instance, oc-
currences of a given class of named entities (such
as names of persons) can be discriminated in texts
by recognizing word patterns in their local contexts.
For example the token Rossi, whenever is preceded
by the token Prof., often represents the name of a
person. Another task that can benefit from modeling
this kind of relations is WSD. To solve ambiguity it
is necessary to analyze syntagmatic relations in the
local context of the word to be disambiguated. In
this paper we propose a kernel function that can be
used to model such relations, the Syntagmatic Ker-
nel, and we apply it to two (English and Italian)
lexical-sample WSD tasks of the Senseval-3 com-
petition (Mihalcea and Edmonds, 2004).
In a lexical-sample WSD task, training data are
provided as a set of texts, in which for each text
a given target word is manually annotated with a
sense from a predetermined set of possibilities. To
model syntagmatic relations, the typical supervised
learning framework adopts as features bigrams and
trigrams in a local context. The main drawback of
this approach is that non contiguous or shifted col-
57
locations cannot be identified, decreasing the gener-
alization power of the learning algorithm. For ex-
ample, suppose that the verb to score has to be dis-
ambiguated into the sentence “Ronaldo scored the
goal”, and that the sense tagged example “the foot-
ball player scores#1the first goal” isprovided for
training. A traditional feature mapping would ex-
tract the bigram w+1 w+2:the goal to represent the
former, and the bigram w+1 w+2:the first to index
the latter. Evidently such features will not match,
leading the algorithm to a misclassification.
In the present paper we propose the Syntagmatic
Kernel as an attempt to solve this problem. The
Syntagmatic Kernel is based on a Gap-Weighted
Subsequences Kernel (Shawe-Taylor and Cristian-
ini, 2004). In the spirit of Kernel Methods, this
kernel is able to compare sequences directly in the
input space, avoiding any explicit feature mapping.
To perform this operation, it counts how many times
a (non-contiguous) subsequence of symbols u of
length n occurs in the input string s, and penalizes
non-contiguous occurrences according to the num-
ber of the contained gaps. To define our Syntag-
matic Kernel, we adapted the generic definition of
the Sequence Kernels to the problem of recognizing
collocations in local word contexts.
In the above definition of Syntagmatic Kernel,
only exact word-matches contribute to the similar-
ity. One shortcoming of this approach is that (near-
)synonyms will never be considered similar, lead-
ing to a very low generalization power of the learn-
ing algorithm, that requires a huge amount of data
to converge to an accurate prediction. To solve this
problem we provided external lexical knowledge to
the supervised learning algorithm, in order to define
a “soft-matching” schema for the kernel function.
For example, if we consider as equivalent the terms
Ronaldo and football player, the proposition “The
football player scored the first goal” is equivalent to
the sentence “Ronaldo scored the first goal”, pro-
viding a strong evidence to disambiguate the latter
occurrence of the verb.
We propose two alternative soft-matching criteria
exploiting twodifferent knowledge sources: (i)hand
made resources and (ii) unsupervised term similar-
ity measures. The first approach performs a soft-
matching among allthose synonyms words in Word-
Net, while the second exploits domain relations, ac-
quired from unlabeled data, for the same purpose.
Our experiments, performed on two standard
WSD benchmarks, show the superiority of the Syn-
tagmatic Kernel with respect to aclassical flatvector
representation of bigrams and trigrams.
The paper is structured as follows. Section 2 in-
troduces the Sequence Kernels. In Section 3 the
Syntagmatic Kernel is defined. Section 4 explains
how soft-matching can be exploited by the Collo-
cation Kernel, describing two alternative criteria:
WordNet Synonymy and Domain Proximity. Sec-
tion 5 gives a brief sketch of the complete WSD
system, composed by the combination of different
kernels, dealing with syntagmatic and paradigmatic
aspects. Section 6evaluates theSyntagmatic Kernel,
and finally Section 7 concludes the paper.
2 Sequence Kernels
The basic idea behind kernel methods is to embed
the data into a suitable feature space F via a map-
ping function φ : X → F, and then use a linear al-
gorithm for discovering nonlinear patterns. Instead
of using the explicit mapping φ, we can use a kernel
function K : X ×X → R, that corresponds to the
inner product in a feature space which is, in general,
different from the input space.
Kernel methods allow us to build a modular sys-
tem, as the kernel function acts as an interface be-
tween the data and the learning algorithm. Thus
the kernel function becomes the only domain spe-
cific module of the system, while the learning algo-
rithm is a general purpose component. Potentially
any kernel function can work with any kernel-based
algorithm. In our system weuse Support Vector Ma-
chines (Cristianini and Shawe-Taylor, 2000).
Sequence Kernels (or String Kernels) are a fam-
ily of kernel functions developed to compute the
inner product among images of strings in high-
dimensional feature space using dynamic program-
ming techniques (Shawe-Taylor and Cristianini,
2004). The Gap-Weighted Subsequences Kernel is
the most general Sequence Kernel. Roughly speak-
ing, it compares two strings by means of the num-
ber of contiguous and non-contiguous substrings of
a given length they have in common. Non contigu-
ous occurrences are penalized according to the num-
ber of gaps they contain.
58
Formally, let Σ be an alphabet of |Σ| symbols,
and s = s1s2 ...s|s| a finite sequence over Σ (i.e.
si ∈ Σ,1 lessorequalslant i lessorequalslant |s|). Let i = [i1,i2,...,in], with
1 lessorequalslant i1 < i2 < ... < in lessorequalslant |s|, be a subset of the
indices in s: we will denote as s[i] ∈ Σn the sub-
sequence si1si2 ...sin. Note that s[i] does not nec-
essarily form a contiguous subsequence of s. For
example, if s is the sequence “Ronaldo scored the
goal” and i = [2,4], then s[i] is “scored goal”. The
length spanned by s[i] in s is l(i) = in − i1 + 1.
The feature space associated with the Gap-Weighted
Subsequences Kernel of length n is indexed by I =
Σn, with the embedding given by
φnu(s) =
X
i:u=s[i]
λl(i),u ∈ Σn, (1)
where λ ∈]0,1] is the decay factor used to penalize
non-contiguous subsequences1. The associate ker-
nel is defined as
Kn(s,t) = 〈φn(s),φn(t)〉 =
X
u∈Σn
φnu(s)φnu(t). (2)
An explicit computation of Equation 2 is unfea-
sible even for small values of n. To evaluate more
efficiently Kn, weusetherecursive formulation pro-
posed in (Lodhi et al., 2002; Saunders et al., 2002;
Cancedda et al., 2003) based on a dynamic program-
ming implementation. It is reported in the following
equations:
Kprime0(s,t) = 1,∀s,t, (3)
Kprimei(s,t) = 0,if min(|s|,|t|) < i, (4)
Kprimeprimei (s,t) = 0,if min(|s|,|t|) < i, (5)
Kprimeprimei (sx,ty) =
(
λKprimeprimei (sx,t), if x negationslash= y;
λKprimeprimei (sx,t) + λ2Kprimei−1(s,t), otherwise.
(6)
Kprimei(sx,t) = λKprimei(s,t) + Kprimeprimei (sx,t), (7)
Kn(s,t) = 0,if min(|s|,|t|) < n, (8)
Kn(sx,t) = Kn(s,t) +
X
j:tj=x
λ2Kprimen−1(s,t[1 : j − 1]),
(9)
Kprimen and Kprimeprimen are auxiliary functions with a sim-
ilar definition as Kn used to facilitate the compu-
tation. Based on all definitions above, Kn can be
1Notice that by choosing λ = 1 sparse subsequences are
not penalized. On the other hand, the kernel does not take into
account sparse subsequences with λ → 0.
computed in O(n|s||t|). Using the above recursive
definition, it turns out that computing all kernel val-
ues for subsequences of lengths up to n is not signif-
icantly more costly than computing the kernel for n
only.
In the rest of the paper we will use the normalised
version of the kernel (Equation 10) to keep the val-
ues comparable for different values of n and to be
independent from the length of the sequences.
ˆK(s,t) = K(s,t)p
K(s,s)K(t,t). (10)
3 The Syntagmatic Kernel
As stated in Section 1, syntagmatic relations hold
among words arranged in a particular temporal or-
der, hence they can be modeled by Sequence Ker-
nels. The Syntagmatic Kernel is defined as a linear
combination of Gap-Weighted Subsequences Ker-
nels thatoperate atwordandPoStaglevel. Inpartic-
ular, following the approach proposed by Cancedda
et al. (2003), it is possible to adapt sequence kernels
to operate at word level by instancing the alphabet Σ
with the vocabulary V = {w1,w2,...,wk}. More-
over, we restricted the generic definition of the Gap-
Weighted Subsequences Kernel to recognize collo-
cations in the local context of a specified word. The
resulting kernel, called n-gram Collocation Kernel
(KnColl), operates on sequences of lemmata around a
specified word l0 (i.e.l−3,l−2,l−1,l0,l+1,l+2,l+3).
Thisformulation allowsustoestimate thenumber of
common (sparse) subsequences of lemmata (i.e. col-
locations) between twoexamples, inorder tocapture
syntagmatic similarity.
Analogously, we defined the PoS Kernel (KnPoS)
to operate on sequences of PoS tags p−3, p−2, p−1,
p0, p+1, p+2, p+3, where p0 is the PoS tag of l0.
The Collocation Kernel and the PoS Kernel are
defined by Equations 11 and 12, respectively.
KColl(s,t) =
nsummationdisplay
l=1
KlColl(s,t) (11)
and
KPoS(s,t) =
nsummationdisplay
l=1
KlPoS(s,t). (12)
Both kernels depend on the parameter n, the length
of the non-contiguous subsequences, and λ, the de-
59
cay factor. For example, K2Coll allows us to repre-
sent all (sparse) bi-grams in the local context of a
word.
Finally, the Syntagmatic Kernel is defined as
KSynt(s,t) = KColl(s,t) + KPoS(s,t). (13)
We will show that in WSD, the Syntagmatic Ker-
nel is more effective than standard bigrams and tri-
grams of lemmata and PoS tags typically used as
features.
4 Soft-Matching Criteria
In the definition of the Syntagmatic Kernel only ex-
act word matches contribute to the similarity. To
overcome this problem, we further extended the def-
inition of the Gap-Weigthed Subsequences Kernel
given in Section 2 to allow soft-matching between
words. In order to develop soft-matching criteria,
we follow the idea that two words can be substi-
tuted preserving the meaning of the whole sentence
if they are paradigmatically related (e.g. synomyns,
hyponyms or domain related words). If the meaning
of the proposition as a whole is preserved, the mean-
ing of the lexical constituents of the sentence will
necessarily remain unchanged too, providing a vi-
able criterion todefineasoft-matching schema. This
can be implemented by “plugging” external paradig-
matic information into the Collocation kernel.
Following the approach proposed by (Shawe-
Taylor and Cristianini, 2004), the soft-matching
Gap-Weighted Subsequences Kernel is now calcu-
lated recursively using Equations 3 to 5, 7 and 8,
replacing Equation 6 by the equation:
Kprimeprimei (sx,ty) = λKprimeprimei (sx,t) + λ2axyKprimei−1(s,t),∀x,y, (14)
and modifying Equation 9 to:
Kn(sx,t) = Kn(s,t) +
|t|X
j
λ2axtjKprimen−1(s,t[1 : j −1]).
(15)
where axy are entries in a similarity matrix A be-
tween symbols (words). In order to ensure that the
resulting kernel is valid, A must be positive semi-
definite.
In the following subsections, we describe two al-
ternative soft-matching criteria based on WordNet
Synonymy and Domain Proximity. In both cases, to
show that the similarity matrices are a positive semi-
definite we use the following result:
Proposition 1 A matrix A is positive semi-definite
if and only if A = BTB for some real matrix B.
The proof is given in (Shawe-Taylor and Cristianini,
2004).
4.1 WordNet Synonymy
The first solution we have experimented exploits a
lexical resource representing paradigmatic relations
among terms, i.e. WordNet. In particular, we used
WordNet-1.7.1 for English and the Italian part of
MultiWordNet2.
In order to find a similarity matrix between terms,
we defined a vector space where terms are repre-
sented by the WordNet synsets in which such terms
appear. Hence, we can view a term as vector in
which each dimension is associated with one synset.
The term-by-synset matrix S is then the matrix
whose rows are indexed by the synsets. The en-
try xij of S is 1 if the synset sj contains the term
wi, and 0 otherwise. The term-by-synset matrix S
gives rise to the similarity matrix A = SST be-
tween terms. Since A can be rewritten as A =
(ST)TST = BTB, it follows directly by Proposi-
tion 1 that it is positive semi-definite.
It is straightforward to extend the soft-matching
criterion to include hyponym relation, but we
achieved worse results. In the evaluation section we
will not report such results.
4.2 Domain Proximity
The approach described above requires a large scale
lexical resource. Unfortunately, formanylanguages,
such a resource is not available. Another possibility
for implementing soft-matching is introducing the
notion of Semantic Domains.
Semantic Domains are groups of strongly
paradigmatically related words, and can be acquired
automatically from corpora in a totally unsuper-
vised way (Gliozzo, 2005). Our proposal is to ex-
ploit a Domain Proximity relation to define a soft-
matching criterion on the basis of an unsupervised
similarity metric defined in a Domain Space. The
Domain Space can be determined once a Domain
2http://multiwordnet.itc.it
60
Model (DM) is available. This solution is evidently
cheaper, because large collections of unlabeled texts
can be easily found for every language.
A DM is represented by a k ×kprime rectangular ma-
trix D, containing the domain relevance for each
term with respect to each domain, as illustrated in
Table 1. DMs can be acquired from texts by exploit-
MEDICINE COMPUTER SCIENCE
HIV 1 0
AIDS 1 0
virus 0.5 0.5
laptop 0 1
Table 1: Example of Domain Model.
ing a lexical coherence assumption (Gliozzo, 2005).
Tothisaim, TermClustering algorithms canbeused:
a different domain is defined for each cluster, and
thedegree ofassociation between termsandclusters,
estimated by the unsupervised learning algorithm,
provides a domain relevance function. As a clus-
tering technique we exploit Latent Semantic Analy-
sis (LSA), following the methodology described in
(Gliozzo et al., 2005b). This operation is done off-
line, and can be efficiently performed on large cor-
pora.
LSA is performed by means of SVD of the term-
by-document matrixTrepresenting the corpus. The
SVDalgorithm can be exploited to acquire adomain
matrix D from a large corpus in a totally unsuper-
vised way. SVD decomposes the term-by-document
matrix T into three matrices T = VΣkUT where
Σk is the diagonal k × k matrix containing the k
singular values of T. D = VΣkprime where kprime lessmuch k.
Once a DM has been defined by the matrixD, the
Domain Space is a kprime dimensional space, in which
both texts and terms are represented by means of
Domain Vectors (DVs), i.e. vectors representing the
domain relevances among the linguistic object and
each domain. The DV vectorwprimei for the term wi ∈ V is the
ith row of D, where V = {w1,w2,...,wk} is the
vocabulary of the corpus.
The term-by-domain matrix D gives rise to the
term-by-term similarity matrix A = DDT among
terms. It follows from Proposition 1 that A is posi-
tive semi-definite.
5 Kernel Combination for WSD
To improve the performance of a WSD system, it
is possible to combine different kernels. Indeed,
we followed this approach in the participation to
Senseval-3 competition, reaching the state-of-the-
art in many lexical-sample tasks (Strapparava et al.,
2004). While this paper is focused on Syntagmatic
Kernels, in this section we would like to spend some
words on another important component for a com-
plete WSD system: the Domain Kernel, used to
model domain relations.
Syntagmatic information alone is not sufficient to
define a full kernel for WSD. In fact, in (Magnini
et al., 2002), it has been claimed that knowing the
domain of the text in which the word is located is a
crucial information for WSD. For example the (do-
main) polysemy among the COMPUTER SCIENCE
and the MEDICINE senses of the word virus can
be solved by simply considering the domain of the
context in which it is located.
This fundamental aspect of lexical polysemy can
be modeled by defining a kernel function to esti-
mate the domain similarity among the contexts of
the words to be disambiguated, namely the Domain
Kernel. The Domain Kernel measures the similarity
among the topics (domains) of two texts, so to cap-
ture domain aspects of sense distinction. It is a vari-
ation of the Latent Semantic Kernel (Shawe-Taylor
and Cristianini, 2004), in which a DM is exploited
to define an explicit mapping D : Rk → Rkprime from
the Vector Space Model (Salton and McGill, 1983)
into the Domain Space (see Section 4), defined by
the following mapping:
D(vectortj) = vectortj(IIDFD) = vectortprimej (16)
where IIDF is a k × k diagonal matrix such that
iIDFi,i = IDF(wi), vectortj is represented as a row vector,
and IDF(wi) isthe Inverse Document Frequency of
wi. The Domain Kernel is then defined by:
KD(ti,tj) = 〈D(ti),D(tj)〉radicalbig〈D(t
j),D(tj)〉〈D(ti),D(ti)〉
(17)
The final system for WSD results from a com-
bination of kernels that deal with syntagmatic and
paradigmatic aspects (i.e. PoS, collocations, bag of
words, domains), according to the following kernel
61
combination schema:
KC(xi,xj) =
nsummationdisplay
l=1
Kl(xi,xj)radicalbig
Kl(xj,xj)Kl(xi,xi) (18)
6 Evaluation
In this section we evaluate the Syntagmatic Kernel,
showing that it improves over the standard feature
extraction technique based on bigrams and trigrams
of words and PoS tags.
6.1 Experimental settings
We conducted the experiments on two lexical sam-
ple tasks (English and Italian) of the Senseval-3
competition (Mihalcea and Edmonds, 2004). In
lexical-sample WSD, after selecting some target
words, training data is provided as a set of texts.
For each text a given target word is manually anno-
tated with a sense from a predetermined set of pos-
sibilities. Table 2 describes the tasks by reporting
the number of words to be disambiguated, the mean
polysemy, and the dimension of training, test and
unlabeled corpora. Note that the organizers of the
English task did not provide any unlabeled material.
So for English we used a domain model built from
the training partition of the task (obviously skipping
the sense annotation), while for Italian we acquired
the DM from the unlabeled corpus made available
by the organizers.
#w pol # train # test # unlab
English 57 6.47 7860 3944 7860
Italian 45 6.30 5145 2439 74788
Table 2: Dataset descriptions.
6.2 Performance of the Syntagmatic Kernel
Table 3 shows the performance of the Syntagmatic
Kernel on both data sets. As baseline, we report
the result of a standard approach consisting on ex-
plicit bigrams and trigrams of words and PoS tags
around the words to be disambiguated (Yarowsky,
1994). The results show that the Syntagmatic Ker-
nel outperforms the baseline in any configuration
(hard/soft-matching). The soft-matching criteria
further improve the classification performance. It
is interesting to note that the Domain Proximity
methodology obtained better results than WordNet
Standard approach
English Italian
Bigrams and trigrams 67.3 51.0
Syntagmatic Kernel
Hard matching 67.7 51.9
Soft matching (WordNet) 67.3 51.3
Soft matching (Domain proximity) 68.5 54.0
Table 3: Performance (F1) of the Syntagmatic Ker-
nel.
Synonymy. The different results observed between
Italian and English using the Domain Proximity
soft-matching criterion are probably due to the small
size of the unlabeled English corpus.
In these experiments, the parameters n and λ are
optimized by cross-validation. For KnColl, we ob-
tained the best results with n = 2 and λ = 0.5. For
KnPoS, n = 3 and λ → 0. The domain cardinality kprime
was set to 50.
Finally, the global performance (F1) of the full
WSD system (see Section 5) on English and Italian
lexical sample tasks is 73.3 for English and 61.3 for
Italian. To our knowledge, these figures represent
the current state-of-the-art on these tasks.
7 Conclusion and Future Work
In this paper we presented the Syntagmatic Kernels,
i.e. a set of kernel functions that can be used to
model syntagmatic relations for a wide variety of
Natural Language Processing tasks. In addition, we
proposed twosoft-matching criteria forthesequence
analysis, which can be easily modeled by relax-
ing the constraints in a Gap-Weighted Subsequences
Kernel applied to local contexts of the word to be
analyzed. Experiments, performed on two lexical
sample Word Sense Disambiguation benchmarks,
show that our approach further improves the stan-
dard techniques usually adopted to deal with syntag-
matic relations. In addition, the Domain Proximity
soft-matching criterion allows us to define a semi-
supervised learning schema, improving the overall
results.
For the future, we plan to exploit the Syntagmatic
Kernel for a wide variety of Natural Language Pro-
cessing tasks, such as Entity Recognition and Re-
lation Extraction. In addition we are applying the
soft matching criteria here defined to Tree Kernels,
62
in order to take into account lexical variability in
parse trees. Finally, we are going to further improve
the soft-matching criteria here proposed by explor-
ing the use of entailment criteria for substitutability.
Acknowledgments
The authors were partially supported by the Onto-
Text Project, funded by the Autonomous Province
of Trento under the FUP-2004 research program.

References
N. Cancedda, E. Gaussier, C. Goutte, and J.M. Renders.
2003. Word-sequence kernels. Journal of Machine
Learning Research, 32(6):1059–1082.
N. Cristianini and J. Shawe-Taylor. 2000. An introduc-
tion to Support Vector Machines. Cambridge Univer-
sity Press.
A. Gliozzo, C. Giuliano, and R. Rinaldi. 2005a. Instance
filtering for entity recognition. ACM SIGKDD Explo-
rations,special IssueonNaturalLanguageProcessing
and Text Mining, 7(1):11–18,June.
A. Gliozzo, C. Giuliano, and C. Strapparava. 2005b. Do-
main kernels for word sense disambiguation. In Pro-
ceedingsof the 43rd annualmeetingof the Association
for Computational Linguistics (ACL-05), pages 403–
410, Ann Arbor, Michigan, June.
A. Gliozzo. 2005. Semantic Domains in Computa-
tional Linguistics. Ph.D. thesis, ITC-irst/University of
Trento.
H. Lodhi, J. Shawe-Taylor, N. Cristianini, and
C. Watkins. 2002. Text classification using string
kernels. Journal of Machine Learning Research,
2(3):419–444.
B. Magnini, C. Strapparava, G. Pezzulo, and A. Gliozzo.
2002. The role of domain information in word
sense disambiguation. Natural Language Engineer-
ing, 8(4):359–373.
R. Mihalcea and P. Edmonds, editors. 2004. Proceed-
ings of SENSEVAL-3: Third International Workshop
on the Evaluation of Systems for the Semantic Analy-
sis of Text, Barcelona, Spain, July.
G. Salton and M.H. McGill. 1983. Introduction to mod-
ern information retrieval. McGraw-Hill, New York.
C. Saunders, H. Tschach, and J. Shawe-Taylor. 2002.
Syllables and other string kernel extensions. In Pro-
ceedingsof19th InternationalConferenceonMachine
Learning (ICML02).
J. Shawe-Taylor and N. Cristianini. 2004. Kernel Meth-
ods for Pattern Analysis. Cambridge University Press.
C. Strapparava, C. Giuliano, and A. Gliozzo. 2004. Pat-
tern abstractionand term similarity for word sense dis-
ambiguation: IRST at Senseval-3. In Proceedings of
SENSEVAL-3: Third International Workshop on the
Evaluation of Systems for the Semantic Analysis of
Text, Barcelona, Spain, July.
D. Yarowsky. 1994. Decision lists for lexical ambiguity
resolution: Application to accent restoration in span-
ish and french. In Proceedings of the 32nd Annual
Meeting of the ACL, pages 88–95, Las Cruces, New
Mexico.
