The University of Alicante systems at SENSEVAL-3⁄
Sonia V´azquez, Rafael Romero
Armando Su´arez and Andr´es Montoyo
Dpt. of Software and Computing Systems
Universidad de Alicante, Spain
fsvazquez,romerog@dlsi.ua.es
farmando,montoyog@dlsi.ua.es
Iulia Nica and Antonia Mart´ı y
Dpt. of General Linguistics
Universidad de Barcelona, Spain
iulia@clic.fil.ub.es
amarti@ub.edu
Abstract
The DLSI-UA team is currently working on sev-
eral word sense disambiguation approaches, both
supervised and unsupervised. These approaches are
based on different ways to use both annotated and
unannotated data, and several resources generated
from or exploiting WordNet (Miller et al., 1993),
WordNet Domains, EuroWordNet (EWN) and addi-
tional corpora. This paper presents a view of differ-
ent system results for Word Sense Disambiguation
in different tasks of SENSEVAL-3.
1 Introduction
Word Sense Disambiguation (WSD) is an open re-
search field in Natural Language Processing (NLP).
The task of WSD consists in assigning the correct
sense to words in a particular context using an elec-
tronic dictionary as the source of words definitions.
This is a difficult problem that is receiving a great
deal of attention from the research community.
At the Second International Workshop on
Evaluating Word Sense Disambiguation Systems,
SENSEVAL-2, several supervised and unsupervised
systems took part. The more successful systems re-
lay on corpus-based and supervised learning meth-
ods. At SENSEVAL-2 the average level of accu-
racy achieved rounded 70%, which is insufficient
for such other NLP tasks as information retrieval,
machine translation, or question answering.
The DLSI-UA systems were applied to three
SENSEVAL-3 tasks: English all-words, English lex-
ical sample and Spanish Lexical Sample. Our sys-
tems use both corpus-based and knowledge-based
approaches: Maximum Entropy(ME) (Lau et al.,
1993; Berger et al., 1996; Ratnaparkhi, 1998) is
a corpus-based and supervised method based on
linguistic features; ME is the core of a bootstrap-
ping algorithm that we call re-training inspired
⁄ This paper has been partially supported by the Spanish
Government (CICyT) under project number TIC-2003-7180
and the Valencia Government (OCyT) under project number
CTIDIB-2002-151
by co-training (Blum and Mitchell, 1998); Rele-
vant Domains (RD) (Montoyo et al., 2003) is a
resource built from WordNet Domains (Magnini
and Cavaglia, 2000) that is used in an unsuper-
vised method that assigns domain and sense la-
bels; Specification Marks(SP) (Montoyo and Palo-
mar, 2000) exploits the relations between synsets
stored in WordNet (Miller et al., 1993) and does not
need any training corpora; Commutative Test (CT)
(Nica et al., 2003), based on the Sense Discrimi-
nators device derived from EWN (Vossen, 1998),
disambiguates nouns inside their syntactic patterns,
with the help of information extracted from raw cor-
pus.
A resume of which methods and how were used
in which SENSEVAL-3 tasks is shown in Table 1.
DLSI-UA Method Combined
Systems Results
ALL-NOSU RD No
LS-ENG-SU Re-t No
LS-ENG-
NOSU
RD No
LS-SPA-SU ME+Re-t No
LS-SPA-NOSU SM + ME Nouns: SM
Verbs and adj.: ME
LS-SPA- Pattern-Nica Nouns: SM
PATTERN + ME Verbs and adj.: ME
Table 1: DLSI-UA Systems at SENSEVAL-3
Most of these methods are relatively new and our
goal when participating at SENSEVAL-3 is to evalu-
ate for the first time such approaches. At the mo-
ment of writing this paper we can conclude that
these are promising contributions in order to im-
prove current WSD systems.
In the following section each method is described
briefly. Then, details of how the SENSEVAL-3 train
and testing data were processed are shown. Next,
the scores obtained by each system are explained.
Finally, some conclusions and future work are pre-
sented.
                                             Association for Computational Linguistics
                        for the Semantic Analysis of Text, Barcelona, Spain, July 2004
                 SENSEVAL-3: Third International Workshop on the Evaluation of Systems
2 Methods and Algorithms
In this section we describe the set of methods and
techniques that we used to build the four systems
that had participated in SENSEVAL-3.
2.1 Re-training and Maximum Entropy
In this section, we describe our bootstrapping
method, which we call re-training. Our method
is derived from the co-training method. Our re-
training system is based on two different views of
the data (as is also the case for co-training), de-
fined using several groups of features from those de-
scribed in Figure 1, with several filters that ensure a
high confidence sense labelling.
† the target word itself
† lemmas of content-words at positions §1, §2, §3
† words at positions §1, §2,
† words at positions §1, §2, §3
† content-words at positions §1, §2, §3
† POS-tags of words at positions §1, §2, §3
† lemmas of collocations at positions (¡2;¡1),
(¡1;+1), (+1;+2)
† collocations at positions (¡2;¡1), (¡1;+1),
(+1;+2)
† lemmas of nouns at any position in context, occur-
ring at least m% times with a sense
† grammatical relation of the target word
† the word that the target word depends on
† the verb that the target word depends on
† the target word belongs to a multi-word, as identi-
fied by the parser
† ANPA codes (Spanish only)
† IPTC codes (Spanish only)
Figure 1: Features Used for the Supervised Learn-
ing
These two views consist of two weak ME learn-
ers, based on different sets of linguistic features,
for every possible sense of a target word. We de-
cided to use ME as the core of our bootstrapping
method because it has shown to be competitive in
WSD when compared to other machine learning ap-
proaches (Su´arez and Palomar, 2002; M`arquez et
al., 2003).
The main difference with respect co-training is
that the two views are used in parallel in order to
get a consensus of what label to assign to a particu-
lar context. Additional filters will ultimately deter-
mine which contexts will then be added to the next
training cycle.
Re-training performs several binary partial train-
ings with positive and negative examples for each
sense. These classifications must be merged in a
unique label for such contexts with enough evidence
of being successfully classified. This “evidence” re-
lies on values of probability assigned by the ME
module to positive and negative labels, and the fact
that the unlabeled example is classified as positive
for a unique sense only. The set of new labeled ex-
amples feeds the training corpora of the next itera-
tion with positive and negative examples. The stop-
ping criteria is a certain number of iterations or the
failure to obtain new examples from the unlabeled
corpus.
2.2 Specification Marks
Specification Marks is an unsupervised WSD
method over nouns. Its context is the group of words
that co-occur with the word to be disambiguated in
the sentence and their relationship to the noun to
be disambiguated. The disambiguation is resolved
with the use of the WordNet lexical knowledge base.
The underlying hypothesis of the method we
present here is that the higher the similarity between
two words, the larger the amount of information
shared by two concepts. In this case, the informa-
tion commonly shared by two concepts is indicated
by the most specific concept that subsumes them
both in the taxonomy.
The input for the WSD module is a group of
nouns W =fw1;w2;:::;wng in a context. Each
word wi is sought in WordNet, each having an asso-
ciated set of possible senses Si =fSi1;Si2;:::;Sing,
and each sense having a set of concepts in the IS-A
taxonomy (hypernymy/hyponymy relations). First,
the common concept to all the senses of the words
that form the context is gathered. This concept is
marked by the initial specification mark (ISM). If
this initial specification mark does not resolve the
ambiguity of the word, we then descend through
the WordNet hierarchy, from one level to another,
assigning new specification marks. The number of
concepts contained within the subhierarchy is then
counted for each specification mark. The sense that
corresponds to the specification mark with the high-
est number of words is the one chosen as the sense
disambiguated within the given context
We define six heuristics for our system: Heuris-
tic of Hypernym, Heuristic of Definition, Heuristic
of Common Specification Mark, Heuristic of Gloss
Hypernym, Heuristic of Hyponym and Heuristic of
Gloss Hyponym.
2.3 Relevant Domains
This is an unsupervised WSD method based on the
WordNet Domains lexical resource (Magnini and
Cavaglia, 2000). The underlying working hypoth-
esis is that domain labels, such as ARCHITEC-
TURE, SPORT and MEDICINE provide a natural
way to establish semantic relations between word
senses, that can be used during the disambiguation
process. This resource has already been used on
Word Sense Disambiguation (Magnini and Strappa-
rava, 2000), but it has not made use of glosses infor-
mation. So our approach make use of a new lexical
resource obtained from glosses information named
Relevant Domains.
First step is to obtain the Relevant Domains re-
source from WordNet glosses. For this task is nec-
essary a previous part-of-speech tagging of Word-
Net glosses (each gloss has associated a domain la-
bel). So we extract all nouns, verbs, adjectives and
adverbs from glosses and assign them their associ-
ated domain label. With this information and using
the Association Ratio formula (w=word,D=domain
label), in (1), we obtain the Relevant Domains re-
source.
AR(w;D) = Pr(wjD)log2Pr(wjD)Pr(w) (1)
The final result is for each word, a set of domain
labels sorted by Association Ratio, for example,
for word plant” its Relevant Domains are: genetics
0.177515, ecology 0.050065, botany 0.038544 . . . .
Once obtained Relevant Domains the disam-
biguation process is carried out. We obtain from
the text source the context words that co-occur with
the word to be disambiguated (context could be
a sentence or a window of words). We obtain a
context vector from Relevant Domains and context
words (in case of repeated domain labels, they are
weighted). Furthermore we need a sense vector ob-
tained in the same way as context vector from words
of glosses of each word sense. We select the cor-
rect sense using the cosine measure between con-
text vector and sense vectors. So the selected sense
is that for which the cosine with the context vector
is closer to one.
2.4 Pattern-Nica
This is an unsupervised method only for Spanish
nouns exploiting both EuroWordNet and corpus.
In this method we adopt a different approach to
WSD: the occurrence to be disambiguated is con-
sidered not separately, but integrated into a syn-
tactic pattern, and its disambiguation is carried
out in relation to this pattern. A syntactic pat-
tern is a triplet X-R-Y, formed by two lexical con-
tent units X and Y and an eventual relational el-
ement R, which corresponds to a syntactic rela-
tion between X and Y. Examples: [X=canal-noun
R=de-preposition Y=televisi´on-noun], [X=pasaje-
noun R=Ø Y=a´ereo-adjective]. The strategy is
based on the hypothesis that syntactic patterns in
which an ambiguous occurrence participates have
decisive influence on its meaning. We also assume
that inside a syntactic pattern a word will tend to
have the same sense: the ”quasi one sense per syn-
tactic pattern” hypothesis. The method works as fol-
lows:
Step 1, the identification of the syntactic patterns
of the ambiguous occurrence;
Step 2, the extraction of information related to it:
from corpus and from the sentential context;
Step 3, the application of the WSD algorithm on
the different information previously obtained;
Step 4, the final sense assignment by combining
the partial sense proposals from step 3.
For step 1, we POS-tag the test sentence and ex-
tract the sequences that correspond to previously de-
fined combinations of POS tags. We only kept the
patterns with frequency 5 or superior.
In step 2, we use a search corpus previously POS-
tagged. For every syntactic pattern of the ambigu-
ous occurrence X, we obtain from corpus two sets of
words: the substitutes of X into the pattern (S1) and
the nouns that co-occur with the pattern in any sen-
tence from the corpus (S2), In both cases, we keep
only the element with frequency 5 or superior.
We perform step 3 by means of the heuristics de-
fined by the Commutative Test (CT) algorithm ap-
plied on each set from 2. The algorithm is related
to the Sense Discriminators (SD) lexical device, an
adaptation of the Spanish WordNet, consisting in a
set of sense discriminators for every sense of a given
noun in WordNet. The Commutative Test algorithm
lays on the hypothesis that if an ambiguous occur-
rence can be substituted in a syntactic pattern by a
sense discriminator, then it can have the sense cor-
responding to that sense discriminator.
For step 4, we first obtain a sense assignment in
relation with each syntactic pattern, by intersecting
the sense proposals from the two heuristics corre-
sponding to a pattern; then we choose the most fre-
quent sense between those proposed by the differ-
ent syntactic patterns; finally, if there are more final
proposed senses, we choose the most frequent sense
on the base of sense numbers in WordNet.
The method we propose for nouns requires only a
large corpus, a minimal preprocessing phase (POS-
tagging) and very little grammatical knowledge, so
it can easily be adapted to other languages. Sense
assignment is performed exploiting information ex-
tracted from corpus, thus we make an intensive use
of sense untagged corpora for the disambiguation
process.
3 Tasks Processing
At this point we explain for each task the systems
processing. The results of each system are shown in
Table2:
DLSI-UA Systems Precision Recall
LS-SPA-SU 84% 84%
LS-ENG-SU 82% 32%
ALL-NOSU 34% 28%
LS-ENG-NOSU 32% 20%
LS-SPA-NOSU 62% 62%
LS-SPA-PATTERN 84% 47%
Table 2: Results at SENSEVAL-3
3.1 DLSI-UA-LS-SPA-SU
Our system, based on re-training and maximum en-
tropy methods, processed both sense labelled and
unlabelled Spanish Lexical Sample data in three
consecutive steps:
Step 1, analyzing the train corpus: words which
most frequent sense is under 70% were selected.
For each one of these words, each feature was used
in a 3-fold cross-validation in order to determine the
best set of features for re-training.
Step 2, feeding training corpora: for these se-
lected words, based on the results of the previous
step, each training corpus was enriched with new
examples from the unlabelled data using re-training.
Step 3, classifying the test data: for the selected
words, re-training was used again to obtain a first set
of answers with, a priori, a label with a high level of
confidence; the remaining contexts that re-training
could not classify were processed with the ME sys-
tem using a unique set of features for all words.
The lemmatization and POS information supplied
into the SENSEVAL-3 Spanish data were the infor-
mation used for defining the features of the system.
0ur system obtained an accuracy of 0.84 for the
Spanish lexical sample task. Unfortunately, a shal-
low analysis of the answers revealed that the UA.5
system performed slightly worse than if only the ba-
sic ME system were used1. This fact means that the
new examples extracted from the unlabelled data in-
troduced too much noise into the classifiers. Be-
cause this anomalous behavior was present only on
some words, a complete study of such new exam-
ples must be done. Probably, the number of itera-
tions done by re-training over unlabelled data were
too low and the enrichment of the training corpora
not large enough.
1The ME system, without using re-training, has not com-
peted at SENSEVAL-3: our own scoring of these set of answers
reported an accuracy of 0.856
3.2 DLSI-UA-LS-ENG-SU
In the English Lexical Sample task our system goal
was to prove that the re-training method ensures a
high level of precision.
By means of a 3-fold cross-validation of the train
data, the features were ordered from higher to lower
precision. Based on this information, four execu-
tions of re-training over the test data were done with
different selections of features for the two views of
the method. Each execution feed the learning cor-
pora of the next one with new examples, those that
re-training considered as the most probably correct.
For this system Minipar parser (Lin, 1998)was
used to properly add syntactic information to the
training and testing data.
Almost 40% of the test contexts were la-
belled by our system, obtaining these scores (for
“fine-grained” and “coarse-grained”, respectively):
0.782/0.828 precision and 0.310/0.329 recall. In our
opinion, such results must be interpreted as very
positive because the re-training method is able to
satisfy a high level of precision if the parameters of
the system are correctly set.
3.3 DLSI-UA-ALL-NOSU and
DLSI-UA-LS-ENG-NOSU
In the English All Words and English Lexical Sam-
ple tasks RD system was performed with informa-
tion obtained from Relevant Domains resource us-
ing for the disambiguation process all the 165 do-
main labels.
For All Words task we used as input information
all nouns, verbs, adjectives and adverbs present in
a 100 words window around the word to be disam-
biguated. So our system obtained a 34% of preci-
sion and a reduced recall around 28%.
For Lexical Sample task we used all nouns, verbs,
adjectives and adverbs present in the context of each
instance obtaining around 32% precision.
We obtained a reduced precision due to we use all
the domains label hierarchy. In some experiments
realized on SENSEVAL-2 data, our system obtained
a more high precision when grouping domains into
the first three levels. Therefore we expect with re-
ducing the number of domains labels, an improve-
ment on precision.
3.4 DLSI-UA-LS-SPA-NOSU
We used a combined system for Spanish Lexical
Sample task, using the SM method for disambiguat-
ing nouns and the ME method for disambiguating
verbs and adjectives. We obtained around 62% pre-
cision and a 62% recall.
3.5 DLSI-UA-LS-SPA-PATTERN
Our goal when participating in this task was to
demonstrate that the applying of syntactic patterns
to WSD maintains high levels of precision.
In this task we used also a combined system for
Spanish Lexical Sample task, using Pattern-Nica
method for disambiguating nouns and ME method
for disambiguating verbs and adjectives. We ob-
tained around 84% precision and a 47% recall.
4 Conclusions
The supervised systems for the English and Span-
ish lexical sample tasks are very competitive. Al-
though the processing of the train and test data was
different for each task, both systems rely on re-
training, a bootstrapping method, that uses a max-
imum entropy-based WSD module.
The results for the English task prove that re-
training is capable of maintaining a high level of
precision. Nevertheless, for the Spanish task, al-
though the scores achieved were excellent, the sys-
tem must be redesigned in order to improve the clas-
sifiers.
The re-training method is a proposal that we are
trying to incorporate into text retrieval and ques-
tion answering systems that could take advantage of
sense disambiguation of a subset of words.
The unsupervised systems presented here does
not appear to be sufficient for a stand-alone WSD
solution. Wether these methods can be combined
with other supervised methods to improve their re-
sults requires further investigation.

References
Adam L. Berger, Stephen A. Della Pietra, and Vin-
cent J. Della Pietra. 1996. A maximum entropy
approach to natural language processing. Com-
putational Linguistics, 22(1):39–71.
Avrim Blum and Tom Mitchell. 1998. Combining
labeled and unlabeled data with co-training. In
Proceedings of the 11th Annual Conference on
Computational Learning Theory, pages 92–100,
Madison, Wisconsin, July. ACM Press.
R. Lau, R. Rosenfeld, and S. Roukos. 1993.
Adaptative statistical language modeling using
the maximum entropy principle. In Proceedings
of the Human Language Technology Workshop,
ARPA.
Dekang Lin. 1998. Dependency-based evaluation
of minipar. In Proceedings of the Workshop on
the Evaluation of Parsing Systems, First Inter-
national Conference on Language Resources and
Evaluation, Granada, Spain.
Bernardo Magnini and Gabriela Cavaglia. 2000.
Integrating Subject Field Codes into WordNet. In
M. Gavrilidou, G. Crayannis, S. Markantonatu,
S. Piperidis, and G. Stainhaouer, editors, Pro-
ceedings of LREC-2000, Second International
Conference on Language Resources and Evalu-
ation, pages 1413–1418, Athens, Greece.
Bernardo Magnini and C. Strapparava. 2000. Ex-
periments in Word Domain Disambiguation for
Parallel Texts. In Proceedings of the ACL Work-
shop on Word Senses and Multilinguality, Hong
Kong, China.
Llu´ıs M`arquez, Fco. Javier Raya, John Car-
roll, Diana McCarthy, Eneko Agirre, David
Mart´ınez, Carlo Strapparava, and Alfio
Gliozzo. 2003. Experiment A: several all-words
WSD systems for English. Technical Report
WP6.2, MEANING project (IST-2001-34460),
http://www.lsi.upc.es/»nlp/meaning/meaning.html.
George A. Miller, Richard Beckwith, Christiane
Fellbaum, Derek Gross, and Katherine J. Miller.
1993. Five Papers on WordNet. Special Issue of
the International journal of lexicography, 3(4).
Andr´es Montoyo and Manuel Palomar. 2000. Word
Sense Disambiguation with Specification Marks
in Unrestricted Texts. In Proceedings of 11th In-
ternational Workshop on Database and Expert
Systems Applications (DEXA 2000), pages 103–
107, Greenwich, London, UK, September. IEEE
Computer Society.
Andr´es Montoyo, Sonia V´azquez, and German
Rigau. 2003. M´etodo de desambiguaci´on l´exica
basada en el recurso l´exico Dominios Rele-
vantes. Procesamiento del Lenguaje Natural, 30,
september.
Iulia Nica, Antonia Mart´ı, and Andr´es Mon-
toyo. 2003. Colaboraci´on entre informaci´on
paradigm´atica y sintagm´atica en la desam-
biguaci´on sem´antica autom´atica. XIX Congreso
de la SEPLN 2003.
Adwait Ratnaparkhi. 1998. Maximum Entropy
Models for Natural Language Ambiguity Resolu-
tion. Ph.D. thesis, University of Pennsylvania.
Armando Su´arez and Manuel Palomar. 2002.
A maximum entropy-based word sense disam-
biguation system. In Hsin-Hsi Chen and Chin-
Yew Lin, editors, Proceedings of the 19th In-
ternational Conference on Computational Lin-
guistics, pages 960–966, Taipei, Taiwan, August.
COLING 2002.
Piek Vossen. 1998. EuroWordNet: Building a Mul-
tilingual Database with WordNets for European
Languages. The ELRA Newsletter, 3(1).
