Unsupervised Domain Relevance Estimation
for Word Sense Disambiguation
Al o Gliozzo and Bernardo Magnini and Carlo Strapparava
ITC-irst, Istituto per la Ricerca Scienti ca e Tecnologica, I-38050 Trento, ITALY
fgliozzo, magnini, strappag@itc.it
Abstract
This paper presents Domain Relevance Estima-
tion (DRE), a fully unsupervised text categorization
technique based on the statistical estimation of the
relevance of a text with respect to a certain cate-
gory. We use a pre-de ned set of categories (we
call them domains) which have been previously as-
sociated to WORDNET word senses. Given a cer-
tain domain, DRE distinguishes between relevant
and non-relevant texts by means of a Gaussian Mix-
ture model that describes the frequency distribution
of domain words inside a large-scale corpus. Then,
an Expectation Maximization algorithm computes
the parameters that maximize the likelihood of the
model on the empirical data.
The correct identi cation of the domain of the
text is a crucial point for Domain Driven Dis-
ambiguation, an unsupervised Word Sense Disam-
biguation (WSD) methodology that makes use of
only domain information. Therefore, DRE has been
exploited and evaluated in the context of a WSD
task. Results are comparable to those of state-of-
the-art unsupervised WSD systems and show that
DRE provides an important contribution.
1 Introduction
A fundamental issue in text processing and under-
standing is the ability to detect the topic (i.e. the do-
main) of a text or of a portion of it. Indeed, domain
detection allows a number of useful simpli cations
in text processing applications, such as, for instance,
in Word Sense Disambiguation (WSD).
In this paper we introduce Domain Relevance Es-
timation (DRE) a fully unsupervised technique for
domain detection. Roughly speaking, DRE can be
viewed as a text categorization (TC) problem (Se-
bastiani, 2002), even if we do not approach the
problem in the standard supervised setting requir-
ing category labeled training data. In fact, recently,
unsupervised approaches to TC have received more
and more attention in the literature (see for example
(Ko and Seo, 2000).
We assume a pre-de ned set of categories, each
de ned by means of a list of related terms. We
call such categories domains and we consider them
as a set of general topics (e.g. SPORT, MEDICINE,
POLITICS) that cover the main disciplines and ar-
eas of human activity. For each domain, the list
of related words is extracted from WORDNET DO-
MAINS (Magnini and Cavagli a, 2000), an extension
of WORDNET in which synsets are annotated with
domain labels. We have identi ed about 40 domains
(out of 200 present in WORDNET DOMAINS) and
we will use them for experiments throughout the pa-
per (see Table 1).
DRE focuses on the problem of estimating a de-
gree of relatedness of a certain text with respect to
the domains in WORDNET DOMAINS.
The basic idea underlying DRE is to combine the
knowledge in WORDNET DOMAINS and a proba-
bilistic framework which makes use of a large-scale
corpus to induce domain frequency distributions.
Speci cally, given a certain domain, DRE considers
frequency scores for both relevant and non-relevant
texts (i.e. texts which introduce noise) and represent
them by means of a Gaussian Mixture model. Then,
an Expectation Maximization algorithm computes
the parameters that maximize the likelihood of the
empirical data.
DRE methodology originated from the effort to
improve the performance of Domain Driven Dis-
ambiguation (DDD) system (Magnini et al., 2002).
DDD is an unsupervised WSD methodology that
makes use of only domain information. DDD as-
signes the right sense of a word in its context com-
paring the domain of the context to the domain of
each sense of the word. This methodology exploits
WORDNET DOMAINS information to estimate both
Domain #Syn Domain #Syn Domain #Syn
Factotum 36820 Biology 21281 Earth 4637
Psychology 3405 Architecture 3394 Medicine 3271
Economy 3039 Alimentation 2998 Administration 2975
Chemistry 2472 Transport 2443 Art 2365
Physics 2225 Sport 2105 Religion 2055
Linguistics 1771 Military 1491 Law 1340
History 1264 Industry 1103 Politics 1033
Play 1009 Anthropology 963 Fashion 937
Mathematics 861 Literature 822 Engineering 746
Sociology 679 Commerce 637 Pedagogy 612
Publishing 532 Tourism 511 Computer Science 509
Telecommunication 493 Astronomy 477 Philosophy 381
Agriculture 334 Sexuality 272 Body Care 185
Artisanship 149 Archaeology 141 Veterinary 92
Astrology 90
Table 1: Domain distribution over WORDNET synsets.
the domain of the textual context and the domain of
the senses of the word to disambiguate. The former
operation is intrinsically an unsupervised TC task,
and the category set used has to be the same used
for representing the domain of word senses.
Since DRE makes use of a  xed set of target cat-
egories (i.e. domains) and since a document col-
lection annotated with such categories is not avail-
able, evaluating the performance of the approach is
a problem in itself. We have decided to perform an
indirect evaluation using the DDD system, where
unsupervised TC plays a crucial role.
The paper is structured as follows. Section 2
introduces WORDNET DOMAINS, the lexical re-
source that provides the underlying knowledge to
the DRE technique. In Section 3 the problem of es-
timating domain relevance for a text is introduced.
In particular, Section 4 brie y sketchs the WSD sys-
tem used for evaluation. Finally, Section 5 describes
a number of evaluation experiments we have carried
out.
2 Domains, WORDNET and Texts
DRE heavily relies on domain information as its
main knowledge source. Domains show interesting
properties both from a lexical and a textual point of
view. Among these properties there are: (i) lexi-
cal coherence, since part of the lexicon of a text is
composed of words belonging to the same domain;
(ii) polysemy reduction, because the potential am-
biguity of terms is sensibly lower if the domain of
the text is speci ed; and (iii) lexical identi ability
of text’s domain, because it is always possible to as-
sign one or more domains to a given text by consid-
ering term distributions in a bag-of-words approach.
Experimental evidences of these properties are re-
ported in (Magnini et al., 2002).
In this section we describe WORDNET DO-
MAINS1 (Magnini and Cavagli a, 2000), a lexical re-
source that attempts a systematization of relevant
aspects in domain organization and representation.
WORDNET DOMAINS is an extension of WORD-
NET (version 1.6) (Fellbaum, 1998), in which each
synset is annotated with one or more domain la-
bels, selected from a hierarchically organized set of
about two hundred labels. In particular, issues con-
cerning the  completeness of the domain set, the
 balancing among domains and the  granularity 
of domain distinctions, have been addressed. The
domain set used in WORDNET DOMAINS has been
extracted from the Dewey Decimal Classi cation
(Comaroni et al., 1989), and a mapping between the
two taxonomies has been computed in order to en-
sure completeness. Table 2 shows how the senses
for a word (i.e. the noun bank) have been associated
to domain label; the last column reports the number
of occurrences of each sense in Semcor2.
Domain labeling is complementary to informa-
tion already present in WORDNET. First of all,
a domain may include synsets of different syn-
tactic categories: for instance MEDICINE groups
together senses from nouns, such as doctor#1
and hospital#1, and from verbs, such as
operate#7. Second, a domain may include
senses from different WORDNET sub-hierarchies
(i.e. deriving from different  unique beginners or
from different  lexicographer  les ). For example,
SPORT contains senses such as athlete#1, deriv-
ing from life form#1, game equipment#1
from physical object#1, sport#1
1WORDNET DOMAINS is freely available at
http://wndomains.itc.it
2SemCor is a portion of the Brown corpus in which words
are annotated with WORDNET senses.
Sense Synset and Gloss Domains Semcor frequencies
#1 depository  nancial institution, bank, banking con-
cern, banking company (a  nancial institution. . . )
ECONOMY 20
#2 bank (sloping land. . . ) GEOGRAPHY, GEOLOGY 14
#3 bank (a supply or stock held in reserve. . . ) ECONOMY -
#4 bank, bank building (a building. . . ) ARCHITECTURE, ECONOMY -
#5 bank (an arrangement of similar objects...) FACTOTUM 1
#6 savings bank, coin bank, money box, bank (a con-
tainer. . . )
ECONOMY -
#7 bank (a long ridge or pile. . . ) GEOGRAPHY, GEOLOGY 2
#8 bank (the funds held by a gambling house. . . ) ECONOMY, PLAY
#9 bank, cant, camber (a slope in the turn of a road. . . ) ARCHITECTURE -
#10 bank (a  ight maneuver. . . ) TRANSPORT -
Table 2: WORDNET senses and domains for the word  bank .
from act#2, and playing field#1 from
location#1.
Domains may group senses of the same word
into thematic clusters, which has the important side-
effect of reducing the level of ambiguity when we
are disambiguating to a domain. Table 2 shows
an example. The word  bank has ten differ-
ent senses in WORDNET 1.6: three of them (i.e.
bank#1, bank#3 and bank#6) can be grouped
under the ECONOMY domain, while bank#2 and
bank#7 both belong to GEOGRAPHY and GEOL-
OGY. Grouping related senses is an emerging topic
in WSD (see, for instance (Palmer et al., 2001)).
Finally, there are WORDNET synsets that do not
belong to a speci c domain, but rather appear in
texts associated with any domain. For this reason,
a FACTOTUM label has been created that basically
includes generic synsets, which appear frequently
in different contexts. Thus the FACTOTUM domain
can be thought of as a  placeholder for all other
domains.
3 Domain Relevance Estimation for Texts
The basic idea of domain relevance estimation for
texts is to exploit lexical coherence inside texts.
From the domain point of view lexical coherence
is equivalent to domain coherence, i.e. the fact that
a great part of the lexicon inside a text belongs to
the same domain.
From this observation follows that a simple
heuristic to approach this problem is counting the
occurrences of domain words for every domain in-
side the text: the higher the percentage of domain
words for a certain domain, the more relevant the
domain will be for the text. In order to perform this
operation the WORDNET DOMAINS information is
exploited, and each word is assigned a weighted list
of domains considering the domain annotation of
its synsets. In addition, we would like to estimate
the domain of the text locally. Local estimation
of domain relevance is very important in order to
take into account domain shifts inside the text. The
methodology used to estimate domain frequency is
described in subsection 3.1.
Unfortunately the simple local frequency count
is not a good domain relevance measure for sev-
eral reasons. The most signi cant one is that very
frequent words have, in general, many senses be-
longing to different domains. When words are used
in texts, ambiguity tends to disappear, but it is not
possible to assume knowing their actual sense (i.e.
the sense in which they are used in the context) in
advance, especially in a WSD framework. The sim-
ple frequency count is then inadequate for relevance
estimation: irrelevant senses of ambiguous words
contribute to augment the  nal score of irrelevant
domains, introducing noise. The level of noise is
different for different domains because of their dif-
ferent sizes and possible differences in the ambigu-
ity level of their vocabularies.
In subsection 3.2 we propose a solution for that
problem, namely the Gaussian Mixture (GM) ap-
proach. This constitutes an unsupervised way to es-
timate how to differentiate relevant domain infor-
mation in texts from noise, because it requires only
a large-scale corpus to estimate parameters in an
Expectation Maximization (EM) framework. Using
the estimated parameters it is possible to describe
the distributions of both relevant and non-relevant
texts, converting the DRE problem into the problem
of estimating the probability of each domain given
its frequency score in the text, in analogy to the
bayesian classi cation framework. Details about
the EM algorithm for GM model are provided in
subsection 3.3.
3.1 Domain Frequency Score
Let t 2 T , be a text in a corpus T composed by a list
of words wt1;::: ;wtq. Let D = fD1;D2;:::;Ddg be
the set of domains used. For each domain Dk the
domain  frequency score is computed in a window
of c words around wtj. The domain frequency score
is de ned by formula (1).
F(Dk; t; j) =
j+cX
i=j c
Rword(Dk; wti)G(i; j; ( c2)2) (1)
where the weight factor G(x; ; 2) is the density
of the normal distribution with mean  and standard
deviation  at point x and Rword(D;w) is a function
that return the relevance of a domain D for a word
w (see formula 3). In the rest of the paper we use the
notation F(Dk;t) to refer to F(Dk;t;m), where m
is the integer part of q=2 (i.e. the  central point of
the text - q is the text length).
Here below we see that the information contained
in WORDNET DOMAINS can be used to estimate
Rword(Dk;w), i.e. domain relevance for the word
w, which is derived from the domain relevance of
the synsets in which w appears.
As far as synsets are concerned, domain informa-
tion is represented by the function Dom : S )
P(D)3 that returns, for each synset s 2 S, where
S is the set of synsets in WORDNET DOMAINS, the
set of the domains associated to it. Formula (2) de-
 nes the domain relevance estimation function (re-
member that d is the cardinality of D):
Rsyn(D; s) =
8<
:
1=jDom(s)j : if D 2 Dom(s)
1=d : if Dom(s) = fFACTOTUMg
0 : otherwise
(2)
Intuitively, Rsyn(D;s) can be perceived as an es-
timated prior for the probability of the domain given
the concept, as expressed by the WORDNET DO-
MAINS annotation. Under these settings FACTO-
TUM (generic) concepts have uniform and low rel-
evance values for each domain while domain con-
cepts have high relevance values for a particular do-
main.
The de nition of domain relevance for a word is
derived directly from the one given for concepts. In-
tuitively a domain D is relevant for a word w if D
is relevant for one or more senses c of w. More
formally let V = fw1;w2;:::wjV jg be the vocab-
ulary, let senses(w) = fsjs 2 S;s is a sense of
wg (e.g. any synset in WORDNET containing the
word w). The domain relevance function for a word
R : D  V ) [0;1] is de ned as follows:
Rword(Di; w) = 1jsenses(w)j
X
s2senses(w)
Rsyn(Di; s) (3)
3P(D) denotes the power set of D
3.2 The Gaussian Mixture Algorithm
As explained at the beginning of this section, the
simple local frequency count expressed by formula
(1) is not a good domain relevance measure.
In order to discriminate between noise and rel-
evant information, a supervised framework is typ-
ically used and signi cance levels for frequency
counts are estimated from labeled training data. Un-
fortunately this is not our case, since no domain
labeled text corpora are available. In this section
we propose a solution for that problem, namely the
Gaussian Mixture approach, that constitutes an un-
supervised way to estimate how to differentiate rel-
evant domain information in texts from noise. The
Gaussian Mixture approach consists of a parameter
estimation technique based on statistics of word dis-
tribution in a large-scale corpus.
The underlying assumption of the Gaussian Mix-
ture approach is that frequency scores for a cer-
tain domain are obtained from an underlying mix-
ture of relevant and non-relevant texts, and that the
scores for relevant texts are signi cantly higher than
scores obtained for the non-relevant ones. In the
corpus these scores are distributed according to two
distinct components. The domain frequency distri-
bution which corresponds to relevant texts has the
higher value expectation, while the one pertaining to
non relevant texts has the lower expectation. Figure
1 describes the probability density function (PDF)
for domain frequency scores of the SPORT domain
estimated on the BNC corpus4 (BNC-Consortium,
2000) using formula (1). The  empirical PDF,
describing the distribution of frequency scores eval-
uated on the corpus, is represented by the continu-
ous line.
From the graph it is possible to see that the empir-
ical PDF can be decomposed into the sum of two
distributions, D = SPORT and D =  non-SPORT .
Most of the probability is concentrated on the left,
describing the distribution for the majority of non
relevant texts; the smaller distribution on the right
is assumed to be the distribution of frequency scores
for the minority of relevant texts.
Thus, the distribution on the left describes the
noise present in frequency estimation counts, which
is produced by the impact of polysemous words
and of occasional occurrences of terms belonging
to SPORT in non-relevant texts. The goal of the
technique is to estimate parameters describing the
distribution of the noise along texts, in order to as-
4The British National Corpus is a very large (over 100 mil-
lion words) corpus of modern English, both spoken and written.
0
50
100
150
200
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
Density
Non-relevant
Relevant
F(D, t)
density function
Figure 1: Gaussian mixture for D = SPORT
sociate high relevance values only to relevant fre-
quency scores (i.e. frequency scores that are not re-
lated to noise). It is reasonable to assume that such
noise is normally distributed because it can be de-
scribed by a binomial distribution in which the prob-
ability of the positive event is very low and the num-
ber of events is very high. On the other hand, the
distribution on the right is the one describing typical
frequency values for relevant texts. This distribution
is also assumed to be normal.
A probabilistic interpretation permits the evalu-
ation of the relevance value R(D;t;j) of a certain
domain D for a new text t in a position j only by
considering the domain frequency F(D;t;j). The
relevance value is de ned as the conditional prob-
ability P(DjF(D;t;j)). Using Bayes theorem we
estimate this probability by equation (4).
R(D; t; j) = P(DjF(D; t; j)) = (4)
= P(F(D; t; j)jD)P(D)P(F(D; t; j)jD)P(D) + P(F(D; t; j)jD)P(D)
where P(F(D;t;j)jD) is the value of the PDF
describing D calculated in the point F(D;t;j),
P(F(D;t;j)jD) is the value of the PDF describ-
ing D, P(D) is the area of the distribution describ-
ing D and P(D) is the area of the distribution for
D.
In order to estimate the parameters describing the
PDF of D and D the Expectation Maximization
(EM) algorithm for the Gaussian Mixture Model
(Redner and Walker, 1984) is exploited. Assuming
to model the empirical distribution of domain fre-
quencies using a Gaussian mixture of two compo-
nents, the estimated parameters can be used to eval-
uate domain relevance by equation (4).
3.3 The EM Algorithm for the GM model
In this section some details about the algorithm for
parameter estimation are reported.
It is well known that a Gaussian mixture (GM)
allows to represent every smooth PDF as a linear
combination of normal distributions of the type in
formula 5
p(xj ) =
mX
j=1
ajG(x; j; j) (5)
with
aj  0 and
mX
j=1
aj = 1 (6)
and
G(x; ; ) = 1p2  e (x  )
2
2 2 (7)
and  = ha1; 1; 1;::: ;am; m; mi is a pa-
rameter list describing the gaussian mixture. The
number of components required by the Gaussian
Mixture algorithm for domain relevance estimation
is m = 2.
Each component j is univocally determined by its
weight aj, its mean  j and its variance  j. Weights
represent also the areas of each component, i.e. its
total probability.
The Gaussian Mixture algorithm for domain rele-
vance estimation exploits a Gaussian Mixture to ap-
proximate the empirical PDF of domain frequency
scores. The goal of the Gaussian Mixture algorithm
is to  nd the GM that maximize the likelihood on
the empirical data, where the likelihood function is
evaluated by formula (8).
L(T ;D; ) =
Y
t2T
p(F(D;t)j ) (8)
More formally, the EM algorithm for GM models
explores the space of parameters in order to  nd the
set of parameters  such that the maximum likeli-
hood criterion (see formula 9) is satis ed.
 D = argmax
 0
L(T ;D; 0) (9)
This condition ensures that the obtained model
 ts the original data as much as possible. Estima-
tion of parameters is the only information required
in order to evaluate domain relevance for texts us-
ing the Gaussian Mixture algorithm. The Expecta-
tion Maximization Algorithm for Gaussian Mixture
Models (Redner and Walker, 1984) allows to ef -
ciently perform this operation.
The strategy followed by the EM algorithm is
to start from a random set of parameters  0, that
has a certain initial likelihood value L0, and then
iteratively change them in order to augment like-
lihood at each step. To this aim the EM algo-
rithm exploits a growth transformation of the like-
lihood function  ( ) =  0 such that L(T ;D; ) 6
L(T ;D; 0). Applying iteratively this transforma-
tion starting from  0 a sequence of parameters is
produced, until the likelihood function achieve a
stable value (i.e. Li+1  Li 6  ). In our settings
the transformation function  is de ned by the fol-
lowing set of equations, in which all the parameters
have to be solved together.
 ( ) =  (ha1; 1; 1;a2; 2; 2i) (10)
= ha01; 01; 01;a02; 02; 02i
a0j = 1jT j
jT jX
k=1
ajG(F(D;tk); j; j)
p(F(D;tk); ) (11)
 0j =
PjT j
k=1 F(D;tk)  
ajG(F(D;tk); j; j)
p(F(D;tk); )P
jT j
k=1
ajG(F(D;tk); j; j)
p(F(D;tk); )
(12)
 0j =
PjT j
k=1 (F(D;tk)   
0j)2  aiG(F(D;tk); i; i)
p(F(D;tk); )P
jT j
k=1
ajG(F(D;tk); j; j)
p(F(D;tk); )
(13)
As said before, in order to estimate distribu-
tion parameters the British National Corpus (BNC-
Consortium, 2000) was used. Domain frequency
scores have been evaluated on the central position
of each text (using equation 1, with c = 50).
In conclusion, the EM algorithm was used to es-
timate parameters to describe distributions for rele-
vant and non-relevant texts. This learning method
is totally unsupervised. Estimated parameters has
been used to estimate relevance values by formula
(4).
4 Domain Driven Disambiguation
DRE originates to improve the performance of Do-
main Driven Disambiguation (DDD). In this sec-
tion, a brief overview of DDD is given. DDD is a
WSD methodology that only makes use of domain
information. Originally developed to test the role of
domain information for WSD, the system is capable
to achieve a good precision disambiguation. Its re-
sults are affected by a low recall, motivated by the
fact that domain information is suf cient to disam-
biguate only  domain words . The disambiguation
process is done comparing the domain of the con-
text and the domains of each sense of the lemma to
disambiguate. The selected sense is the one whose
domain is relevant for the context5.
In order to represent domain information we in-
troduced the notion of Domain Vectors (DV), that
are data structures that collect domain information.
These vectors are de ned in a multidimensional
space, in which each domain represents a dimen-
sion of the space. We distinguish between two kinds
of DVs: (i) synset vectors, which represent the rel-
evance of a synset with respect to each considered
domain and (ii) text vectors, which represent the rel-
evance of a portion of text with respect to each do-
main in the considered set.
More formally let D = fD1;D2;:::;Ddg be the
set of domains, the domain vector ~s for a synset s
is de ned as hR(D1;s);R(D2;s);::: ;R(Dd;s)i
where R(Di;s) is evaluated using equation
(2). In analogy the domain vector ~tj for
a text t in a given position j is de ned as
hR(D1;t;j);R(D2;t;j);::: ;R(Dd;t;j)i where
R(Di;t;j) is evaluated using equation (4).
The DDD methodology is performed basically in
three steps:
1. Compute ~t for the context t of the word w to be disam-
biguated
2. Compute ^s = argmaxs2Senses(w)score(s; w; t) where
score(s;w; t) = P(sjw)  sim(~s;~t)P
s2Senses(w) P(sjw)  sim(~s;~t)
3. if score(^s; w; t) > k (where k 2 [0; 1] is a con dence
threshold) select sense ^s, else do not provide any answer
The similarity metric used is the cosine vector
similarity, which takes into account only the direc-
tion of the vector (i.e. the information regarding the
domain).
P(sjw) describes the prior probability of sense
s for word w, and depends on the distribution of
the sense annotations in the corpus. It is esti-
mated by statistics from a sense tagged corpus (we
used SemCor)6 or considering the sense order in
5Recent works in WSD demonstrate that an automatic es-
timation of domain relevance for texts can be pro table used
to disambiguate words in their contexts. For example, (Escud-
ero et al., 2001) used domain relevance extraction techniques
to extract features for a supervised WSD algorithm presented
at the Senseval-2 competion, improving the system accuracy of
about 4 points for nouns, 1 point for verbs and 2 points for ad-
jectives, con rming the original intuition that domain informa-
tion is very useful to disambiguate  domain words , i.e. words
which are strongly related to the domain of the text.
6Admittedly, this may be regarded as a supervised compo-
nent of the generally unsupervised system. Yet, we considered
this component as legitimate within an unsupervised frame-
WORDNET, which roughly corresponds to sense
frequency order, when no example of the word
to disambiguate are contained in SemCor. In the
former case the estimation of P(sjw) is based on
smoothed statistics from the corpus (P(sjw) =
occ(s;w)+ 
occ(w)+jsenses(w)j  , where  is a smoothing fac-
tor empirically determined). In the latter case
P(sjw) can be estimated in an unsupervised way
considering the order of senses in WORDNET
(P(sjw) = 2(jsenses(w)j sensenumber(s;w)+1)jsenses(w)j(jsenses(w)j+1) where
sensenumber(s;w) returns the position of sense
s of word w in the sense list for w provided by
WORDNET.
5 Evaluation in a WSD task
We used the WSD framework to perform an evalu-
ation of the DRE technique by itself.
As explained in Section 1 Domain Relevance Es-
timation is not a common Text Categorization task.
In the standard framework of TC, categories are
learned form examples, that are used also for test.
In our case information in WORDNET DOMAINS is
used to discriminate, and a test set, i.e. a corpus of
texts categorized using the domain of WORDNET
DOMAINS, is not available. To evaluate the accu-
racy of the domain relevance estimation technique
described above is thus necessary to perform an in-
direct evaluation.
We evaluated the DDD algorithm described in
Section 4 using the dataset of the Senseval-2 all-
words task (Senseval-2, 2001; Preiss and Yarowsky,
2002). In order to estimate domain vectors for the
contexts of the words to disambiguate we used the
DRE methodology described in Section 3. Varying
the con dence threshold k, as described in Section
4, it is possible to change the tradeoff between preci-
sion and recall. The obtained precision-recall curve
of the system is reported in Figure 2.
In addition we evaluated separately the perfor-
mance on nouns and verbs, suspecting that nouns
are more  domain oriented than verbs. The effec-
tiveness of DDD to disambiguate domain words is
con rmed by results reported in Figure 3, in which
the precision recall curve is reported separately for
both nouns and verbs. The performances obtained
for nouns are sensibly higher than the one obtained
for verbs, con rming the claim that domain infor-
mation is crucial to disambiguate domain words.
In Figure 2 we also compare the results ob-
tained by the DDD system that make use of the
DRE technique described in Section 3 with the re-
work since it relies on a general resource (SemCor) that does
not correspond to the test data (Senseval all-words task).
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
Precision
Recall
DDD new
DDD old
Figure 2: Performances of the system for all POS
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Precision
Recall
Nouns
Verbs
Figure 3: Performances of the system for Nouns and
Verbs
sults obtained by the DDD system presented at the
Senseval-2 competition described in (Magnini et al.,
2002), that is based on the same DDD methodol-
ogy and exploit a DRE technique that consists ba-
sically on the simply domain frequency scores de-
scribed in subsection 3.1 (we refer to this system
using the expression old-DDD, in contrast to the ex-
pression new-DDD that refers to the implementation
described in this paper).
Old-DDD obtained 75% precision and 35% re-
call on the of cial evaluation at the Senseval-2 En-
glish all words task. At 35% of recall the new-DDD
achieves a precision of 79%, improving precision
by 4 points with respect to old-DDD. At 75% pre-
cision the recall of new-DDD is 40%. In both cases
the new domain relevance estimation technique im-
proves the performance of the DDD methodology,
demonstrating the usefulness of the DRE technique
proposed in this paper.
6 Conclusions and Future Works
Domain Relevance Estimation, an unsupervised TC
technique, has been proposed and evaluated in-
side the Domain Driven Disambiguation frame-
work, showing a signi cant improvement on the
overall system performances. This technique also
allows a clear probabilistic interpretation providing
an operative de nition of the concept of domain rel-
evance. During the learning phase annotated re-
sources are not required, allowing a low cost imple-
mentation. The portability of the technique to other
languages is allowed by the usage of synset-aligned
wordnets, being domain annotation language inde-
pendent.
As far as the evaluation of DRE is concerned, for
the moment we have tested its usefulness in the con-
text of a WSD task, but we are going deeper, con-
sidering a pure TC framework.
Acknowledgements
We would like to thank Ido Dagan and Marcello
Federico for many useful discussions and sugges-
tions.

References
BNC-Consortium. 2000. British national corpus, http://www.hcu.ox.ac.uk/BNC/.
J. P. Comaroni, J. Beall, W. E. Matthews, and G. R. New, editors. 1989. Dewey Decimal Classi cation and Relative Index. Forest Press, Albany, New York, 20th edition.
G. Escudero, L. M arquez, and G. Rigau. 2001. Using lazy boosting for word sense disambiguation. In Proc. of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation System, pages 71 74, Toulose, France, July.
C. Fellbaum. 1998. WordNet. An Electronic Lexical Database. The MIT Press.
Y. Ko and J. Seo. 2000. Automatic text categorization by unsupervised learning. In Proceedings of COLING-00, the 18th International Conference on Computational Linguistics, Saarbrcurrency1ucken, Germany.
B. Magnini and G. Cavagli a. 2000. Integrating subject  eld codes into WordNet. In Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece, June.
B. Magnini, C. Strapparava, G. Pezzulo, and A. Gliozzo. 2002. The role of domain information in word sense disambiguation. Natural Language Engineering, 8(4):359 373.
M. Palmer, C. Fellbaum, S. Cotton, L. Delfs, and H.T. Dang. 2001. English tasks: All-words and verb lexical sample. In Proceedings of SENSEVAL-2, Second International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France, July.
J. Preiss and D. Yarowsky, editors. 2002. Proceedings of SENSEVAL-2: Second International Workshop on Evaluating Word Sense Disambiguation Systems, Toulouse, France.
R. Redner and H. Walker. 1984. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195 239, April.
F. Sebastiani. 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1 47.
Senseval-2. 2001. http://www.senseval.org.
