Automatic Adaptation of WordNet to Sublanguages and to 
Computational Tasks 
Roberto Basili(+) Alessandro Cucchiarelli (*) Carlo Consoli (+) 
Maria Teresa Pazienza (+) Paola Velardi (~) 
(+) Univer,fita' di Roma Tor Vergata (ITALY) 
(*) Universita' di Ancona (ITALY) 
(#) Universita' di Roma, La Sapienza (ITALY) 
Abstract 
Semantically tagging a corpus is useful for many 
intermediate NLP tasks such as: acquisition of 
word argument structures in sublanguages, ac- 
quisition of syntactic disambiguation cues, ter- 
minology learning, etc. Semantic categories al- 
low the generalization of observed word pat- 
terns, and facilitate the discovery of irecurrent 
sublanguage phenomena and selectional rules of 
various types. Yet, as opposed to POS tags in 
morphology, there is no consensus in literature 
about the type and granularity of the category 
inventory. In addition, most available on-line 
taxonomies, as WordNet, are over ambiguous 
and, at the same time, may not include many 
domain-dependent senses of words. In this pa- 
per we describe a method to adapt a general 
purpose taxonomy to an application sub\[an- 
guage: flint, we prune branches of the Wordnet 
hierarchy that are too " fine grained" for the do- 
main: then. a statistical model of classes is built 
from corpus contexts to sort the different classi- 
fications or assign a classification to known and 
unknown words, respectively. 
1 Introduction 
Lexical learning methods based on the use of 
semantic categories are faced with the problem 
of overambiguity and entangled structures of 
Thesaura and dictionaries. WordNet and Ro- 
get's Thesaura were not initially conceived, de- 
spite their success among researchers in lexi- 
cal statistics, as tools for automatic language 
processing. The purpose was rather to pro- 
vide the linguists with a very refined, general 
purpose, linguistically motivated source of tax- 
onomic knowledge. As a consequence, in most 
on-fine Thesaura words are extremely ambigu- 
ous. with very subtle distinctions among senses. 
High ambiguity, entangled nodes, and asymme- 
try have already been emphasized in (Hearst 
and Shutze, 1993) as being an obstacle to the 
effective use of on-line Thesaura in corpus lin- 
guistics. In most cases, the noise introduced 
by overambiguity almost overrides the positive 
effect of semantic clustering. For example, in 
(BriIl and Resnik, 1994) clustering PP heads ac- 
cording to WordNet synsets produced only a \[% 
improvement in a PP disambiguation task. with 
respect to the non-clustered method. A subse- 
quent paper (Resnik. 1997) reports of a 40% 
precision in a sense disambiguation task. al- 
ways based on generalization through WordNet 
synsets. Context-based sense clisambiguation 
becomes a prohibitive task on a wide-scale basis, 
because when words in the context of unambigu- 
ous word are replaced by their s.vnsets, there 
is a multiplication of possible contexts, rather 
than a generalization. \[n (Agirre and Rigau. 
1996) a method called Conceptual Distance is 
proposed to reduce this problem, but the re- 
ported performance in disambiguation still does 
not reach 50%. On the other hand, (Dolan. 
1994) and (Krovetz and Croft. 1992) claim that 
fine-grained semantic distinctions are unlikely 
to be of practical value for many applications. 
Our experience supports this claim: often, what 
matters is to be able to distinguish among con- 
trastive (Pustejowsky. 1995) ambiguities of the 
bank_river bank_organisation flavor. The prob- 
lem however is that the notion of"coutrastive" 
is domain-dependent. Depending upon the sub- 
language (e.g. medicine, finance, computers. 
etc.) and upon the specific NLP application 
(e.g. Information Extraction, Dialogue etc.) a 
given semantic label may be too general or too 
specific for the task at hand. For example, the 
word line has 27 senses in WordNet. many of 
which draw subtle distinctions e.g. line of ~cork 
(sense 26) and line of products (sense \[9). In aa 
80 
I 
I 
I 
I 
I 
I 
I 
I 
I 
i 
I 
I 
I 
I 
I 
I 
I 
I 
application aimed at extracting information on 
new products in an economic domain, we would 
be interested in identi~-ing occurrences of such 
senses, but perhaps all the other senses could 
be clustered in one or two categories, lbr exam- 
ple Artifact, grouping senses such as: telephone- 
line, railway and cable, and Abstraction, group- 
ing senses such as series, conformity and indica- 
tion. Vice versa, if the sublanguage is technical 
handbooks in computer science, we would like 
to distinguish the cable and the string of words 
senses (7 and 5, respectively), while any other 
distinction may not have any practical interest. 
The research described in this paper is aimed 
at providing some principled, and algorithmic, 
methods to tune a general purpose taxonomy to 
specific sublanguages and domains. 
In this paper, we propose a method by which 
we select a set of core semantic nodes in the 
WordNet taxonomy that "optimally" describe 
the semantics of a sublanguage, according to 
a scoring function defined as a linear combi- 
nation of general and corpus-dependent perfor- 
mance factors. The selected categories are used 
to prune WordNet branches that appear, ac- 
cording to our scoring function, less pertinent to 
the given sublanguage, thus reducing the initial 
ambiguity. Then, we learn from the application 
corpus a statistical model of the core categories 
and use this model to further tune the initial 
taxonomy. Tuning implies two actions: 
The first is to attempt a reclassification 
of relevant word:; in the corpus that are 
not covered bv the selected categories, 
i.e.. words belonging exclusively to pruned 
branches. Often. :hese words have domain- 
dependent .,;enses that are not captured in 
the initial WordNet classification (,e.g. the 
software sense of release in a software hand- 
books sublanguage). The decision to as- 
sign an unclassified word to one of the se- 
lected categories is based on a strong de- 
tected similarity between the contexts in 
which the word o.:curs, and the statistical 
model of the core categories. 
The second iis to further reduce the ambigu- 
itv of words that :;till have a high ambigu- 
ity, with respect to the other word.s in the 
corpus. For example, the word stock in a fi- 
nancial domain still preserved the gunstock 
81 
sense, because instrumentality was one of 
the selected core categories for the domain. 
The expectation of this sense ,nay be low- 
ered, as before, by comparing the typical 
contexts of stock with the acquired model 
of instrumentality. 
In the next sections, we first describe the al- 
gorithm for selecting core categories. Then, we 
describe the method for redistributing relevant 
words among the nodes of the pruned hierarchy. 
Finally, we discuss an evaluation experiment. 
2 Selection of core categories from 
WordNet 
The first step of our method is to select from 
WordNet an inventory of core categories that 
appear particularly appropriate for the domain. 
and prune all the hierarchy branches that does 
not belong to such core categories. This choice 
is performed as follows: 
Creation of alternative sets of balanced 
categories 
First, an iterative method is used to create alter- 
native sets of balanced categories, using infor- 
mation on words and word frequencies in the ap- 
plication corpus. Sets of categories have an in- 
creasing level of generality. The set-generation 
algorithm is an iterative application of the algo- 
rithm proposed in (Hearst and Sht, tze. 1993) for 
creating WordNet categories of a fixed ~tverage 
size. \[n short t. the algorithm works as follows: 
Let C be a set of WordNet svnsets s. iV the set 
of different words (nouns) in the corpus. P(C) 
the number of words ill W that are instances 
of C. weighted by their frequency in the cor- 
pus, UB and LB the upper and lower bound 
for P(C). At each iteration step i. a new synset 
s is added to the current category set C~. iff 
the weight of s lies within the current bound- 
aries, that is. P(s) <_ UBi and P(s) >_ LBi. 
If P(s) >_ UBi s is replaced in Ci by its de- 
scendants, for which the same constraints are 
verified. If P(s) < LBi . s is added to a list of 
"small;' categories SCT(C'i). \[n fact. when re- 
placing an overpopulated category by its sons. 
it may well be the case that some of its sons are 
under populated. 
I The procedure new_cat\[S) is almost the same as in 
(Hearst and Shutze, 1993). For sake of brevity, the algo- 
rithm is not explained in much details here. 
I 
I 
I 
I 
I 
I 
I 
Scoring Alternative Sets of Categories 
Second, a scoring function is applied to alter- 
native sets to identify the core set. The core 
set is modeled as the linear function of four 
performance factors: generality, coverage of the 
domain, average ambiguity, and discrimination 
power. For a formal definition of these four mea- 
sures, see (Cucchiarelli and Velardi, 1997). We 
provide here an intuitive description of these 
factors: 
Generality (G): In principle, we would like to 
represent the semantics of the domain using the 
highest possible level of generalization. A small 
number of categories allows a compact repre- 
sentation of the semantic knowledge base, and 
renders word sense disambiguation more sim- 
ple. On the other side, over general categories 
fail to capture important distinctions. The Gen- 
erality is a gaussian measure that mediates be- 
tween over generality and overambiguity. 
Coverage (CO) This is a measure of the cov- 
erage that a given category set C'i has over the 
words in the corpus. The algorithm for balanced 
category selection does not allow a full coverage 
of the words in the domain: given a selected 
pair < UB, LB >. it may well be the case that 
several words are not assigned to any category, 
because when branching from an overpopulated 
category to its descendants, some of the descen- 
dants may be under populated. Each iterative 
step that creates a C, also creates a set of un- 
der populated categories SCT(Ci). Clearly, a 
"good" selection of Ci is one that minimizes this 
problem (and has therefore a "high" coverage). 
Discrimination Power (DP): A certain se- 
lection of categories may not allow a full dis- 
crimination of the lowest-level senses for a word 
(leaves-synsets hereafter). For example, if psy- 
chological_feature is one of the core categories, 
and if we choose to tag a corpus only with core 
categories, it would be impossible to discrimi- 
nate between the business-target and business- 
concern senses. Though nothing can be said 
about the practical importance of discriminat- 
ing between such two synsets, in general a good 
choice of Ci is one that allows as much as possi- 
ble the discrimination between low level senses 
of ambiguous words. 
Average Ambiguity (A) : Each choice of Ci 
in general reduces the initial ambiguity of the 
corpus. In part. because there are leaves-synsets 
that converge into a single category of the set. 
in part because there are leaves-synsets of a 
word that do not reach any of these categories. 
Though in general we don't know if. by cutting 
out a node. we are removing a set of senses inter- 
esting (or not) for the domain, still in principle 
82 
a good choice of categories is one that reduces as 
much as possible the initial ambiguity. The cu- 
mulative scoring function for a set of categories 
Ci is defined as the linear combination of the 
performance parameters described above: 
Sco~( C'i ) = aG(C~) + 3C.'0(C~) + 
1 +,(DP(C~) +6A(G) (t) 
Estimation of model parameters and re- 
finements 
An interpolation method is adopted to estimate 
the parameters of the model against a reference. 
correctly tagged, corpus (SemCor, tile WordNet 
semantic concordance). The performance of al- 
ternative inventories of core categories is evalu- 
ated in terms of effective reduction of overam- 
biguity. This measure is a combination of the 
system precision at pruning out spurious (for 
the domain) senses, and the global reduction of 
ambiguity. Notice that we are not measuring 
the precision of sense disambiguation in con- 
texts, but simply the precision at reducing a- 
priori the set of possible senses for a word. in a 
given domain. 
The method above is weakly supervised: the 
parameters estimated have been used without 
re-estimation to capture core categories in other 
domains such as Natural Science and a UNIX 
manual. Details on portability of this choice are 
in (Cucchiarelli and Velardi. forthcoming 1998). 
In the different experiments, the best per- 
forming choice of core categories is the one 
with an upper population of 62.000 words 
(frequency weighted). This corresponds to the 
following list of 14 categories: 
num.x:a.t=14 t=61 UB=62000 LB=24800 N=2000 k----IO00 h=O.40 
person, individuM, ~omeone. mortal, human, soul 
in~trument.~lity, instrumentation 
~ttr|bute 
wr|t ten.~omm u nicatlon, wrltten-l~ngu~ge 
message, content. ~ubject-m~tter. 3ubst.xnce 
measuL'e, quantity, amount, quantum 
~ction 
:Lctivity 
group..~ction 
organiz~.tton 
p~ychologlealMe~t ure 
poJLeJsioa 
.It ~,te 
|OC&ttOn 
This selection of core categories is measured 
to have the following performance: 
Precision: 77.6~. 
Reduction of Ambiguity: 37~ 
I 
I 
I 
I 
I 
I 
.I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
Coverage: - ~¢' 
In (Cucchiarelli and Velardi, forthcoming 
1998) a method is proposed to automatically 
increase the coverage of the core set with an 
additional set of categories, selected from the 
set of under populated categories SCT(Ci) (see 
step 1 of the algorithm). With the extension: 
subt~nce, m&~ter 
event 
gathering, assemblage 
phenomenon 
sgructure,const ruction 
na~urxl.object 
creation 
the following performance is obtained: 
Precision: 78,9% 
Reduction of Ambiguity: 26% 
Coverage: 93% 
With some manual refinement of the ex- 
tended set , the precision rises to over 80%. 
Obtaining a higher precision is difficult because, 
neither SemCor nor WordNet can be considered 
a golden standard. In a recent workshop on se- 
mantic texts tagging (TAGWS 1997), the diffi- 
culty of providing comprehensible guidelines for 
semantic annotators in order to avoid disagree- 
ment and inconsistencies was highlighted. On 
the other side. there are many redundancies and 
some inconsistencies in WordNet that makes the 
task of (manual) classification very complex. To 
make an example, one of the detected classi- 
ficatiou errors in our Wall Street Journal ex- 
periment was the selection of two possible core 
senses for the word market: : organization and 
act\[city. Vice versa, in the economic fragment 
of SemCor. market is consistently classifies as 
socio-economic-class, which happens not to be 
a descendent of any of these two categories. 
Our intuition when observing the specific exam- 
ples was more in agreement with the automatic 
classification than with SemCor. Our feeling 
was that the selected core categories could, in 
many cases, represent a good model of clas- 
.~ification for words that remained unclassified 
with respect to the "not pruned" WordNet. or 
appeared misclassified in our evaluation experi- 
illeut. 
in the next section we describe an method to 
verify" this hypothesis and. at the same time, to 
further tune WordNet to a domain. 
83 
3 Redistribution of words among 
core categories 
The purpose of the method described hereafter 
is twofold: 
• The first is to attempt a reclassification of 
words that are not classified, or appeared as 
misclassified, with respect to the "'original" 
WordNet. 
• The second is to further reduce the ambi- 
guity of words that are still very ambigu- 
ous with respect to the "pruned" Word- 
Net. The general idea is that ambiguity 
of words is reduced in a specific domain, 
and enumeration of all their senses is un- 
necessary. Second, some words function as 
sense primers for others. Third, raw con- 
texts of words provide a significant bundle 
of information to guide disambiguation. 
To verify this hypothesis systematically we 
need to acquire from the corpus a contextual 
model of the core categories, and then verify 
to what extent certain "interesting" words (for 
example, unclassified words) adhere to the con- 
textual model of one of such categories. 
Our method, inspired by (Yarowsky, t992), 
works as follows (see (Basil\[ et al, 1997} for de- 
tails}: 
• Step 1. Select the most typical words in 
each core category: 
Step 2. Acquire the collective contexts of 
these words and use them ms a (distribu- 
tional) description of each category: 
Step 3. Use tile distributional descrip- 
tions to evaluate the (corpus-dependent) 
membership of each word to the different 
categories. 
Step l is carried out detecting tile more sig- 
nificant (and less ambiguous) words in any of 
the core classes : these sets are called the kernel 
of the corresponding class. Rather than train- 
ing the classifier on all the nouns in tile learning 
corpus ,as in (Yarowsky. \[992). we select only a 
subset of protot!lpical words for each category. 
We call these words w the salient words of a 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
category C'. We define the typicality T~(C') of 
w in C, as: 
Nw,c _ -- (2) 
I'V w 
where: 
,V,, is the total number of synsets of a word w, i.e. 
all the WordNet synonymy sets including w. 
.V,o.c is the number of synsets of w that belong to 
the semantic category C, i.e. synsets indexed with 
C in WordNet. 
The typicality depends only on WordNet. A 
typical noun for a category C is one that is ei- 
ther non ambiguously assigned to C in Word- 
Net, or that has most of its senses (synsets) in 
C. 
The synonymy S~, of w in C, i.e. the degree 
of synonymy showed by words other than w in 
the synsets of the class C in which w appears. 
is modeled by the following ratio: 
s,,,(c)_ o ,c (3) O~ 
where: 
O,. is the number of words in the corpus that appear 
in at least one of the synsets of w. 
O,:.c is the number of words in the corpus appearing 
ill at least one of the synsets of w, that belong to C. 
"\['lie synonymy depends both on WordNet 
and on the corpus. A noun with a high de- 
gree of synonymy in C is one with a high num- 
ber of synonyms in the corpus, with reference 
to a specific sense (synset) belonging to C. 
Salient nouns for C are frequent, typical, and 
with a high synonymy in C. The salient words 
w. for a semantic category C, are thus identified 
maximizing the following function, that we call 
SPo l'{~ : 
= 0.% × × S,.(C') (4) 
where O.4~, are the absolute occurrences of w 
in the corpus. The value of Score depends both 
on the corpus and on '~,brdNet. O.4~, depends 
obviously on the corpus. 
The kernel of a category kernel(C), is the set 
of salient words w with a "high" 5core~(C). In 
,\[,able I some kernel words for the class gather- 
ing.as.~emblage are reported. 
Step 2 uses the kernel words to build (as 
in (Yarowsky. i992)) a probabilistic model of a 
84 
Table 1: Some kernel elements for class 
gathering, assemblage 
Score Word Score Word 
17: 
0.68835 executive 0.11108 business 
0.55539 senate 0.11108 household 
0.33828 public 0.10014 council 
0.28485 court 0.08920 school 
0.23815 family 0.08864 session 
0.20869 commune 0.08780 form 
0.14839 press 0.08667 town 
0.11907 vote 0.07868 staff 
class: this model is based on the distribution 
of class relevance of the surrounding terms in 
typical contexts. 
In Step 3 a word is assigned to one. or more, 
classes according to the contexts in which it 
appears. Many contexts may enforce the se- 
lection of a given class, or multiple classifica- 
tions are possible when different contexts sug- 
gest independent classes. For a given word 
w, and for each category C, we evaluate the 
following function, that we call Domain Sense 
(DSense(w, C)): 
where 
1 DSense(w.C) = -~= ~ Y(k.C') (5) 
k 
Y(k'.C) = ~ Pr(..'.(') × Pr(C) 
,L,'6~: 
(6) 
where k's are tile contexts of w. and a" is a 
generic word in k. 
In (6), Pr(C)is the (not uniform) probability of 
a class C, given by the ratio between the num- 
ber of collective contexts for (7' 2 and the total 
number of collective contexts. 
4 Discussion of the experiment 
In this section we describe some preliminary re- 
suits of an experiment conducted on tile Wall 
Street Journal. We used 21 categories including 
\[4 core categories plus 7 additional categories 
obtained with automatic extension of tile best 
core set (see section 2). \[n experiment I. we 
selected the 6 most frequent unclassified words 
in the corpus, and attempted a reclassification 
"those collected around the kernel words of C 
according to the contextual description of the 
21 categories. In experiment 2, we selected the 
6 most frequent and still very ambiguous (ac- 
cording to the pruned WordNet) words, and 
attempted a reduction of ambiguity. For each 
word w and each category C, we compute the 
DSense(w, C) and then select only those senses 
that exhibit a membership value higher than the 
average membership of kernel words of C. The 
assignment of a word to a category is performed 
regardless of the current classification of w in 
the pruned WordNet. 
The following Table 2 summarizes the results 
of experiment 1: 
Table 2: Selected categories for some unclassi- 
fied words 
Word/freq Selected cate~;ories 
w~.11/447 g~.t herin g, w rift e n .x:o m m u nic~.tio n,o r &a.nizatio n 
pert tagon/183 gat h e rin g,\[oc ~,tio n.o rga.niza.tio n 
people/g73 g~thering 
xirport/$9 co nst ruction,loc~tion 
congress/456 ~;a.therin~,person 
Table 3 reports on experiment 2. \[n column 
3. selected categories are reported in decreasing 
order of class membership evidence. 
Ia Table 2, notice the apparently "strange" 
classification of wall. The problem is that, in 
the current version of our system, proper nouns 
are not correctly detected (this problem will 
Table 3: Selected and Initial WN categories for 
some very ambiguous words 
W ~,rd, Ircq 
,hare/347?~ 
price/2132 
b~nk/t3t3 
b,t:oneJ$/l "~63 
bo.nd/l.366 
I Inlllal W 31 categortcs 
wr|tte n-co rn rn unit ~tlon 
poJsesslon 
rroup-},ctlon 
activity 
mstrumertta.lity 
w rltten-com rnu nlc&tlon 
posres.~ion 
per.~on 
n&turll~bject 
In~t r u menta.lit~¢ 
rne~saKe,content 
po..se~ion 
.~ttrlbute 
orga, nlz3tion 
poJJe~Jion 
instrumentality 
rtatural.Oblect 
groq p.-~ctlon 
org~,nlz~tion 
g~,thering 
p~ych, feature 
activity 
po~Je$$ion 
~ttribute 
phenomenon 
mitrurnentllitv 
Selected cat cgorier, 
w r it t e n ..co m m urt tc.~.t,o rt 
posseJ~ion 
grou p..xction 
w rltt en.xomrn un|c~tlort 
possession 
persort 
ines~a$te,con tertt 
possesJion 
~ttrlbute 
or&~niz~tion 
possession 
group_~ctlort 
or ganiz~.tlon 
g~therm& 
cre~tlort 
po$$e$~|orl 
,1.t t rib u te 
85 
be fixed shortly) since in the Wall Street Jour- 
nal there is no special syntactic tag for proper 
names. Erroneously, several proper names, such 
as Wall Street, Wall Street Journal, Bush, Delta. 
Apple, etc. were initially classified as common 
nouns, therefore causing some noise in the data 
that we need now to eliminate 3. 
The word wall is in fact part of the com- 
plex nominals Wall Street and Wall Street 
Journal, and it is very interesting that. 
based on the context, the system classifies 
it correctly in the three categories: gather- 
ing, written_communication, organization Notice 
that the category: "gathering, assemblage" has 
somehow an unintuitive label, but in the WSJ 
domain this class includes rather uniform words, 
most of which refer to political organizations, as 
shown in Table 1. 
In Table 3. it is shown that often some reduc- 
tion of ambiguity is possible. However. some 
spurious senses survive, for example, the pro- 
genitor (person) sense of stock. \[t is very im- 
portant that. in all tile analyzed cases, the se- 
lected classes are a subset of the initial Word- 
Net classes: remember that the assignment of 
a word to a category is performed only on the 
basis of its computed membership to that cat- 
egory. There is one example of additional de- 
tected sense (not included in the pruned Word- 
Net), i.e. the sense creation for the word hood. 
Typical (for the domain) words in this class are: 
plan. yeld. software, magazine, jottroal, is.~,e. 
etc. therefore, the creation sense seems appro- 
priate. 
Clearly. we need to perform a better (in 
the large} experimentation, but the first results 
seem encouraging. A large scale experiment re- 
quires, besides a better tuning of the statistical 
parameters and fixing some obvious bug (e.g. 
the identification of proper nouns}, the prepara- 
tion of a test set in which the correct classifica- 
tion of a large nurnber of words is verified man- 
ually in the actual corpus contexts. Finally, ex- 
periments should be extended to domains other 
than WordNet. We already experimented the 
algorithm for core category selection on a UNIX 
corpus and on a small Natural Science corpus. 
but again, extending the complete experiment 
For example, the additional category n,Oftrrtl.x, bject 
was created because of rhe high frequency of spurious 
nouns as apple, delta, b lsh. etc. 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
I 
to other corpora is not trivial for the required 
intensive linguistic and statistical corpus pro- 
cessiug. 

References 
(Agirre and Rigau, 1996) E. Agirre and G. Rigau, 
Word'Sense Disambiguation using Conceptual 
Density, proc. of COLING 1996 

Basili, Della Rocca, Pazienza, 1997) R. Basili, 
M.Della Rocca, M.T. Pazienza, Towards a 
Bootstrapping Fr'amework for Corpus Semantic 
Tagging, in (TAGWS 1997) 

Basili et al. 1995b.) Basili R., M. Della Rocca, 
M.T. Pazienza, P. Velardi. "Contexts and cat- 
egories: tuning a general purpose verb clas- 
sification to sublanguages". Proceeding of 
RANLP95, Tzigov Chark, Bulgaria, 1995. 

Brill and Resnik, /994) E. Brill and P. Resnik, A 
transformation-based approach to prepositional 
phrase attachment disambiguation, proc. of 
COLING 1994 

(Chen and Chen, 1996) K. Chen and C. Chen, 
A rule-based and MT-oriented Approach to 
Prepositional Phrase Attachment, proc. of 
COLING 1996 

(Cucchiarelli and Velardi. forthcoming 1998) Cuc- 
chiarelli A., Velardi P. "Finding a Domain- 
Appropriate Sense Inventory for Semantically 
Tagging a Corpus" Int. Journal of Natural Lan- 
guage Engineering, in press. 1998 

(Cucchiarelli and Velardi, 1997) Cucchiarelli A., Ve- 
lardi P. "Automatic Selection of Class Labels 
from a Thesaurus for an Effective Semantic Tag- 
ging of Corpora", 6th Conf. on Applied Nat- 
ural Language Processing, ANLP97. Washing- 
toll. April t-3 1997 

(Dolau. 1994) W. Dolan. Word Sense Ambiguation: 
Clustering Related Senses. Proc. of Coling 1994 

(Felibaum. \[997) C. Fellbaum. "' Analysis of a hand- 
tagging task" in (TAGWS 1997). 

(Hearst and Schuetze. 1993) M. Hearst and H. 
Schuetze. Customizing a Lexicon to Better 
Suite a Computational Task, ACL SIGLEX. 
Workshop on Lexical Acquisition from Text, 
Columbus. Ohio, USA, 1993. 

(Yarowsky. D. 1992), "Word-Sense disambiguation 
using statistical models of Roget's categories 
trained on large corpora". Nantes: Proceedings 
of COLING 92. 

(Krovetz and Croft, I992) R. Krovetz and B. Croft, 
Lexical Ambiguity and Information Retrieval, 
in AC'M trans, on Information Systems, tO:2, 
1992 

(Gale et al. 1992) Gale. W. K. Church and D. 
Yarowsky. One sense per discourse, in proc. of 
the DARPA speech and and Natural Language 
workshop, Harriman, NY, February 1992 

(Pustejovsky, 1995) J. Pustejovsky, The generative 
Lexicon, MIT Press, 1995 

(Resnik, 1995) P. Resnik. Disambiguating Noun 
Groupings with respect to Wordnet Senses, 
proc. of 3rd Workshop on Very Large Corpora, 
1995 

(Resnik, 1997) P. Resnik, Selectional reference and 
Sense disambiguation, in TAGWS97 

(TAGWS 1997) Proceedings of the workshop "Tag- 
ging Text with Lexical Semantics: Why, What, 
and How?", published by ACL, 4-5 April 1997, 
Whashington, USA 
