An Iterative Approach to Estimating Frequencies over a Semantic 
Hierarchy 
Stephen Clark and David Weir 
School of Cognitive and Computing Sciences 
University of Sussex 
Brighton, BN1 9HQ, UK 
{stephecl, davidw)@cogs, susx. ac. uk 
Abstract 
This paper is concerned with using a se- 
mantic hierarchy to estimate the frequency 
with which a word sense appears as a given 
argument of a verb, assuming the data is 
not sense disambiguated. The standard ap- 
proach is to split the count for any noun ap- 
pearing in the data equally among the al- 
ternative senses of the noun. This can lead 
to inaccurate estimates. We describe a re- 
estimation process which uses the accumu- 
lated counts of hypernyms of the alternative 
senses in order to redistribute the count. In 
order to choose a hypernym for each alter- 
native sense, we employ a novel technique 
which uses a X 2 test to measure the homo- 
geneity of sets of concepts in the hierarchy. 
1 Introduction 
Knowledge of the constraints a verb places 
on the semantic types of its arguments (var- 
iously called selectional restrictions, selec- 
tional preferences, selectional constraints) is 
of use in many areas of natural language pro- 
cessing, particularly structural disambigua- 
tion. Recent treatments of selectional re- 
strictions have been probabilistic in nature 
(Resnik, 1993), (Li and Abe, 1998), (Ribas, 
1995), (McCarthy, 1997), and estimation 
of the relevant probabilities has required 
corpus-based counts of the number of times 
word senses, or concepts, appear in the dif- 
ferent argument positions of verbs. A dif- 
ficulty arises due to the absence of a large 
volume of sense disambiguated data, as the 
counts have to be estimated from the nouns 
which appear in the corpus, most of which 
will have more than one sense. The tech- 
niques in Resnik (1993), Li and Abe (1998) 
and Ribas (1995) simply distribute the count 
equally among the alternative senses of a 
noun. Abney and Light (1998) have at- 
tempted to obtain selectional preferences us- 
ing the Expectation Maximization algorithm 
by encoding WordNet as a hidden Markov 
model and using a modified form of the 
forward-backward algorithm to estimate the 
parameters. 
The approach proposed in this paper is to 
use a re-estimation process which relies on 
counts being passed up a semantic hierar- 
chy, from the senses of nouns appearing in 
the data. We make use of the semantic hier- 
archy in WordNet (Fellbaum, 1998), which 
consists of word senses, or concepts, 1 related 
by the 'is-a' or 'is-a-kind-of' relation. If c' is 
a kind of c, then c is a hypernym of c', and c' 
a hyponym of c. Counts for any concept are 
transmitted up the hierarchy to all of the 
concept's hypernyms. Thus if eat chicken 
appears in the corpus, the count is transmit- 
ted up to <meat >, < :food>, and all the 
other hypernyms of that sense of chicken? 
The problem is how to distinguish the cor- 
rect sense of chicken in this case from incor- 
rect senses such as <wimp>. 3 We utilise the 
1We use the words sense and concept interchange- 
ably to refer to a node in the semantic hierarchy. 
eWe use italics when referring to words, and an- 
gled brackets when referring to concepts or senses. 
This notation does not always pick out a concept 
uniquely, but the particular concept being referred 
to should be clear from the context. 
3The example used here is adapted from Mc- 
Carthy (1997). There are in fact four senses of 
chicken in WordNet 1.6, but for ease of exposi- 
tion we consider only two. The hypernyms of the 
258 
fact that whilst splitting the count equally 
can lead to inaccurate estimates, counts do 
tend to accumulate in the right places. Thus 
counts will appear under <:food>, for the 
object of eat, but not under <person>, in- 
dicating that the object position of eat is 
more strongly associated with the set of con- 
cepts dominated by <:food> than with the 
set of concepts dominated by < person >. 
By choosing a hypernym for each alternative 
sense of chicken and comparing how strongly 
the sets dominated by these hypernyms as- 
sociate with eat, we can give more count in 
subsequent iterations to the food sense of 
chicken than to the wimp sense. 
A problem arises because these two senses 
of chicken each have a number of hypernyms, 
so which two should be compared? The cho- 
sen hypernyms have to be high enough in 
the hierarchy for adequate counts to have 
accumulated, but not so high that the alter- 
native senses cannot be distinguished. For 
example, a hypernym of the food sense of 
chicken is <poultry>, and a hypernym of 
the wimp sense is <weakling>. However, 
these concepts may not be high enough in 
the hierarchy for the accumulated counts to 
indicate that eat is much more strongly as- 
sociated with the set of concepts dominated 
by <poultry> than with the set dominated 
by <weakling>. At the other extreme, if 
we were to choose <entity>, which is high 
in the hierarchy, as the hypernym of both 
senses, then clearly we would have no way of 
distinguishing between the two senses. 
We have developed a technique, using a 
X 2 test, for choosing a suitable hypernym 
for each alternative sense. The technique is 
based on the observation that a chosen hy- 
pernym is too high in the hierarchy if the set 
consisting of the children of the hypernym is 
not sufficiently homogeneous with respect to 
the given verb and argument position. Using 
the previous example, <entity> is too high 
to represent either sense of chicken because 
food sense are <poultry>, <bird>, <meat>, 
< foodstuff >, < food >, < substance >, < 
object >, < entity >. The hypernyms of the 
wimp sense are < weakling >, < person >, < 
life_form>, <entity>. 
the children of <entity> are not all associ- 
ated in the same way with eat. The set con- 
sisting of the children of <meat>, however, is 
homogeneous with respect to the object po- 
sition of eat, and so <meat> is not too high 
a level of representation. The measure of ho- 
mogeneity we use is detailed in Section 5. 
2 The Input Data and 
Semantic Hierarchy 
The input data used to estimate frequencies 
and probabilities over the semantic hierarchy 
has been obtained from the shallow parser 
described in Briscoe and Carroll (1997). The 
data consists of a multiset of 'co-occurrence 
triples', each triple consisting of a noun 
lemma, verb lemma, and argument position. 
We refer to the data as follows: let the uni- 
verse of verbs, argument positions and nouns 
that can appear in the input data be denoted 
= {Vl,... ,Vkv }, 1Z---- {rl,... ,rkn} and 
Af = {nl,... , nk~¢ }, respectively. Note that 
in our treatment of selectional restrictions, 
we do not attempt to distinguish between 
alternative senses of verbs. We also assume 
that each instance of a noun in the data 
refers to one, and only one, concept. 
We use the noun hypernym taxonomy of 
WordNet, version 1.6, as our semantic hier- 
archy. 4 Let C = {Cl,...,Ckc } be the set 
of concepts in WordNet. There are approx- 
imately 66,000 different concepts. A con- 
cept is represented in WordNet by a 'syn- 
onym set' (or 'synset'), which is a set of 
synonymous words which can be used to de- 
note that concept. For example, the con- 
cept 'nut', as in a crazy person, is repre- 
sented by the following synset: {crackpot, 
crank, nut, nutcase, fruitcake, screwball}. 
Let syn(c) C Af be the synset for the con- 
cept c, and let an(n) = { c In 6 syn(c) } be 
the set of concepts that can be denoted by 
the noun n. The fact that some nouns are 
ambiguous means that the synsets are not 
necessarily disjoint. 
4There are other taxonomies in WordNet, but we 
only use the noun taxonomy. Hence, from now on, 
when we talk of concepts in WordNet, we mean con- 
cepts in the noun taxonomy only. 
259 
The hierarchy has the structure of a di- 
rected acyclic graph, 5 with the isa C C xC re- 
lation connecting nodes in the graph, where 
(d,c) • isa implies d is a kind of c. Let 
isa* C C x C be the transitive, reflexive clo- 
sure of isa; and let 
~= { c' l (d,c ) • isa* } 
be the set consisting of the concept c and all 
of its hyponyms. The set <:food> contains 
all the concepts which are kinds of food, in- 
eluding <food>. 
Note that words in our data can appear 
in synsets anywhere in the hierarchy. Even 
concepts such as <entity>, which appear 
near the root of the hierarchy, have synsets 
containing words which may appear in the 
data. The synset for <entity> is {entity, 
something}, and the words entity and some- 
thing may well appear in the argument po- 
sitions of verbs in the corpus. Furthermore, 
for a concept c, we distinguish between the 
set of words that can be used to denote c 
(the synset of c), and the set of words that 
can be used to denote concepts in L 6 
3 The Measure of Association 
We measure the association between argu- 
ment positions of verbs and sets of concepts 
using the association norm (Abe and Li, 
1996). 7 For C C C, v E Vandr E 7~, the 
association norm is defined as follows: 
A(C, v, r) - p(CIv' r) 
p(CI ) 
For example, the association between the ob- 
ject position of eat and the set of concepts 
denoting kinds of food is expressed as fol- 
lows: A(<food>, eat, object). Note that, for 
5The number of nodes in the graph with more 
than one parent is only around one percent of the 
total. 
6Note that Resnik (1993) uses rather non- 
standard terminology by refering to this second set 
as the synsets of c. 
7This work restricts itself to verbs, but can be ex- 
tended to other kinds of predicates that take nouns 
as arguments, such as adjectives. 
C c C, p(C\]v,r) is just the probability of 
the disjunction of the concepts in C; that is, 
= Zp(clv, r) 
cEC 
In order to see how p(clv ,r) relates to the 
input data, note that given a concept c, 
verb v and argument position r, a noun can 
be generated according to the distribution 
p(n\[c, v, r), where 
p(nlc, v, r) = 1 
nEsyn(c) 
Now we have a model for the input data: 
p(n, v, r) = p(v,r)p(niv ,r) 
= p(v,r)  p(clv, rlp(ntc, v,r) 
cecn(n) 
Note that for c ¢ cn(n), p(nlc, v, r) = O. 
The association norm (and similar mea- 
sures such as the mutual information score) 
have been criticised (Dunning, 1993) because 
these scores can be greatly over-estimated 
when frequency counts are low. This prob- 
lem is overcome to some extent in the scheme 
presented below since, generally speaking, 
we only calculate the association norms for 
concepts that have accumulated a significant 
count. 
The association norm can be estimated 
using maximum likelihood estimates of the 
probabilities as follows. 
£(c,v,r) _ P(cI v,r ) 
 (Clr) 
4 Estimating Frequencies 
Let freq(c, v,r), for a particular c, v and r, 
be the number of (n, v, r) triples in the data 
in which n is being used to denote c, and 
let freq(v, r) be the number of times verb v 
appears with something in position r in the 
data; then the relevant maximum likelihood 
estimates, for c E C, v E 12, r E 7~, are as 
260 
follows. 
freq(~, v, r) 
freq(v, r) 
~eee freq(g, v, r) 
freq(v, r) 
if(Fir) = Evevfreq(c' v'r) 
~,ev freq(v, r) 
_ Ever 
~,ev freq(v, r) 
Since we do not have sense disambiguated 
data, we cannot obtain freq(c, v, r) by sim- 
ply counting senses. The standard approach 
is to estimate freq(c, v, r) by distributing 
the count for each noun n in syn(c) evenly 
among all senses of the noun as follows: 
freq(n, v, r) 
freq(c, v, r) = ~ I cn(n)l 
nEsyn(c) 
where freq(n, v, r) is the number times the 
triple (n,v,r) appears in the data, and 
\[ cn(n)\] is the cardinality of an(n). 
Although this approach can give inaccu- 
rate estimates, the counts given to the incor- 
rect senses will disperse randomly through- 
out the hierarchy as noise, and by accu- 
mulating counts up the hierarchy we will 
tend to gather counts from the correct senses 
of related words (Yarowsky, 1992; Resnik, 
1993). To see why, consider two instances 
of possible triples in the data, drink wine 
and drink water. (This example is adapted 
from Resnik (1993).) The word water is a 
member of seven synsets in WordNet 1.6, 
and wine is a member of two synsets. Thus 
each sense of water will be incremented by 
0.14 counts, and each sense of wine will be 
incremented by 0.5 counts. Now although 
the incorrect senses of these words will re- 
ceive counts, those concepts in the hierarchy 
which dominate more than one of the senses, 
such as <beverage>, will accumulate more 
substantial counts. 
However, although counts tend to accu- 
mulate in the right places, counts can be 
greatly underestimated. In the previous ex- 
ample, freq(<beverage>,drink, object) is in- 
cremented by only 0.64 counts from the two 
data instances, rather than the correct value 
of 2. 
The approach explored here is to use 
the accumulated counts in the following re- 
estimation procedure. Given some verb v 
and position r, for each concept c we have 
the following initial estimate, in which the 
counts for a noun are distributed evenly 
among all of its senses: 
^ 0 freq(n,v,r) freq (c, v, r) ---- 
Icn(n) l nEsyn(c) 
Given the assumption that counts from 
the related senses of words that can fill po- 
sition r of verb v will accumulate at hyper- 
nyms of c, let top(c, v, r) be the hypernym 
of c (or possibly c itself) that most accu- 
rately represents this set of related senses. In 
other words, top(c, v, r) will be an approxi- 
mation of the set of concepts related to c that 
fill position r of verb v. Rather than split- 
ting the counts for a noun n evenly among 
each of its senses c E cn(n), we distribute 
the counts for n on the basis of the accumu- 
lated counts at top(c, v, r) for each c E cn(n). 
In the next section we discuss a method for 
finding top(c, v, r), but first we complete the 
description of how the re-estimation process 
uses the accumulated counts at top(c, v, r). 
Given a concept c, verb v and position r, 
in the following formula we use \[c, v, r\] to de- 
note the set of concepts top(c, v, r). The re- _ 
^ rn+l. estimated frequency treq (c, v, r) is given 
as follows. 
fr rn+l. eq (c, v, r) = 
freq(n,v,r) Am(\[c'~%r\]'v'r) 
 m(F, v, rl, v,r) 
decn(n) 
Note that only nouns n in syn(c) con- 
tribute to the count for c. The count 
freq(n, v, r) is split among all concepts in 
261 
<milk> 
<meal> 
<course> 
<dish> 
<delicacy> 
^ 0 freq (~, eat, obj) 
0.0 (0.6) 
8.5 (5.6) 
1.3 (1.7) 
5.3 (5.7) 
0.3 (1.8) 
15.4 
^ 0 freq (~,obj)- 
^ 0 freq (~, eat, obj) 
9.0 (8.4) 
78.0 (80.9) 
24.7 (24.3) 
82.3 (81.9) 
27.4 (25.9) 
221.4 
^ 0 freq (~,obj)= 
^ 
Ev~v freq°(~, v,obj) 
9.0 
86.5 
26.0 
87.6 
27.7 
236.8 
Table 1: Contingency table for children of <nutriment> 
cn(n) according to the ratio 
£m(\[c,v,r\],v,r) 
5L~¢.(.) £ re(Iv, ~, r\], ~, r) 
For a set of concepts C, 
hm(C,v,r) =15m(CI v'r) 
~m(Clr) 
where 
pm(Clv, r ) = freqm(c, v, r) fr~q(~, 
~) 
ism(Clr) = ~vev freqm( C, v, r) 
~vev freq(v, r) 
freqm(C, v, r) = Z freqm(c, v, r) 
cEC 
5 Determining top(c,v,r) 
The technique for calculating top(c, v, r) is 
based on the assumption that a hypernym 
d of c is too high in the hierarchy to be 
top(e, v, r)if the children of e' are not suf- 
ficiently homogeneous with respect to v and 
r. A set of concepts, C, is taken to be ho- 
mogeneous with respect to a given v E 
and r 6 7~, ifp(vl~ , r) has a similar value for 
each c 6 C. Note that this is equivalent to 
comparing association norms since 
p(vlC, r) _ p(Cv, r) , , , p~ p(vr) 
= A(c,v,r)p(vlr) 
262 
and, as we are considering homogeneity for 
a given verb and argument position, p(vlr ) 
is a constant. 
To determine whether a set of concepts 
is homogeneous, we apply a X 2 test to a 
contingency table of frequency counts. Ta- 
ble 1 shows frequencies for the children of 
<nutriment> in the object position of eat, 
and the figures in brackets are the expected 
values, based on the marginal totals in the 
table. 
Notice that we use the freq0 counts in the 
table. A more precise method, that we in- 
tend to explore, would involve creating a new 
table for each freqm , m > 0, and recalculat- 
ing top(c, v, r) after each iteration. A more 
significant problem of this approach is that 
by considering p(v\]~, r), we are not taking 
into account the possibility that some con- 
cepts are associated with more verbs than 
others. In further work, we plan to consider 
alternative ways of comparing levels of asso- 
ciation. 
The null hypothesis of the test is that 
p(vl~ , r) is the same for each c in the table. 
For example, in Table 1 the null hypothesis 
is that for every concept c that is a child of 
<nutriment>, the probability of some con- 
cept d 6 ~ being eaten, given that it is the 
object of some verb, is the same. For the 
experiments described in Section 6, we used 
0.05 as the level of significance. Further work 
will investigate the effect that different lev- 
els of significance have on the estimated fre- 
quencies. 
The X 2 statistic corresponding to Table 1 
(v, c) Hypernyms of c 
( eat, <hotdog> ) 
( drink, <coff ee> ) 
( see, <movie> ) 
( hear, <speaker> ) 
(kiss, <Socrate s> ) 
<sandwich> <snack_food> ... 
<NUTRIMENT> <food> <substance> <entity> 
<BEVEKAGE> <food> <substance> <entity> 
<SHOW> <communication> <social_relation> 
<relation> <abstraction> 
<communicator> <person> <life_form> <ENTITY> 
<philosopher> <intellect> <person> <LIFE_FOKM> <entity> 
Table 2: How log-likelihood X 2 chooses top(c, v, r) 
is 4.8. We use the log-likelihood X ~ statis- 
tic, rather than the Pearson's X 2 statistic, 
as this is thought to be more appropriate 
when the counts in the contingency table 
are low (Dunning, 1993). 8 For a significance 
level of 0.05, with 4 degrees of freedom, the 
critical value is 9.49 (Howell, 1997). Thus in 
this case, the null hypothesis (that the chil- 
dren of <nutriment> are homogeneous with 
respect to eat) would not be rejected. 
Given a verb v and position r, we com- 
pute top(c,v,r) for each c by determining 
the homogeneity of the children of the hy- 
pernyms of c. Initially, we let top(c, v, r) be 
the concept c itself. We work from c up the 
hierarchy reassigning top(c, v, r) to be suc- 
cessive hypernyms of c until we reach a hy- 
pernym whose children are not sufficiently 
homogeneous. In situations where a concept 
has more than one parent, we consider the 
parent which results in the lowest X 2 value 
as this indicates the highest level of homo- 
geneity. 
6 Experimental Results 
In order to evaluate the re-estimation pro- 
cedure, we took triples from approximately 
two million words of parsed text from the 
SLow counts tend to occur in the table when the 
test is being applied to a set of concepts near the 
foot of the hierarchy. A further extension of this 
work will be to use Fisher's exact test for the tables 
with particularly low counts. 
BNC corpus using the shallow parser devel- 
oped by Briscoe and Carroll (1997). For this 
work we only considered triples for which r = 
obj. Table 2 shows some examples of how 
the log-likelihood X 2 test chooses top(c, v, r) 
for various v 6 V and c 6 C. 9 In giving 
the list of hypernyms the selected concept 
top(c, v, obj) is shown in upper case. 
Table 3 shows how frequency estimates 
change, during the re-estimation process, for 
various v E ~, c E C, and r = obj. The fig- 
ures in Table 3 show that the estimates ap- 
pear to be converging after around 10 itera- 
tions. The first column gives the frequency 
estimates using the technique of splitting the 
count equally among alternative senses of a 
noun appearing in the data. The figures for 
eat and drink suggest that these initial es- 
timates can be greatly underestimated (and 
also overestimated for cases where the argu- 
ment strongly violates the selectional prefer- 
ences of the verb, such as eat <location>). 
The final column gives an upper bound on 
the re-estimated frequencies. It shows how 
many nouns in the data, in the object po- 
sition of the given verb, that could possibly 
be denoting one of the concepts in ~, for each 
v and ~ in the table. For example, 95 is the 
number of times a noun which could possibly 
9Notice that < hotdog > is classified at the 
<nutriment> level rather than <food>. This is 
presumably due to the fact that beverage is classed 
as a food, making the set of concepts <food> het- 
erogenous with respect to the object position of eat. 
263 
(v, ~) 
( eat, <f ood>) 
( drink,.<beverage>) 
( eat, <location>) 
(see, Gobj>) 
( hear, <person> ) 
(enjoy, <amusement>) 
( measure, <abstract ion>) 
m=0 
60.8 
10.5 
2.0 
237.1 
90.8 
2.9 
19.1 
freq '~ (~, v, obj) 
m=l I m=5 
85.0 89.6 
22.7 23.5 
1.2 1.1 
235.7 240.2 
85.5 85.5 
3.1 3.3 
21.7 23.3 
m=10 
89.8 
23.4 
1.1 
240.3 
85.5 
3.3 
23.4 
Limit 
95 
26 
6 
568 
130 
5 
31 
Table 3: Example of re-estimated frequencies 
be denoting a concept dominated by (food> 
appeared in the object position of eat. Since 
eat selects so strongly for its object, we 
would expect freq(<food>,eat, obj) (i.e., the 
true figure) to be close to 95. Similarly, since 
drink selects so strongly for its object, we 
would expect freq(< beverage >,drink, obj) 
to be close to 26. We would also expect 
freq(<location>,eat, obj) to be close to 0. 
As can be seen from Table 3, our estimates 
converge quite closely to these values. 
It is noticeable that the frequency counts 
for weakly selecting verbs do not change as 
much as for strongly selecting verbs. Thus, 
the benefit we achieve compared to the stan- 
dard approach of distributing counts evenly 
is reduced in these cases. In order to investi- 
gate the extent to which our technique may 
be helping, for each triple in the data we 
calculated how the distribution of the count 
changed due to our re-estimation technique. 
We estimated the extent to which the distri- 
bution had changed by calculating the per- 
centage increase in the count for the most 
favoured sense after 10 iterations. Table 4 
shows the results we obtained. The pro- 
portions given in the second column are of 
the triples in the data containing nouns with 
more than one sense. 1° We can see from the 
1°17% of the data involved nouns with only one 
sense in W0rdNet. 
table that for 43% of the triples our tech- 
nique is having little effect, but for 23% the 
count is at least doubled. 
7 Conclusions 
We have shown that the standard technique 
for estimating frequencies over a semantic 
hierarchy can lead to inaccurate estimates. 
We have described a re-estimation proce- 
dure which uses an existing measure of se- 
lectional preference and which employs a 
novel way of selecting a hypernym of a con- 
cept. Our experiments indicate that the re- 
estimation procedure gives more accurate es- 
timates than the standard technique, par- 
ticularly for strongly selecting verbs. This 
could prove particularly useful when using 
selectional restrictions, for example in struc- 
tural disambiguation. 
8 Acknowledgements 
This work is supported by UK EPSRC 
project GR/K97400 and by an EPSRC Re- 
search Studentship to the first author. We 
would like to thank John Carroll for supply- 
ing and providing help with the data, and 
Diana McCarthy, Gerald Gazdar, Bill Keller 
and the anonymous reviewers for their help- 
ful comments. 
264 
Percentage Increase 
0-10 
10-50 
50-100 
100- 
Proportion of data 
43% 
18% 
16% 
23% 
Table 4: How the distribution of counts change 

References 

Naoki Abe and Hang Li. 1996. Learning 
Word Association Norms using Tree Cut 
Pair Models. In Proceedings of the Thir- 
teenth International Conference on Ma- 
chine Learning. 

Steve Abney and Marc Light. 1998. Hid- 
ing a Semantic Class Hierarchy in a 
Markov Model. Unpublished. 

Ted Briscoe and John Carroll. 1997. Au- 
tomatic extraction of subcategorization 
from corpora. In Proceedings of the 5th 
A CL Conference on Applied Natural Lan- 
guage Processing, pages 356-363, Wash- 
ington, DC. 

Ted Dunning. 1993. Accurate Methods for 
the Statistics of Surprise and Coincidence. 
Computational Linguistics, 19(1):61-74. 

Christiane Fellbaum, editor. 1998. WordNet 
An Electronic Lexical Database. The MIT 
Press. 

D. Howell. 1997. Statistical Methods for 
Psychology: 4th ed. Duxbury Press. 
Hang Li and Naoki Abe. 1998. Generaliz- 
ing Case Frames using a Thesaurus and 
the MDL Principle. Computational Lin- 
guistics, 24(2):217-244. 

Diana McCarthy. 1997. Word sense dis- 
ambiguation for acquisition of selectional 
preferences. In Proceedings of the Pro- 
ceedings of the ACL/EACL 97 Work- 
shop Automatic Information Extraction 
and Building of Lexical Semantic Re- 
sources for NLP Applications, pages 52- 
61, Madrid, Spain. 

Philip Resnik. 1993. Selection and Informa- 
tion: A Class-Based Approach to Lexical 
Relationships. Ph.D. thesis, University of 
Pennsylvania. 

Francesc Ribas. 1995. On Learning More 
Appropriate Selectional Restrictions. In 
Proceedings of the Seventh Conference of 
the European Chapter of the Association 
for Computational Linguistics, Dublin, 
Ireland. 

David Yarowsky. 1992. Word-sense disam- 
biguation using statistical models of Ro- 
get's categories trained on large corpora. 
In Proceedings of COLING-92, pages 454-460. 
