A CLASS-BASED APPROACH TO LEXICAL DISCOVERY 
Philip Resnik* 
Department of Computer and Information Science, University of Pennsylvania 
Philadelphia, Pennsylvania 19104, USA 
Internet: vesnik@linc.cis.upenn.edu 
1 Introduction 
In this paper I propose a generalization of lexical 
association techniques that is intended to facilitate 
statistical discovery of facts involving word classes 
rather than individual words. Although defining as- 
sociation measures over classes (as sets of words) is 
straightforward in theory, making direct use of such 
a definition is impractical because there are simply 
too many classes to consider. Rather than consid- 
ering all possible classes, I propose constraining the 
set of possible word classes by using a broad-coverage 
lexical/conceptual hierarchy \[Miller, 1990\]. 
2 Word/Word Relationships 
Mutual information is an information-theoretic mea- 
sure of association frequently used with natural lan- 
guage data to gauge the "relatedness" between two 
words z and y. It is defined as follows: 
• Pr(z, y) I(x;y) = log 
r (1) Pr(z)Pr(y) 
As an example of its use, consider Itindle's \[1990\] 
application of mutual information to the discovery 
of predicate argument relations. Hindle investigates 
word co-occurrences as mediated by syntactic struc- 
ture. A six-million-word sample of Associated Press 
news stories was parsed in order to construct a collec- 
tion of subject/verb/object instances. On the basis 
of these data, Hindle calculates a co-occurrence score 
(an estimate of mutual information) for verb/object 
pairs and verb/subject pairs. Table 1 shows some of 
the verb/object pairs for the verb drink that occurred 
more than once, ranked by co-occurrence score, "in 
effect giving the answer to the question 'what can you 
drink?' " \[Hindle, 1990\], p. 270. 
Word/word relationships have proven useful, but 
are not appropriate for all applications. For example, 
*This work was supported by the following grants: Alto 
DAAL 03-89-C-0031, DARPA N00014-90-J-1863, NSF IRI 90- 
16592, Ben Franklin 91S.3078C-1. I am indebted to Eric 
Brill, Henry Gleitman, Lila Gleitman, Aravind Joshi, Chris- 
tine Nakatani, and Michael Niv for helpful discussions, and to 
George Miller and colleagues for making WordNet available. 
Co-occurrence score \[ verb \[ object 
11.75 drink tea 
11.75 drink Pepsi 
11.75 drink champagne 
10.53 drink liquid 
10.20 drink beer 
9.34 drink wine 
Table 1: High-scoring verb/object pairs for 
drink (part of Hindle 1990, Table 2). 
the selectional preferences of a verb constitute a re- 
lationship between a verb and a class of nouns rather 
than an individual noun. 
3 Word/Class Relationships 
3.1 A Measure of Association 
In this section, I propose a method for discovering 
class-based relationships in text corpora on the ba- 
sis of mutual information, using for illustration the 
problem of finding "prototypical" object classes for 
verbs. 
Let V = {vl,v~,...,vz} andAf = {nl,n2,...,nm} 
be the sets of verbs and nouns in a vocabulary, and 
C = {clc C_ Af} the set of noun classes; that is, the 
power set of A f. Since the relationship being inves- 
tigated holds between verbs and classes of their ob- 
jects, the elementary events of interest are members 
of V x C. The joint probability of a verb and a class 
is estimated as 
rtEc Pr(v,c) 
E E (2) 
u'EV n~EJV " 
Given v E V, c E C, define the association score 
Pr( , c) A(v,c) ~ Pr(cl~ )log Pr(~)Pr(c) (3) 
= Pr(clv)I(v; c). (4) 
The association score takes the mutual information 
between the verb and a class, and scales it according 
327 
to the likelihood that a member of that class will 
actually appear as the object of the verb. 1 
3.2 Coherent Classes 
A search among a verb's object nouns requires at 
most I.A/" I computations of the association score, and 
can thus be done exhaustively. An exhaustive search 
among object classes is impractical, however, since 
the number of classes is exponential. Clearly some 
way to constrain the search is needed. I propose re- 
stricting the search by imposing a requirement of co- 
herence upon the classes to be considered. For ex- 
ample, among possible classes of objects for open, 
the class {closet, locker, store} is more coherent than 
{closet, locker, discourse} on intuitive grounds: ev- 
ery noun in the former class describes a repository 
of some kind, whereas the latter class has no such 
obvious interpretation. 
The WordNet lexical database \[Miller, 1990\] pro- 
vides one way to structure the space of noun classes, 
in order to make the search computationally feasi- 
ble. WordNet is a lexical/conceptual database con- 
structed on psycholinguistic principles by George 
Miller and colleagues at Princeton University. Al- 
though I cannot judge how well WordNet fares with 
regard to its psycholinguistic aims, its noun taxon- 
omy appears to have many of the qualities needed 
if it is to provide basic taxonomic knowledge for the 
purpose of corpus-based research in English, includ- 
ing broad coverage and multiple word senses. 
Given the WordNet noun hierarchy, the definition 
of "coherent class" adopted here is straightforward. 
Let words(w) be the set of nouns associated with a 
WordNet class w. 2 
Definition. A noun class e • C is coher- 
ent iff there is a WordNet class w such 
that words(w) N A/" = c. 
I A(v,c) l verb \[ object class \[ 
3.58 2.05 I drink drink \] /beverage' \[beverage .... \]~) { 
(intoxicant, \[alcohol .... J 
Table 2: Object classes for drink 
4 Preliminary Results 
An experiment was performed in order to discover the 
"prototypical" object classes for a set of 115 common 
English verbs. The counts of equation (2) were cal- 
culated by collecting a sample of verb/object pairs 
from the Brown corpus. 4 Direct objects were iden- 
tified using a set of heuristics to extract only the 
surface object of the verb. Verb inflections were 
mapped down to the base form and plural nouns 
mapped down to singular. 5 For example, the sen- 
tence John ate two shiny red apples would yield the 
pair (eat, apple). The sentence These are the apples 
that John ate would not provide a pair for eat, since 
apple does not appear as its surface object. 
Given each verb, v, the "prototypical" object class 
was found by conducting a best-first search upwards 
in the WordNet noun hierarchy, starting with Word- 
Net classes containing members that appeared as ob- 
jects of the verb. Each WordNet class w consid- 
ered was evaluated by calculating A(v, {n E Afln E 
words(w)}). Classes having too low a count (fewer 
than five occurrences with the verb) were excluded 
from consideration. 
The results of this experiment are encouraging. 
Table 2 shows the object classes discovered for the 
verb drink (compare to Table 1), and Table 3 the 
highest-scoring object classes for several other verbs. 
Recall from the definition in Section 3.2 that each 
WordNet class w in the tables appears as an ab- 
breviation for {n • A/'ln • words(w)}; for example, 
(intoxicant, \[alcohol .... \]) appears as an abbrevi- 
ation for {whisky, cognac, wine, beer}. 
As a consequence of this definition, noun classes 
that are "too small" or "too large" to be coherent are 
excluded, and the problem of search through an ex- 
ponentially large space of classes is reduced to search 
within the WordNet hierarchy. 3 
1 Scaling mutual information in this fashion is often done; see, e.g., \[l:tosenfeld and Huang, 1992\]. 
2Strictly speaking, WordNet as described by \[Miller, 1990\] does not have classes, but rather lexical groupings 
called synonym sets. By "WordNet class" I mean a pair (word, synonym-set ). 
ZA related possibility being investigated independently by Paul Kogut (personal communication) is assign to each noun 
and verb a vector of feature/value pairs based upon the word's classification in the WordNet hierarchy, and to classify nouns 
on the basis of their feature-value correspondences. 
5 Acquisition of Verb Properties 
More work is needed to improve the performance of 
the technique proposed here. At the same time, the 
ability to approximate a lexical/conceptual classifica- 
tion of nouns opens up a number of possible applica- 
tions in lexical acquisition. What such applications 
have in common is the use of lexical associations as 
a window into semantic relationships. The technique 
described in this paper provides a new, hierarchical 
4The version of the Brown corpus used was the tagged cor- pus found as part of the Penn Treebank. 
5Nouns outside the scope of WordNet that were tagged as 
proper names were mapped to the token pname, a subclass of classes (someone, \[person\] ) and (location, \[location\] ). 
328 
I A(v,c) I verb I object class 
1.94 ask 
0.16 call 
2.39 climb 
3.64 cook 
0.27 draw 
3.58 drink 
1.76 eat 
0.30 lose 
1.28 play 
2.48 pour 
1.03 pull 
1.23 push 
1.18 read 
2.69 sing 
(quest ion, \[question .... \] } 
someone, \[person .... \] } 
stair, \[step .... \] I I 
repast, \[repast .... \] ) 
cord, \[cord .... \] } 
(beverage, \[beverage .... \] } 
<nutrient, \[food .... \] } 
<sensory-faculty, \[sense .... \] } 
(part, \[daaracter .... \]) 
<liquid, \[liquid .... \] } 
(cover, \[coverin~ .... l} 
(button, \[button .... \] 
<writt en-mat eriai, \[writ in~ .... \] } 
(xusic, \[~ic .... \]) 
Table 3: Some "prototypical" object classes 
source of semantic knowledge for statistical applica- 
tions. This section briefly discusses one area where 
this kind of knowledge might be exploited. 
Diathesis alternations are variations in the way 
that a verb syntactically expresses its arguments 
\[Levin, 1989\]. For example, l(a,b) shows an in- 
stance of the indefinite object alternation, and 2(a,b) 
shows an instance of the causative/inchoative alter- 
nation. 
1 a. John ate lunch. 
b. John ate. 
2 a. John opened the door. 
b. The door opened. 
Such phenomena are of particular interest in the 
study of how children learn the semantic and syn- 
tactic properties of verbs, because they stand at the 
border of syntax and lexical semantics. There are nu- 
merous possible explanations for why verbs fall into 
particular classes of alternations, ranging from shared 
semantic properties of verbs within a class, to prag- 
matic factors, to "lexieal idiosyncracy." 
Statistical techniques like the one described in this 
paper may be useful in investigating relationships be- 
tween verbs and their arguments, with the goal of 
contributing data to the study of diathesis alterna- 
tions, and, ideally, in constructing a computational 
model of verb acquisition. For example, in the experi- 
ment described in Section 4, the verbs participating in 
"implicit object" alternations 6 appear to have higher 
association scores with their "prototypical" object 
classes than verbs for which implicit objects are dis- 
allowed. Preliminary results, in fact, show a statis- 
tically significant difference between the two groups. 
eThe indefinite object alternation \[Levin, 1989\] and the specified object alternation \[Cote, 1992\]. 
Might such shared information-theoretic properties of 
verbs play a role in their acquisition, in the same way 
that shared semantic properties might? 
On a related topic, Grim_shaw has recently sug- 
gested that the syntactic bootstrapping hypothe- 
sis for verb acquisition \[Gleitman, 1991\] be ex- 
tended in such a way that alternations such as the 
causative/inchoative alternation (e.g. 2(a,b)) are 
learned using class information about the observed 
subjects and objects of the verb, in addition to sub- 
categorization information. 7 I hope to extend the 
work on verb/object associations described here to 
other arguments of the verb in order to explore this 
suggestion. 
6 Conclusions 
The technique proposed here provides a way to study 
statistical associations beyond the level of individ- 
ual words, using a broad-coverage lexical/conceptual 
hierarchy to structure the space of possible noun 
classes. Preliminary results, on the task of discover- 
ing "prototypical" object classes for a set of common 
English verbs, appear encouraging, and applications 
in the study of verb argument structure are appar- 
ent. In addition, assuming that the WordNet hier- 
archy (or some similar knowledge base) proves ap- 
propriately broad and consistent, the approach pro- 
posed here may provide a model for importing basic 
taxonomic knowledge into other corpus-based inves- 
tigations, ranging from computational lexicography 
to statistical language modelling. 

References 
\[Cote, 1992\] Sharon Cote. Discourse functions of two 
types of null objects in English. Presented at the 66th 
Annual Meeting of the Linguistic Society of America, 
Philadelphia, PA, January 1992. 
\[Gleitman, 1991\] Lila Gleitman. The structural sources 
of verb meanings. Language Acquisition, 1, 1991. 
\[Hindle, 1990\] Donald Hindle. Noun classification from 
predicate-argument structures. In Proceedings of the 
~Sth Annual Meeting of the ACL, 1990. 
\[Levin, 1989\] Beth Levin. Towards a lexical organization 
of English verbs. Technical report, Dept. of Linguistics, 
Northwestern University, November 1989. 
\[Miller, 1990\] George Miller. Wordnet: An on-line lexical 
database. International Journal o\] Lexicography, 4(3), 
1990. (Special Issue). 
\[Rosenfeld and Huang, 1992\] Ronald Rosenfeld and Xue- 
dong Huang. Improvements in stochastic language 
modelling. In Mitch Marcus, editor, Fifth DARPA 
Workshop on Speech and Natural Language, February 
1992. Arden House Conference Center, Harriman, NY. 
