Supervised Learning of Lexical Semantic Verb Classes 
Using Frequency Distributions 
Suzanne Stevenson 
Rutgers Umverslty 
suzanne©cs rutgers edu 
Paola Merlo 
Umverslty of Geneva 
merlo©lettres unlge ch 
Natalia Kariaeva 
Rutgers Umverslty 
karlaeva@rcl rutgers edu 
Kamin Whitehouse 
Rutgers Umverslty 
kamlnw©rcl rutgers edu 
Abstract 
Vve zeport a number of computatmnal ex- 
periments m supervised learning whose 
goal Is to automatmally classify a set of 
verbs into lexmal semanUc classes, based 
on frequency dlstnbutmn approxlmatmns 
of grammatical features extracted from a 
very large annotated corpus DlstnbuUons 
of five syntactic features that approximate 
tranmUvlty alternatmns and thematic role 
assignments are sufficient to reduce error 
rate by 56% over chance We conclude 
that corpus data is a usable repository of 
verb class mformatmn, and that corpus- 
driven extraction of grammaUcal features 
Is a promising methodology for automatm 
lexmal acqum,Uon 
1 Introduction 
Recent years have witnessed a shift in grammar de- 
velopment methodology, from crafting large gram- 
mars, to annotation of corpora Correspondingly, 
there has been a change from developing rule-based 
parsers to developing statmUcal methods for reduc- 
ing grammatmal knowledge from annotated corpus 
data The shift has mostly occurred because build- 
mg w~de-coverage grammars is ume-consummg, er- 
ror prone, and difficult The same can be said for 
crafting the rich lexlcal representatmns that are a 
central component of hngmstlc knowledge, and re- 
search m automaUc lexmal acquisition has sought 
to address this ((Doff and Jones, 1996, Dorr, 1997), 
among others) Yet there have been few attempts to 
learn fine-grained lexical classifications from the sta- 
tlsUcal analysis of dlstnbutmnal data, analogously 
to the induction of syntacUc knowledge (though see, 
e g, (Brent, 1993, Klavans and Chodorow, 1992, 
Resmk, 1992)) In this paper, we propose such a~ 
approach for the automaUc classfficauon of ~erbs 
into lexlcal semantic classes l 
We can express the Issues raised by this apploach 
as follows 
1 Whmh hngulstlc dlstmcUons among \[exlcsl 
classes can we expect to find m a corpus ~ 
2 How easily can we extract the frequency distri- 
butions that approximate the relevant hngmstlc 
properttes? 
3 Which frequency dlstnbuUons work best to dis- 
tinguish the verb classes ~ 
In exploring these quesUons, we focus on verb clas- 
slficaUon for several reasons Verbs are very impor- 
tant sources of knowledge in many language engi- 
neering tasks, and the relationships among verbs ap- 
pear to play a major role m the orgamzatmn and use 
of this knowledge Knowledge about verb classes is 
crucml for lex,cal acqmsltton m support of language 
generation and machine translatmn (Dolt, 1997) and 
document cl~sfficatmn (Klavans and Kan, 1998), 
yet manual classfficauon of large numbers of verbs is 
a difficult and resource intensive task (Levm, 1993 
Miller et al , 1990, Dang et al, 1998) 
To address these issues, we suggest that one can 
tram an automatic classffier for verbs on the basts of 
staUstmal approxlmaUons to verb dlatheses We use 
dlatheses--alternatmns m the expression of the ar- 
guments of the verb--following Levm and Dorr, for 
two reasons Fnst, verb dlatheses are syntacuc cues 
1 We are aware that a dlstnbutmnal approach rests on 
one strong assumptmn regarding the nature of the repre- 
sentatmns under study semantic notmns and syntacuc 
notmns are correlated, at least m part This assurapuon 
is under debate (Bnscoe and Copestake, 1995, Levm, 
1993, Dorr and Jones, 1996, Dorr, 1997), but we adopt 
~t here without further dlscussmn 
15 ' • 
to semantic classes, hence they can be more easily 
captured by corpus-based techniques Second, using 
verb d~atheses reduces no,se There ~s a certain con- 
sensus (Bnscoe and Copestake, 1995, Pustejovsky, 
1995, Palmer, 1999) that verb dmtheses are regular 
sense extensmns Hence focussing on thin type of 
classfficatmn allows one to abstract from the prob- 
lem of word sense dmamb,guatmn and treat remdual 
d~fferences m word senses as no~se m the classffica- 
tmn task 
We present an m-depth case study, m which we 
apply machine learning techmques to automaUcally 
classify a set of verbs based on d~stnbutmns of gram- 
maucal indicators of dmtheses, extracted from a 
very large corpus We look at three very mterest- 
mg classes of verbs unergaUves, unaccusauves, and 
obJect-drop verbs (Levm, 1993) These are Interest- 
mg classes because they all parUcapate m the trans~- 
uvlty alternatmn, and they are minimal parrs - that 
as, a small number of well-defined dmtmctmns d~ffer- 
entmte their trans,tlve/mtranmUve behavmr Thus, 
we expect the differences m their dmtnbuttons to be 
small, entailing a fine-grained dlscr,mmaUon task 
that prowdes a challenging testbed for automatic 
classfficatmn 
The specffic theoretical questmn we mvesUgate ~s 
whether the factors underlying the verb class dm- 
tmctmns are reflected m the statmttcal dmtnbutmns 
of lex~cal features related to dmtheses presented by 
the md,v~dual verbs m the corpus In doing th~s, we 
address the questmns above by determining what are 
the lexmal features that could d~stmgmsh the behav- 
tor of the classes of verbs w~th respect to the relevant 
dmtheses, ~hmh of those features can be gleaned 
from the corpus, and which of those, once the sta- 
Ustmal dmtnbutmns are available, can be used suc- 
cessfully by an automatic classifier 
In m~ttal work (Stevenson and Merlo, 1999), ~e 
found that hngmstlcally motivated features that d~s- 
tmgmsh the verb classes can be extracted from an 
annotated, and m one case parsed, corpus These 
features are sufficient to almost halve the error 
rate compared to chance (45% reductmn) m auto- 
maUc verb classtficaUon, suggesting that d~stnbu- 
Uonal data prowdes knowledge useful to the class~- 
ficaUon of verbs The focus of our original stud~ 
was tho demonstration m prmctple of l~a.nmg verb 
classes from frequency d~stnbutmns ofsyntactm fea- 
tures, and an analysm of the relaUve contrtbutmn of 
the various features to learmng Th~s paper turns 
to the nnportant next steps of rephcatmg our find- 
rags using other training methods and learning al- 
gorithms, and analyzing the performance on each of 
tbe three classes of verbs This more detailed anal- 
ys~s of accuracy within each class m turn leads to 
the development of a new dlstrtbutmnal feature m- 
tended to improve dlscnmmabthty among t~o of the 
classes The addltmn of the ne~ feature successfully 
reduces the error rate of out mltml results m classl- 
ficatmn by 19%, for a 56% overall reductmn m error 
rate compared to chance 
2 Determining the Features 
In this sectmn, we present mouvatmn for the mttml 
features that we mvesUgated m terms of their role 
m learmng the verb classes We first present the 
hngmstlcally den~ed features then turn to e~tdence 
from experimental psychohngutstlcs to e\tend the 
set of potentially relevant features 
2.1 Features of the Velb Classes 
The three verb classes under mvesugatmn - unerga- 
Uves, unaccusaUves, and object-drop -differ m the 
properties of their translttve/mtranslhve a\[terna- 
Uons, which are exemphfied below 
UnergaUve 
(la) The horse raced past the barn 
(lb) The jockey raced the horse past the barn 
Wnaccusatave 
(2a) The butter melted m the pan 
(2b) The cook melted the butter m the pan 
ObJect-drop 
(3a) The boy washed the hall 
(3b) The boy washed 
The sentences m (1) use an unergatwe velb. ,accd 
Unergatlves are mttansluve actmn verbs whose tran- 
sttlve form is the causattve counterpart of the m- 
transluve form Thus, the subject of the intransi- 
tive (la) becomes the object of the translh~e (lb) 
(Brousseau and Rltter 1991, Hale and ke~set 1993 
Levm and Rappaport Ho~,av, 1995) The sentences 
m (2) use an unaccusaUve verb, melted Lnac- 
cusatlves are intransitive change of state ~et bs (2a) 
hke unergauves, the translu~e counterpart for the.,e 
verbs ts also causative (2b) The sentence~ m (3) 
use an object-dtop verb washed, the~e ','elt:,~ haxe a 
non-causaU~e tran'~ltl~,e/intransltl~,,e al\[eln¢ltton in 
~ hlch the object is sm~pl~ opttonal 
Both unergauves and unaccusatl~es \[la~e a 
causattve trans~u~e form, but differ m the semanuc 
roles that they assign to the paructpants m the e~ent 
described In an mtranstUve unetgaUve, the ',ubject 
ts an 4.gent Ithe doer of the e~ent), and m an Intran- 
sitive unaccusaUve, the subject ts a Theme (~ome- 
thing affected by the e~ent) The role assignments to 
the corresponding semanuc arguments of the ttan- 
s~u~e forms--I e, the dnect objects--a~e the ~ame 
16 
with the addition of a Causal Agent (the causer of 
the event) as subject in both cases Object-drop 
verbs simply assign Agent to the subject and Theme 
to the optional object 
We expect the differing semantic role assignments 
of the verb classes to be reflected m their syntac- 
tic behavior, and consequently in the distributional 
data we collect from a corpus The three classes can 
be characterized by their occurrence in two alter- 
nations the transittve/mtrans~tive alternation and 
the causative alternation Unergatives are distin- 
guished from the other classes m being rare in the 
transitive form (see (Stevenson and Merlo, 1997) for 
an explanation of this fact) Both unergatives and 
unaccusatives are dlstmgmshed from obJect-drop m 
being causative in their transitive form, and sun- 
darly we expect this to be reflected in amount of 
detectable causative use Furthermore, since the 
caus&tlve is a transitive use, and the transitive use of 
unergatlves is expected to be rare, causativity should 
primarily distinguish unaccusatlves from object- 
drops In conclusion, we expect the defining features 
of the verb classes--the intransitive/transitive and 
causative ~lternatlons--to lead to distributional dif- 
ferences m the observed usages of the verbs in these 
alternations 
2 2 Psychollngmst~cally Relevant Features 
The verbs under study not only differ in their 
thematic properties, they also differ in their pro- 
cessmg properties Because these verbs can occur 
both in a trans~tive and an intransitive form, they 
have been particularly studied in the context of the 
mare verb/reduced relative (MV/I:tR) ambiguity il- 
lustrated below (Bever, 1970) 
The horse raced past the barn fell 
The verb ~aced can be interpreted as either a past 
tense main verb, or as a past participle w~thm a re- 
duced relative clause (l e , the horse \[that was\] raced 
past the barn) Because fell is the main verb, the le- 
duced relative lnterpretatmn of raced is required for 
a coherent analysis of the complete sentence But 
the main verb interpretation of raced is so strongly 
preferred that people experience great difficulty at 
the verb fell, unable to integrate it with the inter- 
pretation that has been developed to that point 
However, the reduced relative interpretation is not 
difficult for all verbs, as in the follo~mg example 
The boy washed in the tub was angry 
The difference in ease of interpreting the lesolu- 
tions of this ambiguity has been shown to be sen- 
sitive to both frequency differentials (MacDonald 
1994, Trueswell, 1996) and to verb class d~stmctmns 
(Stevenson and Merlo, 1997, Flhp et al , 1999) 
Consider the features that d~stmguish the t~o res- 
olutions of the M\,/RR ambiguity 
MV The horse raced past the barn quickly 
RR The horse raced past the barn fell 
In the main verb resolution, the ambiguous ~erb 
raced is used in its intransitive form, while in the re- 
duced relative, it is used in its transitive, causative 
form These features correspond directly to the 
defining alternations of the three verb classes un- 
der study (intransitive/transitive, causative) ~,ddl- 
tionally, we see that other related features to these 
usages serve to distinguish the two resolutions of the 
ambiguity The mare verb form Is active and a mare 
verb part-of-speech (labeled as VBD by automatic 
POS taggers), by contrast, the reduced relative foim 
is passive and a past partic~ple (tagged as \ BN) 
Since these features (active/passive and VBD/VBN) 
are related to the intransitive/transitive alteination, 
we expect them to also exhibit d~stributloaal differ- 
ences among the verb classes Specifically, ~e expect 
the unergatives to yield a higher proportion of act~ e 
and "vBD usage, since, as noted above, the transitive 
use of unergatwes is rare 
3 Frequency Distributions of the 
Features 
We assume that currently available large cotpoLa 
are a reasonable approximation to language (Pul- 
lum, 1996) Using a combined corpus of 65-mllhon 
words, we measured the relative frequenc) distribu- 
tions of the four linguistic features (VBD/~ BN ac- 
tive/passive, Intransitive/transitive, causative/non- 
causative) over a sample of verbs from the three lex- 
tcal semantic classes 
3 1 Materials 
~e chose a set of 20 verbs from each class based pll- 
maidy on the classfficatlon of verbs m (Le~ m 1993) 
(see Appendl~ ~) The uneigatlves ale maanei oI 
motion verbs The unaccusatl~es ale ~erbs of~haage 
of state The object-drop verbs are unspecified ob- 
ject alternation verbs The ~e~bs ~ere sele~Led flora 
Lenin's classes based on their absolute fiequenc} 
Ful thermore, they do not generally sho~ ma~l~ e de- 
paitures from the intended verb sense m the cotpu~ 
(Though note that there are only 19 unaccu~atlxes 
because ,zpped, ~hlch ~as initially counted m the 
unaccusatives, was then excluded from the aaal~- 
sis as It occurred mostly in a different usage m the 
corpus, as a velb plus paltlcle ) Most of the vetb~ 
can occur m the transitive and in the passive Each 
~erb presents the ~ame folm m the simple pa~t and 
m the past palticlple In order to smlphf~ the ~ouat- 
17 
mg procedure, we made the assumptron that counts 
on this single verb form would approximate the dis- 
tribution of the features across all forms of the verb 
Most counts were performed on the tagged versron 
of the Brown Corpus and on the portion of the Wall 
Street Journal distmbuted by the ACL/DCI (years 
1987, 1988, 1989), a combined corpus m excess of 
65 mdhon words, with the exceptmn of causatrv- 
lty which was counted only for the 1988 year of the 
WSJ, a corpus of 29 million words 
3 2 Method 
We counted the occurrences of each verb token in 
a transrtlve or mt~ansltr~e use (INTR), m an active 
or passive use (ACT), rn a past pamcrple or smaple 
past use (VBD), and in a causative or non-causative 
use (CAUS) More precrsely, features were counted 
as follows 
INTR a verb occurrence was counted as transrtlve 
if rmmediately followed by a nominal group, else rt 
was counted as mtransitrve 
ACT mare verbs (tagged VBD) were counted as 
actrve, participles (tagged V BN) counted as actrve ff 
the closest preceding auxiliary was have, as passive 
ff the closest preceding auxiliary was be 
VBD occurrences tagged VBD were simple past, 
VBN were past participle 
(Each of the above three counts was normalized 
over all occurrences of the verb, yielding a single 
relative frequency measure for each verb for that fea- 
ture ) 
CAUS The causative feature was approximated by 
the followmg steps Frrst, for each verb, all cooc- 
currmg subjects and objects were extracted from 
a parsed corpus (Colhns, 1997) Then the propor- 
tmn of overlap between the two multrsets of nouns 
was calculated, meant to capture the causative al- 
ternation, ~here the subject of the mtransrtrve can 
occur as the object of the trans~trve Vve define 
overlap as the largest multiset of elements belong- 
mg to both the subjects and the object multisets, 
eg {a,a,a,b}(3 {a} = {a,a,a} The proportron 
is the ratio between the o~erlap and the sum of the 
subject and object multrsets (For example, for the 
rumple sets above, the ratio would be 3/5 or 60 ) 
All ra~ and normahzed corpus data ale a~adable 
from the authors, and more detarl concerning data 
collectron can be found m (Stevenson and Merto, 
1999) 
4 Experiments in Verb Classification 
The frequency drstnbutrons of the verb alternatmn 
features yield a vector for each verb that represents 
the relative frequency values for the verb on each 
drmensron, the set of 59 vectors constrtute the data 
for our machine learmng experiments 
Template \[verb, VBD, ACT, INTR, CADS, class\] 
Example \[opened, 79, 91, 31, 16, unacc\] 
Our goal was to determine whether automatm clas- 
sfficatlon techniques could determine the class of a 
verb from the distributional propertms represented 
m this vector 
In related work (Stevenson and Merlo, 1999) ~e 
describe initial unsupervised and supervised lealnmg 
experiments on this data, and discuss the contllbu- 
tlon of the four different features (the frequenc.~ dis- 
tributions) to accurac~ m verb classfficatlon In thzs 
paper, we extend the work in several ~ays Fu~t, ~e 
report further analysis of rephcauons of our mmal 
supervised learning results Next, we demonstrate 
srmdar performance using different training methods 
and learning algorithms, mdmatmg that the perfor- 
mance rs Independent of the particular learning ap- 
proach Furthermore, these addrtronal e~penments 
allow us to evaluate the performance separately on 
each of the three verb classes Finally, based on tins 
evaluation, we suggest a new feature to better drs- 
tmgmsh the thematic propertms of the classes, and 
present experimental results showing that its use rm- 
proves our original accuracy rate 
4.1 Initml Experiments 
Imtial experiments were carried out using a decrsron 
tree induction algorithm, the C5 0 system avadable 
from http//www rulequest corn/ (Qumlan, 1992), 
to automatmally create a classfficatron program flora 
a training set of verb vectois with known classffica- 
tron 2 In our earhei experiments ~e ran \[0-fold 
cross-vahdatrons repeated 10 times hele ~e repeat 
the ctoss-vahdatrons 50 tmles, and the numbeis te- 
polted are averages over all the tuns 3 
Table 1 shows the results of our experiments on 
the four features we counted m the corpora (x BD 
ACT, INTR, CAUS), as well as all three-feature subsets 
of those four The basehne (chance) performance m 
th~s task rs 33 8%, since thele are 59 ~ectors and 
~The s~stem generates both declsmn trees aml rule 
sets for use m classfficatmn Since the d~fferencc m pet- 
formance between the t~o zs ne~er s~gmficant ~xe repoKt 
here Jab the results using the extracted rules The rules 
provide a confidence level foz each classfficatmn ~ hmh 
Is unavailable with the decmon tree data structure 
3A 10-fold cross-vahdatmn means that the s~stem 
randomly d~vldes the data into 10 parts, and runs 10 
t~mes on a different 90%-tralmng-data/10%-test_data 
spht, ymldmg an average accuracy and standard enor 
Th~s procedure is then repeated for 50 different random 
dlvlsmns of the_ data and accurac3 and standard eIror 
are agam averaged across the 50 runs 
18 
Features Acc% SE% 
VBD ACT INTR CAUS 63 7 0 6 
VBD INTR CAUS 62 7 0 6 
ACT INTR CAUS 59 9 0 5 
VBD ACT CAUS 56 8 0 5 
VBD ACT INTR 54 5 0 5 
f, 
Table 1 Percentage Accuracy (Acc%) and Standard 
Error (SE%) of C5 0 (33 8% baselrne) 
3 possible classes (That is, assigning one of the 
two most common classes--of 20 verbs each--to all 
cases would ymld 20 out of 59 correct, or 33 8% ) As 
seen m the table, classrficatmn based on the four fea- 
tures performs at 63 7%, or 30% over chance The 
true mean of the sample cross-vahdatlons lies wd, hm 
plus or remus two standard errors of the reported 
mean (dr=49, t=2 01, p< 05) In all cases, the range 
is plus or mmus I0 or 12, yreldmg a very nat- 
row predrcted accuracy range Furthermore, we per- 
formed t-tests comparing the results of the 50 cross- 
vahdatmns for each of the different feature subsets 
All pairs were srgmficantly different (p< 05) except 
for the results using all four features (first row m the 
table) and those excluding ACT (second row m the 
table) We conclude that all features except ACT 
contribute posrtlvely to classrficatmn performance, 
and that ACT does not degrade performance In our 
rephcatrons, then, we focus on all four features 
4 2 Rephcatmn with Different Training and 
Learning Methods 
There are conceptual and practical reasons for in- 
vestigating the performance of other training ap- 
proaches and learning algorithms applied to our verb 
distribution data Conceptually, it is desrrable to 
know whether a particular learning algorithm or 
training techmque affects the level of performance 
Practically, drfferent methods enable us to evalu- 
ate more easily the performance of the classification 
method within each verb class (When we run re- 
peated cross-validations with t keg.C5 0..system, we 
don't have access to the accuracy rage for each class, 
the system only outputs an overall mean error rate ) 
To preview, we find that the different training and 
learning methods we tried all, gave similar perfor- 
mance to our original results, and m addltron al- 
lowed us to evaluate the accuracy wlthrn each verb 
class 
In one set of experiments, we used the same C5 0 
system, but employed a training and testing method- 
ology that used a single hold-out case We held 
out a single verb vector, trained on the remaining 
,58 cases, then tested the resulting classffier on the 
II Classes 
\[\[ All Classes 
I Unergatv~e 
Unaccusatwe 
I ObjectDrop 
Percent ~ccuracy 
61 0 
75 0 
57 9 
50 0 
Table 2 Percentage Accuracy of C5 0 With Single 
Hold-Out Training 
single hold-out case, and recorded the collect and 
assigned classes for that verb Tius was then ze- 
peated for each of the 59 verbs This approach ~ raids 
both an overall accuracy rate (when the results are 
averaged across all 59 trials), as well as pio~ldmg 
the data necessary for determining accuracy fol each 
verb class (because we have the classification of each 
verb when It is the test case) The results ale pre- 
sented m Table 2 The overall accuracy IS a little less 
than that achieved with the 10-fold cross-validation 
methodology (61 0% versus 63 7%) However, we 
can see clearly now that the unergatlve verbs ate 
dassffied with much greater accuracy (75%), Mule 
the unaccusatwe and obJect-drop verbs are classified 
with much lower accuracy (57 9% and 50% respec- 
trvely) The distributional features we have appear 
to be much better at dmtmgmshmg unergatwes than 
unaccusatlve or obJect-drop verbs 
To test thrs drrectly under our original tiammg 
assumptrons, we ran two different experiments, u~- 
mg 10-fold cross-vahdation repeated 10 time~ The 
first experiment tested the abdit:~ of the classifier to 
distinguish between unergatlves and the other t~o 
verb types, wrthout having to distinguish bct~een 
the latter two The data included the 20 unerga- 
rive ,,erbs and a random sample of 10 unaccusatave 
and 10 obJect-drop verbs, 10 different random ~am- 
pies were selected to form 10 such data sets In 
these data sets, the ~erbs were labeled as unerga- 
tire or "of;her" The baseline (chance) classzficatmn 
accuracy for this data is 50%, the mean accmac~ 
achmved across all data sets was 78 5% (standard el- 
lot 0 8%), a srzable improvement o~er chance The 
second expeim~ent ~as intended to det, etmme ho~ 
well the classifier can dlstmgm~h.unaccusatl~e from 
object-drop verbs The data consisted of one ~et 
that included all the unaccusative and object-drop 
verbs, with no unergatives Because there ate only 
i9 unaceusauve verbs, the basehne accuracy late is 
51% (20/39), here the classifier achieved an accuracy 
only slightly above chance, at 58 3% (standard elror 
1 8%) These results, summarized in Table 3 clearly 
confirm the higher accuracy of classifying uneigatlvo 
verbs with the current feature set 
This pattern of results ~as repeated under a ~oi3 
19 
Classes Acc% SE% \[I 
Unergatlve vs Other 78 5 0 8 \] 
Unaccusatlve vs ObjectDrop 58 3 1 8 I 
Table 3 Percentage Accuracy (Ace%) and Standard 
Error (SE%) of C5 0 (50-51% baseline) 
II Cl es \[PCA%\[ FMP% II 
\[l All Classes \[ 65 0 1 63 9 II 
Unergatlve 85 0 71 7 
Unaccusative 60 0 55 0 
ObjectDrop 50 0 65 0 
Table 4 Percentage Accuracy of PCA (PCA%) and 
Feature Map (FMP%) Neural Networks 
different type of learning algorithm as well We per- 
formed a set of neural network experiments, using 
NeuroSolutlons 3 0 (see http//www nd corn), and 
report here on the networks that achieve the best 
performance on our data These are principal com- 
ponents analysis and automatic feature map net- 
works, which are essentially feed-forward percep- 
trons with pre-processmg units that transform the 
existing features rata a more useful format In our 
tests, both methods performed best overall when 
there were no hidden layer units, and the networks 
were trained for 1000 epochs The mean accuracy 
rates of 10-fold cross-validations with these param- 
eter settings are summarized in Table 4 Again, the 
overall percentage accuracy is in the low sixties, with 
better performance on the unergattves than on the 
other two verb classes, the difference was particu- 
larly striking with the PCA networks This overall 
pattern doesn't change with further training, in fact, 
training up to 10,000 epochs resulted in very low 
accuracy (of 45%) for either unaccusatives, object- 
drops, or both 
To summarize, following a different training ap- 
proach with C5 0 (the single hold-out method), and 
applying very different learning approaches (two 
kinds of neural networks), resulted in mmllai o~er- 
all performance to our original C5 0 results This 
indicates that the accurac3 achieved is at lea.st 
somewhat independent of specific learning or train- 
Ing techniques Moreover, these different methods, 
along with experiments directly testing unergative 
versus unaccusatlve/object-drop classification, allow 
us to examine more closely where the resulting clas- 
sifters have the most serious problems In all cases, 
the accuracy is best for unergattves, and the accu- 
racy of unaccusatives, object-drops, or both, is de- 
graded If this performance is indeed a reliable mdi- 
Classes 
Unerg vs Unacc 
Uaerg vs ObjDrop 
Unacc vs ObjDrop 
I vBo I AcT I INTR I CAI'S II 
ns ns ** * 
*** p< 001 
** p< 01 
* p_< 05 
as non-significant 
Table 5 Significance Levels of T-Tests Comparing 
Feature Values Between Verb Classes 
cation of the inherent dtscnmmabd~ty of tile dastn- 
butlonal data, then we must examine more closely 
the properties of the data itself to understand (and 
potentially improve) the performance 
4 3 Dsscrlmmatmg Unaccusative and 
ObJect-Drop Verbs 
To understand why the data discriminates unerga- 
ttves reasonably well, but not unaccusatlves and 
object-drops, we need to directly test the discnm- 
inabilityof the features across the classes We do so 
by using t-tests to compare the values of the differ- 
ent features--VBD, ACT, INTR., CAUS--for unergattve 
and unaccusattve verbs, unergatlve and object-drop 
verbs, and unaccusatlve and object-drop verbs In 
each case, the t-test is giving the likelihood that the 
two sets of values--e g, the VBD feature values for 
unergatives and for unaccusatives--are dra~n from 
different populations Table 5 shows that all sets of 
features are significantly different for unergatlve and 
unaccusattve verbs, and for unergattve and object- 
drop verbs Ho~ever, only INTR. and CAUS ate slg- 
mficantly different for unaccusattve and object-dtop 
verbs, indicating that we need additional featules 
that have different values across these two classes 
In Section 2 1, we noted the differing semantic role 
asmgnments for the verb classes, and hypothesized 
that these differences would affect the expression of 
syntactic features that ate countable in a corpus 
For example, the c ~bs feature approximates sen\]an- 
tic role reformation b.~ encoding the oxerlap beh~een 
nouns that can occur m the ~ubject and object po- 
sitions of a cau~ative xetb Here x~e suggest another 
feature, that of ammacy of subject, that is intended 
to distinguish nouns that receive an Agent role flora 
those that receive a Theme role Recall that object- 
drop verbs assign Agent to their subject in both the 
transitive and intransitive alternations, while unac- 
cusattves assign Agent to their subject only in the 
transitive, and Theme m the intransitive We expect 
then that object-drop verbs will occur more often 
with an animate subject Note again that ~e are 
20 
II Features \[Acc% SE% II 
I VBD ACT INTR CAUS I 63 7 0 6 \] 
VBD ACT INTR CAUS PRO 70 7 0 4 
Table 6 Percentage Accuracy (Acc%) and Standard 
Error (SE%) of C5 0, W~th and W~thout New PRO 
Feature, All Verb Classes (33 8% basehne) 
making use of frequency dmtnbutmns--the clatm ~s 
not that only Agents can be ammate, but rather that 
nouns that receive the Agent role will more often be 
ammate than nouns that receive the Theme role 
A problem w~th a feature hke ammacy ~s that ~t 
requires etther manual determmatmn of the antmacy 
of extracted subjects, or reference to an on-hne re- 
source such as WordNet for determining ammacy 
To approximate ammacy w~th a feature that can be 
extracted automatically, and w~thout reference to a 
resource external to the corpus, we instead count 
pronouns (other than ~t) m subject positron The 
assumptmn ~s that the words I, we, you, she, he, 
and they most often refer to ammate ent~tms The 
values for the new feature, P~.O, were determined 
by automatmally extracting all subject/verb tuples 
including our 59 examples verbs (from the WSJ88 
parsed corpus), and computing the ratm of occur- 
rences of pronouns to all subjects 
We again apply t-tests to our new data to deter- 
mine whether the sets of PRo values d~ffer across 
the verb classes Interestingly, we find that the Prto 
values for unaccusat~ve verbs (the only class to as- 
s~gn Theme role to the sub tect m one of tts alterna- 
tmns) are s~gmficantly dtffe~ent from those for both 
unergatlve and object-drop verbs (p< 05) More- 
over, the PRo values for unergat~ve and object-drop 
verbs (whose subjects are Agents m bo~h alterna- 
tmns) are not s~gmficantly d~fferent Th~s pattern 
confirms the abd~ty of the feature to capture the 
thematm d~stmctmn between unaccusat~ve verbs and 
the other two classes 
Table 6 shows the result of applying C5 0 (10-fold 
eross-vahdatmn repeated 50 t~mes) to the three-x~ay 
classfficatmn task using the PRo feature m conjunc- 
tmn w~th the four previous features ~.ccuracy ran- 
proves to over 70%, a teductmn m the error rate of 
almost 20% due to th~s single nex~ feature Mote- 
over, classifying the unaccusat~ve an2 object-drop 
verbs using the new feature m conjunctmn w~th the 
prevmus four leads to accuracy of over 68% (com- 
pared to 58% w~thout PRo) We conclude that this 
feature ~s ~mportant in d~stmgmshlng unaccusat~ve 
and object-drop verbs, and hkely contributes to the 
tmprovement m the three-way classtficatton because 
of th~s Future work wdl examine the performance 
w~thm the verb classes of th~s new set of features to 
see whether accuracy has also tmproved for unerga- 
tire verbs 
5 Conclusions 
In thin paper, we have presented an m-depth case 
study, m whmh we investigate varmus machine learn- 
mg techmques to automatically classify a set of 
verbs, based on dlstnbutmnal features extracted 
from a very large corpus Results show that a small 
number of hngmstlcally motivated grammatical fea- 
tures are sufficmnt to reduce the error rate by mote 
than 50% over chance, acluevmg a 70% acctuacy 
rate m a three-way classfficatmn task Tins leads 
us to conclude that corpus data is a usable reposi- 
tory of verb class mformatmn On one hand ~e ob- 
serve that semantlc propemes of verb classes (such 
as causatlvlty, or ammacy of subject) may be use- 
fully approximated through countable syntactic fea- 
tures Even with some noise, lexmal propertms are 
reflected m the corpus robustly enough to positively 
contribute m classlficatmn On the other hand, how- 
ever, we remark that deep hngumtm analysis cannot 
be ehmmated--m our approach, it is embedded m 
the selection of the features to count We also think 
that using hngumtlcally motivated features makes 
the approach very effective and easdy scalable we 
report a 56% reductmn m error rate, w~th only five 
features that are relatwely straightforward to count 
Acknowledgements 
This research was partly sponsored by the S~ lss Na- 
tmnal Scmnce Foundatmn, under fello~slup 8210- 
46569 to Paola Merlo, by the US Natmnal Scmnce 
Foundatmn, under grants #9702331 and #9818322 
to $uzanne Stevenson, and by the Infotmatton Sci- 
ences Councd of Rutgers Umverslty ~,~,e thank 
Martha Palmer for getting us started on tlus ~ork 
and Mmhael Colhns for gwmg us access to the out- 
put of his parser We gratefully acknowledge the 
help of Ixlva Dickinson, ~ho calculated no~mahza- 
tmns of the corpus data 
Appendix A 
The une~gatx~es are manner of morton ~erbs jumptd 
rushed, malched, leaped floated, laced, huslwd uan- 
dered, vaulted, paraded, galloped, gl,ded, hzked hopped 
jogged, scooted, ncurlzed, ~kzpped, hptoed, trotted 
The unaccusau~es are verbs of change of state 
opened, exploded, flooded, dzs~olved, cracked, hardened 
bozled, melted, .fractured, ,ol,dzfied, collapsed cooled 
folded, w~dened, changed, clealed, dzwded, ~,mmered 
stabdzzed 
The object-dlop verbs are unspecffied object altel- 
natron verbs played, painted, k,cked, carved, reaped, 
washed, danced, yelled, typed, kmtted bolrowed mhet- 
21 
tted, organtzed, rented, sketched, cleaned, packed, stud- 
ted, swallowed, called 

References 
Thomas G Bever 1970 The cogmtwe basis for hngms- 
tlc structure In J R Hayes, e&tor, Cognttson and 
the Development of Language John Wdey, New York 

Michael Brent 1993 From grammar to le~con Un- 
supervmed learmng of \[ex~cal syntax Computational 
Linguistics, 19(2) 243-262 

Edward Bnscoe and Ann Copestake 1995 Lex~cal rules 
m the TDFS framework Techmcal report, Acquflex- 
I I Working" Papers 

Anne-Marm Brousseau and Ehzabeth R~tter 1991 A 
non-umfied analysis of agent~ve verbs In West Coast 
Conference on Formal Lmgutstzcs, number 20, pages 
53-64 

M~chael John Colhns 1997 Three generaUve, lexa- 
cahsed models for statistical parsmg In Proc of the 
~5th Annual Meeting of the ACL, pages 16-23 

Hoa Trang Dang, Kann K~pper, Martha Palmer, and 
Joseph Rosenzwe~g 1998 Investtgatmg regular sense 
extenmons based on mteresecttve Levm classes In 
Proc of the 361h Annual Meeting of the ACL and 
the 171h \[nternatwnal Conference on Computatwnal 
L,ngu,st,cs (COLING-A CL '98), pages 293-299, Mon- 
treal, Canada Umvers~t6 de Montreal 

Bonme Dorr and Doug Jones 1996 Role of word sense 
d~samb~guatmn m lexacal acqms~tmn Predmtmg se- 
mantics from syntactic cues In Proc of the 161h In- 
ternattonal Conference on Computat*onal Lmgutsttcs, 
pages 322-327, Copenhagen, Denmark COLING

Bonnie Dorr 1997 Large-scale chctmnary constructmn 
for foreign language tutonng and mterhngual machine 
translatmn Machine Translatton, 12 1-55 

Hana Fd~p M~chael Tanenhaus, Greg Carlson, Paul AI- 
lopenna, and Joshua Blatt 1999 Reduced rela- 
tives judged hard require constraint-based analyses 
In P Merlo and S Stevenson, echtors, Sentence Pro- 
cessmg and the Lextcon Formal, Computational, and 
Ezpertmental Perspectives, John Benjamms, Holland 

Ken Hale and Jay Keyser 1993 On argument struc- 
ture and the lexacal representatmn of s:~ ntact~c rela- 
tmns In K Hale and J Keyser, editors, The t',ew 
from Budding ~0, pages 53-110 MIT Press 

Juchth L Ixlavans and Martin Chodorow 1992 De- 
grees of stat~vlty The lexacal representatmn of verb 
aspect In Proceedmg~ of the Fourteenth International 
Conference on Computahonal Lmgmst,cs 

Juchth Ixlavans and Mm-Yen Kan 1998 Role of ~erbs 
m document analysis In Proc of the 361h Annual 
Meeting of the ACL and the 171h \[nternatzonal Con- 
ference on Computational Lmgutsttcs ( C O L L'v G- 4 C L 
'98), pages 680-686, Montreal, Canada Umvers~te de 
Montreal 

Beth Levm and/Vlalka Rappapti(t'Hovav 1995 (Jnac- 
cusatwlty MIT Press, Cambridge, MA 

Beth Le~m 1993 Enghsh Verb Clas~e~ and 4lterna- 
twns Chacago Umvers~ty Press, Chicago, IL 

Maryellen C MacDonald 1994 Probablhstlc con- 
stramts and syntactic amblgtuty resolution Language 
and Cognltzve Processes, 9(2) 157-201 

Paola Merlo and Suzanne Stevenson 1998 What gram- 
mars tell us about corpora the case of reduced rela- 
tive clauses In P1oceedmgs of the Slzth Workshop on 
Very Large Corpora, pages 134-142, Montreal, CA 

George Miller, R Beckw~th, C Fellbaum, D Gross, and 
Ix I~hller 1990 Fwe papers on Wordnet Techmcal 
report, Cogmtzve Scmnce Lab, Princeton Ual~erstt~ 

Martha Palmer 1999 Coasmtent criteria for sense dis- 
tmctmns Computmg \]or the Hamamttes 

Fernando Perelra, Naftah Tlshby, and Ldhan Lee 1993 
Dlstrabutmnal clustering of enghsh words \[n Proc of 
the 31th 4nnual Meeting of the 4CL, pages 183-190 

Fernando Perexra, Ido Dagan, and Lalhan Lee 1997 
Slmdanty-based methods for word sense dlsamblgua- 
tmn In Proc of the 35th Annual Meeting of the 
4 CL and the 8th Conf of the E 4 CL (A CL/EA CL '97) 
pages 56 -63 

Geoffrey K Pullum 1996 Learnabthty, hyperlearn- 
rag, and the poverty of the sttmulus In Jan John- 
son, Matthew L Jute, and Jen L Moxley, editors, 
~nd Annual Meeting of the Berkeley Lmgutstzcs So- 
ctety General Sesston and Parasesswn on the Role of 
Learnabdzty m Grammatzcal Theory, pages 498-513, 
Berkeley, Cahforma Berkeley Linguistics Socmty 

James Pustejovsky 1995 The Generatwe Lexicon MIT 
Press 

J Ross Qumlan 1992 C$ 5 Programs fo~ Machine 
Learning Series m Machme Learning Morgan Ixauf- 
mann, San Mateo, C 4. 

Phdlp Resnik 1992 Vv'ordnet and dmtnbutmnal anal- 
ysis a class-based approach to lex~cal dlsco~er~ 
In 4 44I Workshop m Statz~tzcally-ba~ed NLP Tech- 
ntqu_e~, pages 56-64 

Doug Roland and Dan Juxafsk:~ 1998 How ~ed~ subcat- 
egonzatmn fiequencms are affected b~ corpu~ choice 
In Proc of the ~6th 4nnual I\[eetmg of the 4CL, \Ion- 
treal, CA 

Suzanne Stevenson and Paola \lerlo 1997 Lexlcal 
structure and parsing comple~lt~ Language and ('og- 
mtwe Proce~e~, 12(2/3) 3t9-399 

Suzarme Stevenson and Paola Merlo 1999 4.utomauc 
verb classfficatton using distmbutmns of grammatical 
features In Proc of the 9th Conference of the Eu- 
ropean Chapter of the A CL, Bergen, Norway, pages 
45-52 

John Trueswell 1996 The role of lexlcal frequency 
m syntacuc amblgmty resolutmn J of Memory and 
Language, 35 566-585 
