Automatic Lexical Acquisition Based on Statistical Distributions* 
Suzanne Stevenson 
Deparl;ment of Coml)uter Science 
Uldvcrsity of Toronto 
6 King's College l(oa,d 
'.l.'oronto, ON Ca, lla, d~t M5S 311115 
suzanne~cs, toronto, edu 
Abstract 
We a, ui;omatically cla,ssi(y verbs into lexica,1 se- 
mantic classes, 1)ased on distributions of indices- 
tots of verb a.lterna.tions, extra.cCed froln a very 
la.rge a.nnota.ted corpus. We ~ddress a. prol)lem 
which is pa.rticuhtrly difficult 1)eca.use the verl) 
classes, a.lthough sema.ntica.lly different, show sim- 
ila.r surface syntactic 1)eha.vior, Five gra.m,na.tica.1 
fea.l;tlres ~u:e su\[\[icient to reduce error i:ate by more 
tha.n 50% over cha.nc(,: we a.chieve almost 70% 
a.ceura.cy in a 1;ask whose baseline perl'ornmn(:e is 
34%, and whose exl)ert-I)ased Ul)l)er bound we ('aJ- 
culated a.t 86.5%. We conclude l;ha.1; corl)us-driven 
exl;racl;ion of gramma.1;ical \['eaJ;ures is a. promising 
lnethodology for line-grained verb classilica.tion. 
1 Introduction 
\])eta.ileal hfforma.tion a.I)out verbs is critical to a. 
broad ra.nge of NI,I ) and lit 1;asks, yet; ils mau 
us.1 (lel;('rmina.tion for la.rge numl)ers o\[' verl)s is 
difficult aml resource intensive. I{.esea.rch o,I tim 
a.ul,()matic, a(-quisil;ion o\[' verb-I)ased k,owl(~(Ig(, 
has succeded in gleaning sylH.a.('l;ic l)rol)erties o\[' 
verl)s such as sul)ca.tegoriza.tion frames from el> 
line resources (I}rent, \]9!)3; lh'iscoe a.nd C,a.rroll, 
1997; \])err, 1997; Ma.nning, \]993), ll,ecently, 
researchers have investiga£ed statistica.l corpus- 
ba.sed methods for lexica.l sema.ntic classitica.tion 
from synta.ctic prol)erties of verl) usage (Aone a.nd 
McKee, \]996; l,a.pa.ta, and Brew, \]999; Schulte im 
Wa.lde, :1998; Stevenson a.nd Merle. 1999; Steve.n- 
son et a l., \] 999; McCarthy, 2000). 
C, orl)us-based al)pro~mhes to lexica.l sema.ntic 
classitic~tion in pa,rticular ha.ve dra.wn on Levin's 
hypothesis (I,evin, 1993) that verbs can be classi- 
lied according to the dia.thesis aJterna.tions (a.lter- 
nations in the syntactic expressions o\[" a.rguments) 
ill which they f)articil)a.te l'or exa.mple, whether a. 
* This research was partly sponsored 1)y US NSI" grants 
#9702331 and #9818322, Swiss NSI" Mlowshlp 8210- 
d65(;9, Information Sciences (.Ollll(il of Hurters University 
and IllCS, U. of Peimsylwmia. 'l'his research was con- 
ducted wldle the tirst author was at llutgers University. 
Paola Merle 
LAT\],- \])cpa.rtment of l,inguisCics 
Univcrsil;y of (leneva 
2 rue de C'~mdolle 
121\] Gent;re d -. Suisse 
merZo©lettres, un±ge, ch 
verb occurs in the dative/prepositiona.l phrase al- 
terna.tion in l';nglish. One diagnostic for dis.thesis 
a.lternations is the sulx'a.tegorization aJternatives 
of a. verb. ltowew~,r, some classes exhibit the same 
subca.tegoriza.tkm possibilities but differ in their 
a.rgument structures, i.e. tim content el' the the- 
real;it roles assigned to the arguments of l;he verb. 
rFhis gyl)e of situation consl;itutes a. pa.rticula.rly 
difficult case R)r corpus-based classification meth- 
ods. 
In this paper, we apply corpus-based lexica.l 
a.cquisition methodology 1;o distinguish classes of 
verbs which allow the same subca.tegoriza£ions, 
but differ in tlteJna.tic roles. We first a ssu me tha.t 
one ca.n a.ut;oma.tiea.lly restrict l;he choice o\[' ('lasses 
to those that; paxl;icil)a.1;e in the relewu~t subcate- 
gorizations (c\['. (l,a.p~ta. and Brew, \]999)). Our 
prOl)OSa.1 is lhen to use st.a.tistics ovel: di~Ll;hesis 
a lterna.nl,s as a, wa.y to \['urther distinguish those 
verl)s wldch allow 1;he same sul)ca.tegoriza.tions; 
achievi,g fine-grained cla.ssifica.tion within that 
S('l,. ()UI' work \['O(tllSeS oll determining tile 1)esl se- 
ma.nl, ic class for a verl) lgpc - the set of usages o1" 
a. verl) across a. document or corpus rather t;ha.n 
fl)r a single verb I, okc n in {~ single local context. 
In this way, we c~u/ exploit the broad beha.vior o1' 
the verb a.cross 1;he corl)uS to determine its most 
likely class overall. 
We investiga.te the proposed a.l)l)rOaCh in an in- 
del)th case study o\[' the three major classes of o1> 
tiona.lly inlra,nsitive w, rl)s in English: ullergative, 
unaccusa,tive, and ol)ject-drop. More specifically, 
according to l,evin's classificaJ;ion (l,evin, 1993), 
the unerga.tives are ma.nner of motion verbs, such 
as jump and march; the una.ccusa.tives are verl)s 
of cha.nge of state,, such a.s open and explode; the 
object-drop verbs a.re unexpressed object a.lCerna.- 
Lion verl)s, such as played a.nd painted. These 
classes a.ll supl)ort 1)oth tr~msitive and intra.nsi- 
1,ire sul)cal,egoriza.tions, I)ut a.re distinguished by 
the pal;tern of thema.tic role assignments i,o sub- 
jecC a.nd object position. We a.utomatica.lly cla.s- 
si(y these verbs on the basis of sta.tistical a,p- 
815 
proxilnations to syntactic indicators of the under- 
lying argunlent structures, using numerical fea- 
tures collected from a large syntactically anno- 
tated (tagged or parsed) corpus. We apply ma- 
chine learning techniques to determine whether 
the fi'equency distribntions of the features, in- 
dividually or in combination, support automatic 
classification of the verbs. To preview our re- 
sults, we demonstrate that combining only five 
numerical indicators is sufficient to reduce the er- 
ror rate in this classification task by more than 
50% over chance. Specifically, we achieve ahnost 
7(1% accuracy in a task whose baseline (chance) 
per\[brmance is 34%, and whose expert-based up- 
per bound is calculated at 86.5%. We conclude 
that a distribution-based method for lexical se- 
mantic verb classification is a promising avenue of 
research. 
2 The Argument Structures 
Our approach rests on tile hypothesis that, even in 
cases where verb classes cannot be distinguished 
by subcategorizations, the frequency distributions 
of syntactic indicators can hold clues to the under- 
lying thematic role differences. We start here then 
with a description of the subca.tegorizations and 
thematic role assignments for each of l.he three 
verb classes under investigation. 
As optionally intransitive verbs, each of 
the three classes participates in the transi- 
tive/intransitive Mternation: 
Uuergative 
(la) The horse raced past the barn. 
(1 b) The jockey raced the horse past tile barn. 
Unaccnsative 
(2a) The butter melted in the pan. 
(2b) The cook melted the butter in the pan. 
Object-drop 
(3a) The boy washed the hall. 
(3b) The boy washed. 
Unergatives are intransitive action verbs, as in (1), 
whose transitive form can be the causative coun- 
terpart of the intransitive form. In the causative 
use, the semantic argument that appears as the 
subject of the intransitive, as in (la), surfaces 
as the object of the transitive, as in (lb) (Ilale 
and Keyser, 1993). Unaccusatives are intransitive 
change of state verbs, as in (2a); the transitive 
counterpart for these verbs exhibits the causative 
alternation, as in (2b). Object-drop verbs, as in 
(3), have a non-causative transitive/intransitive 
alternation, in which the object is simply optional. 
Subj of 
Classes Trans 
Unergative Causal Agent 
Unaccusative Causal Agent 
Object-drop Agent 
Obj of Subj of' 
rPrans Intrans 
Agent Agent 
'£heme rI'heme 
Theme Agent 
Table 1: Summary of Thematic Alternations. 
Each class is distinguished by the content of tile 
thematic roles assigned by tile verb. For object- 
drop verbs, tile subject is all Agent and the op- 
tional object is a Theme, yielding tile thematic 
assignments (Agent, Tlmme) and (Agent) for the 
transitive and intransitive alternants respectively. 
Unergatives and uuaccusatives differ \['1"o111 object- 
drop verbs in participating in the causative alter- 
nation, and also differ from each other in their core 
thematic argument. In an intransitive unerga- 
live, the subject is an Agent, and in an intran- 
sitive unaccusative, the subject is a Theme. In 
the causative transitive form of each, this core se- 
mantic argument is expressed as the direct object, 
with the addition of a Causal Agent (the causer of 
the action) as subject in bol;h cases. The thematic 
roles assigned, and their mapping to syntactic po- 
sition, are summarized in Ta.ble 1. 
3 The features for Classification 
The key to any automatic classification task is to 
determine a set of' useful fea.tures for discriminat- 
ing the itenls to be classitied. In what follows, we 
refer to the cohnnns of Table 1 to explain \]tow we 
expect the thematic distinctions to yield distri- 
butional features whose frequencies discriminate 
among the classes ~t hand. 
Considering column one of Table 1, only 
unergative and unaccusa.tive verbs assign ~ Causal 
Agent to the subject of the transitive. We hy- 
1)othesize that the causative construction is lin- 
guistically more complex than the simple argu- 
ment optionality of object-drop verbs (Stevenson 
and Merlo, 1.997). We expect then that object- 
drop verbs will be more fi:equent in the transi- 
tive than the other two classes. Furthernmre, the 
object of an unergative verb receives the Agent 
role (see the second column of Table 1.), a linguis- 
tically marked transitive construction (Stevenson 
and Merlo, 1997). We therefore expect unerga- 
tives to be quite rare in the transitive, leading to 
a three-way distinction in transitive usage among 
the three classes. 
Second, due to the causative alternation of 
816 
'l'able 2: The l.i'ea.Cures and 'l'heir F;xpected 13ehavior 
TrmlsitiviLy Unaccusativcs and unergativcs have ~, causative transitive, hence lower transitive use. Fur- 
l;hc'rlnorc, unerga.tivcs ha.re a.n agent.ire object, hence very low transitive use. 
Pa.ssivc Voice Passive implies transitive use, hence correlated with transitive feature. 
VBN Tag Passive implies past pa.rt;iciple use (VBN), hence correlated with transitive (and passive). 
Causativity ()l)ject-drop verbs do not have a. causal agent, hence low "ca.usative" use. Unergatives are 
rare in the transitive, hence low cmlsative use. 
Animacy Unaccusatives have a Theme subject in the intransitive, hence lower use of animal, e subjects. 
unergatives and nnaccusatives, the l, hematic role 
of the subjec~ of the intransitiw,~ is identical to 
that of the objecl of the transitiw;, as shown in 
columns two and three of Table 1. C, iven the 
identity of thematic role mal)ped to subject and 
object positions, we expect to observe the sa.me 
noun occurring at times a.s subject of the verb, 
and at other times as object of the verb. In con- 
trast, for object-tirol) verbs, Cite thenm.tic role o\[' 
the sul)ject o17 the intransitive is identical to l;ha{, 
of the sul)ject of the transitive, not the object of 
the transitive. Thus, we expect that it will be less 
common for the same noun to occur in subject and 
object position of the same object-drop verb. We 
hypothesize that this pattern of thematic role as- 
signments will be retlected in difl'erential amount 
of u~'~age across the classes of the same nouns as 
subjects and ol)jects for a given verb. Further- 
more, since the causative is a transitive use, a.nd 
the 1,ra.nsitive use of u nerga.gives is oxpocl;ed to be 
rare., this overlap o(' subjects and ob.iects should 
primarily distinguish unaccusatives (predicted to 
have high overlap of subjects and objects) from 
the other two classes. 
Finally, considering columns one and three of 
Tal)le 1, we note that unergative and objecl;-drop 
verbs assign all agentive role to their subject in 
both the transitive and intra.nsitive, while unac- 
cusatives assign an agentive role to their subject 
only in the tr~msil, ive. Under the assutnpLion that 
the intransitive use of' unaccusatives is not rare, 1 
we then expect thai, unaccusatives will occur less 
often overall with an agentive subject than the 
other two verb classes. On the flu:ther assump- 
tion that Agents tend to be animate entities more 
so than Themes, we expect that unaccusatives 
will occur less freqnently with an animate subject 
compared to unergative and object-drop verbs. 
Note the importance of our use of frequency dis- 
tributions: the claim is not that only Agents can 
~This assumpl, ion is based on the linguistic conlplexity 
of the causative, and borne out in our corpus analysis. 
be animate, but rather that nouns that receive an 
Agent role will more often be animate than nouns 
that receive a Theme ,'ole. 
The above interactions between thematic roles 
and the syntactic expressions of arguments thus 
lead to three features whose distrit)utional proper- 
ties appear promising for distinguishing the verb 
classes: transitivity, causativity, and animaey of 
subject. We also investigate two additionM syn- 
tactic l'ea.l, ures, the passive voice and tile past pa.r- 
ticiple POS tag (VI3N). These features are related 
to the transitive/intransitive Mternal;ion, since a 
passive use implies a transitive use of the verb, 
and the nse of passive in turn implies the use of 
the past participle. Our hyl)ol;hesis is that these 
five features will exhibit distributional differences 
in the observed usages of the verbs, which can be 
used for classifica.tion. The features and their ex- 
pected relevance are summarized in '13ble 2. 
4 Da~a Collection and Analysis 
We chose a set of 20 verbs from each of three 
classes. The complete list of verbs is reported in 
Appendix A. Recall that our goal is to achieve a 
fine-grained classification of verbs that exhibit the 
same subcategorization frames; thus, tile verbs 
were chosen because they do not generally show 
massive del)artures from the intended verb sense 
(and usage) in the corpus. 2 In order to simplify 
tile counting procedure, we included only tile reg- 
ular ("-ed") simple past/past participle form of 
tile verb, assuming that this would approximate 
the distribution of tile features across all forms of 
the verb. Finally, as far as we were able given 
the preceding constraints, we selected verbs that 
could occur in the transitive and in the passive. 
We counted the occurrences of each verb token 
in a transitive or intransitive use (3'RANS), ill a 
2~l~hough note that there are only 19 unaccusatives be- 
cause ripped was excluded fl'om the analysis as it occurred 
mostly in a very different use (ripped off) in the corpus 
from the intended diange of state usage. 
817 
passive or active use (PASS), in a past participle 
or simple past use (VBN), in a causative or non- 
causative use (tAgS), and with an animate subject 
or not (ANIM), as described below. The first three 
counts (TRANS, I'ASS~ VBN) were performed on 
the LDC's 65-million word tagged ACL/DCI cor- 
pus (Brown, and Wall Street Journal 1987-1989). 
The last two counts (CAUS and ANIM) were per- 
formed on a 29-million word parsed corpus (\gall 
Street Journal 1988, provided by Michael Collins 
(Collins, 1997)). The features were counted as 
follows: 
TaANS: The closest noun following a verb was 
considered a potential object. A verb immedi- 
ately \[bllowed by a potential object was counted 
as transitive, otherwise as intransitive. 
pass: A token tagged VBD (the tag for simple 
past) was counted as active. A token tagged VBN 
(the tag for past participle) was counted as active 
if the closest preceding auxiliary was have, and as 
passive if the closest preceding auxiliary was be. 
VBN: The counts tbr VBN/VBI) were based on 
the POS label in the tagged corl)us. 
Each of the above counts was normalized over 
all occurrences of tim "-ed" form of the verb, yield- 
ing a single relative fi:equency measure \['or each 
verb for that feature. 
tags: For each verl) token, the subject and ob- 
ject (it' there was one) were extracted from the 
parsed corpus, and the proportion of overlap be- 
tween subject and object nouns across all tokens 
of a verb was calculated. 
ANIM: To approximate animacy without refer- 
ence to a resource external to the corpus (such 
as WordNet), we count pronouns (other than it) 
in subject position (cf. (Aone and McKee, 1996)). 
The aSSUlnption is that the words I, we, you, .~'tze, 
he, and theft most often refer to animate entities. 
We automatically extracted all subject/verb tu- 
ples, and computed the ratio of occurrences of 
pronoun subjects to all subjects for each verb. 
The aggregate means by class resulting from the 
counts above are shown in Table 3. The distri- 
butions of each feature are indeed roughly as ex- 
pected according to the description in Section 3. 
Unergatives show a very low relative fi'equency 
of the TRANS feature, followed by unaccusatives, 
then object-drop verbs. Unaccusative verbs show 
a high frequency of the CAUS feature and a low 
frequency of the ANIM feature compared to the 
other classes. Although expected to be a redun- 
dant indicator of transitivity, pass and VBN do 
Ta.ble 3: Aggregated Relative Frequency Data \['or 
tile Five Features. E = unergatives, A = unac- 
cusatives, O = object-drol)s. 
Class 
E 
A 
O 
N MEAN I~ELATIVE FREQUENCY 
TR, ANS PASS VBN CAUS ANIM 
20 0.23 0.07 0.21 0.00 0.25 
19 0.40 0.33 0.65 0.12 0.07 
20 0.62 0.31 0.65 0.04 0.15 
not distinguish t)etween unaccusative and object- 
drop verbs, indicating that their distributions are 
sensitive to factors we have not yet investigated, a 
5 Experiments in Classification 
The frequency distributions of our features yield 
a vector for each verb that represents the relative 
frequency wdues for the verb on eacln dimension: 
\[verb, TRANS, PASS, VBN~ CAUS, ANIM, class\] 
Example: \[opened, .69, .09, .21, .16, .36, unaec\] 
\¥e use the resulting 59 vectors to train an au- 
tomatic classifier to determine, given a verb that 
exhibits transitive~intransitive sttbcategorization 
frames, which of the three major lexical semantic 
classes of English optionally intransitive verbs it 
belongs to. Note that the baseline (chance) per- 
Ibrmance in this task is 33.9%, since there are 59 
vectors and 3 possible classes, with the most coin- 
men class having 20 verbs. 
We used the C5.0 machine learning system 
(tnttp://www.rulequest.com), a newer version of 
C4.5 (Quinlan, 1992), which generates decision 
trees and corresponding rule sets from a training 
set of known classifications. We found little to no 
difference in performance between the trees and 
rule sets, and report only the rule set results. \¥e 
report here on experiments using a single hold- 
out training and testing methodology. In this ap- 
proach, we hold out a single verb vector as the 
test case, and train the system on the remaining 
58 cases. We then test the resulting classifier on 
tile single hold-out case, recording tile assigned 
class for that verb. This is then repeated for each 
of the 59 verbs. This technique has the benefit 
of yielding both an overall accuracy rate (when 
the results are averaged across all 59 trials), as 
well as providing tile data necessary tbr determin- 
ing accuracy for each verb class (because we have 
the classification of each verb when it is the test 
case). This allows us to evaluate tile contribution 
aThese observations have been confirmed by t-test.s be- 
tween feature values for each pair of classes. 
818 
%d)le <1:: Percent Accuracy of Verb Clas- 
sifica,l,ion Task Using \],'eatures in Combina- 
tion. T=TllANS; \])=PASS; \/=VBN; C=CAUS; 
An=ANIM. E=unergatives, A=unaccusatives, 
O =:ol)ject-drops 
Percent Accuracy by Class 
All l E I a I 0 \],'ca.I,1llJes 
1. '.I'P VCAn 69.5 85.0 
2. P V C An 64:.4: 80.0 
3. TV C An 71.2 80.0 
4:. 51' P C An 61.0 65.0 
5. TPVAn 62.7 70.0 
6. 51'PV C 61.0 80.0 
63.2 
dT.d 
73.7 
68.4 
63.2 
42.1 
60.0 
65.0 
60.0 
50.0 
55.0 
60.0 
of individual feal:ures with respect to their effect 
on the perfornlance of individual classes. 
We performed experiments on the \['ull sel, of fea- 
tures, as well a.s each subsel, of fea.l,ures wil,h a. sin- 
gle f~ture remow;d, as reported in Table d. Con- 
sider l;he first column in the ta.ble. The first line 
shows that the overall ~ccuracy for all live features 
is 69.5%, a reduction in tile error ra.te of more 
than 50% above the baseline. The removal o\[" the 
PASS lea.lure appears to improve performance (row 
3 of Ta.ble 4). However, it should be noted that 
this increase in performance results h:oln a single 
additionaJ verb being classified correctly. The re- 
ma.ining rows show thal no feal,ure is superflous 
or hm'mfld as l,he removal of any I'ealure has a. 5 
8% negative elfect on l)erR)rmance. Coral)arable. 
accuracies have been demonsl;rated vsing a more 
thorough cross-validation methodology a.nd using 
reel;hods that are, in principle, better a,t taking 
adva.nl,age of correlated lea,lures (Stevenson and 
Merle, 1999; Stevenson el. al., 1999). 
q'he single hold-out prol,ocol provides new data, 
fbr analysing the performalme on individual verbs 
and classes. The class-by-class accuracies a.re 
shown in the remaining columns of Ta.ble 4. \¥e 
can see clearly thal, using all five features, l,he 
unergatives are classified with much greater ac- 
curacy (85%) than l,he UlmCCUsatives and object- 
drop verbs (63.2% and 60.0% respectively), as 
shown in the first row. The rema.ining rows show 
that this l)al,tern generally holds \['or l,he subsel,s 
of features as well, with tire excel)lion of line d. 
\¥hile ful,ure work on our verb classificalion 
task will need lo focus on deterlnining features 
thal bel,ter discriminate unaccusative a.nd object- 
drop verbs, we can ah:eady exclude an explana- 
tion of the resull,s based simply on l,he wwbs' or 
tile classes' frequency. Unergatives have tile low- 
est average (log) frequency (1.3), but are the best 
classified, while unaccusatives and object-drops 
are comparable (a.verage log fi'equency = 2). If we 
group verbs by frequency, the proportion of errors 
to lhe total number of verbs remains fairly simi- 
lar (freq 1:7 errors/23 verbs; fi:eq. 2:6 errors/24 
verbs; freq. 3:4 errors/10 verbs). The only verb 
of frequency 0 is correctly classified, while lhe only 
one with log frequency 4 is not . In sum, we do 
not find that more frequent classes or verbs are 
more accurately classitied. 
lmlmrtantly, the experiments also enable us to 
see whether the fealures indeed contribute to dis- 
criminating the classes in the manner predicted in 
Seclion 3. The single hold-out results allow us to 
do 1;his, by comparing the individual class labels 
assigned using the full sol, of five features (TIIANS, 
PASS, VBN, CAUS, ANIM) to the class labels as- 
signed using each size four subset of features. This 
comparison indicates 1;he changes in class labels 
l,hat we can a.l,tribul,e to l,he added feature in going 
fi'om a size four subset to the full set of features. 
(The individual class labels supporling our a.naly- 
sis below a.re a.vailable from the authors.) \¥e con- 
cent;rate on tile three main features: CAUS, ANIM, 
TRANS. \¥e filial thai, the behaviour of lhese fea- 
l,ures generaJly does conform to our predicl;ions. 
We expected that TRANS would help make a. three- 
way distinction among the verb classes. While 
unergatives are ah:eady accurately classified with- 
Ollt TRANS, inspection of lhe change in class la.- 
bels reveals that the addition of TRANS tO tire sel; 
improves performance on unaccusatives by help- 
ing to distinguish 1;hem from object-drol)s, llow- 
ever, in this case, we also observe a loss in pre- 
cision of unerga.lives, since some object-drops are 
now classitied a.s unergatives. Moreover, we ex- 
pected CAUS and ANIM tO be parl,icularly helpfid 
in identi\['ying unaccus~l,ives, and this is also borne 
out in our analysis of individual la.bels. We note 
that the increased accuracy from CAUS is primar- 
ily due to bel,ter disl,inguishing unergatives from 
unaccusatives, and l,he increased accura.cy from 
AN1M is primarily due go better distinguishing un- 
accusatives from objecl,-drops. \¥e conclude tha.t 
the feal,ures we have devised are successful in clas- 
siting optionally 1,ra.nsil;ive verbs because they ca.p- 
lure predicted difl'erences in underlying argument 
structlrre. 4 
4 Matters are more cmnplex with the other two features 
and we arc still interpreting tile results. Our prediction 
819 
Table 5: Pair-wise Agreement (Calculated t)3' the 
Kappa Statistics) of Three Experts (El, E2, h;3) 
Compared to a Gold Standard (Levin) and to the 
Classifier (Prog). Numbers in parentheses are per- 
centage of verbs on which judges agree. 
PltOG F.I. E2 E3 
El .36 (59) 
E2 .50 (68) .59 (75) 
E3 .49 (66) .53 (70) .66 (77) 
LEWN .54 (69.5) .56 (71.) .80 (86.5) .74 (83) 
6 Comparison to Experts 
In order to evaluate the performance of the al- 
gorithm in practice, we need to compare it to the 
accuracy of classification performed by an expert, 
which gives a realistic upper bound for the task. 
In (Merle and Stevenson, 2000) we report the re- 
suits of an experiment that measures experts per- 
tbrmance and agreement on a classification task 
very similar: to the program we have described 
here. The results summarised in Table 5 illus- 
trate the performance of the progra, m. On the 
one hand, the algorithm does not perform at ex- 
pert level, as indicated by the fact that, for all ex- 
perts, the lowest agreement score is with the pro- 
gram. On the other: hand, the accuracy achieved 
by the program of 69.5% is only 1.5% less than 
one of the human experts in comparison to tire 
gold standard. In fact, if we take the best per- 
formance achieved by an expert in this task 
86.5%--as the maximum achievable accuracy in 
classification, our algorithm then reduces the er- 
ror rate over; chance by approximately 68%, a very 
respectable result. 
7 Discussion 
The work here contributes both to general and 
technical issues in automatic lexical acquisition. 
Firstly, our results confirm the primary role of 
argument structure in verb classification. Our ex- 
perimental focus is particularly clear in this re- 
gard because we deM with verbs that are ~Illilli- 
was that VBN and PASS would behave similarly to TRANS. 
In fact, PASS is at best unhelpful in classification. VBN 
does appear to make the expected I.hree-way distinction. 
The change ill class labels shows that the improvement in 
performance with VBN results from better distinguishing 
unergatives fi'om object-drops, and object-drops from un- 
accusatives. The latter is surprising, since analysis of the 
data found that the VnN feature values are statistically in- 
distinc~ for the object-drop and unaccusative classes as a 
whole. 
mal pairs" with respect to argument structure. 13y 
classif~ying verbs that show the same subcatego- 
rizations into different classes, we are able to elim- 
inate one of the confounds in classification work 
created by the fact that subcategorization and ar- 
gument structure :M'e largely co-variant. We can 
infer that the accuracy in our classification is due 
to argument structure information, a.s subcatego- 
rization is the same for: all verbs, confirming that 
the con, tent of thematic roles is crucial for clas- 
sification. Secondly, our results further support 
the assumption that thematic differences such as 
these are apparent not only in differences in sub- 
categorization frames, but also in differences in 
(;heir frequencies. We thus join the many recent 
results that all seem to converge in SUl)porting 
the view that the relation between lexical syntax 
and semantics can be usefully exploited (Aone and 
McKee, 1996; l)orr, 1997; Dorr and Jones, 1996; 
Lapata and Brew, 1999; Schulte im Walde, 1998; 
Siegel, 1998), especially in a statistical franmwork. 
Finally, we observe that this information is de- 
tectable in a corpus and can be learned automat- 
ically. Thus we view corpora, especially if an- 
notated wil;h currently available tools, a.s useful 
repositories of implicit grammars. 
Technically, our N)proach extends existing 
corpus-based learning techniques 1;o a more com- 
plex lea.ruing problem, in severaJ dimensions. Our 
statistical apl)roach , which does not require ex- 
plicit negative examples, extends ai)l)roaehes that 
encode l~evin's alternations directly, as symbolic 
properties of a verb (Dorr et al., 1995; l)orr and 
Jones, 1996; l)orr, 1997). We also extend work 
using surface indicators to approximate underly- 
ing properties. (Oishi and Matsumoto, 1997) use 
case marking particles to approximate graimnat- 
ical functions, such as subject and object. We 
improve on this approach by learning argument 
structure properties, which, unlike grammatical 
functions, are not marked lnorphologically. Oth- 
ers have tackled the problem of lexical semantic 
classification, as we have, but using only snbeate- 
gorization frequencies as input data (Lapata and 
Brew, 1.999; Sehulte im Walde, 1998). By con- 
trast, we explicitly address the definition of fea- 
tures that can tap directly into thematic role dif- 
ferences that are not reflected in "subcategoriza- 
tion distinctions. Finally, when learning of the- 
matic role assignment has been the explicit goal, 
the text has been semantically annotated (Web- 
ster and Marcus, 1989), or external semantic re- 
820 
sources ha.re I)een consulted (Ache and McI(ee, 
19!)6). We extend these results by showing that 
them;~tic informa,tion can 1)e inducexl from corpus 
COtllltS. 
The exl)erimental results show that our method 
is l)owerful, and suited to Cite classitica.l;i(m of lex- 
ica.1 items. However, we have not yet addressed 
the problem of verbs that can h;~ve multiple clas- 
sifications. We think tha.t many eases of am- 
1)iguous classification of verb types can 1)e ad- 
dressed with the notion of intersective sets in- 
troduced by (Da.ng et al., 71998). This is an im- 
t)ortant concept tha,t l)rOl)OSes tha,t "i'egula, r" a.m- 
biguity in classifica.tion -i.e., sets of v(;rbs that 
ha.ve the same multi-way classitications a~ccording 
to (l,evin, 1993) can be captured with a. liner- 
grained notion of lexical semantic classes. I~x- 
tending our work to exploit this idea. requires 
only to define the classes a.pl)ropriately; the ba- 
sic a.1)t)roac\]l will remain the same. When we turn 
to consider ambiguity, we must a.lso address the 
l)roblem l;ha.t individual insta.nces of verl)s may 
come from diffel:ent classes. In future research we 
t)lan to extend our method to the ('\]a.ssificagion of 
a.mbiguous tokens, by experimenting with a. func- 
tics that combines severaJ sources of information: 
a bias tbr the verb type (using the cross-corpus 
sta.l;istics we collect), as well as \[~a.tures o\[" the us- 
age of the insta.nce being classiiiod (cf. (l,apa.ta 
a,n<l I~rew, t999; Siegel, 199,q)). 
Appendix A 
Unergatives: floated, galloped, glided, hiked, hopped, hur- 
ried, jogged , jumped, leaped, marched, paraded, raced, 
rushcd, seootcd, scurricd, skipped, tiptoed, t~vttcd, va,dtcd, 
wandered. Unaccusativcs: boiled, changcd, cleared, 
collapsed, cooled, cracked, dissolved, divided, exploded, 
flooded, .folded, fractured, hardcned, melted, opcncd, sim- 
mcred, solidified, stabilized, widened. Object-drops: bof 
rowed, eallcd, earvcd, clca~cd, danced, inheritcd, kiekcd, 
knittcd, or aniscd, packcd, paintcd, playcd, reaped, rcnted, 
skclehe.d, studied, swallowed, typed, washcd, ycllcd. 

References 

Cllinatsu Aolm and Dot@as Mcl(ec. 1990. Acquiring 
predicate-argument mapping information in multilingual 
texts. In Branimir l\]oguraev and James l)ustetjovsky, ed- 
itors, Cou)us \])rocessinq.\[or l, cxieal Acq~tisition, pages 
191-202. MIq' Press. 

Michac'\] lh'ent. 1993. l"rom grammar to lexicon: UnSUl)er- 
vised learning o\[ lexical syntax. Compatational Linguis- 
tics, 1912):243 262. 

'lk:d Briscoe and ./ohn Cm'roll. 1997. Automatic extraction 
of subcategorization from corpora. In IS"ocs of the I"~Hh 
ANLP Co,@rence, pages 356-363. 

Michael John Collins. 1997. Three generative, lexicallsed 
models for statistical pro'sing. In lS"ocs of ACL '97, pages 
16-23. Madrid, Spain. 

Hot Trang Dang, Karin Kipper, Mm'tha l)almer, and 
.loseph l{osenzweig. 1998. Investigating regular sense 
exl, ensions 1oased on intersective 1,evin classes. \[n 
Procs of COI, ING-ACL '98, pages 293 299, Montreal, 
Canada. 

BOltnie 1)orr attd l)(n,g Jones. 1996. Role of word sense 
disambiguatlon in lexical acquisition: lhedieting seman- 
tics from syntactic cues. In 15"oc. of UOL1NG'96, pages 
3;22 327, Col)enhage,l, l)enmark. 

Boluiie l)orr, Joe t~larlliall, and Amy VVeinberg. 1995. 
\]~l:Oln syntactic encodillgs to thematic ro|es: l}uilding 
lexical entries for interlingual MT. Journal o\[ Machine 
!l'ran.¢lation, 9(3):71- -100. 

\]\]onnie l)orr. 1997. I,m-ge-scale dictionary construction 
for foreign  tutoring and inter\]ingual machine. 
translation. Machine Translation, 12:1 55. 

K('.n l\[ale and Jay \](eyser. 1993. On argument structure 
and the lexical representation of syntactic relations. In 
I(. \]tale and .\]. l(eyser, editors, The View fl'om lJuilding 
20, pages 53-110. MIT Press. 

Maria Lapata and Chris Brew. 11999. Using subcategoriza- 
tion t,o resolve verb class ambiguity. In Frocs of Joint 
,5'IGDAT Con\[erence on Empirical Mett, ods in Natural 
Langaage, College Park, M\]). 

Beth Lcvin. 1993. English Verb Classes and Alternations. 
University of Chicago Press, Chicago, I\],. 

Christopher 1). Manning. :1993. Automatic acquisition of 
a large subcategorlzation dictionary l>om corpora. In 
l)rocs of A UL'93, pages 235 -242. Ohio State University. 

Diana McCarthy. 2000. Using semantic l)referenee Lo idcn- 
tit'y verb participation in role switchitlg alternations. In 
15"oes of NAA C1,-2000, Seattle, Washington. 

l)aola Merlo and Sllzantle Stevenson. 2000:-F, stablishing 
the upper-bound and inter-judge agremcnt in a verb 
classification task. In l)rocs of LI~EC-2000, pages 1659 
1G64. Athens, Greece. 

Akira Oishi and Yuji Matsumoto. 1997. l)etecting the or- 
ganization of semantic subclasses of Japanese verbs. In- 
ternational Journal of Corpus Linguistics, 2(1):65 89. 

,/. \]toss Quinlan. 1992. (?4.5 : l~rograms .for Maehi,ze 
Learning. Morgan l(aufmamt, San Mateo, CA. 

Sabine Sehulte im Walde. 1998. Automatic semanl.ic clas- 
sification of verbs according to their alternation be- 
haviour. AIMS l\]eport 4(3), IMS, UniversitSt Stuttgart. 

I",eic V. Siegel. 1998. Linguistic Indicators .for L(mouaffc 
Undcrslandin 9 l)h.I), thesis, I)ept. of Comput(w Sci- 
ence, (Johm|l)ia University. 

Suzalllle SteveHsoll alld l)aola Merlo. 1997. l~exica\] st.ru(:- 
turc and \])ro(:essing complexity. Lan uagc and (7o\[pzilivc 
\])rocc:,~se.,% 12(1-2):3.'19 399. 

Suzanne Stevenson and \])aola Mer\]o. 1999. Verb classifi- 
cation using distributions of gt'ammatical features. In 
l)rocs of 1¢,4 CL'99. Bergen, Norway. 

Suzmme Stevenson, l)aola Merlo, Natalia Kariaeva, and 
l(amin Whitehouse. 1999. Supervised learning of lexical 
semantic verb classes using fl'equency distributions. \[n 
I)rocs of Si.qLex '99, College Park, Maryland. 

Mort Webster and Mirth Mm'cus. 1989. Automatic ac- 
quisition of the lexical semanl, ics of verbs fl'om sentence 
frames. In Procs of A CL'89, pages 177-184, Vancouver, 
Canada. 
