Chlstering Verbs Semantically 
According to their Alternation Behaviour 
Sabine Schulte im Walde 
Institut f/it Maschinelle S1)rachvera,rlw.itung 
UnivcrsitSA: Stuttgart 
Azenbergstral~e 12, 70174 Stuttgart, ~ermm W 
schult e@ims, uni-stuttgart, de 
Abstract 
Verbs were clustered seInantically on the basis of 
their alternation behaviom:, as characterised I)y their 
syut;actic sul)caI;e.gorisation franms extrael:ed from 
lllllXillllllll proba.bili(;y parses of a robu,st sI;atisl;ical 
pa.rser, aald eOml~leted by assigning \'VordNe(; classes 
as se\]ecl, ional preferences (;o the fl:ame arl~uments. 
The clustering was achieved (a) iteratively by mea- 
su):ing the lelal;ive enl;rol)y b(,tween (;he verbs' l)rOl)- 
ability dis(;ribut, ions ()vet' the. franle (,yl)eS, and (1)) 
l)y ul;ilising a lateni; (:lass m/a.lysis t)ased on the joint 
frequencies of verbs and flmne. (,yl)eS. 
1 Introduction 
This paper eml)irieally investiga(;es the proposition 
that ve, rl)s can 1)e seman(;ically classilied according 
to their synl;a(:(;ic alterna.tion 1)ehaviour (:()n(:ernint,; 
subca.(;(%orisation frames ;rod their seleetional i)ref- 
er(;nces tbr the, arguments within the frames. The 
idea is l(;lal;ed Ix) (lx'.vin, 1993) who de.lined verl> 
classes on the basis of verl) atterl)al;ion beh~aqour. 
For exmnl)le , (;he seman(;ic (:lass of l&h, icle Names 
coni, ains verbs like lmlloo'n, bicycle, ca.'n, oe, skate, ski 
which agree in (;he prol)erties (1)-(4) below. 
(1) 1NTltAN.qlTIVI.; IJSI,:, possibly followed by ;t 
path: 
a. 'lPhey skated. 
1). They skated a\]()ng Llle canal/over t:}le 
l>ri(lge. 
(2) INI)UCI'H) ACTION AI:rEIuqA'rIoN (~ome 
verl>s): 
a. Ite skated Penny around the rink. 
(causing the aetiolt nanled 1)y the verb; 
tyl)ieal eausee is m~ alfimal;e volitional 
entity) 
b. Penny skated a.round the )'inl¢. 
(3) LO(3A'rivI,; PI{.1,;I'OSITION \])l{Ol > A~:r),m~A- 
TION (some verbs): 
a. They skated along the canals. 
b. They skated the canals. 
(~) I{.I,;S 1.11 "IWI'I VI,; IqlIIASE: 
Pem~y skated her skate blades bhmt 
(an XI ) describing (,he sl:a~e achieved by lhe 
referent of (;l)(~ nora) l)hrase as a resu\]l; of the 
acl,iou mmw.,.l by the verb) 
Levin's work rel)resenCs th(~ basis foc a tani,;e <)f ,e- 
cent invesl;ig;d,ions veril)qng (\])orr :m(1 .lone.% \] 996), 
evaluating (SI;e.\,ensoll aud Merl% 199! 0 or utJi;::. 
ing (\]mpata., 1999) the proposed elas.qitlcation-,~ 
well as transferring i(, 1.o other lanw, ta.ge',~ tha:l }~;u- 
glish (Jones et al., 7199.\[)~ 
Generally, the definition of a verl/s selllaIii.i(: (:\];/:~.q 
can be considered as part of its lexic.al entry, next io 
idiosyncratic intinmation: the Sellla,lll;ic CI;ts~; gell- 
eralises as a. l.ype definition over ~t range o\[ s.yn- 
l,acl.ic nnd s(mmnl:ic \])tOl)ertie:; , Lo .qUl)por(; Nai.u.':d 
l~a.nguat,;e Processinp; in V;/\['\].()lI~4 .~II'(~;/S like. h',xicor;c~t- 
phy (llapl)ai>ort l\]ov'av and I,evin, t99~;), wo){I :;(:it~:e 
disaml)igual:i(m (l)orr aml .\]oneea, ?l!)!)i;), or ()'.~+- 
nlelll; cla,~silh:al.ion (Klavm~.~; and Kaa, 199,~0o 
I al;l;enlpLed l;o aul;omaLica\]ly ch,sicr verb>~ h~i;o 
selna.lti, ie elas.'-;c~; Oll l.be b;.tsi'A Of i,}\]e vel\]),-; ~ ~ll{;(~P,;I- 
tion I)ehavioltr. 'l)lm iulm 1; into i:he at/i;(ii\[Nti;it; i I J(!!!(: 
tion pr()(:e,~q:4 w;m characi:eri:;ed 1)y (J~e \;erl)',4' di>:l.~;Im- 
(,ion eve\]" ,qyJil,;tc{,\](; ~;ul>ca.l;egori>;;tl;i(m ' .:~; I! ~1. U. 'A ~!\]K- 
Lrael;ed from )u:txinnlm pvol>alfil}t 5' (Vil.crlfi) !:, ~c~e:-', 
()f ~/ l'O})ll,ql; 'r;I;t, ll;i'At,\]C~!\] l);/l':;¢;!'~ ;ll!(t c:()i(l\]~i(;i,,.,d }.,\- ;t!;- 
siffltill~ \Vor(\[)~,:;(; (:l:is',',(z:; ;~.~; ;;(~\]('~(:{;i.o~l,lJ l~i:eI'ei'~!; ,:c:; {:o 
the frame a.rg~m~e**i;:',. 'Fbe, clmd:eti~i~ a~;~:', ;,el,}: ,4d 
(a) iteratively 1)y me;mminig Lhe relai,ive ,;~{,~,')p3 '.m- 
l,ween t,he verbs' probability dist):it)c,i,io,~:~ ; ,,,(:, the 
frame tyl)e, % m~d (I)) l) 3, uLi/i'.-;ing a. Ld:e,t cD;;;.i c,:tal.- 
3'sis ba,q(:d on (,l)e joi .i; J'requen(:ies of verl>~s a:,>d ~,;::,1~'~¢'~ 
tyl)eS. U,~;itu~ Ii,evin':; \,c)'b elas,',ific:d;\]o~) ;',.~'~ c,,ai~;~;ion 
basi,q, 61% of the \,ert)~; were classified (;o1;(;<5;13 ~>!;o 
~;emm~tic cla:;5;e; l>y met,l:,ut (a), ;rod 540/0 b3 :~.~i.!..~d 
(1,). 
Section 2 de:;cribe',; (;l)e three. :de,!,:~ h~ i!w, ~:~i.,')-- 
marie aC(luisit;ion of ,qen)an(;i< verb cla::;sc~;; i.l)e ~ .,h> 
ation takes l)\]a(:e, in ,~eel;ioJl 2{, ;lAd .~;oel.ioil d (tJ:,,.l~>;~;{~; 
(,he re:mll:s. 
747 
2 Automatic Acquisition of 
Semantic Verb Classes 
Tile first step was the induction of purely syntac- 
tic subcategorisation fi'ames for verbs from the het- 
erogeneous British National CoTpus (BNC). I used 
the robust statistical head-entity parser as described 
in (Carroll and Rooth, 1998) which utilises an En- 
glish context-free grammar and a lexicalised prob- 
ability model to produce parse forests, and ex- 
tracted the maximum probability (Viterbi) parses, 
for a total of 5.5 million sentences. The trees were 
mapped to subcategorisation frame tokens consist- 
ing of a inain verb and its argmnents. Each syntac- 
tic category was accompanied by the lexical head, 
the pret)ositional phrase by the lexical prepositional 
head plus the head noun of the subordinated noun 
phrase. Proper names were accompanied by the 
identifier pn. The head information in the frames 
was lemmatised. For example, the sentence Sam- 
rout handled the plaudits during the awards cere- 
mony would be represented by the frame token 
handle subj*pn*sammut obj*plaudit pp*during*ceremony. 
To generalise over the verbs' usage of subcategori- 
sation frames, I defined as 88 frame types the most 
frequent frames which appeared at least 2,000 times 
in total in the BNC sentence parses, disregarding 
the lexical head information. On the basis of the 
frame types I collected information about the joint 
frequencies of the verbs in the BNC and the subcat- 
egorisation frame types they appeared with. These 
frequency counts then represented the syntactic de- 
scription of the verbs. 
Tim next step was to refine the subcategorisation 
frame types by a preferential ordering on conceptual 
classes for the argument slots in the fl'ames. The 
basis I could use for the selectional preferences was 
provided by the lexical heads ill the fi'anm tokens. 
For example, the nouns appearing in the direct ob- 
ject slot of the transitive frame for the verb drink 
included coffee, milk, beer, indicating a conceptual 
class like beverage tbr this argument slot. 
I followed (Resnik, 1993)/(Resnik, 1997) who de- 
fined selectional preference as the amount of infor- 
mation a verb provides about its semantic argument 
classes. He utilised the WordNet taxonomy (Beck- 
with et al., 1991) for a probabilistic model captur- 
ing the co-occurrence behaviour of verbs and con- 
ceptual classes, where the conceptual classes were 
identified by WordNet synsets, sets of synonymous 
nouns within a semantic hierarchy. Referring to the 
above example, the three nouns coffee, milk, beer 
are in three different synsets -since they are not 
synonyms-, but are all subordinated to the synset 
{beverage, drink, potable}. The goal in this example 
would therefore be to determine the relevant synset 
as the most selectionally preferred synset for the di- 
rect object slot of the verb drink. 
Redefined fbr iny usage, the selectional preference 
of a verb v tbr a certain semantic class c within a 
subcategorisation franm slot s was deternfined by 
the association ass between verb and semantic class: 
=des Pl, C, lV~pOg ~ (5) 
with the probabilities estimated by maxinmnl likeli- 
hood: f(v,, 
P(C*lVs) - f(vs) (6) 
p(Cs) = f(c.,) _ f(cs) (7) 
f(c's) /(8) 
and the following interpretation: 
1. f(v,, c,): number of times a semantic class ap- 
peared in a fi'ame slot of a verb's fi'ame type 
2. f(v,): frequency of a verb regarding a specific 
fi'ame type, i.e. the joint Dequency of verb and 
frame type 
3. f(Cs): numl)er of times a semantic class ap- 
peared in a fi'ame slot of a frame type disre- 
garding tim verb 
4. ~¢'c,~'**,,s f(c'~) equals f(s), the frequency of the 
argument slot within a certain frame type, since 
summing over all possible classes within a sub- 
categorisation fl'ame slot equals the lmlnber of 
tinms the slot; appeared 
5. f(s): uulnber of times the franle type appeared, 
since the frequency of a. frame type equals the 
frequency of that frame with a certain slot 
marked 
The fi'equencies of a semantic class concerning an 
argument slot, of a frame type (dependent or inde- 
pendent of a verb) were calculated by all approach 
slightly difl'erent to Resnik's, originally proposed 
by (Ribas, 1994)/(Ribas, 1995). For each noun ap- 
pearing in a certain argument position its fi'equency 
was divided by the nmnber of senses the noun was 
assigned by the WordNet hierarchy, t to take account 
of the uncertainty about the sense of the noun. The 
fi'action was allocated to each conceptual class in the 
hierarchy to which the noun belonged and accumu- 
lated upwards until a top node was reached. Tile 
result was a numerical distribution over the Word- 
Net classes: 
/(noun) (8) s(c,/-- E 
1For example, when considering the noun coffee isolated 
from its context, we do not know whether we are talking about 
the beverage coffee, the plant coffee or a coffee bean. Thero.- 
fore, a third of the frequency of the noun was assigned to each 
of the three classes. 
748 
I restricted tlm possible (:onceptual classes within 
1;he fl'ames' argmnent slots to 23 Wor(tNet nodes, 2 1;o 
facilitate generalisation and comI)arison of the verbs' 
seleetional preference behaviour. 
On the basis of the inforlnation al)out subcategori- 
sation frame types and their arguments' concet)tual 
classes I clustered 153 verbs from Levin's classitica- 
(;ion. I chose (i) some l)olysemous verbs to investi- 
gate how this l)henomenon could be handled 1)y the 
clustering algorithms, and (ii) high and low frequent 
verbs to see the intluence of frequency on th(; al- 
gorithms: the 1~3 verbs had 226 verb senses which 
belonged to 30 different semantic classes. D)ur of the 
verbs were low-Dequeney verbs with a total corpus 
frequency below 100. 
To cluster the verbs I applied two different al- 
gorithms, and each algorithm clustered the verl)s 
bot, h (h) according to only the syntactic informa- 
tion about tlm subcategorisation frames, and (B) 
according to the intbrmation at)out the subcategori- 
sation ti'ames including their selectional 1)referelmes. 
,. lterative clustering based on a dcfinition 
by (Ilugh, es, 109/,): 
In the l)eginning, each vert) represent;ed a single- 
ton cluster. Iteratively, the distances between 
tim clusters were lneasure(l and the closest chls- 
ters merged togel;her. 
For the rel)resentation of the. verbs, each verl) 
v was assigned a distribution over the ditfere.nt 
tyl)es of subcategorisatioll fl'anms i, according 
1;o the. maximum likelihood estimate, of (k) the. 
verb apl)earing with the frame tyl)e: 
f(v,/,) f(,,,) (9) 
with f(v,t) the joint fi'equency of verb and 
frmne type, and f(v) the fl'e(tuency of the verb, 
and (B) the verb appearing with the frame tyt)e 
mid a selectionally t)refe.rred (:lass coml)ination 
C for the m'gmnent t)osil;ions .s in t: 
i,(~,, ely ) =,,ef p(tl v) * J,(Clv , t) (10) 
with p(/,lv) defined as in equation (9), and 
p(C\]v, t) =&/ Ec:6,:l,,.~, \[Isct a.s.s'(v.~, c') (11) 
which intuitively estimates the probability of a 
certain class combination by comparing its as- 
sociation value with the sum over all possible 
class combinations, concerning the respective 
verb and frame. 
2I chose l.he 11 tel) level nodes of the 11 WordNet l,ierar- 
chies as conceptual classes. 'Phe top level node Entity seemed 
too general as concel)tual class, so it was replaced by its 13 
sulml'dinal, ed synsets. 
Starting out with each verb representing a sin- 
gleton cluster, I iteratively determined the two 
closest chlsters by applying tim information- 
theoretic measure relative cutropy :~ (Kulll)ack 
mid Leibler, 1951) to comi)are the distributions. 
The nearest clusters were merged into one clus- 
ter, and their distributions were merged 1)y cal- 
culating a weighted average. Based on test runs 
I defined lleuristics about how many elusl, eriug 
iteral;ions were pertbrmed. In addition, i lira- 
ire(1 the maximum mnuber of verbs within one 
(:luster to four elements because otherwise the. 
verbs showed the tendency to cluster together 
in a few large clusters only; so after the over- 
all clustering process was finished, each cluster 
with more tlmn four members initialised a fllr- 
ther clustering pass on itself. 
Unsupervised latent, class aualysis as described 
in (l~ooth, 1998), based on the cxpcetation- 
'maximisation al.qorithm: 
The algorithm identified categori(:al types 
among indirect, ly observed multinomial distri- 
butions 1) 3, apl)lying the EM-algorithm (\])elnp- 
steret al., 1977) to maximise the joint prol)a- 
bility of (h) t;he verb and frmne tyl)e: p(v, t), 
and (B) the verl) and frame type considering 
the selectional I)referenees: p(v, t, C). 
\]TUl)Ut to the algorithm were absolute, frequen- 
cies of the verl)s at)l)earing with the sul)categori- 
sation frames. Test runs showed that 80 clusters 
modelled the semantic verl) classes best. To 1)e 
able to comI)a.re the analysis wit;h the iterative 
clustering al)proach , I also limited tim numb(~r 
of verbs wit;hin a (:lus|;er 1;o four considering 
that; generally all verbs ai)l)ear within each (:lus- 
l;er when using this apl)roach , the verbs wil;h l:he 
highest l)rol)abilities where chosen. 
D)r version (h) the frequencies were provide.d 
by the joint frequencies of verbs and frame 
tyI)es, for version (B) I used the association 
va.lues of the verbs with tile frame tyl)eS con- 
sidering seleetional preferences, as described 1)y 
equation (10). 
The unsupervised algorithm then classified joint 
events of verbs and subeategoris~tion frmncs 
with 200 iterations of the EM-algorithm into 80 
clusters r, based on the iteratively estimated 
vahles 
v(v, 0 = v, l,) = 
T T (12) 
aConcerning the two typical prol)lems one has with this 
measure, (i) zero frequencies were smoothed 1)y adding 0.5 to 
all frequencies, and (ii) since the measure is not symmetric, 
the resl)ective smaller vahm was used as distance. 
749 
Information 
SFs 
SFs + Pretls 
Clusters Verbs 
Total Correct Total Correct Recall Precision 
31 20 90 55 36% 61% 
30 14 81 31 20% 38% 
Figure 1: Evaluation based on Iterative Clustering 
hfformation Clusters 
~lbtal Correct 
SFs 80 36 
SFs ~1-Prefs 80 22 
Verbs(Senses) 
Total Precision 
107(159) 
153(226) 
Correct Recall 
58(9O) 38(4O)% 
47(56) 31(25)0/o 
,54(57)% 
31(25)% 
Figure 2: Evaluation based on Latent Classes 
I,(,,, t, c) = v, t, c) = Cl ) 
T T 
(13) 
for versions (h) and (B), respectively. 
3 Evaluation 
The evaluation of the resulting clusters was based 
on Levin's classification. Figures 1 and 2 present the 
success of the two clustering algorithms, considering 
tim two difl'erent informational versions (/~) and (B). 
They contain the total mnnber of clusters the algo- 
rithms had formed (clusters containing between two 
and four verbs in the iterative algorithm, and the 
fixed immber of 80 clusters in the l&l;ent (:lass rarely- 
sis), the prol)ortion of correct clusters (non-singleton 
clusters which were subsets of a Levin (:lass, for ex- 
ample the cluster conl;aining the verl)s need, like, 
,want, desire is a subset of the Levin (:lass Desire), 
and the numl)er of verbs wMlin those clusters. In 
figure, 2 the nulnl)er of verbs in brackets rethrs to the 
respective number of Lheir senses, since a verb could 
be clustered several times according to its senses. 
For examl)le, the verl) want could t)e meml)er of the 
(:lasses Desire and Declaration. 
Recall was define(l by the I)ercentage of verbs 
(verb senses) within the correct clusters compared 
to the total munber of verbs (verb senses) to be clus- 
tered: 
I,,e,'bs ......... , ,.,,.,, ..... I 
?*C'C = 153 
(Iv ,.b .......... , ........... l) 
. 226 
and precision was defined by the percentage of verbs 
(verb senses) apl)earing in the correct clusters com- 
pared to the numl)er of verbs (verb senses) apl)earing 
in any cluster: 
\[ ve.rbs..o,..~,.,t ~.t,~t,~,.~ \[ wee = Ive,.r,s,,, ~,,.,,~,., I 
(iv-+ .................. , ........... I) 
Concerning t)recision, the assignntent of verbs into 
semantic classes was most successfifl when using the 
il;erative distance clustering method; 61% of all verbs 
were clustered into correct classes. Clustering the 
verbs into latent classes was with 54% less success- 
tiff. With both clustering methods the results be- 
came worse when adding information about the se- 
lectional preferences tbr the arguments in the sub- 
categorisation fl'ames. 
A baseline ext)eriment was performed in order to 
determine how hard the task of verb clustering was: 
each verb was randomly assigned another verb as 
"closest neighbour", which resulted in only 5% el the, 
verl)s being paired with a verb Don1 the same Levin 
(:lass. Performing the same experiment by assign- 
ing the closest neighbour on the basis of moasm'ing 
the relative entropy between two verbs' distributions 
over subcategorisation fl'ames resulted in 61% of the 
verbs pointing to a verb flom the same Levin class. 
4 Discussion d 
The classitications of both clustering approaches il- 
lustrate the close relationship between alternation 
behaviour and semantic classes, lYor exalnple, the 
common preferences of verbs (see the tlve most 
probable frames) ill the iteratively crea.ted Desire 
(:lass were towards a sul)ject followed by an infini- 
tival phrase (subj :to). Alternatively a l;ransitive 
subj :obj flame was used, partly followed by an ad- 
ditional infinitival phrase indicated by to: s 
4For a more detailed discussion see tile original 
work (Schulte im Walde, 1998). 
Note that the (wrongly chosen) intransitive fl:ame is listed 
as well. This is {Ill('. t,o underlying sentences containing an NP 
ellipsis, parsing mistakes and Dame extraction. 
750 
Verl) 
need 
desire 
Frame l)rol)ability 
subj:to 0.38 
subj:ol)j (I.32 
subj 0.10 
subj:obj:to 0.05 
subj:obj:pp.for 0.02 
sul)j:to 0.34 
subj:ol)j 0.34 
subj 0.14 
sul)j:obj:adv 0.(14 
sub.i:obj:obj 0.03 
subj:to 0.53 
subj :obj 0.15 
subj 0.11 
sul)j :ol)j :to (1.10 
subj :to:adv 0.02 
subj :obj 0.25 
subj 0.24 
sul)j:to 0.20 
sul~j:obj:to 0.(17 
sul)j:sent (I.02 
Adding ilfformation about the selectional prefer- 
enees of the verbs' argmnents hell)s to gel; a deeper 
idea about their lexical semantics. D:)r exalnple, 
mar~,'n, er of Motion verbs 1)referably appeared with 
a subject only, sometimes with a following adverl). 
The subject was an inanimate ol)ject, for move it 
might also be a part (such as a body part like fin_ 
ger) or a grout), roll and fly alternatively used the 
transitive frmne type subj :obj, preferal)ly with a 
living entity as subject, followed by an inanimate 
ob.iecl;: 
roll 
fly 
Fl'itllle 
sub.i (l'hysObject) 
subj (l'hysObject):adv 
subj(Agent):obj (lq~ysObject) 
subj (IJ fel,'orm) :ol)j ( lq C,'sObject) 
subj(Agent):obj (lhu't) 
subj(l'hysOI)ject) 
subj (l'hysOI)jcct):adv 
sub.i(Lifel,'orm) :obj (l'hysObjcct) 
subj (l,illaForm) :pp.to (1A fel"or n 0 
subj(Lifeleorm):l)p.to (Agent) 
sul).i(l 'hysObject) 
subj (l)hysOl~ject):adv 
sul)j (1 're'i,) 
sul~j(Groul)):adv 
subj(Part) :adv 
l'rob~dfility 
0.24 
0.10 
0.07 
0.07 
0.05 
0.3d 
0.12 
0.(17 
0.05 
0.0,1 
0.20 
0.11 
0.09 
0.0,1 
(1.0,1 
Parallel examples created by the latent class analy- 
sis present the clusters with the most probable verbs 
and frmnes, according to cluster membershi I) (first 
column). The dot indicates whether the verb-fi'mne 
combination was seen in the data, the mmtber next 
to the verb frame gives the probability of the verb- 
frmne combination. 
Some verbs of Telling were clustered mainly accord- 
ing to their similar transitive use combined with an 
infiifitival phrase: 
~°_ g o'° g 
Clusi;er d o cb o 
o = 
,, o 
o 9. 
(}.17 advise • • • • 
0.12 teltch • • • • 
0.12 instruct • • • • 
The verl)s of Aspect alternate between a subject 
only, realised by an action, an inanimate subject fol- 
lowed by an infinitiwfl phrase, and a living subject 
followed by a gerund: 
g', ~ g g 
ClHster o d o o 
© 
b0 
< ;~ ;5 < 
0.3,1 start • • • • 
0.19 finish • • • 
0.18 stop • • • 
0.16 begin • • 
Both approaches established a relationship be- 
tween alternation behaviour and semantic class by 
only considering information about the syntactic us- 
age of the subcategorisation Dames. The refinement 
by the frames' selectional preferences allowed fllrther 
demarcations by the identifying (:onceptual restric- 
tions on tile use of the frames. 
Since tim latent class analysis is a soft; clustering 
method, it additionally distinguishes between the 
dith;rent verbs' senses and the resl)ective uses of 
subcategorisation Dames. For example, the verb 
play was clustered with meet 1)ecause of tile com- 
mon strong tendency towards a transitive ti"ame il- 
lustrating a gen(;ral meeting, and it, was clustered 
with figh, t t)eemlse of their colnmon preference for 
an intransitive fi'ame together with a prepositional 
phrase headed 1)y against, illustrating a more aggres- 
six'(; me.eting like a fight: 
Cluster 
0.49 meet 
0.20 l)lay 
Cluster 
I~ g g g 
5 5 o o 
bO 
~L 
0.22 fight • • • • 
0.20 play • • • • 
An extensive investigation of tile linguistic relia- 
bility of the clustered verbs and frames showed that 
l;he character(sing usages could be under\](ned by cor- 
pus data, for example the above cited transitive use 
751 
of the verb fly concerning the subj : obj frame type 
with a living subject and ml inanimate object can be 
illustrated by the BNC-sentence In March the man- 
ufacturer's test pilot flew the aircraft for its annual 
inspection check flight. The clusters were therefore 
created on a reliable linguistic basis representing (a 
selective part of) the verbs' properties. 
Comparing the two informational versions, however, 
showed that refining the fralnes with selectional pref- 
erences points to a problem caused by data sparse- 
ness in the verb description. Investigating the au- 
tomatically created distribution of the verbs over 
the enriched fl'ame types revealed that, for exam- 
ple, even the high fl'equent, alternating verb move 
contains 97% (smoothed) zeroes within its distribu- 
tion. In accordance with this fiuding even subtle 
similarities, e.g. the sole fact that two verbs have 
non-zero wflues for certain fl'ame types, highly cor- 
relates the two verbs. For example, a semantic clus- 
ter contained the two verbs promise and love, be- 
cause both have non-zero attribute values for the 
subj :to frame, demmlding an agent for the subject 
slot; in their alternation behaviour (including selec- 
tional preferences) the two verbs differ, however, so 
they should not be packed into one cluster. A possi- 
ble suggestion to handle the problem of data sparse- 
ness could be to formulate the conceptual class types 
in a way which ensures an increased data potential 
for each type. 
Concerning the polysemy of verbs, the (hard) iter- 
ative distance clustering failed to model verb senses; 
a polysemous verb was either not at all assigned to 
any cluster, or assigned to a cluster describing one 
of the verb's senses. The (soft) latent (:lass analy- 
sis was able to filter the multiple senses and assign 
them to distinct (:lusters, but tended to split senses. 
Low-frequency verbs presented another problem, be- 
cause the verbs' distributions contained mostly ze- 
roes. They were assigned to clusters nearly ran- 
domly. 
An investigation of selected WordNet concep- 
tual classes revealed that the selectional preferences 
within the subcategorisation frames were donfinated 
by a few WordNet classes, mainly LifeForm and 
Agent. The demarcation between these two con- 
cepts was not obvious when referring to actually ap- 
pearing nouns within the frames, since both contain 
a large number of common subordinated nouns. In 
contrast, some WordNet classes were not chosen at 
all, e.g. Unit or Anticipation. Since the WordNet 
hierarchy in general had turned out to define intu- 
itively correct seleetional preferences, an improved 
classification utilised for my conceptual classifica- 
tion should be substituted by finer synsets, i.e. one 
should consider using a different cut through the 
WordNet hierarchy. 
5 Conclusion 
I proposed two algorithms for automatically classi- 
f~,ing verbs semantically, based on their alternation 
behaviour. Taking Levin's classification as a stan- 
dard for 153 manually chosen verbs with 226 verb 
senses and their assignment into 30 semantic classes, 
the iterative distance clustering succeeded for 61% 
of the verbs considering the syntactic usage of the 
fl'ames only, and for 38% when adding information 
about the frmne arguments' selectional preferences. 
The latent class analysis succeeded for 54% and 31%, 
respectively. 
An investigation of the resulting clusters showed 
that the assignment of the verbs was actually based 
on their shared linguistic properties: the verbs in 
a cluster presented common alternation behaviour, 
refined by adding selectional preferences to the syn- 
tactic description of the subcategorisation frmnes. 
It is impressive that as little lexical idiosyncratic 
verb information as the syntactic use of subcategori- 
sation fl'ames like subj : to or subj : pp. against suf- 
fices as a basis for a semantic class distinction to- 
wards Levin's narrow classification system including 
fine concepts as Desire or Manner of Motion. The 
potential is partly characterised by specific frames, 
but in the majority of cases by successflflly com- 
bining the frames in order to define the syntactic 
alternation, hnproving the definition and demarca- 
tion of conceptual classes should provide further po- 
tential concerning the inclusion of selectional prefer- 
ences into the syntactic description. 

References 

Richard Beckwith, Christiane Fellbaum, Derek 
Gross, and George A. Miller. 1991. Wordnet: A 
Lexical Database Organized on Psycholinguistic 
Principles. In Uri Zernik, editor, Lcxical Acqui- 
sition - Exploiting On-Line Resources to Bnild a 
Lczicon, chapter 9, pages 211 232. Lawrence Erl- 
baron Associates, Hillsdale - New Jersey. 

Glenn Carroll and Mats Rooth. 1998. Valence In- 
duction with a Head-Lexicalized PCFG. In Pro- 
ceedings of the 3rd Confcrcncc on Empirical Meth- 
ods in Natu~nl Language Processing, Granada, 
Spain. 

A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. 
Maximum Likelihood from Incomplete Data via 
the EM algorithm. Journal of the Royal Statistical 
Society, 39(B):1-38. 

Bonnie J. Dorr and Doug Jones. 1996. Role of Word 
Sense Dismnbiguation in Lexical Acquisition: Pre- 
dicting Semantics from Syntactic Cues. In Pro- 
ceedings of the 16th International Conference on 
Comp'utational Linguistics, Copenhagen. 

John Hughes. 1994. Automatically Acquiring Clas- 
sification of Words. Ph.D. thesis, University of 
Leeds, School of Computer Studies. 

Douglas A. Jones, Robert C. Berwick, Franklin 
Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki No- 
mura, Anand Radhakrislman, Ulri('h Sauerlan(1, 
and Brian Ulicny. 1994. Verb (,'lasses and Al- 
ternations ill Bangla, German, English, and Ko- 
rean. Technical l{el)ort MIT AI MEMO 1517, 
Massachusetts Institute of Technology. 

Judith L. Kla.vans and Min-Yen Kan. 1998. The 
Role of Verbs in DOeulnent Analysis. In Pwceed- 
ings of thc 17th Intcrnational Co~@rcncc on Com- 
putational Linguistics, Montreal, Canada. 

S. Kullback and R. A. Leibler. 1951. On Infl)rmation 
and Sufficiency. Annals of Mathematical Statis- 
tics, 22:79-86. 

Maria Lapata. 1999. Acquiring Lcxical Generaliza- 
tions from Corpora: A Case Study for Diathe- 
sis Alternations. In Proceedings of the 37th An- 
nual Mccting of the Association for Computa- 
tional Linguistics, pages 397 404:. 

Beth Levin. 1993. English Verb Classes and Al- 
ternations. The University of Chi(:ago Press, 
Chicago, 1st edition. 

Malka Rat)i)al)ort Hovav and Beth Levin. 1998. 
Building Verb Meanings. In M. Butt and 
W. Geuder, editors, Lcxical and Compositional 
Factors, pages 97-134. CSLI Publications, Stan- 
ford, CA. 

Philip Resnik. 1993. Selection and Information: 
A Class-Based AppTvach to Lexical Relationsh, ips. 
Ph.D. thesis, University of Pennsylvania. 

Philip Resnik. 1997. Selectional Preference and 
Sense Disambiguation. In Proceedings of the ACL 
SIGLEX Workshop on ~hflginfl ~::ct with, Lcxical 
Semantics: Wh, y, Wh, at, and llow? 

l~5"ancesc Ribas. 1994. An Experiment on Learn- 
ing Appropriate SelectionM Restrictions fi'om a 
Parsed Corpus. In Procecdings of the 15th Inter- 
national Conference on Computational Linguis- 
tics, pages 769 774. 

Francesc Ribas. 1995. On Learning Mot'e Appropri- 
ate Selcctional Restrictions. In Pwcccdings of the 
7th Conference of the Eurot)ean Chaptcr of the As- 
sociation for Computational Linguistics, Dublin, 
Ireland. 

Mats Rooth. 1998. Two-Dimensional Clusters in 
Grammatical Relations. In Inducing Lexicons 
with th, c EM Algorithm, AIMS Report 4(3). Insti- 
tut ffir Maschinelle Si)raehverarl)eitung, Univer- 
sitgt Stuttgart. 

Sabine Schulte im Walde. 1998. Automatic Se- 
nmntic Classification of Verbs According to Their 
Alternation Behaviour. Master's thesis, Institut 
ffir Maschinelle Sprachverarbeitung, UniversitSt 
Stuttgart. 

Suzamm Stevenson and Paola Merlo. 1999. Auto- 
Inatic Verb Classification Using Distributions of 
Grammatical Features. hi P~vcccdings of the 9th 
Conference of thc European Chaptcr of the Associ- 
ation for Computational Linguistics, pages 45-52. 
