Constructing Verb Semantic Classes for French: Methods and 
Evaluation 
Patrick Saint-Dizier 
IRIT-CNRS 
118 route de Narbonnc F-31062 3bulouse Cedex France 
stdizier@iri% .fr 
Abstract 
In this paper, we study a reformulation, 
which is better adapted to NLP, of the 
alternation system developed for English 
by B. Levin. We have studied a set of 
1700 verbs from which we explain how 
verb semantic classes can be built in a 
systematic way. The quality of the re- 
sults w.r.t, semantic chLssifications such 
as WordNet is then evaluated. 
1 Aims 
Predicative forms are complex to describe; it is 
indeed necessary to describe in detail their sym 
tactic behavior, the different meanings they may 
convey, preferably at different levels of granular- 
ity (e.g. argument structure, thematic grids, con- 
ceptual representations (aackendofr 90)), and the 
relations between syntactic forms and meaning(s) 
(Levin 93), (Williams 94). It is also important to 
hierarchically organize these predicative forms so 
that syntactic and semantic descriptions are min- 
imal and coherent. 
Our work focusses on verbs and is primltrily 
based on B. Levin's work (Levin 93) for English, 
where she shows that, the syntactic behavior of 
verbs is in a large part predictable from some as- 
pects of their semantics. 13y syntactic behavior, 
she means the way arguments are syntactically 
realized with respect to the predicate. This in- 
cludes the description of the basic distrihution of 
the arguments, the description of the other posi- 
tions they may occupy (e.g. ergative and passive 
forms) and when they can be conjoined or deleted. 
These descriptions are called alternations. This 
work results in the creation of organized w'.rb st- 
mantic classes, mainly based on their syntactic lm- 
havior (alternations may also include specific se- 
mantic restrictions). It is an extremely usefifi and 
detailed study of the syntactico-semantic relations 
between a predicate and its arguments. 
We show here how the alternation system can 
be reformulated in a more Nl,P-oriented way, 
and develop for French a set of syntactic descrip- 
tions, called contexts, which sh~re many simi- 
larities with alternations, and propose principles 
that help defining their form and contents. Next, 
we show how verb semantic classes can be con- 
structed in a systematic way and evaluate them 
w.r.t. WordNet-like classifications. The implicit 
semantic conveyed by contexts is also analysed. 
The work presented here is applied to French, but 
can be transposed to other languages. 
2 The context system 
2.1 General approach and motivations 
We. have refornmlated Beth Levin's notion of al- 
ternation into a more declarative one: the notion 
of context. A context is a Dame where the cate- 
gory and some additional syntactic is used to de- 
scribe a precise form and position the argmneuts 
of a verb may have in a sentence. Verb classes 
are then formed from verbs having similar sets of 
contexts. 
Very briefly, compared to the alternation sys- 
tem, our approach avoids having to detlne a basic 
form from which alternations are produced and to 
have to explain what is the relation between a ba- 
sic and an alternated form. Moreover, it avoids us 
to have to account for changes in meaning provo- 
qued by alternations (e.g. by the adjunction of a 
preposition). 
l)etining contexts has led us to formulate a few 
principles: 
+ contexts should be of general purpose, this 
means that: exceptional forms should be 
avoided, only non-mnbiguous and easy-to-use 
forms are acceptable, and theory-neutral de- 
scriptions should be used. 
- c.ontexts should minimally overlapp, 
• they must only describe lexical properties; 
the scope of a context is nsuMly a proposL 
tion 
• as less semantic data as possible should bc 
used, otherwise the classification will also be 
based on semantic criteria, 
1127 
• the exact level of granularity of a context 
should be defined by feedback and retro- 
evaluation on verbs, 
• consider generalizing two contexts into one, if 
their discriminatory power is low. 
These principles allow us to partly automate the 
determination those of contexts which can be asso- 
ciated with a given verb (for example by corpora 
inspection). However, there will always remain 
quite a lot of manual work to check and improve 
the results, in spite of some promising research in 
this direction (Dorr et al. 95a). 
As shall be seen below, the context system 
(which is not really a new concept) provides us 
with a very powerful tool for specifying and or- 
ganizing the syntax and the semantics of verbs. 
Our contribution at this level is the way a context 
is defined, at what level of generality, with what 
formal means, and the way contexts are used to 
form verb classes. 
From a methodological point of view, contexts 
for French have been defined from a transposi- 
tion of some English alternations (about 1/3 of 
our contexts), from French syntactic descriptions, 
among which (Gross 75), from corpora and from 
our own intuitions of language. Context cover- 
age has then been validated on corpora to ensure 
that we cover most of the syntactic behaviors of 
arguments w.r.t, predicates. 
2.2 Description of contexts 
Contexts and the detailed criteria used to define 
them are presented in (Saint-Dizier 95). A context 
is a set of 'extended' distribution frames: 
l. a set: a cluster of syntactic forms which must 
all be valid for a given verb-sense. A verb ac- 
cepts a certain context if it accepts all the 
distributions the context is composed of. A 
distribution is a list of syntactic constructions 
(NPs, PPs and sentences); this list is ordered 
and corresponds to the way these construc- 
tions are linearly realized in the surface form 
as arguments or modifiers. 
2. 'extended': syntactic category distributions 
are expressed as a Type Feature Structure 
(written in Login (hit-Ka~i and Nasr 86)). 
We have identified several types of con- 
straints: 
• Local conslrainls on arguments or on ~he 
verb: thematic roles (including those defined 
in (Pngeault et al. 94), from (Dowty 89, 91)), 
the verb subcategorization frame, the arity of 
the verb, and a few commonly-admitted se- 
lectional restrictions. 
• Introduction of syntactic forms: coordination 
of arguments, introduction of reflexive pro- 
nouns and of a few modifiers. 
• Relations between arguments: thematic 
grids, modifier-modifiee relation between .ar- 
guments (e.g. noun complements), and 
expression of essential semantic relations: 
container-containee, and part-whole of vari- 
ous types. 
Our descriptions are more declarative than al- 
ternations, however, it is clear that this formalism 
allows us to introduce some forms of constraints 
between basis forms (via constraints on the verb) 
and the form being described. Similarly, the use 
of clusters of descriptions permits us to relate two 
forms. 
We have defined 70 contexts, including 'basic' 
contexts (corresponding to 'direct' realizations of 
argument structures) and non-basic ones. We 
have grouped the non-basic ones according to 
some similitudes into 17 subclasses. We have a 
total of 23 basic contexts (of general purpose) 
and 47 non-basic ones (there are 89 alternations 
in English). Non-basic contexts include the de- 
scription of: middle reflexives, passives, inchoat- 
ives, place-subject inversion, introduction of the 
senti-auxiliary faire, support verbs with nominal- 
ization of the predicate (e.g. crier - pousser un 
cri), various forms of argument deletion, prepo- 
sition change, reciproquals, body-part reformu- 
lations, means-instrument raising, reflexives, ar- 
gument 'des-incorporation', perspective change, 
there insertion, etc. 
For example, we have the famous English 
spray/load alternation, which also exists m 
French, which is described as follows: 
context(\[dist(lll, 7, context ID is 111 
verb(I\]), 7, no constraint on verb 
phrases ( \[ 
xp(syntax=>syn(cat=>n)), 7, distribution 
xp (syntax=>syn(cat=>p)), 
xp (synt ax=>syn (cat=>n, 
type-prep=> \[sur,dana\] ), 
semant ics=>sem(themat ic=> \[\[loc\] \] ) ) \] ), 
constraints ( \[\] ) , 
ex ( \[j e, pulverise, l a ,peinture, sur, le, mur\]) ) , 
7, I spray paint on the wall 
dist(lll, verb(I\]), 
phrases( \[ xp(syntax=>syn(cat=>n) ), 
xp (syntax=>syn(cat=>n) , 
semant ics=>sem(themat ic=> \[\[loc\] \] ) ) , 
xp(syntax=>syn(cat=>p, type-prep=> \[de\] ), 
semantics=>sem(thematic => \[ \[tg\] \], 
sem-type=>t sem (semp=>substance)) )\] ), 
constraints ( \[\] ) , 
ex( \[je, pulverise ,le ,mur ,de ,peinture\] ) )\] ) . 
7, I spray the wall with paint 
(tg : general theme and loc : localization). 
3 Construction of verb classes 
3.1 Typology of the verb sample 
The experiment presented here has been realized 
on a set of 1700 usual verbs which are the most 
frequently used in French. Our aim is to classify 
1128 
3000 to 4000 verbs. The size of the sample con- 
sidered so far is however sufficiently large to allow 
us to draw significant and precise conclusions. 
It should be noticed that contexts are associated 
with a given word-sense, not wilh all the senses 
of a verb. Each sense of a polysemous verb is 
associated with a different set of contexts. The 
description of a verb is the following: 
verb(\[verb\],arity, \[basic context number\], 
\[thematic grid\],\[prepositions\], 
\[list of contexts\]). 
verb(\[admirer\],3,\[20\],\[ae,tib,src\],\[pour\], 
\[50,51,61,i02,150,171,180\]). 
(ae = effective agent, tib = incremental bene- 
ficiary theme, src = source). Contexts have been 
associated with verbs on the basis of a nmnber 
of linguistic analyses of French (e.g. (Gross 75)), 
of already existing lexicons, and from corpora in- 
spection and our own intuitions. 
3.2 A simple verb classification 
We have carried out a simple classitication where 
a verb class contains all the verbs which accept 
exactly the same set of contexts. This is not the 
classification method adopted by Beth Levin: her 
verb classes are constructed from subsets of alter- 
nations, intuitively selected, which are sufficiently 
selective to allow for the characterization of a set 
of semantically related verbs. Exceptions are al- 
lowed in order to elt>ctively gather all the verbs 
which are intuitively semantically related. Her 
classification method, based on a large number of 
linguistic analyses involving some subtle semantic 
criteria (e.g. intentionality), can only be carried 
out manually and is therefore not adapted to our 
approach. 
We obtain a total of 953 classes. We get a large 
number of classes with just one element (about 
77%), this is not surprising, however, since con- 
texts can be combined in a large number of ways. 
56% of the verbs appear in classes with at least 2 
elements, and 33% of them are in classes with at 
least 5 elements. 
This number of classes is quite large compared 
to Beth Levin's results (about 200 classes), how- 
ever, our classes have been constructed on a strict 
equivalence class basis, without any exceptions, 
and all the contexts have been taken into account. 
We have an average of 1.8 verbs per class. A sim- 
ilar result was also obtain by (Gross 75), on a 
difi'erent basis (including morphology) and with 
more criteria (about 200). 
A very informal study of the progression of the 
number of classes tends to indicate that the in- 
crease of the number of new classes is not linear, 
but progressively decreases. It seems that beyond 
2500 verbs almost no new verb class should be 
created, defining about 1100 to 1200 classes. But 
this is clearly too much. 
3.3 Evaluation of the semantic 
relatedness of verb gemantic classes 
The overall quality of the verb classes are stud- 
ied in detail in (Saint-Dizier 95). With the same 
set of verb-senses, we have carried out a classi- 
fication similar to the classification proposed in 
WordNet. Besides the main categories presented 
in (Fe\]lbaum 93), we have added two classes: as- 
pectual verbs and verbs expressing causality. We 
have then subdivided these main categories ac- 
cording to different types of properties or con- 
straints following as much as possible those de- 
fined in WordNet. In our current classification, we 
consider 198 hierarchically organized classification 
criteria, instances of the is-a (or troponymy) rela- 
tion, the depth of the decomposition is 3 (Saint- 
Dizier 96). We therefore get 198 verb classes 
(called WN classes) for levels 1 to 3. For exam- 
ple, a three level decomposition is for raovemcn! 
verbs (level 1), directed motion, local motion, etc. 
(level 2) and upward motion, downward motion, 
etc. (level 3). 
If we now compare the degree of overlapp be- 
tween the classes (with at least 2 elements) formed 
above from syntactic contexts (called VS classes) 
and those of WN, we get the following results: 
WN level overlapp VS/WN- 
1 17 120 5-4o%-~ 
2 75 41 47%-- 
3 106 18 32%-o-- 
(1): number of WN classes, (2): average size of 
a WN class at this level. 
Classes where verbs are associated with at least 
5 contexts are of a much better quality (seman- 
tic relateuess with WN classes above 64%) than 
those under 5. The best classes contain an aver- 
age of 4 to 7 verbs, larger classes (above 10 ele- 
ments) are often of a lower quality or may contain 
several subsets of semantically related verbs: in 
a large number of classes with more than 8 ele- 
ments we tbund 2 or 3 subsets of classes of WN. 
These classes are often formed from a small nmn- 
bet of contexts (1 to 3), which explains their low 
semantic relatedness rate. 
Globally, these results aren't very good. If we 
want to explore in more depth the cooperation 
between syntax and semantics, and if we want to 
be able to construct verb semantic classes on a 
rigourous basis, it is necessary to develop methods 
that improve the quality of VS classes (consider- 
ing that syntactic criteria are the most 'rigourous' 
ones a priori). The first approach, which is the 
simplest, is to make the classification more flexible 
by allowing exceptions: a verb in a class may have 
one more or one less context than the norm of the 
class. This approach gives however very bad re- 
sults, with an overlapp VS/WN rate below 35%. 
To improve that rate, exceptions should depend 
1129 
on the VS class, but this is extremely subjective 
and hard to carry out. The second type of solution 
consists in analyzing the implicit semantics con- 
veyed by contexts and to form classes from sets of 
contexts, on the basis of their implicit semantics. 
Then all the verbs accepting exactly an a priori 
given set of contexts will belong to the same VS 
class, even if they accept many other contexts. 
4 Analysis of the semantics 
conveyed by contexts 
Some contexts are quite general and are not re- 
lated to precise semantic notions, while others 
convey clearly identifiable meaning components. 
First, there are contexts which convey very pre- 
cise meaning components, which are not taken 
into account, for various reasons, in WordNet clas- 
sifications. For example, the context of the form 
'pousser + nominalization of verb' is associated 
with verbs of sound emission: painful sounds for 
humans and any sound for animals; verbs which 
accept the 'dans/en-de preposition change' convey 
an idea of putting something into something else 
( bourrer le tuyau de papicr, bourrer le papier daus 
le tuyau). 
Next, a second type of context conveys meaning 
components which can directly be associated with 
WN criteria. We have carried out a detailed anal- 
ysis of the correlations between WN criteria and 
contexts. There are 19 non-basic contexts (out of 
47), which can very clearly be associated with 1 
or 2 WN criteria. For example, context 91, (je fats 
atterir l'avion ('I make land the plane')), is at 90% 
associated with verbs of body care. Context 151, 
(alternation 2.13.4 in Beth Levin: Les grimaces de 
Jean terrifient Sophie), is associated at a rate of 
60% with psychological verbs. This is studied in 
detail in (Saint-Dizier 96). 
5 Perspectives 
The semantic characterization of contexts should 
allow us to construct verb semantic classes on a 
stronger basis, and with a clear method. We have 
carried out preliminary experiments on transfer 
of possession verbs which confirm this hypothe- 
sis. Besides these results, it is of much interest 
to study how WN and VS classification systems 
can cooperate and can contribute to defining the 
syntax and the semantics of verbs, in a quite com- 
prehensive and fine-grained way. It should be 
noted that we consider that the syntax-based ap- 
proach (VS) is the most stable and the most for- 
real approach, it should therefore be the central 
element of our classification strategy. WN criteria 
are extremely useful, but they remain nevertheless 
somewhat intuitive and less connected to language 
realizations. 
Our ultimate goal, from this perspective, is to 
associate with families of verb classes, verb classes 
and possibly individual verbs, hierarchically orga- 
nize.d semantic representations, under the form of 
partially instanciated LCS-based semantic repre- 
sentations (a sucessfull experiment in this direc- 
tion has been carried out for English by (Voss 
and Dorr 95), and also by ourselves on verbs of 
transfer of possession) and ontological knowledge. 
Acknowledgements I thank Bonnie Dorr, 
Martha Palmer, Beth Levin, Doug Jones and 
Pahnira Marrafa for discussions that helped im- 
proving this research. Many thanks also to Alda 
Mart who carried out parts of the syntactic de- 
scriptions of verbs. 
References 
A'/t-Ka§i, H., Nasr, R., LOGIN: A Logic Pro- 
gramming Language with Built-in Inheritance, 
journal of Logic Programming, voh 3, pp 185-215, 
1986. 
Dorr, B., Garman, J., Weinberg, A., From Syn- 
tactic Encodings to Thematic Roles: Building Lex- 
ical Entries for Interlingual MT, Machine Trans- 
lation, 9-3, Kluwer Academic, 1995 
Dowty, D., On the Semantic Content of the No- 
tion of Thematic Role, in G. Cherchia, B. Partee, 
R. ~hrner (eds), Properties, Types and meaning, 
Kluwer, 1989. 
Dowty, D., Thematic Proto-roles and Argument 
Selection, Language, vol. 67-3, 1991. 
Fellbaum, C., English Verbs as Semantic Net, 
Journal of Lexicography, 1993. 
Gross~ M., Mdthodes en syntaxe, Masson, Paris, 
1975. 
Jackendoff, Ft., Semantic Structures, MIT 
Press, 1990. 
Levin, B., English verb Classes and Alterna- 
tions: A Preliminary Investigation, Chicago Univ. 
Press, 1993. 
Pinker, S., Learnability and Cognition, MIT 
Press, 1993. 
Pugeault, F., Saint-Dizier, P., Monteil, M.G., 
Knowledge Extraction from Texts: a method 
for extracting predicate-argument structures from 
texts, in proc. Coling 94, Kyoto, 1994. 
Saint-Dizier, P., Verb Semantic Classes in 
French, IR1T research report, December 1995, (re- 
vised and extended May 1996). 
Saint-Dizier, P., Semantic verb classes based on 
'alternations' and on WordNet-likc semantic cri- 
teria: a powerful convergence, in proc. workshop 
on Predicative Forms, univ. of Toulouse, August 
1996. 
Voss, C., Dorr, B., Toward a Lexicalized Gram- 
mar for Interlinguas, Machine Translation, 9-4, 
Kluwer Academic, 1995. 
Williams, E., Thematic Structure in Syntax, 
Linguistic Inquiry monograph no. 23, MIT Press, 
1994. 
1130 
