Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pages 10–17,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Verb subcategorization kernels for automatic semantic labeling
Alessandro Moschitti and Roberto Basili
Department of Computer Science
University of Rome ”Tor Vergata”
Rome, Italy
{moschitti,basili}@info.uniroma2.it
Abstract
Recently, many researches in natural lan-
guage learning have considered the repre-
sentation of complex linguistic phenom-
ena by means of structural kernels. In
particular, tree kernels have been used to
represent verbal subcategorization frame
(SCF) information for predicate argument
classification. As the SCF is a relevant
clue to learn the relation between syn-
tax and semantic, the classification algo-
rithm accuracy was remarkable enhanced.
In this article, we extend such work by
studying the impact of the SCF tree kernel
on both PropBank and FrameNet seman-
tic roles. The experiments with Support
Vector Machines (SVMs) confirm a strong
link between the SCF and the semantics of
the verbal predicates as well as the bene-
fit of using kernels in diverse and complex
test conditions, e.g. classification of un-
seen verbs.
1 Introduction
Some theories of verb meaning are based on syn-
tactic properties, e.g. the alternations of verb argu-
ments (Levin, 1993). In turn, Verb Subcategoriza-
tion Frame (SCF) characterizes different syntactic
alternations, thus, it plays a central role in the link-
ing theory between verb semantics and their syntac-
tic structures.
Figure 1 shows the parse tree for the sentence
"John rented a room in Boston" along
Figure 1: A predicate argument structure in a parse-tree rep-
resentation.
with the semantic shallow information embodied by
the verbal predicate to rent and its three arguments:
Arg0, Arg1 and ArgM. The SCF of such verb, i.e.
NP-PP, provides a synthesis of the predicate argu-
ment structure.
Currently, the systems which aim to derive se-
mantic shallow information from texts recognize the
SCF of a target verb and represent it as a flat feature
(e.g. (Xue and Palmer, 2004; Pradhan et al., 2004))
in the learning algorithm. To achieve this goal, a lex-
icon which describes the SCFs for each verb, is re-
quired. Such a resource is difficult to find especially
for specific domains, thus, several methods to auto-
matically extract SCF have been proposed (Korho-
nen, 2003). In (Moschitti, 2004), an alternative to
the SCF extraction was proposed, i.e. the SCF ker-
nel (SK). The subcategorization frame of verbs was
implicitly represented by means of the syntactic sub-
trees which include the predicate with its arguments.
The similarity between such syntactic structures was
evaluated by means of convolution kernels.
Convolution kernels are machine learning ap-
proaches which aim to describe structured data in
10
terms of its substructures. The similarity between
two structures is carried out by kernel functions
which determine the number of common substruc-
tures without evaluating the overall substructure
space. Thus, if we associate two SCFs with two
subtrees, we can measure their similarity with such
functions applied to the two trees. This approach
determines a more syntactically motivated verb par-
tition than the traditional method based on flat SCF
representations (e.g. the NP-PP of Figure 1). The
subtrees associated with SCF group the verbs which
have similar syntactic realizations, in turn, accord-
ing to Levin’s theories, this would suggest that they
are semantically related.
A preliminary study on the benefit of such ker-
nels was measured on the classification accuracy of
semantic arguments in (Moschitti, 2004). In such
work, the improvement on the PropBank arguments
(Kingsbury and Palmer, 2002) classification sug-
gests that SK adds information to the prediction
of semantic structures. On the contrary, the perfor-
mance decrease on the FrameNet data classification
shows the limit of such approach, i.e. when the syn-
tactic structures are shared among several semantic
roles SK seems to be useless.
In this article, we use Support Vector Machines
(SVMs) to deeply analyze the role of SK in the au-
tomatic predicate argument classification. The ma-
jor novelty of the article relates to the extensive ex-
perimentation carried out on the PropBank (Kings-
bury and Palmer, 2002) and FrameNet (Fillmore,
1982) corpora with diverse levels of task complex-
ity, e.g. test instances of unseen predicates (typi-
cal of free-text processing). The results show that:
(1) once a structural representation of a linguistic
object, e.g. SCF, is available we can use convolu-
tion kernels to study its connections with another
linguistic phenomenon, e.g. the semantic predicate
arguments. (2) The tree kernels automatically derive
the features (structures) which support also a sort of
back-off estimation in case of unseen verbs. (3) The
structural features are in general robust in all testing
conditions.
The remainder of this article is organized as fol-
lows: Section 2 defines the Predicate Argument Ex-
traction problem and the standard solution to solve
it. In Section 3 we present our kernels whereas
in Section 4 we show comparative results among
SVMs using standard features and the proposed ker-
nels. Finally, Section 5 summarizes the conclusions.
2 Parsing of Semantic Roles and Semantic
Arguments
There are two main resources that relate to predicate
argument structures: PropBank (PB) and FrameNet
(FN). PB is a 300,000 word corpus annotated with
predicative information on top of the Penn Treebank
2 Wall Street Journal texts. For any given pred-
icate, the expected arguments are labeled sequen-
tially from Arg 0 to Arg 9, ArgA and ArgM. The
Figure 1 shows an example of the PB predicate an-
notation. Predicates in PB are only embodied by
verbs whereas most of the times Arg 0 is the subject,
Arg 1 is the direct object and ArgM may indicate lo-
cations, as in our example.
FrameNet also describes predicate/argument
structures but for this purpose it uses richer se-
mantic structures called frames. These latter are
schematic representations of situations involving
various participants, properties and roles, in which
a word may be typically used. Frame elements or
semantic roles are arguments of target words that
can be verbs or nouns or adjectives. In FrameNet,
the argument names are local to the target frames.
For example, assuming that attach is the target word
and Attaching is the target frame, a typical sentence
annotation is the following.
[Agent They] attachTgt [Item themselves]
[Connector with their mouthparts] and then
release a digestive enzyme secretion which
eats into the skin.
Several machine learning approaches for argu-
ment identification and classification have been de-
veloped, e.g. (Gildea and Jurasfky, 2002; Gildea and
Palmer, ; Gildea and Hockenmaier, 2003; Pradhan et
al., 2004). Their common characteristic is the adop-
tion of feature spaces that model predicate-argument
structures in a flat feature representation. In the next
section we present the common parse tree-based ap-
proach to this problem.
2.1 Predicate Argument Extraction
Given a sentence in natural language, all the predi-
cates associated with the verbs have to be identified
11
along with their arguments. This problem can be
divided into two subtasks: (a) the detection of the
target argument boundaries, i.e. all its compound-
ing words, and (b) the classification of the argument
type, e.g. Arg0 or ArgM in PropBank or Agent and
Goal in FrameNet.
The standard approach to learn both the detection
and the classification of predicate arguments is sum-
marized by the following steps:
1. Given a sentence from the training-set, gener-
ate a full syntactic parse-tree;
2. let P and A be the set of predicates and the
set of parse-tree nodes (i.e. the potential argu-
ments), respectively;
3. for each pair <p,a>∈P ×A:
• extract the feature representation set, Fp,a;
• if the subtree rooted in a covers exactly
the words of one argument of p, put Fp,a
in T+ (positive examples), otherwise put
it in T− (negative examples).
For instance, in Figure 1, for each combination of
the predicate rent with the nodes N, S, VP, V, NP,
PP, D or IN the instances Frent,a are generated. In
case the node a exactly covers ”Paul”, ”a room” or
”in Boston”, it will be a positive instance otherwise
it will be a negative one, e.g. Frent,IN.
The T+ and T− sets can be re-organized as posi-
tive T+argi and negative T−argi examples for each argu-
ment i. In this way, an individual ONE-vs-ALL clas-
sifier for each argument i can be trained. We adopted
this solution as it is simple and effective (Pradhan et
al., 2004). In the classification phase, given a sen-
tence of the test-set, all its Fp,a are generated and
classified by each individual classifier Ci. As a final
decision, we select the argument associated with the
maximum value among the scores provided by the
individual classifiers.
2.2 Standard feature space
The discovery of relevant features is, as usual, a
complex task, nevertheless, there is a common con-
sensus on the basic features that should be adopted.
These standard features, firstly proposed in (Gildea
and Jurasfky, 2002), refer to a flat information de-
rived from parse trees, i.e. Phrase Type, Predicate
Word, Head Word, Governing Category, Position
and Voice. For example, the Phrase Type indicates
the syntactic type of the phrase labeled as a predi-
cate argument, e.g. NP for Arg1 in Figure 1. The
Parse Tree Path contains the path in the parse tree
between the predicate and the argument phrase, ex-
pressed as a sequence of non-terminal labels linked
by direction (up or down) symbols, e.g. V ↑ VP ↓
NP for Arg1 in Figure 1. The Predicate Word is the
surface form of the verbal predicate, e.g. rent for all
arguments.
In the next section we describe the SVM approach
and the basic kernel theory for the predicate argu-
ment classification.
2.3 Learning with Support Vector Machines
Given a vector space in Rfracturn and a set of positive and
negative points, SVMs classify vectors according to
a separating hyperplane, H(vectorx) = vectorw × vectorx + b = 0,
where vectorw ∈ Rfracturn and b ∈ Rfractur are learned by applying
the Structural Risk Minimization principle (Vapnik,
1995).
To apply the SVM algorithm to Predicate Argu-
ment Classification, we need a function φ : F →Rfracturn
to map our features space F = {f1,..,f|F|} and our
predicate/argument pair representation, Fp,a = Fz,
into Rfracturn, such that:
Fz → φ(Fz) = (φ1(Fz),..,φn(Fz))
From the kernel theory we have that:
H(vectorx) =
parenleftBig summationdisplay
i=1..l
αivectorxi
parenrightBig
·vectorx+b =
summationdisplay
i=1..l
αivectorxi ·vectorx+b = summationdisplay
i=1..l
αiφ(Fi)·φ(Fz)+b.
where, Fi ∀i ∈ {1,..,l} are the training instances
and the product KT(Fi,Fz) =<φ(Fi) · φ(Fz)> is
the kernel function associated with the mapping φ.
The simplest mapping that we can apply is
φ(Fz) = vectorz = (z1,...,zn) where zi = 1 if fi ∈ Fz
and zi = 0 otherwise, i.e. the characteristic vector
of the set Fz with respect to F. If we choose the
scalar product as a kernel function we obtain the lin-
ear kernel KL(Fx,Fz) = vectorx·vectorz.
Another function that has shown high ac-
curacy for the predicate argument classification
(Pradhan et al., 2004) is the polynomial kernel:
12
S
NP VP
VP VPCC
VB NP
tookDTNN
thebook
and VB NP
readPRP$NN
itstitle
PRP
John
S
NP VP
VP
VB NP
took
S
NP VP
VP
VB NP
read
Arg. 0 Arg. 0
Arg. 1 Arg. 1
Sentence Parse-Tree Ftook Fread
Figure 2: Subcategorization frame structure for two predicate
argument structures.
  
  
  
  
  
S 
NP VP 
VP 
VB NP 
took S NP VP 
VP 
VB NP 
took 
VP 
VB NP 
S 
NP VP 
VP 
VB NP S 
NP VP 
VP 
VB 
took 
VP 
VP 
VB NP 
took 
VP 
VP 
VB NP 
VP 
VP 
Figure 3: All 10 valid fragments of the SCFS associated with
the arguments of Ftook of Figure 2.
KPoly(Fx,Fz) = (c+vectorx·vectorz)d, where c is a constant
and d is the degree of the polynom.
The interesting property is that we do not need to
evaluate the φ function to compute the above vector;
only the K(vectorx,vectorz) values are required. This allows
us to define efficient classifiers in a huge (possible
infinite) feature set, provided that the kernel is pro-
cessed in an efficient way. In the next section, we
introduce the convolution kernel that we used to rep-
resent subcategorization structures.
3 Subcategorization Frame Kernel (SK)
The convolution kernel that we have experimented
was devised in (Moschitti, 2004) and is character-
ized by two aspects: the semantic space of the sub-
categorization structures and the kernel function that
measure their similarities.
3.1 Subcategorization Frame Structure (SCFS)
We consider the predicate argument structures an-
notated in PropBank or FrameNet as our semantic
space. As we assume that semantic structures are
correlated to syntactic structures, we used a ker-
nel that selects semantic information according to
the syntactic structure of a predicate. The subparse
tree which describes the subcategorization frame of
the target verbal predicate defines the target Sub-
categorization Frame Structure (SCFS). For exam-
ple, Figure 2 shows the parse tree of the sentence
"John took the book and read its title" to-
gether with two SCFS structures, Ftook and Fread
associated with the two predicates took and read, re-
spectively. Note that SCFS includes also the external
argument (i.e. the subject) although some linguistic
theories do not consider it being part of the SCFs.
Once the semantic representation is defined, we
need to design a tree kernel function to estimate the
similarity between our objects.
3.2 The tree kernel function
The main idea of tree kernels is to model a
K(T1,T2) function which computes the number of
the common substructures between two trees T1 and
T2. For example, Figure 3 shows all the fragments
of the argument structure Ftook (see Figure 2) which
will be matched against the fragment of another
SCFS.
Given the set of fragments {f1,f2,..} = F ex-
tracted from all SCFSs of the training set, we define
the indicator function Ii(n) which is equal 1 if the
target fi is rooted at node n and 0 otherwise. It fol-
lows that:
K(T1,T2) = summationdisplay
n1∈NT1
summationdisplay
n2∈NT2
∆(n1,n2) (1)
where NT1 and NT2 are the sets of the T1’s
and T2’s nodes, respectively and ∆(n1,n2) =summationtext
|F|
i=1 Ii(n1)Ii(n2). This latter is equal to the num-
ber of common fragments rooted in the n1 and n2
nodes. We can compute ∆ as follows:
1. if the productions at n1 and n2 are different
then ∆(n1,n2) = 0;
2. if the productions at n1 and n2 are the same,
and n1 and n2 have only leaf children (i.e. they
are pre-terminals symbols) then ∆(n1,n2) =
1;
3. if the productions at n1 and n2 are the same,
and n1 and n2 are not pre-terminals then
∆(n1,n2) =
nc(n1)productdisplay
j=1
(1+∆(cjn1,cjn2)) (2)
where σ ∈ {0,1}, nc(n1) is the number of the chil-
dren of n1 and cjn is the j-th child of the node n.
13
Note that, as the productions are the same nc(n1) =
nc(n2).
The above kernel has the drawback of assigning
higher weights to larger structures1. To overcome
this problem we can scale the relative importance of
the tree fragments using a parameter λ in the con-
ditions 2 and 3 as follows: ∆(nx,nz) = λ and
∆(nx,nz) = λproducttextnc(nx)j=1 (σ +∆(cjn1,cjn2)).
The set of fragments that belongs to SCFs are
derived by human annotators according to seman-
tic considerations, thus they generate a semantic
subcategorization frame kernel (SK). We also
note that SK estimates the similarity between
two SCFSs by counting the number of fragments
that are in common. For example, in Figure 2,
KT(φ(Ftook),φ(Fread)) is quite high (i.e. 6 out 10
substructures) as the two verbs have the same syn-
tactic realization.
In other words the fragments encode semantic in-
formation which is measured by SK. This provides
the argument classifiers with important clues about
the possible set of arguments suited for a target ver-
bal predicate. To support this hypothesis the next
section presents the experiments on the predicate ar-
gument type of FrameNet and ProbBank.
4 The Experiments
A clustering algorithm which uses SK would group
together verbs that show a similar syntactic struc-
ture. To study the properties of such clusters we ex-
perimented SK in combination with the traditional
kernel used for the predicate argument classification.
As the polynomial kernel with degree 3 was shown
to be the most accurate for the argument classifica-
tion (Pradhan et al., 2004; Moschitti, 2004) we use
it to build two kernel combinations:
• Poly + SK: KPoly|KPoly| + γ KT|KT|, i.e. the sum be-
tween the normalized polynomial kernel (see
Section 2.3) and the normalized SK2.
• Poly × SK: KPoly×KT|KPoly|×|KT|, i.e. the normal-
ized product between the polynomial kernel
1With a similar aim and to have a similarity score between 0
and 1, we also apply the normalization in the kernel space, i.e.
Kprime(T1,T2) = K(T1,T2)√K(T
1,T1)×K(T2,T2)
.
2To normalize a kernel K(vectorx,vectorz) we can divide it byradicalbig
K(vectorx,vectorx)×K(vectorz,vectorz).
and SK.
For the experiments we adopted two corpora
PropBank (PB) and FrameNet (FN). PB, avail-
able at www.cis.upenn.edu/∼ace, is used along
with the Penn TreeBank 2 (www.cis.upenn.edu
/∼treebank) (Marcus et al., 1993). This corpus
contains about 53,700 sentences and a fixed split be-
tween training and testing which has been used in
other researches, e.g. (Pradhan et al., 2004; Gildea
and Palmer, ). In this split, Sections from 02 to 21
are used for training, section 23 for testing and sec-
tions 1 and 22 as development set. We considered all
12 arguments from Arg0 to Arg9, ArgA and ArgM for
a total of 123,918 and 7,426 arguments in the train-
ing and test sets, respectively. It is worth noting that
in the experiments we used the gold standard parsing
from the Penn TreeBank, thus our kernel structures
are derived with high precision.
The second corpus was obtained by extract-
ing from FrameNet (www.icsi.berkeley.edu/
∼framenet/) all 24,558 sentences from 40 frames
of the Senseval 3 (http://www.senseval.org) Au-
tomatic Labeling of Semantic Role task. We con-
sidered 18 of the most frequent roles for a total of
37,948 arguments3. Only verbs are selected to be
predicates in our evaluations. Moreover, as there is
no fixed split between training and testing, we ran-
domly selected 30% of the sentences for testing and
30% for validation-set, respectively. Both training
and testing sentences were processed using Collins’
parser (Collins, 1997) to generate parse-tree auto-
matically. This means that our shallow semantic
parser for FrameNet is fully automated.
4.1 The Classification set-up
The evaluations were carried out with the SVM-
light-TK software (Moschitti, 2004) available at
http://ai-nlp.info.uniroma2.it/moschitti/
which encodes the tree kernels in the SVM-light
software (Joachims, 1999).
The classification performance was measured us-
ing the F1 measure4 for the individual arguments
and the accuracy for the final multi-class classifier.
This latter choice allows us to compare the results
3We mapped together roles having the same name
4F1 assigns equal importance to Precision P and Recall R,
i.e. F1 = 2P×RP+R .
14
with previous literature works, e.g. (Gildea and
Jurasfky, 2002; Pradhan et al., 2004; Gildea and
Palmer, ).
For the evaluation of SVMs, we used the default
regularization parameter (e.g., C = 1 for normal-
ized kernels) and we tried a few cost-factor values
(i.e., j ∈ {1,2,3,5,7,10,100}) to adjust the rate
between Precision and Recall. We chose the pa-
rameters by evaluating the SVMs using the KPoly
kernel (degree = 3) over the validation-set. Both λ
(see Section 3.2) and γ parameters were evaluated
in a similar way by maximizing the performance of
SVM using Poly+SK. We found that the best values
were 0.4 and 0.3, respectively.
4.2 Comparative results
To study the impact of the subcategorization frame
kernel we experimented the three models Poly,
Poly + SK and Poly × SK on different training
conditions.
First, we run the above models using all the verbal
predicates available in the training and test sets. Ta-
bles 1 and 2 report the F1 measure and the global
accuracy for PB and FN, respectively. Column 2
shows the accuracy of Poly (90.5%) which is sub-
stantially equal to the accuracy obtained in (Prad-
han et al., 2004) on the same training and test sets
with the same SVM model. Columns 3 and 4
show that the kernel combinations Poly + SK and
Poly × SK remarkably improve Poly accuracy,
i.e. 2.7% (93.2% vs. 90.5%) whereas on FN only
Poly+SK produces a small accuracy increase, i.e.
0.7% (86.2% vs. 85.5%).
This outcome is lower since the FN classification
requires dealing with a higher variability of its se-
mantic roles. For example, in ProbBank most of the
time, the PB Arg0 and Arg1 corresponds to the log-
ical subject and logical direct object, respectively.
On the contrary, the FN Cause and Agent roles are
often both associated with the logical subject and
share similar syntactic realizations, making SCFS
less effective to distinguish between them. More-
over, the training data available for FrameNet is
smaller than that used for PropBank, thus, the tree
kernel may not have enough examples to generalize,
correctly.
Second, we carried out other experiments using
a subset of the total verbs for training and another
Args All Verbs Disjoint Verbs
Poly +SK ×SK Poly +SK ×SK
Arg0 90.8 94.6 94.7 86.8 90.9 91.1
Arg1 91.1 92.9 94.1 81.7 86.8 88.3
Arg2 80.0 77.4 82.0 49.9 49.5 47.6
Arg3 57.9 56.2 56.4 20.3 22.9 20.6
Arg4 70.5 69.6 71.1 0 0 0
ArgM 95.4 96.1 96.3 90.3 93.4 93.7
Acc. 90.5 92.4 93.2 82.1 86.3 86.9
Table 1: Kernel accuracies on PropBank.
Role All Verbs Disjoint Verbs
Poly +SK ×SK Poly +SK ×SK
agent 91.7 94.4 94.0 82.5 84.8 84.7
cause 57.4 60.6 56.4 29.1 28.1 26.9
degree 77.1 77.2 60.9 40.6 44.6 22.6
depict. 85.8 86.2 85.9 73.6 74.0 71.2
instrum. 67.1 69.1 64.6 13.3 13.0 12.8
manner 80.5 79.7 77.7 74.8 74.3 72.3
Acc. 85.5 86.2 85.0 72.8 74.6 74.2
Table 2: Kernel accuracies on 18 FrameNet semantic roles.
disjoint subset for testing. In these conditions, the
impact of SK is amplified: on PB, SK×Poly out-
performs Poly by 4.8% (86.9% vs. 82.1%), whereas,
on FN, SK increases Poly of about 2%, i.e. 74.6%
vs. 72.8%. These results suggest that (a) when test-
set verbs are not observed during training, the clas-
sification task is harder, e.g. 82.1% vs. 90.5% on
PB and (b) the syntactic structures of the verbs, i.e.
the SCFSs, allow the SVMs to better generalize on
unseen verbs.
To verify that the kernel representation is supe-
rior to the traditional representation we carried out
an experiment using a flat feature representation of
the SCFs, i.e. we used the syntactic frame feature
described (Xue and Palmer, 2004) in place of SK.
The result as well as other literature findings, e.g.
(Pradhan et al., 2004) show an improvement on PB
of about 0.7% only. Evidently flat features cannot
derive the same information of a convolution kernel.
Finally, to study how the verb complexity impacts
on the usefulness of SK, we carried out additional
experiments with different verb sets. One dimension
of complexity is the frequency of the verbs in the
target corpus. Infrequent verbs are associated with
predicate argument structures poorly represented in
the training set thus they are more difficult to clas-
sify. Another dimension of the verb complexity is
the number of different SCFs that they show in dif-
ferent contexts. Intuitively, the higher is the number
15
(a) PropBank Multi-classifier
0.83
0.85
0.87
0.89
0.91
0.93
0.95
0.97
1-5 6-10 11-15 16-30 31-60 61-100 101-
300
301-
400
401-
600
601-
700
701-
1000
>1000
Verb Frequency
Ac
cu
ra
cy
Poly
Poly+SK
(b) Arg0 Classifier
0.85
0.87
0.89
0.91
0.93
0.95
0.97
1-5 6-10 11-15 16-30 31-60 61-100 101-300 301-400 401-600 601-700 701-
1000
>1000
Verb Frequency
F1
Poly
Poly+SK
(c) PropBank Multi-classifier
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
1 2-3 4-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 >50
# SCF Type
Ac
cu
ra
cy
Poly
Poly+SK
(d) Arg0 Classifier
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
1 2-3 4-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 >50
# SCF Type 
F1
Poly
Poly+SK
(e) FrameNet Multi-classifier
0.60
0.65
0.70
0.75
0.80
0.85
0.90
1 2-3 4-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 >50
# SCF Type
Ac
cu
ra
cy
Poly
Poly+SK
(f) Agent Classifier
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1 2-3 4-5 6-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 >50
# SCF Type 
F1
Poly
Poly+SK
Figure 4: The impact of SCF on the classification accuracy of the semantic arguments and semantic roles according to the verb
complexity.
of verb’s SCF types the more difficult is the classifi-
cation of its arguments.
Figure 4.a, reports the accuracy along with the
trend line plot of Poly and SK + Poly according
to subsets of different verb frequency. For example,
the label 1-5 refers to the class of verbal predicates
whose frequency ranges from 1 to 5. The associated
accuracy is evaluated on the portions of the training
and test-sets which contain only the verbs in such
class. We note that SK improves Poly for any verb
frequency. Such improvement decreases when the
frequency becomes very high, i.e. when there are
many training instances that can suggest the correct
classification. A similar behavior is shown in Figure
4.b where the F1 measure for Arg0 of PB is reported.
Figures 4.c and 4.d illustrate the accuracy and the
F1 measure for all arguments and Arg0 of PB ac-
cording to the number of SCF types, respectively.
We observe that the Semantic Kernel does not pro-
duce any improvement on the verbs which are syn-
tactically expressed by only one type of SCF. As the
number of SCF types increases (> 1) Poly + SK
outperforms Poly for any verb class, i.e. when the
verb is enough complex SK always produces use-
ful information independently of the number of the
training set instances. On the one hand, a high num-
ber of verb instances reduces the complexity of the
classification task. On the other hand, as the num-
ber of verb type increases the complexity of the task
increases as well.
A similar behavior can be noted on the FN data
(Figure 4.e) even if the not so strict correlation be-
tween syntax and semantics prevents SK to produce
high improvements. Figure 4.f shows the impact of
SK on the Agent role. We note that, the F1 increases
more than the global accuracy (Figure 4.e) as the
Agent most of the time corresponds to Arg0. This is
confirmed by the Table 2 which shows an improve-
ment for the Agent of up to 2% when SK is used
along with the polynomial kernel.
5 Conclusive Remarks
In this article, we used Support Vector Machines
(SVMs) to deeply analyze the role of the subcat-
egorization frame kernel (SK) in the automatic
predicate argument classification of PropBank and
16
FrameNet corpora. To study the SK’s verb clas-
sification properties we have combined it with the
polynomial kernel on standard flat features.
We run the SVMs on diverse levels of task com-
plexity. The results show that: (1) in general SK
remarkably improves the classification accuracy. (2)
When there are no training instances of the test-
set verbs the improvement of SK is almost double.
This suggests that tree kernels automatically derive
features which support also a sort of back-off esti-
mation in case of unseen verbs. (3) In all complexity
conditions the structural features are in general very
robust, maintaining a high improvement over the ba-
sic accuracy. (4) The semantic role classification in
FrameNet is affected with more noisy data as it is
based on the output of a statistical parser. As a con-
sequence the improvement is lower. Anyway, the
systematic superiority of SK suggests that it is less
sensible to parsing errors than other models. This
opens promising direction for a more weakly super-
vised application of the statistical semantic tagging
supported by SK.
In summary, the extensive experimentation has
shown that the SK provides information robust with
respect to the complexity of the task, i.e. verbs with
richer syntactic structures and sparse training data.
An important observation on the use of tree ker-
nels has been pointed out in (Cumby and Roth,
2003). Both computational efficiency and classifi-
cation accuracy can often be superior if we select
the most informative tree fragments and carry out
the learning in the feature space. Nevertheless, the
case studied in this paper is well suited for using ker-
nels as: (1) it is difficult to guess which fragment
from an SCF should be retained and which should
be discarded, (2) it may be the case that all frag-
ments are useful as SCFs are small structures and all
theirs substructures may serve as different back-off
levels and (3) small structures do not heavily penal-
ize efficiency.
Future research may be addressed to (a) the use
of SK kernel to explicitly generate verb clusters and
(b) the use of convolution kernels to study other lin-
guistic phenomena: we can use tree kernels to in-
vestigate which syntactic features are suited for an
unknown phenomenon.
Acknowledgement
We would like to thank the anonymous reviewers for
their competence and commitment showed in the re-
view of this paper.

References
Michael Collins. 1997. Three generative, lexicalized
models for statistical parsing. In Proceedings of the
ACL’97,Somerset, New Jersey.
Chad Cumby and Dan Roth. 2003. Kernel methods for
relational learning. In Proceedings of ICML’03, Wash-
ington, DC, USA.
Charles J. Fillmore. 1982. Frame semantics. In Linguis-
tics in the Morning Calm, pages 111–137.
Daniel Gildea and Julia Hockenmaier. 2003. Identifying
semantic roles using combinatory categorial grammar.
In Proceedings of EMNLP’03, Sapporo, Japan.
Daniel Gildea and Daniel Jurasfky. 2002. Automatic la-
beling of semantic roles. Computational Linguistic,
28(3):496–530.
Daniel Gildea and Martha Palmer. The necessity of pars-
ing for predicate argument recognition. In Proceed-
ings of ACL’02.
T. Joachims. 1999. Making large-scale SVM learning
practical. In B. Sch¨olkopf, C. Burges, and A. Smola,
editors, Advances in Kernel Methods - Support Vector
Learning.
Paul Kingsbury and Martha Palmer. 2002. From Tree-
bank to PropBank. In Proceedings of LREC’02.
Anna Korhonen. 2003. Subcategorization Acquisi-
tion. Ph.D. thesis, Techical Report UCAM-CL-TR-
530. Computer Laboratory, University of Cambridge.
Beth Levin. 1993. English Verb Classes and Alterna-
tions A Preliminary Investigation. Chicago: Univer-
sity of Chicago Press.
M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz.
1993. Building a large annotated corpus of en-
glish: The Penn Treebank. Computational Linguistics,
19:313–330.
Alessandro Moschitti. 2004. A study on convolution
kernels for shallow semantic parsing. In proceedings
ACL’04, Barcelona, Spain.
Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne
Ward, James H. Martin, and Daniel Jurafsky. 2005.
Support vector learning for semantic argument classi-
fication. to appear in the Machine Learning Journal.
V. Vapnik. 1995. The Nature of Statistical Learning The-
ory. Springer.
Nianwen Xue and Martha Palmer. 2004. Calibrating
features for semantic role labeling. In Dekang Lin
and Dekai Wu, editors, Proceedings of EMNLP’04,
Barcelona, Spain, July.
