A Study on Convolution Kernels for Shallow Semantic Parsing
Alessandro Moschitti
University of Texas at Dallas
Human Language Technology Research Institute
Richardson, TX 75083-0688, USA
alessandro.moschitti@utdallas.edu
Abstract
In this paper we have designed and experi-
mented novel convolution kernels for automatic
classi cation of predicate arguments. Their
main property is the ability to process struc-
tured representations. Support Vector Ma-
chines (SVMs), using a combination of such ker-
nels and the  at feature kernel, classify Prop-
Bank predicate arguments with accuracy higher
than the current argument classi cation state-
of-the-art.
Additionally, experiments on FrameNet data
have shown that SVMs are appealing for the
classi cation of semantic roles even if the pro-
posed kernels do not produce any improvement.
1 Introduction
Several linguistic theories, e.g. (Jackendo ,
1990) claim that semantic information in nat-
ural language texts is connected to syntactic
structures. Hence, to deal with natural lan-
guage semantics, the learning algorithm should
be able to represent and process structured
data. The classical solution adopted for such
tasks is to convert syntax structures into  at
feature representations which are suitable for a
given learning model. The main drawback is
that structures may not be properly represented
by  at features.
In particular, these problems a ect the pro-
cessing of predicate argument structures an-
notated in PropBank (Kingsbury and Palmer,
2002) or FrameNet (Fillmore, 1982). Figure
1 shows an example of a predicate annotation
in PropBank for the sentence: "Paul gives a
lecture in Rome". A predicate may be a verb
or a noun or an adjective and most of the time
Arg 0 is the logical subject, Arg 1 is the logical
object and ArgM may indicate locations, as in
our example.
FrameNet also describes predicate/argument
structures but for this purpose it uses richer
semantic structures called frames. These lat-
ter are schematic representations of situations
involving various participants, properties and
roles in which a word may be typically used.
Frame elements or semantic roles are arguments
of predicates called target words. In FrameNet,
the argument names are local to a particular
frame.
 
Predicate 
Arg. 0 
Arg. M 
S 
N 
NP 
D N 
VP 
V Paul 
in 
gives 
a lecture 
PP 
IN N 
Rome 
Arg. 1 
Figure 1: A predicate argument structure in a
parse-tree representation.
Several machine learning approaches for argu-
ment identi cation and classi cation have been
developed (Gildea and Jurasfky, 2002; Gildea
and Palmer, 2002; Surdeanu et al., 2003; Ha-
cioglu et al., 2003). Their common characteris-
tic is the adoption of feature spaces that model
predicate-argument structures in a  at repre-
sentation. On the contrary, convolution kernels
aim to capture structural information in term
of sub-structures, providing a viable alternative
to  at features.
In this paper, we select portions of syntactic
trees, which include predicate/argument salient
sub-structures, to de ne convolution kernels for
the task of predicate argument classi cation. In
particular, our kernels aim to (a) represent the
relation between predicate and one of its argu-
ments and (b) to capture the overall argument
structure of the target predicate. Additionally,
we de ne novel kernels as combinations of the
above two with the polynomial kernel of stan-
dard  at features.
Experiments on Support Vector Machines us-
ing the above kernels show an improvement
of the state-of-the-art for PropBank argument
classi cation. On the contrary, FrameNet se-
mantic parsing seems to not take advantage of
the structural information provided by our ker-
nels.
The remainder of this paper is organized as
follows: Section 2 de nes the Predicate Argu-
ment Extraction problem and the standard so-
lution to solve it. In Section 3 we present our
kernels whereas in Section 4 we show compar-
ative results among SVMs using standard fea-
tures and the proposed kernels. Finally, Section
5 summarizes the conclusions.
2 Predicate Argument Extraction: a
standard approach
Given a sentence in natural language and the
target predicates, all arguments have to be rec-
ognized. This problem can be divided into two
subtasks: (a) the detection of the argument
boundaries, i.e. all its compounding words and
(b) the classi cation of the argument type, e.g.
Arg0 or ArgM in PropBank or Agent and Goal
in FrameNet.
The standard approach to learn both detec-
tion and classi cation of predicate arguments
is summarized by the following steps:
1. Given a sentence from the training-set gene-
rate a full syntactic parse-tree;
2. let P and A be the set of predicates and
the set of parse-tree nodes (i.e. the potential
arguments), respectively;
3. for each pair <p;a> 2 P  A:
 extract the feature representation set, Fp;a;
 if the subtree rooted in a covers exactly the
words of one argument of p, put Fp;a in T+
(positive examples), otherwise put it in T  
(negative examples).
For example, in Figure 1, for each combina-
tion of the predicate give with the nodes N, S,
VP, V, NP, PP, D or IN the instances F"give";a are
generated. In case the node a exactly covers
Paul, a lecture or in Rome, it will be a positive
instance otherwise it will be a negative one, e.g.
F"give";"IN".
To learn the argument classi ers the T + set
can be re-organized as positive T +argi and neg-
ative T argi examples for each argument i. In
this way, an individual ONE-vs-ALL classi er
for each argument i can be trained. We adopted
this solution as it is simple and e ective (Ha-
cioglu et al., 2003). In the classi cation phase,
given a sentence of the test-set, all its Fp;a
are generated and classi ed by each individ-
ual classi er. As a  nal decision, we select the
argument associated with the maximum value
among the scores provided by the SVMs, i.e.
argmaxi2S Ci, where S is the target set of ar-
guments.
- Phrase Type: This feature indicates the syntactic type
of the phrase labeled as a predicate argument, e.g. NP
for Arg1.
- Parse Tree Path: This feature contains the path in
the parse tree between the predicate and the argument
phrase, expressed as a sequence of nonterminal labels
linked by direction (up or down) symbols, e.g. V " VP
# NP for Arg1.
- Position: Indicates if the constituent, i.e. the potential
argument, appears before or after the predicate in the
sentence, e.g. after for Arg1 and before for Arg0.
- Voice: This feature distinguishes between active or pas-
sive voice for the predicate phrase, e.g. active for every
argument.
- Head Word: This feature contains the headword of the
evaluated phrase. Case and morphological information
are preserved, e.g. lecture for Arg1.
- Governing Category indicates if an NP is dominated by
a sentence phrase or by a verb phrase, e.g. the NP asso-
ciated with Arg1 is dominated by a VP.
- Predicate Word: This feature consists of two compo-
nents: (1) the word itself, e.g. gives for all arguments;
and (2) the lemma which represents the verb normalized
to lower case and in nitive form, e.g. give for all argu-
ments.
Table 1: Standard features extracted from the
parse-tree in Figure 1.
2.1 Standard feature space
The discovery of relevant features is, as usual, a
complex task, nevertheless, there is a common
consensus on the basic features that should be
adopted. These standard features,  rstly pro-
posed in (Gildea and Jurasfky, 2002), refer to
a  at information derived from parse trees, i.e.
Phrase Type, Predicate Word, Head Word, Gov-
erning Category, Position and Voice. Table 1
presents the standard features and exempli es
how they are extracted from the parse tree in
Figure 1.
For example, the Parse Tree Path feature rep-
resents the path in the parse-tree between a
predicate node and one of its argument nodes.
It is expressed as a sequence of nonterminal la-
bels linked by direction symbols (up or down),
e.g. in Figure 1, V"VP#NP is the path between
the predicate to give and the argument 1, a lec-
ture. Two pairs <p1;a1> and <p2;a2> have
two di erent Path features even if the paths dif-
fer only for a node in the parse-tree. This pre-
 
S
N
NP 
D N 
VP 
V Paul 
in 
delivers 
a    talk 
PP 
IN   NP 
jj 
Fdeliver, Arg0 
 formal 
 N 
      style 
Arg. 0 
a) S
N
NP 
D N 
VP 
V Paul 
in 
delivers 
a    talk 
PP 
IN   NP 
jj 
 formal 
 N 
      style 
Fdeliver, Arg1 b) S
N
NP 
D N 
VP 
V Paul 
in 
delivers 
a    talk 
PP 
IN   NP 
jj 
 formal 
 N 
      style Arg. 1 
Fdeliver, ArgM 
 
c) 
Arg. M 
Figure 2: Structured features for Arg0, Arg1 and ArgM.
vents the learning algorithm to generalize well
on unseen data. In order to address this prob-
lem, the next section describes a novel kernel
space for predicate argument classi cation.
2.2 Support Vector Machine approach
Given a vector space in <n and a set of posi-
tive and negative points, SVMs classify vectors
according to a separating hyperplane, H(~x) =
~w  ~x + b = 0, where ~w 2 <n and b 2 < are
learned by applying the Structural Risk Mini-
mization principle (Vapnik, 1995).
To apply the SVM algorithm to Predicate
Argument Classi cation, we need a function
 : F ! <n to map our features space F =
ff1;::;fjFjg and our predicate/argument pair
representation, Fp;a = Fz, into <n, such that:
Fz !  (Fz) = ( 1(Fz);::; n(Fz))
From the kernel theory we have that:
H(~x) =
 X
i=1::l
 i~xi
 
 ~x+b = X
i=1::l
 i~xi  ~x+b =
= X
i=1::l
 i (Fi)   (Fz) + b:
where, Fi 8i 2 f1;::;lg are the training in-
stances and the product K(Fi;Fz) =< (Fi)  
 (Fz)> is the kernel function associated with
the mapping  . The simplest mapping that we
can apply is  (Fz) = ~z = (z1;:::;zn) where
zi = 1 if fi 2 Fz otherwise zi = 0, i.e.
the characteristic vector of the set Fz with re-
spect to F. If we choose as a kernel function
the scalar product we obtain the linear kernel
KL(Fx;Fz) = ~x  ~z.
Another function which is the current state-
of-the-art of predicate argument classi cation is
the polynomial kernel: Kp(Fx;Fz) = (c+~x ~z)d,
where c is a constant and d is the degree of the
polynom.
3 Convolution Kernels for Semantic
Parsing
We propose two di erent convolution kernels
associated with two di erent predicate argu-
ment sub-structures: the  rst includes the tar-
get predicate with one of its arguments. We will
show that it contains almost all the standard
feature information. The second relates to the
sub-categorization frame of verbs. In this case,
the kernel function aims to cluster together ver-
bal predicates which have the same syntactic
realizations. This provides the classi cation al-
gorithm with important clues about the possible
set of arguments suited for the target syntactic
structure.
3.1 Predicate/Argument Feature
(PAF)
We consider the predicate argument structures
annotated in PropBank or FrameNet as our se-
mantic space. The smallest sub-structure which
includes one predicate with only one of its ar-
guments de nes our structural feature. For
example, Figure 2 illustrates the parse-tree of
the sentence "Paul delivers a talk in formal
style". The circled substructures in (a), (b)
and (c) are our semantic objects associated
with the three arguments of the verb to de-
liver, i.e. <deliver, Arg0>, <deliver, Arg1>
and <deliver, ArgM>. Note that each predi-
cate/argument pair is associated with only one
structure, i.e. Fp;a contain only one of the cir-
cled sub-trees. Other important properties are
the followings:
(1) The overall semantic feature space F con-
tains sub-structures composed of syntactic in-
formation embodied by parse-tree dependencies
and semantic information under the form of
predicate/argument annotation.
(2) This solution is e cient as we have to clas-
sify as many nodes as the number of predicate
arguments.
(3) A constituent cannot be part of two di er-
ent arguments of the target predicate, i.e. there
is no overlapping between the words of two ar-
guments. Thus, two semantic structures Fp1;a1
and Fp2;a21, associated with two di erent ar-
1Fp;a was de ned as the set of features of the object
<p; a>. Since in our representations we have only one
S
NP VP
VP VPCC
VBD NP
flushed DT NN
the pan
and VBD NP
buckled PRP$ NN
his belt
PRP
He
Arg0
(flush and buckle)
Arg1 
(flush) Arg1 (buckle)
Predicate 1 Predicate 2
Fflush
Fbuckle
Figure 3: Sub-Categorization Features for two
predicate argument structures.
guments, cannot be included one in the other.
This property is important because a convolu-
tion kernel would not be e ective to distinguish
between an object and its sub-parts.
3.2 Sub-Categorization Feature (SCF)
The above object space aims to capture all
the information between a predicate and one of
its arguments. Its main drawback is that im-
portant structural information related to inter-
argument dependencies is neglected. In or-
der to solve this problem we de ne the Sub-
Categorization Feature (SCF). This is the sub-
parse tree which includes the sub-categorization
frame of the target verbal predicate. For
example, Figure 3 shows the parse tree of
the sentence "He flushed the pan and buckled
his belt". The solid line describes the SCF
of the predicate  ush, i.e. Fflush whereas the
dashed line tailors the SCF of the predicate
buckle, i.e. Fbuckle. Note that SCFs are features
for predicates, (i.e. they describe predicates)
whereas PAF characterizes predicate/argument
pairs.
Once semantic representations are de ned,
we need to design a kernel function to esti-
mate the similarity between our objects. As
suggested in Section 2 we can map them into
vectors in <n and evaluate implicitly the scalar
product among them.
3.3 Predicate/Argument structure
Kernel (PAK)
Given the semantic objects de ned in the previ-
ous section, we design a convolution kernel in a
way similar to the parse-tree kernel proposed
in (Collins and Du y, 2002). We divide our
mapping  in two steps: (1) from the semantic
structure space F (i.e. PAF or SCF objects)
to the set of all their possible sub-structures
element in Fp;a with an abuse of notation we use it to
indicate the objects themselves.
 
 
 
 
 
 
 
 
 
 
NP 
D N 
a   talk 
NP 
D N 
NP 
D N 
a 
D N 
a   talk 
NP 
D N 
NP 
D N 
VP 
V 
delivers 
a    talk 
V 
delivers 
NP 
D N 
VP 
V 
a    talk 
NP 
D N 
VP 
V 
NP 
D N 
VP 
V 
a 
NP 
D
VP 
V 
   talk 
N 
a 
NP 
D N 
VP 
V 
delivers 
   talk 
NP 
D N 
VP 
V 
delivers 
NP 
D N 
VP 
V 
delivers 
NP 
VP 
V 
NP 
VP 
V 
delivers 
  talk 
Figure 4: All 17 valid fragments of the semantic
structure associated with Arg 1 of Figure 2.
F0 = ff01;::;f0jF0jg and (2) from F0 to <jF0j.
An example of features in F0 is given
in Figure 4 where the whole set of frag-
ments, F0deliver;Arg1, of the argument structure
Fdeliver;Arg1, is shown (see also Figure 2).
It is worth noting that the allowed sub-trees
contain the entire (not partial) production rules.
For instance, the sub-tree [NP [D a]] is excluded
from the set of the Figure 4 since only a part of
the production NP ! D N is used in its gener-
ation. However, this constraint does not apply
to the production VP ! V NP PP along with the
fragment [VP [V NP]] as the subtree [VP [PP [...]]]
is not considered part of the semantic structure.
Thus, in step 1, an argument structure Fp;a is
mapped in a fragment set F 0p;a. In step 2, this
latter is mapped into ~x = (x1;::;xjF0j) 2 <jF0j,
where xi is equal to the number of times that
f0i occurs in F0p;a2.
In order to evaluate K( (Fx); (Fz)) without
evaluating the feature vector ~x and ~z we de-
 ne the indicator function Ii(n) = 1 if the sub-
structure i is rooted at node n and 0 otherwise.
It follows that  i(Fx) = Pn2Nx Ii(n), where Nx
is the set of the Fx’s nodes. Therefore, the ker-
nel can be written as:
K( (Fx); (Fz)) =
jF0jX
i=1
( X
nx2Nx
Ii(nx))( X
nz2Nz
Ii(nz))
= X
nx2Nx
X
nz2Nz
X
i
Ii(nx)Ii(nz)
where Nx and Nz are the nodes in Fx and Fz, re-
spectively. In (Collins and Du y, 2002), it has
been shown that Pi Ii(nx)Ii(nz) =  (nx;nz)
can be computed in O(jNxj  jNzj) by the fol-
lowing recursive relation:
(1) if the productions at nx and nz are di erent
then  (nx;nz) = 0;
2A fragment can appear several times in a parse-tree,
thus each fragment occurrence is considered as a di erent
element in F 0p;a.
(2) if the productions at nx and nz are the
same, and nx and nz are pre-terminals then
 (nx;nz) = 1;
(3) if the productions at nx and nz are the same,
and nx and nz are not pre-terminals then
 (nx;nz) =
nc(nx)Y
j=1
(1 +  (ch(nx;j);ch(nz;j)));
where nc(nx) is the number of the children of nx
and ch(n;i) is the i-th child of the node n. Note
that as the productions are the same ch(nx;i) =
ch(nz;i).
This kind of kernel has the drawback of
assigning more weight to larger structures
while the argument type does not strictly
depend on the size of the argument (Moschitti
and Bejan, 2004). To overcome this prob-
lem we can scale the relative importance of
the tree fragments using a parameter  for
the cases (2) and (3), i.e.  (nx;nz) =  and
 (nx;nz) =  Qnc(nx)j=1 (1 +  (ch(nx;j);ch(nz;j)))
respectively.
It is worth noting that even if the above equa-
tions de ne a kernel function similar to the one
proposed in (Collins and Du y, 2002), the sub-
structures on which it operates are di erent
from the parse-tree kernel. For example, Figure
4 shows that structures such as [VP [V] [NP]], [VP
[V delivers ] [NP]] and [VP [V] [NP [DT] [N]]] are
valid features, but these fragments (and many
others) are not generated by a complete produc-
tion, i.e. VP ! V NP PP. As a consequence they
would not be included in the parse-tree kernel
of the sentence.
3.4 Comparison with Standard
Features
In this section we compare standard features
with the kernel based representation in order
to derive useful indications for their use:
First, PAK estimates a similarity between
two argument structures (i.e., PAF or SCF)
by counting the number of sub-structures that
are in common. As an example, the sim-
ilarity between the two structures in Figure
2, F"delivers";Arg0 and F"delivers";Arg1, is equal
to 1 since they have in common only the [V
delivers] substructure. Such low value de-
pends on the fact that di erent arguments tend
to appear in di erent structures.
On the contrary, if two structures di er only
for a few nodes (especially terminals or near
terminal nodes) the similarity remains quite
high. For example, if we change the tense of
the verb to deliver (Figure 2) in delivered, the
[VP [V delivers] [NP]] subtree will be trans-
formed in [VP [VBD delivered] [NP]], where the
NP is unchanged. Thus, the similarity with
the previous structure will be quite high as:
(1) the NP with all sub-parts will be matched
and (2) the small di erence will not highly af-
fect the kernel norm and consequently the  -
nal score. The above property also holds for
the SCF structures. For example, in Figure
3, KPAK ( (Fflush); (Fbuckle)) is quite high as
the two verbs have the same syntactic realiza-
tion of their arguments. In general,  at features
do not possess this conservative property. For
example, the Parse Tree Path is very sensible
to small changes of parse-trees, e.g. two predi-
cates, expressed in di erent tenses, generate two
di erent Path features.
Second, some information contained in the
standard features is embedded in PAF: Phrase
Type, Predicate Word and Head Word explicitly
appear as structure fragments. For example, in
Figure 4 are shown fragments like [NP [DT] [N]] or
[NP [DT a] [N talk]] which explicitly encode the
Phrase Type feature NP for the Arg 1 in Fig-
ure 2.b. The Predicate Word is represented by
the fragment [V delivers] and the Head Word
is encoded in [N talk]. The same is not true for
SCF since it does not contain information about
a speci c argument. SCF, in fact, aims to char-
acterize the predicate with respect to the overall
argument structures rather than a speci c pair
<p;a>.
Third, Governing Category, Position and
Voice features are not explicitly contained in
both PAF and SCF. Nevertheless, SCF may
allow the learning algorithm to detect the ac-
tive/passive form of verbs.
Finally, from the above observations follows
that the PAF representation may be used with
PAK to classify arguments. On the contrary,
SCF lacks important information, thus, alone it
may be used only to classify verbs in syntactic
categories. This suggests that SCF should be
used in conjunction with standard features to
boost their classi cation performance.
4 The Experiments
The aim of our experiments are twofold: On
the one hand, we study if the PAF represen-
tation produces an accuracy higher than stan-
dard features. On the other hand, we study if
SCF can be used to classify verbs according to
their syntactic realization. Both the above aims
can be carried out by combining PAF and SCF
with the standard features. For this purpose
we adopted two ways to combine kernels3: (1)
K = K1  K2 and (2) K =  K1 + K2. The re-
sulting set of kernels used in the experiments is
the following:
 Kpd is the polynomial kernel with degree d
over the standard features.
 KPAF is obtained by using PAK function over
the PAF structures.
 KPAF+P =  KPAFjKPAFj + KpdjK
pdj
, i.e. the sum be-
tween the normalized4 PAF-based kernel and
the normalized polynomial kernel.
 KPAF  P = KPAF  KpdjKPAFj jK
pdj
, i.e. the normalized
product between the PAF-based kernel and the
polynomial kernel.
 KSCF+P =  KSCFjKSCFj + KpdjK
pdj
, i.e. the summa-
tion between the normalized SCF-based kernel
and the normalized polynomial kernel.
 KSCF  P = KSCF KpdjKSCFj jK
pdj
, i.e. the normal-
ized product between SCF-based kernel and the
polynomial kernel.
4.1 Corpora set-up
The above kernels were experimented over two
corpora: PropBank (www.cis.upenn.edu/ ace)
along with Penn TreeBank5 2 (Marcus et al.,
1993) and FrameNet.
PropBank contains about 53,700 sentences
and a  xed split between training and test-
ing which has been used in other researches
e.g., (Gildea and Palmer, 2002; Surdeanu et al.,
2003; Hacioglu et al., 2003). In this split, Sec-
tions from 02 to 21 are used for training, section
23 for testing and sections 1 and 22 as devel-
oping set. We considered all PropBank argu-
ments6 from Arg0 to Arg9, ArgA and ArgM for
a total of 122,774 and 7,359 arguments in train-
ing and testing respectively. It is worth noting
that in the experiments we used the gold stan-
dard parsing from Penn TreeBank, thus our ker-
nel structures are derived with high precision.
For the FrameNet corpus (www.icsi.berkeley
3It can be proven that the resulting kernels still sat-
isfy Mercer’s conditions (Cristianini and Shawe-Taylor,
2000).
4To normalize a kernel K(~x; ~z) we can divide it byp
K(~x; ~x)  K(~z; ~z).
5We point out that we removed from Penn TreeBank
the function tags like SBJ and TMP as parsers usually
are not able to provide this information.
6We noted that only Arg0 to Arg4 and ArgM con-
tain enough training/testing data to a ect the overall
performance.
.edu/ framenet) we extracted all 24,558 sen-
tences from the 40 frames of Senseval 3 task
(www.senseval.org) for the Automatic Labeling
of Semantic Roles. We considered 18 of the
most frequent roles and we mapped together
those having the same name. Only verbs are se-
lected to be predicates in our evaluations. More-
over, as it does not exist a  xed split between
training and testing, we selected randomly 30%
of sentences for testing and 70% for training.
Additionally, 30% of training was used as a
validation-set. The sentences were processed us-
ing Collins’ parser (Collins, 1997) to generate
parse-trees automatically.
4.2 Classi cation set-up
The classi er evaluations were carried out using
the SVM-light software (Joachims, 1999) avail-
able at svmlight.joachims.org with the default
polynomial kernel for standard feature evalu-
ations. To process PAF and SCF, we imple-
mented our own kernels and we used them in-
side SVM-light.
The classi cation performances were evalu-
ated using the f1 measure7 for single arguments
and the accuracy for the  nal multi-class clas-
si er. This latter choice allows us to compare
the results with previous literature works, e.g.
(Gildea and Jurasfky, 2002; Surdeanu et al.,
2003; Hacioglu et al., 2003).
For the evaluation of SVMs, we used the de-
fault regularization parameter (e.g., C = 1 for
normalized kernels) and we tried a few cost-
factor values (i.e., j 2 f0:1;1;2;3;4;5g) to ad-
just the rate between Precision and Recall. We
chose parameters by evaluating SVM using Kp3
kernel over the validation-set. Both  (see Sec-
tion 3.3) and  parameters were evaluated in a
similar way by maximizing the performance of
SVM using KPAF and  KSCFjKSCFj + KpdjK
pdj
respec-
tively. These parameters were adopted also for
all the other kernels.
4.3 Kernel evaluations
To study the impact of our structural kernels we
 rstly derived the maximal accuracy reachable
with standard features along with polynomial
kernels. The multi-class accuracies, for Prop-
Bank and FrameNet using Kpd with d = 1;::;5,
are shown in Figure 5. We note that (a) the
highest performance is reached for d = 3, (b)
for PropBank our maximal accuracy (90.5%)
7f1 assigns equal importance to Precision P and Re-
call R, i.e. f1 = 2P  RP+R .
is substantially equal to the SVM performance
(88%) obtained in (Hacioglu et al., 2003) with
degree 2 and (c) the accuracy on FrameNet
(85.2%) is higher than the best result obtained
in literature, i.e. 82.0% in (Gildea and Palmer,
2002). This di erent outcome is due to a di er-
ent task (we classify di erent roles) and a di er-
ent classi cation algorithm. Moreover, we did
not use the Frame information which is very im-
portant8.
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.9
0.91
1 2 3 4 5d
A
c
c
u
r
a
c
y FrameNet
PropBank
Figure 5: Multi-classi er accuracy according to dif-
ferent degrees of the polynomial kernel.
It is worth noting that the di erence between
linear and polynomial kernel is about 3-4 per-
cent points for both PropBank and FrameNet.
This remarkable di erence can be easily ex-
plained by considering the meaning of standard
features. For example, let us restrict the classi-
 cation function CArg0 to the two features Voice
and Position. Without loss of generality we can
assume: (a) Voice=1 if active and 0 if passive,
and (b) Position=1 when the argument is af-
ter the predicate and 0 otherwise. To simplify
the example, we also assume that if an argu-
ment precedes the target predicate it is a sub-
ject, otherwise it is an object9. It follows that
a constituent is Arg0, i.e. CArg0 = 1, if only
one feature at a time is 1, otherwise it is not
an Arg0, i.e. CArg0 = 0. In other words, CArg0
= Position XOR Voice, which is the classical ex-
ample of a non-linear separable function that
becomes separable in a superlinear space (Cris-
tianini and Shawe-Taylor, 2000).
After it was established that the best ker-
nel for standard features is Kp3, we carried out
all the other experiments using it in the kernel
combinations. Table 2 and 3 show the single
class (f1 measure) as well as multi-class classi-
 er (accuracy) performance for PropBank and
FrameNet respectively. Each column of the two
tables refers to a di erent kernel de ned in the
8Preliminary experiments indicate that SVMs can
reach 90% by using the frame feature.
9Indeed, this is true in most part of the cases.
previous section. The overall meaning is dis-
cussed in the following points:
First, PAF alone has good performance, since
in PropBank evaluation it outperforms the lin-
ear kernel (Kp1), 88.7% vs. 86.7% whereas in
FrameNet, it shows a similar performance 79.5%
vs. 82.1% (compare tables with Figure 5). This
suggests that PAF generates the same informa-
tion as the standard features in a linear space.
However, when a degree greater than 1 is used
for standard features, PAF is outperformed10.
Args P3 PAF PAF+P PAF P SCF+P SCF P
Arg0 90.8 88.3 90.6 90.5 94.6 94.7
Arg1 91.1 87.4 89.9 91.2 92.9 94.1
Arg2 80.0 68.5 77.5 74.7 77.4 82.0
Arg3 57.9 56.5 55.6 49.7 56.2 56.4
Arg4 70.5 68.7 71.2 62.7 69.6 71.1
ArgM 95.4 94.1 96.2 96.2 96.1 96.3
Acc. 90.5 88.7 90.2 90.4 92.4 93.2
Table 2: Evaluation of Kernels on PropBank.
Roles P3 PAF PAF+P PAF P SCF+P SCF P
agent 92.0 88.5 91.7 91.3 93.1 93.9
cause 59.7 16.1 41.6 27.7 42.6 57.3
degree 74.9 68.6 71.4 57.8 68.5 60.9
depict. 52.6 29.7 51.0 28.6 46.8 37.6
durat. 45.8 52.1 40.9 29.0 31.8 41.8
goal 85.9 78.6 85.3 82.8 84.0 85.3
instr. 67.9 46.8 62.8 55.8 59.6 64.1
mann. 81.0 81.9 81.2 78.6 77.8 77.8
Acc. 85.2 79.5 84.6 81.6 83.8 84.2
18 roles
Table 3: Evaluation of Kernels on FrameNet se-
mantic roles.
Second, SCF improves the polynomial kernel
(d = 3), i.e. the current state-of-the-art, of
about 3 percent points on PropBank (column
SCF P). This suggests that (a) PAK can mea-
sure the similarity between two SCF structures
and (b) the sub-categorization information pro-
vides e ective clues about the expected argu-
ment type. The interesting consequence is that
SCF together with PAK seems suitable to au-
tomatically cluster di erent verbs that have the
same syntactic realization. We note also that to
fully exploit the SCF information it is necessary
to use a kernel product (K1  K2) combination
rather than the sum (K1 + K2), e.g. column
SCF+P.
Finally, the FrameNet results are completely
di erent. No kernel combinations with both
PAF and SCF produce an improvement. On
10Unfortunately the use of a polynomial kernel on top
the tree fragments to generate the XOR functions seems
not successful.
the contrary, the performance decreases, sug-
gesting that the classi er is confused by this
syntactic information. The main reason for the
di erent outcomes is that PropBank arguments
are di erent from semantic roles as they are
an intermediate level between syntax and se-
mantic, i.e. they are nearer to grammatical
functions. In fact, in PropBank arguments are
annotated consistently with syntactic alterna-
tions (see the Annotation guidelines for Prop-
Bank at www.cis.upenn.edu/ ace). On the con-
trary FrameNet roles represent the  nal seman-
tic product and they are assigned according to
semantic considerations rather than syntactic
aspects. For example, Cause and Agent seman-
tic roles have identical syntactic realizations.
This prevents SCF to distinguish between them.
Another minor reason may be the use of auto-
matic parse-trees to extract PAF and SCF, even
if preliminary experiments on automatic seman-
tic shallow parsing of PropBank have shown no
important di erences versus semantic parsing
which adopts Gold Standard parse-trees.
5 Conclusions
In this paper, we have experimented with
SVMs using the two novel convolution kernels
PAF and SCF which are designed for the se-
mantic structures derived from PropBank and
FrameNet corpora. Moreover, we have com-
bined them with the polynomial kernel of stan-
dard features. The results have shown that:
First, SVMs using the above kernels are ap-
pealing for semantically parsing both corpora.
Second, PAF and SCF can be used to improve
automatic classi cation of PropBank arguments
as they provide clues about the predicate argu-
ment structure of the target verb. For example,
SCF improves (a) the classi cation state-of-the-
art (i.e. the polynomial kernel) of about 3 per-
cent points and (b) the best literature result of
about 5 percent points.
Third, additional work is needed to design
kernels suitable to learn the deep semantic con-
tained in FrameNet as it seems not sensible to
both PAF and SCF information.
Finally, an analysis of SVMs using poly-
nomial kernels over standard features has ex-
plained why they largely outperform linear clas-
si ers based-on standard features.
In the future we plan to design other struc-
tures and combine them with SCF, PAF and
standard features. In this vision the learning
will be carried out on a set of structural features
instead of a set of  at features. Other studies
may relate to the use of SCF to generate verb
clusters.
Acknowledgments
This research has been sponsored by the ARDA
AQUAINT program. In addition, I would like to
thank Professor Sanda Harabagiu for her advice,
Adrian Cosmin Bejan for implementing the feature
extractor and Paul Mor arescu for processing the
FrameNet data. Many thanks to the anonymous re-
viewers for their invaluable suggestions.
References
Michael Collins and Nigel Du y. 2002. New ranking
algorithms for parsing and tagging: Kernels over
discrete structures, and the voted perceptron. In
proceeding of ACL-02.
Michael Collins. 1997. Three generative, lexicalized
models for statistical parsing. In proceedings of
the ACL-97, pages 16{23, Somerset, New Jersey.
Nello Cristianini and John Shawe-Taylor. 2000. An
introduction to Support Vector Machines. Cam-
bridge University Press.
Charles J. Fillmore. 1982. Frame semantics. In Lin-
guistics in the Morning Calm, pages 111{137.
Daniel Gildea and Daniel Jurasfky. 2002. Auto-
matic labeling of semantic roles. Computational
Linguistic.
Daniel Gildea and Martha Palmer. 2002. The neces-
sity of parsing for predicate argument recognition.
In proceedings of ACL-02, Philadelphia, PA.
R. Jackendo . 1990. Semantic Structures, Current
Studies in Linguistics series. Cambridge, Mas-
sachusetts: The MIT Press.
T. Joachims. 1999. Making large-scale SVM learn-
ing practical. In Advances in Kernel Methods -
Support Vector Learning.
Paul Kingsbury and Martha Palmer. 2002. From
treebank to propbank. In proceedings of LREC-
02, Las Palmas, Spain.
M. P. Marcus, B. Santorini, and M. A.
Marcinkiewicz. 1993. Building a large anno-
tated corpus of english: The penn treebank.
Computational Linguistics.
Alessandro Moschitti and Cosmin Adrian Bejan.
2004. A semantic kernel for predicate argu-
ment classi cation. In proceedings of CoNLL-04,
Boston, USA.
Kadri Hacioglu, Sameer Pradhan, Wayne Ward,
James H. Martin, and Daniel Jurafsky. 2003.
Shallow Semantic Parsing Using Support Vector
Machines. TR-CSLR-2003-03, University of Col-
orado.
Mihai Surdeanu, Sanda M. Harabagiu, John
Williams, and John Aarseth. 2003. Using
predicate-argument structures for information ex-
traction. In proceedings of ACL-03, Sapporo,
Japan.
V. Vapnik. 1995. The Nature of Statistical Learning
Theory. Springer-Verlag New York, Inc.
