Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL-X),
pages 61–68, New York City, June 2006. c©2006 Association for Computational Linguistics
Semantic Role Labeling via Tree Kernel Joint Inference
Alessandro Moschitti, Daniele Pighin and Roberto Basili
Department of Computer Science
University of Rome ”Tor Vergata”
00133 Rome, Italy
{moschitti,basili}@info.uniroma2.it
daniele.pighin@gmail.com
Abstract
Recent work on Semantic Role Labeling
(SRL) has shown that to achieve high
accuracy a joint inference on the whole
predicate argument structure should be ap-
plied. In this paper, we used syntactic sub-
trees that span potential argument struc-
tures of the target predicate in tree ker-
nel functions. This allows Support Vec-
tor Machines to discern between correct
and incorrect predicate structures and to
re-rank them based on the joint probabil-
ity of their arguments. Experiments on the
PropBank data show that both classifica-
tion and re-ranking based on tree kernels
can improve SRL systems.
1 Introduction
Recent work on Semantic Role Labeling (SRL)
(Carreras and M`arquez, 2005) has shown that to
achieve high labeling accuracy a joint inference on
the whole predicate argument structure should be
applied. For this purpose, we need to extract fea-
tures from the sentence’s syntactic parse tree that
encodes the target semantic structure. This task is
rather complex since we do not exactly know which
are the syntactic clues that capture the relation be-
tween the predicate and its arguments. For exam-
ple, to detect the interesting context, the modeling
of syntax/semantics-based features should take into
account linguistic aspects like ancestor nodes or se-
mantic dependencies (Toutanova et al., 2004).
A viable approach to generate a large number of
features has been proposed in (Collins and Duffy,
2002), where convolution kernels were used to im-
plicitly define a tree substructure space. The selec-
tion of the relevant structural features was left to the
Voted Perceptron learning algorithm. Such success-
ful experimentation shows that tree kernels are very
promising for automatic feature engineering, espe-
cially when the available knowledge about the phe-
nomenon is limited.
In a similar way, we can model SRL systems with
tree kernels to generate large feature spaces. More
in detail, most SRL systems split the labeling pro-
cess into two different steps: Boundary Detection
(i.e. to determine the text boundaries of predicate
arguments) and Role Classification (i.e. labeling
such arguments with a semantic role, e.g. Arg0 or
Arg1 as defined in (Kingsbury and Palmer, 2002)).
The former relates to the detection of syntactic parse
tree nodes associated with constituents that corre-
spond to arguments, whereas the latter considers the
boundary nodes for the assignment of the suitable
label. Both steps require the design and extraction
of features from parse trees. As capturing the tightly
interdependent relations among a predicate and its
arguments is a complex task, we can apply tree ker-
nels on the subtrees that span the whole predicate
argument structure to generate the feature space of
all the possible subtrees.
In this paper, we apply the traditional bound-
ary (TBC) and role (TRC) classifiers (Pradhan
et al., 2005a), which are based on binary predi-
cate/argument relations, to label all parse tree nodes
corresponding to potential arguments. Then, we ex-
61
tract the subtrees which span the predicate-argument
dependencies of such arguments, i.e. Argument
Spanning Trees (ASTs). These are used in a tree
kernel function to generate all possible substructures
that encode n-ary argument relations, i.e. we carry
out an automatic feature engineering process.
To validate our approach, we experimented with
our model and Support Vector Machines for the clas-
sification of valid and invalid ASTs. The results
show that this classification problem can be learned
with high accuracy. Moreover, we modeled SRL as a
re-ranking task in line with (Toutanova et al., 2005).
The large number of complex features provided by
tree kernels for structured learning allows SVMs to
reach the state-of-the-art accuracy.
The paper is organized as follows: Section 2 intro-
duces the Semantic Role Labeling based on SVMs
and the tree kernel spaces; Section 3 formally de-
fines the ASTs and the algorithm for their classifi-
cation and re-ranking; Section 4 shows the compara-
tive results between our approach and the traditional
one; Section 5 presents the related work; and finally,
Section 6 summarizes the conclusions.
2 Semantic Role Labeling
In the last years, several machine learning ap-
proaches have been developed for automatic role
labeling, e.g. (Gildea and Jurasfky, 2002; Prad-
han et al., 2005a). Their common characteristic is
the adoption of attribute-value representations for
predicate-argument structures. Accordingly, our ba-
sic system is similar to the one proposed in (Pradhan
et al., 2005a) and it is hereby described.
We use a boundary detection classifier (for any
role type) to derive the words compounding an ar-
gument and a multiclassifier to assign the roles (e.g.
Arg0 or ArgM) described in PropBank (Kingsbury
and Palmer, 2002)). To prepare the training data for
both classifiers, we used the following algorithm:
1. Given a sentence from the training-set, generate
a full syntactic parse tree;
2. Let P and A be respectively the set of predicates
and the set of parse-tree nodes (i.e. the potential ar-
guments);
3. For each pair 〈p,a〉∈P ×A:
- extract the feature representation set, Fp,a;
- if the subtree rooted in a covers exactly the
words of one argument of p, put Fp,a in the T+
set (positive examples), otherwise put it in the
T− set (negative examples).
The outputs of the above algorithm are the T+ and
T− sets. These sets can be directly used to train a
boundary classifier (e.g. an SVM). Regarding the
argument type classifier, a binary labeler for a role r
(e.g. an SVM) can be trained on the T+r , i.e. its pos-
itive examples and T−r , i.e. its negative examples,
where T+ = T+r ∪ T−r , according to the ONE-vs-
ALL scheme. The binary classifiers are then used
to build a general role multiclassifier by simply se-
lecting the argument associated with the maximum
among the SVM scores.
Regarding the design of features for predicate-
argument pairs, we can use the attribute-values de-
fined in (Gildea and Jurasfky, 2002) or tree struc-
tures (Moschitti, 2004). Although we focus on
the latter approach, a short description of the for-
mer is still relevant as they are used by TBC and
TRC. They include the Phrase Type, Predicate
Word, Head Word, Governing Category, Position
and Voice features. For example, the Phrase Type
indicates the syntactic type of the phrase labeled as
a predicate argument and the Parse Tree Path con-
tains the path in the parse tree between the predicate
and the argument phrase, expressed as a sequence of
nonterminal labels linked by direction (up or down)
symbols, e.g. V↑VP↓NP.
A viable alternative to manual design of syntac-
tic features is the use of tree-kernel functions. These
implicitly define a feature space based on all possi-
ble tree substructures. Given two trees T1 and T2, in-
stead of representing them with the whole fragment
space, we can apply the kernel function to evaluate
the number of common fragments.
Formally, given a tree fragment space F =
{f1,f2,...,f|F|}, the indicator function Ii(n)
is equal to 1 if the target fi is rooted at
node n and equal to 0 otherwise. A tree-
kernel function over t1 and t2 is Kt(t1,t2) =summationtext
n1∈Nt1
summationtext
n2∈Nt2 ∆(n1,n2), where Nt1 and Nt2
are the sets of the t1’s and t2’s nodes, respectively. In
turn ∆(n1,n2) = summationtext|F|i=1 λl(fi)Ii(n1)Ii(n2), where
0 ≤ λ ≤ 1 and l(fi) is the height of the subtree
fi. Thus λl(fi) assigns a lower weight to larger frag-
62
S
NP VP
PRP
John
VP CC VPVB NP
and
VB NP
took
DTNN
thebook read
PRP$NN
its title
Sentence Parse-Tree
S
NP VP
PRP
John
VPVB NP
took
DTNN
thebook
took{ARG0, ARG1}
S
NP VP
PRP
John
VPVB NP
read
PRP$NN
its title
read{ARG0, ARG1}
Figure 1: A sentence parse tree with two argument spanning trees (ASTs)
ments. When λ = 1, ∆ is equal to the number of
common fragments rooted at nodes n1 and n2. As
described in (Collins and Duffy, 2002), ∆ can be
computed in O(|Nt1|×|Nt2|).
3 Tree kernel-based classification of
Predicate Argument Structures
Traditional semantic role labeling systems extract
features from pairs of nodes corresponding to a
predicate and one of its argument, respectively.
Thus, they focus on only binary relations to make
classification decisions. This information is poorer
than the one expressed by the whole predicate ar-
gument structure. As an alternative we can select
the set of potential arguments (potential argument
nodes) of a predicate and extract features from them.
The number of the candidate argument sets is ex-
ponential, thus we should consider only those cor-
responding to the most probable correct argument
structures.
The usual approach (Toutanova et al., 2005) uses
a traditional boundary classifier (TBC) to select the
set of potential argument nodes. Such set can be as-
sociated with a subtree which in turn can be classi-
fied by means of a tree kernel function. This func-
tion intuitively measures to what extent a given can-
didate subtree is compatible with the subtree of a
correct predicate argument structure. We can use it
to define two different learning problems: (a) the
simple classification of correct and incorrect pred-
icate argument structures and (b) given the best m
structures, we can train a re-ranker algorithm able to
exploit argument inter-dependencies.
3.1 The Argument Spanning Trees (ASTs)
We consider predicate argument structures anno-
tated in PropBank along with the corresponding
TreeBank data as our object space. Given the target
predicate node p and a node subset s = {n1,..,nk}
of the parse tree t, we define as the spanning tree
root r the lowest common ancestor of n1,..,nk and
p. The node set spanning tree (NST) ps is the sub-
tree of t rooted in r from which the nodes that are
neither ancestors nor descendants of any ni or p are
removed.
Since predicate arguments are associated with
tree nodes (i.e. they exactly fit into syntactic
constituents), we can define the Argument Span-
ning Tree (AST) of a predicate argument set,
{p,{a1,..,an}}, as the NST over such nodes,
i.e. p{a1,..,an}. An AST corresponds to the min-
imal subtree whose leaves are all and only the
words compounding the arguments and the predi-
cate. For example, Figure 1 shows the parse tree
of the sentence "John took the book and read
its title". took{Arg0,Arg1} and read{Arg0,Arg1}
are two AST structures associated with the two
predicates took and read, respectively. All the other
possible subtrees, i.e. NSTs, are not valid ASTs
for these two predicates. Note that classifying ps in
AST or NST for each node subset s of t is equiva-
lent to solve the boundary detection problem.
The critical points for the AST classification are:
(1) how to design suitable features for the charac-
terization of valid structures. This requires a careful
linguistic investigation about their significant prop-
erties. (2) How to deal with the exponential number
of NSTs.
The first problem can be addressed by means of
tree kernels over the ASTs. Tree kernel spaces are
an alternative to the manual feature design as the
learning machine, (e.g. SVMs) can select the most
relevant features from a high dimensional space. In
other words, we can use a tree kernel function to
estimate the similarity between two ASTs (see Sec-
63
Figure 2: Two-step boundary classification. a) Sentence tree; b) Two candidate ASTs; c) Extended AST-
Ord labeling
tion 2), hence avoiding to define explicit features.
The second problem can be approached in two
ways:
(1) We can increase the recall of TBC to enlarge the
set of candidate arguments. From such set, we can
extract correct and incorrect argument structures. As
the number of such structures will be rather small,
we can apply the AST classifier to detect the cor-
rect ones.
(2) We can consider the classification probability
provided by TBC and TRC (Pradhan et al., 2005a)
and select the m most probable structures. Then, we
can apply a re-ranking approach based on SVMs and
tree kernels.
The re-ranking approach is the most promising
one as suggested in (Toutanova et al., 2005) but it
does not clearly reveal if tree kernels can be used
to learn the difference between correct or incorrect
argument structures. Thus it is interesting to study
both the above approaches.
3.2 NST Classification
As we cannot classify all possible candidate argu-
ment structures, we apply the AST classifier just to
detect the correct structures from a set of overlap-
ping arguments. Given two nodes n1 and n2 of an
NST, they overlap if either n1 is ancestor of n2 or
vice versa. NSTs that contain overlapping nodes
are not valid ASTs but subtrees of NSTs may be
valid ASTs. Assuming this, we define s as the set
of potential argument nodes and we create two node
sets s1 = s−{n1} and s2 = s−{n2}. By classi-
fying the two new NSTs ps1 and ps2 with the AST
classifier, we can select the correct structures. Of
course, this procedure can be generalized to a set of
overlapping nodes greater than 2. However, consid-
ering that the Precision of TBC is generally high,
the number of overlapping nodes is usually small.
Figure 2 shows a working example of the multi-
stage classifier. In Frame (a), TBC labels as po-
tential arguments (circled nodes) three overlapping
nodes related to Arg1. This leads to two possible
non-overlapping solutions (Frame (b)) but only the
first one is correct. In fact, according to the second
one the propositional phrase ”of the book” would be
incorrectly attached to the verbal predicate, i.e. in
contrast with the parse tree. The AST classifier, ap-
plied to the two NSTs, is expected to detect this
inconsistency and provide the correct output.
3.3 Re-ranking NSTs with Tree Kernels
To implement the re-ranking model, we follow the
approach described in (Toutanova et al., 2005).
First, we use SVMs to implement the boundary
TBC and role TRC local classifiers. As SVMs do
not provide probabilistic output, we use the Platt’s
algorithm (Platt, 2000) and its revised version (Lin
et al., 2003) to trasform scores into probabilities.
Second, we combine TBC and TRC probabil-
ities to obtain the m most likely sequences s of
tree nodes annotated with semantic roles. As argu-
ment constituents of the same verb cannot overlap,
we generate sequences that respect such node con-
straint. We adopt the same algorithm described in
(Toutanova et al., 2005). We start from the leaves
and we select the m sequences that respect the con-
straints and at the same time have the highest joint
probability of TBC and TRC.
Third, we extract the following feature represen-
tation:
(a) The ASTs associated with the predicate argu-
ment structures. To make faster the learning process
and to try to only capture the most relevant features,
we also experimented with a compact version of the
64
AST which is pruned at the level of argument nodes.
(b) Attribute value features (standard features) re-
lated to the whole predicate structure. These include
the features for each arguments (Gildea and Juras-
fky, 2002) and global features like the sequence of
argument labels, e.g. 〈Arg0,Arg1,ArgM〉.
Finally, we prepare the training examples for the
re-ranker considering the m best annotations of each
predicate structure. We use the approach adopted
in (Shen et al., 2003), which generates all possible
pairs from the m examples, i.e. parenleftbigm2parenrightbig pairs. Each pair
is assigned to a positive example if the first mem-
ber of the pair has a higher score than the second
member. The score that we use is the F1 measure
of the annotated structure with respect to the gold
standard. More in detail, given training/testing ex-
amples ei = 〈t1i,t2i,v1i,v2i〉, where t1i and t2i are two
ASTs and v1i and v2i are two feature vectors associ-
ated with two candidate predicate structures s1 and
s2, we define the following kernels:
1) Ktr(e1,e2) = Kt(t11,t12)+Kt(t21,t22)
−Kt(t11,t22)−Kt(t21,t12),
where tji is the j-th AST of the pair ei, Kt is the
tree kernel function defined in Section 2 and i,j ∈
{1,2}.
2) Kpr(e1,e2) = Kp(v11,v12)+Kp(v21,v22)
−Kp(v11,v22)−Kp(v21,v12),
where vji is the j-th feature vector of the pair ei and
Kp is the polynomial kernel applied to such vectors.
The final kernel that we use for re-ranking is the
following:
K(e1,e2) = Ktr(e1,e2)|K
tr(e1,e2)|
+ Kpr(e1,e2)|K
pr(e1,e2)|
Regarding tree kernel feature engineering, the
next section show how we can generate more effec-
tive features given an established kernel function.
3.4 Tree kernel feature engineering
Consider the Frame (b) of Figure 2, it shows two
perfectly identical NSTs, consequently, their frag-
ments will also be equal. This prevents the algorithm
to learn something from such examples. To solve the
problem, we can enrich the NSTs by marking their
argument nodes with a progressive number, starting
from the leftmost argument. For example, in the first
NST of Frame (c), we mark as NP-0 and NP-1 the
first and second argument nodes whereas in the sec-
ond NST we trasform the three argument node la-
bels in NP-0, NP-1 and PP-2. We will refer to the
resulting structure as a AST-Ord (ordinal number).
This simple modification allows the tree kernel to
generate different argument structures for the above
NSTs. For example, from the first NST in Fig-
ure 2.c, the fragments [NP-1 [NP][PP]], [NP
[DT][NN]] and [PP [IN][NP]] are gener-
ated. They do not match anymore with the [NP-0
[NP][PP]], [NP-1 [DT][NN]] and [PP-2
[IN][NP]] fragments generated from the second
NST in Figure 2.c.
Additionally, it should be noted that the semantic
information provided by the role type can remark-
ably help the detection of correct or incorrect predi-
cate argument structures. Thus, we can enrich the ar-
gument node label with the role type, e.g. the NP-0
and NP-1 of the correct AST of Figure 2.c become
NP-Arg0 and NP-Arg1 (not shown in the figure).
We refer to this structure as AST-Arg. Of course,
to apply the AST-Arg classifier, we need that TRC
labels the arguments detected by TBC.
4 The experiments
The experiments were carried out within the set-
ting defined in the CoNLL-2005 Shared Task
(Carreras and M`arquez, 2005). In particular,
we adopted the Charniak parse trees available at
www.lsi.upc.edu/∼srlconll/ along with the of-
ficial performance evaluator.
All the experiments were performed with
the SVM-light-TK software available at
http://ai-nlp.info.uniroma2.it/moschitti/
which encodes ST and SST kernels in SVM-light
(Joachims, 1999). For TBC and TRC, we used the
linear kernel with a regularization parameter (option
-c) equal to 1. A cost factor (option -j) of 10 was
adopted for TBC to have a higher Recall, whereas
for TRC, the cost factor was parameterized accord-
ing to the maximal accuracy of each argument class
on the validation set. For the AST-based classifiers
we used a λ equal to 0.4 (see (Moschitti, 2004)).
65
Section 21 Section 23
AST Class. P. R. F1 P. R. F1
− 69.8 77.9 73.7 62.2 77.1 68.9
Ord 73.7 81.2 77.3 63.7 80.6 71.2
Arg 73.6 84.7 78.7 64.2 82.3 72.1
Table 1: AST, AST-Ord, and AST-Arg perfor-
mance on sections 21 and 23.
4.1 Classification of whole predicate argument
structures
In these experiments, we trained TBC on sections
02-08 whereas, to achieve a very accurate role clas-
sifier, we trained TRC on all sections 02-21. To
train the AST, AST-Ord (AST with ordinal num-
bers in the argument nodes), and AST-Arg (AST
with argument type in the argument nodes) clas-
sifiers, we applied the TBC and TRC over sec-
tions 09-20. Then, we considered all the structures
whose automatic annotation showed at least an ar-
gument overlap. From these, we extracted 30,220
valid ASTs and 28,143 non-valid ASTs, for a total
of 183,642 arguments.
First, we evaluate the accuracy of the AST-based
classifiers by extracting 1,975 ASTs and 2,220 non-
ASTs from Section 21 and the 2,159 ASTs and
3,461 non-ASTs from Section 23. The accuracy
derived on Section 21 is an upperbound for our clas-
sifiers since it is obtained using an ideal syntactic
parser (the Charniak’s parser was trained also on
Section 21) and an ideal role classifier.
Table 1 shows Precision, Recall and F1 mea-
sures of the AST-based classifiers over the above
NSTs. Rows 2, 3 and 4 report the performance of
AST, AST-Ord, and AST-Arg classifiers, respec-
tively. We note that: (a) The impact of parsing ac-
curacy is shown by the gap of about 6% points be-
tween sections 21 and 23. (b) The ordinal number-
ing of arguments (Ord) and the role type informa-
tion (Arg) provide tree kernels with more meaning-
ful fragments since they improve the basic model
of about 4%. (c) The deeper semantic information
generated by the Arg labels provides useful clues to
select correct predicate argument structures since it
improves the Ord model on both sections.
Second, we measured the impact of the AST-
based classifiers on the accuracy of both phases of
semantic role labeling. Table 2 reports the results
on sections 21 and 23. For each of them, Precision,
Recall and F1 of different approaches to bound-
ary identification (bnd) and to the complete task,
i.e. boundary and role classification (bnd+class)
are shown. Such approaches are based on differ-
ent strategies to remove the overlaps, i.e. with the
AST, AST-Ord and AST-Arg classifiers and using
the baseline (RND), i.e. a random selection of non-
overlapping structures. The baseline corresponds to
the system based on TBC and TRC1.
We note that: (a) for any model, the boundary de-
tection F1 on Section 21 is about 10 points higher
than the F1 on Section 23 (e.g. 87.0% vs. 77.9%
for RND). As expected the parse tree quality is very
important to detect argument boundaries. (b) On the
real test (Section 23) the classification introduces la-
beling errors which decrease the accuracy of about
5% (77.9 vs 72.9 for RND). (c) The Ord and Arg
approaches constantly improve the baseline F1 of
about 1%. Such poor impact does not surprise as
the overlapping structures are a small percentage of
the test set, thus the overall improvement cannot be
very high.
Third, the comparison with the CoNLL 2005 re-
sults (Carreras and M`arquez, 2005) can only be
carried out with respect to the whole SRL task
(bnd+class in table 2) since boundary detection ver-
sus role classification is generally not provided in
CoNLL 2005. Moreover, our best global result, i.e.
73.9%, was obtained under two severe experimental
factors: a) the use of just 1/3 of the available train-
ing set, and b) the usage of the linear SVM model
for the TBC classifier, which is much faster than the
polynomial SVMs but also less accurate. However,
we note the promising results of the AST meta-
classifier, which can be used with any of the best
figure CoNLL systems.
Finally, the overall results suggest that the tree
kernel model is robust to parse tree errors since pre-
serves the same improvement across trees derived
with different accuracy, i.e. the semi-automatic trees
of Section 21 and the automatic tree of Section 23.
Moreover, it shows a high accuracy for the classi-
fication of correct and incorrect ASTs. This last
property is quite interesting as the best SRL systems
1We needed to remove the overlaps from the baseline out-
come in order to apply the CoNLL evaluator.
66
(Punyakanok et al., 2005; Toutanova et al., 2005;
Pradhan et al., 2005b) were obtained by exploit-
ing the information on the whole predicate argument
structure.
Next section shows our preliminary experiments
on re-ranking using the AST kernel based approach.
4.2 Re-ranking based on Tree Kernels
In these experiments, we used the output of TBC
and TRC2 to provide an SVM tree kernel with a
ranked list of predicate argument structures. More in
detail, we applied a Viterbi-like algorithm to gener-
ate the 20 most likely annotations for each predicate
structure, according to the joint probabilistic model
of TBC and TRC. We sorted such structures based
on their F1 measure and used them to learn the SVM
re-ranker described in 3.3.
For training, we used Sections 12, 14, 15, 16
and 24, which contain 24,729 predicate structures.
For each of them, we considered the 5 annotations
having the highest F1 score (i.e. 123,674 NSTs)
on the span of the 20 best annotations provided by
Viterbi algorithm. With such structures, we ob-
tained 294,296 pairs used to train the SVM-based
re-ranker. As the number of such structures is very
large the SVM training time was very high. Thus,
we sped up the learning process by using only the
ASTs associated with the core arguments. From the
test sentences (which contain 5,267 structures), we
extracted the 20 best Viterbi annotated structures,
i.e. 102,343 (for a total of 315.531 pairs), which
were used for the following experiments:
First, we selected the best annotation (according
to the F1 provided by the gold standard annotations)
out of the 20 provided by the Viterbi’s algorithm.
The resulting F1 of 88.59% is the upperbound of our
approach.
Second, we selected the top ranked annotation in-
dicated by the Viterbi’s algorithm. This provides our
baseline F1 measure, i.e. 75.91%. Such outcome is
slightly higher than our official CoNLL result (Mos-
chitti et al., 2005) obtained without converting SVM
scores into probabilities.
Third, we applied the SVM re-ranker to select
2With the aim of improving the state-of-the-art, we applied
the polynomial kernel for all basic classifiers, at this time.
We used the models developed during our participation to the
CoNLL 2005 shared task (Moschitti et al., 2005).
the best structures according to the core roles. We
achieved 80.68% which is practically equal to the
result obtained in (Punyakanok et al., 2005; Car-
reras and M`arquez, 2005) for core roles, i.e. 81%.
Their overall F1 which includes all the arguments
was 79.44%. This confirms that the classification of
the non-core roles is more complex than the other
arguments.
Finally, the high computation time of the re-
ranker prevented us to use the larger structures
which include all arguments. The major complexity
issue was the slow training and classification time
of SVMs. The time needed for tree kernel function
was not so problematic as we could use the fast eval-
uation proposed in (Moschitti, 2006). This roughly
reduces the computation time to the one required by
a polynomial kernel. The real burden is therefore the
learning time of SVMs that is quadratic in the num-
ber of training instances. For example, to carry out
the re-ranking experiments required approximately
one month of a 64 bits machine (2.4 GHz and 4Gb
Ram). To solve this problem, we are going to study
the impact on the accuracy of fast learning algo-
rithms such as the Voted Perceptron.
5 Related Work
Recently, many kernels for natural language applica-
tions have been designed. In what follows, we high-
light their difference and properties.
The tree kernel used in this article was proposed
in (Collins and Duffy, 2002) for syntactic parsing
re-ranking. It was experimented with the Voted
Perceptron and was shown to improve the syntac-
tic parsing. In (Cumby and Roth, 2003), a feature
description language was used to extract structural
features from the syntactic shallow parse trees asso-
ciated with named entities. The experiments on the
named entity categorization showed that when the
description language selects an adequate set of tree
fragments the Voted Perceptron algorithm increases
its classification accuracy. The explanation was that
the complete tree fragment set contains many irrel-
evant features and may cause overfitting. In (Pun-
yakanok et al., 2005), a set of different syntactic
parse trees, e.g. the n best trees generated by the
Charniak’s parser, were used to improve the SRL
accuracy. These different sources of syntactic infor-
mation were used to generate a set of different SRL
67
Section 21 Section 23
bnd bnd+class bnd bnd+class
AST Classifier RND AST Classifier RND AST Classifier RND AST Classifier RND
- Ord Arg - Ord Arg - Ord Arg - Ord Arg
P. 87.5 88.3 88.3 86.9 85.5 86.3 86.4 85.0 78.6 79.0 79.3 77.8 73.1 73.5 73.4 72.3
R. 87.3 88.1 88.3 87.1 85.7 86.5 86.8 85.6 78.1 78.4 78.7 77.9 73.8 74.1 74.4 73.6
F1 87.4 88.2 88.3 87.0 85.6 86.4 86.6 85.3 78.3 78.7 79.0 77.9 73.4 73.8 73.9 72.9
Table 2: Semantic Role Labeling performance on automatic trees using AST-based classifiers.
outputs. A joint inference stage was applied to re-
solve the inconsistency of the different outputs. In
(Toutanova et al., 2005), it was observed that there
are strong dependencies among the labels of the se-
mantic argument nodes of a verb. Thus, to approach
the problem, a re-ranking method of role sequences
labeled by a TRC is applied. In (Pradhan et al.,
2005b), some experiments were conducted on SRL
systems trained using different syntactic views.
6 Conclusions
Recent work on Semantic Role Labeling has shown
that to achieve high labeling accuracy a joint in-
ference on the whole predicate argument structure
should be applied. As feature design for such task is
complex, we can take advantage from kernel meth-
ods to model our intuitive knowledge about the n-
ary predicate argument relations.
In this paper we have shown that we can exploit
the properties of tree kernels to engineer syntactic
features for the semantic role labeling task. The ex-
periments suggest that (1) the information related
to the whole predicate argument structure is impor-
tant as it can improve the state-of-the-art and (2)
tree kernels can be used in a joint model to gen-
erate relevant syntactic/semantic features. The real
drawback is the computational complexity of work-
ing with SVMs, thus the design of fast algorithm is
an interesting future work.
Acknowledgments
This research is partially supported by the
PrestoSpace EU Project#: FP6-507336.

References
Xavier Carreras and Llu´ıs M`arquez. 2005. Introduction to the
CoNLL-2005 shared task: Semantic role labeling. In Pro-
ceedings of CoNLL05.
Michael Collins and Nigel Duffy. 2002. New ranking algo-
rithms for parsing and tagging: Kernels over discrete struc-
tures, and the voted perceptron. In ACL02.
Chad Cumby and Dan Roth. 2003. Kernel methods for re-
lational learning. In Proceedings of ICML03, Washington,
DC, USA.
Daniel Gildea and Daniel Jurasfky. 2002. Automatic label-
ing of semantic roles. Computational Linguistic, 28(3):496–
530.
T. Joachims. 1999. Making large-scale SVM learning practical.
In B. Sch¨olkopf, C. Burges, and A. Smola, editors, Advances
in Kernel Methods - Support Vector Learning.
Paul Kingsbury and Martha Palmer. 2002. From Treebank to
PropBank. In Proceedings of LREC’02), Las Palmas, Spain.
H.T. Lin, C.J. Lin, and R.C. Weng. 2003. A note on platt’s
probabilistic outputs for support vector machines. Technical
report, National Taiwan University.
Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin,
and Roberto Basili. 2005. Hierarchical semantic role label-
ing. In Proceedings of CoNLL05 shared task, Ann Arbor
(MI), USA.
Alessandro Moschitti. 2004. A study on convolution kernels
for shallow semantic parsing. In Proceedings of ACL’04,
Barcelona, Spain.
Alessandro Moschitti. 2006. Making tree kernels practical
for natural language learning. In Proceedings of EACL’06,
Trento, Italy.
J. Platt. 2000. Probabilistic outputs for support vector ma-
chines and comparison to regularized likelihood methods.
MIT Press.
Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne Ward,
James H. Martin, and Daniel Jurafsky. 2005a. Support vec-
tor learning for semantic argument classification. Machine
Learning Journal.
Sameer Pradhan, Wayne Ward, Kadri Hacioglu, James Martin,
and Daniel Jurafsky. 2005b. Semantic role labeling using
different syntactic views. In Proceedings ACL’05.
V. Punyakanok, D. Roth, and W. Yih. 2005. The necessity of
syntactic parsing for semantic role labeling. In Proceedings
of IJCAI 2005.
Libin Shen, Anoop Sarkar, and Aravind Joshi. 2003. Using
ltag based features in parse reranking. In Conference on
EMNLP03, Sapporo, Japan.
Kristina Toutanova, Penka Markova, and Christopher D. Man-
ning. 2004. The leaf projection path view of parse trees:
Exploring string kernels for hpsg parse selection. In In Pro-
ceedings of EMNLP04.
Kristina Toutanova, Aria Haghighi, and Christopher Manning.
2005. Joint learning improves semantic role labeling. In
Proceedings of ACL05.
