Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP, pages 48–56,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Engineering of Syntactic Features for Shallow Semantic Parsing
Alessandro Moschittidiamondmath
diamondmath DISP - University of Rome “Tor Vergata”, Rome, Italy
{moschitti, pighin, basili}@info.uniroma2.it
† ITC-Irst, ‡ DIT - University of Trento, Povo-Trento, Italy
coppolab@itc.it
Bonaventura Coppola†‡ Daniele Pighindiamondmath Roberto Basilidiamondmath
Abstract
Recent natural language learning research
has shown that structural kernels can be
effectively used to induce accurate models
of linguistic phenomena.
In this paper, we show that the above prop-
erties hold on a novel task related to predi-
cate argument classification. A tree kernel
for selecting the subtrees which encodes
argument structures is applied. Experi-
ments with Support Vector Machines on
large data sets (i.e. the PropBank collec-
tion) show that such kernel improves the
recognition of argument boundaries.
1 Introduction
The design of features for natural language process-
ing tasks is, in general, a critical problem. The inher-
ent complexity of linguistic phenomena, often char-
acterized by structured data, makes difficult to find
effective linear feature representations for the target
learning models.
In many cases, the traditional feature selection
techniques (Kohavi and Sommerfield, 1995) are not
so useful since the critical problem relates to feature
generation rather than selection. For example, the
design of features for a natural language syntactic
parse-tree re-ranking problem (Collins, 2000) can-
not be carried out without a deep knowledge about
automatic syntactic parsing. The modeling of syn-
tactic/semantic based features should take into ac-
count linguistic aspects to detect the interesting con-
text, e.g. the ancestor nodes or the semantic depen-
dencies (Toutanova et al., 2004).
A viable alternative has been proposed in (Collins
and Duffy, 2002), where convolution kernels were
used to implicitly define a tree substructure space.
The selection of the relevant structural features was
left to the voted perceptron learning algorithm. An-
other interesting model for parsing re-ranking based
on tree kernel is presented in (Taskar et al., 2004).
The good results show that tree kernels are very
promising for automatic feature engineering, espe-
cially when the available knowledge about the phe-
nomenon is limited.
Along the same line, automatic learning tasks that
rely on syntactic information may take advantage of
a tree kernel approach. One of such tasks is the au-
tomatic boundary detection of predicate arguments
of the kind defined in PropBank (Kingsbury and
Palmer, 2002). For this purpose, given a predicate p
in a sentence s, we can define the notion of predicate
argument spanning trees (PASTs) as those syntac-
tic subtrees of s which exactly cover all and only
the p’s arguments (see Section 4.1). The set of non-
spanning trees can be then associated with all the
remaining subtrees of s.
An automatic classifier which recognizes the
spanning trees can potentially be used to detect the
predicate argument boundaries. Unfortunately, the
application of such classifier to all possible sen-
tence subtrees would require an exponential execu-
tion time. As a consequence, we can use it only to
decide for a reduced set of subtrees associated with
a corresponding set of candidate boundaries. Notice
how these can be detected by previous approaches
48
(e.g. (Pradhan et al., 2004)) in which a traditional
boundary classifier (tbc) labels the parse-tree nodes
as potential arguments (PA). Such classifiers, gen-
erally, are not sensitive to the overall argument struc-
ture. On the contrary, a PAST classifier (pastc) can
consider the overall argument structure encoded in
the associated subtree. This is induced by the PA
subsets.
The feature design for the PAST representation
is not simple. Tree kernels are a viable alternative
that allows the learning algorithm to measure the
similarity between two PASTs in term of all pos-
sible tree substructures.
In this paper, we designed and experimented a
boundary classifier for predicate argument labeling
based on two phases: (1) a first annotation of po-
tential arguments by using a high recall tbc and
(2) a PAST classification step aiming to select the
correct substructures associated with potential argu-
ments. Both classifiers are based on Support Vector
Machines learning. The pastc uses the tree kernel
function defined in (Collins and Duffy, 2002). The
results show that the PAST classification can be
learned with high accuracy (the f-measure is about
89%) and the impact on the overall boundary detec-
tion accuracy is good.
In the remainder of this paper, Section 2 intro-
duces the Semantic Role Labeling problem along
with the boundary detection subtask. Section 3 de-
fines the SVMs using the linear kernel and the parse
tree kernel for boundary detection. Section 4 de-
scribes our boundary detection algorithm. Section 5
shows the preliminary comparative results between
the traditional and the two-step boundary detection.
Finally, Section 7 summarizes the conclusions.
2 Automated Semantic Role Labeling
One of the largest resources of manually annotated
predicate argument structures has been developed in
the PropBank (PB) project. The PB corpus contains
300,000 words annotated with predicative informa-
tion on top of the Penn Treebank 2 Wall Street Jour-
nal texts. For any given predicate, the expected ar-
guments are labeled sequentially from Arg0 to Arg9,
ArgA and ArgM. Figure 1 shows an example of
the PB predicate annotation of the sentence: John
rented a room in Boston.
Predicates in PB are only embodied by verbs
whereas most of the times Arg0 is the subject, Arg1
is the direct object and ArgM indicates locations, as
in our example.
  
  
 
  
  
Predicate 
Arg. 0 
Arg. M 
S 
N 
NP 
D N 
VP 
V John 
in 
 rented 
a   room 
PP 
IN N 
Boston 
Arg. 1 
Figure 1: A predicate argument structure in a parse-tree rep-
resentation.
Several machine learning approaches for auto-
matic predicate argument extraction have been de-
veloped, e.g. (Gildea and Jurasfky, 2002; Gildea and
Palmer, 2002; Gildea and Hockenmaier, 2003; Prad-
han et al., 2004). Their common characteristic is
the adoption of feature spaces that model predicate-
argument structures in a flat feature representation.
In the next section, we present the common parse
tree-based approach to this problem.
2.1 Predicate Argument Extraction
Given a sentence in natural language, all the predi-
cates associated with the verbs have to be identified
along with their arguments. This problem is usually
divided in two subtasks: (a) the detection of the tar-
get argument boundaries, i.e. the span of its words
in the sentence, and (b) the classification of the argu-
ment type, e.g. Arg0 or ArgM in PropBank or Agent
and Goal in FrameNet.
The standard approach to learn both the detection
and the classification of predicate arguments is sum-
marized by the following steps:
1. Given a sentence from the training-set, gener-
ate a full syntactic parse-tree;
2. let P and A be the set of predicates and the
set of parse-tree nodes (i.e. the potential argu-
ments), respectively;
3. for each pair < p,a >∈P ×A:
• extract the feature representation set, Fp,a;
49
• if the subtree rooted in a covers exactly
the words of one argument of p, put Fp,a
in T+ (positive examples), otherwise put
it in T− (negative examples).
For instance, in Figure 1, for each combination of
the predicate rent with the nodes N, S, VP, V, NP,
PP, D or IN the instances Frent,a are generated. In
case the node a exactly covers ”John”, ”a room” or
”in Boston”, it will be a positive instance otherwise
it will be a negative one, e.g. Frent,IN.
The T+ and T− sets are used to train the bound-
ary classifier. To train the multi-class classifier T+
can be reorganized as positive T+argi and negative
T−argi examples for each argument i. In this way,
an individual ONE-vs-ALL classifier for each argu-
ment i can be trained. We adopted this solution, ac-
cording to (Pradhan et al., 2004), since it is simple
and effective. In the classification phase, given an
unseen sentence, all its Fp,a are generated and clas-
sified by each individual classifier Ci. The argument
associated with the maximum among the scores pro-
vided by the individual classifiers is eventually se-
lected.
2.2 Standard feature space
The discovery of relevant features is, as usual, a
complex task. However, there is a common con-
sensus on the set of basic features. These stan-
dard features, firstly proposed in (Gildea and Juras-
fky, 2002), refer to unstructured information de-
rived from parse trees, i.e. Phrase Type, Predicate
Word, Head Word, Governing Category, Position
and Voice. For example, the Phrase Type indicates
the syntactic type of the phrase labeled as a predicate
argument, e.g. NP for Arg1 in Figure 1. The Parse
Tree Path contains the path in the parse tree between
the predicate and the argument phrase, expressed as
a sequence of nonterminal labels linked by direction
(up or down) symbols, e.g. V↑VP↓NP for Arg1 in
Figure 1. The Predicate Word is the surface form of
the verbal predicate, e.g. rent for all arguments.
In the next section we describe the SVM approach
and the basic kernel theory for the predicate argu-
ment classification.
3 Learning predicate structures via
Support Vector Machines
Given a vector space in Rfracturn and a set of positive and
negative points, SVMs classify vectors according to
a separating hyperplane, H(vectorx) = vectorw × vectorx + b = 0,
where vectorw ∈ Rfracturn and b ∈ Rfractur are learned by applying
the Structural Risk Minimization principle (Vapnik,
1995).
To apply the SVM algorithm to Predicate Argu-
ment Classification, we need a function φ : F →Rfracturn
to map our features space F = {f1,..,f|F|} and our
predicate/argument pair representation, Fp,a = Fz,
into Rfracturn, such that:
Fz → φ(Fz) = (φ1(Fz),..,φn(Fz))
From the kernel theory we have that:
H(vectorx) =
parenleftBig summationdisplay
i=1..l
αivectorxi
parenrightBig
·vectorx+b =
summationdisplay
i=1..l
αivectorxi ·vectorx+b = summationdisplay
i=1..l
αiφ(Fi)·φ(Fz)+b.
where, Fi ∀i ∈ {1,..,l} are the training instances
and the product K(Fi,Fz) =<φ(Fi)·φ(Fz)> is the
kernel function associated with the mapping φ.
The simplest mapping that we can apply is
φ(Fz) = vectorz = (z1,...,zn) where zi = 1 if fi ∈ Fz
and zi = 0 otherwise, i.e. the characteristic vector
of the set Fz with respect to F. If we choose the
scalar product as a kernel function we obtain the lin-
ear kernel KL(Fx,Fz) = vectorx·vectorz.
An interesting property is that we do not need to
evaluate the φ function to compute the above vector.
Only the K(vectorx,vectorz) values are in fact required. This al-
lows us to derive efficient classifiers in a huge (pos-
sible infinite) feature space, provided that the ker-
nel is processed in an efficient way. This property
is also exploited to design convolution kernel like
those based on tree structures.
3.1 The tree kernel function
The main idea of the tree kernels is the modeling of
a KT(T1,T2) function which computes the number
of common substructures between two trees T1 and
T2.
Given the set of substructures (fragments)
{f1,f2,..} = F extracted from all the trees of the
training set, we define the indicator function Ii(n)
50
 
S 
NP VP 
VP VP CC 
VB NP 
took DT NN 
the book 
and VB NP 
read PRP$ NN 
its title 
PRP 
John 
S 
NP VP 
VP 
VB NP 
read 
Sentence Parse-Tree 
S 
NP VP 
VP 
VB NP 
  took 
took{ARG0, ARG1} 
PRP 
John 
PRP 
John 
DT NN 
the book 
PRP$ NN 
its title 
read{ARG0, ARG1} 
Figure 2: A sentence parse tree with two predicative tree structures (PASTs)
which is equal 1 if the target fi is rooted at node n
and 0 otherwise. It follows that:
KT(T1,T2) = summationdisplay
n1∈NT1
summationdisplay
n2∈NT2
∆(n1,n2) (1)
where NT1 and NT2 are the sets of the T1’s
and T2’s nodes, respectively and ∆(n1,n2) =summationtext
|F|
i=1 Ii(n1)Ii(n2). This latter is equal to the num-
ber of common fragments rooted at the n1 and n2
nodes. We can compute ∆ as follows:
1. if the productions at n1 and n2 are different
then ∆(n1,n2) = 0;
2. if the productions at n1 and n2 are the same,
and n1 and n2 have only leaf children (i.e. they
are pre-terminals symbols) then ∆(n1,n2) =
1;
3. if the productions at n1 and n2 are the same,
and n1 and n2 are not pre-terminals then
∆(n1,n2) =
nc(n1)productdisplay
j=1
(1+∆(cjn1,cjn2)) (2)
where nc(n1) is the number of the children of n1
and cjn is the j-th child of the node n. Note that, as
the productions are the same, nc(n1) = nc(n2).
The above kernel has the drawback of assigning
higher weights to larger structures1. In order to over-
come this problem we scale the relative importance
of the tree fragments imposing a parameter λ in con-
ditions 2 and 3 as follows: ∆(nx,nz) = λ and
∆(nx,nz) = λproducttextnc(nx)j=1 (1+∆(cjn1,cjn2)).
1In order to approach this problem and to map similarity
scores in the [0,1] range, a normalization in the kernel space,
i.e. KprimeT(T1,T2) = KT(T1,T2)√K
T(T1,T1)×KT(T2,T2)
. is always applied
4 Boundary detection via argument
spanning
Section 2 has shown that traditional argument
boundary classifiers rely only on features extracted
from the current potential argument node. In or-
der to take into account a complete argument struc-
ture information, the classifier should select a set of
parse-tree nodes and consider them as potential ar-
guments of the target predicate. The number of all
possible subsets is exponential in the number of the
parse-tree nodes of the sentence, thus, we need to
cut the search space. For such purpose, a traditional
boundary classifier can be applied to select the set
of potential arguments PA. The reduced number of
PAsubsets can be associated with sentence subtrees
which in turn can be classified by using tree kernel
functions. These measure if a subtree is compatible
or not with the subtree of a correct predicate argu-
ment structure.
4.1 The Predicate Argument Spanning Trees
(PASTs)
We consider the predicate argument structures an-
notated in PropBank along with the corresponding
TreeBank data as our object space. Given the target
predicate p in a sentence parse tree T and a subset
s = {n1,..,nk} of the T’s nodes, NT , we define as
the spanning tree root r the lowest common ancestor
of n1,..,nk. The node spanning tree (NST), ps is
the subtree rooted in r, from which the nodes that
are neither ancestors nor descendants of any ni are
removed.
Since predicate arguments are associated with
tree nodes, we can define the predicate argu-
51
 
S 
NP VP 
VB NP 
read 
John 
DT NN 
the title 
NP PP 
DT NN 
the book 
NP IN 
of 
Arg. 1 
Arg. 0 
S 
NP VP 
VB NP 
read 
John 
DT NN 
the title 
NP PP 
DT NN 
the book 
NP IN 
of 
S 
NP VP 
VB NP 
read 
John 
DT NN 
the title 
NP PP 
DT NN 
the book 
NP IN 
of 
S 
NP-0 VP 
John 
PP 
DT NN 
the book 
NP IN 
of 
S 
NP-0 VP 
VB NP 
read 
John 
DT NN 
the title 
NP-1 PP-2 
DT NN 
the book 
IN 
of 
NP 
(a) (b) (c) 
Correct PAST 
Incorrect  PAST 
Correct PAST 
Incorrect  PAST 
DT NN 
the title 
NP 
NP-1 VB 
read 
 
 
 
Figure 3: Two-step boundary classifier.
ment spanning tree (PAST) of a predicate ar-
gument set, {a1,..,an}, as the NST over such
nodes, i.e. p{a1,..,an}. A PAST corresponds
to the minimal subparse tree whose leaves are
all and only the word sequence compounding
the arguments. For example, Figure 2 shows
the parse tree of the sentence "John took the
book and read its title". took{ARG0,ARG1}
and read{ARG0,ARG1} are two PAST structures
associated with the two predicates took and read,
respectively. All the other NSTs are not valid
PASTs.
Notice that, labeling ps,∀s ⊆ NT with a PAST
classifier (pastc) corresponds to solve the boundary
problem. The critical points for the application of
this strategy are: (1) how to design suitable features
for the PAST characterization. This new problem
requires a careful linguistic investigation about the
significant properties of the argument spanning trees
and (2) how to deal with the exponential number of
NSTs.
For the first problem, the use of tree kernels over
the PASTs can be an alternative to the manual fea-
tures design as the learning machine, (e.g. SVMs)
can select the most relevant features from a high di-
mensional feature space. In other words, we can use
Eq. 1 to estimate the similarity between two PASTs
avoiding to define explicit features. The same idea
has been successfully applied to the parse-tree re-
ranking task (Taskar et al., 2004; Collins and Duffy,
2002) and predicate argument classification (Mos-
chitti, 2004).
For the second problem, i.e. the high computa-
tional complexity, we can cut the search space by us-
ing a traditional boundary classifier (tbc), e.g. (Prad-
han et al., 2004), which provides a small set of po-
tential argument nodes. Let PA be the set of nodes
located by tbc as arguments. We may consider the
set P of the NSTs associated with any subset of
PA, i.e. P = {ps : s ⊆ PA}. However, also
the classification ofP may be computationally prob-
lematic since theoretically there are |P| = 2|PA|
members.
In order to have a very efficient procedure, we
applied pastc to only the PA sets associated with
incorrect PASTs. A way to detect such incor-
rect NSTs is to look for a node pair <n1,n2>∈
PA × PA of overlapping nodes, i.e. n1 is ances-
tor of n2 or viceversa. After we have detected such
nodes, we create two node sets PA1 = PA−{n1}
and PA2 = PA−{n2} and classify them with the
pastc to select the correct set of argument bound-
aries. This procedure can be generalized to a set of
overlapping nodes O greater than 2 as reported in
Appendix 1.
Note that the algorithm selects a maximal set of
non-overlapping nodes, i.e. the first that is gener-
ated. Additionally, the worst case is rather rare thus
the algorithm is very fast on average.
The Figure 3 shows a working example of the
multi-stage classifier. In Frame (a), tbc labels as
potential arguments (gray color) three overlapping
nodes (in Arg.1). The overlap resolution algorithm
proposes two solutions (Frame (b)) of which only
one is correct. In fact, according to the second so-
lution the propositional phrase ”of the book” would
incorrectly be attached to the verbal predicate, i.e.
in contrast with the parse tree. The pastc, applied
52
to the two NSTs, should detect this inconsistency
and provide the correct output. Note that, during the
learning, we generate the non-overlapping structures
in the same way to derive the positive and negative
examples.
4.2 Engineering Tree Fragment Features
In the Frame (b) of Figure 3, we show one of the
possible cases which pastc should deal with. The
critical problem is that the two NSTs are perfectly
identical, thus, it is not possible to discern between
them using only their parse-tree fragments.
The solution to engineer novel features is to sim-
ply add the boundary information provided by the
tbc to the NSTs. We mark with a progressive num-
ber the phrase type corresponding to an argument
node, starting from the leftmost argument. For ex-
ample, in the first NST of Frame (c), we mark
as NP-0 and NP-1 the first and second argument
nodes whereas in the second NST we have an hy-
pothesis of three arguments on the NP, NP and PP
nodes. We trasform them in NP-0, NP-1 and
PP-2.
This simple modification enables the tree ker-
nel to generate features useful to distinguish be-
tween two identical parse trees associated with dif-
ferent argument structures. For example, for the first
NST the fragments [NP-1 [NP][PP]], [NP
[DT][NN]] and [PP [IN][NP]] are gener-
ated. They do not match anymore with the [NP-0
[NP][PP]], [NP-1 [DT][NN]] and [PP-2
[IN][NP]] fragments of the second NST.
In order to verify the relevance of our model, the
next section provides empirical evidence about the
effectiveness of our approach.
5 The Experiments
The experiments were carried out with
the SVM-light-TK software available at
http://ai-nlp.info.uniroma2.it/moschitti/
which encodes the tree kernels in the SVM-light
software (Joachims, 1999). For tbc, we used the
linear kernel with a regularization parameter (option
-c) equal to 1 and a cost-factor (option -j) of 10 to
have a higher Recall. For the pastc we used λ = 0.4
(see (Moschitti, 2004)).
As referring dataset, we used the PropBank cor-
pora available at www.cis.upenn.edu/∼ace,
along with the Penn TreeBank 2
(www.cis.upenn.edu/∼treebank) (Marcus et
al., 1993). This corpus contains about 53,700
sentences and a fixed split between training and
testing which has been used in other researches, e.g.
(Pradhan et al., 2004; Gildea and Palmer, 2002).
We did not include continuation and co-referring
arguments in our experiments.
We used sections from 02 to 07 (54,443 argu-
ment nodes and 1,343,046 non-argument nodes) to
train the traditional boundary classifier (tbc). Then,
we applied it to classify the sections from 08 to
21 (125,443 argument nodes vs. 3,010,673 non-
argument nodes). As results we obtained 2,988
NSTs containing at least an overlapping node pair
out of the total 65,212 predicate structures (accord-
ing to the tbc decisions). From the 2,988 over-
lapping structures we extracted 3,624 positive and
4,461 negative NSTs, that we used to train the
pastc.
The performance was evaluated with the F1 mea-
sure2 over the section 23. This contains 10,406 ar-
gument nodes out of 249,879 parse tree nodes. By
applying the tbc classifier we derived 235 overlap-
ping NSTs, from which we extracted 204 PASTs
and 385 incorrect predicate argument structures. On
such test data, the performance of pastc was very
high, i.e. 87.08% in Precision and 89.22% in Recall.
Using the pastc we removed from the tbc the PA
that cause overlaps. To measure the impact on the
boundary identification performance, we compared
it with three different boundary classification base-
lines:
• tbc: overlaps are ignored and no decision is
taken. This provides an upper bound for the
recall as no potential argument is rejected for
later labeling. Notice that, in presence of over-
lapping nodes, the sentence cannot be anno-
tated correctly.
• RND: one among the non-overlapping struc-
tures with maximal number of arguments is
randomly selected.
2F1 assigns equal importance to Precision P and Recall R,
i.e. F1 = 2P×RP+R .
53
tbc tbc+RND tbc+Heu tbc+pastc
P R F P R F P R F P R F
All Struct. 92.21 98.76 95.37 93.55 97.31 95.39 92.96 97.32 95.10 94.40 98.42 96.36
Overl. Struct. 98.29 65.8 78.83 74.00 72.27 73.13 68.12 75.23 71.50 89.61 92.68 91.11
Table 1: Two-steps boundary classification performance using the traditional boundary classifier tbc, the random selection of
non-overlapping structures (RND), the heuristic to select the most suitable non-overlapping node set (Heu) and the predicate
argument spanning tree classifier (pastc).
• Heu (heuristic): one of the NSTs which con-
tain the nodes with the lowest overlapping
score is chosen. This score counts the number
of overlapping node pairs in the NST. For ex-
ample, in Figure 3.(a) we have a NP that over-
laps with two nodes NP and PP, thus it is as-
signed a score of 2.
The third row of Table 1 shows the results of tbc,
tbc + RND, tbc + Heu and tbc + pastc in the
columns 2,3,4 and 5, respectively. We note that:
• The tbc F1 is slightly higher than the result ob-
tained in (Pradhan et al., 2004), i.e. 95.37%
vs. 93.8% on same training/testing conditions,
i.e. (same PropBank version, same training and
testing split and same machine learning algo-
rithm). This is explained by the fact that we
did not include the continuations and the co-
referring arguments that are more difficult to
detect.
• Both RND and Heu do not improve the tbc re-
sult. This can be explained by observing that in
the 50% of the cases a correct node is removed.
• When, to select the correct node, the pastc is
used, the F1 increases of 1.49%, i.e. (96.86 vs.
95.37). This is a very good result considering
that to increase the very high baseline of tbc is
hard.
In order to give a fairer evaluation of our approach
we tested the above classifiers on the overlapping
structures only, i.e. we measured the pastc improve-
ment on all and only the structures that required its
application. Such reduced test set contains 642 ar-
gument nodes and 15,408 non-argument nodes. The
fourth row of Table 1 reports the classifier perfor-
mance on such task. We note that the pastc im-
proves the other heuristics of about 20%.
6 Related Work
Recently, many kernels for natural language applica-
tions have been designed. In what follows, we high-
light their difference and properties.
The tree kernel used in this article was proposed
in (Collins and Duffy, 2002) for syntactic parsing re-
ranking. It was experimented with the Voted Percep-
tron and was shown to improve the syntactic parsing.
A refinement of such technique was presented in
(Taskar et al., 2004). The substructures produced by
the proposed tree kernel were bound to local prop-
erties of the target parse tree and more lexical infor-
mation was added to the overall kernel function.
In (Zelenko et al., 2003), two kernels over syn-
tactic shallow parser structures were devised for
the extraction of linguistic relations, e.g. person-
affiliation. To measure the similarity between two
nodes, the contiguous string kernel and the sparse
string kernel (Lodhi et al., 2000) were used. The
former can be reduced to the contiguous substring
kernel whereas the latter can be transformed in the
non-contiguous string kernel. The high running time
complexity, caused by the general form of the frag-
ments, limited the experiments on data-set of just
200 news items.
In (Cumby and Roth, 2003), it is proposed a de-
scription language that models feature descriptors
to generate different feature type. The descriptors,
which are quantified logical prepositions, are instan-
tiated by means of a concept graph which encodes
the structural data. In the case of relation extraction
the concept graph is associated with a syntactic shal-
low parse and the extracted propositional features
express fragments of a such syntactic structure. The
experiments over the named entity class categoriza-
tion show that when the description language selects
an adequate set of tree fragments the Voted Percep-
tron algorithm increases its classification accuracy.
In (Culotta and Sorensen, 2004) a dependency
54
tree kernel is used to detect the Named Entity classes
in natural language texts. The major novelty was
the combination of the contiguous and sparse ker-
nels with the word kernel. The results show that
the contiguous outperforms the sparse kernel and the
bag-of-words.
7 Conclusions
The feature design for new natural language learn-
ing tasks is difficult. We can take advantage from
the kernel methods to model our intuitive knowledge
about the target linguistic phenomenon. In this pa-
per we have shown that we can exploit the properties
of tree kernels to engineer syntactic features for the
predicate argument boundary detection task.
Preliminary results on gold standard trees suggest
that (1) the information related to the whole predi-
cate argument structure is important and (2) tree ker-
nel can be used to generate syntactic features.
In the future, we would like to use an approach
similar to the PAST classifier on parses provided
by different parsing models to detect boundary and
to classify semantic role more accurately .
Acknowledgements
We wish to thank Ana-Maria Giuglea for her help in
the design and implementation of the basic Seman-
tic Role Labeling system that we used in the experi-
ments.

References
Michael Collins and Nigel Duffy. 2002. New ranking
algorithms for parsing and tagging: Kernels over dis-
crete structures, and the voted perceptron. In ACL02.
Michael Collins. 2000. Discriminative reranking for nat-
ural language parsing. In Proceedings of ICML 2000.
Aron Culotta and Jeffrey Sorensen. 2004. Dependency
tree kernels for relation extraction. In Proceedings of
the 42nd Meeting of the Association for Computational
Linguistics (ACL’04), Main Volume, pages 423–429,
Barcelona, Spain, July.
Chad Cumby and Dan Roth. 2003. Kernel methods for
relational learning. In Proceedings of the Twentieth
International Conference (ICML 2003), Washington,
DC, USA.
Daniel Gildea and Julia Hockenmaier. 2003. Identifying
semantic roles using combinatory categorial grammar.
In Proceedings of the 2003 Conference on Empirical
Methods in Natural Language Processing, Sapporo,
Japan.
Daniel Gildea and Daniel Jurasfky. 2002. Automatic la-
beling of semantic roles. Computational Linguistic,
28(3):496–530.
Daniel Gildea and Martha Palmer. 2002. The neces-
sity of parsing for predicate argument recognition. In
Proceedings of the 40th Annual Conference of the
Association for Computational Linguistics (ACL-02),
Philadelphia, PA, USA.
T. Joachims. 1999. Making large-scale SVM learning
practical. In B. Sch¨olkopf, C. Burges, and A. Smola,
editors, Advances in Kernel Methods - Support Vector
Learning.
Paul Kingsbury and Martha Palmer. 2002. From Tree-
bank to PropBank. In Proceedings of the 3rd Interna-
tional Conference on Language Resources and Evalu-
ation (LREC-2002), Las Palmas, Spain.
Ron Kohavi and Dan Sommerfield. 1995. Feature sub-
set selection using the wrapper model: Overfitting and
dynamic search space topology. In The First Interna-
tional Conference on Knowledge Discovery and Data
Mining, pages 192–197. AAAI Press, Menlo Park,
California, August. Journal version in AIJ.
Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello
Cristianini, and Christopher Watkins. 2000. Text clas-
sification using string kernels. In NIPS, pages 563–
569.
M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz.
1993. Building a large annotated corpus of en-
glish: The Penn Treebank. Computational Linguistics,
19:313–330.
Alessandro Moschitti. 2004. A study on convolution ker-
nels for shallow semantic parsing. In proceedings of
the 42th Conference on Association for Computational
Linguistic (ACL-2004), Barcelona, Spain.
Sameer Pradhan, Kadri Hacioglu, Valeri Krugler, Wayne
Ward, James H. Martin, and Daniel Jurafsky. 2005.
Support vector learning for semantic argument classi-
fication. to appear in Machine Learning Journal.
Ben Taskar, Dan Klein, Mike Collins, Daphne Koller, and
Christopher Manning. 2004. Max-margin parsing. In
Dekang Lin and Dekai Wu, editors, Proceedings of
EMNLP 2004, pages 1–8, Barcelona, Spain, July. As-
sociation for Computational Linguistics.
Kristina Toutanova, Penka Markova, and Christopher D.
Manning. 2004. The leaf projection path view of
parse trees: Exploring string kernels for hpsg parse se-
lection. In Proceedings of EMNLP 2004.
V. Vapnik. 1995. The Nature of Statistical Learning The-
ory. Springer.
D. Zelenko, C. Aone, and A. Richardella. 2003. Kernel
methods for relation extraction. Journal of Machine
Learning Research.
