Cascaded Grammatical Relation Assignment 
Sabine Buchholz and Jorn Veenstra and Walter Daelemans 
ILK, Computational Linguistics, Tilburg University 
PO box 90153, 5000 LE Tilburg, The Netherlands 
\[buchholz, veenstra, dae lemans\] 0kub. nl 
Abstract 
In this paper we discuss cascaded Memory- 
Based grammatical relations assignment. In the 
first stages of the cascade, we find chunks of sev- 
eral types (NP,VP,ADJP,ADVP,PP) and label 
them with their adverbial function (e.g. local, 
temporal). In the last stage, we assign gram- 
matical relations to pairs of chunks. We stud- 
ied the effect of adding several levels to this cas- 
caded classifier and we found that even the less 
performing chunkers enhanced the performance 
of the relation finder. 
1 Introduction 
When dealing with large amounts of text, find- 
ing structure in sentences is often a useful pre- 
processing step. Traditionally, full parsing is 
used to find structure in sentences. However, 
full parsing is a complex task and often pro- 
vides us with more information then we need. 
For many tasks detecting only shallow struc- 
tures in a sentence in a fast and reliable way is 
to be preferred over full parsing. For example, 
in information retrieval it can be enough to find 
only simple NPs and VPs in a sentence, for in- 
formation extraction we might also want to find 
relations between constituents as for example 
the subject and object of a verb. 
In this paper we discuss some Memory-Based 
(MB) shallow parsing techniques to find labeled 
chunks and grammatical relations in a sentence. 
Several MB modules have been developed in 
previous work, such as: a POS tagger (Daele- 
mans et al., 1996), a chunker (Veenstra, 1998; 
Tjong Kim Sang and Veenstra, 1999) and a 
grammatical relation (GR) assigner (Buchholz, 
1998). The questions we will answer in this pa- 
per are: Can we reuse these modules in a cas- 
cade of classifiers? What is the effect of cascad- 
ing? Will errors at a lower level percolate to 
higher modules? 
Recently, many people have looked at cas- 
caded and/or shallow parsing and GR assign- 
ment. Abney (1991) is one of the first who pro- 
posed to split up parsing into several cascades. 
He suggests to first find the chunks and then 
the dependecies between these chunks. Grefen- 
stette (1996) describes a cascade of finite-state 
transducers, which first finds noun and verb 
groups, then their heads, and finally syntactic 
functions. Brants and Skut (1998) describe a 
partially automated annotation tool which con- 
structs a complete parse of a sentence by recur- 
sively adding levels to the tree. (Collins, 1997; 
Ratnaparkhi, 1997) use cascaded processing for 
full parsing with good results. Argamon et al. 
(1998) applied Memory-Based Sequence Learn- 
ing (MBSL) to NP chunking and subject/object 
identification. However, their subject and ob- 
ject finders are independent of their chunker 
(i.e. not cascaded). 
Drawing from this previous work we will 
explicitly study the effect of adding steps to 
the grammatical relations assignment cascade. 
Through experiments with cascading several 
classifiers, we will show that even using im- 
perfect classifiers can improve overall perfor- 
mance of the cascaded classifier. We illustrate 
this claim on the task of finding grammati- 
cal relations (e.g. subject, object, locative) to 
verbs in text. The GR assigner uses several 
sources of information step by step such as sev- 
eral types of XP chunks (NP, VP, PP, ADJP 
and ADVP), and adverbial functions assigned 
to these chunks (e.g. temporal, local). Since 
not all of these entities are predicted reliably, it 
is the question whether each source leads to an 
improvement of the overall GR assignment. 
In the rest of this paper we will first briefly de- 
scribe Memory-Based Learning in Section 2. In 
239 
Section 3.1, we discuss the chunking classifiers 
that we later use as steps in the cascade. Sec- 
tion 3.2 describes the basic GR classifier. Sec- 
tion 3.3 presents the architecture and results of 
the cascaded GR assignment experiments. We 
discuss the results in Section 4 and conclude 
with Section 5. 
2 Memory-Based Learning 
Memory-Based Learning (MBL) keeps all train- 
ing data in memory and only abstracts at clas- 
sification time by extrapolating a class from the 
most similar item(s) in memory. In recent work 
Daelemans et al. (1999b) have shown that for 
typical natural language processing tasks, this 
approach is at an advantage because it also 
"remembers" exceptional, low-frequency cases 
which are useful to extrapolate from. More- 
over, automatic feature weighting in the similar- 
ity metric of an MB learner makes the approach 
well-suited for domains with large numbers of 
features from heterogeneous sources, as it em- 
bodies a smoothing-by-similarity method when 
data is sparse (Zavrel and Daelemans, 1997). 
We have used the following MBL algorithms1: 
IB1 : A variant of the k-nearest neighbor (k- 
NN) algorithm. The distance between a 
test item and each memory item is defined 
as the number of features for which they 
have a different value (overlap metric). 
IBi-IG : IB1 with information gain (an 
information-theoretic notion measuring the 
reduction of uncertainty about the class to 
be predicted when knowing the value of a 
feature) to weight the cost of a feature value 
mismatch during comparison. 
IGTree : In.this variant, a decision tree is cre- 
ated with features as tests, and ordered ac- 
cording to the information gain of the fea- 
tures, as a heuristic approximation of the 
computationally more expensive IB1 vari- 
ants. 
For more references and information about 
these algorithms we refer to (Daelemans et al., 
1998; Daelemans et al., 1999b). For other 
1For the experiments described in this paper we have 
used TiMBL, an MBL software package developed in the 
ILK-group (Daelemans et al., 1998), TiMBL is available 
from: http://ilk.kub.nl/. 
memory-based approaches to parsing, see (Bod, 
1992) and (Sekine, 1998). 
3 Methods and Results 
In this section we describe the stages of the cas- 
cade. The very first stage consists of a Memory- 
Based Part-of-Speech Tagger (MBT) for which 
we refer to (Daelemans et al., 1996). The 
next three stages involve determining bound- 
aries and labels of chunks. Chunks are non- 
recursive, non-overlapping constituent parts of 
sentences (see (Abney, 1991)). First, we si- 
multaneously chunk sentences into: NP-, VP- 
, Prep-, ADJP- and APVP-chunks. As these 
chunks are non-overlapping, no words can be- 
long to more than one chunk, and thus no con- 
flicts can arise. Prep-chunks are the preposi- 
tional part of PPs, thus excluding the nominal 
part. Then we join a Prep-chunk and one -- 
or more coordinated -- NP-chunks into a PP- 
chunk. Finally, we assign adverbial function 
(ADVFUNC) labels (e.g. locative or temporal) 
to all chunks. 
In the last stage of the cascade, we label 
several types of grammatical relations between 
pairs of words in the sentence. 
The data for all our experiments was ex- 
tracted from the Penn Treebank II Wall Street 
Journal (WSJ) corpus (Marcus et al., 1993). 
For all experiments, we used sections 00-19 as 
training material and 20-24 as test material. 
See Section 4 for results on other train/test set 
splittings. 
For evaluation of our results we use the pre- 
cision and recall measures. Precision is the per- 
centage of predicted chunks/relations that are 
actually correct, recall is the percentage of cor- 
rect chunks/relations that are actually found. 
For convenient comparisons of only one value, 
we also list the FZ=i value (C.J.van Rijsbergen, 
1979): (Z2+l)'prec'rec /~2.prec+rec , with/~ = 1 
3.1 Chunking 
In the first experiment described in this section, 
the task is to segment the sentence into chunks 
and to assign labels to these chunks. This pro- 
cess of chunking and labeling is carried out by 
assigning a tag to each word in a sentence left- 
to-right. Ramshaw and Marcus (1995) first as- 
signed a chunk tag to each word in the sentence: 
I for inside a chunk, O for outside a chunk, and 
240 
type precision B tbr inside a chunk, but tile preceding word is 
in another chunk. As we want to find more than 
one kind of chunk, we have to \[hrther differen- 
tiate tile IOB tags as to which kind of chunk 
(NP, VP, Prep, ADJP or ADVP) the word is 
ill. With the extended IOB tag set at hand we 
can tag the sentence: 
But/CO \[NP the/DT dollar/NN NP\] 
\[ADVP later/KB ADVP\] 
\[VP rebounded/VBD VP\] ,/, 
\[VP finishing/VBG VP\] 
\[ADJP slightly/KB higher/RBK ADJP\] 
\[Prep against/IN Prep\] \[NP the/DT 
yen/NNS NP\] \[ADJP although/IN ADJP\] 
\[ADJP slightly/RB lower/J JR ADJP\] 
\[Prep against/IN Prep\] \[NP the/DT 
mark/NN NP\] ./. 
as: 
But/CCo the/DTi_NP dollar/NNi_Np 
later/KBi-ADVP rebounded/VBDi_vP ,/,0 
f inishing/VBGi_vP slightly/KBi-ADVP 
higher/RBRi_ADVP against/INi_Prep 
the/DTl_Np yen/NNSi_NP 
although/INi_ADJP slightly/KBB_ADJP 
lower/JJKl_ADJP against/IN\[_Prep 
the/DTf_Np mark/NNi_Np ./.o 
After having found Prep-, NP- and other 
chlmks, we collapse Preps and NPs to PPs in 
a second step. While the GR assigner finds re- 
lations between VPs and other chunks (cf. Sec- 
tion 3.2), the PP chunker finds relations be- 
tween prepositions and NPs 2 in a way sim- 
ilar to GR assignment (see Section 3.2). In 
the last chunking/labeling step, we assign ad- 
verbial functions to chunks. The classes are 
the adverbial function labels from the treebank: 
LOC (locative), TMP (temporal), DIR (direc- 
tional), PRP (purpose and reason), MNR (man- 
ner), EXT (extension) or "-" for none of the 
fornmr. Table 1 gives an overview of the results 
of the chunking-labeling experiments, using the 
following algorithms, determined by validation 
on the train set: IBi-IG for XP-chunking and 
IGTree for PP-chunking and ADVFUNCs as- 
signment. 
3.2 Grammatical Relation Assignment 
In grammatical relation assignment we assign 
a GR to pairs of words in a sentence. In our 
2pPs containing anything else than NPs (e.g. without 
bringing his wife) are not searched for. 
NPchunks 
VPchunks 
ADJPchunks 
ADVPchunks 
Prepchunks 
92.5 
91.9 
68.4 
78.0 
95.5 
PPchunks 91.9 
ADVFUNCs 78.0 
recall 
92.2 
91.7 
65.0 
77.9 
96.7 
92.2 
69.5 
 fl=l 
92.3 
91.8 
66.7 
77.9 
96.1 
92.0 
73.5 
Table h Results of chunking-labeling experi- 
ments. NP-,VP-, ADJP-, ADVP- and Prep- 
chunks are found sinmltaneously, but for con- 
venience, precision and recall values are given 
separately for each type of chunk. 
experiments, one of these words is always a verb, 
since this yields the most important GRs. The 
other word is the head of the phrase which is 
annotated with this grammatical relation in the 
treebank. A preposition is the head of a PP, 
a noun of an NP and so on. Defining relations 
to hold between heads means that the algorithm 
can, for example, find a subject relation between 
a noun and a verb without necessarily having to 
make decisions about the precise boundaries of 
the subject NP. 
Suppose we had the POS-tagged sentence 
shown in Figure 1 and we wanted the algorithm 
to decide whether, and if so how, Miller (hence- 
forth: the focus) is related to the first verb or- 
ganized. We then construct an instance for this 
pair of words by extracting a set of feature val- 
ues from the sentence. The instance contains 
information about the verb and the focus: a 
feature for the word form and a feature for the 
POS of both. It also has similar features for the 
local context of the focus. Experiments on the 
training data suggest an optimal context width 
of two elements to the left and one to the right. 
In the present case, elements are words or punc- 
tuation signs. In addition to the lexical and the 
local context information, we include superficial 
information about clause structure: The first 
feature indicates the distance from the verb to 
the focus, counted in elements. A negative dis- 
tance means that the focus is to the left of the 
verb. The second feature contains the number 
of other verbs between the verb and the focus. 
The third feature is the number of intervening 
commas. The features were chosen by manual 
241 
Not/RB surprisingly/RB ,/, Peter/NNP Miller/NNP ,/, who/WP organized/VBD the/DT con- 
ference/NN in~IN New/NNP York/NNP ,/, does/VBZ not/RB want/VB to~TO come/VB to~IN 
Paris /NNP without~IN bringing /VBG his /P RP$ wife /NN . 
Figure 1: An example sentence annotated with POS. 
Verb Context -2 Context -1 
word pos word posl word 
123 4 5 6 7 8 9 
-702 
-602 
-401. 
-301 
-100 
surprisingly rb 
Miller nnp 
organized vbd 
organized vbd 
organized vbd 
organized vbd 
organized vbd 
not rb 
Peter nnp 
Focus Context +1 
pos word pos 
10 11 12 13 
not rb 
surprisingly rb 
Peter nnp 
Miller nnp 
who wp 
surprisingly rb 
Millet nnp 
organized vbd 
Class 
np-sbj 
Table 2: The first five instances for the sentence in Figure 1. Features 1-3 are the Features for 
distance and intervening VPs and commas. Features 4 and 5 show the verb and its POS. Features 
6-7, 8-9 and 12-13 describe the context words, Features 10-11 the focus word. Empty contexts 
are indicated by the value "-" for all features. 
"feature engineering". Table 2 shows the com- 
plete instance for Miller-organized in row 5, to- 
gether with the other first four instances for the 
sentence. The class is mostly "-", to indicate 
that the word does not have a direct grammati- 
cal relation to organized. Other possible classes 
are those from a list of more than 100 different 
labels found in the treebank. These are combi- 
nations of a syntactic category and zero, one or 
more functions, e.g. NP-SBJ for subject, NP-PRD 
for predicative object, NP for (in)direct object 3, 
PP-LOC for locative PP adjunct, PP-LOC-CLR for 
subcategorised locative PP, etcetera. Accord- 
ing to their information gain values, features are 
ordered with decreasing importance as follows: 
11, 13, 10, 1, 2, 8, 12, 9, 6 , 4, 7, 3 , 5. In- 
tuitively,, this ordering makes sense. The most 
important feature is the POS of the focus, be- 
cause this determines whether it can have a GR 
to a verb at all (15unctuation cannot) and what 
kind of relation is possible. The POS of the fol- 
lowing word is important, because e.g. a noun 
followed by a noun is probably not the head of 
an NP and will therefore not have a direct GR 
to the verb. The word itself may be important 
if it is e.g. a preposition, a pronoun or a clearly 
temporal/local adverb. Features 1 and 2 give 
some indication of the complexity of the struc- 
ture intervening between the focus and the verb. 
aDirect and indirect object NPs have the same label 
in the treebank annotation. They can be differentiated 
by their position. 
The more complex this structure, the lower the 
probability that the focus and the verb are re- 
lated. Context further away is less important 
than near context. 
To test the effects of the chunking steps from 
Section 3.1 on this task, we will now construct 
instances based on more structured input text, 
like that in Figure 2. This time, the focus is de- 
scribed by five features instead of two, for the 
additional information: which type of chunk it 
is in, what the preposition is if it is in a PP 
chunk, and what the adverbial function is, if 
any. We still have a context of two elements 
left, one right, but elements are now defined to 
be either chunks, or words outside any chunk, 
or punctuation. Each chunk in the context is 
represented by its last word (which is the se- 
mantically most important word in most cases), 
by the POS of the last word, and by the type 
of chunk. The distance feature is adapted to 
the new definition of element, too, and instead 
of counting intervening verbs, we now count in- 
tervening VP chunks. Figure 3 shows the first 
five instances for the sentence in Figure 2. Class 
value"-" again means "the focus is not directly 
related to the verb" (but to some other verb or 
a non-verbal element). According to their in- 
formation gain values, features are ordered in 
decreasing importance as follows: 16, 15, 12, 
14, 11, 2, 1, 19, 10, 9, 13, 18, 6, 17, 8, 4, 7, 3, 
5. Comparing this to the earlier feature order- 
ing, we see that most of the new features are 
242 
\[ADVP Not/RB suwri,singly/1RB ADVP\] ,/, \[NP Peter/NNP Miller/NNP NP\] ,/, \[NP 
who/WP NP\] \[VP organized/VBD VP\] \[NP the/DT conference/NN NP\] {PP-LOC \[Prep 
in~IN Prep\] \[NP New/NNP York/NNP NP\] PP-LOC} ,/, \[VP does/VBZ not/RB want/VB 
to~TO eome/VB VP\] {PP-DIR \[Prep to~IN Prep\] \[NP Paris/NNP NP\] PP-DIR} \[Prep 
.,~ithout/IN Prep\] \[VP bringing/VBG VP\] \[NP his/PRP$ wife/NN NP\] . 
Figure 2: An example sentence annotated with POS (after the slash), ctmnks (with square and 
curly brackets) and adverbial functions (after the dash). 
Stru(:t. Verb Context -2 Context -1 Focus Context +1 
word pos cat word pos cat pr word pos cat adv word pos cat 
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 
org. vbd vp 
conf. nn np 
surpris, rb advp 
Miller nnp np 
who wp np - 
- conf. nn np 
in York nnp pp loc 
org. vbd vp 
i York nnp pp 
org. vbd 
org. vbd 
org. vbd 
org. vbd 
org. vbd 
-502 
-301 
-100 
100 
200 
surpris, rb advp 
Miller nnp np 
who wp np 
org. vbd vp 
Class 
np-sbj 
np 
Table 3: The first five instances for the sentence in Figure 2. Features 1-3 are the features for 
distance and intervening VPs and commas. Features 4 and 5 show the verb and its POS. Features 
6-8, 9-11 and 17 19 describe the context words/chunks, Features 12-16 the focus chunk. Empty 
contexts are indicated by the "-" for all features. 
very important, thereby justifying their intro- 
duction. Relative to the other "old" features, 
the structural features 1 and 2 have gained im- 
portance, probably because more structure is 
available in the input to represent. 
In principle, we would have to construct one 
instance tbr each possible pair of a verb and a 
focus word in the sentence. However, we re- 
strict instances to those where there is at most 
one other verb/VP chunk between the verb and 
the focus, in case the focus precedes the verb, 
and no other verb in case the verb precedes the 
focus. This restriction allows, for example, for a 
relative clause on the subject (as in our example 
sentence). In the training data, 97.9% of the re- 
lated pairs fulfill this condition (when counting 
VP chunks). Experiments on the training data 
showed that increasing the admitted number of 
intervening VP chunks slightly increases recall, 
at the cost of precision. Having constructed all 
instances from the test data and from a training 
set with the same level of partial structure, we 
first train the IG'IYee algorithm, and then let it 
classify the test instances. Then, for each test 
instance that was classified with a grammatical 
relation, we check whether the same verb-focus- 
pair appears with the same relation in the GR 
list extracted directly from the treebank. This 
gives us the precision of the classifier. Checking 
the treebank list versus the classified list yields 
recall. 
3.3 Cascaded Experiments 
We have already seen from the example that the 
level of structure in the input text can influence 
the composition of the instances. We are inter- 
ested in the effects of different sorts of partial 
structure in the input data on the classification 
performance of the final classifier. 
Therefore, we ran a series of experiments. 
The classification task was always that of find- 
ing grammatical relations to verbs and perfor- 
mance was always measured by precision and 
recall on those relations (the test set contained 
45825 relations). The amount of structure in 
the input data varied. Table 4 shows the results 
of the experiments. In the first experiment, only 
POS tagged input is used. Then, NP chunks 
are added. Other sorts of chunks are inserted 
at each subsequent step. Finally, the adverbial 
function labels are added. We can see that the 
more structure we add, the better precision and 
recall of the grammatical relations get: preci- 
sion increases from 60.7% to 74.8%, recall from 
41.3% to 67.9%. This in spite of the fact that 
the added information is not always correct, be- 
cause it was predicted for the test material on 
the basis of the training material by the classi- 
tiers described in Section 3.1. As we have seen 
in Table 1, especially ADJP and ADVP chunks 
243 
and adverbial function labels did not have very 
high precision and recall. 
4 Discussion 
There are three ways how two cascaded modules 
can interact. 
The first module can add information on 
which the later module can (partially) base 
its decisions. This is the case between the 
adverbial functions finder and the relations 
finder. The former adds an extra informa- 
tive feature to the instances of the latter 
(Feature 16 in Table 3). Cf. column two of 
Table 4. 
The first module can restrict the num- 
ber of decisions to be made by the sec- 
ond one. This is the case in the combina- 
tion of the chunking steps and the relations 
finder. Without the chunker, the relations 
finder would have to decide for every word, 
whether it is the head of a constituent that 
bears a relation to the verb. With the chun- 
ker., the relations finder has to make this 
decision for fewer words, namely only for 
those which are the last word in a chunk 
resp. the preposition of a PP chunk. Prac- 
tically, this reduction of the number of de- 
cisions (which translates into a reduction 
of instances) as can be seen in the third 
column of Table 4. 
• The first module can reduce the number of 
elements used for the instances by count- 
ing one chunk as just one context element. 
We can see the effect in the feature that 
indicates the distance in elements between 
the focus and the verb. The more chunks 
are used, the smaller the average absolute 
distance (see column four Table 4). 
All three effects interact in the cascade we 
describe. The PP chunker reduces the number 
of decisions for the relations finder (instead of 
one instance for the preposition and one for the 
NP chunk, we get only one instance for the PP 
chunk), introduces an extra feature (Feature 12 
in Table 3), and changes the context (instead of 
a preposition and an NP, context may now be 
one PP). 
As we already noted above, precision and re- 
call are monotonically increasing when adding 
more structure. However, we note large dif- 
ferences, such as NP chunks which increase 
FZ=i by more than 10%, and VP chunks which 
add another 6.8%, whereas ADVPs and ADJPs 
yield hardly any improvement. This may par- 
tially be explained by the fact that these chunks 
are less frequent than the former two. Preps, on 
the other hand, while hardly reducing the av- 
erage distance or the number of instances, im- 
prove F~=i by nearly 1%. PPs yield another 
1.1%. What may come as a surprise is that ad- 
verbial functions again increase FZ=i by nearly 
2%, despite the fact that FZ=i for this ADV- 
FUNC assignment step was not very high. This 
result shows that cascaded modules need not be 
perfect to be useful. 
Up to now, we only looked at the overall re- 
sults. Table 4 also shows individual FZ=i val- 
ues for four selected common grammatical re- 
lations: subject NP, (in)direct object NP, loca- 
tive PP adjunct and temporal PP adjunct. Note 
that the steps have different effects on the dif- 
ferent relations: Adding NPs increases FZ=i by 
11.3% for subjects resp. 16.2% for objects, but 
only 3.9% resp. 3.7% for locatives and tempo- 
rals. Adverbial functions are more important 
for the two adjuncts (+6.3% resp. +15%) than 
for the two complements (+0.2% resp. +0.7%). 
Argamon et al. (1998) report FZ=i for sub- 
ject and object identification of respectively 
86.5% and 83.0%, compared to 81.8% and 
81.0% in this paper. Note however that Arg- 
amon et al. (1998) do not identify the head 
of subjects, subjects in embedded clauses, or 
subjects and objects related to the verb only 
through a trace, which makes their task eas- 
ier. For a detailed comparison of the two meth- 
ods on the same task see (Daelemans et al., 
1999a). That paper also shows that the chunk- 
ing method proposed here performs about as 
well as other methods, and that the influence 
of tagging errors on (NP) chunking is less than 
1%. 
To study the effect of the errors in the lower 
modules other than the tagger, we used "per- 
fect" test data in a last experiment, i.e. data an- 
notated with partial information taken directly 
from the treebank. The results are shown in 
Table 5. We see that later modules suffer from 
errors of earlier modules (as could be expected): 
Ff~=l of PP chunking is 92% but could have 
244 
Structure in input 
words and POS only 
+NP chunks 
+VP chunks 
+ADVP and ADJP 
chunks 
+Prep chunks 
+PP chunks 
+ADVFUNCs 
I All Subj. Obj. Loc. Temp. 
¢p Feat. i# Inst. A Prec aec FZ=i FZ=i FZ=i FZ=i FZ=t 
13 350091 6.1 60.7 41.3 49.1 52.8 49.4 34.0 38.4 
17 227995 4.2 65.9 55.7 60.4 64.1 75.6 37.9 42.1 
17 186364 4.5 72.1 62.9 67.2 78.6 75.6 40.8 46.8 
17 185005 4.4 72.1 63.0 67.3 78.8 75.8'40.41 46.5 
17 184455 4.4 72.5 64.3 68.2 81.2 75.7 40.4! 47.1 
18 149341,3.6 73.6 65.6 69.3 81.6 80.3 40.6 48.3 
19 149341 3.6 i 74.8 67.9 71.2 81.8 81.0 46.9 63.3 
Table 4: Results of grammatical relation assignment with more and more structure in the test data 
added by earlier modules in the cascade. Columns show the number of features in the instances, 
the mlmber of instances constructed front the test input, the average distance between the verb 
and the tbcus element, precision, recall and FZ=i over all relations, and F¢~=i over some selected 
relations. 
Experiment 
PP chunking 
PP on perfect test data 
ADVFUNC assigmnent 
ADVFUNC on perfect test data 
All Relations 
Precision I Recalll F~=i 
91.9 92.2 92.0 
98.5 97.4 97.9 
78.0 q)9.5 73.5 
r3.4 77.0 
65.6 69.3 
80.8 73.9 77.2 
74.8 67.9 
86.3 80.8 83.5 
80.9 
GR with all chunks, without ADV- 73.6 
FUNC label 
GR with all chunks, without ADV-I 
FUNC label on perfect test data 
GR with all chunks and ADVFUNC I 
label 
GR with all chunks and ADVFUNC 
label on perfect test data 
71.2 
Table 5: Comparison of performance of several modules on realistic input structurally enriched by 
previous modules in the cascade) vs. on "perfect" input (enriched with partial treebank annotation). 
For PPs, this means perfect POS tags and chunk labels/boundaries, for ADVFUNC additionally 
perfect PP chunks, for GR assignment also perfect ADVFUNC labels. 
been 97.9% if all previous chunks would have 
been correct (+5.9%). For adverbial functions, 
the difference is 3.5%. For grammatical rela- 
tion assignment, the last module in the cascade, 
the difference is, not surprisingly, the largest: 
7.9% for chunks only, 12.3% for chunks and AD- 
VFUNCs. The latter percentage shows what 
could maximally be gained by further improving 
the chunker and ADVFUNCs finder. 'On realis- 
tic data, a realistic ADVFUNCs finder improves 
GR assigment by 1.9%. On perfect data, a per- 
fect ADVFUNCs finder increases performance 
by 6.3%. 
5 Conclusion and Future Research 
In this paper we studied cascaded grammatical 
relations assignment. We showed that even the 
use of imperfect modules improves the overall 
result of the cascade. 
In future research we plan to also train 
our classifiers on imperfectly chunked material. 
This enables the classifier to better cope with 
systematic errors in train and test material. We 
expect that especially an improvement of the 
245 
adverbial function assignment will lead to bet- 
ter GR assignment. 
Finally, since cascading proved effective for 
GR assignment we intend to study the effect 
of cascading different types of XP chunkers on 
chunking performance. We might e.g. first find 
ADJP chunks, then use that chunker's output 
as additional input for the NP chunker, then use 
the combined output as input to the VP chunker 
and so on. Other chunker orderings are possible, 
too. Likewise, it might be better to find differ- 
ent grammatical relations subsequently, instead 
of simultaneously. 

References 

S. Abney. 1991. Parsing by chunks. In 
Principle-Based Parsing, pages 257-278. 
Kluwer Academic Publishers, Dordrecht. 

S. Argamon, I. Dagan, and Y. Krymolowski. 
1998. A memory-based approach to learning 
shallow natural language patterns. In Proc. 
of 36th annual meeting of the A CL, pages 67- 
73, Montreal. 

R. Bod. 1992. A computational model of 
language performance: Data oriented pars- 
ing. In Proceedings of the l~th Interna- 
tional Conference on Computational Linguis- 
tics, COLING-92, Nantes, France, pages 
855-859. 

Thorsten, Brants and Wojciech Skut. 1998. Au- 
tomation of treebank annotation. In Proceed- 
ings of the Conference on New Methods in 
Language Processing (NeMLaP-3), Australia. 

Sabine Buchholz. 1998. Distinguishing com- 
plements from adjuncts using memory-based 
learning. In Proceedings of the ESSLLI-98 
Workshop on Automated Acquisition of Syn- 
tax and Parsing, Saarbriicken, Germany. 

C.J.van Rijsbergen. 1979. Information Re- 
trieval. Buttersworth, London. 

M. Collins. 1997. Three generative, lexicalised 
models for statistical parsing. In Proceedings 
of the 35th A CL and the 8th EA CL, Madrid, 
Spain~ July. 

W. Daelemans, J. Zavrel, P. Berck, and S. Gillis. 
1996. MBT: A memory-based part of speech 
tagger generator. In E. Ejerhed and I. Dagan, 
editors, Proc. of Fourth Workshop on Very 
Large Corpora, pages 14-27. ACL SIGDAT. 

W. Daelemans, J. Zavrel, K. Van der Sloot, and 
A. Van den Bosch. 1998. TiMBL: Tilburg 
Memory Based Learner, version 1.0, reference 
manual. Technical Report ILK-9803, ILK, 
Tilburg University. 

W. Daelemans, S. Buchholz, and J. Veenstra. 
1999a. Memory-based shallow parsing. In 
Proceedings of CoNLL, Bergen, Norway. 

W. Daelemans, A. Van den Bosch, and J. Za- 
vrel. 1999b. Forgetting exceptions is harm- 
ful in language learning. Machine Learning, 
Special issue on Natural Language Learning, 
34:11-41. 

Gregory Grefenstette. 1996. Light parsing as 
finite-state filtering. In Wolfgang Wahlster, 
editor, Workshop on Extended Finite State 
Models of Language, ECAI'96, Budapest, 
Hungary. John Wiley & Sons, Ltd. 

M. Marcus, B. Santorini, and M.A. 
Marcinkiewicz. 1993. Building a large anno- 
tared corpus of english: The penn treebank. 
Computational Linguistics, 19(2):313-330. 

L.A. Ramshaw and M.P. Marcus. 1995. Text 
chunking using transformation-based learn- 
ing. In Proceedings of the 3rd ACL/SIGDAT 
Workshop on Very Large Corpora, Cam- 
bridge, Massachusetts, USA, pages 82-94. 

A. Ratnaparkhi. 1997. A linear observed time 
statistical parser based on maximum en- 
tropy models. In Proceedings of the Second 
Conference on Empirical Methods in Natural 
Language Processing, EMNLP-2, Providence, 
Rhode Island, pages 1-10. 

Satoshi Sekine. 1998. Corpus-Based Parsing 
and Sublanguage Studies. Ph.D. thesis, New 
York University. 

E. Tjong Kim Sang and J.B. Veenstra. 1999. 
Representing text chunks. In Proceedings of 
the EA CL, Bergen, N. 

J. B. Veenstra. 1998. Fast np chunking using 
memory-based learning techniques. In Pro- 
ceedings of BENELEARN'98, pages 71-78, 
Wageningen, The Netherlands. 

J. Zavrel and W. Daelemans. 1997. Memory- 
based learning: Using similarity for smooth- 
ing. In Proc. of 35th annual meeting of the 
A CL, Madrid. 
