In: Proceedings of CoNLL-2000 and LLL-2000, pages 127-132, Lisbon, Portugal, 2000. 
Introduction to the CoNLL-2000 Shared Task: Chunking 
Erik F. Tjong Kim Sang 
CNTS - Language Technology Group 
University of Antwerp 
eriktOuia, ua. ac. be 
Sabine Buchholz 
ILK, Computational Linguistics 
Tilburg University 
s. buchholz@kub, nl 
Abstract 
We describe the CoNLL-2000 shared task: 
dividing text into syntactically related non- 
overlapping groups of words, so-called text 
chunking. We give background information on 
the data sets, present a general overview of the 
systems that have taken part in the shared task 
and briefly discuss their performance. 
1 Introduction 
Text chunking is a useful preprocessing step 
for parsing. There has been a large inter- 
est in recognizing non-overlapping noun phrases 
(Ramshaw and Marcus (1995) and follow-up pa- 
pers) but relatively little has been written about 
identifying phrases of other syntactic categories. 
The CoNLL-2000 shared task attempts to fill 
this gap. 
2 Task description 
Text chunking consists of dividing a text into 
phrases in such a way that syntactically re- 
lated words become member of the same phrase. 
These phrases are non-overlapping which means 
that one word can only be a member of one 
chunk. Here is an example sentence: 
\[NP He \] \[vP reckons \] \[NP the current 
account deficit \] \[vP will narrow \] 
\[pp to \] \[NP only £ 1.8 billion \] 
\[pp in \]\[NP September \]. 
Chunks have been represented as groups of 
words between square brackets. A tag next to 
the open bracket denotes the type of the chunk. 
As far as we know, there are no annotated cor- 
pora available which contain specific informa- 
tion about dividing sentences into chunks of 
words of arbitrary types. We have chosen to 
work with a corpus with parse information, the 
Wall Street Journal WSJ part of the Penn Tree- 
bank II corpus (Marcus et al., 1993), and to ex- 
tract chunk information from the parse trees in 
this corpus. We will give a global description of 
the various chunk types in the next section. 
3 Chunk Types 
The chunk types are based on the syntactic cat- 
egory part (i.e. without function tag) of the 
bracket label in the Treebank (cf. Bies (1995) 
p.35). Roughly, a chunk contains everything to 
the left of and including the syntactic head of 
the constituent of the same name. Some Tree- 
bank constituents do not have related chunks. 
The head of S (simple declarative clause) for ex- 
ample is normally thought to be the verb, but 
as the verb is already part of the VP chunk, no 
S chunk exists in our example sentence. 
Besides the head, a chunk also contains pre- 
modifiers (like determiners and adjectives in 
NPs), but no postmodifiers or arguments. This 
is why the PP chunk only contains the preposi- 
tion, and not the argument NP, and the SBAR 
chunk consists of only the complementizer. 
There are several difficulties when converting 
trees into chunks. In the most simple case, a 
chunk is just a syntactic constituent without 
any further embedded constituents, like the NPs 
in our examples. In some cases, the chunk con- 
tains only what is left after other chunks have 
been removed from the constituent, cf. "(VP 
loves (NP Mary))" above, or ADJPs and PPs 
below. We will discuss some special cases dur- 
ing the following description of the individual 
chunk types. 
3.1 NP 
Our NP chunks are very similar to the ones of 
Ramshaw and Marcus (1995). Specifically, pos- 
sessive NP constructions are split in front of 
the possessive marker (e.g. \[NP Eastern Air- 
lines \] \[NP ' creditors \]) and the handling of co- 
127 
ordinated NPs follows the Treebank annotators. 
However, as Ramshaw and Marcus do not de- 
scribe the details of their conversion algorithm, 
results may differ in difficult cases, e.g. involv- 
ing NAC and NX. 1 
An ADJP constituent inside an NP con- 
stituent becomes part of the NP chunk: 
(NP The (ADJP most volatile) form) 
\[NP the most volatile form \] 
3.2 VP 
In the Treebank, verb phrases are highly embed- 
ded; see e.g. the following sentence which con- 
tains four VP constituents. Following Ramshaw 
and Marcus' V-type chunks, this sentence will 
only contain one VP chunk: 
((S (NP-SBJ-3 Mr. Icahn) (VP may 
not (VP want (S (NP-SBJ *-3) (VP to 
(VP sell ...))))) . )) 
\[NP Mr. Icahn \] \[vP may not want 
to sell \] ... 
It is still possible however to have one VP chunk 
directly follow another: \[NP The impression \] 
\[NP I\] \[VP have got \] \[vP is \] \[NP they\] \[vP 'd 
love to do \] \[PRT away \] \[pp with \] \[NP it \]. In this 
case the two VP constituents did not overlap in 
the Treebank. 
Adverbs/adverbial phrases becorae part of 
the VP chunk (as long as they are in front of 
the main verb): 
(VP could (ADVP very well) (VP 
show ... )) 
"-+ \[ve could very well show \] ... 
In contrast to Ramshaw and Marcus (1995), 
predicative adjectives of the verb are not part 
of the VP chunk, e.g. in "\[NP they \] \[vP are \] 
\[ADJP unhappy \]'. 
In inverted sentences, the auxiliary verb is not 
part of any verb phrase in the Treebank. Con- 
sequently it does not belong to any VP chunk: 
((S (SINV (CONJP Not only) does 
(NP-SBJ-1 your product) (VP have (S 
IE.g. (NP-SBJ (NP Robin Leigh-Pemberton) , (NP 
(NAC Bank (PP of (NP England))) governor) ,) which 
we convert to \[NP Robin Leigh-Pemberton \] , Bank 
\[pp of \] \[NP England \] \[NP governor \] whereas Ramshaw 
and Marcus state that ' "governor" is not included in 
any baseNP chunk'. 
(NP-SBJ *-1) (VP to (VP be (ADJP- 
PRD excellent)))))) , but ... 
\[CONJP Not only \] does \[NP your 
product \] \[vP have to be \] \[ADJP ex- 
cellent \] , but ... 
3.3 ADVP and ADJP 
ADVP chunks mostly correspond to ADVP con- 
stituents in the Treebank. However, ADVPs in- 
side ADJPs or inside VPs if in front of the main 
verb are assimilated into the ADJP respectively 
VP chunk. On the other hand, ADVPs that 
contain an NP make two chunks: 
(ADVP-TMP (NP a year) earlier) 
-+ \[NP a year \] \[ADVP earlier \] 
ADJPs inside NPs are assimilated into the NP. 
And parallel to ADVPs, ADJPs that contain an 
NP make two chunks: 
(ADJP-PRD (NP 68 years) old) 
\[NP 68 years \] \[ADJP old \] 
It would be interesting to see how chang- 
ing these decisions (as can be done in the 
Treebank-to-chunk conversion script 2) infiu- 
ences the chunking task. 
3.4 PP and SBAR 
Most PP chunks just consist of one word (the 
preposition) with the part-of-speech tag IN. 
This does not mean, though, that finding PP 
chunks is completely trivial. INs can also con- 
stitute an SBAR chunk (see below) and some 
PP chunks contain more than one word. This 
is the case with fixed multi-word prepositions 
such as such as, because of, due to, with prepo- 
sitions preceded by a modifier: well above, just 
after, even in, particularly among or with coor- 
dinated prepositions: inside and outside. We 
think that PPs behave sufficiently differently 
from NPs in a sentence for not wanting to group 
them into one class (as Ramshaw and Marcus 
did in their N-type chunks), and that on the 
other hand tagging all NP chunks inside a PP 
as I-PP would only confuse the chunker. We 
therefore chose not to handle the recognition of 
true PPs (prep.+NP) during this first chunking 
step. 
~The Treebank-to-chunk conversion script is available 
from http://ilk.kub.nl/-sabine/chunklink/ 
128 
SBAR Chunks mostly consist of one word (the 
complementizer) with the part-of-speech tag IN, 
but like multi-word prepositions, there are also 
multi-word complementizers: even though, so 
that, just as, even if, as if, only if. 
3.5 CONJP, PRT, INTJ, LST, UCP 
Conjunctions can consist of more than one word 
as well: as well as, instead of, rather than, not 
only, but also. One-word conjunctions (like and, 
or) are not annotated as CONJP in the Tree- 
bank, and are consequently no CONJP chunks 
in our data. 
The Treebank uses the PRT constituent to 
annotate verb particles, and our PRT chunk 
does the same. The only multi-word particle 
is on and off. This chunk type should be easy 
to recognize as it should coincide with the part- 
of-speech tag RP, but through tagging errors it 
is sometimes also assigned IN (preposition) or 
RB (adverb). 
INTJ is an interjection phrase/chunk like no, 
oh, hello, alas, good grief!. It is quite rare. 
The list marker LST is even rarer. Examples 
are 1., 2, 3., .first, second, a, b, c. It might con- 
sist of two words: the number and the period. 
The UCP chunk is reminiscent of the UCP 
(unlike coordinated phrase) constituent in the 
Treebank. Arguably, the conjunction is the 
head of the UCP, so most UCP chunks consist 
of conjunctions like and and or. UCPs are the 
rarest chunks and are probably not very useful 
for other NLP tasks. 
3.6 Tokens outside 
Tokens outside any chunk are mostly punctua- 
tion signs and the conjunctions in ordinary coor- 
dinated phrases. The word not may also be out- 
side of any chunk. This happens in two cases: 
Either not is not inside the VP constituent in 
the Treebank annotation e.g. in 
... (VP have (VP told (NP-1 clients) 
(S (NP-SBJ *-1) not (VP to (VP ship 
(NP anything)))))) 
or not is not followed by another verb (because 
the main verb is a form of to be). As the right 
chunk boundary is defined by the chunk's head, 
i.e. the main verb in this case, not is thenin fact 
a postmodifier and as such not included in the 
chunk: "... \[SBAR that \] \[NP there \] \[vP were \] 
n't \[NP any major problems \] ." 
3.7 Problems 
All chunks were automatically extracted from 
the parsed version of the Treebank, guided by 
the tree structure, the syntactic constituent la- 
bels, the part-of-speech tags and by knowledge 
about which tags can be heads of which con- 
stituents. However, some trees are very complex 
and some annotations are inconsistent. What 
to think about a VP in which the main verb is 
tagged as NN (common noun)? Either we al- 
low NNs as heads of VPs (not very elegant but 
which is what we did) or we have a VP without 
a head. The first solution might also introduce 
errors elsewhere... As Ramshaw and Marcus 
(1995) already noted: "While this automatic 
derivation process introduced a small percent- 
age of errors on its own, it was the only practi- 
cal way both to provide the amount of training 
data required and to allow for fully-automatic 
testing." 
4 Data and Evaluation 
For the CoNLL shared task, we have chosen 
to work with the same sections of the Penn 
Treebank as the widely used data set for base 
noun phrase recognition (Ramshaw and Mar- 
cus, 1995): WSJ sections 15-18 of the Penn 
Treebank as training material and section 20 
as test material 3. The chunks in the data 
were selected to match the descriptions in the 
previous section. An overview of the chunk 
types in the training data can be found in ta- 
ble 1. De data sets contain tokens (words and 
punctuation marks), information about the lo- 
cation of sentence boundaries and information 
about chunk boundaries. Additionally, a part- 
of-speech (POS) tag was assigned to each token 
by a standard POS tagger (Brill (1994) trained 
on the Penn Treebank). We used these POS 
tags rather than the Treebank ones in order to 
make sure that the performance rates obtained 
for this data are realistic estimates for data for 
which no treebank POS tags are available. 
In our example sentence in section 2, we have 
used brackets for encoding text chunks. In the 
data sets we have represented chunks with three 
types of tags: 
3The text chunking data set is available at http://lcg- 
www.uia.ac.be/conll2000/chunking/ 
129 
count % 
55081 51% 
21467 20% 
21281 20% 
4227 4% 
2207 2% 
2060 2% 
556 1% 
56 O% 
31 0% 
10 O% 
2 0% 
type 
NP (noun phrase) 
VP (verb phrase) 
PP (prepositional phrase) 
ADVP (adverb phrase) 
SBAR (subordinated clause) 
ADJP (adjective phrase) 
PRT (particles) 
CONJP (conjunction phrase 
INTJ (interjection) 
LST (list marker) 
UCP (unlike coordinated phrase) 
Table 1: Number of chunks per phrase type 
in the training data (211727 tokens, 106978 
chunks). 
B-X 
I-X 
0 
first word of a chunk of type X 
non-initial word in an X chunk 
word outside of any chunk 
This representation type is based on a repre- 
sentation proposed by Ramshaw and Marcus 
(1995) for noun phrase chunks. The three tag 
groups are sufficient for encoding the chunks in 
the data since these are non-overlapping. Using 
these chunk tags makes it possible to approach 
the chunking task as a word classification task. 
We can use chunk tags for representing our ex- 
ample sentence in the following way: 
He/B-NP reckons/B-VP the/B-NP 
current/I-NP account/I-NP 
deficit/I-NP will/B-VP narrow/I-VP 
to/B-PP only/B-NP #/I-NP 
1.8/I-NP billion/B-NP in/B-PP 
September/B-NP ./O 
The output of a chunk recognizer may contain 
inconsistencies in the chunk tags in case a word 
tagged I-X follows a word tagged O or I-Y, with 
X and Y being different. These inconsistencies 
can be resolved by assuming that such I-X tags 
start a new chunk. 
The performance on this task is measured 
with three rates. First, the percentage of 
detected phrases that are correct (precision). 
Second, the percentage of phrases in the 
data that were found by the chunker (recall). 
And third, the FZ=i rate which is equal to 
(f12 + 1)*precision*recall / (~2,precision+recall) 
with ~=1 (van Rijsbergen, 1975). The lat- 
ter rate has been used as the target for 
optimization 4. 
5 Results 
The eleven systems that have been applied to 
the CoNLL-2000 shared task can be divided in 
four groups: 
1. Rule-based systems: Villain and Day; Jo- 
hansson; D6jean. 
2. Memory-based systems: Veenstra and Van 
den Bosch. 
3. Statistical systems: Pla, Molina and Pri- 
eto; Osborne; Koeling; Zhou, Tey and Su. 
4. Combined systems: Tjong Kim Sang; Van 
Halteren; Kudoh and Matsumoto. 
Vilain and Day (2000) approached the shared 
task in three different ways. The most success- 
ful was an application of the Alembic parser 
which uses transformation-based rules. Johans- 
son (2000) uses context-sensitive and context- 
free rules for transforming part-of-speech (POS) 
tag sequences to chunk tag sequences. D6jean 
(2000) has applied the theory refinement sys- 
tem ALLiS to the shared task. In order to ob- 
tain a system which could process XML format- 
ted data while using context information, he 
has used three extra tools. Veenstra and Van 
den Bosch (2000) examined different parame- 
ter settings of a memory-based learning algo- 
rithm. They found that modified value differ- 
ence metric applied to POS information only 
worked best. 
A large number of the systems applied to 
the CoNLL-2000 shared task uses statistical 
methods. Pla, Molina and Prieto (2000) use 
a finite-state version of Markov Models. They 
started with using POS information only and 
obtained a better performance when lexical 
information was used. Zhou, Tey and Su 
(2000) implemented a chunk tagger based on 
HMMs. The initial performance of the tag- 
ger was improved by a post-process correction 
method based on error driven learning and by 
4In the literature about related tasks sometimes the 
tagging accuracy is mentioned as well. However, since 
the relation between tag accuracy and chunk precision 
and recall is not very strict, tagging accuracy is not a 
good evaluation measure for this task. 
130 
test data 
Kudoh and Matsumoto 
Van Halteren 
Tjong Kim Sang 
Zhou, Tey and Su 
D@jean 
Koeling 
Osborne 
Veenstra and Van den Bosch 
Pla, Molina and Prieto 
Johansson 
Vilain and Day 
baseline 
precision 
93.45% 
93.13% 
94.04% 
91.99% 
91.87% 
92.08% 
91.65% 
91.05% 
90.63% 
86.24% 
88.82% 
72.58% 
recall 
93.51% 
93.51% 
91.00% 
92.25% 
91.31% 
91.86% 
92.23% 
92.03% 
89.65% 
88.25% 
82.91% 
82.14% 
F~=i 
93.48 
93.32 
92.50 
92.12 
92.09 
91.97 
91.94 
91.54 
90.14 
87.23 
85.76 
77.07 
Table 2: Performance of the eleven systems on the test data. The baseline results have been 
obtained by selecting the most frequent chunk tag for each part-of-speech tag. 
incorporating chunk probabilities generated by 
a memory-based learning process. The two 
other statistical systems use maximum-entropy 
based methods. Osborne (2000) trained Ratna- 
parkhi's maximum-entropy POS tagger to out- 
put chunk tags. Koeling (2000) used a stan- 
dard maximum-entropy learner for generating 
chunk tags from words and POS tags. Both 
have tested different feature combinations be- 
fore finding an optimal one and their final re- 
sults are close to each other. 
Three systems use system combination. 
Tjong Kim Sang (2000) trained and tested five 
memory-based learning systems to produce dif- 
ferent representations of the chunk tags. A 
combination of the five by majority voting per- 
formed better than the individual parts. Van 
Halteren (2000) used Weighted Probability Dis- 
tribution Voting (WPDV) for combining the 
results of four WPDV chunk taggers and a 
memory-based chunk tagger. Again the com- 
bination outperformed the individual systems. 
Kudoh and Matsumoto (2000) created 231 sup- 
port vector machine classifiers to predict the 
unique pairs of chunk tags. The results of the 
classifiers were combined by a dynamic pro- 
gramming algorithm. 
The performance of the systems can be found 
in Table 2. A baseline performance was ob- 
tained by selecting the chunk tag most fre- 
quently associated with a POS tag. All systems 
outperform the baseline. The majority of the 
systems reached an F~=i score between 91.50 
and 92.50. Two approaches performed a lot 
better: the combination system WPDV used by 
Van Halteren and the Support Vector Machines 
used by Kudoh and Matsumoto. 
6 Related Work 
In the early nineties, Abney (1991) proposed 
to approach parsing by starting with finding 
related chunks of words. By then, Church 
(1988) had already reported on recognition 
of base noun phrases with statistical meth- 
ods. Ramshaw and Marcus (1995) approached 
chunking by using a machine learning method. 
Their work has inspired many others to study 
the application of learning methods to noun 
phrase chunking 5. Other chunk types have not 
received the same attention as NP chunks. The 
most complete work is Buchholz et al. (1999), 
which presents results for NP, VP, PP, ADJP 
and ADVP chunks. Veenstra (1999) works with 
NP, VP and PP chunks. Both he and Buchholz 
et al. use data generated by the script that pro- 
duced the CoNLL-2000 shared task data sets. 
Ratnaparkhi (1998) has recognized arbitrary 
chunks as part of a parsing task but did not re- 
port on the chunking performance. Part of the 
Sparkle project has concentrated on finding var- 
ious sorts of chunks for the different s 
~An elaborate overview of the work done on noun 
phrase chunking can be found on http://lcg-www.uia. 
ac.be/- erikt/reseaxch/np-chunking.html 
131 
(Carroll et al., 1997). 
'7 Concluding Remarks 
We have presented an introduction to the 
CoNLL-2000 shared task: dividing text into 
syntactically related non-overlapping groups of 
words, so-called text chunking. For this task we 
have generated training and test data from the 
Penn Treebank. This data has been processed 
by eleven systems. The best performing system 
was a combination of Support Vector Machines 
submitted by Taku Kudoh and Yuji Matsumoto. 
It obtained an FZ=i score of 93.48 on this task. 
Acknowledgements 
We would like to thank the members of 
the CNTS - Language Technology Group in 
Antwerp, Belgium and the members of the ILK 
group in Tilburg, The Netherlands for valuable 
discussions and comments. Tjong Kim Sang is 
funded by the European TMR network Learn- 
ing Computational Grammars. Buchholz is sup- 
ported by the Netherlands Organization for Sci- 
entific Research (NWO). 

References 
Steven Abney. 1991. Parsing by chunks. In 
Principle-Based Parsing. Kluwer Academic Pub- 
lishers. 
Ann Bies, Mark Ferguson, Karen Katz, and Robert 
MacIntyre. 1995. Bracket Guidelines /or Tree- 
bank H Style Penn Treebank Project. Penn Tree- 
bank II cdrom. 
Eric Brill. 1994. Some advances in rule-based 
part of speech tagging. In Proceedings o\] the 
Twelfth National Con/erence on Artificial Intel- 
ligence (AAAI-9~). Seattle, Washington. 
Sabine Buchholz, Jorn Veenstra, and Walter Daele- 
mans. 1999. Cascaded grammatical :relation as- 
signment. In Proceedings o\] EMNLP/VLC-99. 
Association for Computational Linguistics. 
John Carroll, Ted Briscoe, Glenn Carroll, Marc 
Light, Dethleff Prescher, Mats Rooth, Stefano 
Federici, Simonetta Montemagni, Vito Pirrelli, 
Irina Prodanof, and Massimo Vanocchi. 1997. 
Phrasal Parsing Software. Sparkle Work Package 
3, Deliverable D3.2. 
Kenneth Ward Church. 1988. A stochastic parts 
program and noun phrase parser for unrestricted 
text. In Second Con\]erence on Applied Natural 
Language Processing. Austin, Texas. 
H@rve D@jean. 2000. Learning syntactic structures 
with xml. In Proceedings o/ CoN~LL-2000 and 
LLL-2000. Lisbon, Portugal. 
Christer Johansson. 2000. A context sensitive max- 
imum likelihood approach to chunking. In Pro- 
ceedings o\] CoNLL-2000 and LLL-2000. Lisbon, 
Portugal. 
Rob Koeling. 2000. Chunking with maximum en- 
tropy models. In Proceedings o/ CoNLL-2000 and 
LLL-2000. Lisbon, Portugal. 
Taku Kudoh and Yuji Matsumoto. 2000. Use of sup- 
port vector learning for chunk identification. In 
Proceedings o~ CoNLL-2000 and LLL-2000. Lis- 
bon, Portugal. 
Mitchell P. Marcus, Beatrice Santorini, and 
Mary Ann Marcinkiewicz. 1993. Building a large 
annotated corpus of english: the penn treebank. 
Computational Linguistics, 19(2). 
Miles Osborne. 2000. Shallow parsing as part-of- 
speech tagging. In Proceedings o\] CoNLL-2000 
and LLL-2000. Lisbon, Portugal. 
Ferran Pla, Antonio Molina, and Natividad Pri- 
eto. 2000. Improving chunking by means of 
lexical-contextual information in statistical lan- 
guage models. In Proceedings o\] CoNLL-2000 and 
LLL-2000. Lisbon, Portugal. 
Lance A. Ramshaw and Mitchell P. Marcus. 1995. 
Text chunking using transformation-based learn- 
ing. In Proceedings o\] the Third A CL Workshop 
on Very Large Corpora. Association for Compu- 
tational Linguistics. 
Adwait Ratnaparkhi. 1998. Maximum Entropy 
Models \]or Natural Language Ambiguity Resolu- 
tion. PhD thesis Computer and Information Sci- 
ence, University of Pennsylvania. 
Erik F. Tjong Kim Sang. 2000. Text chunking by 
system combination. In Proceedings of CoNLL- 
2000 and LLL-2000. Lisbon, Portugal. 
Hans van Halteren. 2000. Chunking with wpdv 
models. In Proceedings o/ CoNLL-2000 and LLL- 
2000. Lisbon, Portugal. 
C.J. van Rijsbergen. 1975. In/ormation Retrieval. 
Buttersworth. 
Jorn Veenstra and Antal van den Bosch. 2000. 
Single-classifier memory-based phrase chunking. 
In Proceedings o\] CoNLL-2000 and LLL-2000. 
Lisbon, Portugal. 
Jorn Veenstra. 1999. Memory-based text chunking. 
In Nikos Fakotakis, editor, Machine learning in 
human  technology, workshop at ACAI 
99. 
Marc Vilain and David Day. 2000. Phrase parsing 
with rule sequence processors: an application to 
the shared conll task. In Proceedings o/CoNLL- 
2000 and LLL-2000. Lisbon, Portugal. 
GuoDong Zhou, Jian Su, and TongGuan Tey. 2000. 
Hybrid text chunking. In Proceedings o/CoNLL- 
2000 and LLL-2000. Lisbon, Portugal. 
