Introduction to the CoNLL-2002 Shared Task:
Language-Independent Named Entity Recognition
Erik F. Tjong Kim Sang
CNTS - Language Technology Group
University of Antwerp
erikt@uia.ua.ac.be
Abstract
We describe the CoNLL-2002 shared task:
language-independent named entity recogni-
tion. We give background information on the
data sets and the evaluation method, presenta
general overview of the systems that havetaken
part in the task and discuss their performance.
1 Introduction
Named entities are phrases that contain the
names of persons, organizations, locations,
times and quantities. Example:
[PER Wol ] , currently a journalist in
[LOC Argentina ] , played with [PER
Del Bosque ] in the nal years of the
seventies in [ORG Real Madrid ] .
This sentence contains four named entities:
Wol and Del Bosque are persons, Argentina
is a location and Real Madrid is a organiza-
tion. The shared task of CoNLL-2002 concerns
language-independent named entity recogni-
tion. We will concentrate on four types of
named entities: persons, locations, organiza-
tions and names of miscellaneous entities that
do not belong to the previous three groups. The
participantsof the shared task have been oered
training and test data for two European lan-
guages: SpanishandDutch. Theyhave usedthe
data for developing a named-entity recognition
system that includes a machine learning compo-
nent. The organizers of the shared task were es-
pecially interested in approaches that make use
of additional nonannotated data for improving
their performance.
2 Data and Evaluation
The CoNLL-2002 named entity data consists of
six les covering two languages: Spanish and
Dutch
1
. Each of the languages has a training
le, a development le and a test le. The
learning methods will be trained with the train-
ing data. The development data can be used
for tuning the parameters of the learning meth-
ods. When the best parameters are found, the
method can be trained on the training data and
tested on the test data. The results of the dif-
ferent learning methods on the test sets will be
compared in the evaluation of the shared task.
The split between development data and test
data has been chosen to avoid that systems are
being tuned to the test data.
All data les contain one word per line with
empty lines representing sentence boundaries.
Additionally each line contains a tag which
states whether the word is insidea named entity
or not. The tag also encodes the type of named
entity. Here is a part of the example sentence:
Wol B-PER
, O
currently O
a O
journalist O
in O
Argentina B-LOC
, O
played O
with O
Del B-PER
Bosque I-PER
Words tagged with O are outside of named
entities. The B-XXX tag is used for the rst
word in a named entity of type XXX and I-
XXX is used for all other words in named en-
tities of type XXX. The data contains enti-
1
The data les are available from http://lcg-www.
uia.ac.be/conll2002/ner/
ties of four types: persons (PER), organiza-
tions (ORG), locations (LOC) and miscella-
neous names (MISC). The tagging scheme is a
variant of the IOB scheme originally put for-
ward by Ramshaw and Marcus (1995). We as-
sume that named entities are non-recursive and
non-overlapping. In case a named entityisem-
bedded in another named entity usually only
the top level entity will be marked.
The Spanish data is a collection of news wire
articles made available by the Spanish EFE
News Agency. The articles are from May2000.
The annotation was carried out by the TALP
Research Center
2
of the Technical Universityof
Catalonia (UPC) and the Center of Language
and Computation (CLiC
3
) of the University of
Barcelona (UB), and funded by the European
Commission through the NAMIC project (IST-
1999-12392). The data contains words and en-
tity tags only. The training, development and
test data les contain 273037, 54837 and 53049
lines respectively.
The Dutch data consist of four editions of the
Belgian newspaper "De Morgen" of 2000 (June
2, July 1, August 1 and September1). The data
was annotated as a part of the Atranos project
4
at the University of Antwerp in Belgium, Eu-
rope. The annotator has followed the MITRE
and SAIC guidelines for named entity recogni-
tion (Chinchor et al., 1999) as well as possible.
The data consists of words, entity tags and part-
of-speech tags which have been derived by a
Dutch part-of-speech tagger (Daelemans et al.,
1996). Additionally the article boundaries in
the text have been marked explicitly with lines
containing the tag -DOCSTART-. Thetraining,
development and test data les contain 218737,
40656 and 74189 lines respectively.
The performance in this task is mea-
sured with F
=1
rate which is equal to
(
2
+1)*precision*recall / (
2
*precision+recall)
with =1 (van Rijsbergen, 1975). Precision is
the percentage of named entities found by the
learning system that are correct. Recall is the
percentage of named entities present in the cor-
pus that are found by the system. A named
entity is correct only if it is an exact match of
the corresponding entity in the data le.
2
http://www.talp.upc.es/
3
http://clic.l.ub.es/
4
http://atranos.esat.kuleuven.ac.be/
3 Results
Twelve systems have participated in this shared
task. The results for the test sets for Spanish
and Dutch can be found in Table 1. A baseline
rate was computed for both sets. It was pro-
duced by a system which only identied entities
which had a unique class in the training data. If
a phrase was part of more than one entity, the
system would select the longest one. All sys-
tems that participated in the shared task have
outperformed the baseline system.
McNamee and Mayeld (2002) have applied
support vector machines to the data of the
shared task. Their system used many binary
features for representing words (almost 9000).
They have evaluated dierent parameter set-
tings of the system and have selected a cascaded
approach in which rst entity boundaries were
predicted and then entity classes (Spanish test
set: F
=1
=60.97;; Dutch test set: F
=1
=59.52).
Black and Vasilakopoulos (2002) have evalu-
ated two approaches to the shared task. The
rst was a transformation-based method which
generated in rules in a single pass rather than
in many passes. The second method was a
decision tree method. They found that the
transformation-based method consistently out-
performed the decision trees (Spanish test set:
F
=1
=67.49;; Dutchtestset:F
=1
=56.43)
Tsukamoto, Mitsuishi and Sassano (2002)
used a stacked AdaBoost classier for nding
named entities. They found that cascading clas-
siers helped improved performance. Their -
nal system consisted of a cascade of ve learners
eachofwhich performed 10,000 boosting rounds
(Spanish test set: F
=1
=71.49;; Dutch test set:
F
=1
=60.93)
Malouf (2002) tested dierent models with
the shared task data: a statistical baseline
model, a Hidden Markov Model and maximum
entropy models with dierent features. The lat-
ter proved to perform best. The maximum en-
tropy models beneted from extra feature which
encoded capitalization information, positional
information and information about the current
word being part of a person name earlier in
the text. However, incorporating a list of per-
son names in the training process did not help
(Spanish test set: F
=1
=73.66;; Dutch test set:
F
=1
=68.08)
Jansche (2002) employed a rst-order Markov
model as a named entity recognizer. His system
used two separate passes, one for extracting en-
tity boundaries and one for classifying entities.
He evaluated dierent features in both subpro-
cesses. The categorization process was trained
separately from the extraction process but that
did not seem to have harmed overall perfor-
mance (Spanish test set: F
=1
=73.89;; Dutch
test set: F
=1
=69.68)
Patrick, Whitelaw and Munro (2002) present
SLINERC, a language-independent named en-
tity recognizer. The system uses tries as well
as character n-grams for encoding word-internal
and contextual information. Additionally,itre-
lies on lists of entities which have been com-
piled from the training data. The overall sys-
tem consists of six stages, three regarding entity
recognition and three for entity categorization.
Stages use the output of previous stages for ob-
taining an improved performance (Spanish test
set: F
=1
=73.92;; Dutch test set: F
=1
=71.36)
Tjong Kim Sang (2002) has applied a
memory-based learner to the data of the shared
task. He used a two-stage processing strategy
as well: rst identifying entities and then clas-
sifying them. Apart from the base classier,
his system made use of three extra techniques
for boosting performance: cascading classiers
(stacking), feature selection and system combi-
nation. Each of these techniques were shown to
be useful (Spanish test set: F
=1
=75.78;; Dutch
test set: F
=1
=70.67).
Burger, Henderson and Morgan (2002) have
evaluated three approaches to nding named
entities. They started with a baseline system
which consisted of an HMM-based phrase tag-
ger. They gave the tagger access to a list
of approximately 250,000 named entities and
the performance improved. After this several
smoothed word classes derived from the avail-
able data were incorporated into the training
process. The system performed better with the
derived word lists than with the external named
entity lists (Spanish test set: F
=1
=75.78;;
Dutch test set: F
=1
=72.57).
Cucerzan and Yarowsky (2002) approached
the shared task by using word-internal and con-
textual information stored in character-based
tries. Their system obtained good results by us-
ing part-of-speech tag information and employ-
ing the one sense per discourse principle. The
authors expect a performance increase when the
system has access to external entity lists but
have not presented the results of this in detail
(Spanish test set: F
=1
=77.15;; Dutch test set:
F
=1
=72.31).
Wu, Ngai, Carpuat, Larsen and Yang (2002)
have applied AdaBoost.MH to the shared
task data and compared the performance with
that of a maximum entropy-based named
entity tagger. Their system used lexical
and part-of-speech information, contextual and
word-internal clues, capitalization information,
knowledge about entity classes of previous oc-
currences of words and a small external list
of named entity words. The boosting tech-
niques operated on decision stumps, decision
trees of depth one. They outperformed the
maximum entropy-based named entity tagger
(Spanish test set: F
=1
=76.61;; Dutch test set:
F
=1
=75.36).
Florian (2002) employed three stacked
learners for named entity recognition:
transformation-based learning for obtain-
ing base-level non-typed named entities, Snow
for improving the quality of these entities
and the forward-backward algorithm for nd-
ing categories for the named entities. The
combination of the three algorithms showed
a substantially improved performance when
compared with a single algorithm and an
algorithm pair (Spanish test set: F
=1
=79.05;;
Dutch test set: F
=1
=74.99).
Carreras, Marquez and Padro (2002) have ap-
proached the shared task by using AdaBoost
applied to xed-depth decision trees. Their sys-
tem used many dierent input features contex-
tual information, word-internal clues, previous
entity classes, part-of-speechtags(Dutchonly)
and external word lists (Spanish only). It pro-
cessed the data in twostages:rst entity recog-
nition and then classication. Their system
obtained the best results in this shared task
for both the Spanish and Dutch test data sets
(Spanish test set: F
=1
=81.39;; Dutch test set:
F
=1
=77.05).
4 Concluding Remarks
We have described the CoNLL-2002 shared
task: language-independent named entity
recognition. Twelve dierent systems have been
applied to data covering twoWestern European
languages: Spanish and Dutch. A boosted deci-
sion tree method obtained the best performance
on both data sets (Carreras et al., 2002).
Acknowledgements
Tjong Kim Sang is supported by IWT STWW
as a researcher in the ATRANOS project.

References
William J. Blackand ArgyriosVasilakopoulos. 2002.
Language-independentnamed entity classication
by modied transformation-basedlearning and by
decision tree induction. In Proceedings of CoNLL-
2002.Taipei, Taiwan.
John D. Burger, John C. Henderson, and William T.
Morgan. 2002. Statistical named entity recog-
nizer adaptation. In Proceedings of CoNLL-2002.
Taipei, Taiwan.
Xavier Carreras, Llus Marques, and Llus Padro.
2002. Named entity extraction using adaboost.
In Proceedings of CoNLL-2002.Taipei, Taiwan.
Nancy Chinchor, EricaBrown, Lisa Ferro, and Patty
Robinson. 1999. 1999 Named Entity Recognition
Task Denition. MITRE and SAIC.
Silviu Cucerzan and David Yarowsky. 2002. Lan-
guage independent ner using a unied model of
internal and contextual evidence. In Proceedings
of CoNLL-2002.Taipei, Taiwan.
Walter Daelemans, Jakub Zavrel, Peter Berck, and
Steven Gillis. 1996. Mbt: A memory-based part
of speech tagger-generator. In Proceedings of the
Fourth Workshop on Very Large Corpora, pages
14{27. Copenhagen, Denmark.
Radu Florian. 2002. Named entity recognition as a
house of cards: Classier stacking. In Proceedings
of CoNLL-2002.Taipei, Taiwan.
Martin Jansche. 2002. Named entity extraction
with conditional markov models and classiers.
In Proceedings of CoNLL-2002.Taipei, Taiwan.
Robert Malouf. 2002. Markov models for language-
independent named entity recognition. In Pro-
ceedings of CoNLL-2002.Taipei, Taiwan.
Paul McNamee and James Mayeld. 2002. Entity
extraction without language-specic resources. In
Proceedings of CoNLL-2002.Taipei, Taiwan.
Jon Patrick, Casey Whitelaw, and Robert Munro.
2002. Slinerc: The sydney language-independent
named entity recogniserand classier. In Proceed-
ings of CoNLL-2002.Taipei, Taiwan.
Lance A. Ramshaw and Mitchell P. Marcus. 1995.
Text chunking using transformation-based learn-
ing. In Proceedings of the Third ACL Workshop
on Very Large Corpora, pages 82{94. Cambridge,
MA, USA.
Erik F. Tjong Kim Sang. 2002. Memory-based
named entity recognition. In Proceedings of
CoNLL-2002.Taipei, Taiwan.
Koji Tsukamoto, Yutaka Mitsuishi, and Manabu
Sassano. 2002. Learning with multiple stacking
for named entity recognition. In Proceedings of
CoNLL-2002.Taipei, Taiwan.
C.J. van Rijsbergen. 1975. Information Retrieval.
Buttersworth.
Dekai Wu, Grace Ngai, Marine Carpuat, Jeppe
Larsen, and Yongsheng Yang. 2002. Boosting
for named entity recognition. In Proceedings of
CoNLL-2002.Taipei, Taiwan.
