Towards User-Adaptive Annotation Guidelines
Stefanie Dipper, Michael G¨otze, Stavros Skopeteas
Dept. of Linguistics
D-14415 Potsdam, Germany
fdipper,goetzeg@ling.uni-potsdam.de
skopetea@rz.uni-potsdam.de
Abstract
In this paper we address the issue of user-
adaptivity for annotation guidelines. We show
that different user groups have different needs
towards these documents, a fact neglected by
most of current annotation guidelines. We pro-
pose a formal specification of the structure of
annotation guidelines, thus suggesting a mini-
mum set of requirements that guidelines should
fulfill. Finally, we sketch the use of these speci-
fications by exemplary applications, resulting in
user-specific guideline representations.
1 Introduction
Linguistic research nowadays makes heavy use of
annotated corpora. The benefit that researchers may
gain from corpora depends to a large extent on doc-
umentation of the annotation. According to Leech’s
maxims, the guidelines that were applied in the an-
notation of the corpus should be accessible to the
user of the corpus (and thus serve as a kind of doc-
umentation), see Leech (1993).1
In this paper, we argue that annotation guidelines,
which are optimized for use by the annotators of the
corpus, often cannot serve as suitable documenta-
tion for users of the annotated corpus. We illustrate
this claim by different types of prototypical corpus
users, who have different needs with respect to doc-
umentation. Extending the proposal by MATE (Dy-
bkjaer et al., 1998), we sketch a preliminary specifi-
cation for annotation guidelines. We then show how
guidelines that are standardized in this way may be
adapted to different user needs and serve both as
guidelines, applied in the annotation process, and
documentation, used by different corpus users.
1“The annotation scheme should be based on guide-
lines which are available to the end user. Most corpora
have a manual which contains full details of the annota-
tion scheme and guidelines issued to the annotators. This
enables the user to understand fully what each instance of
annotation represents without resorting to guesswork, and
to understand in cases of ambiguity why a particular an-
notation decision was made at that point.”, Leech (1993),
cited by http://www.ling.lancs.ac.uk/monkey/
ihe/linguistics/corpus2/2maxims.htm.
This paper grew out of our work in the Son-
derforschungsbereich (SFB, collaborative research
center) on information structure at the University of
Potsdam.2 In the context of this SFB, several indi-
vidual projects collect a large amount of data of di-
verse languages and annotate them on various anno-
tation levels: phonetics/phonology, morpho-syntax,
semantics, and information structure.
Within the SFB, guidelines for the different anno-
tation levels are being created. In order to maximize
the profit of these data, we are developing standard
recommendations on the format and content of the
SFB annotation guidelines. These guidelines ought
to serve the SFB annotators as well as the research
community.
The paper is organized as follows. We first
present different user profiles with different needs
towards annotation guidelines (sec. 2). We then ana-
lyze the form and content of selected existing guide-
lines to some detail (sec. 3) and show that these
guidelines fulfill the user needs only inadequately
(sec. 4). Finally, we sketch a formal specification of
the structure of annotation guidelines and indicate
how XML/XSLT technology can be used to support
user-adaptive annotation guidelines (sec. 5).
2 Guideline Users
Annotation guidelines are used by different types of
users with different requirements. These require-
ments depend on (i) the user’s objectives and (ii)
the user’s background.
2.1 User Objectives
People are interested in annotation guidelines for
different reasons. According to their respective ob-
jectives, we define five user profiles.3
The annotator Annotators assign linguistic fea-
tures to language data, according to criteria and
2http://www.ling.uni-potsdam.de/sfb/
3In a similar way, Carletta and Isard (1999) define three
user types: the coder, the coding consumer, and the coding de-
veloper. These classes, however, refer to users of annotation
workbenches rather than annotation guidelines.
instructions specified in the annotation guidelines.
Important annotation criteria are consistency and
speed.
The corpus explorer The group of corpus explor-
ers encompasses all those who aim at exploiting lin-
guistic data in order to find evidence for or against
linguistic hypotheses. These people need to know
(i) how to find instances of specific phenomena they
are interested in, and (ii) how to interpret the anno-
tations of the phenomena in question.
The language engineer Instead of inspecting the
data “manually”, as the corpus explorer does, the
language engineer applies automatic methods to the
annotated data to process them further. This in-
cludes a variety of tasks, such as statistical evalu-
ations, training and testing of algorithms, and the
extraction of various types of linguistic information.
The guideline explorer The guidelines per se
(i.e., independently of a corpus) are of interest to,
e.g., theoretical linguists who want to know the
principles that underlie the annotation guidelines. In
addition, the guidelines may serve as an example for
authors of other annotation guidelines.
The guideline author The process of writing
guidelines is usually a time-consuming and step-
wise process. Hence, during the process of writ-
ing, the authors themselves make use of their own
guidelines to look up related or similar phenomena
that are already covered therein.
2.2 User Background
A further factor putting constraints on annotation
guidelines is the user’s background. First, (non-)
acquaintance with the language of the corpus is an
important factor: if corpora should be useful also
for people who do not or hardly know the language
of the corpus, annotation guidelines should provide
translations for example sentences and basic infor-
mation about linguistic properties of the object lan-
guage.
Second, (non-)acquaintance with theoretical
analyses of the phenomena has an impact on re-
quirements towards guidelines. People who are ac-
quainted with the linguistic theory that the guide-
lines are based on do not need theoretical introduc-
tions; an example is the Feldertheorie (field theory
of word order) in German, which serves as the basis
of the analyses in the German Verbmobil Treebank
(Stegmann et al., 2000). In addition, people who
know about alternative (competing) analyses of the
phenomena in question may want to know the rea-
sons of the chosen analysis.
3 Form and Content of Guidelines
We consider sample guidelines from different types
of annotation; all sample guidelines are available
via the internet. These guidelines have been cho-
sen to set out the diversity among different lev-
els of linguistic analysis—from morphology to
pragmatics—and among practices established in
different linguists’ communities—from typologists
to language engineers.4
Interlinear morphemic transcription EU-
ROTYP (K¨onig et al., 1993), Leipzig Glossing
Rules (Bickel et al., 2004). These guidelines deal
with the annotation of morpheme boundaries and
morpheme-by-morpheme translation (glossing);
these guidelines have been created by and for
typologists.5
Morphosyntactic annotation Penn Treebank
(POS-tagging guidelines, “POS”) (Santorini,
1995), STTS (Schiller et al., 1999). These guide-
lines have been developed by language engineers
for (semi-)automatic annotation of morphosyntactic
information.6
Syntactic annotation Penn Treebank (bracketing
guidelines, “BG”) (Bies et al., 1995), SPARKLE
(Carroll et al., 1997), VerbMobil, German Treebank
(Stegmann et al., 2000).7
Semantic/pragmatic annotation PropBank
(PropBank Project, 2002), Penn Discourse Tree-
bank (Mitsakaki et al., 2004), DAMSL (Dialog
Act Markup in Several Layers, Allen and Core
(1997)). PropBank and Penn Discourse Treebank
are extensions of the Penn Treebank.
We focus on three aspects of annotation guide-
lines: the components of guideline documents
4The sample guidelines also vary with regard to size (e.g.,
the Leipzig Glossing Rules comprise 9 pages, the Penn Tree-
bank Bracketing Guidelines 317 pages) and status (e.g., the
VerbMobil guidelines are completed, whereas guidelines such
as the Penn Discourse Treebank guidelines are still being de-
veloped).
5We consider only the rules for morphemic transcription
and not the glossing abbreviations in these documents.
6EAGLES provides recommendations for the design of
morphosyntactic tagsets (Leech and Wilson, 1996). Tagsets
represent only a component of annotation guidelines. The
STTS tagset can be viewed as an instantiation of the EAGLES
recommendations.
7A very detailed annotation scheme for syntactic, semantic
and speech annotation is available in book form for the SU-
SANNE corpus (Sampson, 1995). These guidelines are ad-
dressed primarily to the guideline explorer rather than the an-
notator. In this vein, the book provides a detailed discussion
of the annotation principles and theoretical background. We do
not include these guideline in our discussion, since they are not
available electronically.
A B C D E F G H I J K L M N O P
EUROTYP + + + + + +
Leipzig Glossing Rules + + + + + + + +
Penn Treebank (POS) + + + + + + + +
STTS + + + + + + +
Penn Treebank (BG) + + + + + + + + +
VerbMobil Treebank + + + + + + + + +
SPARKLE + + + + + + + +
PropBank + + + + + + + +
Penn Discourse Treebank + + + + + +
DAMSL + + + + + +
Document components:
A general principles
B underlying linguistic theory
C tagset declaration
D related annotation schemes
E tag index
F keyword index
Instruction components:
G keywords
H criteria
I examples
J related instructions
K alternative analyses
Instruction ordering:
L alphabetical tags
M alphabetical keywords
N content-based structure
O default–specific/exceptional
P simple–difficult
Figure 1: Features of the sample guidelines
(sec. 3.1), the components of an annotation instruc-
tion (sec. 3.2), and the ordering of instructions with
respect to each other (sec. 3.3).
3.1 Document Components
The document architecture varies to some extent in
the sample guidelines. In general, however, there is
(i) an introductory part, (ii) the main section, and
(iii) appendices. In the following, we sketch pro-
totypical components of these parts; to a large ex-
tent, these components overlap with the elements
proposed by Dybkjaer et al. (1998). The table in
fig. 1 presents an overview of most of the guide-
line components considered here. The differences
between the guidelines can (partly) be attributed to
the fact that the guidelines address different types of
users.
Introductory part This part comprises basic in-
formation such as the name of the guidelines, the
annotation goal, the type of source data, the anno-
tation markup (e.g., syntactic annotation can be en-
coded by brackets vs. graphs, etc.). In addition, it
addresses general design principles, including gen-
eral annotation conventions (A8), and the underly-
ing linguistic theory and/or statements about theo-
retical problems (B). A general tagset declaration
in the form of an exhaustive list of all admissible
tags plus a short description is often included (C).
Some guidelines refer to related annotation schemes
or standard recommendations like EAGLES (Leech
and Wilson, 1996) (D). Finally, creation notes in-
form about the authors, creation date, status of the
guidelines, etc.
8The letters refer to the table in fig. 1.
Main section This section is always devoted to
the presentation of the actual annotation guidelines,
which we call “(annotation) instructions”. These
will be discussed in detail in sec. 3.2 and 3.3.
Appendices Some guidelines provide tutorials in
the form of exercises for practicing the use of the an-
notation guidelines. Different types of indices (i.e.,
listings of items, e.g. tags, and numbers of all pages
that refer to these items) may be included: alpha-
betical index of the tags (E); thematic indices, e.g.
an index of keywords such as ‘wh-clefts’ (F). In ad-
dition, lists of specific problematic words or con-
structions may be given. Finally, some guidelines
include recommendations for annotation tools and
methods.
3.2 Instruction Components
The core component of annotation guidelines is rep-
resented by the annotation instructions. We first de-
scribe the form and content of an individual instruc-
tion before addressing the question of how the set
of instructions is ordered/structured (sec. 3.3). We
illustrate the description by two annotation instruc-
tions from the Penn Treebank (POS), displayed in
fig. 2.
An individual instruction always refers to one (or
more) tags that represent the information to be an-
notated, e.g., ‘VB’. The instruction usually provides
some sort of keywords (G) for the phenomenon in
question, e.g., ‘verb, base form’ (e.g., headers may
provide such keywords). The guidelines in the sam-
ple include annotation criteria (H) in the form of
a descriptive text (‘This tag subsumes . . . ’) and
some illustrative examples (I) (‘Do/VB it.’). Some-
Verb, base form—VB
This tag subsumes imperatives, infinitives and subjunctives.
EXAMPLES: Imperative: Do/VB it.
Infinitive: You should do/VB it. [. . . ]
Subjunctive: We suggested that he do/VB it.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VB or VBP
If you are unsure whether a form is subjunctive (VB) or a present tense verb (VBP), replace the subject by a
third person pronoun. If the verb takes an -s ending, then the original form is a present tense verb (VBP); if
not, it is a subjunctive (VB).
EXAMPLE: I recommended that you do/VB it.
(cf. I recommended that he do/*does it.)
Figure 2: Two instructions from the Penn Treebank POS-tagging guidelines (Santorini, 1995, pp. 5, 21)
times, the guidelines also specify how to segment
the source data.
Often, the instructions make reference to other,
closely related instructions (whose annotation crite-
ria are similar to the current criteria) and emphasize
the differences between them (‘If you are unsure
whether . . . ’) (J). Finally, alternative (competing)
analyses may be given (K).9
3.3 Instruction Ordering
Guidelines present annotation instructions in a cer-
tain order. The ordering of instructions is a crucial
aspect of the instructions’ presentation: different or-
dering principles implement different perspectives
to the guidelines and, consequently, serve require-
ments of different groups of users (cf. sec. 4).
The sample guidelines make use of the following
ordering principles:
Alphabetical order of the tags (L) In the sec-
tion on problematic cases, the Penn Treebank (POS)
present the tags and their instructions in an alpha-
betical order (from ‘CC’ to ‘WDT’). (Other guide-
lines make use of this type of ordering in an addi-
tional tag index.)
Alphabetical order of keywords (M) Canonical
cases in the Penn Treebank (POS) are ordered al-
phabetically with respect to keywords (from ‘Adjec-
tive’ to ‘Wh-adverb’).
Content-based structure (N) Instructions are of-
ten presented in thematic units, e.g. all tags encod-
ing nominal features are grouped together. More-
over, complex annotation guidelines are usually or-
ganized in an hierarchical structure, with chapters,
sections, etc., which mirror the complex structure of
9The guidelines considered here at most allude implicitely
to alternative analyses: by giving arguments in favour of the
chosen analaysis.
the described phenomena. For instance, the Verb-
Mobil guidelines contain a chapter about the anno-
tation of phrasal constituents, with sections address-
ing NPs, PPs, etc., and PP subsections addressing
prepositions and circum/postpositions. In DAMSL,
criteria in the form of decision trees guide the anno-
tator through the annotation.
From default to specific/exceptional cases (O)
This is an ordering principle that is usually used
in combination with other principles. For instance,
single sentences are presented before multiple sen-
tences in the guidelines of the Penn Discourse Tree-
bank.
Degree of difficulty (P) Similarly, in combina-
tion with other ordering principles, the guidelines
often proceed from easy to difficult cases. For in-
stance, the Leipzig Glossing Rules first introduce
morphemic transcription of prefixes and suffixes.
Only later are infixes and circumfixes addressed;
these represent a problematic case for interlinear
morphemic translations due to the lack of isomor-
phism between the layer of transcription and the
layer of translation.
Usually, guidelines make use of several ordering
principles, e.g., main instructions are structured ac-
cording to content, (embedded) subinstructions are
ordered from default to specific case, and indices are
ordered alphabetically according to keywords.
4 User Requirements
Current annotation projects usually do not provide
separate guideline documents for different types of
users. Usually, annotation documentation emerges
from the annotating practice, supporting the an-
notator in the annotation task. At the publish-
ing stage, this documentation is often transferred
into a more general document, by adding informa-
tion about annotation conventions, format, methods,
etc.—however, the basic structure of the annotations
instructions remains unaltered. The obvious conse-
quence of this practice is that existing guidelines of-
ten ignore the requirements of certain types of users.
We illustrate different user requirements by some
typical examples. These requirements concern (i)
document components, (ii) instruction components,
and (iii) instruction ordering:
Annotator Typical users are annotators who are
confronted with the guidelines for the first time.
(i) Annotators primarily need a tutorial introduction
and maybe information about the annotation goals.
(ii) They have to learn specific instructions, sup-
ported by didactic examples.
(iii) The appropriate order is from default to excep-
tional or from easy to difficult. Orderings in the
form of decision trees may facilitate the acquisition.
Corpus explorer A further sample user is a re-
searcher who looks for a specific phenomenon in
the corpus:
(i) Corpus explorers need an index of phenomena
(keywords) to look up the tags that encode the phe-
nomenon they are interested in. Moreover, when
inspecting the encoding of this phenomenon, they
might come across other tags they are not yet famil-
iar with. Hence, they also need an index of tags (or
a tagset declaration) to look up the meaning of these
tags.
(ii) They need detailed information about the anno-
tation criteria. Take, for instance, a corpus that is
annotated with respect to information-structural cat-
egories and imagine a corpus explorer who is inter-
ested in topic and focus. Before looking for data,
s/he has to know the exact definitions (criteria) of
topic and focus that have been applied in the anno-
tation.
(iii) The easiest way for the corpus explorer to find
annotation criteria of phenomena and tags is by
means of an alphabetic ordering.
Language engineer Finally, language engineers
may undertake a statistical evaluation of the corpus
data:
(i) They primarily need a tagset declaration, with-
out being interested in any details. In addition, the
circumstances of the annotation are relevant (e.g.,
whether the corpus has been annotated manually,
twice, etc.).
(ii), (iii) Probably, the language engineer would not
need any information about annotation instructions.
Comparing these user requirements and the
guideline features in fig. 1, we see that the guide-
lines are more oriented towards the annotator than
the corpus explorer: Features such as indices (E, F)
are often missing, whereas the predominant instruc-
tion ordering is content-based ordering (N).
5 Towards User-Adaptive Annotation
Guidelines
In what follows, we present a preliminary guideline
specification that allows for generating user-adapted
guideline representations. In the second part, we
illustrate the applicability of the specification.
5.1 Guideline Specification
For the specification of user-adaptive guidelines we
adopt ideas from the MATE Markup Framework
(Dybkjaer et al., 1998), which uses so-called Cod-
ing Modules for the specification and representa-
tion of annotation schemes. Building upon MATE,
we define semi-formal class specifications, Guide-
line modules, which we extend with an Instruction
module for the annotation instructions.10 In con-
trast to MATE, we understand the Guideline module
as an underlying specification from which different
representations can be generated. We sketch how
the guidelines can be encoded by XML, which en-
ables the generation of user-adapted representations
through stylesheet technology (e.g. XSLT).
The Guideline module The guideline module
(see fig. 3) constitutes the basis for the specifica-
tion of annotation guidelines. It includes a subset
of the items in the MATE Coding modules and the
document components introduced and explained in
sec. 3.1. Components that can be derived automati-
cally, such as the tagset declaration and indices, are
not part of the specification, since these can be gen-
erated from the information present in the Instruc-
tion module.
The Instruction module Annotation instructions
are specified in the Instruction module. In fig. 4,
we sketch a preliminary XML representation of the
two instructions in fig. 2. The single elements and
attributes specify the instruction components exem-
plified there. In addition, the instruction for the tag
‘VB’ refers to the second instruction via the ‘re-
lated’ element, marking it as a ‘problematic case’.
The second instruction indeed helps the annotator
to decide between the assignment of two tags, ‘VB’
and ‘VBP’. For both tags, ‘criterion’ elements with
application conditions, the respective annotation ac-
tion, and examples are declared.
10The MATE Markup Framework neither addresses the en-
coding of annotation guidelines nor the issue of user-adaptivity
explicitly.
# Component Example
1 Guideline title Part-of-speech tagging guidelines for the Penn Treebank, 3rd Revision
2 Annotated information Part of speech
3 Type of source data English text
4 General principles ...
(annotation conventions & format)
5 Relation to linguistic theories ...
6 Related annotation schemes Bies et al. (1995):“Bracketing Guidelines for Treebank II Style”, . . .
7 Annotation instructions ! INSTRUCTION MODULE
8 Creation notes: authors, status, etc. Beatrice Santorini, 1995, 3rd revision
Figure 3: The Guideline module
<instruction tags="VB" keywords="verb, base form" difficulty="easy"
id="instr 1">
<text>This tag subsumes imperatives, infinitives and subjunctives</text>
<criterion>
<condition>verb in base form</condition>
<action>label VB</action>
<example comment="imperative">Do/VB it.</example>
<example comment="infinitive">You should do/VB it.</example>
<example comment="subjunctive">We suggested that he do/VB it.</example>
</criterion>
<related type="problematic case" ref="instr 23"/>
</instruction>
......................................................................................................
<instruction tags="VB, VBP" keywords="verb, subjunctive, present tense"
difficulty="medium" id="instr 23">
<text>If you are unsure whether a form is a subjunctive (VB) or a present
tense verb (VBP), replace the subject by a third person pronoun.</text>
<criterion>
<condition>verb does not take an -s ending</condition>
<action>label VB</action>
<example>I recommended that you do/VB it.</example>
<test>I recommend that he do/*does it.</test>
</criterion>
<criterion>
<condition>verb takes an -s ending</condition>
<action>label VBP</action>
</criterion>
</instruction>
Figure 4: Instructions of fig. 2 as specified in the Instruction module
5.2 Application Examples
The exemplary encoding enables the generation of
a number of various types of user-adapted guideline
representations and document components:
– For all user profiles: The ‘tags’ and ‘keywords’
attributes of the instruction elements in fig. 4 al-
low us to automatically generate indices as lists of
tag:page-number pairs (resp. keyword:page-number
pairs) and tagset declarations as tag:keyword pairs.
– For the annotator: The ‘difficulty’ attribute can be
used as a guiding principle for the creation of tu-
torial exercises for the annotator, which might start
with easy annotation examples and develop towards
more difficult instructions. Furthermore, when the
annotator annotates a certain tag, the annotation tool
may display the corresponding ‘text’ element as an
“online help” for the annotator.
– For the guideline author: When the author as-
signs keywords to the instruction s/he is currently
working on, the ‘keywords’ attribute can be used
to point to related instructions (marked by the same
keywords). The formal specification in general can
be used to support the guideline authors, by com-
pleteness and consistency checks.
6 Conclusions
Current guidelines only provide support for a subset
of the potential users. As we have shown in this pa-
per, different user types, such as annotators, corpus
explorers, language engineers, etc., require different
forms of guidelines in order to fulfill their specific
tasks related to an annotated corpus.
To answer these requirements, we propose a gen-
eral guideline structure which serves as the basis for
generation of user-adapted documents. With the use
of XML/XSLT technology, a broad variety of user-
specific applications can be realized.
It is clear that the detailed specification we pro-
pose make high demands on the guideline authors.
However, forcing the authors to fulfill requirements
such as explicitness (as for the declaration of the
exact annotation action), completeness (keywords,
examples for every instruction), etc., will result
in high-quality standardized annotation guidelines,
which we believe will pay off in greater benefit from
the annotated corpora.

References
James Allen and Mark Core. 1997. DAMSL: Di-
alog Act Markup in Several Layers. Draft;
http://www.cs.rochester.edu/
research/cisd/resources/damsl/
RevisedManual/RevisedManual.html.
Balthasar Bickel, Bernard Comrie, and Martin Haspel-
math. 2004. The Leipzig Glossing Rules. Conven-
tions for interlinear morpheme by morpheme glosses.
Max Planck Institute for Evolutionary Anthropol-
ogy and Department of Linguistics, University of
Leipzig; http://www.eva.mpg.de/lingua/
files/morpheme.html.
Ann Bies, Mark Ferguson, Karen Katz, and Robert
MacIntyre. 1995. Bracketing Guidelines for Tree-
bank II Style, Penn Treebank Project. Department
of Computer and Information Science, University
of Pennsylvania; ftp://ftp.cis.upenn.edu/
pub/treebank/doc/manual/.
Jean Carletta and Amy Isard. 1999. The MATE annota-
tion workbench: User requirements. In Proceedings
of the ACL Workshop Towards Standards and Tools
for Discourse Tagging, University of Maryland.
John Carroll, Ted Briscoe, Nicoletta Calzolari, Stefano
Federici, Simonetta Montemagni, Vito Pirrelli, Greg
Grefenstette, Antonio Sanfilippo, Glenn Carroll, and
Mats Rooth. 1997. SPARKLE Work Package 1: Spec-
ification of Phrasal Parsing. Final Report, 1997-TR-
1; http://dienst.iei.pi.cnr.it/.
Laila Dybkjaer, Niels Ole Bernsen, Hans Dybkjaer,
David McKelvie, and Andreas Mengel. 1998. The
MATE Markup Framework. MATE Deliverable D1.2,
http://mate.nis.sdu.dk/information/d12/.
Ekkehard K¨onig, Dik Bakker, ¨Oesten Dahl, Mar-
tin Haspelmath, Maria Koptjevskaja-Tamm, Chris-
tian Lehmann, and Anna Siewierska. 1993. EU-
ROTYP Guidelines. European Science Foundation
Programme in Language Typology. http://
www-uilots.let.uu.nl/ltrc/eurotyp/.
Geoffrey Leech and Andrew Wilson. 1996. EAGLES
recommendations for the morphosyntactic annotation
of corpora. Technical Report EAG-TCWG-MAC/R,
ILC-CNR, Pisa; http://www.ilc.cnr.it/
EAGLES96/annotate/annotate.html.
Geoffrey Leech. 1993. Corpus annotation schemes. Lit-
erary and Linguistic Computing, 8(4):275–281.
Eleni Mitsakaki, Rashmi Prasad, Aravind Joshi,
and Bonnie Weber. 2004. Penn Discourse Tree-
bank: Annotation Tutorial. Institute for Research
in Cognitive Science, University of Pennsylva-
nia; http://www.cis.upenn.edu/˜pdtb/
dltag-webpage-stuff/pdtb-tutorial.pdf.
PropBank Project. 2002. PropBank Annotation Guide-
lines. Version 3; http://www.cis.upenn.
edu/˜ace/propbank-guidelines-feb02.pdf.
Geoffrey Sampson. 1995. English for the computer:
The SUSANNE corpus and analytic scheme. Oxford:
Clarendon Press.
Beatrice Santorini. 1995. Part-of-Speech Tagging
Guidelines for the Penn Treebank Project. 3rd
Revision, 2nd printing; ftp://ftp.cis.upenn.
edu/pub/treebank/doc/tagguide.ps.gz;
Department of Computer and Information Science,
University of Pennsylvania.
Anne Schiller, Simone Teufel, Christine St¨ockert, and
Christine Thielen. 1999. Guidelines f¨ur das Tagging
deutscher Korpora mit STTS. http://www.ims.
uni-stuttgart.de/projekte/corplex/
TagSets/stts-1999.pdf.
Rosmary Stegmann, Heike Telljohann, and Erhard
Hinrichs. 2000. Stylebook for the German Treebank
in VERBMOBIL. Technical Report 239, Verbmobil;
http://verbmobil.dfki.de/cgi-bin/
verbmobil/htbin/decode.cgi/share/
VM-depot/FTP-SERVER/vm-reports/
report-239-00.ps.
