STANDARDISATION EFFORTS ON THE LEVEL OF 
DIALOGUE ACT IN THE MATE PROJECT 
Marion Klein 
DFKI German Research Center for Artificial Intelligence GmbH 
Stuhlsatzenhausweg 3 
66123 Saarbrficken 
Marion.Klein@dfki.de 
http ://www. dfki. de/~mklein 
Abstract 
This paper describes the state of the art of coding 
schemes for dialogue acts and the efforts to estab- 
lish a standard in this field. We present a review 
and comparison of currently available schemes and 
outline the comparison problems we had due to do- 
main, task, and language dependencies of schemes. 
We discuss solution strategies which have in mind 
the reusability of corpora. Reusability is a cru- 
cial point because production and annotation of 
corpora is very time and cost consuming but the 
current broad variety of schemes makes reusability 
of annotated corpora very hard. The work of this 
paper takes place in the framework of the Euro- 
pean Union funded MATE project. MATE aims 
to develop general methodological guidelines for 
the creation, annotation, retrieval and analysis of 
annotated corpora. 
INTRODUCTION 
Over the last years, corpus based approaches have 
gained significant importance in the field of natu- 
ral language processing (NLP). Large corpora for 
many different languages are currently being col- 
lected all over the world, like 
• \[British National Corpus\], 
• \[The TRAINS Spoken Dialogue Corpus\], or 
• \[The Child Language Data\]. 
In order to use this amount of data for training and 
testing purposes of NLP systems, corpora have to 
be annotated in various ways by adding, for exam- 
ple, prosodic, syntactic, or dialogue act informa- 
tion. This annotation assumes an underlying cod- 
ing scheme. The way such schemes are designed 
depends on the task, the domain, and the linguis- 
tic phenomena on which developers focus. The au- 
thor's own style and scientific background also has 
its effects on the scheme. So far, standardisation in 
the field of dialogue acts is missing and reusability 
of annotated corpora in various projects is com- 
plicated. On the other hand reusability is needed 
to reduce the costs of corpus production and an- 
notation time. 
The participating sites of the EU sponsored 
project MATE (Multi level Annotation Tools En- 
gineering) reviewed the world-wide approaches, 
available schemes \[Klein et al.1998\], and tools 
on spoken dialogue annotation \[Isard et ai.1998\]. 
The project builds its own workbench of inte- 
grated tools to support annotation, evaluation, 
statistical analysis and mapping between differ- 
ent formats. MATE also aims to develop a pre- 
liminary form of standard concerning annotation 
schemes on various levels to support the reusabil- 
ity of corpora and schemes. 
In this paper we focus on the level of dia- 
logue acts. We outline the results of the com- 
parison of the reviewed coding schemes based on 
\[Klein et al.1998\] and discuss best practice tech- 
niques for annotation of mass data on the level 
of dialogue acts. These techniques are considered 
as a first step towards a standard on the level of 
dialogue acts. 
COMPARISON OF CODING 
SCHEMES 
Plenty of research has been done in the field of an- 
notation schemes and many schemes for different 
purposes exist. Not all of these schemes can be an- 
notated reliably and are suitable for reuse. In the 
following we state guidelines we have developed for 
selecting most appropriate schemes and represent 
the results of our scheme comparison according to 
these guidelines. 
Firstly, it is important for us that there is a 
coding book provided for a scheme. Without def- 
35 
inition of a tag set, decision trees, and annotation 
examples, a scheme is hard to apply. Also the 
scheme has to show that it is easy to handle which 
means it should have been successfully used by a 
reasonable number of annotators on different lev- 
els of expertise. For reusability reasons, language, 
task, and domain independence is required. Ad- 
ditionally, it is crucial that the scheme has been 
applied to large corpora. The annotation of mass 
data is the best indicator for the usability of a 
scheme. Finally, it was judged positive if schemes 
directly proved their reliability by providing a nu- 
merical evaluation of inter-coder agreement, e.g. 
the ~-value \[Carletta1996\]. 
Information about schemes was collected from 
the world wide web, from recent proceedings and 
through personal contact. We compared 16 dif- 
ferent schemes, developed in the UK, Sweden, the 
US, Japan, the Netherlands, and Germany. Most 
of these schemes were applied to English language 
data. Only three of the reviewed schemes were 
annotated in corpora of more than one language, 
and thus, indicate some language independence. 
A drawback in reusing schemes for different 
purposes is tailoring them to a certain domain or 
task. Nevertheless, most of the ongoing projects in 
corpus annotation look at two-agent, task-oriented 
dialogues, in which the participants collaborate to 
solve some problem. These facts are also reflected 
in the observed schemes which were all designed 
for a certain task and/or used in a specific domain. 
With regard to the evaluation guidelines 
stated above we can positively mention that all 
schemes provide coding books. Also, all schemes 
were applied to corpora of reasonable size (10 K 
- 16 MB data). In 14 cases expertised annotators 
were employed to apply the schemes which leads 
to the assumption that these schemes are rather 
difficult to use. The inter-coder agreement, given 
by 10 of the schemes, shows intermediate to good 
results. 
The comparison of tag sets was performed 
differently with regard to higher and lower or- 
der categories. The definition of higher order 
categories was mainly driven by the linguistic, 
e.g. \[Sacks and Schegloff1973\], and/or philosoph- 
ical theories, e.g. \[Searle1969\], the schemes were 
based on. Whereas definitions and descriptions 
of lower order categories were influenced by the 
underlying task the scheme was designed for, e.g. 
information retrieval, and the domain of the cor- 
pus the scheme was applied to, e.g. conversation 
between children. 
The only higher order aspect that was implic- 
itly or explicitly covered in all schemes was for- 
ward and backward looking functionality. This 
means that a certain dialogue segment is related 
to a previous dialogue part, like a "RESPONSE", 
or to the following dialogue part, like a "CLAIM" 
that forces a reaction from the dialogue partner. 
On the level of lower order tags we could see 
tags 
• with nearly equivalent definitions, e.g. 
- the dialogue act "REQUEST" definition in D. 
Traum's scheme: 
"The speaker aims to get the hearer to per- 
form some action." 
\[Traum1996\] 
compared to 
- the dialogue act "RA" definition in S. Condon 
& C. Cech's scheme: 
"Requests for action function to indicate 
that the speaker would like the hearer(s) 
to do something \[...\]" 
\[Condon and Cech1995\] 
compared to 
- the dialogue act "REQUEST" definition in 
the VERBMOBIL scheme: 
"If you realise that the speaker requests 
some action from the hearer \[...\] you use 
the dialogue act REQUEST" 
\[Alexandersson et a1.1998\]; 
• which broadly seem to cover the same feature 
with slightly different description facettes, e.g. 
- the dialogue act "OPEN-OPTION" definition 
in the DAMSL scheme: 
"It suggests a course of action but puts no 
obligation to the listener." 
\[Allen and Core1997\] 
compared with the examples above; 
• which differ completely from the rest, e.g. 
- the dialogue act "UPDATE" definition in the 
LINLIN scheme: 
"where users provide information to the 
system" 
\[Dahlb~ck and JSnsson1998\] 
-- addressed to human-machine dialogues. 
Especially the last group can be interpreted as 
highly task or domain dependent. 
36 
HOW TO ACHIEVE A 
STANDARD 
There are several possibilities, how standardisa- 
tion on the level of dialogue acts can be achieved. 
One possibility is to develop a single, very gen- 
eral scheme. Our impression is, that such a 
new scheme which has not proven usability is 
not going to be accepted by researchers who 
want to look at a certain phenomena of interest. 
The CHAT scheme used in the CHILDES system 
\[MacWhinney\], for example, distinguishes 67 dif- 
ferent dialogue acts -- it is very unlikely that a 
general scheme would fit all of their requirements 
concerning children's conversation. 
Another possibility is to provide a set of cod- 
ing schemes for several purposes. These already 
existing coding schemes must hold the condition 
that they have proven reliability in mass data an- 
notation. As there cannot exist a scheme for every 
purpose, this approach only serves developers of 
new schemes who want to get an idea how to pro- 
ceed. With regard to the problem of standardisa- 
tion this solution is very unsatisfiable as mapping 
between schemes is often impossible, if schemes 
do not have a common ground, like the SLSA 
scheme that models feedback and own communica- 
tion management \[Nivre et al.1998\], and the AL- 
PARON scheme \[van Vark and de Vreught1996\] 
with the primary objective to analyse the previ- 
ously mentioned dialogues to model information 
transfer. 
The Discourse Resource Initiative (DRI) 
group provided input on a third possibility: De- 
veloping best practice methods for scheme design, 
documentation and annotation. 
Scheme Design 
We can classify the existing schemes in two cat- 
egories: multi-dimensional and single-dimensional 
schemes. 
Multi-dimensional schemes are based on 
the assumption that an utterance covers sev- 
eral different orthogonal aspects, called dimen- 
sions. Each dimension can be labeled. DAMSL 
\[Allen and Core1997\], for instance, is a scheme 
that implements a four dimensional hierarchy. 
These dimensions are tailored to two-agent, task- 
oriented, problem-solving dialogues. Suggested di- 
mensions are 
• Communicative Status which records whether 
an utterance is intelligible and whether it was 
37 
successfully completed, 
• In\]ormation Level which represents the seman- 
tic content of an utterance on an abstract level, 
• Forward Looking Function which describes how 
an utterance constraints the future beliefs and 
actions of the participants, and affects the dis- 
course, and 
• Backward Looking Function which characterizes 
how an utterance relates to the previous dis- 
course. 
Single-dimensional schemes consist of one sin- 
gle list of possible labels. Their labels belong 
basically to what is called Forward and Back- 
ward Looking Functions in DAMSL. Apart from 
DAMSL all observed schemes belong to this cate- 
gory. 
Comparing both categories, the multi-di- 
mensional approach is more linguistically moti- 
vated presenting a clear modeling of theoreti- 
cal distinctions; but annotation experiments have 
shown that it takes more effort to apply them than 
it is for single-dimensional schemes. On the other 
hand, although a single-dimensional scheme is eas- 
ier to annotate it is hard to judge from outside 
what kind of phenomena such a scheme tries to 
model as dimensions are merged -- a major dis- 
advantage if reusability is considered. An example 
for a dialogue act that merges DAMSL's backward 
and forward looking function is the "CHECK" 
move in the Map Task scheme. "A CHECK move 
requests the partner to confirm information that 
the speaker has some reason to believe, but is not 
entirely sure about" \[Carletta et al.1996\]. This re- 
flects the forward looking aspect of such a dia- 
logue act. "However, CHECK moves are almost 
always about some information which the speaker 
has been told" \[Carletta et al.1996\] -- a descrip- 
tion that models the backward looking functional- 
ity of a dialogue act. 
Our suggestion to tackle the problem of what 
kind of scheme is most appropriate, is to use 
single- and multi-dimensional schemes in parallel. 
The developer of a new scheme is asked to think 
precisely what kind of phenomena will be explored 
and what kind of tags are needed for this pur- 
pose. These tags have to be classified with regard 
to the dimension they belong to. The theoretical 
multi-dimensional scheme will then be applied to 
some test corpora. The example annotation shows 
which tags are less used than others and which 
tag combinations often occur together. Based on 
this information the scheme designer can derive a 
flattened single-dimensional version of the multi- 
dimensional scheme. The flattened or merged 
scheme is used for mass data-annotation. A map- 
ping mechanism has to be provided to convert a 
corpus from its surface structure, annotated us- 
ing the single-dimensional scheme to the internal 
structure, annotated using the multi-dimensional 
scheme. The multi-dimensional scheme can easily 
be reused and extended by adding a new dimen- 
sion. Furthermore, the corpus annotated with the 
multi-dimensional scheme is not any longer task 
dependent. 
Scheme Documentation 
Each coding scheme should provide a coding book 
to be applicable. Such a document is needed to 
help other researchers to understand why a tag 
set was designed in the way it is. Therefore the 
introduction part of a coding book should state 
the purpose, i.e. task and domain, the scheme is 
designed for, the kind of information that has been 
labeled with regard to the scheme's purpose, and 
the theory the scheme is based on. 
For detailed information about a tag, a tag set 
definition is required. Following \[Carletta1998\], 
such a definition should be mutual exclusive and 
unambiguous so that the annotator finds it easy to 
classify a dialogue segment as a certain dialogue 
act. Also the definition should be intention-based 
and hence easy to understand and to remember, 
so that the annotator does not have to consult 
the coding book permanently even after using the 
scheme for quite a while. 
We suggest, that a coding book should con- 
tain a decision tree that aims to give an overview 
of all possible tags and how they are related to 
each other. Additionally, the decision tree has to 
be supplemented by rules that help to navigate 
through the tree. For each node in the tree there 
should be a question which states the condition 
that has to be fulfilled in order to go to a lower 
layer in the tree. If no further condition holds, the 
current node (or leaf) in the tree represents the 
most appropriate tag. As an example of a sub- 
tree plus decision rules see Figure 1, taken from 
\[Alexandersson et ai.1998\]. 
Decision CONTROL_DIALOGUE: 
if the segment is used to open a dialogue by greet- 
ing a dialogue partner 
then label with GREET 
else if the segment is used to close a dialogue 
by saying good-bye to a dialogue partner then 
label with BYE. 
else if the segment contains the introduction of 
the speaker, i.e. name, title, associated com- 
pany etc. label with INTRODUCE 
else if the segment is used to perform an action 
of politeness like asking about the partner's 
good health or formulating compliments label 
with POLITENESS_FORMULA 
else if the segment is used to express grati- 
tude towards the dialogue partner label with 
THANK 
else if the segment is used to gain dialogue 
time by thinking aloud or using certain for- 
mulas label with DELIBERATE 
else if the segment is used to signal under- 
standing (i.e. acknowledging intact commu- 
nication) label with BACKCHANNEL 
CONTROL_DIALOGUE 
GREET 
BYE 
INTRODUCE 
/ POLITENESS_FORMULA 
THANK 
\ DELIBERATE 
BACKCHANNEL 
Figure 1 
Examples should complement the scheme's 
description. These examples should present ordi- 
nary but also problematic and more difficult cases 
of annotation. The difficulties should be briefly 
explained. 
Experiences have shown that for new coders 
tag set definitions are most important to get an 
understanding of schemes. Annotation examples 
serve as a starting point to get a feeling for an- 
notation but to manage the annotation task, deci- 
sion trees are used until coders are experienced 
enough to perform annotation without using a 
coding book. This shows how important these 
three components of a coding book are in order to 
38 
J 
give new annotators or other scientists best sup- 
port to understand and apply a coding scheme. 
To interpret the evaluation results of inter- 
coder agreement in the right way, the coding pro- 
cedure that was used for annotation should be 
mentioned. Such a coding procedure covers, for 
example, how segmentation of a corpus is per- 
formed, if multiple tagging is allowed and if so, 
is it unlimited or are there just certain combina- 
tions of tags not allowed, is look ahead permitted, 
etc.. 
For further information on coding procedures 
we want to refer to \[Dybkjmr et al.1998\] and for 
good examples of coding books see, for example, 
\[Carletta et al.1996\], \[Alexandersson et al.1998\], 
or \[Thym~-Gobbel and Levin1998\]. 
Annotation Support 
Another criterion which is important to in- 
crease the effectiveness of annotation is us- 
ing a user-friendly annotation tool. Such a 
tool also guarantees consistency, as typing er- 
rors are avoided and, hence, improves the qual- 
ity of annotated corpora. This issue is ad- 
dressed by the MATE workbench. Other, al- 
ready existing tools are the ALEMBIC Workbench 
by \[The Mitre Corporation\], NB by \[Flammia\], 
or FRINGE used in the FESTIVAL system by 
\[The Centre for Speech Technology Research\]. 
DIALOGUE ACT LEVEL 
REALISATION IN MATE 
The approach in MATE is to reuse the DAMSL 
scheme as an example for an internal multi- 
dimensional scheme and a variant of the SWBD- 
DAMSL scheme \[Jurafsky et al.1997\] as its exam- 
ple flattened surface counterpart. SWBD-DAMSL 
was derived from the original DAMSL scheme us- 
ing the techniques described above. Unfortunately 
some additional tags were added so that an exact 
mapping from one scheme to the other is not pos- 
sible any more. For this reason the MATE SWBD- 
DAMSL variant omits these additional tags. 
MATE uses XML \[The W3Ca\], a widely ac- 
cepted interchange and storage format for struc- 
tured textual data, to represent the schemes and 
the annotated corpora. 
Stylesheets (a subset of XSL \[The W3Cb\]) are 
used as a mapping mechanism between corpora 
annotated with the surface scheme and corpora 
annotated with the internal scheme. 
39 
The choice in MATE to use the W3C (World 
Wide Web Consortium) proposals is because XML 
is the latest, most flexible data exchange format 
currently available and strongly supported by in- 
dustry. XSL supplements XML insofar that it re- 
alises the formatting of an XML document. 
The facilities for dialogue act annotation are 
embedded in the MATE workbench. The work- 
bench is currently being implemented in Java 1.2 
as a platform independent approach. This makes 
the distribution process of the workbench easier 
and supports wide-spreading MATE's ideas of best 
practice in annotation. 
RELATED WORK 
Projects which are related to MATE's aim to de- 
velop a preliminary form of standard concerning 
annotation schemes are the DR/which was started 
as an effort to assemble discourse resources to 
support discourse research and application. The 
goal of this initiative is to develop a standard 
for semantic/pragmatic and discourse features of 
annotated corpora \[Carletta et al.1997\]. Another 
project, LE-EAGLES, also has the goal to pro- 
vide preliminary guidelines for the representa- 
tion or annotation of dialogue resources for lan- 
guage engineering \[Leech et al.1998\]. These guide- 
lines cover the areas of orthographic transcription, 
morpho-syntactic, syntactic, prosodic, and prag- 
matic annotation. LE-EAGLES describes most 
used schemes, markup languages, and systems for 
annotation rather than proposing standards. 
CONCLUSION AND FUTURE 
WORK 
Having reviewed a large amount of currently avail- 
able coding schemes for dialogue acts we pre- 
sented a methodology how to tackle the standard- 
isation problem. We outlined best practice for 
scheme and coding book design which hopefully 
will lead to a better understanding and reusabil- 
ity of schemes and corpora annotated using the 
proposed method. 
Our approach is currently being implemented 
in the MATE workbench and will be tested and 
enhanced in the remaining time of the project. It 
will be applied to the CSELT, Danish Dialogue 
System, CHILDES, MAPTASK and VERBMO- 
BIL corpus to help making it as task, domain and 
language independent as possible. Inadequacies 
during the testing phase which are related to the 
internal scheme we use will be discussed with the 
members of the DRI to further improve the scheme 
or its flattened variant. 
ACKNOWLEDGMENT 
I would like to thank the members of the DRI 
for stimulating and fruitful discussions and sugges- 
tions, and Norbert Reithinger, Jan Alexandersson, 
and Michael Kipp who gave valuable feedback on 
this paper. 
The work described here is part of the Euro- 
pean Union funded MATE LE Telematics Project 
LE4-8370. 

References 
\[Alexandersson et a1.1998\] J. Alexandersson, B. 
Buschbeck-Wolf, T. Fujinami, M. Kipp, 
S. Koch, E. Maier, N. Reithinger, B. Schmitz, 
and M. Siegel. 1998. Dialogue acts in verbmobil- 
2, second edition. Verbmobil Report 226. 
\[Alien and Core1997\] J. Allen and M. Core. 1997. 
Draft of damsl: Dialogue act markup in sev- 
eral layers, http://www.cs.rochester.edu:80\]re 
search/trains/annotation. 
\[British National Corpus\] British National Cor- 
pus. http://info.ox.ac.uk/bnc. 
\[Carletta et al.1996\] J. Carletta, A. Isard, S. Is- 
ard, J. Kowtko, G. Doherty-Sneddon, and 
A. Anderson. 1996. Hcrc dialogue structure cod- 
ing manual, http://www.hcrc.ed.ac.uk/jeanc/. 
\[Carlettaet a1.1997\] J. Carletta, N. Dahlb/ick, 
N. Reithinger, and M. A. Walker. 1997. Stan- 
dards for dialogue coding in natural language 
processing. Dagstuhl Seminar Report. (editors). 
\[Carletta1996\] J. Carletta. 1996. Assessing agree- 
ment on classification tasks: the kappa statis- 
tic. In Computational Linguistics, volume 22(2), 
pages 249--254. 
\[Carletta1998\] J. Carletta. 1998. The history 
of discourse representation. 2nd DRI Meeting, 
Japan. Slides. 
\[Condon and Cech1995\] S. Condon and C. Cech. 
1995. Manual for coding decision-making in- 
teractions, ftp://sls-ftp.lcs.mit.edu/pub/mul 
tiparty/coding_schemes/condon. 
\[Dahlb/ick and JSnsson1998\] 
N. Dahlb/ick and A. JSnsson. 1998. A cod- 
ing manual for the linkceping dialogue model. 
http:/ /www.cs.umd.edu/users/traum/DSD /ar 
ne2.ps. 
\[Dybkjaer et al.1998\] L. Dybkjaer, N.O. Bernsen, 
H. Dybkjaer, D. McKelvie, and A. Mengel. 1998. 
The MATE Markup Framework. MATE Deliv- 
erable D1.2. 
\[Flammia\] G. Flammia. The nb annotation tool. 
http://www.sls.lcs.mit.edu/flammia/Nb.html. 
\[Isard et a1.1998\] A. Isard, D. McKelvie, B. Cap- 
pelli, L. Dybkjaer, S. Evert, A. Fitschen, 
U. Heid, M. Kipp, M. Klein, A. Men- 
gel, and N. Reithinger M. Baun MO ller. 
1998. Specification of coding workbench. 
http: / /www.cogsci.ed.ac.uk / ,~amyi / mate/ repo 
rt.html. 
\[Jurafskyet al.1997\] D. Jurafsky, L. Shriberg, 
and D. Biasca. 1997. Switchboard swbd- 
damsl, shallow-discourse-function annotation. 
http://stripe.Colorado.EDU/"jurafsky/manual 
".angustl.html. 
\[Klein e t al.1998\] M. Klein, N. O. Bernsen, 
S. Davies, L. Dybkjaer, J. Garrido, H. Kasch, 
A. Mengel, V. Pirrelli, M. Poesio, S. Quazza, 
and C. Soria. 1998. Supported coding schemes. 
http://www.dfld.de/mate/d11. 
\[Leech et al.1998\] G. Leech, M. Weisser, and 
A. Wilson. 1998. Draft chapter: Survey and 
guidelines for the represention and annotation 
of dialogue. LE-EAGLES-WP4-3.1. Integrated 
Resources Working Group. 
\[MacWhinney\] B. MacWhinney. The childes 
project: Tools for analysing talk. http://poppy. 
psy.cmu.edu/childes/index.html. 
\[Nivre et al.1998\] J. Nivre, J. Allwood, and 
E. Ahls~n. 1998. Interactive communication 
management: Coding manual. 
\[Sacks and Schegloff1973\] H. Sacks and E. Sche- 
gloff. 1973. Opening up closing. In Semiotica, 
volume 8, pages 289--327. 
\[Searle1969\] J. R. Searle. 1969. Speech Acts. Cam- 
bridge University Press. 
\[The Centre for Speech Technology Research\] 
The Centre for Speech Technology 
Research. The festival speech synthesis system. 
http://www.cstr.ed.ac.uk/projects/festival.html. 
\[The Child Language Data\] The Child Language 
Data. http://sunger2.uia.ac.be/childes. 
\[The Mitre Corporation\] The Mitre Corporation. 
The alembic system, http://www.mitre.org/re 
sources/centers/it/g063/workbech.html. 
\[The TRAINS Spoken Dialogue Corpus\] The 
TRAINS Spoken Dialogue Corpus. http://www 
.cs.rochester.edu/research/speech/cdrom.html. 
\[The W3Ca\] The W3C. a. Extensible markup 
language, http://www.w3.org/TR/REC-xml. 
\[The W3Cb\] The W3C. b. Extensible stylesheet 
language, http://www.w3.org/TR/WD-xsl. 
\[Thym~-Gobbel and Levin1998\] 
A. Thym~-Gobbel and L. Levin. 1998. Speech 
act, dialogue game, and dialogue activity tag- 
ging manual for spanish conversational speech. 
http://www.cnbc.cmu.edu/~gobbel/clarity/ma 
nualintro.html. 
\[Traum1996\] D. Traum. 1996. Coding schemes for 
spoken dialogue structure, ftp://sls-ftp.Lcs.m 
it.edu/pub/multiparty/coding_schemes/traum. 
\[van Vark and de Vreught1996\] R.J. van 
Vark and L.J.M. Rothkrantz J.P.M. de Vreught. 
1996. Analysing ovr dialogue coding scheme 1.0. 
ftp: / /ftp.twi.tudelft.nl/TWI/publications/tech- 
reports/1996/DUT-TWI-96-137.ps.gz. 
