Semi-Automatic Generation of Dialogue Applications
in the GEMINI Project ∗
Stefan W. Hamerich, Volker Schubert, Volker Schless
TEMIC Speech Dialog Systems, Ulm, Germany
{stefan.hamerich|volker.schubert|volker.schless}@temic-sds.com
Ricardo de C´ordoba, Jos´e M. Pardo, Luis F. d’Haro
Grupo de Tecnolog´ıa del Habla, Universidad Polit´ecnica de Madrid, Madrid, Spain
{cordoba|pardo|lfdharo}@die.upm.es
Basilis Kladis, Otilia Kocsis
Knowledge S.A. (LogicDIS group), Patras, Greece
{bkladis|okocsis}@logicdis.gr
Stefan Igel
Forschungsinstitut f¨ur anwendungsorientierte Wissensverarbeitung (FAW), Ulm, Germany
sigel@faw.uni-ulm.de
Abstract
GEMINI (Generic Environment for Multilin-
gual Interactive Natural Interfaces) is an EC
funded research project, which has two main
objectives: First, the development of a flexible
platform able to produce user-friendly interac-
tive multilingual and multi-modal dialogue in-
terfaces to databases with a minimum of hu-
man effort, and, second, the demonstration of
the platform’s efficiency through the develop-
ment of two different applications based on this
platform: EG-Banking, a voice-portal for high-
quality interactions for bank customers, and
CitizenCare, an e-government platform frame-
work for citizen-to-administration interaction
which are available for spoken and web-based
user interaction.
1 Introduction
GEMINI1 exploits experience gained from previous
projects (see e.g. (Ehrlich et al., 1997; Lehtinen et al.,
2000)) and from real-world use of similar systems, to
create a generic platform for the development of user-
friendly, natural, high quality, intuitive, platform in-
dependent and multi-modal interactive interfaces to a
wide area of databases employed by information service
providers.
∗This work was partly supported by the European Com-
mission’s Information Society Technologies Programme under
contract no. IST-2001-32343. The authors are solely responsi-
ble for the contents of this publication.
1Refer to the GEMINI Project Homepage on
www.gemini-project.org for further details.
The main idea of GEMINI is that, given a database, a
description of its structure and how to access the data as
well as a list of the kinds of requests the user may make,
the system should be able to automatically generate the
necessary dialogue scripts to run the service. In a sense,
this is exactly what a human call center agent does when
being trained for the job. Within the project we strive to
get as close as possible to this ideal.
Specifically, the application generation platform of the
GEMINI project contains generic dialogue components
available for adaptation to new services and languages.
Thus, generation of multilingual and multi-modal inter-
faces is achieved by incorporating the lexical and se-
mantic relations of the databases contents, reducing the
development time and facilitating the system’s mainte-
nance and transportability to different applications and
languages. Furthermore, the platform enables a high de-
gree of personalisation (i.e. user modelling, speaker ver-
ification, etc.).
This paper is organised as follows: First we describe
the application generation platform (AGP) of the GEM-
INI project. Afterwards we introduce the two pilot appli-
cations developed with our platform. Next we compare
our approach with other proposals made by different re-
search groups. Finally we conclude our major findings.
2 Application Generation Platform
The main target of the GEMINI project is the develop-
ment of a platform for generating interactive, multilin-
gual and multi-modal dialogue interfaces to databases
with a minimum of cost and human effort. The AGP is
an integrated set of assistants to generate multi-modal di-
alogue applications in a semi-automatic way. Its open
and modular architecture simplifies the adaptability of
applications designed with the AGP to different use cases.
Connecting to a different database, adding a new modal-
ity or changing a scripting language can be achieved by
adding or replacing the appropriate component without
touching the other aspects of dialogue design again.
The AGP consists of assistants, which are tools (partly
with a GUI) producing models. All these models gen-
erated within the AGP are described in GDialogXML
(GEMINI Dialog XML), which is an object-oriented ab-
stract dialogue modelling language. It was created during
GEMINI for use with the AGP. See Figure 1 for an exam-
ple of the GDialogXML syntax. For a detailed descrip-
tion of GDialogXML refer to (Hamerich et al., 2003).
<Var id = "xPersonName">
<xType><Type refr = "String"/></xType>
</Var>
<Var id = "xPersonList">
<xType><Type refr = "List">
<xItemType>
<Type refr = "ObjEmbed"/>
<xClass><Class refr = "Person"/></xClass>
</xItemType></Type>
</xType>
</Var>
Figure 1: Definition of variables in GDialogXML
All models in the AGP may be saved as libraries for
future applications.
As shown in Figure 2 the AGP is not supposed to com-
plete its task without any human interaction. This is be-
cause there will always be different ways for retrieving
specific information. Consequently, the designer of dia-
logue applications has to select the preferred flow of dia-
logue manually by confirming the proposals of the AGP
components. Most of these operations are simply drag
& drop actions between various windows that contain all
relevant fields, which are automatically created from the
previous tools of the platform.
2.1 AGP Architecture
All components of the AGP are integrated into one frame-
work. This eases the use of the platform and enables the
designer to switch back and forward to different tools in
case she or he wants to add or modify certain dialogues.
In Figure 2 the architecture of the AGP is illustrated.
The whole AGP consists of three layers. These layers are
described in more detail in the following sections.
2.1.1 Framework Layer
The framework layer is the first layer of the AGP (refer
to Figure 2). It includes the application description as-
sistant (ADA), the data modelling assistant (DMA), and
the data connector modelling assistant (DCMA). As indi-
	
	 
		




		



	
  
	
 !
"



 

	


	#			 		
$		
%
&	
%'		
Figure 2: Schematic view of the AGP architecture.
cated by the black arrow in the upper left corner of Fig-
ure 2, all assistants are controlled manually.
The designer has to provide the application descrip-
tion, which mainly consists of the modalities for which
the AGP should generate dialogue scripts, the languages
for which the dialogues should be available, the dialogue
strategy for the resulting system, some settings for error
handling and a rough application description containing
the major dialogue steps and their respective slots.
The DMA helps creating the data model, which con-
sists of class descriptions. Also, the attributes and ele-
mentary types of the data are specified here. In this pro-
cess, the GUI guides the designer, and there is the possi-
bility to load libraries of previously created classes.
Furthermore the DCMA helps creating APIs and im-
plementation references for application specific data ac-
cess functions.2 These functions could then be used in
the runtime system without any knowledge of the exist-
ing database.
2The implementation of data access functions has to be done
outside of the AGP context, since special knowledge about the
database itself is needed for this.
2.1.2 Retrievals Layer
The retrievals layer (shown as the second layer in Fig-
ure 2) mainly consists of the retrieval modelling assis-
tant (RMA). This layer is modality and language inde-
pendent, therefore no language or modality specific data
is included here.
The designer uses the RMA to create the abstract dia-
logue flow. It provides a user-friendly interface where the
design process is accelerated. Two main sources of infor-
mation are used to automate the process: the data model
and the data connector. Using the information in the data
model, several dialogues are automatically generated: (1)
candidate dialogues for attributes that the user should be
asked for (we call them ’get information dialogues’) and
(2) another dialogue where that specific attribute is pre-
sented by the system (’say information dialogues’). At
the same time, all procedures from the data connector are
available to the designer, who can drag & drop any of the
dialogues mentioned so far.
In the ideal situation, where a dialogue only depends
on items from the data model, it can be modelled with
just three drag actions: (1) drag & drop a get information
dialogue, (2) drag & drop a call to the database (from the
data connector), and (3) drag & drop a say information
dialogue. All the values exchanged by these three func-
tions are assigned automatically by the assistant, so the
designer just has to press ’Accept’ for all assignments.
When the dialogue depends on data not contained in
the data model (as questions to the user that do not cor-
respond to an object from the data model), the designer
can use a set of four different types of dialogues: dia-
logue based on user input / on a variable / on a sequence
/ on a loop. In all of them, conditional, switch-case and
loop constructs can be inserted. So, the designer has both
automation and a great flexibility in dialogue design.
The resulting output is called generic retrieval model
(GRM), which consists of the modality and language in-
dependent parts of a dialogue, which is mainly the appli-
cation flow. The GRM is modelled in an object-oriented
way using GDialogXML and mainly consists of dialogue
modules. A dialogue module can call other modules
as subdialogues or can jump to another top level mod-
ule. This way, the application flow of dialogues in GDi-
alogXML is modelled.
As indicated by the dashed arrow, it may be necessary
to do some manual fine tuning on the GRM, as the com-
plexity of the RMA depends on the application and may
be rather high and often there exist several ways to im-
plement the application.
2.1.3 Dialogue Layer
The dialogue layer is modality and language dependent
as now the modality extensions from the modality exten-
sion assistant (MEA) are added to the retrieval model.
In the extension files the input and output behaviour of
an application is described for a specific modality. The
current implementation of the AGP supports the genera-
tion of voice (speech modality) and web-based applica-
tions (web modality). For the speech modality the exten-
sions consist of links to grammar and prompt concepts,
which are language and modality independent. For each
language, there is a separate concept file, containing the
wording for the prompts and the names of the grammars
used. Additionally the modality extension consists of
special subdialogues which are specific for one modality
only.
All grammars and prompts of the AGP are handled in
a global library, which eases the quick and easy reuse of
several components.
The GRM is enriched by the modality extensions in
the Linker. The resulting model is called dialogue model,
which is processed by the speech script generator and/or
the web-page script generator depending on the selected
modalities in the application description. For the speech
modality VoiceXML scripts with some additional CGI
scripts are generated. The grammars are taken from the
AGP grammar library or have to be generated with the
MEA. For the web modality a web-page script is gener-
ated out of the dialogue model which enables dynamic
web pages.
For the speech modality, some more tools are relevant,
namely the language modelling tool and the vocabulary
builder.
To have the runtime system ready for use, little effort
has to be spent on manual fine tuning again. For example
the recogniser dependent settings have to be adjusted for
the VoiceXML platform.
2.2 Implementation of the AGP
The initial prototype of the AGP of the GEMINI project
was finished in summer 2003. This version’s architecture
is shown on Figure 2. In spring 2004 an extended and
improved version of the AGP will be implemented. This
version covers additional features like mixed initiative di-
alogues with over-answering, advanced user-modelling,
natural language generation, and language-identification.
As well, multilingual dialogues are possible with this fi-
nal version.
All platform components have been implemented us-
ing Qt. Due to this fact, the AGP is applicable on differ-
ent operating systems.
3 Applications
Two pilot applications have been generated using the
AGP for evaluation and validation. All these applications
are generated in a very user friendly way, taking into ac-
count the automatic multi-modal error handling capabil-
ities of the AGP, refer to (Wang et al., 2003) for more
details about the error handling in GEMINI.
3.1 EG-Banking
The voice banking application called EG-Banking appli-
cation constitutes a voice portal for user-friendly, high-
quality interactions for bank-customers. The main func-
tionality of EG-Banking includes a general information
part (covering credit cards, accounts, loans infos) avail-
able to the public and a transaction part (covering account
flow, account balance, statements, etc.) available to cus-
tomers of the bank only. The multi-lingual application is
accessible via a cellular or fixed network telephone.
A manually refined version of the generated applica-
tion is installed at Egnatia Bank in Greece and is used as
a commercial product for phone banking.
3.2 CitizenCare
CitizenCare is an e-government dialogue system for
citizen-to-administration interaction (via multiple chan-
nels like internet and public terminals), filled with con-
tent for an exemplary community. The main functional-
ity is an interactive authority and information guide, pro-
viding different views like an administrative view, based
on the hierarchical structure of the authorities, and a
concern-oriented view, giving the citizen all the informa-
tion needed to make use of services offered by public ad-
ministration authorities.
4 Comparison to Other Approaches
The GEMINI approach for setting up new dialogue ap-
plications differs in a lot of points from other proposals.
In this section we compare our AGP with other existent
approaches.
Compared with the REWARD system from (Brøndsted
et al., 1998) the GEMINI AGP allows the generation of
dialogues for several modalities. Additionally in GEM-
INI we generate dialogues in standardised description
languages (VoiceXML and XHTML), so we have no need
to develop a special runtime system. As done for the RE-
WARD system, we focused a lot on reusability.
In (Polifroni et al., 2003) a rapid development envi-
ronment for speech dialogues from online resources was
described. The development process there first takes
knowledge from various web applications and composes
a database from it. This is one of the differences to our
approach. Our AGP requires a filled database and allows
the development of speech and web applications from it.
Because of this, we do not need to extract any knowledge,
which makes the GEMINI approach more domain inde-
pendent. Another important difference is, that the speech
dialogue applications generated by the AGP will be im-
plemented in VoiceXML, which allows the generated di-
alogues to be executed with every VoiceXML interpreter.
5 Conclusion and Future Work
In the GEMINI project we aim at the design and imple-
mentation of an application generation platform, which
generates state of the art speech and web applications.
The platform architecture is open to generate multi-
lingual dialogue applications from different databases in
several modalities. We can consider the platform a suc-
cess, as we have streamlined the design process of the
applications thanks the help of our assistants. The use of
standards (e.g. VoiceXML) places us in a good position
in the market of voice applications.
To facilitate the communication between all modules,
an abstract dialogue description language, called GDi-
alogXML, was defined, which is another important result
from the project.
Future work will cover the realisation of the improved
AGP, which will allow multi-lingual applications with
mixed initiative, overanswering, and user modelling. Ad-
ditionally a graphical control flow will be available to the
designer. Furthermore the AGP will be evaluated against
other approaches of dialogue design.

References
T. Brøndsted, B. N. Bai, and J. Ø. Olsen. 1998. The RE-
WARD Service Creation Environment, an Overview.
In Proceedings ICSLP, pages 1175–1178, Sydney,
Australia.
U. Ehrlich, G. Hanrieder, L. Hitzenberger, P. Heis-
terkamp, K. Mecklenburg, and P. Regel-Brietzmann.
1997. ACCeSS - Automated Call Center through
Speech Understanding System. In Proceedings EU-
ROSPEECH, pages 1819–1822, Rhodes, Greece.
S. W. Hamerich, Y.-F. H. Wang, V. Schubert, V. Sch-
less, and S. Igel. 2003. XML-Based Dialogue De-
scriptions in the GEMINI Project. In Proceedings of
the ’Berliner XML-Tage 2003’, pages 404–412, Berlin,
Germany.
G. Lehtinen, S. Safra, M. Gauger, J.-L. Cochard, B. Kas-
par, M. E. Hennecke, J. M. Pardo, R. de C´ordoba,
R. San-Segundo, A. Tsopanoglou, D. Louloudis, and
M. Mantakas. 2000. IDAS: Interactive Directory As-
sistance Service. In Proceedings of the international
Workshop ’Voice Operated Telecom Services’, pages
51–54, Ghent, Belgium. COST 249.
J. Polifroni, G. Chung, and S. Seneff. 2003. To-
wards the Automatic Generation of Mixed-Initiative
Dialogue Systems from Web Content. In Proceedings
EUROSPEECH, pages 193–196, Geneva, Switzerland.
Y.-F. H. Wang, S. W. Hamerich, and V. Schless. 2003.
Multi-Modal and Modality Specific Error Handling in
the GEMINI Project. In Proceedings of the ISCA
Workshop on ’Error Handling in Spoken Dialogue Sys-
tems’, pages 139–144, Chateau d’Oex, Switzerland.
