Some Notes on the Complexity of Dialogues * 
Jan Alexandersson Paul Heisterkamp 
DFKI GmbH, Stuhlsatzenhausweg 3, DaimlerChrysler AG 
D-66123 Saarbrficken, Wilhelm-Runge-Str. 11 
Germany D-89081 Ulm, Germany 
j anal@dfki, de paul. heisterkamp©DaimlerChrysler, com 
Abstract 
The purpose of this paper is twofold. 
First, we describe some complexity 
aspects of spoken dialogue. It is 
shown that, given the internal set- 
ting of our dialogue system, it is im- 
possible to test even a small percent- 
age of the theoretically possible ut- 
terances in a reasonable amount of 
time. An even smaller part of pos- 
sible dialogues can thus be tested. 
Second, an approach for early test- 
ing of the dialogue manager of a dia- 
logue system, without the complete 
system being put together, is de- 
scribed. 
'C 
1 Introduction 
On the one hand, it is important for the de- 
velopers of a dialogue system that the system 
is robust (i.e., it does not fail or loop), easy 
to use and is efficient. On the other hand, the 
testing of a dialogue system is cumbersome 
and expensive. Factors like the effectiveness 
and naturalness of the system, as well as ro- 
bustness are problematic to evaluate. While 
test suites for analysis components have been 
around for a while, their counterparts for di- 
alogue managers (henceforth DM) are (to our 
knowledge) non existent. Evaluation as such 
has been target for a lot of rm3earch. Recently 
more or less automatic testing and evaluation 
The authors wishes to thank Raft Engel for help 
with the ~plementation and Norbert Reithinger, 
Tilmau Becket, Christer Samuelsson and Thorsten 
Brantz for comments on earlier drafts and fruitful dis- 
cussions. 
methods has been proposed (e.g. (Eckert et 
al., 1998; Scheffier and Young, 2000; Lin and 
Lee, 2000)). 
A special problem for the development and 
testing of a DM is that one often has to 
wait until the whole system (including speech 
recognizer(s) and synthesis, parser/generator 
etc.) has been integrated. Moreover, to test 
the complete system one usually has to put 
people (e.g. the system developers or beta 
testers) in front of the system, feeding it with 
"appropriate input." Using the developers of 
the system as testers has the potential dis- 
advantage that the system will just be tested 
with the type of phenomena or dialogues the 
developer has in mind. (S)he also has knowl- 
edge about the internals of the system and 
this can influence the testing in unpredictable 
ways (Araki and Doshita, 1997). Another im- 
portant factor for the testing of DMs con- 
cerned with spoken input is speech recogni- 
tion errors and their effects on the input. 
As we started this project, the following 
goals and experiences guided us: 
• It is cumbersome to test the DM with 
the complete system at hand. Although 
this testing is necessary, we would like to 
minimize the test effort necessary. 
• We must reach a status of the DM where 
it is as error free as possible. There 
must not be any technical bugs in the 
program itseff as well as logical bugs, or 
put in other words: The DM must not 
fail on any input. 
• People behave weird (Eckert etal., 
1995). To us there is no hard border- 
line between legal moves and non legal 
160 
moves in a dialogue. Some moves make 
more sense than others, but can the user 
be obliged to say only certain things at 
a certain point in a conversation? We 
think not! A dialogue system should be 
able to react on any input, how weird it 
might be. 
• Speech Recognizers makes errors. 
For our dialogue system with a large vo- 
cabnlary, the recognition rate drops to 
between 70 and 80% for certain problem- 
atic speakers. Consequently every fourth 
or fifth word can be wrong. An average 
user contribution contains 5 words in the 
application we refer to here (Ehrlich et 
al., 1997), not including single-word ut- 
terances in the calculation. 
Thus, every utterance may contain a 
falsely recognized word that may or may 
not be important for parsing or semantic 
construction. 
To overcome some of the problems stated 
above and to find errors as early as possi- 
ble during the course of developing a dialogue 
system, we have developed a validation tool 
- VALDIA - for the automatic testing of the 
DM. The overall goal we had in mind was to 
be able to obtain a status of the DM such that 
it at least does not contain any loops or other 
fatal (trivial) dialogue strategy errors. To be- 
come independent of the completion status 
of the overall system, we decided to peel the 
interfacing components (parser, generator,...) 
away from the DM. We now view the DM as 
a black box. This black box is then fed with 
random generated input in some interface lan- 
guage and we observe how the DM reacts on 
the given input. An important prerequisite is 
of course that the interface between the anal- 
ysis component and the DM is defined. 
At this point we would like to emphasize 
that our dialogue system is not modeled with 
"finite state dialogue structure" and "allow- 
able syntax" for each state as described in 
(Scheffier and Young, 2000). In our view such 
a system is simple to test, since the system 
will just recognize those utterances it is de- 
signed to process. In such a scenario one can 
use the dialogue model for, e.g., enumerating 
every possible dialogue or generate "coher- 
ent" dialogues. On the other hand, our sys- 
tem puts no limits on what is allowed to say at 
a certain point in the dialogue, which makes 
the task of automatic testing non-trivial. 
Ideally one would want to perform an ex- 
haustive testing the DM with, say, all possible 
dialogues, i.e., sequences of user contributions 
and the respective system reactions. User 
contributions are supposed to have a maxi- 
mum length in terms of semantic items. An 
investigation of the complexity of the number 
of possible utterances (in terms of combina- 
tions of semantic expressions) and resulting 
possible dialogues showed that for our DM, 
the testing task is so complex that the uni- 
verse of possible semantic expressions cannot 
be tested in a reasonable amount of time (see 
Section ??). 
Looking at the complexity of the task one 
is tempted to ask - "is it possible to exhaus- 
tively produce all possible dialogues of a cer- 
tain length?" Or maybe more interesting: 
"can we feed the DM with all the generated 
dialogues?" In (Levin and Pieracciui, 1997) 
a sketch of a method to find good dialogue 
strategies was put forward. The authors ar- 
gue that a dialogue system can be modeled 
in terms of a state space, an action set and 
a strategy. They show how one could auto- 
matically find an optimal strategy by feeding 
the system with all possible dialogues, or in 
our terminology sequences of user contribu- 
tions. We took the natural continuation of 
this: to automatically generate user contribu- 
tions or dialogues and feed them to the sys- 
tem, and then let the system find the optimal 
strategy itself. In this paper we explore some 
aspects and limitations of such an approach 
by analyzing the complexity of dialogues. We 
will, for instance, show that even if a dialogue 
manager can process one or ten or even one 
hundred user contribution(s) per second we 
cannot find an optimal strategy based on ex- 
haustive search - the search space is simply 
too large! 
The paper starts with a brief description of 
the architecture of the DM and the test envi- 
161 
I Speaker ~I 
Aucllo ~ independent I wora 
l - 
xn Speech Lattice Recognition 
I -I Audlo Syn.t,h : Synthesis 
- 
Parsing 
 enotion If 
Dialog 
Manager 
Figure 1: Schematic architecture for our dialogue system. 
ronment for VALDIA, and a description of its 
input format. We then discuss the complexity 
of an utterance, continuing with the complex- 
ity of dialogues. Finally, VALDIA is described 
in more detail and then the paper is closed by 
a discussion of relevant results and papers. 
2 Architecture 
The dialogue system to which we first applied 
VALDIA (Heister~mp and McGlashan, 1996; 
Ehrlich et al., 1997) was designed for answer- 
ing questions about and/or selling insurances 
in the domain of car insurances. In case of 
.failure or problems with the dialogue, the sys- 
tem passes the customer to a human opera- 
tor. The architecture of the system includes 
an HMM-based speaker independent speech 
recognizer, an island parser, DM, generator 
and synthesizer as depicted in figure 1. The 
system also includes a data base which is ac- 
cessed for the retrieval of domain specific in- 
formation. It is important for this paper that 
the speech recognizer is not limited to "al- 
lowed user contributions" but outputs a word 
hypotheses lattice or the best; chain which is 
processed by an island parser. Thus, the in- 
put to the DM might, depending on recogni- 
tion quality, consist of arbitrary sequences of 
semantic expressions. A basic requirement is 
that the DM is not allowed to fail on any of 
these inputs. 
For testing, we peel the interfacing compo- 
nents away from the DM and regard the DM 
as a black box. It is assumed that we send 
a piece of input to the DM which then re- 
acts in a way we can observe (for instance by 
returnlng/generating some output). We as- 
sume that the DM has no notion of time. This 
mean.q that to test the DM, we simply have to 
feed it with input and wait for it to acknowl- 
edge this by sending a responsive output re- 
quest. In looking at the response, however, 
we have to be sensitive to effects like timeout 
(e.g., the DM is "thinking" too long) and/or 
loops (e.g., the DM outputs the same item all 
the time). Although in (Levin and Pieraccini, 
1997) the utterances triggering the actions are 
not mentioned at all, this is very important. 
In general we don't know which utterance will 
trigger a certz.in action when the DM is in a 
certain state, or if the DM needs an utter- 
ante at all to perform another action. As the 
exhaustive validation criteria for the DM do 
not allow us to assume any insight into the 
DM itself, we have to simply feed it with all 
possible sequences of utterances. 
Our test architecture is shown in figure 2. 
We connect to the DM at the same place as 
the analysis. We also watch the output sent 
to the generator. Additionally we watch the 
process status of the DM, that is we notice if 
the DM fails or breaks. In that case we can 
restart the DM and continue the testing. 
3 Complexity 
This section puts forward some notes on the 
complexity of dialogue. We are aware that the 
discussion and the results are not necessar- 
162 
ValDia s,~,~,mtic i "~ l= 
! 
l 
| 
! 
| 
s • 
| 
Black Box 
I 
I 
I 
I 
i 
Dialog @ ) 
Manager 
| 
! 
! 
| 
m 
| 
i 
| 
m i 
Figure 2: Schematic architecture for ValDia. 
ily generalizable because they depend on the 
representation of the input formalism to the 
DM. However, we were certainly surprised by 
the results ourselves and it has consequences 
for the degree of coverage and testing one can 
achieve. For our dialogue system the seman- 
tic representation formalism is simple. It con- 
sists of propositional content represented as 
sequences of semantic objects the SIL 1 repre- 
sentation  (McGlashan et al., 1994). 
Here is one example: "Ein Audi 80 Avant 
Quattro mit ber 100 PS" "An Audi 80 Sta- 
tion Wagon Jx4 with over 100 hp" 
\[ \[type : car_type, 
\[l;hemake : manu£ acturer, 
value : aud£\], 
\[thetype: type_name, 
value: achtzig\], 
\[theversion: version_name, 
value: avant\], 
\[thespecial£eature :feature_name, 
value : quattro\], 
def : indef\], 
\[type: power, 
themeasuretype : ps, 
thevalue: \[type: number, 
cvalue: 125, 
modus: \[rel : above\] \], 
modus : Ire1: with\] \] \] 
This representation is motivated by the fact 
that the analysis component is an island 
1 Semantic Interface Language 
parser (Hanrieder, 1996), and can thus find 
islands or sequences of semantic objects. 
3.1 The complexity of an utterance 
The basic entity is a semantic object (S) 
which is an atomic item treated by the DM. 
The DM knows about (and thus can treat or 
react on) M different semantic objects. Ex- 
amples of a semantic object are cmc_type, 
power, greeting, bye, integer, and year. 
We will not pay attention to the fact that a se- 
mantic item could be instantiated with, e.g., a 
street name - in the navigation domain there 
exist about 42,000 different names of cities 
in Germany, and Berlin has 11,500 different 
street names - but we could of course extend 
the discussion below (on the cost of complex- 
ity). 
We call a user contribution an utterance. 
We assume that an utterance U is a (possi- 
bly empty) sequence of semantic objects. This 
can of course be relaxed to sequences or trees 
in some algebra, but for this discussion it suf- 
rices to deal with sequences - as we will see, 
the complexity is "complex enough" with this 
assumption. A sentence can consist of max O 
number of semantic objects. An utterance is 
a multi-set in the real system, but for this dis- 
cussion we assume an utterance is not. Each 
semantic object can therefore appear at most 
one time. Given the definitions above we can 
now compute the number of possible utter- 
ances \[ U \[: All sequences of a certain length 
163 
l are 
We therefore have 
Ivl 
For one of our dialogue models, concerned 
with car insurance, we have M = 25 and 
O = 9. That is, 25 different semantic ob- 
jects and we allow for a maximum of 9 se- 
mantic items (arbitrarily chosen by estimate 
of breath length) in one utterance: 
{ U I = 1.9.109 
Now, if we would like to test whether our 
DM can treat all utterances or not, we will 
have to wait quite a while: Suppose our DM 
can process 10 utterances per second, then 
we can process 10-60-60 = 36000 utter- 
ances per hour, 36000 - 24 = 864000 utter- 
ances per day, 7. 864000 = 6048000 per week, 
or 864000. 365 = 315360000 utterances per 
year. To process all possible utterances we 
would need more than six years! 
..:, Obviously, the current parameters of the 
system make the complexity of the number 
of utterances intractable in realistic settings. 
Figure 3 shows how different parameter set- 
ting affects the cardinality of utterances for 
different values of M. The (logarithmic) y- 
axis represents the cardinality of utterances, 
and the (linear) x-axis the maximal number 
of semantic items in one utterance. As can be 
seen, for our DM, we will have to limit, e.g., 
the number of semantic items to 6 per utter- 
ance if we want to test all utterances in one 
week. 
3.2 The complexity of dialogue 
A dialogue can - at least theoretically - con- 
sist of a sequence of the same utterance. 
Many of the dialogues will of course be non- 
cooperative and very lmnatural or, put in 
other words, not legal. But, as indicated 
above, it is important to us tlhat the DM does 
not fail on any input. To generate all possi- 
ble dialogues I D I of a certain length L, we 
therefore have: 
IDl=lvl.lul.....Iv!= 
L times IuJ L 
For our scenario 15 user contributions are 
not unnatural, so for L = 15 and the fig- 
ures above, we have I D I ~ 1014° which will 
take quite a while to process 2. Even ff we re- 
strict the length of the dialogues to 2, we get 
1.9 • 109 • 1.9 • 109 = 3.6 • l018 theoretically 
possible dialogues and can thus process just 
an infinitely small part of them. 
3.3 Consequences 
Now, suppose we randomly select some dia- 
logues out of the set of possible ones. While 
testing the DIALOGUE MANAGER with them 
we thereby encounter a certain number of (or 
even zero) errors, it is interesting to be able to 
say something about how error-free the DM 
is. For this discussion, it is important that 
by viewing the DM as a black box, we can 
not do anything more than assuming the er- 
rors to be distributed according to the nor- 
real distribution. Moreover, we can only ap- 
ply this reasoning if we do a large number 
of observations. The figures below may - 
depending on the theoretical number of di- 
alogues - not be valid. By using the approx- 
hnation of the normal distribution we know 
that if we tested N = 10000 dialogues and 
received errors in DM in, say, 250 of the di- 
alogues (-,z f = 2so = 0.025), we can say Y~6 
that the DM contains (with a degree of con- 
fidence of 95%) 
= = s* 1.96 × = 
0.025 .1.96 × 
0.025 :t: 0.003 
percent errors. 
In case no errors were found we get 
E=0:t=l-96×v ioo0o =0=t=0. 
2The exact number is 2184671458940261530062771 
49050004422653349789248729589853552333475097?4 
1304997726070386514948280700225687715652634437 
7571018487670988739143 :-) 
164 
le+09 
le+08 
le+07 
le+06 
100000 
10000 
Z 
1000 
100 
10 
1 
complexity utterances 
4 6 8 10 
Sem Objects/Utterance 
Figure 3: Utterance Complexity. 
Here we have to use a trick: Instead we sup- 
pose we found one error, and thus 
/ = 1/10000 = O.O001 
yielding 
,/o.b0o~×(1-o.oool) E = 0.0001 :i: 1.96 x v 10000 = 
1.96 • 10 -4 ~ 1.0- 10 -6 
we can at least say that we are 95% confident 
that the DM will in less than 
1.96.10 -4 + 1.0.10 -6 = 1.97- 10-4% 
cases raise an error. 
4 VALDIA - The Implementation 
To allow for intelligent testing, we decided to 
implement our test tool in using the following 
three parts: 
• the core test engine, 
• the interface to the DM (implemented in 
OZ/MOZARTa), and 
aThe reason for using OZ is manifold: OZ fea- 
tures threads, multiple platforms (UNIX/LINUX and 
Windows), nniRcation, a Td/Tk library, and finally it 
comes for fzee. See hl;tp://www .mozart-oz. org 
• a graphical editor for the definition 
of stochastic automata (implemented in 
Tcl/Tk), 
The core test engine uses the definition of 
stochastic automata to create sequences of se- 
mantic expressions to be sent to the DM. It 
records both the input and the output to and 
from the DM and checks for special messages 
(e.g. end of dialogue), crashes, if the DM is 
emitting the same response all the time, or 
other events events that indicate erroneous 
behaviour of the DM. It also creates test pro- 
files and checkpoint files to enable interrup- 
tion and restart of test runs. 
The interface handles the connection be- 
tween VALDIA and the DM. It realizes a 
TCP/IP connection to and from the DM. In 
case parallel test runs are made, it can also 
handle different processes. 
The motivation for the stochastic automa- 
ton editor and, at the same time, the main 
feature of VALDLA (see Figure 4) is that it 
allows for the design of utterances or even di- 
alogues or utterance sequences, and thus test 
specific areas in the space of theoretically pos- 
sible dialogues. The dialogue system devel- 
165 
Figure 4: Screen shot of the automata editor 
oper can interactively define the automata, 
using the pointing device to draw the states 
an the transitions. In each state, it is possi- 
ble to change the constraints for the defini- 
tion of a SIL expression. More precisely we 
change the probalrility of the alternatives of 
(a part of) an expression. The arcs between 
the states are augmented with probabilities 
which guide state transitions in a stochastic 
m~uner, thus creating certain sequences by 
preference, without completely excluding oth- 
ers. In Figure 5 the left row contaius the basic 
semantic entities, the middle the probability, 
and the right one the number of occurrences 
for that particular semantic item in each ut- 
terance. For the semantic items the variable 
parts are linked to another window where the .... 
their instantiations are described. The con- 
straints are semi-automatically derived from 
the definition of the interface specification for 
the DM. The reason for "semi-automatically" 
and not automatically is that we have had 
no time to write a generic function for this. 
But, basically the derivation is straightfor- 
ward. Consequently we can design interesting 
utterance sequences, according to, e.g., expe- 
riences gained during WOZ-experhnents- 
166 
Figure 5: Part of the constraints of an utterance 
Finally, by using just one state and no 
constraints, we can, of course, produce com- 
pletely arbitrary utterance sequences. 
During the testing of the dialogue manager 
we can run the system in two modes. The 
first - exhaustive mode - generates all se- 
quences of dialogues by enumerating all di- 
alogues. This is based on the enumeration 
of all possible utterances in each state. The 
exhaustive mode can be used when we know 
that the complexity of the automaton (and 
utterances) is testable - VALDIA can com- 
pute the number of dialogues and compute 
an upper time limit based on the computa- 
tional power of the DM. In the second mode - 
Monte Carlo mode - the utterance generation 
in each state as well as the change of state is 
random. In this way we randomly wa.lk~ the 
automaton and randomly generate utterance 
profiles. This has been proven useful in the 
cases where we number of possible dialogues 
to large is for exhaustive testing. 
Notice that we can not pay any attention 
to legal moves. VALDIA has (i) no knowledge 
about what a legal move is, and (ii) no possi- 
bility to react on the response from the DM. 
Therefore the "legal moves" and "coopera- 
tiveness" is non existent concepts here. But, 
this is what we want: People behave weird! 
Our speech recognizer produces errors! And 
167 
most important: We have to live with this, 
and must not fail on any input! 
5 First Results 
During the development of VALDIA we have 
detected several errors in the implementation 
of our DM. Most of the errors where logical er- 
rors of the kind "Now that's a combination of 
things we didn't cover." e.g., the co-occurence 
of good_bye and request_repetition in a 
user utterance led to a goal conflict in the 
DM that caused it to hang, as did the non- 
exclusive handling of disjunction in "It's older 
(or) younger than 5 years", etc. 
Additionally we discovered that the DM in 
some of the test runs crashed ~ffter about 500 
(l) dialogues due to erroneous memory han- 
dling. This is something one would never de- 
tect during normal testing with a full system, 
but immediately after delivering the system. 
VALDIA produces huge amounts of (huge) 
trace files. Analyzing these is at present a 
pain as big as testing the complete dialogue 
system. Consequently, we will have to de- 
velop functionality for condensing the trace 
information. 
6 Conclusion 
-The project VALDIA has produced useful in- 
sights into the complexity of dialogue: Spoken 
dialogue is very complex! Exhaustive testing 
of a DM is for some scenarios/dialogue models 
impossible. The results were obtained during 
the development of a test program for a DM. 
Purpose of the testing was to be able to in- 
tegrate a DM into the dialogue system which 
contained as few errors as possible. We would 
like to highlight the following points: 
• VALDIA has proven its usefulness in that 
it is able to detect errors in the imple- 
mentation of DMs before', it is integrated 
into the complete dialogue system. Dur- 
ing the testing we encountered, in addi- 
tion to logical bugs, errors which would 
never be detected during normal testing 
with the complete dialogue system. 
• By including the automata into Vm.,- 
DIA it is possible to concentrate the test- 
ing on "interesting utterance sequences" 
and, despite the huge universe of theoret- 
ically possible dialogues, obtain a status 
of the DM which for certain tasks is well 
tested. 
It is simple to adapt for the testing 
of a new DM. Technically the only 
thing that has to changed is the def- 
inition/constraints of the definition ut- 
terances. This is at present a semi- 
automatic process. Conceptually the au- 
tomata has to be defined, unless one 
wants to test in Monte Carlo mode. 
In the current implementation VALDIA 
uses about 10% of the processing time 
compared to the DM. Thus VALDIA can 
control between 5 and 10 instances of the 
DM depending on available resources in 
the net. 
VALDIA is platform independent. At our 
site, we are using a mixture of differ- 
ent types of computers, both PCs run- 
ning under Windows/Linux and UNIX 
machines. Depending on load, we are 
flexible to utilize any of the free resources 
for the testing. 
We are currently in the process of adapting 
VALDIA for a new scenario. For this DM in- 
put consists of grammatical structures, rather 
than sets of semantic objects. Since the VAL- 
DIA project started, interesting research re- 
sults have emerged and there are lot of things 
that remain to be done. Amongst those, we 
will pay attention to at least the following top- 
ics: 
The current implementation of VALDIA 
has no means of react on the output from 
the DM. For intelligent testing this has to 
be incorporated into the system. Possible 
future directions are described in (Eckert 
et al., 1998), (Schefiler and Young, 2000) 
and (Lin and Lee, 2000). In, e.g., (Eck- 
eft et al., 1998) VALDIA is replaced by an 
simulated user, and the authors describe 
a statistical method for reacting on sys- 
tem responses. 
168 
* We have to develop a tool for semi- 
automatically anaJyzing the trace files 
produced by VALDIA. Possible future 
features are just saving the files of those 
dialogues/utterances which resulted in 
an error. 

References 
Masabiro Araki and Shuji Doshita. 1997. Au- 
tomatic evaluation environment for spoken di- 
alogue systems. In Elisabeth Maler, Marion 
Mast, and Susann LuperFoy, editors, Re- 
vised papers from the ECAL96 Workshop in 
Budapest, Hungary on Dialogue Processing in 
Spoken Language Systems, Heidelberg, Au- 
gust. Lecture Notes in Artificial InteUigence, 
Springer-Verlag. 
Wieland Eckert, Elmar N5th, Heinrich Niemann, 
and Ernst-G/inter Schukat-Talamazzini. 1995. 
Real users behave weird - experiences made col- 
lecting large human-machine-dialog corpora. In 
Paul Dalsgaard, Lars Bo Larsen, Louis Boves, 
and Ib Thomsen, editors, Proceedings oS ESCA 
Tutorial and Research Workshop on Spoken Di- 
alogue Systems '95, VigsS, Denmark. 
Wieland Eckert, Esther Levin, and Roberto Pier- 
accini. 1998. Automatic Evaluation of Spo- 
ken Dialogue System. Technical report, AT&T. 
Technical Report Nr. TR98.9.1. 
Ute Ehrlich, Gerhard Hanrieder, Ludwig Hitzen- 
berger, Paul Heisterkarnp, Klaus Mecklenburg, 
and Peter Regel-Brietzmann. 1997. Access - 
automated call center through speech under- 
standing system. In Proceedings of Eurospeech 
'97, Rhodes. 
Gerhard Hanrieder. 1996. Inkrementelles 
Parsing gesproehener Sprache mit einer 
linksassoziativen Unifikationsgrammatik~ 
Ph.D. thesis, Universit~it Erlangen-N~u'nberg. 
http://www.infix.com - ISBN 3.-89838-140-4. 
Paul Heisterkamp and Scott McGlashan. 1996. 
Units of Dialogue Management: An Example. 
In Proceedings of ICSLP-96, Philadelphia, PA, 
October. 
Esther Levin and Roberto Pieraccini. 1997. A 
stochastic model of computer-human interac- 
tion for learning dialogue strategies. In Pro- 
ceedings of EuroSpeech-gZ Rhodes. 
Bor-schen Lin and Lin-shan Lee. 2000. Fun- 
damental performance analysis for spoken di- 
alogue system based on a quantitative simula- 
tion approach. In Proceedings of ICASSP-2OO0, 
Istanbul, Turkey, June 5-9. 
Scott McGlashan, Francois Andry, and Gerhard 
Niedermalr. 1994. A Proposal for SIL. Tech- 
nical report, University of Surrey, CAP SOGETI, 
and Siemens AG, March. SUNDIAL report. 
Konrad Scheffier and Steve Young. 2000. 
Probablistic simulation of human-machine di- 
alogues. In Proceedings o\] ICASSP-2000, Is- 
tanbul, Turkey, June 5--9. 
