Flexible Speech Act Based Dialogue Management 
Eli Hagen and Fred Popowich 
School of Computing Science 
Simon Fraser University 
Canada V5A 1S6 
{hagen, popowich}@cs, sfu. ca 
Abstract 
We present an application independent dialogue 
engine that reasons on application dependent 
knowledge sources to calculate predictions about 
how a dialogue might continue. Predictions are 
 independent and are translated into lan- 
guage dependent structures for recognition and 
synthesis. Further, we discuss how the predic- 
tions account for different kinds of dialogue, e.g., 
question-answer or mixed initiative. 
1 Introduction 
The computerized spoken information systems (or 
Spoken Dialogue System--SDS) that we will con- 
sider in this paper are systems where a computer 
acts as the operator of some service and inter- 
acts with a user in natural , e.g., switch 
board, directory assistance, or ticket service. Be- 
fore an SDS can provide its information, it needs 
to acquire data from the user, e.g., customer name 
and number, birth date, service location, or ser- 
vice date. We call these parameter values. In an 
SDS they are acquired orally and speech recogni- 
tion is used to decode the speech signal into words. 
A dialogue manager facilitates the negotiation 
of parameter values between a user and an SDS. 
We emphasize keeping our dialogue manager ap- 
plication and  independent, thus we fac- 
tored out the independent information into two 
components. A dialogue engine calculates pre- 
dictions for how to continue a dialogue from de- 
pendent knowledge sources (e.g., dialogue gram- 
mar and history, application description). A prag- 
matic interpreter maps syntactic/semantic inter- 
pretation results onto predictions. 
Our predictions are called dialogue primitives; 
GEN-primitives predict system utterances and 
REC-primitives predict user utterances. They are 
 independent and on both the recogni- 
tion and the generation side, other modules trans- 
late them into  dependent structures. In 
this paper, we will discuss the kinds of primi- 
tives our dialogue manager calculates and how 
REC-primitives 
REC- ~, 
SIDE I speech 1 \[ syntac./sem. \] recognizer ~ interpreter 
/ \ 
'telephone\] primitives pragm, int. 
microphone I ~o~e GEN I engine 
primitiv~ 
~ \[ response i applicati°n 
I generator description+ dialogue 
strategies 
GEN- 
SIDE 
Figure 1: System architecture of our SDS. The 
arcs indicate information flow. 
they account for different kinds of dialogue, e.g., 
question-answer or mixed initiative. 
2 Background 
First, we discuss our system architecture and data 
flow between modules. Second, we present the ap- 
plication description of a movie service, which we 
will use for the examples in later sections. Third, 
we present some of our current primitives, and fi- 
naUy, we describe the dialogue engine and how it 
uses the application description and other sources 
to calculate dialogue primitives. 
2.1 System Architecture 
Our system architecture is presented in Figure 1. 
The dialogue manager takes an application de- 
scription (Section 2.2) and a set of dialogue strate- 
gies (Sections 3 and 4) as input--both provided by 
the service designer. The application description 
describes the parameters needed by the service 
and is necessarily application dependent. The di- 
alogue strategies contain directions for how the di- 
alogue shall proceed in certain situations. For ex- 
ample, whether to ask for confirmation or spelling 
of a badly recognized parameter value or whether 
131 
to generate system or user directed dialogue. 
The output of our dialogue manager is a bag 
of abstract,  independent primitives. On 
the generation side they encode the next sys- 
tem utterance and a response generator trans- 
lates the GEN-primitives into text, which is then 
synthesized. On the recognition side, the REC- 
primitives represent the dialogue manager's pre- 
dictions about the next user utterance. REC- 
primitives are translated into (recognition) con- 
texts and grammars for speech recognition and 
they may activate sub-components of a synsem 
grammar. After speech recognition has taken 
place, the dialogue engine must be told which pre- 
dictions came true, thus the pragmatic interpreter 
maps the output of synsem interpreter onto a sub- 
bag of REC-primitives, which is then returned to 
the dialogue engine for further processing (Sec- 
tion 2.4). 
2.2 Application Description 
The application description (AD) specifies the 
tasks that a service can solve and the parame- 
ter values needed to solve them. The AD for a 
movie service is presented in Figure 2. Our repre- 
sentation is an extended version of and-or trees 1 
and in Figure 2, the U-shaped symbols represent 
and-relations, while the V-shaped symbols repre- 
sent or-relations. Thus, this movie service can 
perform three tasks: selling tickets or providing 
movie or theatre information. If the user wants 
to buy tickets, the system needs to acquire six pa- 
rarneter values, e.g., the show time, the date, and 
the name of the film. Date and show time can 
be acquired in several ways. For example, a date 
can be a simple date (e.g., "November 17th ~) or 
a combination of day of the week and week (e.g., 
"Wednesday this week."). 
The nodes keep state information. Open nodes 
have not yet been negotiated, topic nodes are be- 
ing negotiated, and closed nodes have been negoti- 
ated. The currently active task has status active. 
Parameters can be retrieved through the func- 
tions activeTask(AD), openParams(AD), closed- 
Params(AD), and topicParams(AD). Status(p) 
returns the status of parameter p. tasks(AD) and 
params(AD) return the task and parameter nodes. 
Similar hierarchical domain descriptions have 
been suggested in (Young et al., 1990) for a naval 
domain and in (Caminero-Gil et al., 1996) for an 
e-mall assistance domain. A tree-like organiza- 
tion of the domain is sufficent for the information 
retrieval domains, which we are currently consid- 
ering. We expect, however, that in future work we 
1Extensions include has-a relations. 
movie service 
movieInfot buyTicketst theatreInfot 
Ticketsp 
Timep timep dat% k~_~we e 
wee kpDay P 
Figure 2: A description of a movie service. No- 
tation: U and v represent and/or-relations. Sub- 
scripts t and p denote tasks and parameters. 
will need to switch to a semantic network struc- 
ture or since our future research includes auto- 
matic generation of system utterances from our 
dialogue primitives, we hope to be able to uti- 
lize the ontology and domain organization work, 
which has proven so useful for text generation 
(Bateman et al., 1994; Bateman et al., 1995), for 
both dialogue management and text generation. 
2.3 Dialogue Primitives 
Following the procedure outlined in Section 2.4, 
the dialogue manager calculates a bag of primi- 
tives for each turn and speaker. Our current col- 
lection is motivated through our experience with 
several domains, e.g., movie service, horoscope 
service, and directory assistance. The collection 
is not exhaustive and we will add primitives as 
wider dialogue coverage is required. 
Notation: A primitive is written prim- 
Name(p=v,n), where primName is its name; p E 
params(AD) U {aTask}; aTask is a special param- 
eter whose values E tasks(AD); v is the value of p; 
and n is an integer denoting the number of times 
a primitive has been uttered. If v is uninstanti- 
ated, it is left out for readability. Unless otherwise 
stated, p E params(AD). 
2.3.1 GEN-Primitives 
Our current GEN-primitives: 
salutation(p=v): system opens or closes the inter- 
action, p E {hello, goodbye}, v E {morning, day, 
evening}. 
requestValue(p): system requests a value for the 
paramter p. p E params(AD) U {aTask}. 
requestValue(p=v): system asks whether the value 
v of parameter p is correct. If this form is used, the 
system has a list of alternative values for p, and 
132 
v is not a recognition result (e.g., Frankfurt am 
Main or Frankfurt an der Oder where Frankfurt 
is the recognition result.) 
requestValue(aTask=v), v E tasks(AD) U {repeat- 
PreServiceTask, useService, repeatService}: system 
requests a value for aTask. If v E {repeatPre- 
ServiceTask, useService, repeatService}, the system 
requests whether the user wants the pre-service 
task repeated, the service started (first task after 
pre-service task), or a new task started. 
requestConfirm(p=v): system asks whether the 
value v of parameter p is correct, v is a recog- 
nition result, p E params(AD) U {aTask}. Am- 
bignous results not resulting from speech recog- 
nition, e.g., Frankfurt am Main vs. ~zauldurt an 
der Oder, would yield multiple requestValue(p=v) 
primitives. 
requestValueABC(p): system requests the spelling 
of the value of parameter p. 
requestParam(p=v): system asks whether the 
value v is a value for parameter p. 
evaluate(p=v) : system acknowledges value v of 
parameter p. 
promise(p=v): system promises to attempt to 
answer the user's request, p E params(AD) U 
{aTask}. v E {pleaseWait}. Only used after nav- 
igate(), requestParam 0 or requestAIternative 0 if 
the user has to wait long for a reply. 
inform(aTask=v): system informs about the ac- 
quired database results, v E aetiveTask(AD) U 
{tooMany, zero}. If v = activeTask(AD), there 
are several answers, if v = tooMany/zero, there 
are either too many answers to be enumerated or 
zero answers. 
inform(aTask=n): system presents the n'th answer 
to the query t. n > 0 
inforrnAIternative(p): system informs that there 
are several possible values for p. p E params(AD) 
U {aTask}. v E {tooMany, null}. If v = tooMany, 
there are too many alternatives to be enumerated. 
v = null, means that v is uninstantiated, not that 
there are zero alternatives. 
inforrnAIternative(p=v): system informs that a 
possible value of p is v. p E params(AD) U 
{aTask}. 
informNegative(p): system infolds that the user 
misrecognized something, p E params(AD) U 
{aTask}. 
informPositive(p): system informs that the user 
recognized something correctly, p E params(AD) 
U {aTask}. 
withdraw(p): system withdraws from dialogue for 
reason p E {error} before it has started negotia- 
tions. 
withdrawOffer(aTask=v): system withdraws an of- 
fer for reason v E {error}. 
withdrawPrornise(aTask=v): system withdraws a 
promise for reason v E {error}. 
In Section 3, we present several sample instan- 
tiations of the primitives. 
2.3.2 REC-Primitives 
Our current REC-primitives: 
requestParam(p): user requests which parameter 
the system requested, p E params(AD) U {null}. 
requestAIternatives(p): user requests possible val- 
ues for parameter p. 
requestConffirm(aTask=n): user asks system to 
confirm an answer that it has given, e.g., "Was 
the first answer $30?" 0 < n < no of query results. 
informValue(p=v): user provides value v for pa- 
rameter p. p was requested. 2 
informExtraValue(p=v): user provides value v for 
parameter p. p was not requested in the preceeding 
system utterance. 
informValueABC(p=v): user spells the value v of 
parameter p. The spelling is expanded by synsem 
and expansions are presented to the dialogue man- 
ager. 2 
inforrnPositive(p=v): user confirms that the value 
of parameter p is v. p E params(AD) U {aTask}. 
informNegative(p=v): user disconfirms that the 
value of parameter p is v. p E params(AD) U 
{aTask}. 
correctValue(p=v): user corrects a misrecognized 
value. Often used together with informNegative. 
For example, "Hamburg, not Homburg. "2 
informGarbage(p): user says something but recog- 
nizer and/or synsem could not make sense out of 
it. 
changeValue(p=v): user changes the value of pa- 
rameter p to v instead of v'. 2 
repeatValue(p=v): user repeats the value v of pa- 
rameter p.2 
correctPararn(p=v): user corrects that v is the 
value of p, not p'. 
disambiguate(p=v): user chooses v as the value of 
p when presented with a choice between several 
values for p. p E params(AD) U {aTask}. 
2The pragmatic interpreter instantiates v. 
133 
rejectValue(p=v): the user has been given a se- 
ries of alternatives and chooses p=:v'. Primitive is 
combined with disambiguate(p=v'). 
navigate(aTask=v): user navigates in the query re- 
sults, v E {forward, backward, repeat, n} where 0 
n < no of query results. 2 
rejectRequest(p=v): user ignores or does not hear 
the system request, v E {null, didNotHear}. 
rejectOffer(aTask=v): user ignores or does not 
hear the system offer, v E tasks(AD) U {null, did- 
NotHear}. 
evaluate(t=v): user evaluates an answer she has 
received, v E {positive, neutral, negative, cancel}. 
cancel is used to end the current dialogue after at 
least one answer has been given ~md start a new 
one without calling again. 
promise(p): user promises to find a value for p. 
withdrawAccept(aTask=v): user ,mthdraws from 
the conversation for reason v E {cancel, hangup}. 
With cancel, the user ends the current dialogue 
before an answer has been given and starts a new 
task Without calling again. 2 
withdrawPromise(p=v): user withd.raws a promise 
to provide a value for reason v E {cancel, 
hangup}. 2 
withdrawRequest(p=v): user withdraws a request. 
p E params(AD) U {forward, backward, repeat, and 
n}. 2 
null(): returned to the dialogue manager if the 
,user does not say anything and is not ezpected to 
say anything, e.g., after a greeting or promise. 
In Section 3, we present several sample instan- 
tiations of the primitives. 
2.4 Dialogue Engine 
The dialogue engine (Hagen, 1999) consists of a 
reasoning engine and several knowledge sources: 
An AD defines an application's data-needs, a di- 
alogue grammar defines how a dialogue may pro- 
ceed at the level of speech acts, and a dialogue 
history is a dynamically growing parse tree of an 
on-going dialogue with respect to the dialogue 
grammar. Other knowledge sources may be re- 
quired, for instance, recognition confidence or dis- 
ambiguation of city names. 
The dialogue engine calculates the next turn by 
consulting and combining information from the 
knowledge sources. It consults with the dialogue 
history and the dialogue grammar in order to cal- 
culate which speech acts may continue a dialogue. 
Speech acts have no propositional content, thus 
in the context of the current dialogue history and 
the state of the application description, they are 
translated into dialogue primitives, which have 
content, for example, the name of a parameter 
and a potential value for this parameter. Here we 
will walk through an example of how some prim- 
itives are calculated in a simple question-answer 
dialogue. 
Example: For our example we will use the AD 
in Figure 2. Assume that the task has already 
been negotiated and set to theatre information 
(i.e., activeTask(AD) = theatrelnfo), i.e., the sys- 
tem needs to acquire the name of the theatre and 
the name of the city. All other nodes in the AD 
are closed since they are not relevant to this task. 
The speech act grammar used in our system 
is presented in Appendix A but we will use a 
trivial grammar for the example. It can account 
for simple question-answer dialogues where a 
request from the system (sys) is followed by an 
inform from the user (usr). The system can 
respond to the inform with a sub-dialogue: s 
Dialogue(sys)--~(request(sys) + Inform(usr))* 
Inform(usr)--+inform(usr) + \[Dialogue(sys)\] 
The dialogue history reflects all previous ne- 
gotiations (here: task theatreinfo). 
Dialogue(sys) 
request(sys) .. Inform(usr) requestValue(task) 
intprm.(usr) . . , informValue(task=theatrelnto) 
The next turn can be rooted in either the 
Inform(usr) after the inform(usr) or in the Dia- 
Iogue(sys) after Inform(usr). 
With all the above knowledge sources in place, 
the calculation of the next dialogue turn can start: 
1. The last speech act in the dialogue history 
gives us a starting point in the grammar, thus 
moving forward from inform(usr), the next atomic 
speech act is request(sys)--either as a flat struc- 
ture (i.e., request(sys) off Dialogue(sys)) or in a 
sub-dialogue (i.e., Dialogue(sys)-I-request(sys) off 
Inform(usr)). 
2. Knowing that the system can request some- 
thing, the dialogue engine consults with the AD 
for what the system can ask about. The flat 
strucutre (request(us)) represents negotiation of 
the task but since we assume that negotiation of 
the task is complete (i.e., Status(theatrelnfo) = ac- 
tive), this speech act is not interpreted into a prim- 
SThe star (') means that a dialogue may contain 
several request(sys) -I- Inform(usr) sequences. Lower- 
case speech acts are atomic, while others are complex. 
The dialogue in square brackets (\[\]) is optional. 
134 
itive. Next we consider the sub-diaJogue struc- 
ture. Both children of theatrelnfo are open (i.e., 
they have not been negotiatied yet) thus the sys- 
tem randomly chooses to pursue city whose state 
is changed to topic. The speech act and the pa- 
rameter are combined into the primitive request- 
Value(city)--request a value for the parameter city 
(e.g., "In which city is the theatre?"). We chose 
to use the sub-dialogue structure instead of the 
flat strucutre to represent negotiation of parame- 
ter values since they are subordinate to the task 
in the sense that the task dictates which parame- 
ter values are needed. This is also the case for the 
real gammar (Appendix A). 
3. The primitive requestValue(city) is added to 
the dialogue history: 
Dialogue(sys) 
J 
request(sys) I nform(usr) requestValue(task) j 
intorm(usr) Dialogue(sys) inform Value(task=theatrelnfo) 
request(sys) . requestValue(city) 
4. Starting from request(sys), the grammar 
states that inform(usr) (i.e., Inform(usr) + in- 
form(usr)) is the next speech act in the dialogue. 
requestValue(city) was the last primitive spoken. 
Reasoning that a user-inform in response to a sys- 
tem requestValue should involve the same parame- 
ter as the system's requestValue, the information is 
combined to form the primitive informValue(city), 
i.e., the user should respond to the system request 
with a value for the parameter city. Let's assume 
that the user replied "Hong Kong", thus the dia- 
logue history is expanded: 
Dialogue(sys) 
request (sys) ., Inform(usr) req uestVa lue (task),...-. 
intorm(u.sr). . o .Dialogue(sys) 
informValue(task=theatrelnto) / 
request(sys) . Inform(usr) requestValue(city) 
intorm(usr~ informValue(city& Hohg Kong) 
5. Starting ~om inform(usr), the grammar re- 
turns reques't(sys) and Dialogue(sys)-t-request(sys). 
Since a recogniton result is available from the pre- 
vious turn, the engine checks its recogution con- 
fidence. If it is high, it would consider the nego- 
tiation of city finished, change its state to closed, 
and discard Dialogue(sys)+request(sys) since there 
is nothing to be requested about a closed param- 
eter. It would translate request(sys) into request- 
Value(theatre) since theatre is the only remaining 
open parameter. 
If confidence is low, the dialogue engine 
may decide to ask the user to confirm 
the recognized value. In which case, Dia- 
Iogue(sys)+request(sys) would be interpreted into 
requestConfirm(city=Hong Kong). Whether re- 
quest(sys) would be interpreted or not depends 
on the dialogue strategies chosen by the service 
designer (see Sections 3 and 4). 
If confidence is extremely low, the dialogue en- 
gine may decide to repeat the question. In which 
case, request(sys) would be interpreted into re- 
questValue(city, 2), while the sub-dialogue struc- 
ture would be discarded. 
6. Any interpretation of the flat strucutre would 
result in the following addition to the last Dia- 
Iogue(sys) in the dialogue history. 
I 
Dialogue(sys) 
request(sys) . Inform(usr) recluest(sys) requestValue(city) i 
infgrm(u.sr) . • informValue(city= Hong I~.ong) 
Our example shows how a speech act can result 
in several primitives depending on the context and 
thus how the dialogue manager dynamically reacts 
to external events. Although this brief description 
may not show it, our dialogue manager can handle 
mixed initiative dialogue (Hagen, 1999). In (Ha- 
gen, 1999), we also present our theory of taking, 
keeping, and relinquishing the initiative. 
Heisterkamp and McGlashan (1996) presented 
an approach that uses a similar division of func- 
tionality as we do: task (=application), contex- 
tual (=synsem + pragmatic), and pragmatic in- 
terpretation (=dialogue engine). They also use 
abstract parameterized units similar to ours, but 
they do not use a speech act grammar to cal- 
culate the units. Rather, they map contex- 
tual functions onto dialogue goals, e.g., the func- 
tion new_for_system(gaalcity:munich) introduces 
the dialogue goal confirm(goalcity:munich). In 
terms of our primitievs this could be expressed as 
requestConfirm 0 follows informValue 0. We choose 
not to start our modelling at this level since we 
want to be able to vary what follows informValue0, 
e.g., requestConfirmO, requestValueABCO, or eval- 
uate(). 
3 Primitives in Use 
Conceptually, GEN-primitives are calculated first 
and then a bag of possible responses (REC- 
primitives). One dialogue primitive corresponds 
to one information unit or communicative goal, 
135 
GEN-Primitive 
requestValue(film) 
requestConfi rm 
(theatre=Ridge) 
REC-Primitives 
informValue(filrn) 
rejectRequest(film) 
withd rawAccept (aTask=hangup) 
withd rawAccept(aTask=cancel) 
inform Positive(theatre=Ridge) 
inform Negative(theatre=Ridge) 
rejectReq uest(theatre=Ridge) 
withdrawAccept(aTask=ha ngup) 
withdrawAccept(aTask=cancel) 
Table 1: REC-primitives calculated in response 
to two GEN-primitives in Dialogue 1. 
e.g., in an information retrieval setting: provid- 
ing or requesting one piece of infi)rmation. Prim- 
itives can be used individually or combined to ac- 
count for more complex dialogue. Whether and 
how they are combined depends on the dialogue 
strategies specified by the service designer. In this 
and the following section, we will examine several 
such strategies and show how the primitives are 
combined to achieve them. 
3.1 Question-Answer Dialogue 
In the simplest case, the service designer wants a 
strickt question-answer dialogue: 4 
Dialogue 1: Question-Answer 
Sys: "Which film do you want to see?" req uest:Value(film) 
Usr: "The Matrix. At the Ridge." 
Int: informValue(film= Matrix) 
Sys: "Which theatre?" requestValue(theatre) 
Usr: "Ridge. R I D G E." Int: "mformValue(theatre=Ridge). ,, 
Sys: "Did you say The Ridge? 
requestConfi rm(theatre= Ridge) 
Usr: ~Yes. R I D G E." 
Int: inform Positive(theatre= Ridge.) 
For this type of dialogue, only the REC- 
primitives representing direct answers, rejects, 
and withdraws are calculated. In Table 1, we 
present those calculated in response to the 
first and the third system turn. We see that, 
after requestValue(film), only iinformValue(film) 
is calculated and the pragmatic', interpreter has 
no chance to detect "At the Ridge" (even if 
synsem parsed it correctly) since there is no in- 
formExtraValue(theatre) available to map it onto. 
Similarly, after requescConfirm(theatre=Ridge) 
only informPositive(theatre=Ridge) and in- 
formNegative(theatre=Ridge) are available and 
"R I D G E" cannot be detected since there is no 
informValueABC(city) primitive present. 
41n the sample dialogues, 'Sys' means system turn, 
'Usr' means user turn, and 'Int' means primitives rec- 
ognized and sent back to the dialogue engine from the 
pragmatic interpreter. 
GEN-Primitive 
requestValue(film) 
requestConfirm 
(theatre=Ridge) 
PdgC-Primitives 
inforrnValue(film) 
rejectReq uest(film) 
inform ExtraValueValue(time) 
informExtraValue(theatre) 
inforrn Ext raVal ue(city) 
inform ExtraValue(noOfTickets) 
inforrnExtraValue(date) 
withd rawAccept(aTask=v) = 
inform Positive(theatre= Ridge) 
inform Negative(theatre=Ridge) 
rejectReq uest(theatre=Ridge) 
inform Ext raVal ue(ti m e) 
inform ExtraValue(city) 
informExtraValue(noOfTickets) 
inform ExtraValue(date) 
withdrawAccept(aTask=v) a 
=Vv E {cancel, hangup} 
Table 2: REC-primitives calculated in response 
to two GEN-primitives in Dialogue 2. 
3.2 Over-Answering 
In our experience, users frequently provide more 
information than explicitly asked for, thus a more 
flexible dialogue strategy would be to allow over- 
answering and Dialogue 1 could have developed as 
follows: 
Dialogue 2: Over-Answering 
Sys: "Which film do you want to see?" requestValue(film) 
Usr: =Matrix. At the Ridge. R I D G E." 
Int: informValue(film= Matrix) + informExtraValue(theatre=The Ridge) 
Sys: "Did you say The Ridge?" req uestConfi rrn (theatre= Ridge) 
Usr: "Yes, and I want the late show." 
Int: informPositive(theatre=Ridge) + informExtraValue(time=9P M) 
In Table 2, we present the REC-primitives cal- 
culated in response to the same system turns as 
in Dialogue 1. In Dialogue 2, only over-answering 
of requestValue 0 primitives were allowed, thus 
"R I D G E" could still not be accounted for. 
3.3 Complex Mixed Initiative 
Here we consider the most complex dialogue strat- 
egy that we can currently offer: The system is able 
to account for complex mixed initiative dialogue 
(at least from a dialogue point of view), i.e., the 
user can requst clarifications, over-answer, change 
values, repeat values, correct values, spell values, 
and reject requests as she pleases. 
Dialogue 3: Complex Mixed Initiative 
Sys: "Which ~Irn do you want to see?" 
requestValue(film) Usr: =Sorry, did you ask for the time?" 
Int: requestParam(time) Sys: =No. Which film do you want to see?" 
informNegative(time) + requestValue(film, 2) 
136 
Dialogue 3 cont'd. 
Usr: "Matrix. At the Ridge." 
Int: informValue(film= M atrix) 
+ informExtraValue(theatre=The Ridge) 
Sys: "Did you say The Ridge?" 
requestConfirm (theatre=Ridge, 1) 
Usr: "Sorry, I didn't hear that." 
Int: rejectReq uest(theatre=didNotHear) 
Sys: "Did you say The Ridge?" requestConfirm(theatre= Ridge, 2) 
Usr: "Yes, The Ridge. R I D G E." 
Int: inform Positive(theatre= Ridge) 
+ repeatValue(theatre= Ridge) 
+ informValueABC(theatre=Ridge) 
Sys: "Ok. What time?" 
evaluate(theatre=Ridge) 
+ requestValue(time) Usr: "I don't know. What are the alternatives?" 
Int: req uestAIternatives(time) 
Sys: "18:30 or 21:00." informAlternative(time=18:30) 
+ informAIternative(time=21:00) Usr: "Ok, two tickets for the late show tomorrow." 
Int: evaluate(time=neutral) 
-l.- inform ExtraValue(noOfTickets=2) 
+ informValue(time=21:00) 
+ informExtraValue(date=July 4) Sys: "Did you say two tickets?" 
req uestConfirm(noOfTickets=2) 
• Usr: "Yes, but I change to the early show." 
Int: inform Positive(noOfTickets=2) 
+ changeValue(time=18:30) 
In Table 3, we present the REC-primitives cal- 
culated in response to two system utterances. 
3.4 Multi-Functional Turns 
It has been argued that speech act grammars can- 
not be used to describe dialogue since utterances 
can be multi-functional or encode more than one 
speech act; Speech act grammars can typically be 
in only one state at a time, thus they cannot cap- 
ture this phenomenon (Levinson, 1981). In an 
information retrieval setting such situations oc- 
cur, for example, when users disregard the system 
utterance and provide unrelated information or 
when a recogniton mistake occured and the sytem 
asks for confirmation. Instead of answering yes or 
no, users frequently answer with the correct value, 
which implicitly disconfirms the previous value: 
Dialogue 4: Multi-Functional Utterances 
Sys: "How many tickets?" req uestVal ue(noOfTickets) 
Usr: "I want tickets for July 4." 
I.ut: reject Request (noOf'l'ickets) 
+ informExtraValue(date-~July 3) 
Sys: "Did you say July 3?" requestConfirm (date=July 3) 
Usr: "Tomorrow!" 
Int: informNegative(date=July 3) -t- correctValue(date=July 4) 
In the first utterance, the user both ignores the 
system utterance and provides some information. 
In the second one, she negated and correctd the 
system suggestion with a single word. 
GEN-Primitive 
requestValue(film) 
requestConfirm 
(theatre=Ridge) 
REC-Primitives 
informValue(film) 
informValueABC(film) 
requestAIternatives(film) 
promise(film) 
rejectRequest(film=v) = 
informGarbage(film) 
requestParam(p) b 
informExtraValue(p) b 
informValueABC(p) b 
repeatValue(p) ~ 
changeValue(p) c 
withdrawAccept(aTask=v) d 
inform Positive(theatre---- Ridge) 
repeatValue(theatre=Ridge) 
informNegative(theatre----Ridge) 
correctValue(theatre) 
informValueABC (theatre) 
rejectRequest(theatre=v) a 
inform Garbage(theatre) 
informExtraValue(p) b 
informValueABC(p) b 
repeatValue(p) ¢ 
changeValue(p) c 
withd rawAccept(aTask=v) 'd 
=Vv E {null, didNotHear} 
~VpE openParams(AD) 
VpE closedParams(AD) 
~Vv E {cancel, hangup} 
Table 3: REC-primitives calculated in response 
to two GEN-primitives in Dialogue 3. 
Since we are not using the speech act grammar 
directly and instead interpret the speech acts into 
a bag of primitives, we can assign as many prim- 
itives to an utterance as necessary and are not 
bound by the states dictated by a grammar. This 
aspect of our approach becomes even more inter- 
esting when the system combines several primi- 
tives in its utterance (Section 4). 
4 Dialogue Strategies 
Although, the procedure outlined in Section 2.4, 
only shows how to calculate one primitive per sys- 
tem turn, the approach is, of course, not limited 
to this. The service designer can decide to em- 
ploy mixed initiative dialogue strategies for the 
system utterances as well, for example, requesting 
or confirming several values at once or implicitly 
confirming values. The dialogue strategies for sys- 
tem utterances include choosing nodes in the ap- 
plication description, dealing with speech recogni- 
tion results, or dealing with ambiguous data from 
other knowledge sources. Here we present a few 
examples of how the dialogue manager would com- 
bine hypotheses (for more information see (Hagen, 
2001)). 
137 
4.1 Confirmation Strategies 
We illustrate implicit and multiple confirmation, 
i.e., the system realizes requestValue and request- 
Confirm or multiple requestConfirm primitives in 
one utterance: 
Dialogue 5: Confirmation Strategies Sys: "Which showing of The Mal;rix do you want?" 
requestValue(time) -I- requestConfi rm (film= Matrix) 
Usr: "(No.) Buena Vista!" 
Int: informNegative(film=The Mzttrix) 
+ correct:Value(film=Buena Vista) 
+ reject:Request(time) 
Sys: "Which showing of Buena Vista do you want?" 
requestConfirm(film=Buena 'Vista) 
+ requestValue(time) Usr: =The late show. Tomorrow. :~ 
Int: inforrnPositive(film=Buena Vista) 
informValue(time=21:00) 
+ infformExtraValue(date=8 October) 
Sys: "Did you say 21:00 today?" 
requestConflrm (time=21:00) = 
requestConfirm(date=October 7) 
Usr: "No. Tomorrow." 
Int: inform Positive(time=21:00) 
-I- informNegative(date= October 7) + correctValue(date=October 8) 
For the first two utterances, the system has a 
recognition result for the parameter film with a 
low recognition score. Consequently, it calculates 
requestConfirm(film=Matrix/Buena Vista). Addi- 
tionally, there are still open parameter nodes in 
the AD, thus the dialogue engine picks one (ei- 
ther at random or if the service designer has 
ordered them, the next one) and calculates a 
requestValue primitive, here requestValue(time). 
If the service designer allows implicit confirma- 
tion, the two primitives are combined and ut- 
tered together in one turn. If the service de. 
signer does not allow implicit confirmation, the 
dialogue engine continues the dialogue with the 
topic that has alread been introduced, i.e., re. 
questConfirm (film= Matrix/Buena Vista). 5 
For its last utterance, the system has two recog- 
nition results with a low recognition score, thus 
for each one of them it calculates a requestConfirm 
primitive. If the service designer, allows multiple 
confirmations, they are combined and realized as 
one utterance. If not, the dialogue engine chooses 
requestConfirm(time=21:00), since this topic was 
introduces first. If topics are introduced in the 
same utterance, it pickes one at random. 
4.2 AD Based Strategies 
When requesting parameter values from the user, 
the system consults the application description for 
SThis is a conceptual account. In the implemen- 
tation, the requestValue primitive would not be cal- 
cualted at all, if the service designer does not allow 
implicit confirmation. 
open nodes. If there are several open nodes, the 
dialogue manager can decide to keep the initiative 
and produce several primitives, which can be com- 
bined into one turn. If the nodes are joined with 
an or-relation, the text generator would trans- 
late the primitives into an utterance offering al- 
ternative ways of entering the same information. 
For example, "Please tell me the show time or 
early or late show." (requestValue(time) + re- 
questValue(namedTime)). If the nodes are joined 
with an and- or a has-a relation, the text gen- 
erator would translate the primitives into an ut- 
terances requesting several different pieces of in- 
formation. For example, "What is the name of 
the city and the theatre?" (requestValue(city) + 
requestValue(theatre)). 
As seen in the application descriptions there 
may be several ways of acquiring a particular value 
e.g., date and time in Figure 2. If a parameter 
value is recognized with a low score, the service 
designer can decide whether the system shall con- 
tinue processing the original parameter or whether 
it shall switch to one of the alternative ones. Thus 
after a bad recognition of date, the system can 
switch strategy and request weekDay and week in- 
stead. 
Which strategies to follow is decided by the ser- 
vice designer through a set of switches in the dia- 
logue strategies specification file (Figure 1). 
5 Pragmatic Interpreter 
After synsem interpretation, the user utterance 
must be mapped onto dialogue primitives. A bag 
of REC-primitives is calculated for each user ut- 
terance and the pragmatic interpreter must assure 
that the utterance is mapped onto primitives in 
this bag. There is always a mapping. The reject 
and withdraw primitives are always part of the bag 
thus in the worst case, the user utterance would 
be mapped onto one of these. 
Since primitives in their uninstantiated form are 
application independent, we can develop generic 
rules for this mapping. In other words, the rules 
define how the dialogue strategies presented in 
Section 3 are mapped onto primitives and how we 
account for several primitives per utterance. 
A rule has the form: GEN-Primitives A user ut- 
terance =~- REC-primitives. In Table 4, we present 
two rules for implicit confirmation. The first one 
corresponds to the first sys/usr pair in Dialogue 5. 
The user responds with a new value vs (Buena 
Vista) for P2 (film) in requestConfirm(p2=v2) 
and thereby disconfirms v2 (Matrix) and re- 
jects the request for a value for Pl (time) in 
requestValue(pl). The second rule corresponds to 
138 
GEN-Primitives requestValue(p~) 
V1 < i _< maxi 
req uestConf.(pj =vj ) 
Vl < j < max# 
requestValue(pi) 
V1 < i < maxi 
requestConf.(p# =vj) 
V1 _< j < max# 
Input (no) 
pj~vt~ 
vz ~ vj, 
Vj 
l<j_< 
k ~ ma.xj 
Vi 
l_<i_< 
k _< =axi 
l~EC-Primitives informNeg.(pj =v./) 
Yjl<j<k correctVal.(pj =vl) 
Yjl<j<k informPos. 
(pj =v# ) 
W k < j < max j 
rejectRequest(pi) 
V1 < i _< maxl 
informVal.(pi=vi) 
Vil<i<k informPos.(pj =vj) 
Vl _< j < maxj 
rejectRequest(pi) 
Vi k < i < maxi 
Table 4: Mapping of user 
primitives, p=v means that 
value v for param p. 
input onto REC- 
the user provided 
the second sys/usr pair in Dialogue 5. The user 
provides value vx (the late show) for Pl (time) 
in requestValue(pl) and thus confirms v2 (Buena 
Vista) in requestConfirm(p2=v2). For instanti- 
afion of the primitives, see Dialogue 5. Here, 
we only presented two examples. Similar rules 
were developed for all our primitives and dialogue 
strategies (see (Hagen, 2001)). 
One reviewer asked whether we can modify the 
approach such that expectations can be overrid- 
den if there is sufficently good information from 
the synsem module. The short answer is that we 
could (re-)calcnlate the primitives pretending that 
the service designer allowed mixed initiative re- 
gardless of the dialogue strategies actually chosen. 
W~, however, think it is important to give her the 
right to decide. For example, if she has decided 
that over-answering is allowed, informExtraValue0 
primitives for all parameters whose status is still 
open would be calculated and thus there is noth- 
ing to override. If, however, the service designer 
has decided that over-answering is not allowed, we 
assume that she had good reasons for doing that 
and the dialogue manager will not try to overrule 
this decision. 
6 Conclusion 
We have presented some results from our research 
on spoken dialogue management. We concen- 
trated on how to dynamically calculate a collec- 
tion of predictions for how to continue a dialogue 
(dialogue primitives), how to account for differ- 
ent dialogue strategies and utterances with sev- 
eral communicative goals through combinations of 
primitives, and how to map the user utterances 
onto primitives. The approach has been imple- 
mented and tested in several prototype systems, 
e.g., horoscope, movie, and telephone rate service 
(Feldes et al., 1998). 
Dialogue grammars have previously been used 
to manage dialogue (Bunt, 1989; Bilange, 1991; 
Traum and Hinkelman, 1992; JSnsson, 1993; Mast 
et al., 1994; Novick and Sutton, 1994; Chino and 
Tsuboi, 1996), but we are not aware of an ap- 
proach where speech acts are translated into a 
collection of primitives with propositional content. 
Previous grammar approaches use the speech acts 
directly or assume a one-to-one correspondence 
between utterance and speech act. 
Through the natural division of the knowledge 
into type and content, we have achieved a flex- 
ible dialogue manager that adapts to users' be- 
haviour. We can take advantage of the predictive 
capabilites of speech act grammars and still be 
able to account for multi-functional utterances. 
We have also demonstrated that our approach 
is flexible: 1. the dialogue engine, the pragmatic 
interpreter, the primitives and the algorithm for 
mapping user utterances onto predictions are ap- 
plication and  independent, which makes 
it easy to reuse our dialogue manager in new ap- 
plications, and 2. the dialogue manager can easily 
account for several types of dialogue, e.g., strict 
question-answer or mixed initiative. We give the 
service designer the freedom to decide which kind 
of dialogue she wants---on a high level--and the 
dialogue manager combines the basic primitives 
accordingly. 
Future work includes empirical testing to ver- 
ify whether we are calculating appropriate predic- 
tions. Also, several aspects of our dialogue gram- 
mar have not yet been translated into primitives, 
for example, the frequent use of assert in natu- 
ral dialogue. As a wider dialogue coverage is re- 
quired, we will add primitives accordingly. We are 
also working on using the primities as input to a 
multi-lingual automatic text generation system. 
Acknowledgements 
The author thanks the three anonymous reviewers 
for their helpful comments on the first draft of 
this paper. Financial support from the Norwegian 
Research Council, project number 116578/410 is 
greatly appreciated. 

References 
J.A. Bateman, B. Magnini, and F. Rinaldi. 1994. 
The generalized {Italian, German, English} up- 
per model. In Proe. of the ECAI9J Workshop: 
Comparison of Implemented Ontologies, Ams- 
terdam, The Netherlands. 
J.A. Bateman, B. Maguini, and G. Fabris. 
1995. The generalized upper model knowledge 
base: Organization and use. In Proc. of the 
Conf. on Knowledge Representation and Shar- 
ing, Twente, The Netherlands. 
E- Bilange. 1991. A task independent oral dia- 
logue model. In Proc. of the Euro. Conf. of the 
ACL, pages 83-87. 
H.C. Bunt. 1989. Information dialogues as com- 
municative action in relation to. partner model- 
ing and information processing. In M.M. Tay- 
lor, F. Neel, and D.G. BouwhLfis, editors, The 
Structure of Multimodal Dialogue, pages 47-73. 
North-Holland, Amsterdam. 
J. Caminero-Gil, J. Alvarez-Cercadillo, C. Crespo- 
Casas, and D. Tapias-Merino. 1996. Data- 
driven discourse modeling for semantic in- 
terpretation. In Proe. of 1996 Intl. Conf. 
on Acoustics, Speech, and Signal Processing 
(ICASSP'96), pages 401-404. 
T. Chino and H. Tsuboi. 1996. A new discourse 
model for spontaneous spoken dialogue. In 
1021-1024, editor, Proc. of the 1996 Intl. Conf. 
on Spoken Language Processing {ICSLP'96). 
S. Feldes, G. Fries, E. Hagen, and A. Wirth. 1998. 
A novel service creation enviromnent for speech 
enabled database access. In Proc. ~th IEEE 
Workshop on Interactive Voice Technology for 
Telecommunications Applications (IVTTA '98), 
29-30 Sept. 1998, Torino, Italy. 
E. Hagen. 1999. An approach to mixed ini- 
: tiative spoken information retrieval dialogue. 
User Modeling and User-Adapted Interaction, 
9(1/2):167-213. 
E. Hagen. 2001. Mixed Initiative Spoken Dialogue 
Management in Information Systems. Ph.D. 
thesis, School of Computing Science, Simon 
Fraser University, Burnaby, BC, Canada. Jan. 
2001 expected. 
P. Heisterkamp and S. McGlashan. 1996. Units 
of dialogue management. In Proc. of the 1996 
Intl. Conf. on Spoken Language Processing (IC- 
SLP'96). 
A. JSnsson. 1993. A dialogue manager using 
initiative-response units and distributed con- 
trol. In Proc. of 6th Euro. Conf. of the A CL, 
pages 233-238. 
S.C. Levinson. 1981. Some pre-observations on 
the modelling of dialogue. Discourse Processes, 
4:93-116. 
M. Mast, F. Kummert, U. Ehrlich, G.A. Fink, 
T. Kuhn, H. Niemann, and G. Sagerer. 1994. 
Prosody takes over: Towards a prosodically 
guided dialog system. Speech Communication, 
15(1-2):155-167. 
D.G. Novick and S. Sutton. 1994. An empirical 
model of acknowledgement for spoken- 
systems. In Proc. of the 32nd Annual Meeting 
of the ACL, pages 96-101. 
S. Sitter and A. Stein. 1992. Modelling the il- 
locutionary aspects of information-seeking di- 
alogues. Information Processing and Manage- 
ment, 8(2):165-180. 
D. 'I~aum and E. Hinkelman. 1992. Conversation 
acts in task-oriented spoken dialogue. Compu- 
tational Intelligence, 8(3):575-599. 
S. Young, A. Hauptmann, W. Ward, E. Smith, 
and P. Werner. 1990. High-level knowledge 
sources in usable speech recognition systems. 
In A. Walbel and K. Lee, editors, Readings 
in Speech Recognition, pages 538-549. Morgan 
Kaufman, San Mateo, CA. 
