The CommandTalk Spoken Dialogue System* 
Amanda Stent, John Dowding 
Jean Mark Gawron, Elizabeth Owen Bratt, and Robert Moore 
SRI International 
333 Ravenswood Avenue 
Menlo Park, CA 94025 
{stent,dowding,gawron,owen,bmoore}@ai.sri.com 
1 Introduction 
CommandTalk (Moore et al., 1997) is a spoken- 
language interface to the ModSAF battlefield 
simulator that allows simulation operators to 
generate and execute military exercises by cre- 
ating forces and control measures, assigning 
missions to forces, and controlling the display 
(Ceranowicz, 1994). CommandTalk consists 
of independent, cooperating agents interacting 
through SRI's Open Agent Architecture (OAA) 
(Martin et al., 1998). This architecture allows 
components to be developed independently, and 
then flexibly and dynamically combined to sup- 
port distributed computation. Most of the 
agents that compose CommandTalk have been 
described elsewhere !for more detail, see (Moore 
et al., 1997)). This paper describes extensions 
to CommandTalk to support spoken dialogue. 
While we make no theoretical claims about the 
nature and structure of dialogue, we are influ- 
enced by the theoretical work of (Grosz and 
Sidner, 1986) and will use terminology from 
that tradition when appropriate. We also follow 
(Chu-Carroll and Brown, 1997) in distinguish- 
ing task initiative and dialogue initiative. 
Section 2 demonstrates the dialogue capabil- 
ities of CommandTalk by way of an extended 
example. Section 3 describes how language 
in CommandTalk is modeled for understanding 
and generation. Section 4 describes the archi- 
tecture of the dialogue manager in detail. Sec- 
tion 5 compares CommandTalk with other spo- 
* This research was supported by the Defense Advanced 
Research Projects Agency under Contract N66001-94-C- 
6046 with the Space and Naval Warfare Systems Cen- 
ter. The views and conclusions contained in this doc- 
ument are those of the authors and should not be in- 
terpreted as necessarily representing the official policies, 
either express or implied, of the Defense Advanced Re- 
search Projects Agency of the U.S. Government. 
ken dialogue systems. 
2 Example Dialogues 
The following examples constitute a single ex- 
tended dialogue illustrating the capabilities of 
the dialogue manager with regard to structured 
dialogue, clarification and correction, changes in 
initiative, integration of speech and gesture, and 
sensitivity to events occurring in the underlying 
simulated world. 1 
Ex. 1-" 
U 1 
S 2 
U 3 
S 4 
U 5 
S 6 
Confirmation 
Create a point named Checkpoint 
1 at 64 53 
® 
Create a CEV at Checkpoint 1 
® 
Create a CEV here < click> 
® I will create CEV at FQ 643 576 
Utterances 1 and 3 illustrate typical success- 
ful interactions between an operator and the 
system. When no exceptional event occurs, 
CommandTalk does not respond verbally. How- 
ever, it does provide an audible tone to indicate 
that it has completed processing. For a suc- 
cessful command, it produces a rising tone, il- 
lustrated by the ® symbol in utterances 2 and 
4. For an unsuccessful command it produces a 
falling tone, illustrated by the ® symbol in ut- 
terances 12 and 14. 2 
1U indicates a user utterance as recognized by the 
system, and S indicates a system utterance. Bracketed 
items in italics are gestures. 
2The rising and falling tones come from the predia- 
logue version of the system. They were added because 
we observed that users did not always notice when the 
system had not understood them correctly, and a tex- 
tual error message alone did not always get the user's 
attention. These tones also perform basic grounding be- 
havior. 
183 
Utterance 6 demonstrates a case where, al- 
though the system successfully completed the 
command, it chose to provide an explicit con- 
firmation. Explicit confirmations can be given 
at any time. In this case, the system chose 
to give the confirmation because it performed 
a nontrivial reference, resolving "here" to the 
map coordinates given by the gesture, FQ 643 
576. Similar situations in which the system 
gives an explicit confirmation are the resolution 
of pronouns and elided, definite or plural noun 
phrases. 
Ex. 2: Correction 
U 7 Put Objective Golf here <click> 
S 8 ® I will locate Objective Golf at FQ 
658 583 
U 9 I meant Objective Alpha 
S 10 ® I will locate Objective Alpha at 
FQ 658 583 
Example 2 contains a correction sub-dialogue. 
Utterance 9 is a correction of utterance 7, and 
is interpreted as though the operator had said 
"Put Objective Alpha here". This illustrates 
two points. First, since utterance 7 was suc- 
cessful, the system undoes its effects (that is, 
deletes Objective Golf) before creating Objec- 
tive Alpha. Second, although the edited utter- 
ance contains the word "here", the gesture that 
was used to resolve that is no longer available. 
The system keeps track of gestural information 
along with linguistic information in its repre- 
sentation of context in order to interpret cor- 
rections. 
Ex. 3: Structured Sub-dialogue 
U 11 CEV advance to Objective Golf 
S 12 ® There are 2 CEVs. Do you mean 
100All or 100A127 
U 13 100All 
S 14 @ There is no point named Ob- 
jective Golf. Which point should 
100All proceed to? 
U 15 I said Objective Alpha 
S 16 ® 100All will proceed to Objective 
Alpha 
Example 3 illustrates a structured discourse 
segment containing two subsegments. Utter- 
ance 11 is uninterpretable for two reasons: the 
reference to "CEV" is ambiguous, and Objec- 
tive Golf does not exist. The first difficulty 
is resolved in discourse segment 12-13, and the 
second in discourse segment 14-16. Notice that 
the operator is not required to answer the ques- 
tion posed by the system in utterance 14, but 
is free to correct the system's misunderstanding 
of utterance 11 even though it is not the im- 
mediately prior utterance. This is true because 
utterance 13 (the most recent utterance) is in- 
terpreted as if the operator had said "100All 
advance to Objective Golf". 
Ex. 4: Required Argument 
U 17 A12 conduct crater breach facing 
north 
S 18 ® Which point should 100A12 con- 
duct crater breach oriented north 
at? 
U 19 <click> 
S 20 ® 100A12 will conduct crater 
breach oriented north at FQ 542 
595 
Example 4 demonstrates a case where, al- 
though there are no errors in the operator's ut- 
terance, the system requires additional informa- 
tion before it can execute the command. Also 
note that the question asked by the system in 
utterance 18 is answered with an isolated ges- 
ture. 
Ex. 5: Delayed Response 
U 21 A13 continue to Checkpoint 1 in a 
column formation. 
S 22 ® There is no A13. Which unit 
should proceed in a column forma- 
tion to Checkpoint 17 
U 23 What units are there? 
S 24 ® There are 2 units: 100All and 
100A12. 
U 25 All proceed to Checkpoint 1. 
S 26 ® 100All will proceed in a column 
formation to Checkpoint 1. 
In example 5, the system asks a question but 
the operator needs to perform some other ac- 
tivity before answering it. The question asked 
by the system in utterance 22 is answered by 
the operator in utterance 25. Due to the in- 
tervening material, the most natural way to an- 
swer the question posed in utterance 22 is with a 
154 
complete sentence, rather than an isolated noun 
phrase. 
Ex. 6: Changes in the Simulated World 
U 27 A12 advance to Objective Bravo. 
S 28 ® There is no Objective Bravo. 
What point should 100A12 ad- 
vance to? 
U 29 Create it at 635 545. 
S 30 ® 
S 31 Should 100A12 proceed to Objec- 
tive Bravo? 
U 32 Yes. 
S 33 ® 100A12 will proceed to Objective 
Bravo. 
Example 6 demonstrates the use of a guard, 
or test to see if a situation holds. In utterance 
27, a presupposition failure occurs, leading to 
the open proposition expressed in utterance 28. 
A guard, associated with the open proposition, 
tests to see if the system can successfully resolve 
"Objective Bravo". Rather than answering the 
question in utterance 28, the operator chooses 
to create Objective Bravo. The system then 
tests the guard, which succeeds because Objec- 
tive Bravo now exists. The system therefore 
takes dialogue initiative by asking the operator 
in utterance 31 if that operator would like to 
carry out the original command. Although, in 
this case, the simulated world changed in direct 
response to a linguistic act, in general the world 
can change for a variety of reasons, including the 
operator's activities on the GUI or the activities 
of other operators. 
3 Language Interpretation and 
Generation 
The language used in CommandTalk is derived 
from a single grammar using Gemini (Dowding 
et al., 1993), a unification-based grammar for- 
malism. This grammar is used to provide all the 
language modeling capabilities of the system, 
including the language model used in the speech 
recognizer, the syntactic and semantic interpre- 
tation of user utterances (Dowding et al., 1994), 
and the generation of system responses (Shieber 
et al., 1990). 
For speech recognition, Gemini uses the Nu- 
ance speech recognizer. Nuance accepts lan- 
guage models written in a Grammar Speci- 
fication Language (GSL) format that allows 
context-free, as well as the more commonly used 
finite-state, models. 3 Using a technique de- 
scribed in (Moore, 1999), we compile a context- 
free covering grammar into GSL format from 
the main Gemini grammar. 
This approach of using a single grammar 
source for both sides of the dialogue has sev- 
eral advantages. First, although there are differ- 
ences between the language used by the system 
and that used by the speaker, there is a large de- 
gree of overlap, and encoding the grammar once 
is efficient. Second, anecdotal evidence suggests 
that the language used by the system influences 
the kind of language that speakers use in re- 
sponse. This gives rise to a consistency problem 
if the language models used for interpretation 
and generation are developed independently. 
The grammar used in CommandTalk contains 
features that allow it to be partitioned into 
a set of independent top-level grammars. For 
instance, CommandTalk contains related, but 
distinct, grammars for each of the four armed 
services (Army, Navy, Air Force, and Marine 
Corps). The top-level grammar currently in use 
by the speech recognizer can be changed dy- 
namically. This feature is used in the dialogue 
manager to change the top-level grammar, de- 
pending on the state of the dialogue. Currently 
in CommandTalk, for each service there are two 
main grammars, one in which the user is free to 
give any top-level command, and another that 
contains everything in the first grammar, plus 
isolated noun phrases of the semantic types that 
can be used as answers to wh-questions, as well 
as answers to yes/no questions. 
3.1 Prosody 
A separate Prosody agent annotates the sys- 
tem's utterances to provide cues to the speech 
synthesizer about how they should be produced. 
It takes as input an utterance to be spoken, 
along with its parse tree and logical form. The 
output is an expression in the Spoken Text 
Markup Language 4 (STML) that annotates the 
locations and lengths of pauses and the loca- 
tions of pitch changes. 
3GSL grammars that are context-free cannot contain 
indirect left-recursion. 
4See http ://www. cstr. ed. ac. uk/proj ect s/ssml. 
html for details. 
185 
3.2 Speech Synthesis 
Speech synthesis is performed by another agent 
that encapsulates the Festival speech synthe- 
sizer. Festival 5 was developed by the Centre 
for Speech Technology Research (CSTR) at the 
University of Edinburgh. Festival was selected 
because it accepts STML commands, is avail- 
able for research, educational, and individual 
use without charge, and is open-source. 
4 Dialogue Manager 
The role of the dialogue manager in Com- 
mandTalk is to manage the representation of 
linguistic context, interpret user utterances 
within that context, plan system responses, 
and set the speech recognition system's lan- 
guage model. The system supports natural, 
structured mixed-initiative dialogue and multi- 
modal interactions. 
When interpreting a new utterance from the 
user, the dialogue manager considers these pos- 
sibilities in order: 
1. Corrections: The utterance is a correction 
of a prior utterance. 
2. Transitions/Responses: The utterance is a 
continuation of the current discourse seg- 
ment. 
3. New Commands/Questions: The utterance 
is initiating a new discourse segment. 
The following sections will describe the data 
structures maintained by the dialogue manager, 
and show how they are affected as the dialogue 
manager processes each of these three types of 
user utterances. 
4.1 Dialogue Stack 
CommandTalk uses a dialogue stack to keep 
track of the current discourse context. The 
dialogue stack attempts to keep track of the 
open discourse segments at each point in the 
dialogue. Each stack frame corresponds to one 
user-system discourse pair, and contains at least 
the following elements: 
• an atomic dialogue state identifier (see Sec- 
tion 4.2) 
5See http://~w, cstr. ed. ac. u.k/projects/ 
festival .htral for full information on Festival. 
• a semantic representation of the user's ut- 
terance(s) 
• a semantic representation of the system's 
response, if any 
• a representation of the background (i.e., 
open proposition) for the anticipated user 
response. 
• focus spaces containing semantic represen- 
tations of the items referred to in each sys- 
tem and user utterance 
a gesture space containing the gestures 
used in the interpretation of each user ut- 
terance 
• an optional guard 
The semantic representation of the system re- 
sponse is related to the background, but there 
are cases where the background may contain 
more information than the response. For ex- 
ample, in utterance 28 the system could have 
simply said "There is no Objective Bravo", and 
omitted the explicit follow-up question. In this 
case, the background may still contain the open 
proposition. 
Unlike in dialogue analyses carried out on 
completed dialogues (Grosz and Sidner, 1986), 
the dialogue manager needs to maintain a stack 
of all open discourse segments at each point in 
an on-going dialogue. When a system allows 
corrections, it can be difficult to determine when 
a user has completed a discourse segment. 
Ex. 7: Consecutive Corrections 
U 34 
S 35 
U 36 
S 37 
U 38 
S 39 
Center on Objective Charlie 
® There is no point named Objec- 
tive Charlie. What point should I 
center on? 
95 65 
® I will center on FQ 950 650 
I said 55 65 
® I will center on FQ 550 650 
In example 7, for instance, when the user an- 
swers the question in utterance 36, the system 
will pop the frame corresponding to utterances 
34-35 off the stack. However, the information in 
that frame is necessary to properly interpret the 
correction in utterance 38. Without some other 
mechanism it would be unsafe to ever pop a 
186 
frame from the stack, and the stack would grow 
indefinitely. Since the dialogue stack represents 
our best guess as to the set of currently open dis- 
course segments, we want to allow the system to 
pop frames from the stack when it believes dis- 
course segments have been closed. We make use 
of another representation, the dialogue trail, to 
let us to recover from these moves if they prove 
to be incorrect. 
The dialogue trail acts as a history of all di- 
alogue stack operations performed. Using the 
trail, we record enough information to be able 
to restore the dialogue stack to any previous 
configuration (each trail entry records one op- 
eration taken, the top of the dialog stack before 
the operation, and the top of the dialog stack 
after). Unlike the stack, the dialogue trail rep- 
resents the entire history of the dialogue, not 
just the set of currently open propositions. The 
fact that the dialogue trail can grow arbitrarily 
long has not proven to be a problem in practice 
since the system typically does not look past the 
top item in the trail. 
4.2 Finite State Machines 
Each stack frame in the dialogue manager con- 
tains a unique dialogue state identifier. These 
states form a collection of finite-state machines 
(FSMs), where each FSM describes the turns 
comprising a particular discourse segment. The 
dialogue stack is reminiscent of a recursive tran- 
sition network, in that the stack records the sys- 
tem's progress through a series of FSMs in par- 
allel. However, in this case, the stack operations 
are not dictated explicitly by the labels on the 
FSMs, but stack push operations correspond to 
the onset of a discourse segment, and stack pop 
operations correspond to the conclusion of a dis- 
course segment. 
Most of the FSMs currently used in Com- 
mandTalk coordinate dialogue initiative. These 
FSMs have a very simple structure of at most 
two states. For instance, there are FSMs rep- 
resenting discourse segments for clarification 
questions (utterances 23-24), reference failures 
(utterances 27-28), corrections (utterances 9- 
10), and guards becoming true (utterances 31- 
33). CommandTalk currently uses 22 such small 
FSMs. Although they each have a very simple 
structure, they compose naturally to support 
more complex dialogues. In these sub-dialogues 
the user retains the task initiative, but the sys- 
tem may temporarily take the dialogue initia- 
tive. This set of FSMs comprises the core dia- 
logue competence of the system. 
In a similar way, more complex FSMs can 
be designed to support more structured dia- 
logues, in which the system may take more of 
the task initiative. The additional structure im- 
posed varies from short 2-3 turn interactions to 
longer "form-filling" dialogues. We currently 
have three such FSMs in CommandTalk: 
The Embark/Debark command has four re- 
quired parameters; a user may have diffi- 
culty expressing them all in a single utter- 
ance. CommandTalk will query the user for 
missing parameters to fill in the structure 
of the command. 
The Infantry Attack command has a num- 
ber of required parameters, a potentially 
unbounded number of optional parameters, 
and some constraints between optional ar- 
guments (e.g., two parameters are each op- 
tional, but if one is specified then the other 
must be also). 
The Nine Line Brief is a strMght-forward 
form-filling command with nine parameters 
that should be provided in a specified or- 
der. 
When the system interprets a new user ut- 
terance that is not a correction, the next alter- 
native is that it is a continuation of the current 
discourse segment. Simple examples of this kind 
of transition occur when the user is answering a 
question posed by the system, or when the user 
has provided the next entry in a form-filling di- 
alogue. Once the transition is recognized, the 
current frame on top of the stack is popped. If 
the next state is not a final state, then a new 
frame is pushed corresponding to the next state. 
If it is a final state, then a new frame is not 
created, indicating the end of the discourse seg- 
ment. 
The last alternative for a new user utterance 
is that it is the onset of a new discourse segment. 
During the course of interpretation of the ut- 
terance, the conditions for entering one or more 
new FSMs may be satisfied by the utterance. 
These conditions may be linguistic, such as pre- 
supposition failures, or can arise from events 
that occur in the simulation, as when a guard 
187 
is tested in example 6. Each potential FSM 
has a corresponding priority (error, warning, or 
good). An FSM of the highest priority will be 
chosen to dictate the system's response. 
One last decision that must be made is 
whether the new discourse segment is a subseg- 
ment of the current segment, or if it should be 
a sibling of that segment. The heuristic that- 
we use is to consider the new segment a subseg- 
ment if the discourse frame on top of the stack 
contains an open proposition (as in utterance 
23). In this case, we push the new frame on the 
stack. Otherwise, we consider the previous seg- 
ment to now be closed (as in utterance 3), and 
we pop the frame corresponding to it prior to 
pushing on the new frame. 
4.3 Mechanisms for Reference 
CommandTalk employs two mechanisms for 
maintaining local context and performing refer- 
ence: a list of salient objects in the simulation, 
and focus spaces of linguistic items used in the 
dialogue. 
Since CommandTalk is controlling a dis- 
tributed simulation, events can occur asyn- 
chronously with the operator's linguistic acts, 
and objects may become available for reference 
independently of the on-going dialogue. For in- 
stance, if an enemy unit suddenly appears on 
the operator's display, that unit is available for 
immediate reference, even if no prior linguistic 
reference to it has been made. The ModSAF 
agent notifies the dialogue manager whenever 
an object is created, modified, or destroyed, and 
these objects are stored in a salience list in or- 
der of recency. The salience list can also be up- 
dated when simulation objects are referred to 
using language. 
The salience list is not part of the dialogue 
stack. It does not reflect attentional state; 
rather, it captures recency and "known" infor- 
mation. 
While the salience list contains only entities 
that directly correspond to objects in the sim- 
ulation, focus spaces contain representations of 
entities realized in linguistic acts, including ob- 
jects not directly represented in the simulation. 
This includes objects that do not exist (yet), 
as in "Objective Bravo" in utterance 28, which 
is referred to with a pronoun in utterance 29, 
and sets of objects introduced by plural noun 
phrases. All items referred to in an utterance 
are stored in a focus space associated with that 
utterance in the stack frame. There is one focus 
space per utterance. 
Focus spaces can be used during the genera- 
tion of pronouns and definite noun phrases. Al- 
though at present CommandTalk does not gen- 
erate pronouns (we choose to err on the side of 
verbosity, to avoid potential confusion due to 
misrecognitions), focus spaces could be used to 
make intelligent decisions about when to use a 
pronoun or a definite reference. In particular, 
while it might be dangerous to generate a pro- 
noun referring to a noun phrase that the user 
has used, it would be appropriate to use a pro- 
noun to refer to a noun phrase that the system 
has used. 
Focus spaces are also used during the inter- 
pretation of responses and corrections. In these 
cases the salience list reflects what is known 
now, not what was known at the time the ut- 
terance being corrected or clarified was made. 
The focus spaces reflect what was known and 
in focus at that earlier time; they track atten- 
tional state. For instance, imagine example 6 
had instead been: 
Ex. 6b: 
U 4O 
S 41 
U 42 
Focusing 
A14 advance there. 
® There is no A14. Which unit 
should advance to Checkpoint 1? 
Create CEV at 635 545 and name 
it A14. 
At the end of utterance 42 the system will 
reinterpret utterance 40, but the most recent 
location in the salience list is FQ 635 545 rather 
than Checkpoint 1. The system uses the focus 
space to determine the referent for "there" at 
the time utterance 40 was originally made. 
In conclusion, CommandTalk's dialogue man- 
ager uses a dialogue stack and trail, refer- 
ence mechanisms, and finite state machines to 
handle a wide range of different kinds of di- 
alogue, including form-filling dialogues, free- 
flowing mixed-initiative dialogues, and dia- 
logues involving multi-modality. 
5 Related Work 
CommandTalk differs from other recent spoken 
language systems in that it is a command and 
control application. It provides a particularly 
188 
interesting environment in which to design spo- 
ken dialogue systems in that it supports dis- 
tributed stochastic simulations, in which one 
operator controls a certain collection of forces 
while other operators simultaneously control 
other allied and/or opposing forces, and unex- 
pected events can occur that require responses 
in real time. Other applications (Litman et al., 
1998; Walker et al., 1998) have been in domains 
that were sufficiently limited (e.g., queries about 
train schedules, or reading email) that the sys- 
tem could presume much about the user's goals, 
and make significant contributions to task ini- 
tiative. However, the high number of possible 
commands available in CommandTalk, and the 
more abstract nature of the user's high-level 
goals (to carry out a simulation of a complex 
military engagement) preclude the system from 
taking significant task initiative in most cases. 
The system most closely related to Com- 
mandTalk in terms of dialogue use is TRIPS 
(Ferguson and Allen, 1998), although there are 
several important differences. In contrast to 
TRIPS, in CommandTalk gestures are fully in- 
corporated into the dialogue state. Also, Com- 
mandTalk provides the same language capabil- 
ities for user and system utterances. 
Unlike other simulation systems, such as 
QuickSet (Cohen et al., 1997), CommandTalk 
has extensive dialogue capabilities. In Quick- 
Set, the user is required to confirm each spoken 
utterance before it is processed by the system 
(McGee et al., 1998). 
Our earlier work on spoken dialogue in the air 
travel planning domain (Bratt et al., 1995) (and 
related systems) interpreted speaker utterances 
in context, but did not support structured dia- 
logues. The technique of using dialogue context 
to control the speech recognition state is similar 
to one used in (Andry, 1992). 
6 Future Work 
We have discussed some aspects of Com- 
mandTalk that make it especially suited to han- 
dle different kinds of interactions. We have 
looked at the use of a dialogue stack, salience 
information, and focus spaces to assist inter- 
pretation and generation. We have seen that 
structured dialogues can be represented by com- 
posing finite-state models. We have briefly dis- 
cussed the advantages of using the same gram- 
mar for all linguistic aspects of the system. It is 
our belief that most of the items discussed could 
easily be transferred to a different domain. 
The most significant difficulty with this work 
is that it has been impossible to perform a for- 
mal evaluation of the system. This is due to 
the difficulty of collecting data in this domain, 
which requires speakers who are both knowl- 
edgeable about the domain and familiar with 
ModSAF. CommandTalk has been used in sim- 
ulations of real military exercises, but those ex- 
ercises have always taken place in classified en- 
vironments where data collection is not permit- 
ted. 
To facilitate such an evaluation, we are cur- 
rently porting the CommandTalk dialogue man- 
ager to the domain of air travel planning. There 
is a large body of existing data in that domain 
(MADCOW, 1992), and speakers familiar with 
the domain are easily available. 
The internal representation of actions in 
CommandTalk is derived from ModSAF. We 
would like to port that to a domain-independent 
representation such as frames or explicit repre- 
sentations of plans. 
Finally, there are interesting options regard- 
ing the finite state model. We are investigating 
other representations for the semantic contents 
of a discourse segment, such as frames or active 
templates. 
7 Acknowledgments 
We would like to thank Andrew Kehler, David 
Israel, Jerry Hobbs, and Sharon Goldwater for 
comments on an earlier version of this paper, 
and we have benefited from the very helpful 
comments from several anonymous reviewers. 

References 
F. Andry. 1992. Static and Dynamic Predic- 
tions: A Method to Improve Speech Under- 
standing in Cooperative Dialogues. In Pro- 
ceedings of the International Conference on 
Spoken Language Processing, Banff, Canada. 
H. Bratt, J.Dowding, and K. Hunicke-Smith. 
1995. The SRI Telephone ATIS System. 
In Proceedings of the Spoken Language Sys- 
terns Technology Workshop, pages 218-220, 
Austin, Texas. 
A. Ceranowicz. 1994. Modular Semi- 
Automated Forces. In J.D. Tew et al., 
editor, Proceedings of the Winter Simulation 
Conference, pages 755-761. 
J. Chu-Carroll and M. Brown. 1997. Tracking 
Initiative in Collaborative Dialogue Interac- 
tions. In Proceedings of the Thirty-Fifth An- 
nual Meeting of the A CL and 8th Conference 
of the European Chapter of the ACL, Madrid, 
Spain. 
P. Cohen, M. Johnston, D. McGee, S. Oviatt, 
J. Pittman, I. Smith, L. Chen, and J. Clow. 
1997. QuickSet: Multimodal Interaction for 
Distributed Applications. In Proceedings of 
the Fifth Annual International Multimodal 
Conference, Seattle, WA. 
J. Dowding, J. Gawron, D. Appelt, L. Cherny, 
R. Moore, and D. Moran. 1993. Gemini: A 
Natural Language System for Spoken Lan- 
guage Understanding. In Proceedings of the 
Thirty-First Annual Meeting of the ACL, 
Columbus, OH. Association for Computa- 
tional Linguistics. 
J. Dowding, R. Moore, F. Andry, and D. Moran. 
1994. Interleaving Syntax and Semantics in 
an Efficient Bottom-Up Parser. In Proceed- 
ings of the Thirty-Second Annual Meeting of 
the A CL, Las Cruces, New Mexico. Associa- 
tion for Computational Linguistics. 
G. Ferguson and J. Allen. 1998. TRIPS: An 
Intelligent Integrated Problem-Solving Assis- 
tant. In Proceedings of the Fifteenth National 
Conference on Artificial Intelligence (AAAI- 
98), Madison, WI. 
B. Grosz and C. Sidner. 1986. Attention, Inten- 
tions, and the Structure of Discourse. Com- 
putational Linguistics, 12(3):175-204. 
D. Litman, S. Pan, and M. Walker. 1998. Eval- 
uating Response Strategies in a Web-Based 
Spoken Dialogue Agent. In Proceedings of 
the 38th Annual Meeting of the Association 
for Computational Linguistics, pages 780- 
786, Montreal, Canada. 
MADCOW. 1992. Multi-Site Data Collection 
for a Spoken Language Corpus. In Proceed- 
ings of the DARPA Speech and Natural Lan- 
guage Workshop, pages 200-203, Harriman, 
New York. 
D. Martin, A. Cheyer, and D. Moran. 1998. 
Building Distributed Software Systems with 
the Open Agent Architecture. In Proceed- 
ings of the Third International Conference on 
the Practical Application of Intelligent Agents 
and Multi-Agent Technology, Blackpool, Lan- 
cashire, UK. The Practical Application Com- 
pany Ltd. 
D. McGee, P. Cohen, and S. Oviatt. 1998. Con- 
firmation in Multimodal Systems. In Proceed- 
ings of the 38th Annual Meeting of the Asso- 
ciation for Computational Linguistics, pages 
823-829, Montreal, Canada. 
R. Moore, J. Dowding, H. Bratt, J. Gawron, 
Y. Gorfu, and A. Cheyer. 1997. Com- 
mandTalk: A Spoken-Language Interface for 
Battlefield Simulations. In Proceedings of the 
Fifth Conference on Applied Natural Lan- 
guage Processing, pages 1-7, Washington, 
DC. Association for Computational Linguis- 
tics. 
R. Moore. 1999. Using Natural Language 
Knowledge Sources in Speech Recognition. In 
Keith Ponting, editor, Speech Pattern Pro- 
cessing. Springer-Verlag. 
S. M. Shieber, G. van Noord, R. Moore, 
and F. Pereira. 1990. A Semantic Head- 
Driven Generation Algorithm for Unification- 
Based Formalisms. Computational Linguis- 
tics, 16(1), March. 
M. Walker, J. Fromer, and S. Narayanan. 
1998. Learning Optimal Dialogue Strategies: 
A Case Study of a Spoken Dialogue Agent 
for Email. In Proceedings of the 38th An- 
nual Meeting of the Association for Compu- 
tational Linguistics, pages 1345-1351, Mon- 
treal, Canada. 
