Lessons Learned in Building 
Spoken Language Collaborative Interface Agents 
Candace L. Sidner 
Carolyn Boettner 
Lotus Development Corporation 
Cambridge, MA 02142 USA 
csidnerlcarolyn_boettner@lotus.com 
Charles Rich 
Mitsubishi Electric Research Laboratory 
Cambridge, MA 02139 USA 
rich@merl.com 
Abstract 
This paper reports on the development of two 
spoken  collaborative interface agents 
built with the Collagen system. It presents 
sample dialogues with the agents working with 
email applications and meeting planning appli- 
cations, and discusses how these applications 
were created. It also discusses limitations and 
benefits of this approach. 
1 Collaborative Agents 
The underlying premise of the Collageff M (for 
Collaborative agent) project is that software 
agents, when they interact with people, should 
be governed by the same principles that govern 
human-to-human collaboration. To determine 
the principles governing human collaboration, 
we have relied on research in computational lin- 
guistics on collaborative discourse, specifically 
within the SharedPlan framework of Grosz and 
Sidner (1986, 1990) (Grosz and Kraus, 1996, 
Lochbaum, 1998). This work has provided us 
with a computationally-specified theory that 
has been empirically validated across a range of 
User Agent 
communicate 
l Application 
Figure 1: Collaborative interface agent paradigm. 
human tasks. We have implemented the algo- 
rithms and information structures of this theory 
in the form of a Java middleware component, 
a collaboration manager called Collagen, which 
software developers can use to implement a col- 
laborative interface agent for any Java applica- 
tion. 
In the collaborative interface agent paradigm, 
illustrated abstractly in Figure 1, a software 
agent is able to both communicate with and 
observe the actions of a user on a shared ap- 
plication interface, and vice versa. The soft- 
ware agent in this paradigm takes an active r01e 
in joint problem solving, including advising the 
user when he gets stuck, suggesting what to do 
next when he gets lost, and taking care of low- 
level details after a high-level decision is made. 
The screenshot in Figure 2 shows how the 
collaborative interface agent paradigm is con- 
cretely realized on a user's display. The large 
window in the background is the shared appli- 
cation, in this case, the Lotus eSuite TM email 
program. The two smaller overlapping windows 
Figure 2: Interface for Collagen email agent. 
in the corners of the screen are the agent's and 
user's home windows, through which they com- 
municate with each other. 
A key benefit of using Collagen to build an in- 
terface agent is that the collaboration manager 
automatically constructs a structured history of 
the user's and agent's activities. This segmented 
interaction history is hierarchically organized 
according to the goal structure of the applica- 
tion tasks. Among other things, this history can 
help re-orient the user when he gets confused 
or after an extended absence. It also supports 
high-level, task-oriented transformations, such 
as returning to an earlier goal. Figure 3 shows 
a sample segmented interaction history for the 
an email interaction. 
To apply Collagen to a particular application, 
the application developer must provide an ab- 
stract model of the tasks for which the appli- 
cation software will be used. This knowledge 
is formalized in a recipe library, which is then 
automatically compiled for use by the interface 
agent. This approach also allows us to easily 
vary an agent's level of initiative from very pas- 
sive to very active, using the same task model. 
For more details on the internal architecture of 
Collagen, see (Rich and Sidner, 1998). 
We have developed prototype interface agents 
using Collagen for several applications, includ- 
ing air travel planning (Rich and Sidner, 1998), 
resource allocation, industrial control, and com- 
mon PC desktop activities. 
2 A Collaborative Email Agent 
The email agent (Gruen et al., 1999) is the first 
Collagen-based agent we have built that sup- 
ports spoken- interaction. Our other 
agents avoided the need for natural  
understanding by presenting the user with a 
dynamically-changing menu of expected utter- 
ances, which was generated from the current 
discourse state according to the predictions of 
the SharedPlan theory. Sample menus are dis- 
played in Figure 2. The email agent, how- 
ever, incorporates a speech and natural lan- 
guage understanding system developed by IBM 
Research, allowing users to collaborate either 
entirely in speech or with a mixture of speech 
and interface actions, such as selecting a mes- 
sage. More recently we have developed the 
Lotus Notes TM meeting planning agent, which 
incorporates speech and sentence level under- 
standing using the Java Speech API, as imple- 
mented by IBM. The JSAPI toolkit provides 
a parser, which we use with a vocabulary and 
grammar we developed for the domain of meet- 
ing planning. The tags produced by the Java 
Speech parser are interpreted with a set of se- 
mantic rules that produce internal structures 
used by the Collagen agent. 
With the email application, the user can read, 
compose and send messages as one typically 
does with email. The Collagen email agent, 
called Daffy, performs actions requested by the 
user with speech and watches user interface ac- 
tions. It can perform a few email actions on its 
own (such as opening and closing windows, and 
filling in the user's signature on email) and can 
also undertake actions that the user requests in 
spoken utterances. In the sample session shown 
in Figure 4, the agent keeps a todo list for the 
user, explains how to accomplish email tasks for 
a user who is new to email, answers user ques- 
tions about what actions were taken in the in- 
teraction and offers suggestions about what to 
do next in the interaction as well as forming {lser 
requests. 
To create the email agent, we built a recipe 
library about email, as required for the Colla- 
gen architecture, of about 55 actions and 32 
recipes for doing those actions; the actions in- 
cluded GUI primitives such as sending a mes- 
sage, and high level actions such as reacting to 
a message. 
3 A Collaborative Agent for 
Planning Meetings 
Our experience with Daffy convinced us that 
collaborative agents would be more useful if 
they not only understood what the user was do- 
ing in the interface but could undertake more 
of the user's sub-goals and thus off-load some 
of the burden from the user. To explore this 
notion, we built Dotty, a Collagen agent, that 
works with a user who is planning a meeting 
with a customer, using Lotus Notes. As the di- 
alogue in Figure 5 demonstrates, Dotty is able 
to take over many of the details of planning the 
meeting. Dotty uses a library that is far smaller 
than Daffy's: 19 actions and 5 recipes. 
This dialogue begins with an overall goals of 
managing sales contacts and several sub-goals, 
: 2 
USER: "Your turn," 
Done checking our to-do list. 
AGENT: "Let's check our to-do list." 
USER: "Ok." 
AGENT: "The following items are on our to-do list." 
AGENT: "Work on email." 
AGENT: "Prepare a presentation about Java Beans." 
AGENT: "That's all," 
USER: "Ok." 
Done working on email. 
AGENT: "Let's work on email." 
Done explaining how to work on email. 
USER: "What should I do?" 
AGENT: "First, you reed a message." 
USER: "Ok." 
AGENT: "Then, you might store, print or delete the message." 
Done reading message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH GROUP .... 
Done explaining how to read a message. 
Done agent selecting message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH GROUP .... 
USER: "Select message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH GROUP .... " 
AGENT: Select message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH GROUP .... 
AGENT: 0pen ~essage from Brian Gorwin <brg@dewey.watson.ibm.eom> about WEEKLY SPEECH GROUP SEMINAR. 
Done using the contents of message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH .... 
AGENT: "Shall we use the contents of message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY ...?" 
Done scheduling meeting with Brian at three P.M. Thursday by scheduling. 
Done agent setting up meeting with Brian. 
USER: "Set up meeting with Brian." 
AGENT: Set up meeting with Brian. 
Done agent scheduling event tentatively at three P.M. Thursday. 
AGENT: "Let's schedule event tentatively at a time." 
USER: "Schedule event tentatively at three P.M. Thursday." 
AGENT: Schedule event tentatively at three P.M. Thursday. 
AGENT: "There is a conflict of meeting with Brian at three P.M. Thursday with 
meeting with Irene Landoz at three P.M. Thursday." 
AGENT: "Let's remember to reschedule meeting with Irene Landoz at three P.M, Thursday." 
USER: "Ok." 
AGENT: Finalize meeting with Brian at three P.M. Thursday. 
AGENT: Close message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH GROUP SEMINAR. 
Done reacting to message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY SPEECH .... 
AGENT: "Shall we react to message from Brian Gorwin <brg@dewey.watson.ibm.com> about WEEKLY ...?" 
Done agent forwarding message from Brian Gorwin <brg@dewey.watson.ibm,com> about WEEKLY ... to Steve Krantz. 
USER: "Forward message from Brian Gorwin <brg@dewey.watson.ibm.com> about ... to Steve Krantz." 
Done user saying identify the recipient as Steve Krantz. 
AGENT: "Who is the recipient?" 
USER: "The recipient is Steve Krantz." 
AGENT: Forward message from Brian Gorwin <brg@dewey.watson.ibm.com> about ... to Steve Krantz. 
Done filling in message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH GROUP SEMINAR. 
AGENT: "Shall we fill in message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY ,..?" 
USER: "Ok." 
AGENT: Add text to message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY .... 
Stopped ccing message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH .... 
AGENT: "Shall we cc message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH GROUP SEMINAR?" 
USER: "No." 
AGENT: "0k, your turn." 
Done sending, canceling or saving this message as a draft. 
Done agent sending message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH .... 
USER: "Send message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH GROUP .... " 
AGENT: Send message from Candy Sidner <csidner@lotus.com> about Fwd:WEEKLY SPEECH .... 
Done storing, printing or deleting this message. 
AGENT: "Shall we store, print or delete this message?" 
USER: "Did I forward message to Steve Krantz." 
AGENT: "Yes, you did forward message from Brian Gorwin <brg@dewey.watson.ibm,com> about ... to Steve Krantz." 
USER: "Done." 
Done interruption agent checking history. 
USER: "Please check history," 
USER: "Done." 
Figure 3: Sample s¢~gnmnted interaction history for ~m email interactiou. 
3 

AGENT: 
USER: 
AGENT: 
USER: 
AGENT: 
USER: 
AGENT: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
AGENT: 
AGENT: 
USER: 
AGENT: 
AGENT: 
"Let's manage sales contacts." 
"Let's create profile for Miles White from Abbott Laboratories." 
"What is the eemail address?" 
"The emailAddress is mileswhite@abl.com." 
"What is the workPhone?" 
"The workPhone is 8173329280." 
Goto profile for Abbott Laboratories. 
Create profile for Miles White. 
"Do you want to schedule a presentation for Miles White?" 
"Schedule the meeting for December 9 from 2 to 4 pm." 
Make meeting entry for Miles White on December 9. 
"Is the meeting here or at the customer site?" 
"The site is here." 
"Which building?" 
"The building is Rogers Street." 
Locate room 5003 Center in Rogers Street. 
"I'm going to reserve room 5003 Center in Rogers Street." 
"Ok." 
Reserve room 5003 Center in Rogers Street. 
"Do you need any special equipment?" 
"No." 
"Okay." 
"Who will attend the meeting?" 
"Invite Fred Smith and Jerry Johnson." 
Invite Fred Smith and Jerry Johnson. 
"Will the customer need consulting services?" 
"Ok." 
"Identifying the consulting staff." 
"Invite Pam Bridges from Lotus Consulting and Jimmy Barnes from IGS." 
Invite Pam Bridges and Jimmy Barnes. 
"The following people: Pam Bridges have a conflict with the meeting time.." 
"Instead invite Julie Knox." 
Invite Julie Knox instead of Pam Bridges. 
"I will get back to you to confirm this meeting." 
"Ok." 
Send invitations to Fred Smith, Jerry Johnson, Julie Knox, and Jimmy Barnes. 
"Do you want to see company information?" 
"Show me information about Dover Hospital." 
Goto profile for Dover Hospital. 
"You need to plan a presentation for Miles White." 
"Shall we find presentations about a subject?" 
"No." 
"Okay." 
"I'm going to get back to you about meeting." 
"The following people: Jimmy Barnes have a conflict with the meeting time.." 
"This is message from Jimmy Barnes: I have to be in Cleveland on December 9. 
"Invite Jonathan Parks instead." 
Invite Jonathan Parks instead of Jimmy Barnes. 
"I will get back to you to confirm this meeting." 
Jonathan Parks is available." 
Figure 5: Sample session of Collagen agent using Lotus Notes for meeting planning (unsegmented). 
5 
including creating a profile for Miles White 
(which is displayed to the user as a Notes doc- 
ument), scheduling a meeting with Miles White 
(which the agent undertakes by itself using facil- 
ities in Lotus Notes), finding information about 
Dover Hospital (which is displayed as a Notes 
document), and a brief discussion about plan- 
ning a presentation. 
4 Current Limitations 
The spoken interaction of our two Collagen 
agents is limited by the range of utterances 
that the utterance understanding components 
can interpret More significantly, we feel these 
agents are limited in dealing with spoken con- 
versational errors, i.e. errors that arise either 
because the recognition system produces an er- 
ror, or the semantic interpretation is faulty 
(even given the correct choice of words). Er- 
rors resulting from semantic mis-interpretation 
are especially important because often the con- 
tent of the faulty interpretation is something 
that the agent can respond to and does, which 
results in the conversation going awry. In 
such cases we have in mind using the history 
based transformations possible in Collagen (c.f. 
(Rich and Sidner, 1998)) to allow the user to 
turn the conversation back to before where the 
error occurred. 
Whether communicating by speech or menus, 
our agents are limited by their inability to ne- 
gotiate with their human partner. For example, 
whenever one of our agents propose an action 
to perform that the user rejects (as in the email 
conversation in Figure 4, where the agent pro- 
poses filling in the cclist and the user says no), 
the agent currently does not have any strategies 
for responding in the conversation other than to 
accept the rejection and turn the conversation 
back to the user. We are in present exploring 
how to use a set of strategies for negotiation 
of activities and beliefs that we have identified 
from corpora of human-human collaborations. 
Using these strategies in the Collagen system 
will give interface agents a richer set of negoti- 
ation capabilities critical for collaboration. 
Finally, our agents need a better model 
of conversational initiative. We have experi- 
mented in the Collagen system with three initia- 
tive modes, one dominated by the user, one by 
the agent and one that gives each some control 
of the conversation. The dialogues presented in 
this paper are all from agent initiative. None of 
these modes is quite right. The user dominated 
mode is characterized by an agent that only acts 
when specifically directed to or when explicitly 
told to take a turn in the conversation, while the 
agent dominated mode has a very chatty agent 
that constantly offers next possible actions rel- 
evant to the collaboration. We are currently 
investigating additional modes of initiative. 
The collaborative agent paradigm that we 
have implemented has several original features. 
The conversation and collaboration model is 
general and does not require tuning or the im- 
plementation of special dialogue steps for the 
agent to participate. The model tracks the in- 
teraction and treats both the utterances of both 
participants and the GUI level actions as com- 
munications for the discourse; it relates these to 
the actions and recipes for actions. The model 
has facilities for richer interpretation of dis- 
course level phenomena, such as reference and 
anaphora, through the use of the focus stack. 
Finally, when we began this research, we were 
not certain that the Collagen system could be 
used to create agents that would interact with 
users for many different applications. Our expe- 
rience with five different applications indicates 
that the model has the flexibility and richness to 
make human and computer collaboration possi- 
ble in many circumstances. 

References 
B. J. Grosz and S. Kraus. 1996. Collaborative plans 
for complex group action. Artificial Intelligence, 
86(2):269-357, October. 
B. J. Grosz and C. L. Sidner. 1986. Attention, in- 
tentions, and the structure of discourse. Compu- 
tational Linguistics, 12(3):175-204. 
B. J. Grosz and C. L. Sidner. 1990. Plans for dis- 
course. In P. R. Cohen, J. L. Morgan, and M. E. 
Pollack, editors, Intentions and Communication, 
pages 417-444. MIT Press, Cambridge, MA. 
D. Cruen, C. Sidner, C. Boettner, and C. Rich. 
1999. A collaborative assistant for email. In Proc. 
ACM SIGCHI Conference on Human Factors in 
Computing Systems, Austin, TX, May. 
K. E. Lochbaum. 1998. A collaborative planning 
model of intentional structure. Computational 
Linguistics, 24(4), December. 
C. Rich and C. Sidner. 1998. COLLAGEN: A col- 
laboration manager for software interface agents. 
User Modeling and User-Adapted Interaction, 
8(3/4):315-350. 
