Natural Language Generation 
Journeys to Interactive 3D Worlds* 
Invited Talk Extended Abstract 
James C. Lester and William H. Bares 
Charles B, Callaway and Stuart G. Towns 
• Multimedia Laboratory 
Department of Computer Science 
North Carolina State University 
Raleigh, NC 27695 
{lester, whbsres, cbcallaw , sgtowns}~os.ncsu.edu 
http://multimedia.ncsu.edu/imedia/ 
Abstract 
Interactive 3D worlds offer an intriguing testbed for 
the natural language generation community. To com- 
plement interactive 3D worlds' rich visualizations, they 
~equire significant linguistic flexibility and communica- 
tive • power. We explore the major functionalities and 
~rchitectural implications of natural language genera- 
tion for three key classes of interactive 3D worlds: self- 
. ." explaining 3D environments, habitable 3D learning en- 
vironments, and interactive 3D narrative worlds. These 
are illustrated with .empirical investigations underway 
in our laboratory with severalsuch systems. 
• • • . . 
• Introduction 
Natural language generation (NLG) has witnessed great 
strides over the past decade. Our theoretical Underpin- 
nings are firming up, our systems building activities 
are proceeding quickly, and we are beginning to see 
significant empirical results. As a result of this mat- 
uration, the field is now well positioned to attack the 
• "challenges pose'd by a new family of computing envi- 
• ronments: interactive 3D worlds, which continuously 
• •render the activities playing out in rich 3D scenes in 
realtime. Because of these worlds' compelling visual 
properties and their promise of a high degree of mul- 
timodal interactivity, they will soon form the basis for 
applications ranging from learning environments for ed, 
ucation and training to interactive fiction systems for 
•entertainment. 
Interactive 3D worlds offer an intriguing testbed for 
the NLG •community for several reasons. They may 
portray scenes with complicated spatial relationships, 
" * Support for this work was provided by the follow- 
lag •organizations: the National Science Foundation under 
grants CDA-9720395 (Learning and Intelligent Systems Ini- 
tiative) and IRI-9701503 (CAREER Award Program); the 
North Carolina State University Intelli/Cledia Initiative; the 
William S: Kenan Institute for Engineering, Technology and 
Science; and a Corporate gift from Novell, Inc. 
• such as those found in the domain of electricity and 
magnetism in physics. They may include multiple dy- 
namic objects tracing out complex motion paths, such 
as water particles traveling through xylem tissue in Vir- 
tual plants. They might be inhabited by user-directed 
avatars that manipulate objects in the world and lifelike 
agents that will need to coordinate speech, gesture, and 
locomotion as they explain and demonstrate complex 
phenomena. In 3D interactive fiction systems, user- 
directed avatars and lifelike autonomous agents may 
navigate through complex cityscapes and interact with 
users and with one another to create new forms of the- 
ater. 
As the visual complexities of interactive 3D worlds 
grow, they will place increasingly heavy demands on 
the visual channel. To complement their rich visualiza- 
tions, interactive 3D worlds will require the linguistic 
flexibility and •communicative power that only NLG can 
provide. In interactive learning environments, the spa- 
tial complexities and dynamic phenomena that charac- 
terize physical devices must be clearly explained. NLG 
delivered with speech synthesis will need to be care- 
fully coordinated with 3D graphics generation to create 
interactive presentations that are both coherent and in- 
teresting. In a similar fashion, lifelike agents roaming • 
around the same 3D worlds through which users guide 
their avatars will require sophisticated NLG capabili- 
ties, and 3D interacti.ve fiction systems will-benefit con- 
siderably from virtual narrators that are articulate and 
can generate interesting commentary in realtime. 
In this talk, we will •explore the major issues, func- 
tionalities, and architectural implications of •natural 
language generation for interactive 3D worlds. Our dis- 
cussion will examine NLG issues for three interesting 
classes of interactive 3D worlds: 
• Self-Explaining 3D Environments: In response 
to users' questions, Self-explaining environments dy- 
namically generate spoken natural language and 3D 
animated visualizations and produce vivid explana- 
. • • : 
Figure 1: The PHYSVIZ Self-Explaining 3D Environment 
tions of complex phenomena. 
• Habitable 3D Learning Environments: In hab- 
itable learning environments, lifelike pedagogical 
agents generate advice combining speech and gesture 
as users solve problems by guiding avatars through 
3D worlds and manipulating devices housed in the 
worlds. 
• Interactive 3D Narrative Worlds: Virtual narra- 
tors generate fluid descriptions of lifelike characters' 
interaction with one another in response to incremen- 
tal specifications produced by narrative planners and 
interactively-issued user directives. 
To begin mapping out the very large and complex 
space of NLG phenomena in 3D interactive worlds, 
it is informative to examine the issues empirically. 
These issues are being studied in the context of several 
projects currently under development in our laboratory. 
First, self-explaining 3D environments must coordinate 
NLG with 3D graphics generation. These require- 
ments will be discussed with regard to the PHYSVIZ 
(Towns, Callaway, & Lester 1998) and the PLANT- 
WORLD (Bares & Lester 1997) self-explaining 3D en- 
vironments for the domains of physics and plant phys- 
iology, respectively. Second, in habitable 3D learning 
environments, lifelike agents must be able to gener- 
ate clear language that is carefully coordinated with 
agents' gestures and movements as they interact with 
users in problem-solving episodes. We examine these 
issues in the VIRTUAL COMPUTER (Bares el al. 1998; 
Bares, Zettlemoyer, & Lester 1998), a habitable 3D 
learning environment for the domain of introductory 
computer architecture. Third, virtual narrators for 
3D interactive fiction should be able to generate com- 
pelling realtime descriptions of multiple characters' be- 
haviors. These issues are illustrated with examples from 
the COPS~ROBBERS world (Bares, Gr~goire, & Lester 
1998), a 3D interactive fiction testbed. 
In the talk, we discuss current efforts to introduce 
NLG capabilities into these worlds at several levels. 
This includes (1) discourse planning, as provided by the 
KNIGHT explanation planner (Lester & Porter 1997), 
(2) sentence construction, as provided by the the FARE 
sentence planner (Callaway & Lester 1995) and the RE- 
visor clause aggregator (Callaway & Lester 1997), and 
(3) surface generation, as provided by FUF (Elhadad 
1991). Below we briefly summarize the requirements 
and issues of NLG for self-explaining 3D environments, 
habitable 3D learning environments, and interactive 3D 
narrative worlds. These will be discussed in some detail 
in the talk. 
Generation in Self-Explaining 
3D Environments 
As graphics technologies reach ever higher levels of 
sophistication, knowledge-based learning environments 
and intelligent training systems can create increasingly 
Figure 2: The PLANTWORLD Self-Explaining 3D Environment 
effective educational experiences. A critical functional- 
ity required in many such systems is the ability to un- 
ambiguously communicate spatial knowledge. Learning 
environments for the basic sciences frequently focus on 
physical structures and the fundamental forces that act 
on them in the world, and training systems for tech- 
nical domains often revolve around the structure and 
function of complex devices. Explanations of electro- 
magnetism, for example, must effectively communicate 
the complex spatiM relationships governing the direc- 
tions and magnitudes of multiple vectors representing 
currents and electromagnetic fields, many of which are 
orthogonal to one another. 
Because text-only explanations are inadequate for ex- 
pressing complex spatial relationships and describing 
dynamic phenomena, realtime explanation generation 
combining natural language and 3D graphics could con- 
tribute significantly to a broad range of learning envi- 
ronments and training systems. This calls for a com- 
putational model of 3D multimodal explanation gen- 
eration for complex spatial and dynamic phenomena. 
Unfortunately, planning the integrated creation of 3D 
animation and spatial/behavior linguistic utterances in 
realtime requires coordinating the visual presentation 
of 3D objects and generating appropriate referring ex- 
pressions that accurately reflect the relative position, 
orientation, direction, and motion paths of the objects 
presented with respect to the virtual camera's view of 
the scene. 
To address this problem, we are developing the visuo- 
linguistic ezplanation planning framework for gener- 
ating multimodal spatial and behavioral explanations 
combining 3D animation and speech that complement 
one another. Because 3D animation planners require 
spatial knowledge in a geometric form and natural lan- 
guage generators require spatial knowledge in a linguis- 
tic form, a realtime multimodal planner interposed be- 
tween the visual and linguistic components serves as 
a mediator. This framework has been implemented 
in CINESP~.AK, a multimodal generator consisting of 
a media-independent explanation planner, a visuo- 
linguistic mediator, a 3D animation planner, and a real- 
time natural language generator with a speech synthe- 
sizer. Experimentation with CINESPEAK is underway 
in conjunction with self-explaining environments that 
are being designed to produce language of spatial and 
dynamic phenomena: 
• Complex spatial explanations: PHYSVIZ (Towns, 
Callaway, & Lester 1998) is a self-explaining 3D en- 
vironment in the domain of physics that generates 
multimodal explanations of three dimensional elec- 
tromagnetic fields, force, and electric currents in re- 
altime (Figure I). 
• Complex dynamic behavior explanations: PLANT- 
WORbD (Bares & Lester 1997) is a self-explaining 
3D environment in the domain of plant anatomy and 
4 
! 
! 
i| 
!.-- 
Figure 3: The VIRTUAL COMPUTER Habitable 3D Learning Environment 
physiology that generates multimodal explanations of 
dynamic three dimensional physiological phenomena 
such as nutrient transport (Figure 2). 
Generation in Habitable 
3D Learning Environments 
Engaging 3D learning environments in which users 
guide avatars through virtual worlds hold great promise 
for learner-centered education. By enabling users to 
participate in immersive experiences, 3D learning en- 
vironments could help them come to develop accurate 
mental models of highly complex biological, electronic, 
or mechanical systems. In particular, 3D learning envi- 
ronments could permit learners to actively participate 
in the very systems about which they are learning and 
interact with lifelike agents that could effectively com- 
municate the knowledge relevant to the user's task. For 
example, users could study computer architecture in a 
virtual computer where they might be advised by a life- 
like agent about how to help a CPU carry data from 
RAM to the hard disk, or they could study the human 
immune system by helping a T-cell traverse a virtual 
lymph system. Properly designed, 3D learning environ- 
ments that blur the distinction between education and 
entertainment could produce engrossing learning expe- 
riences that are intrinsically motivating and are solidly 
grounded in problem solving. 
Lifelike agents that are to interact with users in 
habitable 3D learning environments should be able to 
generate language that enables them to provide clear 
problem-solving advice. Rather than operating in iso- 
lation, generation decisions must be carefully coordi- 
nated with decisions about gesture, locomotion, and 
eventually prosody. In collaboration with the STEVE 
virtual environments tutor project at USC/ISI (Rickel 
&: Johnson 1998), we have begun to design NLG tech- 
niques for embodied explanation generation in which the 
avatar/agent generates coordinated utterances (deliv- 
ered with a speech synthesizer) and gestural and lo- 
comotive behaviors as it manipulates various devices in 
the world. Embodied explanation generation poses par- 
ticularly interesting challenges in the following areas: 
• Deictic believability: Lifelike agents must be able to 
employ referring expressions and gestures that to- 
gether are both unambiguous and natural (Lester et 
al. 1998). 
• Socially motivated generation: Lifelike agents must 
not only express concepts clearly but also create ut- 
terances that are properly situated in the current 
socio-linguistic context. 
• Embodied discourse planning: Media allocation issues 
must be considered in adjudicating between express- 
ing advice verbally or through agents' demonstrative 
actions. 
Over the past two years, we have constructed a habit- 
able learning environment for the domain of computer 
Figure 4: The COPSe:ROBBERS Interactive 3D Narrative World 
architecture. The VIRTUAL COMPUTER (Bares et al. 
1998; Bares, Zettlemoyer, & Lester 1998) (Figure 3) is a 
habitable 3D learning environment that teaches novices 
the fundamentals of computer architecture and system 
algorithms, e.g., the fetch-execute cycle. To learn the 
basics of computation, users direct an avatar in the form 
of a friendly robot courier as they execute instructions 
and transport data packets to appropriate locations in 
a 3D "town" whose buildings represent the CPU, RAM, 
and hard disk. We are beginning to investigate deictic 
believability, socially motivated generation, and embod- 
ied discourse planning in an lifelike agent that provides 
advice in the VIRTUAL COMPUTER. 
Interactive 3D Narrative Worlds 
While story generation has been an NLG goal that dates 
back more than. a quarter century and text-based inter- 
active fiction systems have been the subject of increas- 
ing attention, it is the prospect of coupling sophisti- 
cated NLG with 3D believable characters that offers 
the potential of achieving interactive fiction generation 
in a visually compelling environment. One can imag- 
ine different genres of 3D interactive fiction, many of 
which will involve a virtual narrator who comments on 
the events unfolding in the world. In much the same 
manner that sports announcers come in two varieties, 
play-by-play and color commentary, virtual narrators 
can provide both a descriptive account of the world's 
activities as well as a running analysis on their signifi- 
cance. To stress test NLG, we adopt three constraints 
on generation for 3D narrative worlds: 1 
• Realtime: World events play out in realtime and can 
be modified by users. Consequently, the relevance of 
utterances is time-bound; generators must construct 
their utterances in realtime and cannot know in ad- 
vance how the actions in the world will play out. 
• Non-interference: Generators cannot themselves en- 
act modifications on objects or characters in the 
world. As a result, they must cope with what they 
are dealt by world simulators and users' actions. 
* Multiple, simultaneous events: Multiple activities oc- 
cur in the world at the same time. Consequently, 
generators must make time-bounded moment-by- 
moment content determination decisions that neces- 
sarily omit mention of many actions. 
We have recently begun to study these issues in 
COPS&ROBBERs (Bares, Gr~goire, & Lester 1998), a 
3D interactive fiction testbed with multiple characters 
interacting with each other in an intricate cityscape. In 
COPS&ROBBERS (Figure 4), three autonomous charac- 
ters, a policeman and two robbers, attempt to capture 
a lost money bag dropped by a careless bank teller. If 
l Efisabeth Andrfi and colleagues at DFKI are addressing 
similar issues in their realtime generator for the ROBOCUP 
competition. 
! 
i 
i 
I 
I 
I 
! 
! 
the policeman finds the money bag first, he dutifully 
returns it to the bank, but if either of the two miscre- 
ants find the unclaimed money, they will scurry off to 
Joe's Bar to spend their new found loot. If the cop 
catches either robber carrying the money, he will im- 
mobilize him and return the money bag to the bank. 
When the narrative begins, the three characters mean- 
der randomly through the town searching for the lost 
money bag. At any time, users may affect the narrative 
by modifying characters' physical abilities such as their 
speed or eyesight. 
Despite the relative simplicity of the testbed, it poses 
significant NLG challenges. Of particular interest are 
problems in the virtual narrator's expressing time se- 
quence relations, concisely describing locations where 
particular events are occurring, and linking characters' 
actions to their intentions. Because events occur si- 
multaneously, tense issues are problematic in accurately 
describing the temporal relations between events in se- 
quential utterances. Especially difficult are generating 
precise disambiguating locative descriptions involving 
relative locations, direction of movement, and proxim- 
ity of characters and structures in the world. Because 
it is often important to identify where a specific ac- 
tion has occurred, generators must be able to formu- 
late locatives that are precise. Frequently, they must 
also be concise, because utterances• that are too verbose 
will require excessive speaking times, causing tile narra- 
tion to miss other important events. Finally, generators 
must be able to communicate about, characters' goals, 
actions, and the relation between the two. For exam- 
ple, if the cop is scurrying toward one of the robbers, 
rather than merely reporting the action, the generator 
should sometimes comment on the causal link between 
the cop's desire to obtain the money bag and and his 
accosting the targeted robber. 
A New Era for NLG 
As a result of both technological and societal develop- 
ments, the advent of a new era for NLG is upon us. 
On the technology front, high-end 3Dgraphics, as well 
as the 3D interactive worlds they will spawn, will make 
significant demands on NLG systems. On the societal 
front, we're beginning to see the rapid convergence of 
the software, telecommunications, and even the enter- 
tainment industries. This will undoubtedly provide sig- 
nificant impetus for integrating NLG into applications 
that could not have even been imagined at the incep- 
tion of the field. With continued progress in theory, 
systems building, and empirical studies, we will be well 
positioned to meet the upcoming challenges. 
Acknowledgements 
Many people have contributed to the projects discussed 
in the talk. The authors would like to thank:• the 
technical members of the the IntelliMedia Initiative's 
3D team including Jo~l Gr~goire, Ben Lee, Dennis Ro- 
driguez, and Luke Zettlemoyer; the IntelliMedia ani- 
mation and modeling team, led by Patrick FitzGerald, 
including Tim Buie, Mike Cuales, Rob Gray, and Alex 
Levy; Bruce Porter for his collaboration on the KNIGHT 
explanation system; Jeff Rickel for his collaboration on 
the pedagogical agents dialogue work; and especially 
Michael Elhadad for creating and generously assisting 
us with FUF for the past five years. 

References 
Bares, W. H., and Lester, J. C. 1997. Realti'me generation 
of customized 3D animated explanations for knowledge- 
based learning environments. In AAAI-97: Proceedings of 
the Fourteenth National Conference on Artificial Intelli- 
gence, 347-354. 
Bares, W.; Zettlemoyer, L.; Rodriguez, D.; and Lester, 
J. 1998. Task-sensitive cinematography interfaces for in- 
teractive 3D learning environments. In Proceedings of the 
Fourth International Conference on Intelligent User Inter- 
faces, 81-88. 
Bares, W.; Gr6goire, J.; and Lester, J. 1998. Realtime 
constraint-based cinematography for complex interactive 
3D worlds. In Proceedings of the Tenth National':Confer- 
ence on Innovative Applications of Artificial Intelligence. 
Bares, W.; Zettlemoyer, L.; and Lester, J. 1998. Habitable 
3D learning environments for situated learning. In Proceed- 
ings of the Fourth International Conference on Intelligent 
Tutoring Systems. Forthcoming. 
Callaway, C., and Lester, J. 1995. Robust natural language 
generation from large-scale knowledge bases. Ill Proceed- 
ings of the Fourth Bar-Ban Symposium on the Foundations 
of Artificial Intelligence, 96-105. 
Callaway, C. B., and Lester, J. C. 1997. Dynamically im- 
proving explanations: A revision-based approach to expla- 
nation generation. In Proceedings of the Fifteenth Interna- 
tional Joint Conference on Artificial Intelligence, 952-58. 
Elhadad, M. 1991. FUF: The universal unifier user manual 
version 5.0. Technical Report CUCS-038-91, Department 
of Computer Science, Columbia University. 
Lester, J. C., and Porter, B.W. 1997. ~)eveloping 
and empirically evaluating robust explanation generators: 
The KNIGHT experiments~ Computational Linguistics 
23(1):65-101. 
Lester, J.; Voerman, J.; Towns, S.; and Callaway, C. 1998. 
Deicfic believability: Coordinating gesture, locomotion, 
and speech in lifelike pedagogical agents. Applied Artificial 
Intelligence. Forthcoming. 
Rickel, J., and Johnson, W. L. 1998. Animated agents 
for procedural training in virtual reality: I Percepti6n, cog- 
nition, and motor control. Applied Artificial Intelligence. 
Forthcoming. 
Towns, S. G.; Callaway, C. B.; and Lester, J. C. 1998. Gen- 
erating coordinated natural language and 3D animations 
for complex spatial explanations. In Proceedings of the Fif- 
teenth National Conference on Artificial Intelligence. 
