PROBLEM SOLVING APPLIED TO LANGUAGE 
GENERATION 
Douglas I~: Appelt 
Stanford University, Stanfo,d, Califorlda 
SR I International 111enlo Park. California 
This research was supported at SRI htternational by the 
Defense Advanced Reseat~ch Projects Agency under contract 
N00039-79-C-0118 ~¢ith the Naval Electronic Systems 
Commaw t The views and conchtsions contained in this 
document are those of the author and should not be interpreted 
as representative of the official policiex either expressed or 
bnplied, of the Defense Advanced Research Projects Agency, or 
the U. S. Goverttment. The author is gratefid to Barbara Grosz, 
Gary ttendrix and Terry Winograd for comments on an earlier 
draa of this paper. 
I. Introduction 
Previous approaches to designing language understanding systems have 
considered language generation to be tile activity of a highly specialized 
linguistic facility that is largely indcpendcnt of other cognitive capabilities. 
All the requisite knowlcdge for gencration is embodicd in a "gcneration 
module" which, with appropriate modifications to the lexicon, is 
transportable bctween different domains and applications. Application 
programs construct "messages" in some internal representation, such as first 
order predicate calculus or scmantic nctworks, and hand them to the 
generation module to be translated into aatoral language. The application 
program decides what to say; the gencration module decides how to say it. 
In contrast with this previous work. this papcr proposes an approach to 
designing a language generation systcm that builds on the view of language 
as action which has cvolvcd from speech act theory (see Austin \[2l and 
Scarle \[11\]). According to this vicw, linguistic actions are actions planncd 
to satisfy particular goals of the spcakcr, similar to other actions like 
moving and looking. Language production is integrated with a spcakcr's 
problcm solving processes. This approach is fi~unded on the hypothesis 
that planning and pcrforming linguistic ,actions is an activity that is not 
substantially different from planning and pcrforming othcr kinds of 
physical actions. The process of pro/lucing an uttcrance involves, planning 
actions to satisfy a numbcr of diffcrent kinds of goals, and then el~cicntly 
coordinating the actions that satisfy these goals. In the resulting 
framework, dlere is no distinction between deciding what to say and 
deciding how to say it. 
This rcsearch has procceded through a simultaneous, intcgrated effort in 
two areas. The first area of re.arch is the thcoretieal problcm of 
identifying the goals and actions that occur in human communication and 
then characterizing them in planning terms. The ~cond is the more 
applied task of developing machine--based planning methods that are 
adequate to form plans based on thc characterization dcveloped as part of 
the work in the first area. The eventual goal is to merge the results of the 
two areas of effort into a planning system that is capable of producing 
English sentences. 
Rather than relying on a specialized generation module, language 
generation is performed by a general problcm-.-solving system that has a 
great deal of knowlcdge about language. A planning system, named K^MI' 
(Knowlcdge and Modalitics Planncr), is currently under development that 
can take a high-lcvel goal-and plan to achieve it through both linguistic 
and non-linguistic actions. Means for satisfying multple goals can be 
integrated into a single utterance. 
Thi.~ paper examines the goals that arise in a dialog, and what actions 
satisfy those goals. It then discusses an example of a sentcnee which 
satisfies several goals simultaneously, and how K^MP will be able to 
produce this and similar utterances. This system represents an extension to 
Cohen's work on planning speech acts \[3\]. However, unlikc Cohen's 
system which plans actions on thc level of informing and requesting, but 
does not actually generate natural language sentences, KAMP applies general 
problcm-solving techniqucs to thc entire language gencration process, 
including the constructiun of the uttcrance. 
1I. GoaLs and Actions used in Task Oriented Dialogues 
The participants in a dialogue have four different major types of goals 
which may be satisfied, either directly or indirectly, through utterances. 
Physical goals, involve the physical state of the world. The physical state 
can only be altered by actions that have physical effects, and so speech acts 
do not serve directly to achieve these goals. But since physical goals give 
rise to other types of goals as subgoals, which may in turn be satisfied by 
speech acts, they are important to a language planning system. Goals that 
bear directly on the utterances themselves are knowledge slate goals. 
discourse goals, and social goalx 
Any goal of a speaker can fit into one of these four categories. However, 
each category has many sob--categories, with the goals in each sub--category 
being satisfied by actions related to but different from those satisfying the 
goals of other sub--categories. Delineating the primary categorizations of 
goals and actions is one objective of this research. 
Knowledge state goals involve changes in tile beliefs and wants held by the 
speaker or the hearer. Thcy may be satisfied by several different kinds of 
actions. Physical actions affect knowledge, since ,as a minimum the agent 
knows he has performed the action. There are also actions that affect only 
knowledge and do not change the state o£ the world -- for example. 
reading, looking and speech acts. Speech acts are a special case of 
knowledge-producing actions because they do not produce knowledge 
directly, like looking at a clock. Instead, the effects of speech acts manifest 
thcmselves through the recognition of intention. The effect of a speech act, 
according to Searle. is that the hearer recognizes the speaker's intention to 
perform the act. The hcarer then knows which spceeh act has been 
performcd, and because of rules governing the communication processes, 
such as the Gricean maxims \[4\]. the hearer makes inferences about thc 
speaker's beliefs. Thcse inferences all affect the heater's own beliefs. 
Discourse goals are goals dial involve maintaining or changing the sthte of 
the discourse. For example, a goal of focusing on a different concept is a 
type of discourse goal \[5, 9, 12\]. The utterance Take John. for instance 
serves to move the participants' focusing from a general subject to a 
specific example. Utterances of this nature seem to be explainable only in 
terms of the effects they have, and not in terms of a formal specification of 
their propositional content 
Concept activation goals are a particular category of discourse goals. These 
are goals of bringing a concept of some object, state, or event into the 
heater's immediate coneiousness so that he understands its role in the 
utterance. Concept activation is a general goal that subsumes different 
kinds of speaker reference. It is a low-level goal that is not considered 
until the later stages of the planning process, but it is interesting because of 
the large number of interactions between it and higher-level goals and the 
large number of options available by which concept activations can be 
performed. 
59 
Social goals also play an important part in the planning of utterances. 
Thc,:e goals are fimdamentally different from other goals in that freqnently 
they are not effeCts to be achieved ~a~ much as constraiots on the possible 
behavior that is acceptable in a given situation. Social goals relate to 
politeness, and arc reflected in the surface form and content of tile 
utterance. However, there is no simple "formula" that one can follow to 
construct polite utterances. Do you know what time it Ls? may ~ a polite 
way to ask the time, but Do you know your phone number? is not very 
polite in most situations, but Could you tell me your phone number? is. 
What is important in this example is the exact propositional content of the 
utterance. People are expected to know phone numbers, but not 
necessarily what time it is. Using an indirect speech act is not a sufficient 
condition for politen¢~. This example illustrates how a social goal can 
mtluence what is said, as well as how it is expressed. 
Quite often the knowledge state goals have been ssragned a special 
priviliged status among all these goals. Conveying a propsition was viewed 
as the primary reason for planning an utterance, and the task of a language 
generator was to somehow construct an utterance that would be appropriate 
in the current context. In contrast, this rosen:oh attempts to take Halliday's 
claim \[7\] seriously in the design of a computer system: 
"We do not. in'fact, first decide what we want to say 
independcndy of the setting a,ld then dress it up in a garb that 
is appropriate to it in the context .... The 'content' is part of 
the total planning that takes place. "lhere is no clear line 
between the "what' and the 'how'..." 
The complexity that arises from the interactions of these different types of 
goals leads to situations where the content of an utterance is dictated by 
the requirement that it tit into the current context. For example, a speaker 
may plan to inform a bearer of a particular fact. Tbc context of the 
discou~ may make it impossible for the speaker to make an abrupt 
transition from the current topic to the topic that includes that proposition, 
To make this transition according to the communicative rules may require 
planning another utterance, Planning this utterance will in turn generate 
other goals of inforoting, concept activation and focusing. The actions used 
to satisfy these goals may affect the planning of the utterance that gave rise 
to the subgoal. In this situation, there is no clear dividing line between 
"what to say" and "how to say it". 
IlL An Integrated Approach to Planning Speech Acts 
A probem--solving system that plans utterances must have lhe ability to 
describe actions at different levels of abstraction, the ability to speCify a 
partial ordering among sequences of actions, and the ability to consider a 
plan globally to discover interactions and constraints among the actions 
already planned. It must have an intelligent method for maintaining 
alternatives, and evaluating them comparatively. Since reasoning about 
belief is very important in planning utterance, the planning system must 
have a knowledge representation that is adequate for representing facts 
about belief, and a deduction system that is capable of using that 
representauon efficiently. I Achieve(P) /' 
KAMI' is a planning system, which is currently beiug implemented, th:K 
builds on the NOAII planning system of Saccrdoti \[10\]. \]t uses a 
possible-worlds semantics approach to reasoning about belief" and the 
effects that various actions have on belief \[8\] and represents actions in a 
data structure called a procedural network. The procedural network consists 
of nt~es representing actions at somc level of abstraction, along with split 
nodes, which specify several parually urdercd sequences of actions that can 
be performed in any order, or perhaps even in parallel, and choice nodes 
which specify alternate actions, any one of which would achieve the goal. 
Figure 1 is an examplc of a simple procedural network that represents the 
following plan: The top--level goal is to achieve P. The downward link 
from that node m the net points to an expansion of actions and subgoals, 
which when performcd or achieved, will make P true in the resulting 
world. The plan consists of a choice betwcen two alternatives. In tile first 
the agent A does actions At and A2. and no commitment has been made to 
the ordering of these two parts of thc plan. After both of those parts havc 
been complctcly planned and executed, thcn action A\] is performed in thc 
r~sulting world. The other alternative is for agent B to perform action A4. 
It is an important feature of KAMP that it can represent actions at several 
levels of abstraction. An INFORM action can be considered as a high level 
action, which is expanded at a lower level of abstraction into concept 
activation and focusing actions. After each expansion to a lower level of 
abstraction, ~.^MP invokes a set of procedures called critics that cxa,ninc 
tile plan globally, considering the interactions bctwccn its parts, resolving 
conflicts, making the best choice among availab;e alternatives, and noticing 
redundant acuons or actions that could bc subsumed by minor alterations 
in another part of the plan. Tile control structure could bc described as a 
loop that makes a plan, expands it. criticizes thc result, and expands it 
again, until thc entirc plan consists of cxccutablc actions. 
The following is an example of the type of problem that KAMP has been 
tested on: A robot namcd Rob and a man namcd John arc in a room that 
is adjacent to a hallway containing a clock. Both Rob and John are 
capable of moving, reading clocks, and talking to each other, and they each 
know that the other is capable of performing these actions. They both 
know that they are in the room, and they both know where tile hallway is. 
Neither Rob nor John knows what time it is. Suppose that Rob knows that 
the clock is in the I'tall, but John does not. Suppose further that John 
wants to know what time it is. and Rob knows he does. Furthermore, Rub 
is helpful, and wants to do what he can to insure that John achieves his 
goal. Rob's planning system must come up with a plan, perhaps involving 
actions by both Rob and John. that will result in John knowing what time 
it is. 
Rob can devise a plan using KAMP that consists of a choice between two 
alternalives, First, if John could find out where the clock is. he could go 
to the clock and read it, and in the resulting state would know the time. 
So. Rob can tell John where the clock is, "asoning that this information is 
sufficient for John to form and execute a plan that would achieve his goal. 
'~" DO(A t At) 
DO(A t A2} 
DO(B, A4) J 
Figu re 1 
A Simple Procedural Network 
Do(A, A3) I 
60 
f 
Actlieve(Oetached(Bracel, Como)) 
I ActtievelLoo.se(Boltl II i j 
Achieve(KnowWhaOs(Aoor. E\]oltl)) 
ciaieve( KnowWhalls( AI)l~r. Loosen(Bolt I .Wfl))) 
chieve(t(nowWhatls L--~ ' Achieve(Has 
.=,.=, 
\[ Acllieve(Know(Ap,r.On(Tat,le.Wrl))) ' ~ Oo(Aoor. Get(Wrl. Tattle;) 
Figure 2 
A Plan to Remove a Bolt 
The second alternative is t'or Rob to movc into the hall and read the clock 
himself, move back into the room. and tcU John the time. 
As of the time of this writing. KAMP has been implemented and tested on 
problems involving the planning of high level speech act descriptions, and 
pcrfonns tasks comparable to the planner implcmcntcd by Cohen. A more 
complete description of this planner, and the motivation for its design can 
be found in \[\],\]. The following example is intended to give the reader a 
feeling for how the planner will prncced in a typical situation involving 
linguistic planning, but is not a description of a currently working system. 
An expert and an apprentice are cooperating in the task of repairing an air 
compressor. The expert is assumed to be a computer system that has 
complete knowledge of all aspects of the task, but has no means of 
manipulating the world except by requesting the apprentice to do things. 
and furnishit~g him or her with the knowledge necdcd to complete the task. 
Figure 2 shows a partially completed procedural network. The node at the 
highest level indicates the planner's top-level goal. which in this case is 
Oo(Ap,r. 
Loosen(Bolt1. Wrll) 
Assume that the apprentice knows that rite part is to be removed, and 
wants to do the removal, but does not know of a procedure \['or doing it. 
This situation would hold if the goal marked with an asterisk in figure 2 
were unsatisfied. The expert must plan an action to inform ri~e apprentice 
of what the desired action is. This goal expands into an INFORM action. 
The expert also beiicv~ that the apprentice does not know where the 
wrench is, and plans another \[NI:ORM action to tell him where it is located. 
The planner tests d~c ACIIIt:,VE goals to see if it bclicves d~at any of them 
arc ,already true in die current state of the world. In the case we arc 
considering Y.AMFS model of the hearer should indicate that he ktlows 
what the bolt is. and what the wrench is, but doesn't know what the action 
is. i.e. that he should use that particular wrench to loosen that bolt, and he 
doesn't know the location of the wrench. \[f informing actions ~e planned 
to satisfy those goals that are not already satisfied; then that part of the 
plan looks like Figure 3. 
Each of the INFORM actions is a high-level action that can be expanded. 
The planner has a set of standard expansions for actions of this type. In 
removing a particular object (BRACEI) from an air compressor, \[t knows 
that this goal can be achieved by the apprentice executing a particular 
unfastening operation involving a specific wrench and a specific bolt, "ll~e 
expert knows that the apprentice can do the action if he knows what the 
objects involved in the cask are. and knows what the action is (i.e. that he 
knows how to do the ,action). This is reflected in the second goal in the 
split path in the procedural network. Since the plan also requires obtaining 
a wrench and using it, a goal is also established that tile apprentice knows 
where the wrench is: hence the goal ^CIllEvE(Know(Apprentice. On(Table. 
Wr\].))). 
NOAII, these actions were written in SOUP code. In this planner, they are 
represented in situation-action rules. The conditional of the rule involves 
tests on the type of action to be performed, the hearer's knowledge, and 
social goals. The action is to select a particular strategy for expanding the 
action. In this case, a rule such as /\[you are expanding an inform of what 
an action involving the hearer as agent is. then use an IMPERATIVE syntactic 
construct to describe the action. The planner then inserts the expansion 
shown in Figure 4 into the plan. 
~ ~Achilve(KnowWhatls(Al~Dr.Lo~m~(Bolt 1 .Wrl ))) 
I DO( E xoer t.lnformval(A 130r.L0osen(Bo~t I ,Wr 1 ))) 
"%~Acilieve( KnowWhatis ~ Achieve(Hgs 
I I 
I 
./ 
J Ac hieve(Kn°w('~ pot 'On(Table'Wr I ))) 
I 
I I O~( Exp.lntor m(A~pr.OnlTahle.Wr Ill I I 
Figure 3 
Planning to Inform 
Do(Agtor. Get(We I)) I 
61 
I Dot ExD,int ormV~d(AnDr,Loosen(BoUl .Wrl ))) I ) 
DolExpert. ,~V( "Loo~n "l) 
Do(Expert, CACT(AgDf. Wfl)) IN~f 
Figure 4 
Expanding the INFORM Act 
This sub-plan is marked by a tag indicating that it is to be realized by an 
Unpcrative. The split specifics which h)wer level acuons arc performed by 
the utterance of the imperative. At some point, a critic will choose an 
ordering for the actions. Without further information the scntcncc could 
be realizcd in any of the following ways, some of which sound strange 
when spoken in islolation: 
Loosen Boltl with Wrl. 
With Wrl loosen BOltl. 
Boltl loosen with Wrl. 
The first sentence above sounds natural in isolation. \]'he other two might 
be chosen if a critic notic~ a need to realize a focnsmg action that has 
been plauncd. For example, the second sentence shiftS thc focus to the 
wrench instead of the bolt` and would be useful in organizing a series of 
instructions around what tools to use. The third would be used in a 
discourse organized around what object to manipulate aexL 
Up to this point` the phmning process ilas been quite :;traighdorward, since 
none of the critics have come into piny. However, since there arc two 
INFORM actions on two branches of the same split, thc COMBINE-CONCEPT- 
ACTIVATION critic is invoked. This critic is invoked whenever a plan 
contains a concept activation on one branch of the split, and an inform of 
some property of the activated object on the other branch. Sometimes the 
planner can combine the two informing actions into one by including the 
property description of one of the intbrmmg actS into the description that 
is being used for the concept activation. 
In this particular example, ~ critic would av.,'~h to the Do(Expe~ 
CACT(Appr.. Wri)) action the copetraint that one of the realizing descriptors 
must be ON(Wri. Table). and the goal that the apprentice knows the 
wrench is on the table is marked as already satisfied. 
Another critic, the REDUNDANT-PATII critic, notices when portions of two 
brances of a split contain identical actions, and collapses the two branches 
into one. This critic, when applied to utterance plans will oRen result in a 
sentence with an and conjunction. The critic is not restricted to apply only 
m linguistic actions, and may apply to other types of actions as well. 
Or.her critics know about acuon subsumption, and what kinds of focusing 
actions can be realized in terms of which linguistic choices. One of these 
action subsumption critics can make a decision about the ordering of the 
concept activations, and can mark discourse goals as pha,. ")ms. in U is 
example, there are no spccific discourse goalS, so it is pussibtc to chose the 
default verb-object°instrument ordering. 
On the next next expansion cycle, the concept activations must be 
expanded into uttcrances. This means planning descriptors for the objects. 
Planning the risht description requires reasoning about what the hearer 
believes about the object` describing it as economically as possible, and 
then adding the additional descriptors recommended by the action 
subsumption critic. The final step is realizing the descriptors in natural 
language. Some descriptors have straightforward realizations ,as lexical 
items. Otbers may require planning a prepositional phrnsc or a relative 
clause. 
IV. Formally dcfi,ing H);guistic actions 
If actions are to be planned by a planning system, thcy must be defined 
formally so they can bc used by the system. This means explicitly stating 
the preconditions and effects of each action. Physical actions havc received 
attention in the literature on planning, but one ~pect of physical actions 
Lhat has been ignored arc thcir cffccts on kuowlcdgc. Moorc \[8\] suggestS 
an approach to formalizing, the km)wicdgc cffccL'; of physEal actions, so \[ 
will not pursue Lhat further at this time. 
A fairly large amount of work has been done on the formal specification of 
speech acts un the level of informing and requesting, etc. Most of this 
work has bccn done by Scaric till, and has been incorporatcd into a 
planning system by Cohen \[3\]. 
Not much has been done to formally specify the actions of focusing and 
concept activation. Sidncr \[12\] has developed a set of formal rules for 
detecting focus movement in a discourse, and has suggested that these rules 
could be translated into an appropriate set of actions that a generation 
system could use. Since there are a number of well defined strategies that 
speakers use to focus on different topics. I suggest that the preconditions 
and effectS of these strategies could be defined precisely and they can bc 
incorporated as operators in a planning systcm. Reichmann \[9J describes a 
number of focusing strategies and the situations in which they are 
applicable. The focusing mechanism is driven by the spcakcr's goal that 
the bearer know what is currently being focused on. Tbis particular type 
of knowledge state goal is satisfied by a varicty of different actions. These 
actions have preconditions which depend on what the current state of the 
discourse is, and what type of shift is taking place. 
Consider the problem of moving the focus back to the previous topic of 
discussion after a brief digression onto a diEerent hut related topic. 
Reichmaon pointS out that several actions arc available. Onc soch action is 
the utterance of "anyway'* which signals a more or tcss expected focus 
~hffL. She claims that the utterance of "but" can achieve a similar effect, 
but is used where the speaker believes that the hearer believes that a 
discu~ion on the current topic will continue, and Lhat presupposition needs 
to be countered. Each of these two actions will be defincd in the planning 
system as operator. The °'but" operator will have as an additional 
precondition that the hearer believes that the speaker's next uttorance will 
be part of the current context. Both operators will hay= the effect that the 
hearer believes that the speaker is focusing on the prcvious topic of 
discussion. 
Other operators that are available includc cxplicity labeled shifts. This 
operator exp. ~ds rata planning an INFORM of a fOCUS shill The previous 
example of Take John. for instance, is an example of such an action. 
The prccLsc logical axiomiuzation of focusing and the prccisc definitions of 
each of these actions is a topic of curre..t research. The point being made 
here is that these focusing actions can bc spccificd formally, One goal of 
this research is to formally describe linguistic actions and other knowledge 
producing actions adequately enough to demonstrate the fcasibility of a 
language plmming system. 
V. Current Status 
The K^MP planner described in this paper is in the early stages of 
implementation. It can solve interesting problems in finding multiple agent 
plans, and plans involving acquiring and using knowlcge. It has not bee. 
applied directly to language yet` but this is the next stcp in research. 
62 
Focusing actions need to be described formally, and critics have to be 
defined precisely and implemented. This work is currendy in progress. 
Although still in its early stages, this approach shows a great deal of 
promise for developing a computer system that is capable of producing 
utterances that approach the richness that is apparent in even the simplest 
human communication. 
REFERENCES 
\[1\] Appelt, Douglas, A Planner for Reasoning about Knowledge mid Belief, 
Proceedings of the First Conference of the American Association for 
Artificial Intelligence, 1980. 
\[2\] Austin, J., How to Do Things with Words, J. O. Urmson (ed.), Oxford 
University Pre~ 1962 
\[3\] Cohen, Philip, On Knowing What to Say: Planning Spech Acts, 
Technical Report #118. University of Toronto. 1.978 
\[4\] Gricc, H. P., Logic and Coversation, in Davidson, cd., The Logic of 
Grammar., Dickenson Publishing Co., Encino, California, \[975. 
\[5\] Grosz, Barbara J., Focusing and Description in Natural Language 
Dialogs, in Elements of Discoursc Understanding: Proccedings of a 
Workshop on Computational Aspects of Linguistic Structure and Discourse 
Setting, A. K. Joshi et al. eds., Cambridge University Press. Cambridge. 
Ealgland. 1980. 
\[6\] Halliday, M. A. K., Language Structure and Language Ftmctiol~ in 
Lyons, cd., Ncw Horizons in Linguistics. 
\[7\] Halliday, M. A. K., Language as Social Semiotic, University Park Press, 
Baltimore, Md., 1978. 
\[8\] Moore. Robert C., Reasoning about Knowledge and Action. Ph.D. 
thesis, Massachusetts Institute of Technology. 1979 
\[9\] Reichman. Rachel. Conversational Coherency. Center for Research in 
Computing Technology Tochnical Rcport TR-17-78. Harvard University. 
1978. 
\[10\] Sacerdod, Earl, A Structure for Plans and Behavior. Elsevier North- 
Holland, Inc.. Amsterdam, The Nedlcriands, 1.977 
\['l_l\] Searte, John, Speech Acts, Cambridge Univcrsiy Press, 1969 
\[12\] Sidner, Candace L. Toward a Computational Theory of Definite 
Anaphora Comprehension in English Discourse. Massichusetts Institute of 
Technology Aritificial Intelligence Laboratory technical note TR-537, 1979. 
63 

