The Power of Words in Message Planning 
Michael Zock 
Language & Cognition 
LIMSI - C.N.R.S., B.P. 133 
91403 Orsay, France 
zock@|imsi.fr 
Abstract: Before engaging in a conversa- 
tion, a message must be planned. While there 
are many ways to perform this task, I believe 
that people do this in the following way, in 
particular if the message is going to be long: 
first an outline is planned (global, or skeleton 
planning), which is then filled in with details 
(local planning, elaboration). Planning pro- 
ceeds thus from general to specific (breadth 
first), that is, sentences are planned incremen- 
tally by gradual refinement of some abstract 
thought rather than in one go (one-shot pro- 
cess) where every element is planned down to 
its last details. 
While global planning is largely lan- 
guage independent, local planning can be lan- 
guage dependent: the dictionary acts a media- 
tor, interfacing language and thought. Given 
the fact that words can be used to specify non 
linguistic thought, there is feedback from the 
lexical to the conceptual component. This 
being so, dictionaries may play a fundamental 
role in guiding and potentially modifying non 
linguistic thought. If my view is correct, this 
could have implications on the design of ge- 
neration architectures: instead of separating 
message planning and realization, viewing the 
process as being strictly sequential, we could 
allow for feedback loops (interleaved process), 
whereby the linguistic component could feed 
back to the conceptual component. 
1 Introduction 
A major step in natural language-generation (NLG) 
consists in choosing content words for expressing the 
planned message. While this sounds self evident, it 
contains at least two assumptions that are easily over- 
looked: (a) thought precedes language; (b) thought is 
entirely encoded or specified before lexicalization 
takes place. My contribution in this paper consists in 
providing evidence for the following three claims: (a) 
thought is underspecified at the onset of lexicaliza- 
tion; (b) language can feed back on thought, i.e. 
words can specify the conceptual component; (c) our 
mental dictionaries are the interface between language 
and thought. 
NLG has often been viewed as a two step process. 
During the first (deep generation) conceptual choices 
are made (content determination, discourse planning), 
during the second (surface generation) linguistic ope- 
rations are performed (word choice, determination of 
syntactic structure). While this kind of decomposition 
has proven useful for practical purposes, ---dividing 
the process into separate components increased the 
control, -- it has also encouraged researchers to build 
into their systems wrong assumptions: content is ge- 
nerally determined in one go (one-shot process), and 
information flow is one-directional, going downwards 
from the conceptual level to the linguistic level. As I 
will show by taking an example from the lexicon, 
both these conclusions are ill-lbunded, as they sug- 
gest that there is no feedback between the different 
components. 
2 A naive view of generation 
NLG can be viewed as the process of finding a lin- 
guistic form for a given conceptual fragment. 
Obviously, there are certain dangers with this view. 
First of all, the order of thought, i.e. the order in 
which conceptual chunks become available, and the 
utterance order, i.e. the order in which words have to 
be uttered in a given language is not necessarily the 
same. Second, words cannot be directly mapped on 
their conceptual coun-terpart, that is, there is no one- 
to-one correspondance between concepts and words: a 
given word may ex-press more than a single concept: 
leave vs. go away; unhappy vs. not happy (for more 
details see Nogier & Zock, 1992). Third, there is no 
feedback from the linguistic to the conceptual 
component. In the remainder of this paper I shall 
concentrate precisely on this latter problem by 
showing the interaction between these two 
components. 
3 How is content planned? 
Suppose we wanted to produce the following sentence: 
<,~clen the old man saw the little boy 
drowning in the river, he went to his canoe 
in order to rescue him.>>. Figure 1 here below 
can be considered as the underlying planning tree. But 
how is such a tree built? 
succ 
EVENT-I EVENT-I 
..... ,1 ...... I 
GOA L DI RECTF.D ACTION 
GOAl, 
/\ I t I 
mall AGE. DROWN GO TO R}~CUE 
I /\ /\ /\ 
old PERSON-I PLACE PERSglN-I MEANS OF |'ERSON-I pERoSON-I 
/\ I I ........./\ I I 
bny SIZE river man can~ BELONG TO man L'~)Y .,,L. I 
mall 
Figure 1 
990 
There are several good reasons, --psychological 
and linguistic,-- to believe that this sentence has not 
been planned from left to right and in one go. 
p~ychological re!sons: The sentence is simply too 
long for a speaker to hold in short-term-memory all 
the information to be conveyed. It is highly unlikely 
that the speaker has all this information available at 
the onset of verbalization. The need of planning, that 
is, the need to look ahead and to plan in general 
terms, increases with sentence length and with the 
ntnnber and type of embeddings (for example, center 
eml~xtded sentences). There is also good evidence in 
the speech error literature for the chtim that people 
plan in abstract terms. False starts or repairs, like 
<< I've turned on the stove switch, I mean the heater 
switch >> suggest that the temperature increasing de- 
vice has been present in the speakers mind, yet at an 
abstract level (see Levelt, 1989; Fromkin 1993). 
Linguistic reasons: as mentionned already, tile otr/er 
of words does not necessarily parallel the order of 
thought. For example, the generation of the first word 
of the sentence here above, --tile temporal adverbial 
"when",-- requires knowledge of the fact that there is 
another event taking place. Yet, this information 
appears fairly late in the sentence. 
Figures 2 and 3 here below illustrate a reasonable 
way to plan such a message. Starting with something 
fairly general like, --there are two temporally related 
events: EVENT-I preceding EVENT-2 (st@p-i),-- the 
speaker expands gradually each element (steps 2-8).1 
Step 1: The are two temporally related events 
EVENT-1 PRECEDING EVENT-2 
Step 2: When PERSON-I saw EVENT-3 
Figure 2 
t 1 will use the following conventions in the figures: text 
with gray background in white box: currently expanded 
element; viileo inverted question mark: not yet fully 
specified element; gray box: processed elements; 
pointer: element to be elaborated. 
My comments under the figures should be read in the 
following way: underlined element: currently obtained 
result; capital letters: variable, hence not yet fully 
specified element. 
Step-2: Since there are two events, the speaker lifts 
to choose. Let's assume he wants to begin with 
EVENT-I. This allows the generation of the first 
element "wheW', and the fact that there is a PERSON 
who saw some EVENT. Both person and event are still 
unspecified elements, hence written in capital letters. 
Having two unspecified elements (PERSON, EVENT-3) 
we have to choose. 
Step-3: Suppose we decided to elaborate PERSON. 
This could yield something like man + ATTRIBUTE, 
meaning, that the person who sees the event is a man 
(a terminal element) whom we want to describe further 
by providing a elmmcterizing attribute. Please uote, 
that I consider both man and the variable ATTRIBUTE 
as sister nodes, hence, in principle there could be a 
linearization problem. Whether predicates (in our case, 
the attribute) can, or are determined before the argu- 
ment they qualify (here man) remains an empirical 
question. The situation seems clearer during lexieali- 
zation where a head noun may constrain an adjective, 
hence the noun has to be generated first (colloeational 
constraint). 
Step-4 :During this step we decide on the attribute, 
the result might be "old". One could object that an 
intermediate step is necessary in order to decide on the 
kind of attribute (size, age, etc,). This is correct, but 
lot reasons of economy (size of the figures), we've 
skipped this step. 
Step-S: During this step we elaborate the node 
EVENT-3 which yields : someone drowns somewhere, 
where the person and the place are still unspecified. 
Again we have to choose which element to elaborate. 
Suppose we started with PERSON-2. 
Step-6: This could yield boy + ATTRIBUTE. Boy 
being a terminal element it needs no fnrther refi- 
nement, but we do still need to specify its attribute. 
Step-7 :If we were to characterize the boy in terms 
of size, we might get "small". 
Step-E: The instanciation of the variable (PLACE) 
might yield river. 
Having completed the description of EVENT-1 we still 
have to specify EVEN'r-2. We will leave this as n 
exercice lbr the motivated reader (ben courage!) 
4 Possible implications 
Let us see some of the advantages of our approach. 
Top-down, left-to-right expansion gives the speaker a 
good control over the whole process, minimizing tile 
danger of forgetting some information because of 
memory overload. Of course, the tree can be built in 
different ways, top-down, bottom-up, or by 
combining both methods. 2 For related appl'oaches on 
2 I believe, that the way how the tree is built depends on 
whether tile speaker has at the onset of message 
planning a clear picture of the object or scene to 
describe, or whether he has to build it from scratch. In 
the first case he could build the tree from top to bottom. 
991 
Step 3: When the man + ATTRIBUTE saw... 
Step 4: When the old man saw... 
Step 5: When the old man saw 
PERSON-2 drowning in PLACE-X, EVENT-2 
Step 6: When the old man saw the ~ + ATTRIBUTE 
drowning in PLACE-X, EVENT-2 
Step 7: When the old man saw the little boy 
drowning in PLACE-X, EVENT-2 
chosen node for elaboration 
Step 8: When the old man saw the little boy 
drowning in the river, EVENT-2 
1 
not yet specified node I ~ currently elaborated node m 
I 
Figure 3 
992 
bidirectional tree growth, see the work on segment 
grammar (de Smedt & Kempen1991), or 7?ee 
Adjoining Grammar (Joshi 1987). 
If my line of reasoning concerning message 
planning is correct, --namely that planning is basi- 
('ally a two-step process where l'irst a skeleton is 
planned (general plan), and then its constituents (spe- 
cific plan),-- then this should have consequences on 
the overall architecture o1' generators, as well as on 
the infornmtion flow (control, process). We shall see 
this in the next section on lexical choice. The basic 
question that arises in this context is the following: 
when do we process what'? This suggests the 
corrolary questions: Is all information pertaining to a 
given module processed in one go (one shot), or am 
objects gradually refined (several passes)? 
For example, 
l. is everything related to meaning processed entire- 
ly and once and for all --- hence messages can 
neither be changed not" be refined --- or 
2. are the objects to be talked about only specified to 
the degree to which they need to be at this stage of 
the process (limited commitment planning)? 
Actually, it is quite easy to find arguments in 
favor of this second hypothesis. Why should we cam 
at an early stage of the process about details, if we 
aren't sure at that point, whether the global message 
will meet the goal defined? Put differently, why cam 
at that stage about the specificity of a word or a relY- 
rent (how great a detail to give in order to ch~u'ac- 
lcrizc an object) if we areffl even sum whether the 
planned message contains all and only the inlor- 
nmtion we wish to convey? In a similar vein, why 
bother about style, spelling and punctuation, and so 
forth, if syntactic structure is likely to he changtxl? 
The point 1 am trying to make is that, people pro- 
bably start by planning things globally, filling this 
plan with details at a later stage (local planning). 
We cycle through the same kind of process but at 
different levels of detail. Having built a global plan 
we Ilesh it out with details as soon as this becomes 
necessary. The speech error literature abounds with 
examples supporting this point of view. Blends, 
substitutions, or speech repairs like 'tie conquered 
Babylon, the great Alexander', seem all to suggest, 
that the speaker plans his message in abstract terms. 
For details see (l,evelt 1989; Fromkin, 1993). 
Genuine examples, as the lbllowing from Maclay 
and Osgood (1959:25) clearly show that the speaker 
has not necessarily everything planned at the onset 
of articulation: kruger stretches of spontancous dis- 
course are full of pauses, stutters and mistakes: 
<< As far as l know, no one yet has done the / in a 
way obvious now and interesting problem of 
Ipausel doing a/ in a sense a structural frequency 
study of the alternative \[pause\] syntactical \[uhl / 
in a given language, say, like English, the 
alternative \[uhl possible structures, and how / 
what their hierarchical Ipausel probability of 
occurrence structure is. >> 
5 Lexical choice as paltem matching 
Having planned the underlying content, let's see how 
the dictkmary may feed back on the conceptual com- 
ponent. The process of finding words for conceptual 
structures can be viewed as lexically nrediated, hence 
indirect structure mapping: conceptual li'agments ~ue 
mapped onto words via the lexicon (see figure 4), the 
latter serving as an interface between thought (con- 
cepts) and language (words). 
This view raises a number of interesting problems: 
(a) What shall we do if not all of the planned 
message can be expressed by the words available at a 
given moment? Should we backtrack (try to lind 
another word); change the underlying meaning (replan 
Figure 4: The lexicon (I,L) as mediator between the 
conceptual level (CL) and word level (WIO. 
the content by adding or deleting some inlbrnmtion); 
or carry over to the next cycle (word or sentence) the 
specific parl that couldn'l be expressed (cany-over 
phenomenon) '? For an atl:cmpt to solve this problem, 
see Nicolov et al. (1996). (b) On what basis do we 
choose among a set of potential candklates? (c) What 
information is available at the onset o1' lexicalization 
(all or only part), i.e. at what moment is the word's 
underlying meaning fully specified? 
In order to answer these questions, let me take an 
example. Suppose we warned to express the fact that 
a person moves on a given surlhce in a given 
direction (see GI in Figure 5 next page, or, the left 
branch of second EVENT of our example in section 3 
on message phmning: ...he went to his canoe...). 3 
Obviously, there is more than one way. According to 
the input specifications we could consider either 'to 
move, to swim, to walk', or 'to run'. All these words 
express the notion of movement. Yet, not all of them 
fit equally well the initial message, and lor quite 
different reasons. While the second cmldidate ('1o 
swim') is simply in contradiction with tmrt of thc 
initial specification (location: ground), the last one 
('to run') expresses more than the initial message 
phnmcd. Now, what cot, kl motivate the choice 
between the two remaining candidates, 'to move' ~u~l 
'to walk"? Both of them m'e subgraphs of the 
utterance graph, that is, both of them express part of 
the message planned. Yet 'to walk' expresses more of 
:~ Figure 5 is to be read counterclockwise. 
993 
GI : Initial Utterance graph 
il 
h. G2 : New Utterance graph 
i::~:ii~:ii ~:i:ii~?:i~i~i~i:~~:'i ~ii~i~iiii~i!i~i~i~i!i{ii!~i~/~i`.i~ii~/ii`~i~.~i!~ii!i~i~i~!~i{i~i ~::~,~iiii~i~i~i~:~:i ~:~:ii~:ii:~i:~{~i:~iii~iiii~!ii ~:;i~:ii:~:~i~i:ii::~i: 
VERB : "to move" • VERII : "to move" Correlation factor : 2 
• VERB : "to swim" 
I P,~.~°,'.'~.°l--<~--I .o~,~.~.:. ~°l 
r-finn ~ 
• VERB : "to swim" Correlation factor : 2 
VERB : "to walk" • VERB : "to walk" Correlation factor : 4 
• VERB : "to run" • VERB : "ta run" Correlation factor : 4 
message (concept) planned 
before the lexicalization phase 
not matched, hence 
unexpressed concept 
Figure 5 
matched, hence 
expressed concept 
concept added after 
the lexicalization phase 
994 
the message graph, hence, all things being equal, it 
will be preferred. 
Actually, the choice of a specific word will depend 
to a large extend on pragmatic factors like speaker's 
goals, time and space constraints, hearer's expertise, 
and so forth. Of course, all these factors can be 
interrelated. If conciseness is what we are looking for, 
then the most specific word ('to walk') is to be 
preterred to the more general term ('to move'). The 
reason is simply economy: more of the message is 
expressed by using the given resources, hence time 
and space is gained. There is a limitation though. The 
words chosen have to match the user's expertise. A 
cooperative speaker uses dense words 4 or technical 
terms ('computer', 'space shuttle') only for people 
whose lexical competency allows them to understand 
their meaning, otherwise he will decompose these 
words. 
Now let me get back to the last candidate ('to 
run'), as it raises an interesting problem. The word's 
underlying meaning is not in contradiction with the 
initial message. Yet it does not express exactly what 
was planned. Actually, it expresses something more: 
the notion of speed. 5 This being so, the question 
arises, whether 'to run' can be considered as a valid 
candidate. I believe that it does, provided that the 
additional information (speed) is consistent with 
some belief about the state of the world, and that the 
speaker considers it worthwhile mentioning, if that is 
so, then we have here evidence for feedback of the 
lexical component to the conceptual component. The 
described situation may seem somehow artificial, yet, 
1 believe it occurs quite often in spontaneous 
discourse, even if we are not aware of it. When we 
get to the point of choosing a word, it is the power 
of the language that drives us to say something that 
was not initially planned. Of course, we are ti'ee not 
to mention the additional piece of information, but 
this is not really the point. The point is, that not all 
information ultimately expressed by a word has been 
available in the speaker's mind at the onset of 
lexicalization. 
6 Discussion 
The approach taken here raises an interesting 
problem. When choosing words we express not only 
a given meaning, but we may end up adding to the 
conceptual structure (message) meanings thin initially 
we had not planned. While there are good reasons to 
believe that at an early stage of processing the 
tmderlying meaning of words is underspecified ',rod 
4 Dense words are generally abstract words like inflation- 
rate, superstition, belief. They carry a lot of informa- 
tion. 
Please note that this example serves only for illustra- 
tive purposes. Actually, 'to run' and 'to move fast on 
the ground" do not refer exactly to the same kind of 
locomotion. For more details, see Nogier & Zock 
(1992). 
abstract, hence language independent, the question 
arises whether the information added during the 
expansion phase is not language specific to a large 
extent. Much of the information to be ~:ted is 
precisely intbrmation required by the lexicon. While 
it is possible that for syntactic and other reasons we 
cannot use or retrieve the word planned, it is less 
obvious why one would want to add at this stage 
information for which we don't have words (see also 
Meteer, 1990). Obviously, subscribing to our 
reasoning is taking a stance with regard to the 
language-thought debate (Whorl, 1956), namely, 
language may drive thought. Surprisingly, this is all 
the more likely as we get close to the surface, that is, 
relatively late in the process. 
7 Conclusion 
In this paper I have tried to give evidence for three 
claims, namely that thought is not completely 
specified at the onset of lexicalization; that there is 
feedback from the lexical to the conceptual 
component; and that the dictionary plays a 
flmdamental role in guiding and potentially modifying 
non-linguistic thought. Hence, it is during the 
lexicalization phase that language can play this 
important role over thought. This is what I meant to 
say, when I wrote in my title: the power of words in 
message planning. 
REFERENCES 
K. de Smedt & G. Kcmpen: Segment grammar: a l'orma- 
lism for Incremental Sentence Generation, In, Paris, 
C. & W. Swartout & W. Mann (Eds.). Natural Lan- 
guage Generation in Artificial Intelligence and Com- 
putational Linguistics, K\[uwer Academic Publishers, 
Boston, 1991 
V. Fromkin: Speech Production. In J. Berko, Gleason & 
N. Bernstein Ratner (eds.) Psycholinguistics. Fort 
Worth, TX: Harcourt, Brace, Jovanovich. 1993 
A. Joshi: Tree Adjoining Grammars and their Relevance 
to Generation, in: Kempen (Ed.). Natural Language 
Generation: New Results in Artificial Intelligence, 
Psychology and Linguistics, Martinus Nijhoff Pbs, 
Dordrecht, 1987 
W. Levelt: Speaking : From Intention to Articulation. 
Cambridge. Mass.: MIT Press; 1989 
M. Meteer: The 'Generation Gap': the problem of expres- 
sibility in text planning., BBN Rep. n ° 7347, 1990. 
J.F. Nogier J.F. & M. Zock: Lexical choice by pattern 
matching. In Knowledge Based Systems, 5 (3), 1992. 
N.Nicolov, C.Mellish and G.Ritchie, Approximate 
Generation from Non-Hierarchical Representations, 
8th International Workshop on Natural Language 
Generation, Herstmonceux Castle, 13-15 June 1996. 
B.Whorf: Language, thought, and reality. Cambridge. 
Mass., MIT Press, 1956 
995 
