A Connectionist Treatment of Grammar 
for Generation: Relying on Emergents 
Nigel Ward 
Computer Science Division 
University of California at Berkeley 
Abstract 
Parallel Ireatment of syntactic considerations in generation 
promises quality and speed. Parallelism should be used not only 
for simultaneous processing of several sub-parts of the output, but 
even within single parts. If beth types of parallelism are used with 
incremental generation it becomes unnecessary to build up and ma- 
nipulate representations of sentence structure-- the syntactic form 
of the output can be emergent. 
FIG is a structured conneetionist generator bulk in this way. 
Constructions and their constituents are represented in the same 
network which encodes world knowledge and lexical knowledge. 
Grammatical output results from synergy among many construe- 
tions simultaneously active at run-time. FIG incorporates new 
ways of handling constituency, word order and optional con- 
stituents; and simple ways to avoid the problems of instantiatien 
and binding. Syntactic knowledge is expressed in a simple, read- 
able form; this representation straightforwardly defines parts of the 
network. 
1 Introduction 
Generation research has not yet fully identified the advan- 
tages offered by parallelism nor the techniques necessary to 
take advantage of it. This is especially true for the syntactic 
aspects of generation. 
This paper presents a way to exploit parallelism for syn- 
tax in generation. The key points are: Syntactic construc- 
tions are encoded in the same knowledge network as words 
and concepts. Many constructions are active in parallel; 
there is synergy, and sometimes competition. The syntactic 
form of the output emerges from interactions among con- 
structions at run-time-- explicit syntactic choice and build- 
ing up of representations of syntactic structure are unneces- 
sary. 
To see that this approach works for syntactically non- 
trivial examples, consider that FIG's outputs include: "once 
1Thanks to Daniel Jurrafsky, Robert Wilensky, Dekai Wu, and Terry 
Regier. This research was sponsored by the Defense Advanced Research 
Projects Agency (DoD), monitored by the Space and Naval Warfare Sys- 
tems Command under N00039-88-C-0292, and the Office of Naval Re- 
search under contract N00014-89-J-3205. An early version of this paper 
appears in the Proceedings of the 12th Cognitive Science Conference, Erl- 
baum, 1990. 
upon a time there lived an old man and an old woman," 
"one day the old man went into the hills to gather wood," 
" a big peach bobbed down towards an old woman from up- 
stream," "an old woman gave a peach to an old man," 
"John broke a dish," "John made the cake vanish," 
and "Mary was killed;" and when producing Japanese: 
"mukashi mukashi aru tokoro ni ojiisan to obaasan ga 
sunde imashita," "aru hi ojiisan wa yama e shibakari ni 
ikimashita," "kawakami kara ookii momo ga donburiko 
donburako to obaasan e nagarete kimashita," "ojiisan wa 
meeri ni momo o agemashita," and "meeri o koroshi- 
mashita." 
Section 2 discusses parallelism in syntax and presents the 
basic proposal. Section 3 presents a framework for connec- 
tionist generation, and Section 4 elaborates the proposal in 
this framework. Sections 5 through 8 discuss an implemen- 
tation of these ideas: Section 5 presents a representation for 
grammatical knowledge, Section 6 explains how the pro- 
posal accounts for specific syntactic phenomena, Section 7 
presents an example of the generator in action, and Section 
8 discusses general implementation issues. Section 9 sum- 
marizes. 
2 Parallel Syntax 
This section discusses two types of parallelism for syn- 
tax, proposes that a generator should have both of them, and 
sketches out the advantages of such an approach. 
Natural language generation research traditionally as- 
sumed that syntactic choices are made in a fixed (and gen- 
erally top-down) order. Yet, for incremental generation at 
least, it is clear that a fixed order of decisions is not appro- 
priate. This realization has led to generators which work on 
several parts of the input in parallel, simultaneously build- 
ing several sub-trees. Recent work in this area includes 
(De Smedt 1990) and (Finkler & Neumann 1989). I will 
refer to this type of parallelism as 'part-wise' parallelism. 
A second kind of parallelism involves using several con- 
structions to generate even one part of the output. As far 
as I know, this 'within-part' parallelism has not been pro- 
posed in the generation literature. It has proven useful in lin- 
guistics. In Fillmore's Construction Grammar the syntactic 
15 
structure of sentences is accounted for in terms of 'superim- 
position' of constructions (Fillmore 1989b). It has also been 
used in psycholingnistics, where analysis of speech errors 
suggests that even normal speech is the result of competing 
'plans' (Baars 1980). More specifically, (Stemberger 1985) 
suggested that human speakers can be modeled as having 
many 'phrase structure units' being 'partially activated' si- 
multaneously. That is, many syntactic alternatives for ex- 
pressing some piece of meaning are considered in parallel. 
I propose that a generator should exploit both part-wise 
and within-part parallelism. 
Parallel generation is a good idea for several reasons. 1. 
It has been observed that part-wise parallelism is a good 
way to improve the speed of response, especially for incre- 
mental generation. 2. Part-wise parallelism is also useful 
for handling dependencies. It is not always the case that 
one part can be processed without consideration of the way 
the surrounding utterance will turn out. If the various parts 
are generated in parallel then knowledge about the proba- 
ble output for one part is available for consideration when 
building another part. This can lead to better quality. 3. 
Given the possibility of constraints among the various syn- 
tactic choices involved in building an utterance, there is the 
possibility that a 'first choice' will not work out when the 
larger context is considered. This suggests within-part par- 
allelism, so that a generator has available alternative ways 
to realize some information. Given this it can find a set of 
choices satisfies all the dependencies, resulting in consis- 
tent and natural utterance. 4. If a generator is indeed to 
consider all the possible dependencies among choices, then 
parallelism becomes necessary to cope with the amount of 
computation necessary. 5. Parallelism is the natural way to 
generate if the input is very complex (Ward 1989a). 
3 The FIG Approach to Generation 
Reduced to bare essentials, a generator's task is to get 
from concepts (what the speaker wants to express) to words 
(what he can say). On this view, the key problem in genera- 
tion is computing the relevance (pertinence) of a particular 
word, given the concepts to express. Syntactic and other 
knowledge mediates this computation of relevance. 
Accordingly FIG is based on word choice -- every other 
consideration is analyzed in terms of how it affects word 
choice. 
FIG is based on a large semantic network. Words are 
nodes in the network, the activation they receive represents 
evidence for their relevance. The basic FIG algorithm is: 
1. each node of the input is a source of activation 
2. activation flows through the network 
3. when the network settles, the most highly activated 
word is selected and emitted 
4. activation levels are updated to represent the new cur- 
rent state 
5. steps 2 through 4 repeat until all of the input has been 
conveyed 
Thus FIG is an incremental generator. Its network must 
be designed so that, when it settles, the node which is most 
highly activated corresponds to the best next word. This 
paper discusses only the network structures which encode 
syntactic knowledge. 
Elsewhere I argue that FIG points the way to accurate 
and flexible word choice (Ward 1988), producing natural- 
sounding output for machine translation (Ward 1989c), and 
modeling the key aspects of the human language production 
process (Ward 1989a). 
4 Conneetionist Syntax: Overview 
In FIG constructions and constituents also are represented 
as nodes in the knowledge network. Their activation levels 
represent their current relevance. They interact with other 
nodes by means of activation flow. Any number of construc- 
tions can be simultaneously active. This handles part-wise 
parallelism, competition, and superimposition. 
Syntactic considerations manifest themselves only 
through their effects on the activation levels of words (di- 
rectly or indirectly). An utterance is simply the result of 
successive word choices. FIG does produce grammatical 
sentences, most of the time, but their 'syntactic structure' is 
emergent, a side-effect of expressing the meaning. Thus we 
can say that the syntactic form of utterances is emergent in 
FIG 2. This point will be illustrated repeatedly in Section 6. 
Mechanisms developed by linguists (and often adopted 
by generation researchers), such as unification, are not di- 
rected to the task of generation (or parsing) so much as to 
the goal of explaining sentence structure. Accounting for 
the structure of sentences may be a worthwhile goal for lin- 
gnistics, but building syntactic structures is not necessary 
for language generation, as subsequent sections will show. 
The most common metaphor for generation is that of 
making choices among alternatives. For example, a gen- 
erator may choose among words for a concept, among ways 
to syntactically realize a constituent, and among concepts 
to bind to a slot. Given this metaphor, organizing choices 
becomes the key problem in generator design. Attempts 
to build parallel generators while retaining the notion of 
explicit choice run up against problems of sequencing the 
choices or of doing bookkeeping so that the order of choices 
can vary. This appears to be difficult, judging by the gen- 
eral paucity of published outputs in descriptions of parallel 
generators. On the other hand, relying on emergents means 
2post hoe examination of FIG output might make one think, for exam- 
ple, 'this exhibits the choice of the existential-there construction.' In HG 
there is indeed an inhibit link between the nodes ex-there and subj-pred, 
and so when generating the network tends to reach a state where only one of 
these is highly activated. The most highly activated construction can have 
a strong effect on word choices, which is why the appearance of syntactic 
choice arises. 
16 
(defp noun-phr 
(constituents 
(defp go-p 
(constituents 
(np-I 
(np-2 
(np-3 
(9-P-1 
(gp-2 
(gp-3 
(gp-4 
obl article ((article 1.2))) 
opt adjective((adJective .28))) 
obl noun ((cnoun .47))) )) 
Figure 1: Representation of the Eng~sh Noun-Phrase Construction 
obl go-w ((go-w .2))) 
opt epart ((vparticle .6) (directionr .2))) 
opt'noun ((prep-phr .6) (destinationr .2))) 
opt verb ((purpose-clause .7) (purposer .2))) 
Figure 2: Representation of the Valence of "Go" 
)) 
(defp ex-there 
(inhibit subj-pred passive) 
(constituents (et-i obl therew ((therew .5))) 
(et-2 obl verb ((verb .5))) 
(et-3 obl noun ((noun .3))) )) 
Figure3: Representation ofthe Existential "Then" Construction 
there are no explicit choices to worry about, and thus there 
are no problems of ordenng or bookkeeping at all(Ward 
1989b). 
In FIG all types of knowledge represented are uniformly 
in the network, and interact freely at run time. FIG not only 
allows this kind of interaction among various considerations 
when generating, it relies on it. It relies on synergy among 
constructions in the same way that Construction Grammar 
does. It relies on synergy between semantic and syntactic 
considerations, as seen below in Section 6.7. It also enables 
interaction among lexical choices and syntactic considera- 
tions. 
5 Knowledge of Syntax 
This section presents FIG's representation of knowledge, 
first presenting it in a declarative form then showing how 
that representation maps into network structures. 
Starting with this section I will be largely describing FIG- 
as-implemented, as of May 1990. This is for the sake of 
concreteness. The theory, however, is intended to apply 
to parallel generators in general. Moreover, the syntactic 
knowledge presented in this section is purely illustrative. I 
do not claim that these represent the facts of English, nor 
the best way to describe them in a grammar. In particular, 
many generalizations are not captured. The examples are 
intended simply to illustrate the representational tools and 
computational mechanisms available in FIG. Many details 
are left unexplained for lack of space. 
Figure 1 shows FIG's definition of noun-phr, represent- 
ing the English noun-phrase construction. This construction 
has three constituents: np-1, np-2, and np-3. rip-1 and np- 
3 are obligatory, np-2 is optional. Glossing over the details 
for the moment, the list at the end of each constituent's defi- 
nition specifies how to realize the constituent. For example, 
np-1, np-2, and np-3, should be realized as an article, ad- 
jective, and noun, respectively. 
Figure 2 shows the construction for the case frame of the 
word "go." First comes go-w, for the word "go," which is 
obligatory. Next come (optionally): a verb-particle repre- 
senting direction (as in "go away" or "go back home" or 
"go down to the lake"), a prepositional phrase to express 
the destination, and a propose clause. 
Figure 3 shows the representation of the existential 
"there" construction, as in "there was a poor cobbler." The 
'inhibit' field indicates that this construction is incompati- 
ble with the passive construction and also with subj-pred, 
the construction responsible for the basic SVO ordering of 
English. 
Figure 4 shows knowledge about when and where con- 
structions are relevant. Bdetty, constructions are associated 
with words, with concepts, and with other constructions. 
Constructions are associated with the meanings they can 
express. For example, ex-there is listed under the concept 
introductory, representing that this construction is appro- 
priate for introducing some character into the story, and 
purpose-clause is listed as a way to express the purposer 
relation. 
Constructions are associated with words. For example 
go-p is the 'valence' (case frame) of go-w and noun-phr is 
the 'maximal' of cnoun. 
Constructions are also associated with other construc- 
tions. For example, the fourth constituent of go-p subcat- 
egodzes for purpose-clause (Figure 2); and there are nega- 
tive associations among incompatible constructions, for ex- 
ample the 'inhibit' link between ex-there and subj-pred 
(Figure 3). 
Figure 5 shows a fragment of FIG's network, where the 
numbers on the links are their weights. This is partially 
17 
(defw peachw 
(smallcat cnoun) (expresses momoc) 
(defs cnoun (bigcat noun .4) (maximals 
(defw go-w (cat verb) (expresses ikuc) 
(grapheme (inf "go") (past "went") 
(defc introductoryc (properties persistent) (english 
(defr purposer (english (to2w .4) (purpose-clause .i)) 
(grapheme "peach") 
(noun-phr .4))) 
(valence (go-p 
(pastp "gone") 
(english (consnt-initial 
; common-noun 
.2)) 
(presp "going")) ) 
(ex-there .2) )) 
(japanese (ni-w .6))) 
Figure 4: Some Knowledge Related to Constructions 
.5)) ) 
nou n.p hr ~'~"----"~"'-,~ ,,/ \ 
In-contextc 1 .~p-1 np-2 np-3 ,, . 4 7 \ 
X adJecti:e %oun / 
X the-w a-w peachw ~ consnt-inltlal 
.5 
Figure 5: A Fragment of the Network 
specified by the knowledge shown in the previous figures. 
The mapping from s-expressions to network structures is not 
quite trivial. For example, the link from noun to peaehw 
comes from the statements that peachw has 'subcat' cnoun 
and that cnoun has 'bigcat' noun. Similarly, the link from 
peaehw to noun-phr is inherited by peachw from the 'max- 
imals' information on cnoun. 
6 Various Syntactic Phenomena 
6.1 Constituency 
The links described above suffice to handle constituency. 
Consider for example the fact that common nouns must be 
preceded by articles in FIG's subset of English. Suppose 
that peachw is activated, perhaps because a peache concept 
is in the input. Activation flows from peachw via noun- 
phr, rip-l, and article to a-w and the-w. 
In this way the relevance of a noun increases the rele- 
vancerating of articles. Provided that other activation levels 
are appropriate, this will cause some article to become the 
most highly activated word, and thus be selected and emit- 
ted. Note that FIG does not first choose to say a noun, then 
decide to say an article; rather the these 'decisions' emerge 
as activation levels settle. 
Any node can be mentioned by a constituent, thus con- 
structions can specify: which semantic elements to include 
(metonymies), what order to mention things in, what func- 
tion words to choose, and what inflections to use. 
6.2 Subcategorization 
Consider the problem of specifying where a given con- 
cept should appear and what syntactic form it should take. 
In FIG this is handled by simultaneously activating a con- 
cept node and a syntactic construction or category node. For 
example, the third constituent of go-p specifies that 'the di- 
rection of the going' be expressed as a 'verbal particle.' Ac- 
tivation will thus flow to an appropriate word node, such as 
downw, both via the concept filling the directionr slot and 
via the syntactic category vparticle. Thanks to this sort of 
activation flow FIG tends to select and emit an appropriate 
word in an appropriate form (Ward 1988). Government, for 
example, the way that some verbs govern case markers, is 
handled in the same way. 
6.3 Word Order 
In an incremental connectionist generator, at each time 
the activation level of a word must represent its current rele- 
vance. In particular, words which are currently syntactically 
appropriate must be strongly activated. In FIG the represen- 
tation of the current syntactic state is distributed across the 
constructions. There is no central process which plans or 
manipulates word order; each construction simply operates 
18 
independently. More highly activated constructions send 
out more activation, and so have a greater effect. But in the 
end, FIG just follows the simple rule, 'select and emit the 
most highly activated word.' Thus word order is emergent. 
In FIG the current syntactic state is encoded in construc- 
tions' activation levels and 'cursors.' The cursor of a con- 
struction points to the currently appropriate constituent and 
ensures that it is relatively highly activated. To be spe- 
cific, the cursor gives the location of a 'mask' specifying the 
weights of the links from the construction to constituents. 
The mask specifies a weight of 1.0 for the constituent un- 
der the cursor, and for subsequent constituents a weight pro- 
portional to their closeness to the cursor. (Subsequent con- 
stituents must receive some activation so that there is part- 
wise parallelism.) (For unordered constructions the weights 
on all construction-constituent links are the same.) 
For example, when the cursor of noun-phr points to np- 
1, articles receive a large proportion of the activation of 
noun-phr. Thus, an article is likely to be the most highly 
activated word and therefore selected and emitted. After an 
article is emitted the cursor is advanced to np-2, and so on. 
Advancing cursors is described in Section 6.5. 
In accordance with the intuition that a word is not truly 
appropriate unless it is both syntactically and semantically 
appropriate, the activation level for words is given by the 
product (not the sum) of incoming syntactic and seman- 
tic activation, where 'syntactic activation' is activation re- 
ceived from constituents and syntactic categories. The 
problem with simply summing is that it results in the the 
network often being in a state where many word-nodes have 
nearly equal activation, which makes the behavior is over- 
sensitive to minor changes in link weights. 
6.4 Optional Constituents 
When building a noun-phrase a generator should emit an 
adjective if semantically appropriate, otherwise it should ig- 
nore that option and emit a noun next. FIG does this without 
additional mechanism. 
To see this, suppose "the" has been emitted and the cursor 
of noun-pbr is on its second constituent, np-2. As a result 
adjectives get activation, via rip-2, and so to a lesser extent 
do nouns via np-3. There are two cases: If the input includes 
a concept linked (indirectly perhaps) to some adjective, that 
adjective will receive activation from it. In this case the ad- 
jective will receive more syntactic activation than any noun 
does, and hence have more total activation, so it will be se- 
lected next. If the input does not include any concept linked 
to an adjective, then a noun will have more activation than 
any adjective (since only the noun receives semantic activa- 
tion also), and so a noun will be selected next. 
Most generators use some syntax-driven procedure to in- 
spect semantics and decide explicitly whether or not to real- 
ize an optional constituent. In FIG, the decision to include 
or to omit an optional constituent (or adjunct) is emergent 
-- ff an adjective becomes highly activated it will be cho- 
sen, in the usual fashion, otherwise some other word, most 
likely a noun, will be. 
6.5 Updating Constructions 
Recall that FIG, after selecting and emitting a word, up- 
dates activation levels to represent the new state. There are 
are several aspects to this. 
The cursors of constructions must advance as constituents 
are completed. The update mechanism can 'skip over' 'opt 
constituents, since, for example, ff there are no adjectives, 
the cursor of noun-phr should not remain stuck forever at 
the second constituent. More than one construction may be 
updated after a word is output, for example, emitting a noun 
may cause updates to both the prep-phr construction and 
the noun-phr construction. 
Constructions which are 'guiding' the output should be 
scored as more relevant. Therefore the update process 
adds activation to those constructions whose cursors have 
changed and sets temporary lower bounds on their activa- 
tion levels. Thus, even though FIG does not make any syn- 
tactic plans, it tends to form a grammatical continuation of 
whatever it has already output. After the last constituent of 
a construction has been completed, the cursor is reset and 
the lower bound is removed. 
Why is a separate update mechanism necessary? Most 
generators simply choose a construction and 'execute' it 
straightforwardly. However, in FIG no construction is ever 
'in control.' For example, one construction may be strongly 
activating a verb, but activation from other constructions 
may 'interfere,' causing an adverbial, for example, to be in- 
terpolated. Therefore constructions need this kind of feed- 
back on what words have been output. 
6.6 No Instantiation or Binding 
It is not obvious that notions of instanfiafion, binding, em- 
bedding, or recursion are essential for the description of nat- 
ural language. Nor are mechanisms for these things essen- 
tial for the generation task, I conjecture. This subsection 
considers a problem which is usually handled with instanti- 
ation and shows how it can be handled more simply without. 
Consider the problem of generating utterances with mul- 
tiple 'copies,' for example, several noun phrases, or several 
uses of "a". Note that FIG as described so far would have 
problems with this. For example since all words of cate- 
gory cnoun have links to noun-phr, that node might re- 
ceive more activation than appropriate, in cases when sev- 
eral nouns are active. This could result in over-activation of 
articles, and thus premature output of "the," for example. 
In fact FIG uses a special rule for activation received 
across inherited links: the maximum (not the sum) of these 
amounts is used. For example, this rule applies to the 'max- 
imal' links from nouns to noun-phr, thus noun-phr effec- 
tively 'ignores' all but the most highly activated noun. (This 
was not shown in Figure 5.) 
3_9 
F; q I .mm°l i 
Figure 6: An Input to FIG 
7 sp-1 ~nouf 
subj'pre i 
~go-w I old-womancl I j 
np-3 
/ 
old-womanw 
noun-phr ~ np-1 ~ article ~'a'w \ 
j the-w 
-~ in-contextw 
Figure 7: Selected Paths of Activation Flow Just Before Output of "the" 
An earlier version of FIG handled this by actually mak- 
ing copies. For example, it would make a copy of noun- 
phr for each noun-expressible concept, and bind each copy 
to the appropriate concept, and to copies of a-w and the- 
w. This worked but it made the program hard to extend. 
In particular, it was hard to choose weights such that the 
network would behave properly both before and after new 
nodes were inslantiated and linked in. 
6.7 Low-level Coherence 
Words must stand in the correct relations to their neigh- 
bors. For example, a generator must not produce "the big 
man went to the mountain" when the input calls for "the 
man went to the big mountain". This is the problem of emit- 
ting the right adjective at the right time, or, in Other words, 
only emitting adjectives that stand in an appropriate relation 
to the head noun. 
Most generators handle this easily with structure- 
mapping or pointer following. For example, a syntax- 
directed generator may, whenever building a noun phrase, 
traverse the 'modified-by' pointer to find the item to turn 
into an adjective. FIG, however, eschews structure manip- 
ulation and pointer following. Like all connectionist ap- 
proaches, therefore, it is potentially subject to problems with 
crosstalk. 
The way to avoid this is to ensure that related concepts be- 
come highly activated together. In the example, bige should 
become activated together with mountainc, not together 
with old-mane. Using a more elaborate terminology, this 
means that there should be some kind of 'focus of attention' 
(Chafe 1980), which successively 'lights up' groups of re- 
lated nodes. 
This condition is met in FIG, thanks to the links among 
the nodes of the input. For example, if mountaincl is linked 
by a sizer link to bigel, then bigcl will tend to become 
highly activated whenever mountaincl is. Thus, when old- 
mancl is the most highly activated concept-node, bigel 
will only receive energy from it indirectly (via an inverse- 
agentr link, a locationr link, and a sizer link) and thus will 
not be activated sufficiently to interfere early in the sen- 
tence. 
7 Example 
This section describes how FIG produces "the old woman 
went to a stream to wash clothes." For this example 
the input is the set of nodes go-el, old-womancl, wash- 
elothescl, streamcl, and paste, linked together as shown 
in Figure 6. (The names of the concepts have been an- 
glicized for the reader's convenience.) (Boxes are drawn 
around nodes in the input so that they can be easily identi- 
fied in subsequent diagrams.) 
Initially each node of the input has 11 units of activation. 
After activation flows, before any word is output, the most 
highly activated word node is the-w, primarily for the rea- 
sons shown in Figure 7. Figure 8 shows the activation levels 
of selected nodes. 
After "the" is emitted the update mechanism activates 
noun-phr and advances its cursor to np-2. The most highly 
activated word becomes old-womanw, largely due to acti- 
vation from np-3. 
After "old woman" is emitted noun-phr is reset -- that 
is, the cursor is set back to np-1 and it thereby becomes 
ready to guide production of another noun phrase. Also, 
now the cursor on subj-pred advances to sp-2. As a result 
verbs, in particular go-w, become highly activated. 
20 
go-p 
---PATTERNS ........ WORDS ......... CONCEPTS--- 
15.6 SUBJ-PRED 29.7 THE-W 19.70LD-WOMANCi 
SP-i sp-2 21.0 A-W 15.0 IKUCi 
7.6 CAUSATIVEP 18.50LD-WOMANW 14.0 KAWACI 
CP-I cp-2 cp-3 13.3 STREAMW 13.2 SENTAKUCI 
6.6 NOUN-PHR 10.7 RIVERW ii.0 PASTC 
NP-I np-2 np-3 10.0 GO-W 8.3 VOWEL-INITIAL 
1.8 GO-P 7.5 WASH-CLOTHESW 6.1 CONSNT-INITIAL 
GP-i gp-2 gp-3 gp-4 3.9 TOiW 5.8 TOPICC 
1.4 PURPOSE-CLAUSE 3.2 MAKEW .... OTHER ..... 
PC-i pc-2 pc-3 2.9 TOWARDSW 13.4 CAUSER 
0.2 PREP-PHR 2.9 INTOW 10.4 AGENTR 
PP-i pp-2 2.5 TO2W 6.9 ARTICLE 
0.4 WITHW 4.5 NOUN 
Figure 8: Activation Levels of Selected Nodes Just Before Output of "the" 
destinationr 
-~gp-3 /pp-1 ~ prepositlo'~ ~tolw 
prep'phr ~ pp-2 noun \[ streamcl I~streamw 
Figure 9: Selected Paths of Activation Flow Just Before Output of "to" 
go-w is selected. Because pastc has more activation than 
presentc, infinitivec and so on, go-w is inflected and emit- 
ted as "went" (the inflection mechanism is not described in 
this paper), go-p's cursor advances to its second constituent, 
thus it activates directional particles, although there is no se- 
mantic input to any such word in this case. tolw becomes 
the most highly activated word, primarily for the reasons 
shown in Figure 9. 
After "to" is emitted, the cursor of prep-phris advanced. 
The key path of activation flow is now from the second con- 
stituent of prep-phr to noun to streamw to noun-phr to 
article to a-w. Thus a is selected. The inflection mecha- 
nism produces "a" not "an" since consnt-initial is more 
highly activated than vowel-initial. 
Then the cursor of noun-phr advances and "stream" is 
emitted. After this the cursor of go-p advances to gp-4. 
From this constituent activation flows to purpose-clause, 
and in due course "to" and "wash clothes" are emitted. 
Now that all the nodes of the input are expressed, FIG 
ends, having produced "the old woman went to a stream to 
wash clothes." 
8 About the Implementation 
I have used a connectionist model because it is a good 
way to explore interactivity, parallelism, emergents, not be- 
cause of fondness for connectionism-for-its-own-sake. 
Thus I have not attempted to develop a distributed con- 
nectionist model. Distributed models do have various ad- 
vantages, such as elegant handling of generalizations and 
the potential for learning. Yet the current state of PDP tech- 
nology does not seem up to building an interactive model of 
a complex task like language generation. I therefore devel- 
oped FIG as a structured (localis0 connectionist system. 
I have also not attempted to make FIG a 'pure' connec- 
tionist model. For example, updating constructions is cur- 
rently done by a special process that goes in and changes 
activation levels and moves the cursor. (This process uses 
the third elements in the constituent descriptions of Figures 
1-3, not previously discussed.) FIG could be made more 
'pure' by doing this connectionistically, perhaps by adding 
new nodes with special properties. But this change would 
not improve FIG's performance, since there seems no need 
for the update process to interact with the other processes. 
A connectionist model of computation allows parallelism 
and emergents, but it certainly does not require them. In- 
deed, other generators built using structured connection- 
ism (Kalita & Shastri 1987; Gasser 1988; Kitano 1989; 
Stolcke 1989) do not appear to exploit parallelism much, nor 
do they exhibit emergent properties. For example, Gasser's 
CHIE relies heavily on winner-take-all subnetworks, which 
cuts down on the amount of effective parallelism. Also, far 
from exploiting emergents, CHIE uses 'neuron firings' to 
model syntactic choices; these happen sequentially and the 
21 
exact order and timing of firings seems crucial. 
Currently FIG has about 350 nodes and 1000 links. Be- 
fore each word choice, activation flows until the network 
settles down, with cutoff after 9 cycles. This takes about .2 
seconds per word on average, simulating parallel activation 
flow on a Symbolics 3670 (1.6 seconds on a Sun 3/140)~ 
The correct operation of FIG depends on having correct 
link weights. I have no theory of weights, indeed rinding 
appropriate ones is still largely an empirical process. How- 
ever there are regularities, for example, all 'inhibit' links 
have weight .7, almost all links from syntactic categories 
to their members have weight .5, and so on. Many of the 
weights have a rationale: for example, the link from rip-1 
to articles has a relatively high weight because articles get 
very little activation from other sources. No single weight 
is meaningful; the way it functions in context is. For exam- 
ple, the exact weight of the link from the first constituent of 
subj-pred to noun is not crucial, as long as the product of 
it and the weight on the agentr relation is appropriate. 
FIG's knowledge is, of course, very limited. Adding new 
concepts, words or constructions is generally straightfor- 
ward; they can be encoded by analogy to similar nodes, and 
usually the same link weights suffice. Occasionally new 
nodes and links interact with other knowledge in the system 
in unforeseen ways, causing other nodes to get too much 
or too little activation. In these cases it is necessary to de- 
bug the network. Sometimes trial-and-error experimenta- 
tion is required, but often the acceptable range of weights 
can be determined by examination. This is a kind of back- 
propagation by hand; it could doubtless be automated. 
9 Summary 
I have proposed a new way to handle syntax for genera- 
tion. The proposal also relies heavily on parallelism: part- 
wise parallelism, competition, and cooperation. Also, syn- 
tactic considerations are used in parallel with lexical and 
world knowledge and there is pervasive interaction among 
them. This promises improved output quality without sacri- 
ricing speed, on parallel hardware. The proposal also relies 
heavily on emergents -- it does not make syntactic choices 
nor build up representations of syntactic structure. The net- 
work representations of linguistic knowledge affect word 
choice and order directly. 
This work is not traditional linguistics, artiricial intelli- 
gence, or connectionism, but uses techniques from all three 
fields. I hope this will stimulate further work in empirical 
computational linguistics, modeling human language pro- 
duction, and building useful parallel generation systems. 

References 
Baars, Bernard K. (1980). The Competing Plans Hypoth- 
esis: an heuristic viewpoint on the causes of errors in 
speech. In Hans W. Dechert & Manfred Raupach, edi- 
tors, Temporal Variables in Speech. Mouton. 
Chafe, Wallace L. (1980). The Deployment of Conscious- 
ness in the Production of a Narrative. In Wallace L. Chafe, 
editor, The Pear Stories. Ablex. 
De Smedt, Koenrad J.MJ. (1990). Incremental Sentence 
Generation: a computer model of grammatical encoding. 
Technical Report 90-01, Nijmegen Institute for Cognition 
Research and Information Technology. 
Fillmore, Charles (1989a). The Mechanisms of"Construc- 
tion Grammar". In Proceedings of the Berkeley Linguistic 
Society, volume 15. 
Fillmore, Charles (1989b). On Grammatical Constructions. 
course notes, UC Berkeley Linguistics DepartmenL 
Finkler, Wolfgang & Giinter Neumann (1989). POPEL- 
HOW: A Distributed Parallel Model for Incremental Nat- 
ural Language Production with Feedback. In Proceedings 
of the Eleventh International Joint Conference on ArU~f- 
cial lntelligence. Detroit. 
Gasser, Micheal (1988). A Connectionist Model of Sen- 
tence Generation in a First and Second Language. Tech- 
nical Report UCLA-AI-88-13, Los Angeles. 
Kalita, Jugal & Lokendra Shastri (1987). Generation of 
Simple Sentences in English Using the Connectionist 
Model of Computation. In 9th Cognitive Science Con- 
ference. Lawrence Edbaum Associates. 
Kitano, Hiroaki (1989). A Massively Parallel Model of 
Natural Language Generation for Interpreting Telephony: 
Almost Concurrent Processing of Parsing and Genera- 
tion. In Proceedings of the Second European Workshop 
on Natural Language Generation. 
Stemberger, J. P. (1985). An Interactive Activation Model 
of Language Production. In Andrew W. Ellis, edi- 
tor, Progress in the Psychology of Language, Volume 1. 
Lawrence Erlbaum Associates. 
Stolcke, Andreas (1989). Processing Unification-based 
Grammars in a Connectionist Framework. In llth Cogni- 
tive Science Conference. Lawrence Erlbaum Associates. 
Ward, Nigel (1988). Issues in Word Choice. In Proceedings 
12th COLING. Budapest. 
Ward, Nigel (1989a). Capturing Intuitions about Human 
Language Production. In Proceedings, Cognitive Science 
Conference. Lawrence Erlbaum Associates. Ann Arbor. 
Ward, Nigel (1989b). On the Ordering of Decisions in Ma- 
chine Translation. In Proceedings of the Third Annual 
Conference of the Japanese Society for Artificial Intelli- 
gence, Tokyo. 
Ward, Nigel (1989c). Towards Natural Machine Transla- 
tion. In Proceedings of the EIC Workshop on Artificial 
Intelligence, Tokyo. Institute of Electronics, Information, 
and Communication Engineers. Published as Technical 
Research Report AI89-30. 
