Parallel Distributed Processing and Role Assignment Constraints 
.lames L. McClelland 
Carnegie-Mellon University 
My work in natural language processing is based on the premise that it is not in general possible to recover the 
underlying representations of sentences wilhout considering semantic constraints on their possible case structures, it 
seems clear that people use these constraints to do several things: 
To assign constituents to the proper ease roles and attach then to the proper other constituents. 
To assign the appropriate reading to a word or larger constituent when it occurs in context. 
To assign default values to missing constituents. 
To instantiate the concepts referenced by the words in a sentence so that they fit the context. 
I believe that parallel-distributed processing models (i.e., conneelionist models which make use of distributed 
representations) provide the mechanisms that are needed for these lasks. Argument altachments and role assign- 
ments seem to require a consideration of the relative merits of competing possibilities (Marcus, 1980; Bates and 
MacWhinney, 1987; MaeWhinney, 1987), as deles lexical dlsambigualion. Conuectionist models provide a very 
natural substrate for these kinds of competition processes (Cottrell, 1985; Wallz and Pollack, 1985). 
The use of distributed representations also seems well suited to capturing many aspects of the way people 
exploit sen)antic constraints. For choosing between two distinct alternalive inlerprelatlons of a constituent, local and 
dislribnted representations may be approximately equivalent, but distributed represenlalions are much more natural 
from c~pturing contextual shading of the interpretation of a constituent, hi a dislributed representation the pattern 
of aclivation Ihat is most typically activated by a particular word or phrase can lye sublly shaded hy constralnLs 
imposed by context; there is no need to limit the choice of alternative shadings to a pre-speeified set of alternatives 
each represented by a differnt single unit. Similarly, filling in missing argumenls is not a mailer of choosing a partic- 
ular concept, but of fiifirlg in a pattern that specifies what is known about the filler, without necessarily specifying a 
particular specific concept. 
In previous work, Alan Kawamoto and I (McClelland and Kawamoto, 1986) implemented a parallel- 
distributed processing (PDP) model that can use semantic constraints to do the fi~ur things listed at the beginning of 
the article, though it was limited to processing only one clause at a lime. While it would lye possible to use such a 
mechanism clause-by-clause, semantic constrainLs are often required to decide which of several clauses a phrase 
belongs to. For example, in the sentence: 
1) John ate the cake that his mother baked at the picnic. 
we attach "at the picnic" to the main clause (as the place where the cake was ealen), whereas in 
2) John ate the cake that his mother baked in the over). 
we attach "in the over)" to the subordir)ate clause (as the place where Ihe cake was baked). Clearly these attach- 
ments depend on knowing Ihat baking can take place in ovens, not at picnics, arid ealiug can lake place at picnics, 
not in ovens; ! would also claim that the relative merits of both atlachments must be laken into account to get the 
atlaehmenls right. It seems, lhen, that a mechanism is needed lhat can consider Ihe possihilily of altaching a phrase 
to more than one possihle clause. 
This article sketches out a model that aims to achieve mtdli-clause capabilily. The model has not yet been 
fidly implemented, so the paper is quite speculative. 1 lowever, I think the model promises to lake us some disl~ance 
toward a belter underslanding of the interaction of syntactic and case-role analysis. In particular, it suggesLs that 
with the right cnr)neclionist archilecture, the four uses of semantic constraints enumerated above become intrinsic 
characteristics of the language processing machinery. 
I would like to thank Geoff tlinton, George l,akoff, Brian MacWhinney and Mark St. John for discussions of the topic of this paper and/or for 
specific comments on the first draft. Supported hy ONR contract N00014-82-C-0374, NR 667.483. 
75 
Representing structure and content. To begin, let us consider how to represent the structure of a sentence in 
a PDP mechanism. To do this, we make use of the notion that a structural description can be repesented as a set of 
triples. For example the correct role structure of Sentence 2 can be represented with a set of triples such as the fol- 
lowing: 
(P1 AGENT BOY) (Pl ACFION ATE) (P1 PATIENT CAKE) 
(P2 AGENT MOTIIER) (P2 ACFION BAKED) (P2 PATIENT CAKE) 
(P2 I,OCATION OVEN) 
An individual triple can be represented in distributed form by dedicating a set of units to each of its parts; thus we 
can have one set of units for the head of the triple, one for the relation, and one for the tail or slot-filler. Each of 
the three parts of a triple can then be represented in distributed form as a pallern of activation over the units. The 
idea of using this kind of three-part distributed representation was inlroduced by l linton (1981) to represent the con- 
tents of semantic nell; the extension to arbitrary tree structures is due to "Fouretzky and l linton (1985) and 
Touretzky (1986). 
For the fillers, or the tail of a triple, the units stand for useful characterizers that serve to distinguish one filler 
from another, llinton (1981) used the term "microfeatures" for these units; these features need not correspond in 
any simple way to verbalizable primitives. Different slot fillers produce different patterns on these units; and the 
different possible instanliatlons of a filler are likewise captured by differences in the pattern of activation on the 
units. 
For the relations, fire units stand for characteristics of the relation itself Note that this differs from most other 
approaches in treating each role or relation as a distributed pattern. This has several virtues. For one thing, it 
immediately eliminates the problem of specifying a small set of case roles, in the face of the fact that there seem to 
he a very large number of very subtle differences between roles that are in many ways very similar. Further, the use 
of distributed representations allows us to capture both the similarities and differences among case roles. The idea 
has been proposed on independent linguistic grounds, as well. 
For the head of each triple, the units stand for characteristics of the whole in which the filler plays a part. 
Thus the pattern that represents P1 is not some arbitrary pointer as it might be in a Lisp-based rcpresentalion, but is 
rather a Reduced Description of the constituent that it stands for (I linton, McClelland, and Rumelhart, 1986; Lakoff, 
personal communication). In particular, the pattern representing P1 would capture characteristics of the act of eat- 
ing and of the participants in the act. There would he less detail, of course, than in the separate representations of 
these constituents where they occur as separate fillers of the tail slot. 
,~yntaclic and case-role representations. Sentences have both art augmented surface structure representation 
and a case-role representation, in the present model, then, there are two sets of units, one that represents the syn- 
tactic structure triples, and one that represents the case-structure triples. I have already described the general form 
of the case-role triples; the syntactic Iriples would have a similar form, though they would capture primarily syntactic 
relations among the constituents. So, for example, the set of syntactic triples of Senlence 2 would be something like: 
(S1 SUBJ BOY) (S1 VERB SAW) (S1 DOBJ (SAKE) 
(CAKE MODIFIER $2) 
($2 SUBJ MOTIIER) ($2 VERB BAKED) ($2 I)OBJ T=CAKE) 
(SI LOC-PP OVEN) 
There are, correspondingly, two main parts to the model, a syntactic processor and a case-fi'ame processor (See Fig- 
ure !). In this respect, the model is similar to marly conventional parsing schemes (e.g., Marcus, 1980; Kaplan and 
Bresnan, 19821. The microstructure is quite different, however. One of the key I.hings ttlat a PDP microstructure 
buys us is the ability to improve the interaction between these two main components. 
Syntactic processing. The role of the syntactic processor is to take in words as they are encountered in read- 
ing or listening and to produce at its outputs a sequence of patterns, with each pattern capturing one syntactic struc- 
ture triple. ~ In Figure l the syntactic processor is shown in the midst of processing Sentence 2. It has reached the 
1. Note that this means that ~everai words can be packed into the same constituent, and that as the words of a constituent (e.g., "the old grey don- 
key") are encountered the microfeatures of the constituent wil| he gradually specified. Thus the representation of the constituent can gradually build 
up at the output of the syntactic processor. 
76 
~ase - ~o/e Prac.es_~o~ 
F 
/ 
4" 
/ 5y.i:o .'fi  Zv-ipk 
II 
0 M,.0,j 
{., xe_ T i le_ IJ.,'/'x 
IqiAa   
bl tt  
Figure I. A diagram of the model. See text for explanation. 
point where it is processing the words "the cake". The output of at this point should tend to aclivate the pattern 
corresponding to (S1 I)OBJ CAKE) over a sef. of units (the syntactic triple "units) whose role is to display the pattern 
of activation corresponding to the current syntactic triple. Note that these units also receive feedback from the 
case-frame processor; the role of this feedback is to fill in unspecified parts of Ihe syntactic triple, as shall be dis- 
cussed below. The syntactic triple unil.s have connections to units (Ihe case-frame triple units) which serve to 
represent the current case-frame triple. 
The connections between these two sets of units are assumed to be learned through prior pairings of synlactic triples 
and case-frame triples, so that they capture the mutual constraints on case and synlactic role assignmenls. The 
inner workings of the syntaclic processor have yet to be fidly worked out, so for now I leave it as a black box. 
The case-frame processor. Tire role of the case-frame processor is to produce art aclive representation of the 
current case-frame cortstituent, based on the pattern represenling Ihe current synlaclic consliluent on the syntactic 
triple units and on feedback from a set of units called the working memory. Tlte working memory is Ihe slruclure iH 
which the developing case-frame represenlalion of Ihe sentence is held. As conslilucnts are parsed, Ihey are loaded 
into the working memory, by way of a network called an I/O ncl. 2 Within Ihe working memory, individual units 
correspond to combinations of units in the current case-role represenlation. Tbtts, Ihe represenlalion at Ihis level is 
conjunctive, artd is therefore capable of maintaining information about which combinations of case-role units were 
activated togelher in the same case-role triple when the patterns aclivated by several triples are snperimposed in the 
working memory (see llinton et al 1986, for discussion). Of course, early in a parse, Ihe loaded constituents will 
necessarily be incomplete. 
Pattern completion. The working memory provides a persisting representation of the cottstituents already 
parsed. This representation persists as a pattern of activation, so that it can bolh constrain and be constrained by 
new constituents as they are encountered, through interactions with a final set of units, called tile hidden ease-role 
units. These units are called "hidden" because their state is not visible to any olher part of the system; instead they 
2. The l/O net is equivalent to TouretT.ky and llinton's (1985) "pull-out net". Its job is to ensure that the characteristics of only a one of the consti- 
tuents stored in the working memory are interacting with the case-rrame triple unit':,. See Touretzky and tlinton (19~5) for details. 
77 
serve to mediate constraining relations among the units in the working memory. The process works as follows. 
Connections from working memory units to hidden units allow the pattern of activation over the working memory to 
produce a pattern over the hidden units. Connections from the hidden units to the working memory units allow 
these patterns, in turn, to feed activation back to the working memory. This feedback allows the network to com- 
plete and dean-up distorted and incomplete patterns (that is, representations of sentences). The connections in the 
network are acquired through training on on a sample of sentences (see St. John, 1986, for details). The connection 
strengths derived from this training experience allow it to sustain and complete the representations of familiar sen- 
tences; this capability generalizes to novel sentences with similar structure. 
What this model can do. The model I have described should be able to do all of the kinds of things listed at 
the beginning of the paper. Consider, for example, the problem of interpreting the sentence "The boy hit the ball 
with the bat." This requires both assigning the appropriate reading (baseball bat) and the appropriate role (instru- 
ment) to the bat. The syntactic triple for this constituent (S1 with-PP BAT), would tend to activate a pattern over 
the coresponding to a blend of baseball bat and flying bat as the tail of the triple, and a blend of the possible case- 
roles consistent with "with" as the the pattern representing the relalion portion of the triple. These in turn would 
tend to activate units representing the various possible filler-role combinations consistent with this syntactic consti- 
tuent. But since the other constituents of the sentence w,~uld already have been stored in the working memory, the 
completion process would tend to support units standing for the baseball-bat as instrument interpretation more than 
others. Thus, simultaneous role assignment and context sensitive selection of the appropriate reading of an ambigu- 
ous word would be expected to fall out naturally from the operation of the completion process. 
Filling in default values for missing arguments and shading or shaping the representations of vaguely described 
constituents is also a simple by-product of the pattern completion process. Thus, fi)r example, on encountering 
"The man stirred the coffee", the completion process will tend to fill in the paltern for the completion that includes a 
spoon as instrument. Note that the pattern so filled in need not specify a particular specific concept; thus for a sen- 
tence like "The boy wrote his name", we would expect a pattern representing a writing inslrument, but not specify- 
ing if it is a pen or a pencil, to be filled in; unless, of course, the network had had specific experience indicating that 
boys always write their names with one particular instrument or another. A similar process occurs on encountering 
the container in a sentence like "The container held the cola". In such eases the constraints impo~d by other con- 
stituents (the cola) would be expected to si~ape the representation of "container", toward a smallish, hand-holdable, 
non-porous container; Again, this process would not necessarily specify a specific container, just the properties such 
a container could be predicted to have. 
l have not yet said anything about what the model would do with the altachment problem posed by the sen- 
tence "The boy ate the cake that his mother baked in the oven." In this case, we would expect that the syntactic 
processor would pass along a constituent like (S? in-PP OVEN), and that it would be the job of the case-role proces- 
sor to determine its correct attachment. Supposing that the experience the network has been exposed to includes 
mothers (and others) baking cakes (and other things) in ovens, we would expect that the case-role triple (P2 LOC 
OVEN) (where P2 stands for the reduced description of "mother-baked-cake") would already be partially active as 
the syntactic constituent became available. Thus the incoming constituent wm,ld simply reinforce a pattern of 
activation thai. already reflected the correct attachment of oven. 
Current staltt~ of the model. As I previously staled, the model has not yet been hnplcmented, and so one 
can treat the previous section as describing the performance of a machine made out of hopeware. Nevertheless I 
have reason to believe it will work. CMU connectlonists now have considerable experience with representations of 
the kind used in the cnse-fi'ame processor (l'ouretzky & llinlon, 1985; Tourelzky, 1986; l)erlhick, 1986). A 
mechanism quite like the case-frame processor has been implemented by St..Iohn (1986), and it demonstrates 
several of the uses of semantic conslraints that I have been discussing. 
Obviously, though, even if the case-frame processor is successfid l here are many more tasks that lie ahead. 
One crucial one is the development of a cormectionist implemenlation of the synlactic processor. I helievc that we 
are now on the verge of understanding sequential processes in connectionist networks (see Jordan, 1986), and that 
this will soon make it possible to describe a complete connectionist mechanism for language processing that captures 
both the strengths and limitations of human language processing capabilities. 
78 
References 
Bates, E., & MaeWhinney, B. (1987). Competition, variation and language learning: What is not universal in 
language acquisition. In B. MacWhinney (Ed.), Mechanisms of language acquisition, ttillsdale, N J: Erlbat, m. 
Cottrell, G. (1985). A connectionist approach to word sense disambiguation (q'R-154). Rochester, NY: University 
of Rochester, Department of Computer Science. 
Derthick, M. (1986). A connectionist knowledge representation system. Thesis proposal, Carnegie-Mellon Univer- 
sity, Department of Computer Science, PitLsburgh, PA. 
llinton, G. E. (1981). Implementing semantic networks in parallel hardware. In G. E. llinton & J. A. Anderson 
(Eds.), Parallel models of associative memory (pp. 161-188). llillsdale, N J: Erlbaum. 
llinton, G. E., McClelland, J. !.., & Rumelhart, I). E. (1986). Distriboted Reprcsentations. In D. E. Rumelhart, 
J. I.. McClelland, & the Pl)P research group (Eds.), Parallel distributed processing: Explorations in the micros- 
trueture of cognition. Volume I. Cambridge, MA: Bradford Books. 
Jordan, M. I. (1986). Serial order: A parallel distributed processing approach (ICS Rep. No. 8604). University of 
California, San Diego, Institute for Cognitive Science. 
Kaplan, R., & Bresnan, J. (1982). I.exical fimctional grammar: A formal system for grammatical representation. 
In I. Bresnan (Ed.), The mental representation of grammatical relations. Cambridge, MA: Mrl" Press. 
Kawamoto, A. II. (1985). Dynamic processes in the (re)solution of lexieal ambiguity. Unpublished doctoral 
dissertation, Brown University. 
MacWhinney,-B. J. (1987). The competition model. In B. MacWhinney (Ed.), Mechanisms or language acquisi- 
tion. llillsdale, NJ: Erlbaum. 
Marcus, M. P. (1980). A theory of.wntactic recognition for natural language. Camhridge, MA: MIT Press. 
McClelland, J. L. (in press.) How we use what we know in reading: An interactive activalion approach. In M. 
Coltheart (Ed.), Attention and performance XII: The psychology of reading. I.ondon: Erlbaum. 
McClelland, .I.L., & Kawamoto, A. H. (1986). Mechanisms of sentence processing: Assigning roles to consti- 
tuents. In J. L. MeClelland, D. E. Rumelhart, & the PDP research group (Eds.), Parallel distributed processing: 
Explorations in the microstrueture of cognition. Volume 1I. Cambridge, MA: Bradford Books. 
St. John, M. F. (1986). Reconstructive memory for sentences. Working paper, Department of Psychology, 
Carnegie-Mellon University, Pittsburgh, PA. 
Tourelzky, D. S. (1986). BoltzCONS: Reconciling conneclionism wilh lhe recnrsive nalnrc of stacks and trees. 
Proceedings or the Eighth Annual ConFerence of the Cognitive Science Society, Amhersl, MA, 522-530. 
Touretzky, I)., & l linlon, G. E. (1985). Symbols among the neurons: l)elails of'a counectionisl inference architec- 
Lure. Proceedings of the Ninth International Johzt Conference on Art~fi¢'ial Intelligence. 
I 
Waltz, D. 1.., & Pollack, J. B. (1985). Massively parallel parsing. Cognitive Science, 9, 51-74. 
79 
