Hopfield Models as Nondeterministic Finite-State Machines 
Marc F.J. Drossacrs 
Computer Science Department, 
University of Twente, 
P.O Box 217, 7500 AE Enschede, 
The Netherlands, 
email: mdrssrs@cs.utwente.nl. 
Abstract 
Tbe use of neural networks for integrated linguistic 
analysis may be profitable. This paper presents the 
first results of our research on that subject: a Hop- 
field model for syntactical analysis. We construct a 
neural network as an implementation of a bounded 
push-down automaton, which can accept context-free 
languages with limited center-embedding. The net- 
work's behavior can be predicted a priori, so the pre- 
sented theory can be tested. The operation of the 
network as an implementation of the acceptor is prov- 
ably correct. Furthermore we found a solution to the 
problem of spurious states in Hopfield models: we 
use them as dynamically constructed representations 
of sets of states of the implemented acceptor. The 
so-called neural-network aceeptor we propose, is fast 
but large. 
1 Introduction 
Neural networks may be well suited for integrated lin- 
guistic analysis, as Waltz and Pollack \[10\] indicate. 
An integrated linguistic analysis is a parallel compo- 
sition of several analyses, such as syntactical, seman- 
tical, and pragmatic analysis. When integrated, these 
analyses constrain each other interactively, and may 
thus suppress a combinatoric explosion of sentence 
structure and meaning representations. 
This paper presents the first results of our research 
into the use of neural networks for integrated linguis- 
tic analysis: a Hopfield model for syntactical analysis. 
Syntactical analysis in the context of integration with 
other analyses boils down to the decision whether a 
sentence is an element of a language. A parse tree is 
superfluous here as an intermediary representation, 
since it will not be finished before the complete in- 
tegrated analysis is. This fact allows us to deal with 
the problem of a restricted length of the sentences a 
neural par .... n handle, see e.g. \[51,\[7\], a problem 
that could not be elegantly solved, see e.g. \[6\],\[3\]. 
In this paper we propose a formal model that rec- 
ognizes syntactically correct sentences (section 2), a 
tlopfield model onto which we want to map this for- 
mar model (section 3), the parameters that makes the 
network operate as intended (section 4), and a way 
to map the formal model onto the Hopfield model, 
including a correctness result for the latter (section 
5). The theoretically predicted behavior of the so~ 
obtained network has been verified, and a simple ex- 
ample provides the taste of it (section 6). We alas 
consider complexity aspects of the model (section 7). 
Section 8 consists of concluding remarks. 
2 A Bounded Push-Down Au- 
tomaton 
Although it is not an t.~tablished fact, it is as- 
stoned here that natural languages are context-free, 
and consequently that sentences in a natural lan- 
guage can be recognized, by a push-down atttoma- 
ton (PDA). ilowever, we are not interested in mod- 
eling the competence of natural language users, but 
in modeling their performance. The human perfor- 
mance in natural language use is also characterized 
by a very limited degree of center-embedding. In 
terms of PDAs this means that there is n bound on 
the,number of items on the stack of a PI)A for a 
natural language. A bounded push-down automaton 
M = (Q,Y~,I',6, qs, Zo, F) is a PDA that has an up- 
per limit k E ~ on the number of items on its stack, 
i.e. H ~< k for every instantaneous description (ID) 
(q, w, a) of M. The set of stack states of this PDA is 
delined to be: QST =- {a I (qo,w, Zs) P*~t (q,e,¢~)}. 
QsT is finite: IQsT\[ <_ (IFI) ~, therefore we may de- 
fine a nondeterministic finite~state aeceptor (NDA) 
M ~ that has QST ms its set of states. 
The class of PDAs of which we would like to map 
bounded versions onto NDAs is constrained, among 
others to the class of v-free PDAs. By this constraint 
we anticipate the situation that grammars are stored 
in a neural network by self-organization. In that sit- 
nation a neural network will store e-productions only 
if examples of applications of e-productions are re- 
peatedly presented to it. This requires e to have a 
representation in the machine, in which case it fails 
to accommodate its definition. 
Another restriction we would like to introduce is to 
grammars in 2-standard form with a minimal num- 
ber of quadratic productions: productions of the form 
ACrEs DE COL~G-92, NANTES, 23-28 AOOT 1992 I I 3 PROC. OF COLING-92, NA~rEs, AIJG. 23-28, 1992 
A ~ bCD where b is a terminal and C and D are 
variables. Such a grammar can be seen as a min- 
imal extension of a right-linear grammar. Within 
such grammars, quadratic productions provide for the 
center-embedding. Since such grammars have a min- 
imal number of quadratic productions, acceptance by 
a PDA defined for such grammars requires a minimal 
use of (stack) memory, and titus generates a mini- 
real Qs'r. To maintain this minimal use of memory a 
restriction to one-state PDAs that accept by empty 
stack is also required: when a PDA is mapped onto 
an NDA, the information concerning its states is h)st, 
unless it was stored on the stack. 
An e-free, one-state PDA that simulates a context- 
free grammar in 2-standard form with a minimal 
number of quadratic productions (and that accepts 
by empty stack) satisfies all our criteria. For every 
such PDA we can define an NDA, for which we can 
prove \[4\] that it accepts the same language ms the 
PDA does. 
Definition 2.1 
Let M = ({*},Z,I',6,*,Zo,{~) be an e-free bounded 
PDA with hound k. An NDA defined for M is an 
NDA M' = (Q', Y2', 6', Q'o, f") such that: 
Q' = QST, as defined above; 
E' = E; 
6' : QsT x E ~ 2 osT is defined by: 
~'(~l, ~)= {~ I (*,~,~)~M (,, w,~)} 
Q~ : {z0}; and 
F' = {el (empty stack). 
Theorem 2.2 (correctness of the NDA) 
Let M be an e.free one-state PDA with bound k, if 
M' is an NDA a.~ defined in definition 2.1, then M 
accepts a string by empty stack tf and only if M' ac- 
cepts it by accepting state. 
In as far as a natural language is context-free, we 
claim that there is an instance of our aeeeptor that 
recognizes it. 
3 An Input-Driven Sequencing 
Hopfield Model 
In this section a noiseless Hopfield model is proposed 
that is tailored to implement NDAs on. The model is 
based on the associative memory described by Buh- 
mann et al. \[2\] and the theory of delayed synap~s 
from \[8\]. We chose the Itopfield model because of 
its analytical transparency, and its capability of se- 
quence traversing, which agrees well with the sequen- 
tial nature of language use at a pbenomenologica\[ 
level. The Hopfield model proposed is a memory 
for temporal transitions extended with externa|-input 
synapses. Figure 1 shows the architecture involved. 
Ill this network only those neurons are active upon 
which a combined local field operates that transcends 
the threshold. The activity generated by such a lo~ 
cal field is the active overlap of the temporal image of 
past activity provided by so-called temporal synapses, 
and the image of input external activity provided by 
so-called input synapses. By the temporal synapses, 
this activity will later generate another (subthresh- 
old) temporal image, so network activity may be con- 
sidered a transition mechanism that brings the net- 
work from one temporal image to another. Active 
overlaps are unique with high probability if the ac- 
tivity patterns are chosen at random and represent 
low nrean network activity. This uniqueness makes 
tile selectivity of the network very plausible: if an 
external activity pattern is presented that does not 
match the current temporal image, then there will 
not he activity of any significance; tile input is not 
recognized. 
When an NDA is mapped onto this network, 
pairs of NDA-state q and input-symbol x, such that 
6(q, x) y£ {~, are mapped onto activity patterns. Tem- 
poral relations in the network then serve to imple- 
ment NDA transitions. Note that single NDA tran- 
sitions arc mapped onto single network transitions. 
This results in complex representations of the NDA 
states and the input symbols. An NDA state is rap 
resented by all activity patterns that represent a pair 
containing that state, and input patterns are rep- 
resented by a component-wise OR over all activity 
patterns containing that input symbol. A conse- 
quence is that mixed temporal images, the subthresh- 
old analogue of mixture states, are a very natural phe- 
nomenon in this network, because tile temporal image 
of an active overlap comprises at least all activity pat- 
terns representing a successor state. But this is not 
all. Also the network will act as if it implements the 
deterministic equivalent of the NDA, i.e. it will trace 
all paths through state space the input allows for, 
concurrently. The representations of the states of this 
deterministic finite-state automaton (FSA) are dy- 
namically constructed along the way; they are mixed 
temporal images. The concept of a "dynamically con- 
structed representation" is borrowed from Touretzky 
\[9\], who, by the way, argued that they could not exist 
in the current generation of neural networks, such ms 
ltopfield models. 
A time cycle of the network can be described as 
follows: 
1~ The network is allowed to evolve into a stable 
activity pattern that is the active overlap of a 
temporal image of past activity, and the input 
image of external input for a pe~'iod tr (= relax- 
ation time), when an external activity pattern is 
presented to the network; 
2. After some time the network reaches a state of 
stable activity and starts to construct a new tem- 
poral image. It is allowed to do this for a period 
t, (= active time)', 
3. Then the input is removed, and the network 
evolves towards inactivity. This takes again 
about a period t~; 
Acids DE COLING-92, NANTES, 23-28 AOt~T 1992 1 1 4 PRO(:. OF COL\]NG-92, NANTI~S, AUG. 23-28. 1992 
4. Not before a period ta (= delay time) has passed, 
a new input arrives. The new temporal image is 
forwarded by tile slow synapses during a period 
ta +iv, starting when td ends. The slow synapses 
have forgotten the old temporal image while the 
network was in its td. 
The synapses modeled in the network collect the 
incoming activity over a period ~ + tr, and emit the 
time average over again a period ta + tr after hav- 
ing waited a period ta + t~. In the network this 
is modeled by computing a time average over prior 
neuronal activity, and then multiply it by the synop- 
tic efficacy. Tile time average ranges over a period 
(2t~ + la + 31~)/N - (l, + I~)/N. The first argu- 
ment is the total time span in the network, covering 
two active periods and an intervening delay time, in- 
cluding their transition times. The second argument 
is tile current period of network is activity, activity 
that cannot directly interfere with the network's dy- 
na\[lliCS. 
More formally the network can be described as fol- 
lows: 
5'i (5 {0,1}, wherei= 1 .... ,N, 
l 1 ifh~(t)>U 
.')'i(t+l) : 0 if hi(t)< U, 
h~(t) : h~(t)+hl(t), 
N 
j=l 
the temporal transition term, 
t P 
u all - a)N jz~=l"" - a)(~¢~' - a), 
with J. = 0, 
~(t) ~ r O- 0 L ~ Sj(t- t')w(t')dt', where 
{ 7~- ifO<t<r 
w(t) = 0 otherwise , 
h~(t) = x~(s,'(t) - a), 
the external input term, 
The Si are neuronal variables (5",'. is a neuron in an- 
other network), hi is the total input on Si, U is 
a threshold value which is equal for all Si, Jij is 
the synaptic efficacy of the synapse connecting S i to 
Si, and A is the relative magnitude of the synapses. 
The average at time ~ is expressed by ~/(~), where 
r =- (2t. + ta + 3tr)/N and ~ -~ (ta + t~)/N. The 
function w(t) determines over which period activity 
is averaged. The input synapses are nonzero only in 
case i = j. These synapses carry a negative ground 
signal -A'a, which is equivalent to an extra threshold 
generated by the input synapses. The activity pat- 
terns {~'} ({~"} ~ (~\],~ ..... ~N) ) are statist,tally 
independent, and satisfy the same probability distri- 
bution as the patterns in the model of Buhmann et 
al. \[2\]: 
...... - (1- a)b(~), where 
l if x :~ 0 
~(x) ~ 0 otherwise. 
If a ¢ ½ the pattern is biased. For N -~ co, 
I/N ~N=I ~' --* a. The updating p .... is a Monte 
Carlo random walk. 
D II~ I 
Figure 1: 7'he model for N = 3. Usually HopJield 
models consist of very many neurons. The arced ar- 
rows denote temporal synapses. The straiyht alT'ows 
denote input synapses. 
4 Estimation of Parameters 
A number of system parameters need to be related in 
order to make the model work correctly. 
Timing; is fairly important ill this network. Tile 
time the network is active (to) should not exceed tile 
delay time t~. If it does then ta+lr > ta+tr, and since 
no average is computed over a period ta + tr back in 
time, not the fldl time average of the previous activity 
need to be computed, consequently we choose ta < ta. 
The choice for a transition time tr depends on tile 
probability with which one requires the network to 
get in the next stable activity state. This subject will 
be dealt with in section 7. 
In the estimates of A t and the storage capacity be- 
low, an expression of the temporal transition term 
in terms of the overlap parameter m o is used, which 
will be introduced here first. The overlap parame- 
ter rnP(t) measures the overlap of a network state 
{S} ~_ ($1,$2,...,SN) r at time t with stored pat- 
tern {~P}, and is delined by: 
N 1 
~(~' - .)s',(t). mo(t) = ~-N~=I 
m" E i-a, I - a\]. The expression for the temporal 
transition term is: 
N h~(t) 
= ~slj~j(1). 
j=l 
Acaqis OE COLING-92, NANTES, 23-28 ao~r 1992 1 1 5 PROC. Of COLING-92, N^NTgs, AUG. 23-28, 1992 
Assuming that N --* co while p remains fixed this is, 
after expansion of the Jq and ignoring infinitesimal 
terms, approximated by: 
At p N 
h:(t) - a(1-a)N Z(~+l-a) Z(~f-a);j(t) 
t~=l j=l 
t I° A' ~,-.tf¢+ , 
= (1 - a) ,=z'-~l'" - a)t~l"(t), where 
r - O ~o°° m"(t) ~ ~ m"(t- t')w(t')dt'. 
If the temporal image is {~¢~} then h~ is about (N 
co): 
h~(t) = ~,(~+l _a)(r_ O) w(t)dt. 0 
If a number of patterns in a mixture state have 
the same successor, that pattern may be activated. 
To prevent this A ~ will be chosen such that the slow 
synapses do not induce activity in the network au- 
tonomously, not even if all the neurons in the network 
are active. On average, the activity in the network 
is given by the parameter a. The total activity in a 
network is a quantity x such that z = 1/a, so what 
we require is that xh~ < U, i.e. that: 
a~(~+~ - .)(r - 0) w(t)dt < U. aO 
The interesting case is (i "+1 = 1. Since the integral 
is at most O/(r - O) which is the strongest condition 
on the left side, the left expression can be written 
as )d(1 - a)/a. It was earlier demanded that only 
a combined local field can transcend the threshold, 
which implies that external input .~e(1 - a) < U, so 
we can take A ~ < A~a safely. This is small because a 
is small. 
Next a value for the threshold that optimizes stor- 
age capacity is estimated by signal-to-noise ratio anal- 
ysis, following \[1\] and \[2\], for N,p --* oo. Temporal 
effects are neglected because they effect signal and 
noise equally. It is also assumed that external input 
is present, so that the critical noise effects can be 
studied. In this model the external input synapses 
do not add noise, they do not contain any informa- 
tion apart from the magnitude of the incoming signal. 
Now suppose the system is in state {S} = {~}. The 
signal is that part of the input that excites the current 
pattern: 
s = A'(~, ~-a). 
The noise is the part of the input that excites other 
patterns. It can be seen as a random variable with 
zero mean and it is estimated by its variance: ~t v'~-a 
where a = p/N. We want that given the right input 
h i > U, if both the temporal and the external input 
excite Si, and that hi < U if the temporal input does 
not excite Si. This gives signal-to-noise ratios: 
Pt = (,V+At)(1-a)-U, and 
U + ,Va - ,~'(1 - a) PO = 
,v vCg~ 
Recall is optimal in case Po = Pl which is true for a 
threshold: 
Uor,, = At(1-a)+,V(½-a). 
Substituted in either P0 or Pt it results in Pore = 
l This result is the same as obtained by Buh- 
mann et al. \[2\], and they found a storage capacity 
ct c .~ -(ainu) -1 where ac = prna,:/N. The stor- 
age capacity is large for small a, so a will be chosen 
a << 0.5. A last remark concerns an initial input for 
the network. In case there has been no input for the 
network for some time, it does not contain a temporal 
image anymore, and consequently has to be restarted. 
This can be done by preceding a sequence by an extra 
strong first input, a kind of warning signal of magni- 
tude e.g. A" + At. This input creates a memory of 
previous activity and serves as a starting point for 
the temporal sequences stored in the network. 
5 Neural-Network Acceptors 
In this section it is shown how NDAs from definition 
2.1 can be mapped onto networks as described in sec- 
tions 3 and 4. Such networks can only be used for 
cyclic recognition runs. Where "cyclic" indicates that 
both the initial and the accepting state of an NDA 
are mapped onto the same network state. If this were 
not done, the accepting state is not assigned an ac- 
tivity vector, since no transition departs from it, see 
definition 5.2 below. Cyclic recognition in its turn 
can only be correctly done for grammars that gen- 
erate end-of-sentence markers. Any grammar can be 
extended to do this. 
An NDA is related to a network by a parameter 
list. 
Definition 5.1 
Let M = (Q, ,6, Qo, F) be an NDA. A parame- 
ter list defined .for an NDA M is a list of the form 
(a, A ~, A t , N, p, prnaz, ta, td, tr, U), where: 
1. a 6 \[0,1\] c ~t; 
2. t~ < td, with ta,td 6 ~I; 
3. A ~ 6 ~+ where ~+ = {x }x E  Ax > 0}; 
4. 0< A t <,Va; 
5. p = ~q.q'eQ I {x 6 ~Clq' e ~f(q,x)} I x 
I {Y 6 ~, 16(q',y) # 0} I; 
6. Pmax >~ P; 
7. N > (-alna)p ..... ; 
Ac-t~ DE COLING-92, NANTES, 23-28 AO6"r 1992 1 1 6 PROC. OF COLING-92, NANTES, AUG. 23-28. 1992 
8. G: see section 7; 
9. U=M(1-a)+M(½-a). 
Note that there arc an infinite number of parameter 
lists for each NDA. 
The mapping of an NDA onto a network starts by 
mapping basic entities of the NDA onto activity pat- 
terns. Such a pattern is called a code. 
Definition 5.2 
Let M = (Q,E,ti, Qo, F) he an NDA, let t" : 
(a,A e,M, N,p, pma,,ta,ta,tr,U) be a parameter list 
defined according to definition 5.1. The coding \[unc- 
tion c is given by: 
c : QxE~{0,1} N, 
such that for q E Q, and x E :E: 
1. if 6(q, x) :/: 0: 
c(q,x) = {~}, 
where ~i is chosen at random from {0, 1} with 
probability distribution 
r'(~) = a6(~ - I) + (t - a)~(~,), a.d 
2. undefined otherwise. 
The set of codes is then partitioned: into sets of ac- 
tivity patterns corresponding to NDA states, and into 
sets of patterns corresponding to input symbols. 
Definition 5.3 
Let M = (Q,E,6, Qs, I") Ire an NDA, let P = 
(a, M, A t, N, p, PmaJ:, In, td, tr, U) be a parameter list 
defined according to definition 5.1. 
The set Pq of activity patteT~s for q E Q is: 
G = {c(%z) l x e ~}. 
The set P:: of activity patterns for" x E E is: 
P, = {c(q,x) I q E Q}. 
Then a network transition is defined ms a matrix op- 
erator specified by the network's storage prescription, 
and related to NDA transitions using the previously 
defined partition of the set of codes. 
Definition 5.4 
Let M = (Q,~,6, Qo, F) bc an NDA, let P = 
(a, M, A t , N, p, Pma*, ta, t d, G, U) be a parameter list 
defned according to definition 5.1. and let a be an 
N-dimensional vector, with each component a. The 
set Tr of network transitions is: 
At t 
Tr = {d(¥,q,x) \[ d(q',q,x) = a(1 - a)N 
E (c(q',y) - a)(c(q,x) -- a) T A 
c(q',y)eP u, 
qt E ~(q, x)}, 
where each jt is an N x N matrix. 
This suffices to define a neuralonetwork acceptor. 
Definition 5.5 
Let M = (Q,E,6, Qo, I") be an NDA, let P = 
(a, A*, A*, N, p, Pmax, to, td, tr, U) be a parameter list 
defined according to definition 5.1. A neural-network 
acceptor (NNA) defined for an NDA M that takes its 
parameters from P is a quadruple H = (T, f, U, S), 
where: 
1. tim topology T, a list of neurons per layer, is: 
(N); 
2. the activation fimction f is given by: Si = 
1 if E~J/j.~j+A'(S~-a)>U 
(} if EjJej~J+A'('d~-a)<-U' 
3. the update procedure is a Monte Carlo random 
walk. 
4. the synaptic coefficients are given by: 
'It =: E jl (q,.q.~) C Tr, and 
je .: A~I where I is the identity matrix. 
In order to construct activity patterns that can serve 
xs external input x for the network a component-wise 
OR operation is performed over the set P~ as defined 
in definition 5.3. 
Definition 5.6 
7he OR operation over a set t½ of activity patterns 
is specified by: 
1. oa({~v}, {U}) = {OR(~/',~)} ; 
~. oit({¢},..., W'}) :~ 
oft.(K'}, ()lt({~q ..... {~" })) ; ~ud 
a OR(P~) = OR({~ ~} ..... {C'}) if 
• e~ = {W} ..... {~"}}, 
At last a formal definition can be given of a temporal 
image a.s the set of all activity patterns for which 
there is a network input that makes such an activity 
pattern the network's next qua.st-stable state. 
Definition 5.7 
Let M = (Q,E,6, Qc, F) I)e an NDA, let P = 
(a, M, M, N, p, p,.,,, t,, ta, it, U) be a parameter list 
defined according to definition 5.1, and let H = 
(T, f, U,S) be an NNA defined according to defini- 
tion 5.5 that takes its parameters from P. A temporal 
image is a set: 
{c(q, x)\[ input OIt(P~) for H implies 
m ¢(q'~) = 1 -- a}, 
a set Pq, is a temporal image of a quasi-stable stale 
{S} = c(q, x) of H if and only if J~q,,q,,:) is a transition 
of H. 
Now that we have a neural-uetwork acceptor, we triay 
also Wahl to use it to judge the legality of strings 
against a given grammar with it. 
ACTE's DE COLlNG-92, NANTES, 23-28 AO~I' 1992 1 1 7 I'ROC. OV COLING-92, NANTES, AUCl. 23-28, 1992 
Definition 5.8 
Let M = (Q,E,6, Qo, F) be an NDA, let P -- 
(a, M, A t, N, p, Pm~x, is, Gt, G, U) be a parameter list 
defined according to definition 5.1, let H = 
(T, f, U, S) be an NNA defined according to defini- 
tion 5.5 that takes its parameters from P, and let 
ai E E, q E Q0, and q' E F. H is said to accept a 
string w = al " " " an if and only if H evolves through 
a series of temporal images that ends at Pq, if started 
at Pq, if OR(P~),..., OR(Pa,) appears as external 
input for the network. 
Next the correctness of an NNA is to be proven. 
Since an NNA is essentially a stochastic machine, this 
raises some extra problems. What we propose is to 
let various network parameters approach values for 
which the NNA has only exact properties, and prove 
that the uetwork is correct in the limit. Those "vari- 
ous network parameters" are defined below. 
Definition 5.9 
Let M : (Q,E,6, Qo, F) he an NDA, let P = 
(a, M, M, N, p, P,na,,, ta, t,t~ t,., U) be a parameter list 
defined according to definition 5.1. A list of large 
parameters is a parameter list P such that: 
1. a =_ cl/N where cl << N is a constant; 
2. ~ ~ c~N/cl, where c2 is a small constant; 
3. 0 < M < A~a,i.e. 0< M < c2; 
4. Pm,~" -- -{alna)-l N; 
5. N ~oo; 
6. t~lN ~oo. 
The following lemma states that for neural-network 
acceptors that take their parameters from a list of 
large parameters both the probability that the net- 
work reaches the next stable state within relaxation 
time, and the probability that only the patterns that 
are temporally related to the previous activity pat- 
tern will become active, tend to unity. Essentially 
it means that such networks operate exactly as pre- 
scribed by the synapses. Such networks are intrinsi- 
cally correct. 
Lemma 5.10 intrinsical correctness 
Let M = (Q,~2,8, Qo, F) be an NDA, let P = 
(a, A', ~t, N, p, pm,::, ta, ta, tr, U) be a parameter list 
defined according to definitions 5.1 and 5.9, and let 
H = (T, f, U, S) be an NNA defined according to def- 
inition 5.5 that takes its parameters from P, then H 
is such that: 
1. for all neurons Si in H, P(SI is selected) ~ 1 
during network evolution; and 
2. /or all actwity patterns {c} ~ Upu, y ~ QuZ, 
if l~ 7 £ ", then P(~,~ = ~ = 1) ~ O, where 
i=l,...,N. 
Then the correctness of an NNA follows. 
Theorem 5.11 (correctness of the NNA) 
Let M = (Q,Z,6, Q0, F) be an NDA, let P = 
(a, A', A t, N, p, p,,o~, t,,, ta, G, U) be a parameter list 
defined according to definitions 5.1 and 5.9, let H = 
(71, f, U, S) be an NNA defined according to definition 
5.5 that takes its parameters from P, and let w E E +, 
then the probability that M accepts a string w if and 
only if H accepts w, tends to unity. 
The proof of the theorem is given in \[4\]. 
6 Simulation Results 
As an'example we constructed au NNA that accepts 
the language generated by a grammar with produc- 
tions: 
S' ~ theBE, 
B ~ naanSV \[ womanSV \[ babySV \[ 
mauV J womanV I bahyV, 
S ~ theB, 
E~t, 
V ~ saw I cried I comforted. 
It takes its parameters from the llst: 
(0.05,1.5,0.07,800,64,6.68N,5,5,5,1.46). It was tested 
with tile sentence ". tile baby the woman comforted 
cried ." The preceding full stop is a first input that 
awakens tile network. The graph below shows the 
time evolution of the network. 
c(BS,wmm) 
cS~VR.~by) 
©Cave) 
c(SVVE,~) 
c~VVE,~) 
c(V V V r~co~roma) 
c(VVVR~d) 
~(VVV~w) 
,~v Vl~xaD/~ 
~vvE,~w) 
c(S'.~) 
0 i 2 ~ d ~ 6 7 
INPUTS 
Plot of the system dynamics. 
7 Complexity Aspects 
If a neural-network acceptor h~.s to process a sequence 
of n input patterns, it (worst ease) first has to con- 
struct its initial temporal image, when awakened by 
an initial input (that is not considered a part of the 
sequence), and then has to build n further tempo- 
ral images. The time required to process a string of 
AC'rE.S DE COLING-92, NaNTEs, 23-28 Aor~r 1992 1 I 8 PgOC. OF COLING-92, NAWrES. AUG. 23-28, 1992 
length u as a function of the length of the input se- 
quence is thus (T -- O)(n + 1). The constant r al~ 
depends oti t,. which is chosen to let the network sat- 
isfy a certain probability that it reaches the next state 
in relaxation. This probability is given by (1 - .~/)o 
where B = tr/N. The time complexity of the neural- 
network acceptor is O(n). 
The upper limit on the number p of stored tem- 
poral relations between single activity patterns is 
\[ Q 1:~ x I ~3 \[2. The number of neurons in a net- 
work is then cx { Q \]e × \] E 17, where e depends on 
the storage capacity and the chosen (low) probabil- 
ity that selection errors occur. The randomly chosen 
activity patterns overlap, so if a large number of pat- 
terns is active they may constitute, by overlap, other 
unselected activity patterns that will create tlreir own 
causal consequences. This is called a selection er= 
rot. The probability that this can happen can be 
estinrated by l'~,.,,,~(n) -~. 1 - 1'(,'¢, = 0), where the 
latter is: 
(-l - 2np ' 1.-2,,i)~ 
In this expressi .... P ---- ( ..... ,)/v wh .... -_- (~ ), ,: is 
the nmnber of activity patterns stored in the network, 
and m is the number of patterns that were supposed 
to be present in the mixture state. The probability 
q = 1 - p, and ,1 :_~ (aN) is the number of patterns 
that can bc constructed fi'om the aetiw~ neurons in 
the mix. S,, is the mnnber of wrongly selected activ- 
ity patterns for a given n. l),rror(n) decreases with 
increa~ing N if the other parameters remain tixed. 
The space complexity of the network, exprc&sed as 
the nnmber of neurons, and as a function of the num- 
ber of NI)A states is O(\[ Q 17). This is large because 
Q = Qs'r <1 F I ~ for some PDA M. tlowever things 
conld have been worse. Not using mixed temporal 
images to represent FSA states would necessitate the 
use of a mnnber of temporal images of order 2 IQ'I~, 
So compared to a more conventional use of lloptield 
models, this approach yields a redaction of the space 
complexity of the network. 
8 Conclusions 
We proposed an receptor for all context-free lan- 
guages with limited center-embedding, and a suitable 
variant of the Ilopfield mode\]. The formal model was 
implemented on the lloptield model, and a correct= 
uess theorem for the latter was given. Simulation re- 
stilts provided initial corroboration ofonr theory. The 
obtamed neural-network receptor is fast but large. 
Continuation of this research in the near fnture 
consists of the design of an adaptive variant of this 
mode\[, one that learns a grammar from examples in 
an unsupervised fashion. 
Acknowledgements 
1 am very grateful for the indispensable support of 
Mannes Poel and Anton Nijholt in making this paper. 
Special thanks go out to Bert llelthuis who wrote 
tile very beautiful simulation program with which the 
theory was corroborated. 
ACTES DECOL1NG-92, NANri!s, 23-28 Aofn" 1992 1 1 9 I>ROC. OF COLING-92, NANIT~S, AUG. 23-28, 1992 

References 

\[1} 1).J. Amit. Modcltag Brain FunctiOn. Cambridge 
University Press, (~ambridge, 1989. 

\[2\] J. iluhmann, R. Divko, and K. Schulten. Asso- 
ciative memory with high information content. 
Physical Review A, 39(5):2689-2692, 1989. 

\[3\] E. Charniak and E. Santos. A connection|st 
context-free parser which is not context-free, but 
then it is not really connection|st either. In Pro- 
eeedinys 91h Ann. Couf. o\]' the Cognitive Science 
Society, pages 70 77, 1987. 

\[4\] M.F.J. l)rossaers. Neural-network aceeptors. 
"l~chuical Report Memoranda lnformatica 92-36, 
University of Twente, Enschede, 1992. 

\[5\] M.A. Fanty. Context=free parsing in connection- 
|st networks. Technical Report TR 174, Univer- 
sity of Rocbester, Rochester, NY, 1985. 

\[6\] 11. Nakagawa and T. Mort. A parser based on 
connection|st ntodel. In Proceedings oJ: COLING 
'88, pages 454--458, 1988. 

\[7\] B. Sehnan and G. llirst. Parsing as an energy 
minimization problem, in E. Davis, editor, Ge- 
netic Algorithms and Simulated Annealing, Re- 
search Notes in Artzficial lntelligence, pages 141- 
154, Los Altos, Calif., 1987. Morgan Kaufinan 
Publishers. 

\[8\] II. Sompoiinsky and 1. Kanter. Temporal a.s.so- 
elation in asymmetric neural networks. Physical 
Review Letters, 57(22):2861-2864, 1986. 

\[9\] I).S. Touretzky. Connection|sin and composi- 
tional semantics. Technical Report CMU-CS- 
89-147, Carnegie Mellon University, Pittsburgh, 
1989. 

\[1(\]\] I).L. Waltz and J.B. Pollack. Massively parallel 
parsing: A strongly interactive model of natu- 
ral language interpretation. Cognitive Science, 
pages 51- 74, 1985. 
