Analysis of Scene Identification Ability of Associative Memory with 
Pictorial Dictionary 
Tatsuhiko TSUNODA *, I-Iidehiko TANAKA 
Tanaka Hidehiko Laboratory, Departlnent of Electrical Engineering 
Faculty of Engineering, University of Tokyo, 7-3-1 Hongo, BunkyoqCu, Tokyo 113, Japan 
{tsunoda,t an aka} Qmtl. t. u-tokyo, ac.j I) 
Abstract 
Semantic disambiguation depends on a process of 
defining the appropriate knowledge context. Recent 
research directions suggest a eonnectionist approach 
which use dictionaries, but there remain problems of 
scale, analysis, and interpretation. IIere we focus on 
word disambiguation as scene selection, based on the 
Oxford Pictorial English Dictionary. We present a re- 
sults of a spatial-scene identification ability using our 
original associative mcmor~j, We show both theoretical 
and experimental analysis, based on a several different 
measures including information entropy. 
1 Introduction 
'the difficulty of semantic disambiguation in natural 
language processing originates with the complexity of 
defining disambiguating knowledge contexts (Barwise 
J. and Perry J., 1983). These knowledge contexts 
must provide unique interpretations for co-dependent 
words, and help resolve "semantic garden path" se- 
quences. For example, in "John shot some bucks,"a 
unique reading requires semantic agreement on "shot" 
and "bucks," suggesting either a hunting or gambling 
context. The semantic garden path can be illustrated 
by prefixing the above sentence with "John travelled to 
the woods," which might suggest the hunting context, 
but then appending "The illegal csmino was hidden far 
from town," to dramatically change the interpretation 
suggested by the first two sentences. 
The core of the problem is the disciplined and dy- 
namic construction of a disambiguating kvowledge 
context. While it might be possible to write static 
rules which provide disambiguating information in the 
context of complete knowledge, such rulmhased mod- 
els are both time and space inefficient. 
Recognizing these problems, Waltz D.L. and Pollack 
J.B.(1985) and Cottrell G.W.(1989) proposed a f,~sci- 
hating connectionist approach, which uses early ideas 
from semantic networks to resolve semantic ambiguity 
*Supported by the Fellowships of the Japan Society for the 
Promotion of Science for Japanese Junior Scientists 
by dynamic spreading activation. This spreading acti- 
wttion construction of disambiguating context is based 
oil a high density associative cognitive model, but 
still has problems: (1) no automated learning method 
to adaptively construct the model, (2) non-scalable, 
and (3) no method of confirming hypothesized dis- 
ambiguation. Shastri L.(1988) proposes a similar 
structure, which uses st statistical semantic network. 
Sharkey N.E. (1989) has proposed a system for process- 
ing script-based narratives based on combining local 
representation and relaxation techniques with ImrMlel 
distributed learning and mapping mechanisms. Mi- 
ikkulainen's system DISCERN(Miikkulainen R., 1993) 
is also suggestive of adaptive processing, and uses self- 
organizing representation of words and memory de- 
pending on semantics. However, all of these models 
share the problems enumerated above. 
Research directions for improvements suggest the 
use of existing collections of machine-readM~le dictio- 
naries. Ilecently, Nishlklmi M. et al. (1992) has pro- 
posed a new relationship between language acquistion 
and learning based on scene analaysis. Furthermore, 
Bookman L.A.(1993) has proposed a scalable architec- 
ture for integrating ~tqsociative and semantic memory 
using a thesaurus. Based on this idea of using existing 
sources of word meanings, Veronis and Ide (Veronis .1. 
and Ide N.M., 1990; Ide N.M. and Veronis J., 1993) use 
sew~ral dictionaries and to improve the ratio of words 
disambiguated to ambiguous words. 
In addition to ideas for the source of disambiguat- 
ing knowledge, many researchers have incorporated 
some kind of preference heuristics for improving tl,e 
efficiency of determining disambiguating constraints. 
Although these methods are essential for semantic pro- 
cessing they lack any coherent method for (1) evaluat- 
ing performance, and (2) acquiring new disaml)iguat- 
ing knowledge from real-world sensors. 
Of course all of these l)roblems result from the com- 
plexity of defining appropriate disambiguating knowl- 
edge contexts. To help control and reduce this com- 
plexity, Kohonen T.(1984) has suggested the cla.ssifica- 
tion of dlsambiguating information into flmr types: (1) 
spatial contact, (2) tenqmral contact, (3) similarity, 
(4) contrast. Kohonen also emphmsizes the existence 
310 
of a contextual background in which primary percep- 
tions occur, but we clMm that this kind of information 
<:an be expressed in the existing four types. 
The previous approaches noted above can all be 
interpreted as using a complex mixture of the infor- 
mation types proposed by Kohonen. This coml>lex- 
ity makes it very difficult to identify or create a sta- 
ble mo<lel of learning the appropriate <lisan,biguating 
knowledge from the real world. 
Our original contribution here is to propose a lmsie 
method of word disambiguation b~med on spatial scene 
identification, and to provide a detaile<l analysis of its 
performance. The disambiguating knowledge is repre- 
sented in the form of a stochastic ~msociative memory, 
constructed fi-om the ()xford Pictorial English Dicti<>- 
nary (OPED). This l>ietorial dictionary claims to l>ro: 
vide word sense meanings for most ordinary lift.' scenes. 
The process of disambiguation is modelled as <leter- 
mining a unique mapping fi'om ambiguous input wor<ls 
to a particular l>ietorial <lictionary scene as modelle<l in 
the ~msociative menmry. The simple representatiml of 
pietorial knowledge. I)~med (m the OPED makes analy- 
sis simpler, and provides a potentially smooth (:onnee- 
tion to visual sensory data. 
2 Scene Identification 
In order to identify spatial scenes lmsed on inl)ut sen- 
tenees, some kind of information <>f detining each seell(~ 
must exist. As exph'dned in the OPEl), "The dictio- 
nary is edited regarding the depiction of (weryday ob- 
jects and situations, in order to allow greater scope 
for the treatment of these, objects and situatiovs in 
the context of English-speaking countries" \[from l;'of 
ward in OPED\]. Each scene or pictorial entry i~, the 
OPED accompanied by a word list of entries f,'om the 
scene (see next section). This bu,ldle of infi)rmation is 
the basis for organizing our associate memory model. 
2.1 Constraints 
Here we ~msume some constraints on the method of 
representing and using the OPED scenes: 
• Only ordinal livivg scenes (384 scenes in(:lu(ling 
thousands of subseenes) are handled. All scenes 
are hypothesized to be eonstructable by combina- 
tions of these scenes. 
® Most of the words in OPEl) are noun terms ae- 
eoml)anied by adjective terms. In this system, 
spatial-seenes are identified by using only these 
words. No syntactical information is used. 
• Compound words are dec<)mposed into primitiw'. 
words. 
• The associative memory luus the ability to incre- 
mentally learn, but our analysis here uses a tixed 
set of scenes and words. 
.................................. ,Saqu~tlal mymbol Direct Logical 1 \[ r ...... t .... uo 
I \[/0 I 12 I~tmc con ado on 
............ ","~ " " ~____ \[3, Changa fo~:ut "?Z~-£oglcal 
processldgT 
- -UK\]= 1 --°:°r 
Figure l: PI)AI&CD architecture 
Ambiguous Dlsamblguated 
Figure 2: ,qtrueture of OPED an{1 diagram of 
PDAI&CD 
* Morphoh>gical analysis is done by using the elec- 
tronie dictionary of Japatl Electronic Dictionary 
Resear<:h institute (EDR). 
2.2 PDAI&CD and WAVE 
The spatial scene identification system analyzed in this 
paper is one moduh' of a general infi'rence architec- 
ture called l'aralM l)istributed Associatiw." Inference 
and Contradiction /)etection (PDAI&CD)(Tsunoda 
'\['. and 'Fanak;t l\[., 1993), which uses an :msociatiw~. 
memory WAVE('\['sunoda T. an(\[ Tanaka H.) lmsed on 
neural networks and a logical veritieation system. We 
haw~ previously presented itll application of that archi- 
tecture to semantic ¢lisambiguation (Tsunoda T. and 
Tanalat II., 1993). It features a eognitive model of fast 
disambiguation depending on context with bottom-up 
associatiw:, memory together with a nmre precise top- 
(lown feedba(:k process (Fig.l). After one scene is se- 
lected by previously inlmt words, the system can dis- 
ambiguate meaning of following words (as in the right 
side of Fig.2). In the. future, we plan to combine natu- 
ral language proce.ssing with visual image from sensory 
data. Our representation of the spatial data fi'om the 
OPED is considered to be a simplest approximation of 
such visual sensory images. 
311 
Table 1: Examples of semantic disambiguation 
Ex. 
1 
2 
Ambiguous Sentence # Classilied Meaning 
word (Context) scene of word 
ball Billiards 
lead 
(a) 
(a) 
(b) 
Carniwd 
Kitchen 
Atom I 
globe 
dance 
cord 
metal 
2.3 Semantic Disambiguation 
Words in OPED have ditferent meanings correspond- 
ing to their use in ditferent scenes. When a set of am- 
biguous words uniquely determines a scene, we con- 
clude that the words have been successfully disam- 
biguated. We acknowledge that many other processes 
may be involved in general word sense disambiguation, 
but use this scene-selection sense of word sense (lisain- 
biguation from here on. 
We illustrate typical two examples below. The sys- 
tem with OPED and our associative memory can re(:- 
ognize these sentences and classify into each scene in 
the dictionary. Once a scene is identified, it assigns 
each ambiguous words uniquely. We call it semantical 
disambiguation of words here. The correspondances of 
the sentences and each meaning of word is summarized 
in Table.1. 
1. ball 
(a) 
(b) 
Tom shot a white cue ball with a cue. The 
ball hit a red object ball and he thought it's 
lucky if it will ... 
Judy found that she was in a strange 
world. Devils,dominos,pierrots,exotie girls, 
pirates,.•, where am I? 'Oh!', she said to her- 
self, a.s she found she wandered into a ball, 
2. lead : 
(a) It's not sufficient to shield only by the lm- 
thick concrete• The fission experiment re- 
quires additional 10cm-thick blocks of lea<l. 
Fission fragments released by the chain reac- 
tion of... 
(b) He said to his son, "Please pull out the plug 
of the coffee grinder from the wall socket. Be 
careful not to pull by the lea<l. Ituum...here 
I found the kettle."... 
Our system is able to disambiguate each meaning in 
these examples actually. 
3 Representation and Process- 
ing Theory 
~ ::: ......... :...x, ...... ::~ i ~: :~,g ! ::~+ ~zi~;~:~;: iL.'~ i: : ,,',~ 
if:t .......... :~ .:2:?:'~ ">.':"5-% 11711 words, 384 scenes 
wall 0,01\ 
units o.o04~N, 
side 0.008~----~ 
wall O.Ol--~all' ~',:,~ ;~ I 
bookself 07251// 
row 0.7///...: ~--~l • :...' 
: • i. : 
Figure 3: laving room scene and link example on the 
associative memoryWAVE 
Figure 4: Weight of links and category selection 
3.1 Representation of OPED 
The Oxford Pictorial English Dictionary(OPED) h,~s 
very simple form of text and picture (Fig.3). In this 
example, the upper part is a picture of a living room 
scene, and the lower part consists of words of corre- 
sponding parts as follows: 
i wall units 
2 side wall 
3 bookself 
OPP;I) has originally a hierachlcal structure of catego 
rization (as in the left side of Fig.2), but we use the 
middle level of it (shaded part in the figure), which is 
most easily interl)retal~h!. 
To llrovide the associative memory model for l)ro - 
cessing words and selecting scenes, we, encode the 
OPED entries in tile WAVE model ms depicted in 
Fig.3. The weights between scene elements are au- 
tomatically learned during tile constructiou of the as- 
sociative memory. 
3.2 Simplified Model of Associative 
Memory WAVE 
The aim of using m~sociative memory for identifica- 
tion is to select tile most likely scene based on incom- 
plete word data from sentences. Ii and Ci are set to 
be elements of input space SI, scene space So:, respec- 
tively, in an ideal state, the approl)riate scene Ci is 
312 
mfiquely indexed by z~ssociation from a complete input 
vector: Ii A Ci. 
In the typical situation, however, the complete index 
is not provided and we require a way of ranking cam- 
peting scenes by defining a weighted activation value 
which depends on the i)artial inlmt, or set of ambigu- 
ous words, as follows: 
Ci = f(EWiflJ) (1) 
J 
1 
f(x) - (2) 1 + e-~' 
(a) 
where the weight of each compone.nt is given bythe 
conditional probability value 
W~j - P(Cil6 ) (4) 
A maximum-likelihoad scene is selected by a winner- 
take-all network: 
c. = .,filed (5) 
This type of assaeiative meinory has following fea- 
tttres: 
• Unlike correlative models (Amari S. and Maginu 
K., 1988), neither distortion of pattern nor pseudo 
local minimum solutions arise from memorizing 
other patterns. 
• Memory capacity is O(mn) compared to O(n "2) 
of correlative Inodel, where m is average immber 
of wards per scene, and n is the total number af 
possible words. 
• Unlike back-propagation learning algorithms, in- 
cremental learning is l)ossilflc at any time in 
WAVE. 
3.3 Recalling prol)ability and estima- 
tion of required quantity of infof 
mation 
Tile me`asure of scene selectivity is reduced to tile con- 
dition whether given words are unique to the SCelle. If 
all input words are cOlnlnon to l)lura\] scenes, they can 
not determine the original scene uniquely. For exam- 
pie, tile system can not determine whether to choose 
category CA ar CB only by seeing element q}' in Fig.4. 
If 'a' or tile set {a, b} is given, it is able ta select CA. 
Here we estimate the selectivity by the ratio of suc- 
cessfld cases to all of possible cases ,as follaws(n is the 
mlml}er of total elements, k is the number of elements 
related to each scene, aim m is the total number of 
scenes; incomplete information is dellned as a partial 
vector of elements number s (0 < s < k)). 
Tile pral)ability that s elements are shared si,nulta- 
neously by two patterns is 
kCs-t n-kCk.-.s-1 v(,,, k, ~) = (~) 
n Ck 
Ta extend this probal)ility to generalized cases of 
m patterns, we use the munber s of elements of the 
(1)artial) input vector. It can be estimated by counting 
the negative ease where illore thall one pattern shares 
elelllents. 
1'(.,~, k, ,~, ,)0 (r) 
= (~v(,,,<,.)) ..... '-r(,~,k,~-~,,,0 (s) 
m - 2 
= (v, - p~)(~ 7,~I,: "'-~-~) (9) 
q~0 
m--2 
= vo,~ ) (m) 
q=:0 
v,= v(n, <,.), 7,~= v(,, k,,.) 
r::l r=l 
The results using this formula are shawn hi the next 
section. 
3.4 Infornmtion Entropy 
As an alternative method of ewduation of spatial- 
see.he information of aPED, we consider here self- 
information entropy and mntual-informatian entropy 
along with the information theory of Shannon 
C.E.(19,t8). 
* Self-lnformation entropy: 
Fig.5 illustrates a talking scene. Although 
sentences involving many ambiguous wards are 
handed fr<>m the speaker to the listener, the lis- 
tener can disambiguate them with some kind of 
knowkedge common to these people. Conversely, 
the listner can determine scene 1)y the hande<l sen- 
tences. The entropy of scene selection ainbiguity 
is reduced by the interaction. We can define a con- 
cept of self-infarmation (SI) af the spatial-scene 
idetification module as the entropy of ainbiguous 
words or scenes. Assuming equal probalfility to 
the scene selection with no harmed ward, the en- 
tropy of the spatial-scene identitication can be cal- 
cualted. 
S lo -- - E I)( C J ) l"g2 I)( C J ) : log:, 38,1 = 8.59bits 
J 
After the identiticatian, the meaning of eact, word 
can be selected according to each a selection dis- 
tril)ution flmctian updated by the Bayesian rule. 
S.\[1 = CE(C I X ) (11) 
= < -~rj~l,,~l'j~ > (12) 
ji 
r'ji = r(Cj I "i) = P(~'i I @) (13) 
Each P,j is equal to Wij as in Eq.(2). <> repre- 
sents ensemble average over each xl. 
31,3 
sentences 
Listener I 
__L_._ 
Spatial Scene 
common knowledge 
Figure 5: Common knowledge between speaker and lis- 
tener to disambiguate semantics of handed sentences. 
Table 2: Mutual-information of OPEl) 
Scene entropy Mutual-inform. 
Without input 8.59 bits 
1 word input 0.80 bits 7.79 bits 
2 words inl)ut 0.32 bits 0.48 bits 
Mutual-lnformation entropy: 
Mutual-information entropy (MIE) can lye defined 
as the contribution of additional words to identify 
a scene, and consequently, tile selectiveness of the 
target word or scene. In order to select a word 
meauing or scene fi'om the possible space Y, the 
space C of M1 other words are considered in the 
calculation of conditional entropy (CE). Mutual- 
information entrot>y per word is calculated by fol- 
lowing formula: 
MIE(O;O') = CU(C l O ) -CE(CIO' ) 
Here, 0 is a set of previous state parameters, and 
0 ~ is that of next one. Mutual-inforamtion can lye 
interpreted ,as the reduction from a previous con- 
ditional entropy to corresponding updated con- 
ditional entrolyy with additional words. We l)ro - 
vide a theoretical estimation of sclf-informatio,l 
of spatial-scenes with the dictionary in Table 2. 
Tile result suggests that it has the spa.tial-scene 
identification ability with a few words 1)rese,'va - 
tion. It also supl)orts the consequence of a h)gical- 
summation algorithm shown in next section. 
4 Analyses of identification 
module 
Here we propose analyses of OPED and results of theo- 
retical simulations. As formula (9) is expensive(11711! 
times), we use a Monte-Carlo simulation to abstract its 
characteristics. Iteration thne in each case is 1,000. 
* Fig.6 (a) shows a distribution of number of ele- 
ments involved in each scene in OPED. It approx- 
imated a Gaussian distribution and has a average 
# Elemems i m ..... tog(el .... IS? nes per el°merit \] 
",o, ...... :2 o, 
Figure 6: (a) Distribution of number of elements per 
scene and (b) Distribution of number of scenes per 
elements 
wdue of 184.2. This value is used ill the theoreti- 
cal simulations. 
• Fig.6 (b) shows a distribution of number of scenes 
which are related to one element. The region 
where more than 100 scenes are related to one 
word are those for trivial words like 'a', 'the', 'of', 
'that', 'to', 'in', ~and', ~for', 'with', 's'. Although 
we could ignore these words for an actual appli- 
cation, we use them for fairness. 
• Selection probability in the case that partial 
words of scenes arc input to the mssoeiative men> 
cry is illustrated in Fig.7. The recall rate in- 
cre`ases `as the input vector (set of words) becmnes 
more similar to c:omplete vector (set of words) pat- 
tern. Only about tlve words are enough to iden- 
tify each scene at recognition rate of 90 percent. 
Compared to the average, number of 184 words 
ill each scene, this required mlmber is sufficiently 
small. It proves good performance of the `associa- 
tive memory used in this module. 'l~heoretical re- 
suits of a random distribution model is also shown 
in Fig.7. The cause of the discrepancy between 
the experiment and theoryis describe<l latter. The 
dotted line 'EXACT' ill the tlgure is a result ilS- 
ing logical-smnmation. "File crossing point <>f the 
'OPED' line and the 'IgXACT' line. is remarkable. 
Tile former has the adwmtage of expecting with 
relatively high-probMfility (likelihood) using in- 
put words of small number. Though with more 
additional words, the algorithm is deDated by the 
simple logical-sumination. As our architecture 
PDAI&CD uses dual-phase of expectation and 
evaluation, we can get a solution with maximum- 
likelihood slttisfying constraints automatically. 
• Fig.8 shows tile distribution of mnnber of elements 
contributing to identify each scene uniquely. 
• In order to clarify tile discrepancy of tlle experi- 
mental an¢l theoretical results, tile number of ele- 
lnents overlal)lmd ill any two st:ones are connted. 
314 
Recalling ratio 
.o 
1.64 
).4~ 
).21 
" I \[ I I 
5 10 15 20 
Number of elements of partial match 
Figure 7: Recalling prollahility to number of partial 
input elements 
Recalling ratio 
t.( ~ _ i 
3.1 
3.1 
3., 
0.: 
0J 
I 
'\[Tr;h~ 
I I 
5 10 15 20 
Number of elements of partial match 
Figure 8: Distribution of mmfller of partial inlmt ele- 
ments to identify scenes 
As in Fig.9, tit(', number of overlal)ping (,lernents 
in the. the.oretieal e~dculation is very small com- 
pared to the experhr,ents with Of)El). OPfi',D-2 
ill tile figure illustrates the same ,¢alue without 
using trivial words like 'a', 'the', 'of', 'that', 'to', 
'in', 'and', fief', 'with', 's'. But the. existence of 
these words can not explain the whole discrep- 
ancy. This will be deserilled in the next section 
ill more detail. 
* As filrther investigation in order to explain tile 
discrepancy of 'EXACT'(logical-sunnnation) and 
'OPED'(with our associative memory), distrilm- 
tion of weight v~tlues is shown in l,'ig.10. I,~)/';ical- 
surnmation me.thod is achieved by a spe(:ial algo- 
rithm similar to the associative memory. Only tile 
ditferenee is that it uses equal weight value with- 
log(number) 
Figure 9: Distribution of number of elements comnmn 
to two seelles 
4 
2 
1 
0.2 
10g(number) 
6 
5 
0.4 
Distribution of weighl value 
0.G 0.8 1.0 
Vigure 10: Distribution of weight value 
out any wtrianee, llut in practic~tl, the experimen- 
tal result of 'OPED' as ill \]'~ig.10 shows am exis- 
tence of enormous wtriance ill tile distrilmtion of 
weight value. Though tile varimme helps the selec- 
tivity with it few words, it disturhs the expectivity 
with lllOl'e thall l\]lrt!e w()rds eolivers(qy, l\[el'e we 
sumnmrize the interl)ret;ttion of the gaps ~tmonF, 
the theoretical expectation, the rest, It of logic~tl- 
summalion('\]';XAC'.l"), and the system('OPl~,l)'): 
1. l'~xsistem:e of trivial words in most of tile 
seelleS. 
2. Variance of weight distribution. 
3. l)ilference of characteristics hetwee.n algo- 
rithms. 
• Abstracted results are summarized in Tabh.'.3. In 
this table, the number of re.gistered words ill dic- 
tionary itself is ditferent from the nurnber of the 
total words analyzed hy our systern. The diserep- 
alley arises mainly Dora the fact that we analyzed 
emnpound words into simple words (e.g. 'research 
laboratory' to 'research' ~'~ittl 'laboratory'). 
315 
Table 3: Summarized results 
Total ~ of scenes 384 scenes 
Registered # of words 27,500 words 
Total # of words 11,711 words 
Average # of words / scene 184.2 words 
Mm,~ # of words in one scene 478 words 
Required # of words to 5 words 
identify scenes at 90% ratio 
Required # of words to 4 words 
identify scenes at 90% ratio 
by exact match algorithm 
Theoretical estimation of 2 words 
required # of words to 
identify scenes at 90% ratio 
5 Summary 
We analyzed the selectivity of our 384 living scenes 
with many sets of words which are part of 11,711 words 
used in the dictionary OPED. The average munber of 
words in one scene is about 184. The probability of re- 
calling correct scenes with input partial words is difl'er- 
ent from the theoretical simulation of random assign- 
ment constructed with vMues of these parameters. Un- 
like random generation of arbitrary symbols, seman- 
tics of natural language consists of highly-correlated 
meanings of words. Although the theoretical simula- 
tion of the simplified model suggests a rough estima- 
tion of disambiguation requirements we should analyze 
the dictionary itself as in this paper. 
Another suggestive analysis is using Shannon's in- 
formation or entropy, which gives us more accurate. 
information depending on prol)ability of each phe- 
nomenon. It shows how to estimate the amount of 
semantic ambiguity. 
Spatial-scene identification is one of the simplest 
kind of context necessary to disambiguate meaning of 
words an(\[ offer a new method for future integration of 
natural language processing and visual pattern recog- 
nition. 
6 Acknowledgements 
The authors acknowledge Randy Goebel, Nancy 
Ide, Jean Veronis, Hiroaki Kitano, Koiichi IIashida, 
Katashi Nagao and Lawrence A. Bookman for helpful 
discussions and suggestions. Also the authors thank 
Kazuhiro Nala~tdai and Satoshi Murakami for trans- 
formation of the pictorial dictionary into machine- 
readable one. This research is supported by Fellow- 
ships of the Japan Society for the Promotion of Sci- 
ence for Japanese Junior Scientists and Grant-in-Aid 
for Scientific Research on Priority Areas by the Min- 
istry of Educations, Science and Culture, Japan. 
References 
\[1\] Amari S. and Maginu K. (1988). Statistical Neu- 
rodynamics of Associative Memory. Neural Net- 
works, Vol. 1-I, pp.63-73. 
\[2\] Barwise .\]. and Perry J. (1983). Situation and 
Attitudes, MIT-Prcss. 
\[3\] Bookman L.A. (1993). A ScMable Architecture 
for Integrating Associative and Semantic Mem- 
ory. Connection Science, Vol. 5. 
\[4\] Cottrell G.W. (1989). A Connectionist Approach 
to Word Sense Disambiguation, Pitman, Morgan 
I(aufmann Pub. 
\[5\] Ide N.M. and Vcronis J. (1993). Extracting 
Knowledge Bases from Machine-Readal)le Dic- 
tionaries: Have We Wasted Our Time? In KB 
~ KS 93, pp.257-266. 
\[6\] Kohonen T. (1984). Self-Organization and Asso- 
ciative Memory, Springer-Vcrlag. 
\[7\] Miikkulainen R. (1993). Subsymbolic Natural 
Language I)rocessing : An Inteyrated Model of 
Scripts, Lea:icon, and Memory., MIT-Press. 
\[8\] Nishikimi M., Nakashima II. and Matsubara II. 
(1992). Language Acquisition ,'us Learning. In 
Proceedings of COLING-92, pp.707-713. 
\[9\] Shannon C.E. (1948). A Mathematical Theory 
of Communication. Bell System 7~ch. J., Vol.27, 
pp.373-423, 623-656. 
\[10\] Sharkey N.E. (1989). A PDP Learning Approach 
to Naural Language Understanding. In Alexm~- 
der I. Ed., Neural Computing Architectures : 
The Design of Brain-like Machines, MIT-Press, 
pp.92-116. 
\[11\] Shastri L. (1988). Semantic Networks: An Evi- 
dential Formalization and its Connectionist Re- 
alization, Morgan Kauflnann. 
\[12\] Tsunod~t T. and Tanal~t It. (1992). Semantic 
Ambiguity Resolution by Parallel Distrit)uted 
Associative Inference and Contradiction Detec- 
tion. In Proceedings of LICNN-Nagoya93, Vol. I, 
pp.163-166. 
\[131 Tsunoda T. and Tanata't H. (1993). Winner As- 
sociatiw; Voting Engine (WAVE). In Proceedings 
of LlCNN-Beijing92, Vol.3, pp.589-594. 
\[141 Veronis J. and Ide N.M. (1990). Word Sense Dis- 
ambiguation with Very Large Neural Networks 
Extracted from Machine Readable Dictionaries. 
In Proceedings of COLING-90, pp.389-394. 
\[15\] Waltz D.L. and Pollack J.B. (1985). Massively 
Parallel Parsing : A Strongly Interactive Model 
of Natural Language Interpretation. COGNI- 
TIVE SCIENCE, Vol.9, pp.51-74. 
316 
Generation 

