Association-based Natural Language Processing 
with Neural Networks 
KIMURA Kazuhiro SUZUOKA Takashi 
AMANO Sin-ya 
Information Systems Laboratory 
Research and Development Center 
TOSHIBA Corp. 
1 Komukai-T6siba-ty6, Saiwai-ku, Kawasaki 210 Japan 
kim~isl.rdc.toshiba.co.jp 
Abstract 
This paper describes a natural language pro- 
cessing system reinforced by the use of associ- 
ation of words and concepts, implemented as a 
neural network. Combining an associative net- 
work with a conventional system contributes 
to semantic disambiguation in the process of 
interpretation. The model is employed within 
a kana-kanji conversion system and the advan- 
tages over conventional ones are shown. 
1 Introduction 
Currently, most practical applications in nat- 
ural language processing (NLP) have been 
realized via symbolic manipulation engines, 
such as grammar parsers. However, the cur- 
rent trend (and focus of research) is shift- 
ing to consider aspects of semantics and dis- 
course as part of NLP. This can be seen in 
the emergence of new theories of language, 
such as Situation Theory \[Barwise 83\] and 
Discourse Representation Theory \[Kamp 84\]. 
While these theories provide an excellent the- 
oretical framework for natural language un- 
derstanding, the practical treatment of con- 
text dependency within the language can also 
be improved by enhancing underlying compo- 
nent technologies, such as knowledge based 
systems. In particular, alternate approaches 
to symbolic manipulation provided by connec- 
tionist models \[Rumelhart 86\] have emerged. 
Connectionist approaches enable the extrac- 
tion of processing knowledge from examples, 
instead of building knowledge bases manually. 
The model described here represents the 
unification of the connectionist approach and 
conventional symbolic manipulation; its most 
valuable feature is the use of word as- 
sociations using neural network technology. 
Word and concept associations appear to 
be central in human cognition \[Minsky 88\]. 
Therefore, simulating word associations con- 
tributes to semantic disambiguation in the 
computational process of interpreting sen- 
tences by putting a strong preference to ex- 
pected words(meanings). 
The paper describes NLP reinforced by as- 
sociation of concepts and words via a con- 
nectionist network. The model is employed 
within a NLP application system for kana- 
224 
kanji conversion x. Finally, an evaluation of 
the system and advantages over conventional 
systems are presented. 
2 A brief overview of 
kana-kanji conversion 
Japanese has a several interesting feature in 
its variety of letters. Especially the ex- 
istence of several thousand of kanji (based 
on Chinese characters; ~, 111,..) made typing 
task hard before the invention of kana-kanji 
conversion\[Amano 79\] . Now it has become 
a standard method in inputting Japanese to 
computers. It is also used in word processors 
and is familiar to those who are not computer 
experts. It comes from the simpleness of op- 
erations. By only typing sentences by pho- 
netic expressions of Japanese (kan a), the kana- 
kanji converter automatically converts kana 
into meaningful expressions(kanji). The sim- 
plified mechanism of kana-kanji conversion can 
be described as two stages of processing: mor- 
phological analysis and homonym selection. 
• Morphological Analysis 
Kana-inputted (fragment of) sentences 
are morphologically analized through dic- 
tionary look up, both lexicons and gram- 
mars. There are many ambiguities in 
word division due to the agglutinative na- 
ture of Japanese (Japanese has no spaces 
in text), Each partitioning of the kana 
is then further open to being a possible 
interpretation of several alternate kanji. 
The spoken word douki, for example, can 
mean motivation, pulsation, synchroniza- 
tion, or copperware. All of them are spelt 
identically in kana( k°5 -~), but have dif- 
ferent kanji eharacters(~, ~'t-~, ~\], ~1 
1 Many commercial products use kana-kanji conver- 
sion technology in Japan, including the TOSHIBA 
Tosword-series of Japanese word processors. 
~-~,respectively). Some kana words have 
10 or more possible meanings. Therefore 
the stage of Homonym Selection is indis- 
pensable to kana-kanji conversion for the 
reduction of homonyms. 
Homonym Selection 
Preferable semantic homonyms are se- 
lected according to the co-occurrence 
restrictions and selectional restrictions. 
The frequency of use of each word is also 
taken into account. Usually, the selection 
is also reinforced by a simple context hold- 
ing mechanism; when homonyms appear 
in previous discourse and one of them is 
chosen by a user, the word is automat- 
ically memorized in the system as in a 
cache technology. Then, when the same 
homonyms appear the memorized word is 
selected as the most preferred candidate 
and is shown to the user. 
3 Association-based kana- 
kanji conversion 
The above mechanisms are simple and effec- 
tive in regarding kana-kanji converter as a typ- 
ing aid. However, the abundance of homonyms 
in Japanese contributes to many of the am- 
biguities and a user is forced to choose the 
desired kanji from many candidates. To re- 
duce homonym ambiguities a variety of tech- 
niques are available; however, these tend to 
be limited from a semantic disambiguation 
perspective. In using word co-occurrence re- 
strictions, it is necessary to collect a large 
amount of co-occurrence phenomena, a prac- 
tically impossible task. In the case of the 
use of selectional restrictions, an appropri- 
ate thesaurus is necessary but it is known 
that defining the conceptual hierarchy is dif- 
ficult work \[Lenat 89\]\[EDR 90\]. Techniques 
for storing previous kanji selections (cache) 
225 
Text;ua.l Znpu ~" 
. ......... ...... j 
',,'-.. / ~\ 
~~\ ',, "t.. ~ 2"~J 
~ ',, "-~ 
Figure 1: Kana-Kanji Conversion with a Neural Network 
are too simple to disambiguate between possi- 
ble previous selections for the same homonym 
with respect to the context or between context 
switches. 
To avoid these problems without increasing 
computational costs, we propose the use of the 
associative functionality of neural networks. 
The use of association is a natural extension to 
the conventional context holding mechanism. 
The idea is summarized as follows. There are 
two stages of processing: network generation 
and kana-kanji conversion. 
A network representing the strength of word 
association is automatically generated from 
real documents. Real documents can be con- 
sideredas training data because they are made 
of correctly converted kanji. Each node in 
the network uniquely correspond to a word 
entry in the dictionary of kana-kanji conver- 
sion. Each node has an activation level. 
The link between nodes is a weighted link 
and represents the strength of association be- 
tween words. The network is a Hopfield-type 
network\[Hopfield 84\]; links are bidirectional 
and a network is one layered. 
When the user chooses a word from 
homonym candidates, a certain value is in- 
putted to the node corresponding to the cho- 
sen word and the node will be activated. The 
activation level of nodes connected to the ac- 
tivated node will be then activated. In this 
manner, the activation spreads over the net- 
226 
work through the links and the active part of 
the network can be considered as the associa- 
tive words in that context. In kana-kanji con- 
version, the converter decides the preference 
of word order for homonyms in the given con- 
text by comparing the node activation level of 
each node of homonyms. An example of the 
method is shown in Figure 1. 
Assume the network is already built from 
certain documents. A user is inputting a text 
whose topic is related to computer hardware. 
In the example, words like clock ( ~ t~ .~ ~ ) 
and signal (4~'-~-) already appear in the previ- 
ous context, so their activation levels are rela- 
tively high. When the word DOUKI (~") ~) 
is inputted in kana and the conversion starts, 
the activation level of synchronization (~J~) 
is higher than that of other candidates due to 
its relationship to clock or signal. The input 
douki is then correctly converted into synchro- 
nization (\[~jtj\]). 
The advantages of our method are: 
* The method enables kanji to be given 
based on a preference related to the cur- 
rent context. Alternative kanji selections 
are not discarded but are just given a 
lower context weighing. Should the con- 
text switch, the other possible selections 
will obtain a stronger context preference; 
this strategy allows the system to capably 
handle context change. 
* Word preferences of a user are reflected in 
the network. 
• The correctness of the conversion is im- 
proved without high-cost computation 
such as semantic/discourse analyses. 
4 Implementation 
The system was built on Toshiba AS-4000 
workstation (Sun4 compatible machine) using 
C. The system configuration is shown in Fig- 
ure 2. 
The left-hand side of the dashed line repre- 
sents an off-line network building process. The 
right-hand side represents a kana-kanji con- 
version process reinforced with a neural net- 
work handler. The network is used by the 
neural network handler and word associations 
are done in parallel with kana-kanji conver- 
sion. The kana-kanji converter receives kana- 
sequences from a user. It searches the dictio- 
nary for lexical and grammatical information 
and finally creates a list of possible homonym 
candidates. Then the neural network handler 
is requested for activation levels of homonyms. 
After the selection of preferred homonyms, it 
shows the candidates in kanji to a user. When 
the user chooses the desired one, the chosen 
word information is sent to the neural network 
handler through a homonym choice interface 
and the corresponding node is activated. 
The roles and the functions of main compo- 
nents are described as follows. 
* Neural Network Generator 
Several real documents are analyzed and 
the network nodes and the weights of links 
are automatically decided. The docu- 
ments consist of the mixture of kana and 
kanji; homonyms for the kanji within the 
given context are also provided. The doc- 
uments, therefore, can be seen as training 
data for the neural network. The analysis 
proceeds through the following steps. 
1. Analyze the documents morpholog- 
ically and convert into a sequence 
of words. Note that particles and 
demonstratives are ignored because 
they have no characteristics in word 
association. 
2. Count up the frequency of the all 
combination of co-appeared word- 
pair in a paragraph and memorize 
227 
ass~laClve net~rX 
I 
1 
F~ 
D~tlom.~7 
H~dler 
Lex.lcons £ 1 ~ammars hiragana 
sequeltce4 
I Kana.Kaq/i 
-! 
activation levels 
o1" neurons homonym 
candlda tes 
fin kanJ$) 
actlvet~ngchoeen neurons 
Figure 2: System Configuration 
!~ ..j 
I,u.#'~ 
i 
them as the strength of connection. 
A paragraph is recognized only by a 
format information of documents. 
3. Sum up the strength of connection 
for each word-pair. 
4. Regularize the training data; this 
involves removing low occurrences 
(noise) and partitioning the fre- 
quency range in order to obtain 
a monotonically decreasing (in fre- 
quency) training set. 
Although the network data have 
only positive links and not all nodes 
are connected, non-connected nodes 
are assumed to be connected by neg- 
ative weights so that the Hopfield 
conditions \[Hopfield 84\] are satisfied. 
As described above, the technique used 
here is a morphological and statistical 
analysis. Actually this module is a pat- 
tern learning of co-appearing words in a 
paragraph. 
The idea behind of this approach is that 
words that appear together in a para- 
graph have some sort of associative con- 
nection. By accumulating them, pairs 
without such relationships will be statis- 
tically rejected. 
From a practical point of view, automated 
network generation is inevitable. Since 
human word association differ by individ- 
228 
ual, creation of a general purpose asso- 
ciative network is not realistic. Because 
the training data for the network is sup- 
posed to be supplied by users' documents 
in our system, automatic network genera- 
tion mechanism is necessary even if the 
generated network is somewhat inaccu- 
rate. 
• Neural Network Handler 
The role of the module is to recall the 
total patterns of co-appearing words in a 
paragraph from the partial patterns of the 
current paragraph given by a user. 
The output value Oj for each node j is 
calculated by following equations. 
Oj = f(nj) 
nj = (1 - 5)nj + 6(Z wjiO i -11- Ij) 
i 
where 
f : a sigmoidal function 
: a real number representing the inertia 
of the network(0 < ~ < 1). 
nj : input value to node j. 
Ij : external input value to node j. 
wjl : weight of a link from node i to node 
j; Wji ---- Wij , Wii .~ O. 
The external input value Ij takes a cer- 
tain positive value when the word corre- 
sponding to node j is chosen by a user. 
Otherwise zero. 
Although the module is software imple- 
mented, it is fast enough to follow tile 
typing speed of a user. 2 
• Kana-Kanji Converter 
2A certain optinfization technique is used respect- 
ing for the spm-seness of the network. 
Tile basic algorithm is almost same as 
the conventional one. The difference is 
that holnonym candidates are sorted by 
the activation levels of the correspond- 
ing nodes in the network, except when lo- 
cal constraints such as word co-occurrence 
restrictions are applicable to the candi- 
dates. The associative information also 
affects the preference decision of gram- 
matical ambiguities. 
5 Evaluation 
To evaluate tile method, we tested the im- 
plemented sytem by doing kana-kanji conver- 
sion for real documents. The training data 
and tested data were taken from four types 
of documents: business letters, personal let- 
ters, news articles, and technical articles. The 
amount of training data and tested data was 
over 100,000 phrases and 10,000 phrases re- 
spectively, for each type of document. The 
measure for accuracy of conversion was a re- 
duction ratio(RR) of the homonym choice 
operations of a user. For comparison, we 
also evaluated the reduction ratio(RR ~) of the 
kana-kanji conversion with a conventional con- 
text holding mechanism. 
RR = (A - B)/A 
RR' = (A - C)/A 
whe1:e 
A : number of clmice operations required when 
an untrained kana-kanji converter was used. 
B : number of choice operations required when 
a NN-trained kana-kanji converter was used. 
C : nunlber of choice operations required 
when a kana-kanji converter with a conven- 
tional context holding mechanism was used. 
Tile result is shown in Table 1. The ad- 
vantages of our method is clear for each type 
229 
Table 1: Result of the Evaluation 
document-type RR(%) RR'(%) 
business letters 41.8 32.6 
personal letters 20.7 12.7 
news articles 23.4 12.2 
technical articles 45.6 40.7 
of documents. Especially, it is notable that 
the advantages in business letter field is promi- 
nent, because more than 80% of word proces- 
sor users write business letters. 
6 Discussion 
Although the result of conversion test is sat- 
isfactory, word associations by neural network 
are not human-like ones yet. Following is a list 
of improvements that many further enhance 
the system: 
• Improvements for generating a network 
The quality of the network depends on 
how to reduce noisy word occurrence in 
the network from the point of view of as- 
sociation. The existence of noisy words 
is inevitable in automatic generation but 
plays a role to make unwanted associa- 
tions. One approach to reducing noisy 
words is to identify those words which 
are context independent and remove them 
from the network generation stage. The 
identification can be based on word cat- 
egories and meanings. In most cases, 
words representing very abstract concepts 
are noisy because they force unwanted ac- 
tivations in unrelated contexts. There- 
fore they should be detected through ex- 
periments. Another problem arises be- 
cause of the ambiguity of morphological 
analysis. Word extraction from real doc- 
uments is not always correct because of 
the agglutinative nature of the Japanese 
language. Other possibility for network 
improvement is to consider a syntactic 
relationship or co-occurrence relationship 
while deciding link weights. In addition, 
there are keywords in a document in gen- 
eral which play a central role in associa- 
tion. They will be reflected in a network 
more in consideration of technical terms. 
Preference decision in kana-kanji conver- 
sion 
The reinforcement of associative informa- 
tion complicates the decision of homonym 
preference in kana-kanji conversion. We 
already have several means of seman- 
tic disambiguation of homonyms: co- 
occurrence restrictions and selectional re- 
strictions. As building a complete the- 
saurus is very difficult, our thesaurus 
is still not enough to select the cor- 
rect meaning(kanfi-conversion) of kana- 
written word. So selectional restrictions 
should be weak constraints in homonym 
selection. In the same vein, associative 
information should be considered a weak 
constraint because associations by neural 
networks are not always reliable. Pos- 
sible conflict between selectional restric- 
tions and associative information, added 
to tile grammatical ambiguities remaining 
in the stage of homonym selection, make 
kanji selection very complex. The prob- 
lem of multiply and weakly constrained 
230 
homonyms is one to which we have not 
yet found the best solution. 
7 Conclusion 
This paper described an association based nat- 
ural language processing and its application 
to kana.kanji conversion. We showed advan- 
tages of the method over the conventional one 
through the experiments. After the improve- 
ments discussed above, we are planning to de- 
velop a neuro-word processor available in com- 
mercial use. We are also planning the applica- 
tion of the method to other fields including 
machine translations and discourse analyses 
for natural language interface to computers. 

References 
Kawada, T. and Amano, S., 
"Japanese Word Processor," 
Proc. IJCAI-79, pp. 466-468, 
1979. 
Barwise, J. and Perry, J., "Sit- 
uations and Attitudes," MIT 
Press, 1983. 
Japan Electronic Dictionary 
Research Institute, 
"Concept Dictionary," Tech. 
Rep. No.027, 1990. 
Hopfield, J., "Neurons with 
Graded Response Have Col- 
lective Computational Proper- 
ties Like Those of Two-State 
Neurons," Proc. Natl. Acad. 
Sci. USA 81, pp. 3088-3092, 
1984. 
Kamp, H., "A Theory of 
Truth and Semantic Repre- 
sentation," in Groenendijk et 
al(eds.) "Truth, Interpreta- 
tion and Information", 1984. 
Lenat, D. and Guha, R., 
"Building Large Knowledge- 
Based Systems: Represen- 
tation and Inference in 
the Cyc Project," Addison- 
Wesley, 1989. 
Minsky, M., "The Society Of 
Mind,", Simon gz Schuster 
Inc., 1988. 
Rumelhart, D., McClelland, 
J., and the PDP Research 
Group, "Parallel Distributed 
Processing: Explorations in 
the Microstructure of Cogni- 
tion," MIT Press, 1986. 
Waltz, D. and Pollack, J., 
"Massively Parallel Parsing: 
A Strongly Interactive Model 
of Natural Language Interpre- 
tation," Cognitive Science, pp. 
51-74, 1985. 
