STATISTICAL ANALYSIS OF JAPANESE CHARACTERS 
by 
Takushi Tanaka 
The National Language Research Institute 
3-9-14 Nishigaoka Kita-ku, Tokyo 
Summary 
The purpose of this study is to 
analyze the statistical property of 
Japanese characters for computer pro- 
cessing. Sentences in high school text- 
books and newspapers have been investi- 
gated in this study. This paper con- 
tains the following points : the number 
of different words written in each char- 
acter, position of characters in a word, 
relation between word boundaries and 
character strings, relation between 
parts of speech and patterns of charac- 
ter strings, relation between parts of 
speech and each character. 
The results of these investigations 
can be applied to the processing of 
written Japanese for practical purpose. 
i. Introduction 
There are several different aspects 
between English and Japanese in the 
information processing of natural lan- 
guage. The first concerns the number of 
characters. In order to write Japanese 
more than 2,000 characters are used. 
The second concerns the way of writing. 
A Japanese sentence consists of a con- 
tinuous character string without any 
space between words. The third concerns 
word order and other syntactic features. 
Among these aspects, the second and 
third features are closely related to 
the characters. 
Japanese characters consist of 
three kinds. A KANJI(Chinese character) 
is used to write nouns and the principal 
part of a predicate, and expresses the 
concepts contained in the sentence. 
A HIRAGANA (traditional Japanese char- 
acter) is used to write conjunctions, 
adverbs, JODOSHI (mainly expresses many 
modalities of a predicate) and JOSHI 
(post-position, mainly expresses case 
relations). A KATAKANA (traditional 
Japanese character) is used mainly as 
phonetic signs to write foreign words. 
Accordingly, Japanese characters 
are regarded as elements of words, at 
the same time, they function to charac- 
terize the syntactic or semantic classes 
of words and express word boundaries in 
a character string. 
The following Japanese character 
strings, (A) to (D), are the same sen- 
tenCes written by using KANJI to dif- 
ferent degrees. 
(D) is quoted from a high school text- 
book (world history). 
While (A), (B) and (C) are transliterated 
from (D) by computer. 1,2 
(Example of Japanese sentence) 
(A) 
(s) 
(c) 
l~l~),D~t~.s:~ ~s<:. 2.D 6{20t~t~O 
(D) 
--315-- 
(A) is written in KATAKANA (only for 
' ~--D~,~ ') and HIRAGANA (the rests) 
without using KANJI. 
(B) is written in HIRAGANA, KATAKANA 
and 200 KANJI of high frequency in 
Japanese writing. 
(C) is written in HIRAGANA, KATAKANA 
and the so-called educational KANJI 
(996 characters). 
Low graders in elementary school 
tend to write sentences like (A). The 
older they get the more KANJI they learn 
and they begin to write sentences like 
(D) in high school. When we read sen- 
tences like (A), we realize it is very 
difficult to read them, because we can- 
not find word boundaries easily. On the 
other hand, in (B), (C) and (D) we find 
less difficulty in this order. Because 
we can easily find out word boundaries 
by means of KANJI in a character string. 
Boundaries between a HIRAGANA part and a 
KANJI part play a role to indicate word 
boundaries in many cases. We can also 
grasp main concepts in a sentence by 
focusing our attention to the KANJI 
parts of the sentence. 
Therefore, it is very important to 
use HIRAGANA and KANJI appropriately in 
a character string. It is, however, hard 
to say the rules for the appropriate use 
of HIRAGANA and KANJI have been estab- 
lished. Due to the fact, it is necessary 
for us to study more about the actual 
use of Japanese characters. Because, 
explication of rules for the appropriate 
use of the characters is a prerequisite 
for information processing in commonly 
written Japanese. 
2. Outline of Japanese characters 
Fig.l illustrates the rate of total 
characters contained in the high school 
textbooks (9 subjects X 1/20 sampling). 
The data contains 48,096 characters in 
total. 3 HIRAGANA occupies the first 
place accounting for 47.1%. According to 
the result of Nakano's study which will 
be presented here, KANJI takes the first 
place in the newspaper, because they 
have TV-programs and mini advertisement 
which are both written mainly in KANJI. 4 
Fig.2 illustrates the rate of dif- 
ferent characters in the data of text- 
books. The data contains 1,525 different 
characters. KATAKANA and HIRAGANA are 
composed of basic 47 characters respec- 
tively, however the data also contains 
variations like small letters and let- 
ters with special symbols, and both kind 
of KANA exceed 70. Most of HIRAGANA and 
KATAKANA were appeared in the data of 
textbooks. The data contains 1,312 dif- 
ferent KANJI. The more data is investi- 
gated the more KANJI appear, and the 
rate of KANJI increases in the graph. 
'0-9'(1.9X) -- . 
','<3.7~>-.-X ~ 
KAHJI "~k RAGAHA 
(36.3~) ~ ~ (47,1~) 
I 00~( =48096 ) 
Fig.l Rate of total characters 
ALPHAIgET( 2 . 5~. ) I ~- SYMBOLS+( ' , ' , ' . ' ) 
H IRAGANA ~I I < I . 2~ ) 
(4. 6Z ) ~~-- 'O-9' ( 0. 7~, ) 
KATAKAHA 
(5.B~:) " 
KANJI 
(B6.0~) 
100~( =1525 ) 
Fig.2 Rate of different characters 
According to the investigation of Nomura 
3,213 KANJI were found in the newspaper~ 
The largest Japanese KANJI dictionary 
(edited by Morohashi) contains about 
50,000 characters.6 
Fig.3 shows relation between fre- 
quency and order of frequency in every 
kind of characters. From Fig.3 we see 
that a few HIRAGANA have high frequency. 
They play an important role in writing 
grammatical elements in a sentence as 
JOSHI and JODOSHI. 
(Y) 
1000 
100 
10 
Fig. 3 
("HIRAGANA) X : Order l %",.,,, 
Y : Frequency 
"°',-,2 ",..., 
,..,,. 
,. ",. "'. '". ................ :: .... I ( KANJ I ) 
• , ""-.,., • .......... , ......... , .......... 
",, 5 ..... "',.3 "% ".. (Num.) .............. . " .... 
• , "",. ,, ,.-,° 
• ( KATAKANA )'"'"'-:.:.. 
4 
(Alphabet) 
I i i 
25 50 75 X) 
Frequency and Their order 
316 
Fig.4 shows the relation between 
order of frequency and total number of 
characters up to their order. In this 
graph, we see about twelve different 
HIRAGANA occupy 50% of total HIRAGANA. 
About 120 different KANJI occupy 50% of 
total KANJI. 
(Y) 
38888 
28888 
1~B888 
e 
Fig. 4 
i Order 
Total number 
................. (H IRAGANA) 
..'" 
/.." 
(KANJI) I 
°° .................................................. 
"" . ,..°- ...... ~ ....... \[ .. . 3 
• ....~':::::: .................. ;" (KATAKANA) 
e 2~ se 75 lee(x) 
Order and Total up to the order 
3. Number of different words 
written in each character 
As we have more than 50,000 charac- 
ters, it is necessary to decide the 
degree of importace of them. In order 
to decide the degrees two criteria are 
assumed here. One is the frequency of 
the characters. The other one isthe 
number of different words in which the 
same character is used. The similar 
concept has been proposed by A. Tanaka. 7 
In Fig.5, axis X represents the 
frequency of the character as first cri- 
terion. Axis Y represents the number of 
different words in which the same char- 
acter is used. The graph shows the 
distribution of characters in the text- 
books except KANJI. Each character on 
Y=I is used for only one word. For in- 
stance, HIRAGANA ' & ' (o) on Y=i is used 
for only one word (one of case-JOSHI, 
indicating accusative case) exclusively. 
Each character on Y=X is used for a new 
word in every occurrence of the char- 
acter. 
(Y) 
188 
18 
Fig.5 Distribution of characters 
except KANJI 
X : Frequency 
Y : Number of different words 
: HIRAGANA 
-- : KATAKANA 
k : Alphabet 
I : Numeral or Symbol 
~I~ ffs o 
\]\ O,T, ~ V 
c C • n 
m ~ 
> < 
--X 
.~ P ~-~ 
2-- tl 
- ~2~ "~ .~ 
,I, • • • a I 816 £ Ill I ~ I II III I il I ill I 
V'/Lvx ~XRM TPFqdSkt. O/b P H ~ ~ DA 768 59403r2 () 1 
up; r G i e Et f =~ 
• l~ "~ 
-J~ 
..~ 
• \[C 
| ° 
I I I I 
I 18 188 18~8 (x) 
317-- 
1080 
(Y) 
,-_ p 
18 
1000 
(Y) 
180 
18 
Fig.6 Distribution of KANJI ._h ._. 
for daily use "~ .dJ "~H m.~ 
X : Frequency :~:~ ~,.., :.,.~.. _.~'A "H 
Y Number of different words *AT~ \['~':" ~'~'~. ":" ":\[~ i~ ~ 
• ~'.. ~. %% ~.'...,.~,~.~, ..~ 
~¢~ ~:., -~ {:~.'~ .~.¢: :',~-? ,.- .%. . 
'~k. ~.....,.~'4.b.~-'~-r-~-A; -'. ,J' ." . .~'~ 
~. :.- ............... 
~{~ ...................... ~ " 
I ~ I I I I 
1 10 100 IOOO 1000~3 
Fig.7 Distribution of KANJI 
not for daily use 
X : Frequency 
Y : Number of different words 
~V ~ ~'~ , 
I ! ! 
~'.~...*~ "~ 
"i~ 
• Q =- 
Q ~ • • ./lJ L'I~ 
~\] "~..-,~. -. :- 
,~,,~,,.~':.-.: ::..:-. -: : ~.+~"~..~ 
~I~ .............. • ,k,~ 
O 
"6 
overlap of characters 
on the same point 
length of the diagonal 
( 500 / scale ) 
I I 
I 10 100 1000 10000 
(x) 
(x) 
318 
KATAKANA appear near Y=X, because 
KATAKANA are mainly used for writing 
proper nouns of foreign words. The same 
words of such a category do not appear 
frequentry. 
HIRAGANA,' ~ ' (ru),' ~'(i),'~'(shi), 
'~ '(tsu), ' ~ '(ka) and ' < '(ku) are lo- 
calized on the upper right side. These 
are often used for writing some parts of 
inflectional forms of verbs (e.g. ' %~' 
for ' ~ ', ' D' for ' 5~ ~ ', ' ~' for 
' ~'). ' ~)'(i), ' ~'(ka) and ' < ' 
(ku) are also often used for writing 
some parts of inflectional forms of 
adjectives. ' ~ '(no), ' ¢:'(ni), ' %'(o), 
' ~ '(wa), ' a '(to), ' ~'(ga) and '~'(de) 
on the right side are frequently used 
for JOSHI (post-position, expressing- 
case relations or other grammatical re- 
lations). '~ '(ta) on the upper right 
side is often used for JODOSHI of the 
past tense. ' ~ '(na) on the upper right 
side is often used for the initial syl- 
lable of JODOSHI of negative. 
Fig.6 and Fig.7 show the same in- 
vestigation into the KANJI of newspapers 
(the original work was carried out by 
Nomura).5 
Fig.6 shows the distribution of (y) 
the so-called "TOYOKANJI" selected by 
the Japanese government for daily use in 
1946. The upper right area on the graph I~8 
is occupied by the so-called educational % 
KANJI. Each KANJI on Y=i is used only 
for one word (e.g. '~ '(tai) for '~{~' 
(taiho : arrest), ' ~ '(bou) for ' ~' 
(boueki : trade), '~ '(kai) for '~' 
(kikai : machine)). The same as Fig.5, 
characters used for persons' names are 
localized near Y=X. 
Fig.7 shows the distribution of 5~ 
KANJI other than TOYOKANJI. The most of 
characters in upper right part of the 
graph are the ones which are used for 
persons' names or for place names. (e.g. 
'~ ' and ' ~' for '~ '(Eujisaki:person) 
' ~' for '~' (Fukuoka:place). 
4. Position of characters in a word 
For the information processing of 
Japanese sentences, at first, it is im- 
portant to find out word boundaries in a 
continuous character string. If there 
are some characters which always come to 
the initial position or the final posi- 
tion of a word, these characters are 
available to find the boundaries. 
Fig.8 shows the position of charac- 
ters in words. In the data of textbooks, 
there are 399 characters which are used 
for more than 6 kinds of different 
words. The characters on X=i00 always 
come to the initial position of a word. 
The characters on X=0 are never used at 
the initial position. The characters on 
Y=i00 always come to the final position 
of a word. The characters on Y=0 are 
never used at the final position. 
KANJI, represented with dots, spread 
over the area of Y~-X+i00. Namely, the 
value of X+Y are always greater than or 
equal to i00~ In other words, rates of 
the initial position plus final position 
are always greater than or equal to 100%. 
It means that all KANJI have a tendency 
to be used for the initial position or 
the final position or both position (as 
a word of one character) of a word 
(short unit *). Most KANJI on Y = -X+100 
form only words of two KANJI. The tend- 
ency originates in the composition of 
words written by KANJI. This matter will 
be observed in section 6. The group of 
HIRAGANA in the upper right area has a 
tendency to be used for JOSHI. KATAKANA 
represented by '~' appear around the 
under left area on the graph. Words 
written in KATAKANA have relatively long 
length (See section 6). Therefore, the 
rates of the initial position and the 
final position are relativery decreased. 
  _ ;. ". ~.7.% 
~3 -- ---- "¢ :. -- --" 
g • 
- * i: {..," 
• . o 
• " (U~ 
>' ~ I,. ~%'--'" • , "" 
~7 I•' b D %.. ". • • - b ~ D "'" "? y= 
- ~ - ~..: . .. i~ 
~ I~ ~ " "% "" 
q¢ ., 
I I I 
50 IO0 % (x) 
X : Rate of initial position 
Y : Rate of final position 
Fig.8 Position of character in a Word 
* word (long unit) : ~m~ 
(National-language-research-institute) 
word (short unit) : \[\]~ , ~, ~, 
(National,Language,Research,Institute) 
--319 - 
5. Relation between word boundaries 
and character strings 
(Simple Japanese grammar) 
N, JiN2J ~ ... V. (i) 
Ni: Noun 
Ji : Case-JOSHI for N~ 
V : Verb 
A Japanese sentence fundamentally 
belongs to pattern (i). Many nouns (Ni) 
tend to be written in KANJI (See next 
section). All the case-JOSHI are writ- 
ten in HIRAGANA. Stems of verbs are often 
written in KANJI and their inflectional 
parts in HIRAGANA. So both a phrase of 
N~J& and V have such a pattern that the 
initial position is occupied by a KANJI 
and the final position is occupied by a 
HIRAGANA. Therefore, the changing point 
from HIRAGANA to KANJI in a character 
string is always regarded as a word 
boundary. On the other hand, a word 
boundary is not always a changing point 
from HIRAGANA to KANJI. One of the 
exception is Japanese nouns (long unit) 
which are composed of some concatenation 
of nouns (short unit). (See page 5 *) 
Fig.9 shows one of the relations 
between word boundaries and character 
strings. The graph contains 902 KANJI 
(total : 1,546) in the textbooks. The ax- 
is X represents the rate that the chan- 
ging points from HIRAGANA to KANJI cor- 
respond to word boundaries. Each KANJI 
on X=i00 is considered as the initial 
character of a word if it is preceeded 
by a HIRAGANA. The axis Y represents the 
rate that the word boundaries correspond 
to changing points from HIRAGANA to KAN- 
JI. The symbol of '~' represents a KANJI. 
(Y) 
le8 
50 
0 
Fig. 9 
15461 9O2 
,,, ~r__\] ~ 
I I 
e se lee~ (x) 
x : Rate of word boundary 
y : Rate of H-K boundary 
Character string and boundary 
The length of diagonal of '~' is propor- 
tionate to the frequency of the KANJI. 
In the graph, the length of 10% of axis 
is equal to i00 times of the frequency. 
6. Parts of speech and patterns 
of character strings 
In the investigation of newspapers, 
20 parts of speech were assumed. 8 Each 
part of speech has a particular pattern 
of character strings. It is possible to 
decide the part of speech of a word 
based on the knowledge of such patterns 
in computer processing of Japanese sen- 
tences. 
In Fig.10, 'K' in the column of 
pattern represents a KANJI, 'H' repre- 
sents a HIRAGAN~, and 'I' represents a 
KATAKANA. The left side of the bar chart 
shows the rate of total words. The right 
side of the bar chart shows the rate of 
different words. 
Fig.10-(1) shows the pattern of 
common nouns. The left side of the bar 
chart shows that KK-pattern accounts for 
68.0% of total common nouns in the news- 
papers. The right side of the bar chart 
shows that KK-pattern accounts for 68.5% 
of different common nouns in the news- 
papers. 
Fig.10-(2) shows the pattern of 
proper nouns. Most of the proper nouns 
also have KANJI strings. The rest of 
proper nouns have KATAKANA strings ex- 
pressing foreign words. 
Fig.10-(3) shows the pattern of 
verbal nouns which change to verbs with 
succeeding characters ' ~' (se), ' 8' (sa) 
' b'(shi), ' ~' (su), ' ~ '(suru), ' ~ ' 
(sure), '~'(seyo). The verbal nouns 
consist of KK-pattern up to 97.1% of 
total. If KK-pattern and succeeding 
characters ' ~ '(se),' ~ '(sa), ' L '(shi ) 
...are found, such a character string 
can be treated as a form of this kind. 
Fig.10-(4) shows the pattern of 
verbs. The verb of H-pattern is often 
used with preceding verbal nouns. Most 
different verbs have KH-pattern. 
Fig.10-(5) shows the pattern of 
adjective. Most of the adjectives are 
written with KH-pattern or KHH-pattern. 
Fig.10-(6) shows the pattern of 
adverbs. Most of the adverbs are written 
with HHH-pattern or HHHH-pattern. Namely 
they are written in HIRAGANA. 
7. Relation between each character 
and part of speech 
We have assumed patterns of charac- 
ter strings and the patterns are basi- 
cally available for classifing part of 
speech in actual data. However, the pat- 
terns do not provide sufficient criteria 
for the classification. For example, the 
320 
(i) Common noun 
68.8 
19.8 
2.4 
2.3 
7.5 
~l 68.5 I 
8.4 
3.9 
4.1 
15.1 
I 
108~(=288144) 
(2) Proper noun 
L 78.0 
7.7 
6.1 
4.3 
3.4 
8.5 
60 9 ! 
t t 18.3 6.6 
4.9 
4.3 
13.0 
100~,(=46196) 0~ 
(3) Verbal noun i I 
1 .2 I ,~ 
0.6 0.9 
I.I 2.3 
188>.( =5779 ) OP. 
(4) Verb 
26.1 
2~.3 
24.5 
8.5 
7.0 
8.6 
I 
0,7   
4,3 
6.2 
16,3 
14,3 
10,2 
1007.(=:38829) 
(5) Adjective 
4z % ! 
25 2 
O 2 
7' 4 
4 8 
? 9 
L 
lOOP.( =3~48 ) 
(6) Adverb 
31 7 
26 3 
19 1 
? 4 
7' 2 
3 O 
5 3 
e~ 
f 
I 32 2 
J 20 3 
12 8 
12 
6 8 
l 16 7 
23.7 
38.:3 
6.3 
1.6 
6.7 
6.3 
17.1 
180,~(=5044) O~ 
Fig.10 .Pattern 
(pattern) 
1 KK 
2 K 
3 lII 
4 IllI 
5 OTHERS 
I 
IlBI~P.( =9436 ) 
(example) 
~, ~ (language,world) 
,~, )~ (station,person) 
e~. ~e~u (television,hotel 
==--z, xw--~"(news,speed) 
7°~x~,~ (plastics) 
(pattern) 
1 KK 
2 KKK 
3 K 
4 Iili 
5 III 
6 OTHERS 
I 
loam.( =3472 ) 
(example) 
• ~, 51~ (Tokyo,Nippon) 
~, ~ (Chiyoda,Akihabara) 
~. ~ (U.S.A.,England 
79~z, e~(France,Moscow) 
F47, ~Y (Deutsch,TOYOTA 
==-~-~ (New York) 
(pattern) 
J 1 KK 
2 HHHH 
3 III 
4 OTHERS 
I 
188~(=679) 
(example) 
~, ~ (study,success) 
~<9, ~<~@O(amaze,greeting 
U-V. 7°9x (lead,plus) 
~f (shelving) 
(pattern) 
1 H 
2 KH 
3 HH 
4 HHH 
KHH 
6 OTHERS 
J 
188P.(=1427) 
(example) 
t, i~, ~" ('si','sa',su') 
r~l<, {-< (open,write) 
• ~~, ~,¢ (do,say) 
o< ~, ,9)/~ (make,understand) 
~ ~ ~, ~i~ (continue,give) 
& &0~i~ (prepare) 
(example) 
~L~, ~ (many, strong) 
~b~,. }<~ (beautiful,big) 
L~£'~ ,, l:~k~ (cruel,hard) 
%~b~. ~,C <~(merry, tasty) 
t'~b~, (difficult) 
~\]~ (funny) 
(example) 
~,r~ 0. 9- -C I: (fairly,already) 
~ ~{'~. I~ ~/~6(each, almost) 
~tt', 69 (yet,now) 
~ (about) 
~U, ~\]>C (again) 
%7\]~ ~, ~E ( fir st, immediatly) 
~,o@L,~z (simultaneously) 
(pattern) 
1 KH 
2 KHH 
:3 HHH 
4 HHHH 
5 HHHHH 
6 OTHERS 
I 
100~(=251 ) 
(pattern) 
1 HHH 
2 HHHH 
3 HH 
4 K 
5 KH 
6 KHH 
7 OTHERS 
I 
108P.(=253) 
of character string of word 
- 321 
(Y) 
i~ 
(Y) 
i L-:~C' 0 
i A q O 
X : Rate of verb 
-- Y : Frequency 
lit/i*: 
• : .=~ ~ ~ .~T ~;:". "~'=~4..~ . .~ 
'.-,., .'." " ." ~F'i~ -~t .~ :~. "." " -" 
" ~a" "~ -~, ,, ~ "~ '~ 
)tL '1- "..'..-" "%~e:.';~ .~m "-,~ ",~'~ 
"" • • ~ "~ .~fi 
;s, . .** . * • ** . • * * . 't"-:.: -. " "' : ." • ~\] .~I ~ ~I 
~:A.: "." .:... , " " • .. .~ ~ "L4 ;.I ~" ",: ." :" ". 
• " k:" " & ,~ "~ II.~I • 
-.'.¢ ~. • . . . . ~ ~ .~. iS "~ • • .,: ,:..... ,. ... . . ... • . ~x?, % " "~ 'l'}~ 
to/.'.:.."." . .::, ,''{~1.: .... i • "~ , 
%~. : :... ....* • • " I. • "~ ,,...., ,....-. ,-. , .... 
• • ,~° ,~ 
• .. • . : - ". ff(':/'~l 
¢ * 
I I I I 
0 2.~ O0 75 
Fig. ii KANJI for verbs 
same pattern was found among dif- 
ferent parts of speech. In order 
to obtain more accurate results, 
we analyzed relations between each 
character and each part of speech 
in the data of newspaper (restric- 
tion : word-frequency ~ 3). 
In Fig.ll the axis Y repre- 
sents total number of the last 
KANJI in a word. 
e.g. KKHH 
T .last KANJI in a word 
The axis X shows the rate of KANJI 
used for verbs. KANJI on X=I00 are 
used for verbs in the all occur- 
rence of the last KANJI in a word. 
The reliability of axis X increas- 
es according to the value of axis 
Y. In the lower area of the graph, 
the value on axis X seems to be 
discrete because of shortness of 
the data. 
8. Conclusion 
These analyses are preliminary 
I works to make character dictionary 
I@@ ~ (X) having statistical data. We plan 
to use the dictionary for computer 
processing of various written Jap- 
anese. 
References 
.n+ -Z. 
• X 
,U /b 
,~...~. -~ .~ 
.~ 
i~ .~ 
X : Rate of adjective 
Y : Frequency 
~b 
*igC' 
• i,~ 
-2 
I I I I 
~3 25 58 75 
Fig.12 
I 
KANJI for adjectives 

References

\[i\] T.Tanaka, "A similation system 
for transliteration of writing 
form of Japanese",Mathematical 
Linguistics, Vol.ll,No.15,1978 

\[2\] T.Tanaka, "Transliteration of 
Japanese writing", bit,Vol.10, 
No.15, 1978 

\[3\] T.Tanaka, "Statistics of Japa- 
nese characters", Studies in 
computational linguistics, 
Vol. X, (National Language Re- 
search Inst. Report-67), 1980 

\[4\] H.Nakano et al., "An automatic 
processing of the natural lan- 
guage in the word count sys- 
tem", (in this proceeding) COLING 1980

\[5\] M.Nomura et al., "A study of 
Chinese characters in modern 
newspapers", N.L.R. Inst. Re- 
port-56, 1976 

\[6\] T.Morohashi,"DAIKANWA diction- 
ary", Taishu-kan Book Co. 1971 

\[7\] A.Tanaka, "A statistical meas- 
urement on survey of KANJI", 
Studies in computational lin- 
guistics, Vol.VI\[~, (N.L.R.Inst. 
Report-59, 1976 

\[8\] T.Ishiwata, A.Tanaka, H.Nakano 
et al., "Studies on the vocab- 
ulary of modern newspapers", 
Vol.l, Vol.2, N.L.R. Inst. Re- 
port-37,38, 1970,1971 
