DEVELORgENT OF BASIC PRACTICAL TECHNIQUES YOR JAPANESE LETTER 
STRING PROCESSING - AU~0NATIC KEYWORD EXTEAC~ION AND AUTOMATIC 
READING 
K. Arak/, K. Hinatou, K. ltaysma, T. Sahara, Y. Sakagamt and 
F. Takano 
T~e Japan Information Center of Science and Technology (JICST) 
2-5-2 NIgataoho Chiyodak~ Tokyo 100 Japan 
Japanese is a peculiar language among the thousands of 
languages in the world. There exist only two of the same class! 
Japanese and Korean. Japanese is written both in Chinese 
characters (ideograph) and in Kana (Katakana and HAragana - 
phonetlo symbols) In mixture without any space. Moreover, 0hi- 
neee characters in Japanese have, in most cases, several read- 
tngs and play several roles depending on the context and letter 
string characteristics. So for written Japanese, it wan very 
difficult to ee~nent letter string and extract adequate terms 
from sentence and to gave them correct readings automatically, 
which a~e indispensable for terminology, automatic reading, 
automatic indexing, key-boarding from on-line terminals other- 
wise more than 2,000 character key-board As necessary. 
The authors invented efficient algorithm and :developed 
computer programme and dictionaries for successful solution of 
the problems above for the first time in Japan. 
The system consists of two subsystems called K-KACS 
(Kan~i-Kana Automatic Conversion System) and JAEAS (Japanese 
Keyword Automatic Selection). 
Some Chinese characters act both as suffix, preffix or 
preposition and as parts of meaningful words. We comprehensive- 
- 21 - 
ly collected such characters (about 500) and those terms in 
Which the characters are included not as fixes or prepositions 
but as important part (about 8000 words). Letter string which 
is matched with dictionary term is passed but the letter re- 
mained and coincides with the special character itself is cutv 
In case of long letter string without such special 
letter, sentence is out by those terms of dictionary which 
are thought to be definite within reasonable amount. That iss 
dog liver nucleus DNase 
indefinite type of word. definite type of word. 
Equally, among the varlety of readlngs - in some oases 
more than 8 - some are speelel and definite and others ~re in- 
definite but obey to rules. We collected these speolal read- 
ings (about 25,000) for about 2,000 Chinese characters and 
developed algorit~ and progr~e to select the correct readlng 
for each Chinese character with the precision higher than 
99.94 ~. 
As the dlotionary is small enough and lo~Io is simple, 
implementation and meintenanoe are relatively easy and the 
speed ls very hlgho 
JICST adopted this system for its information file 
production and services of more than 400,000 citations per 
ye~T and save cOStS° 
By the development of the techniques, p~ocessiug of 
Japanese has become to be able to cope with western lan6uages, 
We were awarded for the work The Prize of Learning of Japan 
Association of Information and Documentation in 1980, and have 
applied patent (Japan Patent Koks£ Shows 55 (1980) - 102074). 
- 22 - 
Znformatton FAle of JZCS~ 
000163' 81/11/05 
~S~A010 
~B010 
~B020 
~B510 
~COlO 
~C020 
~C030 
\[~DOIO 
~EOIO 
~E020 
E~E030 
~E040 
~E050 
~0~0 
731455001@ P I0842810425810506~I 101447~ 
I B B03030G(~a34,2 - 71 - 8~ 
2D C 06020V~)551.463~ 
1E050T2G08061P 0303~,~ 
a105 0100EN 800978514-E-759 B D Z USAS0 0 ce~n~ 0 ce~n~ 
IUniv, New Hampshlr~Univ. New \]-\]ampshlr~ 
ICOX P@.cox P~ 
IHARVEY P@HARVEY P~ or~n~ t~t~e 
IRENTIS P~RENTIS P~ / 
IS IVAPRASAD K®S IVAPRASAD 
IYILDIZ A@YILDIZ A~ 
IYILDIZ M@YILDIZ .M~. 
~is8o 
translated title 
e I 
~FOIO Sound propagation in a shallow water region overlying a vl~cela~tlc halfspace.l~ / 
i 
~HOIO ( :$, "./ Y "/" .t: 4 , , / ~' ,~ D "/ .~ ~ ,~ 1 .tr. "..," :~ 4 , J ~, ~ 4 @ i ~->"¢~':/"¢>')~ "~ I 
I 
I F;~J010 000~13@¢ ~'~l~)d'>',¢f>',¢:.,~ automatically tz~nsl.~ 
~J030 035336@ * 
J O4O OO0999@ 
~J O70 01O455@ 
L~ J 080 007946~ 
~K001 015216~01~) 
~K002 016038(~2~ 
~K00~ 000.~01® 
~KO0.~ O028S2®d2@ 
~l(005 00.1281~03@ 
~K000 0138~@01@ 
~K007 000437~01@ 
~K008 03~90@01® 
~1(000 025003~02~ 
~KOIO 000951(~03(~ 
~K0ll 014-185~)0\[~ 
~K012 014135(~2~ 
~K013 0~40~03@ 
~K01.1 031~_~01@ 
i:~K015 031901~02~ 
~K016 031900~03@ 
~K017 0208~4~7)04@ 
~L010 
E~M010 
E~M020 
~Mo3o 
~M0.10 
.,~i~®.z 4 ,:/:/ \[ 7 y,,,~) 
~.i~J"t® 4 ".," >'.~ .,fS; 
,~:r'~-@:.':/,:9 \[science\] \[\] 
?.,15,f~t~.~® .i- S~ ,f ~- + ~ ..r ,:,, 71ri 
titlm (z.oi~L~) 
human i~exlag 
up-wuz~ pasting 
b7 thesaurus 
-- - free human indexing 
automatio ta~dex~ng from Japanese 
title 
- 23 - 
000041 
~AOIO 
~BOIO 
~BSIO 
~COlO 
~co2o 
~co3o 
~E010 
~Eo2o 
~F010 
~O010 
~1|010 
~I OlO 
l~tJ 010 
~J02O 
~Jo3o 
B J®40 
~JOT0 
.~ J o80 
~J ~oo 
~J HO 
~J 120 
~K001 
~K002 
~K003 
~K004 
~K005 
~K006 
~MOIO 
~MO~O 
~MO~O 
~.Xi050 
8t/ti/o5 
B~03300~1@ G 20740g10421810506~10'~715~ 
I W C 03020T~62t .785~ 
I G03031M0303~ 
alO~ 0040RU 800770829RI¢3AAJ SUNSO lz~ Vyssh Uchebn Zaved Chern Meta 
II®lzv Vyssh Uchebn Zaved Chern Metat#F~ 
03\[;8 - 0797 \[ V UMA 1~3- 0797. " 
OJIblIIAHCKHR B M®OJI~.IUAHCKt4R B IVt~/~'-Oz'Z~\]'II~L title 
rPVIH~EPr B ~IOrPVIHL~EPr B ~ // r,~ro,~.tsle.tod title 
O,~eae~eHwe OnTetmabtlO~ I~aOpwRItOCTw TOllJIWB~ rip..arpeBe.~\] ~,~O~8,ti08~.l~" tranB1 
(~*'~ / ~'4 / *~')~9 / ~4-Y-4-,,'~.7'7~ e) / ~,~)~tltle (rea~tJlg) 
~®~' v, .~ human lndex~n~ 
00~549@ 
OO35070* 
0106,.q2~* 
001131® 
O035O10 
0431970 
0006~20* 
0001~ 
003~,93® 
O11329~ 
0.128~®01® 
O00~t.~@01® 
002197~01® 
003~07®0 l® 
0~0~7@01@ 
~oad~ 
t p-word p~st~ng 
by thesaurus 
t automatto ~ndo~L~w~ from Japanese 
title 
" 24 - 
