AN AUTOMATIC TRANSLATION SYSTEM OF NON-SEGMENTED KANA SENTENCES INTO KANJI-KANA SENTENCES 
Hiroshi Makino 
Faculty of Engineering Science, Osaka University 
Machikaneyama-eho, Toyonaka, Osaka 560, JAPAN 
and Makoto Kizawa 
University of Library and Information Science 
Yatabe-machi, Tsukuba-gun, Ibaraki-ken 305, 
JAPAN 
Sum~lary 
This paper presents the algorithms to solve 
the two main problems comprised in the automatic 
Kana-KanJi translation system, in which the 
input sentences in Kana are translated into 
ordinary Japanese sentences in Kanji and Kana : 
the segmentation of non-segmented sentences into 
Bunsetsu and the word identification from homo- 
nyms. Employing this algorithm, non-segmented 
Kana input sentences could be automatically 
translated into KanJi and Kana output sentences 
with 96.2 per cent success. 
Introduction 
In the computer processing of the Japanese 
language informations, the input method is much 
more difficult than in other Indo-European 
languages because thousands of kinds of charac- 
ters in mainly two classes, KanJi(ideograms) and 
Kana(phonograms), are used together in writing 
regular sentences. 
Conventional Japanese typewriters are 
equipped with least 2000 KanJi(Chinese charac- 
ters) which are frequently used in daily use. 
A typewrite of this sort is difficult for us to 
handle and its typing speed is much lower than 
that of alphabetic typewriters because operators 
must look for characters one by one. 
One of the most promising inputmethods to 
overcome this intrinsic input difficulty is 
Kana-KanJi translation system, in which all the 
sentences are input with Kana only using a 
regular 44-Key keyboard and then translated into 
regular KanJi-Kana sentences automatically in 
the computer. 
The automatic translation system consists of 
two processes; the segmentation and the word 
identification processes. 
The problem 9 iP Kana-Kap~i translation 
The problems in Kana-KanJi translation are: 
(a) segmentation of input sentences. 
(b) word identification from homonyms. 
These problems are basic in the processing 
of Japanese sentences as language informations. 
Japanese sentences in KanJi and Kana have no 
spaces between words as English ones do. However, 
in order to make the computer process Kana 
sentences easy, it would be necessary to put a 
space as a segmental symbol between words or 
some units in sentences. Therefore, some spacing 
methods, listed in Fig.l(concluding non-segment- 
ed sentence for convenience), was already adopt- 
13 ed in Kana-Kanji translation systems. - 
(I) genzai jinrui ha sugure ta me to yubisaki no 
kankaku wo mot te iru. 
(2) genzai jinrui ha sugure ta me to yubisaki no 
kankaku wo mot teiru. 
(3) genzai jinruiha sugureta meto yubisaklno 
kankakuwo motteiru. 
(4) genzaiJinrui ha sugu reta me to yubisaki no 
kankaku wo mot teiru. 
(5) genzaiJinruihasuguretametoyubisakinokanksku- 
womotteiru. 
(i) segmented between words 
(2) segmented between an independent word 
and a sequence of dependent words 
(3) segmented between Bunsetsu 
(4) segmented between KanJi and Kana 
(5) non-segmented 
Fig.1 Examples of segmentations in a Japanese 
sentence. 
However, these pre-editing methods of word 
segmentation or unit segmentation are not only 
an too laborious for most of the Japanese people 
who are not accustomed in segmenting each sen- 
tence into words but also apt to be erroneous. 
It is, therefore, necessary in Kana-KanJi trans- 
lation system to segment the Kana strings into 
words or other units automatically. 
The number of different syllables in Japa- 
nese is much less than in English or in Chinese, 
while the number of KanJi is much more. Conse- 
quently, there are many groups of KanJi which 
have the same pronunciation. This fact makes 
word identification more difficult in Kana-KanJi 
translation since there is no one-to-one corre- 
spondence between KanJi and Kana. For example, 
Kana strings '= ~ ~ y'corresponds to 25 words in 
an ordinary dictionary and a part of these are 
shown below. 
Example. 
Kana KanJi a meaning 
~ a battle 
~ a resistance 
~ an iron ship 
--295 
~ a bea. 
~ a public election 
H~ a commission 
~ a mineral spring 
The segmentation process 
Bunsetsu 
A Japanese sentence is composed of the sequences 
of syntactic units called Bunsetsu pronounced 
without pausing. Bunsetsu usually consists of 
two parts: an independent part and a dependent 
part. The independent part consists of an inde- 
pendent word or its derivative, and the de- 
pendent part consists of a sequence of dependent 
words, given as follows: 
Bunsetsu=(independent part).(dependent part) 
independent part 
=\[prefix\].(independent word).\[suffix\] 
dependent part 
=\[dependent word\]* 
independent word=noun/pronoun/adverbs/ 
verb/adjective/verbal adjective/ 
attributive/conjuction/interjection 
dependent word=auxiliary verb/particle or 
postposition 
Here, brackets indicate optionality, the aster- 
isk indicates one or more repititions or non- 
existing and the slants indicate alternatives. 
The independent words('Jiritsugo') are 
divided into two main groups: inflected words 
which consist of verbs, adjectives and verbal 
adjectives('keiyodoshi'), and non-inflected 
words which consist of nouns, pronouns and 
others. On the other hands the dependent words 
consist of particles and auxiliary verbs which 
have their inflections. 
There are grammatical connectabilities be- 
tween a preceding word and its succeeding word 
in Bunsetsu. This is explained using an example 
in Fig.2. 
ikanakerebanaranakatta (had to go) 
V AUX P AUX AUX AUX 
V:verbs, AUX:auxiliary verb, P:particle 
Fig.2 An example of Bunsetsu 
An indicative form 'ika' of a verb 'iku' can be 
concatenated not only by inflectional form 
'nakere' of auxiliary verb 'nai' in this example 
but also by all of inflectional forms of 'nai'. 
And the particle 'ba' is preceded by the con- 
ditional form of 'nai'. Thus, these properties 
are decided upon each inflectional form of the 
preceding word(if the word is an inflected word) 
and its succeeding word. These connectability 
features in Bunsetsu constitute the basis of the 
segmentation of Kana strings described in later 
sections. 
The lonsest string-match method of two Bunsetsu 
For segmentation, each independent word is, 
in the order of length, first separated by 
comparing the Kana strings with the vocabulary 
of a word dictionary, and is stored with the 
informations such as parts of speech and 
inflectional forms if necessary for further 
morhological analysis. 
Then, the dependent words in the rest of the 
strings are recognized using the dependent-word 
list and grammatical connectabilities between 
the dependent word and the independent word are 
examined. This analysis is continued until no 
succeeding word is found in the successive Kana 
strings. Thus, the candidates of a Bunsetsu are 
extracted from Kana strings as below. 
Example. 
souiuzassiwo ... (a part of strings) 
soui ... (noun) 
sou.iu ... (adverb.auxiliary verb) 
sou ... (verb) 
The same analysis as mentioned above is exe- 
cuted for the rest of the strings from which 
each candidate of Bunsetsu is separated. 
Consequently, the sequence of two candidates 
of Bunsetsu is extracted from Kana strings, and 
then the Bunsetsu in the sentence is appropri- 
ately identified so as to make the total length 
of two consecutive strings of their candidates 
maximum. This algorithm decides only the bounda- 
ry between two consecutive Bunsetsu. In other 
words, the preceding Kana strings and these con- 
stituents for the Bunsetsu are recognized. On 
the other hand, the decisions for succeeding 
Bunsetsu are tentative at this stage. 
These processes named as the longest string- 
match method of two Bunsetsu 4 are executed 
sentence by sentence and at length the input 
sentences are converted into Bunsetsu and homo- 
nyms in Bunsetsu are stored. An example is 
illustrated in Fig.3. 
souiuzasshiwo... 
i) souiu zasshiwo... 
2) soul... 
3) soui iu... 
Fig.3 Segmentation process of Kana 
strings by the longest string- 
match method of two Bunsetsu. 
The successive candidates of Bunsetsu in i) and 
3) are compared since the succeeding Kana 
strings are not analyzed in 2). As the total 
length of two analyzed strings in i) is longer 
than that in 3), the segmentation in i), namely 
the Bunsetsu 'souiu' is decided as the result. 
296 
The proccessin5 of unknown words 
The longest string-match method of two 
Bunsetsu is based on the grammatical character- 
risties of the words, and so is not applicable 
to unknown words to the word dictionary. Hence, 
it would be easily expected that the appearance 
of an unknown word in a sentence makes the 
segmentation impossible. Therefore, it is neces- 
sary in non-segmented sentences to take account 
of the processing of unknown words. 
The dependent words are divided into two 
main groups by their connectability character- 
istics. One is the word class, named is A, that 
is preceded by nouns or non-inflected words. The 
other is the word class that is preceded by in- 
flected words and is further sub-divided into 
four sub-classes, named as B, C, D and E, ac- 
cording to the preceding word conjugations which 
are of indefinite form, conjunction form, final 
form and conditional form, repectively. The de- 
pendent words and their classes of connect- 
abilities are given in Table i. 
Table i Classification on 
connectability of dependent words. 
words class words class 
no 
ni 
te 
wo 
ha 
ta 
ga 
da 
de 
to 
mo 
nai 
masu 
kara 
desu 
he 
ka 
ba 
made 
A 
A 
C 
A 
A 
C 
A 
A 
A 
A 
A 
B 
C 
A 
A 
A 
A 
E 
A 
ya 
u 
nado 
dake 
ZU 
demo 
yori 
nagara 
tara 
n' 
tari 
shi 
rashii 
beki 
naku 
bakari 
shika 
taru 
A 
B 
A 
A 
C 
A 
A 
C 
C 
B 
C 
D 
A 
D 
C 
A 
A 
A 
Now, suppose that the search for the word 
dictionary fails. Then, the word in the above 
dependent word list is searched for the rest of 
the strings without being segmented. If a de- 
pendent word is found and its preceding Kana 
corresponds to an inflected word-ending suc- 
ceeded by it vowels of inflectional endings 
of indefinite, conjunction, final and condition- 
al forms are '-a', '-i' or '-e', '-u' and '-e', 
respectively, then the dependent word is recog- 
nized and its succeeding Kana strings are ana- 
lyzed morphologically as mentioned in the pre- 
ceding section. Consequently, the dependent word 
sequences are extracted and utilized for next 
segmentation. 
The word identification process among homon~L~ 
As mentioned above, a part of words in input 
sentences is identified in grammatical or.mor- 
phological analysis, But there are still many 
homonyms which have the same grammatical charac- 
teristics in general. Therefore, further word 
identification will need for syntactical and 
semantical analyses in a given sentence. 
The usage dictionary 
The usage dictionary contains the infor- 
mations of word uses which play an important 
role on word identification from homonyms. 
Informations of word uses would be divided 
into two groups: colloqual information of words 
such as derivatives, compound words and ideoms, 
and semantic informations such as "semantic 
pattern" representative of nouns and verbs. 
Case relations accompanied with verbs in a 
sentence are explicitly marked with particles 
attached by nouns. Usually, the particles 'ga', 
'wo' and 'ni' indicate nominative, objective and 
dative respectively, whose case relations are 
fundamental, and so these are called 'ga' case, 
'wo' case and 'ni' case, respectively. 
Accordingly, the so-called case frame of each 
verb has been studied with an emphasis on these 
particles. 
Example. 
\[watashi\] ga aruke \[I\] walk 
\[hon\] wo yomu read \[book\] 
\[mono\] ni sawaru touch \[thing\] 
where, Ix\] means a semantic feature or semantic 
category of x. 
One of difficulties of doing the work is the 
semantic classification of each word. To avoid 
this burden, the semantic category of each word 
is identified according to the system of "The 
Word List by Semantic Principles" edited by the 
National Language Research Institute, in which 
about 32,600 words are divided into 798 semantic 
categories.5 
The particle 'ni' also occurs after locative 
noun which mean the location. However, it is 
empirically assumed that either locative nouns 
or dative nouns occur with each verb in a simple 
sentence. The example is given as follows, 
\[hito\] ni \[ie\] ni itta 
...said to the men to the house... 
...went 
The above example is unusual and this fact 
means that semantic features of nouns with 'hi' 
are derived from surface structures of 
sentences. 
The case frame 6 of each verb is different, 
and so semantic categories of nouns and standard 
particles used as semantical "identifiers" are 
described in the usage dictionary. 
297 
Example. 
Kaku : \[hito\] ga \[ji\] wo \[kami\] ni \[dougu\] de 
write ~ HUMAN LETTER PAPER INSTRUMENT 
iku : \[hito\] ga \[basho\] kara \[basho\] he 
go : HUMAN LOCATION LOCATION 
The particles 'de', 'kara', and 'he' with 
respective semantic categories are filled up in 
the usage dictionary in the above example. 
For adjectives and verbal adjectives, seman- 
tic categories of nominative nouns are only 
filled up in the usage dictionary. The example 
is given as follows: 
utsukushii : \[hana\] ga 
beautiful : FLOWER 
kireida : \[hana\] ga 
pretty : FLOWER 
Where, 'kireida' is a verbal adjective in Japa- 
nese which corresponds to an adjective in 
English. As a result, we have investigated 
"semantic pattern" for 3421 inflected words 
which consist of verbs, adjectives, verbal 
adjectives and verbs conjugated with 'suru' 
which are called 'sahenmeishi', since their word 
stems are regarded as nouns in Japanese. These 
words are extracted from the vocabulary frequen- 
cy table edited by ~he National Language 
Research Institute. r 
On the other hand, informations about nouns, 
namely, their derivatives composed with prefixes 
and suffixes, compound words and idioms are col- 
lected from an ordinary dictionary.UThe example 
of a part of the usage dictionary is illustlated 
in Fig.4. 
in dteem prefix 'suffix ! compound case word, idiom ~a wo ni others 
! 
\[ 
Fig.4 A part of the usage dictionary 
The parsing 
After segmenting sentences into Bunsetsu, 
the parsing phase begins, in order not to take 
out so-called tree structures but to extract the 
syntactic relations between Bunsetsu or words. 
The parsing of the sentence is executed on the 
basis of the Kskariuke relations(something like 
the dependency relations) between Bunsetsu. The 
Kakariuke is the term in Japanese traditional 
school grammar. 
Characteristics of Kakariuke relations in a 
sentence are given as follows: 
(i) A final word or an inflectional form in a 
Bunsetsu decides what kinds of words to 
modify, on the other hand each of the 
independent words decides how to be 
modified. 
(2) Each Bunsetsu as a dependent always appears 
before its governor in a sentence. 
(3) Kakariuke relations between any two Bunsetsu 
do not cross with each other in a sentence. 
For simplicity of the parsing, we adopted 
the following two assumptions that would be cor- 
rect in most sentences. 
(4) A Kakariuke relation is decided on the 
smallest distance between a dependent and 
its probable governors. 
(5) Each Bunsetsu can be a dependent of only one 
Bunsetsu appearing after it except the 
Bunsetsu at the end of a sentence. 
The relations among Bunsetsu are searched 
taking account of the following three factors: 
five conditions mentioned above, final word as 
a dependent and an independent word class as a 
governor. The term noun phrase is used for 
Bunsetsu in which an independent part is a noun, 
and similarly a verb phrase for Bunsetsu con- 
sisting of a verb and its dependent part. But, 
for the phrase of the form of a noun and some of 
auxiliary verbs, which are called as copulas 
('desu', 'da' etc.), it is necessary to regard 
the phrase as a predicate in a sentence. 
Example 
kano~o ~ / watashi 7-n° / musume~ des~ 
(She is my daughter. ) 
In the above example, an underline denotes a 
word and a slant does a segmental symbol between 
Bunsetsu. An arrowed line denotes the Kakariuke 
relation between Bunsetsu. Usually, the 
Kakariuke relation between Bunsetsu, 'watashino' 
and 'musamedesu', is determined by the particle 
'no' and the noun 'musume', on the other hand 
the relation between 'kanojowa' and 'musumedesu' 
is determined by the particle 'ha' and the aux- 
iliary verb 'desu'. 
The pre-processin6 for the word identification 
In Japanese, the different semantic re- 
lations are reduced to the same syntactic re- 
lations of verbs with nouns intermediated by 
particles in active voice as in passive voice. 
The passive or causative voice is represented 
explicitly by the attachment of auxiliary verbs 
('reru, rareru') or auxiliary verbs('seru, 
saseru') to inflectional forms of verbs. 
Accordingly, the semantic normalization is 
necessary in the cases below. 
298 
(i) passive: 
Ni ga N2 ni V+reru(or rareru). 
--~ N2 ga N1 ni V. 
(ii) causative: 
N1 ga N2 ni N3 wo V+seru(or saseru). 
N2 ga N3 wo V. 
where Ni, N2 and N3 denotes a noun and V denotes 
a transitive verb. The auxiliary verbs (reru and 
seru) are used for the consonant conjugation 
verbs(godan katsuyo doshi), on the other hand 
the auxiliary verbs(rareru and saseru) for the 
vowel conjugation verbs(ichidan katsuyo doshi). 
The meaning of independent part which 
consists of an independent and a suffix is 
substituted for the meaning of its suffix. 
Similarly, the meaning of the numbor that con- 
sists of the set of the numeral plus counter is 
representative of the meaning of its counter. 
Example 
\[nihon+jin\] ---~\[Jin\] 
\[lO0+nin\] __,\[nin\] 
where,'jin'and'nin' are a suffix and a counter, 
respectively that mean the word "human". 
The dependent part composed of more than two 
dependent words are substituted for a dependent 
word representing a case in order to consult the 
usage dictionary in next steps. 
Example 
Tokyo.he.mo itta ~ Tokyo.he itta 
(went to Tokyo, too) (went to Tokyo) 
Word selections from homonyms 
Word selections from homonyms an executed 
using both colloqual informations and infor- 
mations about cases with verbs. 
Word selection based on noun-to-verb relation 
Word selections from homonyms are executed 
particle attached to each noun. At that time, 
each particle is converted into the "standard 
particle" in the preprocessing phase. And so, 
each semantic category of homonyms (nouns) is 
compared with the corresponding semantic cate- 
gory code in the usage dictionary, and the most 
matched word is selected. When homonyms are 
verbs, the verb and the nouns as case elements 
of the verb are selected taking account of the 
numbers of case found in the sentence. The 
nouns related with verbs intermediated by the 
particle 'no' are referredto the nominative 
nouns. As it is assumed that the noun attached 
by copulas such as 'desu' are in the synonymous 
relation to nominative nouns, each pair is 
selected from homonyms. 
As it is difficult to estimate the case re- 
lations between verbs and nouns modified by 
their verbs because of no occurrence of parti- 
cles, the reference to the case elements not 
identified yet are tried. In the example below, 
the words 'hon' are examined whether they are 
nominative or objective elements of the verb 
'morau'. 
Kare ni moratta hon (book received from him ) ---r-m--r- 
hon ga morau 
hon wo morau 
Word selection based on noun-to-noun relations 
For the Bunsetsu composed of prefixes and/or 
suffixes and independent words, the derivative 
is decided according to their prefixes and 
suffixes in the usage dictionary. 
When the successive nouns are found, each 
registration is examined, and the registered 
word in the usage dictionary is selected if any. 
Informations as for idioms are also, re- 
ferred for nouns and verbs in the Kakariuke re- 
lation because ~heir words are identified in 
colloqual expressions. In the sequence of two 
nouns, either of which is 'sahenmeishi', it is 
often assumed that the semantical relation be- 
tween two nouns is based on the case relation 
because 'sahenmeishi' also have the character- 
istics as verbs. 
Example 
jouhou shori (information processing) 
Jouhou wo shorisuru (... process infor- 
mations...) 
The semantic category of alternative nouns 
'jouhou' are compared with semantic categories 
of case elements of a verb 'shori + suru' are so 
"~$~' is selected from homonyms(-kJY~J~, etc. ) 
As it is assumed that two nouns intermedi- 
ated by the conjunctive particles('to', 'ya', 
'dano', 'nari', etc.) are in the relation of the 
same or similar semantic categories. 
The pair of nouns is selected, whose 
semantic category codes are close to each other. 
A synonym and antonym are included in the same 
semantic category as shown in the following 
example. 
Example 
Sensei to reiju 
( absolutism and slavery ) 
The most frequent word is selected for homo- 
nyms undetermined by the analysis of word uses. 
Dictionaries 
Implementatio,n 
- 299 
The dictionaries for this Kana-Kanji trans- 
lation system are given in Table.2 with a brief 
explanation. 
(a) The independent word dictionary 
The contents consist of sequential numbers, 
indexes of Kana, Kanji representation, 
numbers of Kanji, inflectional forms, word 
frequency, semantic category and information 
for dictionary search. 
This dictionary has about 8000 independent 
words chosen from "Vocabulary and Chinese 
Characters in Ninety Magazines of Today. ''7 
(b) Connection matrix 
The connectability between preceding words 
and succeeding words in Bunsetsu is rep- 
resented by the matrix, in which each row 
corresponds to the preceding words or their 
conjugations and each column to the succeed- 
ing words. Each element takes the value of 
i or 0, and i stands for that words of row 
are connectable to the succeeding words 
of the column. 
The size of this matrix is 154X108. 
(c) The table of inflectional word endings 
For analyzing three inflected words(verbs, 
adjective and verbal adjectives), their con- 
jugations and their correspondences to each 
row of connection matrix are listed, because 
these occur before dependent words in 
Bunsetsu. 
(d) The dependent word list 
This list consists of dependent word 
(particles and inflectional forms of auxilia- 
ry verbs) and their correspondence of rows 
and columns of the connection matrix. 
(e) The prefix, the suffix and the counter 
dictionaries 
These dictionaries include 47 prefixes, 311 
suffixes and 141 counters, respectively, and 
also their Kanji representations. Moreover, 
the suffix and the counter dictionaries 
include their semantic category codes. 
(f) The dependent list for segmentation 
The dependent list consists of the words 
and their classes listed in Table i. 
(g) The usage dictionary 
This dictionary have contents such as in 
Fig.2. 
Table 2 List of dictionaries 
(a) The independent dictionary 
(b) The connection matrix 
(c) The table of inflectional endings 
(d) The dependent word list 
(e) The prefix, the suffix and the counter 
dictionaries 
(f) The usage dictionary 
The system 
The automatic Kana-Kanji translation system 
was inplemented on FACOM 230-45S equipped with 
256 kilobyte memory. The programs in PL/I 
consist of 17 sub-programs. 
segmentation 
process 
word identifi- 
cation 
precess 
(i'nput sente'nce-) 
$ 
The longest string- 
I match method of 
I two Bunsetsu 
I The segmentation for 
unknown words 
,The homonym analysi 
Fig. 5 The flow of Kana-Kanji translation 
Input sentence : 
(i) Segmentation process 
l) 2) ~) 4) 5) 6) 
/ ~)~© / ~{L~c / 
7) 8) 
(II) Parsing 
, L I I 
(Ill) Output sentence 
Note: Words are arranged in their frequency 
order in (I). Arrowed lines denote the 
Kakariuke relation between Bunsetsu. 
Fig. 6 An example of Kana-Kanji translation 
process. 
- 300 
An input sentence is first segmented in 
Bunsetsu, and second Kana homonyms in Bunsetsu 
are identified, consequently transformed into 
Kanji and Kana sentence. These processes are 
executed alternatively in a sentence as illus- 
trated in Fig.5. 
An Example of Kana-Kanji translation process 
is illustrated in Fig.6. 
(I) in Fig.6 shows segmented Bunsetsu and 
homonyms and (II) shows Kakariuke relations 
between Bunsetsu, on the basis of that re- 
lations in (II), 
case relation: (~) , (~/v~) 
idiom : ( ~ ) , (~-~) 
compound word: ( ~N ) , (~) 
'sahenmeishi': (.~£©) , (~{~) 
each word is selected from homonyms. At a re- 
sult, the output sentence is acquired in (III). 
Experimental Result 
In order to evaluate translation efficiency, 
2592 Bunsetsu in 214 sentences were chosen from 
various literatures, magazines, articles etc. 
Results of the experiment is shown in 
Table 3. 
Table 3 Experimental result 
segmentation translation 
correct 98.8 % 96.2 % 
error 1.2 % 3.8 % 
Translation errors are classified into 
segmentation errors and word selection errors. 
Segmentation errors are divided into errors 
caused by the longest string-match method of two 
Bunsetsu, unknown word and grammatical incom- 
pleteness, whose examples are denoted at (i), 
(2) and (3) in Table 4, respectively. 
Errors by the longest string-match method 
of two Bunsetsu occurred on seven boundaries of 
Bunsetsu in the data. 
On the other hand, word selection errors are 
apparently due to the uses of word frequencies. 
However, the true causes of errors are due to 
incompleteness of homonym analysis. They are 
given as follows; not taking account of the 
segmentical relation underlying between nouns 
formed with the noun phrase pattern"noun + 'no' 
+ noun", not identifying the meaning of pronoun 
in context, not identifying the ambiguities 
between case relations and other semantic re- 
lations, for example, such as adverbial re- 
lation for verbs, Their examples in the data 
are illustrated in (4), (5) and (6) of Table 4, 
respectively. Appendix shows examples of the 
segmented sentences and the corresponding 
sentences in Kanji and Kana. 
Table 4 Examples of errors 
Erroneous 
4 ) ~j©~ 
Correct 
Iq ©A#a 
* Katakana shows the segmentation based on 
dependent word only. 
Conclusion 
We have proposed new approach for two main 
problems: segmentation of sentences into 
Bunsetsu and homonym analysis, in automatic 
Kana-Kanji translation, which should be basic 
linguistic problems. Moreover, an experimental 
system was constructed to make sure of their 
efficiency. As a result of experiments 96.2 per 
cent of the whole Bunsetsu in input sentences 
were seccessfully translated into Kanji where 
they should be. 
For promoting applicabilities of this 
system, we are going to prepare the dictionary 
including about 30,000 words in daily use. 
The difficulties in Kana-Kanji translation 
is based on ambiguities about the utterance, 
accordingly, further studies on understanding 
sentences would be needed for overcoming these 
difficultes. 
Acknpwled@ements 
We would like to thank Mr. Masakazu Okada 
for his cooperation in this work. 
The research described in this paper was 
partially supported by the Ministry of Edu- 
cation Science and Culture in 1979. 
Appendix. 
V~w, 7~9 1- 
w~J ~" 
~x:~,= I- ~ -k >,,Y' 
>'.~ ~ = ~e\] '~' 
# ~ I) _.- e b ~,+# 
~xO=~9- 079 
=.'~4 "~', ~ "1' 
~ v" 9 -", # ~" t/.y 
e ~ ,~'7 I), u.: 
,~4= # b~ b.x ~ ~')~. 
~\] ~ T ~'. 
(i) Kana sentences in automatically 
segmented Bunsetsu 
~$~l~©~~b~, ~©~ 
~, ~o~, ~UoL~©~b~ 
Note: Underlined words are in error. 
Katakana denotes no analized strings. 
(2) Output sentences in Kanji and Kana 
Output examples (The preamble in the Constitution of Japan) 
--302-- 

References 

\[i\] I. Aizawa and T. Ebara, "Machine Trans- 
lation System of 'Kana' Presentations to 
'Kanji-Kana' Mixed Presentations." NHK. Tech. 
Res., pp. 261-98(1973). 

\[2\] Y. Matsushita, H. Yamazaki and F. Sato, 
"Kana Alphabet to Kanji Converting System." 
JOHOSHORI, Vol. 15, No. i, pp. 2-9(1974). 

\[3\] H. Makino, M. Kizawa and Y. Katsube, "Trans- 
formation of Kana-input into Kanji-presented 
Sentence." JOHOSHORI. Vol. 18. No. 7, PP. 
656-63(1977). 

\[4\] H. Makino and M. Kizawa, "Automatic Segmen- 
tation for Transformation of Kana into 
Kanji" Trans. of Inf. Proc. Society of Japan, 
Vol. 20. No. 4, pp. 337-45(1979). 

\[5\] The National Language Research Institute, 
"The Word List by Semantic Principles" p. 
362, SYUEI SYUPPAN, Tokyo, Japan (1973). 

\[6\] C. J. Fillmore, "The Case For Case" in Uni- 
versals in Linguistic Theory, Holt, Rinehart 
and Winston, New York (1968). 

\[7\] The National Language Research Institute, 
"Vocabulary and Chinese Characters in Ninety 
Magazines of Today" p. 321, SYUEI SYUPPAN, 
Tokyo, Japan (1962). 

\[8\] K. Kindaichi edited, "SHIN-MEIKAI KOKUGO 
JITEN", SANSEIDO, TOKYO (1971). 
