MORPHOLOGICAL ASPECT OF JAPANESE LANGUAGE PROCESSING 
Kosho Shudo, Toshiko Narahara 
Department of Electronics 
Fukuoka University 
Ii, Nanakuma, Nishi-ku 
Fukuoka-shi, 814 Japan 
and Sho Yoshida 
Department of Electronics 
Kyushu University 
6-10-1, Hakozaki, Higashi-ku 
Fukuoka-shi, 812 Japan 
A comprehensive grammatical model produced for 
analyzing the agglutinated structure which 
characterizes the Japanese language is pre- 
sented. This model, which includes extensively 
idiomatic postpositional expressions as termi- 
nals, is quite effective for the development 
of the Japanese language processor receptive 
to a reasonable variety of sentential forms and 
applicable to relatively wide fields. 
Introduction 
The following fundamental problems are still 
latent in most present systems of the natural 
language processing: (i)how to enable the system 
to have a higher quality processing that ren- 
ders the output more feasible; (ii)how to 
broaden the applicable field of the system; and 
(iii)how to allow the system to accept more 
"natural" input sentences, including miscella- 
neous linguistic constructions. In order to 
remedy these problems, we will need not only far 
advanced A.I. researches on the knowledge repre- 
sentation or deduction, but also more elaborate 
studies on the surface structures of natural 
sentences from the engineering viewpoints. 
Among other things, the requirement for the lin- 
guistic approach on the engineering side is 
quite urgent for Japanese language processing, 
since we have no Japanese grammar which is 
extensive and definite enough for solving, espe- 
cially, problem (iii). 
The authors have been developing a Japanese 
language parser for a Japanese-English 
translation system on the following standpoints. 
(1)~de coverage of the input forms; we aim at 
a system which is powerful enough to accept 
with less exceptions the sentential forms 
which appear in the actual, colloquial and 
written texts (e.g. non-pre-edited sentences 
in technical papers). 
(2)Two-phase parsing~ The system first analyzes 
the local expression which is the syntactical 
and semantical unit constituting immediately 
the input sentence, and then analyzes the 
whole sentence by detecting the relationships 
between the units. The first phase, which 
corresponds to the morphological phase in the 
ordinary parser of the European language, is 
designed for analyzing not only the word's 
inflection but the "agglutinated" structure 
characterizing the Japanese language. 
We attach much importance to the first phase 
which has a great influence on the overall 
performance of the system. 
(3)Elaborate preparation for the first phase; In 
the first phase, we adopt an elaborate 
grammatical model that prescribes the 
internal structure of the above-mentioned 
units in detail. The extensive enumeration 
of postpositional expressions carried out 
in the model, among others, is quite 
effective for solving the problem (iii), 
since they determine the syntactical and 
semantical "framework" of the Japanese 
sentence. The inflection of the word can 
also be manipulated almost without exceptions 
in a relatively simple way in this model. 
(4)Matching of the first phase and higher phases; 
Most of the atomic postpositional expressions 
enumerated in the model are idiomatic ones 
which should be treated without decomposing 
into words because of their definite and 
unitary meanings. This fact yields a good 
matching of the first phase and the higher 
semantical phases. 
(5)Disambiguation in the first phase; A certain 
part of the polysemy of the postpositional 
expression can be reduced by the restriction 
for the co-occurence on the neighboring 
positions in the sentence. Our grammar for 
the first phase is designed to carry out 
disambiguation of this type. This is based 
on the idea that the syntactical and 
semantical structure ought to be dis- 
ambiguated as early and as much as possible 
from the viewpoint of the system's total 
efficiency. 
In this paper, the above mentioned grammatical 
model for the first phase of parsing, which may 
be called "pseudo-morphological" phase, is shown 
and the experimental system developed for the 
verification of its validity is outlined. After 
showing some operational examples and the result 
of the experiment, we conclude that our model is 
quite effective for Japanese language processing 
from the standpoints mentioned above. 
Japanese sentence, E-bunsetsu 
The information to be extracted from the input 
sentence by the parser can be generally 
classified into following three types: 
(a)the information of the concept which is 
ordinarily provided by the conceptual word 
(e.g. noun, verb, adjective); 
(b)the information of the relationship between 
concepts; 
(c)the supplementary information such as of 
"tense","aspect","mood",etc. 
Japanese is an agglutinative language and is 
very far from European languages from structural 
viewpoints, i.e. the information of type(b) or 
(c) is ordinarily given by the annex-expression 
agglutinated postpositionally to the conceptual 
expression which gives the information of type 
(a). We call the compound which consists of the 
annex- and conceptual expression E-bunsetsu%. 
The information of type(b) is given as the 
dependency relation, called kakariuke-relation 
between E-bunsetsus. A sentence consists 
immediately of E-bunsetsus positioned in a 
relatively free order except for a few con- 
straints "\['j~. Because of this structural feature, 
we adopt the two-phase approach for the parsing 
of the Japanese sentence: the first phase for 
analyzing each E-bunsetsu; the second, for 
detecting the kakariuke-relational structure of 
the sentence. 
It is apparent that the extensive characteri- 
zation of the E-bunsetsu yields the wide 
coverage of input sentential forms to the system. 
Specifically, the extensive enumeration of the 
annex-expressions will drastically broaden the 
range of acceptable input forms, since they make 
the syntactic and semantic "framework" of the 
sentence. However, the annex- or conceptual 
expression may itself be a compound of atomic 
expressions and is too multiformed to be 
enumerated extensively. 
From these points of view, we have constructed 
a grammatical model for analyzing the E-bunsetsu 
by, first, enumerating extensively atomic 
expressions excepting most of the conceptual 
ones that are quite numerous; secondly, classi- 
fying them by the syntactic and partially 
semantic functions; thirdly, prescribing the 
connectability rules of atomic expressions 
within the E-bunsetsu. 
Atomic Expressions 
The notion of "atomic expression" is the 
extended one of "word" so as to include the 
idiomatic word-string which has the unitary, 
self-supported meaning and the definite 
syntactic function. Though we often encounter 
such idiomatic strings in the sentence of every- 
day use, it has not been clarified exhaustively 
The notion of "bunsetsu" in the conventional 
school grammar is well known as the unit for 
sentence construction. However, the unitary 
local structure in the real sentence used in 
every day life is often too multiformed to be 
analyzed with it. The notion of "E-bunsetsu", 
which is a fully extended version of "bunsetsu", 
was devised from the standpoints mentioned in 
the previous chapter. 
%%When we let a string, EB l EB 2 ''' EB n be a 
sentence~ 'each E-bunsetsucEBi(iSi<n) must depend 
on only one of EBi+i,''°,EB n without passing any 
EBj(i<j) that governs at least one of EB l, ''', 
EBi- I. Moreover, EB n must be predicative. 
how many are needed for building up the 
natural sentence and how they can be used. 
We have singled out the atomic expressions 
extensively excepting most of conceptual ones 
from approximately 12,000 sentences of technical 
papers and text-books of the senior high schools. 
Their rough categorization is sho~1 in the 
following. (The number of the expressions is 
given in parentheses.) 
Annex-expressions 
Atomic annex-expressions are classified into two 
kinds: relational expressions which provide the 
information of type(b); and co-predicative 
expressions which provide the information of 
type(c). 
Relational Expressions(575). While the 
typical example of the relational expression is 
the particle in the conventional grammar, eighty 
percent of the relational expressions are 
idiomatic word-strings. For example, the word- 
string,'ni tsui te' is atomic and relational 
because it has a proper, undividable and self- 
supported meaning equivalent to that of the pre- 
position,'about' in English in such context as 
'Mary ni tsui te hanasu'('talk about Mary'). 
(The original meaning of the verb,'tsuku' is 
almost missing in the context.) 
The atomic annex-expressions can be divided 
roughly into ten categories according to their 
abilities to indicate the kakariuke-relation. 
We denote these categories by RNpi,RNP2,RNp3, 
Rppi,Rpp2 ,Rpp3,RNNi ,RNN2 ,RNN3and RpN. RNp I , 
RNp 2 or RNp 3, for example, is a category of 
expressions which indicate the dependency of the 
nominal E-bunsetsu, N on the predicative E- 
bunsetsu; P. 'hi tsui te' mentioned above is 
included in RNp 1 . 
Co-predicative Expressions(348). The auxil- 
iary verb in the conventional school grammar is 
typically co-predicative but ninety percent of 
the co-predicative expressions singled out are 
idiomatic. For example, the word-string,'ta hou 
ga yoi',which is equivalent to 'had better' in 
English provides the information of the modality. 
These can be divided into seven categories,i.e. 
hnpl ,Anpz ,Anp3 ,Appl ,App2 ,App3 and App~ ac- 
cording to the functions of the connection and 
whether they can inflect or not. Appl, for ex- 
ample, represents a category of inflectable 
expressions each of which yields a predicative 
expression, p by connecting(agglutinating) to a 
predicative expression, p. The atomic expres- 
sion, 'ta hou ga yoi' mentioned above is in App I • 
Conjunctive Expressions(122) 
Besides the traditional conjunction, many con- 
junctive, idiomatic expressions have been 
singled out as atomic ones. For example, the 
string 'sikasi nagara', wich is equivalent to 
'however' in English is conjunctive and atomic. 
The conjuctive expression is not annexational, 
but offers the information of type(b). There 
observed two categories: one, denoted by C I, of 
expressions which can indicate both of the 
relation between two sentences and the relation 
--2 
between two E-bunsetsus; the other ~, denoted by 
C 2, of expressions which indicate exclusively 
the relation between two sentences. 
Suffixal Expressions(403) 
The conceptual expressions are too numerous to 
be enumerated exhaustively. In addition, it is 
difficult in the present state to settle the 
extensive rules for constructing the conceptual 
compound. 
We have singled out only the suffixal constit- 
uents of the conceptual compounds that are used 
very frequently and have definite syntactic and 
semantic functions. These are classified 
roughly into seven categories, i.e. Snpl,Snp 2, 
Spp, Snnl,Snn2,Snn s and Spn, by their functions. 
For example, Snp I , that includes such a string 
as 'de aru' being used quite frequently, is a 
category of expressions each of which consti- 
tutes a predicative conceptual expression,p when 
suffixed to a nominal conceptual expression,n. 
The conceptual compound of quantitative, 
temporal or locational meaning, e.g. '3 zi 15 
hun' ('a quarter past three ~) is sometimes 
exceptionally easy to be decomposed into con- 
stituents. A good many suffixal constituents 
of these compounds are included in Snn I . 
Adverbial Expressions(262) 
The adverbial expressions fall into two cate- 
gories, D 2, for the expression which is always 
used in cooperation with some other specific 
expression and D I, for the rest. For example, 
'kanarazusimo ''' (nai)' ('not necessarily') is 
in D2. 
Adnominal Expressions(165) 
The adnominal expression, such as 'subete no' 
('all') is similar to the adjective except that 
it is uninflective and used always attributively 
being located ahead of the nominal E-bunsetsu to 
be modified. The category of these expressions 
is denoted by T. 
Structure of E-bunsetsu 
The structure of the E-bunsetsu can be charac- 
terized in the form of "transition net", since 
it has no complex embedded structures. Our 
structural characterization is based on pre- 
scribing the connection rules of the atomic 
expressions within an E-bunsetsu. It is shown 
in two stages in this chapter. 
General Structure of E-bunsetsu 
The general structure of E-bunsetsu is shown in 
Fig.l using the above-mentioned categories and 
three traditional ones, Mi,M2 and Y,representing 
for the noun, verbal-noun(i.e, noun called 
initial ) node -~ 
~g 
.~.~t ' i 
{ 
t~ 
6U~', 
< 
~k~< IE~ ~ I~< 
~1.~9 \] : 
~to~ 
Fig. 1. Connection Graph Describing the General Structure of E-bunsetsu 
--3-- 
"sahen-meishi") and yougen(i.e, verb, adjective, 
adjective verb), respectively. In Fig.l, nodes 
represent the categories and arrows denote that 
expressions in starting nodes can be immediately 
followed(agglutinated) by those in ending nodes. 
The E-bunsetsu can be analyzed, though roughly, 
by starting at the initial node and tracing a 
path in the figure. Each node is the acceptable 
node for the E-bunsetsu. The conceptual 
expression corresponds to a path terminating at 
a node located above the dotted line. The syn- 
tactic and semantic function of the E-bunsetsu 
can be estimated by recognizing the terminating 
node in the path. 
Generality of Characterization. In order to 
verify the generality of the characterization 
shown in Fig.l, we have inspected approximately 
1,500 actual sentences in technical papers by 
segmenting each sentence into E-bunsetsus 
applying the above rules. Table 1 shows the 
results of the inspection. From this, it came 
out that our enumeration of annex-expressions 
is almost sufficient and all of the sentences 
inspected can be segmented into E-bunsetsus if 
we newly register and classify the expressions 
missing in the enumeration into existing 
categories. In addition, it turned out that the 
idea of the E-bunsetsu~ which elucidates a 
Table I. Results of Inspection 
number of atomic expressions 
missing in the enumeration: 
annex- 6 
conjunctive 21 
suffixal 0 
adverbial 49 
adnominal 25 
unclassifiable 0 
number of: 
sentences , n 1,532 
bunsetsus , nl 23,432 
E-bunsetsus , n2 20,118 
nl/n 15.3 
n2/n 13.1 
(nl-n2)/nl 0.14 
total appearances of: 
annex-expressions , n3 10,124 
compound annex- 
expressions , n4 1,655 
n4/n3 0.16 
Table 2. Paradigm 
type 
form 
code 
negative-con- adverbial standard adnominal subjunctive 
j ectural form form form form form 
1 2 3 4 5 6 7 8 
imperative stem 
form 
9 A B 
5-vowel, 
I-type 
5-vowel, 
T-type 
5-vowel, 
Q-type 
5-vowel 
type 
1-vowel 
type > 
S-type 
K-type 
W-type 
adj ective-type 
• NA-type 
u NO-type 
> e-type 
T-type 
D-type 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
A 
B 
C 
D 
a a o i i u u e 
* 
a a o i t u u e 
, 
a a o i q u u e 
a a o i i u u e 
e e yo e e ru ru re 
i e lyo i i uru uru ure 
o o oyo i i uru uru ure 
wa wa o i t u u e 
ku karo ku kat i i kere 
ni na 
ni no 
ni e 
aro a a 
aro at a 
e 
e 
e 
e 
ro 
iro 
oi 
e 
yo 
eyo 
ex.kik 
ex.okor 
ex.sin 
ex.kes 
ex.tozi 
~X.S 
ex.k 
ex.tiga 
ex.yo 
ex.kirei 
ex.hodo 
ex.onazi 
ex. t 
ex.d 
larger structure than a "bunsetsu", is quite 
effective for reducing the load of the second 
phase of the parser because it causes fourteen 
percent decrease of the number of immediate 
constituents of the sentence. Moreover, the 
rate of appearance of the atomic relational 
expressions which are originally compound was 
found to be sixteen percent. These facts assure 
the generality of the characterization to a 
reasonable extent. 
Detailed Structure of E-bunsetsu 
In the course of the development of the natural 
language system, it is a fundamental and crucial 
problem how much the grammatical rule should be 
elaborate or how much the syntactic and semantic 
structure of the sentence should be disambigu- 
ated within the grammatical phase of the 
processing. We think it profitable for 
increasing the total efficiency of the system to 
disambiguate them as much and as early as 
possible. From this point of view/ we try 
to do it in the phase of analyzing the 
E-bunsetsu by refining the characterization of 
Fig.l without destroying its grammatical 
features and generality. 
Inflectional Endings. The word-inflection 
of Japanese language is closely related to the 
agglutination of words. The connection 
represented in Fig.1 by the asterisked arrow 
should be restricted by the inflectional type 
and inflectional form of the preceding expres- 
sion, which is inflectable. 
While subcategorizing the inflectable expres- 
sions by their inflectional types, we gave 
respective expressions in the ending nodes of 
the asterisked arrows a dictionary entry 
denoting what inflectional types and forms it 
can be connected to. The inflectional form is 
known by detecting the ending. Table 2 shows 
the paradigm. The asterisked letter in the 
table is a euphonical one by which the final 
letter of the stem may be replaced. '~' repre- 
sents an empty ending. 
This paradigm (and the experimental system de- 
scribed in the next chapter) is based on a way 
of expressing Japanese characters by English 
letters which is devised from the viewpoints of 
mechanical processing. 
Subcategorization. We subcategorized some of 
the annex-expressions by their detailed aggluti- 
native functions using a formal algorithm ~. 
It should be noted that the homonymous expres- 
sion whose meanings have individual aggluti- 
native functions was categorized duplicatively 
into different categories according to 
respective functions. These expressions' 
Ti.e. to partition the set,E=RNp~RNP2URppiURpp2 
ORpp 3 by the following relation, ~ into 
equivalence classes. 
for Vx,y~E (xRy ~. for VWl,Wz~E ((x*wl+-~ 
Y*Wl) A (w2*x ~+ w2~Y))), where a*b ~. "a can 
be agglutinated by b" 
Table 3. Outline of Subcategorization 
original number of number of 
category expressions subcategories 
relational 
co- 
predicative 
suffixal 
RNPi 153 
RNP2 63 
RNP3 13 
RNNI 1 
RNNz 118 
RNN3 4 
RPN 40 
RPPi 149 
Rpp2 1 
Rpp3 38 
Anpl 37 
Anp2 4 
Anp3 2 
App I 298 
App2 15 
App3 4 
App4 1 
Snnl 288 
Snn2 6 
Snn3 3 
Spn 92 
Snpl ii 
Snp2 1 
Spp 2 
conjunctive 
Di adverbial 
D2 
adnominal T 
noun MI 
M2 
yougen Y 
C1 35 
C2 87 
180 
82 
165 
24 
31 
4 
3 
5* 
2* 
12" 
25 
i0 
4* 
2* 
5 
(14") 
meanings, therefore, can be disambiguated by 
checking the agglutinative structure of the 
E-bunsetsu. 
Suffixal expressions were also subcategorized 
mainly by their semantical functions in order 
to decompose limited types of the conceptual 
compounds in the experimental system. 
The numerical outline of these refinements of 
the categories is given in Table 3. The aster- 
isk in the table implies the subcategorization 
based on the inflectional type. 
Refined Connection Rules. The connection 
rules were refined by using the finally obtained 
categories that amount to 142. The number of 
these rules is approximately 3,600. 
Table 4 shows some examples of the rule and of 
the expression. 
Table 4. Examples of Refined Rules 
succeedable subcategory examples of expressions (their meanings) categories 
R01 
R02 
R19 
R23 
R27 
R36 
R37 
R38 
R55 
R56 
R62 
'ga' (AGENT,OBJ-1,'''),'no' (AGENT,''') 
'wo' (OBJ-I,''') 
'wo motii te' (INSTRUMENT),'ni tui te' (OBJ-i,SITUATION),''' 
'hi tui te' (NUMBER-2"RATE,'''),'atari' (NUMBER-2"RATE,''') 
'he' (DIRECTION), 'made' (DIRECTION), •" • 
"ha' (AGENT \[THEME\], OBJ-i \[THEME\], • • • ) 
'mo' (AGENT \[ADDITION\] ,.'') ,'mo mata' (AGENT \[ADDITION\],''.) ,''' 
'koso' (AGENT \[STRESS\] ,OBJ-i \[STRESS\], ' ' ' ) 
' made' (AGENT \[STRESS-OTHER\], OBJ-i \[STRESS-OTHER\] ," • • ) 
' dake ' (AGENT \[LIMITATION\], • ' • ), • • • 
'made' (AGENT \[T-POINT\] , • • .) , • • • 
R37,R38,R55,.." 
R36~R38,R55,R56, ... 
R36~R38,R55,R56, ... 
R01,R02,R19,R36,... 
R01,R02,Ri9,R36,... 
R01,R02,Ri9,R27,R36,... 
R01,R02,RI9,R36~R38, ... 
Experiment 
Overview 
The Japanese sentence is ordinarily written in 
kana(phonetic) letters and Chinese(ideographic) 
characters without leaving a space between words. 
From the viewpoint of machine-processing, 
however, it is preferable to express clearly the 
units composing the sentence in such a way as to 
leave a space between every word as in English. 
We have no standard way of spacing the units 
though the need for this has been demanded for 
a long time. Supposing tentatively that a 
sentence is written in English letters with a 
space between each E-bunsetsu, we have developed 
an experimental system which decomposes the 
input E-bunsetsu into atomic expressions using 
the refined rules and decides its function. 
The system is overviewed as follows: 
(i) The system consists of five components: a 
program; a dictionary of atomic expressions; 
a table of the connection rules; a paradigm; 
and a table of euphonical rules(not 
mentioned in this paper); 
(2) Each entry expression is given one or more 
triple of the information in the dictionary. 
A triple consists of a code of the (refined) 
category such as A48 or R56, a code of the 
inflectional condition of the connection, 
and a code of the meaning; 
(3) As to the inflectable expression, the 
dictionary includes only its stem; 
(4) E-bunsetsu is decomposed from left to right 
on it by the "longest-match method" and 
all possible analyses are tried in the 
"depth-first" manner; 
(5) The category code such as Mi3 or Y05, of 
the noun or yougen is used in the input and 
dictionary for the actual expression in it. 
Op_erational Examples 
Operational examples follow. The string of 
letters parenthesized in the output description 
is the inflectional ending and '/' denotes the 
boundary between the conceptual expression and 
the annex-expression detected by the system. 
The arrows in the following illustration show 
the string of categories which corresponds to 
a leftmost substring of the input and is 
assured to be successful by both of the connec- 
tion rules of the category level and the inflec- 
tional conditions given in the dictionary. On 
the other hand, the dotted arrow shows that the 
connection is allowed by the rule of the cate- 
gory level but not by the rule of the inflec- 
tional level. 
Example i. 
input = YO60NAKATTATAME (~h,~t:t:~) 
output : 
segmentation = Y06(0)/NA(KAT) T(A) TAME 
categories = Y06 A48 A4C R91 
function = P MODIFYING P 
Without checking the refined rules ( of two 
levels: the category level, and inflectional 
level), the following two decompositions would 
have been obtained. 
Y06 (0)/NA (KAT) T(A) TAME i-i 
Y06, ) A48 > A4C~-~R91 
"'~A60 '~R91 
~A4A 
~A4A 
Y06 (O) NA (KAT) T (A) T (A) ME 1-2 
• • • • • 
Y06. ~A48 ~A4C--~A4C 
"'~ A60 
While the decomposition i-I is successful, 1-2 
was rejected because the auxiliary verb,'ta' is 
prohibited from being connected to the preceding 
auxiliary verb,'ta' by the inflectional rule. 
The triples given in the dictionary to 'tame' 
are as follows: 
{R91; "connectable to adnominal forms 
of all types"; CAUSE.REASON }; 
{R91; "connectable to adnominal forms 
--6 
of verb types"; PURPOSE }; 
{A4A; "connectable to adnominal forms 
of all types"; CAUSE.REASON }; 
{A4A; "connectable to adnominal forms 
of verb types"; PURPOSE }. 
In i-i, since the inflectional type of 'ta' is 
not verbal, the second and fourth triples are 
not acceptable. In addition, the third one is 
unavailable since the ending form of the input 
E-busetsu results to be a stem, and inadequate. 
Finally, only the first one was accepted and 
at the same time the meaning of 'tame' was dis- 
ambiguated. 
Example 2. 
input = YO8SANIMOTODUITESIKA(~ ~ ~Ic~,~l~) 
output : 
segmentation = Y08 SA/NIMOTODUITE SIKA 
categories = Y08 S45 RI9 R42 
function = N MODIFYING P 
Without using the rules, the following three 
kinds of decompositions would have been possible. 
Y08 SA/NIMOTODUITE SIKA 2-1 
: : : : 
Y08--~$45 > Ri9- >R42 
R92 
TO8 SA NIMOTODUITE SI KA 2-2 
: : : : : 
Y08--~S45---~Ri9 R95 
Y08 SA NIMOTODUITE S(I) KA 2-3 
: : : : : 
Y08--~S45 > Ri9 S35 
• he atomic expression, 'si' in 2-2 and 'si' in 
2-3, which are understood as a conjunctive verb, 
and a suffixal expression, respectively, can not 
be connected to 'nimotoduite' 
Example 3. 
input = MI4TEKINANODEHANAI. (~r~I~<,,) 
output : 
segmentation 1 = M|4 TEKI/NANODEHANA(1) 
categories 1 = M\]4 $29 A48 
function 1 = P IN THE SENTENCE-FINAL POSITION 
segmentation 2 = Ml4 TEKI(NA) NO/DEHANA(1) 
categories 2 = Ml4 $29 $47 Al8 
function 2 = P IN THE SENTENCE-FINAL POSITION 
The result was twofold according to two sorts 
of interpretations of 'no':the first one is to 
understand it has no special meaning; the 
second, it is a suffixal variant of the noun, 
'mono' ('thing'). There exist latently following 
eight different decompositions but only 3-1 and 
3-6 were accepted by the rules. 
Mi4 TEKI/NANODEHANA(I) 3-1 
: : : 
M14---~$29 )A48 
Ai8 
MI4 TEKI NANODE HA NA(I) 3-2 
Mi4 TEKI NANOD(E) HA NA(I) 3-3 
Mi4 TEKI(NA) NODEHANA(I) 3-4 
Mi4 TEKI(NA) NODE HA NA(I) 3-5 
Mi4 TEKI(NA) NO/DEHANA(I) 3-6 
: : : : 
M14---~$29. R70 Ai8 
ROi A48 
Mi4 TEKI(NA) NO DE HA NA(I) 3-7 
Mi4 TEKI(NA) NO D(E) HA NA(I) 3-8 
AS for 3-6, it was understood that the atomic 
expression,'no' was not a particle(R70) which 
indicates a kakariuke relation between two 
nominal E-bunsetsus or a particle(R01) of the 
meaning of AGENT, but a suffixal expression(S47) 
which nominalizes the predicative expression. 
Example 4. 
input = M2ODEKINAKUNARUTO(~ ~ ~ <~ ~) 
output : 
segmentation 1 = M20/DEKI() NAKUNAR(U) TO 
categories 1 = M20 A24 A4\] R92 
function 1 = P MODIFYING P 
segmentation 2 = M20/DEKI() NAKUNAR(U) TO 
categories 2 = M20 A24 A4\] R94 
function 2 = P MODIFYING P 
The decomposition was unique but the interpre- 
tation of 'to' was twofold as follows. 
M20 / DEKI () NAKUNAR(U) TO 
M20---~A24 > A41. R03   
R19 
R72 
$90 
In the first interpretation, 'to' is a conjunc- 
tive particle of the meaning,ASSUMPTION, and in 
the second, it is a particle of the meaning, 
QUOTATION. This ambiguity is, therefore, quite 
reasonable. 
Results of Experiments 
We show the results of experiments made for 162 
E-bunsetsus in Table 5 and 6. The average 
number of atomic expressions composing an E- 
Table 5. Ambiguity of Decomposition 
number of decompositions number of E-bunsetsus 
zero 1 
(not decomposable) 
one 158 
two 3 
more than or 
equal to three 0 
--7-- 
Table 6. Ambiguity of Category Sequence 
number of category sequences number of 
per a single decomposition decompositions 
1 145 
2 12 
3 1 
4 3 
5 2 
8 1 
bunsetsu fed to the system has been 4.8. The 
ambiguities of both the decomposition and the 
category sequence have been reduced sufficiently. 
Most of the ambiguities left by the system have 
been quite reasonable in the sense that further 
reductions of them would require more detailed 
information from the outside of the E-bunsetsu. 
In addition, the ambiguities to be left to 
higher phases of parsing for reduction have not 
been reduced by the system. 
As exemplified in Example i., the disambiguation 
of the atomic expression's meaning is carried 
out by selecting the triple of functional 
information given in the dictionary. Nine per- 
cent of the entry expressions are given plural 
triples and then their meanings can be reduced 
by our rules on the bases of its structural 
surroundings in the E-bunsetsu. 
Conclusions 
Extending the domain of input sentential forms 
of the natural language processing system 
enables, in principle, the system to manipulate 
more precice or delicate meanings and to commu- 
nicate with men more naturally. Our grammat- 
ical model presented in this paper is so compre L 
hensive that the local structures of colloquial 
and written sentences actually used in everyday 
life can almost always be analyzed with it. 
It is also elaborate enough to reduce the 
syntactic and semantic ambiguities of the local 
structure. It should be noted that the local 
structure analyzed by our grammar plays a quite 
important role in the Japanese language proces- 
sing because it is not only a larger structure 
which can include idiomatic strings of words 
than a bunsetsu, but also a syntactic and seman- 
tic unit for sentence construction. 
Every atomic expression, which is the smallest 
component of the sentence, has been chosen to 
have undividable and self-supported meaninqs. 
Though we have not mentioned it in detail in 
this paper, we have already settled exten- 
sively the meanings of annex-expressions by 
classifying them. 
--8-- 

REFERENCES 

\[i\] K.Shudo:"On Machine Translation from Japa- 
nese into English for a Technical Field", 
Information Processing in Japan,14 (1974). 

\[2\] K.Shudo,H.Tsurumaru & S.Yoshida:"A Predi- 
cative Part Processing System for Japanese- 
English Machine Translation"(in Japanese), 
the Trans. of the IECE of Japan,J60-D,10 
(1977) --- Abstract in English,E60-D,10 
(1977). 

\[3\] K.Shudo,T.Fujita & S.Yoshida:"On the Proces- 
sing of Annexational Expressions in Japa- 
nese", the Proc. of the 7th International 
Conference on Computational Linguistics 
(COLING 78) (1978). 

\[4\] K.Shudo,T.Narahara & S.Yoshida:"A Structural 
Model of Bunsetsu for Machine Processing of 
Japanese"(in Japanese), the Trans. of the 
IECE of Japan, J62-D, 12 (1979) --- Abstract 
in English, E62-D, 12 (1979). 

\[5\] K.Shudo:"Studies on Machlne Processing of 
Japanese Using a Structural Model of 
Bunsetsu"(in Japanese), the Bulletin of the 
Institute for Advanced Research of Fukuoka 
University, 45 (1980). 
