JAPANESE SENTENCE ANALYSIS SYSTEM ESSAY - EVALUATION 
OP DICTIONARY DERIVED PROM REAL TEXT DATA 
K. Shira£, J. Eubota, Y. Hayashi 
Department of Electrical Engineering, WASEDA University 
In this paper, we report on an experimental system of 
Japanese sentence analysis, called ESSAY, 
Many Japanese sentence analysis systems, not only Phrase 
structure analysis systems but also Ke~:a,rt-D~:e anelysis sys- 
tems, are usually based on rules in eTntactic level or Case 
grammatical restriction. 
Comparing with such systems, our system is unique in 
the dictionary. In this dictionary, function of the language 
elements, such as words or auxiliary morpheme, are dbscr~bed. 
And these lexlcal entries are automatically constructed from 
analysis of real text data~ 
Xn order to evaluate the usefulness of such dictionary, 
we are accumulating Japanese sentences data, and applTJ-~ 
statistical and structural analysis method to this data. 
Xn the following we concentrate upon next 2 points. 
( 1 ) construction of dictionary 
(2) overview of ESSAY 
( 1 ) Construction of dictionary 
As the initial data we entered about 2000 sentences of 
elementary school text in Kana-letter (Japanese syllabary) 
not in KanJi (Chinese character). 
Japanese is an agF~utinative language, so in analyzing 
sentences they are u~ually separated into number of parts 
- 259 - 
called Bunsetsu. In entering the text at this time, we also 
used this unit. Between these Bunsetsu, there are some depend- 
ency relations called Kakari-Uke which can be decided uniquely 
for a~v sentence. We can consider that in case there is a Kaka- 
ri-Uke relation between word A and B, A is modifying B. 
This time we defined the distance between words mainly 
based on this Kakari-Uke relation, and then olassi~fted them 
into number of groups using some clustering techniques. Am the 
result we got a base-dictionary which can represent Kakari-Uke 
relation between these groups. 
It is expected that syntax, sau~ntios or knowledge of 
the world can be naturally embeded in this dictionary and this 
type of lexicon is highly useful in the Japanese sentence 
~ysiso 
(2) Overview of ESSAY 
ESSAY (Experimental System of Sentence AnalYsis) parses 
Japanese sentence by analyzing Kakari-Uke relation between 
Bunmetsu in input sentences. 
This system ham dictionary driven feature, and does not 
depend on usual syntactic and semantic models. Thus this 
system can be used for evaluation of dictionary, which Is de- 
scribed in (1). 
The input to this system is a Japanese sentence, which 
is segmented in Bunsetsu unit, and the output from this system 
is labelled binary tree structure, which represents syntnotio 
structure of the input sentence. 
The algorithm to extract this structure is very simple, 
and special linguistic knowledge is not embedded in the 
prooedurable way. The decision of tree structure is based on 
Graph theoretic processing, and labelling of Kakari-Uke relat- 
ion is processed by using Statistical decision theory. 
- 260 - 
As stated above, this system has it si linguistic know- 
ledge in the declarative way by the form of dictionary, thus 
structure of system is simple, and rich in modularity, But 
proaedurable knowledge can be easily implemented, if we need 
it. 
By taking this approach, it is possible to get the way 
to construot a flexible system, which has rich shility of 
adaptation to specified world. This point is one of the merits 
of ottr approach, in comparison with usual approaches, that tend 
%0 depend on researcher's framework. 
In this paper, we present several experimental results 
which show the validity of our approaoho 
- 261 - 
