File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/82/c82-2064_abstr.xml
Size: 4,151 bytes
Last Modified: 2025-10-06 13:46:03
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-2064"> <Title>JAPANESE SENTENCE ANALYSIS SYSTEM ESSAY - EVALUATION OP DICTIONARY DERIVED PROM REAL TEXT DATA</Title> <Section position="1" start_page="0" end_page="0" type="abstr"> <SectionTitle> JAPANESE SENTENCE ANALYSIS SYSTEM ESSAY - EVALUATION OP DICTIONARY DERIVED PROM REAL TEXT DATA </SectionTitle> <Paragraph position="0"> K. ShiraPS, J. Eubota, Y. Hayashi Department of Electrical Engineering, WASEDA University In this paper, we report on an experimental system of Japanese sentence analysis, called ESSAY, Many Japanese sentence analysis systems, not only Phrase structure analysis systems but also Ke~:a,rt-D~:e anelysis systems, are usually based on rules in eTntactic level or Case grammatical restriction.</Paragraph> <Paragraph position="1"> Comparing with such systems, our system is unique in the dictionary. In this dictionary, function of the language elements, such as words or auxiliary morpheme, are dbscr~bed. And these lexlcal entries are automatically constructed from analysis of real text data~ Xn order to evaluate the usefulness of such dictionary, we are accumulating Japanese sentences data, and applTJ-~ statistical and structural analysis method to this data. Xn the following we concentrate upon next 2 points.</Paragraph> <Paragraph position="2"> ( 1 ) construction of dictionary (2) overview of ESSAY</Paragraph> <Paragraph position="4"> As the initial data we entered about 2000 sentences of elementary school text in Kana-letter (Japanese syllabary) not in KanJi (Chinese character).</Paragraph> <Paragraph position="5"> Japanese is an agF~utinative language, so in analyzing sentences they are u~ually separated into number of parts - 259 called Bunsetsu. In entering the text at this time, we also used this unit. Between these Bunsetsu, there are some dependency relations called Kakari-Uke which can be decided uniquely for a~v sentence. We can consider that in case there is a Kakari-Uke relation between word A and B, A is modifying B. This time we defined the distance between words mainly based on this Kakari-Uke relation, and then olassi~fted them into number of groups using some clustering techniques. Am the result we got a base-dictionary which can represent Kakari-Uke relation between these groups.</Paragraph> <Paragraph position="6"> It is expected that syntax, sau~ntios or knowledge of the world can be naturally embeded in this dictionary and this type of lexicon is highly useful in the Japanese sentence Japanese sentence by analyzing Kakari-Uke relation between Bunmetsu in input sentences.</Paragraph> <Paragraph position="7"> This system ham dictionary driven feature, and does not depend on usual syntactic and semantic models. Thus this system can be used for evaluation of dictionary, which Is described in (1).</Paragraph> <Paragraph position="8"> The input to this system is a Japanese sentence, which is segmented in Bunsetsu unit, and the output from this system is labelled binary tree structure, which represents syntnotio structure of the input sentence.</Paragraph> <Paragraph position="9"> The algorithm to extract this structure is very simple, and special linguistic knowledge is not embedded in the prooedurable way. The decision of tree structure is based on Graph theoretic processing, and labelling of Kakari-Uke relation is processed by using Statistical decision theory. - 260 As stated above, this system has it si linguistic knowledge in the declarative way by the form of dictionary, thus structure of system is simple, and rich in modularity, But proaedurable knowledge can be easily implemented, if we need it.</Paragraph> <Paragraph position="10"> By taking this approach, it is possible to get the way to construot a flexible system, which has rich shility of adaptation to specified world. This point is one of the merits of ottr approach, in comparison with usual approaches, that tend %0 depend on researcher's framework.</Paragraph> <Paragraph position="11"> In this paper, we present several experimental results which show the validity of our approaoho - 261 -</Paragraph> </Section> class="xml-element"></Paper>