File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/98/p98-1108_concl.xml
Size: 2,048 bytes
Last Modified: 2025-10-06 13:58:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1108"> <Title>Use of Mutual Information Based Character Clusters in Dictionary-less Morphological Analysis of Japanese Hideki Kashioka, Yasuhiro Kawata, Yumiko Kinjo,</Title> <Section position="8" start_page="661" end_page="661" type="concl"> <SectionTitle> 7 Conclusion and Discussion </SectionTitle> <Paragraph position="0"> Both results show that the use of character clusters significantly improves both tokenizing and tagging at every stage of the training. Considering the results, our model with MI characters is useful for assigning parts of speech as well as for finding word boundaries, and overcoming the unknown word problem.</Paragraph> <Paragraph position="1"> The consistent experimental results obtained from the training data with different word boundaries and different tag sets in the Japanese text, suggests the method is generally applicable to various different sets of corpora constructed for different purposes. We believe that with the appropriate number of adequate l~These include common noun, verb, post-position, auxiliary verb, adjective, adverb, etc. The purpose of this tag set is to perform machine translation from Japanese to English, German and Korean.</Paragraph> <Paragraph position="2"> questions, the method is transferable to other languages that have word boundaries not indicated in the text.</Paragraph> <Paragraph position="3"> In conclusion, we should note that our method, which does not require a dictionary, has been significantly improved by the character cluster information provided.</Paragraph> <Paragraph position="4"> Our plans for further research include investigating the correlation between accuracy and the training data size, the number of questions as well as exploring methods for factoring information from a &quot;dictionary&quot; into our model. Along these lines, a fruitful approach may be to explore methods of coordinating probabilistic decision-trees to obtain a higher accuracy.</Paragraph> </Section> class="xml-element"></Paper>