File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/98/p98-1108_abstr.xml

Size: 1,157 bytes

Last Modified: 2025-10-06 13:49:22

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1108">
  <Title>Use of Mutual Information Based Character Clusters in Dictionary-less Morphological Analysis of Japanese Hideki Kashioka, Yasuhiro Kawata, Yumiko Kinjo,</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
ATR Interpreting Telecommunications Reserach Laboratories
Abstract
</SectionTitle>
    <Paragraph position="0"> For languages whose character set is very large and whose orthography does not require spacing between words, such as Japanese, tokenizing and part-of-speech tagging are often the difficult parts of any morphological analysis. For practical systems to tackle this problem, uncontrolled heuristics are primarily used. The use of information on character sorts, however, mitigates this difficulty. This paper presents our method of incorporating character clustering based on mutual information into Decision-Tree Dictionary-less morphological analysis. By using natural classes, we have confirmed that our morphological analyzer has been significantly improved in both tokenizing and tagging Japanese text.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML