File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/98/p98-1108_evalu.xml

Size: 2,270 bytes

Last Modified: 2025-10-06 14:00:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1108">
  <Title>Use of Mutual Information Based Character Clusters in Dictionary-less Morphological Analysis of Japanese Hideki Kashioka, Yasuhiro Kawata, Yumiko Kinjo,</Title>
  <Section position="7" start_page="660" end_page="661" type="evalu">
    <SectionTitle>
6 Experimental Results
</SectionTitle>
    <Paragraph position="0"> We tested our morphological analyzer with two different corpora: a) ATR-travel, which is a task oriented dialogue in a travel context, and b) EDR Corpus, (EDR 1996) which consists of rather general written text.</Paragraph>
    <Paragraph position="1"> For each experiment, we used the character clustering based on MI. Each question for the decision-trees was prepared separately, with or without questions concerning the character clusters. Evaluations were made with respect to the original tagged corpora, from which both the training and test sentences were taken.</Paragraph>
    <Paragraph position="2"> The analyzer was trained for an incrementally enlarged set of training data using or not using character clustering. 15 Table 1 shows results obtained from training sets of ATR-travel. The upper figures in each box indicate the results when using the character clusters, and the lower without using them. The actual test set of 4,147 sentences (55,544 words) was taken from 15Another 2,231 sentences (28,933 words) in the same domain are used for the smoothing.</Paragraph>
    <Paragraph position="3">  the same domain.</Paragraph>
    <Paragraph position="4"> The MI-word clusters were constructed according to the domain of the training set. The tag set consisted of 209 part-of-speech tags. 16 For the word model decision-tree, three of 69 questions concerned the character clusters and three of 63 the tagging model. Their presence or absence was the deciding parameter.</Paragraph>
    <Paragraph position="5"> The analyzer was also trained for the EDR Corpus. The same character clusters as with the conversational corpus were used. A tag set in the corpus consisted of 15 parts-of-speech. For the word model, 45 questions were prepared; 18 for the Tagging model. Just a couple of them were involved in the character clusters. The results are shown in Table 2.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML