File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/w04-3230_evalu.xml

Size: 6,125 bytes

Last Modified: 2025-10-06 13:59:21

<?xml version="1.0" standalone="yes"?>
<Paper uid="W04-3230">
  <Title>Applying Conditional Random Fields to Japanese Morphological Analysis</Title>
  <Section position="6" start_page="0" end_page="0" type="evalu">
    <SectionTitle>
4.2 Results
</SectionTitle>
    <Paragraph position="0"> Tables 3 and 4 show experimental results using KC and RWCP respectively. The three F-scores (seg/top/all) for our CRFs and a baseline bi-gram HMMs are listed.</Paragraph>
    <Paragraph position="1"> In Table 3 (KC data set), the results of a variant of maximum entropy Markov models (MEMMs) (Uchimoto et al., 2001) and a rule-based analyzer (JUMAN7) are also shown. To make a fare comparison, we use exactly the same data as (Uchimoto et al., 2001).</Paragraph>
    <Paragraph position="2"> In Table 4 (RWCP data set), the result of an ex-</Paragraph>
    <Paragraph position="4"> and p20=p2 are the top and sub categories of POS. cf0=cf and ct0=ct are the cfrom and ctype respectively. bw0=bw are the base form of the words w0=w.</Paragraph>
    <Paragraph position="5">  hara and Matsumoto, 2000) trained and tested with the same corpus is also shown. E-HMMs is applied to the current implementation of ChaSen. Details of E-HMMs are described in Section 4.3.2.</Paragraph>
    <Paragraph position="6"> We directly evaluated the difference of these systems using McNemar's test. Since there are no standard methods to evaluate the significance of F scores, we convert the outputs into the character-based B/I labels and then employ a McNemar's paired test on the labeling disagreements. This evaluation was also used in (Sha and Pereira, 2003). The results of McNemar's test suggest that L2-CRFs is significantly better than other systems including L1-CRFs8. The overall results support our empirical success of morphological analysis based on CRFs.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3 Discussion
</SectionTitle>
      <Paragraph position="0"> Uchimoto el al. proposed a variant of MEMMs trained with a number of features (Uchimoto et al., 2001). Although they improved the accuracy for unknown words, they fail to segment some sentences which are correctly segmented with HMMs or rule-based analyzers.</Paragraph>
      <Paragraph position="1"> Figure 3 illustrates the sentences which are incorrectly segmented by Uchimoto's MEMMs. The correct paths are indicated by bold boxes. Uchimoto et al. concluded that these errors were caused by non-standard entries in the lexicon. In Figure 3, &amp;quot; x&amp;quot; (romanticist) and &amp;quot;sM&amp;quot; (one's heart) are unusual spellings and they are normally written as &amp;quot;&amp;quot; and &amp;quot;&amp;quot; respectively. However, we conjecture that these errors are caused by the influence of the length bias. To support our claim, these sentences are correctly segmented by CRFs, HMMs and rule-based analyzers using the same lexicon as (Uchimoto et al., 2001). By the length bias, short paths are preferred to long paths. Thus, single token &amp;quot;x&amp;quot; or &amp;quot;sM&amp;quot; is likely to be selected compared to multiple tokens &amp;quot;/x&amp;quot; or &amp;quot;s M/&amp;quot;. Moreover, &amp;quot;&amp;quot; and &amp;quot;x&amp;quot; have exactly the same POS (Noun), and transition probabilities of these tokens become almost equal. Consequentially, there is no choice but to select a short path (single token) in order to maximize the whole sentence probability.</Paragraph>
      <Paragraph position="2"> Table 5 summarizes the number of errors in HMMs, CRFs and MEMMs, using the KC data set.</Paragraph>
      <Paragraph position="3"> Two types of errors, l-error and s-error, are given in  l-error: output longer token than correct one s-error: output shorter token than correct one this table. l-error (or s-error) means that a system incorrectly outputs a longer (or shorter) token than the correct token respectively. By length bias, long tokens are preferred to short tokens. Thus, larger number of l-errors implies that the result is highly influenced by the length bias.</Paragraph>
      <Paragraph position="4"> While the relative rates of l-error and s-error are almost the same in HMMs and CRFs, the number of l-errors with MEMMs amounts to 416, which is 70% of total errors, and is even larger than that of naive HMMs (306). This result supports our claim that MEMMs is not sufficient to be applied to Japanese morphological analysis where the length bias is inevitable.</Paragraph>
      <Paragraph position="5">  Asahara et al. extended the original HMMs by 1) position-wise grouping of POS tags, 2) word-level statistics, and 3) smoothing of word and POS level statistics (Asahara and Matsumoto, 2000). All of these techniques are designed to capture hierarchical structures of POS tagsets. For instance, in the position-wise grouping, optimal levels of POS hierarchies are changed according to the contexts. Best hierarchies for each context are selected by hand-crafted rules or automatic error-driven procedures. CRFs can realize such extensions naturally and straightforwardly. In CRFs, position-wise grouping and word-POS smoothing are simply integrated into a design of feature functions. Parameters k for each feature are automatically configured by general maximum likelihood estimation. As shown in Table 2, we can employ a number of templates to capture POS hierarchies. Furthermore, some overlapping features (e.g., forms and types of conjugation) can be used, which was not possible in the extended HMMs.</Paragraph>
      <Paragraph position="6">  L2-CRFs perform slightly better than L1-CRFs, which indicates that most of given features (i.e., overlapping features, POS hierarchies, suffixes/prefixes and character types) are relevant to both of two datasets. The numbers of active (nonzero) features used in L1-CRFs are much smaller (about 1/8 - 1/6) than those in L2-CRFs: (L2-CRFs: 791,798 (KC) / 580,032 (RWCP) v.s., L1-CRFs: 90,163 (KC) / 101,757 (RWCP)). L1-CRFs are worth being examined if there are some practical constraints (e.g., limits of memory, disk or CPU resources).</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML