File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/p05-2004_evalu.xml

Size: 5,149 bytes

Last Modified: 2025-10-06 13:59:26

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-2004">
  <Title>Jointly Labeling Multiple Sequences: A Factorial HMM Approach</Title>
  <Section position="6" start_page="21" end_page="22" type="evalu">
    <SectionTitle>
5 Experiments
</SectionTitle>
    <Paragraph position="0"> We report two sets of experiments. Experiment 1 compares several FHMMs with cascaded HMMs and demonstrates the benefit of joint labeling. Experiment 2 evaluates the Switching FHMM for various training dataset sizes and shows its robustness against data sparsity. All models are implemented using the Graphical Models Toolkit (GMTK) (Bilmes and Zweig, 2002).</Paragraph>
    <Section position="1" start_page="21" end_page="22" type="sub_section">
      <SectionTitle>
5.1 Exp1: FHMM vs Cascaded HMMs
</SectionTitle>
      <Paragraph position="0"> We compare the four FHMMs of Section 2 to the traditional approach of cascading HMMs in succession, and compare their POS and NP accuracies in Table 2. In this table, the first row &amp;quot;Oracle HMM&amp;quot; is an oracle experiment which shows what NP accuracies can be achieved if perfectly correct POS tags are available in a cascaded approach. The second row &amp;quot;Cascaded HMM&amp;quot; represents the traditional approach of doing POS tagging and NP chunking in succession; i.e. an NP chunker is applied to the output of a POS tagger that is 94.17% accurate. The next four rows show the results of joint labeling using various FHMMs. The final row &amp;quot;DCRF&amp;quot; are  comparable results from Dynamic Conditional Random Fields (Sutton et al., 2004).</Paragraph>
      <Paragraph position="1"> There are several observations: First, it is important to note that FHMM outperforms the cascaded HMM in terms of NP accuracy for all but one model. For instance, FHMM-CT achieves an NP accuracy of 95.93%, significantly higher than both the cascaded HMM (93.90%) and the oracle HMM (94.67%). This confirms our hypothesis that joint labeling helps prevent POS errors from propagating to NP chunking. Second, the fact that several FHMM models achieve NP accuracies higher than the oracle HMM implies that information sharing between POS and NP sequences gives even more benefit than having only perfectly correct POS tags. Thirdly, the fact that the most complex model (FHMM-CT) performs best suggests that it is important to avoid data sparsity problems, as it requires more parameters to be estimated in training.</Paragraph>
      <Paragraph position="2"> Finally, it should be noted that although the DCRF outperforms the FHMM in this experiment, the DCRF uses significantly more word features (e.g.</Paragraph>
      <Paragraph position="3"> capitalization, existence in a list of proper nouns, etc.) and a larger context (previous and next 3 tags), whereas the FHMM considers the word as its sole feature, and the previous tag as its only context. Further work is required to see whether the addition of these features in the FHMM's generative framework will achieve accuracies close to that of DCRF. The take-home message is that, in light of the computational advantages of generative models, the FHMM should not be dismissed as a potential solution for joint labeling. In fact, recent results in the discriminative training of FHMMs (Bach and Jordan, 2005) has shown promising results in speech processing and it is likely that such advanced techniques, among others, may improve the FHMM's performance to state-of-the-art results.</Paragraph>
    </Section>
    <Section position="2" start_page="22" end_page="22" type="sub_section">
      <SectionTitle>
5.2 Exp2: Switching FHMM and Data Sparsity
</SectionTitle>
      <Paragraph position="0"> We now compare the Switching FHMM to the best model of Experiment 1 (FHMM-CT) for varying amounts of training data. The Switching FHMM uses the following a and b mapping. The mapping a = f(z1:t) partitions the space of chunk history z1:t into five equivalence classes based on the two most recent chunk labels:  strictly inside or outside an NP chunk. Class3 and Class4 are situations where the tag is leaving or entering an NP, and Class5 is when the tag transits between consecutive NP chunks. Class-specific tag bi-grams pa(yt|yt[?]1) are trained by dividing the training data according to the mapping. On the other hand, the mapping b = g(y1:t) is not used to ensure a single point of comparison with FHMM-CT; we use FHMM-CT's chunk model p(zt|zt[?]1,yt[?]1) in place of pb(zt|zt[?]1).</Paragraph>
      <Paragraph position="1"> The POS and NP accuracies are plotted in Figures 3 and 4. We report accuracies based on the average of five different random subsets of the training data for datasets of sizes 1000, 3000, 5000, and 7000 sentences. Note that for the Switching FHMM, POS and NP accuracy remains relatively constant despite the reduction in data size. This suggests that a more explicit model for cross sequence interaction is essential especially in the case of insufficient training data. Also, for the very small datasize of 1000, the accuracies for Cascaded HMM are 84% for POS and 70% for NP, suggesting that the general FHMM framework is still beneficial.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML