File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/04/c04-1004_evalu.xml

Size: 4,377 bytes

Last Modified: 2025-10-06 13:59:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C04-1004">
  <Title>Discriminative Hidden Markov Modeling with Long State Dependence using a kNN Ensemble</Title>
  <Section position="6" start_page="211" end_page="211" type="evalu">
    <SectionTitle>
6. Experimentation
</SectionTitle>
    <Paragraph position="0"> The corpus used in shallow parsing is extracted from the PENN TreeBank (Marcus et al. 1993) of 1 million words (25 sections) by a program provided by Sabine Buchholz from Tilburg University. All the evaluations are 5-fold crossvalidated. For shallow parsing, we use the F-measure to measure the performance. Here, the F-measure is the weighted harmonic mean of the precision (P) and the recall (R):  with =1 (Rijsbergen 1979), where the precision (P) is the percentage of predicted phrase chunks that are actually correct and the recall (R) is the percentage of correct phrase chunks that are actually found.</Paragraph>
    <Paragraph position="1">  b Tables 1, 2 and 3 show the detailed performance of LSD-DHMMs. In this paper, the valid set of pattern entry forms ValidEntry is defined to include those pattern entry forms within a windows of 7 observations(including current, left 3 and right 3 observations) where for to be included in a pattern entry, all or one of the overlapping features in each of  should be included in the same pattern entry while for to be included in a pattern entry, all or one of the overlapping features in each of or should be included in the same pattern entry.</Paragraph>
    <Paragraph position="3"> Table 1 shows the effect of different number of nearest neighbors in the kNN probability estimator and considered previous states in the variable-length mutual information modeling approach of the LSD-DHMM, using only one kNN probability estimator in the ensemble to estimate in the output model. It shows that finding 3 nearest neighbors in the kNN probability estimator performs best. It also shows that further increasing the number of nearest neighbors does not increase or even decrease the performance. This may be due to introduction of noisy neighbors when the number of nearest neighbors increases. Moreover, Table 1 shows that the LSD-DHMM performs best when six previous states is considered in the variable-length mutual information-based modeling approach and further considering more previous states only slightly increase the performance. This suggests that the state dependence exists well beyond traditional ngram modeling (e.g. bigram and trigram) to six previous states and the variable-length mutual information-based modeling approach can capture the long state dependence. In the following experimentation, we will only use the LSD-DHMM with 3 nearest neighbors used in the kNN probability estimator and 6 previous states considered in the variable-length mutual information modeling approach.</Paragraph>
    <Paragraph position="5"> Table 2 shows the effect of different number of kNN probability estimators in the ensemble. It shows that 15 bootstrap replicates are enough for the k-NN ensemble on shallow parsing and increase the F-measure by 0.71 compared with the ensemble of only one kNN probability estimator.</Paragraph>
    <Paragraph position="6"> Table 3 compares the LSD-DHMM with GHMMs and other DHMMs. It shows that all the DHMMs significantly outperform GHMMs due to the modeling of the observation dependence and allowing for non-independent, difficult to enumerate observation features. It also shows that our LSD-DHMM much outperforms other DHMMs due to the modeling of the long state dependence using the variable-length mutual information-based modeling approach in the LSD-DHMM. Moverover, Table 3 shows that noprojection-based DHMMs (i.e. CRF-DHMM, SNoW-DHMM, Backoff-DHMM and LSD-DHMM) outperform projection-based DHMMs. It may be due to alleviation of the label bias problem inherent in the projection-based DHMMs. Finally, Table 2 also compares the kNN ensemble with popular classifier-based approaches, such as SNoW and Maximum Entropy, in estimating the output model of the LSD-DHMM. It shows that the kNN ensemble outperforms these classifier-based approaches. This suggests that the kNN ensemble captures the dependence between the features of the observation sequence more effectively by forcing the model to account for the distribution of those dependencies.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML