File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/97/p97-1030_relat.xml

Size: 2,183 bytes

Last Modified: 2025-10-06 14:16:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1030">
  <Title>Mistake-Driven Mixture of Hierarchical Tag Context Trees</Title>
  <Section position="9" start_page="234" end_page="235" type="relat">
    <SectionTitle>
6 Related Work
</SectionTitle>
    <Paragraph position="0"> Although statistical natural language processing has mainly focused on Maximum Likelihood Estimators, (Pereira et al., 1995) proposed a mixture approach to predict next words by using the Context Tree Weighting (CTW) method .(Willems et al., 1995).</Paragraph>
    <Paragraph position="1"> The CTW method computes probability by mixing subtrees in a single context tree in Bayesian fashion.</Paragraph>
    <Paragraph position="2"> Although the method is very efficient, it cannot be used to construct hierarchical tag context trees.</Paragraph>
    <Paragraph position="3"> Various kinds of re-sampling techniques have been studied in statistics (Efron, 1979; Efron and Tibshirani, 1993) and machine learning (Breiman, 1996; Hull et al., 1996; Freund and Schapire, 1996a).</Paragraph>
    <Paragraph position="4"> In particular, the mistake-driven mixture algorithm  was directly motivated by Adaboost (Freund and Schapire, 1996a). The Adaboost method was designed to construct a high-performance predictor by iteratively calling a weak learning algorithm (that is slightly better than random guess). An empirical work reports that the method greatly improved the performance of decision-tree, k-nearestneighbor, and other learning methods given relatively simple and sparse data (Freund and Schapire, 1996b). We borrowed the idea of re-sampling to detect exceptional connections and first proved that such a re-sampling method is also effective for a practical application using a large amount of data.</Paragraph>
    <Paragraph position="5"> The next step is to fill the gap between theory and practition. Most theoretical work on re-sampling assumes i.i.d (identically, independently distributed) samples. This is not a realistic assumption in part-of-speech tagging and other NL applications. An interesting future research direction is to construct a theory that handles Markov processes.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML