XML Viewer - w06-1668

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/w06-1668_intro.xml
Size: 4,381 bytes
Last Modified: 2025-10-06 14:03:59
<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-1668">
  <Title>Competitive generative models with structure learning for NLP classification tasks</Title>
  <Section position="3" start_page="0" end_page="576" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Discriminative models have become the models of choice for NLP tasks, because of their ability to easily incorporate non-independent features and to more directly optimize classification accuracy.</Paragraph>
    <Paragraph position="1"> State of the art models for many NLP tasks are either fully discriminative or trained using discriminative reranking (Collins, 2000). These include models for part-of-speech tagging (Toutanova et al., 2003), semantic-role labeling (Punyakanok et al., 2005; Pradhan et al., 2005b) and Penn Tree-bank parsing (Charniak and Johnson, 2005).</Paragraph>
    <Paragraph position="2"> The superiority of discriminative models has been shown on many tasks when the discriminative and generative models use exactly the same model structure (Klein and Manning, 2002). However, the advantage of the discriminative models can be very slight (Johnson, 2001) and for small training set sizes generative models can be better because they need fewer training samples to converge to the optimal parameter setting (Ng and Jordan, 2002). Additionally, many discriminative models use a generative model as a base model and add discriminative features with reranking (Collins, 2000; Charniak and Johnson, 2005; Roark et al., 2004), or train discriminatively a small set of weights for features which are generatively estimated probabilities (Raina et al., 2004; Och and Ney, 2002). Therefore it is important to study generative models and to find ways of making them better even when they are used only as components of discriminative models.</Paragraph>
    <Paragraph position="3"> Generative models may often perform poorly due to making strong independence assumptions about the joint distribution of features and classes.</Paragraph>
    <Paragraph position="4"> To avoid this problem, generative models for NLP tasks have often been manually designed to achieve an appropriate representation of the joint distribution, such as in the parsing models of (Collins, 1997; Charniak, 2000). This shows that when the generative models have a good model structure, they can perform quite well.</Paragraph>
    <Paragraph position="5"> In this paper, we look differently at comparing generative and discriminative models. We ask the question: given the same set of input features, what is the best a generative model can do if it is allowed to learn an optimal structure for the joint distribution, and what is the best a discriminative model can do if it is also allowed to learn an optimal structure. That is, we do not impose any independence assumptions on the generative or discriminative models and let them learn the best representation of the data they can.</Paragraph>
    <Paragraph position="6"> Structure learning is very efficient for generative models in the form of directed graphical models (Bayesian Networks (Pearl, 1988)), since the optimal parameters for such models can be estimated in closed form. We compare Bayesian Net- null works with structure learning to their closely related discriminative counterpart - conditional log-linear models with structure learning. Our conditional log-linear models can also be seen as Conditional Random Fields (Lafferty et al., 2001), except we do not have a structure on the labels, but want to learn a structure on the features.</Paragraph>
    <Paragraph position="7"> We compare the two kinds of models on two NLP classification tasks - prepositional phrase attachment and semantic role labelling. Our results show that the generative models are competitive with or better than the discriminative models. When a small set of interpolation parameters for the conditional probability tables are fit discriminatively, the resulting hybrid generativediscriminative models perform better than the generative only models and sometimes better than the discriminative models.</Paragraph>
    <Paragraph position="8"> In Section 2, we describe in detail the form of the generative and discriminative models we study and our structure search methodology. In Section 3 we present the results of our empirical study.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML