File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/04/p04-1013_relat.xml

Size: 2,383 bytes

Last Modified: 2025-10-06 14:15:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="P04-1013">
  <Title>Discriminative Training of a Neural Network Statistical Parser</Title>
  <Section position="7" start_page="0" end_page="0" type="relat">
    <SectionTitle>
7 Related Work
</SectionTitle>
    <Paragraph position="0"> Johnson (2001) investigated similar issues for parsing and tagging. His maximal conditional likelihood estimate for a PCFG takes the same approach as our generative model trained with a discriminative criteria. While he shows a non-signi cant increase in performance over the standard maximal joint likelihood estimate on a small dataset, because he did not have a computationally e cient way to train this model, he was not able to test it on the standard datasets. The other models he investigates conate changes in the probability models with changes in the training criteria, and the discriminative probability models do worse.</Paragraph>
    <Paragraph position="1"> In the context of part-of-speech tagging, Klein and Manning (2002) argue for the same distinctions made here between discriminative models and discriminative training criteria, and come to the same conclusions. However, their arguments are made in terms of independence assumptions. Our results show that these generalizations also apply to methods which do not rely on independence assumptions.</Paragraph>
    <Paragraph position="2"> While both (Johnson, 2001) and (Klein and Manning, 2002) propose models which use the parameters of the generative model but train to optimize a discriminative criteria, neither proposes training algorithms which are computationally tractable enough to be used for broad coverage parsing. Our proposed training method succeeds in being both tractable and e ective, demonstrating both a signi cant improvement over the equivalent generative model and state-of-the-art accuracy.</Paragraph>
    <Paragraph position="3"> Collins (2000) and Collins and Du y (2002) also succeed in nding algorithms for training discriminative models which balance tractability with e ectiveness, showing improvements over a generative model. Both these methods are limited to reranking the output of another parser, while our trained parser can be used alone. Neither of these methods use the parameters of a generative probability model, which might explain our better performance (see table 2).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML