File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/02/p02-1034_concl.xml

Size: 2,993 bytes

Last Modified: 2025-10-06 13:53:17

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1034">
  <Title>New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron</Title>
  <Section position="8" start_page="0" end_page="0" type="concl">
    <SectionTitle>
7 Relationship to Previous Work
</SectionTitle>
    <Paragraph position="0"> (Bod 1998) describes quite different parameter estimation and parsing methods for the DOP representation. The methods explicitly deal with the parameters associated with subtrees, with sub-sampling of tree fragments making the computation manageable.</Paragraph>
    <Paragraph position="1"> Even after this, Bod's method is left with a huge grammar: (Bod 2001) describes a grammar with over 5 million sub-structures. The method requires search for the 1,000 most probable derivations under this grammar, using beam search, presumably a challenging computational task given the size of the grammar. In spite of these problems, (Bod 2001) gives excellent results for the method on parsing Wall Street Journal text. The algorithms in this paper have a different flavor, avoiding the need to explicitly deal with feature vectors that track all subtrees, and also avoiding the need to sum over an exponential number of derivations underlying a given tree. (Goodman 1996) gives a polynomial time conversion of a DOP model into an equivalent PCFG whose size is linear in the size of the training set.</Paragraph>
    <Paragraph position="2"> The method uses a similar recursion to the common sub-trees recursion described in this paper. Goodman's method still leaves exact parsing under the model intractable (because of the need to sum over multiple derivations underlying the same tree), but he gives an approximation to finding the most probable tree, which can be computed efficiently.</Paragraph>
    <Paragraph position="3"> From a theoretical point of view, it is difficult to find motivation for the parameter estimation methods used by (Bod 1998) - see (Johnson 2002) for discussion. In contrast, the parameter estimation methods in this paper have a strong theoretical basis (see (Cristianini and Shawe-Taylor 2000) chapter 2 and (Freund &amp; Schapire 1999) for statistical theory underlying the perceptron).</Paragraph>
    <Paragraph position="4"> For related work on the voted perceptron algorithm applied to NLP problems, see (Collins 2002a) and (Collins 2002b). (Collins 2002a) describes experiments on the same named-entity dataset as in this paper, but using explicit features rather than kernels. (Collins 2002b) describes how the voted perceptron can be used to train maximum-entropy style taggers, and also gives a more thorough discussion of the theory behind the perceptron algorithm applied to ranking tasks.</Paragraph>
    <Paragraph position="5"> Acknowledgements Many thanks to Jack Minisi for annotating the named-entity data used in the experiments. Thanks to Rob Schapire and Yoram Singer for many useful discussions.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML