File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/p02-1034_intro.xml

Size: 2,670 bytes

Last Modified: 2025-10-06 14:01:30

<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1034">
  <Title>New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The perceptron algorithm is one of the oldest algorithms in machine learning, going back to (Rosenblatt 1958). It is an incredibly simple algorithm to implement, and yet it has been shown to be competitive with more recent learning methods such as support vector machines - see (Freund &amp; Schapire 1999) for its application to image classification, for example.</Paragraph>
    <Paragraph position="1"> This paper describes how the perceptron and voted perceptron algorithms can be used for parsing and tagging problems. Crucially, the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the &amp;quot;all subtrees&amp;quot; (DOP) representation described by (Bod 1998), or a representation tracking all sub-fragments of a tagged sentence. It might seem paradoxical to be able to efficiently learn and apply a model with an exponential number of features.1 The key to our algorithms is the 1Although see (Goodman 1996) for an efficient algorithm for the DOP model, which we discuss in section 7 of this paper. &amp;quot;kernel&amp;quot; trick ((Cristianini and Shawe-Taylor 2000) discuss kernel methods at length). We describe how the inner product between feature vectors in these representations can be calculated efficiently using dynamic programming algorithms. This leads to polynomial time2 algorithms for training and applying the perceptron. The kernels we describe are related to the kernels over discrete structures in (Haussler 1999; Lodhi et al. 2001).</Paragraph>
    <Paragraph position="2"> A previous paper (Collins and Duffy 2001) showed improvements over a PCFG in parsing the ATIS task. In this paper we show that the method scales to far more complex domains. In parsing Wall Street Journal text, the method gives a 5.1% relative reduction in error rate over the model of (Collins 1999). In the second domain, detecting named-entity boundaries in web data, we show a 15.6% relative error reduction (an improvement in F-measure from 85.3% to 87.6%) over a state-of-the-art model, a maximum-entropy tagger. This result is derived using a new kernel, for tagged sequences, described in this paper. Both results rely on a new approach that incorporates the log-probability from a baseline model, in addition to the &amp;quot;all-fragments&amp;quot; features.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML