File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/03/w03-1012_intro.xml

Size: 3,711 bytes

Last Modified: 2025-10-06 14:02:00

<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-1012">
  <Title>Using LTAG Based Features in Parse Reranking</Title>
  <Section position="2" start_page="0" end_page="0" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Recent work in statistical parsing has explored alternatives to the use of (smoothed) maximum likelihood estimation for parameters of the model. These alternatives are distribution-free (Collins, 2001), providing a discriminative method for resolving parse ambiguity. Discriminative methods provide a ranking between multiple choices for the most plausible parse tree for a sentence, without assuming that a particular distribution or stochastic process generated the alternative parses.</Paragraph>
    <Paragraph position="1"> We would like to thank Michael Collins for providing the original n-best parsed data on which we ran our experiments and the anonymous reviewers for their comments. The second author is partially supported by NSERC, Canada (RGPIN: 264905).</Paragraph>
    <Paragraph position="2"> Discriminative methods permit the use of feature functions that can be used to condition on arbitrary aspects of the input. This exibility makes it possible to incorporate features of various of kinds. Features can be de ned on characters, words, part of speech (POS) tags and context-free grammar (CFG) rules, depending on the application to which the model is applied.</Paragraph>
    <Paragraph position="3"> Features de ned on n-grams from the input are the most commonly used for NLP applications.</Paragraph>
    <Paragraph position="4"> Such n-grams can either be de ned explicitly using some linguistic insight into the problem, or the model can be used to search the entire space of n-gram features using a kernel representation. One example is the use of a polynomial kernel over sequences. However, to use all possible n-gram features typically introduces too many noisy features, which can result in lower accuracy. One way to solve this problem is to use a kernel function that is tailored for particular NLP applications, such as the tree kernel (Collins and Duffy, 2001) for statistical parsing.</Paragraph>
    <Paragraph position="5"> In addition to n-gram features, more complex high-level features are often exploited to obtain higher accuracy, especially when discriminative models are used for statistical parsing. For example, all possible sub-trees can be used as features (Collins and Duffy, 2002; Bod, 2003). However, most of the sub-trees are linguistically meaningless, and are a source of noisy features thus limiting ef ciency and accuracy. An alternative to the use of arbitrary sets of sub-trees is to use the set of elementary trees as de ned in Lexicalized Tree Adjoining Grammar (LTAG) (Joshi and Schabes, 1997).</Paragraph>
    <Paragraph position="6"> LTAG based features not only allow a more limited and a linguistically more valid set of features over sub-trees, they also provide the use of features that use discontinuous sub-trees which are outside the scope of previous tree kernel de nitions using arbitrary sub-trees. In this paper, we use the LTAG based features in the parse reranking problem (Collins, 2000; Collins and Duffy, 2002). We use the Support Vector Machine (SVM) (Vapnik, 1999) based algorithm proposed in (Shen and Joshi, 2003) as the reranker in this paper. We apply the tree kernel to derivation trees of LTAG, and extract features from derivation trees. Both the tree kernel and the linear kernel on the richer feature set are used. Our experiments show that the use of tree kernel on derivation trees makes the notion of a tree kernel more powerful and more applicable.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML