File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/05/p05-1023_concl.xml

Size: 2,446 bytes

Last Modified: 2025-10-06 13:54:43

<?xml version="1.0" standalone="yes"?>
<Paper uid="P05-1023">
  <Title>Data-Defined Kernels for Parse Reranking Derived from Probabilistic Models</Title>
  <Section position="8" start_page="186" end_page="187" type="concl">
    <SectionTitle>
7 Conclusions
</SectionTitle>
    <Paragraph position="0"> This paper proposes a method for deriving a kernel for reranking from a probabilistic model, and demonstrates state-of-the-art accuracy when this method is applied to parse reranking. Contrary to most of the previous research on kernel methods in parsing, linguistic knowledge does not have to be expressed through a list of features, but instead can be expressed through the design of a probability model.</Paragraph>
    <Paragraph position="1"> The parameters of this probability model are then trained, so that they reflect what features of trees are relevant to parsing. The kernel is then derived from this trained model in such a way as to maximize its usefulness for reranking.</Paragraph>
    <Paragraph position="2"> We performed experiments on parse reranking using a neural network based statistical parser as both the probabilistic model and the source of the list of candidate parses. We used a modification of the Voted Perceptron algorithm to perform reranking with the kernel. The results were amongst the best current statistical parsers, and only 0.2% worse than the best current parsing methods which use kernels.</Paragraph>
    <Paragraph position="3"> We would expect further improvement if we used different models to derive the kernel and to gener- null ate the candidates, thereby exploiting the advantages of combining multiple models, as do the better performing methods using kernels.</Paragraph>
    <Paragraph position="4"> In recent years, probabilistic models have become commonplace in natural language processing. We believe that this approach to defining kernels would simplify the problem of defining kernels for these tasks, and could be very useful for many of them.</Paragraph>
    <Paragraph position="5"> In particular, maximum entropy models also use a normalized exponential function to estimate probabilities, so all the methods discussed in this paper would be applicable to maximum entropy models.</Paragraph>
    <Paragraph position="6"> This approach would be particularly useful for tasks where there is less data available than in parsing, for which large-margin methods work particularly well.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML