File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/relat/03/n03-1014_relat.xml

Size: 2,189 bytes

Last Modified: 2025-10-06 14:15:39

<?xml version="1.0" standalone="yes"?>
<Paper uid="N03-1014">
  <Title>Inducing History Representations for Broad Coverage Statistical Parsing</Title>
  <Section position="8" start_page="0" end_page="0" type="relat">
    <SectionTitle>
8 Related Work
</SectionTitle>
    <Paragraph position="0"> Most previous work on statistical parsing has used a history-based probability model with a hand-crafted set of features to represent the derivation history (Ratnaparkhi, 1999; Collins, 1999; Charniak, 2000). Ratnaparkhi (1999) defines a very general set of features for the histories of a shift-reduce parsing model, but the results are not as good as models which use a more linguistically informed set of features for a top-down parsing model (Collins, 1999; Charniak, 2000). In addition to the method proposed in this paper, another alternative to choosing a finite set of features is to use kernel methods, which can handle unbounded feature sets. However, this causes efficiency problems. Collins and Duffy (2002) define a kernel over parse trees and apply it to re-ranking the output of a parser, but the resulting feature space is restricted by the need to compute the kernel efficiently, and the results are not as good as Collins' previous work on re-ranking using a finite set of features (Collins, 2000).</Paragraph>
    <Paragraph position="1"> Future work could use the induced history representations from our work to define efficiently computable tree kernels. null The only other broad coverage neural network parser (Costa et al., 2001) also uses a neural network architecture which is specifically designed for processing structures. We believe that their poor performance is due to a network design which does not take into consideration the recency bias discussed in section 4. Ratnaparkhi's parser (1999) can also be considered a form of neural network, but with only a single layer, since it uses a log-linear model to estimate its probabilities. Previous work on applying SSNs to natural language parsing (Henderson, 2000) has not been general enough to be applied to the Penn Treebank, so it is not possible to compare results directly to this work.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML