XML Viewer - w05-1506

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/05/w05-1506_intro.xml
Size: 4,398 bytes
Last Modified: 2025-10-06 14:03:18
<?xml version="1.0" standalone="yes"?>
<Paper uid="W05-1506">
  <Title>Better k-best Parsing</Title>
  <Section position="4" start_page="53" end_page="53" type="intro">
    <SectionTitle>
2 Previous Work
</SectionTitle>
    <Paragraph position="0"> As pointed out by Charniak and Johnson (2005), the major difficulty in k-best parsing is dynamic programming.</Paragraph>
    <Paragraph position="1"> The simplest method is to abandon dynamic programming and rely on aggressive pruning to maintain tractability, as is used in (Collins, 2000; Bikel, 2004). But this approach is prohibitively slow, and produces rather low-quality k-best lists (see Sec. 5.1.2). Gildea and Jurafsky (2002) described an O(k2)-overhead extension for the CKY algorithm and reimplemented Collins' Model 1 to obtain k-best parses with an average of 14.9 parses per sentence. Their algorithm turns out to be a special case of our Algorithm 0 (Sec. 4.1), and is reported to also be prohibitively slow.</Paragraph>
    <Paragraph position="2"> Since the original design of the algorithm described below, we have become aware of two efforts that are very closely related to ours, one by Jim*enez and Marzal (2000) and another done in parallel to ours by Charniak and Johnson (2005). Jim*enez and Marzal present an algorithm very similar to our Algorithm 3 (Sec. 4.4) while Charniak and Johnson propose using an algorithm similar to our Algorithm 0, but with multiple passes to improve efficiency. They apply this method to the Charniak (2000) parser to get 50-best lists for reranking, yielding an improvement in parsing accuracy.</Paragraph>
    <Paragraph position="3"> Our work differs from Jim*enez and Marzal's in the following three respects. First, we formulate the parsing problem in the more general framework of hypergraphs (Klein and Manning, 2001), making it applicable to a very wide variety of parsing algorithms, whereas Jim*enez and Marzal de ne their algorithm as an extension of CKY, for CFGs in Chomsky Normal Form (CNF) only. This generalization is not only of theoretical importance, but also critical in the application to state-of-the-art parsers such as (Collins, 2003) and (Charniak, 2000). In Collins' parsing model, for instance, the rules are dynamically generated and include unary productions, making it very hard to convert to CNF by preprocessing, whereas our algorithms can be applied directly to these parsers. Second, our Algorithm 3 has an improvement over Jim*enez and Marzal which leads to a slight theoretical and empirical speedup. Third, we have implemented our algorithms on top of state-of-the-art, large-scale statistical parser/decoders and report extensive experimental results while Jim*enez and Marzal's was tested on relatively small grammars.</Paragraph>
    <Paragraph position="4"> On the other hand, our algorithms are more scalable and much more general than the coarse-to- ne approach of Charniak and Johnson. In our experiments, we can obtain 10000-best lists nearly as fast as 1-best parsing, with very modest use of memory. Indeed, Charniak (p.c.) has adopted our Algorithm 3 into his own parser implementation and con rmed our ndings.</Paragraph>
    <Paragraph position="5"> In the literature of k shortest-path problems, Minieka (1974) generalized the Floyd algorithm in a way very similar to our Algorithm 0 and Lawler (1977) improved it using an idea similar to but a little slower than the binary branching case of our Algorithm 1. For hypergraphs, Gallo et al. (1993) study the shortest hyperpath problem and Nielsen et al. (2005) extend it to k shortest hyperpath. Our work differes from (Nielsen et al., 2005) in two aspects. First, we solve the problem of k-best derivations (i.e., trees), not the k-best hyperpaths, although in many cases they coincide (see Sec. 3 for further discussions).</Paragraph>
    <Paragraph position="6"> Second, their work assumes non-negative costs (or probabilities * 1) so that they can apply Dijkstra-like algorithms. Although generative models, being probabilitybased, do not suffer from this problem, more general models (e.g., log-linear models) may require negative edge costs (McDonald et al., 2005; Taskar et al., 2004).</Paragraph>
    <Paragraph position="7"> Our work, based on the Viterbi algorithm, is still applicable as long as the hypergraph is acyclic, and is used by McDonald et al. (2005) to get the k-best parses.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML