File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/n06-1040_intro.xml
Size: 4,337 bytes
Last Modified: 2025-10-06 14:03:24
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1040"> <Title>Probabilistic Context-Free Grammar Induction Based on Structural Zeros</Title> <Section position="2" start_page="0" end_page="312" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> There is a very severe speed vs. accuracy tradeoff in stochastic context-free parsing, which can be explained by the grammar factor in the running-time complexity of standard parsing algorithms such as the CYK algorithm (Kasami, 1965; Younger, 1967).</Paragraph> <Paragraph position="1"> That algorithm has complexity O(n3|P|), where n is the length in words of the sentence parsed, and |P |is the number of grammar productions. Grammar non-terminals can be split to encode richer dependencies in a stochastic model and improve parsing accuracy. For example, the parent of the left-hand side (LHS) can be annotated onto the label of the LHS category (Johnson, 1998), hence differentiating, for instance, between expansions of a VP with parent S and parent VP. Such annotations, however, tend to substantially increase the number of grammar productions as well as the ambiguity of the grammar, thereby significantly slowing down the parsing algorithm. In the case of bilexical grammars, where categories in binary grammars are annotated with their lexical heads, the grammar factor contributes an additional O(n2|VD|3) complexity, leading to an over-all O(n5|VD|3) parsing complexity, where |VD |is the number of delexicalized non-terminals (Eisner, 1997). Even with special modifications to the basic CYK algorithm, such as those presented by Eisner and Satta (1999), improvements to the stochastic model are obtained at the expense of efficiency.</Paragraph> <Paragraph position="2"> In addition to the significant cost in efficiency, increasing the non-terminal set impacts parameter estimation for the stochastic model. With more productions, much fewer observations per production are available and one is left with the hope that a subsequent smoothing technique can effectively deal with this problem, regardless of the number of non-terminals created. Klein and Manning (2003b) showed that, by making certain linguistically-motivated node label annotations, but avoiding certain other kinds of state splits (mainly lexical annotations) models of relatively high accuracy can be built without resorting to smoothing.</Paragraph> <Paragraph position="3"> The resulting grammars were small enough to allow for exhaustive CYK parsing; even so, parsing speed was significantly impacted by the state splits: the test-set parsing time reported was about 3s for average length sentences, with a memory usage of 1GB.</Paragraph> <Paragraph position="4"> This paper presents an automatic method for deciding which state to split in order to create concise and accurate unsmoothed probabilistic context-free grammars (PCFGs) for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to the limited size of the sample (sampling zero) or because it is grammatically impossible (structural zero). This helps introduce a relatively small number of new non-terminals with little additional parsing factored Markov order-2 (d) Right-factored Markov order-1 (e) Right-factored Markov order-0 overhead. Experimental results show that, using this method, high accuracies can be achieved with orders of magnitude fewer non-terminals than in typically induced PCFGs, leading to substantial speed-ups in parsing. The approach can further be used in combination with an existing reranker to provide competitive WSJ parsing results.</Paragraph> <Paragraph position="5"> The remainder of the paper is structured as follows. Section 2 gives a brief description of PCFG induction from treebanks, including non-terminal label-splitting, factorization, and relative frequency estimation. Section 3 discusses the statistical criteria that we explored to determine structural zeros and thus select non-terminals for the factored PCFG. Finally, Section 4 reports the results of parsing experiments using our exhaustive k-best CYK parser with the concise PCFGs induced from the Penn WSJ tree-bank (Marcus et al., 1993).</Paragraph> </Section> class="xml-element"></Paper>