File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/00/c00-1052_intro.xml
Size: 5,723 bytes
Last Modified: 2025-10-06 14:00:46
<?xml version="1.0" standalone="yes"?> <Paper uid="C00-1052"> <Title>Compact non-left-recursive grammars using the selective left-corner transform and factoring*</Title> <Section position="2" start_page="0" end_page="355" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> TOl)-down i)arsing techniques are al;tl'a(:tiv(! because of their simt)licity, and can often a(:hi(~ve good 1)erformance in 1)racti(:e (l{oark and .\]()hns(m, 1999).</Paragraph> <Paragraph position="1"> However, with a left-re(:ursive grammar such l)ars('as tyl)i(:ally fail to termim~te. Tim left>corner gramltlar l;rallsforln eoliverts a lefl;-recursive ~l'iillllll~lr into a non-lefl;-recursive one: a top-down t)arser using a left-corner transformed grammar simulates a lefl;-(:orner parser using the original granllnar (l/os(mkrantz and Lewis II, 1970; Aho and Ulhnan, 1972). Ih)wever, the left-corner transformed grammar can 1)e significantly larger than the original grammar, ca.using mlmero~ls l)rol)lelns. For example, we show 1)clew that a probat)ilistic context-fr(.~e grammm: (PCFG) estimated froln left-corner transformed Petal WSJ tree-bank trees exhil)its considerably greater st)arse data prol)lems than a PCFG estimal, e(t in the usual manner, siint)ly because the left-corner transformed grammar contains approximately 20 times more 1)reductions. The transform described in this paper t)roduees a grammar al)proximately the same size as the inlmt grmmnar, which is not as adversely at\[ected by sparse data.</Paragraph> <Paragraph position="2"> * This research was slli)i)orl;ed t)y NSF awards !1720368, 9870676 and 98121(19. We would like to tlmnk o1|1&quot; (:olleagues in I~I,IAP (Brown l,aboratory for Linguistic Information Proccssing) and Bet) Moore tbr their hcll)ful comments on this pal)Or.</Paragraph> <Paragraph position="3"> Left-corner transforms a.re particularly useflll because they can i)reserve annotations on productions (n:ore on this 1)(flow) and are thereibre apt)lieable to more COml)Iex graminar formalisms as well its CFGs; a t)roI)erty which other al)l)roaehes to lefl;-recursion elimination tyl)ically lack. For examl)le , they al)l)ly to l(~ft-r(~cursive unification-based granmmrs (Mat;sumoto et al., 1983; Pereira and Shieber, 1987; .h)hnson, 1998a). Because the emission 1)robabilit;y of a PCFG 1)ro(hm(;ion ca15 be regarded as an anllotatioll on a CFG 1)reduction, the left-corner transform can t)rodue(', a CFG with weighted l)roductions which assigns the same l)robal)iliti(~s to strings an(l transtbrmed trees its the original grammar (Abney et al., 11999). Ilowever, the transibrmed grammars (:an be much larger than the original, which is unac('el)table tbr many aI)t)lieations involving large grammars.</Paragraph> <Paragraph position="4"> The selective left-corner transform reduces the transl'ornm(l grammar size because only those l)ro(lu(:tions which apt)ear in a left-recto'sire (:y(:le llee(l 1)e recognized left-(:orner in order to remove leftrecurs|on. A tOl)-down parser using a grammar produeed by the selective left-(:orner |;ranst.'orm simulates a generalized left-corner parser (Demers, 1977; Nijholt, 1980) wlfich recognizes st user-specified sul)set; of the original productions in a left-corner fashion, and the other productions tol)-down.</Paragraph> <Paragraph position="5"> Although we do not investigate it in this 1)al)er, the selective left-(:orner transform should usually lmve a slnaller sear(:h sl)ace relative, to tim standard left-corner transform, all else being equal. The partial l)arses t)roduced during a tot)-down parse consist of a single connected tree fragment, while the partial parses l)rodueed produced during a let't-corner t)arse generally consist of several discommcted tree fragments. Since these fragments arc only weakly related (via the &quot;link&quot; constraint descril)ed below), the search for each fragment ix relatively independent.</Paragraph> <Paragraph position="6"> This lllay l)e rest)onsil)le for the ol)servation that exhaustive left-corner 1)arsing is less efficient titan top-down l)arsing (Covington, 1994). Intbrmally, because the selective left-corner transforln recognizes only a sul)set of 1)reductions in a lefl;-corner fashion, its partial parses contain fewer tree discontiguous fl'agnlents and the search inay be more efficient.</Paragraph> <Paragraph position="7"> While this paper focuses oil reducing grammar size to nlinimize sparse data problems in PCFG estilnation, tile modified left-corner transforms described here are generally api)licable wherever the original left-conler transform is. For example, tile selective left-corner transform can be used in place of the standard left-comer transform in the construction of finite-state approximations (Johnson, 1998a), often reducing the size of the intermediate automata constructed. The selective left-corner transform can be generalized to head-corner parsing (vail Noord, 1997), yielding a selective head-corner parser. (This follows from generalizing the selective left-corner transform to Horn clauses).</Paragraph> <Paragraph position="8"> After this paI)er was accepted for publication we learnt of Moore (2000), which addresses the issue of grammar size using very similar techniques to those proposed here. The goals of tile two pat)ers are slightly different: Moore's approach is designed to reduce the total grammar size (i.e., the sunl of the lengths of the productions), wtfile our approach minimizes the number of productions. Moore (2000) does not address left-corner tree-transforms, or questions of sparse data and parsing accuracy that are covered ill section 3.</Paragraph> </Section> class="xml-element"></Paper>