File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1052_metho.xml

Size: 23,248 bytes

Last Modified: 2025-10-06 14:07:07

<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1052">
  <Title>Compact non-left-recursive grammars using the selective left-corner transform and factoring*</Title>
  <Section position="3" start_page="355" end_page="358" type="metho">
    <SectionTitle>
2 The selective left-corner and
</SectionTitle>
    <Paragraph position="0"> related transforms This section introduces the selective left-corner transform and two additional factorization transforms which apply to its output. These transfbrnm are used ill tile experiInents described in tile following section. As Moore (2000) observes, in general the transforms produce a non-left-recursive output grammar only if tile input grammar G does not contain unary cycles, i.e., there is no nonterminal A such that A -~+ A.</Paragraph>
    <Section position="1" start_page="355" end_page="356" type="sub_section">
      <SectionTitle>
2.1 The selective left-corner transform
</SectionTitle>
      <Paragraph position="0"> The selective left-corner transform takes as input a CFG G = (V, T, P, S) and a set of left-corner productions L C_ P, which contains no epsilon t)roductions; the non-left-corner prodnctions P - L are called top-down productions. The standard left-corner tr'ansform is obtained by setting L to the set of all non-epsilon productions in P. The selective left-corner trnnsform of G with respect to L is the CFG</Paragraph>
      <Paragraph position="2"> and P1 contains all instances of tile schemata 1. In these schemata, D E V, w E T, and lower case greek letters range over (V tO T)*. The D-X are new nont;ernlinals; informally they encode a parse state in which an D is predicted top-down and an X  inal parse tree correspond to left-corner productions; the corresponding local trees (generated by instances of schema lc) in the selective leff-conler transfornled tree are also shown shaded. The local tree colored black is generated by an instance of schema lb.</Paragraph>
      <Paragraph position="3"> has been found left-corner, so D X ~cr.(c,) 7 only if D ~b XT.</Paragraph>
      <Paragraph position="5"> Tile schemata flmction as follows. The productions introduced by schema 1~ start a left-corner parse of a predicted nonterminal D with its let'mlost terminal w, while those introduced by schenla lb start; a left-corner parse of D with a left&gt;corner A, which is itself found by the top-down recognition of production A -+ (t E P- L. Scheina lc extends the current left-corner B tit) to a C with tile left&gt;corner recognition of production C ~ /3 ft. Finally, scheina ld inatches tile top-down prediction with tile recognized left-corner category.</Paragraph>
      <Paragraph position="6"> Figure 1 schematically depicts the relationship between a chain of left-comer t)roductions in a parse tree generated by G and the chain of correst)onding instances of schema le. The left-comer recognition of the chain starts with the recognition of (t, tile right-hand side of a top-down production A --+ ~, using an instance of schema lb. Tile left-branching chain of left-corner productions corresponds to a right-branching chain of instances of schema lc; the left-corner transforln in effect converts left recursion into right recursion. Notice that tile top-down predicted category D is passed down this right-recursive chain, effectively multiplying each left-conler productions by the possible top-down predicted categories. Tile right recursion terininates with an illstance of schema ld when tile left-comer and top-down categories match.</Paragraph>
      <Paragraph position="7"> Figure 2 shows how tot)-down productions from G are recognized using PSCL(G). When the se- null a: by PSCL(G) involves a left-corner category A-A, which immediately rewrites to e. One-step e-removal applied to PSCL(G) l)roduces a grmnmar in which each top-down production A -+ ct corresponds to a production A --+ tt in the transformed grammar.</Paragraph>
      <Paragraph position="8"> lective left-corner tra,nsform is tbllowed by a one-step c-renlowd transfornl (i.e., coml)osition or partial evaluation of schema 1t) with respect to schema ld (Johnson, 1998a; Abney and 3oMson, 1991; Resnik, 1992)), each top-down production f'rolll G appears uilclmnged in tile tinal grammar. Full e-relnoval yields the grannnar giwm 1) 3, the schemata below.</Paragraph>
      <Paragraph position="9">  .D -~ w D-w D -~ 'w where. D ~j w D ~ ~DA whereA-+(~cl )-L .D -+ a where D =&gt;* A P L A, -+ ~ G -- L D-B --+ fl D C whereC--&gt;BflcL D-B -} fl wllereD~},C,C~Bfl6L  Moore (2000) introduces a version of the left-corner transform called LCLIt, which al)plies only to productions with left-recursive parent and left clfihl categories. \]n the~ (:ontext of the other transforms that Moore introduces, it seems to have the, sallle effect in his system as the s(Je(;tive lefl;-corll(W trailsform does lmre.</Paragraph>
    </Section>
    <Section position="2" start_page="356" end_page="356" type="sub_section">
      <SectionTitle>
2.2 Selective left-corner tree transforlllS
</SectionTitle>
      <Paragraph position="0"> There is a 1.-to-1 correspondence between the 1)arse trees generated by G and PSCL(G). A tree t is generated by G iff there is a corresponding t' generated by PSCL(G), where each occurrence of a top-down production in the derivation of t corresponds to exactly one local l, ree gelmrated by occurrence of the corresponding instance of schema 11) ill the derivation of t', and each occurrence of a M't-corner production in 1 corresponds to exactly one occurrence of the corresponding instance of schema le in t'. It; is straightforward to detine a 14o-1 tree l;ransform TL mapping parse trees of G into parse trees of PSdL (G) (.Johnson, 1998a; Roark and Johnson, 1999). In the empirical evaluation below, we estinmte a PCFG Dora the trees obtained by applying 7}, to the trees in the Petal WSJ tree-lmnk, and compare it to tile PCFG estinmted from the original tree-bank trees.</Paragraph>
      <Paragraph position="1"> A stochastic top-down parser using the I'CFG estimated from the trees produced by ~, simulates a stochastic generalized left-corner Imrser, wlfich is a generalization of a standard stochastic lefl;-corner 1)arser that pernfits productions to t)e ret;ognize, d top-down as well as left-corner (Manning and Carpenter, 1997). Thus investigating the 1)roperties of PCFG estimated from trees transformed with &amp;quot;YL is an easy way of studying stochastic trash-down automata performing generalized lefi;-corner parses.</Paragraph>
    </Section>
    <Section position="3" start_page="356" end_page="357" type="sub_section">
      <SectionTitle>
2.3 Pruning useless productions
</SectionTitle>
      <Paragraph position="0"> We turn now to the problmn of reducing the size of tile grmnmars produced by left-corner transforms.</Paragraph>
      <Paragraph position="1"> Many of the productions generated by schemata 1 art: useless, i.e., they never appear in any terminating deriw~tion. Wtfile they can be removed by standard methods for deleting useless productions (Ilopcroft and Ulhnan, 1979), the relationship between the parse trees of G and PSCL(G) depicted in Figure 1 shows how to determine ahead of time the new nonterminals D X that can at)pear in useful productions of ECL (G). This is known as a link constraint. null D)r (P)CFGs there is a particularly simple link constrainl;: \]) X apt)ears in useflfl productions of PSCL(G) only if ~7 &lt; ( 17 U T)*.D =&gt;* XT. If * L epsilon removal is applied to the resulting grammar, D X appears in usefill productions only if H7 C (17 U T) +.D ~}, X7. Thus one only need ge.nerate instances of the left-corner schemata which satist~y the corresponding link constraints.</Paragraph>
      <Paragraph position="2"> Moore (2000) suggests all additional constraint on nonte.rminals D X that can al)l)ear in useflll 1)roduc l;iolts of PSCL(G): D lllllsl; eitller be th(! start synJ)ol of G or else al)pear in a production A --+ o'D/3 of G, for .,,;, A c- V, c {Vu T}+ c Tp.</Paragraph>
      <Paragraph position="3"> It is easy to see that the l}roducl,ions that Moore's constraint prohibits are useless. There is one nonternfinal in the tree-bank gramnmr investigated below that has this property, namely LST. However, ill the tree-lmnk granmmr none of the productions exlmnding LST are left-recursive (in fact, the first; dfild is ahvays a pretermiiml), so Moore's constraint does not atgect the size of the transformed grammars investigated below.</Paragraph>
      <Paragraph position="4"> While these constraints can dramatically reduce both the number of productions and the size of the 1)arsing search space of the 1;ransformed grmnmar, in general the transfl)rmed grammar PSCL (G) can 1)e quadratically larger than G. There are two causes for the explosion ill grmnmar size. First, PSCL(G) contains an instance of sdmma lb tbr each top-down production A --+ a and each D such that 37. D ~}, A 7. Second, PSCI,(G) contains an instance of schema lc for each left-corner production C -~ fi and each D such that BT.D ~, C7. In etDct, PSCL(G) contains one copy of each production for each possible left-comer ancestor. Section 2.5 describes filrther factorizations of the l)roductions of PSCL (G) which mitigate these causes.</Paragraph>
    </Section>
    <Section position="4" start_page="357" end_page="357" type="sub_section">
      <SectionTitle>
2.4 Optimal choice of L
</SectionTitle>
      <Paragraph position="0"> Because ::&gt;~, increases monotonically with =&gt;L and hence L, we typically reduce the size of PSCL(G) by making the left-corner production set L as small as possit)le. This section shows how to find the unique minimal set of left-corner productions L such that PSCL(G) is not left-recursive.</Paragraph>
      <Paragraph position="1"> Assume G = (V,T, P, S) is wuned (i.e., P contains no useless productions) and that there is no A 6 V such that A --++ A (i.e., G does not generate recursive unary branching chains). For reasons of space we also assume that P contains no e-productions, but this approach can be extended to deal with them if desired. A production A -+/3fl C P is left-rccursive iff ~3' C (V U T)*. \]3 ~, AT, i.e., P rewrites B into a string beginning with A. Let L0 be the set of left-recursive prodtlctious in G. Then we claim (1) that PSCLo (G) is not left-recursive, and (2) that for all L C Lo, PSCL(G) is leff-recursive.</Paragraph>
      <Paragraph position="2"> Claim 1 follows tY=om the fact, that if A ~s,0 B7 then A =:&gt;,, /37 and tile constraints ill section 2.3 on useful productions of PSCLo(G). Claim 2 tbllows from the fact that if L C L0 then there is a chain of left-recursive productions that includes a top-down production; a simple induction on tile length of the chain shows that gCL (G) is left-recursive.</Paragraph>
      <Paragraph position="3"> This result justifies the common practice in natural language lefl;-corner t)arsing of taking tile terminals to be the preterminal t)art-of-speech tags, rather than the lexical items themselves. (We did not attempt to calculate tile size of such a left-comer grammar in tilt empirical evaluation below, lint it would be much larger than any of the grammars described there). In fact, if the preterminals are distinct from the other nonterminals (as they are ill the tree-bank grammars investigated below) then L0 does not include any productions beginning with a preterminal, and PSCLo (G) contains no instances of schema la at all. We now turn our attention to tlm other sclmmata of the selective left-corner grammar transform.</Paragraph>
    </Section>
    <Section position="5" start_page="357" end_page="358" type="sub_section">
      <SectionTitle>
2.5 Factoring the output of PSCL
</SectionTitle>
      <Paragraph position="0"> This section defines two factorizations of the outtmt of the selective left-corner grammar transform that can dramatically reduce its size. These factorizations are most effective if the number of t)roductions is much larger than the number of nonterminals, as is usually the case with tree-bank grmnmars.</Paragraph>
      <Paragraph position="1"> Tilt top-down factorization decomposes schema lb by introducing new imnterminals D t, where D C V, that have the stone expansions that D does in G. Using the same interpretation for variables as in schemata 1, if G = (I~ T, P, S) then</Paragraph>
      <Paragraph position="3"> and Ptd contains all instances of the schemata la, 3a, 3b, lc and 1(t.</Paragraph>
      <Paragraph position="4"> D --+ A'D-A whereA-+aEP-L (3a) A' -+ a, whereA--&gt;creP-L (3b)  Notice that the number of instances of schema 3a is less than the square of tile number of nonterminals and that the number of instances of sdmma 31) is the number of top-down productions; the sum of these numbers is usually much less than tile mlmber of instances of schema lb.</Paragraph>
      <Paragraph position="5"> Top-down factoring plws approximately tile same role as &amp;quot;non-left-recursion grouping&amp;quot; (NLRG) does in Moore's (2000) approach. The meier difl!erence is that NLRG applies to all productions A ~ /3/9 in wtfich /3 is not left-recm'sive, i.e., ~7./7 =&gt;~ /3% while in our system toll-down factorization applies to those productions tbr which ~7. B ~, AO', i.e., the productions not directly involved in left recursion. Tim left-corner factorization decomposes schema lc in a similar way using new nonterminals D\X, where D e V and X ~ V U T.</Paragraph>
      <Paragraph position="7"> The number of instances of schema 4a is bounded by the numtmr of instances of schema lc and is typically nmch smaller, while the number of instances of schema 41) is precisely the munber of left-corner productions L.</Paragraph>
      <Paragraph position="8"> Left-corner factoring seems to correspond to one step of Moore's (2000) &amp;quot;left factor&amp;quot; (LF) operation. Tile left; factor operation constructs new nonterminals corresponding to common prefixes of&amp;quot; arbitrary length, while left-corner factoring effectively only factors the frst nonterminal symbol on the right hand side of left-corner productions. While we have not done experiments, Moore's left factor operation would seem to reduce the total number of symbols in the transformed grammar at tile expense of possibly introducing additional productions, while our left-corner factoring reduces the number of productions. null These two factorizations can be used together in the obvious way to define a grmnmar trans- ~__.C(ld,le) form &amp;quot;L , whose productions are defined by schemata la, 3a, 3b, 4a, 4b and ld. There are correspondiug tree transtbrms, which we refer to as TI! td) , etc., below. Of course, the pruning constraints described in section 2.3 are applicable with these factorizations, and corresponding invertible tree transforms can be constructed.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="358" end_page="514" type="metho">
    <SectionTitle>
3 Empirical Results
</SectionTitle>
    <Paragraph position="0"> To examine the effect of the tra.nsforms outlined above, we experimented with vm'ious PCFGs indueed from sections 2--21 of a modified Pcml WSJ tree-bank as described in Johnson (19981)) (i.e., labels simplifiecl to grammatical ca.tegorics, R.OOT lu)des added, empty nodes and vacuous unary bra.nehcs deleted, and auxiliaries retagged as AUX or AUX('). \~,Ze. ignored lexic.al items, and treated the part-of-speech tags as terminals. As Bob Moore pointed out Lo us, the left-corner transform may produc.e left-recursive grmnmars if its inlmt grammar contains mmry cycles, so we removed them using the a transforln that Moore suggested. Given an iifitial set of (non-epsihm) productions P, the transtbrmed grammar contains the following in:odu(:tions, wherc~  This transform can be extended t,o one on PCFGs which preserves derivation probabilities. In this sectic)n, we fix P to) be the produeticms l;lmt re.sult; afl;er al)plying this unary t:yc:le removal transforma.tion to the tree-l)ank 1)roductions, and G to \])e the ('orrest)onding grammm'.</Paragraph>
    <Paragraph position="1"> Tables 1 and 2 give the sizes of selective left;(:orner grmnlnar trmlsforms of G for various wthles of l;he left-et)rner set L and fa(:torizal;ions, without and with epsilon-remowfl respectively. In l;he tables, L/j is the st'./; of hd't-rc.cm'siv(' 1)roductions in P, as detined in set:lion 2.4. N is the sel of 1)roclu(: l;ions in 1~ whose hfft-ha\]M sides do not begin with a part-ofspee(:h (P()S) tag; 1)ecause I'OS tags are distinct front other nontermimtls in l;he tree-lmnk, N is an easily identified set of I)roductions guaranteed to include L0. The tables also gives the sizes of maximum-likelihood PCFGs estimated from the tr(;es resulting fl:om applying the sele(:tive left-corner tree transforms 7- 1,(} the tree-bank, l)reaking mmry t:yeles as clescribed above. For the I)arsing exl)eriments below we always deleted empty nodes in the outl)ut of these tree transforms; this corresponds to el)silon removal in the grammar transform.</Paragraph>
    <Paragraph position="2"> First, note that/2Cv(G), the result of al)plying the standard left-corner glmmnar transform to G, has al)proximately 20 times the number of t)roductions PSC (m't~)(G), the result of aI)- tha.t G has. Itowever &amp;quot;co plying the selective left-corner grammar transformation with factorization, has approximately 11.4 times the munber of productions that G has. Thus the.</Paragraph>
    <Paragraph position="3"> methods described in this paper cml in fact dramatically reduce the. size of left-corner transformed grammars. Second, note that PSC(~t'I&amp;quot;)(G) is not much th.,, :his t,et:.,,se N larger is llOt IJO \ \]  and tree transtbrms after pruning with link constraints without epsihm removal. Cohmms indicate thctorization.</Paragraph>
    <Paragraph position="4"> In the grammar and tree transfl)rms, P is the set, of productions in G (i.e., the standard M't-corner transform), N is the set of all productions in P which do not begin with a POS tag, mM L0 is the set of left-recursive t)roclu(:tions.</Paragraph>
    <Paragraph position="5">  and tree trmtsforms aftc.r pruning with link constraints with epsihm removM, using the same notation as Table 1. much larger than L0, which in turn is be(:ausc, most pairs of non-P()S nonternfinals A, B are nmt;ually left-recursive.</Paragraph>
    <Paragraph position="6"> 'l)lrning now to the PCFGs estimated after at)plying tree transtbrms, we notice that grammar size (Loes ll()t Jll(;Fe}Lqe. Ile}llJ\]y St) dramatically. These PCFGs encode a. maximum-likelihood estimate of the state transiti(m probabilities for vmious stochastic generalized h;t't-(-orner t)m'sers, since a tol).-clt)wn parser using these, grammars simulates a generalized left-corner 1)arser. The fact that PSCp(G) is 17 timc.s larger than the. PCFG infe.rred a.fter applying &amp;quot;T}, to the tree-lmnk means that most of tile l}OS sible transitions of a standard stochastic left-corner parser are not observc.d in the tree-bank la&amp;quot;'ammg&amp;quot; data. The state of a left-corner parser does capture some linguistic generalizations (Mmming an&lt;l Carpenter, 1997; Roark a.nd Johnson, 1999), but one might still expect sparse-data problems. Note that &amp;quot;Lo is only 1.4 times larger than T, (t~'z~) Lo , SO We expect less serious sp~rse data problems with the fat:toted selective left-corner transibrm.</Paragraph>
    <Paragraph position="7"> We quantii~ these sparse data prol)lems in two ways using a held-out test eorIms, viz., all sentences in section 23 of the trce-lmnk. First, table 3 lists the mmfl)er of sentences in the test corpus that fail to receive a parse with the wwious PCFGs mentioned  not receive a parse using various fl'om sections 2-21.</Paragraph>
    <Paragraph position="8"> in section 23 that do  formed trees of sentences in section 23 that do not appear in the corresponding transforined trccs f,'om sections 2 21. (The subscript epsilon indicates epsilon remowfl was applied).</Paragraph>
    <Paragraph position="9"> above. This is a relatively crude lneasure, but correlates roughly with the ratios of gralnlnar sizes, as expected.</Paragraph>
    <Paragraph position="10"> Second, table 4 lists the number of productions found in the tree-transformed test cortms that (lo not at)pear in the correspondingly transformed trees of sections 2 2t. What is striking here is tlmt the number of missing I)roductions aft;er either of the l;ransforlns , Lo or , N is apl)roxilllal;ely the sa, ine as tim number of inissing 1)reductions using the untransformed trees, indicating that the factored selective left-corner transfl)rms cause little or no additional sparse data problem. (The relationship between local trees ill the parse trees of G and PSdc(G) mentioned earlier implies that left-corner tree transtbrmations wilt not decrease the number of missing productions).</Paragraph>
    <Paragraph position="11"> We also investigate the accuracy of the maximum-likelihood parses (MLPs) obtained using the PCFGs estimated from tile output of the various left-corner tree transforms. 1 We searched for these parses using all exhaustive CKY parser. Because the parse trees of these PCFGs are isomorphic to the derivations of the corresponding stochastic generalized left-corner parsers, we are in fact evaluating different kinds of stochastic generalized left-corner parsers inferred from sections 2-21 of the tree-bank. We used 1\Y=e (lid not investigate the grammars produced by the various left-corner grammar transforms. Because a left-corner grammar transform ECL preserves production probal)ilities, the highest scoring parses obtained using the weighted CFG EeL(G) should be the highest scoring parses obtained using G transformed by TL.</Paragraph>
    <Paragraph position="12">  estimated using various tree-transforms ill a transformdetransform framework using test data from section 23. tile transforn&gt;detransfornl franmwork described in Johnson (1998b) to evaluate the parses, i.e., we applied tile at)propriate inverse tree transfornl ,\]---1 to detransform the parse trees produced using the PCFG estimated froul trees transtbrnmd by T. By calculating the labelled precision and recall scores tbr the detransformed trees in the usual rammer, we can systematically compare the parsing accuracy of diflbrent kinds of stochastic generalized left-corner parsers.</Paragraph>
    <Paragraph position="13"> Table 5 presents the results of this comparison. As reported previously, the standard left-corner grmninar embeds sufficient non-local infornlation in its productions to significantly improve the labelled precision and recall of its MLPs with respect to MLPs of the PCFG estimated from the untransfornmd trees (Maiming and Carpenter, 1997; ll.oark and Johnson, 1999). Parsing accuracy drops off as granunar size decreases, presuntably because smaller PCFGs have fewer adjustatfle parameters with which to describe this non-local information. There are other kinds of non-local information which can be incorporated into a PCFG using a transforln-detransform approacll that result in an eve.n greater improvement of lml'sing accuracy (3ohnson, 1998b). Ultinmtely, however, it seems that a more complex ai)t)roach incorporating back-off and smoothing is necessary ill order to achieve the parsing accuracy achieved by Charniak (1997) and Collins (1997).</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML