Compact non-left-recursive grammars using the selective 
left-corner transform and factoring* 
Mark Johnson Brian Roark 
C()gnitiv(; and Linguistic Sci('.n(:('.s, Box 1978 
Brown University 
Mark_Johnson@\[3rown.ed u Bria n_Roark@grown.edu 
Abstract 
Tim lcft-c.orner transibrm reiIloves left-r(;cursion 
fl'om (l)rol)al)ilisti(') (:ontext-free granunars and uni- 
t|cation grammars, i)ermitting siml)l(~ tol)-down 
parsing te(:hniques to l)e used. Unforl.unately the 
grammars l)roduced by the stal~dard \]etTt-('orner 
transform are usually much larger than |;he original. 
The select, lye left-corner i;ransform (lescril)ed in this 
l)aI)er 1)rodu(:es a transformed grammar which simu- 
lates left-corner recognition of a user-st)coiffed set of 
tim original productions, and tOl)-down r(~cognition 
of the, others. C()mbined with tw() factorizations, it; 
1)rOdll(;es llOll-lefl;-recilrsive grallllll~ll'S |;lilt|; ~/re 11o|; 
much larger than the original. 
1 Introduction 
TOl)-down i)arsing techniques are al;tl'a(:tiv(! because 
of their simt)licity, and can often a(:hi(~ve good 1)er- 
formance in 1)racti(:e (l{oark and .\]()hns(m, 1999). 
However, with a left-re(:ursive grammar such l)ars('as 
tyl)i(:ally fail to termim~te. Tim left>corner gram- 
ltlar l;rallsforln eoliverts a lefl;-recursive ~l'iillllll~lr 
into a non-lefl;-recursive one: a top-down t)arser 
using a left-corner transformed grammar simulates 
a lefl;-(:orner parser using the original granllnar 
(l/os(mkrantz and Lewis II, 1970; Aho and Ulhnan, 
1972). Ih)wever, the left-corner transformed gram- 
mar can 1)e significantly larger than the original 
grammar, ca.using mlmero~ls l)rol)lelns. For exam- 
ple, we show 1)clew that a probat)ilistic context-fr(.~e 
grammm: (PCFG) estimated froln left-corner trans- 
formed Petal WSJ tree-bank trees exhil)its consid- 
erably greater st)arse data prol)lems than a PCFG 
estimal, e(t in the usual manner, siint)ly because the 
left-corner transformed grammar contains approxi- 
mately 20 times more 1)reductions. The transform 
described in this paper t)roduees a grammar al)prox- 
imately the same size as the inlmt grmmnar, which 
is not as adversely at\[ected by sparse data. 
* This research was slli)i)orl;ed t)y NSF awards !1720368, 
9870676 and 98121(19. We would like to tlmnk o1|1" (:olleagues 
in I~I,IAP (Brown l,aboratory for Linguistic Information Pro- 
ccssing) and Bet) Moore tbr their hcll)ful comments on this 
pal)Or. 
Left-corner transforms a.re particularly useflll be- 
cause they can i)reserve annotations on productions 
(n:ore on this 1)(flow) and are thereibre apt)lieable to 
more COml)Iex graminar formalisms as well its CFGs; 
a t)roI)erty which other al)l)roaehes to lefl;-recursion 
elimination tyl)ically lack. For examl)le , they al)l)ly 
to l(~ft-r(~cursive unification-based granmmrs (Mat;- 
sumoto et al., 1983; Pereira and Shieber, 1987; .h)hn- 
son, 1998a). Because the emission 1)robabilit;y of a 
PCFG 1)ro(hm(;ion ca15 be regarded as an anllotatioll 
on a CFG 1)reduction, the left-corner transform can 
t)rodue(', a CFG with weighted l)roductions which 
assigns the same l)robal)iliti(~s to strings an(l trans- 
tbrmed trees its the original grammar (Abney et al., 
11999). Ilowever, the transibrmed grammars (:an be 
much larger than the original, which is unac('el)table 
tbr many aI)t)lieations involving large grammars. 
The selective left-corner transform reduces the 
transl'ornm(l grammar size because only those l)ro- 
(lu(:tions which apt)ear in a left-recto'sire (:y(:le llee(l 
1)e recognized left-(:orner in order to remove left- 
recurs|on. A tOl)-down parser using a grammar pro- 
dueed by the selective left-(:orner |;ranst.'orm simu- 
lates a generalized left-corner parser (Demers, 1977; 
Nijholt, 1980) wlfich recognizes st user-specified sul)- 
set; of the original productions in a left-corner fash- 
ion, and the other productions tol)-down. 
Although we do not investigate it in this 1)al)er, 
the selective left-(:orner transform should usually 
lmve a slnaller sear(:h sl)ace relative, to tim standard 
left-corner transform, all else being equal. The par- 
tial l)arses t)roduced during a tot)-down parse consist 
of a single connected tree fragment, while the par- 
tial parses l)rodueed produced during a let't-corner 
t)arse generally consist of several discommcted tree 
fragments. Since these fragments arc only weakly re- 
lated (via the "link" constraint descril)ed below), the 
search for each fragment ix relatively independent. 
This lllay l)e rest)onsil)le for the ol)servation that 
exhaustive left-corner 1)arsing is less efficient titan 
top-down l)arsing (Covington, 1994). Intbrmally, be- 
cause the selective left-corner transforln recognizes 
only a sul)set of 1)reductions in a lefl;-corner fashion, 
its partial parses contain fewer tree discontiguous 
355 
fl'agnlents and the search inay be more efficient. 
While this paper focuses oil reducing grammar 
size to nlinimize sparse data problems in PCFG 
estilnation, tile modified left-corner transforms de- 
scribed here are generally api)licable wherever the 
original left-conler transform is. For example, tile 
selective left-corner transform can be used in place 
of the standard left-comer transform in the con- 
struction of finite-state approximations (Johnson, 
1998a), often reducing the size of the intermedi- 
ate automata constructed. The selective left-corner 
transform can be generalized to head-corner parsing 
(vail Noord, 1997), yielding a selective head-corner 
parser. (This follows from generalizing the selective 
left-corner transform to Horn clauses). 
After this paI)er was accepted for publication we 
learnt of Moore (2000), which addresses the issue 
of grammar size using very similar techniques to 
those proposed here. The goals of tile two pat)ers 
are slightly different: Moore's approach is designed 
to reduce the total grammar size (i.e., the sunl of 
the lengths of the productions), wtfile our approach 
minimizes the number of productions. Moore (2000) 
does not address left-corner tree-transforms, or ques- 
tions of sparse data and parsing accuracy that are 
covered ill section 3. 
2 The selective left-corner and 
related transforms 
This section introduces the selective left-corner 
transform and two additional factorization trans- 
forms which apply to its output. These transfbrnm 
are used ill tile experiInents described in tile follow- 
ing section. As Moore (2000) observes, in general 
the transforms produce a non-left-recursive output 
grammar only if tile input grammar G does not con- 
tain unary cycles, i.e., there is no nonterminal A 
such that A -~+ A. 
2.1 The selective left-corner transform 
The selective left-corner transform takes as input a 
CFG G = (V, T, P, S) and a set of left-corner produc- 
tions L C_ P, which contains no epsilon t)roductions; 
the non-left-corner prodnctions P - L are called top- 
down productions. The standard left-corner tr'ans- 
form is obtained by setting L to the set of all 
non-epsilon productions in P. The selective left- 
corner trnnsform of G with respect to L is the CFG 
£.Cn (G) = (V1, T, P1, S), where: 
V1 = VU{D X:DEV, XcVUT} 
and P1 contains all instances of tile schemata 1. In 
these schemata, D E V, w E T, and lower case 
greek letters range over (V tO T)*. The D-X are 
new nont;ernlinals; informally they encode a parse 
state in which an D is predicted top-down and an X 
• --D ...... D..- 
/31 f~o a D A 
: ~ ~~ 
• " fin D /3n 
CC '.. 
A ~,~ D-Bj 
a rio D-D 
Figure 1: Schematic parse trees generated by the origi- 
nal grmnmar G and the selective left-corner transformed 
grammar gCL(G). The shaded local trees in the orig- 
inal parse tree correspond to left-corner productions; 
the corresponding local trees (generated by instances of 
schema lc) in the selective leff-conler transfornled tree 
are also shown shaded. The local tree colored black is 
generated by an instance of schema lb. 
has been found left-corner, so D X ~cr.(c,) 7 only 
if D ~b XT. 
D ~ w D-w (la) 
D -~ (.~D-A whereA-+aEP-L (lb) 
D-/3 -+ \[3 D C whereC+/3/3EL (lc) 
D-D ---> e (ld) 
Tile schemata flmction as follows. The productions 
introduced by schema 1~ start a left-corner parse of 
a predicted nonterminal D with its let'mlost termi- 
nal w, while those introduced by schenla lb start; a 
left-corner parse of D with a left>corner A, which is 
itself found by the top-down recognition of produc- 
tion A -+ (t E P- L. Scheina lc extends the current 
left-corner B tit) to a C with tile left>corner recogni- 
tion of production C ~ /3 ft. Finally, scheina ld 
inatches tile top-down prediction with tile recog- 
nized left-corner category. 
Figure 1 schematically depicts the relationship be- 
tween a chain of left-comer t)roductions in a parse 
tree generated by G and the chain of correst)onding 
instances of schema le. The left-comer recognition 
of the chain starts with the recognition of (t, tile 
right-hand side of a top-down production A --+ ~, 
using an instance of schema lb. Tile left-branching 
chain of left-corner productions corresponds to a 
right-branching chain of instances of schema lc; the 
left-corner transforln in effect converts left recursion 
into right recursion. Notice that tile top-down pre- 
dicted category D is passed down this right-recursive 
chain, effectively multiplying each left-conler pro- 
ductions by the possible top-down predicted cate- 
gories. Tile right recursion terininates with an ill- 
stance of schema ld when tile left-comer and top- 
down categories match. 
Figure 2 shows how tot)-down productions from 
G are recognized using £CL(G). When the se- 
356 
...A... -..A--. ...A... 
(&, £C O: A--A r:-re, nmval (t: 
.K2ZX LECx 
Figure 2: The recognition of a top-down produc, tion A --+ 
a: by £CL(G) involves a left-corner category A-A, which 
immediately rewrites to e. One-step e-removal applied 
to £CL(G) l)roduces a grmnmar in which each top-down 
production A -+ ct corresponds to a production A --+ tt 
in the transformed grammar. 
lective left-corner tra,nsform is tbllowed by a one- 
step c-renlowd transfornl (i.e., coml)osition or partial 
evaluation of schema 1t) with respect to schema ld 
(Johnson, 1998a; Abney and 3oMson, 1991; Resnik, 
1992)), each top-down production f'rolll G appears 
uilclmnged in tile tinal grammar. Full e-relnoval 
yields the grannnar giwm 1) 3, the schemata below. 
.D -~ w D-w 
D -~ 'w where. D ~j w 
D ~ ~DA whereA-+(~cl )-L 
.D -+ a where D =>* A P L A, -+ ~ G -- L 
D-B --+ fl D C whereC-->BflcL 
D-B -} fl wllereD~},C,C~Bfl6L 
Moore (2000) introduces a version of the left- 
corner transform called LCLIt, which al)plies only to 
productions with left-recursive parent and left clfihl 
categories. \]n the~ (:ontext of the other transforms 
that Moore introduces, it seems to have the, sallle 
effect in his system as the s(Je(;tive lefl;-corll(W trails- 
form does lmre. 
2.2 Selective left-corner tree transforlllS 
There is a 1.-to-1 correspondence between the 1)arse 
trees generated by G and £CL(G). A tree t is gener- 
ated by G iff there is a corresponding t' generated by 
£CL(G), where each occurrence of a top-down pro- 
duction in the derivation of t corresponds to exactly 
one local l, ree gelmrated by occurrence of the cor- 
responding instance of schema 11) ill the derivation 
of t', and each occurrence of a M't-corner produc- 
tion in 1 corresponds to exactly one occurrence of 
the corresponding instance of schema le in t'. It; is 
straightforward to detine a 14o-1 tree l;ransform TL 
mapping parse trees of G into parse trees of £dL (G) 
(.Johnson, 1998a; Roark and Johnson, 1999). In the 
empirical evaluation below, we estinmte a PCFG 
Dora the trees obtained by applying 7}, to the trees 
in the Petal WSJ tree-lmnk, and compare it to tile 
PCFG estinmted from the original tree-bank trees. 
A stochastic top-down parser using the I'CFG es- 
timated from the trees produced by ~, simulates 
a stochastic generalized left-corner Imrser, wlfich is 
a generalization of a standard stochastic lefl;-corner 
1)arser that pernfits productions to t)e ret;ognize, d 
top-down as well as left-corner (Manning and Car- 
penter, 1997). Thus investigating the 1)roperties of 
PCFG estimated from trees transformed with "YL is 
an easy way of studying stochastic trash-down au- 
tomata performing generalized lefi;-corner parses. 
2.3 Pruning useless productions 
We turn now to the problmn of reducing the size of 
tile grmnmars produced by left-corner transforms. 
Many of the productions generated by schemata 1 
art: useless, i.e., they never appear in any termi- 
nating deriw~tion. Wtfile they can be removed by 
standard methods for deleting useless productions 
(Ilopcroft and Ulhnan, 1979), the relationship be- 
tween the parse trees of G and £CL(G) depicted in 
Figure 1 shows how to determine ahead of time the 
new nonterminals D X that can at)pear in useful 
productions of ECL (G). This is known as a link con- 
straint. 
D)r (P)CFGs there is a particularly simple link 
constrainl;: \]) X apt)ears in useflfl productions of 
£CL(G) only if ~7 < ( 17 U T)*.D =>* XT. If • L 
epsilon removal is applied to the resulting gram- 
mar, D X appears in usefill productions only if 
H7 C (17 U T) +.D ~}, X7. Thus one only need 
ge.nerate instances of the left-corner schemata which 
satist~y the corresponding link constraints. 
Moore (2000) suggests all additional constraint on 
nonte.rminals D X that can al)l)ear in useflll 1)roduc - 
l;iolts of £CL(G): D lllllsl; eitller be th(! start synJ)ol 
of G or else al)pear in a production A --+ o'D/3 of G, 
for .,,;, A c- V, c {Vu T}+ c Tp. 
It is easy to see that the l}roducl,ions that Moore's 
constraint prohibits are useless. There is one non- 
ternfinal in the tree-bank gramnmr investigated be- 
low that has this property, namely LST. However, 
ill the tree-lmnk granmmr none of the productions 
exlmnding LST are left-recursive (in fact, the first; 
dfild is ahvays a pretermiiml), so Moore's constraint 
does not atgect the size of the transformed grammars 
investigated below. 
While these constraints can dramatically reduce 
both the number of productions and the size of the 
1)arsing search space of the 1;ransformed grmnmar, 
in general the transfl)rmed grammar £CL (G) can 1)e 
quadratically larger than G. There are two causes 
for the explosion ill grmnmar size. First, £CL(G) 
contains an instance of sdmma lb tbr each top- 
down production A --+ a and each D such that 
37. D ~}, A 7. Second, £CI,(G) contains an in- 
stance of schema lc for each left-corner production 
C -~ fi and each D such that BT.D ~, C7. In 
etDct, £CL(G) contains one copy of each production 
for each possible left-comer ancestor. Section 2.5 
describes filrther factorizations of the l)roductions 
of £CL (G) which mitigate these causes. 
357 
2.4 Optimal choice of L 
Because ::>~, increases monotonically with =>L and 
hence L, we typically reduce the size of £CL(G) by 
making the left-corner production set L as small as 
possit)le. This section shows how to find the unique 
minimal set of left-corner productions L such that 
£CL(G) is not left-recursive. 
Assume G = (V,T, P, S) is wuned (i.e., P con- 
tains no useless productions) and that there is no 
A 6 V such that A --++ A (i.e., G does not gen- 
erate recursive unary branching chains). For rea- 
sons of space we also assume that P contains no 
e-productions, but this approach can be extended to 
deal with them if desired. A production A -+/3fl C 
P is left-rccursive iff ~3' C (V U T)*. \]3 ~, AT, i.e., 
P rewrites B into a string beginning with A. Let L0 
be the set of left-recursive prodtlctious in G. Then 
we claim (1) that £CLo (G) is not left-recursive, and 
(2) that for all L C Lo, £CL(G) is leff-recursive. 
Claim 1 follows t¥om the fact, that if A ~s,0 B7 
then A =:>,, /37 and tile constraints ill section 2.3 
on useful productions of £CLo(G). Claim 2 tbllows 
from the fact that if L C L0 then there is a chain of 
left-recursive productions that includes a top-down 
production; a simple induction on tile length of the 
chain shows that gCL (G) is left-recursive. 
This result justifies the common practice in natu- 
ral  lefl;-corner t)arsing of taking tile termi- 
nals to be the preterminal t)art-of-speech tags, rather 
than the lexical items themselves. (We did not at- 
tempt to calculate tile size of such a left-comer gram- 
mar in tilt empirical evaluation below, lint it would 
be much larger than any of the grammars described 
there). In fact, if the preterminals are distinct from 
the other nonterminals (as they are ill the tree-bank 
grammars investigated below) then L0 does not in- 
clude any productions beginning with a preterminal, 
and £CLo (G) contains no instances of schema la at 
all. We now turn our attention to tlm other sclmmata 
of the selective left-corner grammar transform. 
2.5 Factoring the output of £CL 
This section defines two factorizations of the outtmt 
of the selective left-corner grammar transform that 
can dramatically reduce its size. These factoriza- 
tions are most effective if the number of t)roductions 
is much larger than the number of nonterminals, as 
is usually the case with tree-bank grmnmars. 
Tilt top-down factorization decomposes 
schema lb by introducing new imnterminals 
D t, where D C V, that have the stone expansions 
that D does in G. Using the same interpretation for 
variables as in schemata 1, if G = (I~ T, P, S) then 
(° (a) = T, S), where: 
14a = IqtO{D':DEV} 
and Ptd contains all instances of the schemata la, 
3a, 3b, lc and 1(t. 
D --+ A'D-A whereA-+aEP-L (3a) 
A' -+ a, whereA-->creP-L (3b) 
Notice that the number of instances of schema 3a is 
less than the square of tile number of nonterminals 
and that the number of instances of sdmma 31) is the 
number of top-down productions; the sum of these 
numbers is usually much less than tile mlmber of 
instances of schema lb. 
Top-down factoring plws approximately tile same 
role as "non-left-recursion grouping" (NLRG) does 
in Moore's (2000) approach. The meier difl!erence 
is that NLRG applies to all productions A ~ /3/9 
in wtfich /3 is not left-recm'sive, i.e., ~7./7 =>~ /3% 
while in our system toll-down factorization applies to 
those productions tbr which ~7. B ~, AO', i.e., the 
productions not directly involved in left recursion. 
Tim left-corner factorization decomposes 
schema lc in a similar way using new nonter- 
minals D\X, where D e V and X ~ V U T. 
c)(c) = T, S), where: 
I'}o = ~qU{D\X:D6V, X6VUT} 
and Plc contains all instances of tile schemata la, 
ib, 4a., 4b and id. 
D /3 -+ C\BD C whereC-+B\[9¢L (4a) 
CxB --+ fl whereC--+BflEL (4b) 
The number of instances of schema 4a is bounded 
by the numtmr of instances of schema lc and is typ- 
ically nmch smaller, while the number of instances 
of schema 41) is precisely the munber of left-corner 
productions L. 
Left-corner factoring seems to correspond to one 
step of Moore's (2000) "left factor" (LF) operation. 
Tile left; factor operation constructs new nontermi- 
nals corresponding to common prefixes of" arbitrary 
length, while left-corner factoring effectively only 
factors the frst nonterminal symbol on the right 
hand side of left-corner productions. While we have 
not done experiments, Moore's left factor operation 
would seem to reduce the total number of symbols 
in the transformed grammar at tile expense of pos- 
sibly introducing additional productions, while our 
left-corner factoring reduces the number of produc- 
tions. 
These two factorizations can be used together 
in the obvious way to define a grmnmar trans- ~__.C(ld,le) 
form "L , whose productions are defined by 
schemata la, 3a, 3b, 4a, 4b and ld. There are corre- 
spondiug tree transtbrms, which we refer to as TI! td) , 
etc., below. Of course, the pruning constraints de- 
scribed in section 2.3 are applicable with these fac- 
torizations, and corresponding invertible tree trans- 
forms can be constructed. 
358 
3 Empirical Results 
To examine the effect of the tra.nsforms outlined 
above, we experimented with vm'ious PCFGs in- 
dueed from sections 2--21 of a modified Pcml WSJ 
tree-bank as described in Johnson (19981)) (i.e., 
labels simplifiecl to grammatical ca.tegorics, R.OOT 
lu)des added, empty nodes and vacuous unary 
bra.nehcs deleted, and auxiliaries retagged as AUX 
or AUX('). \~,Ze. ignored lexic.al items, and treated 
the part-of-speech tags as terminals. As Bob Moore 
pointed out Lo us, the left-corner transform may pro- 
duc.e left-recursive grmnmars if its inlmt grammar 
contains mmry cycles, so we removed them using the 
a transforln that Moore suggested. Given an iifitial 
set of (non-epsihm) productions P, the transtbrmed 
grammar contains the following in:odu(:tions, wherc~ 
l;he A ~ are 1lew llOll-terlilillals: 
A "-~ (t where A--} (t G P, A 75~; A 
A~D~ whereA=>~,D~iA 
./1 h -~ ~: where A -~ (~ G P, A ~; .,1, (t ~>~, A 
This transform can be extended t,o one on PCFGs 
which preserves derivation probabilities. In this sec- 
tic)n, we fix P to) be the produeticms l;lmt re.sult; afl;er 
al)plying this unary t:yc:le removal transforma.tion to 
the tree-l)ank 1)roductions, and G to \])e the ('orre- 
st)onding grammm'. 
Tables 1 and 2 give the sizes of selective left;- 
(:orner grmnlnar trmlsforms of G for various wthles 
of l;he left-et)rner set L and fa(:torizal;ions, without 
and with epsilon-remowfl respectively. In l;he ta- 
bles, L/j is the st'./; of hd't-rc.cm'siv(' 1)roductions in 
P, as detined in set:lion 2.4. N is the sel of 1)roclu(: - 
l;ions in 1~ whose hfft-ha\]M sides do not begin with 
a part-ofspee(:h (P()S) tag; 1)ecause I'OS tags are 
distinct front other nontermimtls in l;he tree-lmnk, 
N is an easily identified set of I)roductions guaran- 
teed to include L0. The tables also gives the sizes 
of maximum-likelihood PCFGs estimated from the 
tr(;es resulting fl:om applying the sele(:tive left-corner 
tree transforms 7- 1,(} the tree-bank, l)reaking mmry 
t:yeles as clescribed above. For the I)arsing exl)eri- 
ments below we always deleted empty nodes in the 
outl)ut of these tree transforms; this corresponds to 
el)silon removal in the grammar transform. 
First, note that/2Cv(G), the result of al)plying the 
standard left-corner glmmnar transform to G, has 
al)proximately 20 times the number of t)roductions 
£C (m't~)(G), the result of aI)- tha.t G has. Itowever "co 
plying the selective left-corner grammar transforma- 
tion with factorization, has approximately 11.4 times 
the munber of productions that G has. Thus the. 
methods described in this paper cml in fact dramati- 
cally reduce the. size of left-corner transformed gram- 
mars. Second, note that £C(~t'I")(G) is not much 
th.,,  :his t,et:.,,se N larger is llOt IJO \ \] 
G 
£C 1' 
£CN 
£CLo 
T,\, 
,,o,~e (ta) (z~) (t(z, z(..) 
1.5,040 
346,344 30,716 
345,272 113,616 254,067 22,4:11 
314,555 103,504 232,41.5 21,364 
20,087 17,146 
19,61.9 16,349 19,002 15,732 
18,945 16,126 18,437 15,618 
Table \] : Sizes of PCFGs inferred using vm'ious grammar 
and tree transtbrms after pruning with link constraints 
without epsihm removal. Cohmms indicate thctorization. 
In the grammar and tree transfl)rms, P is the set, of pro- 
ductions in G (i.e., the standard M't-corner transform), 
N is the set of all productions in P which do not be- 
gin with a POS tag, mM L0 is the set of left-recursive 
t)roclu(:tions. 
£C1' 
£CN £CL, 
"I-N 
"~\])o 
,,o,,e (¢,~z) (l~) (~< >) 
564,430 38,489 
563,295 1.76,644: 411,986 25,335 
505,435 157,899 371,102 23,566 
22,035 17,398 
2:1,58!) 16,688 20,696 15,795 
21,061 16,566 20,168 15,673 
'.l'alfle 2: Sizc's of PCFGs inferred using various grammm: 
and tree trmtsforms aftc.r pruning with link constraints 
with epsihm removM, using the same notation as Table 1. 
much larger than L0, which in turn is be(:ausc, most 
pairs of non-P()S nonternfinals A, B are nmt;ually 
left-recursive. 
'l)lrning now to the PCFGs estimated after at)- 
plying tree transtbrms, we notice that grammar size 
(Loes ll()t Jll(;Fe}Lqe. Ile}llJ\]y St) dramatically. These 
PCFGs encode a. maximum-likelihood estimate of 
the state transiti(m probabilities for vmious stochas- 
tic generalized h;t't-(-orner t)m'sers, since a tol).-clt)wn 
parser using these, grammars simulates a general- 
ized left-corner 1)arser. The fact that £Cp(G) is 
17 timc.s larger than the. PCFG infe.rred a.fter apply- 
ing "T}, to the tree-lmnk means that most of tile l}OS - 
sible transitions of a standard stochastic left-corner 
parser are not observc.d in the tree-bank la"'ammg" 
data. The state of a left-corner parser does capture 
some linguistic generalizations (Mmming an<l Car- 
penter, 1997; Roark a.nd Johnson, 1999), but one 
might still expect sparse-data problems. Note that 
"Lo is only 1.4 times larger than T, (t~'z~) Lo , SO We 
expect less serious sp~rse data problems with the 
fat:toted selective left-corner transibrm. 
We quantii~ these sparse data prol)lems in two 
ways using a held-out test eorIms, viz., all sentences 
in section 23 of the trce-lmnk. First, table 3 lists the 
mmfl)er of sentences in the test corpus that fail to 
receive a parse with the wwious PCFGs mentioned 
359 
TransfBrln 
Ilone %, 
TN 
%.0 
none (t(0 (t~) (td, lc) 
0 
2 0 
2 0 2 
0 0 0 
Table 3: The number of sentences 
not receive a parse using various 
fl'om sections 2-21. 
in section 23 that do 
grammars estimated 
Transforin 
none 
7?, 
TN 
Tp~ 
TN~ 
7I~0 e 
nolle (td) (lc) (td, Ic) 
514 
665 535 
664 543 639 518 
640 547 615 522 
719 539 
718 554 685 521 
706 561 666 521 
Table 4: The lmml)er of productions found in the trans- 
formed trees of sentences in section 23 that do not appear 
in the corresponding transforined trccs f,'om sections 2 
21. (The subscript epsilon indicates epsilon remowfl was 
applied). 
above. This is a relatively crude lneasure, but cor- 
relates roughly with the ratios of gralnlnar sizes, as 
expected. 
Second, table 4 lists the number of productions 
found in the tree-transformed test cortms that (lo 
not at)pear in the correspondingly transformed trees 
of sections 2 2t. What is striking here is tlmt the 
number of missing I)roductions aft;er either of the 
l;ransforlns , Lo or , N is apl)roxilllal;ely the 
sa, ine as tim number of inissing 1)reductions using 
the untransformed trees, indicating that the factored 
selective left-corner transfl)rms cause little or no ad- 
ditional sparse data problem. (The relationship be- 
tween local trees ill the parse trees of G and £dc(G) 
mentioned earlier implies that left-corner tree trans- 
tbrmations wilt not decrease the number of missing 
productions). 
We also investigate the accuracy of the maximum- 
likelihood parses (MLPs) obtained using the PCFGs 
estimated from tile output of the various left-corner 
tree transforms. 1 We searched for these parses us- 
ing all exhaustive CKY parser. Because the parse 
trees of these PCFGs are isomorphic to the deriva- 
tions of the corresponding stochastic generalized 
left-corner parsers, we are in fact evaluating different 
kinds of stochastic generalized left-corner parsers in- 
ferred from sections 2-21 of the tree-bank. We used 
1\¥e (lid not investigate the grammars produced by the 
various left-corner grammar transforms. Because a left-corner 
grammar transform ECL preserves production probal)ilities, 
the highest scoring parses obtained using the weighted CFG 
EeL(G) should be the highest scoring parses obtained using 
G transformed by TL. 
nolle %, 
TN 
7}:0 
,,one (t~Z) (Z~) (ta, to) 
70.8,75.3 
75.8,77.7 74.8,76.9 
75.8,77.6 73.8,75.8 75.5,77.8 72.8,75.4 
75.8,77.4 73.0,74.7 75.6,77.8 72.9,75.4 
Table 5: Labelled recall and precision scores of PCFGs 
estimated using various tree-transforms ill a transform- 
detransform framework using test data from section 23. 
tile transforn>detransfornl franmwork described in 
Johnson (1998b) to evaluate the parses, i.e., we ap- 
plied tile at)propriate inverse tree transfornl ,\]---1 
to detransform the parse trees produced using the 
PCFG estimated froul trees transtbrnmd by T. By 
calculating the labelled precision and recall scores 
tbr the detransformed trees in the usual rammer, we 
can systematically compare the parsing accuracy of 
diflbrent kinds of stochastic generalized left-corner 
parsers. 
Table 5 presents the results of this comparison. As 
reported previously, the standard left-corner grmn- 
inar embeds sufficient non-local infornlation in its 
productions to significantly improve the labelled pre- 
cision and recall of its MLPs with respect to MLPs of 
the PCFG estimated from the untransfornmd trees 
(Maiming and Carpenter, 1997; ll.oark and John- 
son, 1999). Parsing accuracy drops off as granunar 
size decreases, presuntably because smaller PCFGs 
have fewer adjustatfle parameters with which to de- 
scribe this non-local information. There are other 
kinds of non-local information which can be incor- 
porated into a PCFG using a transforln-detransform 
approacll that result in an eve.n greater improvement 
of lml'sing accuracy (3ohnson, 1998b). Ultinmtely, 
however, it seems that a more complex ai)t)roach in- 
corporating back-off and smoothing is necessary ill 
order to achieve the parsing accuracy achieved by 
Charniak (1997) and Collins (1997). 
4 Conclusion 
This paper presented factored selective left-corner 
grammar transtbrms. These transtbrlns preserve the 
priinary benefits of the left-conmr grammar trans- 
form (i.e., elimination of left-recursion and preserva- 
tion of annotations on tlroductions) while dranmti- 
tally ameliorating its 1)rincipal problems (gramnmr 
size and sparse data problelns). This should extend 
the applicability of left-conmr techniques to situa- 
tions involving large grammars. We showed how to 
identif~y the nfinimal set L0 of productions of a gram- 
mar that must be recognized left-corner ill order for 
the transformed grammar not to be left-recursive. 
We also proposed two factorizations of tile output of 
the selective left-corner grmnmar t, ransfbrm which 
fllrther reduce grammar size, and showed that there 
is only a nfinor increase in gralnmar size when the 
360 
B~ctored sele(:tive left-corner transform is apl)lied to 
a large tre('A)a.nk grmnmar. Finally, we exploited 
the tree trm~sforms that, correspond to these gram- 
mar trmlsforms to formulate and study a class of 
sto(:hastie generalized left-corner t)arsers. 
This work could be extended in a. nmnber of ways. 
D)r examl)le, in this t)al)er we assumed that one 
would always choose a left-corner l)ro(lut'tioll set 
that inehtdes the nfinimal set L0 required to ensure 
that the transfbrmed grammar is not left-recursive. 
However, Roark mid Johnson (1999) report good 
perR)rmance from a stochastically-guided top-down 
parser, suggesting that lefl;-recm'sion is not; always 
fatal. It might be possible to judiciously choose 
a M't-cor,mr production set smaller than L0 which 
elimiimtes t)erni(:ious left-r(;cursion, so that the re- 
maining lefl;-reeursive cycles llav(', su(:h low t)rol)a- 
1)ility that tlmy will efl'(~(:t;ively never l)e used and 
a stochastically-guided top-down l)arser will II(~,Vel ' 
sea.reh l;h(un. 

References 

Stephen Almey and Mark Johnson. 1991. Memory re- 
quirements and local mnbiguities of parsing strategies. 
Journal of 1Lsycholinflui.stic R.e.scavch, 20(3):233 250. 

Steven Abney, David MeAlles,x~r, and D;rnando Pereira. 
1999. Relating 1)r()|)al)ilisl;i(: grammars and automata. 
Ill Procccdinfls of tile ,?Tth Annual Mcefinfl of the Asso- 
ciation for Computational Linipdstics, pages 542 549, 
San Francisco. Morgan Kauflmmn. 

Alfred V. Aho mM .leth;ry D. 1Jllman. 1!172. The 5lTtc- 
ory oJ'Parsing, Translation and Compiling; Volume. I: 
Parsing. Prentice-Hall, Englewood Cliffs, New Jersey. 

Eugene Charniak. 1997. St;atist:i(:al parsing wit;h a 
colL|;ex1;-ll'ee ~lallllllar and word sl;atisl;i(:s. In \])~Y)('ccd,- 
ings of the Fourteenth Nationo, l Covj):rcnce on Artifi- 
cial hLtcdliflcnce, Menlo Park. AAAI Press/MIT Press. 

Michael Collins. 1997. Three generative, lexicalised 
models for st;atistical parsing. In The Proceedings of 
the 35th Annual Meeting of tile: Association jbr Com- 
p'utational Linguistics, San Francisco. M~orgall Kallf- 
IlI~Ulll. 

Michael A. Covington. 1994. Natural Lanfluage Pro- 
cessin9 for Prolo 9 Programmers. Prentice Ilall, En- 
glewood Clitli% New Jersey. 

A. Demers. 1977. Generalized left>corner parsing. In 
Cot@fence R.ccord of the Fourth ACM Symposium 
on Principles of Programming Lanfluagcs, 1977 A C'M 
SIGA CT/SIGPLAN, pages 170 -182. 

,John E. Hopcroft ml.d JeIfrey D. Ulhmm. 1979. Intro- 
duction to Automata Theory, Languages and Compu- 
tation. Addison-\Vesley. 

Mark Johnson. 1998a. Finite state apl)roximation 
of unification grammars using lefl;-eorner grmmnar 
l\[,,rallsforlllS. In ~'hc \])rocccdiltga of tlt('. 3O'th dtrt- 
nual Gin@fence of the Association J'or Computational 
Linguistics (COLING-ACL), pages 619 623. Morgan 
Kaufmann. 

Marl{ Johnson. 19981). PCFG mode, Is of linguis- 
tic tree representations. Computational Linguistics, 
24(4):613 632. 

Christol)her D. Manning mM Bob Carl)enter. 1997. 
1)robal)ilistic parsing using left>corner models. In Pro- 
cecdings of flu: 5th hdcrnational Workshop on Parsing 
Technologies, 1)ages 147- 158, Massachusetts Institut, e 
of Technology. 

guji Matsumoto, IIozumi Tanaka, Ilideki Ilirakawa, 
IIideo Miyoshi, and Hideki Yasukawa. 1983. BUP: 
A 1)otl;oIn-ll I) t)arser embedded in Prolog. New Gen- 
eration Computing, 1(2):145 158. 

l{.obert C. Moore. 2000. Removing left reeursion from 
context-flee grmnmars. In Proceedings of 1st Annual 
Conference of tile North American Chapter of the As- 
sociation for Computational Linguistics, San ~'an- 
cisco. Morgan Kauflnann. 

Anton Nijholt. 1980. Context-free Grammars: Co'rots, 
Normal Forms, and Par,sing. Springer Verlag, Berlin. 
Fermmdo C.N. Pereira and Stuart M. Shieber. 1987. 

P'~wlofl and Natural Language Analysis. Numbex l 0 in 
CSLI lx~eture Notes Series. Chicago University Press, 
Chicago. 

Philip l{esnik. 1.992. l,eft-corner parsing and psychologi- 
cal plausibility. In The Proceedings of th(', JlJ'tec, nth h~- 
tcrnational Conference on Computational Linfluistics, 
COLING-92, vohmw, 1, pages 191 197. 

\]3rian l/.oark and Mark Johnson. 1999. Efficient proba- 
bilistic top-down and left-corner parsing. In P~wcccd- 
ings of the 37th Annual Mcctinfl of the ACL, pages 
421 428. 

Stanley J. Rosenkrantz and Philip M. Lewis II. 197{). 
Deterministic M't corner parser. In IEEE CmLfcrcnce 
Record of the l lth Annual Symposium on Switchinq 
and Automata, pages 139 152. 

Gertjan van Noord. 1997. An efficient implenmnl;al;ion 
of the head-(:orn(!r l)arser. Computational Linguistics, 
23(3):425 q56. 
