Efficient Tabular LR Parsing 
Mark-Jan Nederhof 
Faculty of Arts 
University of Groningen 
P.O. Box 716 
9700 AS Groningen 
The Netherlands 
markj an@let, rug. nl 
Giorgio Satta 
Dipartimento di Elettronica ed Informatica 
Universit£ di Padova 
via Gradenigo, 6/A 
1-35131 Padova 
Italy 
satt a@dei, unipd, it 
Abstract 
We give a new treatment of tabular LR 
parsing, which is an alternative to Tomita's 
generalized LR algorithm. The advantage 
is twofold. Firstly, our treatment is con- 
ceptually more attractive because it uses 
simpler concepts, such as grammar trans- 
formations and standard tabulation tech- 
niques also know as chart parsing. Second- 
ly, the static and dynamic complexity of 
parsing, both in space and time, is signifi- 
cantly reduced. 
1 Introduction 
The efficiency of LR(k) parsing techniques (Sippu 
and Soisalon-Soininen, 1990) is very attractive from 
the perspective of natural language processing ap- 
plications. This has stimulated the computational 
linguistics community to develop extensions of these 
techniques to general context-free grammar parsing. 
The best-known example is generalized LR pars- 
ing, also known as Tomita's algorithm, described by 
Tomita (1986) and further investigated by, for ex- 
ample, Tomita (1991) and Nederhof (1994a). Des- 
pite appearances, the graph-structured stacks used 
to describe Tomita's algorithm differ very little from 
parse fables, or in other words, generalized LR pars- 
ing is one of the so called tabular parsing algorithms, 
among which also the CYK algorithm (Harrison, 
1978) and Earley's algorithm (Earley, 1970) can be 
found. (Tabular parsing is also known as chart pars- 
ing.) 
In this paper we investigate the extension of LR 
parsing to general context-free grammars from a 
more general viewpoint: tabular algorithms can of- 
ten be described by the composition of two construc- 
tions. One example is given by Lang (1974) and 
Billot and Lang (1989): the construction of push- 
down automata from grammars and the simulation 
of these automata by means of tabulation yield dif- 
ferent tabular algorithms for different such construc- 
tions. Another example, on which our presentation 
is based, was first suggested by Leermakers (1989): 
a grammar is first transformed and then a standard 
tabular algorithm along with some filtering condi- 
tion is applied using the transformed grammar. In 
our case, the transformation and the subsequent ap- 
plication of the tabular algorithm result in a new 
form of tabular LR parsing. 
Our method is more efficient than Tomita's algo- 
rithm in two respects. First, reduce operations are 
implemented in an efficient way, by splitting them in- 
to several, more primitive, operations (a similar idea 
has been proposed by Kipps (1991) for Tomita's al- 
gorithm). Second, several paths in the computation 
that must be simulated separately by Tomita's algo- 
rithm are collapsed into a single computation path, 
using state minimization techniques. Experiments 
on practical grammars have indicated that there is 
a significant gain in efficiency, with regard to both 
space and time requirements. 
Our grammar transformation produces a so called 
cover for the input grammar, which together with 
the filtering condition fully captures the specifica- 
tion of the method, abstracting away from algorith- 
mic details such as data structures and control flow. 
Since this cover can be easily precomputed, imple- 
menting our LR parser simply amounts to running 
the standard tabular algorithm. This is very attrac- 
tive from an application-oriented perspective, since 
many actual systems for natural language processing 
are based on these kinds of parsing algorithm. 
The remainder of this paper is organized as fol- 
lows. In Section 2 some preliminaries are discussed. 
We review the notion of LR automaton in Section.3 
and introduce the notion of 2LR automaton in Sec- 
tion 4. Then we specify our tabular LR method in 
Section 5, and provide an analysis of the algorithm 
in Section 6. Finally, some empirical results are giv- 
239 
en in Section 7, and further discussion of our method 
is provided in Section 8. 
2 Definitions 
Throughout this paper we use standard formal lan- 
guage notation. We assume that the reader is famil- 
iar with context-free grammar parsing theory (Har- 
rison, 1978). 
A context-free grammar (CFG) is a 4-tuple G = 
(S, N, P, S), where S and N are two finite disjoint 
sets of terminal and nonterminal symbols, respec- 
tively, S E N is the start symbol, and P is a finite 
set of rules. Each rule has the form A ---* a with 
A E N and a E V*, where V denotes N U E. The 
size of G, written I G I, is defined as E(A--*a)EP \[Aot I; 
by I a I we mean the length of a string of symbols a. 
We generally use symbols A,B,C,... to range 
over N, symbols a, b, c,... to range over S, symbols 
X, Y, Z to range over V, symbols ~, 8, 7,... to range 
over V*, and symbols v, w, z,... to range over S*. 
We write e to denote the empty string. 
A CFG is said to be in binary form if ~ E 
{e} U V t.J N 2 for all of its rules A --* c~. (The binary 
form does not limit the (weak) generative capaci- 
ty of context-free grammars (Harrison, 1978).) For 
technicM reasons, we sometimes use the augment- 
ed grammar associated with G, defined as G t = 
(St, N t, pt, St), where St, t> and <1 are fresh sym- 
bols, S t = SU {t>,<l}, N t = NU {S t } and 
pt = p U {S t ~ t>S<~}. 
A pushdown automaton (PDA) is a 5-tuple .4 = 
(Z, Q, T, qi,, q/in), where S, Q and T are finite sets 
of input symbols, stack symbols and transitions, re- 
spectively; qin E Q is the initiM stack symbol and 
q/i, E Q is the finM stack symbol. 1 Each transition 
has the form 61 ~-~ 62, where 61,82 E Q*, 1 < 161 l, 
1 < 1621 < 2, and z = e or z = a. We generally use 
symbols q, r, s,... to range over Q, and the symbol 
6 to range over Q*. 
Consider a fixed input string v E ~*. A config- 
uration of the automaton is a pair (6, w) consisting 
of a stack 6 E Q* and the remaining input w, which 
is a suffix of the input string v. The rightmost sym- 
bol of 6 represents the top of the stack. The initial 
configuration has the form (qi~, v), where the stack 
is formed by the initial stack symbol. The final con- 
figuration has the form (qi, q/i,, e), where the stack 
is formed by the final stack symbol stacked upon the 
initial stack symbol. 
ZWe dispense with the notion of state, traditionally 
incorporated in the definition of PDA. This does not 
affect the power of these devices, since states can be 
encoded within stack symbols and transitions. 
The application of a transition 81 ~-~ 82 is de- 
scribed as follows. If the top-most symbols of the 
stack are 61, then these symbols may be replaced by 
62, provided that either z = e, or z = a and a is the 
first symbol of the remaining input. Furthermore, if 
z = a then a is removed from the remaining input. 
Formally, for a fixed PDA .4 we define the bina- 
ry relation t- on configurations as the least relation 
satisfying (881, w) ~- (662, w) if there is a transition 
61 ~ 62, and (881, aw) t- (682, w) if there is a tran- 
sition 61 a 82. The recognition of a certain input v 
is obtained if starting from the initial configuration 
for that input we can reach the final configuration 
by repeated application of transitions, or, formally, 
if (qin, v) I"* (q~,, aria, e), where t-* denotes the re- 
flexive and transitive closure of b. 
By a computation of a PDA we mean a sequence 
(qi,,v) t- (61,wl) h... t- (6n,wn), n > 0. A PDA is 
called deterministic if for all possible configurations 
at most one transition is applicable. A PDA is said 
to be in binary form if, for all transitions 61 ~L~ 62, 
we have 1611 < 2. 
3 Ll:t automata 
Let G = (S, N, P, S) be a CFG. We recall the no- 
tion of LR automaton, which is a particular kind 
of PDA. We make use of the augmented grammar 
G t = (st, N t, pt, S t) introduced in Section 2. 
Let !LR : {A ~ a • ~ I (A --~ aft) E pt}. 
We introduce the function closure from 2 I~'R to 2 ILR 
and the function goto from 2 ILR × V to 2 l~rt. For 
any q C ILK, closure(q) is the smallest set such that 
(i) q c closure(q); and 
(ii) (B --~ c~ • Aft) e closure(q) and (A ~ 7) e pt 
together imply (A --* • 7) E closure(q). 
We then define 
goto(q, X) = 
{A ---* ~X • fl I (A --* a • Xfl) E closure(q)}. 
We construct a finite set T~Lp ~ as the smallest collec- 
tion of sets satisfying the conditions: 
(i) {S t ~ t>. S<~} E ~'~Ll=t; and 
(ii) for every q E ~T~LR and X E V, we have 
goto(q, X) E 7~LR, provided goto(q, X) ~ 0. 
Two elements from ~Lt~ deserve special attention: 
qm = {S t --+ t> * S<~}, and q/in, which is defined to 
be the unique set in "~Ll:t containing (S t ~ t>S * <~); 
in other words, q/in = goto(q~n, S). 
240 
For A • N, an A-redex is a string qoqlq2"" "qm, 
m _> 0, of elements from T~Lrt, satisfying the follow- 
ing conditions: 
(i) (A ~ a .) • closure(q,,), for some a = 
X1X~. • • • Xm ; and 
(ii) goto(q~_l, Xk) = qk, for 1 < k < m. 
Note that in such an A-redex, (A --~ • X1Xg.... Xm) 
• closure(qo), and (A ~ X1...Xk * Xk+z'"Xm) 
E qk, for 0 < k < m. 
The LR automaton associated with G is now in- 
troduced. 
Definition 1 .ALR = (S, QLR, TLR, qin, q~n), where 
QLR "- ~'~LR, qin = {S t -'* t> • S<~}, qlin = 
goto(qin, S), and TLR contains: 
(i) q ~ q q', for every a • S and q, q~ • ~LR such 
that q' = goto(q, a); 
(ii) q5 ~-L q q', for every A • N, A-redex q~, and 
q' • TiLa such that q~ = goto(q, A). 
Transitions in (i) above are called shift, transitions 
in (ii) are called reduce. 
4 2LR Automata 
The automata .At, rt defined in the previous section 
are deterministic only for a subset of the CFGs, 
called the LR(0) grammars (Sippu and Soisalon- 
Soininen, 1990), and behave nondeterministical- 
ly in the general case. When designing tabular 
methods that simulate nondeterministic computa- 
tions of ~4LR, two main difficulties are encountered: 
• A reduce transition in .ALrt is an elementary op- 
eration that removes from the stack a number 
of elements bounded by the size of the underly- 
ing grammar. Consequently, the time require- 
ment of tabular simulation of .AL~ computa- 
tions can be onerous, for reasons pointed out 
by Sheil (1976) and Kipps (1991). 
• The set 7~Lrt can be exponential in the size of 
the grammar (Johnson, 1991). If in such a case 
the computations of.ALR touch upon each state, 
then time and space requirements of tabular 
simulation are obviously onerous. 
The first issue above is solved here by re- 
casting .ALR in binary form. This is done 
by considering each reduce transition as a se- 
quence of "pop" operations which affect at most 
two stack symbols at a time. (See also 
Lang (1974), Villemonte de la Clergerie (1993) and 
Nederhof (1994a), and for LR parsing specifically 
gipps (1991) and Leermakers (19925).) The follow- 
ing definition introduces this new kind of automaton. 
I ! Definition 2 A~R = (~, QLR' TLR., qin, q1~n), where 
q, LR ----- 7~LR U ILR, qin = {S t "* I> • S<2}, qJin = 
goto(qin, S) and TLR contains: 
(i) q ~ q q,, for every a • S and q, q' • 7~Lrt such 
that q' = goto(q, a); 
(ii) q A. q (A --* a .), for every q • TiLR and (A 
• ) • closure(q); 
(iii) q (A --* aX • ,8) ~ (A ~ a • X,8), for every 
q • ~LR and (A ~ aX . ,8) • q; 
(iv) q (A --* * c~) A, q q', for every q, q' • 7~LR and 
(A ~ ~) • pt such that q' = goto(q, A). 
Transitions in (i) above are again called shift, tran- 
sitions in (ii) are called initiate, those in (iii) are 
called gathering, and transitions in (iv) are called 
goto. The role of a reduce step in .ALR is taken over 
in .A£K by an initiate step, a number of gathering 
steps, and a goto step. Observe that these steps in- 
volve the new stack symbols (A --~ a • ,8) • ILI~ 
that are distinguishable from possible stack symbols 
{A .-* a • ,8} • '/'~LR- 
We now turn to the second above-mentioned prob- 
lem, regarding the size of set 7dgR. The problem 
is in part solved here as follows. The number of 
states in 7~Lrt is considerably reduced by identify- 
ing two states if they become identical after items 
A --~ cr • fl from ILrt have been simplified to only 
the suffix of the right-hand side ,8. This is rem- 
iniscent of techniques of state minimization for fi- 
nite automata (Booth, 1967), as they have been ap- 
plied before to LR parsing, e.g., by Pager (1970) and 
Nederhof and Sarbo (1993). 
Let G t be the augmented grammar associated 
with a CFG G, and let I2LI~ -- {fl I (A ---, a,8) e 
pt}. We define variants of the closure and 9oto func- 
tions from the previous section as follows. For any 
set q C I2Lt~, closurel(q) is the smallest collection of 
sets such that 
(i) q C elosure'(q); and 
(ii) (Aft) e closure' (q) and (A ---* 7) • pt together 
imply (7) • closure'(q). 
Also, we define 
goto'(q, x) = {,8 I (x,8) ~ closure'(q)}. 
We now construct a finite set T~2Lrt as the smallest 
set satisfying the conditions: 
241 
(i) {S<l} 6 7~2LR; and 
(ii) for every q 6 T~2LI:t and X • V, we have 
goto'(q, X) • T~2LR, provided goto'(q, X) # @. 
As stack symbols, we take the elements from I2LR 
and a subset of elements from (V × ~2Lrt): 
Q2LR = {(X,q) I 3q'\[goto'(q',X) = q\]} U I2LR 
In a stack symbol of the form (X, q), the X serves 
to record the grammar symbol that has been recog- 
nized last, cf. the symbols that formerly were found 
immediately before the dots. 
The 2LK automaton associated with G can now 
be introduced. 
Z T ' ' Definition 3 .A2LR ---~ ( , Q2LR, 2LR, qin, qfin), 
where Q LR is as defined above, = (C>, 
q~. = (S, goto'({S.~}, S)), and T2LR contains: 
(i) (X,q) ~ (X,q) (a,q'), for every a • Z and 
(X, q), (a, q') • Q2Lrt such that q' = goto'(q, a); 
(ii) (X,q) ~+ (X,q)(e), for every (X,q) • Q2LR 
such that e • closure'(q); 
(iii) (Z,q)(~) ~ (Zg), for every (X,q) • Q2LR 
and 19 • q; 
(iv) (X,q) (o~) ~ (X,q) (A,q'), for every (X,q), 
(A,q') • Q2LR and (A ---~ c~) • pt such that 
q' = goto'(q, A). 
Note that in the case of a reduce/reduce conflict 
with two grammar rules sharing some suffix in the 
right-hand side, the gathering steps of A2Lrt will 
treat both rules simultaneously, until the parts of 
the right-hand sides are reached where the two rules 
differ. (See Leermakers (1992a) for a similar sharing 
of computation for common suffixes.) 
An interesting fact is that the automaton .A2LR is 
very similar to the automaton .ALR constructed for 
a grammar transformed by the transformation rtwo 
given by Nederhof and Satta (1994). 2 
5 The algorithm 
This section presents a tabular LR parser, which is 
the main result of this paper. The parser is derived 
from the 2LR automata introduced in the previous 
section. Following the general approach presented 
by Leermakers (1989), we simulate computations of 
2For the earliest mention of this transformation, we 
have encountered pointers to Schauerte (1973). Regret- 
tably, we have as yet not been able to get hold of a copy 
of this paper. 
these devices using a tabular method, a grammar 
transformation and a filtering function. 
We make use of a tabular parsing algorithm which 
is basically an asynchronous version of the CYK al- 
gorithm, as presented by Harrison (1978), extended 
to productions of the forms A ---* B and A ~ 
and with a left-to-right filtering condition. The al- 
gorithm uses a parse table consisting in a 0-indexed 
square array U. The indices represent positions in 
the input string. We define Ui to be Uk<i Uk,i. 
Computation of the entries of U is moderated by 
a filtering process. This process makes use of a 
function pred from 2 N to 2 N, specific to a certain 
context-free grammar. We have a certain nontermi- 
nal Ainit which is initially inserted in U0,0 in order 
to start the recognition process. 
We are now ready to give a formal specification of 
the tabular algorithm. 
Algorithm 1 Let G = (~,N,P,S) be a CFG in 
binary form, let pred be a function from 2 N to 2 N, 
let Ai,,t be the distinguished element from N, and 
let v = ala2. "'an 6 ~* be an input string. We 
compute the least (n+ 1) x (n+ 1) table U such that 
Ainit 6 U0,0 and 
(i) A 6 Uj_ 1,j 
if (A ~ aj) 6 P, A 6 pred(Uj_l); 
(ii) A 6 Uj,j 
if (A --+ e) 6 P, A E pred(Uj); 
(iii) A 6 Ui,j 
if B 6 Ui,~, C 6 Uk,j, (A ---. BC) 6 P, A 6 
pred(Ui); 
(iv) A 6 Uij 
if B 6 Uij, (A ~ B) 6 P, A 6 pred(UO. 
The string has been accepted when S 6 U0,,. 
We now specify a grammar transformation, based 
on the definition of .A2LR. 
Definition 4 Let A2LR = (S, Q2LR, T2LR, ' qin,q~,) 
be the 2L1% automaton associated with a CFG G. 
The 2LR cover associated with G is the CFG 
C2I r (G) = ( Q2Lr , P2I rt, where the rules in 
P2LR are given by: 
(i) (a,q') --* a, 
for every (X, q) ~-~ (X, q) (a, q') E T2LR; 
(ii) (e) ~ ¢, 
for every (X, q) ~-* (X, q) (e) 6 T2LR; 
(iii) (X~) ~ (X, q) (~), 
for every (X, q) (~) ~-* (X~) 6 T2LR; 
242 
(iv) (A,q') --, (a), 
for every (X, q) (or) ~-~ (X, q) (A, q') E T2La. 
Observe that there is a direct, one-to-one correspon- 
dence between transitions of.A2La and productions 
of C2LR(G). 
The accompanying function pred is defined as fol- 
lows (q, q', q" range over the stack elements): 
pred(v) = {q I q'q" ~-~ q E T2La} U 
{q \] q' E r, q' ~*q'qET~La} U 
{q I q'Er, q'q"~-~q'qET2La}. 
The above definition implies that only the tabular 
equivalents of the shift, initiate and goto transitions 
are subject to actual filtering; the simulation of the 
gathering transitions does not depend on elements 
in r. 
Finally, the distinguished nonterminal from the 
cover used to initialize the table is qin'l Thus we 
start with (t>, {S<l)) E U0,0. 
The 2LR cover introduces spurious ambiguity: 
where some grammar G would allow a certain num- 
ber of parses to be found for a certain input, the 
grammar C2Lrt(G) in general allows more parses. 
This problem is in part solved by the filtering func- 
tion pred. The remaining spurious ambiguity is 
avoided by a particular way of constructing the parse 
trees, described in what follows. 
After Algorithm 1 has recognized a given in- 
put, the set of all parse trees can be computed as 
tree(q~n, O, n) where the function tree, which deter- 
mines sets of either parse trees or lists of parse trees 
for entries in U, is recursively defined by: 
(i) tree((a, q'), i, j) is the set {a}. This set contains 
a single parse tree Consisting of a single node 
labelled a. 
(ii) tree(e, i, i) is the set {c}. This set consists of an 
empty list of trees. 
(iii) tree(Xl?,i,j) is the union of the sets T. k (x~),i,j, 
where i < k < j, (8) E Uk,j, and there is at 
least one (X, q) E Ui,k and (X~) ---* (X, q) (8) 
in C2La(G), for some q. For each such k, select 
one such q. We define 7:, ~ = {t.ts I t E ( X fl ),i,j 
tree((X,q),i,k) A ts E tree(fl, k,j)}. Each t. ts 
is a list of trees, with head t and tail ts. 
(iv) tree( ( A, q'), i, j) is the union of the sets 
T. a where (~) E Uij is such that ( A,ql ),i,j ' 
(A, q') ---* (c~) in C2La(G). We define T ~ - (a,q'),i,j 
-- {glue(A, ts) l ts E tree(c~,i,j)}. 
The function 
glue constructs a tree from a fresh root node 
labelled A and the trees in list ts as immediate 
subtrees. 
We emphasize that in the third clause above, one 
should not consider more than one q for given k in 
order to prevent spurious ambiguity. (In fact, for 
fixed X, i, k and for different q such that (X, q) E 
Ui,k, tvee((X, q),i, k) yields the exact same set of 
trees.) With this proviso, the degree of ambiguity, 
i.e. the number of parses found by the algorithm for 
any input, is reduced to exactly that of the source 
grammar. 
A practical implementation would construct the 
parse trees on-the-fly, attaching them to the table 
entries, allowing packing and sharing of subtrees (cf. 
the literature on parse forests (Tomita, 1986; Ell- 
lot and Lang, 1989)). Our algorithm actually only 
needs one (packed) subtree for several ( X, q) E Ui,k 
with fixed X,i,k but different q. The resulting 
parse forests would then be optimally compact, con- 
trary to some other LR-based tabular algorithms, as 
pointed out by Rekers (1992), Nederhof (1993) and 
Nederhof (1994b). 
6 Analysis of the algorithm 
In this section, we investigate how the steps per- 
formed by Algorithm 1 (applied to the 2LR cover) 
relate to those performed by .A2LR, for the same in- 
put. 
We define a subrelation ~+ of t -+ as: (6, uw) ~+ 
(66',w) if and only if (6, uw) = (6, zlz2".'zmw) t- 
(88l,z2..-zmw) ~- ... ~ (68re,w) = (86',w), for 
some m > 1, where I~kl > 0 for all k, 1 < k < m. 
Informally, we have (6, uw) ~+ (6~', w) if configura- 
tion (~8', w) can be reached from (6, uw) without the 
bottom-most part 8 of the intermediate stacks being 
affected by any of the transitions; furthermore, at 
least one element is pushed on top of 6. 
The following characterization relates the automa- 
ton .A2Lrt and Algorithm 1 applied to the 2LR cover. 
Symbol q E Q~Lrt is eventually added to Uij if and 
only if for some 6: 
(q;n,al...an) ~-* (di, ai+l...an) ~+ (~q, aj+l...an). 
In words, q is found in entry Ui,j if and only if, at 
input position j, the automaton would push some 
element q on top of some lower-part of the stack 
that remains unaffected while the input from i to j 
is being read. 
The above characterization, whose proof is not re- 
ported here, is the justification for calling the result- 
ing algorithm tabular LR parsing. In particular, for 
a grammar for which .A2Lrt is deterministic, i.e. for 
an LR(0) grammar, the number of steps performed 
243 
by J42LR and the number of steps performed by the 
above algorithm are exactly the same. In the case of 
grammars which are not LR(0), the tabular LR algo- 
rithm is more efficient than for example a backtrack 
realisation of -A2LR. 
For determining the order of the time complex- 
ity of our algorithm, we look at the most expen- 
sive step, which is the computation of an element 
(Xfl) E Ui,j from two elements (X, q) e Ui,k and 
(t3) E Uk,j, through (X, q) (fl) ,--% (Xfl) E T2LR. In 
a straightforward realisation of the algorithm, this 
step can be applied O(IT2LRI" Iv 13) times (once for 
each i, k,j and each transition), each step taking a 
constant amount of time. We conclude that the time 
complexity of our algorithm is O(\[ T2LR\] • IV \[Z). 
As far as space requirements are concerned, each 
set Ui,j or Ui contains at most I O2w.RI elements. 
(One may assume an auxiliary table storing each Ui.) 
This results in a space complexity O(I Q2LRI" Iv 12). 
The entries in the table represent single stack ele- 
ments, as opposed to pairs of stack elements follow- 
ing Lang (1974) and Leermakers (1989). This has 
been investigated before by Nederhof (1994a, p. 25) 
and Villemonte de la Clergerie (1993, p. 155). 
7 Empirical results 
We have performed some experiments with Algo- 
rithm 1 applied to ,A2L R and .A ~ for 4 practical LR, 
context-free grammars. For ,4 ~ LR a cover was used 
analogous to the one in Definition 4; the filtering 
function remains the same. 
The first grammar generates a subset of the pro- 
gramming language ALGOL 68 (van Wijngaarden 
and others, 1975). The second and third grammars 
generate a fragment of Dutch, and are referred to as 
the CORRie grammar (Vosse, 1994) and the Deltra 
grammar (Schoorl and Belder, 1990), respectively. 
These grammars were stripped of their arguments 
in order to convert them into context-free grammars. 
The fourth grammar, referred to as the Alvey gram- 
mar (Carroll, 1993), generates a fragment of English 
and was automatically generated from a unification- 
based grammar. 
The test sentences have been obtained by au- 
tomatic generation from the grammars, using the 
Grammar Workbench (Nederhof and Koster, 1992), 
which uses a random generator to select rules; there- 
fore these sentences do not necessarily represent in- 
put typical of the applications for which the gram- 
mars were written. Table 1 summarizes the test ma- 
terial. 
Our implementation is merely a prototype, which 
means that absolute duration of the parsing process 
G =(Z,N,P,S) 
ALGOL 68 ~ 
CORRie 
Deltra 
Alvey 
Table 1: The test material: the four grammars and 
some of their dimensions, and the average length of 
the test sentences (20 sentences of various length for 
each grammar). 
4 LR A2LR 
G space \] time space \] time 
ALGOL 68 327 375 234 343 
CORRie 7548 28028 5131 22414 
Deltra 11772 94824 6526 70333 
Alvey 599 1147 354 747 
Table 2: Dynamic requirements: average space and 
time per sentence. 
is little indicative of the actual efficiency of more 
sophisticated implementations. Therefore, our mea- 
surements have been restricted to implementation- 
independent quantities, viz. the number of elements 
stored in the parse table and the number of elemen- 
tary steps performed by the algorithm. In a practical 
implementation, such quantities will strongly influ- 
ence the space and time complexity, although they 
do not represent the only determining factors. Fur- 
thermore, all optimizations of the time and space 
efficiency have been left out of consideration. 
Table 2 presents the costs of parsing the test sen- 
tences. The first and third columns give the number 
of entries stored in table U, the second and fourth 
columns give the number of elementary steps that 
were performed. 
An elementary step consists of the derivation of ! 
one element in QLR or Q2LR from one or two other 
elements. The elements that are used in the filter- 
ing process are counted individually. We give an 
example for the case of .A~R. Suppose we derive an 
element q~ E Ui,j from an element (A -. • c~) E Ui,j, 
warranted by two elements ql,q2 E Ui, ql ~ q2, 
through pred, in the presence of ql (A --* • c~) 
ql q' e T~.~ and q2 (A ---* • c~) ~-~ q2 q' E T~R. We 
then count two parsing steps, one for ql and one for 
q2. 
Table 2 shows that there is a significant gain in 
space and time efficiency when moving from ,4~a to 
244 
G 
ALGOL 68 
CORRie 
Deltra 
Alvey 
.A ! LR 
\[T~LR\[ I \[Q\[a\[ \[ \[T~R\[ 
434 1,217 13,844 
600 1,741 22,129 
856 2,785 54,932 
3,712 8,784 1,862,492 
,A2LR 
In2LRI \[ \[O2La\[ \[ IT2Lrd 
109 724 12,387 
185 821 15,569 
260 1,089 37,510 
753 3,065 537,852 
Table 3: Static requirements. 
,A2LR. 
Apart from the dynamic costs of parsing, we have 
also measured some quantities relevant to the con- 
struction and storage of the two types of tabular LR 
parser. These data are given in Table 3. 
We see that the number of states is strongly re- 
duced with regard to traditional LR parsing. In the 
case of the Alvey grammar, moving from \[T~LR \[ to 
\]T~2LR\[ amounts to a reduction to 20.3 %. Whereas 
time- and space-efficient computation of T~LR for this 
grammar is a serious problem, computation of T~2La 
will not be difficult on any modern computer. Al- 
so significant is the reduction from \[T~R \[ to \[T2LR\[, 
especially for the larger grammars. These quanti- 
ties correlate with the amount of storage needed for 
naive representation of the respective automata. 
8 Discussion 
Our treatment of tabular LR parsing has two impor- 
tant advantages over the one by Tomita: 
* It is conceptually simpler, because we make use 
of simple concepts such as a grammar trans- 
formation and the well-understood CYK al- 
gorithm, instead of a complicated mechanism 
working on graph-structured stacks. 
• Our algorithm requires fewer LR states. This 
leads to faster parser generation, to smaller 
parsers, and to reduced time and space com- 
plexity of parsing itself. 
The conceptual simplicity of our formulation of 
tabular LR parsing allows comparison with other 
tabular parsing techniques, such as Earley's algo- 
rithm (Earley, 1970) and tabular left-corner pars- 
ing (Nederhof, 1993), based on implementation- 
independent criteria. This is in contrast to experi- 
ments reported before (e.g. by Shann (1991)), which 
treated tabular LR parsing differently from the other 
techniques. 
The reduced time and space complexities reported 
in the previous section pertain to the tabular real- 
isation of two parsing techniques, expressed by the 
automata A~, R and A2La. The tabular realisation 
of the former automata is very close to a variant of 
Tomita's algorithm by Kipps (1991). The objective 
of our experiments was to show that the automata 
~4~La provide a better basis than .A~a for tabular LR 
parsing with regard to space and time complexity. 
Parsing algorithms that are not based on the 
LR technique have however been left out of con- 
sideration, and so were techniques for unification 
grammars and techniques incorporating finite-state 
processes. 3 
Theoretical considerations (Leermakers, 1989; 
Schabes, 1991; Nederhof, 1994b) have suggested that 
for natural language parsing, LR-based techniques 
may not necessarily be superior to other parsing 
techniques, although convincing empirical data to 
this effect has never been shown. This issue is dif- 
ficult to resolve because so much of the relative ef- 
ficiency of the different parsing techniques depends 
on particular grammars and particular input, as well 
as on particular implementations of the techniques. 
We hope the conceptual framework presented in this 
paper may at least partly alleviate this problem. 
Acknowledgements 
The first author is supported by the Dutch Organiza- 
tion for Scientific Research (NWO), under grant 305- 
00-802. Part of the present research was done while 
the second author was visiting the Center for Lan- 
guage and Speech Processing, Johns Hopkins Uni- 
versity, Baltimore, MD. 
We received kind help from John Carroll, Job 
Honig, Kees Koster, Theo Vosse and Hans de 
Vreught in finding the grammars mentioned in this 
paper. Generous help with locating relevant litera- 
ture was provided by Anton Nijholt, Rockford Ross, 
and Arnd Ruflmann. 
3As remarked before by Nederhof (1993), the algo- 
rithms by Schabes (1991) and Leermakers (1989) are not 
really related to LR parsing, although some notation 
used in these papers suggests otherwise. 
245 

References 
Billot, S. and B. Lang. 1989. The structure of 
shared forests in ambiguous parsing. In 27th An- 
nual Meeting of the ACL, pages 143-151. 
Booth, T.L. 1967. Sequential Machines and Au- 
tomata Theory. Wiley, New York. 
Carroll, J.A. 1993. Practical unification-based pars- 
ing of natural language. Technical Report No. 
314, University of Cambridge, Computer Labora- 
tory, England. PhD thesis. 
Earley, J. 1970. An efficient context-free parsing al- 
gorithm. Communications of the ACM, 13(2):94- 
102. 
Harrison, M.A. 1978. Introduction to Formal Lan- 
guage Theory. Addison-Wesley. 
Johnson, M. 1991. The computational complexi- 
ty of GLR parsing. In Tomita (1991), chapter 3, 
pages 35-42. 
Kipps, J.R. 1991. GLR parsing in time O(n3). In 
Tomita (1991), chapter 4, pages 43-59. 
Lang, B. 1974. Deterministic techniques for ef- 
ficient non-deterministic parsers. In Automata, 
Languages and Programming, 2nd Colloquium, 
LNCS 14, pages 255-269, Saarbrficken. Springer- 
Verlag. 
Leermakers, R. 1989. How to cover a grammar. In 
27th Annual Meeting of the ACL, pages 135-142. 
Leermakers, R. 1992a. A recursive ascent Earley 
parser. Information Processing Letters, 41(2):87- 
91. 
Leermakers, R. 1992b. Recursive ascent parsing: 
from Earley to Marcus. Theoretical Computer 
Science, 104:299-312. 
Nederhof, M.J. 1993. Generalized left-corner pars- 
ing. In Sixth Conference of the European Chapter 
of the ACL, pages 305-314. 
Nederhof, M.J. 1994a. Linguistic Parsing and Pro- 
gram Transformations. Ph.D. thesis, University 
of Nijmegen. 
Nederhof, M.J. 1994b. An optimal tabular parsing 
algorithm. In 32nd Annual Meeting of the ACL, 
pages 117-124. 
Nederhof, M.J. and K. Koster. 1992. A customized 
grammar workbench. In J. Aarts, P. de Haan, 
and N. Oostdijk, editors, English Language Cor- 
pora: Design, Analysis and Exploitation, Papers 
from the thirteenth International Conference on 
English Language Research on Computerized Cor- 
pora, pages 163-179, Nijmegen. Rodopi. 
Nederhof, M.J. and J.J. Sarbo. 1993. Increasing 
the applicability of LR parsing. In Third Interna- 
tional Workshop on Parsing Technologies, pages 
187-201. 
Nederhof, M.J. and G. Satta. 1994. An extended 
theory of head-driven parsing. In 32nd Annual 
Meeting of the ACL, pages 210-217. 
Pager, D. 1970. A solution to an open problem by 
Knuth. Information and Control, 17:462-473. 
Rekers, J. 1992. Parser Generation for Interactive 
Environments. Ph.D. thesis, University of Am- 
sterdam. 
Schabes, Y. 1991. Polynomial time and space shift- 
reduce parsing of arbitrary context-free gram- 
mars. In 29th Annual Meeting of the ACL, pages 
106-113. 
Schauerte, R. 1973. Transformationen von 
LR(k)-grammatiken. Diplomarbeit, Universit~it 
GSttingen, Abteilung Informatik. 
Schoorl, J.J. and S. Belder. 1990. Computation- 
al linguistics at Delft: A status report. Report 
WTM/TT 90-09, Delft University of Technology, 
Applied Linguistics Unit. 
Shann, P. 1991. Experiments with GLR and chart 
parsing. In Tomita (1991), chapter 2, pages 17- 
34. 
Sheil, B.A. 1976. Observations on context-free pars- 
ing. Statistical Methods in Linguistics, pages 71- 
109. 
Sippu, S. and E. Soisalon-Soininen. 1990. Pars- 
ing Theory, Vol. II: LR(k) and LL(k) Parsing. 
Springer-Verlag. 
Tomita, M. 1986. Efficient Parsing for Natural Lan- 
guage. Kluwer Academic Publishers. 
Tomita, M., editor. 1991. Generalized LR Parsing. 
Kluwer Academic Publishers. 
van Wijngaarden, A. et at. 1975. Revised report on 
the algorithmic language ALGOL 68. Acta Infor- 
matica, 5:1-236. 
Villemonte de la Clergerie, E. 1993. Automates 
Piles et Programmation Dynamique -- DyALog: 
Une application h la Programmation en Logique. 
Ph.D. thesis, Universit@ Paris VII. 
Vosse, T.G. 1994. The Word Connection. Ph.D. 
thesis, University of Leiden. 
