How to cover a grammar 
Ren6 Leermakers 
Philips Research Laboratories, P.O. Box 80.000 
5600 JA Eindhoven, The Netherlands 
ABSTRACT 
A novel formalism is presented for Earley-like parsers. 
It accommodates the simulation of non-deterministic 
pushdown automata. In particular, the theory is applied 
to non-deterministlc LRoparsers for RTN grammars. 
1 Introduction 
A major problem of computational linguistics is the 
inefficiency of parsing natural language. The most 
popular parsing method for context-free natural lan- 
guage grammars, is the genera/ context-free parsing 
method of Earley \[1\]. It was noted by Lang \[2\], that 
Earley-like methods can be used for simulating a class 
of non-determlnistic pushdown antomata(NPDA). Re- 
cently, Tondta \[3\] presented an algorithm that simulates 
non-determlnistic LRoparsers, and claimed it to be a fast 
Mgorithm for practical natural language processing sys- 
tems. The purpose of the present paper is threefold: 
1 A novel formalism is presented for Earley-like parsers. 
A key rSle herein is played by the concept of bi- 
linear grammaxs. These are defined as context-free 
grammars, that satisfy the constraint that the right 
hand side of each grammar rule have at most two 
non-terminals. The construction of parse matrices 
• for bilinear grammars can be accomplished in cubic 
time, by an algorithm called C-paxser. It includes 
an elegant way to represent the (possibly infinite) 
set of parse trees. A case in point is the use of 
predict functions, which impose restrictions on the 
parse matrix, if part of it is known. The exact form 
and effectiveness of predict functions depend on the 
bilineax grammar at hand. In order to parse a gen- 
era\] context-free grammar G, a possible strategy 
is to define a cover for G that satisfies the bilin- 
ear grammar constraint, and subsequently parse it 
with C-parser using appropriate predict functions. 
The resulting parsers axe named Earley-like, and 
differ only in the precise description for deriving 
covers, and predict functions. 
2 We present the Lang algorithm by giving a bilin- 
ear grammar corresponding to an NPDA. Em- 
ploying the correct predict functions, the parser 
for this grammar is equivalent to Lang's algo- 
rithm, although it works for a slightly different 
class of NPDA's. We show that simulation of non- 
deterministic LR-parsers can be performed in our 
version of the Lang framework. It follows that 
Earley-like Tomita parsers can handle all context- 
free grammars, including cyclic ones, although 
Tomita suggested differently\[3\]. 
3 The formalism is illustrated by applying it to Recur- 
sire Transition Networka(RTN)\[S\]: Applying the 
techniques of deterministic LR-parsing to gram- 
mars written as RTN's has been the subject of re- 
cent studies \[9,10\]. Using this research, we show 
how to construct efficient non-deterministic LR- 
parsers for RTN's. 
2 C-Parser 
The simplest parser that is applicable to all context-free 
languages, is the well-known Cocke-Younger-Kasa~i 
(CYK) parser. It requires the grammar to be cast in 
Chomsky normal form. The CYK parser constructs, 
for the sentence zl ..zn, a parse matrix T. To each part 
zi+1 ..zj of the input corresponds the matrix element T.j, 
the value of which is a set of non-terminals from which 
one can derive zi+1..zj. The algorithm can easily be 
generalized to work for any grammar, but its complexity 
then increases with the number of non-terminals at the 
right hand side of grammar rules. Bilinear grammars 
have the lowest complexity, disregarding linear gram- 
mars which do not have the generative power of general 
context-free grammars. Below we list the recursion re- 
lation T must satisfy for general bilinear grammars. We 
write the grammar as as a four-tuple (N, E, P, S), where 
N is the set of non-terminals, E the set of terminals, P 
the set of production rules, and S 6 N the start sym- 
bol. We use variables I,J,K,L E N, ~1,~2,~z E E*, 
and i,j, kl..k4 as indices of the matrix T 1 . 
I E ~ij -~ 3J, KEN,i<kt<k2~ks~ka<j(J ~ Tktk~^ 
K E Tkak4 A I "* 81JI~2KI~ A ~a = zi+l ..zkt 
AB2 = Zk3÷1..Zk3 A B3 ~-" 2~k4-~1..Zj) 
^Bt = zi+t..zk~ a ~2 = Zk~..zi) 
The relation can be solved for the diagonal elements T,, 
independently of the input sentence. They are equal to 
the set of non-terminals that derive e in one or more 
1 Throughout the paper we identify a gr~ummar rule \[ --* 
with the boolean expression 'l directly derives ~'. 
135 
steps. Algorithms that construct T for given input, will 
be referred to as C-paxsers. The time needed for con- 
structing T is at most a cubic function of the input 
length ~, while it takes an amount of space that is a 
quadratic function of n. The sentence is successfully 
parsed, if S E Ton. From T, one can simply deduce an 
output grammar O, which represents the set of parse 
trees. Its non-termlnals axe triples < I,i,j >, where I 
is a non-termlnal of the original bilineax grammar, and 
i,j are integers between 0 and n. 
< l,i,# >--. #~ < 3,h,I~2 > fl~ < K,h,/~, > #s = 
I E T,i AI --. #13\[~Kfl3 ^ J G Th~h2 ^K G Tk~k, 
Afll = z~+l..z~ ^fl~ ---- z~+~..z~ Afls = z~+~..z# 
< I, i, j >-- fl~ < 3, h, k~ > ~ - I ~ T~j ^ I -- fl~ 3#2 
^J E Tk~ka A fll ---- zi+~..zk~ A/@2 ---- :gk3.1.1 ..Z i 
< I, i,j >--* fla _= I ~ T~# A I --* fl~ ^ & = zi+~..zj 
The grammar rules of O axe such that they generate only 
the sentence that was parsed. The parse trees according 
to the output grammar are isomorphic to the parse trees 
generated by the original grammar. The latter parse 
trees can be obtained from the former by replacing the 
triple non-terminals by their first element. 
Matrix elements of T are such that their members 
cover part of the input. This does not imply that all 
members axe useful for constructing a possible parse of 
the input as a whole. In fact, many are useless for this 
purpose. Depending on the grammar, knowledge of part 
of T may give restrictions on the possibly useful contents 
of the rest of T. Making use of these restrictions, one 
may get more efficient parsers, with the same function- 
ality. As an example, one has the generalized E~rley 
prediction. It involves functions predlct~ : 2 ~ --* 2N(N 
is the set of non-terminais), such that one can prove 
that the useful contents of the Tj~ axe contained in the 
elements of a matrix @ related to T by 
Soo = S~, 
O,~ ffi predictj_,(~.o O~,) m T,~, if j > O, 
where O c, called the initial prediction, in some constant 
set of non-termln~ls that derive (. It follows that T~$ 
can be calculated from the matrix elements O~t with i < 
k, l ~ j, i.e. the occurrences of T at the right hand side 
of the recurrence relation may be replaced by O. Hence 
0~j, j > 0, can be calculated from the matrix elements 
O~t, with ! < j: 
O~j = predict~_~(~ Os~)~ 
{II~J, xe~t,,<~<~<~<~o<_~(3 ~ 0~^ 
K ~. O~s, , AI -- fl~Jfl~Kfls Afl~ = z,+~..z~ 
Afl~ = z~+z..z~ Aria = z~,+z..z~) 
V3aeN, i<k~<_k~<j( 3 ~ Okxk~ A I "-~ fll 3~ 
Aflx = z~+~..zk~ A fl~ ---- Zk~..z~) 
V(! -- ~ ^ ~ = ~,+~..z,)) 
The algorithm that creates the matrix @ in this way, 
scanning the input from left to right, is called a re- 
stricted C-paxser. The above relation does not deter- 
mine the diagonal elements of ~ uniquely, and a re- 
stricted C-paxser is to find the smal\]est solution. Con- 
cerning the gain of efficiency, it should be noted that 
this is very grammax-dependent. For some grammars, 
restriction of the paxser reduces its complexity, while for 
others predict functions may even be counter-productive \[4\]. 
3 Bilinear covers 
A grammar G is said to be covered by a grammar C(G), 
if the language generated by both grammars is identical, 
and if for each sentence the set of parse trees generated 
by G can be recovered from the set of parse trees gen- 
erated by C(G). The grammar C(G) is called a cover 
for G, and we will be interested in covers that axe hi- 
linear, and can thus be parsed by C-paxser. It is rather 
surprising that at the heart of most parsing algorithms 
for context-free languages lies a method for deriving a 
bilineax cover. 
3.1 Earley's method 
Eaxley's construction of items is a clear example of a con- 
struction of a biHneax cover CE(G) for each context-free 
grammar G. The terminals of CE(G) and G axe iden- 
ticai, the non-terminals of Cz(G) axe the items (dotted 
rnies\[1\]) I~, defined as follows. Let the non-terminal de- 
fined by rule i of grammar G be given by N~, then I~ is 
N~ -- a. fl, with lilt + 1 = k (~, # axe used for sequences 
of terminals and non-terminais). We assume that only 
one rule, rule O, of G rewrites the start symbol S. The 
length of the right-hand side of rule i is given by M~ - 1. 
The rules of C~(G) are derived as follows. 
• Let I~ be an item of the form A --* ~ • B~, and 
hence I~ -l be A --, aB. ~. Then if B is a terminal, 
I~ -I ...* I~B, and if B is non-terminal then I~ -I -- 
I~, for all j such that Nj = B. 
• Initial items of the form N~ --- .or rewrite to e: 
• For each i one has the final rule/~ -- I~. 
In \[4\] a similar construction was given, leading to a 
grammar in canonical two-form for each context-free 
grammar. Among other things it differs from the above 
in the appearance of the final rules, which axe indeed 
superfluous. We have introduced them to make the ex- 
tension to RTN's, in section 4, more immediate. 
The description just given, yields a set of production 
rules consisting of sections P~, that have the following 
structure: 
Pi --- ~-,iI211M' ,'fI#-li -- I~ z'~/} t.,l{I~ ( -- flu {I ° -* I!}, 
where z~/ E U, {/~i) u E. Note that the start symbol of 
the cover is/~0. The construction of parse matrices T by 
C-paxser yields the Eaxley algorithm, without its pre- 
diction part. By restricting the parser by the predicto 
function satisfying 
v,edicto( W) - ( X, - ^ x, t), 
the initial prediction 0¢ being the smallest solution of 
s ° = v, dicto(S u }, 
136 
one obtains a conventional Earley parser (predict~ -~ 
U~. {I~ } for k > 0). The cover is such that usually the J 
predict action speeds up the parser considerably. 
There are many ways to define covers with dotted 
rules as non-terminals. For example, from recent work 
by Kruseman Aretz \[6\], we learn a prescription for a 
bilinear cover for G, which is smaller in size compared to 
C~(G), at the cost of rules with longer right hand sides. 
The prescription is as follows (c~, ~, 7, s are sequences 
of terminals and non-termlnaJs, ~ stands for sequences 
of terminals only, and A, B, C are non-terminals): 
• Let I be an item of the form A --* or. Bs, and K is 
an item B --* */-, then J .--, IK~, where either 
J is item A --* c~B~. C~ and ~: = ~C~, or 
J is item A --* ~B~. and s --- 6. 
• Let I be an item of the form A ---, 6 .Bc~ or A -* 6., 
then I --* 6. 
3.2 Lang grammar 
In a similar fashion the items used by Lang \[2\] in 
his algorithm for non-deterministic pushdown automata 
(NPDA) may be interpreted as non-terminals of a hi- 
linear grammar, which we will call the Lang grammar. 
We adopt restrictions on NPDA's similarly to \[2\], the 
main one being that one or two symbols be pushed on 
the stack in a singie move, and each stark symbol is re- 
moved when it is read. If two symbols &re pushed on 
the sta~k, the bottom one must be identical to the sym- 
bol that is removed in the same transition. Formally we 
write an NPDA as & 7-tuple (Q, E, r, 6, q0, Co, F), where 
Q is the set of state symbols, E the input alphabet, r 
the pnshdown symbols, 6 : Q x (I" tJ {e}) × (E U {¢}) 
--* 2 Qx((~}uru(rxr)) the transition function, qo E Q the 
initial state, ¢0 E 1` the start symbol, and F C_ Q is the 
set of final states. If the automaton is in state p, and ¢~ 
is the top of the stack, and the current symbol on the 
input tape is It, then it may make the following eight 
types of moves: 
if (r, e) E 6(p, e, e): gO to state r 
if (r, e) E 6(p, or, e): pop ~, go to state r 
if (r, 3") ~ 6(p, a, e): pop ~, push 3', go to state r 
if (r, e) ~ 6(p, e, It): shift input tape, go to state r 
if (r, 3') E 6(p, e, It): push 7, shift tape, go to r 
if (r, e) ~ 6(p, c~, It): pop ~, shift tape, go to r 
if (r, 3") ~ 6(p, ¢~, It): pop c~, push % shift tape, go to r 
if (r, 3"or) ~ 6(p, ~, y): push % shift tape, go to r 
We do not allow transitions such that (r, ~r) ~ 6(p, e, e), 
or (r, "yo~) ~ 6(p, ~, e), and assume that the initial state 
can not be reached from other states. 
The non-terminals of the Lang grammar are the start 
symbol 3 and four-tuple entities (Lang's 'items') of the 
form < q, c~,p, ~ >, where p and q axe states, and cr and 
stack symbols. The idea is that iff there exists a com- 
putation that consumes input symbols zi..zj, starting 
at state p with a stack ~0 (the leftmost symbol is the 
top), and ending in state q with stack ~0, and if the 
stack fl(o does not re-occur in intermediate configura~ 
tions, then < q,a,p,~ >---" z~..zj. The rewrite rules 
of the La~g grammar are defined as follows (universal 
quantification over p, q, r, s E Q; ~, ~, 7 E 1`; z E ~, t.J e, 
It E E is understood): 
S -*< p,a, qo,¢0 >-p E F (final rules) 
< r,~,s, 7 >--,< q,~,s, 7 >< p,c~,q,/3 > z ---- 
(,', ~) ~ 6(p, ~, ~) 
< r, 7, q, ~ >--"< P, ct, q, ~ > z 
((,', ~) ~ 6(,,,,, ~, z))V ((,', '0 E 5(p, e, ,~) ^ (~ = 7)) 
< r, 7,P,a >---, It 
((,, ~) ~ 6(p, ~, It))v ((,, ~) ~ ~(p, ~, It)) 
< q0, ~0, g0, ¢0 >--* e (initial rule) 
From each NPDA one may deduce context-free gram- 
mars that generate the same language \[5\]. The above 
construction yields such a grammar in bilinear form. 
It only works for automata, that have transitions like 
we use above. Lang grammars are rather big, in the 
rough form given above. Many of the non-terminals do 
not occur, however, in the derivation of any sentence. 
They can be removed by a standard procedure \[5\]. In 
addition, during parsing, predict functions can be used 
to limit the number of possible contents of parse ma- 
trix elements. The following initial prediction and pre- 
dict functions render the restricted C-parser functionally 
equivalent to Lang's original algorithm, albeit that Lang 
considered & class of NPDA's which is slightly different 
from the class we alluded to above: 
s ° = {< q0,¢0,q0,¢0 >} 
predictk(L) = ~ if k = 0 else 
predic~h(L) -- {< s,~,q,~ > 13,,~ < ¢,~,r, 3" >~ L} 
u{Slk ffi n} (n is sentence length). 
The Tomita parser \[3\] simulates an NPDA, con- 
structed from a context-free grammar via LR-parsing tw 
hies. Within our formalism we can implement this idea, 
and arrive at an Earley-like version of the Tomita parser, 
which is able to handle general context-free grammars, 
including cyclic ones. 
4 Extension to RTN's 
In the preceding section we discussed various ways of 
deriving bilinear covers. Reversely, one may try to dis- 
cover what kinds of grammars are covered by certain 
bllinear grammars. 
A billnear grammar C~(G), generated from a context- 
free grammar by the Earley prescription, has peculiar 
properties. In general, the sections P~ defined above con- 
stitute regular subgrammars, with the ~ as terminals. 
Alternatively, P~ may be seen as a finite state automa- 
ton with states I~. Each rule I~ -l --.//Jz~ corresponds 
to a transition from I~ to I~ -l labeled by z~. This cot- 
respondence between regular grammars and finite state 
137 
automata is in fact a special instance of the correspon- 
dence between Lang bilinear grammars and NPDA's. 
The Pi of the above kind are very restricted finite 
state automata, generating only one string. It is a natu- 
ral step to remove this restriction and study covers that 
are the union of general regular subgrammars. Such a 
grammar will cover a grammar, consisting of rules of 
the form N~ -. ~, where ~ is a regular expression of 
terminals and non-terminals. Such grammars go under 
the names of RTN grammars \[8\], or extended context- 
free grammars \[9\], or regular right part grammars \[10\]. 
Without loss of generality we may restrict the format 
of the fufite state automata, and stipulate that it have 
one initial •tale I~' and one final state/~, and only the 
following type of rules: 
• final rules P, -. I~ 
• rules I I -- .\[~z, where z ~ Um{J°m} U ~, k <> 0 
and j <> M~. 
• the initial rule I/M~ -- (. 
For future reference we define define the set I of non- 
terminals as I = U,${I~}, and its subset/o = U,{/~i }. 
A covering prescription that turns an RTN into a set 
of such subgrammars, reduces to C~ if applied to normal 
context-free grammars, and will be referred to by the 
same name, although in general the above format does 
not determine the cover uniquely. For some example 
definitions of items for RTN's (i.e. the I~), see \[1,9\]. 
5 The CNLR Cover 
A different cover for RTN grammars may be derived 
from the one discussed in the previous section. So 
our starting point is that we have a biline&r grammar 
C£(G), consisting of regular subgrammars. We (approx- 
imately) follow the idea of Tomita, and construct an 
NPDA from an LR(O)-antomaton, whose states are sets 
of items. In our case, the items are the non-terminals 
of C~(G). The full specification of the automaton is ex- 
tracted from \[9\] in a straightforward way. Subsequently, 
the general prescription of chapter 3 yields a bilinear 
grammar. In this way we arrive at what we would like to 
call the canonical non-deterministic LR-parser (CNLR 
parser, for short). 
5.1 LR(0) states 
In order to derive the set Q of LR(0) states, which are 
subset• of I, we first need a few definitions. Let • be an 
element of 2 I, then closure(s) is the smMlest element of 
2 x, such that 
s c closure(s)^ ((~! ~ ~osure(s)^ (xp - xl~)) 
x= ~- ~ aos.re(s)) 
Similarly, the sets gotot(s, z), and goto.j(s, z), where z E 
/o U E, are defined as 
goto~(s, ffi) = closu,e({~'l 
II ~ s ^ (I,* --. I!~) ^ j <> M,}) 
goto~(s, ~) = closure({I?lI, ~' ~ • ^ (Ip - I~'ffi)}) 
The set Q then is the smallest one that satisfies 
aosnre({&~°}) ~ q^ (~ ~ q * 
(gaot(s, =) = O V gotot(s, z) ~ q)^ 
Oao2(,, z) = O v go,o2(s, ~) ~ q)) 
The automaton we look for can be constructed in terms 
of the LR(0) states. In addition to the goto function•, 
we will need the predicate reduce, defined by 
,'edna(s,_:) -- 3,,((~ -- X~') ^Xl' ~ s). 
A point of interest is the possible existence of •tacking 
conflicts\[9\]. These arise if for some s, z both gotol (s, z) 
and goto2(a, x) are not empty. Stacking conflict• cause 
an increase of non-determinism that can always be 
avoided by removing the conflicts. One method for do- 
ing this has been detailed in \[9\], and consist• of the split- 
ting in parts of the right hand side of grammar rule• that 
cause conflicts. Here we need not and will not assume 
anything about the occurrence of stacking conflict•. 
Grammars, of which Earley cover• do not give rise 
to stacking conflicts, form a proper subset of the set 
of extended context-free grammars. It could very well 
be that natural language grammar•, written as RTN's in 
order to produce 'natural' syntax trees, generally belong 
to this subset. For an example, see section 6. 
5.2 The automaton 
To determine the automaton we specify, in addition to 
the set of states Q, the set of stack symbols F ---- QUI°u 
{Co}, the initial state q0 = closure({IoM°}), the final 
states F ffi {slrednce(s, ~)}~ and the transition function 
& 
6(s, -f, y) = {(t, q'f)l "f ~/°A 
(0 = goto~(s, y) ^ q ffi s) v(~ = gotol(s, y) ^ q = +))} 
6(8,-r, ¢) -- {(t, q)l~ E/°h 
((t = gotot (s, "f) Aq = ¢)V ((t = goto2 (s, 7) A q = s))} 
u{(~, ~)l'f ~ q ^ reduce(s, ~)} 
5.3 The grammar 
From the automaton, which is of the type discussed in 
section 3.2, we deduce the bilinear grammar 
S --< s,~,q0,¢0 >= reduce(s,~) 
< t,r,q,~ >---~< s,r,q,/~ > y = t = gotoz(s,y) 
< t, s, s, r >--. y -- t = goto2(•, y) 
< t,#,p,~ >-.< q,~,p,~ > < s,/°, ,q,~ > 
- t = gotol(s,l °) 
< t,s,q,~ >-.< s, I2,q,~ >=- t = goto~(•,l'~,) 
< p,l~,q,~ >--*< s,p,q,# > -- reduce(s,I °) 
< qo, Co, qo,¢o >"* ~, 
where $,t,q,p E Q, r E QU{C0}, ~,/~ E r, y E E. 
A• was mentioned in section 3.2, this grammar can be 
reduced by a standard algorithm to contain only useful 
non-terminals. 
138 
5.3.1 A reduced form 
If the reduction algorithm of \[5\] is performed, it turns 
out that the structure of the above grammar is such that 
useful non-terminals < p, ¢~, q, ~ > satisfy 
a ~Q=~.otfq 
~f~Q=~p=q 
Furthermore, two non-terminals that differ only in their 
fourth tuple-element always derive the same strings of 
terminals. Hence, the fourth element can safely be dis- 
carded, as can the second if it is in Q and the first if 
the second is not in Q. The non-termlnals then become 
pairs < ~, s >, with ~ ~ I' and s ~ Q. For such non- 
terminals, the predict functions, mentioned in section 2, 
must be changed: 
0 ° = {< ~o,~o >} 
pcedia~(L) = 0 if k = 0 else 
predicts(L) = {< ~, ~ > 13~ < s, q >E L} U {Sit = n} 
The grammar gets the general form 
S --*< s, qo >---- reduce(s,/~o) 
< t,q >--*< ~,q > //= t = gotot(s, 9) 
< t, s >--* y ---- t = gotoa(s, y) 
< ~,0 >-< ,,~ >< P,,, > - ~ = ~oto:(,,~) 
< ~,, >-< ~, s >= ~ = ~o~o~(,, ~) 
< ~, q >-<., ~ > __. reau~(s, ~) 
Note that the terminal < q0, q0 > does not appear in 
this grammar, but will appear in the parse matrix be- 
cause of the initial prediction 0 c. Of course, when the 
automaton is fully specified for a particular language, 
the corresponding CNLR grammar can be reduced still 
further, see section 6.4. 
5.3.2 Final form 
Even the grammar in reduced form contains many non- 
terminals that derive the same set of strings. In partic- 
ular, all non-terminals that only differ in their second 
component generate the same language. Thus, the sec- 
ond component only encodes information for the predict 
functions. The redundancy can be removed by the fol- 
lowing means. Define the function ¢ : I' -. 2 Q, such 
that 
~(~r) ---- {s{ < or, s > is a useful non-terminal of the 
above grammar}. 
Then we may simply parse with the 'bare' grammar, the 
non-terminals of which are the automaton stack symbols 
F: 
S --* S ~ reduce(s, ~0) 
t --. sy --t =.gotoz(s,y) 
* -- P, - ~,(~ = goto20, ~,)) 
I~, -. ~ - reduce(s, I°), 
using the predict functions 
0 ° = {qo} 
predicth(L) = ~ if k = 0 else 
preaiah(Z,) = {~1~,(" ~ L^, ~ ~(~))} u {Slk = .}. 
The function ¢ can also be deduced directly from the 
bare grammar, see section 7. 
5.4 Parse trees 
Each parse tree r according to the original grammar can 
be obtained from a corresponding parse tree t according 
to the cover. Each subset of the set of nodes of t is par- 
tially ordered by the relation 'is descendant of'. Now 
consider the set of nodes of t that correspond to non- 
terminals/~. The 'is descendant of' ordering defines a 
projected tree that contains, apart from the terminals, 
only these nodes. The desired parse tree r is now ob- 
tained by replacing in the projected tree, each node 1 ° 
by a node labeled by N~, the left hand side of grammar 
rule i of the original grammar. 
6 Example 
The foregoing was rather technical and we will try to re- 
pair this by showing, very explicitly, how the formalism 
works for a small example grammar. In particular, we 
will for a small RTN grammar, derive the Earley cover 
of section 4, and the two covers of sections 5.3.1 and 
5.3.2. 
6.1 The grammar 
The following is a simple grammar for finite subordinate 
clauses in Dutch. 
$ -* conj NP VP 
VP --* \[NP\] {PP} verb \[S\] 
PP --* prep NP 
NP --* det noun {PP} 
So we have four regular expressions defining No = S, 
N1 ffi V P, N2 = P P, N3 -- N P. 
6.2 The Earley cover 
The above grammar is covered by four regular subgrarn- 
m aA's" 
~0 - z~;I~ - I0~z,°; Zo ~ - I~; Ig - Io`co.j; Io' - 
- x~;g - I~;II - I~Ig;x ~, - I~erb;X~ - x?~;x~ 
- I,*~;P, - ~?,,erb;¢? - z~z°;z~ - 
x~ &; P, - Xb, erb; x~ - 
z** .o.; ~ - It ae*; xt - 
Note that the Mi in this case turn out as M0 = 4, Mz = 
5, M~ = 3, M3 = 4. 
139 
6.3 The automaton 
The construction of section 5.1 yields the following set 
of states: 
qo = {I~}; ql = {I~,I~}; q2 = {~,I\[,I~,~}; 
qa = {I~}; q, -- {IoI }; qs ffi {I~,I$}; q* = {I~,I~}; 
q, = {Xo~, x,=}; qs = {P,,xD;qo = {zL xD; 
qlO = {R};q- = {R}; ¢12 = {xLR} 
The transitions axe grouped into two parts. First we list 
the function goto~: 
goto2(¢0, ~o,=~) = ~; goto=(¢l, det) ffi ¢~; 
go=o.(q~, P.) ffi qs; OO=O~(q2, ~) ffi ~s; 
goto2(,2, verb) ffi q.; goto~(~2, prep) = ~; 
go¢o2(q2, de0 = q~; got~(~, prep) = qs; 
goto~(qs,prep) = qs; goto~(qr, conj) -- ql; 
goto~(qs, det) = qa; goto~(qs,prep) = qs; 
goto2(ql=, prep) "J-- qs 
Likewise, we have the gotot function, which gives the 
non-stacking transitions for our grammar: 
gotol (ql , ~) = q'a; gotol (q,, I~ ) = q,; 
gotol (q~, noun) = q~; gotol (qs, g) ---- qs; 
gotol(qs, verb) = ~,; goto~(qs, ~=) = qs; 
goto, (~, , Po ) = elo; goto, (es, ~) = q. ; 
go,o, (e., ~) = el=; go,o, (q,=, g) = e,, 
The predicate reduce holds for six pairs of states and 
non-terminals: 
redu~O,, Po); redu=O,o, ~); redffi~(q,, ~); 
reduce(q,l , \]~=); reduce(q,, g); reduce(ql=, l~a ) 
6.4 CNLR parser 
Given the automaton, the CNLR grammar follows ac- 
cording to section 5.3. After removal of the useless non- 
terminals we arrive at the following grammar, which is 
of the format of section 5.3.1. 
S ..--,< q4,qo > 
< q~,q >-.< q~,q > noun, where q E \[ql,q~,qs\] 
< qT, q~ >--*< qs,q~ > verb 
< q~,q >-* conj, where q E \[qo, qT\] 
< q~,q >--* det, where q E \[qt,q~,qs\] 
< q?, q2 >--* verb 
< qs,q >"* prep, where q E \[q~,q~,qs,qe,q~\] 
< q~,q >-*< ql,q > </~,q~ >, where q ~ \[qo,qT\] 
< qt,q >--*< q~,q > </~t,q~ >, where q E \[qo, qT\] 
< qs, q2 >'-*< qs,q~ >< I~, qs > 
< qs, q~ >'-'*< qs,q~ ></~,qs > 
< qlo, q2 >"'*< ql', q2 >< ~0, q? > 
< q~,q >-'*< qs,q > < /~,qs >, where q E 
\[q~, ~s, qs, w, q~2\] 
< ql2, q >-"*< qs, q > < ~,q9 >, where q E \[ql,q2,qs\] 
< q12, q >"*< ql2, q > < /~2,q12 >, where q E 
\[~,,q2,qd 
< qs,~ >-*< ~,q2 >, < qs,q2 >-.< ~,q2 > 
< I~o,qv >--.< q4,q7 >, < I~l,q2 >-.< qlo,q2 > 
"</~x,q2 >-'*< qT, q2 > 
< \]~2,q >"~< qll,q >, where q E \[q2,qs,qe,qo, q12\] 
< I°,q >-*< qs, q >, where q E \[qx,q2,qs\] 
</~3,q >'-'~< q12,q >, where q E \[ql,q2,qs\] 
From this grammar, the function ¢ can be deduced. It 
is given by 
~(¢1) ffi ~(q2 ffi ~(q.) = \[¢0, q,\] 
~r(q3) -- ~(qg) --- a(q12) ---- ~(I °) = \[ql, q2, qs\] 
.(q~) = ~(¢s) = #(q,) = ~(q~0) = ~(:) = \[q2\] 
~0s) = ~(q-) = ~(~) = \[q2, q~, q~, q~, q12\] 
~(g) = \[q,l 
Either by stripping the above cover, or by directly de- 
ducing it ~om the automaton, the bare cover can be 
obtained. We list it here for completeness. 
S -* q4, q9 -* q3noun, q? "* qsverb 
ql -* conj, q3 --* det, q7 "* verb 
qs "* prep, q2 "* qlI~3, q4 "* q2\]~z 
qn "* qs~, g12 "-* qs~, q12 --* q12~ 
- qlo, ~ - q,, ~ - qll 
~- q,, ~- q,2, 
Together with the predict functions defined in section 
5.3.2, this grammar should provide an efficient parser 
for our example grammar. 
7 Tadpole Grammars 
The function ~ has been defined, in section 5, via a 
grammar reduction algorithm. In this section we wish to 
show that an alternative method exists, and, moreover, 
that it can be applied to the class of bilinear tadpole 
grammars. This class consists of all bilineax grammars 
without epsilon rules, and with no useless symbols, with 
non-termlnals (the head) preceding terminals (the tail) 
at the right hand side of rules.Thus, rules are of the form 
A -* a6, 
where we use the symbol 6 as a variable over possibly 
empty sequences of terminals, and a denotes a possibly 
empty sequence of at most two non-terminals. Capital 
romu letters are used for non-terminals. Note that a 
CNLR cover is a member of this class of grammars, as 
are all grammars that are in Chomsky normal form. 
First we change the grammar a little bit by adding 
q0 to the set of non-terminals of the grammar, assum- 
ing that it was not there yet. Next, we create a new 
140 
grammar, inspired by the grammar of 5.3.1, with pairs 
< A, C > as non=terminals. The rules of the new gram- 
mar are such that (with implicit universal quantification 
over all variables, as before) 
< A, C >-..~ 6 -- A -.~ 6 
< A,C >.--~< B,C > 6 m__A..-~ B6 
< A,C >-~< B,C >< D,B > 8 =_ A-. BD8 
The start symbol of the new grammar, which can be 
seen as a parametrized version of the tadpole grammar, 
is defined to be < S, qo >. A non-terminal < B, C > is a 
useful one, whence C E ~(B) according to the definition 
of ~, if it occurs in a derivation of the parametrized 
grammar: 
< S, qo >---" ~ < B, C > A, 
where i¢ is an arbitrary sequence of non-terminals, and 
A is a sequence of terminals and non-terminals. Then, 
we conclude that 
q0 E ~(B) -< S, q0 >-.'< B, q0 > A 
C E ~r(B) ^ C <> q0 --- 3A,~(< A,C >--'< B,C >/, 
^ < S, qo >--* " s < C,D >< A,C > A) 
This definition may be rephrased without reference 
to the parametrized grammar. Define, for each non- 
terminal A a set firstnonts(A), such that 
firstnonts(A) --.. {BIA --" BA}. 
The predict set o(A) then is obtainabh as 
• (s) = {Cl3.~,v,,(a ~. firstnonts(A)A 
D -- CA6)} u {qolS E firstnonts(S)}, 
where S is the start symbol. As in section 5.3.2, the 
initial prediction is given by 0= = {q0}. 
8 An LL/LR-automaton 
In order to illustrate the amount of freedom that ex- 
ists for the construction of automata and associated 
parsers, we shall construct a non-deterministic LL/LR- 
automaton and the associated cover, along the lines of 
section 5. 
8.1 The automaton 
We change the goto functions, such that they yield sets 
of states rather that just one state, as follows: 
go=o,(s, z) ---- {dosure({I,~})l 
Zl ~ s ^ (Z~ -- ZI=) A j <> M,} 
goto~O, =) = {ao.ure({z~})lZ, ~' e s A (Z, ~ -- Z,~'=)} 
The set Q is changed accordingly to be the smallest one 
that satisfies 
ctos,,re({Xo"°}) E Q^ (s E q =~ 
(go=o,(s, =) = 0 v goto,(s, =) c q)^ 
(goto2(s, z) m ~ V gotoa(s, z) C q)) 
Every state in this automaton is defined as a set 
clos~re({I~ }) and is, as a consequence, completely char- 
acterized by the one non-terminal I~. The reason for 
calling the above an LL/LR-automaton lies in the fact 
that the states of LR(0) automata for LL(1) grammars 
have exactly this property. The predicate reduce is de- 
fined as in section 5.1. 
8.2 The LL/LR-cover 
The cover associated with the LL/LR-automaton just 
defined, is a simple variant of the cover of section 5.3.2: 
S -- s -ffi reduce(s, I °) 
t -* 8y =-- t E gotox(s,g) 
t -. y - 3,0 ~ ao~oz(s, y)) 
t - sP,, - ~ ~ goto, O,z °) 
t -- I ° = 3,(t E goto2(s, I°)) 
-. s - reduce(s, I°), 
As it is of the tadpole type, the predict mechanism works 
as explained in section 7. 
We just mentioned that each LL/LR-state, and hence 
each non-terminal of the LL/LR-cover, is completely 
characterized by one non-terminal, or 'item', of the 
Earley cover. This correspondence between their non- 
terminals leads to a tight connection between the two 
covers. Indeed, the cover we obtained from the LL/LR- 
automaton can be obtained from the cover of section 
4, by eliminating the e-rules-I~ ~ --~ e. Of course, the 
predict functions associated to both covers differ consid- 
erably, as it axe the non-terminals deriving e, the items 
beginning with a dot, that axe the object of prediction 
in the Earley algorithm, and they axe no longer present 
in the LL/LR-cover. 
9 Efficiency 
We have discussed a number of bilinear covers now, and 
we could add many more. In fact, the space of bilinear 
covers for each context-free grammar, or RTN grammar, 
is huge. The optimal one would be the one that makes 
C-parser spend the least time on the average sentence. 
In general, the least time will be more or less equivalent 
to the smallest content of the parse matrix. Naively, 
this content would be proportional to the size of the 
cover. Under this assumption, the smallest cover would 
be optimal. Note that the number of non-terminals of 
the CNLR cover is equal to the number of states of the 
LR-antomaton plus the number of non-terminals of the 
original grammar. The size of the Earley cover is given 
by the number of items. In worst case situations the size 
of the CNLR cover is an exponential function of the size 
of the original grammar, whereas the size of the Ea~ley 
cover dearly grows linearly with the size of the original 
grammar. For many grammars, however, the number 
of LR(0)-states, may be considerably smaller than the 
number of items. This seems to be the case for the nat- 
ural language grammaxs considered by Tomita\[3\]. His 
141 
data even suggest that the number of LR(0) states is a 
sub-linear function of the original grammar size. Note, 
however, that predict functions may influence the re- 
lation between grammar size and average parse matrix 
content, as some grammars may allow more restrictive 
predict functions then others. Summarizing, it seems 
unlikely, that a single parsing approach would be opti- 
mal for all grammars. A viable goal of research would 
be to find methods for determining the optimal cover 
for a given grammar. Such research should have a solid 
experimental back-bone. 
The matter gets still more complicated when the orig- 
inal grammar is an attribute grammar. Attribute evalu- 
ation may lead to the rejection of certain parse trees that 
are correct for the grammar without attributes. Then 
the ease and efficiency of on-the-fly attribute evalua- 
tion becomes important, in order to stop wrong parses 
as soon as possible. In the Rosetta machine transla- 
tion system \[11,12\], we use an attributed RTN during 
the analysis of sentences. The attribute evaluation is 
bottom-up only, and designed in such a way that the 
grammar is covered by an attributed Earley cover. 
Other points concerning efficiency that we would like 
to discuss, are issues of precomputation. In the con- 
ventional Earley parser, the calculation of the cover is 
done dynamically, while parsing a sentence. However, it 
could just as well be done statically, i.e. before parsing, 
in order to increase parsing performance. For instance, 
set operations can be implemented more efficiently if the 
set elements are known non-terminals, rather than un- 
known items, although this would depend on the choice 
of programming language. The procedure of generating 
bilinear covers from LR-antomata should always be per- 
formed statically, because of the amount of computation 
involved. Tomita has reported \[3\], that for a number of 
grammars, his parsing method turns out to be more efli- 
cient than the Earley ~gorithm. It is not clear, whether 
his results would still hold if the creation of the cover 
for the Earley parser were being done statically. 
Onedmight be inclined to think that if use is made 
of precomputed sets of items, as in LR-parsers, one is 
bound to have a parser that is significantly different from 
and probably faster than Earley's algorithm, which com- 
putes these sets at parse time. The question is much 
more subtle as we showed in this paper. On the one 
hand, non-deterministic LR-parsing comes down to the 
use of certain covers for the grammar at hand, just like 
the Earley algorithm. Reversely, we showed that the 
Earley cover can, with minor modifications, be obtained 
from the LL/LR-automaton, which also uses precom- 
puted sets of items. 
10 Conclusions 
We studied parsing of general context-free languages, by 
splitting the process into two parts. Firstly, the gram- 
mar is turned into bilinear grammar format, and sub- 
sequently a general parser for bilinear grammars is ap- 
plied. Our view on the relation between parsers and 
covers is similar to the work on covers of Nijholt \[7\] for 
grammars that are deterministically parsable. 
We established that the Lung algorithm for simulat- 
ing pushdown automata, hides a prescription for deriv- 
ing bilinear covers from automata that satisfy certain 
constraints. Reversely, the LR-parser construction tech- 
nique has been presented as a way to derive automata 
from certain bilinear grammars. 
We found that the Earley algorithm is intimately re- 
lated to an automaton that simulates non-deterministic 
LL-parsing and, furthermore, that non-deterministic 
LR-automata provide general parsers for context-free 
grammars, with the same complexity as the Earley al- 
gorithm. It should be noted, however, that there are as 
many parsers with this property, as there are ways to 
obtain bilinear covers for a given grammar. 
References 
1 Earley, J. 1970. An Efficient Context-Free Parsing 
Algorithm, Communication8 ACM 13(2):94-102. 
2 Lang, B. 1974. Deterministic Techniques for Efficient 
Non-deterministic Parsers, Springer Lecture Notes 
in Computer Science 14:255-269. 
3 Tomita, M. 1986. Efficient Parsing for Natural Lan- 
guage, Kluwer Academic Publishers. 
4 Graham, S.L., M.A. Harrison and W.L. Ruzzo 1980. 
An improved context-free recognizer, ACM trans. 
actions on Progr. Languages and Systems 2:415- 
462. 
5 Aho, A.V. and J.D. Ullman 1972. The theory of pars- 
ing, translation, and compiling, Prentice Hall Inc. 
Englewood Cliffs N.J. 
6 Kruseman Aretz, F.E.J. 1989. A new approach to 
Earley's parsing algorithm, Science of Computer 
Programming volume 12.. 
T Nijholt, A. 1980. Context-free Grammars: Cov- 
ers, Normal Forms, and Parsing, Springer Lecture 
Notes in Computer Science 93. 
8 Woods, W.A. 1970. Transition network grammars for 
natural language analysis, Commun. ACM 13:591- 
602. 
9 Purdom, P.W. and C.A. Brown 1981. Parsing ex- 
tended LR(k) grammars, Acta \[n\]ormatica 15:115- 
127. 
10 Nagata, I and M. Sama 1986. Generation of Efficient 
LALR Parsers for Regular Right Part Grammars, 
Acta In\]ormatica 23:149-162. 
11 Leermakers, R. and J. Rons 1986. The Transla- 
tion Method of Rosetta, Computers and Transla- 
tion 1:169-183. 
12 Appelo L., C Fellinger and J. Landsbergen 1987. 
Subgrammars, Rule Classes and Control in the 
Rosetta Translation System, Proceedings o/ 3rd 
Conference ACL, European Chapter, Copenhagen 
118-133. 
142 
