A Simple Transformation for Oflqine-Parsable Grammars and its 
Termination Properties 
Marc Dymetman* 
Rank Xerox Research Centre 
6, chemin de Maupcrtuis 
Meylan, 38240 Fra,nce 
dyraetmanOxerox, fr 
Abstract We present, in easily reproducible terms, 
a simple transformation for offline-parsable grammars 
which results in a provably terminating parsing pro- 
gram directly top-down interpretable in Prolog. The 
transformation consists in two steps: (1) removal 
of empty-productions, followed by: (2) left-recursion 
elimination. It is related both to left-corner parsing 
(where the grammar is compiled, rather than inter- 
preted through a parsing program, and with the ad- 
vantage of guaranteed termination in the presence of 
empty productions) and to the Generalized Greibach 
Normal Form for I)CGs (with the advantage of imple- 
mentation simplicity). 
1 Motivation 
Definite clause grammars (DCGs) are one of the sim- 
plest and most widely used unification grammar for- 
malisms. They represent a direct augmentation of 
context-free grammars through the use of (term) uni- 
fication (a fact that tends to be masked by their usual 
presentation based on the programming language Pro- 
log). It is obviously important to ask wether certain 
usual methods and algorithms pertaining to CFGs can 
be adapted to DCGs, and this general question informs 
much of the work concerning I)CGs, as well as more 
complex unification grammar formalisms (to cite only 
a few areas: Earley parsing, LR parsing, left-corner 
parsing, Greibach Norinal l,'orm). 
One essential complication when trying to generalize 
CFG methods to the I)CG domain lies in the fact that, 
whereas the parsing problein for ClOGs is decidable, 
the corresponding problem for DCGs is in general an- 
decidable. This can be shown easily as a consequence 
of the noteworthy fact that any definite clause pro- 
gram can be viewed as a definite clause grammar "on 
the empty string", that is, as a DCG where no termi- 
nals other than \[ \] are allowed on the right-hand sides 
of rules. The ~Itlring-completeness of defnite clanse 
programs therefbre implies the undecidability of the 
parsing problem for this snbclass of DCGs, and a for- 
tiori for DCGs in general. 1 In order to guarantee good 
*Thaalks to Pierre Isabelle and Frangols Perrault for their 
comments, and to C,\[TI (Montreal) for its support during the 
preparation of this paper. 
1 I)CGs on I, he empty string might be dismissed as extreme, 
computationM properties for DCGs, it is then neces- 
sary to impose certain restrictions on their form such 
as o\[fline-parsability (OP), a nomenclature introduced 
by Pereira and Warren \[11\], who define an OP DCG 
as a grammar whose context-free skeleton CFG is not 
infinitely ambiguous, and show that OP DCGs lead to 
a decidable parsing problem. 2 
Our aim in this paper is to propose a simple trans- 
formation lbr an arbitrary OP DCG putting it into 
a form which leads to the completeness of the direct 
top-down interpretation by the standard Prolog inter- 
preter: parsing is guaranteed to enumerate all solutions 
to the parsing problem and terminate. The e.xistence 
of such a transformation is kuown: in \[1, 2\], we have 
recently introduced a "Generalized Greibach Normal 
Form" (GGNF) for DCGs, which leads to termination 
of top-down interpretation in the OP case. lIowever, 
the awdlable presentation of the GGNF transforma- 
tion is rather complex (it involves an algebraic study 
of the fixpoints of certain equational systems repre- 
senting grammars.). Our aim here is to present a re- 
lated, but much simpler, transformation, which from a 
theoretical viewpoint performs somewhat less than the 
GGNF transformation (it; involves some encoding of 
the initial DCG, which the (~GNF does not, and it only 
handles oflline-parsable grammars, while the GGNF is 
defined for arbitrary DCGs), a but in practice is ex- 
tremely easy to implement and displays a comparable 
behavior when parsing with an OP grammar. 
3'he transformation consists of two steps: (1) empty- 
production elimination and (2) left-recursion elimina- 
tion. 
The empty-production elimination Mgorithm is in- 
spired by the nsnal procedure for context-free gram- 
mars. But there are some notable differences, due 
to the fact that removal of empty-productions is in 
general impossible for non-OP I)CGs. The empty- 
but they are in fact at the core of the oflline-parsability concept. 
See note 3. 
2'lThe concept of ofllineA~arsability (under a different name) 
goes back to \[8\], where it is shown to be linguistically relevant. 
aThe GGNF factorizes an arbitrary DCG into two compo- 
nents: a "unit sub-DCG on the empty string", and another paa't 
consisting of rules whose right-hand side starts with a tm'mi- 
nal. The decidability of the DCG depends exclusively on certain 
simple textual properties of the unit sub-DCG. This sub-l)CG 
can be eliminated fl'om the GGNF if and only if the DCG is 
of Illne-parsable. 
1226 
production elimination a(gorithm is guaranteed to ter- 
minate only in the OP ease. 't It produces a I)C(\] 
declaratively equivalent to the. original grammar. 
The left-recursion elimination ~dgorithnt is adapted 
from a transR)rmation proposed in \[4\] in the context 
of a certain formalism ("l,exical Grammars") which 
we presented as a possible basis for bui(ding reversible 
grammars, a The key observation (in slightly different 
terms) was that, in a I)CG, ifa nontermiual g is defined 
(itcrMly by the two rules (the first of which is left- 
reeursive): 
:+\') --+ g(Y), a(v, x). 
,(x) --, ~(x). 
then the replacement of these two rules by the three 
rules (where d_tc is a new nonterminal symbol, which 
represents a kind of "transitive c(osure" of d): 
g(X) -, t(Y), d_tc(r, X). 
,/_re(X, x) -+ \[\]. 
d_tc(X, Z) -, d(X, V), d_tc(r, Z). 
l)reserves the declarative semantics o1' tim grammar, s 
We remarked in \[4\] that this transformation :'is 
closely re(ated to le('t<.orner pa.rsing", but did not give 
details. In a recent paper \[7\], Mark Johnson introduces 
"a left-corner program transR)rmation for natural (an- 
guage parsing", which has some similarity to the abow~ 
transformation, but whic.h is applied to definite clause 
programs, rather than to ()CGs. lie proves that this 
transformation respects deelarative equivalcnee, and 
also shows, using a mode(q;heoretic approach, the close 
connection of his transformation with (eft-corner pars- 
ing \[12, 9, 1()\]. r 
(t 1TlUSt be noted that the left-reeursion elimination 
procedure can 1)e a*pplied to any \])C(~, whether OP or 
not. Even in the case where the grammar is OP, how 
ever, it wil( not (ead to a terminating parsing algorithm 
unless empty l)roductions have been prea(ably elimi- 
nated from the grammar, a l)roblem wlfirh is shared 
by the usual left-corner parser-interpreter. 
4'Fhe fact that the standard (','FG emptyq)roduction elinfio 
nation transformation is always possible is relal.ed to the fact 
that this transformation does not preserve degrees of ambiguity. 
For instance the infinitely ambiguous grammar S ~ \[b\] A, A 
A, A -+ \[ \] is simplified into the grammar S -+ \[b\]. This type 
of simplification is generally impossible in a I)UG. Consider for 
insl ....... tim "g,' ........... "' s( X ) -~ \[ ....... be,'\] a( X ), a( ....... ( X ) ) --+ 
a(x), ~40) -+ \[\]. 
572he xnethod goes back to a transh)rmation used to compile 
oat. certain local cases of left-reeursionli'om I)CGs in the context 
of the Machine Translation prototyl)e CItlTTER \[3\]. 
6A proof of this fact, baaed on a comparison of prootktrees 
for the original and the transformed grammar, is giwm in \[2\]. 
?His paper does not state termination conditions for the 
transformed program. Such ternfination conditions w(mM prob- 
ably involve some generalized notion of o\[ttine-parsability \[6, 5, 
13\]. By contrast, we prove termlnation only for I)CGs which arc 
OP in the original sense of Pereira and Warren, but this ca.se 
SeelllS to llS tO represent llltlch of the core issue, &lid Lo lead to 
some direct exl.ensions. \],'or instance, the I)CG transformation 
proposed here can I)e directly applied to "guided" programs in 
the sense of \[4\]. 
Dae to the space available, we do not give here col 
rectncss proof~ Jbr the algorithms presented, but ez'peet 
to publish them in a tidier version of this paper. These 
algorithms have actually been implemented in a slightly 
extended version, where the*,/ are also used to decide 
whether the grammar proposed for" transformation is 
in fact oJfline-parsable or not. 
2 Empty-production 
elinlination 
(t can be proven that, if I)CG0 is an OP ()CG, the 
tb(lowing transformation, which involves repeated par- 
tial evaluation of rules that rewrite into the empty 
string, terminates after a finite number of steps and 
produces a grammm: I)CG without empty-l)roductions 
which is equivalent to the initial grammar on non- 
eml)ty strings: s 
inimt: an otllineq)ars~tble DC(-II. 
ontImt: a DCG without empty rules equivalent to DC(I I 
on non-empty strings. 
alg, orithm: 
initialize I,IST1 to a list of the rules of D(X;\[, :;el I,IST2 
to the empty fist. 
while there exists ;m empty rule El/: /l(T|,...,Tk)-,\[\] 
in LISTI do: 
Inove F,R to I,IS'I'2. 
ti)r each rule R: B(...) - + ~ in LIST1 such that (~ 
(:ontains an instance of A(...) (including 
new such rules created inside this loop) do: 
tilr each such instance A(SI .... , Sk) unifiable with 
A(TI, ...,7'k) do: 
~q)pend to 13S'l'l ;~ rule IU: ll(...) • ~ d obtained 
from R by removing A(,ql, .... S'k) 
lrom (~ (or by replacing it with \[\] if this was 
the only nonterminal in or), 
nnd by unifying the Ti's with the ,5'i's. 
set I)C(I to LISTI. 
For instance the grammar consisting in the nine rules 
appearing above the separation in lig. 1 is transformed 
into the grammar (see figure): 
~(,s(N P, v t,)) --÷ ,,v(NP), vv(W'). 
,,.p(',~v(~'~, c')) -+ ,,Up), co,,p(c). 
,,(,,.(Vcovte) ) - ~ \[peovte\]. 
vv(vv(~'(.~te~p), c)) ~ \[,te~v\], eo,~,v(c). 
eo,,V(,'.(C, a)) • eo,,,~,(c), ad,~(A). 
,dv(ad,,(l~e,'e))-. \[t,e~4 
a,tv(adv(todav)) -, \[today\], 
~V('P04,~;o~O), C) -- co.,.V(C). 
,~p(,~p(N, nil)) - ~ ,4N). 
eo,~v(c(,'t, A)) -+ ,,,t4A). 
vp(vp(v(*t~,~p), ,,.it)) -~ N~V\]. 
.q.s("V('~V('4V"")), ,,.it), V t')) -. ~V(V V). 
I~When DCG0 is not OP, the transl\]~rlnatiott ~xlay produce 
an infinite lllll\[lh(!l" Of l'Lll(!8, bill a simple extension of the aid(> 
rithm can detect this situation: the transformathm stops and 
the dr;mimer is decl;~red not to be Ot ). 
1227 
3 Left-recursion elimination 
The transformation can be logically divided into two 
steps: (1) an encoding of DCG into a "generic" form 
DCG', and (2) a simple replacement of a certain group 
of left-recursive rules in DCG' by a certain equivalent 
non left-recursive group of rules, yielding a top-down 
interpretable DCG". An example of the transformation 
DCG ----+ DCG' ----+ DCG" is given in fig. 2. 
The encoding is performed by the following algo- 
rithm: 
input: an oittine-parsable DCG without empty rules. 
output: an equivalent "encoding" DCG'. 
algorithm: 
initialize LIST to a list of the rules of DCG. 
initialize DCG' to the list of rules (literally): 
g(X) --~ g(Y), d(Y, X). 
g(x) -, t(x). 
while there exists a rule R of the form 
A(T1 ..... Tk) --, B(S1 ..... Sl) a in LIST do: 
remove R from LIST. 
add to DCG' a rule R': 
d(B(£'l ..... Sl), A(T1 ..... Tk)) --+ ~', 
where c~ ~ is obtained by replacing any C(V1, ..., Vm) 
in a by g(C(V1, ..., Vm)), 
or is set to \[ \] in the case where oe is empty. 
while there exists a rule R of the form 
A(TI ..... Tk) -+ \[terminal\] ~ in LIST do: 
remove R from LIST. 
add to DCG' a rule R': 
t(A(T1 ..... TI~)) -. \[terminal\] #, 
where cJ is obtained by replacing any C(V1, ..., Vm) 
in ~ by g(6'(V1 ..... Vm)), 
or is set to \[ \] in the ease where c~ is empty. 
~fhe procedure is very simple. It involves the cre- 
ation of a generic nonterminal g(X), of arity one, 
which performs a task equivalent to the original nonter- 
minals s(X1,...,Xn),vp(X1,...,Xra),.... The goal 
g(s(X1,..., Xn)), for instance, plays the same role for 
parsing a sentenee as did the goal s(X1,...,Xn) in 
the original grammar. 
Two further generic nonterminals are introduced: 
fiX) accounts for rules whose right-hand side begins 
with a terminal, while d(Y, X) accounts for rules whose 
right-hand side begins with a nonterminal. The ratio- 
nale behind the encoding is best understood fi'mn the 
following examples, where ~ represents rule rewrit- 
ing: 
vp(vp(v(sleep), C)) -, \[sleep\], comp(C) 
g(vp(vp( v( sleep), C) ) ) ~ \[sleep\], g(comp( C) ) 
g(X) -~ \[sleep\], 
({x : ~p(~p(~(sleep), c))}, g(co.~p(c')) !. 
s(s(NP, VP)) --+ np(NP), vp(VP) 
g(s(s(NP, VP))) ---* g(np(NP)), g(vp(VP)) 
g(X) ~ g(y), 
( {X = s(s(NP, VP)), Y = np(NP)}, g(vp(VP)) ; 
The second example illustrates the role played by 
d(Y, X) in the encoding. This nonterminal has the fol- 
lowing interpretation: X is an"immediate" extension 
of Y using the given rule. In other words, Y corre- 
sponds to an "immediate left-corner" of X. 
The left-recnrsion elimination is now performed by 
the following "algorithm" :9 
intmt: a DCG' encoded as above. 
output: an equivalent non left-recursive DCG". 
algorithm: 
initialize DCG" to DCG'. 
in DCG", replace literally the rules: 
g(X) -~ .q(g), d(Y, X). 
g(X) -~ t(X). 
by the rules: 
g(X) ---+ t(Y), d_tc(Y; X). 
d_tc(X, X) --~ \[ \]. 
d_tc(X, Z) --+ d(X, Y), d_tc(Y, Z). 
In this transformation, the new nonterminal d_tc 
plays the role of a kind of transitive closure of d. It can 
be seen that, relative to DCG", for any string w and 
for any ground term z, the fact that .q(z) rewrites 
into w --or, equivalently, that there exists a ground 
term x such that t(x) d_tc(x,z) rewrites into w-- 
is equivalent to the existence of a sequence of ground 
terms x = xl, ..., xa = z and asequence of strings 
wl, ..., wk such that t(xl) rewrites into wi, d(xl, x2) 
rewrites into w;, ..., d(xk-1, xk) rewrites into we, and 
such that w is the string concatenation w = wl ""wk. 
From our previous remark on the meaning of d(Y, X), 
this can be interpreted as saying that "consituent x is 
a left-corner of constituent z", relatively to string w. 
The grammar DCG" can now be compiled in the 
standard way---via the adjunetion of two "differential 
list" arguments---into a Prolog program which can bc 
executed directly. If we started from an oflline-parsable 
grammar DCGO, this program will enumerate all so- 
lutions to the parsing problem and terminate after a 
finite number of steps. 1° 
References 
\[1\] Marc Dymetman. A Generalized Greibach Nor- 
mal Form for Definite Clause Grammars. In Pro- 
ceedings of the 15th International Conference on 
9In practice, this and the preceding algorithm, which are dis- 
sociated here for exposition reasons, are lumped together. 
1°There exist of course DCGs which do not contain empty 
productions and which are not OP. :\['hey are characterizedby the 
existence of cycles of "chain-rules" of the form: al (...) -+ a2 (...). 
..... am-l(...) -+ am(...). , with am = al. But, if we start 
with an OP DCG0, the empty-production elimination algorithm 
cannot produce such a situation. 
1228 
I,ISTI delete LIST2 
s(s(NP, V P) ) --~ np( N P), vp(V P). 
np(np(N, C)) -+ n(N), comp(C). 
,,(,,.(wopl~) ) -. \[v~ovu,\]. 
n(,~(,jo~)) -~ \[\]. 
~V(~V(,,(~t~.e~), c)) -+ \[~v\], eo.~V(C'). 
comp(c(C, A)) -~ cornp(C), adv(A). 
adv(adv(herc)) -+ \[h,.re\]. 
adv(adv(today)) --+ \[today\]. 
np(nV(,4you) ), C) ~ corny(C). 
rip(rip(N, nil)) -+ n( N). 
comp(c(nil, a)) -+ adv(A). 
'oV(,,p(~(~l~V), hi0) -~ \[.~mV\]. 
,,p(,~(,(yo~)), ,~i~) --, \[\]. 
~(,(,~p(,w(,~(vo,O ), ,~iO, v t,) ) ~ v~,(w,), 
n(,4,,o,,)) --+ \[\]. 
comv(niO -~ \[\]. 
,~v(,,p(,,(vo,*) ), ~il) -. \[\]. 
Figure 1: Empty-production elimination. 
Computational Linguistics, volume 1, pages 366 
372, Nantes, l,'rance, .\]uly 1992. 
\[21\[ Marc l)ymetman. Transformatkms de grammaires 
logiques et rdversibilitd en Traduction Autom~,- 
tique. Th#~se d'Etat, 1992. Universitd Joseph 
Fourier (Grenoble 1), Grenoble, France. 
\[3\] Marc Dymetman and Pien'e }sabelle. Reversible 
logic grarnmars for machine translation. \[n Pro- 
ceedings of the ,5'econd International Uo'l@rence 
on 7'heorelical and Methodological Issues in Ma- 
chine Translation of Natural Languages, Pitts- 
burgh, PA, June 1988. Carnegie Mellon Univer- 
sity. 
\[4:\] Marc l)ymetman, Pierre Isabelle, and Frangois 
Perrault. A symmetrical approach to parsing and 
generation. In Proceedings of the 13lh Interna- 
tional Conference on Computational Liuguislics, 
volume 3, pages 90-96, Itelsinki, August 1!)90. 
\[5\] Andrew tlaas. A generalization of the o\[}tine- 
parsable grammars. In Proceedings of the 27lh 
Annual Meeting of the Association for Computa- 
tional Linguislics, pages 237 42, Vancouw~r, BC, 
Canada, June 1989. 
\[6\] Mark Johnson. Attribute-Value Logic and the 
Theory of Grammar. CSL} lecture note No. 16. 
Center for the. Study of I,anguagc and Informw 
tion, Stanford, CA, 1(.)88. 
\[7\] Mark Johnson. A left-corner program transforn,a- 
tion for natural language parsing, (forthcoming). 
\[8\] R. Kaplan and J. lh:esnan. Lexica\] flmctional 
grammar: a R)rmal system for grammatical rep- 
resentation. In Bresnan, editor, The Men*al \]{cp- 
resenialion of Grammatical ltelations, pages 173 - 
281. MIT Press, Cambridge, MA, 1982. 
\[9\] Y. Matsumoto, II. Tanaka, lI. \]firikawa, 
H. Miyoshi, and I\[. Yasukawa. BUP: a bottom- 
up pm:ser embedded in Pro\[og. New Generalion 
Computing, 1(2):145-158, 1983, 
\[10\] Fermmdo C. N. Pereira and Stuart M. Shieber. 
Ibvlog and Natural Language Analysis. CSI,I lec- 
tm:e note No. 10. Center for tim Study of Language 
and Information, Stan}'ord, CA, 1987. 
lilt I:'ernando C. N. l'ercira and \])avid }}. 71). War- 
ren. Parsing as deduction. In Proceedil~gs of the 
211h Annual Meeling of the Association for Com- 
pulalional Linguistics, pages 137-144, MIT, Cam- 
bridge, MA, June 1983. 
\[12\] D. ,I. l{osencrantz and P. M. Lewis. Deterministic 
left-corner parsing. In Eleventh Annual Sym.po- 
sium on Switching and Automata Theor?/, pages 
139 }53. IEEE, 1970. F, xtended Abstract. 
\[13\] Stuart M. Shieber. Constraint-Based Grammar 
Formalisms. MI'I' Press, Cambridge, MA, 1992. 
7229 
I)CG 
s(s( N P, V P) ) ---, np( N P), vp(V P). 
np(np(N, C)) ~ n(N), comp(C). 
,~(,~(people)) -~ \[p~ople\]. 
vp(vp(v(sleep), C) ) ~ \[sleep\], eomp( C). 
cornp(c(C, A)) -~ comp(C), adv(d). 
ad~(~dv(here)) -~ \[here\]. 
adv(adv(today)) ~ \[today\]. 
np(np(n(you) ), C) --~ comp( C). 
np(,~p(N, ,~il)) -~ ,~(N). 
comp(e(nil, A)) --+ adv(A). 
vp(vp(v(sleep), nil)) ~ \[,sleep\]. 
s(s(np(np(n(you) ), nil), V P)) --+ vp(Y P). 
DCG' 
g(X) -~ g(Y), d(Y, X). 
g(X) -- t(X). 
d(np(NP), s(s(NP, VP))) --~ g(vp(VP)). 
d(n(N), np(np(N, C))) -+ g(comp(C)). 
t(n(n(people) ) ) -+ \[people\]. 
t(vp(vp(~(steep), C) ) ) ~ \[sleep\], g(eomp( C) ). 
d(comp(C), comp(c(C, A))) ~ g(adv(A)). 
t(adv(adv(here))) ~ \[here\]. 
t(adv(adv(today))) --~ \[today\]. 
d(eomp(C), np(np(~(yo,,) ), C) ) -~ \[\]. 
d(,,(N), ,~p(~p(N, nil))) --+ \[\]. 
d(adv(A), corap(e(nil, A))) -~ \[\]. 
d(~p(W), s(s(,~p(np(~(yo~)), nil), VP))) -~ \[ \], 
DCG" 
g(x) -~ t(y), d_te(Y,X). 
d_te(X, X) ~ \[ \]. 
d_te(X, Z) -~ d(X, Y), d_tc(Y, Z). 
d(np(N P), s(s(N P, V P))) -~ g(vp(V P)). 
d(n(g), np(np(g, C))) --+ g(comp(C)). 
t(,(n(people))) -~ \[peopZe\]. 
t(vp(vp(v(sleep), C) ) ) ~ \[sleep\], g(comp( C) ). 
d(comp(C), comp(e(C, A))) ~ a(adv(A)). 
t(adv(adv(here))) -+ \[here\]. 
t(adv(adv(today))) --* \[today\]. 
d(comp(C), np(np(n(you) ), C) ) --+ \[\]. 
d(n(N), np(np(N, nil))) -+ \[ \]. 
d(adv(A), comp(c(nil. A))) --+ \[\]. 
t(vp(vp(v(sleep), nil))) --* \[~leep\]. 
d(vp(V P), s(s(np(np(n(you) ), nil), V P) ) ) --~ \[\]. 
Figure 2: Encoding (DCG') of a grammar (DCG) and left-reeursion elimination (I)CG"). 
1230 
