PARSING AS TREE TRAVERSAL 
Dale Gerdemann* 
Seminar ffir Sprachwissenschaft, Universiti t T bingen t 
ABSTRACT 
This paper presents a unified approach 
to parsing, in which top-down, bottom- 
up and left-corner parsers m:e related 
to preorder, postorder and inorder tree 
traversals. It is shown that the sim- 
plest bottom-up and left-corner parsers 
are left recursive and must be con- 
verted using an extended Greibach nor- 
mal form. With further partial exe- 
cution, the bottom-up and left-corner 
parsers collapse togethe~ as in the I\]IJP 
parser of Matsumoto. 
1 INTRODUCTION 
In this paper, I present a unified ap- 
proach to parsing, in which top-down, 
bottom-up and left-corner parsers are 
related to preorder, postorder and in- 
order tree traversals. To some extent, 
this connection is already clear since 
for each parsing strategy the nodes of 
the parse tree are constructed accord- 
ing to the corresponding tree traversal. 
It is somewhat trickier though, to ac- 
tually use a tree traversa.l program as 
a parser since the resulting pa.rser may 
be left recursive. This left recursion can 
*The research presented in this paper was 
partially sponsored by Teilprojekt Bd "Con- 
straints on Grammar for Efficient Generation" 
of the Sonderforschungsbereich 340 of the 
Deutsche Forschungsgemeinschaft. I wouhl 
also like to thank Guido Minnen and Dieter 
Martini for helpflfl comments. All mistakes 
are of course my own. 
?KI. Wilhelmstr. 113, D-72074 T(ibingen, 
Germany, dg@sfs.nphil.uni-tuebingen.de. 
be eliminated, however, by employing a 
version of Greibach Normal Form which 
is extended to handle argument instan- 
tiations in definite clause grammars. 
The resulting parsers resemble the 
standard Prolog versions of versions of 
such parsers. One can then go one step 
further and partially execute the parser 
with respect to a particular grammar--- 
as is normally done with definite clause 
gra,,nn~a,'s (Per(,ir~ ~ Warren \[JO\]). a 
surprising result of this partial execu- 
tion is l.ha.t the bottom-up and left- 
corner parsers become identical when 
they are 1)oth partially executed. This 
may explain why the BUP parser of 
~/lil.tSllll\]OtO eta\]. \[6\] \[71 was ,'eferre.d tO 
as a bottona-u I) parser even though it 
clearly follows a left-corner strategy. 
TREE TRAVERSAL 
PRO G RAM S 
Following O'Keefe \[8\], we can imple- 
ment i)reorder, postorder and inorder 
tree tra.versals as I)CCs, which will then 
1)e converted directly into top-down 
\])otl.om-u 1) and heft-corner l)arsers, re- 
spectively. The general schema is: 
x ._o r d e r('\]'t'ee) --* 
(x_ordered node labels in Tree). 
Note tha.t in this case, since we are 
most likely to call x_order with the 
Tree va.riable instantiated, we are us- 
ing the DCG in generation mode rather 
tha.n as a parser. When used as a parser 
396 
on the stringlS , the procedure will re- 
turn all trees whose x_order traw~rsal 
produces S. The three, instantiations of 
this procedure are as \['ollows: 
Z preorder traversal 
pre(empty) --> \[\]. 
pre(node(Mother,Left,Right)) --> 
\[Mother\], 
pre(Left), 
pre(Right). 
postorder traversal 
post(empty) --> \[\]. 
post(node(Mother,Left,Right)) --> 
post(Left), 
post(Right), 
\[Mother\]. 
inorder traversal 
in(empty) --> \[\]. 
in(node(Mother,Left,Right)) --> 
in(Left), 
\[Mother\], 
in(Right). 
2.1 DIRECT ENCODING OF 
PARSING STRATEGIES 
Analogous to these three tl'aversal pro- 
grams, there are three parsing strage- 
gies, which differ from the tree traversal 
programs in only two respects. First, 
the base case for a parser should be to 
parse a lexical item rathe,: than to parse 
an empty string. And second, in the re- 
cursive clauses, the mother care.gory fits 
into the parse tree and is licensed by the 
auxiliary predicate rule/3 but it does 
not figure into the string that is parsed. 
As was the case for the three tree 
traversal programs, the three parsers 
differ from each other only with respect 
to the right hand side order. \])'or sim- 
plicity, I assume that phrase structure 
rules are binary branching, though the 
approach can easily be generalized to 
non-bi uary branching. 1 
% top-down parser 
td(node(PreTerm,lf(Word))) --> 
\[Word\], 
{word(PreTerm,Word)}. 
td(node(Mother,Left,Right)) --> 
{rule(Mother,Left,Right)}, 
gd(Left), 
td(Right). 
bottom-up parser 
bu(node(PreTerm,lf(Word))) --> 
\[Word\], 
{word(PreTerm,Word)}. 
bu(node(Mother,Left,Right)) --> 
bu(Left), 
bu(Right), 
{rule(Mother,Left,Right)}. 
Y, left-corner parser 
ic(node(PreTerm,lf (Word))) --> 
\[Word\] , 
{word (Pr eTerm, Word) }. 
ic (node (Mother, Left ,Right) ) --> 
ic(Lef%), 
{rule (Mother, Left, Right) }, 
ic (Right). 
iks seen here the on\]y difference be- 
tween the t\]lree strategies concerns |,he. 
choice of when to select a phrase struc- 
ture rule. 2 Do you start with a. rule and 
then try to satisfy it as iu the top-down 
apl~roa.ch , or do you parse the (laugh- 
t(ers of a. rule. first before selecting the 
rule as in the bottom-up approach, or 
do you l,al(e an inte,'mediate strategy as 
in the left-corner al)l)roach. 
lq'he only ln'oblematic ease is for left corner 
since the corresponding tre.e traw~'rsal inorder 
is normally defined only for bina,'y trees. But 
inorder is easily extended to non-binary trees 
as follows: i. visit the left daughter in inorder, 
ii. visit the mot, her, iii. visit the rest; of the. 
daughters in inorder. 
eAs opposed to, say, ~t choice of whether to 
use operations of expanding and matching or 
operations of shifting and reducing. 
397 
GREIBACH NORMAL 
FORM PARSERS 
While this approach reflects the logic 
of the top-down, bottom-up and left- 
corner parsers in a clear way, the result- 
ing programs are not all usable in Pro- 
log since the bottom-up and the left- 
corner parsers are left-recursive. There 
exists, however, a general technique for 
removal of left-recursion, namely, con- 
version to Oreibach normal form. The 
standard Oreibach normal form conver- 
sion, however, does not allow for I)CG 
type rules, but we can easily take care 
of the Prolog arguments by a technique 
suggested by Problem 3.118 of Pereira 
& Shieber \[9\] to produce what I will 
call Extended Greibach Normal Form 
(ECINF). 3 Pereira & Shieber's idea has 
been more formally presented in the 
Generalized Greibaeh Normal Form of 
Dymetman (\[1\] \[2\]), however, the sim- 
plicity of the parsers here does not jus- 
tify the extra complication in Dymet- 
man's procedure. Using this transfor- 
mation, the bottom-up parser then be- 
comes as follows: 4 
aEGNF is similar to normal GNF except 
that the arguments attached to non-terminals 
must be manipulated so that the original in- 
stantiations are preserved. For specific gram- 
mars, it is pretty e~y to see that such a ma- 
nipulation is possiMe. It is nmch more dif- 
tlcult (and beyond the scope of this paper) 
to show that there is a general rule tbr such 
manipulations. 
4The Greibach NF conversion introduces 
one auxiliary predicate, which (following 
IIopcroft & Ulhnan \[4\]) I have called b. Of 
course, the GNF conversion also does not tell 
us what to do with the auxiliary procedures in 
curly brackets. What I've done here is silnply 
to put these auxiliary procedures in the trans- 
formed grammar in positions corresponding to 
where they occurred in the original grammar. 
It's not clear that one can always find such a 
"corresponding" position, though in the case 
of the bottom-up and left-corner parsers such 
a position is easy to identify. 
% EGNF bottom-up 
bu(node(PreTerm,lf(Word))) --> 
\[Word\], 
{word(PreTerm,Word)}. 
bu(Node) --> 
\[Word\], 
{word(PreTerm,Word)}. 
b(node(PreTerm,lf(Word)),Node). 
b(L,node(Mother,L,R)) --> 
bu(R), 
{rule(gother,L,R)}. 
b(L,Node) --> 
bu(R), 
{rule(Mother,L,g)}, 
b(node(Mother,L,R),Node). 
This, however is not very ef\[icient 
since the two clauses of both bu and 
b differ only in whether or not there 
is a final call to b. ~Ve can reduce 
l.he a.mount of backtracking by encod- 
ing this optiolmlity in the b procedure 
itself. 
% Improved EGNF bottom-up 
bu(Node) --> 
\[Word\], 
{word(PreTerm,Word)}, 
b(node(PreTerm,lf(Word)),Node). 
b(Node,Node) --> \[\]. 
b(L,Node) --> 
bu(R), 
{rule(Mother,L,R)}, 
b(node(Mother,L,R),Node). 
l~y tile same I",GNI: transform~Ltion 
and improvement, s, tile resulting left- 
corner parser is only minimally different 
from the bottom-up parser: 
Improved EGNF Left-corner 
Ic(Node) --> 
\[Word\], 
{word(PreTerm,Word)}, 
b(node(PreTerm,lf(Word)),Node). 
398 
b(Node,Node) --> \[\]. 
b(L,Node) --> 
{rule(Mother,L,g)}, 
Xc(R), 
b(node(Hother,L,R),Node). 
4 PARTIAL EXECUTION 
The improved ECNF bottom-np altd 
left-corner parsers (lilIhr now only in the 
position of the auxiliary l)redicate in 
curly brackets. If this auxiliary pred- 
icate is partially executed out with re- 
spect to a particular gramlnar, the two 
pltrsers will become identical. For ex- 
ample, if we have a rule of the \['orl)l: 
s(tree(s,NP,VP)) --> 
np(RP), 
vp(VP). 
For either parser, this will result in 
one b clause of the form: 
b(np(NP),Node) --> 
lc(vp(VP)), 
b(node(s(tree(s,NP,VP)), 
np(RP),vp(VP)),Node). 
This is essentially eqtfivalent to the 
kind of rules produced by Matsumoto 
et al. (\[6\] \[7\])in their "bottom-up" 
l)arser BUI). s As seen here, Mal, sumo(.o 
et al were not wrong to call their parser 
bottom-ui) , but they could have just as 
well called it left-corner. 
5 CONCLUSION 
In most standard presentations, simple 
top-down, bottom-up and h'.ft-corner 
aThis rule is not precis('.ly the same as (.he 
rules used in BUP since Matsumoto et al. con> 
pile their rules a lltth! further to take adv~tll- 
tage of the first argument and predicate name 
indexing used in Prolog. 
parsers are described in terms of pairs 
c)f op(wations such a.s expand/ma(,c\]l, 
shift/reduce or sprout/nlatch, l{tlt it 
is enl, irely unclear wha.(, expa.nding and 
matching has to do with shifting, re- 
ducing or sprouting. By relating pars- 
ing (.o tree tri~versal, however, it b(:- 
comes much clearer how these three ap- 
proac\]ms 1,o parsing rcbd;e to each other. 
This is a natural comparison, since 
clearly t, he l)OSSiloh: orders in which a 
tree can be traversed should not dif 
f(H' frolll the possible orders in which a 
parse I, ree can be constructed. ~Vhltt's 
new in this paper, however, is tile idea 
gha.(, such tree traversal programs could 
be translated into p~trsers usillg ex- 
tended (',reibach Nor,ha.1 Form. 
Such a unified approach to parsing is 
mostly useful simply (,o understand how 
the different l>arsers are related. It is 
sm'prising Co see, for examph:, that with 
partial executiol L the bottom-up and 
\]el't-cornc.r parsers be('ome, the same. 
The similarity bel;weeu t>ot(,om-u 1) and 
h:ft-corner pa.rsing ha.s caused a certain 
all/Ollllt (If (:onI'usion in the literature. 
l"or example, (,It('. so-calh'd "botton>ui)" 
chart i)arse.r l)resenl,ed (among other 
l)laces) in Cazda.r "~ Me.llish \[3\] in fact 
uses a left-corner strategy. This was 
pointed out by Wiren \[ll\] but has not 
receive(l much attention in the litera- 
I.ure. It is hoped I.ha.1, the unifi('.d ap- 
proa.ch to parsing l)re.seifix:d h(:re will 
hel l) 1,o clear u I> ol, her such confusions. 
Finally, one Inight )nentiol)a co)l-- 
heel.ion to C, ovcrnm('.nt-llinding parsingj 
a.s presented ill ,Iolmson & Stabhn' \[5\]. 
These a.uthors present a generate amd 
test approa.(:h, in which X-bar struc- 
l, lli'es ~llTe ramlomly generated m~d then 
tesl, ed agldnst lIB principles. Once (,he 
logic of the program is expressed in such 
a ma.uner, cfIi('iency considerations are 
used in order to fold the testing pro- 
cedures into the generation procedure. 
399 
One could view the strategy takel~ in 
this paper as rather similar. Running 
a tree traversal program in reverse is 
like randomly generating phrase struc- 
ture. Then these randomly generated 
structures are tested against the con- 
straints, i.e., the phrase structure rules. 
What I have shown here, is that the de- 
cision as to where to fold in the con- 
straints is very significant. Folding in 
the constraints at different positions ac- 
tually gives completely different parsing 
strategies. 
References 
\[1\] Marc Dymetman. A generalized 
greibach normal form for definit;e 
clause grammars. In COLING-92 
vol. I, pages 366-372, 1992. 
\[2\] Marc Dymetman. Tra'asforma- 
tions de Grammaires logiques. Ap- 
plicatios au probIeThc de la re- 
versibilite~n Traduclion A~do'ma- 
tique. PhD thesis, Uniw;rsite/le 
Grenoble, Grenoble, France, 1992. 
The.~e d'Etat. 
\[3\] Gerald Gazdar and Chris Mel- 
lish. Natural Lang~tage Processi.ng 
in Prolo 9. Addison-Wesley, Read- 
ing, Mass, 1989. 
\[4\] John Itopcroft and .)effrcy lJlhmm. 
Introduction to Automata 7'h,c- 
ory and Computation. Addison- 
Wesley, Reading, Mass, 197!). 
\[5\] Mark Johnson and Edward Sta- 
bler, 1993. Lecture Notes for 
Course taught at the LSA Summer 
School in Columbus Ohio. 
\[6\] Y. Matsumoto, H. tIirakawa., 
I{ Miyoshi, and I1 Yasukawa. Bup: 
A bottom-up parser embedded in 
prolog. New Ceneration Comp~tl- 
ing, 1(2):145-158, 11983. 
\[7\] 
Is\] 
\[10\] 
\[11\] 
Yuji Matsumoto. Natwral Lan- 
guage Parsin 9 Systems baaed on 
Logic Programming. PM) thesis, 
Kyoto University, 1989. 
Richard O'Keefe. The Craft of 
Prolog. MIT Press, Cambridge, 
Mass, 1990. 
Fernando C. N. Pereira and Stu- 
art Shieber. ProIo 9 and Natural 
Language Analysis. CSLI Lecture 
Notes No. 10. Chicago University 
Press, Chicago, 1987. 
Fernando C. N. Pereira and David 
lI. 1). W~m:en. Definite clause 
grammars-a surw'.y of the formal- 
ism and a comparison with aug- 
mented transition networks. ArliJi- 
cial \['ntelligence , 13:231-278, 1980. 
Also in Grosz et. al., :1986. 
IVlats \Viren. A comparison of rule- 
invocation strategies in context- 
free chart parsing. In EACL 
Proceedings, 3lh Annual Meeting, 
l)ages 226-233, 11987. 
400 
