The Structure of Shared Forests 
in Ambiguous Parsing 
Sylvie Billot t" Bernard Lang* 
INRIA 
rand Universit~ d'Orl~ans 
billotGinria.inria.fr langGinria.inria.fr 
Abstract 
The Context-Free backbone of some natural language ana- 
lyzers produces all possible CF parses as some kind of shared 
forest, from which a single tree is to be chosen by a disam- 
biguation process that may be based on the finer features of 
the language. We study the structure of these forests with 
respect to optimality of sharing, and in relation with the 
parsing schema used to produce them. In addition to a theo- 
retical and experimental framework for studying these issues, 
the main results presented are: 
- sophistication in chart parsing schemata (e.g. use of 
look-ahead) may reduce time and space efficiency instead of 
improving it, 
- there is a shared forest structure with at most cubic size 
for any CF grammar, 
- when O(n 3) complexity is required, the shape of a shared 
forest is dependent on the parsing schema used. 
Though analyzed on CF grammars for simplicity, these re- 
sults extend to more complex formalisms such as unification 
based grammars. 
Key words: Context-Free Parsing, Ambiguity, Dynamic 
Programming, Earley Parsing, Chart Parsing, Parsing 
Strategies, Parsing Schemata, Parse Tree, Parse Forest. 
1 Introduction 
Several natural language parser start with & pure Conte~zt. 
Free (CF) backbone that makes a first sketch of the struc- 
ture of the analyzed sentence, before it is handed to a more 
elaborate analyzer (possibly a coroutine), that takes into ac- 
count the finer grammatical structure to filter out undesir- 
able parses (see for example \[24,28\]). In \[28\], Shieber sur- 
veys existing variants to this approach before giving his own 
tunable approach based on restrictions that ~ split up the 
infinite nonterminal domain into a finite set of equivalence 
classes that can be used for parsing". The basic motivation 
for this approach is to benefit from the CF parsing technol- 
ogy whose development over 30 years has lead to powerful 
and ei~cient parsers \[I,7\]. 
A parser that takes into account only an approximation of 
the grammatical features will often find ambiguities it can- 
not resolve in the analyzed sentences I. A natural solution 
*Address: INRIA, B.P. 105, 78153 Le Chesn~y, France. 
The work reported here was partially supported by the Eureka 
Software Factory project. 
1 Ambiguity may also have a semantical origin." 
is then to produce all possible parses, according to the CF 
backbone, and then select among them on the basis of the 
complete features information. One hitch is that the num- 
ber of parses may be exponential in the size of the input 
sentence, or even infuite for cyclic grammars or incomplete 
sentences \[16\]. However chart parsing techniques have been 
developed that produce an encoding of all possible parses as 
a data structure with a size polynomial in the length of the 
input sentence. These techniques are all based on a dynamic 
programming paradigm. 
The kind of structure they produce to represent all parses 
of the analyzed sentence is an essential characteristic of these 
algorithm. Some of the published algorithms produce only 
a chart as described by Kay in \[14\], which only associates 
nonterminal categories to segments of the analyzed sentence 
\[11,39,13,3,9\], and which thus still requires non-trivial pro- 
ceasing to extract parse-trees \[26\]. The worst size complex- 
ity of such a chart is only a square function of the size of the 
input 2. 
However, practical parsing algorithms will often produce a 
more complex structure that explicitly relates the instances 
of nonterminals associated with sentence fragments to their 
constituents, possibly in several ways in case of ambiguity, 
with a sharing of some common subtrees between the distinct 
ambiguous parses \[7,4,24,31,25\] ~ 
One advantage of this structure is that the chart retains 
only these constituents that can actually participate in a 
parse. Furthermore it makes the extraction of parse-trees 
a trivial matter. A drawback is that this structure may be 
cubic in the length of the parsed sentence, and more gener- 
ally polynomial' for some proposed algorithms \[31\]. How- 
ever, these algorithms are rather well behaved in practice, 
and this complexity is not a problem. 
In this paper we shall call shared forests such data struc- 
2 We do not consider CF reco~zers that have asymptotically 
the lowest complexity, but are only of theoretical interest here 
\[~S,5\]. 
3 There are several other published implementation of chart 
parsers \[23,20,33\], hut they often do not give much detail on the 
output of the parsing process, or even side-step the problem ~1. 
together \[33\]. We do not consider here the well .formed s~bs~ring fablea 
of Shell \[26\] which falls somewhere in between in our classi- 
ficgtlon. They do not use pointers and parse-trees are only "indi- 
rectly" visible, but may be extracted rather simply in linear time. 
• The table may contain useless constituents. 
4 Space cubic algorithms often require the lan~tage grammar to 
be in Chomsky Normal Form, and some authors have incorrectly 
conjectured tha~ cubic complexity cannot he obtained otherwise. 
143 
tures used to represent simultaneously all parse trees for a 
given sentence. 
Several question• may be asked in relation with shared 
forests: 
• How to construct them during the parsing process? 
• Can the cubic complexity be attained without modify- 
ing the grammar (e.g. into Chomsky Normal Form)? 
s What is the appropriate data structure to improve 
sharing and reduce time and space complexity? 
• How good is the sharing of tree fragments between 
ambiguous parses, and how can it be improved? 
• Is there a relation between the coding of parse-trees in 
the shared forest and the parsing schema used? 
• How well formalized is their definition snd construc- 
tion? 
These questions are of importance in practical systems 
because the answers impact both the performance and the 
implementation techniques. For example good sharing may 
allow a better factorization of the computation that filters 
parse trees with the secondary features of the language. The 
representation needed for good sharing or low space com- 
plexity may be incompatible with the needs of other com- 
ponents of the system. These components may also make 
assumptions about this representation that are incompatible 
with some parsing schemata. The issue of formalization is of 
course related to the formal tractability of correctness proof 
for algorithms using shared forests. 
In section 2 we describe a uniform theoretical framework in 
which various parsing strategies are expressed and compared 
with respect to the above questions. This approach has been 
implemented into a system intended for the experimental 
study and comparison of parsing strategies. This system is 
described in section 3. Section 4 contain~ a detailed example 
produced with our implementation which illustrates both the 
working of the system and the underlying theory. 
2 A Uniform Framework 
To discus• the above issue• in a uniform way, we need a gen- 
era\] framework that encompasses all forms of chart parsing 
and shared forest building in a unique formalism. We shall 
take a• a l~sk a formalism developed by the second author 
in previous papers \[15,16\]. The idea of this approach is to 
separate the dynamic programming construct• needed for ef- 
ficient chart parsing from the chosen parsing schema. Com- 
parison between the classifications of Kay \[14\] and Gritfith & 
Petrick \[10\] shows that a parsing schema (or parsing strat- 
egy) may be expressed in the construction of a Push-Down 
Transducer (PDT), a well studied formalization of left-to- 
right CF parsers 5. These PDTs are usually non-deterministic 
and cannot be used as produced for actual parsing. Their 
backtrack simulation does not alway• terminate, and is often 
time-exponential when it does, while breadth-first simula- 
tion is usually exponential for both time and space. However, 
by extending Earley's dynamic programming construction to 
PDTs, Long provided in\[15\] a way of simulating all possible 
computations of any PDT in cubic time and space complex- 
s Grifllth & Petrick actually use Turing ma,'hines for pedagog- 
ical reasons. 
ity. This approach may thus be used as a uniform framework 
for comparing chart parsers s. 
2.1 The algorithm 
The following is a formal overview of parsing by dynamic 
programming interpretation of PDT•. 
Our ahn is to parse sentences in the language £(G) gen- 
erated by a CF phrase structure grammar G -- (V, ~, H, N) 
according to its syntax. The notation used is V for the set 
of nontermln~l, ~ for the set of terminals, H for the rules, 
for the initial nonterminal, and e for the empty string. 
We assume that, by some appropriate parser construction 
technique (e.g. \[12,6,1\]) we mechanically produce from the 
grammar G a parser for the language £(G) in the form of 
a (possibly non-deterministic) push.down transducer (PDT) 
T G. The output of each possible computation of the parser 
is a sequence of rules in rl ~ to be used in a left-to-right 
reduction of the input sentence (this is obviously equivalent 
to producing a parse-tree). 
We assume for the PDT T G a very general formal defini- 
tion that can fit most usual PDT construction techniques. It 
is defined as an 8-tuple T G -- (Q, \]~, A, H, 6, ~, ;, F) where: 
Q is the set of states, ~ is the set of input word symbols, A 
is the set of stack symbols, H is the set of output symbols s 
(i.e. rule• of G), q is the initial state, $ is the initial stack 
symbol, F is the set of final states, 6 is a fnite set of tran- 
sitions of the form: (p A a ~-* q B u) with p, q E Q, 
x,s ¢ A u {e}, a E ~: u {~}, and . ~ H*. 
Let the PDT be in a configuration p -- (p Aa az u) where 
p is the current state, Aa is the •tack contents with A on 
the top, az is the remaining input where the symbol a is the 
next to be shifted and z E ~*, and u is the already produced 
output. The application of a transition r = (p A a ~-* qB v) 
result• in a new configuration p' ---- (q Bot z uv) where the 
terminal symbol a has been scanned (i.e. shifted), A has been 
popped and B has been pushed, and t, has been concatenated 
to the existing output ,~ If the terminal symbol a is replaced 
by e in the transition, no input symbol is scanned. If A (reap. 
B) is replaced by • then no stack symbol is popped from (resp. 
pushed on) the •tack. 
Our algorithm consist• in an Earley-like 9 simulation of the 
PDT T G. Using the terminology of \[1\], the algorithm builds 
an item set ,~ successively for each word symbol z~ holding 
position i in the input sentence z. An item is constituted 
of two modes of the form (p A i) where p is a PDT state, 
A is a stack symbol, and i.is the index of an input symbol. 
The item set & contains items of the form ((p A i) (q B j)) . 
These item• are used as nontermineds of an output grammar 
S The original intent of \[15\] was to show how one can generate 
efficient general CF chart parsers, by first producing the PDT with 
the efllcient techniques for deterministic parsing developed for the 
compiler technology \[6,12,1\]. This idea was later successfu/ly used 
by Tomits \[31\] who applied it to LR(1) parsers \[6,1\], and later to 
other puelulown based parsers \[32\]. 
7 Implomczxtations usually dc~ote these rules by their index in 
the set rl. 
s Actual implementations use output symbols from rIu~, since 
rules alone do not distinguish words in the same lexical category. 
s We asmune the reader to be familiar with some variation of 
Earley's algorithm. Earley's original paper uses the word stere 
(from dynamic programming terminology) instead of item. 
144 
= (8, l'I, ~, U~), where 8 is the set of all items (i.e. the 
union of &), and the rules in ~ are constructed together 
with their left-hand-side item by the algorithm. The initial 
nonterminal Ut of ~ derives on the last items produced by a 
successful computation. 
Appendix A gives the details of the construction of items 
and rules in G by interpretation of the transitions of the PDT. 
More details may be found in \[15,16\]. 
2.2 The shared forest 
An apparently major difference between the above algorithm 
and other parsers is that it represents a parse as the string of 
the grammar rules used in a leftmost reduction of the parsed 
sentence, rather than as a parse tree (cf. section 4). When 
the sentence has several distinct paxses, the set of all possi- 
ble parse strings is represented in finite shared form by a CF 
grammar that generates that possibly infinite set. Other 
published algorithms produce instead a graph structure rep- 
resenting all paxse-trees with sharing of common subpaxts, 
which corresponds well to the intuitive notion of a shared 
forest. 
This difference is only appearance. We show here in sec- 
tion 4 that the CF grammar of all leftmost parses is just a 
theoretical formalization of the shared.forest graph. Context- 
Free grammars can be represented by AND-OR graphs that 
are closely related to the syntax diagrams often used to de- 
scribe the syntax of programming languages \[37\], and to the 
transition networks of Woods \[22\]. In the case of our gram- 
mar of leftmost parses, this AND-OR graph (which is acyclic 
when there is only finite ambiguity) is precisely the shaxed- 
forest graph. In this graph, AND-nodes correspond to the 
usual parse-tree nodes, whil~ OR-nodes correspond to xmbi- 
guities, i.e. distinct possible subtrees occurring in the same 
context. Sharing ofsubtrees in represented by nodes accessed 
by more than one other node. 
The grammar viewpoint is the following (cf. the example 
in section 4). Non-terminal (reap. terminal) symbols corre- 
spond to nodes with (reap. without) outgoing arcs. AND- 
nodes correspond to right-hand sides of grammar rules, and 
OR-nodes (i.e. ambiguities) correspond to non-terminals de- 
fined by several rules. Subtree sharing is represented by seVo 
eral uses of the same symbol in rule right-hand sides. 
To our knowledge, this representation of parse-forests as 
grammars is the simplest and most tractable theoretical for- 
malization proposed so far, and the parser presented here is 
the only one for which the correctness of the output gram- 
mar -- i.e. of the shared-forest -- has ever been proved. 
Though in the examples we use graph(ical) representations 
for intuitive understanding (grammars axe also sometimes 
represented as graphs \[37\]), they are not the proper formal 
tool for manipulating shared forests, and developing formal- 
ized (proved) algorithms that use them. Graph formalization 
is considerably more complex and awkward to manipulate 
than the well understood, specialized and few concepts of 
CF grammars. Furthermore, unlike graphs, this grammar 
formalization of the shared forest may be tractably extended 
to other grammatical formalisms (ct: section 5). 
More importantly, our work on the parsing of incomplete 
sentences \[16\] has exhibited the fundamental character of 
our grammatical view of shared forests: when parsing the 
completely unknown sentence, the shared forest obtained is 
precisely the complete grammar of the analyzed language. 
This also leads to connections with the work on partial eval- 
nation \[8\]. 
2.3 The shape of the forest 
For our shared-forest, x cubic space complexity (in the worst 
case -- space complexity is often linear in practice) is 
achieved, without requiring that the language grammar be in 
Chonmky Normal Form, by producing a grammar of parses 
that has at most two symbols on the right-hand side of its 
rules. This amounts to representing the list of sons of a parse 
tree node as a Lisp-like list built with binary nodes (see fig- 
ures 1 L- 2), and it allows partial sharing of the sons i0 
The structure of the parse grammar, i.e. the shape of the 
parse forest, is tightly related to the parsing schema used, 
hence to the structure of the possible computation of the 
non-deterministic PDT from which the parser is constructed. 
First we need a precise characterization of parsing strategies, 
whose distinction is often blurred by superimposed optimiza- 
tions. We call bottom-up a strategy in which the PDT 
decides on the nature of a constituent (i.e. on the grammar 
rule that structures it), after having made this decision first 
on its subconstituents. It corresponds to a postfix left-to- 
right walk of the parse tree. Top-Down parsing recognizes 
a constituent before recognition of its subconstituents, and 
corresponds to a prefix walk. Intermediate strategies are also 
possible. 
The sequence of operations of a bottom-up parser is basi- 
cally of the following form (up to possible simplifying oi>. 
timizations): To parse a constituent A, the parser first 
parses and pushes on the stack each sub-constituent B~; at 
some point, it decides that it has all the constituents of 
A on the stack and it pops them all, and then it pushes 
A and outputs the (rule number ~- of the) recognized rule 
f : A -* Bl ... Bn,. Dynamic programming interpretation 
of such a sequence results in a shared forest containing parse- 
trees with the shape described in figure 1, i.e. where each 
node of the forest points to the beginning of the llst of its 
sons. 
A top-down PDT uses a different sequence of operations, 
detailed in appendix B, resulting in the shape of figure 2 
where a forest node points to the end of the list of sons, which 
is itself chained backward. These two figures are only simple 
examples. Many variations on the shape of parse trees and 
forests may be obtained by changing the parsing schema. 
Sharing in the shared forest may correspond to sharing of 
a complete subtree, but also to sharing of a tail of a llst of 
sons: this is what allows the cubic complezity. Thus bottom- 
up parsing may share only the rightmost subconstituents of a 
constituent, while top-down parsing may share only the left- 
most subconstituents. This relation between parsing schema 
and shape of the shared forest (and type of sharing) is a con- 
sequence of intrinsic properties of chart parsing, and not of 
our specific implementation. 
It is for example to be expected that the bidirectional na- 
ture of island parsing leads to irregular structure in shared 
forests, when optimal sharing is sought for. 
3 Implementation and Experimental 
Results 
The ideas presented above have been implemented in an ex- 
perimental system called Tin (after the woodman of OZ). 
10 This was noted by Shell \[26\] and is implicit in his use of "2- 
form ~ grammars. 
145 
A A 
Figure 1: Bottom-up parse-tree 
' 
Figure 2: Top-down parse-tree 
The intent is to provide a uniform f~amework for the con- 
struction and experimentation of chart parsers, somewhat 
as systems like MCHART \[29\], but with a more systematic 
theoretical foundation. The kernel of the system is a virtual 
parsing machine with a stack and a set of primitive com- 
mands corresponding essentially to the operation of a practi- 
cal Push-Down Transducer. These commands include for ex- 
ample: push (resp. pop) to push a symbol on the stack (reap. 
pop one), check~indow to compare the look-ahead symbol(s) 
to some given symbol, chsckstack to branch depending on 
the top of the sta~k, scan to read an input word, outpu$ to 
output a rule number (or a terminal symbol), goto for uncon- 
ditional jumps, and a few others. However theae commands 
are never used directly to program parsers. They are used as 
machine instructions for compilers that compile grammatical 
definitions into Tin code according to some parsing schema. 
A characteristic of these commands is that they may all be 
marked as non-determlnistic. The intuitive interpretation is 
that there is a non-deterministic choice between a command 
thus marked and another command whose address in the 
virtual machine code is then specified. However execution of 
the virtual machine code is done by an all-paths interpreter 
that follows the dynamic programming strategy described in 
section 2.1 and appendix A. 
The Tin interpreter is used in two different ways: 
1. to study the effectiveness for chart parsing of known 
parsing schemata designed for deterministic parsing. 
We have only considered formally defined parsing 
schemata, corresponding to established PDA construc- 
tion techniques that we use to mechanically translate 
CF grammars into Tin code. (e.g. LALR(1) and 
LALR(2) \[6\], weak precedence \[12\], LL(0) top-down 
(recursive descent), LR(0), LR(1) \[1\] ...). 
2. to study the computational behavior of the generated 
code, and the optimization techniques that could be 
used on the Tin code -- and more generally chart 
parser code -- with respect to code size, execution 
speed and better sharing in the parse forest. 
Experimenting with several compilation schemata has 
shown that sophistication may have a negative effect on the 
ej~iciency of all-path parsin911 . Sophisticated PDT construc- 
tion techniques tend to multiply the number of special cases, 
thereby increasing the code size of the chart parser. Some- 
times it also prevents sharing of locally identical subcom- 
putations because of differences in context analysis. This 
in turn may result in lesser sharing in the parse forest and 
sometimes longer computation, as in example $BBL in ap- 
pendix C, but of course it does not change the set of parse- 
trees encoded in the forest 12. Experimentally, weak prece- 
dence gives slightly better sharing than LALR(1) parsing. 
The latter is often v/ewed as more efficient, whereas it only 
has a larger deterministic domain. 
One essential guideline to achieve better sharing (and often 
also reduced computation time) is to try to recognize every 
grammar rule in only one place of the generated chart parser 
code, even at the cost of increasing non-determinism. 
Thus simpler schemata such as precedence, LL(0) (and 
probably LR(0) I~) produce the best sharing. However, since 
they correspond to a smaller deterministic domain within the 
CF grammar realm, they may sometimes be computationally 
less efficient because they produce a larger number of useless 
items (Le. edges) that correspond to dead-end computational 
paths. 
Slight sophistication (e.g. LALR(1) used by Tomita in 
\[31\], or LR(1) ) may slightly improve computational per- 
formance by detecting earlier dead-end computations. This 
may however be at the expense of the forest sharing quality. 
More sophistication (say LR(2)) is usually losing on both 
accounts as explained earlier. The duplication of computa- 
tional pgths due to distinct context analysis overweights the 
11 We mean here the sophistication of the CF parser construc- 
tion technique rather than the sophistication of the language fea- 
tures chopin to be used by this parser. 
l~ This negative behavior of some techniques originally intended 
to preserve determlni~n had beam remarked and analyzed in a 
special case by Bouckaert, Pirotte and Shelling \[3\]. However we 
believe their result to be weaker than ours, since it seems to rely 
on the fact that they directly interpret ~'anuuars rather than first 
compile them. Hence each interpretive step include in some sense 
compilation steps, which are more expensive when look-ahead is 
increased. Their paper presents several examples that run less ef- 
ficiently when look-ahead is increased. For all these examples, this 
behavior disappears in our compiled setting. However the gram- 
mar SBBL in appendix C shows a loss of eltlciency with increased 
look-ahead that is due exclusively to loss of sharing caused by ir- 
relevant contextual distinctions. This effect is particularly visible 
when parsing incomplete sentences \[16\]. 
Eiticiency loss with increased look-ahead is mainly due to state 
splitting \[6\]. This should favor LALR techniques ova- LR ones. 
is Our resnlts do not take into account a newly found optimiza- 
tion of PDT interpretation that applies to all and only to bottom- 
up PDTs. This should make simple bottom-up schemes compet- 
itive for sharing quality, and even increase their computational 
ei~ciency. However it should not change qualitatively the rela- 
tive performances of bottom-up parsers, and n~y emphasize even 
more the phenomenon that reduces efficiency when look-ahead in- 
146 
benefits of early elimination of dead-end paths. But there 
can be no absolute rule: ff a grammar is aclose" to the LR(2) 
domain, an LR(2) schema is likely to give the best result for 
most parsed sentences. 
Sophisticated schemata correspond also to larger parsers, 
which may be critical in some natural language applications 
with very large grammars. 
The choice of a parsing schema depends in fine on the 
grammar used, on the corpus (or kind) of sentences to be an- 
alyzed, and on a balance between computational and sharing 
efficiency. It is best decided on an experimental basis with 
a system such as ours. Furthermore, we do not believe that 
any firm conclusion limited to CF grammars would be of 
real practical usefulness. The real purpose of the work pre- 
sented is to get a qualitative insight in phenomena which 
are best exhibited in the simpler framework of CF parsing. 
This insight should help us with more complex formalisms 
(cf. section 5) for which the phenomena might be less easily 
evidenced. 
Note that the evidence gained contradicts the common be- 
lid that parsing schemata with a large deterministic domain 
(see for example the remarks on LR parsing in \[31\]) are more 
effective than simpler ones. Most experiments in this area 
were based on incomparable implementations, while our uni- 
form framework gives us a common theoretical yardstick. 
4 A Simple Bottom-Up Example 
The following is a simple example based on a bottom-up 
PDT generated by our LALR(1) compiler from the following 
grammar taken from \[31\]: 
I (0) '$ax ::= $ 's $ (1) 's ::= 'up 'vp (2) 'e ::- 's 'pp (3) 'up ::= n (4) 'up ::- det n (5) 'up ::- 'up 'pp (6) 'pp ::- prep 'up (7) 'vp ::= v 'up 
Nonterminals are prefixed with a quote symbol The first 
rule is used for initialization and handlhg of the delimiter 
symbol 8. The $ delimiters are implicit in the actual input 
sentence. 
The sample input is a(n v det n prep n) ~. It figures 
(for example) the sentence: aT see a man at home ~. 
4.1 Output grammar produced by the parser 
The grammar of parses of the input sentence is given in fig- 
ure 3. 
The initial nonterminal is the left-hand side of the first 
rule. For readability, the nonterminals have been given com- 
puter generated names of the form at2, where z is an integer. 
All other symbols are terminal. Integer terminals correspond 
to rule numbers of the input language grammar given above, 
and the other terminals are symbols of the parsed language, 
except fo r the special terminal %i1" which indicates the end 
of the list of subconstituents of a sentence constituent, and 
may also be read as the empty string ~. Note the ambiguity 
for nontermlnal at4. 
It is possible to simplify this grammar to 7 rules without 
losing the sharing of common subparses. However it would 
no longer exhibit the structure that makes it readable as a 
shared-forest (though this structure could be retrieved). 
nt0 ::= ntl 0 ntl9 ::= nt20 nil 
ntl ::= nt2 nt3 nt20 ::= n 
nt2 ::- $ at21 ::- nt22 nil 
nt3 ::= nt4 nt37 nt22 ::= nt23 6 
at4 ::= at5 2 at23 ::= at24 nt25 
nt4 ::= nt29 1 nt24 ::= prep 
nt5 ::= nt6 nt21 nt25 ::= nt26 nil 
nt6 ::= nt7 1 nt26 ::= nt27 3 
nit ::= nt8 ntll nt27 ::= nt28 nil 
at8 ::- at9 3 nt28 ::= n 
nt9 ::=ntlO nil at29 ::- nt8 nt30 
nil0 ::- n at30 ::= nt31 nil 
ntll ::- nil2 nil at31 ::= at32 7 
nil2 ::= nil3 7 at32 ::= nil4 at33 
n~13 ::= nil4 nil5 nt33 ::= nt34 nil 
nil4 ::- v at34 ::= nt35 5 
nt15 ::= nil6 nil nt35 ::= nil6 nt36 
nil6 ::= at17 4 nt36 ::= nt22 nil 
nil7 ::= ntl8 ntl9 nt37 ::= nt38 nil 
nt18 ::= det nt38 ::= $ 
Figure 3: Grammar of parses of the input sentence 
The two parses of the input sentence defined by this gram- 
mar are: $ n 3 v det n 4 7 1 prep n 3 6 2 $ 
$ n 3 vdet n 4 prepn 3 6 5 7 1 $ 
Here again the two $ symbols must be read as delimiters. 
The ~1" symbols, no longer useful, have been omitted in 
these two parses. 
4.2 Parse shared-forest constructed fi'om that 
grnnalxlar 
To explain the structure of the shared forest, we first build a 
graph from the grammar, as shown in figure 4. Each node 
corresponds to one terminal or nonterminal of the grammar 
in figure 3, and is labelled by it. The labels at the right 
of small dashes are rule numbers from the parsed language 
grammar (see beginning of section 4). The basic structure is 
that of figure 1. 
From this first graph, we can trivially derive the more tra- 
ditional shared forest given in figure 5. Note that this simpli- 
fied representation is not always adequate since it does not 
allow partial sharing of their sons between two nodes. Each 
node includes a label which is a non-terminal of the parsed 
language grammar, and for each possible derivation (several 
in case of ambiguity) there is the number of the grammar 
rule used for that derivation. Though this simplified version 
is more readable, the representation of figure 5 is not ade- 
quate to represent partial sharing of the subconstituents of 
a constituent. 
Of course, the ~constructions ~ given in this section are 
purely virtual. In an implementation, the data-structure rep- 
resenting the grammar of figure 3 may be directly interpreted 
and used as a shared-forest. 
A similar construction for top-down parsing is sketched in 
appendix B. 
147 
0--@ 
I 
1~ 3 37"~ I" 
2 38 $ $ 
5 21 .--\]l. 29 ~ 30--~" 
"-r' 
i. ° "-') • 'f-' 9~. 13--f-~ ~ -~," z.t -I, 3s 3e--l," 
10 14 16B4 28 
rt V \] fl E 
I 
17 ~ 19"-~' I I 
18 20 
det n 
Figure 4: Graph of the output grammar 
NP 
4 
n v det n 
PP 
6 
prep n., 
Figure 5: The shared forest 
5 Extensions 
As indicated earlier, our intent is mostly to understand phe- 
nomena that would be harder to evidence in more complex 
grammatical formalisms. 
This statement implies that our approach can be extended. 
This is indeed the case. It is known that many simple parsing 
schemata can be expressed with stack based machines \[32\]• 
This is certainly the case for M! left-to-right CF chart parsing 
schemata. 
We have formally extended the concept of PDA into that 
of Logical PDA which is an operational push-down stack de- 
vice for parsing unification based grammars \[17,18\] or other 
non-CF grammars such as Tree Adjoining Grammars \[19\]. 
Hence we axe reusing and developing our theoretical \[18\] and 
experimental \[38\] approach in this much more general set- 
ting which is more likely to be effectively usable for natural 
language parsing. 
Furthermore, these extensions can also express, within the 
PDA model, non-left-to-fight behavior such as is used in is- 
land parsing \[38\] or in Shei\]'s approach \[26\]• More generally 
they allow the formal analysis of agenda strategies, which 
we have not considered here. In these extensions, the coun- 
terpart of parse forests are proof forests of definite clause 
programs. 
6 Conclusion 
AnMysis of Ml-path parsing schemata within a common 
framework exhibits in comparable terms the properties of 
these schemata, and gives objective criteria for chosing a 
given schema when implementing a language analyzer. The 
approach taken here supports both theoreticM analysis and 
actuM experimentation, both for the computational behavior 
of pLmers and for the structure of the resulting shared forest. 
Many experiments and extensions still remain t 9 be made: 
improved dynamic programming interpretation of bottom- 
up parsers, more extensive experimental measurements with 
a variety of languages and parsing schemata, or generaliza- 
tion of this approach to more complex situations, such as 
word lattice parsing \[21,30\], or even handling of "secondary" 
language features. Early research in that latter direction is 
promising: our framework and the corresponding paradigm 
for parser construction have been extended to full first-order 
Horn clauses \[17,18\], and are hence applicable to unification 
based grammatical formalisms \[27\]. Shared forest construc- 
tion and analysis can be generalized in the same way to these 
more advanced formalisms. 
Acknowledgements: We are grateful to V~ronique 
Donzeau-Gouge for many fruitful discussions. 
This work has been partially supported by the Eureka 
Software Factory (ESF) project. 
References 
\[1\] Aho, A.V.; and Ullman, J.D• 1972 The Theory 
of Parsing, Trar~lation and Compiling. Prentice- 
Hall, Englewood Cliffs, New Jersey. 
\[2\] Billot~ S. 1988 Analyseurs Syntaxiques et Non. 
D6terminigme. Th~se de Doctorat, Universit~ 
d'Ofl~ns la Source, Orleans (France). 
148 
\[3\] Bouckaert, M.; Pirotte, A~; and Sn~lllng, M. 1975 
Efficient Parsing Algorithms for General Context- 
Free Grammars. Information Sciences 8(1): 1-26 
\[4\] Cooke, J.; ~nd Schwartz, J.T. 1970 Programming 
Languages and Their Compilers. Courant Insti- 
tute of Mathematical Sciences, New York Univer- 
sity, New York. 
\[5\] Coppersmith, D.; and Winograd, S. 1982 On the 
Asymptotic Complexity of Matrix Multiplication. 
SIAM Journal on Computing, 11(3): 472-492. 
\[6\] DeRemer, F.L. 1971 Simple LR(k) Grammars. 
Communications A CM 14(7): 453-460. 
\[7\] Earley, J. 1970 An Efficient Context-Free Parsing 
Algorithm. Communications ACM 13(2): 94-102. 
\[8\] Fntamura, Y. (ed.) 1988 Proceedings of the Work- 
shop on Paxtial Evaluation and Mixed Computa- 
tion. New Generation Computing 6(2,3). 
\[9\] Graham, S.L.; Harrison, M.A.; and Ruzzo W.L. 
1980 An Improved Context-Free Recognizer. A CM 
Transactions on Programming Languages and Sys- 
tems 2(3): 415-462. 
\[10\] Griffiths, L; and Petrick, S. 1965 On the Relative 
Efficiencies of Context-Free Grammar Recogniz- 
ers. Communications A CM 8(5): 289-300. 
\[11\] Hays, D.G. 1962 Automatic Language-Data Pro- 
ceesing. In Computer Applications in the Behav- 
ioral Sciences, (H. Borko ed.), Prentice-Hall, pp. 
394-423. 
\[12\] Ichbiah, J.D.; and Morse, S.P. 1970 A Technique 
for Generating Almost Optimal Floyd-Evans Pro- 
ductions for Precedence Grammars. Communica- 
tions ACM 13(8): 501-508. 
\[13\] Kuami, J. 1965 An E~icient Recognition and 
Slmtax Analysis Algorithm .for Context-Free Lan. 
geages. Report of Univ. of Hawaii, also AFCRL- 
65-758, Air Force Cambridge Research Labor~- 
tory, Bedford (Massachusetts), also 1968, Univer- 
sity of Illinois Coordinated Science Lab. Report, 
No. R-257. 
\[14\] Kay, M. 1980 Algorithm Schemata and Data 
Structures in Syntactic Processing. Proceedings oy 
the Nobel Symposium on Text Processing, Gothen- 
burg. 
\[15\] Lung, B. 1974 Deterministic Techniques for Effi- 
cient Non-deterministic Parsers. Proc. oy the 2 "~ 
Colloquium on Automata, Languages and Pro- 
gramming, J. Loeckx (ed.), Saarbrflcken, Springer 
Lecture Notes in Computer Science 14: 255-269. 
Also: Rapport de Recherche 72, IRIA-Laboris, 
Rocquencourt (France). 
\[16\] Lung, B. 1988 Parsing Incomplete Sentences. Proc. 
of the 12 en Internat. Cony. on Computational Lin- 
guistics (COLING'88) "CoL 1:365-371, D. Vargha 
(ed.), Budapest (Hungary). 
\[17\] Lung, B. 1988 Datalog Automata. Proc. of the rd 
3 Internat. Cony. on Data and Knowledge Bases, 
C. Beeri, J.W. Schmidt, U. Dayal (eds.), Morgan 
Kanfmann Pub., pp. 389-404, Jerusalem (Israel). 
\[18\] Lung, B. 1988 Complete Evaluation of Horn 
Clauses, an Automata Theoretic Approach. INRIA 
Research Report 913. 
\[19\] LanK, B. 1988 The Systematic Construction of 
Eadey Parsers: Application to the Production o/ 
O(n 6) Earle~ Parsers for Tree Adjoining Gram- 
mars. In preparation. 
\[20\] Li, T.; and Chun, H.W. 1987 A Massively Psral- 
lel Network-Based Natural Language Parsing Sys- 
tem. Proc. ol £nd Int. Cony. on Computers and 
Applications Beijing (Peking), : 401-408. 
\[21\] Nakagawa, S. 1987 Spoken Sentence Recogni- 
tion by Time-Synchronous Parsing Algorithm of 
Context-Free Grammar. Proc. ICASSP 87, Dallas 
(Texas), Vol. 2 : 829-832. 
\[22\] Pereira, F.C.N.; and Warren, D.H.D. 1980 Deft- 
uite Clause Grammars for Language Analysis -- 
Asurvey of the Formalism and a Comparison with 
Augmented Transition Networks. Artificial Intel. 
ligence 13: 231-278. 
\[23\] Phillips, J.D. 1986 A Simple Efficient Parser for 
Phrase-Structure Grammars. Quarterly Newslet- 
ter of the Soc. for the Study of Artificial Intelli- 
gence (AISBQ) 59: 14-19. 
\[24\] Pratt, V.R. 1975 LINGOL -- A Progress Report. 
In Proceedings of the Jth IJCAI: 422-428. 
\[25\] Rekers, J. 1987 A Parser Generator for Finitely 
Ambiguous Context-Free Grammars. Report CS- 
R8712, Computer Science/Dpt. of Software Tech- 
nology, Centrum voor Wiskunde en Informatica, 
Amsterdam (The Netherlands). 
\[26\] Sheil, B.A. 1976 Observations on Context Free 
Parsing. in Statistical Methods in Linguistics:. 71- 
109, Stockholm (Sweden), Pros. of Internat. Conf. 
on Computational Linguistics (COLING-76), Or- 
taw'4 (Canada). 
Also: Techuical Report TR 12-76, Center for Re- 
search in Computing Technology, Alken Computa- 
tion Laboratory, Harvard Univ., Cambridge (Mas- 
sachusetts). 
\[27\] Shieber, S.M. 1984 The Design of a Computer 
Language for Linguistic Information. Proc. of the 
10 'h Internat. Cony. on Computational Linguistics 
-- COLING'84: 362-366, Stanford (California). 
\[28\] Shieber, S.M. 1985 Using Restriction to Extend 
Parsing Algorithms for Complex-Feature-Based 
Formalisms. Proceedings oy the ~3rd Annual Meet- 
ing of the Association for Computational Linguis- 
tics: 145-152. 
\[29\] Thompson, H. 1983 MCHART: A Flexible, Mod- 
ular Chart Parsing System. Proc. of the National 
Conf. on Artificial Intelligence (AAAI-83), Wash- 
ington (D.C.), pp. 408-410. 
\[30\] Tomita, M. 1986 An Efficient Word Lattice Pars- 
ing Algorithm for Continuous Speech Recognition. 
In Proceedings oy IEEE-IECE-ASJ International 
Conference on Acoustics, Speech, and Signal Pro- 
¢essing (ICASSP 86), Vol. 3: 1569-1572. 
\[31\] Tomita, M. 1987 An Efficient Augmented- 
Context-Free Parsing Algorithm. Computational 
Linguistics 13(1-2): 31-46. 
\[32\] Tomita, M. 1988 Graph-structured Stack and Nat- 
ural Language Parsing. Proceedings oy the 26 th 
Annual Meeting Of the Association for Computa. 
tional Linguistics: 249-257. 
149 
\[33\] Uehaxa, K.; Ochitani, R.; Kaknsho, 0.; Toyoda, 
J. 1984 A Bottom-UpParser based on Predicate 
Logic: A Survey of the Formalism and its Im- 
plementation Technique. 198~ In•ernst. Syrup. on 
Logic P~mming, Atlantic City (New Jersey), : 
220-227. 
\[34\] U.S. Department of Defense 1983 Reference 
Manual for the Ada Programming Language. 
ANSI/MIL-STD-1815 A. 
I35\] Valiant, L.G. 1975 General Context-Free Recog- 
nition in Less than Cubic Time. Journal of Com- 
puter and System Sciences, 10: 308-315. 
\[36\] Villemonte de la Clergerie, E.; and Zanchetta, A. 
1988 Eealuateur de Clauaes de Horn. Rapport de 
Stage d'Option, Ecole Polytechulque, Palaise&u 
(n'auce). 
\[37\] Wirth, N. 1971 The Programming Language Pas- 
cal. Acta Informatica, 1(1). 
\[38\] Ward, W.H.; Hauptmann, A.G.; Stern, R.M.; and 
Chanak, T. 1988 Parsing Spoken Phrases Despite 
Missing Words. In Proceedings of the 1988 In- 
ternational Conference on Acot~tics, Speech, and 
Signal Processing (ICASSP 88), Vol. 1: 275-278. 
\[39\] Younger, D.H. 1967 Recognition and Parsing of 
Context-Free Languages in Time n 3. Information 
and Control, 10(2): 189-208 
A The algorithm 
This is the formal description of a minimal dynamic pro- 
gramming PDT interpreter. The actual Tin interpreter has 
a larger instruction set. Comments are prefixed with ~. 
-- Begin parse with input sentence x of length n 
step-A: -- Initialization 
:= 
So := {~}; 
7, := {~}; 
i := 0; 
-- initial item 
-- first rule of output grammar 
-- initialize item.set So 
-- rules of output grammar 
-- input-scanner indez is set 
-- before the first input symbol 
step-B: -- Iteration 
while i < n loop 
for every ires Uf((pAi)(qBj)) in S, do 
for every ~zanaition r in 6 do 
me consider four kinds of transitions, corresponding 
to the instructions of a minimal PDT interpreter. 
i~ rf(p•e ~-* fez) then ~ OUTPUTz 
Y :---- ((rAi) CqBj)); 
& := 8, u {v}; 
7' := P u {(v - u~)}; 
if r----(pe• ~-, roe) then --PUSHC 
V :---- ((r O i) (p A i)) ; 
s, :=&u(V}; 
:= 7' u {(v - •)}; 
if r=(pAe ~ tee) then --PAPA 
for every il;en Y = ((q B j) (S D k)) in Sj do 
V := ((r B i) (s V k)) ; 
s, := & u {v}; 
7' := 7, u ((v - Yu)}; 
if r = (p•a ~-~ r• •) then 
V := ((rAi+l)(qej)); 
S,+x := &+x u {V} ; 
:= 7, u ((v -. u)}; 
i := i+1; 
.nd loop; 
step-C: -- Termination 
:Jar every item O = ((f; n) (~; 0)) 
such that fEF do 
:= ~' u (U~ -. U); 
-- Uf is the initial nonterminal of ~. 
-- End of parse 
-- SHIFT a 
in S. 
B Interpretation of a top-down PDT 
To illustrate the creation of the shared forest, we present 
here informally a simplified sequence of transitions in their 
order of execution by a top-down parser. We indicate the 
transitions as Tin instructions on the left, as defined in ap- 
pendix A. On the right we indicate the item and the rule 
produced by execution of each instruction: the item is the 
left-hand-side of the rule. 
The pseudo-instruction scan is given in italics because 
it does not exist, and stands for the parsing of a sub- 
constituent: either several transitions for a complex con- 
stituent or a single shift instruction for a lexical constituent. 
The global behavior of scan is the same as that of ehif% and 
it may be understood as a shift on the whole sub-constituent. 
Items axe represented by a pair of integer. Hence we give 
no details about states or input, but keep just enough infor- 
mation to see how items axe inter-related when applying a 
pop transition: it must use two items of the form (a,b) and 
(b, c) as indicated by the algorithm. 
The symbol r stands for the rule used to recognize a con- 
stituent s, and ~ri stands for the rule used to recognize its i 'h 
sub-constituent ei. The whole sequence, minus the first and 
the last two instructions, would be equivalent to "scan s'. 
• .. (6,6) 
push r (7,6) -> e 
push rl (8,7) -> • 
scan 81 (9,7) -> (8,7) sl 
out rl (10,7) -> (9,7) rl 
pop (11,6) -> (7,6) (10,7) 
ptmh f2 (12,11) -> • 
scan s~ (13,11) -> (12,11) ss 
out r2 (14,11) -> (13,11) r2 
pop (15,6) -> (11,6) (14,11) 
push r~ (16,15) -> • 
scan sa (17,15) -> (16,15) s3 
out r3 (18,15) -> (17,15) ~3 
pop (19,6) -> (15,6) (18,15) 
out f (20,6) -> (19,6) 
pop (21,5) -> (6,5)(20,6) 
,,. 
This grammar may be simplified by eliminating useless 
non-terminals, deriving on the empty string e or on a single 
other non-terminal. As in section 4, the simplified grammar 
may then be represented as a graph which is similar, with 
more details (the rules used for the subconstituents), to the 
graph given in figure 2. 
150 
C Experimental Comparisons 
This appc~dlx gives some of the experimental data gathered to 
c~npa~ compilation achemata~ 
For each grammar, the first table gives the size of the PDTs oh- 
t~dned by compiling it accordlnZ to several compilation schematL 
This size corresponds to the number of instructions genca'ated for 
the PDT, which is roughly the n,mher of possible PDT states. 
The second table gives two figures far each schema and for 
sevm-al input sentences. The first figure is the number of items 
computed to parse that sentence with the given schema: it may 
be read as the number of computation steps and is thus • measure 
of computational ei~ciency. The second figure is the n,,ml~er of 
items r~n~in;ng after simp/ification of the output grarnm~, it is 
thus an indicator of shsx~g quality. Sharing is better when this 
second figure is low. 
In these tables, columns beaded with LR/LALR stands for the 
LR(0), LR(1), LALR(1) and LALR(2) cases (which often give the 
same results), unlesa one of these cases has its own expl;clt column. 
Tests were run on the GRE, NSE, UBDA and RR gramman 
of \[3\]: they did not exhibit the loss of eRiciency with incre~md 
look-ahead that was reported for the bottom-up look-ahead of \[3\]. 
We believe the results presented here axe consistent and give 
an accurate comparison of performances of the parsers considered, 
despite some implementation departure from the strict theoretical 
model required by performance considerations. A tint version of 
our LL(0) compiler &,ave results that were inconsistent with the 
results of the bottom-up parsers. This was ,, due to & weakness in 
that LL(0) compiler which was then corrected. We consider this 
experience to be a conflrm~ion of the nsefuln~ of our uniform 
framework. 
It must be stressed that these ~re prellmi~L~-y experiments. On 
the basis of thdr. ~,dysis, we intend a new set of experiments 
that will better exhibit the phenomena discussed in the paper. In 
particular we wish to study variants of the schen~ta and dynamic 
progr~nming interpretation that give the best p,~dble sharing. 
C.I Gr-mmar UBDA 
it ::'A• J • 
LR(0) \[ LR(1) LALR(1) LALR(2) 
38 60 41 41 
input string 
ma 
&aaaam 
LR/LALR 
14- 9 
23- 15 
249- 156 
prece(L 
15-9 
29- 15 
226 - 124 
preced. LL(0) 
36 46 
LL(0) 
41 - 9 
75 - 15 
391 - 112 
C.2 Gr-mmar RR 
• ::-x• l • 
gramm~ is LALR(1) but not LR(0), which explains the 
lower performance of the LR(O) parser. 
LB.(0) LR(1) LALR(1) LALR(2) preced. LL(0) 
34 37 37 37 48 46 
input string LR(0) LR/LALR preced. 
• 14-9 14-9 15-9 
xx 23- 13 20- 13 25 - 13 
xxzxxz 99- 29 44 - 29 56 - 29 
C.3 Picogrsmmar of English 
S ::8 EP VP \[ S PP 
IP ::" n J dec a \[ lip PP 
VP ::= v irP 
PP : :• prep lip 
LL(O) 
28- 9 
43- 13 
123- 29 
LR(0) LR(1) LALR(1) LALR(2) preced. LL(0) 
110 341 104 104 90 116 
input string LFt/LALR preced. LL(0) 
n • n prep n 71.47 72 - 47 169 - 43 
n • n (prep n) 2 146 - 97 141 - 93 260 - 77 
n • u (Fep n) 3 260 - 172 245 - 161 371 - 122 
n • n (prep n) s 854 - 541 775 - 491 844 - 317 
C.4 Grammar of Ada expressions 
This grimm&r, too long for inclusion h~e, is the grammar of ex- 
pressions of the \]an~cru~e AdsN as given in the reference man- ual \[3@ This grammar is ambiguous. 
In these examples, the use of look-ahead give approximately a 
25% gain in speed elliciency over LR(0) parsing, with the same fo~t shadng. 
However the use of look-ahead rn~y increase the LR(1) parser 
size quadratically with the granunar size. Still, a better engineered 
LR(1) construction should not usually increase that size as dra- 
nmticaily as indicated by our experimental figure. 
LR(0) LR(1) LALR(1) preced. 
587 32210 534 323 
input string LIt(0) LR(1) LALR(1) 
a*3 7'4 - 39 59 - 39 59 - 39 
(ae3)+b 137- 75 113 - 75 113- 75 
&*3"I-b**4 169- 81 122 - 81 122 - 81 
C.5 Grnmmar PB 
E ::= a A d \[ • Be \[ b a ¢ \[ b B d 
• ::me 
B ::me 
preced. 
80- 39 
293- 75 
227 - 81 
LR(0) LR(1) LALR(1) ~ (2) p~'cd. LL(0) 
76 i00 80 84 122 
Thin ~p-ammar is LR(1) but is not LALR. For each compilation 
scb,'ma it gives the same result on all possible inputs: aed, ae¢, 
bec and bed. 
LR(0) LR(1) LALR(1) & (2) preced. LL(0) 
26-15 23- 15 26- 15 29-15 47- 15 
C.6 Grammar SBBL 
E ::8 X £ d J I B c \[ Y £ c \[ Y B d 
X ::mr 
Y ::mr 
A ::=ei\[ g 
B::=eAJ s 
LR(0) LR(1) LALR(1) LALR(2) preced. 
159 294 158 158 104 
input string LR(0) LR(1) LALR(1) & (2) preced. 
fegd 50- 21 57 o 37 50 - 21 84 - 36 
feee~Fl 62 - 29 7'5 - 49 62 - 29 II0 - 44 
The termln,d f may be ambiguously parsed as X or as Y. This 
ambiguous left context increases uselessly the complexity of the 
LR(1) ~ during recognition of the A and B constituents. Hence 
LR(0) performs better in this case since it ignores the context. 
151 
