The Primordial Soup Algorithm 
A Systematic Approach to the Specification of Parallel Parsers 
Wil Janssen, Mannes l)oel, t(laa,s Sikkel, Job Zwicrs 
University of Twente, Dept. of Computer Science 
P.O. Box 217, 7500 AE Ensehede, The Netherlands 
E-mail: {janssenw,mpoel,sikkel,zwiers}(c~cs.utwente.nl 
Abstract 
A general framework for parallel parsing is pre- 
sented, which allows for a unitied, systematic 
approach to parallel parsing. The l'rimordial 
Soup Algorithm creates trees by alh)wing par- 
tial parse trees to combine arbitrarily. By adding 
constraints to the general algorithm, a large, class 
of parallel parsing strategies can be detined. This 
is exemplified by CYK, (hot, torn up) Earley and 
de Vreught & Honig parsers. From such a pars- 
ing strategy algorithms for various machine ar- 
chitectures can be derived in a systematic way. 
1 Introduction 
In this paper we present a general framework for 
parallel parsing algorithms. Parsing ean be seen 
ms a process in which a set of partial parse trees 
is recognized. One starts with the productions as 
elementary trees. Small trees Cml be combined 
into larger trees, yielding ew~.r larger and larger 
structures, until completed parses for a particu- 
lar target sentence are produced. We envisage 
the set of recognized trees as a kind of primor- 
dial soup. Small trees float around and if they 
fit together they can be combined into a larger 
tree. This is, in a nutshell, the Primordial Sotq) 
paradigm. 
In the most general approach, trees can com- 
bine in arbitrary ways. That is, a new tree can be 
created from two existing trees if there is a par- 
tiaI overlap between the trees. The overlapping 
part is unified. Tree creation is nondestructive, 
in tim sense that a tree can be used more than 
once for the production of a larger tree. Or, t(~ 
put it in a different way, the initial soup contains 
an abundant amount of raw material. Thus all 
relevant trees can actually be created. 
The Primordial Soup Algorithm can be re- 
lined into a variety of parsing strategies by add- 
ing constraints, either on the allowed type of 
trees or on the way in which existing trees ean 
be combined. A parsing strategy specifies which 
trees will be recognizerl, without bothering to 
specify control and data structures that must be 
added in order to arrive at practical implemen- 
tations. 
For the development of parallel implementa- 
tions, the (initial) absence of control structure is 
&n :-:u'~s¢?l;~ a~'~ decisions about system architecture 
can be deferred to a later stage when the strategy 
has been fleshed out in more detail. Our specifi- 
cation of the Primordial Soup Algorithm allows 
for a systematic derivation of implementations of 
parsing strategies as is shown in a more detailed 
technical report \[JPSZ\]. These derivations are 
em'ried out within a partial order framework as 
introduced in \[JPZ\]. In the restricted space avail- 
able here we concentrate on the Primordial Soup 
Algorithm ~ a fi'amework for the specification of 
parsing strategies. 
In section 2 the l'rimordial Soup Algorithm is 
introduced, exemplified by a CYK-like approach. 
\[n section 3, t.im formalism is slightly extended 
so ~ to allow for the description of almost any 
parallel or sequential parser. 
2 The Primordial Soup 
The Primordial Soup Algorithm will be intro 
duced after some renlarks about notation and 
parsers We show that the algorithm is a gener- 
alization ()\[ well-known parsing strategies. 
2.1 l'reliminaries 
We nse the following notational conventions. 
Nonterminals are denoted by A, 1/,.., E N; ter 
rninals arc denoted by a,b,... E ~:. We write V 
for NUY\], with typieal elements X, Y~.... Termi 
nal strings are denoted by s, t,u,v,w,... E E*, 
ACI~.S DE COLING-92, NAN'IES, 23-28 AO(Z|' 1992 3 7 3 PkoC. oJ: COLING-92, N^N'rES, AUC;. 23 28. 1992 
arbitrary strings by c~,/3,.., E V*. 
Let G = (N, Z, P, S) be a context free gram- 
mar. Let w -= al...a, E ~* be the sentence. 
While executing an arbitrary parsing algorithm, 
we maintain a set of trees that might be su.b- 
trees of a parse for w. Let .Tbt be the class of 
finitely branching trees, in which all nodes have 
a label from some universal class of symbols. Let 
T(G) C J:bt be the class of trees that can be con- 
structed from P; i.e., if some node is labelled A 
and its children X1,..., X~, then A--~XI ... X, E 
P. We will usually write T for T(G); individual 
trees are denoted p, a, T,. • • E T. 
We write root(T) for the label of the root of a 
tree T. The yield of a tree T, denoted by yield(T)~ 
is defined as the concatenation of the labels of 
the leaves. Clearly, y~eld(T) ~ V*. Note that 
leaves labelled ¢ (generated by empty produc- 
tions) are not visible in the yield as ¢ disappears 
in concatenation. A tree T is a parse tree for w 
if root(v) = S and yield(v) = w. For arbitrary 
w E \]E* a subclass T~ C T is defined that con- 
rains trees v with yield(v) = ai...aj for some 
substring al..-aj of w. T~ is called the set of 
subparses of w. The root of a subparse need not 
be 8, it can be any nonterminal A E N. 
As a convenient notation for trees we write 
= (A "~ ~) for an arbitrary tree with A = 
root(T) and c~ = yield(T). In general {A-~ c~) 
is not uniquely determined, as every derivation 
A=~+a defines a tree (A -,-* a). If we want to 
stress that a derivation A=~,+a/3"~ can be obtained 
as A:::~+ aB"/::~+ afl"/ we write (A',-* a (B",~ ~) ~/) 
for the tree (A-,-*~fl-y). Thus the tree notation is 
generalized into (A',~I... ~.), where ~ is either 
a leaf or a subtree. This simple tree notation is 
extended with the following conventions: 
• A tree (A-,,*~) corresponding to a single-step 
derivation A=~a is also denoted as (A--~a). 
This corresponds to a production A-..*a E P. 
• As a convenient shorthand, a tree 
(A--~a (B~--*/31} ... (B~-,zfl.} 7) 
will be abbreviated to 
(A,',~o~ (BI " . B,,",~ /31" " fl,,) "y). 
2.2 Various bottom-up parsers 
Our basic approach results from a generalization 
of various bottom-up parsing algorithms. The 
oldest and perhaps best known of these is the 
Cocke-Younger-Kasami (CYK) algorithm \[You\]. 
It requires the grammar to be in Chomsky Nor- 
mal Form, i.e., productions have the form A---*BC 
or A--re. If we have trees vl = (B".za~+l... ak) 
and v2 -- (C',*a~+l...at) and if there is a pro- 
duction A~BC E P, we can construct a larger 
tree (A',~a~+l ... aj) from vl and v~. This can be 
continued until (S',~al... a, / has been derived, 
or no new trees can be constructed. 
The CYK algorithm is usually described as a 
recognizer, rather than a parser. A recognition 
algorithm collects a set of items that denote the 
existence of trees, rather than trees themselves. 
If it is deduced that A:~*a~+l .. • a t (without hav- 
ing constructed a corresponding tree), this will 
be denoted by an item \[A',.*a~+l... at\]. In gen- 
eral, an item \[A.-~ a\] denotes the existence of one 
or more trees (A.,*a I. The string w is grammat- 
ically correct if and only if an item \[S-~ w\] can 
be recognized. 
The CYK algorithm recognizes items of the 
form \[A-,-* a~+l...aj\]. For notational conve- 
nience, such an item is usually written as \[i, A,j\]. 
Thus we get the conventional description of CYK 
recognition: An item \[i,A,j\] can be recognized 
iff \[i, B, k\] and \[k, C, j\] have been recognized pre- 
viously for some i < k < j and A---*BC E P. 
Several recognition and parsing algorithms 
deal with arbitrary context-free grammars along 
the same line as CYK, involving some more tech- 
nicalities for handling productions of arbitrary 
length, including e-productions. For example, a 
bottom-up variant of Earley's recognition algo- 
rithm \[Ear, GHR\] recognizes items of the form 
\[i, A--*ao/3, j\] 
denoting the fact that a~*a~+l...aj. That is, 
the first part of a production has been recog- 
nized. If fl = e, i.e. the item is of the form 
\[i, A--~a.,j\], the entire production has been rec- 
ognized; such an item denotes the existence of 
a tree (A-~ a~+l.., at). We call this algorithm 
Bottom- Up Earley (BUE) in the sequel; the top- 
down filter of Earley's algorithm has been deleted 
so as to allow parallel bottom-up, rather than 
left-to-right processing of the string. 
Still, BUE recognizes each individual nonter- 
minal in left-to-right manner, for which there 
is no a priori reason. De Vreught and Honig 
\[dVH\] describe a similar, more general algorithm 
(which we abbreviate VH), using double dotted 
items \[i,A--*a./3,'y,j\] where/3=¢.*a~+1 ." a t. In 
this case /3 corresponds to a part of the string 
that has been recognized, whereas a and "~ still 
need to be recognized. 
Both BUE and VH can easily be extended 
AcrEs DE COLING-92, NANTES, 23-28 AOOT 1992 3 7 4 Pgoe. OF COLING-92, NANTES, AUG. 23-28, 1992 
to parsing algorithms, producing partial parse 
trees of the form (A---, (a ".~ ai+l...aj)~) and 
(A--*~ (/~"-*ai+l'" aj) 7), respectively. 
2.3 The Primordial Soup Algorithm 
VH is by no means the most general algorithm. 
As the ultimate generalization we can allow any 
tree in T. The top is a nonterminal and the 
leaves (:an be any symbol in V; a tree may or 
may not be part of a parse for w. 
Initially we start with elementary trees that 
correspond to the productions in our grammar. 
New trees can be added by merging (copies of) 
existing trees which agree on their common parts. 
This can be seen as some kind of unification pro- 
cess on parse trees. The string is parsed when 
a tree T = (S ",~ al ... a,,) is produced; the al- 
gorithm terminates when no new trees can be 
added. Metaphorically speaking, one can think 
of the initial set of trees as a primordial soup 
in which small structures react with each other, 
creating ever larger and more complicated struc- 
tures. We therefore call it the Primordial Soup 
Algorithm. Superficially, it may resemble the 
unification space of Vosse and Kempen \[VK\], who 
think of molecules floating in a test-tube and en- 
tering into chemical bonds with other molecules. 
The paradigms are different however, as in the 
primordial soup, unlike the test-tube, raw ma- 
terial abounds and and multiple copies of any 
structure can be created. 
The most general version of the Primordial 
Soup Algorithm--allowing to combine trees by 
unification of arbitrary overlapping parts is a 
formalism in which a wide variety of parsing al- 
gorithms can be specified with great ease. Be- 
fore that, we first formalize a slightly limited, 
but somewhat easier version of the Primordial 
Soup Algorithm. 
The algorithm starts of with an initial set of 
recognized trees S consisting of trees correspond- 
ing to the productions in our grammar. New 
trees can be added to S by taking combinations 
of existing trees. The simplest way to combine 
trees is the following. 
Let c~ = (A~-* c~B3') ¢ S and r = (B--~ fl) (: 
S. We can unify the leaf B in a with the root 
B in r, yielding a new tree (A --~ a (B ,,~ fl} "/). 
This tree is denoted by o<1T. The (partial) func- 
tion <1 : QCbt x Ybt--,.Ybt is called composition. 
Note that there can be inultiple occurrences of 
B in yield(a), which means that a<lT need not 
be determined uniquely. Also, we will use the 
operator <1 in a liberal way, allowing more than 
one extension to be made at the same time. Let 
= (A"--,aoH1cqB2c~2) and Ti = (Bi",~fli). We 
write 
cr<1~-l, T ~ 
for the tree (A"~a0 (B,'-~fl~)ch (B=',zf~2) ~2), 
using <1 as a polyadic operator with one left-hand 
argument and an arbitrary number of right-hand 
arguments. 
As initial contents of the primordial soup, we 
take the trees (A--*~} corresponding to produc- 
tions A--*~ e P. Such a tree (A-+c~) is called a 
production tTee or a production for short. We de- 
fine an operator `4 : 27--~27 that yields all new 
trees that can be composed from the contents of 
the soup by 
A(s) "°--' {~<1r,,..., rk ~ 7- I 
{~, ~,..., ~k} c s}. 
This definition of .4 has one shortcoming, 
however. Rather than all parses for all sentences 
we only want the parses for one particular sen- 
tence w (~ Z*. In general, this problem is tackled 
by redefining A as 
`4(S) "°' {a<~,. .., ~ C 7- I {a, ~,,..., ~~} C S 
A allowed(a<1rh..., ~'k)} 
in which a predicate allowed specifies which trees 
are allowed to be added. Which trees can be 
discarded right away, and which ones should be 
added to the soup? As we are only interested 
in trees that can be extended to parses for some 
specific sentence w, the terminal part of the yield 
should he extendable to w. That is, w can be pro- 
duced from yield(r) by replacing every nontermi- 
nal in r with some string of terminals. Formally, 
for terminal strings s (~ E* we define 
extends(s, t) d~r 3U, v C ~:*(t = USV), 
i.e. s is a substring of t. For strings in V* 
containing at least one nonterminal, we define 
extends reeursively a.~ 
extend4(~Z, t) '~°~ 3s C ~: (extends(~sZ, t)). 
Finally we define 
atto~ed ( r ) a°S extends(~ield(¢), w), 
in accordance with the informal definition given 
above. Note, however, that we still may create 
an infinite number of useless trees, simply by not 
adding terminals to the yieht! If yield(r) C N* 
then allowed(T) holds: each leaf can be extended 
t() ~, and the empty string is indeed a substring 
of w. In 3.2 we will see how this problem can 
be tackled in general; here we will only regard 
a subclass of 7" ttmt does not contain trees with 
arbitrarily large nonterminal yields. 
ACq'ES DE COL1NG-92, NANTES, 23-28 Aotrr 1992 3 7 5 Paoc. OF COLING-92. NANTES, AUG. 23-28. 1992 
This finally allows us to define the Primordial 
Soup Algorithm. 
\[ Program primordial_soup declare 
i 8: set of T 
I begin 
I s := {r ~ 7- \[ production(r)}; 
while (A(S) - S) ¢ 0 do $ := S U A(,S) 
\[ end {primordial_so~zp} ............... 
2.4 Specifying parse strategies 
More specific and more useful instances of the al- 
gorithm can be defined by imposing restrictions 
on the trees to be added. A strategy is a char- 
acterization of trees that are to be added to the 
primordial soup S under some additional con- 
straints. Different constraints specify different 
strategies. We call it strategy, rather than al- 
gorithm, as no control structure is specified ex- 
plicitly. For the sake of simplicity we assume 
that ~4(S) is added all at once, but it should 
be understood that, if so desired, only a subsets 
of ~4(S) need be added at each step. A strat- 
egy can be refined into a (parallel or sequential) 
algorithm by adding control structure and data 
structures so as to keep track of intermediate re- 
sults in an efficient manner. For examples of the 
design of parsing algorithms from such strategies, 
see \[JPSZ\]. 
Parsing strategies can be characterized by 
two types of restrictions: on the types of trees 
allowed in the soup and on the operators that 
create new trees from existing ones. Both kinds 
of restrictions are interchangeable most of the 
time; if trees are allowed to combine only in some 
specific way, the set of generated trees will be 
restricted, and vice versa. 
As a simple example, we will specify a strat- 
egy for the CYK parser. To that end, we define 
an additional predicate 
complete(T) a~ yield(r) e ~* 
i.e., a tree is complete if its yield does not con- 
tain any uonterminal. Such a tree can only be 
u~'d as a right-hand side argument of a com- 
position. Recalling that the CYK algorithm is 
defined only for grammars in Chomsky Normal 
Form (i.e., productions are of the type A---~BC 
and A-+a), we can define the CYK strategy by 
des .AcyK(S) = {a<~rl, r2 \] eomplete(a<~rl, r2) 
^ altowed(a<r~, r~)}. 
Apart from the initial production trees, S will 
only contain trees of the form (A-,~ ai+l.-, aj). 
The complete predicate specifies that newly cre- 
ated trees have a terminal yield; this must be a 
subtring of w due to the allowed predicate. It is 
trivial to verify that all such trees are added to 
S in due course. Hence the specification of CYK 
is sound and complete. 
3 Other parse strategies 
We redefine the Primordial Soup Algorithm from 
section 2 in a more general manner, and show 
its power and elegance by specifying the parsing 
strategies of Bottom-Up Earley, De Vreught & 
Honig and some variants of CYK. 
3.1 Unification and superposition 
In section 2 we used only the composition oper- 
ator <1 to create new trees from existing ones. 
Composition can be seen as a specific case of su- 
perposition, in which arbitrary overlapping parts 
of trees can be unified. 
We will first define unification, which is a spe- 
cial case of superposition in which the roots of 
two trees are mapped onto each other, for the 
definition of unification, we use the derivation 
operator =~ for trees. If T = (A'~ c~B~) and 
a = (A-~ ~(B--~)'y), we write r=~cr. A tree a 
is called an extensioT, of r if r=C'*a, where =~* 
means applying the derivation =¢- zero or more 
times. Now two trees r and a unify if a tree p 
exists that is an extension of both cr and r. I.e., 
unify(a, r) %~ ~p C T(T=C.*p A a:=¢.* p) . 
p is called an upper bound of r and a. Further- 
more, if a and r unify, there is a unique least 
upper bound, denoted by rkla, satisfying 
if T=C-*p and a=:C,*p then rllff=:~'*p . 
rtJa is called the unification of T and a. Note 
that the roots of r and a coincide in TUcr. Unifi- 
cation can be generalized to superposition by al- 
lowing the root of one tree to be unified with an 
arbitrary node of the other tree, under the con- 
straint that the overlapping parts of both trees 
are be identical; see Figure 1. This superposition 
operator is denoted by ~. Note that, in general, 
superposition is not uniquely determined. Hence 
it is defined as a function ~ : .7:bt × .Ybt--*2 3:bt, 
whereas unification is defined as a partial func- 
tion kl : .Ybt x .Wbt---*JYbt. For a more formal 
definition, see \[JPSZ\]. 
ACT,S DE COLING-92, N^NTES. 23-28 AOUT 1992 3 7 6 PROC. OF COLING-92. NANTES. AUG. 23-28. 1992 
T 
if anT' = p for a subtree r t of % then 7" is 
replaced by p. 
Figure 1: superposition of trees 
3.2 Some general restrictions 
As discussed in 2.3, we do not want to recognize 
all trees leading to parses of arbitrary strings. 
We introduced the general idea that a tree is 
allowed only if the terminal part of the yield 
extends to the sentence. For the CYK algo- 
rithm, this simple criterion is fine. In general, 
however, it is too restrictive, in the sense that 
some familiar parsing algorithms cannot handle 
it. Suppose, for example, that a tree (A,',~ aB) 
is extcnded with a production (B--*bCd) into 
{A ,~ abCd). In principle, this should only be 
allowed if ab and d occur in w in this order. A 
parser which uses only local information, e.g. an 
LR(1) parser, cannot determine wtmther a ter- 
minal d occurs somewhere in the string, perhaps 
after a large substring produced by (7. 
We will use a rather more subtle scheme to 
match the yield of a tree against the sentence, so 
as to allow for refinement into arbitrary parsing 
algorithms. Having a tree {A-,~ aB) we can check 
that a occurs in w and mark the leaf a accord- 
ingly. Marking a leaf is denoted by underlining 
the terminal symbol. The tree (A ~ aB) can 
be extended to (A'~ a(B-+bCd) ) = (A~.~ abCd) 
and then to {A~,~abCd), irrespective of whether 
d occurs in the string at all. 
The notion of marking ternfinals with occur- 
rences in the string fits quite well to parsing nat- 
ural languages, rather than un-interpreted con- 
text-free grammars. In practical NL parsing, the 
word categories rather than the individual words 
are used as terminals, although they are in fact 
pre-terminals. Using the word categories as ter- 
minals, a marked terminal is a word category 
applied to a word from the sentence. 
As an example, consider the sentence the bird 
flies. The initial soup might contain: 
(,wNP vt~ (det--,the) 
{NP~detnoun) (noun--*bird) 
( VP-~ verb) {noun-allies) 
(verb-~flies) 
Word categories need not be uniquely defined. In 
this case the word flies fits into two categories. A 
tree (NP ~ the noun) could be combined with 
(noun~fiies), yielding a noun phrase the flies. 
This tree is ruled out by extends, however, as 
the flies does not extend to the bird flies. 
In summary, we distinguish two types of ini 
tial trees: 
initial(r) d,=f production(r) V marker(r), 
produetion( (A---*a) ) a,r = A--~aEP, 
marker((a-m)) ~t a (~ T,a in the sentence. 
The extends predicate can be defined so as to 
apply to strings of markings (i.e. words) rather 
than terminals. Furthermore, if we do not want 
to construct arbitrarily large trees with a non-- 
marked yield, we can define 
def allowed ( r ) = 
extends(yield(r),w) ^ lyield(r)l < Iwl. 
Finally, allowing arbitrary tree construction 
with superposition (~) rather than composition 
(<1), a general version of the operator A is given 
by 
A(s)%f {oe r~o Jr es^aes 
A allowed(p)}. 
The algorithm is now given by 
Program pr:imordiaL soup - 
declare 
S: set of T 
begin 
/ 5 := {r e 7- I i,~it~al(r)} ; r 
/ while (A(S) - S) -~ 0 do S := S U J\[(S) / 
\[ end (primordial soup) .... \] 
For acyclic grammars (i.e., grammars that do 
not allow a derivation A:=~+A), only a finite num- 
ber of trees can be constructed, hence the algo- 
rithm is guaranteed to halt. When a gramnlar is 
cyclic, an infinite number of parses exist. Every 
finite (subtree of a) parse will be found within a 
finite number of steps. 
From the point of efficiency, the above al- 
gorithm isn't sensible at all. lts strength, how- 
ever, derives from the fact that a very large class 
of parallel parsing algorithms can be defined as 
specializations, by constraining the general algo- 
rithm in various ways. Some examples will be 
given shortly. 
Ac2Y~s DE COLING-92, NANTES, 23-28 AOt~2" 1992 3 7 7 PROC. OF COL1NG-92, NANTES, AUG. 23-28, 1992 
We have concentrated on context-free gram- 
mars for the sake of simplicity. It should be clear, 
though, that extension to various types of unifi- 
cation grammars is straightforward. 
3.3 Different breeds of trees 
As we have seen in the CYK example, complete 
trees are an important class of trees. But, hav- 
ing introduced markers, it is obvious that we con- 
sider a tree to be complete only if the entire yield 
has been marked. Therefore we redefine 
complete(v) d.~ yield(~) e U*. 
Note that all marker trees are complete, and that 
production trees axe complete iff they correspond 
to an e-production. 
Palm trees consist of a roof (corresponding 
to a single production) and a trunk (consisting 
of a number of adjacent complete trees). They 
are the result of composing production trees and 
complete trees. We can define them as )d~f 
palm ( r = 
T = (A-*a (fl,-,~ v) 7) A a7 # e ^ /3 # e. 
By notational convention, A-~fl? is a pro~ 
duction and v E E*. Note that in general 
is a sequence of symbols X1 ..' X,; each X~ is 
the root of a complete tree XC,~ P-r Degenerate 
cases, with only a trunk (a~ = e) or only a roof 
(3 = ¢ = v) are excluded explicitly. 
As a generalization of palm trees, we may 
consider trees with more than one trunk. This 
type of tree is denoted by baobab) 
baobab(r) 
^ a,...a~ #e A 31'"~ #~. 
For baobabs, like palms, we exclude degener- 
ate cases. Note, however, that any palm is also 
a baobab. Palms and baobabs are illustrated in 
Figure 2. 
A A 
gt~ ix3 
,v 
- palm tree baobab 
Figure 2: palm trees and baobabs 
IThe baobab is ~.n African tree that has branches from 
which roots originate, supporting the roof. Such roots 
grow out to additional trunks. 
3.4 CYK revisited 
The only addition to our previous specification of 
CYK is that it should produce trees with marked 
yields. To that end, we can define an initial step 
A0(S) doj (a<T e TIa, T e S 
A production(g) A marker(r)). 
For the remainder of the algorithm, production 
trees a = (A~BC) are composed with two com- 
plete trees T1 and T2 as usual, denoting ternary 
composition by a<lzl, T2. 
TO keep in line with other algorithms to fol- 
low, we could alternatively define CYK with a 
binary composition operator. As a consequence, 
a new tree is created in two steps. First a pro- 
duction tree is combined with a complete tree, 
giving a palm. In the second step the palm is 
combined with a second complete tree, giving a 
new complete tree. We define two functions A: 
Ads) %f (~<T e Tl~,~ ~ s 
h production (a) A complete (T) }, 
Ads) %f (~<T e Tla, T e s ^ Tetra(a) 
A complete(T) A allowed(a<lT)}. 
But as intermediate palm trees do not occur in 
the CYK algorithm as such, we define the func- 
tion A'cyK (for other than initial steps) as 
A'cvK(s) %' ~(A~(s) u s). 
A more liberal approach would be to allow 
the intermediate results to be in the soup: 
A,, ~S ~ d.~ At ($) u As(8). CYKk \] = 
For grammars in Chomsky Normal Form this 
hardly seems sensible. But when CYK is ex- 
tended to arbitrary CFGs, a complete tree can 
be created from a production tree through an in- 
termediate series of palm trees. If symbols in the 
right-hand side of a production can be recognized 
in arbitrary order, the condition palm(g) in the 
definition of A2 should be replaced by baobab (a). 
3.5 Bottom-Up Earley 
The BUE algorithm is defined for arbitrary con- 
text-free grammars. It is usually described as 
a recognition algorithm. An item \[i,A---*a,3,j\] 
denotes the fact that a~a,+t ' • • a s has been rec- 
ognized. From \[i, A-~a.B%j\] and LJ, B-*3o, k\]a 
new item \[i, A--~aB°% k) can be derived. We will 
define the algorithm on trees, rather than items. 
Trees of the form (A--* (a',~v / 3) are recognized 
for v = ai+l •. • aj a substring of w. 
ACTES DE COLING-92. NANTES, 23-28 AOOT 1992 3 7 8 PgoC. OF COLING-92. NANTES, AUG, 23-28, 1992 
We define the set of Earley trees g C 7" as 
£ d°d {(A--* (a-,~v)Z) • T\] 
A--~a/~ E P ^ v E E*}. 
Note that productions (~ = e) and complete 
trees (~ = e) are also included in 8. The op- 
eration of the algorithm is described by 
• ABuE(8) d.f { a.~r E £ I o,r • 8 
^ allowed(a,~r)}. 
From the definition of g' it follows that a,~r • E 
iff complete(T) and the leftmost unmarked sym- 
bol of yield(a) is root(T). The soundness follows 
from the definitions and completeness is trivially 
proven with induction on the size of the tree, 
hence the algorithm is correct. 
3.6 De Vreught and Honig's algorithm 
The VH algorithm also uses complete trees and 
palm trees, with the difference that the trunk of 
a palm tree does not necessarily cover the left- 
most part of the roof. We define a set l) of trees, 
analogously to the set of Earley trees by 
V ~f {(A--.c~ (f/-.~ v~ -y) e 12 \[ 
A~a/37 ~ P A v E ~*}. 
The functions to combine trees are defined dif- 
ferently, however: 
Ads) ~°J {o~r e v l o,r e s 
h production(o)A complete(r)}, 
A~(s)~our eVla, r es ^ palm(o) 
A palm(T) ^ allowed(our)}, 
Av.(s) ~ At(s) u A~(s). 
The first operation was originally called inclu- 
sion, the second concatenation. The former com- 
bines a nonterminal tree and a complete tree to a 
palm tree, whereas the latter combines two palm 
trees into a palm tree with a wider trunk, using 
unification. It cannot result in a proper baobab 
because of the definition of 12. A subtle differ- 
ence to the original algorithm is that we allow 
trunks of o and r to overlap, which is prohibited 
in their approach. It is not difficult to add this 
condition, if required. 
A similar result is obtained by replacing the 
functions ~4t and .A2 by a function similar to the 
one used for Earley's algorithm (but now for trees 
in 12 instead of in £). Thus a generalized bottom- 
up Earley parser, for which left-to-right parsing 
of a constituent is not necessary, is defined by 
A(S) ~f {o~r • v t o, ~ • s 
A complete(r) A allowed(a~r)}. 
4 Conclusions 
The Primordial Soup paradigm facilitates the 
specification of parsing strategies, i.e., high-level 
specifications or parsing algorithms, without ex- 
plicit control flow and data structures. 
A specification without control flow is a good 
basis for the design of a parallel implementation, 
as it allows a further refinement of the design 
before any decision on architecture is taken. For 
more details, see \[JPSZ\], where this has been ex- 
emplified with a design for a parallel CYK parser, 
using the Primordial Soup paradigm and the for- 
malism introduced in \[JPZ\]. 
The Primordial Soup framework can be used 
to design new parsing algorithms by mixing fea- 
tures of existing algorithms. For example, the 
Earley operator for tree composition in combina- 
tion with the De Vreught & Honig set of allowed 
trees yields a generalized Earley parser that has 
been rigorously defined in only two lines. 
The specification of parsing strategies is given 
in a formalism closely resembling predicate logic. 
This makes it almost trivial to derive prototype 
implementations in (parallel) logic programming 
languages like Prolog or Parlog \[JPSZ\]. 
ACTES De COLING-92, NANTES, 23-28 AOm" 1992 3 7 9 PROC. OF COLING-92. NANTES. AUG. 23-28, 1992 

References 

\[Earl J. Earley. An efficient Context-Free Parsing Al- 
gorithm. Comm. ACM, 13 (1970) 90-102. 

\[GHR\] S.L. Graham, M.A. Harrison, W.L. Ruzzo. 
An Improved Context-Free Recognizer. Trans. 
on Prog. Lang. and Syst. 2 (1980) 415-462. 

\[JPSZ\] W. Janssen, M. Poel, K. Sikkel, J. Zwiers. 
Tile Primordial Soup Algorithm. Memoranda 
Informatica 91-77, University of Twente (1991). 

\[JPZ\] W. Janssen~ M. Poel, J. Zwiers. Action Sys- 
tems and Action Refinement in the Devel- 
opment of Parallel Systems, CONCUR 'gL 
Springer Lectures Notes in Computer Science 
527 (1991) 298-316. 

\[VK\] T. Vosse, G. Kempen. A Hybrid Model of 
Human Sentence Processing: Parsing Right- 
Branching, Center-Embedded and Cross-Serial 
Dependencies. Proc. 2 "d Int. Workshop on 
Parsing Technologies, Cancun, (1991) 73-78. 

\[dVft\] J.P.M. de Vreught, H.J. ttonig. A Tab- 
ular Bottom-Up recognizer. Report 89-78, Fac- 
ulty of Technical Mathematics and Informatics, 
Delft University of Technology (1989). 

\[You\] D.H. Younger. Recognition of context-free lan- 
guages in time n 3. Information and Control 10 
(1967) 189 208. 
