CONTEXT-FRKFNESS OF THE LANGUAGE ACCEPTED 
BY MARCUS' PARSER 
R. Nozohoor.-FarshJ 
School of Computing Sdence. Simon Fraser Unlversit3" 
Buruaby. British Columbia, Canada VSA 156 
ABSTRACT 
In this paper, we prove that the set of sentences parsed 
by M~cus' parser constitutes a context-free language. The 
proof is carried out by construing a deterministic pushdown 
automaton that recognizes those smngs of terminals that are 
parsed successfully by the Marcus pa~er. 
1. In~u~on 
While Marcus \[4\] does not use phrase mucture rules as 
base grammar in his parser, he points out some correspondence 
between the use of a base rule and the way packets are 
acuvated to parse a constmcu Chamlak \[2\] has also assumed 
some phrase structure base ~ in implementing a Marcus 
style parser that handles ungrammatical situations. However 
neither has suggested a type for such a grammar or the 
language accepted by the parser. Berwick \[1\] relates Marcus' 
parser to IX(k.0 context-free grammars. Similarly, in \[5\] and 
\[6\] we have related this parser to LRRL(k) grammars. 
Inevitably. these raise the question of whether the s~s=g set 
parsed by Marcus' parser is a context-free language. 
In this paper, we provide the answer for the above 
que'.~/on by showing formally that the set of sentences accepted 
by Marcus' parser constitutes a context-free language. Our 
proof is based on simulating a simplified version of the parser 
by a pushdown automaton. Then some modificauons of the 
PDA are suggested in order to ascertain that Marcus' parser. 
regardless of the s~a~mres it puts on the input sentences, 
accepts a context-free set of sentences. Furthermore. since the 
resulung PDA is a deterministic one. it conRrms the 
deterrmnism of the language parsed by this parser. Such a 
proof also provides a justification for a.~uming a context-free 
underlying grammar in automatic generation of Marcus type 
parsers as discussed in \[5\] and \[6\]. 
2. Assumption of a finite size buffer 
Marcus' parser employs two data su'ucmres: a pushdown 
stack which holds the constructs yet to be completed, and a 
finite size buffer which holds the lookaheads. The iookaheads 
are completed constructs as well as bare terminals. Various 
operations are used to manipulate these data struaures. An 
"attentiun shift" operation moves a window of size k-3 to a 
given position on the buffer. This occurs in pazsing some 
constructs, e.g., some NP's, in par-dcul~ when a buffer node 
other than the first indicates start of an NP. "Restore buffer" 
restores the window to its previous position before the last 
"attention shift'. Marcus suggests that the movements of the 
window can be achieved by employing a stack of displacements 
from the beginning of the buffer, and in general he suggests 
that the buffer could be unbounded on the fight. But in 
practice, he notes that he has not found a need for more than 
five ceils, and PARSIFAL does not use a stack to implement 
the window or virtual buffer. 
A comment regar~ng an infinite buffer is in place here. 
An unbounded buffer would yield a passer with two stacks. 
Generally. such parsers characterize context-sensitive languages 
and are equivalent to linear bounded automa~ They have also 
been used for pa.mng some context-free languages. In this role 
they may hide the non-determinism of a context-free language 
by storing an unbounded number of lonkaheads. For example. 
LR-regular \[3\], BCP(m,n), LR(k.-) and FSPA(k) parsers \[8\] are 
such parsers. Furthermore, basing parsing decisions on the 
whole left contexts and k Iookaheads in them has often resulted 
in defining classes of context-free (context-sensitive) grammars 
with undecidable membership. LR-reguh~. IX(L=) and 
FSPA(k) are such classes. The class of GLRRL(k) grammars 
with unbounded buffer (defined in \[5\]) seems to be the known 
exception in this category that has decidable membership. 
Waiters \[9\] considers context--sensitive grammars with 
deterministic two--stack parsers and shows the undeddabiliD' of 
the membership problem for the class of such grammars. 
In this paper we assume that the. buffer in a Marcus 
style parser can only be of a finite size b (e.g.. b=5 in Marcus' 
parser). The limitation on the size of the buffer has two 
important consequences. First. it allows a proof for the 
context-freeness of the language to be given in terms of a 
PDA. Second, it facilitates the design of an effecuve algorithm 
for automatic generation of a parser. (However. we should add 
that: 1- some Marcus style parsers that use an unbounded 
buffer in a consu'ained way. e.g., by resuming the window to 
the krishtmost elements of the buffer, are equivalent to 
pushdown automata. 2- Marcus style parsers with unbounded 
buffer, similar to GLRRL parsers, can still be constructed for 
those languages which ale known to be context-free.) 
117 
3. Simplified parser 
A few reswictions on Marcus' parser will prove to be 
convenient in outli-i- 5 a proof for the context-freene~ of the 
language accepted by it. 
(i) Prohibition of features: 
Marcus allows syntactic nodes to have features containing the 
grammatical properties of the constituents that they represenL 
For implementation purposes, the type of a node is also 
considered as a feature. However, here a distinction will be 
made between this feature and others. We consider the type of 
a node and the node itself to convey the same concept (ke., a 
non-terminal symbol). Any other feature is disailowecL In 
Marcus' parser, the binding of traces is also implemented 
through the use of features. A trace is a null deriving 
non-termimJ (e.g., an NP) that has a feature pointing to 
another node, Le., the binding of the trace. We should mess at 
the outset that Marcus' parser outputs the annotated surface 
su'ucture of an utterance and traces are intended to be used by 
the semantic component to recover the underlying 
predicate/argument structure of the utterance. Therefore one 
could put aside the issue of trace registers without affe~ng any 
argument that deals with the strings accepted by the parser, i.e., 
frontiers of surface su'ucmre~ We will reintroduce the features 
in the generalized form of PDA for the completeness of the 
simulation. 
fib Non-acfessibilit~' of the oar~¢ tree; 
Although most of the information about the left context is 
captured through the use of the packeting mechanism in 
Marcus' parser, he nevertheless allows limited access to the 
nodes of the partial parse tree (besides the current active node) 
in the ac6on parts of the grammar rules. In some rules, after 
the initial pattern roaches, conditional clauses test for some 
property of the parse tree. These tests are limited to the left 
daughters, of the current active node and the last cyclic node 
(NP or S) on the stuck and its descendants. It is plausible to 
eliminate tree accessibility entirely through adding new packets 
and/or simple flags. In the simplified parser, access to the 
partial parse tree is disallowed. However. by modifying the 
stack symbols of the. PDA we will later show that the proof of 
context-freeness carries over to the general parser (that tests 
limited nodes of parse tree). 
(iii) Atomic actions: 
Action segments in Marcus' grammar rules may contain a series 
of basic operations. To simplify the mnulation, we assume that 
in the simplified parser actions are atomic. Breakdown of a 
compound action into atomic actions can be achieved by 
keeping the first operation in the original rule and inuoduclng 
new singleton packets containing a default pattern and a 
remaining operation in the a~on parx These packets will 
successively dea~vate themselves and activate the next packet 
much like "run <rule> next"s in PIDGIN. The last packet will 
activate the first if the original rule leaves the packet still 
active. Therefore in the simplified parser action segments are of 
the following forms: 
(1) Activate packetsl; \[deactivate packets2\]. 
(2) Deactivate packets1; \[a~vate packets2\]. 
(3) Attach ith; \[deactivate packetsl\]: \[activate packets2\]. 
(4) \[Deactivate packetsl\]: create node; activate packets2. 
(5) \[Deactivate packets1\]; cattach node: activate packets2. ~ 
(6) Drop; \[deactivate packets\].\]; \[activate packets2\]. 
(7) Drop into buffer; \[deactivate packetsl\]; 
\[activate packets2\]. 
(8) Attention shift (to ith cell); \[deactivate packetsl\]; 
\[a~vate packe~\]. 
(9) Restore buffer; \[deactivate packetsl\]; \[activate packets2\]. 
Note that "forward attention shift has no explicit command in 
Marcus' rules. An "AS" prefix in the name of a rule implies 
the operation. Backward window move has an explicit command 
"restore buffer'. The square brackets in the above forms 
indicate optional parrs. Feature assignment operations are 
ignored for the obvious reason. 
4. Simulation of the simplified parser 
In this s~'fion we construct a PDA equivalent to the 
simplified parser. This PDA recognizes the same string set that 
is accepted by the parser. Roughly, the states of the PDA are 
symbolized by the contents of the parser's buffer, and its stack 
symbols are ordered pairs consisting of a non-terminai symbol 
(Le.. a stack symbol of the parser) and a set of packets 
associated with that symbol 
Let N be the set of non-terminal symbols, and Y" be 
the set of terminal symbols of the pazser. We assume the top 
S node, i.e., the root of a parse tree, is denoted by So, a 
distinct element of N. We also assume that a f'L"~I packet is 
added to the PIIX3IN 8ranm~ar. When the parsing of a 
sentence is completed, the activation of this packet will cause 
the root node So to be dropped into the buffer, rather than 
being left on the stack. Furthermore, let P denote the set of 
all packets of rules, and 2/' the powerset of P, and let 
P.P~,P2.- be elements of 2/'. When a set of packets P is active, 
the pattern segments of the rules in these packets are compared 
with the current active node and contents of the viruml buffer 
(the window). Then the action segment of a rule with highest 
priority that matches is executed. In effect the operation of the 
parser can be characterized by a partial function M from a~ve 
packets, current active node and contents of the window into 
atondc actions, ke. 
M: 2~N(1)~fV (k) "* ACTIONS 
*Cauach" is used as a short notation for "create and 
attach'. 
118 
where V = N U ~, V(k)= V0+VI+_+Vk and AC"I'IONS is the 
set of atomic actions (1) - (9) discussed in the previous section. 
Now we can consu-act the equivalent PDA 
A=(Q2.r,r,6,qo,Ze,f) in the following way. 
Z = the set of input symbols of A, is the set of terminal 
symbols in the simplified parser. 
r = the set of stack symbols \[X.P\], where XeN is a 
non-terminal symbol of the parser and P is a set of packets. 
Q = the set of states of the PDA, each of the form 
<P~,P,,buffer>, where P~ and P~ are sets of packem. In general 
Pt and P: are erupt3" sets except for those states that represent 
dropping of a current a~ve node in the parser. Pt is the set 
of packets to be activated explicitly after the drop operation, 
and P~ is the set of those packets that are deactivated. "buffer" 
a suing in (\](1)v)(m)\[v(k), where 0~r~b-k The last 
vertical bar in "buffer" denotes the position of the current 
window in the parser and those on the left indicate former 
window positions. 
qo = the initial state = ¢~,~X>, where X denotes the null 
suing. 
f = the final state = <~.e~S,>. This state corresponds to the 
outcome of an activation of the final packet in the parser. In 
this way, i.e., by dropping the So node into the buffer, we can 
show the acceptance of a sentence simultaneously by empty 
stack and by final state. 
Z, = the start symbol - \[S~,P~, where P, is the set of initial 
packets, e.~, {SS-Start, C-Pool} in Marcus' parser. 
6 = the move function of the PDA, deemed in the following 
way: 
Let P denote a set of active packets, X an active node 
and WIW2...W n, n < k, the content of a window. Let 
o\[WIW2...WnS be a suing (representing the buffer) Such that: 
~ e (\[(1) V)(b-k) and " fleV where Length(o WlW2_WnB)~b. 
and a' is the suing a in which vertical bar's are erased. 
~on-),-move~; The non-X-moves of the PDA A correspond to 
bringing the input tokens into the buffer for examination by 
the parser. In Marcus' parser input tokens come to the 
attention of parser as they are needed. Therefore. we can 
assume that when a rule tests the contents of n cells of the 
window and there are fewer tokens in the buffer, terminal 
symbols will be brought into the buffer. More specifically, if 
M(P,X,W!...W n) has a defined value (i.e., P contains a packet 
with a rule that has pattern segment \[X\]\[W:t\]_\[Wn\]), then 
(<e ,o ~lwz _w~ >,w3. ~.\[ X.P\] ) = 
(<o.O.a\[WI-WjW3÷I>.\[X.P\]) for all a. and for j = 0, _, n--1 
and Wj÷l eI'~. 
),-moves: By 7,-moves, the PDA mimics the actions of the 
parser on successful matches. Thus the ~-function on ), input 
corresponding to each individual atomic action is determined 
according to one of the following cases, 
C~¢ (I) and (2): 
If M(P,X,W!W2...W n) = "activate PI; deactivate P2" (or 
"deactivate P2; activate P\].'), then 
6 (<~ ,~ ~\[ w I w 2..w n B >A.\[x.P\]) = 
(<¢,¢,o\[WIW2...Wn~>,\[X,(P U PI)--P2\]) for all a md B. 
Case (3): 
If M(P,X,WIW2_W:L-W n) = "attach ith (normally i is I); 
deactivate \])1; activate P2", then 
(<~ .0 ," I w1-.wt .-.Wn B >A .\[x~'\] ) - 
(<¢,¢,alW1...W£_iW£+1..WnB>. \[X,(P 11 P2)-PI\]) for all 
Cases (4) and ($): 
If M(P,X,WI_Wn)= "deactivate P1; create/cattach Y; activate 
P2" then 
6 (<e .o a 1%..-.Wn B >A,\[ x,P\] ) = 
(<~,,,~lwz..wna>. \[x,P-P1\]\[Y~'2\]) for ~u o and B. 
Case (6): 
If M(P.X,W1...W n) = "drop; deactivate P1; activate P2", then 
6(<o,e,olW!_Wna>),,\[X.P\]) = (<P2,PlaIWI..WnS>,7`) for all 
o and B, and fm'thermore 
6 (<P2'PI'a\[ W1 -Wn B >,7`.\[Y,P'~ ) " 
(<~,~. alWI..WnB>, \[Y.(P' U P2)-PI\]) for all a and 8, and 
Fe2 P. YeN. 
The latter move corresponds to the deactivation of the packets 
PI and activation of the packets P2 that follow the dropping of 
a curt'erie active node. 
Case (7): 
If M(P,X,WI-W n) = "drop into buffer; deactivate PI; activate 
P2", (where n < k), then 
6(<,.,.,Iwl..Wna>.x.\[xy\]) - (<P2,PI,aIXWI..WnB>A) for 
all a and a, and furthermore 
6 (~2 a'x ~1 xwz.-Wn a >A,\[ Y~q ) - 
(<o,e,~IXW~..Wna>, \[Y.(P' U P2)-P:\].\]) for all a and B. and 
for all P'eY and YeN. 
Case (8): 
If M(P.X.Wl..Wi...W n) = "shift attention to ith cell; deactivate 
PX; activate P2", then 
6 (<o ,~ ~l wl-.w~ _w n a >A .ix.P\] ) = 
(<,.e,alwl...~w£_WnB>. \[x,(P v P2)-P1\]) for all o and B. 
Case (9): 
If M(P,X,Wi...Wn)= "restore buffer; deactivate PI; a~vate P2", 
then 
6 (<o .o ,a ,I o ,\[ WX---Wn a >.X.\[ X.P\] ) = 
(<e,e,a,\[a,Wl...Wna>. \[X.(P U P2)-P1\]) for all a,,,,, and S 
such that ¢~ contains no vertical bar. 
Now from the construction of the PDA, it is obvious 
that A accepts those strings of terminals that are parsed 
successfully by the simplified parser. The reader may note that 
the value of 6 is undefined for the "cases in which 
M(X,P,Wt_Wn) has multiple values. This accounts for the fact 
that Marcos' parser behaves in a deterministic way. 
Furthermore. many of the states of A are unreachable. This is 
due to the way we constructed the PDA, in which we 
considered activation of every subset of P with any active node 
119 
and any Iookahead window. 
5. Simulation of the general parser 
It is possible to lift the resu'ictions on the simpLified 
parser by modifying the PDA. Here. we describe how Marcus' 
parser can be simulated by a generalized form of the PDA. 
fi) Non-atomic actions; 
The behaviour of the parser with non-atomic actions can be 
described in terms of M'eM*. a sequence of compositions of 
M. which in turn can be specified by a sequence 6' in 6". 
(ii) Accef~ibilirv 9f desefndants of current 8ctive node. and 
current cyclic node: 
What parts of the partial parse tree are accessible in Marcus' 
parser seems to be a moot point Marcus \[4\] states 
"the parser can modify or directly examine exactly two 
nodes in the active node stack.., the current active node 
aad S or NP node closest to the bottom of gacl¢... 
called the dominming cy¢lic node.., or... current cyclic 
node... The parser ia aLso free to exanune the 
descendants of these two nodex .... although the parser 
cannot modify them. It does this by specif)~ng the 
exact path to the descendant it wishes to examine." 
The problem is that whether by descendants of these 
two nodes, one means the immediate daughters, or descendants 
at arbiu'ary levels. It seems plausible that accessibility of 
immediate descendants is sufficient. To explore this idea, we 
need to examine the reason behind pardal tree accesses in 
Marcus' parser. It could be argued that tree accessibility serves 
two purposes: 
(I) Examinin~ what daughters are attached to the current active 
node considerably reduces the number of packet rules one 
needs to write. 
(2) Examining the current cyclic node and its daughters serves 
the purpose of binding traces. Since transformations are applied 
in each transformat/onal cycle to a single cyclic node, it seems 
urmecessary to examine descendants of a cyclic node at 
arbitrarily lower levels. 
If Marcus' parser indeed accesses only the immediate 
daughters (a brief examination of the sample grammar \[4\] does 
not seem to conwadict this): then the accessible part of the a 
parse tree can represented by a pair of nodes and their 
daughters. Moreover, the set of such pairs of height--one trees 
are finite in a grammar. Furthermore, if we extend the access 
to the descendants of these two nodes down to a finite fixed 
depth (which, in fact seems to have a supporting evidence from 
X theory and C-command), we will still be able to represent 
the accessible pans of parse trees with a finite set of f'mite 
sequences of fixed height trees, 
A second interpretation of Marcus' statement is that 
descendants of the current cyclic node and current active node 
at arbium-ily lower levels are accessible to the parser. However, 
in the presence of non--cyclic recussive constructs, the notion of 
giving an exact path to a descendant of the current a~ve or 
current cyclic node would be inconceivable; in fact one can 
argue that in such a situation parsing cannot be achieved 
through a i'mite number of rifle packets. The reader is 
reminded here that PIDGIN (unlike most programming 
languages) does not have iterative or re, cursive constructs to test 
the conditions that are needed under the latter interpretation. 
Thus, a meaningful assumption in the second case is to 
consider every recursive node to be cycl/c, and to Limit 
accessibility to the sobtree dominated by the current cyclic node 
in which branches are pruned at the lower cyclic nodes. In 
general, we may also include cyclic nodes at fixed recursion 
depths, but again branches of a cyclic node beyond that must 
be pruned, in this manner, we end up with a finite number of 
finite sequences (hereafmr called forests) of finite trees 
represenung the accessible segments of partial parse uee~ 
Our conclusion is that at each stage of parsing the 
accessible segment of a parse tree. regar~ess of how we 
interpret Marcus' statement, can be represented by a forest of 
trees that belong to a finite set Tlc,h. Tlc,h denotes the set of 
all trees with non-termirml roots and of a maximum height h. 
In the general case, th/s information is in the form of a forest. 
rather than a pair of trees, because we also need to account for 
the unattached subtrees that reside in the buffer and may 
become an accessible paxt of an active node in the future. 
Obviously, these subtrees will be pruned to a maximum height 
h-1. Hence, the operation of the parser can be characterized by 
the partial function M from active packets, subtrees rooted at 
current acdve and cyclic nodes, and contents of the window into 
compound actions, i.e.. 
M: Y'X(T,, h u \[_x.})xCrc, h u ,Xl)XCr+t,h.~. u zY k) 
"* ACTIONS 
where TC, h is the subset of "IN, h consisting of the trees with 
cyclic roo~ 
In the PDA simulating the general parser, the set of 
stack symbols F would be the set of u'iples \[T¥,Tx,P\], where 
T¥ and T x are the subtrees rooted at current cyclic node Y 
and current ac~ve node X, and P is the set of packets 
associated with X. The states of this PDA will be of the form 
<X.P~.P2,huffer>. The last three elements are the same as 
before, except that the buffer may now contain subtrees 
belonging to TlC,h. 1. (Note that in the simple case. when h=l. 
TIC,hol=N). The first entry is usually ), except that when the 
current active node X is dropped, this element is changed to 
T' x. The subu'ee "I x is the tree dominated by X. i.e., T X. 
pruned to the height h-1. 
Definition of the move function for this PDA is very 
similar to the simplified case. For example, under the 
120 
assumption that the pair of height-one trees rooted at current 
cyclic node and current active node is accessible to the parser, 
the det'mition of 6 fun~on would include the following 
statement among others: 
If M(P,Tx,T¥,W!_Wn) - "drop; deactivate PZ; activate P2" 
(where T x and T¥ represent the height--one trees rooted at the 
current active and cyclic nodes X and Y), then 
8(<X,e,~.=\[W3.-W1B>. k.\[Ty.Tx,P\]) = 
(<X,P2,PI,alWz_WIa>,X) for all a and 8. Furthermore, 
_6(<XJ'2,Pz~lwz..wla>. X,\[Ty.TzJ"\]) - 
(<x¢¢.o~Wz..wza>. \[Ty.Tz,(r u P2)-Pz\]) for all (TzY) in 
TN,IX2~ such that T z has X as its rightmmt leaf. 
In the more general case (i.e., when h > 1). as we noted ha 
the above, the first entry in the representation of the state will 
be T' x, rather than its root node X. In that case, we will 
replace the righonost leaf node of T Z, i.e., the nonterrmnal X, 
with the subtree T' x. This mechanism of using the first ent23." 
in the representation of a state allows us to relate attachments. 
Also, in the simple case (h=l) the mechanism could be used to 
convey feature information to the higher level when the current 
active node is dropped. More specifically, there would be a 
bundle of features associated with each symbol. When the node 
X is dropped, its associated features would be copied to the X 
symbol appea.tinll in the state of the PDA (via first _8-move). 
The second _8-move allows m to copy the features from the X 
symbol in the state to the X node dominated by the node 7_ 
(iii) Accommodation of fC2tur~$; 
The features used in Marcus' parser are syntactic in nature and 
have f'mite domains. Therefore the set of" attributed symbols in 
that parser constitute a finite set. Hence syntactic features can 
be accommodated in the construction of the PDA by allowing 
complex non-terminal symbols, i.e., at-a'ibuted symbols instead of 
simple ones. 
Feature assitmments can be simulated by .replacing the 
top stack symbol in the PDA. For example, under our previous 
assumption that two height-one trees rooted at current active 
node and current cyclic node are accessible to the parser, the 
definition of _8 function will include the following statement: 
If M(P,Tx:A,T¥:B,Wl...Wn) = "assign features A' to curt'erie 
active node; assign features B' to current cyciic node; deactivate 
Pl; activate P2" (where A,A',B and B' axe sets of features). 
then 
_6(<x~.o l wz...w z B >~, \[% ...T x :A~'\]) = 
(<k'~'~'~lWl"Wla>' \[TY:e U B"Tx:A It A ',(P U P2)-Pz\]) for 
all ° and 8. 
Now, by lifting all three resuictions introduced on the 
simplified parser, it is possible to conclude that Marcus' parser 
can be simulated by a pushdown automaton, and thus accepts a 
context-free set of suing.s. Moreover, as one of the reviewers 
has suggested to us. we could make our result more general if 
we incorporate a finite number of semantic tests (via a finite 
or°de set) into the parser. We could still simulate the parser 
by a PDA. 
Farthermore, the pushdown automaton which we have 
constructed here is a deterministic one. Thus, it confirms the 
de--in+sin of the language which is parsed by Marcus' 
mechanism. We should also point out that our notion of a 
context-free language being deterministic differs from the 
deterministic behavour of the parser as described by Marcus. 
However, since every deterministic language can be parsed by a 
deterministic parser, our result adds more evidence to believe 
that Marcus' paner does not hide non-determinism in any 
form. 
It is easy to obtain (through a standard procedure) an 
LR(1) grammar describing the language accepted by the 
generalized PDA. Although this grammar will be equivalent to 
Marcus' PIDGIN grammar (minus any semantic considerations). 
and it will be a right cover for any undetl.ving surface grammar 
which may be assumed in consu'ucting the Marcus parser, it 
will suffer from being an unnatural description of the language. 
Not only may the resulting structures be hardly usable by any 
reasonable sernantic/pragmatics component, but also parsing 
would be inefficient because of the huge number of 
non-teminals and productions. 
In automatic generation of Marcus-style parsers, one can 
assume either a context-free or a context-sensitive grammar (as 
a base grammar) which one feels is naturally suitable for 
describing surface structures. However, if one chooses a 
context--sensitive grammar then one needs to make sure that it 
only generates a context-free language (which is unsolvable in 
general). In \[5\] and \[0"J, we have proposed a context-free base 
grammar which is augmented with syntactic features (e.g., 
person, tense, etc.) much like amibuted grammars in compiler 
writing systems. An additional advantage with this scheme is 
that semantic features can also be added to the nodes without 
an extra effort. In this way one is also able to capture the 
context-sensitivity of a language. 
6. Conclusions 
We have shown that the information examined or 
modified during Marcus parsing (i.e., segments of partial parse 
trees, contents of the buffer and active packets) for a PIDGIN 
grmm'nar is a finite set. By encoding this information in the 
stack symbols and the states of a deterministic pushdown 
automaton, we have shown that the resniting PDA is equivalent 
to the Marcus parser. In this way we have proved that the set 
of surface sentences accepted by this parser is a context-free 
set. 
An important factor in this simulation has been the 
assumption that the buffer in a Marcus style parser is bounded. 
It is unlikely that all parsers with unbounded buffers written in 
121 
this Style can be simulated by determiuistic pushdown automata. 
Parsers with unbounded buffers (i.e., two--stuck pa~rs) are used 
either for recognition of context--sensitive ignguages, or if they 
parse context-free bmguases, possibly W hide the 
non-determinism of a language by storing an ~ted number 
of lookabeads in the buffer. However, ~ does not mean that 
some Marc~-type parsers that use an unbounded buffer in a 
conswained way are not equivalent to pushdown automata. 
Shipman and Marcus \[7\] consider a model of Marcus' parser in 
which the active node s~ack and buffer are combined w give a 
single data suuctme that holds both complete and incomplete 
sub~ees. The original stack nodes and their lcokaheads 
aJtemately re~de on ~ s'u'ucum~. Letting an n,limited number 
of completed conswacts and bare terrnlr'21~ reside on the new 
su~cmre is equivalem to having an unbounded buffer in the 
original model Given the resmcuon that auadunents and drops 
are always limited to the k+l riLzhUno~ nodes of this data 
structure, it is possible to now that a parser in this model with 
an unbounded buffer s~ can be simulated with an orrllns~. 
pushdown autotoaton. (The equivalent condition in the originaJ 
model is to res~a the window to the k rightmost elemmts of 
the hurler. However simuiation of the singte structm'e ptner is 
much more su-aightforw'ard.) 
ACKNOWI.£DGEM~"rs 
The author is indebted to Dr. Lcn Schubert for posing 
the question and ~.J'ully reviewing an eazly dr~ of This 
paper, and to the referees for their helpful comments. The 
resecrch reported here was supported by the Nann'zl Scionces 
and Engineerinl~ Research Council of Canada operating \[m~nr, s 
A8818 and 69203 at the universities of Alberta and Simon 
Fraser. 
REFt~t'~ICES 
\[1\] R.C Berw/ck. The Aequistion of S.vlm~¢ Kmwle~. MIT 
Press. 1985. 
\[2\] E Charniak. A paxser with something for everyone_ 
Parsing natural Iongua~. ed. M. King. PP. 11"/-149. Academic 
Press, London. 1983. 
\[3\] IC Cuiik H and P,. Cohen. I.R-regular grJmrnar~: an 
extension of LR(k) gr~mm*,s. Join'hal of Compmer sad S.ntm 
Sciem~, voL 7, pp. 66-96. 1973. 
\[4\] M.P. Marcu~ A Theory. of Syatactic Rece~itioe for 
Natural Langnal~ MIT Press, Cambridge, MA. 1980. 
\[5\] P,- NozohonPFwJ~L LRRL~) ~ • left m tiSh~ 
pa.,~g uchn/que with n~duced look~ead~ Ph.D. thed.~ Dept 
of Compmin~ Science, Umverdv/of Alberta. 1986` 
\[6\] R. Nozohoor"Ftrdl/. On form~ll,ltions of Mau¢l~' ~. 
COL/NC-86` 1986. 
\[7\] D.W. Shipman and M.P. Maxcm. Towards minimal dam 
for demTnln~'nc ~ IJCAI-~. 1979. 
\[8\] T.G. Szymamk/ and LH. Wali,,,,~ N~ 
ex~m/uns of bouom-up parting techniques. SIAM Jmnal of 
Computing. voL 5. ~ Z PP. 231-'..50. June 1976. 
\[9\] D.A. Walte~ Dem~/nistic conwxPsem/tive languages. 
Information and Control. voL 17. pp. 14-61. 1970. 
122 
