DETERMINISTIC LEFT TO RIGHT PARSING OF 
TREE ADJOINING LANGUAGES* 
Yves Schabes 
Dept. of Computer & Information Science 
University of Pennsylvania 
Philadelphia, PA 19104-6389, USA 
schabes@linc.cis.upenn.edu 
K. Vijay-Shanker 
Dept. of Computer & Information Science 
University of Delaware 
Newark, DE 19716, USA 
vijay@udel.edu 
Abstract 
We define a set of deterministic bottom-up left to right 
parsers which analyze a subset of Tree Adjoining Lan- 
guages. The LR parsing strategy for Context Free 
Grammars is extended to Tree Adjoining Grammars 
(TAGs). We use a machine, called Bottom-up Embed- 
tied Push Down Automaton (BEPDA), that recognizes 
in a bottom-up fashion the set of Tree Adjoining Lan- 
guages (and exactly this se0. Each parser consists of a 
finite state control that drives the moves of a Bottom-up 
Embedded Pushdown Automaton. The parsers handle 
deterministically some context-sensitive Tree Adjoining 
Languages. 
In this paper, we informally describe the BEPDA then 
given a parsing table, we explain the LR parsing algo- 
rithm. We then show how to construct an LR(0) parsing 
table (no lookahead). An example of a context-sensitive 
language recognized deterministically is given. Then, 
we explain informally the construction of SLR(1) pars- 
ing tables for BEPDA. We conclude with a discussion 
of our parsing method and current work. 
1 Introduction 
LR(k) parsers for Context Free Grammars (Knuth, 1965) 
consist of a finite state control (constructed given a CFG) 
that drives deterministically with k lookahead symbols 
a push down stack, while scanning the input from left 
to right. It has been shown that they recognize exactly 
the set of languages recognized by deterministic push 
down automata. LR(k) parsers for CFGs have been 
proven useful for compilers as well as recently for nat- 
ural language processing. For natural language process- 
ing, although LR(k) parsers are not powerful enough, 
*The first author is partially supported by Darpa grant N0014-85- 
K0018, ARO grant DAAL03-89-C-003iPRI NSF grant-IRIS4-10413 
A02. We are extremely grateful to Bernard Lang and David Weir for 
their valuable suggestions. 
276 
conflicts between multiple choices are solved by pseudo- 
parallelism (Lang, 1974, Tomita, 1987). This gives rise 
to a class of powerful yet efficient parsers for natural 
languages. It is in this context that we study determin- 
istic (LR(k)-style) parsing of TAGs. 
The set of Tree Adjoining Languages is a strict su- 
perset of the set of Context Free Languages (CFLs). 
For example, the cross serial dependency constmction 
in Dutch can be generated by a TAG. 1 Waiters (1970), 
R~v6sz (1971), Turnbull and Lee (1979) investigated 
deterministic parsing of the class of context-sensitive 
languages. However they used Turing machines which 
recognize languages much more powerful than Tree Ad- 
joining Languages. So far no deterministic bottom-up 
parser has been proposed for any member of the class 
of the so-called "mildly context sensitive" formalisms 
(Joshi, 1985) in which Tree Adjoining Grammars fall. 2 
Since the set of Tree Adjoining Languages (TALs) is a 
strict superset of the set of Context Free Languages, in 
order to define LR-type parsers for TAGs, we need to 
use a more powerful configuration then a finite state au- 
tomaton driving a push down stack. We investigate the 
design of deterministic left to right bottom up parsers for 
TAGs in which a finite state control drives the moves 
of a Bottom-up Embedded Push Down Stack. The class 
of corresponding non-deterministic automata recognizes 
exactly the set of TALs. 
We focus our attention on showing how a bottom- 
up embedded pushdown automaton is deterministically 
driven given a parsing table. To illustrate the building 
of a parsing table, we consider the simplest case, i.e. 
building of LR(0) items and the corresponding LR(0) 
1The parsers that we develop in this paper can parse these con- 
structions deterministically (see Figure 5). 
2Tree Adjoining Grammars, Modified Head Grammars, Linear In- 
dexed Grammars and Categorial Grammars (all of which generate 
the same subclass of context-sensitive languages) fall in the class of 
the so-called "mildly context sensitive" formalisms. The Embedded 
Push Down Automaton recognizes exactly this set of languages (Vijay- 
Shanker 1987). 
parsing table for a given TAG. An example for a TAG 
generating a context-sensitive language is given in Fig- 
ure 5. Finally, we consider the construction of SLR(1) 
parsing tables. 
We assume that the reader is familiar with TAGs. We 
refer the reader to Joshi (1987) for an introduction to 
TAGs. We will assume that the trees can be combined 
by adjunction only. 
2 Automata Models of Tags 
Before we discuss the Bottom-up Embedded Push- 
down Automaton (BEPDA) which we use in our parser, 
we will introduce the Embedded Pushdown Automaton 
(EPDA). An EPDA is similar to a pushdown automaton 
(PDA) except that the storage of an EPDA is a sequence 
of pushdown stores. A move of an EPDA (see Figure 1) 
allows for the introduction of bounded pushdowns above 
and below the current top pushdown. Informally, this 
move can be thought of as corresponding to the adjoin- 
ing operation move in TAGs with the pushdowns intro- 
duced above and below the current pushdown reflecting 
the tree structure to the left and right of the foot node of 
an auxiliary being adjoined. The spine (path from root 
to foot node) is left on the previous stack. 
The generalization of a PDA to an EPDA whose stor- 
age is a sequence of pushdowns captures the generaliza- 
tion of the nature of the derived trees of a CFG to the 
nature of derived trees of a TAG. From Thatcher (1971), 
we can observe that the path set of a CFG (i.e. the set 
of all paths from root to leaves in trees derived by a 
CFG) is a regular set. On the other hand, the path set 
of a TAG is a CFL. This follows from the nature of the 
adjoining operation of TAGs, which suggests stacking 
along the path from root to a leaf. For example, as we 
traverse down a path in a tree 3' (in Figure 1), if ad- 
junction, say by/~, occurs then the spine of/~ has to be 
traversed before we can resume the path in 7. 
~ e ~ -gQeft of foot d \[~ ~ .,~splne of I~ i~fight d foot of ~ 
Figure 1: Embedded Pushdown Automaton 
277 
3 Bottom-up Embedded Push- 
down Automaton 3 
For any TAG G, an EPDA can be designed such that 
its moves correspond to a top-down parse of a string 
generated by G (EPDA characterizes exactly the set of 
Tree Adjoining Languages, Vijay- Shanker, 1987). If 
we wish to design a bottom-up parser, say by adopting 
a shift reduce parsing strategy, we have to consider the 
nature of a reduce move of such a parser (i.e. using 
EPDA storage). This reduce move, for example applied 
after completely considering an auxiliary tree, must be 
allowed to 'remove' some bounded pushdowns above 
and below some (not necessarily bounded) pushdown. 
Thus (see Figure 2), the reduce move is like the dual of 
the wrapping move performed by an EPDA. 
Therefore, we introduce Bottom-up Embedded Push- 
down Automaton (BEPDA), whose moves are dual of 
an EPDA. The two moves of a BEPDA are the unwrap 
move depicted in Figure 2 - which is an inverse of 
the wrap move of an EPDA - and the introduction of 
new pnshdowns on top of the previous pushdown (push 
move). In an EPDA, when the top pnshdown is emp- 
tied, the next pushdown automatically becomes the new 
top pushdown. The inverse of this step is to allow for 
the introduction of new pushdowns above the previous 
top pushdown. These are the two moves allowed in a 
BEPDA, the various steps in our parsers are sequences 
of one or more such moves. 
Due to space constraints, we do not show the equiva- 
lence between BEPDA and EPDA apart from noting that 
the moves of the two machines are dual of each other. 
4 LR Parsing Algorithm 
An LR parser consists of an input, an output, a sequence 
of stacks, a driver program, and a parsing table that has 
three parts (ACTION, GOTOright and GOTO.foot). The 
parsing program is the same for all LR parsers, only 
the parsing tables change from one grammar to another. 
The parsing program reads characters from the input one 
character at a time. The program uses the sequence of 
stacks to store states. 
The parsing table consists of three parts, a pars- 
ing action function ACTION and two goto functions 
GOTOright and GOTOloot. The program driving the 
LR parser first determines the state i currently on top 
of the top stack and the current input token at. Then it 
consults the ACTION table entry for state i and token 
3The need to use bottom-up version of an EPDA in LR style pars- 
ing of TAGs was suggested to us by Bernard Lang and David Weir. 
Also their susgestions played all insU~llaK~\[ v01e in the definition of 
BBPDA, for example restriction on the moves allowed. 
read only input tape 
u 
stack of aac~ 
BEPDA 
Bounded number \[1 
of stacks II of bounded size 
1 Bounded number \[~ 
of stack elements 
Unbounded number (1 of stack elements ~.J 
Bounded number 
of stacks II of bounded size ~,1 
A~ 
All 
al 
BI 7" 
Bn EPDA 
lnove 
UNWRAP move 
\[\] 
PUSH move 
Figure 2: Bottom-up Embedded Pushdown Automaton 
at. The entry in the action table can have one of the 
following five values: 
• Shift j (s j), where j is a state; 
• Resume Right of 6 at address dot (rs6@dot)), 
where 6 is an elementary tree and dot is the ad- 
dress of a node in 6; 
• Reduce Root of the auxiliary tree/5 in which the 
last adjunction on the spine was performed at ad- 
dress star (rd/3@star); 
• Accept (acc); 
• Error, no action applies, the parsers rejects the in- 
put string (errors are associated with empty table 
entries). 
The function GOTOright and GOTOfoo, take a state 
i and an auxiliary tree # and produce a state j. 
An example of a parsing table for a grammar gener- 
ating L = {anbnecndnln > 0} is given in Figure 5. 
We denote an instantaneous description of the 
BEPDA by a pair whose first component is the sequence 
of pushdowns and whose second component is the un- 
expanded input: 
(lltm'' "till" "-Ilsl" "sw, a~a~+l...a,$) 
In the above sequence of pushdowns, the stacks are 
piled up from left to right. II stands for the bottom of a 
stack, s~ is the top element of the top stack, Sx is the 
bottom element of the top stack, tl is the top element 
of the bottom stack and tm is the bottom element of the 
bottom stack. 
The initial configuration of the parser is set to: 
(110, al-..an$) 
where 0 is the start state and ax • .. a,$ is the input string 
to be read with an end marker ($). 278 
Suppose the parser reaches the configuration: 
(lit,,," "till" "IIi~""" ill, arar+l.., an$) 
The next move of the parser is determined by reading 
at, the current input token and the state i on top of the 
sequence of stacks, and then consulting the parsing table 
entry for ACTION\[i, a,\]. The parser keeps applying the 
move associated with ACTION\[i, at\] until acceptance or 
error occurs. The following moves are possible: 
(i) 
(ii) 
ACTION\[/, at\] = shift state j (,j). The parser exe- 
cutes a push move, entering the configuration: 
(lltm''' tx II"" IIi~o • • • ilillJ, at+l"'" an$) 
ACTION\[/, at\] = resume right of 6 at address dot 
(rs6@doO. The parser is coming to the right and 
below of the node at address dot in 6, say ri, on which 
an auxiliary tree has been adjoined. The information 
identifying the auxiliary tree is in the sequence of 
stacks and must be recovered. There are two eases: 
Case 1:71 does not subsume a foot node. Let k 
be the number of terminal symbols subsumed by r/. 
Before applying this move, the current configuration 
looks like: 
(ll"" Ilikll "" IIi111i, a,.. "an$) 
The k top first stacks are merged into one stack 
and the stack IIm is pushed on top of it, where 
m = GOTOfoo,\[ik, #\] for some auxiliary tree # that 
can be adjoined in 6 at 71, and the parser enters the 
configuration: 
(11""" Ilikllit-t "'" ix illm, at"" a,$) 
Case 2:~7 subsumes the foot node of 6. Let k (resp. 
k') be the number of terminal symbols to the right 
(resp. to the left) of the foot node subsumed by r/. 
Before applying this move, the configuration looks 
like: 
(ll" "" Ilnv+tll""" Ilnxllsl" "" szllik" "" Iii111i, a,--. a.$) 
The k' stacks below the k + 2 *h stack from the top 
as well as the k + 1 top stacks are rewritten onto the 
k + 2 th stack and the stack lira is pushed on top of it, 
where m = GOTO/oot\[nk,+ x,/3\] for some auxiliary 
tree ~ that can be adjoined in 6 at ,7, and the parser 
enters the configuration: 
(11"" Ilnv+lllsl "" .sink .... nlik.., ixil\]m, a~... an$) 
(iii) ACTION\[/, at\] = reduce root of an auxiliary tree/3 
in which the last adjunction on the spine was per- 
formed at address star (rdfl@star). The parser has 
finished the recognition of the auxiliary tree/L It 
must remove all information about/3 and continue 
the recognition of the tree in which/3 was adjoined. 
The parser executes an unwrap move. Let k (resp. 
k') be the number of terminal symbols to the left 
(resp. to the righO of the foot node of B. Let ff be 
the node at address star in/3 (ff = nil if star is not 
set). Let p be the number of terminal symbols to 
the left of the foot node subsumed by ~ (p = 0 if 
= nil). p + k' + 1 symbols from the top of the 
sequence of stacks popped. Then k - p single ele- 
ment stacks below the new top stack are unwrapped. 
Let j be the new top element of the top stack. Let 
ra = GOTOriaht~, t~\]. j is popped and the single 
element stack lira is pushed on top of the top stack. 
By keeping track of the auxiliary trees being reduced, 
it is possible to output a parse instead of acceptance or 
an error. 
The parser recognizes the derived tree inside out: it 
extracts recursively the innermost auxiliary tree that has 
no adjunction performed in it. 
5 LR(0) Parsing Tables 
This section explain how to construct an LR(0) parsing 
table given a TAG. The construction is an extension 
of the one used for CFGs. Similarly to Schabes and 
Joshi (1988), we extend the notion of dotted rules to 
trees. We define the closure operations that correspond 
to adjunction. Then we explain how transitions between 
states are defined. We give in Figure 5 an example of 
a finite state automaton used to build the parsing table 
for a TAG (see Figure 5) generating a context-sensitive 
language. 
We first explain preliminary concepts (originally de- 
fined to construct an Earley-type parser for TAGs) that 
will be used by the algorithm. Dotted rules are extended 
to trees. Then we recall a tree traversal that the algo- 
rithm will mimic in order to scan the input from left to 
right. 
A dotted symbol is defined as a symbol associated 
with a dot above or below and either to the left or to 
279 
the right of it. The four positions of the dot are anno- 
tated by ia, ib, ra, rb (resp. left above, left below, right 
above, right below): taa,~ In practice, only two dot Ib.L.rb • 
positions can be used (to the left and to the fight of 
a node). However, for sake of simplicity, we will use 
four different dot positions. A dotted tree is defined 
as a tree with exactly one dotted symbol. Furthermore, 
some nodes in the dotted tree can be marked with a star. 
A star on a node expresses the fact that an adjunction 
has been performed on the corresponding node. A dot- 
ted tree is referred as \[c~, dot, pos, stars\], where o~ is a 
tree, dot is the address of the dot, pos is the position of 
the dot (la, lb, ra or rb) and stars is a list of nodes in 
a annotated by a star. 
Given a dotted tree with the dot above and to the left 
of the root, we define a tree traversal of a dotted tree (as 
shown in the Figure 3) that will enable us to scan the 
frontier of an elementary tree from left to right while try- 
ing to recognize possible adjunctions between the above 
and below positions of the dot of interior nodes. 
STAa  : 
 .ao 
• 
E F G H I 
2.1 2.2 2.3 3.1 3.2 
Figure 3: Left to Right Tree Traversal 
A state in the finite state automaton is defined to be 
a set of dotted trees closed under the following opera- 
tions: Adjunction Prediction, Left Completion, Move 
Dot Down, Move Dot Up and Skip Node (See Fig- 
tire 4). 4 
Adjunction Prediction predicts all possible auxiliary 
trees that can be adjoining at a given node. Left Com- 
pletion occurs when an auxiliary tree is recognized up 
to its foot node. All trees in which that tree can be 
adjoined are pulled back with the node on which ad- 
junction has been performed added to the list of stars. 
Move Dot Down moves the dot down the links. Move 
Dot Up moves the dot up the links. Skip Node moves 
the dot up on the right hand side of a node on which no 
adjunction has been performed. 
All the states in the finite state automaton (FSA) must 
be closed under the closure operations. The FSA is 
4These operations correspond to proeesson in the Eadey-type 
parser for TAGs. 
/% 
/% 
"A 
Adjunction Prediction Move Dot Up Move Dot Down 
A 
Left Completion stap node 
Figure 4: Closure Operations 
build as follows. In states set 0, we put all initial trees 
with a dot to the left and above the root. The state is 
then closed. Then recursively we build new states with 
the following transitions (we refer to Figure 5 for an 
example of such a construction). 
• A transition on a (where a is a terminal symbol) 
from Si to Sj occurs if and only if in Si there is a 
dotted tree \[6, dot, la, stars\] in which the dot is to 
the left and above a terminal symbol a; Sj consists 
of the closure of the set of dotted trees of the form 
\[6, dot, ra, stars\]. 
• A transition on/3~ight from Si to Sj occurs iff in 
Si there is a dotted tree \[8, dot, rb, stars\] such that 
the dot is to the right and below a node on which 
/3 can he adjoined; Sj consists of the closure of the 
set of dotted trees of the form \[8, dot, ra, stars'\]. 
If the dotted node of \[8, dot, rb, stars\] is not on the 
spine 5 of 8, star' consists of all the nodes in star 
that strictly dominate the dotted node. When the 
dotted node is on the spine, stars' consists of all 
the nodes in star that strictly dominate the dotted 
node, ff there are some, otherwise stars' = {dot}. 
• A Skip foot of \[/3, dot, lb, stars\] transition from 
Si to Sj occurs iff in S~ there is a dotted tree 
\[/3, dot, lb, stars\] such that the dot is to the left 
and below the foot node of the auxiliary tree/3; Sj 
consists of the closure of the set of dotted trees of 
the form \[/3, dot, rb, stars\]. 
The parsing table is constructed from the FSA built as 
above. In the following, we write trans(i, z) for set of 
states in the FSA reached from state i on the transition 
labeled by z. 
The actions for ACTION(i, a) are: 
• Shift j (sc(j)). It applies fff j E trans(i, a). 
5Nodes on the path from root node to foot node. 280 
• Resume Right of /6, dot, rb, stars\] (rsS@dot). 
It applies iff in state i there is a dotted tree 
\[8, dot, rb, stars\], where dot E stars. 
• Reduce Root of/3 (rd/3@star). It applies iff in 
state i there is a dotted tree \[/3, O, ra, {star}\], where 
/3 is an auxiliary tree. 6 
• Accept occurs iff a is the end marker (a = $) and 
there is a dotted tree \[~, O, ra, {star}\], where a is 
an initial tree and the dot is to the right and above 
the root node. 
• Error, if none of the above applies. 
The GOTO table encodes the transitions in the 
FSA on non-terminal symbols. It is indexed by 
a state and by /3right or /31oot, for all auxiliary 
trees /3: j G GOTO(i, label) iff there is a tran- 
sition from i to j on the given label (label E 
{/3riaht,/3/oot I/3 is an auxiliary tree}. 
If more than one action is possible in an entry of the ac- 
tion table, the grammar is not LR(0): there is a conflict 
of action, the grammar cannot be parsed deterministi- 
tally without lookahead. 
An example of a finite state automaton used for the 
construction of the LR(0) table for a TAG (trees cq,/31 
in Figure 5) generating 7 L = {anbneendnln >_ O}, its 
corresponding parsing table is given and an example of 
sequences of moves are given in Figure 5. 
60 is the address of the root node. 
tin the given TAG (trees ~1 and/31), if we omit a and c, we obtain 
a TAG that is similar to the one for the Dutch cross-serial construction. 
This grammar can still bc handled by an LR(0) parser. 
In the trees c~ and /3, na stand for null adjuncfion constraint (i.e. 
no anxifiary tree can be adjoined on a node with null adjunction 
constraint). 
TAG for L = {a"b~ec"d "} 
Sea A',, 
a Sd 
(~) //~ 
b S~a e 
a S d 
b S~ 
"~ • bS.o 
• ,' S d 
b S~ 
s 
(~)l 
e 
'~ S~d -~ b S d a'$ d It a • 
/t,, /1",, /r',, b'Sc b Snac b Suc b Sna¢ 
I 
a/~d "a S d a.~ • Sd .,.S*d 
/~ ./~ \[b -S~ c b Suc b Suc b S~,a¢ b S~a¢ 
"Ae Ae, Ae • S* d a S*d • S* d 
aSd aSd 
b S~c b.Snac 
a S* d e *e ./1~ 
bSc 
aSd 
bS, c 
I0 I' ~ ~ 7 
o/rN. "bS~ b S c b Sine 
8 
1~ '~*C~ ~ 12( Jl~u ~3 (~°~v b ~*~ :~t I~ \]a S d a S*~l\[~ dl a S*d " S ¢ 
b F---I Z n,¢', 
cT 
a S*d /'I',,, 
bS¢ 
b Snac b S~a~) 
\[ PARSING ACTION II GOTO 
I II fcot \[\[ right 
Finite State Aatomaton for a BEPDA Recognizing L = { a " b " ecn d" } 
a b c d e $ /5' /3 
Parser configuration Next move 
(llo, aabbeccdd$) 
(lloll2, abbeccdd$) 
<110112112, bbeccdd$) 
(110112112113, b~ccdd$) 
(110112112113119, eccdd$) 
(110112112ll3ll9ll4, ccdd$) 
(I\]0112112\[\[3\[\[9\[\[4\[\[10, ccdd$) 
(110112112\[\[3\[\[9114\[\[101111, cdd$) 
(110112112113114 9 10 11116, cdd$) 
(110112112113114 9 10 11116117, dd$) 
(110H2H2H3H4 9 10 11\[\[6117\[\[8, d$) 
(110\[\[2ll4 9 101112, d$) 
(lloll2114 9 lO1\[121113, $) 
<110\[15, *) 
s2 
s2 
s3 
s9 
s4 
rsa@O 
sll 
rs~@2 
s7 
s8 
rd~@ - 
s13 
rd/3~2 
ace 
Example of LR(O) Parsing Table Example of sequences of moves 
sj _---- Shift j; rs6~dot -- Resume Right of 6 at dot; rd~star ---- Reduce Root of/~ with star at address star; $ -- end of input. 
Figure 5: Example of the construction of an LR(0) parser for a TAG recognizing L = {a'~bnec"d" } 
281 
6 SLR(1) Parsing Tables 
The tables that we have constructed are LR(0) tables. 
The Resume Right and Reduce Root moves are per- 
formed regardless of the next input token. The accu- 
racy of the parsing table can be improved by comput- 
ing lookaheads. FIRST and FOLLOW can be extended 
to dotted trees, s FIRST of a dotted tree corresponds to 
the set of left most symbols appearing below the subtree 
dominated by the dotted node. FOLLOW of a dotted tree 
defines the set of tokens that can appear in a derivation 
immediately following the dotted node. Once FIRST 
and FOLLOW computed, the LR(0) parsing table can 
be improved to an SLR(1) table: Resume Right and Re- 
duce Root are applicable only on the input tokens in the 
follow set of the dotted tree. 
For example, the SLR(1) table for the TAG built with 
trees oq and ~1 is given in Figure 6. 
I PARSING AC'TION II GOTO\[ 
I I1 foot II right I 
I I'lbl 'c I a lel S I1~11 ~1 
6 
Figure 6: Example of SLR(1) Parsing Table 
By associating dotted trees with lookaheads, one can 
also compute LR(k) items in the finite state automaton 
in order to build LR(k) parsing tables. 
7 Current Research 
The deterministic parsers we have developed do not sat- 
isfy an important property satisfied by LR parsers for 
CFG. This property is often described as the viable pre- 
fix property which states that as long as the portion of 
the input considered so far leads to some stack configu- 
ration (i.e. does not lead to error), it is always possible 
to find a suffix to obtain a string in the language. 
Our parsers do not satisfy this property because the 
left completion move is not a 'reduce" move. This move 
aDue to the lack of space, we do not define FIRST and FOLLOW. 
How¢ver, we explain the basic principles used for the computafi~m of 
FIRST and FOLI£)W. 282 
applies when we have reached a bottom-left end (to the 
left of the foot node) of an auxiliary tree, say/3. If we 
had considered this move to be a reduce move, then by 
popping appropriate amount of elements off the storage 
would allow us to figure out which tree (into which/3 
was adjoined), say a, to proceed with. Rather than us- 
ing this information (that is available in the storage of 
the BEPDA), by putting left completion in the closure 
operations, we apply a move that is akin to the predict 
move of Earley parser. That is we continue by consider- 
ing every possible nodes/3 could have been adjoined at, 
which could include nodes in trees that were not used 
so far. However, we do not accept incorrect strings, we 
only lose the prefix property (for an example see Fig- 
ure 7). As a consequence, errors are always detected but 
not as soon as possible. 
Parser configuration Next move 
(\[10, aabeccdd$) 
¢11o112, abeccdd$) 
(liO\[\[2U2, beccdd$) 
(llo112ll2113, ,c,dd$) 
(Iio1\[21121131\[4, ccdd$) 
(11o1121121131141\[6, ccdd$) 
(11o112112113114116117, ~dd*) 
s2 
s2 
s3 
s4 
rsa@O 
s7 
¢ITOr 
Figure 7: Example of error detecting 
The reason why we did not consider the left comple- 
tion move to be a reduce move is related to the restric- 
tions on moves of BEPDA which is weakly equivalent 
to TAGs (perhaps also due to the fact that left to right 
parsing may not be most natural for parsing TAGs which 
produce trees with context-free path sets). In CFGs, 
where there is only horizontal stacking, a single reduc- 
tion step is used to account for the application of rule 
in left to right parsing. On the other hand, with TAGs, 
if a tree is used successfully, it appears that a prediction 
move and more than one reduction move are necessary 
for auxiliary tree. In left to right parsing, a prediction is 
made to start an auxiliary tree/3 at top left end; a reduc- 
tion is appropriate to recover the node/3 was adjoined at 
the left completion stage; a reduction is needed again at 
resume right state to resume the right end of t; finally a 
reduction is needed at the right completion stage. In our 
algorithm, reductions are used at right resume stage and 
reduce right state. Even if a reduction step is applied at 
left completion stage, an encoding of the fact that left 
part of/3 (as well as the left part of trees adjoined on 
the spine of/~) has been completed has to be restored in 
the storage (note in a reduction move of any shift reduce 
parser for CFGs, any information about the rule used is 
discarded once reduction step applied). So far we have 
not been able to apply a reduction step at the left com- 
pletion stage, reinsert the left part of fl and yet maintain 
the correct sequence in the storage so that the right part 
of/3 can be recovered at the resume right stage. We are 
considering alternative strategies for shift reduce parsing 
with BEPDA as well as considering whether there are 
other automata models equivalent to TAGs better suited 
for deterministic left to right parsing of tree-adjoining 
languages. 
Conclusion 
We have introduced a bottom-up machine (Bottom-up 
Embedded Push Down Automaton) that enabled us to 
define LR-like parsers for TAGs. The machine recog- 
nizes in a bottom-up fashion exactly the set of Tree Ad- 
joining Languages. 
We described the LR parsing algorithm and a method 
for computing LR(0) parsing tables. We also men- 
tioned the possibility of building SLR(k) parsing tables 
by defining the notions of FIRST and FOLLOW sets for 
TAGs. 
As shown for the example, no lookaheads are nee- 
essary to parse deterministically the language L = 
{anbnec"d"ln >_ O}. If instead of using e, we had the 
empty string e in the initial tree, LR(0)-like parser will 
not be enough. On the other hand SLR(1)-like parser 
will suffice. 
We have noted that our parsers do not satisfy the valid 
prefix property. As a consequence, errors are always 
detected but not as soon as possible. 
Similar to the work of Lang (1974) and Tomita (1987) 
extending LR parsers for arbitrary CFGs, the LR parsers 
for TAGs can be extended to solve by pseudo-parallelism 
the conflicts of moves. 
Lang, Bernard, 1974. Deterministic Techniques for EffÉ- 
cient Non-Deterministic Parsers. In Loeckx, Jacques 
(editor), Automata, Languages and Programming, 
2rid Colloquium, University of Saarbri~cken. Lecture 
Notes in Computer Science, Springer Verlag. 
R6v6sz, G., 1971. Unilateral context sensitive gram- 
mars and left to fight parsing. J. Comput. System Sci. 
5:337-352. 
Schabes, Yves and Joshi, Aravind K., June 1988. An 
Earley-Type Parsing Algorithm for Tree Adjoining 
Grammars. In 26 th Meeting of the Association for 
Computational Linguistics (A CL' 88 ). Buffalo. 
Thatcher, J. W., 1971. Characterizing Derivations Trees 
of Context Free Grammars through a Generalization 
of Finite Automata Theory. J. Comput. Syst. Sci. 
5:365-396. 
Tomita, Masaru, 1987. An Efficient Augmented- 
Context-Free Parsing Algorithm. Computational Lin- 
guistics 13:31--46. 
Turnbull, C. J. M. and Lee, E. S., 1979. Generalized 
Deterministic Left to Right Parsing. Acta lnformatica 
12:187-207. 
Vijay-Shanker, K., 1987. A Study of Tree Adjoining 
Grammars. Phi) thesis, Department of Computer and 
Information Science, University of Pennsylvania. 
Waiters, D.A., 1970. Deterministic Context-Sensitive 
Languages. Inf. Control 17:14--40. 
References 
Joshi, Aravind IC, 1985. How Much Context- 
Sensitivity is Necessary for Characterizing Struc- 
tural Descriptions---Tree Adjoining Grammars. In 
Dowry, D., Karttunen, L., and Zwicky, A. (editors), 
Natural Language Processing--Theoretical, Compu- 
tational and Psychological Perspectives. Cambridge 
University Press, New York. Originally presented in 
a Workshop on Natural Language Parsing at Ohio 
State University, Columbus, Ohio, May 1983. 
Joshi, Aravind K., 1987. An Inmxluction to Tree Ad- 
joining Grammars. In Manaster-Ramer, A. (editor), 
Mathematics of Language. John Benjamins, Amster- 
dam. 
Knuth, D. E., 1965. On the translation of languages 
from left to right. Inf. Control 8:607-639. 283 
