TAL Recognition in O(M(n2)) Time 
Sanguthevar Rajasekaran 
Dept. of CISE, Univ. of Florida 
raj~cis.ufl.edu 
Shibu Yooseph 
Dept. of CIS, Univ. of Pennsylvania 
yooseph@gradient.cis.upenn.edu 
Abstract 
We propose an O(M(n2)) time algorithm 
for the recognition of Tree Adjoining Lan- 
guages (TALs), where n is the size of the 
input string and M(k) is the time needed 
to multiply two k x k boolean matrices. 
Tree Adjoining Grammars (TAGs) are for- 
malisms suitable for natural language pro- 
cessing and have received enormous atten- 
tion in the past among not only natural 
language processing researchers but also al- 
gorithms designers. The first polynomial 
time algorithm for TAL parsing was pro- 
posed in 1986 and had a run time of O(n6). Quite recently, 
an O(n 3 M(n)) algorithm 
has been proposed. The algorithm pre- 
sented in this paper improves the run time 
of the recent result using an entirely differ- 
ent approach. 
1 Introduction 
The Tree Adjoining Grammar (TAG) formalism was 
introduced by :loshi, Levy and Takahashi (1975). 
TAGs are tree generating systems, and are strictly 
more powerful than context-free grammars. They 
belong to the class of mildly context sensitive gram- 
mars (:loshi, et al., 1991). They have been found 
to be good grammatical systems for natural lan- 
guages (Kroch, Joshi, 1985). The first polynomial 
time parsing algorithm for TALs was given by Vi- 
jayashanker and :loshi (1986), which had a run time 
of O(n6), for an input of size n. Their algorithm 
had a flavor similar to the Cocke-Younger-Kasami 
(CYK) algorithm for context-free grammars. An 
Earley-type parsing algorithm has been given by 
Schabes and Joshi (1988). An optimal linear time 
parallel parsing algorithm for TALs was given by 
Palls, Shende and Wei (1990). In a recent paper, 
Rajasekaran (1995) shows how TALs can be parsed 
in time O(n3M(n)). 
In this paper, we propose an O(M(n2)) time 
recognition algorithm for TALs, where M(k) is the 
time needed to multiply two k x k boolean matri- 
ces. The best known value for M(k) is O(n 2"3vs) 
(Coppersmith, Winograd, 1990). Though our algo- 
rithm is similar in flavor to those of Graham, Har- 
rison, & Ruzzo (1976), and Valiant (1975) (which 
were Mgorithms proposed for recognition of Con- 
text Pree Languages (CFLs)), there are crucial dif- 
ferences. As such, the techniques of (Graham, et al., 
1976) and (Valiant, 1975) do not seem to extend to 
TALs (Satta, 1993). 
2 Tree Adjoining Grammars 
A Tree Adjoining Grammar (TAG) consists of a 
quintuple (N, ~ U {~}, I, A, S), where 
N is a finite set of nonterminal symbols, 
is a finite set of terminal symbols disjoint from 
N, 
is the empty terminal string not in ~, 
I is a finite set of labelled initial trees, 
A is a finite set of auxiliary trees, 
S E N is the distinguished start symbol 
The trees in I U A are called elementary trees. All 
internal nodes of elementary trees are labelled with 
nonterminal symbols. Also, every initial tree is la- 
belled at the root by the start symbol S and has 
leaf nodes labelled with symbols from ~3 U {E}. An 
auxiliary tree has both its root and exactly one leaf 
(called the foot node ) labelled with the same non- 
terminal symbol. All other leaf nodes are labelled 
with symbols in E U {~}, at least one of which has a 
label strictly in E. An example of a TAG is given in 
figure 1. 
A tree built from an operation involving two other 
trees is called a derived tree. The operation involved 
is called adjunction. Formally, adjunction is an op- 
eration which builds a new tree 7, from an auxiliary 
tree fl and another tree ~ (a is any tree - initial, aux- 
iliary or derived). Let c~ contain an internal node m 
labelled X and let fl be the auxiliary tree with root 
node also labelled X. The resulting tree 7, obtained 
by adjoining fl onto c~ at node m is built as follows 
(figure 2): 
166 
Initial tree 
O~ 
S I 
E 
G = {{S},{a,b,c,e }, { or}, { ~}, S} 
S 
S 
b S* 
Figure 1: Example of a TAG 
Auxiliary tree 
1. The subtree of a rooted at m, call it t, is excised, 
leaving a copy of m behind. 
2. The auxiliary tree fl is attached at the copy of 
m and its root node is identifed with the copy 
of m. 
3. The subtree t is attached to the foot node of fl 
and the root node of t (i.e. m) is identified with 
the foot node of ft. 
This definition can be extended to include adjunc- 
tion constraints at nodes in a tree. The constraints 
include Selective, Null and Obligatory adjunction 
constraints. The algorithm we present here can he 
modified to include constraints. 
For our purpose, we will assume that every inter- 
nal node in an elementary tree has exactly 2 children. 
Each node in a tree is represented by a tuple < 
tree, node index, label >. (For brevity, we will refer 
to a node with a single variable m whereever there 
is no confusion) 
A good introduction to TAGs can be found in 
(Partee, et al., 1990). 
3 Context Free recognition in 
O( M(n)) Time 
The CFG G = (N,~,P, A1), where 
N is a set of Nonterminals {A1, A2, .., Ak}, 
is a finite set of terminals, 
P is a finite set of productions, 
A1 is the start symbol 
is assumed to be in the Chomsky Normal Form. 
Valiant (1975) shows how the recognition problem 
can be reduced to the problem of finding Transitive 
Closure and how Transitive Closure can be reduced 
to Matrix Multiplication. 
Given an input string aza2 .... an E ~*, the recur- 
sive algorithm makes use of an (n+l)× (n+l) upper 
triangular matrix b defined by 
hi,i+1 = {Ak I(Ak --* a,) E P}, 
bi,j = ¢, for j   i + 1 
and proceeds to find the transitive closure b + of this 
matrix. (If b + is the transitive closure, then Ak E 
b. +. ¢:~ Ak-~ ai .... aj-1) $,J 
Instead of finding the transitive closure by the cus- 
tomary method based on recursively splitting into 
disjoint parts, a more complex procedure based on 
'splitting with overlaps' is used. The extra cost in- 
volved in such a strategy can be made almost negligi- 
ble. The algorithm is based on the following lemma 
Lemma : Let b be an n x n upper triangular ma- 
trix, and suppose that for any r > n/e, the tran- 
sitive closure of the partitions \[1 < i,j < r\] and 
\[n- r < i,j < n\] are known. Then the closure of b 
can be computed by 
I. performing a single matrix multiplication, and 
2. finding the closure of a 2(n - r) × 2(n - r) up- 
per triangular matrix of which the closure of the 
partitions\[1 < i,j < n- r\] and \[n- r < i,j < 
2(n - r)\] are known. 
Proof: See (Valiant, 1975)for details 
The idea behind (Valiant, 1975) is based on visu- 
alizing Ak E b+j as spanning a tree rooted at the 
node Ak with l~aves ai through aj-1 and internal 
nodes as nonterminals generated from Ak according 
to the productions in P. Having done this, the fol- 
lowing observation is made : 
Given an input string al...a, and 2 distinct sym- 
bol positions, i and j, and a nonterminal Ak such 
that Ak E b + ., where i' < i,j' > j, then 3 a non- 
I P3 terminal A k, which is a descendent of Ak in the 
b + . where tree rooted at Ak, such that A k, E i d' 
i" < i, j" > j and A k, has two children Ak~ and Ak2 
such thatAk~ Eb +, andAk2 Eb +..withi<s<j. 
A k, can be thought of as a minimal node in this 
sense.(The descendent relation is both reflexive and 
transitive) 
Thus, given a string al...a, of length n, (say r = 
2/3), the following steps are done : 
167 
t 
Figure 2: Adjunction Operation 
k 
t 
1. Find the closure of the first 2/3 ,i.e. all nodes 
spanning trees which are within the first 2/3 . 
2. Find the closure of the last 2/3 , i.e. all nodes 
spanning trees which are within the last 2/3. 
3. Do a composition operation (i.e. matrix multi- 
plication) on the nodes got as a result of Step 
1 with nodes got as a result of Step 2. 
4. Reduce problem size to az...an/zal+2n/3...an 
and find closure of this input. 
The point to note is that in step 3, we can get rid 
of the mid 1/3 and focus on the remaining problem 
size. 
This approach does not work for TALs because of 
the presence of the adjunction operation. 
Firstly, the data structure used, i.e. the 2- 
dimensional matrix with the given representation, 
is not sufficient as adjunction does not operate on 
contiguous strings. Suppose a node in a tree domi- 
nates a frontier which has the substring aiaj to the 
left of the foot node and akat to the right of the 
footnode. These substrings need not be a contigu- 
ous part of the input; in fact, when this tree is used 
for adjunction then a string is inserted between these 
two suhstrings. Thus in order to represent a node, 
we need to use a matrix of higher dimension, namely 
dimension 4, to characterize the substring that ap- 
pears to the left of the footnode and the substring 
that appears to the right of the footnode. 
Secondly, the observation we made about an entry 
E b + is no longer quite true because of the presence 
of adjunction. 
Thirdly, the technique of getting rid of the mid 
1/3 and focusing on the reduced problem size alone, 
does not work as shown in figure 3: 
Suppose 3' is a derived tree in which 3 a node rn 
on which adjunction was done by an auxiliary tree 
ft. Even if we are able to identify the derived tree 
71 rooted at m, we have to first identify fl before we 
can check for adjunction, fl need not be realised as 
a result of the composition operation involving the 
nodes from the first and last 2/3's ,(say r =2/3). 
Thus, if we discard the mid 1/3, we will not be able 
to infer that the adjunction had indeed taken place 
at node m. 
4 Notations 
Before we introduce the algorithm, we state the no- 
tations that will be used. 
We will be making use of a 4-dimensional matrix 
A of size (n + 1) x (n + 1) x (n + 1) x (n + 1), where 
n is the size of the input string. 
(Vijayashanker, Joshi, 1986) Given a TAG G and 
an input string aza2..an, n > 1, the entries in A will 
be nodes of the trees of G. We say, that a node m 
(= < 0, node index, label >) E A(i,j, k, l) iff m is a 
node in a derived tree 7 and the subtree of 7 rooted 
at m has a yield given by either ai+l...ajXak+l...al 
(where X is the footnode of r/, j < k) or ai+l .... az 
(when j = k). 
If a node m E A(i,j,k,l}, we will refer to m as 
spanning a tree (i,j,k,l). 
When we refer to a node m being realised as a 
result of composition of two nodes ml and rnP, we 
mean that 3 an elementary tree in which m is the 
parent of ml and m2. 
A Grown Auxiliary Tree is defined to be either 
a tree resulting from an adjunction involving two 
auxiliary trees or a tree resulting from an adjunction 
involving an auxiliary tree and a grown auxiliary 
tree. 
Given a node m spanning a tree (i,j,k,l), we define 
the last operation to create this tree as follows : 
if the tree (i,j,k,l) was created in a series of op- 
erations, which also involved an adjunction by an 
auxiliary tree (or a grown auxiliary tree) (i, Jl, kz, l) 
onto the node m, then we say that the last opera- 
tion to create this tree is an adjunction operation; 
else the last operation to create the tree (i,j,k,l) is a 
composition. 
The concept of last operation is useful in modelling 
the steps required, in a bottom-up fashion, to create 
168 
n .. x 
71 
Node m has label X 
/, 
'3' 
Derived tree 
71 
Figure 3: Situation where we cannot infer the adjunction if we simply get rid of the mid 1/3 
a tree. 
5 Algorithm 
Given that the set of initial and auxiliary trees can 
have leaf nodes labelled with e, we do some prepro- 
cessing on the TAG G to obtain an Association List 
(ASSOC LIST) for each node. ASSOC LIST (m), 
where m is a node, will be useful in obtaining chains 
of nodes in elementary trees which have children la- 
belled ~. 
Initialize ASSOC LIST (m) = ¢, V m, and then 
call procedure MAKELIST on each elementary tree, 
in a top down fashion starting with the root node. 
Procedure MAKELIST (m) 
Begin 
1. If m is a leaf then quit 
2. If m has children ml and me both yielding the 
empty string at their frontiers (i.e. m spans a 
subtree yielding e) then 
ASSOC LIST (ml) = ASSOC 
LIST (m) u {m) 
ASSOC LIST (m2) = ASSOC 
LIST (m) U (m} 
3. If m has children m1 and me, with only me 
yielding the empty string at its frontier, then 
ASSOC LIST (ml) = ASSOC 
LIST (m) u {m) 
End 
We initially fill A(i,i+l,i+l,i+l) with all nodes 
from Smt,Vml, where S,~1 = {ml} O AS- 
SOC LIST (ml), ml being a node with the same 
label as the input hi+l, for 0 < i < n-1. We also fill 
A(i,i,j,j), i < j, with nodes from S,~2, Vm2, where 
Sin2 = {me) tJ ASSOC LIST (me), me being a foot 
node. All entries A(i,i,i,i), 0 < i < n, are filled with 
nodes from Sraa,Vm3, where S,n3 = { m3} U AS- 
SOC LIST (mS), m3 having label ¢. 
Following is the main procedure, Compute Nodes, 
which takes as input a sequence rlr2 ..... rp of symbol 
positions (not necessarily contiguous). The proce- 
dure outputs all nodes spanning trees (i,j,k,O, with 
{i, 1} E {rl,r2 ..... ~'ip } and {j,k} E {rl,r I Jr Z,...,rp}. 
The procedure is initially called with the sequence 
012..n corresponding to the input string aa ..... an. 
The matrix A is updated with every call to this pro- 
cedure and it is updated with the nodes just realised 
and also with the nodes in the ASSOC LISTs of the 
nodes just realised. 
Procedure Compute Nodes ( rl r2 ..... rp ) 
Begin 
1. Ifp = 2, then 
a. Compose all nodes E A(rl,j, k, re) with all 
nodes E A(re,re, re, re), rt < j < k < re. 
Update A . 
b. Compose all nodes E A(rl,rl,rl,rx) with 
all nodes E A(rt, j, k, r2), rt < j < k < re. 
Update A . 
e. Check for adjunctions involving nodes re- 
alised from steps a and b. Update A . 
d. Return 
2. Compute Nodes ( rlr2 ..... rep/a ). 
3. Compute Nodes ( rl+p/z ..... rp ). 
4. a. Compose nodes realised from step 2 with 
nodes realised from step 3. 
b. Update A. 
5. a. Check for all possible adjunctions involving 
the nodes realised as a result of step 4. 
b. Update A. 
6. Compute Nodes ( rlre...rp/arl+2p/a...r p ) 
169 
End 
Steps la,lb and 4a can be carried out in the fol- 
lowing manner : 
Consider the composition of node ml with node 
me. For step 4a, there are two cases to take care of. 
Case 1 
If node ml in a derived tree is the ancestor of the 
foot node, and node me is its right sibling, such that 
ml 6 A(i, j, k, l) and m2 E A(l, r, r, s), then their 
parent, say node m should belong to A(i,j,k,s). 
This composition of ml with me can be reduced to a 
boolean matrix multiplication in the following way: 
(We use a technique similar to the one used in (Ra- 
jasekaran, 1995)) Construct two boolean matrices 
B1, of size ((n 4- 1)2p/3) × (p/3) and Be, of size 
(p/3) x (p/3). 
Bl(ijk, l) = 1 iff ml E A(i,j,k,I) 
and i E {rl, .., rv/3} 
and 1 E {rl+p/3, ..r2p/3} 
= 0 otherwise 
Note that in B1 0 < j < k < n. 
BeEs ) = 1 iff me e A(I,r, r,s) 
and 1 E {r1+;13, ..rep/3} 
and s E {rl+ep/3, .., rp} 
-- 0 otherwise 
Clearly the dot product of the ijk th row of B1 
with the s th column of Be is a 1 iff m E A(i, j, k, s). 
Thus, update A(i,j,k, s) with {m} U ASSOC LIST 
(m). 
Case 2 
If node me in a derived tree is the ancestor of the 
foot node, and node ml is its left sibling, such that 
ml E A(i,j,j,l) and m2 E A(l,p, q, r), then their 
parent, say node m should belong to A(i,p,q,s). 
This can also be handled similar to the manner de- 
scribed for case 1. Update A(i,p,q,s) with {m} U 
ASSOC LIST (m). 
Notice that Case 1 also covers step la and Case 2 
also covers step lb. 
Step 5a and Step lc can be carried out in the 
following manner : 
We know that if a node m E A(i,j,k,i), and the 
root ml of an auxiliary tree E A(r, i, i, s), then ad- 
joining the tree 7/, rooted at ml, onto the node m, 
results in the node m spanning a tree (rj,k,s), i.e. m 
E A(r, j, k, s). 
We can essentially use the previous technique of 
reducing to boolean matrix multiplication. Con- 
struct two matrices C1 and Ce of sizes (p2/9) x (n + 
1) 2 and (n + 1) 2 x (n + 1) 2, respectively, as follows : 
Cl(ii, jk) = 1 iff 3ml, root of an auxiliary 
tree E A(i, j, k, l), with same label as m and 
Cl(il, jk) = 0 otherwise 
Note that in CI i E {rl,..,rpls}, i E 
{rl+2p/3 , .., rp}, and 0 _< j < k < n. 
Ce(qt, rs) = 1 iff m E A(q, r, s, t) 
-- 0 otherwise 
Note that inC2 0<q<r<s<t<n. 
Clearly the dot product of the ii th row of C1 with 
the rs th column of Ce is a 1 iff m E A(i,r,s,l). 
Thus, update A(i, r, s, l) with {m} U ASSOC LIST 
(m). 
The input string ala2...an is in the language gener- 
ated by the TAG G iff 3 a node labelled S in some 
A(O,j,j,n), 0 <_ j < n. 
6 Complexity 
Steps la, lb and 4a can be computed in 
O(neM(p)). 
Steps 5a and le can be computed in 
O((ne/pe)eM(pg)). 
If T(p) is the time taken by the procedure Compute 
Nodes, for an input of size p, then 
T(p) = 3T(2p/3)4-O(n2M(p))4- 
O( ( ne /pe)e M (pe) ) 
where n is the initial size of the input string. 
Solving the recurrence relation, we get T(n) - 
O(M(ne)). 
7 Proof of Correctness 
We will show the proof of correctness of the algo- 
rithm by induction on the length of the sequence of 
symbol positions. 
But first, we make an observation, given any two 
symbol positions (r~, rt), rt > r~ 4-1 , and a node m 
spanning a tree (i,j, k, l) such that i < rs and i _> rt 
with j and k in any of the possible combinations as 
shown in figure 4. 
3 a node m' which is a descendent of the 
node m in the tree (i,j,k,l) and which either 
E ASSOC LIST(ml) or is the same as ml, with 
ml having one of the two properties mentioned be- 
low : 
1. ml spans a tree (il,jl, kl, 11) such that the last 
operation to create this tree was a composition 
operation involving two nodes me and m3 with 
me spanning (ix, J2, k2, 12) and m3 spanning 
(12,j3, ks, ix). (with (r, < l~. < rt), 01 <- r,), 
(rt < !1) and either (j2 = kz,j3 = jl,k3 = kl) 
or (j2 = jl,k2 = kl,j3 = k3) ) 
2. ml spans a tree (il,jl, kl, ll) such that the last 
operation to create this tree was an adjunction 
by an auxiliary tree (or a grown auxiliary tree) 
(il, j2, ke, Ix), rooted at node me, onto the node 
ml spanning the tree (je,jl, kl, k2) such that 
node me has either the property mentioned in 
(1) or belongs to the ASSOC LIST of a node 
170 
I I 
rs rt 
j k 
2 
3 
4 j 
5 
Figure 4: Combinations 
j k 
j k 
k 
j k 
of j and k being considered 
which has the property mentioned in (1). (The 
labels of ml and me being the same) 
Any node satisfying the above observation will be 
called a minimal node w.r.t, the symbol positions 
(r,, r0. 
The minimM nodes can be identified in the follow- 
ing manner. If the node m spans (i,j, k, l) such that 
the last operation to create this tree is a composition 
of the form in figure ha, then m tO ASSOC LIST(m) 
is minimal. Else, if it is as shown in figure 5b, we 
can concentrate on the tree spanned by node ml and 
repeat the process. But, if the last operation to cre- 
ate (i, j, k, 1) was an adjunction as shown in figure 
5c, we can concentrate on the tree (il, j, k, 11) ini- 
tially spanned by node m. If the only adjunction 
was by an auxiliary tree, on node m spanning tree 
(Q,j,k, lx) as shown in figure 5d, then the set of 
minimal nodes will include both m and the root ml 
of the auxiliary, tree and the nodes in their respec- 
tive ASSOC LISTs. But if the adjunction was by a 
grown auxiliary tree as shown in figure he, then the 
minimal nodes include the roots of/31,/32, ..,/3s, 7 
and the node m. 
Given a sequence < rl,r2,..,rp >, we call 
(rq,r~+l) a gap, iff rq+l ¢ rq + 1. Identifying min- 
imal nodes w.r.t, every new gap created, will serve 
our purpose in determining all the nodes spanning 
trees (i, j, k, 1), with {i, l} e {rl, r2, .., rp}. 
Theorem : Given an increasing sequence < 
rl, r2, .., rp > of symbol positions and given 
a. V gaps (rq, rq+l), all nodes spanning trees (i,j,k,l} 
with rq < i < j < k < l < rq+l 
b. V gaps (rq, rq+l), all nodes spanning trees (i,j,k,l) 
such that either rq < i < rq+l or rq < l < rq+l 
c. V gaps (rq,rq+l) , all the minimal nodes for the 
gap such that these nodes span trees (i,j,k,l) with 
{i,l} E { rl,r2,..,rp } and i <_ 1 
in addition to the initialization information, the 
algorithm computes all the nodes spanning trees 
(i,i,k,O with (i,l} ~ { r~,r~,..,rp } and i _< i < 
k<l. 
m 
Proof : 
Base Cases : 
For length = 1, it is trivial as this information is 
already known as a result of initialization. 
For length = 2, there are two cases to consider : 
1. r2 = rl + 1, in which case a composition in- 
volving nodes from A(rl, rl, rl, rl) with nodes 
from A(rl, r2, r2, r2) and a composition involv- 
ing nodes from A(rl, r2, r2, r2) with nodes from 
A(r2, r2, r2, r2), followed by a check for adjunc- 
tion involving nodes realised from the previous 
two compositions, will be sufficient. Note that 
since there is only one symbol from the input 
(namely, ar~), and because an auxiliary tree has 
at least one label from ~, thus, checking for one 
adjunction is sufficient as there can be at most 
one adjunction. 
2. r2 ~ rl + 1, implies that (rl,r2) is a gap. 
Thus, in addition to the information given 
as per the theorem, a composition involv- 
ing nodes from A(rl, j, k, r2) with nodes from 
A(r2,r2, r2,r2) and a composition involving 
nodes from A(rl,rl,rl,rl) with nodes from 
A(rl, j, k, r2), (rl < j < k < r2), followed by an 
adjunction involving nodes realised as a result of 
the previous two compositions will be sufficient 
as the only adjunction to take care of involves 
the adjunction of some auxiliary tree onto a 
node m which yields e, and m E A(rl, rl, rl, rl) 
or m E A(r2,r2,r2, r2). 
Induction hypothesis : V increasing sequence 
< rl,r2, ..,r~ > of symbol positions of length < p, 
(i.e q < p), the algorithm, given the information as 
171 
(5a) 
m 
r r s t 
(ab) 
m 
(5c) 
m 
auxiliary A 
• tree o~,.~//////2X grow. 
tree ///// ~k//~ 
i il ' j k ' ll ! 
(Se) 
i z 
I 
(M) 
root of auxiliary ra tree has property 
tree ~///J//~ 
i -'i 1 ' l 1 1 
Grown aux tree formed by adjoining 
Ps " P2 Pl 
onto root of grown aux tree 7 
Root of ~1 has property shown in (Sa) 
Figure 5: Identifying minimal nodes 
required by the theorem, computes all nodes span- 
ning trees (i,j,k,l) such that {i, l} e { rl, r2, .., rq } 
and i < j < k < I. Induction : Given an increasing 
sequence < rl, r~, .., rp, rp+l > of symbol positions 
together with the information required as per parts 
a,b,c of the theorem, the algorithm proceeds as fol- 
lows: 
1. By the induction hypothesis, the algorithm 
correctly computes all nodes spanning trees 
(i,j,k,i) within the first 2/3, i.e, {i,l} E { 
rt, r2, .., r2(p+D/3 } and i < l . By the hypothe- 
sis, it also computes all nodes (i',j,k',l')within 
the last 2/3, i.e, { i ~, ! ~ } E {rl+(p+l)/3, .., rp+z} 
and i' < i'. 
2. The composition step involving the nodes 
from the first and last 2/3 of the sequence 
< rl, r2, .., rp, rp+i >, followed by the adjunc- 
tion step captures all nodes m such that either 
a. m spans a tree (i,j,k,l)such that the last op- 
eration to create this tree was a composi- 
tion operation on two nodes ml and m2 
with ml spanning (i,j',k;l'} and me span- 
ning 
(i;j",k",l). (with i E { rl, r2, .., r(p+l)/3 }, 
i E { rl+(p+l)/3,..,r2(p+D/3 } and I E ! 
ri+2(p+z)/3, .., rp+z }, and either (j' = k, 
j" =j, k" = k) or (j' =j, k'= k,j" = k') ). 
b. m spans a tree O,J, k,l) such that the last op- 
eration to create this tree was an adjunc- 
tion by an auxiliary or grown auxiliary tree 
(i,j',k',l), rooted at node mI, onto the node 
m spanning the tree (j',j,k,k') such that 
node ml has either the property mentioned 
in (1) or it belongs to the ASSOC LIST of 
a node which has the property mentioned 
in (1). (The labels of m and ml being the same) 
Note that, in addition to the nodes m captured 
from a or b, we will also be realising nodes E 
ASSOC LIST (m). 
The nodes captured as a result of 2 are 
the minimal nodes with respect to the gap 
(r(p+l)/a, rl+2(p+l)/3) with the additional property 
that the trees (i,j,k,l) they span are such that i E { 
rl, r2, .., r(p+l)\]3 } and l E { rl+2(p+l)\]3, .., rp+l }. 
Before we can apply the hypothesis on the se- 
quence < rx, r2, .., r(p+t)/3, rl+2(p+l)\[3, ..rp+l >, we 
have to make sure that the conditions in parts 
a,b,c of the theorem are met for the new gap 
(r(p+1)/3, rl+2(p+l)/3). It is easy to see that con- 
ditions for parts a and b are met for this gap. We 
have also seen that as a result of step 2, all the mini- 
mal nodes w.r.t the gap (r(p+x)/3 , rl+2(p+l)/3), with 
172 
the desired property as required in part c have been 
computed. Thus applying the hypothesis on the 
sequence < rl, r2, .., r(p+l)\[3, rl+2(p+l)/3, ..rp+l >, 
the algorithm in the end correctly computes all 
the nodes spanning trees (ij,k,1) with {i,l} E 
{rl,r2,..,rp+x } andi<j<k<l. D 
8 Implementation 
The TAL recognizer given in this paper was im- 
plemented in Scheme on a SPARC station-10/30. 
Theoretical results in this paper and those in (Ra- 
jasekaran, 1995) clearly demonstrate that asymp- 
totically fast algorithms can be obtained for TAL 
parsing with the help of matrix multiplication al- 
gorithms. The main objective of the implementa- 
tion was to check if matrix multiplication techniques 
help in practice also to obtain efficient parsing algo- 
rithms. 
The recognizer implemented two different algo- 
rithms for matrix multiplication, namely the triv- 
ial cubic time algorithm and an algorithm that ex- 
ploits the sparsity of the matrices. The TAL recog- 
nizer that uses the cubic time algorithm has a run 
time comparable to that of Vijayashanker-\]oshi's al- 
gorithm. 
Below is given a sample of a grammar tested and 
also the speed up using the sparse version over the 
ordinary version. The grammar used, generated the 
TAL anbnc n. This grammar is shown in figure 1. 
Interestingly, the sparse version is an order of 
magnitude faster than the ordinary version for 
strings of length greater than 7. 
i\[ String 
abe 
aabbcc 
Answer 
Yes 
Yes 
Speedup \[1 
3.1 
6.1 
aabcabe No 8.0 
abacabac No 11.7 
aaabbbccc Yes 11.4 
The above implementation results suggest that 
even in practice better parsing algorithms can be 
obtained through the use of matrix multiplication 
techniques. 
9 Conclusions 
In this paper we have presented an O(M(n2)) time 
algorithm for parsing TALs, n being the length of 
the input string. We have also demonstrated with 
our implementation work that matrix multiplication 
techniques can help us obtain efficient parsing algo- 
rithms. 
Acknowledgements 
This research was supported in part by an NSF Re- 
search Initiation Award CCR-92-09260 and an ARO 
grant DAAL03-89-C-0031. 

References 
D. Coppersmith and S. Winograd, Matrix Multi- 
plication Via Arithmetic Progressions, in Proc. 
19th Annual ACM Symposium on Theory of Com- 
puting, 1987,pp. 1-6. Also in Journal of Symbolic 
Computation, Vol. 9, 1990, pp. 251-280. 
S.L. Graham, M.A. Harrison, and W.L. Ruzzo, On 
Line Context Free Language Recognition in Less 
than Cubic Time, Proc. A CM Symposium on The- 
ory of Computing, 1976, pp. 112-120. 
A.K. Joshi, L.S. Levy, and M. Takahashi, Tree Ad- 
junct Grammars, Journal of Computer and Sys- 
tem Sciences, 10(1), 1975. 
A.K. Joshi, K. Vijayashanker and D. Weir, The Con- 
vergence of Mildly Context-Sensitive Grammar 
Formalisms, Foundational Issues of Natural Lan- 
guage Processing, MIT Press, Cambridge, MA, 
1991,pp. 31-81. 
A. Kroch and A.K. Joshi, Linguistic Relevance of 
Tree Adjoining Grammars, Technical Report MS- 
CS-85-18, Department of Computer and Informa- 
tion Science, University of Pennsylvania, 1985. 
M. Palis, S. Shende, and D.S.L. Wet, An Optimal 
Linear Time Parallel Parser for Tree Adjoining 
Languages, SIAM Journal on Computin#,1990. 
B.H. Partee, A. Ter Meulen, and R.E. Wall, Stud- 
ies in Linguistics and Philosophy, Vol. 30, Kluwer 
Academic Publishers, 1990. 
S. Rajasekaran, TAL Parsing in o(n 6) Time, to ap- 
pear in SIAM Journal on Computing, 1995. 
G. Satta, Tree Adjoining Grammar Parsing and 
Boolean Matrix Multiplication, to be presented in 
the 31st Meeting of the Association for Computa- 
tional Linguistics, 1993. 
G. Satta, Personal Communication, September 
1993. 
Y. Schabes and A.K. Joshi, An Earley-Type Parsing 
Algorithm for Tree Adjoining Grammars, Proc. 
26th Meeting of the Association for Computa- 
tional Linguistics, 1988. 
L.G. Valiant, General Context-Free Recognition in 
Less than Cubic Time, Journal of Computer and 
System Sciences, 10,1975, pp. 308-315. 
K. Vijayashanker and A.K. Joshi, Some Computa- 
tional Properties of Tree Adjoining Grammars, 
Proc. 2~th Meeting of the Association for Com- 
putational Linguistics, 1986. 
