The Computational Complexity of the 
Correct-Prefix Property for TAGs 
Mark-Jan Nederhof* 
German Research Center for Artificial 
Intelligence 
A new upper bound is presented for the computational complexity of the parsing problem for 
TAGs, under the constraint that input is read from left to right in such a way that errors in the 
input are observed as soon as possible, which is called the "correct-prefix property." The former 
upper bound, O(n9), is now improved to O(n6), which is the same as that of practical parsing 
algorithms for TAGs without the additional constraint of the correct-prefix property. 
1. Introduction 
Traditionally, parsers and recognizers for regular and context-free languages process 
input from left to right. If a syntax error occurs in the input they often detect that 
error immediately after its position is reached. The position of the syntax error can 
be defined as the rightmost symbol of the shortest prefix of the input that cannot be 
extended to be a correct sentence in the language L. 
In formal notation, this prefix for a given erroneous input w ~ L is defined as the 
string va, where w = vax, for some x, such that vy E L, for some y, but vaz ~ L, for 
any z. (The symbols v, w .... denote strings, and a denotes an input symbol.) The 
occurrence of a in w indicates the error position. 
If the error is detected as soon as it is reached, then all prefixes of the input that 
have been processed at preceding stages are correct prefixes, or more precisely, they are 
prefixes of some correct strings in the language. Hence, we speak of the correct-prefix 
property. 1 
An important application can be found in the area of grammar checking: upon 
finding an ungrammatical sentence in a document, a grammar checker may report to 
the user the presumed position of the error, obtained from a parsing algorithm with 
the correct-prefix property. 
For context-free and regular languages, the correct-prefix property can be satis- 
fied without additional costs of space or time. Surprisingly, it has been claimed by 
Schabes and Waters (1995) that this property is problematic for the mildly context- 
sensitive languages represented by tree-adjoining grammars (TAGs): the best practical 
parsing algorithms for TAGs have time complexity Cg(n 6) (Vijay-Shankar and Joshi 
\[1985\]; see Satta \[1994\] and Rajasekaran and Yooseph \[1995\] for lower theoretical upper 
bounds), whereas the only published algorithm with the correct-prefix property--that 
by Schabes and Joshi (1988)--has complexity O(n9). 
In this paper we present an algorithm that satisfies the correct-prefix property and 
operates in Cq(n 6) time. This algorithm merely recognizes input, but it can be extended 
* DFKI, Stuhlsatzenhausweg 3, D-66123 Saarbriicken, Germany. E-mail: nederhof@dfki.de 
1 We adopt this term from Sippu and Soisalon-Soininen (1988). In some publications, the term valid 
prefix property is used. 
(~) 1999 Association for Computational Linguistics 
Computational Linguistics Volume 25, Number 3 
to be a parsing algorithm with the ideas from Schabes (1994), which also suggest how 
it can be extended to handle substitution in addition to adjunction. The complexity 
results carry over to linear indexed grammars, combinatory categorial grammars, and 
head grammars, since these formalisms are equivalent to TAGs (Vijay-Shanker and 
Weir 1993, 1994). 
We present the actual algorithm in Section 3, after the necessary notation has been 
discussed in Section 2. The correctness proofs are discussed in Section 4, and the time 
complexity in Section 5. The ideas in this paper give rise to a number of questions for 
further research, as discussed in Section 6. 
2. Definitions 
Our definition of TAGs simplifies the explanation of the algorithm, but differs slightly 
from standard treatment such as that of Joshi (1987). 
A tree-adjoining grammar is a 4-tuple (~, NT, L A), where ~ is the set of terminals, 
I is the set of initial trees, and A is the set of auxiliary trees. We refer to the trees in 
I U A as elementary trees. The set NT, the set of nonterminals, does not play any role 
in this paper. 
We refer to the root of an elementary tree t as Rt. Each auxiliary tree has exactly 
one distinguished leaf, which is called the foot. We refer to the foot of an auxiliary 
tree t as Ft. 
We use variables N and M to range over nodes in elementary trees. We assume 
that the sets of nodes belonging to distinct elementary trees are pairwise disjoint. 
For each leaf N in an elementary tree, except when it is a foot, we define label(N) 
to be the label of the node, which is either a terminal from ~ or the empty string e. 
For all other nodes, label is undefined. 
For each node N that is not a leaf or that is a foot, Adj(N) is the set of auxiliary trees 
that can be adjoined at N, plus possibly the special element nil. For all other nodes, 
Adj is undefined. If a set Adj(N) contains nil, then this indicates that adjunction at N 
is not obligatory. 
For each nonleaf node N we define children(N) as the (nonempty) list of daughter 
nodes. For all other nodes, children is undefined. An example of a TAG is given in 
Figure 1. 
The language described by a TAG is given by the set of strings that are the yields 
of derived trees. A derived tree is obtained from an initial tree by performing the 
following operation on each node N, except when it is a leaf: The tree is excised at N, 
and between the two halves a fresh instance of an auxiliary tree, which is taken from 
the set Adj(N), is inserted, or the element nil is taken from Adj(N), in which case no 
new nodes are added to the tree. Insertion of the new auxiliary tree, which from now 
on will be called adjunction, is done in such a way that the bottom half of the excised 
tree is connected to the foot of the auxiliary tree. The new nodes that are added to the 
tree as a result are recursively subjected to the same operation. This process ends in a 
complete derived tree once all nodes have been treated. 
An example of the derivation of a string is given in Figure 2. We start with initial 
tree al and treat Ral, for which we find Adj(Ral) = {b2, nil}. We opt to select nil, 
so that no new nodes are added. However in the figure we do split Ral in order to 
mark it as having been treated. Next we treat Nail, and we opt to adjoin bl, taken 
from Adj(N~I ) = {bl, b3}. After another "nil-adjunction" at Rbl, we adjoin b2 at N~I. 
Note that this is an obligatory adjunction, since Adj(N~I ) does not contain nil. Some 
more nil-adjunctions lead to a derived tree with yield acdb, which is therefore in the 
language described by the TAG. 
346 
Nederhof Correct-Prefix Property for TAGs 
Initial trees 
(al) (a2) @a/~ Ra l C~~@ 
b e e 
Adj(Ral) = Adj(R,~2) = 
{62, nil} {b3, nil} 
Adj(N:,) = Adj(g:~) = 
{bl,b3} {bl,b2} 
Adj ( Y2~) = 
{nil} 
Figure 1 
A tree-adjoining grammar. 
Auxiliary trees 
i 
(bl) (b2) 
Fbl OFb2 ~ 
C 
Adj(nbl) = Adj(Rb2) = 
{bl, b2, nil} {b3, nil} 
Ady(NdI) = Adj(Fb~) = 
{b~} {nil} 
Adj(Fb~) = 
{nil} 
Initial tre~/~ Ral 
b 
nil- 
adjunction 
'~ adjoin 
eaq bl at N~I 
b 
o.~ adjoin 
b2 at Nil 
sl 
C 
¢-~ Derived o/~ 
tree: 
three nil- 
adjunctions 
4 
¢.% 
e 
Figure 2 
Derivation of the string acdb. 
(b3) 
l Rb3 
"a" oFb3  N:3 
b 
Adj(Rb3) = {bl, nil} 
Adj(N~) = (b2} 
A~j(N~) = {b3, nil} 
Adj(Fb3) = (nil} 
a 
c b 
J nil- 
adjunction 
a ~Fbl 
c b 
In order to avoid cluttering the picture with details, we have omitted the names 
of nodes at which (nil-)adjunction has been applied. We will reintroduce these names 
later. A further point worth mentioning is that here we treat the nodes in preorder: we 
traverse the tree top-down and left-to-right, and perform adjunction at each node the 
first time it is encountered. 2 Any other strategy would lead to the same set of derived 
trees, but we chose preorder treatment since this matches the algorithm we present 
below. 
2 The tree that is being traversed grows in size during the traversal, contrary to traditional usage of the 
notion of "traversal." 
347 
Computational Linguistics Volume 25, Number 3 
3. The Algorithm 
The input to the recognition algorithm is given by the string ala2.. • an, where n is the 
length of the input. Integers i such that 0 < i < n will be used to indicate "positions" 
in the input string. Where we refer to the input between positions i and j we mean 
the string ai+l ... aj. 
The algorithm operates by means of least fixed-point iteration: a table is gradually 
filled with elements derived from other elements, until no more new ones can be found. 
A number of "steps" indicate how table elements are to be derived from others. 3 
For the description of the steps we use a pseudoformal notation. Each step consists 
of a list of antecedents and a consequent. The antecedents are the conditions under 
which an incarnation of the step is executed. The consequent is a new table element 
that the step then adds to the parse table, unless of course it is already present. An 
antecedent may be a table element, in which case the condition that it represents is 
membership in the table. 
The main table elements, or items, are 6-tuples \[h, N --* c~ • t, i, j, fl, f2\]. Here, N 
is a node from some elementary tree t, and o~fl is the list of the daughter nodes of N. 
The daughters in o~ together generate the input between positions i and j. The whole 
elementary tree generates input from position h onwards. 
Internal to the elementary tree, there may be adjunctions; in fact, the traversal of 
the tree (implying (nil-)adjunctions at all nodes) has been completed up to the end 
of c~. Furthermore, tree t may itself be an auxiliary tree, in which case it is adjoined 
in another tree. Then, the foot may be dominated by one of the daughters in a, and 
the foot generates the part of the input between positions fl and f2. When the tree is 
not an auxiliary tree, or when the foot is not dominated by one of the daughters in c~, 
then fl and f2 both have the dummy value "-" 
Whether t is an initial or an auxiliary tree, it is part of a derived tree of which 
everything to the left of the end of c~ generates the input between positions 0 and j. 
The traversal has been completed up to the end of c~. 
See Figure 3 for an illustration of the meaning of items. We assume Rt and Ft 
are the root and foot of the elementary tree t to which N belongs; Ft may not exist, 
as explained above. R is the root of some initial tree. The solid lines indicate what 
has been established; the dashed lines indicate what is merely predicted. If Ft ex- 
ists, the subtree below Ft indicates the lower half of the derived tree in which t was 
adjoined. 
The shaded areas labeled by I, II, and III have not yet been traversed. In particular 
it has not yet been established that these parts of the derived tree together generate 
the input between positions j and n. 
For technical reasons, we assume an additional node for each elementary tree t, 
which we denote by T. This node has only one daughter, viz. the actual root node Rt. 
We also assume an additional node for each auxiliary tree t, which we denote by 3_. 
This is the unique daughter of the actual foot node Ft; we set children(Ft) = _1_. 
In summary, an item indicates how a part of an elementary tree contributes to the 
recognition of some derived tree. 
Figure 4 illustrates the items needed for recognition of the derived tree from the 
running example. We have simplified the notation of items by replacing the names of 
leaves (other than foot nodes) by their labels. 
3 A "step" is more accurately called an "inference rule" in the literature on deductive parsing (Shieber, Schabes, and Pereira 1995). For the sake of convenience we will apply the shorter term. 
348 
Nederhof Correct-Prefix Property for TAGs 
0 h if 1 f2 j 
Figure 3 
An item \[h, N -* a . fl, i, \], fl, f2\]. 
0 
1: \[0, T-+ ,Ral, 0,0,-,-\] = 
T 2: \[0, R~I -+ • a N~I, 0, 0,-,-\] = 
I , 23 3: \[0, Ral -+ a • N~I, 0, 1, -, -\] = 
~Rnl 4: \[1, T --+ • Rbl, 1, 1, -, -\] = ~4T 
212~ 1 5:\[I'RbI-+•NdlFbl' 1'1'-'-\]= 
6: \[1, T-+ •Rb2, 1,1,--,--\] = 
21 7: \[1, Rb2 -+ • Fb2 d, 1, 1, -, -\] = 
~Rbl 8: \[1, Fb2 -+ • 3-, 1, 1,-,-\] = 
g~/ Rbl 9: \[1, Nil --+ • c, 1, 1,-,-1 ---- 
a 1 ;~20 10: \[1,NIl -+ c •, 1,2,-,-\] = 
Nbl:4 / ~Fbl 11: \[1, Ub2 -+ 3_ •, 1,2,1,21 = 
6,T14 yFbl 12: \[1, Rb2 --~ Fb2 • d, 1, 2, 1, 2\] = 
~Rb2 16119 13: \[1, Rb2 --+ Fb2 d •, 1, 3, 1, 2\] = Rb2 
7~,12~3 7~lN// 14: \[1, T __+ Rb2 •, 1,3, 1,21 = u 
52 I 8 15a: \[N~I -+ c •, 1, 3,-, -1 = 
15: \[1, Rbl ~ N~I " Fbl, 1, 3, --, -1 = 
yFb2 d b 16: \[1, Fbl -+ • 3_, 3, 3, --, --1 = 
8-!-111 17: \[0, Nil --+ • b, 3, 3, , \] = 
1 Nbl 18: \[0, Nail -+ b •, 3, 4, -, -\] = 
19: \[1,Ybl ~ ± •, 3,4,3,4\] = 
20: \[1, Rbl --+ Nil Fbl •, 1, 4, 3, 4\] = 
C 21: \[1, T --+ Rbl •, 1, 4, 3, 4\] = 
a c d b 22a: \[Nil -+ b •, 1, 4,-, -\] = 
22: \[0, R~I --+ a N~I •, 0,4,-,-\] = 
1 2 3 4 23: \[0, 7- -+ Ral ", 0, 4,-,-\] = 
(Init) 
(Pred 2)1 
(Scan 1)2 
(Pred 1)3 
(Pred 2)4 
(Pred 1)5 
(Pred 2)6 
(Pred 2)7 
(Pred 3)8 + 5 
(Scan 1)9 
(Comp 1)10 + 8 + 5 
(Comp 2111 + 7 
(Scan 1)12 
(Comp 2)13 + 6 
(ndj 0)14 + 10 
(Adj 2)15a + 5 
(Pred 2)15 
(Pred 3)16 + 3 
(Scan 1)17 
(Comp 1)18 + 16 + 3 
(Comp 2)19 + 15 
(Comp 2)20 + 4 
(Adj 0)21 + 18 
(Adj 2)22a + 3 
(Comp 3)22 + 1 
Figure 4 
The items needed for recognition of a derived tree. 
There is one special kind of item, with only five fields instead of six. This is 
used as an intermediate result in the adjunctor steps to be discussed in Section 
3.5. 
349 
Computational Linguistics Volume 25, Number 3 
R 
L .............. x 
0 n 
Init 
Figure 5 
The initialization. 
h i fl f2 J j+l 
Scan 1 
Figure 6 
The first scanner step. 
3.1 Initializer 
The initializer step predicts initial trees t starting at position 0; see Figure 5. 
t¢I 
\[0, 7----~ .Rt, O, O, , \] (Init) 
For the running example, item 1 in Figure 4 results from this step. 
3.2 Scanner 
The scanner steps try to shift the dot rightward in case the next node in line is labeled 
with a terminal or ¢, which means the node is a leaf but not a foot. Figure 6 sketches 
the situation with respect to the input positions mentioned in the step. The depicted 
structure is part of at least one derived tree consistent with the input between positions 
0 and j + 1, as explained earlier. 
\[h, N Mg, i, j, A, ;Ca\], 
label(M) = aj+l 
\[h, N --* ccM . fl, i, j + l, A, f2 \] (Scan 1) 
\[h, N--*c~,Mfl, i, j, fl, f2\], 
label(M) = ¢ 
\[h, N --* aM. fl, i, j, A, f2\] (Scan 2) 
For the running example in Figure 4, Scan 1 derives, among others, item 3 from 
item 2, and item 13 from item 12. 
350 
Nederhof Correct-Prefix Property for TAGs 
",,, ",,,, 
h i h i j 
Pred 1 Pred 2 
Figure 7 
The three predictor steps. 
i!~, "', "',," / ///,~,,:,,', ,,',, 
/ // //JM'i ~,', ,,, , 
/ ////I ~ ,i <, ,,', , / // //~ '__~_, i~ ~,;, ,, ,, 
/ f f t~___:~L~_~__', 
h i j 
Pred 3 
3.3 Predictor 
The first predictor step predicts a fresh occurrence of an auxiliary tree t, indicated in 
Figure 7. The second predicts a list of daughters "7 lower down in the tree, abstaining 
from adjunction at the current node M. The third predicts the lower half of a tree in 
which the present tree t was adjoined. 
\[h, N --~ a. Mfl, i, j, fl, f2\], 
t E Adj(M) (Pred 1) 
~', T~ .R,, j, j, -, -\] 
\[h, N--+a.Mfl, i, j, A, f2\], 
nil E Adj(M), 
children(M) = "7 (Pred 2) 
\[h,M--+ *"7, j, j, -, -\] 
~, ~,~ .±, k, k, -, -\], 
\[h, N ---+ a. Mfl, i, j, fl, f2\], 
t E Adj(M), 
children(M) = "7 (Pred 3) 
\[h, M--+ ."7, k, k, -, -\] 
For the running example, Pred 1 derives item 4 from item 3 and item 6 from 
item 5. Pred 2 derives, among others, item 5 from item 4. Pred 3 derives item 9 from 
items 8 and 5, and item 17 from items 16 and 3. 
3.4 Completer 
The first completer step completes recognition of the lower half of a tree in which an 
auxiliary tree t was adjoined, and asserts recognition of the foot of t; see Figure 8. The 
second and third completer steps complete recognition of a list of daughter nodes '7, 
and initiate recognition of the list of nodes fl to the right of the mother node of % 
\[h, M--*'7., k, l,f~,fd\], 
t E Adj(M), 
~, Ft--~ ,_L, k, k, -, -\], 
\[h, N-+a.Mfl, i, j, f~, f2\] 
~, F t ---+ l., k, l, k, I\] (Comp 1) 
351 
Computational Linguistics Volume 25, Number 3 
L 
h i j k l 
Comp 1 
Figure 8 
h ij 
Two of the completer steps. 
f2 k 
Comp 2 
\[h, M--*3"°, j, k, fl, f2\], 
\[h, N --~ c~oMfl, i, j, -, -\], 
M dominates foot of tree 
\[h, N--,c~M°fl, i, k, fl, f2\] (Comp 2) 
\[h, M--~',/., j, k, -, -\], 
\[h, N---~c~°Mfl, i, j, fl, f2\] 
\[h, N -* c~M ° fl, i, k, A, f2\] (Comp 3) 
See Figure 4 for use of these three steps in the running example. 
3.5 Adjunctor 
The adjunctor steps perform the actual recognition of an adjunction of an auxiliary 
tree t in another tree at some node M. The first adjunctor step deals with the case in 
which the other tree is again adjoined in a third tree (the two darkly shaded areas in 
Figure 9) and M dominates the foot node. The second adjunctor step deals with the 
case in which the other tree is either an initial tree, or has its foot elsewhere, i.e., not 
dominated by M. 
The two respective cases of adjunction are realized by step Adj 0 plus step Adj 1, 
and by step Adj 0 plus step Adj 2. The auxiliary step Adj 0 introduces items of a 
somewhat different form than those considered up to now, viz. \[M ~ 3' o, j, k, f~, f~\]. 
The interpretation is suggested in Figure 10: at M a tree has been adjoined. The ad- 
joined tree and the lower half of the tree that M occurs in together generate the input 
from j to k. The depicted structure is part of at least one derived tree consistent with 
the input between positions 0 and k. In the case in which M dominates a foot node, 
as suggested in the figure, f~ and fd have a value other than "-" 
~, T---~ Rt., j, k, fl, f2\], 
\[h, M--*3",, f~, f2, f~, fd\], 
t E Adj(M) 
\[M --~ 3' °, j, k, f~, f2\] (Adj 0) 
352 
Nederhof Correct-Prefix Property for TAGs 
h i j f~f( f;f2 
Adj 1 
Figure 9 
~ M 3', 
h i fl 4 k 
Adj 2 
The two adjunctor steps, implicitly combined with Adj 0. 
\[M--* 7°, j, k,f~,fd\], 
M dominates foot of tree t', 
\[h, F t, --+ ± o, f~, f~, f~, f~\], 
\[h, N--+ c~oMfl, i, j, -, -\] 
\[h, N--.c~M°fl, i, k, f~, f~\] (Adj 1) 
\[M ~-yo, j, k, -, -\], 
\[h, N--*c~.Mfl, i, j, f~, f~\] 
\[h, N --~ aM. fl, i, k, f~, f~\] (Adj 2) 
For the running example, Adj 0 derives the intermediate item 15a from items 14 
and 10 and from this and item 5, Adj 2 derives item 15. Similarly, Adj 0 and Adj 2 
together derive item 22. There are no applications of Adj 1 in this example. 
An alternative formulation of the adjunctor steps, without Adj 0, could be the 
following: 
~, T--* Rt., j, k, fl, d2\], 
\[h, M~,, fl, fa, f~, fd\], 
t C Adj(M), 
M dominates foot of tree t', 
\[h, Ft, ~ ±., f~, fd, f~, f~l, 
\[h, N--~ c~ °Mfl, i, j, -, -1 
\[h, N --* aM ° fl, i, k, f~, fd\] (Adj 1') 
~, T"+ at°, j, k, f~, fal, 
\[h, M--~7,, fl, f2, -, -1, t 6 Adj(M), 
\[h, N---~a,Mfl, i, j, f~, fd\] 
\[h, N ~ aM,fl, i, k, f~, f~\] (Adj 2') 
353 
Computational Linguistics Volume 25, Number 3 
M 
J fl'~' k 
Figure 10 
An item \[M --~ 3' ", j, k, fi, fd\]. 
That this formulation is equivalent to the original combination of the three steps 
Adj 0, Adj 1, and Adj 2 can be argued in two stages. 
First, the h in \[h, M --~ "7 ,, A, f2, f~, f~\] or \[h, M --+ 31 ,, A, f2, -, -\] occurring as 
second antecedent of Adj I t or Adj 2 ~, respectively, can be replaced by a fresh variable 
h ~ without affecting the correctness of the algorithm. In particular, the occurrence of h 
in the second antecedent of Adj 1 ~ is redundant because of the inclusion of the fifth 
antecedent \[h, Ft, --+ J-,, f~, f~, f~, f~\]. Note that, conversely, this fifth antecedent is 
redundant with respect to the second antecedent, since existence of an item \[h, M --+ 
"7 ", fl, f2, f~, fd\], such that M dominates the foot of a tree t', implies the existence of 
an item \[h, Ft, ~ _L ,, f~, f~, f~, f~\]. For further explanation, see Section 4. 
Second, the first three antecedents of Adj 1 ~ and Adj 2 ~ can be split off to obtain 
Adj 0, Adj 1, and Adj 2, justified by principles that are the basis for optimization of 
database queries (Ullman 1982). 
The rationale for the original formulation of the adjunction steps as opposed to 
the alternative formulation by Adj 1 ~ and Adj 2 ~ lies in the consideration of time 
complexity, as will be discussed in Section 5. 
4. Properties 
The first claim we make about the algorithm pertains to its correctness as a recognizer: 
Claim 
After completion of the algorithm, the item \[0, T --* a t e, 0, n, -, -\], for some t E L 
is in the table if and only if the input is in the language described by the grammar. 
Note that the input is in the language if and only if the input is the yield of a 
derived tree. 
The idea behind the proof of the "if" part is that for any derived tree constructed 
from the grammar we can indicate a top-down and left-to-right tree traversal that is 
matched by corresponding items that are computed by steps of the algorithm. The 
tree traversal and the corresponding items are exemplified by the numbers 1 .... ,23 
in Figure 4. 
For the "only if" part, we can show for each step separately that the invariant 
suggested in Figure 3 is preserved. To simplify the proof one can look only at the last 
five fields of items \[h, N --+ c~, fl, i, j, fl, f2\], h being irrelevant for the above claim. 
We do, however, need h for the proof of the following claim: 
354 
Nederhof Correct-Prefix Property for TAGs 
0 hi d 
(a) 
Figure 11 
Pred 1 preserves the invariant. 
0 hid 
(b) 
Claim 
The algorithm satisfies the correct-prefix property, provided the grammar is reduced. 
A TAG is reduced if it does not contain any elementary trees that cannot be part 
of any derived tree. (One reason why an auxiliary tree might not be a part of any 
derived tree is that at some node it may have obligatory adjunction of itself, leading 
to "infinite adjunction.") 
Again, the proof relies on the invariant sketched in Figure 3. The invariant can be 
proven correct by verifying that if the items in the antecedents of some step satisfy 
the invariant, then so does the item in the consequent. 
A slight technical problem is caused by the obligatory adjunctions. The shaded 
areas in Figure 3, for example, represent not merely subtrees of elementary trees, but 
subtrees of a derived tree, which means that at each node either adjunction or nil- 
adjunction has been performed. 
This issue arises when we prove that Pred 1 preserves the invariant. Figure 11(a) 
represents the interpretation of the first antecedent of this step, \[h, N --+ e~ • Mfl, i, j, fl, 
f2\]; without loss of generality we only consider the case that fl = f2 = -. We may 
assume that below M some subtree exists, and that at M itself either adjunction with 
auxiliary tree t ~ or nil-adjunction has been applied; the figure shows the former case. 
In order to justify the item from the consequent, ~, T --* • Rt, j, j, -, -\], we 
construct the tree in Figure 11(b), which is the same as that in Figure 11(a), except 
that t ~ is replaced by auxiliary tree t, which has been traversed so that at all nodes 
either adjunction or nil-adjunction has been applied, including the nodes introduced 
recursively through adjunctions. Such a finite traversal must exist since the grammar 
is reduced. 
For the other steps we do not need the assumption that the grammar is reduced 
in order to prove that the invariant is preserved. For example, for Adj 1 we may 
reason as follows: The item \[M --+ ~, •, j, k, f~, f~\], the first antecedent, informs us of 
the existence of the structure in the shaded area of Figure 12(a). Similarly, the items 
\[h, Ft, --~ /•, jc~, jcd, f~, fd\] and \[h, N ~ c~ • Mfl, i, j, -, -\] provide the shaded areas 
of Figures 12(b) and 12(c). Note that in the case of the first or third item, we do not 
use all the information that the item provides. In particular, the information that the 
structures are part of a derived tree consistent with the input between positions 0 and 
k (in the case of (a)) or j (in the case of (c)) is not needed. 
355 
Computational Linguistics Volume 25, Number 3 
M 
J fl' f2' kO 
"" """ / ",i ", 
h fl' .f2' h i j 
(a) (b) (c) 
Rt~""" 
' '!\ 
0 h i j fl fl" f2' f2 k 
Figure 12 
Adj 1 preserves the invariant. 
(d) 
The combined information from these three items ensures the existence of the 
derived tree depicted in Figure 12(d), which justifies the consequent of Adj 1, viz. 
\[h, N --* aM. fl, i, k, f~, fd\]. 
The other steps can be proven to preserve the invariant in similar ways. 
Now the second claim follows: if the input up to position j has been read resulting 
in an item of the form \[h, N --* aa * fl, i, j, fl, f2\], then there is a string y such that 
al... ajy is in the language. This y is the concatenation of the yields of the subtrees 
labeled I, II, and III in Figure 3. 
The full proofs of the two claims above are straightforward but tedious. Further- 
more, our new algorithm is related to many existing recognition algorithms for TAGs 
(Vijay-Shankar and Joshi 1985; Schabes and Joshi 1988; Lang 1988; Vijay-Shanker and 
Weir 1993; Schabes and Shieber 1994; Schabes 1994), some of which were published 
356 
Nederhof Correct-Prefix Property for TAGs 
together with proofs of correctness. Therefore, including full proofs for our new algo- 
rithm does not seem necessary. 
5. Complexity 
The steps presented in pseudoformal notation in Section 3 can easily be composed 
into an actual algorithm (Shieber, Schabes, and Pereira 1995). This can be done in such 
a way that the order of the time complexity is determined by the maximal number of 
different combinations of antecedents per step. If we restrict ourselves to the order of 
the time complexity expressed in the length of the input, this means that the complexity 
is given by O(nP), where p is the largest number of input positions in any step. 
However, a better realization of the algorithm exists that allows us to exclude 
the variables for input positions that occur only once in a step, which we will call 
irrelevant input positions. This realization relies on the fact that an intermediate step 
I 
may be applied that reduces an item I with q input positions to another item I' with 
q' < q input positions, omitting those that are irrelevant. That reduced item I' then 
takes the place of I in the antecedent of the actual step. This has a strong relationship 
to optimization of database queries (Ullman 1982). 
For example, there are nine variables in Comp 1, of which i,fl,f2,f~,f~ are all 
irrelevant, since they occur only once in that step. An alternative formulation of this 
step is therefore given by the combination of the following three steps: 
\[h, M~'),., k, l, f{, f~\] 
\[h, M--*'y., k, l, ?, ?\] (Omit 5-6) 
\[h, N ---* c~. Mfl, i, j, A, f2\] 
\[h, N--* o~ .Mfl, ?, j, ?, ?\] (Omit 3-5-6) 
\[h, M--~'y ., k, 1, ?, ?\], 
t E Adj(M), 
~,Ft---+ -±, k, k,-, -\], 
\[h, N--* o~. Mfl, ?, j, ?, ?1 
~, Ft--~l., k, l, k, I\] (Comp 1') 
The question marks indicate omitted input positions. Items containing question 
marks are distinguished from items without them, and from items with question marks 
in different fields. 
In Comp 1' there are now only four input positions left. The contribution of this 
step to the overall time complexity is therefore O(n 4) rather than C9(n9). The contribu- 
tion of Omit 5-6 and Omit 3-5-6 to the time complexity is O(n5). 
For the entire algorithm, the maximum number of relevant input positions per 
step is six. Thereby, the complexity of left-to-right recognition for TAGs under the 
constraint of the correct-prefix property is CO(n6). There are five steps that contain six 
relevant input positions, viz. Comp 2, Comp 3, Adj 0, Adj 1, and Adj 2. 
357 
Computational Linguistics Volume 25, Number 3 
In terms of the size of the grammar G, the complexity is (Q(IG\[2), since at most 
two elementary trees are simultaneously considered in a single step. Note that in 
some steps we address several parts of a single elementary tree, such as the two parts 
represented by the items \[h, Ft, ---+ 3_., fi, f~, f~, f~\] and \[h, N ~ c~, Mfl, i, j, -, -\] 
in Adj 1. However, the second of these items uniquely identifies the second field of 
the first item, and therefore this pair of items amounts to only one factor of IG\] in the 
time complexity. 
The complexity of (.9(n 6) that we have achieved depends on two ideas: first, the 
use of Adj 0, Adj 1, and Adj 2 instead of Adj 1 / and Adj 2 I, and second, the exclusion 
of irrelevant variables above. Both are needed. The exclusion of irrelevant variables 
alone, in combination with Adj 1 t and Adj 2 t, leads to a complexity of O(n8). Without 
excluding irrelevant variables, we obtain a complexity of 0(//9) due to Comp 1, which 
uses nine input positions. 
The question arises where the exact difference lies between our algorithm and 
that of Schabes and Joshi (1988), and whether their algorithm could be improved to 
obtain the same time complexity as ours, using techniques similar to those discussed 
above. This question is difficult to answer precisely because of the significant difference 
between the types of items that are used in the respective algorithms. However, some 
general considerations suggest that the algorithm from Schabes and Joshi (1988) is 
inherently more expensive. 
First, the items from the new algorithm have five input positions, which implies 
that storage of the parse table requires a space complexity of O(n5). The items from the 
older algorithm have effectively six input positions, which leads to a space complexity 
of 0(/76). 
Second, the "Right Completor" from Schabes and Joshi (1988), which roughly 
corresponds with our adjunctor steps, has nine relevant input positions. This step can 
be straightforwardly broken up into smaller steps that each have fewer relevant input 
positions, but it seems difficult to reduce the maximal number of positions to six. 
A final remark on Schabes and Joshi (1988) concerns the time complexity in terms 
of the size of the grammar that they report, viz. O(\]GI2). This would be the same upper 
bound as in the case of the new algorithm. However, the correct complexity seems to 
be O(\]G\]3), since each item contains references to two nodes of the same elementary 
tree, and the combination in "Right Completor" of two items entails the simultaneous 
use of three distinct nodes from the grammar. 
6. Further Research 
The algorithm in the present paper operates in a top-down manner, being very similar 
to Earley's algorithm (Earley 1970), which is emphasized by the use of the "dotted" 
items. As shown by Nederhof and Satta (1994), a family of parsing algorithms (top- 
down, left-corner, PLR, ELR, and LR parsing \[Nederhof 1994\]) can be carried over 
to head-driven parsing. An obvious question is whether such parsing techniques can 
also be used to produce variants of left-to-right parsing for TAGs. Thus, one may 
conjecture, for example, the existence of an LR-like parsing algorithm for arbitrary 
TAGs that operates in (_9(n 6) and that has the correct-prefix property. 
Note that LR-like parsing algorithms were proposed by Schabes and Vijay-Shanker 
(1990) and Nederhof (1998). However, for these algorithms the correct-prefix property 
is not satisfied. 
Development of advanced parsing algorithms for TAGs with the correct-prefix 
property is not at all straightforward. In the case of context-free grammars, the addi- 
tional benefit of LR parsing, in comparison to, for example, top-down parsing, lies in 
358 
Nederhof Correct-Prefix Property for TAGs 
the ability to process multiple grammar rules simultaneously. If this is to be carried 
over to TAGs, then multiple elementary trees must be handled simultaneously. This 
is difficult to combine with the mechanism we used to satisfy the correct-prefix prop- 
erty, which relies on filtering out hypotheses with respect to "left context." Filtering 
out such hypotheses requires detailed investigation of that left context, which, how- 
ever, precludes treating multiple elementary trees simultaneously. An exception may 
be the case when a TAG contains many, almost identical, elementary trees. It is not 
clear whether this case occurs often in practice. 
Therefore, further research is needed not only to precisely define advanced parsing 
algorithms for TAGs with the correct-prefix property, but also to determine whether 
there are any benefits for practical grammars. 
Acknowledgments 
Most of the presented research was carried 
out within the framework of the Priority 
Programme Language and Speech 
Technology (TST) while the author was 
employed at the University of Groningen. 
The TST-Programme is sponsored by NWO 
(Dutch Organization for Scientific Research). 
An error in a previous version of this paper 
was found and corrected with the help of 
Giorgio Satta. 
References 
Earley, Jay. 1970. An efficient context-free 
parsing algorithm. Communications of the 
ACM, 13(2):94-102, February. 
Joshi, Aravind K. 1987. An introduction to 
tree adjoining grammars. In Alexis 
Manaster-Ramer, editor, Mathematics of 
Language. John Benjamins Publishing 
Company, Amsterdam, pages 87-114. 
Lang, Bernard. 1988. The systematic 
construction of Earley parsers: 
Application to the production of C9(n 6) 
Earley parsers for tree adjoining 
grammars. Unpublished paper, December. 
Nederhof, Mark-Jan. 1994. An optimal 
tabular parsing algorithm. In Proceedings 
of the 32nd Annual Meeting, pages 117-124, 
Las Cruces, NM, June. Association for 
Computational Linguistics. 
Nederhof, Mark-Jan. 1998. An alternative 
LR algorithm for TAGs. In COLING-ACL 
'98 36th Annual Meeting o/the Association/or 
Computational Linguistics and 17th 
International Conference on Computational 
Linguistics, volume 2, pages 946-952, 
Montreal, Quebec, Canada, August. 
Nederhof, Mark-Jan and Giorgio Satta. 1994. 
An extended theory of head-driven 
parsing. In Proceedings o/the 32nd Annual 
Meeting, pages 210-217, Las Cruces, NM, 
June. Association for Computational 
Linguistics. 
Rajasekaran, Sanguthevar and Shibu 
Yooseph. 1995. TAL recognition in 
O(M(n2) ) time. In Proceedings of the 33rd 
Annual Meeting, pages 166-173, 
Cambridge, MA, June. Association for 
Computational Linguistics. 
Satta, Giorgio. 1994. Tree-adjoining 
grammar parsing and Boolean matrix 
multiplication. Computational Linguistics, 
20(2):173-191. 
Schabes, Yves. 1994. Left to right parsing of 
lexicalized tree-adjoining grammars. 
Computational Intelligence, 10(4):506-524. 
Schabes, Yves and Aravind K. Joshi. 1988. 
An Earley-type parsing algorithm for tree 
adjoining grammars. In Proceedings o/the 
26th Annual Meeting, pages 258-269, 
Buffalo, NY, June. Association for 
Computational Linguistics. 
Schabes, Yves and Stuart M. Shieber. 1994. 
An alternative conception of 
tree-adjoining derivation. Computational 
Linguistics, 20(1):91-124. 
Schabes, Yves and K. Vijay-Shanker. 1990. 
Deterministic left to right parsing of tree 
adjoining languages. In Proceedings o/the 
28th Annual Meeting, pages 276-283, 
Pittsburgh, PA, June. Association for 
Computational Linguistics. 
Schabes, Yves and Richard C. Waters. 1995. 
Tree insertion grammar: A cubic-time, 
parsable formalism that lexicalizes 
context-free grammar without changing 
the trees produced. Computational 
Linguistics, 21(4):479-513. 
Shieber, Stuart M., Yves Schabes, and 
Fernando C. N. Pereira. 1995. Principles 
and implementation of deductive parsing. 
Journal o/Logic Programming, 24:3-36. 
Sippu, Seppo and Eljas Soisalon-Soininen. 
1988. Parsing Theory, Vol. h Languages and 
Parsing. Volume 15 of EATCS Monographs 
on Theoretical Computer Science. 
Springer-Verlag. 
Ullman, Jeffrey D. 1982. Principles o/Database 
Systems. Computer Science Press. 
Vijay-Shankar, K. and Aravind K. Joshi. 
359 
Computational Linguistics Volume 25, Number 3 
1985. Some computational properties of 
tree adjoining grammars. In Proceedings of 
the 23rd Annual Meeting, pages 82-93, 
Chicago, IL, July. Association for 
Computational Linguistics. 
Vijay-Shanker, K. and David J. Weir. 1993. 
Parsing some constrained grammar 
formalisms. Computational Linguistics, 
19(4):591-636. 
Vijay-Shanker, K. and David J. Weir. 1994. 
The equivalence of four extensions of 
context-free grammars. Mathematical 
Systems Theory, 27:511-546. 
360 
