An Alternative Conception of 
Tree-Adjoining Derivation 
Yves Schabes* 
Mitsubishi Electric Research Laboratory 
Stuart M. Shieber t 
Harvard University 
The precise formulation of derivation for tree-adjoining grammars has important ramifications 
for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and 
statistical language modeling. We argue that the definition of tree-adjoining derivation must be 
reformulated in order to manifest the proper linguistic dependencies in derivations. The particular 
proposal is both precisely characterizable through a definition of TAG derivations as equivalence 
classes of ordered derivation trees, and computationally operational, by virtue of a compilation 
to linear indexed grammars together with an efficient algorithm for recognition and parsing 
according to the compiled grammar. 
1. Introduction 
In a context-free grammar, the derivation of a string in the rewriting sense can be cap- 
tured in a single canonical tree structure that abstracts all possible derivation orders. 
As it turns out, this derivation tree also corresponds exactly to the hierarchical structure 
that the derivation imposes on the string, the derived tree structure of the string. The 
formalism of tree-adjoining grammars (TAG), on the other hand, decouples these two 
notions of derivation tree and derived tree. Intuitively, the derivation tree is a more 
finely grained structure than the derived tree, and as such can serve as a substrate 
on which to pursue further analysis of the string. This intuitive possibility is made 
manifest in several ways. Fine-grained syntactic analysis can be pursued by imposing 
on the derivation tree further combinatorial constraints, for instance, selective adjoin- 
ing constraints or equational constraints over feature structures. Statistical analysis 
can be explored through the specification of derivational probabilities as formalized 
in stochastic tree-adjoining grammars. Semantic analysis can be overlaid through the 
synchronous derivations of two TAGs. 
All of these methods rely on the derivation tree as the source of the important 
primitive relationships among trees. The decoupling of derivation trees from derived 
trees thus makes possible a more flexible ability to pursue these types of analyses. At 
the same time, the exact definition of derivation becomes of paramount importance. 
In this paper, we argue that previous definitions of tree-adjoining derivation have not 
taken full advantage of this decoupling, and are not as appropriate as they might be 
for the kind of further analysis that tree-adjoining analyses could make possible. In 
particular, the standard definition of derivation, attributable to Vijay-Shanker (1987), 
• Cambridge, MA 02139 
t Division of Applied Sciences, Cambridge, MA 02138 
(~) 1994 Association for Computational Linguistics 
Computational Linguistics Volume 20, Number 1 
requires that auxiliary trees be adjoined at distinct nodes in elementary trees. However, 
in certain cases, especially cases characterized as linguistic modification, it is more 
appropriate to allow multiple adjunctions at a single node. 
In this paper we propose a redefinition of TAG derivation along these lines, 
whereby multiple auxiliary trees of modification can be adjoined at a single node, 
whereas only a single auxiliary tree of predication can. The redefinition constitutes a 
new definition of derivation for TAG that we will refer to as extended derivation. For 
such a redefinition to be serviceable, however, it is necessary that it be both precise 
and operational. In service of the former, we provide a formal definition of extended 
derivation using a new approach to representing derivations as equivalence classes of 
ordered derivation trees. With respect to the latter, we provide a method of compi- 
lation of TAGs into corresponding linear indexed grammars (LIG), which makes the 
derivation structure explicit; and show how the generated LIG can drive a parsing 
algorithm that recovers, either implicitly or explicitly, the extended derivations of the 
string. 
The paper is organized as follows. First we review Vijay-Shanker's standard defi- 
nition of TAG derivation and introduce the motivation for extended derivations. Then 
we present the extended notion of derivation and its formal definition. The original 
compilation of TAGs to LIGs provided by Vijay-Shanker and Weir and our variant for 
extended derivations are both described. Finally, we discuss a parsing algorithm for 
TAG that operates by a variant of Earley parsing on the corresponding LIG. The set 
of extended derivations can subsequently be recovered from the set of Earley items 
generated by the algorithm. The resultant algorithm is further modified so as to build 
an explicit derivation tree incrementally as parsing proceeds; this modification, which 
is a novel result in its own right, allows the parsing algorithm to be used by systems 
that require incremental processing with respect to tree-adjoining grammars. 
2. The Standard Definition of Derivation 
To exemplify the distinction between standard and extended derivations, we exhibit 
the TAG of Figure 1.1 This grammar derives some simple noun phrases such as 
"roasted red pepper" and "baked red potato." The former, for instance, is associated 
with the derived tree in Figure 2(a). The tree can be viewed as being derived in two 
ways: 2 
Dependent: The auxiliary tree,,flro is adjoined at the root node (address C) 3 of fire. 
The resultant tree is adjoined at the N node (address 1) of initial tree ape. 
This derivation is depicted as the derivation tree in Figure 3(a). 
Independent: The auxiliary trees flro and fire are adjoined at the N node (address 
1) of the initial tree ape. This derivation is depicted as the derivation tree 
in Figure 3(b). 
1 Here and elsewhere, we conventionally use the Greek letter c~ and its subscripted and primed variants 
for initial trees, fl and its variants for auxiliary trees, and ~, and its variants for elementary trees in 
general. The foot node of an auxiliary tree is marked with an asterisk ('*'). 2 We ignore here the possibility of another dependent derivation wherein adjunction occurs at the foot 
node of an auxiliary tree. Because this introduces yet another systematic ambiguity, it is typically 
disallowed by stipulation in the literature on linguistic analyses using TAGs. 3 The address of a node in a tree is taken to be its Gorn number, that sequence of integers specifying 
which branches to traverse in order starting from the root of the tree to reach the node. The address of 
the root of the tree is therefore the empty sequence, notated ¢. See the appendix for a more complete 
discussion of notation. 
92 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
NP NP L I 
N N I 
potato pepper 
N N 
Adj N* Adj N* I I 
roasted red 
N /N 
Adj N* I 
baked 
%0) (%) (fl ro ) (t~re ) 
Figure 1 
A sample tree-adjoining grammar. 
NP 
N 
Adj N 
roasted Adj N f I 
red pepper 
NP L 
N /N 
Adj N 
red Adj N I 
roasted pepper 
(a) (b) 
Figure 2 
Two trees derived by the grammar of Figure 1. 
o~ pe 
1 I 
~ro 
t~ pe J~ 
~ro ge 
(a) (b) 
Figure 3 
Derivation trees for the derived tree of Figure 2(a) according to the grammar of Figure 1. 
93 
Computational Linguistics Volume 20, Number 1 
In the independent derivation, two trees are separately adjoined at one and the same 
node in the initial tree. In the dependent derivation, on the other hand, one auxiliary 
tree is adjoined to the other, the latter only being adjoined to the initial tree. We will 
use this informal terminology uniformly in the sequel to distinguish the two general 
topologies of derivation trees. 
The standard definition of derivation, as codified by Vijay-Shanker, restricts deriva- 
tions so that two adjunctions cannot occur at the same node in the same elementary tree. The 
dependent notion of derivation (Figure 3(a)) is therefore the only sanctioned derivation 
for the desired tree in Figure 2(a); the independent derivation (Figure 3(b)) is disal- 
lowed. Vijay-Shanker's definition is appropriate because for any independent deriva- 
tion, there is a dependent derivation of the same derived tree. This can be easily seen 
in that any adjunction of f12 at a node at which an adjunction of fll occurs could instead 
be replaced by an adjunction of f12 at the root of ill. 
The advantage of this standard definition of derivation is that a derivation tree in 
this normal form unambiguously specifies a derived tree. The independent derivation 
tree, on the other hand, is ambiguous as to the derived tree it specifies in that a 
notion of precedence of the adjunctions at the same node is unspecified, but crucial to 
the derived tree specified. This follows from the fact that the independent derivation 
tree is symmetric with respect to the roles of the two auxiliary trees (by inspection), 
whereas the derived tree is not. By symmetry, therefore, it must be the case that the 
same independent derivation tree specifies the alternative derived tree in Figure 2(b). 
3. Motivation for an Extended Definition of Derivation 
In the absence of some further interpretation of the derivation tree nothing hinges on 
the choice of derivation definition, so that the standard definition disallowing inde- 
pendent derivations is as reasonable as any other. However, tree-adjoining grammars 
are almost universally extended with augmentations that make the issue apposite. 
We discuss three such variations here, all of which argue for the use of independent 
derivations under certain circumstances. 4 
3.1 Adding Adjoining Constraints 
Already in very early work on tree-adjoining grammars (Joshi, Levy, and Takahashi 
1975) constraints were allowed to be specified as to whether a particular auxiliary 
tree may or may not be adjoined at a particular node in a particular tree. The idea 
is formulated in its modern variant as selective-adjoining constraints (Vijay-Shanker and 
Joshi 1985). As an application of this capability, we consider the traditional grammatical 
view that directional adjuncts can be used only with certain verbs. 5 This would account 
4 The formulation of derivation for tree-adjoining grammars is also of significance for other grammatical 
formalisms based on weaker forms of adjunction such as lexicalized context-free grammar (Schabes 
and Waters 1993a) and its stochastic extension (Schabes and Waters 1993b), though we do not discuss 
these arguments here. 
5 For instance, Quirk, Greenbaum, Leech, and Svartvik (1985, page 517) remark that "direction adjuncts 
of both goal and source can normally be used only with verbs of motion." Although the restriction is 
undoubtedly a semantic one, we will examine the modeling of it in a TAG deriving syntactic trees for 
two reasons. First, the problematic nature of independent derivation is more easily seen in this way. 
Second, much of the intuition behind TAG analyses is based on a tight relationship between syntactic 
and semantic structure. Thus, whatever scheme for semantics is to be used with TAGs will require 
appropriate derivations to model these data. For example, an analysis of this phenomenon by adjoining 
constraints on the semantic half of a synchronous TAG would be subject to the identical argument. See 
Section 3.3. 
94 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
for the felicity distinctions between the following sentences: 
. a. 
b. 
Brockway walked his Labrador towards the yacht club. 
# Brockway resembled his Labrador towards the yacht club. 
This could be modeled by disallowing through selective adjoining constraints the 
adjunction of the elementary tree corresponding to a towards adverbial at the VP node 
of the elementary tree corresponding to the verb resembles. 6 However, the restriction 
applies even with intervening (and otherwise acceptable) adverbials. 
. a. 
b. 
3. a. 
b. 
Brockway walked his Labrador yesterday. 
Brockway walked his Labrador yesterday towards the yacht club. 
Brockway resembled his Labrador yesterday. 
# Brockway resembled his Labrador yesterday towards the yacht club. 
Under the standard definition of derivation, there is no direct adjunction in the latter 
sentence of the towards tree into the resembles tree. Rather, it is dependently adjoined 
at the root of the elementary tree that heads the adverbial yesterday, the latter directly 
adjoining into the main verb tree. To restrict both of the ill-formed sentences, then, 
a restriction must be placed not only on adjoining the goal adverbial in a resembles 
context, but also in the yesterday adverbial context. But this constraint is too strong, as 
it disallows sentence (2b) above as well. 
The problem is that the standard derivation does not correctly reflect the syn- 
tactic relation between the adverbial modifier and the phrase it modifies when there 
are multiple modifications in a single clause. In such a case, each of the adverbials 
independently modifies the verb, and this should be reflected in their independent 
adjunction at the same point. But this is specifically disallowed in a standard deriva- 
tion. 
Another example along the same lines follows from the requirement that tense 
as manifested in a verb group be consistent with temporal adjuncts. For instance, 
consider the following examples: 
4. a. Brockway 
b. # Brockway 
5. a. # Brockway 
b. Brockway 
walked his Labrador yesterday. 
will walk his Labrador yesterday. 
walked his Labrador tomorrow. 
will walk his Labrador tomorrow. 
Again, the relationship is independent of other intervening adjuncts. 
6. a. Brockway 
b. # Brockway 
7. a. # Brockway 
b. Brockway 
walked his Labrador towards the yacht club yesterday. 
will walk his Labrador towards the yacht club yesterday. 
walked his Labrador towards the yacht club tomorrow. 
will walk his Labrador towards the yacht club tomorrow. 
It is important to note that these arguments apply specifically to auxiliary trees that 
correspond to a modification relationship. Auxiliary trees are used in TAG typically 
6 Whether the adjunction occurs at the VP node or the S node is immaterial to the argument. 
95 
Computational Linguistics Volume 20, Number 1 
for predication relations as well, 7 as in the case of raising and sentential complement 
constructions, s Consider the following sentences. (The brackets mark the leaves of the 
pertinent trees to be combined by adjunction in the assumed analysis.) 
. a. 
b. 
9. a. 
b. 
10. a. 
b. 
11. a. 
b. 
Brockway assumed that Harrison wanted to walk his Labrador. 
\[Brockway assumed that\] \[Harrison wanted\] \[to walk his Labrador\] 
Brockway wanted to try to walk his Labrador. 
\[Brockway wanted\] \[to try\] \[to walk his Labrador\] 
Harrison wanted Brockway tried to walk his Labrador. 
\[Harrison wanted\] \[Brockway tried\] \[to walk his Labrador\] 
Harrison wanted to assume that Brockway walked his Labrador. 
\[Harrison wanted\] \[to assume that\] \[Brockway walked his Labrador\] 
Assume (following, for instance, the analysis of Kroch and Joshi \[1985\]) that the trees 
associated with the various forms of the verbs try, want, and assume all take senten- 
tial complements, certain of which are tensed with overt subjects and others untensed 
with empty subjects. The auxiliary trees for these verbs specify by adjoining constraints 
which type of sentential complement they take: assume requires tensed complements, 
want and try untensed. Under this analysis the auxiliary trees must not be allowed to 
independently adjoin at the same node. For instance, if trees corresponding to "Harri- 
son wanted" and "Brockway tried" (which both require untensed complements) were 
both adjoined at the root of the tree for "to walk his Labrador," the selective adjoin- 
ing constraints would be satisfied, yet the generated sentence (10a) is ungrammatical. 
Conversely, under independent adjunction, sentence (11a) would be deemed ungram- 
matical, although it is in fact grammatical. Thus, the case of predicative trees is entirely 
unlike that of modifier trees. Here, the standard notion of derivation is exactly what 
is needed as far as interpretation of adjoining constraints is concerned. 
An alternative would be to modify the way in which adjoining constraints are 
updated upon adjunction. If after adjoining a modifier tree at a node, the adjoining 
constraints of the original node, rather than those of the root and foot of the modifier 
tree, are manifest in the corresponding nodes in the derived tree, the adjoining con- 
straints would propagate appropriately to handle the examples above. This alternative 
leads, however, to a formalism for which derivation trees are no longer context-free, 
with concomitant difficulties in designing parsing algorithms. Instead, the extended 
definition of derivation effectively allows use of a Kleene-* in the "grammar" of deriva- 
tion trees. 
Adjoining constraints can also be implemented using feature structure equations 
(Vijay-Shanker and Joshi 1988). It is possible that judicious use of such techniques 
might prevent the particular problems noted here. Such an encoding of a solution 
requires consideration of constraints that pass among many trees just to limit the co- 
occurrence of a pair of trees. However, it more closely follows the spirit of TAGs to 
state such intuitively local limitations locally. 
7 We use the term 'predication' in its logical sense, that is, for auxiliary trees that serve as logical 
predicates over the trees into which they adjoin, in contrast to the term's linguistic sub-sense in which 
the argument of the predicate is a linguistic subject. 
8 The distinction between predicative and modifier trees has been proposed previously for purely 
linguistic reasons by Kroch (1989), who refers to them as complement and athematic trees, respectively. 
The arguments presented here can be seen as providing further evidence for differentiating the two 
kinds of auxiliary trees. A precursor to this idea can perhaps be seen in the distinction between 
repeatable and nonrepeatable adjunction in the formalism of string adjunct grammars, a precursor of 
TAGs (Joshi, Kosaraju, and Yamada 1972b, pages 253-254). 
96 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
In summary, the interpretation of adjoining constraints in TAG is sensitive to the 
particular notion of derivation that is used. Therefore, it can be used as a litmus 
test for an appropriate definition of derivation. As such, it argues for a nonstandard 
independent notion of derivation for modifier auxiliary trees and a standard dependent 
notion for predicative trees. 
3.2 Adding Statistical Parameters 
In a similar vein, the statistical parameters of a stochastic lexicalized TAG (SLTAG) 
(Resnik 1992; Schabes 1992) specify the probability of adjunction of a given auxiliary 
tree at a specific node in another tree. This specification may again be interpreted 
with regard to differing derivations, obviously with differing impact on the resulting 
probabilities assigned to derivation trees. (In the extreme case, a constraint prohibiting 
adjoining corresponds to a zero probability in an SLTAG. The relation to the argument 
in the previous section follows thereby.) Consider a case in which linguistic modifi- 
cation of noun phrases by adjectives is modeled by adjunction of a modifying tree. 
Under the standard definition of derivation, multiple modifications of a single NP 
would lead to dependent adjunctions in which a first modifier adjoins at the root of 
a second. As an example, we consider again .the grammar given in Figure 1, which 
admits of derivations for the strings "baked red potato" and "baked red pepper." 
Specifying adjunction probabilities on standard derivations, the distinction between 
the overall probabilities for these two strings depends solely on the adjunction proba- 
bilities of fire (the tree for red) into ~po and c~p¢ (those for potato and pepper, respectively), 
as the tree fib for the word baked is adjoined in both cases at the root of fire in both 
standard derivations. In the extended derivations, on the other hand, both modifying 
trees are adjoined independently into the noun trees. Thus, the overall probabilities 
are determined as well by the probabilities of adjunction of the trees for baked into the 
nominal trees. It seems intuitively plausible that the most important relationships to 
characterize statistically are those between modifier and modified, rather than between 
two modifiers. 9 In the case at hand, the fact that one typically refers to the process 
of cooking potatoes as "baking," whereas the appropriate term for the corresponding 
cooking process applied to peppers is "roasting," would be more determining of the 
expected overall probabilities. 
Note again that the distinction between modifier and predicative trees is important. 
The standard definition of derivation is entirely appropriate for adjunction probabili- 
ties for predicative trees, but not for modifier trees. 
3.3 Adding Semantics 
Finally, the formation of synchronous TAGs has been proposed to allow use of TAGs 
in semantic interpretation, natural language generation, and machine translation. In 
previous work (Shieber and Schabes 1990), the definition of synchronous TAG deriva- 
tion is given in a manner that requires multiple adjunctions at a single node. The need 
for such derivations follows from the fact that synchronous derivations are intended 
to model semantic relationships. In cases of multiple adjunction of modifier trees at 
9 Intuition is an appropriate guide in the design of the SLTAG framework, as the idea is to set up a 
linguistically plausible infrastructure on top of which a lexically based statistical model can be built. In 
addition, suggestive (though certainly not conclusive) evidence along these lines can be gleaned from 
corpora analyses. For instance, in a simple experiment in which medium frequency triples of exactly 
the discussed form "(adjective) (adjective) (noun)" were examined, the mean mutual information 
between the first adjective and the noun was found to be larger than that between the two adjectives. 
The statistical assumptions behind this particular experiment do not allow very robust conclusions to 
be drawn, and more work is needed along these lines. 
97 
Computational Linguistics Volume 20, Number 1 
a single node, the appropriate semantic relationships comprise separate modifications 
rather than cascaded ones, and this is reflected in the definition of synchronous TAG 
derivation. 1° Because of this, a parser for synchronous TAGs must recover, at least 
implicitly, the extended derivations of TAG-derived trees. Shieber (in press) provides 
a more complete discussion of the relationship between synchronous TAGs and the 
extended definition of derivation with special emphasis on the ramifications for formal 
expressivity. 
Note that the independence of the adjunction of modifiers in the syntax does not 
imply that semantically there is no precedence or scoping relation between them. As 
exemplified in Figure 5, the derived tree generated by multiple independent adjunc- 
tions at a single node still manifests nesting relationships among the adjoined trees. 
This fact may be used to advantage in the semantic half of a synchronous tree-adjoining 
grammar to specify the semantic distinction between, for example, the following two 
sentences: u 
12. a. 
b. 
Brockway ran over his polo mallet twice intentionally. 
Brockway ran over his polo mallet intentionally twice. 
We hope to address this issue in greater detail in future work on synchronous tree- 
adjoining grammars. 
3.4 Desired Properties of Extended Derivations 
We have presented several arguments that the standard notion of derivation does not 
allow for an appropriate specification of dependencies to be captured. An extended 
notion of derivation is needed that 
1. 
2. 
3. 
4. 
differentiates predicative and modifier auxiliary trees; 
requires dependent derivations for predicative trees; 
allows independent derivations for modifier trees; and 
unambiguously and nonredundantly specifies a derived tree. 
Furthermore, following from considerations of the role of modifier trees in a grammar 
as essentially optional and freely applicable elements, we would like the following 
criterion to hold of extended derivations: 
. If a node can be modified at all, it can be modified any number of times, 
including zero times. 
Recall that a derivation tree (as traditionally conceived) is a tree with unordered 
arcs where each node is labeled by an elementary tree of a TAG and each arc is labeled 
by a tree address specifying a node in the parent tree. In a standard derivation tree 
no two sibling arcs can be labeled with the same address. In an extended derivation 
tree, however, the condition is relaxed: No two sibling arcs to predicative trees can be 
10 The importance of the distinction between predicative and modifier trees with respect to how 
derivations are defined was not appreciated in the earlier work; derivations were taken to be of the 
independent variety in all cases. In future work, we plan to remedy this flaw. 
11 We are indebted to an anonymous reviewer of an earlier version of this paper for raising this issue 
crisply through examples similar to those given here. 
98 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
labeled with the same address. Thus, for any given address there can be at most one 
predicative tree and several modifier trees adjoined at that node. As we have seen, this 
relaxed definition violates the fourth desideratum above; for instance, the derivation 
tree in Figure 3(b) ambiguously specifies both derived trees in Figure 2. In the next 
section we provide a formal definition of extended derivations that satisfies all of the 
criteria above. 
4. Formal Definition of Extended Derivations 
In this section we introduce a new framework for describing TAG derivation trees that 
allows for a natural expression of both standard and extended derivations, and makes 
available even more fine-grained restrictions on derivation trees. First, we define or- 
dered derivation trees and show that they unambiguously but redundantly specify 
derivations. 12 We characterize the redundant trees as those related by a sibling swap- 
ping operation. Derivation trees proper are then taken to be the equivalence classes of 
ordered derivation trees in which the equivalence relation is generated by the sibling 
swapping. By limiting the underlying set of ordered derivation trees in various ways, 
Vijay-Shanker's definition of derivation tree, a precise form of the extended definition, 
and many other definitions of derivation can be characterized in this way. 
4.1 Ordered Derivation Trees 
Ordered derivation trees, like the traditional derivation trees, are trees with" nodes 
labeled by elementary trees where each arc is labeled with an address in the tree for 
the parent node of the arc. However, the arcs are taken to be ordered with respect to 
each other. 
An ordered derivation tree is well-formed if for each of its arcs, linking parent 
node labeled 3` to child node labeled 3`~ and itself labeled with address t, the tree 3" 
is an auxiliary tree that can be adjoined at the node t in the tree 3'. (Alternatively, if 
substitution is allowed, 3"~ may be an initial tree that can be substituted at the node t 
in 3`. Later definitions ignore this possibility, but are easily generalized.) 
We define the function/~ from ordered derivation trees to the derived trees they 
specify, according to the following recursive definition: 
/9(D) = { 
3` if D is a trivial tree of one node labeled with the elementary tree 3' 
3`\[/9(Dl)/t1,79(D2)/t2,..., ~D(Dk) /tk\] 
if D is a tree with root node labeled with the elementary tree 3` 
and with k child subtrees D1,..., Dk 
whose arcs are labeled with addresses tl,..., tk. 
Here 3`\[A1/h,...,Ak/tk\] specifies the simultaneous adjunction of trees A1 through Ak 
at tl through tk, respectively, in 3'. It is defined as the iterative adjunction of the Ai in 
order at their respective addresses, with appropriate updating of the tree addresses of 
any later adjunction to reflect the effect of earlier adjunctions that occur at addresses 
dominating the address of the later adjunction. 
12 Historical precedent for independent derivation and the associated ordered derivation trees can be 
found in the derivation trees postulated for string adjunct grammars (Joshi, Kosaraju, and Yamada 
1972a, 99-100). In this system, siblings in derivation trees are viewed as totally, not partially, ordered. 
The systematic ambiguity introduced thereby is eliminated by stipulating that the sibling order be 
consistent with an arbitrary ordering on adjunction sites. 
99 
Computational Linguistics Volume 20, Number 1 
4.2 Derivation Trees 
It is easy to see that the derived tree specified by a given ordered derivation tree is 
unchanged if adjacent siblings whose arcs are labeled with different tree addresses are 
swapped. (This is not true of adjacent siblings whose arcs are labeled with the same 
address.) That is, if t ~ t' then 3,\[... ,Aft, B/t',...\] = 7\[..., B/t', Aft,...\]. A graphical 
"proof" of this intuitive fact is given in Figure 4. A formal proof, although tedious and 
unenlightening, is possible as well. We provide it in an appendix, primarily because 
the definitional aspects of the TAG formulation may be of some interest. 
This fact about the swapping of adjacent siblings shows that ordered derivation 
trees possess an inherent redundancy. The order of adjacent sibling subtrees labeled 
with different tree addresses is immaterial. Consequently, we can define true derivation 
trees to be the equivalence classes of the base set of ordered derivation trees under the 
equivalence relation generated by the sibling subtree swapping operation above. This 
is a well-formed definition by virtue of the proposition argued informally above. 
This definition generalizes the traditional definition in not restricting the tree ad- 
dress labels in any way. It therefore satisfies criterion (3) of Section 3.4. Furthermore, by 
virtue of the explicit quotient with respect to sibling swapping, a derivation tree under 
this definition unambiguously and nonredundantly specifies a derived tree (criterion 
4). It does not, however, differentiate predicative from modifier trees (criterion (1)), nor 
can it therefore mandate dependent derivations for predicative trees (criterion (2)). 
This general approach can, however, be specialized to correspond to several pre- 
vious definitions of derivation tree. For instance, if we further restrict the base set 
of ordered derivation trees so that no two siblings are labeled with the same tree 
address, then the equivalence relation over these ordered derivation trees allows for 
full reordering of all siblings. Clearly, these equivalence classes are isomorphic to the 
unordered trees, and we have reconstructed Vijay-Shanker's standard definition of 
derivation tree. 
If we instead restrict ordered derivation trees so that no two siblings corresponding 
to predicative trees are labeled with the same tree address, then we have reconstructed 
a version of the extended definition argued for in this paper. Under this restriction, 
criteria (1) and (2) are satisfied, while maintaining (3) and (4). 
By careful selection of other constraints on the base set, other linguistic restrictions 
might be imposed on derivation trees, still using the same definition of derivation trees 
as equivalence classes over ordered derivation trees. In the next section, we show that 
the definition of the previous paragraph should be further restricted to disallow the 
reordering of predicative and modifier trees. We also describe other potential linguistic 
applications of the ability to finely control the notion of derivation through the use of 
ordered derivation trees. 
4.3 Further Restrictions on Extended Derivations 
The extended definition of derivation tree given in the previous section effectively 
specifies the output derived tree by adding a partial ordering on sibling arcs that 
correspond to modifier trees adjoined at the same address. All other arcs are effectively 
unordered (in the sense that all relative orderings of them exist in the equivalence 
class). 
Assume that in a given tree ~, at a particular address t, the k modifier trees #1,..., ~k 
are directly adjoined in that order. Associated with the subtrees rooted at the k ele- 
mentary auxiliary trees in this derivation are k derived auxiliary trees (A1,...,Ak, 
respectively). The derived tree specified by this derivation tree, according to the def- 
inition of ~ given above, would have the derived tree A1 directly below A2 and so 
forth, with Ak at the top. Now suppose that in addition, a predicative tree 7r is also 
100 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
(a) 
(b) 
(c) 
Figure 4 
A graphical proof of the irrelevance of adjacent sibling swapping. 
These diagrams show the effect of performing two adjunctions (of auxiliary trees depicted, 
one as dark-shaded and one light-shaded), presumed to be specified by adjacent siblings in an 
ordered derivation tree. The adjunctions are to occur at two addresses (referred to in this 
caption as t and t', respectively). The two addresses must be such that either (a) they are 
distinct but neither dominates the other, (b) t dominates t' (or vice versa), or (c) they are 
identical. In case (a) the diagram shows that either order of adjunction yields the same 
derived tree. Adjunction at t and then t' corresponds to the upper arrows, adjunction at t' and 
then t the lower arrows. Similarly, in case (b), adjunction at t followed by adjunction at an 
appropriately updated t' yields the same result as adjunction first at t' and then at t. Clearly, 
adjunctions occurring before these two or after do not affect the interchangeability. Thus, if 
two adjacent siblings in a derivation tree specify adjunctions at distinct addresses t and t', the 
adjunctions can occur in either order. Diagram (c) demonstrates that this is not the case when 
t and t' are the same. 
101 
Computational Linguistics Volume 20, Number 1 
Y 
"'" \]/1 "'" \]'/k 7~ "'" /A~ ~ 
A AA 
(a) (b) • 
/¢-.. 
Figure 5 
Schematic extended derivation tree and associated derived tree. 
In a derived tree, the predicative tree adjoined at an address t is required to follow all 
modifier trees adjoined at the same address, as in (a). The derived tree therefore appears as 
depicted in (b) with the predicative tree outermost. 
adjoined at address t. It must be ordered with respect to the #i in the derivation tree, 
and its relative order determines where in the bottom-to-top order in the derived tree 
the tree A,~ associated with the subderivation rooted at 7r goes. 
The question that we raise here is whether all k + 1 possible placements of the tree 
~r relative to the #i are linguistically reasonable. We might allow all k + 1 orderings 
(as in the definition of the previous section), or we might restrict them by requiring, 
say, that the predicative tree always be adjoined before, or perhaps after, any modifier 
trees at a given address. We emphasize that this is a linguistic question, in the sense 
that the definition of extended derivation is well formed whatever decision is made 
on this question. 
Henceforth, we will assume that predicative trees are always adjoined after any 
modifier trees at the same address, so that they appear above the modifier trees in the 
derived tree. We call this "outermost predication" because a predicative tree appears 
wrapped around the outside of the modifier trees adjoined at the same address. (See 
Figure 5.) If we were to mandate innermost predication, in which a predicative tree 
is always adjoined before the modifier trees at the same address, the predicative tree 
would appear within all of the modifier trees, innermost in the derived tree. 
Linguistically, the outermost method specifies that if both a predicative tree and a 
modifier tree are adjoined at a single node, then the predicative tree attaches higher 
than the modifier tree; in terms of the derived tree, it is as if the predicative tree 
were adjoined at the root of the modifier tree. This accords with the semantic intuition 
that in such a case (for English at least), the modifier is modifying the original tree, 
not the predicative one. (The alternate "reading," in which the modifier modifies the 
predicative tree, is still obtainable under an outermost-predication standard by having 
the modifier auxiliary tree adjoin dependently at the root node of the predicative tree.) 
102 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
In contrast, the innermost-predication method specifies that the modifier tree attaches 
higher, as if the modifier tree adjoined at the root of the predicative tree and was 
therefore modifying the predicative tree, contra semantic intuitions. 
For this reason, we specify that outermost predication is mandated. This is easily 
done by further limiting the base set of ordered derivation trees to those in which 
predicative trees are ordered after modifier tree siblings. 
(From a technical standpoint, by the way, the outermost-predication method has 
the advantage that it requires no changes to the parsing rules to be presented later, 
but only a single addition. The innermost-predication method induces some subtle 
interactions between the original parsing rules and the additional one, necessitating 
a much more complicated set of modifications to the original algorithm. In fact, the 
complexities in generating such an algorithm constituted the precipitating factor that 
led us to revise our original innermost-predication attempt at redefining tree-adjoining 
derivation. The linguistic argument, although commanding, became clear to us only 
later.) 
Another possibility, which we mention but do not pursue here, is to allow for 
language-particular precedence constraints to restrict the possible orderings of deriva- 
tion-tree siblings, in a manner similar to the linear precedence constraints of ID/LP 
format (Gazdar, Klein, Pullum, and Sag 1985) but at the level of derivation trees. 
These might be interpreted as hard constraints or soft orderings depending on the 
application. This more fine-grained approach to the issue of ordering has several ap- 
plications. Soft orderings might be used to account for ordering preferences among 
modifiers, such as the default ordering of English adjectives that accounts for the typ- 
ical preference for "a large red ball" over "? a red large ball" and the typical ordering 
of temporal before spatial adverbial phrases in German. 
Similarly, hard constraints might allow for the handling of an apparent counter- 
example to the outermost-predication rule. 13 One natural analysis of the sentence 
13. At what time did Brockway say Harrison arrived? 
would involve adjunction of a predicative tree for the phrase "did Brockway say" at 
the root of the tree for "Harrison arrived." A Wh modifier tree "at what time" must 
be adjoined in as well. The example question is ambiguous, of course, as to whether 
it questions the time of the saying or of the arriving. In the former case, the modifier 
tree presumably adjoins at the root of the predicative tree for "did Brockway say" that 
it modifies. In the latter case, which is of primary interest here, it must adjoin at the 
root of the tree for "Harrison arrived." Thus, both trees would be adjoined at the same 
address, and the outermost-predication rule would predict the derived sentence to be 
"Did Brockway say at what time Harrison arrived." To get around this problem, we 
might specify hard ordering constraints for English that place all Wh modifier trees 
after all predicative trees, which in turn come after all non-Wh modifier trees. This 
would place the Wh modifier outermost as required. 
Although we find this extra flexibility to be an attractive aspect of this approach, 
we stay with the more stringent outermost-predication restriction in the material that 
follows. 
13 Other solutions are possible that do not require extended derivations or linear precedence constraints. 
For instance, we might postulate an elementary tree for the verb arrived that includes a substitution 
node for a fronted adverbial Wh phrase. 
103 
Computational Linguistics Volume 20, Number 1 
5. Compilation of TAGs to Linear Indexed Grammars 
In this section we present a technique for compiling tree-adjoining grammars into 
linear indexed grammars such that the linear indexed grammar makes explicit the 
extended derivations of the TAG. This compilation plays two roles. First, it provides 
for a simple proof of the generative equivalence of TAGs under the standard and 
extended definitions of derivation, as described at the end of this section. Second, it 
can be used as the basis for a parsing algorithm that recovers the extended derivations 
for strings. The design of such an algorithm is the topic of Section 6. 
Linear indexed grammars (LIG) constitute a grammatical framework based, like 
context-free, context-sensitive, and unrestricted rewriting systems, on rewriting strings 
of nonterminal and terminal symbols. Unlike these systems, linear indexed grammars, 
like the indexed grammars from which they are restricted, allow stacks of marker 
symbols, called indices, to be associated with the nonterminal symbols being rewritten. 
The linear version of the formalism allows the full index information from the parent 
to be used to specify the index information for only one of the child constituents. 
Thus, a linear indexed production can be given schematically as: 
No\[..~o \] --+ Nl\[fll\]..-Ns_l\[fls_l\] Ns\[..~s\] Ns+l\[~s+l\].. "Nk\[flk\] 
The Ni are nonterminals, the fli. strings of indices. The ".." notation stands for the 
remainder of the stack below the given string of indices. Note that only one element 
on the right-hand side, Ns, inherits the remainder of the stack from the parent. (This 
schematic rule is intended to be indicative, not definitive. We ignore issues such as 
the optionality of the inherited stack how terminal symbols fit in, and so forth. Vijay- 
Shanker and Weir \[1990\] present a complete discussion.) 
Vijay-Shanker and Weir (1990) present a way of specifying any TAG as a linear 
indexed grammar. The LIG version makes explicit the standard notion of derivation 
being presumed. Also, the LIG version of a TAG grammar can be used for recognition 
and parsing. Because the LIG formalism is based on augmented rewriting, the parsing 
algorithms can be much simpler to understand and easier to modify, and no loss of 
generality is incurred. For these reasons, we use the technique in this work. 
The compilation process that manifests the standard definition of derivation can 
be most easily understood by viewing nodes in a TAG elementary tree as having 
both a top and bottom component, identically marked for nonterminal category, that 
dominate (but may not immediately dominate) each other. (See Figure 6.) The rewrite 
rules of the corresponding linear indexed grammar capture the immediate domination 
between a bottom node and its child top nodes directly, and capture the domination 
between top and bottom parts of the same node by optionally allowing rewriting from 
the top of a node to an appropriate auxiliary tree, and from the foot of the auxiliary 
tree back to the bottom of the node. The index stack keeps track of the nodes on which 
adjunction has occurred so that the recognition to the left and the right of the foot 
node will occur under identical assumption of derivation structure. 
The TAG grammar is encoded as a LIG with two nonterminal symbols t and b cor- 
responding to the top and bottom components, respectively, of each node. The stack 
indices correspond to the individual nodes of the elementary trees of the TAG gram- 
mar. Thus, there are as many stack index symbols as there are nodes in the elementary 
trees of the grammar, and each such index (i.e., node) corresponds unambiguously to 
a single address in a single elementary tree. (In fact, the symbols can be thought of as 
pairs of an elementary tree identifier and an address within that tree, and our imple- 
mentation encodes them in just that way.) The index at the top of the stack corresponds 
104 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
Type 4 
~ ~-t\[~. r\] 
"~i'~ ~~ --~ Type3 //~ 
/ Type btnfl 
Figure 6 
Schematic structure of adjunction with top and bottom of each node separated. 
7/I 7/2 /7/3 
Figure 7 
A stack of indices \[717273\] captures the adjunction history that led to the reaching of the node 
73 in the parsing process. 
Parsing of an elementary tree c~ proceeded to node 71 in that tree, at which point 
adjunction of the tree containing 72 was pursued by the parser. When the node 72 was 
reached, the tree containing 73 was implicitly adjoined. Once this latter tree is completely 
parsed, the remainder of the tree containing 72 can be parsed from that point, and so on. 
to the node being rewritten. Thus, a LIG nonterminal with stack t\[~\] corresponds to 
the top component of node 7, and b\[~\]1712~\]3\] corresponds to the bottom component of 
73- The indices ~h and 7/2 capture the history of adjunctions that are pending comple- 
tion of the tree in which 73 is a node. Figure 7 depicts the interpretation of a stack of 
indices. 
In summary, given a tree-adjoining grammar, the following LIG rules are gener- 
ated: 
. Immediate domination dominating foot: For each auxiliary tree node 7 that 
dominates the foot node, with children 71,..., 7s,..., ~n, where 7\]s is the 
child that also dominates the foot node, include a production 
b\[..,\] --* t\[71\]'"" t\[7s-,\]t\[..71s\]t\[~ls+l\]'" t\[7n\]. 
105 
Computational Linguistics Volume 20, Number 1 
. 
. 
Immediate domination not including foot: For each elementary tree node ~/ 
that does not dominate a foot node, with children ~/1,..., ~/n, include a 
production 
b\[,\] --* t\[,,\].., t\[,n\]. 
No adjunction: For each elementary tree node ~/that is not marked for 
substitution or obligatory adjunction, include a production 
t\[..,\] b\[..,\]. 
. 
. 
. 
Start root ofadjunction: For each elementary tree node ~ on which the 
auxiliary tree fl with root node ~r can be adjoined, include the following 
production: 
t\[..,\] --* t\[..,,r\]. 
Start foot ofadjunction: For each elementary tree node ~ on which the 
auxiliary tree fl with foot node ~//can be adjoined, include the following 
production: 
b\[..,,f\] ~ b\[..~/\]. 
Start substitution: For each elementary tree node ~ marked for 
substitution on which the initial tree c~ with root node ?~r can be 
substituted, include the production 
t\[,\] --* t\[,r\]. 
We will refer to productions generated by Rule i above as Type i productions. For 
example, Type 3 productions are of the form t\[..~/\] --* b\[..~\]. For further information 
concerning the compilation see Vijay-Shanker and Weir (1990). For present purposes, it 
is sufficient to note that the method directly embeds the standard notion of derivation 
in the rewriting process. To perform an adjunction, we move (by Rule 4) from the 
node adjoined at to the top of the root of the auxiliary tree. At the root, additional 
adjunctions might be performed. When returning from the foot of the auxiliary tree 
back to the node where adjunction occurred, rewriting continues at the bottom of the 
node (see Rule 5), not the top, so that no more adjunctions can be started at that node. 
Thus, the dependent nature of predicative adjunction is enforced because only a single 
adjunction can occur at any given node. 
In order to permit extended derivations, we must allow for multiple modifier tree 
adjunctions at a single node. There are two natural ways this might be accomplished, 
as depicted in Figure 8. 
1. Modified start foot ofadjunction rule: Allow moving from the bottom of the 
foot of a modifier auxiliary tree to the top (rather than the bottom) of the 
node at which it adjoined (Figure 8b). 
2. Modified start root of adjunction rule: Allow moving from the bottom (rather 
than the top) of a node to the top of the root of a modifier auxiliary tree 
(Figure 8c). 
As can be seen from the figures, both of these methods allow recursion at a node, 
unlike the original method depicted in Figure 8a. Thus multiple modifier trees are 
allowed to adjoin at a single node. Note that since predicative trees fall under the 
original rules, at most a single predicative tree can be adjoined at a node. The two 
106 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
(a) ~/~ predicative / 
L.S 
~e 
Figure 8 
Schematic structure of possible predicative and modifier adjunctions with top and bottom of 
each node separated. 
methods correspond exactly to the innermost- and outermost-predication methods 
discussed in Section 4.3. For the reasons described there, the latter is preferred. TM 
In summary, independent derivation structures can be allowed for modifier aux- 
iliary trees by starting the adjunction process from the bottom, rather than the top of 
a node for those trees. Thus, we split Type 4 LIG productions into two subtypes for 
predicative and modifier trees, respectively. 
4a. 
4b. 
Start root of predicative adjunction: For each elementary tree node 7/on 
which the predicative auxiliary tree fl with root node T\]F can be adjoined, 
include the following production: 
t\[..,\] ~ t\[..~p?~\]. 
Start root of modifier adjunction: For each elementary tree node ~/on which 
the modifier auxiliary tree fl with root node ~/r can be adjoined, include 
the following production: 
b\[..~/\] ~ t\[.3l~lr \]. 
Once this augmentation has been made, we no longer need to allow for adjunctions at 
the root nodes of modifier auxiliary trees, as repeated adjunction is now allowed for 
14 The more general definition allowing predicative trees to occur anywhere within a sequence of modifier adjunctions would be achieved by adding both types of rules. 
107 
Computational Linguistics Volume 20, Number 1 
by the new rule 4b. Consequently, grammars should forbid adjunction of a modifier 
tree fll at the root of a modifier tree f12 except where fll is intended to modify /32 
directly. 
This simple modification to the compilation process from TAG to LIG fully spec- 
ifies the modified notion of derivation. Note that the extra criterion (5) noted in Sec- 
tion 3.4 is satisfied by this definition: modifier adjunctions are inherently repeatable 
and eliminable as the movement through the adjunction "loop" ends up at the same 
point that it begins. The recognition algorithms for TAG based on this compilation, 
however, must be adjusted to allow for the new rule types. 
This compilation makes possible a simple proof of the weak-generative equiva- 
lence of TAGs under the standard and extended derivations, is Call the set of languages 
generable by a TAG under the standard definition of derivation TALs and under the 
extended definition TALe. Clearly, TALs c TALe since the standard definition can be 
mimicked by making all auxiliary trees predicative. The compilation above provides 
the inclusion TALe C LIL, where LIL is the set of linear indexed languages. The final 
inclusion LIL C_ TALs has been shown indirectly by Vijay-Shanker (1987) using em- 
bedded push-down automata and modified head grammars as intermediaries. From 
these inclusions, we can conclude that TALs = TALe. 
6. Recognition and Parsing 
A recognition algorithm for TAGs can be constructed based on the above translation 
into corresponding LIGs as specified by Rules 1 through 6 in the previous section. The 
algorithm is not a full recognition algorithm for LIGs, but rather, is tuned for exactly 
the types of rules generated as output of this compilation process. In this section, we 
present the recognition algorithm and modify it to work with the extended derivation 
compilation. 
We will use the following notations in this and later sections. The symbol P will 
serve as a variable over the two LIG grammar nonterminals t and b. The substring of 
the string wl ... Wn being parsed between indices i and j will be notated as wi+t ". wj, 
which we take to be the empty string when i is greater than or equal to j. We will use 
p, A, and {9 for sequences containing terminals and LIG nonterminals with their stack 
specifications. For instance, F might be t\[rll\]t\[..rl2\]t\[rl3 \]. 
The parsing algorithm can be seen as a tabular parsing method based on deduction 
of items, as in Earley deduction (Pereira and Warren 1983). We will so describe it, by 
presenting inference rules over items of the form 
(e\[r/\] --* r • A,i,j,k,l). 
Such items play the role of the items of Earley's algorithm. Unlike the items of Earley's 
algorithm, however, an item of this form does not embed a grammar rule proper; that 
is, P\[7/\] --+ pA is not necessarily a rule of the grammar. Rather, it is what we will call 
a reduced rule; for reasons described below, the nonterminals in F and A as well as 
the nonterminal P\[~/\] record only the top element of each stack of indices. We will use 
the notation P\[~\] --+ pA for the unreduced form of the rule whose reduced form is 
p\[~/\] --+ pA. For instance, the rule specified by the notation t\[~/1\] --+ t\[712\] might be the 
rule t\[..~l\] --+ t\[..~1~\]2\]. The reader can easily verify that the TAG to LIG compilation is 
such that there is a one-to-one correspondence between the generated rules and their 
reduced form. Consequently, this notation is well defined. 
15 We are grateful to K. Vijay-Shanker for bringing this point to our attention. 
108 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
The dot in the items is analogous to that found in Earley and LR items as well. It 
serves as a marker for how far recognition has proceeded in identifying the subcon- 
stituents for this rule. The indices i, j, k, and l specify the portion of the string Wl .. • w~ 
covered by the recognition of the item. The substring between i and 1 (i.e., wi+ 1 "'" Wl) 
has been recognized, perhaps with a region between j and k where the foot of the tree 
below the node ~ has been recognized. (If the foot node is not dominated by F, we 
take the values of j and k to be the dummy value '-'.) 
6.1 The Inference Rules 
In this section, we specify several inference rules for parsing a LIG generated from a 
TAG, which we recall in this section. One explanatory comment is in order, however, 
before the rules are presented. The rules of a LIG associate with each constituent a 
nonterminal and a stack of indices. It seems natural for a parsing algorithm to maintain 
this association by building items that specify for each constituent the full information 
of nonterminal and index stack. However, this would necessitate storing an unbounded 
amount of information for each potential constituent, resulting in a parsing algorithm 
that is potentially quite inefficient when nondeterminism arises during the parsing 
process, and perhaps noneffective if the grammar is infinitely ambiguous. Instead, the 
parse items manipulated by the inference rules that we present do not keep all of 
this information for each constituent. Rather, the items keep only the single top stack 
element for each constituent (in addition to the nonterminal symbol). This drastically 
decreases the number of possible items and accounts for the polynomial character of 
the resultant algorithm. 16 Side conditions make up for some of the loss of information, 
thereby maintaining correctness. For instance, the Type 4 Completor rule specifies a 
relation between ~ and ~/f that takes the place of popping an element off of the stack 
associated with ~. However, the side conditions are strictly weaker than maintaining 
full stack information. Consequently, the algorithm, though correct, does not maintain 
the valid prefix property. See Schabes (1991) for further discussion and alternatives. 
Scanning and prediction work much as in Earley's original algorithm. 
• Scanner: 
(b\[,\] -* F • aA, i,j, k, l> 
(b\[7/\] -* Fa • A, i,j, k, l+ 1> a ~- Wl+ 1 
Note that the only rules that need be considered are those where the 
parent is a bottom node, as terminal symbols occur on the right-hand 
side only of Type 1 or 2 productions. Otherwise, the rule is exactly as 
that for Earley's algorithm except that the extra foot indices (j and k) are 
carried along. 
• Predictor: 
(P\[~/\] --* F • P'\[~\]'\] A, i,j, k, l) 
(P'\[~/\] -* • O, l, -, -, l) P'\[~'\] ~ @ 
This rule serves to form predictions for any type production in the 
grammar, as the variables P and P' range over the values t and b. In the 
16 Vijay-Shanker and Weir (1990) first proposed the recording of only the top stack element in order to 
achieve efficient parsing. The algorithm they presented is a bottom-up general LIG parsing algorithm. 
Schabes (1991) sketches a proof of an O(n 6) bound for an Earley-style algorithm for TAG parsing that 
is more closely related to the algorithm proposed here. 
109 
Computational Linguistics Volume 20, Number 1 
predicted item, the foot is not dominated by the (empty) recognized 
input, so that the dummy value '-' is used for the foot indices. Note that 
the predicted item records the reduced form of an unreduced rule 
P'\[~/'\] --* (9 of the grammar. 
Completion of items (moving of the dot from left to right over a nonterminal) 
breaks up into several cases, depending on which production type is being completed. 
This is because the addition of the extra indices and the separate interpretations for 
top and bottom productions require differing index manipulations to be performed. 
We will list the various steps, organized by what type of production they participate 
in the completion of. . 
Productions that specify immediate domination (from Rules I and 2) are completed 
whenever the top of the child node is fully recognized. 
• Type I and 2 Completor: 
{b\[,1\] --* P•t\[,\]A,m,j',k',i} {t\[,\] ~ (9• ,i,j,k,l} 
(b\[~/1\] --* rt\[,\] • A m,j U j', k U k', l} 
Here, t\[7/\] has been fully recognized as the substring between i and I. The 
item expecting t\[~\] can be completed. One of the two antecedent items 
might also dominate the foot node of the tree to which ~/and 71 belong, 
and would therefore have indices for the foot substring. The operations 
j U j' and k U k' are used to specify whichever of j or j' (and respectively 
for k or k') contain foot substring indices. The formal definition of U is as 
follows: 
j ifj' = - 
jUj' = J' ifj = - 
j ifj' =j 
undefined otherwise 
The remaining rules (3 through 6) are each completed by a particular completion 
instance. 
• Type 3 Completor: 
{t\[,\] --* . b\[~l\], i, -, -, i} {b\[~/\] --* (9-,i,j,k,l} 
{t\[~/\] --* b\[~/\]. ,i,j,k,l} 
This rule is used to complete a prediction that no (predicative) 
adjunction occurs at node ~/. Once the part of the string dominated by 
b\[~/\] has been found, as evidenced by the second antecedent item, the 
prediction of no adjunction can be completed. 
Type 4 Completor: 
{till --* • t\[~/~\], i, --, --, i} 
{t\[~r\] ~ (9 ", i,j, k, l} 
{b\[~/\] ~ A. ,j,p,q,k} 
{t\[ \] ~ t\[~lr\] ", i,p,q, I} 
t\[..,\] --* t\[..,~/r\] 
110 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
Here, an adjunction has been predicted at 7, and the adjoined derived 
tree (between t\[~\] and b\[~\]) and the derived material that r\] itself 
dominates (below b\[r\]\]) have both been completed. Thus t\[~\] is 
completely recognized. Note that the side condition (the unreduced form 
of the reduced rule in the first antecedent item) is placed merely to 
guarantee that ~/r is the root node of an adjoinable auxiliary tree. 
Type 5 Completor: 
(b\[@ --+ * b\[rl\], i, -, -, i) (b\[,\] -+ 0 • ,i,j,k,l) 
(b\[*lf\] --+ b\[r\]\] •, i, i, 1,1) b\[..rl~f\] ~ b\[..,\] 
When adjunction has been performed and recognition up to the foot 
node ~f has been performed, it is necessary to recognize all the material 
under the foot node. When that is done, the foot node prediction can be 
completed. Note that it must be possible to have adjoined the auxiliary 
tree at node r/as specified in the production in the side condition. 
• Type 6 Completor: 
(t\[,\] --~ ,t\[~\],i,-,-,i) (t\[~r\] --+ O, ,i,-,-,l) 
(t\[,\] ---+ t\[,r\]. ,i,-,-,l) t\[,\] --* t\[rlr\] 
Completion of the material below the root node ~r of an initial tree 
allows for the completion of the node at which substitution occurred. 
The recognition process for a string wl • .. Wn starts with some items that serve as 
axioms for these inference rules. For each rule t\[~ls\] --* F where ~s is the root node of 
an initial tree whose node is labeled with the start nonterminal, the item (t\[~s\] -~ • F, 
0, -, -, 0) is an axiom. If from these axioms an item of the form (t\[~s\] --~ P •, 0, -, -, n) can 
be proved according to the rules of inference above, the string is accepted; otherwise 
it is rejected. 
Alternatively, the axioms can be stated as if there were extra rules S --* t\[r/s\] for 
each ~/s a start-nonterminal-labeled root node of an initial tree. In this case, the axioms 
are items of the form (S --~ • t\[~s\], 0, -,-, 0) and the string is accepted upon proving 
IS --+ t\[~/s\] •, 0,-,-, n). In this case, an extra prediction and completion rule is needed 
just for these rules, since the normal rules do not allow S on the left-hand side. This 
point is taken up further in Section 6.4. 
Generation of items can be cached in the standard way for inference-based parsing 
algorithms (Shieber 1992); this leads to a tabular or chart-based parsing algorithm. 
6.2 The Algorithm Invariant 
The algorithm maintains an invariant that holds of all items added to the chart. We 
will describe the invariant using some additional notational conventions. Recall that 
P\[~\] -+ 1 ~ is the LIG production in the grammar whose reduced form is P\[~\] --+ P. The 
notation F\[7\] where 7 is a sequence of stack symbols (i.e., nodes), specifies the sequence 
F with 7 replacing the occurrence of .. in the stack specifications. For example, if P 
is the sequence t\[rll\]t\[..rl2\]t\[~13 \], then F\[3,\] = t\[r\]l\]t\['yrl2\]t\[~3\]. A single LIG derivation step 
will be notated with ~ and its reflexive transitive closure with 3*. 
11i 
Computational Linguistics Volume 20, Number 1 
The invariant specifies that (P\[~\] ~ E • A, i,j, k, 1) is in the chart only if 17 
1. If node ~ dominates the foot node ~f of the tree to which it belongs, then 
there exists a string of stack symbols (i.e., nodes) "y such that 
(a) P\[~\] --. PA is a LIG rule in the grammar, where E is the 
unreduced form of F. 
(b) F\[Tz/\] o* • Wi+ 1. . wjb\[v?~f\]Wk+I . . . W l 
(c) b{Tnr\] o* wj+t...wk 
2. If node ~ does not dominate the foot node ~f of the tree to which it 
belongs or there is no foot node in the tree, then 
(a) P\[7/\] --. PA is a LIG rule in the grammar, where F is the 
unreduced form of E. 
(b) F =:k* Wi+I"''W 1 
(c) j and k are not bound. 
According to this invariant, for a node ~/s that is the root of an initial tree, the item 
(t\[z\]s\] --+ P., 0,-,-, n) is in the chart only if t\[~?s\] ~ E ~* Wl''" W n. Thus, soundness of 
the algorithm as a recognizer follows. 
6.3 Modifications for Extended Derivations 
Extending the algorithm to allow for the new types of production (specifically, as 
derived by Rule 4b) requires adding a completion rule for Type 4b productions. For 
the new type of production, a completion rule of the following form is required: 
• Type 4b Completor: 
• t\[Zlr\]  i,-,-, i) 
(t\[Zlr \] --* (9. ,i,j,k l) 
(b\[,\] --* A. ,j,p q,k) 
(b\[z/\] --~ t\[~/~\]., i,p,q, l) 
b\[..z\]\] --+ t\[..,Zlr\] 
In addition to being able to complete Type 4b items, we must also be able to 
complete other items using completed Type 4b items. This is an issue in particular for 
completor rules that might move their dot over a b\[~\] constituent; in particular, the 
Type 3 and 5 Completors. However, these rules have been stated so that the antecedent 
item with right-hand side b\[~\] already matches Type 4b items. Furthermore, the general 
statement, including index manipulation is still appropriate in the context of Type 4b 
productions. Thus, no further changes to the recognition inference rules are needed 
for this purpose. 
17 The invariant is not stated as a biconditional because this would require strengthening of the 
antecedent condition. The natural strengthening, following the standard for Earley's algorithm, would 
be to add a requirement that the item be consistent with left context, as 
(d) 7/s ~* Wl"" wiP\[7"q\] 
but this is too strong. This condition implies that the algorithm possesses the valid prefix property, 
which it does not. The exact statement of the invariant condition that would allow for exact 
specifications of the item semantics is the topic of ongoing research. However, the current specification 
is sufficient for proving soundness of the algorithm. 
112 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
However, a bit of care must be taken in the interpretation of the Type 1/2 Com- 
pletor. Type 4b items that require completion bear a superficial resemblance to Type 1 
and 2 items, in that both have a constituent of the form t\[_\] after the dot. In Type 4b 
items, the constituent is tier\], in Type 4a items t\[71\]. But it is crucial that the Type 1/2 
Completor not be used to complete Type 4b items. A simple distinguishing character- 
istic is that in Type 1 and 2 items to be completed, the node ~/after the dot is never a 
root node (as it is immediately dominated by 71), whereas in Type 4b items, the node 
~r after the dot is always a root node (of a modifier tree). Simple side conditions can 
distinguish the cases. 
Figure 9 contains the final versions of the inference rules for recognition of LIGs 
corresponding to extended TAG derivations. 
6.4 Maintaining Derivation Structures 
One of the intended applications for extended derivation TAG parsing is the parsing 
of synchronous TAGs. Especially important in this application is the ability to generate 
the derivation trees while parsing proceeds. 
A synchronous TAG is composed of two base TAGs (which we will call the source 
TAG and the target TAG) whose elementary trees have been paired one-to-one. A syn- 
chronous TAG whose source TAG is a grammar for a fragment of English and whose 
target TAG is a grammar for a logical form language may be used to generate logical 
forms for each sentence of English that the source grammar admits (Shieber and Sch- 
abes 1990). Similarly, with source and target swapped, the synchronized grammar may 
be used to generate English sentences corresponding to logical forms (Shieber and Sch- 
abes 1991). If the source and target grammars specify fragments of natural languages, 
an automatic translation system is specified (Abeill6, Schabes, and Joshi 1990). 
Abstractly viewed, the processing of a synchronous grammar proceeds by parsing 
an input string according to the source grammar, thereby generating a derivation 
tree for the string; mapping the derivation tree into a derivation tree for the target 
grammar; and generating a derived tree (hence, derived string) according to the target 
grammar. 
One frequent worry about synchronous TAGs as used in their semantic interpreta- 
tion mode is whether it is possible to perform incremental interpretation. The abstract 
view of processing just presented seems to require that a full derivation tree be de- 
veloped before interpretation into the logical form language can proceed. Incremental 
interpretation, on the other hand, would allow partial interpretation results to guide 
the parsing process on-line, thereby decreasing the nondeterminism in the parsing 
process. Whether incremental interpretation is possible depends precisely on the ex- 
tent to which the three abstract phases of synchronous TAG processing can in fact be 
interleaved. In previous work we left this issue open. In this section, we allay these 
worries by showing how the extended TAG parser just presented can build derivation 
trees incrementally as parsing proceeds. Once this has been demonstrated, it should 
be obvious that these derivation trees could be transferred to target derivation trees 
during the parsing process and immediately generated from. Thus, incremental inter- 
pretation is demonstrated to be possible in the synchronous TAG framework. In fact, 
the technique presented in this section has allowed for the first implementation of syn- 
chronous TAG processing, by Onnig Dombalagian. This implementation was directly 
based on the inference-based TAG parser mentioned in Section 6.5 and presented in 
full elsewhere (Schabes and Shieber 1992). 
We associate with each item a set of operations that have been implicitly carried 
out by the parser in recognizing the substring covered by the item. An operation can 
be characterized by a derivation tree and a tree address at which the derivation tree is 
113 
Computational Linguistics Volume 20, Number 1 
• Scanner: 
• Predictor: 
(b\[r/\] --+ F • aA, i,j, k, I) 
(b\[z/\] ~ ra.A,i,j,k,l+l) 
(P\[r/\] --, P • P' \[r/lA, i,j, k, I) 
(P'\[r/'\]--~ • O,/, -, -,l) 
Type 1 and 2 Completor: 
(b\[rh\] -+ r • t\[r/\]A, m,j', k', i) (t\[rl\] -+ 0 •, i,j, k, l) 
(b\[rh\] -+ Pt\[r/\] • A,m,jUj',kUk',l) 
• Type 3 Completor: 
• Type 4a Completor: 
a ~ Wl+l 
P' \[~/'\] ~ 0 
(t\[~\] ---+ •b\[~\],i,-,-,i) (b\[,\] --+ O, ,i,j,k,l) 
(t\[~\] ---+b\[,\] • ,i,j,k,l) 
(t\[r/\] --+ •t\[,r\],i,-,-,i) 
(tier\] --+ 0 • ,i,j,k,l) 
(bM --+ A• ,j,p,q,k) 
(t\[~\] -+ t\[~r\] • ,i,p,q,l) 
• Type 4b Completor: 
(b\[~\] -+ •t\[,r\],i,--,--,i) 
(t\[rlr \] --+ 0 • ,i,j,k,l) 
(b\[,\] --+ A• ,j,p,q,k) 
(b\[w\] -+ t\[Wr\] • ,i,p,q,l) 
• Type 5 Completor: 
(b\[r/f\] --+ • b\[~/\], i, -, -, i) (b\[~/\] --+ 0 •, i,j, k, I i 
(bit/f\] --+ b\[r/\] •, i, i, l, l) 
• Type 6 Completor: 
(t\[r/\] --+ .t\[~,\],i,-,-,i) (t\[n,\] --+ o • ,i,-, -,l) 
(t\[r/\] ---+ t\[,r\] • ,i,--,--,l) 
Figure 9 
Inference rules for extended derivation TAG recognition. 
~/not a root node 
t\[..,\] --+ t\[..~lrlr\] 
b\[..r\]\] --~ t\[..rlrlr\] 
b\[..~lrlf\] --+ b\[..~\] 
t\[,\] --+ t\[,r\] 
to be placed; it corresponds roughly to a branch of a derivation tree. Prediction items 
have the empty set of operations. Type 4 and 6 completion steps build new elements 
of the sets as they correspond to actually carrying out adjunction and substitution 
operations, respectively. Other completion steps merely pool the operations from their 
constituent parts. 
In describing the building of derivation trees, we will use normal set notation for 
the sets of derivation trees. We will assume that for each node r/, there are functions 
tree(rl) and addr(rl) that specify, respectively, the initial tree that ~ occurs in and its 
address in that tree. Finally, we will use a constructor function for derivation trees 
114 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
deriv(% S), where "7 specifies an elementary tree and S specifies a set of operations on 
it. An operation is built with op(t, D) where t is a tree address and D is a derivation 
tree to be operated at that address. 
Figure 10 lists the previously presented recognition rules augmented to build 
derivation structures as the final component of each item. The axioms for this in- 
ference system are items of the form (S --* • t\[~ls\], 0,-,-, 0, {}), where we assume as in 
Section 6.1 that there are extra rules S ~ t\[~s\] for each ~s a start-nonterminal-labeled 
root node of an initial tree. We require an extra rule for prediction and completion to 
handle this new type of rule. The predictor rule is the obvious analog: 
• Start Rule Predictor: 
(S ~ r • P'\[~'\]A,i,j,k,l,S) 
(P'\[,'\] • e,t,-,-,1, {}) P'\[,'\] e 
In fact, the existing predictor rule could have been easily generalized to handle this 
case. 
The completor for these start rules is the obvious analog to a Type 6 completor, 
except in the handling of the derivation. It delivers, instead of a set of derivation 
operations, a single derivation tree. 
• Start Rule Completor: 
iS ---+ • t\[os\], i,-,-, i, {}) (tit/s\] --* O -, i,-,-, l, S) 
(S -+ t\[rls\] . , i,-,-, I, deriv(treeO7s), S)) 
The string is accepted upon proving (S-+ t\[r/s\]. , 0,-,-, n, D), where D is the 
derivation developed during the parse. 
6.5 Complexity Considerations 
The inference system of Section 6.3 essentially specifies a parsing algorithm with com- 
plexity of O(n 6) in the length of the string. Adding explicit derivation structures to the 
items, as in the inference system of the previous section, eliminates the polynomial 
character of the algorithm in that there may be an unbounded number of derivations 
corresponding to any given item of the original sort. Even for finitely ambiguous 
grammars, the number of derivations may be exponential. Nonetheless, this fact does 
not vitiate the usefulness of the second algorithm, which maintains derivations ex- 
plicitly. The point of this augmentation is to allow for incremental interpretation--for 
interleaved processing of a post-syntactic sort--so as to guide the parsing process in 
making choices on-line. By using the extra derivation information, the parser should 
be able to eliminate certain nondeterministic paths of computation; otherwise, there 
is no reason to do the interpretation incrementally. But this determinization of choice 
presumably decreases the complexity. Thus, the extra information is designed for use 
in cases where the full search space is not intended to be explored. 
Of course, a polynomial shared-forest representation of the exponential number 
of derivations could have been maintained (by maintaining back pointers among the 
items in the standard fashion). For performing incremental interpretation for the pur- 
pose of determinization of parsing, however, the non-shared representation is suffi- 
cient, and preferable on grounds of ease of implementation and expository conve- 
nience. 
115 
Computational Linguistics Volume 20, Number 1 
• Scanner: 
(b\[~\] --* FoaA,i,j,k,l,S) 
(b\[,\]---,Fa.A,i,j,k,l+l,S) a ~ Wl+l 
° Predictor: 
(P\[r/\] --* F • P'\[7/'\]A, i,j, k, l, S) 
(P' \[7/'\] --* • O,l, -, -,l, {}) P,\[~,\] ~ e 
Type I and 2 Completor: 
(b\[,1\] --.Pot\[,\]A,m,j',k',i, S1) (t\[,\] --~ (9° ,i,j,k,l, S2) 
(b\[,1\] --~Ft\[,\] o A, m,juj',kukq, l, S1 US21 
Type 3 Completor: 
(t\[7/\] --* • b\[~/\], i, -, -, i, {}} (b\[n\] --* O o, i,j, k, l, S) 
(t\[~/\] --* b\[~?\] ° ,i,j,k,l,S) 
Type 4a Completor: 
(t\[7/\] --* • t\[~/~\], i, -, -, i, {}) 
(t\[7/r\] --~ O • ,i,j,k,l,S~) 
(b\[~/\] ~ A. ,j,p,q,k, $2) 
(t\[n\] -* t\[nr\] °, i, p, q, 1, {op(addr(n), deriv( tree(nr), S~ ) ) } U $2) 
Type 4b Completor: 
(b\[zl\] --* • t\[~/r\], i, -, -, i, {}} 
(t\[Zlr\] ~ 0. ,i,j,k,l, S1) 
(b\[,\] --* A. ,j,p,q,k, S2) 
(t\[,\] ---* t\[Zlr\] " , i, p, q, 1, {op(addr(,), deriv(treeO?r), S~) )} U $2) 
Type 5 Completor: 
(b\[zlf\] ---* .b\[~l\],i,-,-,i,{}) (b\[z/\] --* O. ,i,j,k,l,S) 
(b\[~f\] ---+ b\[~/\]., i, i, I, 1, S) 
Type 6 Completor: 
(t\[~/\] ~ • tic?r\], i, -, -, i, {}) (t\[rlr\] ~ 0 "~ i~ -~ -~ l, S) 
(t\[7/\] --* t\[Z\]r\] " ~ i, --, --, I, {op(addr01 ), deriv(treeO?~), S))}} 
Figure 10 
Inference rules for extended derivation TAG parsing. 
Z/ not a root node 
t\[..~?\] ~ t\[..,~/r\] 
b\[..~?\] ~ t\[..~p/r\] 
b\[..~/r/f\] --~ b\[..r/\] 
t\[,\] --* t\[~r\] 
As a proof of concept, the parsing algorithm just described was implemented in 
Prolog on top of a simple, general-purpose, agenda-based inference engine. Encod- 
ings of explicit inference rules are essentially interpreted by the inference engine. The 
Prolog database is used as the chart; items not already subsumed by a previously gen- 
erated item are asserted to the database as the parser runs. An agenda of potential new 
items is maintained. Items are added to the agenda as inference rules are triggered by 
items added to the chart. Because the inference rules are stated explicitly, the relation 
between the abstract inference rules described in this paper and the implementation 
is extremely transparent. As a meta-interpreter, the prototype is not particularly effi- 
116 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
cient. (In particular, the implementation does not achieve the theoretical O(n 6) bound 
on complexity, because of a lack of appropriate indexing.) Code for the prototype 
implementation is available for distribution electronically from the authors. 
7. Conclusion 
The precise formulation of derivation for tree-adjoining grammars has important rami- 
fications for a wide variety of uses of the formalism, from syntactic analysis to semantic 
interpretation and statistical language modeling. We have argued that the definition of 
tree-adjoining derivation must be reformulated in order to take greatest advantage of 
the decoupling of derivation tree and derived tree by manifesting the proper linguistic 
dependencies in derivations. The particular proposal is both precisely characterizable 
through a definition of TAG derivations as equivalence classes of ordered derivation 
trees and computationally operational by virtue of a compilation to linear indexed 
grammars together with an efficient algorithm for recognition and parsing according 
to the compiled grammar. 
Acknowledgments 
Order of authors is not intended as an 
indication of precedence of authorship. 
Much of the work reported in this paper 
was performed while the first author was at 
the Department of Computer and 
Information Science, University of 
Pennsylvania, Philadelphia, PA. The first 
author was supported in part by DARPA 
Grant N0014-90-31863, ARO Grant 
DAAL03-89-C-0031, and NSF Grant 
IRI-90-16592. The second author was 
supported in part by Presidential Young 
Investigator award IRI-91-57996 from the 
National Science Foundation and a 
matching grant from Xerox Corporation. 
The authors wish to thank Aravind Joshi for 
his support of the research, and Aravind 
Joshi, Judith Klavans, Anthony Kroch, 
Shalom Lappin, Kathy McCoy, Fernando 
Pereira, James Pustejovsky, and 
K. Vijay-Shanker for their helpful 
discussions of the issues involved. We are 
indebted to David Yarowsky for aid in the 
design of the experiment mentioned in 
footnote 9 and for its execution. 
References 
Abeill6, Anne; Schabes, Yves; and Joshi, 
Aravind K. (1990). "Using lexicalized tree 
adjoining grammars for machine 
translation." In Proceedings, 13th 
International Conference on Computational 
Linguistics, Volume 3, 1-6, Helsinki, 
Finland. 
Gazdar, Gerald; Klein, Ewan; Pullum, 
Geoffrey K.; and Sag, Ivan A. (1985). 
Generalized Phrase Structure Grammar. 
Blackwell. 
Joshi, A. K.; Kosaraju, S. R.; and Yamada, 
H.M. (1972a). "String adjunct grammars: 
I. Local and distributed adjunction." 
Information and Control, 21(2), 93-116. 
Joshi, A. K.; Kosaraju, S. R.; and Yamada, 
H. M. (1972b). "String adjunct grammars: 
II. Equational representation, null 
symbols, and linguistic relevance." 
Information and Control, 21 (3), 235-260. 
Joshi, Aravind K.; Levy, L. S.; and 
Takahashi, M. (1975). "Tree adjunct 
grammars." Journal of Computer and System 
Sciences, 10(1), 136-163. 
Kroch, Anthony S. (1989). "Asymmetries in 
long distance extraction in a TAG 
grammar." In Alternative Conceptions of 
Phrase Structure, edited by M. Baltin and 
A. Kroch, 66-98. University of Chicago 
Press. 
Kroch, Anthony S., and Joshi, Aravind K. 
(1985). "The linguistic relevance of tree 
adjoining grammar." Technical Report 
MS-CIS-85-18, Department of Computer 
and Information Science, University of 
Pennsylvania, Philadelphia, PA. 
Pereira, Fernando C. N., and Warren, David 
H. D. (1983). "Parsing as deduction." In 
Proceedings, 21st Annual Meeting of the 
Association for Computational Linguistics, 
137-144. Cambridge, MA. 
Quirk, Randolph; Greenbaum, Sidney; 
Leech, Geoffrey; and Svartvik, Jan (1985). 
A Comprehensive Grammar of the English 
Language. Longman. 
Resnik, Philip (1992). "Probabilistic 
tree-adjoining grammar as a framework 
for statistical natural language 
processing." In Proceedings, 14th 
International Conference on Computational 
Linguistics, 418-424. Nantes, France. 
117 
Computational Linguistics Volume 20, Number 1 
Schabes, Yves (1991). "The valid prefix 
property and left to right parsing of 
tree-adjoining grammar." In Proceedings, 
Second International Workshop on Parsing 
Technologies, 21-30. Cancun, Mexico. 
Schabes, Yves (1992). "Stochastic lexicalized 
tree-adjoining grammars." In Proceedings, 
14th International Conference on 
Computational Linguistics, 426-432. Nantes, 
France. 
Schabes, Yves, and Shieber, Stuart M. (1992). 
"An alternative conception of 
tree-adjoining derivation." Technical 
Report 08-92, Harvard University, 
Cambridge, MA. 
Schabes, Yves, and Waters, Richard C. 
(1993a). "Lexicalized context-free 
grammars." In Proceedings, 31st Annual 
Meeting of the Association for Computational 
Linguistics, 121-129. Columbus, OH. 
Schabes, Yves, and Waters, Richard C. 
(1993b). "Stochastic lexicalized 
context-free grammars." In Proceedings, 
Third International Workshop on Parsing 
Technologies, 257-266. Tilburg, The 
Netherlands and Durbuy, Belgium. 
Shieber, Stuart M. (1992). Constraint-Based 
Grammar Formalisms. MIT Press. 
Shieber, Stuart M. (in press). "Restricting the 
weak-generative capacity of synchronous 
tree-adjoining grammars." Computational 
Intelligence. 
Shieber, Stuart M., and Schabes, Yves (1990). 
"Synchronous tree-adjoining grammars." 
In Proceedings, 13th International Conference 
on Computational Linguistics, Volume 3, 
253-258. Helsinki, Finland. 
Shieber, Stuart M., and Schabes, Yves (1991). 
"Generation and synchronous tree 
adjoining grammars." Computational 
Intelligence, 4(7), 220-228. 
Vijay-Shanker, K. (1987). A Study of Tree 
Adjoining Grammars. Doctoral dissertation, 
Department of Computer and 
Information Science, University of 
Pennsylvania, Philadelphia, PA. 
Vijay-Shanker, K., and Joshi, Aravind K. 
(1985). "Some computational properties of 
tree adjoining grammars." In Proceedings, 
23rd Annual Meeting of the Association for 
Computational Linguistics, 82-93. Chicago, 
IL. 
Vijay-Shanker, K., and Joshi, Aravind K. 
(1988). "Feature structure based tree 
adjoining grammars." In Proceedings, 12th 
International Conference on Computational 
Linguistics, 714--719. Budapest, Hungary. 
Vijay-Shanker, K., and Weir, David J. (1990). 
"Polynomial parsing of extensions of 
context-free grammars." In Current Issues 
in Parsing Technologies, edited by Masaru 
Tomita, 191-206. Kluwer Academic 
Publishers. 
118 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
Appendix A: Proof of Redundancy of Adjacent Sibling Swapping 
A.1 Preliminaries 
A.1.1 Tree Addresses. We define tree addresses (variables over which are convention- 
ally notated p~q,...~ t, u~v and their subscripted and primed variants) as the finite, 
possibly empty, sequences of positive integers (conventionally i~j~ k), with _._ as the 
sequence concatenation operator. We uniformly abuse notation by conflating the dis- 
tinction between singleton sequences and their one element. 
We use the notation p -~ q to notate that tree address p is a proper prefix of q, 
and p -~ q for improper prefix. When p _ q, we write q - p for the (possibly empty) 
sequence obtained from q by removing p from the front, e.g., 1 - 2.3.4 - 1 • 2 --- 3.4. 
A.1.2 Trees. We will take trees (conventionally A, B, E, T; also ~, r, 3' in the prior text) 
to be finite partial functions from tree addresses to symbols, such that the functions 
are 
Prefix closed: For any tree T, if T(p. i) is defined then T(p) is defined. 
Left closed: For any tree T, if T(p • i) is defined and i > 1 then T(p • (i - 1)) is 
defined. 
We will refer to the domain of a tree T, the tree addresses for which T is defined, 
as the nodes of T. A node p of T is a frontier node if T(p. i) is undefined for all i. A node 
of T is an interior node if it is not a frontier node. We say that a node p of T is labeled 
with a symbol s if T(p) = s. 
A.2 Tree-Adjoining Grammars and Derivations 
A.2.1 Tree-Adjoining Grammars. In the following definitions, we restrict attention to 
tree-adjoining grammars in which adjunction is the only operation; substitution is not 
allowed. The definitions are, however, easily augmented to include substitution. We 
define a tree-adjoining grammar to be given by a quintuple/G~ N,/7, ~4~ S) where 
• P, is a finite set of terminal symbols. 
• N is a finite set of nonterminal symbols disjoint from P,. 
• (V = G U N is the vocabulary of the grammar.) 
• S is a distinguished nonterminal symbol, the start symbol. 
• /7 is a finite set of trees, the initial trees, where 
--interior nodes are labeled by nonterminal symbols, and 
frontier nodes are labeled by terminal symbols or the special 
symbol c. (We require that e ~g V, as e intuitively specifies the 
empty string.) 
• ~4 is a finite set of trees, the auxiliary trees, where 
--interior nodes are labeled by nonterminal symbols, and 
--frontier nodes are labeled by terminal symbols or e, except for 
one node, called the foot node, which is labeled with a 
nonterminal symbol. 
• (g =/7 tO A is the set of elementary trees of the grammar.) 
By convention, the address of the foot node of a tree A is notated fa. 
119 
Computational Linguistics Volume 20, Number 1 
A.2.2 Adjunction. The adjunction of an auxiliary tree A at address t in tree E notated 
E\[A/t\] is defined to be the smallest (least defined) tree T such that 
E(r) if t 74 r (1) 
T(r) = A(u) if r = t. u and fA 74 U (2) 
E(t . u) if r = t . fa . u (3) 
These cases ~ire disjoint except at addresses t and t. fA. We have 
by clause (1), and 
by clause (2). Similarly, we have 
by clause (2) and 
T(t) -- E(t) 
T(t) -- A(t) 
T(t . fA) = Aria) 
T(t "fA) = E(t) 
by clause (3). So for an adjunction to be well defined, it must be the case that 
E(t) = A(t) = Aria) 
that is, the node at which adjunction occurs must have the same label as the root and 
foot of the auxiliary tree adjoined. This is, of course, standard in definitions of TAG. 
Alternatively, this constraint can be added as a stipulation and the definition mod- 
ified as follows: 
E(F) if t Z r 
T(r) = A(u) if r = t.u and fA Z U 
E(t. u) if r = t.fA. U 
We will use this latter definition below. 
A.2.3 Ordered Derivation Trees. Ordered derivation trees are ordered trees composed 
of nodes, conventionally notated as ~/, possibly in its subscripted and primed variants. 
(For ordered derivation trees, we will be less formal as to their mathematical structure. 
In particular, the formalization of the previous section need not apply; the definitions 
that follow define all of the structure that we will need.) The parent of a node ~/ 
in a derivation tree will be written parent(q), and the tree in g that the node marks 
adjunction of will be notated tree(~l). The tree tree(q) is to be adjoined into its parent 
tree(parent(q)) at an address specified on the arc in the tree linking the two; this address 
is notated addrO1 ). (Of course, the root node has no parent or address; the parent and 
addr functions are partial.) 
An ordered derivation tree is well formed if for each arc in the derivation tree 
from ~ to parent(q) labeled with addr(~), the tree tree(q) is an auxiliary tree that can be 
adjoined at the node addrO? ) in tree(parent01)). 
We repeat from Section 4.1 the definition of the function/) from derivation trees 
to the derived trees they specify, in the notation of this appendix: 
"D(D) = { 
treeO? ) if D is a trivial tree of one node ~/ 
tree(,)\[~D(D1) /h, "D(Da) /t2~ . . . , ~)(Dk) /tk\] 
if D is a tree with root node ~/ 
and with k child subtrees D1,..., Dk 
whose arcs are labeled with addresses tl,..., tk. 
120 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
As in Section 4.1, E\[A1/h,...,Ak/tk\] specifies the simultaneous adjunction of trees 
A1 through Ak at tl through tk, respectively, in E. It is defined as the iterative adjunction 
of the Ai in order at their respective addresses, with appropriate updating of the tree 
addresses of later adjunctions to reflect the effect of earlier adjunctions. In particular, 
the following inductive definition suffices; the base case holds for the adjunction of 
zero auxiliary trees. 
where 
El\] = E 
E\[A1/ tl , A2/ t2, . . . , Ak/ tk\] 
(E\[A1/tl\]) \[A2/update( t2 ~ A1, tl),... ~ Ak/update( tk , A1, tl )\] 
update(s, A, t) = { s if t 74 s 
t.fA'(S--t) ift-~s 
In the following section, we leave out parentheses in specifying sequential ad- 
junctions such as (E\[A1/tl\]) \[A2/t2\] under a convention of left associativity of the \[_/_\] 
operator. 
A.3 Effect of Sibling Swaps 
In this section, we show that the derived tree specified by a given ordered deriva- 
tion tree is unchanged if adjacent siblings whose arcs are labeled with different tree 
addresses are swapped. This will be shown as the following proposition. 
Proposition 
If t ~ t' then El..., A/t, B/t',...\] = El..., B/if, A/t,...\]. 
We start with a lemma, the case for only two adjunctions. 
Lemma 
If t ~ t' then E\[A/t, B/t'\] = E\[B/t',A/t\]. 
Proof 
There are three major cases, depending on the relationship of t and t': 
Case t -~ t': Let s = t ~ - t. Then 
E\[A/t, B/t'\](r) = E\[A/t\]\[B/update(t', A, t)\](r) 
= E\[A/t\]\[B/t "fa" s\](r) 
E\[A/t\](r) 
= B(u) 
E\[A/t\](t. fA" s 
E(r) if 
a(v) 
E(t-v) 
B(u) 
E(t. s-u) 
E(r) A(v) 
E(t. v) 
B(u) 
E(t. s. u) 
if t.fa.s ;~ r 
if r = t.fA .S. U andfB ~ u 
• u) if r = t'fa'S'fB" U 
t.fA.s ~ r and t ~ r 
if t-fA "S :~ r and r = t-v 
if t.fa.s ~ r and r = t.fa 'v 
if r = t.fa .s. u andfB :~ u 
ifr=t.fa.s.fB.u 
if t ;~ r 
ifr=t.v 
if s ~ v and r = t.fA "v 
if r= t.fa .s.u andfB :~ u 
if r = t'fa'S'fB" U 
121 
Computational Linguistics Volume 20, Number 1 
If siblings are swapped, 
= E\[B/t'\] \[A/update(t, 
= E\[B/t'\]\[A/t\](r) 
E\[B/t. s\](r) 
= A(v) 
E\[B/t. s\](t. v) 
E(r) if 
A(v) if 
= E(t. v) if 
B(u) if 
E(t . s . u) if 
E(r) if 
A(v) if 
= E(t. v) if 
B(u) if 
E(t . s . u) if 
E\[B/tI,A/t\](r) 
Case t / -~ t: Analogously. 
Case t 74 t' and t' 74 t: 
E\[A/tt B/t'\] (r) 
B,t')\](r) 
ift ~r 
if r = t.v and fA ~ V 
if r= t'fA .V 
:~r 
=t.v 
=t'fA 
=t'fA 
=t'fA 
;~r 
~-t.v 
:~ v and r = t . fA . v 
= t'fA "S" U andfB :~ u 
=t'fA'S'fB'U 
= E\[A/t\] \[B/update(Y, A, 
= E\[A/tl\[B/t'\](r) 
E\[A/t\](r) 
= B(u) 
E\[A/t\](t'. u) 
E(r) if t' 
A(v) if t' 
E(t. v) if t' :~ r 
B(u) if r = t' 
E(t'. u) if r = t' 
• v and t.s ~ t.v 
• v and t-v = t.s.u andfB :~ u 
.vandt.v=t.fB.u 
t)\](r) 
if t' ~ r 
ifr=t'.uandfB~u 
if r = t' . fB " U 
r and t :~ r 
and r = t.v and fA ;~ V 
and r = t . fA " V 
• U andfB ~ U 
"fB "U 
Note that this is unchanged (up to variable renaming) under swapping of 
A for B and t for t'. That is E\[A/t, B/t'\](r) = E\[B/t',A/t\](r). \[\] 
We now return to the main proposition• 
Proposition 
If t ~ t' then E\[...,A/t,B/t',...\] = E\[...,B/t',A/tt...\]. 
Proof 
The effect of the adjunctions before the two specified in the swap is obviously the 
same on all following adjunctions, so we need only show that 
E\[A/t, B/t', C1/tl,...t Ck/tk\] = E\[B/t', A/t, C 1/tlt . . . t Ck/tk\] 
without loss of generality. We examine the effect of the A and B adjunctions on the 
tree address ti for each Ci separately. In the case of the former adjunction order 
E\[A/t,B/t',...,Cdtit...\] 
= E\[A/t\] \[B/update(t't A, t),..., Ci/update(ti, A, t),...\] 
= E\[A/t\]\[B/update(t', At t)\]\[..., Q/update(update(tit A, t), B, update(t', A, t))t...\] 
= E\[A/t, B/t'\] \[...t Ci/update(update(ti, At t), Bt update(Y, A, t)),...\] 
122 
Yves Schabes and Stuart M. Shieber Tree-Adjoining Derivation 
and for the latter adjunction order: 
E\[B/t', A/t,..., Ci/ti,...\] 
= E\[B/t'\] \[A/update(t, B, t'),..., Ci/update(ti, B, t'),...\] 
= E\[B/t'\] \[A/update(t, B, t')\] \[..., Ci/update(update(ti, B, t'), A, update(t, B, t')),...\] 
= E\[B/t', A/t\]\[..., Ci/update(update(ti, B, t'), A, update(t, B, t')),...\] 
= E\[A/t, B/t'\] \[..., Ci/update(update(ti, B, t'), A, update(t, B, t')),...\] 
This last step holds by virtue of the lemma. 
Thus, it suffices to show that 
update(update(ti, A, t), B, update(t', A, t) ) = update(update(ti, B, t'), A, update(t, B, t')) 
Again, we perform a case analysis depending on the prefix relationships of t, Y, 
and ti. Note that we make use of the fact that if t -~ t ~ then (Y - t) • s = t r • s - t. 
Case t -~ Y: 
Subcase t ~ -~ ti: 
update( update( ti, A, t ) , B, update( t', A, t ) ) 
= update(t "fA" (ti -- t), B, t'fA" (t' -- t)) 
= t'fA" (t' -- t)"fB" (ti- t') 
= t'fA" (t' "fB" (ti-- t') -- t) 
= update(t' "fB" (ti- t'),A, t) 
= update(update(ti, B, t'), A, update(t, B, t')) 
Subcase t ~ 74 ti and t -~ ti: 
update(update( ti, A, t ), B, update( t', A, t ) ) 
= update(t "fA" (ti- t), B, t'fA" (t'-- t)) 
= t.fA. (ti-- t) 
= update(ti, A, t) 
= update(update(ti, B, t'), A, update(t, B, t')) 
Subcase t' 74 ti and t 74 ti: 
update(update( ti, A, t ), B, update( t', A, t ) ) 
= update(ti, B, t'fA" (t' -- t)) 
-= ti 
= update(ti, A, t.fB. (t' - t)) 
= update(update(ti, B, t'), A, update(t, B, t')) 
Case t p < t: The proof is as for the previous case with t for t r and vice versa. 
Case t 74 t ~ and t ~ 74 t: 
Subcase t ~ ti: We can conclude from the assumptions that Y 74 ti. 
Then 
update(update(ti, A, t), B, update(t', A, t) ) 
= update(t, fA" (ti -- t), B, t') 
~- t . fA . (ti - t) 
= update(ti, A, t) 
= update(update(ti, B, t'), A, update(t, B, t')) 
123 
Computational Linguistics Volume 20, Number 1 
Subcase t 74 ti and t ~ -~ ti'- The proof is as for the previous subcase 
with t for t ~ and vice versa. 
Subcase t 74 ti and t' 74 ti: 
update( update( G A, t ) , Be update( t'~ A, t ) ) 
= update(ti, B, t') 
= update(G A, t) 
= update(update(ti, B~ t'), As update(t, B, t')) 
\[\] 
124 
