SOME COMPUTATIONAL PROPERTIES 
OF TREE ADJOINING GRAMMARS* 
K. Vijay-Shankltr and Ar~vlnd K. Joshl 
Department of Computer and Information Science 
Room 268 Moore School/D2 
University of Pennsylvania 
Philadelphia, PA 191C4 
.ABSTRACT 
Tree Adjoining Grammar (TAG) is a formalism for natural 
language grammars. Some of the basic notions of TAG's were 
introduced in \[Joshi,Levy, and Takahashi 19751 and by \[Joshi, 1083\]. 
A detailed investigation of the linguistic relevance of TAG's has been 
carried out in \[Kroch and Joshi,1985\]. In this paper, we will describe 
some new results for TAG's, especially in the following areas: (I) 
parsing complexity of TAG's, (2) some closure results for TAG's, and 
(3) the relationship to Head grammars. 
1. INTRODUCTION 
Investigation of constrained grammatical systems from the 
point of view of their linguistic adequacy and thei¢ computational 
tractability has been a major concern of computational linguists for 
the last several years. Generalized Phrase Structure grammars 
(GPSG), Lexical Functional grammars" (LFG), Phrase Linking 
grammars (PLG), and Tree Adjoining grammars (TAG) are some 
key examples of grammatical systems that have been and still 
continue to be investigated along these lines. 
Some of the basic notions of TAG% were introduced in \[Joshi, 
Levy, and Takahashi,1975\] and \[Jo6hi,lQ83\]. Some preliminary 
investigations of the linguistic relevance and some computational 
properties were also carried out in \[Jo6hi,1983\]. More recently, a 
detailed investigation of the linguistic relevance of TAG's were 
carried out by \[Kroch and Joshi,1985\]. 
In this paper, we will describe some new results for TAG's, 
especially in the following areas: (1) parsing complexity of TAG's, (2) 
some closure results for TAG's, and (3) the relationship to Head 
grammars. These topics will be covered in Sections 3, 4, and S 
respectively. In section 2, we will give an introduction to TAG's. In 
section 6, we will state some properties not discussed here. A detailed 
exposition of these results is given in \[Vijay-Shankar and Joshi,1985\]. 
"This work wu partially supported by NSF Grants MCS-821011e-CER. MCS-82~7204. We want to thuk Carl P'oUard, Kelly Roach, David SeaM, tad 
David Weir. We have benefited enormously by valuable dbcussion8 with them. 
2. TREE ADJOINING GRAMMARS--TAG's 
We now introduce tree adjoining grammars (TAG's). TAG's 
are more powerful than CFG's, both weakly and strongly. ! TAG's 
were first introduced in \[Joshi, Levy, and Takahashi,1O7$\] and 
\[Joshi,1983\]. We include their description in this section to make the 
paper self-contained. 
We can define a tree ~ grammar as follows. A tree 
adjoining grammar G is'-~pair (I,A) where \] ~ a set of initial trees, 
and A is a set of auxiliary trees. 
A tree o is an initial tree if it is of the form 
CZ~ 
s /\ 
/ \ ,6~" 
/ \ 
/ \ 
That L% the root node of a is labelled S and the frontier nodes 
are all terminal symbols. The internal nodes are aU non-terminaL. 
A tree ~ is an a~xiliary tree if it is of the form 
/ \ 
/ \ 
/ \ -xvz E 
I \ 
v I v2 
That is, the root node of ~0 m labelled with a non-termb~al X 
and the frontier nodes are all labelled with terminals symbols except 
one which is labelled X. The node labelled by X on the frontier will 
be called the foot node of ~. The frontiers of initial trees belong to 
L ~, whereas the frontiers of the auxiliary trees belong to ~ N L~ + O 
~+ N L'. 
We will now define a composition operation called adjoininL 
(or adjunetion) which composes an auxiliary tree fl with a tree "I. 
Let '7 be a tree with a node n labelled X and let B be an auxiliary 
tree with the root labelled with the same symbol X. (Note that 
must have, by definition, a node (and only one) labelled X on the 
frontier.) 
IGr~ramm GI tad G2 are weakly equivalent if the string language of GI. 
L(GI) ~ the strbqE italr,~e4e of G2, L(G2). GI tad G2 m strongly equlvsleut 
they are weakly eqolwJent tad for each w In L(GI) ~ t~G2), both GI sad G2 
mlga the 8Lme structural description to w. A Ip'ffimmu G iz weakbr adequate 
for a (string) hmgeqe L, if L(G) -- L. G iJ strongly adequate for L if L(G) -- L 
tad for each w in L, G as~slgne ta eappropriatea structural description to w. The 
notion of strung adequacy is undoubtedly not precls¢ beesas¢ it depends on the 
notion of appropriate structural descriptions 
212 
Adjoining can now be defined as follows, if p is adjoined to "I 
st the node n then the resulting tree "ft' is as shown in Fig. 2.1 
below. 
1 = ~= 
$ X 
I\ /\ / \ / \ 
node / X ~ / 
s / /_\ \ ---x--- {_/.T_x.__ x 
t 
3' = S 
I\ 7 
I We"'without 
I x \ t 
--i \-- 
I \ --x-- p 
IX i \+--t 
Fisure ~ 
The tree t dominated by X in "~ is excised, ~ is inserted at" the 
node n in "1 and the tree t is attached to the foot node (labelled X) of 
,¢. i.e., ~ is inserted or adjoined to the node n in 7 pushing t 
do*swards. Note that adjoining is not a substitution operation. 
We will now define 
T(G): The set of all truce derived in G starting from initial 
trees in I. This set will be called the tree net of G. 
L(G): The set of all terminal strings which appear in the 
frontier of the trees in T(G). This set will be called the string 
I.xngeage (or language) of G. If L is the string language of a TAG G 
thee we say that L is a Tree-Adjoining Language ITAL\]. The 
relationship between TAG's , context-free grammars, and the 
corresponding string languages can be summarized as follows (\[Joehi, 
Levy, and Takahashi, 1975\], \[aoehi, *~SSl). 
Theorem 2.1: For every context-free grammar, G', there is an 
equivalent TAG, G, both weakly and strongly. 
Theorem 2.2: For ever,/ TAG, G, we have the following 
situations: 
*. I,(G) is context-free and there is a context4ree grammar 
G' that is strongly (end therefore weakly) equivalent to 
G. 
b. L(G) is context-free and there is no context4ree grammar 
G' that is equivalent to G. Of course, there must be a 
context-free grammar that is weakly equivalent to G. 
¢. L(G) is strictly context-sensitive. Obviously in this case, 
there is no context-free grammar that is weakly 
equivalent to G. 
Parts (a) and (e) of Theorem 2.2 appear in (\[Joehi, Levy, and 
Tskahashi, 1075\]). Part (b) is implicit in that paper, but it is 
imro~taut to state it explicitly u we have done here because of its 
linguistic significance. Example 2.1 illustrates part (a). We will now 
illustrate parts (b) and (c). 
Example 2.2: Let G -- (I,A) where 
I : 
A : 
Pt = 
a I = $ 
I 
n 
$ T 
I\ I\ 
• T n S 
I\ I\ 
Ib Ib 
$ T • 
LaL us look at anne derlvatlon8 in G. 
70=a = 
$e 
I 
a 
72 = $ 
I \z~ a/T\ 
/ u $\P= 
i I\ % 
t I b % 
I T _._~ 
.... f~ Ib 
S 
I 
e 
7t 
S /\ 
a T 
I\ 
$b 
I 
e 
Pt 
3t = "70 with ,0 s 7s = "It with pa 
adjoined at S as indicated in 70- adjoined at T as indicated in 7=. 
Clearly, L(G), tlie string language of G is 
L= (**eh=/n > o) 
which is a context-free language. Thus, there must exist a context,.. 
free grammar, G', which is at least weakly equivalent to G. It can be 
shown however that there is no context4ree grammar G' which is 
strongly equivalent to G, i.e., T(G) -- T(G'). This follows from the 
fact that the set T(G) (the tree set of G) is non-recognizable, Le., 
there is no finite state bottom-up tree automaton that can recognize 
precisely T(G). Thus a TAG may generate a context-free language, 
yet assign structural descriptions to the strings that cannot be 
assigned by any context-free grammar. 
Example 2.3: Let G  , (LA) where 
I : all 
$ 
I 
U 
* : Pt = P= =" 
S T I\ /~ 
a T • S 
/IX /IX 
I1\ /1\ 
b S c b T ¢ 
213 
The precise definition of L(G) is as follows: 
L(G)  - LI  ffi {w • e u / u > o, w is a string of a's and b', such that 
(1) the n•mber of •% ~ the number of b's -- •i and 
(2) for any initial substring of w, the n•mber 
of a's .~ the number of b's. } 
L! is a strictly context-sensitive language (i.e., • context- 
sensitive language that is not context-free). This can be shown as 
follows. Intersecting L with the regular language a* b* e c* results in 
the language 
L, = { na bS e ca / n ~.. o } ~ffi LI I'l a" b" e c * 
L 2 is well-known strictly co•text-sensitive language. The result 
of intersecting a context-free language with a regular language is 
always a context-free language; hence, L i is •ot • context-free 
language. It is thus a strictly context-sensitive language. Example 
2.3 thus illustrates part (c) of Theorem 2.2. 
TAG's have more power than CFG's. However, the extra 
power is quite limited. The language L! has equal number of a's, b's 
and c's; however, the a's and b's are mixed in • certain way. The 
language L~ is similar to Li, except that a's come before all b's. 
TAG's as defined so far are not powerful enough to generate L=. 
This can be seen as follows. Clearly, for any TAG for L2, each 
initial tree must contain equal •amber of •'% b's and e's (including 
zero), and each auxiliary tree must also contain equal number of a'n, 
b's and c's. Further in each case the a's meet precede the b's. Then 
it is easy to see from the grammar of Example 2.3, that it will not be 
po~ible to avoid getting the a's and b's mixed. However, L~ can be 
generated by a TAG with local constraints (see Sectio• 2.1) The so- 
tailed copy language. 
Lffi {wew/w~ {a,b}" ) 
also cannot be generated by • TAG, however, again, with local 
constraints. It is thus clear that TAG's can generate more than 
context-free languages. It can be shown that TAG's cannot generate 
all context-sensitive languages \[Joehi ,1984\]. 
Although TAG's are more powerful than CFG's, this extra 
power is highly constrained and appace•tly it is just the right kind 
for characterizing certain structural description. TAG's share almost 
all the formal properties of CFG's (more precisely, the correspo•ding 
classes of la•guages), as we shall see in sectio• 4 of this paper and 
\[Vijay-Shankar and Joshi01985\]. I• addition,the string languages of 
TAG's can also be parsed in polynomial time, in particular in O(ne). 
The parsing algorithm is described in detail in section 3. 
2.1. TAG's with Local Constraints on Adjoining 
The adjoining operation as defined in Sectio• 2.1 is "context- 
free'. An a•xiliary tree, say, 
X 
/\ 
/ \ 
/ \ 
---X--- 
is adjoinable to • tree t at • •ode, say, •, if the label of that 
node is X. Adjoining does •or depend on the context (tree context) 
around the node n. I• this sense, adjoining is co•text-free. 
In \[Joshi ,1983\]. local constraints o• adjoining similar to those 
investigated by \[Joshi and Levy ,1977\] were considered.These are a 
generalization of the context-sensitive constraints studied by \[Peters 
and Ritchie .1069\]. it was soon recognized, however, that the full 
power of these co•straints was never fully utilized, both in the 
linguistic context as well as in the "formal languages" of TAG's. 
The so-called proper analysis contexts and domination contexts (as 
defined i• \[Joshi and Levy ,1977l) as used in \[Joshi .10831 always 
turned out to be such that the context elements were always in a 
specific elementary tree i.e.. they were further localized by being in 
the same elementary tree. Based on this observation and a 
suggestio• in \[Joshi, Levy and Takahashi ,1975\], we will describe a 
new way of introducing local ¢o•strainta. This approach •ot only 
captures the insight stated above, but it is truly in the spirit of 
TAG's. The earlier approach was not so, although it was certainly 
adequate for the investigatio• in \[Joshi ,1983\]. A precise 
characterization of that approach still remains an ope• problem. 
G ~ (I,A) be a TAG with local constraints if for each 
elementary tree t 6 I U A, and for each node, n, in t, we specify the 
set fl of auxiliary trees that can be adjoined •t the node n. Note 
that if there is no constraint then •H auxiliary trees are adjoinabl¢ at 
n (of course, only those whose root has the same label as the label of 
the node n). Thee, in general, ~ is a subnct of the set of all the 
auxiliary trees adjoinable at n. 
We will adopt the following conventions. 
1. Since, by definition, no auxiliary trees are adjoinable to a 
node labelled by a terminal symbol, •o co•straint ha8 to 
be stated for node labelled by a terminal. 
2. If there is no constraint, i.e., all auxiliary trees (with the 
appropriate root label) are adjoinable •t a node, say, n, 
then we will not state this explicitly. 
3. If no auxiliary trees are adjoinabie at a •ode n, then we 
will write the constraint as (~b), where ¢b de•ores the null 
set. 
4. We will als,~ allow for the po~ihility that for a node at 
least one adjoining is obligatory, of course, from the set 
of all possible auxiliary trees adjoinable at that node. 
Hence, a'TAG with local constraints is defined as follows. G = 
(1, A) is a TAG with local constraints if for each node, n. in each tree 
t, be specify one (and only one) of the f'ollowing constraints. 
1. Selective Adjoining ~SA:) Only a specified subset of the 
set of all auxiliary trees are adjoinable at n. SA is 
written as (C), where C is a subset of the set of all 
auxiliary trees adjoinable at n. 
If C equals the set of all auxiliary trees adjoinable at n, 
then we do not explicitly state this at the node u. 
2. Null Adjoinin~ INA:) No attxiliary tree is adjoinable at 
the node N. NA will be writte• a8 (~). 
3. Obligating Adjoining IOA: ) At least one (oat of all the 
auxiliary trees adjoinable at n) must be adjoined at •. 
OA is writte• as (OA), or as O(C) where C i~ • subset of 
the set of all auxiliary trees adjoinable at n. 
Example 2.4: Let G =ffi (I,A) be a TAG with local co•attaints where 
1: O~ ffi= 
S (~) /\ 
~ S S (PO 
I I 
• b 
214 
A: #, = p,- 
s (Pt) s (#=) 
/\ /\ 
/ \ / \ 
= s (4,) (¢) s b 
In 01 no auxiliary trees can be adjoined to the root node. Only 
~1 is ~ljoinable to the left S node at depth 1 and only /9 s is 
adjoinable to the right S node at depth 1. In ~1 only Pi is ad\]oinable 
at the root node and no auxiliary trees are adjoinable at the \[(~,~. 
node. Similarly for PS" 
We must now modify our definition of adjoining to take care of 
the local constraints, given a tree "1 with a node, say, n, labelled A 
and given an auxiliary tree, say, ~, with the root node labelled A, we 
define adjoining as follows. # is adjoinable to "1 at the node n if ~ E 
#, where B is the constraint associated with the node n in "1. The 
result of adjoining p to 7 will be as defined in earlier, except that the 
constraint C associated with n will be replaced by C', the constraint 
associated with the root node ore and by C °, the constraint 
associated with the foot node of ~. Thus. given 
7= p= 
s A (c') 
/ \ node n I \ 
/ ^ (c) / \ 
I1\ I \ 
ll \\ 1 \ 
II \\ I \ 
................ It ..... (c l) 
The resultant tree "/' is ". 
-/' = 
S /\ 
/ \ 
/ \ 
/ A (c') / /\ \ 
---/ \--- 
/ \ 
/ x (c') / /\ \ 
---/ \--- 
/ \ 
We also adopt the convention that any derived tree with a node 
which has an OA constraint associated with it will not be included in 
the tree set associated with a TAG, G. The string language L of G is 
then defined as the set of all terminal strings of all trees derived in G 
(starting with initial trees) which have no OA constraints left-in 
them. 
Example 2.5: 
where 
Z : 0 = 
Let G = (I,A) be a TAG with local constraints 
A: ~= 
s (~) 
/I 
II 
a S 
/1\ 
I1\ 
b I c 
s (~) 
There are no constraints in a t. In ~ no auxiliary trees are adjoJnable 
st the root node sad the foot node and for the center S node there 
are no constraints. 
Starting with a I and adjoining ,8 to a ! at the root node we 
obtain 
3' = 
s (4O 
/I 
/I 
t S 
/1\ 
/1\ 
b I ¢ 
S (¢) 
I 
@ 
Adjoining ~ to the center S node (the only node at which 
adjunction can be made} we have 
s (~) 
II 
II 
n/s" (~) 
/ II ~ 
//I 
'aS ~ ,-- P 
/ I1\ 
/ II\ 
I b I c t 
L. ..... 
II\ 
b l c 
s (~) 
I 
U 
It is easy to see that G generates the string language 
L= {a"b~ec"/n >o} 
Other languages such ~ L'f{a u' In >I}, L" ---- {a u: I n _> I} 
also cannot be generated by TAG's. This is because the strings of a 
TAt grow linearly (for a detailed definite of the property called 
• contact growth" property, see \[Joshi ,198.3J. 
For those familiar with IJoshi, 1983\], it is worth pointing out 
that the SA constraint is only abbreviating, i.e., it does not affect the 
power of TAG's. The NA and OA constraints however do affect the 
power of TAG's. This way of looking at local constraints has only 
greatly simplified their statement, but it has also allowed us to 
capture the insight that the 'locality' of the constraint is statable in 
terms of the elementary trees themselves! 
l.I. Simple Linguistic Examples 
We now give a couple of lingnistle examples. Readers may refer 
to \[Kroch and Joshi, 1985J for details. 
1. Starting with "II m a I which is an initial tree and then adjoining 
Pl (with appropriate lexical insertions) at the indicated node in a I. 
we obtain '/s- 
215 
71 = ~tl = 
$ 
/\ 
Nps 
/\ I\ 
DET Ii V MP 
I I I I\ 
I I I I\ 
the glrl I DET N 
is I I 
t senior 
the girl t8 a senior 
PI z 
lip /\ 
liP $ /\ 
/\ 
Ifl ~ Yp 
I /\ 
• ¥ liP 
I I 
net II 
I 
Bill 
"~2 = 
S 
I \ 
I \ ,-d \ 
tNP ~ VP %/\ ~ /\ 
\\ v NP 
DgT/\N~ S~ I /\ / \~i8 DET N 
I law s\ I I 
the girl~ ~T/ \ X\ • senior 
VP \ 
I I /\ \ 
le V m'\ 
net I ~ ~pt 
\ N t 
\ I I 
Blll 
The girl vho net Blll 18 • senior 
2. Starting with the initial tree 3`1 ~ a2 and adjoining ~2 at 
the'indicated node in a 2 we obtain 3`2- 
3`1 = el2 = 
3`2 = 
* S 0(82) S 
/\ /\ 
NP VP NP VP 
l IX I II\ 
PRO To vP H I I \ 
I\ I v ~ S (~) 
V lip John J \ 
I I I \ 
invite Ii persuaded N 
I I 
Mary Bill 
PRO to invite Mary John persuaded Bill S 
4 S N / /\ ,, 
/tIP VP ~ / 
/ I II\ "~ ~ .8 2 
1 N II\ \ 
I John I 
I I 
I persuaded I~ I /\ 
t I; IrovP 
\ I tPa0 I\ % 
.. Bill~ V NP 
.... I J 
invite N 
I 
if try 
John persuaded Bill to invite Mary 
Note that the initial tree a 2 is not • matrix sentence. In order 
for it to become • matrix sentence, it must undergo an adjunction at 
its root node, for example, by the auxiliary tree ~it as show• above. 
Thus, for o 2 we will specify a local constraint O(~t) for the root 
• node, indicating that o= requires for it to undergo tn adjunct\on at 
the root node by an auxiliary tree ~2- In • fuller grammar there will 
be, of course, some alternatives in the scope of O(). 
3. PARSING TREE-ADJOINING 
LANGUAGES 
3,1, Definitions 
We will give • few additional definitions. These are not 
necessary for defining derivations in • TAG as defined in section 2. 
However, they are introduced to help explain the parsing algorithm 
and the proofs for some of the closure properties of TAL's. 
DEFINITION 3.1 Let %3`' be two trees.We say 3` I--- 3`' if i• 3` we 
adjoin an auxiliary tree to obtain 3`'. 
I--* is the reflexive,transitive closure of \[--. 
DEFINITION 3.2 7' is called • derived tree if 3` \],--" 3`' for some 
elementary tree % 
We then say "7' 6 D(3`). 
The frontier of any derived tree "I belongs to either L ~ N E + U 
LE t- N E ° if 3'6 D($) for some auxiliary tree ~0, or to E* if 3` 6 D(o) 
for some initial tree ¢x. Note if 3` 6 D(c~) for some initial tree ¢x, then 
3` is also • sentential tree. 
If ~ is an auxiliary tree, 3` 6 D(~) and the frontier of 3` is w s X 
w 2 (X is • nonterminal,Wl,W 2 6 L ~') the• the leaf node having this 
non-terminal symbol X at the frontier is called the foot of 3`. 
Sometimes we will be loosely using the phrase "adjoining with 
a derived tree" ,7 6 D(~) for some auxiliary tree ~8. What we mean is 
that suppose we adjoin ,8 at some node and then adjoin within ~8 and 
so on, we can derive the desired derived tree 6 D(~) which uses the 
same adjoining sequence and use~this resulting tree to "adjoin" at 
the original node. 
3.~. The Parsing Algorithm 
The algorithm, we present here to parse Tree-Adjoining 
Languages (TALe), is a modification of the CYK algorithm (which is 
described in detail in \[Aho and Ullman,1073\]), which •sea a dynamic 
programming technique to parse CFL's. For the sake of making our 
description of the parsing algorithm simpler, we shall present the 
algorithm for parsing without considering local constraints. We will 
later show how to handle local constraints. 
We shall a~ume that any node in the elementary trees in the 
grammar hal •tmost two children. This assumption ca• be made 
without ••y loss of generality, bee•use it can be easily shown that 
for any TAG G there is •n equivalent TAG G ! such that ••y node in 
any elementary tree in G l has utmost two children. A similar 
assumption is made in CYK algorithm. We use the terms ancestor 
and descendant, throughout the paper as • transitive and reflexive 
relation, for example, the foot •ode may be called the ancestor of the 
foot node. 
The algorithm works as follows. Let al...a, n be the input to be 
"parsed. We use • four.dimensional array A; each element of the 
array contains • subset of the nodes of derived trees. We nay • node 
X of • derived tree 3` belongs to A\[i~,k01J if X dominates • nab-tree of 
3` whose frontier is given by either ai+i...a j Y nk+l...a u (where the 
foot node of "7 is labelled by V) or ai+v..a u (i.e., j ~ k. This 
216 
corresponds to the case When -f is • sentential tree). The indices 
(iJ,k,I) refer to the positions between the input symbols and range 
over 0 through n. If i -- 5 say, then it refers to the gap between at 
and a s. 
Initially, we fill A\[i,i+l,i+l,i+l\] with those nodes in the 
frontier of the elementary trees whose label is the same as the input 
ti+ 1 for 0 < i < n*l. The foot nodes of auxiliary trees will belong to 
all Aii,i,jj I. such that i -- j. 
We are now in n position to fill in •11 the elements of the array 
A. There are five cases to be considered. 
Case 1. We know that if a node X in a derived tree is the 
ancestor of the foot node, and node Y is its right sibling, such that X 
E Ali,j,k,l\] and Y E All,m,m,n\], then their parent, sayt Z should 
belong to Alij.k,n l, see Fig 3.1a. 
Case 2. U the right sibling Y is the ancestor of the foot node 
such that it belongs to A\[I,m,n,p\] and its left sibling X belongs to 
A\[i,jj,I\], then we know that the parent Z of X and Y belongs to 
A\[i,m,n,p\], see Fig 3.1b 
Case 3. If neither X nor its right sibling Y are the ancestors of 
the foot node ( or there is no foot node) then if X E Ali,j,j,l\] and Y 6 
All,re,•,el then their parent Z belongs to A\[i,j,j,n\]. 
Case 4. If a node Z has only one child X, and if X E Alij,k,I\], 
then obviously Z E A\[i,j,k,l\]. ~" 
Case 5. U a node X E Ali,j,k,I\], and the root Y of • derived 
tree "/having the same label as that of X, belong,s to Alm,i,l,n\], then 
adjoining ? at X makes the resulting node to be in Almj,k,n\], see Fig 
3.1c. 
(t) X' (c) T 
/\ /\ 
/ \ / \ 
/ \ / \ 
/ \ / \ 
/ z' \ / \ / /\ \ / \ 
/ I \ \ / \ / / \ \ / \ 
/ V' y. \ .......... X ........ 
I /\ I\ \ I\ / / \ / \ \ I / \ I 
i / \/ \ \ n / \ n 
......... x ................. / \ ...... / \ 
t iI i i i ......... $--: 
i Jk l n n t I I I 
(b) X" /\ i j k 1 
I \ / \ 
/ \ 
I Z' \ / /\ \ 
/ / \ \ 
/ / \ \ 
/ v' T' \ / /\ /\ \ 
/ / \ I \ \ 
/ / \1 \ \ 
................. X t ........ 
I I I II I 
i J 1 nn p 
Fil~ure 3.__.~ 
Although we have stated that the elements of the array 
contain • subset of the nodes of derived trees, what really goes in 
there ate the addressee of nodes in the elementary trees. Thus the 
the size of any set is bounded by • constant, determined by the 
grammar. It is hoped that the presentation of the algorithm below 
will make it clear why we do m. 
a.a. The allgorlthm 
The compkteMgorithmk given below 
Step 1 For i=0 to n-I step 1 do 
Step 2 put all nodes in the frontier of elementary 
true v hose lnbel i8 •t*t in a\[i.i*l,i*l.l*l\]. 
Step 3 For i=O to n-I step I do 
Step 4 for J=l to n-I step 1 do 
Step 6 put foot nodes of all nuxilinry trees In 
A\[l.t.J.J\] 
Step 6 For 1=0 to n step 1 do 
Step 7 For i=l to 0 step -I do 
~Step 8 For J=i to I step | do 
Step 9 For k=l to J step -1 do 
Step 10 do Cue 1 
Step 11 do Cane 2 
Step 12 do Case 8 
Step 13 do Case 6 
Step 14 do Case 4 
Step 18 Accept if root of sons initial tree E A\[O.J.J,n\], 
0<j<n 
where, 
(a) Came 1 corresponds to situation where the left sibling is the 
ancestor of the foot node. The parent is put in A\[Q,k,I\] if the left 
sibling is in A\[i,j,k,m\] and the right sibling is in A\[m,p,p,l\], where k 
_ m < I, m ~ p, p < I. Therefore Case I is written as 
For n=k to 1-I step I do 
for p= n to 1,step I do 
if there is a left sibling in A\[i,J.k.n\] and the 
right sibling in A\[a.p.p.1\] satisfying appropriate 
restrictions then put their parent 
in A\[i,J.k.1\]. 
(b) Case 2 corresponds to the ease where the right sibling is the 
ancestor of the foot node. If the left sibling is in A\[i,m,m,p\] and the 
right sibling is in A\[p,j,k,I\], i < m < p and p < j, then we put their 
parent in A\[i,j,k,I\]. This may be written as 
For n=i to J-! step I do 
For p=a*l to \] step i do 
for all left siblin~ in A\[i.n,m,p\] ud right 
siblings 
in A\[p.J.k.1\] satisfying appropriate rentrictionn put 
thei= parents 
in A\[i.J.k.1\]. 
217 
(¢) Case 3 corresponds to the case where •either ehildre• •re 
ancestors of the foot •ode. If the left sibling E A\[i,j,j,m\] and the right 
• sibling E A\[m,p,p,l I then we ca• pet the parent in A\[i,j,j,l\] if it is the 
¢g~ulthat(i<j <_ mori_<j <m) and(m < p< Iorm < p < 
|), This may be written as 
for • = J to 1-1 step I do 
for p = J to 1 step i do 
for all left 81blLngs In ACL,|,J.a\] and 
right slblings in A\[nop.p.l\] 8atisfylng the appropriate 
restrictions pet their parent in A\[ioJ,J°l\]. 
(e) Case 5 corresponds to adjoining. If X is • node in A\[m~,k,p\] and 
Y is the root of a auxiliary tree with same symbol as that of X, such 
thatYisiuA\[i,m,p,I\]((i < m < p < Iori < m < p < l) and(m 
< j < k_ porm_~ j < k < p)). Thls may be written as 
for • = ~. to | step t do 
for p = • to I step I do 
if t node X 6 A\[n,J.k,p\] tad the root of 
auxiliary tree Is in A\[i,a.pol\] then put X in A\[l.J.k.1\] 
Case 4 corresponds to the case where a node Y has only one child X 
If X E A\[i,j,k,I\] then put Y in A\[i,j,k,I I. Repeat Case 4 again if Y has 
no siblings. 
3.4. Complexity of the Algorithm 
It is obvious that steps 10 through 15 (cases ~-e) are completed 
in O(e:~), because the different cases have at most two nested for 
loop statements, the iterating variables taking values in the range 0 
through n. They are repeated atmost O(n 4) times, because of the 
four loop statements in steps 6 through 9. The initialization phase 
(steps 1 through 5) has a time complexity of O(n + n 2) = O(n2). 
Step 15 is completed in O(n). Therefore, the time complexity of the 
parsing algorithm is O(nS). 
3.5. Correctness of the Algorithm 
The main issue in proving the algorithm correct, is to show 
that while computing the contents of an element of the array A, we 
must have already determined the contents of other elements of the 
array needed to correctly complete this entry. We can show this 
inductively by considering each case individually. We give an 
informal argument below. 
Case l: We need to know the contents of A{i,j,k,m\], A\[m,p,p,l l 
where m < I, i < m, when we ate trying to compute the contents of 
A\[i,j,k,l\]. Since I is the variable itererated in the outermost loop (step 
6), we nan assume {by induction hypothesis) that for all m < I and 
for all p,q,r, the co•teats of AIp,q,r,m \] are already computed. Hence, 
the contents of A\[i,j,k,m\] are known. Similarly, for all m > i. and 
for all p,q, and r _ I, A\[m,p,q,r i would have been computed. Thus, 
A\[m,p,p,! ! would also have bee• computed. 
Case 2: By • similar reasoning, the contents of A\[i,m,m,p\] and 
Alp,i,k,l\] are known since p < i and p > i. 
Case 3: When we are trying to compute the contents of some 
A\[ij,i,I\], we •end to know the •odes in A\[i,i,i,p\] and A\[p,q,q,I\]. Note i 
> i or j < I. Hence, we know that the co•teats of A\[i,j,j,p\] and 
A\[p,q,q,I\] would have been compared already. 
Case 5: The co•tents of A\[i,m,p,I\] and A\[m,j,k,p\] mesa be 
know• in order to compute A\[i,j,k,ll, where ( i < m < p < I or i 
m_p_<l)and(m <_j <k<porm<j_<k_<p). Since 
either m > i or p < I, contents of A\[m,j0k,p\] will be known. 
Similarly, since either m < j or k < p, the contents of A\[i,m,p,l\] 
would have been computed. 
3.6. Pining with Local Coustrslnt6 
So far,we have ~•med that the give• grammar has •o local 
constraints, if the grammar has local constraints, it is easy to modify 
the above algorithm to take care of them. Note that in Case 5, if an 
adjuectio• occurs at a node X, we add X again to the element of the 
array we are computing. This seems to be in contrast with our 
definition of how to associate local constraints with the nodes in a 
sentential tree. We should have added the root of the auxiliary tree 
instead to the element of the array being competed, since so far as 
the local constraints are concerned,this •ode decides the local 
constraints at this node in the derived tree. However, this scheme 
cannot be adopted in our algorithm for obvious reasons. We let pairs 
of the form {X,C) belong to elements of the array, where X is as 
before and C represents the local constraints to be associated with 
this node. 
We then alter the algorithm as follows. If (X,Ct) refers to • 
node at which we attempt to adjoin with •n auxiliary tree {whose 
root is denoted by (Y,Ca)). then adjunctioa would determined by C t. 
If adjunction is allowed, then we can add (X,C2) in the corresponding 
element of the array. In cases 1 through 4, we do not attempt to add 
a new element if any one of the children has a• obligatory 
constraint. 
Once it has been determined that the given string belongs to 
the language, we can find the parse in a way similar to the scheme 
adopted in CYK algorithm.To make this process simpler and more 
efficient, we can use pointers from the new element added to the 
elements which caused it to be put there. For example, consider 
Case 1 of the algorithm (step 10 ). if we add a node Z to A\[i,j,k,I\], 
because of the presence of its children X and Y in A\[i,j,k,m\] and 
A\[m,p,p,I\] respectively, then we add pointers from this node Z in 
A\[i,j,k,I\] to the nodes X, Y in Ali,j,k,m\] and A\[m,p,p,I\]. Once this has 
been done, the parse can be found by traversing the tree formed by 
these pointers. 
A parser based on the techniques described above is currently 
being implemented and will be reported at time of presentation. 
4. CLOSURE PROPERTIES OF TAG's 
In this section, we present some closure results for TALe. We 
now informally sketch the proofs for the closure properties. 
Interested readers may refer to \[Vijay-Shankar and Joshi,19851 fort 
the complete proofs. 
4.1. Closure under Union 
Let G 1 and G 2 be two TAGs generating L! and ~ respectively. 
We can construct a TAG G such that L(G)~L! tJ L2. 
Let G 1 = ( ! !, A v N v S ), and G 2 ---- ( 12 , A 2, N 2, S ). 
Without loss of generality, we may Lssume that the N! f'l N 2 ~ #. 
LetG ~ (I IU 12, AtUA 2,N, t.JN 2, S ). We claim that L(G) - L I 
UL2 
Let x 6 L l UI, 2 . Then x 6 L! or x 6 L2. If x 6 Ll, thee it 
must be possible to generate the string x in G , since I 1 , A! ate in 
G. Hence x E L(G). Si~nilarly if x E ~ , we can show that x E L(G). 
Hence L 1 LIL 2 ~ L(G). If x E L(G), then x is derived using either 
only I l,A Ioronly 12, A 2sinceN! f'IN2~ ~. Hence, x6L! orx6 
l..~z. Thus, L(G) C_ L I V L2. Therefore, L(G} = L, O L=z. 
218 
4.S. Closure under Coneatenntton 
Let G, --(lt.At,Nt.St), G s -- (la,As.Ns.Sa) be two TAGs 
generating LI, 1,2 respectively, such that N 1 I"1 N2 ,m at. We can 
construct • TAG G == (I, A, N, S) such that L(G)== L t . L a. We 
chooeeSsucbthatSisnotinN n UNa. We let N == N t U N2U 
{S), A ffi= A i U A 2. For all t I E ! l, tz E 1 2, we add tlz to !, as shown 
in Fig 4.2.1. Therefore, I ffi= ( t12 \[ t I E It, ta E lz), where the nodes 
in the subtrees t I and t z of the tree t12 have the same ¢oustrxints 
associnted with them as in the original grammars G s ned G s. It is 
eMy to show that L(G) ~ L 1 . L2. once we note that there are no 
auxiliary trees in G rooted with the symbol S, and that N 1 13 N z == 
as. 
t,= / \ ~= / \ 
/ \ I \ 
I \ / \ 
t12 = 
S 
I\ / \ 
I \ / \ 
st 
I \ / \ / 4~ 1 \ / ta \ 
Figure 4.2.1 ~. 
4.8, Clo,ure under Kleene St.m. 
Let G 1 ~ (Ii.Ai.NI.Si) be a TAG generating L 1. We can show 
that we can construct a TAG G such that L(G)  = Ls'. Let S be a 
symbol ant in Ni, and let N == N t U (S). We let the set I of initial 
trees of G be (te}, where t e is the tree shown in Fig 4.3a. The set of 
auxiliary trees A is dermed M 
A= (tsx/t IEIt}UA t. 
The tree teA is as shown in Fig 4.3b, with the constraints on 
the root of each ttA being the null adjoining constraint, no 
constraints on the foot, and the constraints on the nodes of the 
subtreee t I of the trees tlA being the same as those for the 
corresponding nodes in the initial tree t I of G t. 
To see why L(G) .- Lt" , consider x (~ L(G). Obviously, the tree 
derived (whose frontier is given by x ) must be of the form shown in 
Fig 4.3¢, where each t i' is a eeutential tree in Gl,such t i' E D(ti), for 
an initial tree t i in G I. Thus, L(G) _ Lt'. 
On the other hand, if x E Lt', then x ~ wt...wn, w i 6 L l for 1 
i _~ n. Let each w i thee be the frontier of the eenteutial tree t i' of 
G t such that t i' E D(tl) , t i E ! t. Obviously, we can derive the tree T, 
using the initial tree re. and have a sequence of adjoining'operations 
using the auxiliary trees tiA for I < i _< n. From T we can obviously 
obtain the tree T' the same as give• by Fig 4.3¢, using only the 
• ~xiliary trees in A t . The frontier of T' is obviously ws...w n. Hence, x 
G L(G). Therefore, L t. G L(G). Thus L(G) -- L,'. 
(a) t o ffi g 
/ 
(h) ttA : 
S 
s (c) / \ 
I \ / st 
/ \ / 1% 
S S t / ,%..t' t /\ 
/ \rat 
/ .% s / \ 
/ st 
s / \ 
/ I \~- t'a 
@ 
T' 
Figure 4 ..__.33 
4.4. Clolure under Intersection with Regulu Languages 
Let L T be a TAL and L R be a regular language. Let G be a 
TAG generating L T and M = (Q , E , 6, q0 , QF) be a fruits state 
automaton recognizing L R. We can construct a grammar G and will 
8how that L(GI) -- L T N L R. 
Let a be an elementary tree in G. We shall negotiate with each 
node a quadruple (ql,q2,q~,q4) where ql,q2,qa,q4 E Q. Let (ql,qa,qs,q4) 
be associated with a node X in a. Let us assume that a is an 
auxiliary tree, and that X is an ancestor of tbe foot node of n, ud 
hence, the ancestor of the foot node of any derived tree -/iu D(a). 
Let Y be the label of the root and foot nodes of a. If the frontier of 
'7 ('r in D(a)) is w I w 2 Y w s w4, and the frontier of the subtree of 7 
rooted at Z, which corresponds to the node X in a is w z Y wt. The 
idea of associating (ql,q2.qs,q4) with X is that it must be the ease 
that 6"(ql, w2) = q2, and 6"(q~, ws)  ffi q4- When "t becomes a part of 
the sentential tree 7' whose frontier is given by u w I w z v w s w 4 w, 
then it must be the case that 6"(q2, v) == qs. Following this 
reasoning, we must make q2 ~ qa, if Z is not the ancestor of the foot 
node of % or if "7 is in D(a) for some initial tree a in G. 
We have assumed here. as in the case of the parsing algorithm 
prcsented earlier, that any node in any elementary tree has atmoet 
two children. 
From G we can obtain G s as follows. For each initial tree a, 
ar~ociate with the root the quadruple (q0, q, q, qt) where qo is the 
initial state of the finite state automaton M, nnd qf E QF- For each 
auxiliary tree 0 of G, a~5ociate with the root the quadruple 
(qt,q2,q.q,q4), where q,ql,q2,q~,q4 are some variables which will later 
be given values from Q. Let X be some •ode in some elementary tree 
a. Let (qt,q2,q3,q4) be associated with X. Then, we have to consider 
the follo'~ing caacs. 
Case 1: X has two children Y and Z. The left child Y is the 
ancestor of the foot node of a. Then associate with Y the quadruple ( 
P, q2, q3, q ), and ( q, r, r, s ) with Z, and associate with X the 
constraint that only those trees whose root has the quadruple ( ql, P, 
e, q4 ), among those which were allowed in the original grammar, " 
may be adjoined at this node. If ql ~ p, or q4 ~ u , then the 
constraint as6ociated with X must be made obligatory. If in the 
original grammar X had an obligatory constraint aasocinted with it 
then we retain the obligatory constraint regarding of the relationship 
between ql and p, and q4 and s. If the constraint a~mciated with X 
is a null adjoining constraint, we sumociate ( qt, el,q, qa, q ), and ( ¢b r, 
r. q4 ) with Y and Z respectively, and associate the null adjoining 
constraint with X. If the label of Z is ~, where • E E, then we choose 
s and q such that 6 ( q, a ) ~ s. In the null adjoining constraint ease, 
q is chosen such that 6 ( q, a ) ~ q4. 
219 
Case 2: This corresponds to the ease where a node X has two 
children Y and Z, with (ql,qs,qs,q4) ~mocinted at X. Let Z ( the right 
child } be the ancestor of the the foot node the tree a. Then we shall 
associate (p,q,q,r), (r,qs,q3,s) with Y and Z. The associated constraint 
with X shall be that only those trees among those which were 
allowed in the orignal grammar may be adjoined provided their root 
has the quadruple (ql,p,s,qt) associated with it. If q, ~ p or q4 ~ r 
then we make the constraint obligatory. If the original grammar had 
obligatory constraint we will retain the obligatory constraint. Null 
constraint in the original grammar will force us to use null constraint 
and not consider the cases where it is not the case that ql == P and 
q4 -- s. If the label of Y is a terminal 'a' then we choose r such that 
oe'(p,n) ~ r. If the constraint at X is n null adjoining constraint, then 
• o~(ql,a) = r. 
Case 3: This corresponds to the case where neither the left 
child Y nor the right child Z of the node X is the ancestor of the foot 
node of o or if ~ is a initial tree. Then q2 ~ q~  ffi q- We will 
associate with Y and Z the quadruples (p,r,r,q) and (q,s,s,t) reap. The 
constraints are assigned as before , in this case it is dictated by the 
quadruple (ql,p,t,q4). If it is not the case that qt ~ P and ql ~ t, 
then it becomes an OA constraint. The OA and NA constraints at X 
are treated similar to the previous cases, and so is the case if either 
Y or Z is labelled by a terminal symbol. 
Case 4: If (qt,q2,~,q4) is associated with a node X, which has 
only one child Y, then we can deal with the various cases as follows. 
We will associate with Y the quadruple (p,q20q~,s) and the constraint 
that root of the tree which can be adjoined at X should have the 
quadruple (ql,P,e,q4) associated with it among the trees which were 
allowed in the original grammar, if it is to be adjoined at X. The 
cases where the original grammar had null or obligatory "constraint 
associated with this •ode or Y is labelled with a terminal symbol, are 
treated similar to how we dealt with them ia the previous cases. 
Once this has bee• done, let ql,"',qm be the independent 
variables for this elementary tree a, then we produce as many copies 
of a so that ql,"',qm take all possible values from Q. The only 
difference among the various copies of a so produced will be 
constraints associated with the •odes in the trees. Repeat the process 
for all the elementary trees in G !. Once this has bee• done and each 
tree given • unique name we can write the constraints in terms of 
these •ames. We will now show why L(GI) =ffi L T f3 L R. 
Let w E L(GI). Theu there is s seque•ce of adjoining 
operatio•s starting with an initial tree a to derive w. Obviously, w 6 
LT, also since corresponding to eseh tree used in deriving w, there is 
. correspo•ding tree ia G, which differs only in the ¢onstrai•ts 
associated with its •odes. Note, however, that the ¢o•strai•ts 
associated with the •odes in trees in G t are just * restriction of the 
correspo•ding o•es in G, or an obHgatoiT ¢o•straint where there was 
• o•e in G. Now, if we can assume ( by inductin• hypothesis ) that if 
~fter n adjoining operatio•s we can derive "f 6 D(~x'), then there is a 
correspo•ding tree "T 6 D(a) iu G, which has the same tree structure 
as ~/' but differing o•ly in the constraints associated with the 
corresponding •odes, then if we adjoin at some node in "~' to obtain 
"h', we can adjoin in "~ to obtain "h (corresponding to gl')- 
Therefore, if w can be derived in Gi, then it can dcfmitely be derived 
inG. 
If we can also show th~ L(Gi) C L a. then we can co•clods 
that L(GI) C L T N Lit. We can use induetio• to prove this. The 
induction hypothesis is that if all derived trees obtained after k < n 
adjoining operations have the property P then so will th• derived 
trees after • .4- I adjoiniugs where P is defi•ed as, 
Property P: If any node X in a derived tree '3' has the foot-node of 
the tree p to which X belo•gs labelled Y as a desce•dant such that 
w s Y w s is the frontier of the subtree of # rooted at X, then if 
(qs,q2,~,q4) had been associated with X, 6'(qvwl) ,~ qz and 
~(q3,w2) ~ q4, a•d if w is the frontier of the subtree under the foot 
• ode of # in '7 is then ~(qs,w) ~= q~. If X is not the ancestor of the 
foot •ode of # then the subtree of # below is of the form wlw s. 
Suppose X has associated with it (qt,q,q,q2) then ~(ql,wa) -~ q, 
~*(q,w2) ffi q2" 
Actually what we mean by an adjoining operation is •of 
necessarily just o•e adjoining operatio• but the minimum number so 
that no obligatory co•straints are associated with any •odes in the 
derived trees. Similarly, the base case •teed ant cousider o•ly 
elementary trees, but the smallest (in terms of the •umber of 
adjoining operatin•s) tree starting with eleme•tary trees which has 
• o obligatory coustrai•t associated with any of its •odes. The base 
case ca• be see• easily co•sidering the way the grammar was built 
(it can be shown formally by induction ou the height of the tree) The 
inductive step is obvious. Note that the derived tree we are going to 
use for adjoining will have the property P, and so will the tree at 
which we adjoin; the former because of the way we designed the 
grammar and a~ig•ed constraints, and the latter because of 
induction hypothesis. Thus so will the new derived tree. Once we 
have proved this, all we have to do to show that L(Gx) C L R is to 
consider tho6e derived trees which are se•tential trees and observe 
that the roots of these trees obey property P. 
Now. if n string x E L T 13 L R, we ca• show that x E L(G). To 
do that, we make use of the foUowing claim. 
Let ~ be an auxiliary tree in G with root labelled ¥ and let "y 6 
D(~). We claim that there is a 8' in G I with the same structure as 8, 
such that there is a ";' in D(bet~0)' ) where "I' has the same structure 
as 7, such that there is •o OA coustraint in '7'. Let X be a •ode in 
fit which was used in deriving -;. Then there is a •ode X' in 7' such 
that X' belongs to the auxillixry tree #l' (with the same structure as 
~|. There are several cams to co•sider - 
Case I: X. is the ancestor of the foot node of 81, such that the 
fro•tier of the subtree of ,81 rooted at X is wsYw 4 and the frontier of 
the subtree of 7 rooted at X is w,wlZwsw4. Let ~(ql,ws) ~ q, 
~(q,wl) -~- q2, ~(qS,w2) = r, and ~(r,w4) ~ q4. Then X' will have 
(ql,q,r,q4) associated with it, and there will be •o OA co•straint in 
'7'* 
Case 2: X is the ancestor of the foot •ode of ~l, and the frontier of 
the subtree of fll rooted at X is wsYw 4. Let the frontier of the 
aubtree of 'T rooted at X is wawlwsw 4. Then we claim that X' in -;' 
will have associated with it the quadruple (ql,q,r,q4), if G*(qt,wa) 
q, f(q,wl) = p0 6"(p,wz} = r, and ~(r,w4} = q4- 
Case 3: Let the froutier of the subtree of ~i (and also "7) rooted at X 
is wlw 2. Let f(q,wl) = p, 6*(p,ws) = r. The• X' will have 
associated with it the quadruple (q,p,p,r). 
We shall prove our claim by inductio• o• the •umber of 
adjoining operations used to derive "I. The base case (where "1 == 0) is 
obvious from the way the grammar G 1 was built. We shall •ow 
assume that for all derived trees % which have bee• derived from p 
using k or less adjoining operatio•s, have the property as required in 
our claim. Let ~ be a derived tree in p after k adju•ctio•s. By our 
inductive hypothesis we may ~asume the existence of the 
corresponding derived tree "y' E D(~') derived in G I. Let X be a uode 
in 7 as shown ia Fig. 4.4.1. Then the •ode X* in "y' eorrespo•di•g to 
X will have associated with it the quadruple (ql',q2S,q~l',q4")- Note we 
are aseumin~ here that the left child Y' of X' is the ancestor of the 
220 
foot node of D'- The quedruples (qt',q~',qa',P) and (P,Pt*Pt,q4") will 
be associated with Y' and Z' (by the induction hypothesis). Let ~t be 
derived from ~ by edjoining Pt at X as in Fig. 4.4.2. We have to 
slaw the existence of It' in GI such that the root of this auxili~f 
tt~ has asmeinted with K the quedruple (q,qt',q4O,r). The existence 
0( the tree follows from induction hypothesis (k m 0). We have also 
got to show that there exkts '71' with the mane structure as q' but 
one that allows It' to be adjoined at the required no¢le. But this 
should be so, since from the way we obtained the trees in GI, there 
will exist "/1" such that X t' has the quadruple (q,q:t',qs',r) and the 
constraints st X 1' are dictated by the quadruple (q,qt',q4eJ'), but 
such that the two children of X t' will have the same quedruple as in 
1'. We san now adjoin It' in 7t ° to obtain "Yl'- It can be shown that 
lt' has the required property to establish our claim. 
/\ 
/ \ 
/ \ 
/ \ 
/ \ 
/ \ 
/ \ 
/ \ 
I I\ \ / / \ \ 
I I \ \ 
I\ I I \ \ 
/ \ ........ i \ ........ 
I \ ~ / \ 
I \ / \ 
I x \ / \ I / \ \ ........ 1\__2._ .... 
/ / \ \ x / \ y / / \ \ / \ 
/ / \ \ / \ 
/ /\ /\ \ / \ 
/ / \ / \ \ /\ /\ 
/ / \/ \ \ / \ / \ 
........................... I \ / \ 
v' t T v'2 v't v's I \ I \ / V. \ 
V' 1 Y V' 2 V°! Se2 
~ (q" t,v' t)=q'26"(p.v't)=pt 
6"(q'i.v'2)=p ~(pt.v'2)fq't 6*(q,x)=q, t ~'(q'i,y)=r 
FiKure 4.4.1. Fi~re 4.4.2 
Firstly, any node below the foot of PI' in 7t' will satisfy our 
requirement~ as they are the same as the corresponding nodes in "/l'- 
Since ~t' satisfy• the requirement, it is simple to observe that the 
sods• in 01' will, even after the edjuuction of I1' in "el'" Howcver, 
because the quadruple associated with X l' are different, the 
quadruples of the nodes above X i' must reflect this change. It is easy 
to chock the existence of an auxiliary tree such that the nodes above 
X l' satisfy the requirements as stated above. It can also be argued an 
the by•is of the design of grammar GI, that there exists trees which 
allow this new auxiliary tree to be adjoined at the appropriate pi~ce. 
This then allows us to conclude that there exist • derived tree for 
e~h derived tree bebngin to D(0) as in our el~timo The next step is 
to extend our claim to take into amount all derived trees (Le., 
including the centennial truest This can be done in • manner similar 
to our treatment of derived trees belonging to D(~) for some 
auxiliary tree I as above. Of course, we have to consider only the 
¢~-~e where the finite state automaton starts from tlie initial state q0, 
and teaches some fmal state °4 on the ihput which is the frontier of 
tome sentential tree in (3. This, then allows us to conclude that L T f3 
Ln c_ L(C,)..nose, C(G,} -- C r n t~. 
? 
5. HEAD GRA.MMARS AND TAG's 
In this section, we attempt to show that Heed Grtmmar* (JIG) 
are remarktbly similar to Tree Adjoining Grammars. it appears that 
the bask: intuition behind the two systems is more ~ lea the same. 
Head Grammars were introduced in \[Pollard,10841, .but we follow the 
notations used in \[Roach,1084\]. It has been observed that TAG's ud 
HG's share s lot of common formal properties such as ahnoet 
identical cloture results, similar pummping leman. 
Consider the bask operation in llead Grammars - the Heed 
Wrapping operation. A derivation from • non-terminal produces • 
pair (i,•t...ai...sa) (• more convenient representation for this pair is 
• l...ailai÷l...~ ).* The arrow denotes the head of the string, which in 
turn determines where the string is split up when wrapping operation 
takes pl~e. For example, consider X->LI~(A,B), and let A=t*WhlX 
and B=t'Uglv.Thcn we say, X=t'whUglVX- 
'We shall define some functions used in the HG formalism, 
which we need here. If A derives in 0 or more steps the heeded string 
whx~ and B derives u~v, then 
l) if X -> LLt(A.B) is • rule in the granaar then 
X derives vhugvx 't 
2) if X -> LI.q(A.B) 18 a rule In the Kraaaar then 
X derives shugvx @ 
3) if I -> LCt(A,B) is a rule in the grLanar then 
X derives vhxugv 4" 
4) if X -> LC2(A.B) i8 a rule in the ~rsJLaar then 
X derives vhx~v 4, 
Nov consider hov• derLvttion Lu TAGs proceeds - 
Let ~ be In auxilliary tree and let a be • eentential tree as in 
Fig 5.1. Adjoining ~ st the root of the sub-tree "r gives us the 
seutential tree in File 5.1. We can, now see how the string whx has 
"wrapped around" the sub-tree i.e,tbe string ugv. This seems to 
suggest that there is something similiar in the role played by the foot 
in •n auxilliary tree and the head in a Head Grammar how the 
adjoining operations and head-wrapping operations operate on 
strings. We could say that if X is the root of an auxillizry tree ~ ted 
• l...al X ~i+l...an is the frontier of a derived tree "1 6 D(~), then the 
derivation of "/would correspond to • derivation from a non-terminal 
X to the string at...a i lai+t...tu in HG and the nee of "f in some 
sentential tree would correspond to how the strings st... a i and 
ai+t...• a are used in deriving a string in IlL. 
S /\ 
I \ I z \ 
/ I-\ \ 
/ ,/---\,<_.l~__~ 
/_3 .~_l_~ ugv 
t 
q 
S /\ 
/ \ 
I x \ 
-7~ / \ 
/ x \ ,h-7"-~-x 
I \~-~ 
ugv 
~= X /\ 
I \ 
I \ / x \ 
vh x 
221 
Based on this observation, we attempt to show the close 
relationship of TAL% and llL's. It k more convinient for us to think 
of the headed string (i,a,.:.aa) as the string al...a a with the head 
pointing in between the symbok a I and el+ , rather than at the 
symbol t 1. The defmition of the derivation oporatom can be extended 
in, straightforward manner to take this into aeeount. However, we 
can acheive the same eHeet by considering the definitions of the 
operators LL,LC,etc. Pollard suggest~ that cases such u LL2~,~ ) be 
left undefined. We shall assume that if ~" mwty then L L~,k) -- 
andLC,(X~) ~ kx. 
We, then ;ay tha~t if G is x Head Grammar, then w I -- whx belongs 
to L(G) if and only if S derives the headed string w~or whkx. 
With this new definition, we shall show, without giving the pro~* f, 
that the ci-,ss of TAL'e is contained in the clan of HL's. by 
systematically converting any TAG G to n HG G'. We shall assume, 
without loss of generality, that the constraints expressed at the nodes 
of elementary trees of G are - 
1) Nothing can be adjoined at a node {NA). 
2) Any appropriate tree (symbols at the node and root of the 
auxilliary tree must match) can be adjoined {AA), or 
3) Adjoining at the node is obligatory {OA). 
It is easy to show that these constraints are enough, and that 
selective adjoining can be expressed in terms of these and additional 
non-terminals. We know give a procedural description of obtaining 
an equivalent Head Grammar from a Tree-Adjoining Grammar. The 
procedure works as follows. It is n recursive p~rocedure 
{Convert to HG) which takes in two parameters, the first 
representing the node on which it is being applied and the second the 
label appearing on the left-hand side of the HG productions for this 
node. If X is a nonterminal, for each auxiliary tree #.whose root has 
the label X, we obtain a sequence of productions such that the first 
one has X on the left-hand side. Using these productions, we can 
derive the string w|Xw z where a derived tree in D(~) has a frontier 
wlYw =. If Y is a#node with with label X in rome tree where 
adjoining is allowed, we introduce the productions 
Y' -> LL2(X,N') (so that s derived tree with root 
Iabel X nay wrap around the string derived free the 8ubtree 
below this node} 
N' -> LCi(A , ..... Aj) {asstming that there 
are J children of this node and the t tit child in the 
ancestor of the foot node. By calling the procedure 
recursively for all the J children of Y with Ak.k 
ranging from I through J, ve can derive frou N' the 
frontier of the subtree below Y} 
Y' -) N' ( this i8 to handle the case where no 
adJunctton takes place at Y} 
If G k a TAGthen we do the following - 
Repeat for every Initial tree 
Convert to HG(root,S') {S' viii be the start symbol of 
the new Head Grammar). 
Repeat for etch Auxillinry tree 
Convert td~_HG(root,rootsyabol) 
where Convert to HG(node,ntue) Is defined u follows 
if node is an internal node then 
cue 1 If the constraint •t the node t8 AA 
add productioan Sym->LL=(node nyabol,|'). 
r->LCt(AI*. .... Ai'o .... Aj*) 
SYm->LCt(AI'. .... At'. .... tj') 
where N'.AI'.~'o...A J' are new non-teraintl 
uymbolu.A t ..... Aj correspond to the | children 
of the node and i=i if foot node is not • descendant 
of node else =1 such that the 1 ~ child of node to 
ancestor of foot node0J=nuaber of children of node 
for k=l to J step I do 
Convert to Hf(k t& child of node,Ak'). 
Cue 2 The constraint at the node in NA. 
Same as Case 1 except don't add the productions 
Sya->LLl(node nyabol.g'). 
N'->LCi(At'. .... Aj'). 
Case 3 The constraint at the node i80A. 
State as Case I except that we don't add 
Syn->LCi(AI',...Aj') 
else if the node has t terainai syabol a. 
then add the production Sya ->~ 
e'lse {it i8 a foot node } 
if the constraint at the foot node is AA then 
add the productions - -- 
Sya ->LL2(node eysbolok)/k 
if the constraint is 0A then add only the 
production 
Sya ->LL2(node syabol~) 
if the constraint is NA add the production Sym.->X 
We shall now give an example of converting a TAG G to a 
HG. G contains a single initial tree a, and a single auxiliary tree 
as in Fig. 5.2. 
S 
a= I .8= 
e 
Figure 6.__2 
S (4') /\ 
/ \ 
a $ 
/1\ 
/ I \ 
/ I \ 
b s(÷) c 
Obviously, L(G) ~ {aabac a / n :> O} 
222 
Applying the procedure Convert_to_HG to this grammar we 
obtain the HG whose productions are gives by- 
s'-~ LL=(S,S) 
s -• t.c2(e.c) 
8 -• "\[ 
c -• U.,2(S,D)m 
O -• ~Ct(E.F,G) 
E -• b F -> "~ 
which eta be revrittan u s' 
-• s/~ 
S-• LC=(a,A') 
A' -• Ltq(S,b~c) or t' ->l,lq(S,l~c) 
be verified that this grma~ gennratsn exactly It can 
L(6). 
It is worth emphaaising that the main point of this exercise was 
to show the similarities between Head Grammars and Tree Adjoining 
Grammars. We have shown how a HG G' (using our extended 
definitions) can be obtained in a systematic fashion from a TAG 
G. It is our belief that the extension of the definition may not 
necessary. Yet, this conversion process should help us understand the 
similarities between the two formalisms. 
6. OTHER MATHEMATIC~kL PROPERTIES 
OF TAG's ~. 
Additional formal properties of TAG's have been discussed in 
|Vijay-Shtakar and Joshi,1985\]. Some of them are listed below 
il Pumping lemma for TAG's TAL's are closed under substitution and homomorphlsms 
TAL's ate not closed under the following operations 
a) intersection with TAL*s 
b) intersection with CFL*8 
¢) conplsntatation 
Some other properties that have been considered in \[Vijay* 
Shankar tad Joehi,1985\] are as follows 
1) closure under the following propertieu 
a) inverse hmsoaorphimt 
b) gem napplng8 
2) senillnsurtty and Parikh-boundednus8. 
References 
1. Aho,A.V., and Ullman,J.D., 1073 "Theer 7 -f ~ Translation. 
and Compiling, Volume h Parsing, Prentice-Hall, Eaglewood Cliffs, 
N.J., 1073. 
2. Joshi,A.K., 1083 "How much context-sensitivity k necessary for 
charecterizing structural descriptions * tree adjoining grammars" in 
Natural Language ~- Theoretical, Computational I and 
~ogical Perspectives (ed. "D~.Dowty, L.Karttunea, A.Zwicky~, 
Cambridge University Press, New York, (originally presented in 
1983) to appear in 1985. 
3. Joshi,A.K., and Levy,L.S., 1977 "Constraints on Structural 
Descriptions: Local Transformations', SIAM Journal of Computing 
• June 1977. 
4. Joshi,A.K., Levy,L.S., and Takaha~hi, M., 1975 "Tree adjoining 
grammars', Journal of Computer Swat.eros and Sciences, March 1975 
5. Kroch, T., and Joshi, A.K., 1935 "Linguistic relevance of tree 
adjoining grammars', Technical Report, MS-CIS-gS-18 a Dept. of 
Computer and Information Science t University of Pennsylvania, April 
1985 
6. Pollard, C., 1984 "Generalized Phrase Structure Grammars, Head 
Grammars, and Natural languao~e *, Ph.D diseertation~ Stanford 
University, August 19S4 
7. Roach, K., 1984 "Formal Properties of Head Grammars', 
unpublished manuscript, Stanford University, also presented at the 
M,~thematics of LanK,ages workshop at the University of Michigan, 
Ann Arbor, Oct. 1984. 
8. Vijay-Shaukar,K., Joshi,A.K., 1985 "Formal Properties of Tree 
Adjoining Grammars', Technical Report: Dept. of Computer sn._d_ 
Information Science, University o_.f Pennsylvani__a., July 1985. 
223 
