TAG's as a Grammatical 
Formalism for Ceneration 
David D. McDonald and James D. Pus~ejovsky 
Departmmt of Compute~ and Information Scienc~ 
Un/vemty of Mam,dzm~tm at Amherst 
I. ~mnct 
Tree Adj~g Grammars, or "TAG's', (Josh/, Levy & 
Takahash/ 1975; Josh/ 1983; Kroch & Josh/ 1965) we~ 
developed as an al~ma~ive to the aandard tyntac~ 
formalisms that are ,,_~'~ in theoretical ~,.ll,/~s of languaSe. 
They are a.rwac~ve because they may pin,vide just the 
asFects of context seusit~ve exptes~e Fmv~r that actually 
appear in human lanSuages while otherwise r~alning 
context free. 
"\['n/s paper ___~,~,ibcs how we have applied the theory of 
Tree Adjoining Grammars to natural language generation. 
We have ~ attracted to TAG's because their cemral 
opemtiou--~he exteamou of an "initial" phra~ m~ca~u tree 
through the incl~/ou, at re,? ,~y came~/aed 
loeatinus, of oae or mmu "au~!!iar~'* ~ds 
dixec~ to cextain ceat~ ol~rat~m of our owu, 
p~rfonnnce-one~ted theory. 
We besm by briefly _,~,,-,ibin 8 TAG's u • formalism 
for phrase ram:rare in a com~___~ theory, and summar/ze 
the points in the theory of TAG's that are germainu to our 
own theory. We them conmdm' generaUy the poation of a 
grammar within the geueratiem process, inemducmg our use 
of TAG's through a contrast with how oth~nJ have used 
system~ grammars. This takes us to the ,~,~,~ resulm of 
our psper:, usng eaamp/es from our research with 
wefl.wrR1eu trots from aewupapmm, we walk throush our 
TAG insp/~ed treatments cl r~ng and wh-movemem, and 
show the cc~denc~ of the TAG ~adjunct/oo" oper~t/oa 
and our "attachment" process. 
In the final tectiou we discuss ~mau/ons to the theory, 
motivated by the way we usa the operafiou cmveqxmdin 8 
to TAG's" adjun~iou in performance. This mssesss that the 
compe~eace theory of TAG's can be profitably projoc~ed to 
s~na:tur~ at the morphoiogicaJ leve/ as weft as the preseat 
syntacuc level 
2. Tree Ad\]unctioo Grammars 
The theoretic~ apparatus of a TAG cons/sin of a 
primitive~ defined set of "elememary" phrase smgnu~ 
gge~ a Jqinkins'~ l'~lgJOgl thag ~ ~ ~ to de~e 
dependency relations between two nodes within an 
elemeutary tree, and an "adjunction" operarlon that 
combines trees under specifiable constraints. The elementary 
frees are divided into gwo sets: initLll and auxiliary. Initial 
wea have only terminals at their leaves. A~///m.y we~ are 
distinguished by having role non-terminal among their 
leaves; the category of th/s node must be the same u the 
,~tegol~ of the root. AU c/~l ~1 ~ ~ "~nnlnlmaJ n ill 
the serum that they do am regunm on any nou-~rminal. 
A mxle NI in an elementa,lry tree may be linked 
(co-indaad) to a second aode N2 in the same tree 
provided NI c-commands N2. ~Jnklng is used to indicate 
grammadcaUy defined del~de~:/es between nodes ~.b u 
subcatesorizatiou relatioashipe or fdler-sap dependencies. 
Links are p~ved (thouSh "m~.bed out") when their txee 
is extended throu~ adjunctioo; this is the mechan/sm 
TAG's use to re~re~___ t unbounded del~denczes. 
Seatea©u der/va0om start with an in/tial uee, and 
contimm via the adjunctim of an arbitrary number of 
auxiJumj trees. To adjoin an auxiliary tree A with reo¢ 
,-~tegory X to a in/t/a/ (or clenv~) tree T, we first se/ecz 
some node of catesory X within T to be the point at 
which the adjunction is to occur. Then (1) the subcree of 
T dominated by that instance of X (carl it X') is removed 
from T, (2) the au.vili~ry ~ A is kn/t into T at the 
pos/tioQ where X" had beret Icelted, and (3) die sublree 
detainer_..4 by X" is kn/t into A to replace the second 
cgcurencu of the catego~ X at T's frontier. The two trees 
have now been merged by "up/icing" A into T, disp/acing 
the subcrea of T at the pmnt of the adjunction to the 
fromier of A. 
For ~-ram~e we cmdd take the initial tree: 
~. who~ doa ~ Zohn ~ke "i \] l 
(the subucnlX "i" indJ~ttes that the "who" and the trace "e" 
am Unked) and adjoin to it the aux/Uar/ Uree: 
to pTedum the derived trea: 
94 
Adjunctioe may be "constrained'. The grammar writer 
may specify which specific trees may be adjoined to a given 
node in an elementary tree; if no specification is given the 
default is that there is no constraint and that any auxiliary 
tree may be adjoined to the node. 
2.1 Key f,_,_m~ of the theory of TAG's 
A TAG tqxectfi~ mrfaee m'ucture. There is no notion 
of derivation from deep structure in the theory of 
TAG's--the primitive trees are not transformed or otherwise 
changed once they are introduced into a text, only 
combined with other primitive trees. As Kmch and Jmhi 
point out, this means that a TAG is incomplete ms an 
account of the structure of a natural language, e.g. a TAG 
grammar wW contain ~th an active and a passive form of 
the same verbal sutx:ategurization pattern, without an 
theory-mediated description of the very clme relationship 
between them. 
To our minds this is by uo means a deficit. The 
p~c~lural machinery that generative grammars have 
traditionally carried with them to characterize relations like 
that of active to passive has only gotten in the way of 
employing tho~ characterizations in processing models of 
generation. This is because a generation model, like any 
theory of performance, has a procedural m'ucture of its 
own and cannot coexist with an incompatible one, at least 
not while still operating efficiently or while retainin 5 a 
simple mapping from its actual machine to the virtual 
machine that its authors put forward ms their ao~unt of 
psycholinguistic data. 
Our own generator uses surface structure ms its only 
expficifly represented linguistic level. Thus grammatical 
formalisms that dwell on the rules governing surface form 
are more useful to us than those that hide those rules in a 
deep to surface transformational process. 
A TAG Involves the manlpulatlea of very mmail 
demantary m'uctures. This is _'~'__~_use of the stipulation 
that elementary trees may not include recumve nodes. It 
implies that the sentences one ~ in everyday usage, e.g. 
aewpaper texts, are the result of many _o_,__e~6_ 're adjunctions. 
This melds nicely with a move that we have made in 
recent years to view the conceptual representation from 
which generation proceeds ms consisting of a heap of very 
small, redundantly related information units that have been 
defiberately selected by a text plannin~g ~ from the 
total state of the knowledge base at the time of utterance; 
each such unit will correspond in the final te~ to a head 
lexical item plus selected thematic arguments--a linguistic 
entity that is easily projected onto the elementary trees of a 
TAG. 
TAG U~n7 Indudes ~ly ow operm~oa, mqemetlom, 
and otherwim .--~-, u .4,.,,.~ to the elemantary trees 
that go tnts• text. This compom well with the indefibllity 
mpulatiou in our mode/ of gene~uion, tince adected text 
fragments ~ be ~ di~y all ~ by th@ gl~mm~r 
without the need for any later transformation. The 
composition options delimited by the constraints on 
adjunction given with a TAG define a space of alternative 
text forms which can correspond directly in generation to 
alternative conceptual relations among information units, 
alternatives in rhetorical intent, and alternatives in t,,~me 
style. 
3. Adapting TAG's to Generation 
The mapping from TAG's as a formaligm for 
competence theories of language to our formalism for 
generation is strikingly direct. As we described in Section 5 
their adjunction operation corresponds to our attachment 
Wcgess; their constraints ou adjunction correspond to our 
attachment points; their surface structure trees correspoad to 
our surface structure trees, t We further hypothesize that 
two quite strong correspondence claims can be made, 
though considerably more experimentation and theorizing 
will have to be done with both formalisms before these 
claims can be c~nfirmed. 
I. The primitive information units in renlization 
specifications can be realized exclusively ms one or 
another elementary tree ms def'med by a suitable 
TAG, i.e. linguistic criteria can be used in 
derermmmg the proper modularity of the 
conceptual structure. 2 
2. Convex~ly, for any textual relationship which our 
generator would derive by the attachment of 
multiple information units into a ~ingle package, 
there is a correslxmding rule of adjunct/on. Since 
we u~ attachment in the rp,~li,~tiou of nominal 
compounds like "o// tanker', this has the force of 
extending the domain of TAG analyses into 
morphology. (See section 7). 
4. 1"he Place of Grammar in a Tneory of 
Generat/on 
To understand why we are looking at TAG's rather 
than some other formaJi~n, one must first understand the 
role of grammar within our ~g model. The foflowing 
is a brief summary of the model; a more complete 
description can be found in McDonald & Pustejovsky 
\] Our model ot geaeratioe dora cot eml:~oy the ~ tre~ ot 
labe.t~ ~ that appear in most ttmm, etical ~ ~ Our 
mtrfa~ strtEtut~ iaeoqlofat~ tim m~umti~ ~ ot tzem, but it 
also iacl ,.,t.'- reifi~tiom ot coeMitt~at pomtio.- like "mbject" or 
"z~..'---" and is b~t~ ~ overall .., an "czemnab t- 
teq;~:am o( labeled pemtiom'. We dimm this furth~ in ...t~" ._ 5.1. 
2 If this hylm~ m race.tel, it has very mmalemttat 
im~icatiom for tha "sire" of the iaforma~oa umm that th6 tat 
woukl not be realized u u~m that inc/uda recun/ve nodes. We will 
diEum ,t,i. and o 's..- implJ~tiom in • ta-~" psp~'. 
95 
We have always had two complementa~ goats in our 
research: on the one hand our generation program hu had 
to be of practical utility to the Imowedge based expert 
systems that use it as part of a natural language interface. 
This means that architecturally our generator has always 
dmgned to produce text from mecepmal 
spm:~catlons, "plans", devdo~ by another program and 
comequenfly has had to be mmtive to the limitations and 
v-ap~g approaches of the present state of the art in 
concepmal reprewntation. 
At the same time, we want the architecture of the 
vimud m~hlne that we abstract out of our program to be 
effective as a murce of psycholinguis~c hypothesm about 
the actual generation p~c~em that humans use; it should, 
for example, provide the basis for predictive ___~mts of 
human speech error behavior and apparent p~annin s 
limitatioB. To achieve this, we have restricted om~lves to 
a highly constrained set of representations and operations, 
•nd have adopced strong and mgge~ve stipulations on our 
dmigu such as high locality, information encaptmlation, 
online qua~-realtimo rtlotime performan~, and inclelibility. 3 
restricts us u ptogrammm, but disaplines us as 
theomu. 
We me the pmce~ of generation u involving tluen 
temporally intmmingied activities: (1) determinin$ what goats 
the u~(~ is tO ac.hie~e, (2) plxnnin S what informaboll 
omtent and rhetorical force will best meet those goals given 
the context, and (3) realizing the tpectfied inlormation and 
rhetorical intent as a grammatical teat. Our l/agum~ 
camom,~ (henceforth LC), the Zetalisp ~ MUMBLE, 
handles the ~ of these activities, tskin\]g a "TMal~tiO~ 
qx~ificatim ~ as input, and producing a mmm of 
morpUotosicaay s~,-~,;.,.a wor~ u output. 
As described in \[McDonald 19@t\], LC is a 
"~on~ed" process: it ~ the m-~nue of the 
realization specification it is given, plus the syntactic surfa~ 
ttrueture of the text in progrem (which it extends 
incrementally as the qxa:£fication is mafized) to directly 
control its sctions, int~t,~hag them as though they were 
sequential computer programs. This technique imposes 
strmtg demands on the clem~ptive f~ used for 
3 "Indett, iaty" in a compmattoa requm= that m a~oe o4 • 
pro=m (matml dmmm. cee~-mml repmmmatiom. ~ ~m. 
ctg.) call be ~ tmdom olgg it has beta pegtonm& Maw/ 
mmbacMrackiag, mra~l pml~lm dem~ ha~ tim property; it is 
our tam for wdmt ~ \[Lel~ I rdermd to m tim Ixepany o( tXmlg 
4 A realbams ~dfka~oa m Jar, rurally be ,-~-~ m 
m w~ tmmy r~sndm~, ~ ~ t~t ~ -" tim 
"me~aSo le~:l" ~~ ~ • tat. 
5 Whigh m m my that it pemmtly ~ meitt~8 mtha 
~m tats. We expect m m~t mtb ~ ompm ~, 
~, 8nd tl~ amd to ,,Wm~ tl~ mpt~mmm~ I~m e~ m 
tnmeatimud mmo~ ~ ~ to ma m~ dmSm fee 
mamimency pattern ht mrfam mmctme. 
repre~ntin 8 surface gructure. For example, node, and 
categot~ labeLs now designate actions the generator is to 
take (e.g. imposillg Ka3~g relatiolu or COtkqUalnln s 
embedded decisiom) and dictate the inclu~on of function 
words and morphological specializatiem. 
4.1 Unlmmclll~ Syaemb: Gramman 
Of the established linguistic formalims, systemic 
grammar \[Halliday 1976\] has always been the most 
important to AI researchers on generation. Two of the 
mo~ important generation systems that have been 
deveJoped, PROTEUS ~Davey 1974\] and NIGEL \[Mann & 
Manhie~en 1983\], am systemic grammar, and others, 
including ourselves, have been mongly influenced by it. 
The reasons for this entb,,tlatm are central to the special 
concerns of generation. Systemic grammars employ a 
functional vocabulary: they empha~/ze the uses to which 
language can be put--how languages achieve their speakers" 
goaLs-rather than its formal structure. Since the generation 
pmcem begins with goals, unlike the comprehension process 
which begins with structure, this orientation makes systemic 
grammars more immediately useful than, for example, 
tramffotmationai generatb,+ grammars or even procedurally 
oriented AI fogmali-qa~s |of language such as ATN's. 
The generation researcher's primary question is why use 
one construction rather than another--active instead of 
pa~ive, "the" instead of "a'. "toe principle device of a 
systemic grammar, the "choice system", mppom this 
question by highlighting how the constructions of the 
language are gmupud into met of altemativet Choice 
systems pro~tde an anchoring point for the rules of a 
theory of language u~ tin,-,, it it natural to associate the 
vaziotm romantic, disgou~, or rhetorical criteria that bear 
oa the mlection of a given ~on or feature with the 
choice system to which the consmmtion belongs, thus 
providing the basis of a decision-Wm:edure for rejecting 
from its Listed atternatives; the NIGEL sy~em does ~y 
this in its "chooser" p~c~_~M_ures. 
In our formalism ~ make tt~e o~ ttu~ saint i~l'ormatWn 
a.¢ a sy~emic grammar captures, however we have choosen to 
bundle it quite differemly. The maderlyiog reat~ for this is 
that our concern for p~/cholinguistic modeling and efficient 
procemin~ takes ~ce in our design decisions about 
how the facts of language and language me should be 
repretented in a generator. It is thus instructive to look at 
the different kinds of linguistic information that a network 
of choice systems carry. In our system we distribute the~,-- 
to separate computational devimm. 
o Delx~cl©ncies among smmtutal features: A generator 
must respect the constraints that dependencies impom 
and appgeciam ,.he impact they have on its 
reafization options: for example that tome 
mburdinate da-,~_ can am express ten~ or modality 
while main datum are required to; or that a 
j~inll ~ Ob~Ol~ foN pll~de ~ent 
while a lealcal ob~cts leaves it optiomd. 
96 
o Usage criteria. The deei_'Moa pr~___~_mms associated 
with each choice system are not a part of the 
oammsLr pl~ m, althOUgh thfy ~ natllg~y 
asaociated with it and organized by it. Also most 
s~lra~\[lic glr'amm~ll include V~'y a~ f~tuns ~teh 
as "geneS: reference" or "completed action', which 
~elate the language's surface fennues, and 
thus are more controllert of why a construct is -_~_ 
rather than consmJcu themsetva. 
o Coordinated mucunal alternative=. A teutence may 
be either active or passive, either a question or a 
statement. By grouping these Mternatives into 
systems and using the:m systems exclusively when 
constructing a teat, one is guaranteed not to 
~bine inconsistent ttruetural featun=. 
o Efficieat ordering of choice~ The network that 
a~mects choice systems p~ovides a aamral path 
betweeu decision, which if followed strictly 
guarentees that a choice will not be made unlem it 
is required, and that it will aot be made before any 
of the choices that it is it~If dependent upon, 
insuring that it can be made indelibly. 
o Typology of surface structure. Almost by accident 
(since its specification is distributed throughout all of 
the systems implicidy), the stammer determines the 
pattern of dominance and cmtstituency relatiomhips 
of the tat. While not a principle of the theory, 
the trees of dauscs, NPs, etc, in ty~.emi¢ grammars 
tend to be thallow and broad. 
We believe, but have no¢ yet established, that 
equivalence transformations can be defined that would take 
a systemic grammar as a tpecification to coummct the 
alternative devices that we use in our generator (or 
augment devices that derive from other murcm, e.g. a 
TAG) by 4_-eom_ Ixxing the in/ormation in the sy~emic 
grammar aloug the lines just U_~__*~_ and redistributing it. 
s. Fuam#e Anat~ 
One of the task domaiM we are c~,i,~.tly developing 
involves newsl~per reports of current events. We are 
"revere engh~eering" leading paragraphs from actual 
eewsptper articles to produce ~ but mmpta 
conceptual repretmttation, and then designing realization 
tpecificatiomt.-plam--that will lead our LC to recommtet 
the ori~nal text or mmivated variatiou on it. We have 
adolxed this domain because the ae~a mporung task, with 
its requirement of communicating what is new and 
tignificant in an event as well as the event itmif, appears 
to impom e=czptioually rich cooaerainm on the udection of 
what conceptual informatioo to report and on what 
syntaeth: omummctiom to u.~ in reporting it (see 
in Clipplnger & McDmald \[1983|. We expect to f'md out 
how much mmplt=tity a realizatioa q~cification requires in 
order to motivate such carefully mmpmed texts; this will 
later guide ,,I, in dminl- s a tat I~ with ~t 
capsbilitim to mmtruct ugh wecificatiom on its o~m. 
Our examples are drawn from the text fragment below 
(Associated Press, 12/23/84); the realization specification we 
use to reproduce the tat foUow~. 
"LONDON. Two oil tamer& the Notweglm.owrmd T;-u-~ava ~ a Otm,len.regtsferecl ve~el, 
were 
reDortecl to tnwe Deen hit by missilm Friday In the 
Cuff. 
The Thot~wet web ahteze end under tow to 
Ba~r#in, officiaM in Osio said. Uoyds rsponed tl~ 
two crewmen were Inl~ on the UI3erlm ~" 
(ttweay" s.ever~me.C~-tar~er-war 
~v~Oon.as.to-e~gce 
(m~evem #<urm~ern-tym_vary~vaU~ 
#<tgt.oy-nmgks Ymnmgvet> 
#<llt-Oy~ t.lbm~> > 
i #~.of-m 2> 
tmr~y.m ) 
(pareetm~ #~ Ttumtuvm Osto-ofltc~a> 
#~ Lbemn Uo~> )) 
This realization specification represents the structured 
object which gives the toplevel plan for this utterance. 
Symbols preceded by colons indicate particular featur~ of 
the utterance. The two ex~ont in parenthems rare the 
content items of the specification and axe resmeted to 
appear in the utterance in that order. The first symbol in 
,.~eh_ expression is a labet indicating the function of that 
item within the plan; embett,bM__ items appearing in angle 
brackets ere in/ormatiou units from the current-events 
knowledge base. 
Obviously this plan must be considerably refined before 
it could mrve as a proximal toarce for the text; that is 
why we point out that it is a "toplevel" plan. It is a 
specification for the general outline of the utterance which 
mum l~ flC~lhed out by rtgugsive planning OUce its 
realization has begun and the LC can mpply a linguistic 
context to further constrain the choices for the units and 
the rhetorical fcatunm. 
For present purposes, the key fact to al~re about 
this realization specification is how different it is in form 
from the surface structure. One cannot produce the -ited 
text simply by travemng and "reading oat" the dements of 
the specification as though one were de~g 
production. S~ rearrangements are required, and 
these must be done under the coutrol of constraints which 
can only be stated in linguis~ vocabulary with terms like 
"subject" or "r~i~in$'. 
The fire unit in the qxcification, #<satin.civet.type..>, 
is a relation over two other units. It indicates that a 
commotmiity between the two has been noticed and deemed 
significam in the underlying representation of the event. 
The premat LC always realize, such relatious by merging 
the realizations of the two units. If nothing else occurred, 
this would give us the tat "Two od tanker, were ~ by 
mits/~r". 
97 
As it happens, however, a penclmg rhetorical constra/nt 
from the rcefi~tion specification, 
~v 8wto-sotm~ 
will force the addition of yet another information unit, 6 the 
reporting event by the ~ service that announced the 
a/edged event (e.g. a press relce.~ from Iraq, Reuters, etc.). 
In this case the "content" of the ~ event is the two 
which have already been p/armed for 
inclusion in the utterance as past of the "particulars" part 
of the specification. L~ us look closely at how that 
reportiing event unit is folded into turface mmcture. 
When am itself the focus of attention, a 
event is typically realized u "so-and-,m said X', that is, the 
content of the report is more important than the report 
itsel/; whatever sigmficance the report or its source has as 
newu will be indicated subtlly through which of the 
alternative realizations below is selected for it. 7 
Dem'ed characterisdc 
de.¢mphuLm report 
sMppmg sources sa~d. 
muree is given ebewhm'e 
emphame report 
mmmnS test 
Two tankers v~,re Ms. Gulf 
Two tankers were reported hit. 
Iraq reported it hit two tankers. 
Figuge 2 Pom/b/Utfes for ezwea~all r~ort(mmr~, into) In 
newpsper prose 
In our LC, the-,, alternative "choices" are grouped 
together into a "rcefization class" as shown in Figure 3. 
Our reatization cla.~,~s have their historic or/sire in the 
choice systems of systemic grammar, though they are very 
dLfferent in almost every concrete detail. The mot 
important difference of interest theoretically is that while 
systemic choice systems select among s/ogle alternative 
features (e.g. passive, gemndive), realization classes select 
among entire surface smmture fragments at a tune (which 
might be seen as ~ed ~tious of bundles of 
features). That is, our approach to genmt~on cafls for us 
to organize our docis/on procedures m as to ,elect the 
values for a number of linguistic feature5 timultaneouMy in 
one choice where a system~ grnmmar would make the 
selection incrementally. 8 
: gm'ammm (a~nt propo~on verb} 
: ctmk:~ 
(( (AGENT-VEFIBs-tJ'~t-PROP a0ent verb imp) cm, m focuKst~nt) emp~w~se~0) 
; e.g. "L/oyds reports lraq ~ two tanker~." 
; encompasus variations with and without that, and 
; also tem~las complements like "JoAn believes Aim 
; to be a fool." 
( (raJ~-V~PFtOP (pas~tze verb) ffoo) 
mum focug(l~t_ prop)) m~mmd-~ewhem(aOm) ) 
; "Two tankers were reported to have been hit" 
( 0t-VERB-PFtOP verb prop) 
~em~(a~nt} ) 
; e.g. "lt Lt reported that 2 tankers were hit." 
( Oe~t~P~OP aomt veto ~mv) 
; "Two tankers were hit, Gulf sources said." 
)J 
lqgare3 ~~...~shgnedm~~_) 
Returning to our example, we are now faced now with 
the need to incorporate a unit denoting the report of the 
Iraqi attacks into the utterance to act as a certification of 
the #<:~t~> events. This will be done using the 
reafization class tx~eve-veres; the cla~ is applicable to any 
information un;t of the form rel~rt(surce, into) (and 
others). It determines the reafizat/on of such units bot h 
when they appear in is~olation and, as in the present case, 
when they are to augment an utterance corresponding to 
one of th~z arguments. 
From this realization class the choice 
rag~VERB-~to-Pl~OP will be selected s/nce (1) the fact that 
two shipu were hit is most s/gnificant, meaning that the 
focus will be on the information and not the source (n.b. 
when the dam executes the murc~ ~ will be bound to its 
parameter and the information about the missile hits to the 
propcation parameter); (2) there is no rhetorical motivation 
for us to occupy space in the first sentence with the 
murca of the report s/rice they have already been planned 
to follow. These conditions are sensed by attached 
pr~__~urm associated with the characteristics that annotate 
the choice (i.e. f~us and mum~oncd.e.b~whe~e). 
6 We will not ~ the ~ by whgh featu~ in th~ 
spe(~matJon infhgn~ r~-W-=tmn. Rgatisat~on apug/ficau of th~ 
compka~ty of th/s exampks aru still very n~w in ou~ ~ and we 
am umu~ wlgtbcg tl~ ~ is t~tt~ ~ st th~ ~awmal 
dim•inS • compomi~ pngm, imia t~ Mmmmq 
(during oo~ o4' th~ B immgst~m) or mthin tbo LC mmJ/sbnl 
• --,--,~- ~ ami~pm~t alm-ut/~m. At 0m ~ ow 
~m m'~ immuglum~. 
7 "1"l gin za--~,,'- mm atl~lg~; actual oam ~ be ..... 
m~ u'ff wU~ffm~do mot ~m~ia my of dm umta ~m 
havu czammecl. P~luq~ tim "1cut N1 w p~tiou ts mo mlxmam m 
mum on a pronoun. 
8 T ha t~mklua of ~ dg~a~ ~ to control the ,ct~ 
of utu:zangB femur~ is ~lpioyed by t~ most weLI-knm~ 
appiica~om of v~a~g grammars to pwrs~on (i~ Lbe work of 
I~v=y \[t.q'741 ~ Mum ~ Mattu~mm {t~D. ~ wry r~mt 
work ith ,Nmgtmg ~'m~mus at ta~nl~trgh by Patum \[I~\] 
from ~s ,~-~n. Patt~ usm • umam~.-ie~:t pisAumS m~ 
to ~ gg~k~ groulpm o( festu.,,m at tin rightward. "output', ido og 
• syaU~ mmm'k, and ~ =mrlm backwards through tho n~mrk to 
dm~mim wlmt orbs. am ~ ~ f~tmm mum be -,4,'-* 
to tho ~ f¢~ it m ~ i~ammm~a~ comrol is thus ~ tin 
grammsr pmp~, ruth grsmmu ruim rclqat~l to mmuUtt 
--~_yn,~.~ o117. w. ~ ~migued by t~b ag~tque ..d tOOk 
fmwud "~ its fm'th~ dmgtopmt~. 
98 
Since the PROP is already in ~ in the mrface 
smu:mm tree, the LC will be in~g 
mim-V~Pl~OP as a specificatioa of how it my fold 
the auT~ary ~e fof reported into the tr~ for Two oa 
tanker~ were hit by rnit~ Friday in du~ GuLf. 
co~ds to the TAG anaIys/s in Figure 4 \[Kroch & 
Soshi 1985\]. 
lnltaal Tree AumLtary Tree: 
S \[NFL 
NP INFL INFL VP 
t~,o tankers ./ "-,, be repotted INFL 
( NF L VP 
be IXtt by t~stle~ 
4 T~Uisi and ...~m..y ere~ for EaJSlal~bject 
The initial tree for Two o~ tankers were /~ by m/~n~ea, II, 
may be e~tended at its I~FL" node as ind/ceted by the 
canto'a/at given in parenthem by that node. Figure 
shows the tree aJtet the auxiliary tree A2, named by that 
conma/nt has been adjoined. Notice that the original 
INFL" of Figure 4 is now in the comp/ement ptmtion of 
repot, giving US the Nnoteoce Two od tani~r~ ~ere reported 
NP J~ 
m.t#eil~ 
INFL 
\[NFL VP 
be r,port~.~ II~/'I. j.~-"~%'--., . 
II~L VF 
be rdt by m~uil#~ 
lq~mS Art~r ~ml~kUnt r~l~n 
5.1 Path Notsdem 
As reader8 of any of our eari/er paper~ are aware, we 
do am employ a coaveatiomd tree notation in our LC. A 
generation model places its own kinds of demaads oa the 
representation of surface structure, and them lead to 
~i-dpled ~ from me conventions adop~ by 
theoretical tlngnim. Figure 6 shows \[he uuface m'ucuue as 
our LC wou/d actually represent it just before the mom~t 
wMm the ~djunetion is made. 
... > \[SEHTEHCE \] , 
\[b'UBJECT\] , \[PRED\[CATE\] 
NP (plural) 0 Att~h- 
• // f~-~., l~t~tt~- td "- s<hit by mxsstles > Pr~tc~te 
{quant\] ---> \[headl 
two N /~.... 
\[premo~\]'\] ---> \[head\] 
otl ~anker 
Flpre 6 Sarfaee Uructure in l~h notadon 
We call this repres~tation pmh no¢cufo~ because it 
defines the path that our LC. Formally the muctum is 
am a tree but a unidirectional Ih~ked list whose formadoa 
rules obey the axioms of a tree (e.g. any path "down" 
through a given node must eventuaUy pass back "up" 
mrough that same node). The path co~ of a s~ream of 
entiu~s representing phrasal nodes, constituent positions 
(indicated by square brack~s), insumces of information units 
(in boldface), inaanca of words, and activated attachment 
pomu (me labeled circle und~ me ;nedicate; me next 
u;etion). The various symbols in the figure (e.g. mmmce, 
pred/ram, etc.) have attached procedures that are activated 
as the point of speech morea a/on s the path, a process w© 
call q~hram muczure ctecution". Phra~ mueture ctecution 
is the means by wh/eh grammat/cel consta-aints are impmecl 
oa embedded decim'oas and function words and grammatical 
moq3he~es are produced (~or discuss/on tee McDoo~d 
\[19S~l). 
Once one has begun to think of mrface m~-nue as a 
rrsvenni path, it is a short step to imt~nln~ ~ able to 
cut the path and ~ in" additional pm/;ion mquences. 9 
This q)ficin 8 operation inherits a natural set of ceusu'amu 
on the \]rinds of dim)mons that it can perform, J~nee, by 
the inde~b/ticy mpuiation, exiseing pmit~on melUenCe~ can 
am be d~stroyed or reth _r,~_d_,~_J It is our imptem/oa that 
these ~ts will turn out to be formally the same as 
throe of a TAG, but we have no( yet carried out the 
de~fled analysm to confirm thi~ 
9 The poml~lit7 of ~tdnS tbo mrf-,~ m-.~re and mm, s~os 
,al,-~-- ~ ~ ms ~ mn~ of t~m~m .lrcady in 
has ~ in our theory oL I~n~ml u t978, Wk..- We used 
it m ~ ntimS v~be whom rbetmk~ form mm the ~ 8s 
"b,~ uh.,~= I/ko ~. 0~" p,',=.m. =.,~ rare =~m~e 
ua8 o( tim ~ m tbo ~ of u dmlnm attachmem ~ dates 
from ths ~ ot t~. 
10 Conm~. ~ Llmsm uean movabou ~ in ~ & 
\[1985\]. lhvviom m of TAO theory ailawed "~t~t 
mmatint qmafimtiom ~at it fact ~ am~ mpimmd. 
Th8 prtmm c~mmims ~ we attrtcdve foma~ ,~ tt~ nat 
be muml IccaUy m a .~Je trm. 
99 
$.2 A-,.~,,~mt Polms 
The TAG formalism allowu a grammar writer to define 
"a~straints" by annotating the nodes of elememary 
with lists indicstin8 what auxiliary trees may be •djohmd to 
them (inducling "any" or "non~'). m In a ~ manner 
the "choices" in our realization dasms--which by our 
hypothem can be taken to always corrmpm~ to TAG 
elemeautry urees--iadude specifications of the a~ta~Asumt 
po~r~ at which new information unto can be 
iato the ms, face muctum peth they define. Rather than 
being c~nsl~aints on an othexwise free~ applying uperathxt, 
as in a TAG, attachment pohtts age actual objects 
inte~ in the path noutdon of the surface sm~mm. A 
list of the attachment points acbve at any momunt is 
mainta/ned by the attachment process and ~adted 
whenever an information unit needs to be .,~4_o Mint 
un/ts could be attached at any of mveral points, with the 
decis/on being made on the basis of what would be most 
consistunt with the des/red prow style (of. McOoemid 
Pustejowky \[198~a\]). Whea one of the poinu is sdecud it is 
ins•anti•ted, usually spficin 8 in new surface m'ucture in the 
protein, and the new unit -~d_-~_ at a dmignated ptmtion 
with/n the new structure. Figure 7 shows our Wemnt 
definition of the attachment point that ultima~dy leads to 
the addition of "w~s reported". 
referenco-voV~ 
( mnmO-vem-w~ ) ime~ae~-atumewem-poee 
( (sctu~-mt "~,~ste peru•} 
nm~rsas4mJ~j~ (~ 
(v0-~mlv~) ; specification of new phrase 
veto ; where the unit being an~.bed goes 
~n~rdt~~} ; when~ the eximng ccutunts go ~fec~-an~Uw-m,~aXt~--,~um-~mm 
~,~em-0aasm~um 0net~m-em "Tms~me)) 
gtgure 7 'I'm, attacbmunt-peint used by ,~r r~ved 
This anadununt point goes with any choa (eb~munu~y 
tree) that indud~ a constituent lmtition Lt~ed pr~,,.-~. 
It is placed in the position Ixtth imm.~di=t~ly at't~r (or 
; "under ~) that poubon (see Figure 6), where it is available 
to any new unit that passes the lad/cared requireme~m. 
When this attechmunt is ted_~___,~_, it builds • new VP 
• ode that has the old VP as one of its aaw~tuunts, then 
~pi/ms this new aede into the path in its #aas as ~ ia 
Fisure 7. 
The ,,nit being atutched, e.g. the report of the attack 
on the two ~iI tanken, is made the verb of the new VP. 
Later, un~ the phnum mucmm es-.',,t/o~ IX~cem has 
wailred into the new ~ and reached that verb pe~/e~, 
the unit', rudizathxt dam Oni~,..~) will be comuited 
aad a choico ml___e,~,~__ that is cc~mscem with the 
srammafical conseralnts of tx~S a verb (i~. • convuntio,tal 
variant on the rsfes-VERB.htto-PROP chokm), giving us 
... , (mmT~C~-\] ,... 
\[SUI~IECTI 
NP 
two ott tsttkel'l 
, \[PREDICATE\] 
\[verbi ---> \[tnfimt~ve- 
rt.port complement;\] 
o<hi( by atsmstle. • 
r~ure 8 1"~ path •mr attadunem 
From this discussion one can tee that our urea•taunt of 
art•thin•at usa two tt~tctuges, an attachment point and • 
choice, where • TAG would oedy use cme structure, an 
anx/lia~ tree. Tim is • amsequeace of the fact that we 
are working with a performance medel of generation that 
m,,~ ,how explicitly how coacupm~ in/ormafion units arts 
rendered into tea•as as part of • IxJychofinguisticafly plaus/ble 
process, while • TAG is • formaIiun for competence 
theories that oily aeed to qxcify the syntactic mnu~:mm of 
the grammatical minp of a languagu. "Vnis is a usnifa:ant 
cliff•race, but not one that should stand in our way in 
compming what the two theories have to offer each other. 
Comequeady in the ,rest of this paper we wifl omit the 
of the psm aoumoa and a¢¢nchmunt point clefimtions 
to fs~liu~ me comptrtuxt of theoredad lames. 
6. Generating questions using a TAG vernon og 
wh-movement 
Earlier we illustrated the TAG mncept of "\]inking" by 
shemdng how one woukl ,ran ~th -,', initial u'ee consisting 
of the /nmrrmo~ datum of a quest/on p/us the frooted 
wh-phnum and then build outward by ma:emvely •die/n/rig 
the des/red amdtiary phrases to the S node that intervenes 
baweea the wb-phram and the dame. Wh-quest/ons am 
thus built from the bottom up, as in fact is any sentence 
involving wa~ tsklng urn•retrial complements. 
This an•lyre has the dem~ble property of •flowing mus 
to state the dependencies between the W~3hrase aad the 
gap as a laced relation on a =ngie elementary tree, 
criminating the need to inducie any machinery for 
movemem iu the theory. Aft unbounded dependencies now 
derive from adjunczioas (which, as far as the grammar is 
coucerned, ca• be made withemt limit), rather than m the 
exit migratkm of a c~mdtount 8cram dauses. 
We also find this iocaiRy property to be demable, aad 
an umlogous ~ in our ~m of qmsmi01m 
and osher kinds of W~lUesdcm and unbounded dupmdm~ 
axumJedm~ 
100 
This -ommm-u~ dmiKn haa comequencm for how the 
reaiizatien qmc~catiom for them comcP, ic~o~ mu~ be 
or~-i-,~ Xa paxecu/ar, the logi~-'s urea/ ~tatiou 
of senu~d com~em~ ved~ u Id~ opw,non is am 
tenable m that ~e. For ~'~,,qde we cannm have the 
mu~m M, my. How may d,~ d~d Re~m.~ r~ d,m In,# 
had ~,dd it a~ac/~d? be the ex~mssm: 
when ~ as ,~l~don ~x¢/ficm/ou. ~sm~ ~ ou 
realizn dm IJml~ opm'a~t fw~, me ee~ o~ 
,-~.-~1, ~e my thi.,d, and ,~ on. A local TAG ,,,-,ym of 
Wk-movemen¢ requ~ ,,- to have me Ltmlxla and the 
a singia "hyer" o4 the qxa~ation, otber~i~ we would be 
forcad to vio/am oae of me .A,,~.S p,mcild,.- of our theory 
of ~era~ion, aamely chat me ~ ia a 
reaiizabon clam may ",,~W' only ~he immediam arlFuaenm of 
~he ,,-it being reafiz~; they may ao¢ look "~ssicl~" those 
arguments to mbu~lUCmt levels of ~ m.uc~uru. 
princilde has ,erred us we~l. aad we a:e 
to give it up without a very compe~ng P'~,,~a. 
We dec'.~l immsd to give up the iaummi ~m~ioa of 
~mumt/a/ c:m~lement verb ~ u ~inKle exl:m~mo~ This 
move wu a.,y for ,- to make ,/ace uw.h ~ am 
awkward m manil~Ltm ia the "Era Coa~ gyle frame 
~,,o~l~i~ ~ that we u~ ia our owu rmmmnS 
and we have p~m'red a ~m¢ionai myle 
wire r~lundant. ~ m~d ooacepma/ umB for qmte 
,ome ~ime. 
The rep~m~¢acmn we um inateacl ammmm to breaginll 
up d~e logical ~ into individua~ um~, and s/lowin s 
~em m inc/ud~ refm¢-nc~ m each oth~. 
U 1 - tambd~quam/¢y-ot-sh/ps) . anack(lnq,qmmtiry-of-daps) 
u2 " ,--y(-u-~, u 0 
U 3 = re~or~Reuten, U2) 
Given such a network u ~e r,.~ii~-~oa specificaaio~. 
d~e LC mu~ have mine l~nncip/e by wt,P.~ m )uclSe w~e~e 
to start: which umt ~houJd form me ~ of ~he ~udace 
smu:nue to which the othe~ are then attached? A tumuli 
prm¢il~e to adolx i~ to ~ ~m d~e "oa~" ,-,q, i~. me 
one that does am mention any other umm in im defimQon. 
We axe ~n~dermg aclopemg the po//cy that atria ~mm 
daouid be allowed onJy rmdizaUon~ as iaimd trees while 
~mm whom defmitioa m~ "pomunS m" (.-,-".$) other 
umm taou~d be aflowed o~y realizauem u au~ :xee,. 
We have rim. howe~e¢, worked thxo~sh a/l M the 
ramificattom inch a poficy m/ght have on o~or parB of 
our l~meranon mode/; without ye~ ~lg whe~ it 
impn~ve or desra~ me o~w ~ M our mere, y, 
we axe relum~nt co aum't it as one of our hypoth__-._-~_ 
retalmS our ge~eranoa mode/ to TAG's. 
Given tbtt ~en ~ m, me r~indoe d the 
quea/en is fa~dy maiShdmward (See F~gum 9). The 
Lameda ¢qnemoa is amgned a realizat/oa dam for dau~ 
Wk oommscboss, wherentxm the emmmmd aXllummt 
cp--*,*y-et-ddW is I~''~ ia COMP, aad me body of me 
k p/aced in me H\]BAD pom~0u. At the mine 
~me, the two m of quan~-e~-~ a:e ,~ 
mark~ The o~e ia COMP ~ ~mllned to the reaiiz~oa 
for w;, phnu~ appmlma~ to quanuty (e.g. it will 
have the choice how many X aad pmmbly related choicm 
such as <aan~/> ~' w/dck and olhe¢ vaxiaum aplnopriam 
to rehmve chuu,m or oth~ pemtiom whe~ Wk commm~om 
can be m~d). Simedtmuaxudy the i.~.~ M 
qusm/ty,~t-ddW in the argument pomion of the head frame 
tmmk i~ amwaed to the reaiiza¢icsa dam for Wk-cmc¢ 
Them cwo q~ma~m¢iom are the equivalent, in our mode/, 
of the TAG llnkin s I'~ 
¢ ~-- Reuters r~pc.r.*.s ::" 
\ _J S comp S<... ' ,./ 
WH(smps) 
\[raq atr2cl¢ e 
F~ 9 Qumclml ferm---., w/th ~ mmldement 
"\[~e n,o pend/nS umu. u 2 aad U3. are mea ,~ed 
to cl~ ,,,an'ix. mlxnergmll f'um me ~aglt unit and m~ U 2 
mm mmplem,,,,t pmuimD. 
7. Exumsions to the Theor7 of TAG 
Coau~-t-free grammars ~um ab/e to ~ the word 
fonnauon pro¢~ maz seem m ~ for ~ lantlua~ 
(ct. W~, \[19811. Se/k/xk \[1982 D. A TAG amdym of arab 
a grammar seem, like a nanmd app//c~oa to the currier 
vemoa of the d2mry (cL Pm~eiovsky (in p~.paraUoa)). To 
uUumram our point, comldcr oompound/ns rulm ia Engii~. 
We can my dmt for a conu~-frea ~prxmmar for word 
formacioa. G~, th~ iJ a TAG. r~, thai is cq~w~i,m¢ to 
Gw (cL F~Kuxes 10 and 11). Co~der a f~Kment of G w 
be/ow, tl 
fe¢ ,,,... lemnl~e capac~ M aann.al laquap ~ fmmauoa 
mmp,mmm. 
101 
N->N IA I V IF N 
A->NIAIP A 
V ->PV 
ln4tmm Io C~G rrmpn~ tot" Word Foematlaa 
The ~ aw frat~teat would be: 
/'\ 
comp N comp A P V 
AUXI LIAR'/ TREES 
N N N t t ( 
oti tan~er ~et'mtta~L 
INITIAL TRKES 
Ftgm~ U TAG Fru~meat for Word F~ 
Now ~n.~der the comlmmtd , "oa tamer t~r~r~, t~em 
from the n~lmr mlxm~g dome, and its derivaUoa in 
TAG theory, showu ia Figure 12. 
~p N~ N 
/C~np''N ~"" .... ~k 
Figure 12 TAG ~ o! o~ tam~ termma/ 
the ImUibility of ~8 U2 preuominally. One of the 
e.homes ~ with this unit is a ~atl~mnd 
~ i= tenm of an auxiliary ~m. A 
malXitm at this Ixut in tim dmivatiou tho~J the foflowintt 
structure. 
  nu2\] ulI 
The ueat unit c~etted up in this structure is U3, which also 
a~t)~vs for attachlneat l:)tl~Om~nsily. "l~tm an SUZiii,~'y 
ammspoading to U 4 ~ iamxtuced, giving us the mmctmm 
bet~: 
u4 \] u311 ul\] 
The miecflon~ constraints impomd by ~e mmcttmd 
immticmUtg of i~fmmation unit U 4 aJl~ ooi), a 
¢ompouadiag choicm. Had th~ ~ no word.4evet 
compound raliz~oa option, we would haw work~l out 
way iam ~ comer without eXlmmmtg the relation between 
• ~3i1> axtd ~'xa~er>. Becamm of this it may be better 
to view units such as 0 4 as being umciated directly with a 
ImicaJ compoue~.'~'~ form, i.e. ed tank.er. This partial 
~uUoa, bow~er, wouM not qx~c to the ?mblem of active 
word formation in the language. Ftuthermom, it would be 
mteremas to ~mlmre ~e mategic deci.siom made by a 
gtmm'ttion tn/tt~m with tbom planniag m~ madm 
bummm,s wbcm ~--~",5. ~ L5 ~n ~ect of &,tmtwation that 
tam'its muc~ hmber rmmrc~. 
La us ~mlmre tim derivation to ~e izromm ,,__,e~ by 
the LC. The uadmCyin8 intormJmoa umim from which this 
¢omlmtmd is dmwed m our system ate tho~m tmtow. "the 
pitaum' Ilu dmidml that the utits Mt~ meal to be 
c~tammticated m ord~ to ,a~u.t~y m tho omlce~. 
The to~evet unit in this Mmdle L5 a<:tm'mlnsl~. 
LL t ~ ~<tsm, mm> 
u 2 ,, u.# 
u 4 = ,<=ram 
U 5 = ~ 
The first trait to be pmibcn~ed in tbo surfa~ sm~x~ 
U 1, ~usd aplxm,t~m u the It~ of ~,t NP. Thems is an 
attac~cmt point oa this position, however, which allows for 
102 
8. Acknowledgements 
This re~u~ has been ml~enn/aaled in part by 
contract NG014-85-K-(}(}I7 from the Defcmm Advanced 
Re,arch Projects Agency. We would like to thank Marie 
Vaushan for help in the preparation of this text. 
9. References 
CLipp/nger, & McDoonld (1983) "Why Good Writing is 
Eaker to Undcrmmd", Proc. UCAI-83, pp. "~0-732. Davey (1974) ~ lh~ugt/m, Ph.D. Dime~ation, 
Edinburgh Un/vers/ty; pubt/~ed in 1979 by E~nburgh 
University Press. Halliday (1976) System and g~ In Language, Oxford 
Umvemty Pre~. Joshi (1983) "How Much Coutext-Sens/tivity is Required to 
Provide Reasonable Structural DescfilXions: Tree Ad~3inin$ Grammar', preprint to appear in Dowry, 
p~<~ & Zwicky (eds.) Natm'al 12mgua~ ~cho~.uis~ Compu .taaout, ,~, 
3"heer.~-~i Perspe~ves, Cambridge Umvemty Fre~. 
Kngh, T. and A. Joshi (1985) "The Linguistic Relevance of 
Tree Adjolnln$ Grammar", Univemty of Pennsylvania, 
Dept. of Computer and In/ormation Science. 
ransendoen, D.T. (1981) "The Generative Capacity of 
Word-Format/on Components", w Jn~,n,~le Inquiry, 
Volume 12,O. Mann A, Magghi~ (1~) Nige\[: A Systemic Grammar for 
Text Generation, in Freedle (ed.) System/g Perstm~vm 
~a ~, Able=. 
Marcus (1~0) A Theory ~f Sy~a¢~¢ Recogn~m for Namr~ 
Language, Mr\]" \[heSS. 
McDonald (1984) "Description Directed Control: Its 
Implications for Namr, d Language Generation', in 
C~i~e (ed.) Comlmtat/om~ lJn-ul~/a, Pergamon 
Press. 
McDonald & Pustejovsky (19&~a) "SAMSON: a 
computational theory of prose style in generation", 
~gs of the 1985 meeting of the European 
Amociat/on for Computational Linguistics. 
(1985b) "Description.Directed Namra/ 
Language Generat/on", Proceedings of IJCAl-85, 
W.gnufmann Inc., Los Altos CA. 
Patten T. (1985) "A Problem Solving Approach to 
Generating Text from Systemic Grammars", Proceedings 
of the 19&5 meeting of the European Association for 
Computational Linguistics. 
Pustejovsky, J. (In Preparation) "Word Forma~ou in Tree 
Adjo/n/ng Grammars" 
Se/k~k (1982) 1"~ Syutaa d Word=, MIT Press. Win=fflint (1981) "Ar$um=at Scmemm and Morphok~" T/w 
/~Su/.me Rev/¢~, 1, 81-114. 
103 
