Ameriean Journal of Computational Linguistics 
Microfiche 35 
PROCEEDINGS 
13~~ ANNUAL MEETING 
ASSOCIATIQN FOR ~MPUTATIONAL LINGUISTICS 
Timothy C. Editor 
Sperry-Univac 
St. Paul, Minnesota 55101 
Copyright @ 1975 by the Association for Computational Linguistics 
PREFACE 
Session 4 centered around two major topics: modeling 
the flow of information in discourse and representing and 
utilizing the knowledge of the world shared by communicators. 
The paper by Deutsch describes a mechanism for identifying 
the referents of definite noun phrases within a task-oriented 
dialogue. (Note the closely related paper by Klappholz and 
Lockman in Session 5.) Bruce compares two discourse models: 
a "discourse grammar" which defines the set of found and/or 
likely discourse structures, and a "detnand processor", which 
accounts for utterances as responses to and activators of 
internal demands. Phillips presents various cohesive links 
found in coherent discourse and then considers the inferen- 
tial process essential to filling in knowledge only implicit 
in the linking mechanisms. Cullingford discusses the major 
components of SAM (Script Applier Mechanism), a computatZonal 
system modeling the organization and management of extra- 
linguistic world knowledge. Badler describes a sys tein for 
translating visual input into propositional descriptions of 
dscrete events. Focussing on a particular type of visual 
input (American Sign Language), Keg1 and Chinchor present the 
use of frame analysis in describing various communicatory 
devices in ASL. Thanks to Carl Hewitt for chairing this 
session.--Timothy C. Diller, Program Committee Chairman 
TABLE OF CONTENTS 
SESSION 4k MODELING DISCOURSE AM0 WORLD KNOWLEDGE 1 
Establishing Context in Task-oriented Dialogs Barbara 
G, Deutsch ..........-............,. 4 
Discourse Models and Language ComprehensFon Bertram C. Bruce 19 
Judging the Coherency of Discourse Brian Phillips . . . . . 36 
An Approach to the Organization of Mundane World Know- 
ledge: The Generation and Management of Scripts R. E. 
Cullinqford . . . . . . . . . . . . . . . . . . . . . . . . 50 
The Conceptual Description of Physical Activities Norman 
Bdler . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 
A Frame Analysis of American Sign Language ~udy Anne Keg1 
and Nancy Chinchr . . . . . . . . . . . . . . . . . . . . . . 84 
American Jamal of Computational linguistics Microfiche 35 : 4 
Artificial Intelligence Center 
Stanford Research Institute 
Menlo Park, California 94025 
ABSTRACT 
Thl8 paper dtrcrlbrs part bt the dl8e~ut~e easpanant at a 
cp~ech underrfrnding SYStla far tark=0ri@ntad dialogrl 
~p~citl~~lly~ r esehinlrm far rrtrbllrhing a tacur at attention 
to rid In ldsntitytng the referent8 Of derinlta noun Ohrrrer, In 
building a teprrrentatlon of the dlrlog context, the dlscourre 
ptocssro? trkrs advantage of the fact that trrkmorlantcd dialogs 
have a structure thrt clareLY parallel8 tha rtructure of thr 
trsk, The irm@ntlc nctrork of thc system 18 partltlonrd into 
toeur rprcrr with rrch foeur apace C~ntalnlng only thorr eonceptr 
pettin~nt to tha airlog relating to r rubtrrk. The facur aprcer 
rrc link@b ta their QarraSpan4ing rubtrrkr and ard@t*d In r 
hterarchy dotrrnlnad by th~ relatianr &Rang rubtrtko. 
Tbi# rrrarrch war @upportad by the Deferire Advanced Rerearch 
Progtct8 Aprney of the Departrant of D~fenrr and raonltbrud by the 
U.8. &t1y Re8ewch OF tic@ under Contrret Wa, OAHCO~-~S-C~OOO~. 
&anouror cormunlcrtgon rntai11 thm trrnrairslon ot contcptl 
from the #p~.ker@@ aodal of the world to thr 1i8taner~s. If is 
erueirl that the rperkr? be eb1e to comaunicrte dereriPtlonB at 
concepts in hlt model in r wry thit allow8 the llrtrner to pick 
6uf the relevant retatad concept kn hi8 madel, In nornri hunan 
caaaUniert$on ft is not n@eelsrry te desct$be r concept in s 
cooplefaty unrwbt$uour way, ~ante%tur~ clues froa both the 
#Lturtf&a and t rurrsundinu dialog are eountrd an to help 
dfrrmbiqurte, Ths listenet@s prebiern Is to ura that context to 
h*%P gn hfr gbenttftcrtgm of the cancrpt being comun~cltcd, As 
r 8,rnpLe *xanplet con@lder the utfcrrnetr *Hand n$ the box-end 
U'r)nt?hr"8 Lt ~tght aecut in a canverratlon between two PIOP~~ 
warking on r rnrfhtrn4nce trrk, Although many baxmend wrenchar 
~y be knern to both the sp*&krr and the listrncrt the tact tkrt 
the Iistener hi8 r ~erttc~lar bOX*end wrench tn his hand sakar 
the noun Phtl8r ~n&lbLgu0~8. (re1 other CXIA~~~I, sea Rotnrn, 
Ranrlhartt et at,, lWSt, Ln the mort extreat* crsa, tha ure of 
pronouns depend8 rntirrly an the dl.10~ context to dcfarmtnr the 
intad8d getetantt @itR can retet ta any ringle %nanlmate objact 
Or *vente 
k W@bjN@ arises wt.th @111$tlcrl @uprlrs$anr, Oftan 
the rurroondino dlrlog tupp1i.r rnouoh information 80 that only r 
word er two ruttierr to camnunlert@ an antirr (eorglew) id@&. 
Far Wfam~lrr e@n#id@r the tollortn~ exchrngrr 
~a Bolt the pump to the ptrttorm, 
Aa 0.K. 
E1 wnrt taalr are you using [to bolt the pump 
to the ~Latforml. 
A# My fingers fate the tools I rm urlng ,,,I 
The *xp?e%sien8 in bw3kttS indicate the full utterance that war 
meant by the partial utterance, The llrtaner mu8t flll in this 
infatrnrtion from the surroundtng dialogQ 
Thir paper eonridera ruch phrnamma as they occur In 
trrklerisntrd diJ!kl~g#, BY tr#k=Otl@nted dirtog wt msrn 
conversation dlrectrd toward the cornplation of soma tark, Xn 
partieulart wa wglI be eonearnad with r C~?nPUter~baa@d consultant 
task in which an apprentice t~chniclsn communlcataa with a 
computer rystsm about tbr rsprir of! dLectromechanleaL devicar, 
The UndcrStrndlng ryrtem ~urt maintain rWdbi& of the world and ~f 
the Qialos to 61sa~bl~Uate refaranera In thu agprantlca's rpceeh, 
DIGCOURsE ZN SPEECH UNDERSTANDING 
fn r $paeCh undarstandfng ryrtr~, the direourst component is 
one at revsrrl sources of knowledge that must intctact In 
interptetlng an utterance (me ?axtan and A, Robinson, 19951 
JI RabinSan, 19791r Becaure aL the UncrrtrintY in the ueaurtlc 
rlgnrl, it is important that higher level raurcar of Rnawladge 
tika bircousr~ give advice to the ryrtarn at early rtrgar in the 
rnrlyrir;. For this raaron, in aur current speech ~y~tomr 
xoutinsr Car identlfylng the referents at d4kfhlta noun Ohtaror 
ars applied rr #@an rs a pordibla noun phrrrt is idrntlfisd 
rathat than writing far an Interprrtatlon sf the entire 
Utt~tance. In estencar the procedure rntrlkr rrtrchlng the 
recent context to find posrfblc rafer~nta and returning r lfrt of 
erndidctrc, 
E11Ipsts and Pronoun tcaolutlon r*UUir@ r more loerg context 
than the r~aalut~en of nonpranoniniL deftnltc noun phrases 
CDNPB), A descrtptian of the processinu for elllpsfs and pronoun 
rerelutton is contrined In the rectlen sDi8caursr Anr~yrir and 
Pragaatitrn In walker at ale? 1975, In this paper we concentrate 
on mechanism@ for reaoivlnu DWr, 
The Problem ot tarolving DHPI Ir bralcallp r problsa of 
finding a aatchfng StrUCtUte In nemory, Xn the crtt of a 
canputtr rystem with a rearntic nctrark knowltno~ barer the 
Problem $8 that of tLndtnp the nttrark tbtr~cture cotrtSPonding to 
the Btructur~ et the nwn phrase, The node that maps onto the 
head node of the prtse rtrueturc rtpresenttng the noun phtarr is 
the Concrpt being ldentltird by the noun phrrea. For rxaaplr, 1f 
the knowledge but eentrlnr tht nQdar 8hoWa in rLuure 1 (and 
there are no other node8 wtth (rlraent) or r (ruprrr@t) area to 
wrcnchtrlt then either nOde W1 at nade W3, but not W2, rSLl Batch 
the onrsrr Atha b~%-md wran~h~, W~tchin~ is net aLwryr 
8tr~lghtfotwrrd. lor CrrmPlr~ conrldcr the rlturtSon Partrayed 
in Ftgurr 1, Th* adr or Belknrrting elbmant, arc C 8 ecr 
Hendrlxr i975a) linkr node to delinertfnq Lniorm&tlon about 
nrmbrr8 of the e~~lr that nod* rrprracntc, El-E Ir r set of 
BOX-END 
efId type 
f A-740522-83 
FIGURE 1 NETWORK DESCRIPTION OF THREE WRENCHES 
WRENCHES 
-7-- 
S 
----- 
FEW, PEF HEW, DEF~ 
end type I 
-- 
I 
J 
FIGURE 2 SEMANTIC NET SHOWING MEMBERS OF TWO SUBSETS OF THE 
SET "WRENCHES" 
WRENCHES 
F-7 
BOX-END 
FIGURE 3 SEMANTIC NET SHOWING PARSE SPACE FOR 
"BOX-END WRENCH" 
box-end wrancher to which HI balang8, H-L 1s r rat aL h@xm@nd 
wrrnehar to whtch W2 brlan9s, If the rpprrntlce now rays, 
u,rr, the box-end wrench% he mtrnr Wlr The uttrrincc level 
8trocture created by prrrlng Cree Handrixr 1975W for the Phrrra 
boxmend wrrnchVg Inslba the space NP In Figure 31 roma 
deductfan ~ust be dona to e#trbl&Sh the carratpandence between Hi 
and W), 
The structure artching taUttneS that for# r brgic part of 
the DNP terdJvar take as tnputr a parre level network af nOd@S 
rnd rrcr and e data network to match it against, (The eurrrnt 
hateher war written by R, E. Flker). Ln general, a Large nUmbar 
05 abject8 In the data net may b@ candid~tt~i far the mstshar 
(irbrr object8 that arc alsRant8 of the #&me rat rr the object 
being Ebcntlfled bY Zhe DNPL SInca, in itlelf, the mateher has 
no way of dacldinu whlch obleetr to consider first, additional 
aechsnismr ate needed to llwit the aerrch, 
FBCWB SPACES 
The dlscaut&@ componadt must determlna a rubnct of the 
semantic net knawirdge bare for eonrideration by the matcher, 
That Lt nurr be able to rrtablirh a8 a 1oc.L cantrwt that 
rub8et of thc rystsrer tot11 knowltdgd bur thrt $8 trltvant at a 
given point in the dialog, Thir 1s anrlogour to detrrmlning what 
ia in the urerps focus ot rttrntien. Put anofher way, wc Weuld 
like to hiuhltpht certain node8 and arc8 of the rearntlc nrtwark, 
In t&skrnorianted dialega, tnr dlrloo ~ontrxt ir actur$ly r 
camparits 4L three different camponant cantaxtat a Verb41 
context, r task context, and r oantext of gan~tal world 
knowledgal The verbal eontext include8 the hirtory of prbcsdlng 
utterances, their ryntacfic farm, the object8 and actions 
ditcurred tn them, rnd the prrticulrs words usad, The talk 
context Is the fecur ruppli@d by the trrK betng worked bn, It 
inclub*$ such information art where the Current subtark fltr in 
the overall plan, what its oubtaakr are, What actiana are likely 
to tal10W~ What obj~ctr are important, The context of! general 
world knerledge t& the i,ntormatlan that reflect8 a backgteund 
undetrtanding of the prapettlal and intearslations sf oblactr and 
rctionrt tor example, the fact that tool bbxrr typically contain 
tools and that attaching entail8 Sam@ kind of fastening, 
To highlight abject8 in the dialog and prb~id~ verbal 
CUnttxtr network partitioning Ir used in 4 new ~dy. Hendrgx 
(1975a) ha8 ruggsrted lmporin~ a legleal Battgtioning an nstwarK 
atructurad for encadlng logical connactlraa and quantlflerr, 
Using tht srma technique, a tscur partitioning may be ulrd to 
djvida the network 1 numb- of local contextr, Nodat and 
rtcr b@lonO to Both lagleal and Xocur apac@r, The logical and 
focus prrtltions rre tndependrnt of on* another la EhQ #ant@ that 
the loutcrl spastr on whteh a node or arc lL18 neither detrrmina 
nor deprnd an thr tocur cprcet In which the nods or arc tlsrc 
A fiew focus space la created far each subtark that antrre 
the dialog,  he 6 modal (described 8hortly) impoaar a 
hiararchical otdcllng, based on tha rubtask hiacrrchy, on thrrc 
spacer, Thir htarrrchy determiner what nods8 and arc8 art 
vlsiblt from a glvtn space, The arcs and noeer that belong to a 
$pact age the only ones Immediately vf~lbl@ from that #pace. 
Arcs and node8 in spaces that rrs abave I glven space in the 
hltrrrchy ate potentially vlriblar but nuct bt requaotrd 
speefticrlly to be teen, Other arc6 and nodes are not visible, 
& node may rp~trr in rnY number of focus SPactr, When the 
raar abject ia used in two eliffarant rubtarke, elthtr the rape or 
dlffartnt a8ptttr of the abject may be in focur in the two 
%Ubtr~k%, It Ir al~o porsibLd for r nade Q? &re to be in no 
focus Spacev Ln this cassr the object 18 not StronglY assoelated 
with the ictu11 pertoraancr ot any particulrr rubtrgk. Such 
objsctr must br dtrctibtd relrtivr ts tho global tack 
envf rvnment For Ce&pLctenartr wc dcflna a tap=&bst gg~ec, 
called the wcanmunrl rpace5 and a hotto~=mort rprec, callad the 
*vista sp&cew, The communal $pact contains the 1elatlonrhip8 
that are tlee invrrLant {smgor the fact that taolr are founci in 
teal barer) or earnon to rll eontcxts. The vista space LI below 
a11 other spaces and hCnce c@n scc tvcrything In the scmrntle 
net, This psrrpectivr is uscFu1 far drtcrwinlng the 
r@k.tfonshi~r lnto which an object har rnt~rcd. 
Th* task sadel tn our ayrttm will br cmb~dicd in r 
procedutrl net which @ncaQcs the trrk ltructure in a hierarchy of 
8ubt&sks rnd encoder qach &ubt.rk rr 8 partirl orderinq of rtepr 
C88crtdotit 197s). The pracedur@& net ryrtam rue rllowr tarxl 
te be axplndad dynamically to further irvalr ot detail when 
n@ce18aryI A rrPre8antrtion of the hirratehy of subtarkr 48 
hpartrnt for refer~nce ta#olutdan, An ensmlnation ~f 
task-otiant@d dlr1ogr show8 that rcfctencss operate within tarkr 
and up tha hlrtarehy chaln (DaUtBch, 1974). UIing the hierarchy 
of the praer4ural nrt to impor* r hierarchy on the faeus opacar 
enables us to $*arch for rtfQrenceS in hierarchical order, 
Havtnp a rrpretrntrtion of the PartlrL ordtrLnp ot task@ allow8 
ur to capture the alt@rnatlV@8 the rpprcnttFu has in cheoslng 
robtrquant tasks, 
wc have expllcitly #@~lZ'.t@d the thtar camponsntr of the 
dialog context, The reprrrantatlon of an abject in a tscur apaes 
will Includ~ only the relatienrhipr that hrve barn mentioned In 
the dialog conearning the eoracspandinq rubtask or that are 
ihherant in the pfoeedural net description of ths loerl twg, 
Thusr the verbal esmPontnt dr ru~Dlisd by Eha lndotmrtian 
recorded LA the focus space hiarrtehy, Forward rsfotcncsr to 
objects in the task (task component) are found by axamlning thr 
Pracedural net, The general world knowledge camPonsnt is 
indaraetlan that is prtarnt in the communrl space, When 
rasalvlnq a ONP, ws can dynamically allaeats effort batwean 
sxrmlntng llnkS In thr &seal Locus rgaee, looking forward In th~ 
task, looking b~ek up tho facur space hierarchyl and looking 
daspar into knoWt@dga bare lntatmrtXon. 
GENERAL 8TR1TEGY 
the cutrently active LOCUS @pace and thrn to rxrnlnr the next 
level of detrtl in the task, If the raterent cannot be found tn 
and then furtner down the tark chain, The currant Ce~tcxt to be 
U#ed by the dllC0Ur8e ~~OCIIIOI Inctudcaa 
(11 A focus space containing the bblrcts currently In focus 
(2) A link to the crraci&tod node In the tark modal 
(31 A type flag urQd in ratting up QXp*ctationr, 
The type is necessary becru88 there are aubdtrlogs that do not 
4irectLy rcflCct on the task StrUctUra, For example, there era 
rabdfal0~~ about toot Id~ntiftc4tion (wWhrt ir r whaelpullrt?~) 
rna fool ( WHoW do I ure this wren~h?~). Raftr~ncar in these 
iubd~rlegr do not fblloq the &@me few# space hierarchy and task 
The dlrlog Shawn In Tlble 1 Will be examined to show how a 
ca~#lnrtion of a task model and focur spaces may be ulcd to help 
I rQUfd like You to arsembla the air coaprasoot, 
0.K. 
I rugpert you begin by rtt~chLng the pump to the platform, 
0 4, 
What #re You doing now? 
using the pliers to gat the nut6 in underncrth the platform, 
I rWllza thfr is c dlfflcult trrk, 
f@'m tiqhtaning the baltr now, Theyare all In p2ac+, 
Goode 
How tightly should I install this pips sibow that fits into 
the Pump? 
Tabla I: OubdlrLqg tor aiteoaprerror rrrembly, 
A pattirl procedural net for asrambling an air camptensor $8 
8hawn in Flgurr 4, The termo winstalp, flconntct" @attachN 
refw to conceptual a~tlans Pugher than lexlcal :tam#, The 
darhtd llnar connect higher level tasks to their conrtituent 
robtark$. The time 8bqU@nC@ of #taps in tho task It left to 
right, The partial otdaring of te$k# 18 encoded with the S and J 
nodes. The S1 or ANDBPLIT, node indicatar thr beginning of 
ParaLltl brrnchro In the partial otdertng, The nods8 on arc8 
coming out of an 3 node may be done in any order0 The JI ~r 
ANDJOIN, nsdc indlerter a paint where several parallel tasks nu8t 
be completed, The box@# tabelad T are relevant to the subdialog 
fragment , 
In the following rnaly8is of the dialogl the utterance6 are 
ralrtfon to the dialog hirtory and the procedural net task model. 
(The @@arch for references inside foeun apace8 II currently 
implementadt Integratlbn with the task model I$ not,) The context 
intaraat ian listed under (13-(3) above Is shown in the 
network1 (2) PNETTZE; (31 FSTYPE, 
E; X would like you to arr@mblt ttao air compgerror. 
kt 0.K. 
Er X ruggcat you begin by attaching the pump ta the plrttotm, 
[At thL8 Point, we ate at task TI1 tncur spacer F60 and F61 Show 
Ln Uguto 5 have been gat UP~J 
CThil coULd mean I'm donel but the trs~snec come) right after thc 
inrtructtan and the tark trKQu a ~hl1arJ 
1 ASSEMBLE AIR COMPRESSOR 
TA-740522-84 
FIGURE 4 PARTIAL PRqCEDURAL NET FOR ASSEMBLING AIR COMPRESSOR 
t 
INsTiLL . I ) 
0- 
AFTERCOOLER 
i 
.a. 
E LBOV'' 
i 
E 
I 
1 
me, 
I I 1 l NSTALL , 
.** 
I 
1 
BELT 
I 
**m 
I 
I 
PUMP 
C 
**a 
1 
PULLEY 
om. 
I INSTALL ,I I I 
erne 
AGTERCOOLER 
AFTERCOOLER 
AFTERCOOLER 
TO PUMP 
POSITION 
PUMP ON - PUMP TO 
PLATFORM PLATFORM 
- 
T5 
START PUMP- TIGHTEN PUMP- 
MOUNT NUTS MOUNT NUTS 
AND BOLTS 
A 
F SO 
PNETTIE TO 
FSTYPE TASK 
FIGURE 5 FOCUS SPACES FSO AND FS1 
r 
------- 
1 
I e 
PLATFORMS 
I 
F S4 
ATTACH -0PS PNETTIE T4 
FSTYPE TASK 
BOLT/NUTS 
POSlTlQNlNGS 
I 
I 
I 
PLIERS 
LFS4 ------- _I 
TA -740522-86 
FIGURE 6 FOCUS SPACE FOR STARTING BOLT/NUTS OPERATION 
C: What are you datng new? 
(After r ruitrblr raitlng prriod, the rxprrt qurrltr the progrers 
of the usat ,l 
At Wring the plltrr to get the nuts in un.derntrth thc platforln. 
fRthe pllersw can be rcrolved becrust thrrr is only on@ pair! it 
this were bat the Clttr the talk mdrl would hive to be 
conrultrd. For both *the nutsa and .the plrttotaar the rJ 
hl@rrrchy it consuJted. @The plat far^^^ Pt fs in facur Ln the 
current Fa+ There i8 no glun of nuts so re look forward in the 
task model. The relevant pwtr arc loeat~d in rubtask T4, This 
CIUles a ntw contextr to be cctablirhed ar shorn in Figure 6.1 
Et I terlfte thtr 16 r difficult task, 
[An attewgt to asreso the rpprrnttee@s petecptlan ot the problew, 
Note that at thrr point the trrk baa barely begun and the rxpttt 
docs not have r very goad ~adei of th~ rpprcntfcs.1 
At 1% tfghtening the bolts now, They're all tn puce, 
tFS4 contain, "he baltrRj they were brought Into foeus when TI 
was gtarted, uThey" ii ddrter~lnrd to refer to 'the baltrVy 
checking thr obgaefr in tha previaus utterance for number 
.Ureesentr Note that the Laat rtitamsnt canf~rwr the closure af 
4 wT$ghtenQop*ns TS ,I 
E t Good, 
A2 Hew tightly 8hauId Z inrtrll thlr pipe ekb@W that dltr Lnta 
the pun$? 
[There is no vJPe @%bow i~ the currant Fa. (Note that UP until 
that point in the QPWY the rpprentrce ni$ht have bran asking 
about task TS), We close T5t becruse of the trrk rttusturr this 
bring8 ur back up to th@ top Ievcl, we art at the point of 
looklne into naw tcaks, At pramnt all of t terks &re 
ca%siSered equally, Eventually Tb lr found fO lnvolvs an albow,] 
In rum~~tionr then@ the sacug 8pacw P~OV~~Q r way aL 
taalrting certain parts at thr remantic nst, thu8 pt~Yiding e wry 
to focus on insediately relevant Intorartion, By tylnu the foeur 
trgk retapenceg, Both the tank mads1 and the facur macar are 
linked to the general knonladgI brae; th~~r it Is possible to QO 
from an item bn either the trrk madrl or a Lacul ~paca to other 
known but not previously rafrrrncrd information &bout that item, 
Tha focus rpree~ and ta@k model pravida rccrrr to context 
intormation about objects in the dosaln, maklng it Porrtblr ts 
DIUtSCh, BltbarP G. The bttu~turC bt Task-Oriented Dlalapt. 
Contributed PIp@?Sr LEEE SYMgo8luR on bpaech RhbO~nitlQn~ 
Carne~ie~nellon Uhiveralty, Pittrburqh, Psnnrylvrnll, 
15-19 Agrll 1974@ IEEE, New York, 1974, 2501254. 
Hendtix, Gary G. ExprndZnp the Utlllty of Saarntie NctworkL 
ThtaUph Partitioning. Abvanee Paperr ot the Fourth Intrrnstiona~ 
Joint CohP@tren~d on Artlticia1 Int@llX~~nce~ TbUlai, 
Georgia, U88R, 3-1 S4ptcSbW 197S, 115421 (e), 
Hendrlxt Gary G, damant ic Praeasrpfng far Sp@@eh 
Undelrt%nding. Presented rt thr ThLrtcenth Annual Martlnp of the 
A~~eci8tien bar Com~utatlonrl Lin$~l@tle#, Bo@t~n, MaSl4ehU16tttr 
30 QCtabat 6 1 Nov4mbet 1995 (&I, 
N0t14nr D, All RuB~~MP~, rb El, dt B1.r Ex~l~rafions in 
CopnitLon. We Hq rrrcmin and Compmy, San Craneisco, 1995, 
Paxton, Wllllrm Har and Robinson, Ann E. Bystern Integration 
and Control ln a Bptech Undet$tbnding Syfitem, Presentad rt the 
ThlrtWnth Annuri Msrting of the A~roclrtion for Computational 
tinoulrticr, Bortan, Ma$t@Chu#rtt#, 30 Octobrr - 1 Novlmbar 197Jq 
Rabln80nr dm@ Ja A Tun~rbl~, Per Eormrnc* Grawmrr, 
Prraentad at the Thlrtranth Annual Mwtlng sf the Atra~irtion far 
Coaputrtianri Lingui8tier r Bo8tanr Mrrrachu8attrr 30 Octdbf~ - 1 
Novrmb@r 1995, 
8r~erdot!l, E~lr A gtructure far Phnr and BsR@vtor, 
Tcchnlcal Note 109, ArtlficLal Intel$lgenec Canter, Stanturd 
Rerearch Inrtltute, Menlo Park, Califatnia, Augutt 1975, 
BuJ t BeraaeR and Rewnan Iac, 
50 M~vrton Street, Catabridge, Wassachusetts 022 38 
ABSTRACT 
Higher order structures such as ndiscourseH ane "intent'ionfi 
must be included in any complete theory of language 
understanding. This paper coEpares two approaches to modeling 
discourse, The first centers on the conce~t of a wdiscourse 
grammarn which defines the set of likely (i.e. easily 
understood) dl scourse structures, 
A second approach is a tldemand processingtr model in which 
uttkrances create demands on both the speaker and the hearer. 
Res.ponses to these demands are based on their relative 
vimportancen. bhe length of time they have been around, and 
conditions attached to each demand. The flow of responses 
provides another level of explanation for the discourse 
strue ture. 
These two approaches are discussed in terms of flexibility, 
efficiency, and of their role in a more complete theory of 
discourse understanding, 
As has been said many times, understanding anything a 
problenr, an acticn, a word demands some knowledge of the 
context fn which it appears. Certainly this is true of language, 
uhere an utterance s meaning may depend upon who the speaker is, 
when he is talking, what has just been said, who the listeners 
are, what the Durpose of the conversation is, and so on. It is 
reasonable to define language understanding as the process of 
applying contextual knowledge to a sound (or string of symbols) 
to produce a change in that context. Successful language 
undkrstanding scaurs whenever the changes in the hearer s context 
(model of the world) coincides with changes the speaker intended. 
Of course, stating a problem in a different way does not 
solve- it, Instead it suggests a series of subsidiary questfons 
such as: 
(1) What is a context? What does it look like? What are 
its components, its structural characteristics? 
(2) How doses a new utterance change an existing 
context? What is the assimilation praceas? What must be 
kept; what aan de discarded? 
(3) How does a model of changing eontext account for 
observed phenomena such as the ability to switch 
contexts, and to return later (but not too much later)? 
(4) How does the domain of conversation influence the 
structure of a "contextH? Do different nechanisms 
operate when the subject natter is tightly constrained? 
It may be quite a while before questions of this type can be 
answered fully, This paper is a discussion of some of the issues 
and of the characZeristies required of a solution. To do this, 
we will exmine two classes of discourse models which have been 
proposed. The first is a ndiscourse grammarw approach which 
attempts to define the set of likely sequences of utterances. 
the second is a "demand processort"pproach which attempts to 
account for utterances as responses to internal demands. Before 
discussing the specifics of these models, a sample dialogue, from 
the nttch restricted world of man-machine conrnunfcation, is 
presented and discussed in Section 2, 
2. - An Examole from Travel Budaet Management 
The proble~l of discourse understanding is complex enough 
that a complete analysis is certainly premature. We will examine 
h-ere a micro-discourse, restricted to two conversants, one of 
them a machine, and, a conversation restricted in scope to that 
of managing a travel budget. Imagine that a person responsible 
for a traveJ budget is trying to decide how many people to send 
to an upcoming conference. He has available a computer system 
which can answer questions about the budget, conferences, trips, 
and so on. In talking to such a system, the travel budget 
22 
manager is engaging in an admittedly simple dialogue, but one 
which nevertheless exhibits many interesting properties of 
discourse understanding, (The following dialogue is 
hypothetical; however a system of this type is being developed at 
BBN (Woods, fi a (1974, 1975))). The (human) manager begins: 
(1) WHEN IS THE NEXT ASSP CONFERENCE? 
May 15-18, 1975 
(2) IS THAT IN OUR CURRENT BUDGET YEAR? 
Yes. 
(3) HOW MUCH MONEY DO WE HAVE LEFT? 
$316.41 remains in the bud~et. 
(4) WHAT IS THE COST OF THE ASSP CONFERENCE? 
The registration fee is $25. Would you like me to estimate 
the cost of a trip to the conference? 
(5) YES. FOR THREE PEOPLE. 
Assuming they travel by air . . . 
Assuming each person spends three days in 
Philadelphia . . . 
Assuming a per day expense of $25 . . . 
The c03t would~be $371.50. 
(6) WILL THE BUDGET COVER THAT? 
No, the deficit would be $55.09. 
(-7) ASSUME THEY GO BY BUS. 
The cost would be $327.40. The budget deficit would be 
$10.99. 
2 3 
(8) OK. SCHEDULE A TRIP FOR THREE PEOPLE EY EUS ?C THE AECP 
MEETING. 
Done. That leaves $ -10.99 in the budget. 
Even in a simple dialogue like the one abcve we see 
important discourse phenomena. For exanple, the cacager's isput 
(4) is ambiguous to the system. "Cost" could near; "segistratim 
feeu or rlcost of a trip to the ecnferencen. The syste~ hzs t~ 
ask about this ambiguity, producing arz erzbedded questi~n-ansxer 
pair within the criginal question-answer pair (cf. Schegloff s 
(1972) "insertion sequencesr1). However, there is sore than a 
Q-1, Q-2, A-2, A-1 pattern in (4) and (5). The system does ncit 
just say, "Did you mean fee cost or 'trip cost'?". Assucing 
that "fee costn is a reasonable interpretation, and useful to 
know besides, it simply states that fact, and asks about the 
other interpretation, which would demand more zcnputation. b. 
discourse model should account for 'chis apparent awareness of 
co'tnputational difficulty, which is exhibited in human 
conversation and between a human and our idealized machine above. 
Another phenomenon worth noting in this dialogue is the 
variation in detail and precision anon@ the utterances. Sentence 
(8) is fairly precise and complete. Since alternatives have been 
considered to the trip he has decided upon it is important. 
24 
stress those aspects of the trip - "three people", Ifby bus - 
which have been in question. On the other hand, sentence (3) is 
clearly elliptical. Tbis is all right since the question is 
merely exploratory. Furthermore, the previous question insures 
that "money . .. leftv refers to money in the current budget. An 
adequate discourse model should account as well for our apparent 
ability to accommodate for the speech channel capac9ity, to 
minimize transmission errors through the use of redundancy and 
stress, and in genwal to attempt to optimize the communication. 
One way to account for these and related phenomena .is to 
postulate a discourse grammar. The grammar might say that part 
of a dialogue is a "question-answern pair, and that it may be 
recursive in the sense that question-answer pairs may be embedded 
within it. This approach is discussed in the next section. A 
contrasting approach is to say that each utterance produces 
ndemandsfT in the heads of the listeners. Responses to these 
demands may take the form of subsequent utterances. This latter 
model is discussed in Section 4. 
Upon reading a dialogue like the example in Section 2, most 
of us readily form an opinion about its structure, In any 
dialogue we see this kind of structure: one person is asking 
another to do something; two people are arguing about politics, 
or discussing a novel. There is almost always a structure higher 
2 5 
than the individual sentences, In the example cf Seetim 2, the 
travel budget manager seens to be entering into a "scneduie a 
tripu dialogue. His question abaut, a future cor,ferer.ce is one of 
the cues to a bundle of infornation known by both  hi^ arid the 
system about scheduling trips. Such a bundle has been varicusly 
referred to as a "fraroeV (IJiinsky (19751, Winograd ! 1?75)), a 
trscriptrr (Abelson ( 1975), Schank and Abelson ( 1975) ) , a rrtbenev 
(Phillips (1975)), a "story schemar1 (Rumelhart (1975)), ar.d a 
ttsocial action paradigmF1 (Bruce ( 1975a, 1975b) ) . 
The information associated with scheduling a trip includes 
facts about dates and times, about the budget, about travel, 
about conferences, and so on. It also includes ttplanstr, that is, 
tiwe ordered structures of beliefs about achieving "goals". In 
this case, the goal is scheduling a trip to a conference. (See 
also Bruce and Schmidt (1974), Schmidt (1975)). Cne such 
partially instantiated plan might be - 
1. Find out to which budget the trip should belong. 
2. Determine how much is in the budget (budget). 
3. Figure the cost of the trig (tripcost). 
q. Decide whether (budget - tripcost) is acceptable. 
5. If acceptable, schedule the trib and stop. 
6. If not acceptable, determine if trip can be' 
modified to be cheaper. 
a. If modifiable, go to 3. 
b. If not modifiable, stop. 
26 
The steps (1 - 6) above are ordered, though nothing is said 
about their relative lengths. Also, there are variants on the 
plan where the order might be changed, e.g. step 3 might come 
before step 2 in some other plan. The structu~e of such a plan, 
coupled with the by now commonplace observation that a discourse 
is structured, leada to the natural idea of representing a 
discourse by a grammar, Such a grammar may be large; it may be 
probabilistic; it may apply in only limited domains. 
Nevertheless it does give some idea of what to expect in a 
dialogue and may play a central role in language comprehension. 
A portion of the grammar for our example dialogue is shown 
in Figure I. This is an Augmented Transition Network Network 
(ATN) in which the arcs nay refer to other networks (PUSH arcs), 
may signify direct transitions to other states (JUMP arcs), or 
may signify conclusion of the path (POP arcs). For example, in 
addition to this "SCHEDULE" network there is an network 
wherein the manager describes a new trip to be entered and the 
system asks him questpons to complete the descriptian. 
Fig. 1. ATN for scheduling a trip. 
A discourse or dialogue grammar can be used with a modified 
ATN parser to ttparsetl a dialogue, generating both analyses of the 
current utterance and predictions about the one to come. Tn 
fact, one such modified parser and grammar has been implemen$ted 
for the BBN speech system (~ruce(1975c), Woods, & (1975)). 
For many dialogues, the grammar applies quite well, testing for 
the head verb in the utterance, the mood, and checking 
presuppositions of the action implied. When successful, it makes 
28 
corresponding predictions for application to the next utterance. 
Unfortunately, when the grammar fails it is not very good at 
recovering from its error, 
Discourse grammars seem to be most effective in tightly 
constrained domains, more for instance in a discussion about how 
to coak a turkey, where there are specific subproblems to 
analyze, than in the travel budget management domain, and less 
still in a general question answering context. (Cf Deutsch 
( 1974, 1975) 1. 
Lest it be thought that discourse parsing is just sentence 
parsing for !'big sentencesw, I should emphasize some of the 
differences, differences which some would say preclude the use of 
terms like "grammaru ,, llATN1', and llparsing'l, First, discourse 
parsing ppoceeds in a mode of partial parse, then output, then 
partial parse, etc. In other words, the goal is to derive 
information from the partial discourse which has occurred to 
suggest what may follow and to explicate the role of the current 
utterance. The parse is never completed, no structure is built. 
Since the entire discourse is not available to the parser (as the 
entire sentence is to a sentence parser), it is necessartly 
probabilistic One can never know how the next utterance may 
alter the current interpretation of the trend of the dialogue. 
Another important difference is that PUSE'S and POP'S in the 
discourse grammar are wsloppyv. That is, the participants in a 
dialogue may descend several levels ("Before you finish, let me 
29 
''Before that . . .") and never lfpop" back up tell you about ... , 
to the original level of the discourse. A discourse parser is 
faced with the peculiar phenomenon that a PUSH usually implies a 
POP but not always. 
Some, but not all of these oddities of a discourse grammar 
are resolved by an approach which emphasizes internal models of 
the speaker and the listeners. This approach is discussed in the 
next section. 
Demand Discourse 
One obvious characteristic of a discourse is that many 
processes may be occurring at once. A person cannot, nor does he 
wish to respond at one time to all unanswered questions; extend 
each unfinished line of thought, or deal with every 
inconsistency. While a grammar may predict the most likely 
action for a given point in a dialogue, it is not very good at 
suggesting alternatives out of the main line. There appears to 
be an additional mechanism of roughly the following form: 
An event in a discourse (or prior to it) sets up a number of 
internal demands. Examples of such demands are to confirm what 
was said, explore its consequences, dispute it, answer it, etc. 
For any given event (such as an =utterance) the~e may be none, 
one, or many demands created. A person's own action may place 
demands upon himself. If X asks a question of Y, then Y normally 
establishes an internal demand to answer the question. But X may 
30 
also establish a demand of the form, l1check to see if the 
question has been answered1!. This latter demand may generate a 
later utterance such as, "Why haven t you answered me?". 
Simple demand models already exist in a few systems. In 
general, they suggest that utterances are produced in response to 
conditions in the (internal model of the) environment rather than 
as units in a larger linguistic form. (See also Stansfield 
(1975)). It would be premature to argue that either a demand 
model or a grammar model is sufficient by itself, Instead, what 
follows is simply a description of a demand model for the travel 
budget management domain mentioned above. 
Internal demands on the travel budget system help to explain 
how one computation of a response can be pushed down, while a 
whole dialogue takes place to obtain missing information, and how 
a c~mputation can spawn subsequent expectations or digressions. 
Associated with each demand is a priority, a pointer (pur~ose) to 
the demand which spawned this one (if any), and a time marker 
indicating how long the demand has been around. An active 
unanswered question is a typical demand with high priority. 
Demands of lower priority include such things as a notice by the 
system that the manager is over his budget. Such a notice might 
not be communicated until after direct questions had been 
answered. The fact that some questions cannot be answered 
without more informatia~ leads to the 
User-makes-query 
System-asks-question 
User-clarif ies 
System-answers-query 
kind of embedding which is typically represented in a discourse 
grammar by a PUSH to a ltclarificationU state. 
Counter-demands are questions the system has explicitly or 
implicitly asked the user. While it should not hold on to these 
as long as it does to demands, nor expect too strongly that they 
will be met, the system can reasonably expect that most 
counter-demands will be resolved in some way. This is an 
additional influence on the discourse structure. 
A demand model also includes a representation of the current 
topic, the active focus of attention in the dialogue. For the 
travel budget system, it could be the actual budget, a 
hypothetical budget, a particular trzi-p, or a conference. The 
current topic is used as an anchor point for resolving references 
and deciding how much detail to give in responses. Again, this 
structure leads to certain modes of interaction. For example, if 
the manager says "Enter a trip," the system notes that the 
current topic has changed to an incompletely described trip. 
This results in demands that cause standard fill-in questions to 
be asked. If the manager wants to complete the trip description 
later, then the completion of the trip description becomes a low 
priority demand, 
5. Synthesis? 
Discourse has been. an object of' study for many both in and 
out of the field of computational Linguistics. Especially worth 
noting is the work of sociolinguists such as Labov (1972), Sacks, 
Schegloff, and Jefferson (19751, and Schegloff (1972). Linguists 
(e.g. Grimes), sociologists (e.g. Gof fman (1971)), and 
philosophers (e.g. Austin (1962), Searle (1969)) have important 
direct or related contributions. I certainly can't presume in 
this short paper tan ~ive the definitive solution to all the 
problems revolving around the discourse question. What I have 
tried to do is to emphasize a distinction in approach between 
looking at a discourse as a linguistic whole with subparts being 
individual utterances, and as a side effect of responses to task 
demands . 
Both approaches are useful in exemplifying ways in which the 
otherwise hazy area of discourse might be modeled. The grammar 
approach makes the strongest statement about actual discourse 
structure and can best be used where the structure is well known 
or can be tightly constrained, e.g. in generating a discourse or 
in a man-machine system where the computer imposes control on the 
dialogue. A grammar and a discourse parser can be very efficient 
in such situations. When the dialogue is less predictable the 
(more bottom-up) demand prooessing approach may be more resistant 
33 
to vtsurprisesvl in the dialogue. 
The ultimate discourse model probably contains aspects of 
both goal-directed grammars and of localized responses to 
demands. What should be particularly interesting to see is how 
characteristics of the model are affected by the type of 
discourse, human-machine v. human-human, problem-oriented v. 
information-exchanging, or new domain v, old. 
American Journal of Computational Linguistics Microfiche 35 : 36 
Department of Inf oxma ti on Engineering 
University of Illinois at Chicago Circle 
Box 4348, Chicago 60680 
ABSTRACT 
The component propositions of a coherent discourse exhibit anaphoric, 
spatio-temporal, causal and thematic structures. Not all of this struc- 
ture is explicit, but must be inferred using a model of cognitive know- 
ledge. The organization of knowledge in the model allows a bottom-up 
analysis of discourse. Furmer, knowledge is formed into small complexes 
rather than into the large monolithic structures found in Scripts/Frames. 
1. The Structure of Coherent Discourse. 
- - 
A discourse is judged coherent if its constituent propo'sitions are 
connected. Various types of cohesive links are observed in discourse: 
anaphoric, watial, temporal, causal and thematic. We will formally 
describe the structure of a well-formed discourse in terns of these 
connectives. 
1.1 Anaphora. 
Two kinds of anaphora can be distinguished. The first is marked 
by the presence of a profom (or by the repetition of a form) : 
(1) Henry travels too much. He is getting a foreign accent. 
Antecedents may be nominal, verbal or clausal. 
The second kind of anaphora has a dependent that is an abstract 
term for the antecedent. For example, 
(2) John put the car into 'reverse' instead of 'drive' 
and hit a wall. The mistake cost him $200 in repairs. 
'Mistake1 in (2) is an abstract characterization of the gear selection 
expressed in the first sentence. 
A conventional way to label the recurring actors 
in discourse is 
as Idramatis personae'. However cohesion can result not only from 
multiple appearances of people, but of any concept, as in (2). 
1.2 Spatio-temporal and Causal Cohnectives. 
Space, time and cause give coherency to a set of propositions. 
(3) The King was in the counting house, counting out his 
money. The Queen was in the parlour, eating bread 
and honey. 
The actions in (3) are set in different rooms, but of the same 'palace1. 
(4) After Richard talked to the reporter, he went to lunch. 
The temporal semence of events in (4) is expressed by 'after1. 
(5) John eats garlic. Martha avoids him. 
To non-aficionados garlic is known only for its aroma, detection of 
~hich causes evasive action. 
Cause, illustrated in (5) is an important discourse connective. 
Note however, that this is an ethnocentric view; in akher cultures a 
different position may have to be taken, for example, a teleological 
world view (White : 1975) . 
!Ibis dimension of discourse skructure is termed its 'plot' structure. 
3.. 3 Thematicity. 
Discourse is expected to have a theme, to have a topic. 
For example, 
6 Dino Frances drowned today in Middle Branch Resevoir 
after rescuing his son Dino Jr. who had fallen into 
the water while on a fishing trip. 
is a new story from the New York Times, with a theme of, say, 'tragedy'. 
Discourse may have more than one theme, but these should not conflict. 
(7) Eating the fish made Gerry sick. He had measles in May. 
In (7) we have an incoherent structure. The proposition 'Gerry sick' 
belongs both to a topic 'food-poisoning' and to a biography of illnesses. 
The analysis of fairy-tales by Lakoff (1972) suggests that discourse has 
a strictly tree-like thematic organization. 
It is concluded that the propositions of a coherent discourse are 
connected either by coreference Or (preferably) causally, and that it 
has a single theme (which may be the root of a tree of themes). 
2, The Role of Inference. 
Not all of discourse structure is overtly stated; discourse is highly 
elliptic. In (4) the discourse connective 'after' is present to mark a 
temporal sequence, but in (5) there is no realization of the causal relation 
between the two propositions. Normally one assumes that a discourse is 
coherent; hence (3) is most acceptable if the rooms are taken as being with- 
in the same habitation. Evidently a reader must infer omitted structure. 
The inferences axe made from his cognitive store of world knowledge. 
There is much discussion at present about inference as part of under- 
standing. To make inferences is easy; the problem is to make the right 
ones. It helps to have a goal. It is suggested that discourse can be 
said to be understood when it has been judged coherent, as defined above. 
3. Mechanisms of Inference. 
A model of cognitive knowledge -- an encyclopedia -- should be 
capable of making the inferences necessary to form an opinion about 
the coherency of a discourse. The present encyclopedia originated with 
Hays (1973); a fuller description can be found in Phillips (1975). It 
is implemented as a directed graph. Labeled nodes characterize concepts 
and labele'd arcs relations between concepts. 
Propositions have a structure of case-related concepts, based on 
Fillmore (1989). This is our 'syntagmatic' organization of knowledge. 
AS propositions are essentially the building blocks of discourse, we 
will not dwell on their structure here. 
3.1 Anaphora. 
If the dependent is a profom then part of understanding is to 
determine the correct antecedent. There are syntactic constraints 
(Langacker: 1969) which serve to narrow down choices for anteaedents and 
to give an order of preference. The chosen antecedent will be the first 
that, when substituted for the proform, produces a meaningful proposition 
that is coherent in context. 
A meaningful proposition is one that has a counterpart in the ency- 
clopedia. The counterpart may be the self-same proposition, or more 
likely. a generalized proposition (hereafter a GP]. For example, 
rather than 'Joan drink milk', we would expect to find 'animal imbibe 
liquid'. 
How are GPs found? All concepts belong to partially ordered 
taxonomic structures in the encyg-edia (our 'paradigmatic' organiz- 
ation of concepts). From any concept it is possible to follow para- 
digmatic relations to a more general concept, which may be a constit- 
uent of a proposition, An intersection of paradigmatic paws origin- 
ating from each concept in a discourse proposition (hereafter a DP), 
taking account~fsyntagmatic structure, gives a GP. If there is no 
such intersection, then the Dl? is not consistent with encyclopedic 
knowledge. 
Abstract terms can be defined by complexes of GPs, each having 
sufficient conceptual content to define situations in which they apply. 
For example, a definitionof 'mistake' must be such that it applies to 
part of the first sentence in (2) . 
3.2 Space, Time and Cause. 
To infer oxni-tted spatio-temporal and causal relations (temed 
'discursive' relations in the encyclopedia), it is also necessary to 
locate GPs. The encyclopedia, of course, includes these relations, but 
between GPs. ~chematically, from a discourse proposition P we can 
1 
locate P a GP, in the manner outlined above. P may have a discursive 
2' 2 
relation R to another GP, 
P3 
. A proposition P a particularized version 
4' 
of Pj, and the relation R, between P and P can be added to the 
1 4' 
discourse, figure 1. 
DISCOURSE 
Often P4 will be a propasition already stated in the discc~nzoe; merely 
the relation need be inferred to augment the plot structure, It may, 
however, be necessary to, infer a chain of propositions to link the 
original DPs. The question arises whether there is a limit on the 
number of propositidns in a 'sensible' inferred path, Intuitively 
there is, but at present we have no formal insight. 
3.3 Thematicity. 
A theme is a complex of GPs, structurally indistinguishable from 
that used in characterizing abstract terms like 'mistake1, The potential 
presence of a theme is detected in the process of seeking GPs for DPs. 
All GPs, whether or not they are part of a thematic definition, can be 
located by paradigmatic searches; some GPs have additional structure 
indicating that they are components of themes. Tt is not sufficient to 
establi~h a theme for discourse by separately finding DPs that correspond 
to all the GPs of a theme. The thematic definition and the relevant 
?art of the discourse must be tested holistically to ensure that the 
correct coreferentialities exist among the propositions, 
3l4 Overview of Inference, 
- - 
There are two basic processes underlying inference. First there 
is the process of locating a GP given a DP. This is implemented essen- 
tially by a breadth-first search through the paradigmat~c structure of 
the encyclopedia. Secondly there is the process of matching a complex 
of proposil5ons in discourse against an encyclopedic complex. The 
latter process is qualitatively different as it involves tests for co- 
reference that the former does not. 
Complexes of propositions have obvious functiond similarities with 
'Paraplates ' (Wilks : 1975) , 'Scripts ' (Schank and Abelson : 1975) and 
'Frames1 (Minsky: 1975). Adding to the expanding terminology, our 
version known ' me talingual definitions'. 
Metalingual definitions serve to define abstract terms (Imistake1), 
themes (' tragedy1 ) and plans (used by Furugori (1974) in his robot 
planner). The distinctions are more terminological than substantive, 
their functions are interchangable; irl other cantexts a plan could be a 
theme, a theme an abstract term, etc. 
When an abstract concept has a metalingual definition, a matching 
discourse may be rewritten in terms of that concept. For example, 'buy' 
has such a definition, say 'person gives object to pexson , person 
1 2 2 
gives money to person To properly make the transduction to 'persan 
1- 2 
buys object from person , there must be a case frame for 'buy' linked 
1 
to concepts in its definition. A proposition produced by abstraction 
is structurally indistinguishable from a proposition that was in the 
iginal discourse, and can be subject encyclopedic process, 
including further abstraction. Conversely, if a proposition contains 
concept having a metalingual definition, then the proposition can 
be decomposed into a complex of propositions patterned on the definition, 
4. An Example. 
A schematic analysis of (6) shows the inference system in opexation, 
resultins in a structure that satisfies the criteria of coherence, 
At each step we will indicate the encyclopedic knowledge used in 
the inference, and the current state of the discourse. The original 
discourse propositions are indicated by 0 and inferred propositions 
Step 0, Initial State. 
Father 
0 drowns 
a 
Father 
rescue 
son 
Son in 
water 
Son 
falls 
-.----------I----..CICC--CIII----------.LIIIIIILCCCC-CC--------- 
Step 1, Fall causes injury. 
Father 
drowns 
e 
Father 
rescue 
son 
Y 
watex 
CAUSE 
Son 
falls 
Son 
injured 
Step 2. Injury causes inability to act, 
Father 
drowns 
0 
Father 
rescue 
son 
h Son in 
water 
fmUsE -+ CAUSE 
Son Son Son not able 
falls injured to act 
- - - -- 
Step 3, In water and not able to act causes rescue. 
e 
drowns 
Father 
res cue 
Son 
Son Son not able 
falls in j wed to act 
Conjunction is indicated by Part-whole relations. Note that a link to 
one of the original propositions has been established. 
--------------------------------------------------------------------- 
Step 4. To rescue someone who is in water it may be necessary to be 
in water. 
Father 
*owns 
CAUSE 
-., Father 
in 
h= +.v-ta- 
<. CAUSE T-w 
'II 
Father 
'. 
\, 
res cue 
&, son 
%A\ 
water 
s 
CAUSE CAUSE 
----,(--J - - ---- -=- --- --- - - 
Son Son Son not able 
falls injured ta act 
Step 5. Acting can make you we-. 
Father 
e drowns 
water 
'. 
CAUSE CAUSE 
Son Son 
Son not able 
falls injured 
to act 
Step 6. If weary then unable to act. 
Father 
a drowns 
CAUSE Father in 
CAUSE 
-- _--a CAUSE 
43, 
Father 
rescue 
Father Father  lot 
weary able to act 
water 
f 
, CAUSE 
-- - - -> A& . 
Son Son 
Son not able 
falls injured to act 
Step 7. If in water and not able to act then drown. 
CAUSE 
Father Father not 
rescue weary able to act 
Son 
f alLs 
Son 
injured 
Son not able 
to act 
A link to the final proposition of the discourse is made. Corefer- 
entiality conditions prevent 'son in water1 and 'Father not able to 
act1 conjoining to satisfy the conditions on this i-nference. 
Note that the antecedent condition on this inference is the same 
as at step 3. 
Both resultant situations are possible, and axe noted. 
The system can select either. 
However, the wrong choice does not lead 
to a connected structure, and a back up to the alternative has to be 
made. 
The discourse now has an inferred causal structure connecting all 
the original propositions. 
From a thematic analysis of drowning stories in general (Phillips: 
1975), the common theme can be described as 'giving a cause for the 
person being in ae water, and giving a cause for the victim not being 
able to act (thereby not being able to save himself)'. This theme fits 
the discourse by virtue of propositions a and 0, which stand in 
causal relations to 'being in the water' and 'not able to act' for the 
victim, The theme 'tragedy' is defined as 'someone does something good 
and dies as a result of this actiont. The father's rescue of his son and 
subsequent demise satisfy this theme (@ and ) For the story to 
be coherent, these themes must not overlap; in fact we see that the 
l drowning' theme is properly contained by ' tragedyt . 
5. Discussion. 
The analysis is so organized that the themes are determined in 
a bottom up manner, as are all generalized facts used in the analysis. 
Though not presently implemented, it should be possible to use potential 
themes, ones for which only some component propositions have been found, 
in a predictive manner, 
The complexes of propos'i tions , in rnetalingual definitions of themes 
and elsewhere, are really not that complex. The ones in the example 
contain only a few propositions. Each has only the essentials of the 
situation. The final structure arises from many small pieces of 
knowledge rather than from one monolithic aggregate. This seems to be 
a more natural organization, as each ~f the simpler structures can be 
freely applied in many contexts, rather than being bound to one situation. 
The discourse judgement is relative to the knowledge of the hearer. 
Whether the inferences are those intended by the author is another 
question. Ideally they should ber or differences should be unimportant. 
A misleading inference indicates poor writing by the author; he has 
misjudged the knowledge of his audience. 
Directing inferences on a discourse towards the goal of judging it 
coherent provides a normalized version of the discourse, if the process 
is successful. The normalized structure can form the basis for further 
proc~ssing: content analyis, stylistic analysis, etc. It may also 
provoke various questions, for example, we could ask if the inferences 
were correct; we have the 'rescue' situation applying to the father, but 
he wasn t rescued, why not, 
American Journal of Computational Linguistics Microfiche 35 : 50 
Yale University 
New Haven, Connecticut 06511 
ABSTRACT 
In understanding stories or natural-language discourse, 
hearers draw upon an enormous base of shared world knowledge 
about common situations like going to restaurants, theaters or 
supermarkets to help establish the needed context.  his paper 
presents an approach to the management of this type of knowledge 
based upon the concept of a situational script [Schank and 
Abelson, 19751. The application of scripts - in story 
understanding is illustrated via a computer model called SAM 
(Script Applier Mechanism) . 
In simple one-script stories, SAM constructs a trace 
through a preformed data structure containing the input, other 
events not mentioned but commonly assumed, the important 
The research described in this paper was supported in part 
by the Advanced Research Projects Agency of the Department of 
Defer.se and monitored under the Office of Naval Research under 
contract N00014-75-C-1111. 
inferences asmciated with the events, and the interconnecting 
causal links. In more complicated stories, SAM handles the 
invocation and closing of parallel, nested and sequential 
scripts. 
1.0 Introduction 
Natural-language processing research in recent years has 
increasingly focussed upon the modeling of human world knowledge 
and management of the resulting data base (I). This has come 
about largely because of the enormous problems encountered in the 
processing of texts, as opposed to single sentences, by 
traditional methods based upon syntactic analysis and low-level 
semantics, This state of affairs should not be surprising, since 
it is quite clear that people draw upon a huge store of shared, 
extra-linguistic world knowledge in understanding even the 
simplest stories or eng ag ing in the most rudimentary 
conversation, 
Much of the knowledge that hearers utilize to establish the 
background or context of a story appears to be episodic in 
hature, distilled from many experiences in common situations like 
going to restaurants, football games and supermarkets. This 
paper presents an approach to the representation and handling of 
this type of mundane world-knowledge based upon the concept of a 
situational script [Schank and Abelson, 19753. The application 
(1) See, for example, the emphasis on this area in 
 h he ore tical Issues in Natural Language Processing", Proceedings 
of the Interdisciplinary workshop in Computational Linguistics, 
10-13 June 1975, Cambridge, Mass. 
of scripts in story understanding will be illustrated via a 
0 
computer model called SAM (Script Appl ier Mechanism) devised for 
a PDP-10 computer at Yale University. 
1.1 Examples of Story Understanding in SAM 
Before attacking the various aspects of scripts and the 
particular activities carried. on by SAM'S script applier, it 
seems appropriate to give some examples of input and the various 
outputs that SAM produces. This will give the reader some 
feeling for the kinds of heavily script-based stories SAM can 
handle and the understanding tasks it performs. 
We present three different types of story input. The first 
story makes references to a common script , the restaurant script , 
then stops the script in midstream: 
Wnt to a restaurant. He sat down. He got mad. 
- 
He left. 
A 'long paraphrase' output generated by SAM for this story is as 
follows : 
John was hungry. He decided to go to a restaurant. 
He went to one, He sat down in a chair. A waiter 
did not go to the table. John became upset. He 
decided he was going to leave the restaurant. He 
left it. 
Next is a typical boring restaurant story conforming closely 
to expected scriptal data: 
Story I1 
John went to a restaurant. The hostess seated John. 
The hostess gave John a menu. John ordered a 
lobster. He was served quickly. He left a large 
tip. He left the restaurant. 
SAM has generated summary and question-answer ing output for this 
story: 
Summary 
John went to a restaurant and he ate a lobs,ter, 
Q: Why did the hostess give John a menu? 
A: So John could ask the waiter for a meal, 
Q: Why did John go to a restaurant? 
A: So John could eat a meal, 
Q: Did the waiter give John a menu? 
A: John got the menu from the hostess. 
Q: Why did John leave the waiter a large tip? 
A: Probably John remembered the waiter served a 
lobster to him quickly. 
Q: Di,d John pay the bill? 
A: John paid the check. 
The system has also provided long-paraphrase outputs for this 
story, not shown here, in both English and Chinese. 
The final example invokes several scripts, and calls up an 
unusual path in one script because of an odd occurrence in an 
earlier one: 
%%%% to New York by bus. On the bus he talked 
to an old lady. When he left the bus, he thanked the 
driver. He took the subway to ~eone's. On the 
subway his pocket was picked. He got off the train 
and entered ~eone's. He had some lasagna. When the 
check came, he discovered he couldn't pay. The 
management told him he would have to wash dishes. 
Long Paraphrase 
John went to a bus stop. He waited at it a few 
minutes , He entered- a bus. The driver got the 
ticket from John, He went to a seat. He sat down in 
it. While John was on the bus an old lady and John 
talked. Phe dtiver took John to New York. He went 
to the driver. while getting off the bus John 
thanked the driver. John got off it. 
He enteted a station. He put a token in the 
turnstile. He went to the platform. He waited at it 
a few minutes. He enteted a subway car. A thief 
went to John. The thief picked ~ohn's pocket. He 
went, John went to the seat. He sat down in it. 
The driver took John to ~eone's. He left the subway 
cat, He left the station. 
He entered ~eone's. He looked around inside it. 
He saw he could go to a table. He went to it. He 
sat down in the seat. He ordered some lasagna. The 
waiter indicated to the chef John would like him to 
prepare something. The chef prepared the lasagna. 
The waiter got it from the chef, The waiter went to 
the table, He served the lasagna to John. He ate 
it. He became full. 
He asked the waiter for the check. John got it 
from the waiter. John read the check. John 
discovered he was unable to pay the check. He 
indicated to the waiter he wayunable to pay the 
check. The management told John he would have to 
wash dishes. He entered the kitchen. He washed 
dishes. He left ~eone's. 
[paragraphing has been added to the 
computer output for ease of reading] 
In these example stories, SAM analyzes each input sentence 
into a Conceptual Dependency (CD) represen tat ion. If this 
representation fits a script, that script is called into memory 
and successive inputs are matched in the script and linked up by 
a SAM program called the script applier. The script applier 
output is processed by other SAM programs depending on the type 
of final output desired, and English or, for Story 11, Chinese is 
generated, The point to be stressed is that all the 
'understanding' processing is done on a single data structure, 
the story representation constructed by the script applier. We 
discuss in particular the scriptal data base, the script applier 
and the story representation in succeeding sections. Additional 
details on the other parts of SAM can be found in [Schank et al, 
C_ _L 
19753. 
2.0 Situational Scripts 
As implemented in SAM, a situational script is a network of 
CD patterns describing the major paths and turning points 
commonly understood by middle-class ~mericans to occur in 
stereotyped activitieS such as going to theaters, restaurants and 
supermarkets. The script idea is very similar to the 
independently developed 'fr me system' for story understanding 
described in [Charniak, 19751 , which is itself based loosely on 
the '£1 ame' concept [Minsky, 19741 currently4 used in vision 
research. 
The patterns provided in scripts are of two general kinds: 
events, which we will construe broadly as including states and 
state-changes (2) as well as mental and physical ACTS; and 
carnal relatkons among these events [Schank, 1973 and 19741. 
(2) Certain actions like driving a car or preparing food 
involve complex, learned sensory-motor skills as well as scr iptal 
knowledge. Such actions are summarized within a script as a 
causal relation terminating in the chief state-change effected by 
the action. For example, the sentence "The cook prepared the 
mealn is represented in LISP CD format as: 
( (CON ( (ACTOR (*COOK*) <=> (*DO*) ) ) 
LEADTO 
( (ACTOR (*MEAL*) LEAVING (*COOKSTATE* VAL (0) ) ) ) ) ) 
Patterns are used in scripts not only because of the variety of 
possible fillers for the roles in scripts, but also to constrain 
the amount of information needed to identify a story input. 
Thus, far example, the script provides a LISP CD template like: 
((ACTOR (X) <=> (*PTRANS*) OBJECT (X) TO (*'INSIDE* 
PART (RESWAURANT) ) ) ) 
to identify inputs like: 
John went into ~eone's. 
John walked into Leone's. 
John came into ~eone's from the subway. 
(X and RESTAURANT are dummy varisables).  his allows the script 
applier to ignore inessential features of an input (like the 
Instrument of the underlying ACT or the place John came from in 
the examples given above) , and thus provides a crude beginning 
for a theory of forgetting. 
In the present implementation, SAM possesses three 'regular' 
scripts, for riding a bus, for riding a subway, and for going to 
a restaurant (3). These scripts have been simplified in various 
ways. For example, all of them assume that there is only a 
single main actor. The bus script has been restricted to a 
single track' for a long-distance bus ride, aHd the restaurant 
script does not have a '~c~onald's' or a '~e ~avillon track. 
This was done primarily to have a data base capable of handling 
specific stories of interest available in a reasonable time, 
secondarily to limit the storage needed (4). Nevertheless, as 
(3) The data base also contains script-like structures for 
'weird ' or *unusual ' happenings like the main actor s becoming 
ill, or, as in Story 111, having his pocket picked. Such 
activities could be handled by a generalized inferencing program 
like the one described in [Rieger, 19751, 
the examples of Section 1.1 indicate, the current scripts are a 
re~sonable first pass at the dual problems of creating and 
managing this type of data structure, 
2.1 Goals,, Predictions and, Roles in Scr ip,ts 
Each situational script supplies a default y oal statement 
which is assumed, in the absence of input from higher level 
cognitive processes like 'planning ' [Schank and Abelson, 19751 , 
to be what a story referring to a script is about. The 
restaurant script for example, defines the INGEST and the 
resulting state-change in hunger as the central events of a story 
about eating in restaurants. Closely related to the goal 
statement is the sequence of mutual obligations that many scripts 
seem to entail. Invoking the bus script, for example, implies 
the contract between the rider and the bus management of a PTRANS 
to the desired location in return for the ATFWNS of the fare. 
Such obligations have a powerful influence on the predictions the 
system makes about new input. In the restaurant context, for 
example, an .input referring to an event beyond ordering or eating 
is not initially expected, because these events form the initial 
statement of obligation. Thus the system takes longer to 
identify a story sequence like: 
John went to a diner, He left a large tip. 
Once an input about ordering has been processed, SAM is prepared 
(4) The text for the restaurant script, presently the 
largest of the scripts, occupies roughly 100 blocks of PDP-10 
disk storage, or about 64,000 ASCII characters. 
to hear about the preparation and serving of food, actions 
associated with eating, or paying the bill, but not about leaving 
the restaurant. This is because the main actor has not fulfilled 
the other half of the oblSgation. 
The binding of nominals in the story input to appropriate 
fillers in the script templates is accomplished in SAM by means 
of script variables with ass~ciated features. In the rather 
cxude system of features preseptly used, each script variable is 
assigned a superset menibership class: e. g., a hamburger is a 
'food', whi$e a waiter is a 'humane- certain variables are also 
given roles: e. g., a hostess or a waiter can fill the 
'maitre'd'  ole. The former property would enable the system to 
distinguish between "The waiter brought Mary a hamburger" and 
"The waiter bropg-ht Mary the check". The latter property 
identifies important roles in script contexts, primarily those to 
which it is possible to make definite reference without previous 
iqtrodnction, like 'the driver', 'the cook' or 'the check'. For 
stories in which certain script variables are not bound, the 
system provides a set of default bindings for the roles not 
mentioned : thus, SAM fills in 'meal' for a story in which the 
food ordered is not explicitly named. Variables without 
distinguished roles default to an indefinite filler, like 
'someone' for the main actor. 
2.2 Script Structure 
Each SAM script is organized in a top-down manner as 
follows : into tracks, consisting of sceneg, which are in turn 
composed of subscenes. Each track of a script corresponds to a 
manifestation of the situation differing in minor features of the 
script roles, or in a different ordering of the scenes. So, for 
example, eating in an expensive restaurant and in ~c~onald's 
share recognizable seating, ordering , paying, etc., activities, 
but contrast in the price of the food, type of food served, 
number of restautant personnel, sequence of ordering and seating, 
and the like. Script scenes are organized aroond the main 
top-level acts, occurr ing in some definite sequence, that 
characterize a scriptal situation. The giving of presents, for 
example, would be a scene focus in a birthday party script, but 
putting on a party hat would not be. The latter would correspond 
to a subscene, perhaps within the 'prepar ing-to-celebrate ' scene 
of that script. In general, subscenes are organized around acts 
more or- less closely related to the main act of the scene, eitfier 
con'tributing a precondition for the main act, as walking to a 
table precedes sitting down; or resulting from the main act, as 
arriving at the desired location follows from the driver's act of 
driving the bus. An intuitive way of identifying scene foci and 
scene boundaries is to visualize a script network of interwoven 
paths. In such a network, the scene foci would correspond to 
points of maximum constrietion; scene boundaries to points of 
most constriction between foci. This essentially means that all 
paths thrpugh a scene go through the main act (except abort 
paths, discussed below) , and relatively few events are at scene 
edges. 
It is necessary, therefore, to distinguish certain events in 
a script: scripts, their tracks, scenes and subscenes all have 
f 
main*, 'initial' and 'final' events. For example, the main 
event of the 'ordering' event in a restaurant is the ordering act 
itself; an initial event is reading the menu; and a final event 
is the waiter telling the cook the order. Additionally, scripts 
and tracks have associated 'summaries', which refer to a script 
in general terms. Consider, for example, the following sentence 
from Story 111: "John went to New York by bus". This sentence 
is marked in the underlying meaning representation by the SAM 
analyzer as a summary because of the presence of: 
((ACTOR (*JOHN*) <=> (*SDO*) OBJECT ($BUS))) 
in the Instrument slot (5). Such sentences have two ccmmon 
functions in simple stories. They may indicate that a script was 
invoked and completed, and no further input should be expected 
for this instance of the script. This function of the summary 
of ten occurs with scripts (like those associated with .travelling) 
. 
which tend to be used as instruments* of other scripts (as in 
getting to a restaurant or store). Alternatively, they may 
signal that a wider range of possible next inputs is to be 
expected than would be predicted if the script were entered via 
an initial event. For example, the story sequence initiated with 
a summary: 
John took a train to New York. While leaving the 
train, he tipped the conductor. 
(5) The primitive ACT SDO is an extension of the primitive 
dummy CD ACT DO, and stands for an actor performing his script 
for a given situation, in this case the bus script ($BUS). 
sounds more natural than a sequence beginning with an initial 
event: 
John got on a train. while leaving the train, he 
tipped the conductor. 
These two functions of the summary contres,t widely in the range 
of predictions they invoke. However, additional inputs after a 
summary, as in the example above, often give the psychological 
Scenes are built up out of subscenes, which usually contain 
a single chunk of causal chain or 'path'. In SAM scripts, these 
paths are assigned a 'value* to indicate roughly their normality 
in the scrfptal context. Sever a1 pathvalues have been found 
useful in setting up the story representation. At one end of the 
normality range is 'default', which designates the path the 
sctipt applier takes through a scene when the input does not 
explicitly refer to it. For example, the input sequence: 
John went to Consiglio's. He ordered lasagna. 
makes no mention of ~ohn's sitting down, which would commonly be 
assumed in this situation. The system, following the default 
path, would fill in that John probably looked around inside the 
restaurant, saw an empty table, walked over to it, etc. Next on 
the normality scale is 'n~minal', designating paths which are 
usual in the actipt, not involving errors or obstructions in the 
normal flow of events. The sentences in Story I1 which refer to 
the hoetess are examples of nominal inputs. Finally, there are 
the 'interference/resolution' paths in a script. These are 
followed when an event occurs which blocks the normal functioning 
of the script. In a restaurant, for example, having to wait for 
a table is a, mild interference; its resolution occurs when one 
becomes available. More serious because it conflicts directly 
with the goal/obligation structure of the script is the main 
actor's discovery that he has no money to pay the bill. This is 
resolved in Story I11 by his doing dishes. An extreme example of 
an interference is the main actor's becoming irritated when a 
waiter fails to take his order, as in Story I, followed by his 
leaving the restaurant. When this happens, the script is said to 
have taken an 'abort' path. 
In addition to the above, certain incomplete paths, i. e- , 
paths having no direct consequences within the script, have been 
i,ncluded in the scriptal data base. The most important of these 
incomplete paths are the inferences from, and preconditions for, 
the events in the direct causal paths. Lumped under the 
pathvalue 'inference', these subsidiary events identify crucial 
resultative and enabling links which are useful in particular for 
question-answering [Lehnert, 19751. For example, the main path 
event '~ohn entered the train ' has attached the precondition that 
the train must have arrived at the platform, which in turn is 
given as a result of the driver's bringing the train to the 
station. Similarly, a result of the main path event '~ohn paid 
the bill' is that he has less money than previously. Both of 
these types of path amount to a selection among the vast number 
of inferences that could be made from the main path event by an 
inferencing mechanism like ~ieger's Conceptual Memory program 
[Rieger, 19751. 
A Special class of resul tative inferences' are those common 
events which are potentialized by main path events, though they 
may not occur in a given story. Labelled with the pathvalue 
'parallel', these events may either occur often in a specific 
context without having important cohsequences, as in "The waiter 
filled ~ohn's water glass"; or they may happen in almost any 
context without contributing much to the story, as in the 
sentence "On the bus, John talked to an old lady", from Story 
111. Since such parallel paths often lead nowhere*, they are 
good candidates for being forgotten. 
3.0 The Script Applier 
Construction of a story representation from CD input 
supplied by the SAM analyzer is the job of the script applier 
(6). Under control of the SAM executive, the applier locates 
each new input in its collection of situational scripts, links it 
up with what has gone before, and makes predictions about what is 
likely to happen next. Since the SAM system as a whole is 
$ntended to model human understanding of simple, script-like 
stories, the script applier organizes its output into a form 
suitable for subsequent summary, paraphrase and 
question-answer ing activities. 
In the course of fitting a new input into the story 
(6) The current version of the applier is programmed in 
MLISP/LISP 1.6 and runs in an 85K core image on a PDP-10 
computer. Processing of Story 111, the longest story attempted 
to date, took approximately 8 minutes with SAM as the single user 
of the timesharing system. 
representation, the applier performs several important subtasks. 
Identifying an input often requires an implicit job of reference 
specification. For example, in the sentence from Story I11 
beginning "When the check came. . ." , there is surface ambiguity, 
reflected in the parser's outpuC, regarding donor and recipient. 
This ambiguity is settled in the restaurant context.by the 
assumption that the recipient is the main actor and that the 
donor is a member of the restaurant staff, preferably the waiter. 
An allied problem arises whem the applier, in placing a new 
conceptualization in the story representation, determines the 
relevant time relations. Certain types of time data are computed 
from the output conceptualization itself : for example, the 
relation between an MTRANS and its MOBJECT, which may determine 
whether 'remember ' or 'ask for' is appropriate in the final 
output. Other time relations are defined by the causal structure 
of the script itself: thus 'eating ' follows 'ordering ' . 
More complex time-order computations have to be made when 
the applier identifies two or more 'simple' conceptualizations in 
a compound input derived from sentences containing ambiguous 
words like 'during' or 'when'. Examples of this were encountered 
during the processing of Story 111, for example, in the sentence 
'when he left the bus, he thanked the driver'. The system 
resolves this compound input into the plausible sequence of a 
PTRANS to the driver, the MTRANS of the 'thanking , and the 
PTRANS off the bus. 
3.1 Story ~epresentation 
The output of the script applier consists of linked story 
segments, one per script invoked, giving the particular script 
paths traversed by the input story. The backbone of the story 
representation is the eventlist of aJ1 the acts and state-changes 
that took place. The eventlist is doubly linked, causally and 
temporally, with the type of causation and time relatiohs filled 
in within a story segment by the applier . 
Attached to the eventlist are the appropr iate , instantiated 
preconditions, inferences and parallel events for each main path 
event, As discussed above, the inferences and preconditions have 
been selected for their expected utility in question-answer ing . 
Each story segment is identified by a label which gives 
access to important properties of the segment: what script it 
came from; what the particulars were of the script summary, 
maincon, entrycon, and exitcon this time through; and what 
interf erence/resolution cycles were encountered. Additionally, 
pointers are provided to extra-scriptal 'weird ' events that 
happened in the story. At the top, the global identifier STORY 
gives the gross structure of the story in terms of sequential, 
parallel and nested scripts and the weird things. This 
hierarchical organization facilitates summary and short 
paraphrase processing, while retaining the fine structure needed 
for extended paraphrasing and question-answer ing . 
Story I11 illustrates most of the present capabilities of 
the SAM script applier in story understanding. The applier 
accepts a CD representation of the nine sentences in turn from 
the analyzer and builds an eventlist consisting of 56 main path 
conceptualizations and 39 associated preconditions/inEerences. 
The 'parallel' events of John talking to the old lady and the bus 
driver also appear in the eventlist. The eventlist is divided 
into four story segments, one each for the bus, subway and 
restaurant scripts and one for the 'weird' robbery event. The 
identifier for the subway segment is marked as containing the 
weird event, as is the global STORY. The restaurant segment 
contains the interference/resolution pair 'unable to pay/wash 
dishes'. Additionally, the lack of money encountered during the 
paying scene was checked with the SAM executive during the 
processing of Story 111, since it violates one of the prime 
preconditions of the restaurant script. Since the executive 
found that the loss of money was a Consequence of the stealing 
event that oqcurred earlier, this event is not marked as weird. 
Appropriate summaries are provided for each story segment. At 
the top, STORY contains the information that the four segments 
are organized as a sequence of bus, subway and restaurant, with 
the pickpocket event nested inside the subway segment. 
4.0 Future Work 
As the examples show, SAM is capable of handling fairly 
complex stories in its present state of development. However, 
several extensions and additions to the scriptal data base and 
the script applier appear to be needed before SAM can achieve its 
ultimate potential. 
First, a more flexible method of pattern-matching is 
required so that the full diversity of input role-fillers can be 
accommodated. A method of comparing features of nominals in the 
parser output to the appropriate script variables is needed so 
that over- or underspecified inputs can be correctly identified. 
For example, the applier should be able to recognize the phrase 
'the restaurant' as a partially specified instance of '~eone's' , 
found earlier. 
4s an extension of this, input conceptualizations of a 
descriptive nature (e. g., "The restaurant was of red brick") 
need to be processed in a way that allows the system to update 
. 
its image' of the role-fillers in a script. The facilities 
needed are similar to those provided by the 'occurrence set' in 
~ieger's Conceptual Memory program [Rieger, 19751. 
The most important problem to be faced, however, is the 
generalization of the story representation to handle stories with 
beveral main actors, or with non-synchronous events. It is clear 
that the simple linear eventlist structure described in Section 
3.1 would not be adequate for even such a simple story sequence 
as: 
"The cook made the lasagna, Meanwhile the wine 
steward poured the wine, *I 
4.1 Acknowledgement 
The programs discussed here are only a part of the SAM 
system, and a great deal of credit is due to my co-workers in the 
68 
Yale A1 Project: to Professors Roger Schank and Bob Abelson for 
the theory on which SAM is based and for their overall guidance; 
to Dr, Chris Riesbeck for valuable discussion and criticism, as 
well as a substantial part of the programming effort; and to 
Gerry DeJong , Leila Habib, Wendy Lehner t , Jim Meehan , Dick 
Proudfoot, Wally Stutzman and Bob Wilensky. 
Department of Computer and Information Science 
The Moore School of Electrical Engineering 
University of Pennsylvania 
Philadelphia 19174 
ABSTRACT 
A system has been designed to translate connected sequences of visual 
images of physical activities into conceptual descriptions. The representation 
of such activities is based on a canonical verb of motion so that the con- 
ceptual description will be compatible with semantic networks in natural 
language understanding systems. A case structure is described which is 
derived from the kinds of information obtainable in image data. A possible 
solution is presented to the problem of segmenting the temporal information 
st ream into linguistically and physically meaningful events. An example 
is given for a simple scenario, showing part of the derivation of the lowest 
level events. The results of applying certain condensatiom to these events 
show how details can be systematically eliminated to produce simpler, more 
general, and hence shorter, descriptions. 
This research was primarily supported by Canadian Defense Research 
Board grant 9820- 1 1, and partially by National Science Foundation grant 
If we view a motion picture such as illustrated in Figure 1, we are able 
to give a description of the physical activities in the scenario. 
This des- 
cription is linguistic in the sense that the words used express our recognition 
of objects and movements as conceptual entities. A system for 
performing a sizeable part of this transformation of visual data into con- 
ceptual descriptions has been designed. It is described in Badler (1975); 
here we will present one small part of the system which is 
concerned with the organization of abstracted data from successive images 
of the scenario. 
We are interested in a possible solution to the following problem: 
Given 
that a conceptual description of a scenario is to be generated, how is it 
decided where one verb instance starts and another ends? In other words, 
we seek computational criteria which separate visual experience into 
discrete "chunks" or events. By organizing the representation of an event 
into a case structure for a canonical motion verb, events can be described 
in linguistic terms. Verbs of motion have been investigated directly or 
indirectly by Miller (1972). Hendrix et aL lt 7 3a, 197 3b). Martin (1973). and 
Schank (1973); semantic databases using variants of case structure verb 
representations Wllmore(1968)) include Winograd (197 Z), Rumelhart et a1 
(197 2), and Simmons (197 3). 
We are concerned with physical movements of rigid or jointed objects 
so that motions may be restricted to translations and rotations. 
Objects may 
appear or disappear and the observer is free to move about. 
The resulting 
activities are combinations of the se where observer motions are factored 
out if at all possible. We assume that the scenarios contain recognizable 
objects exhibiting physically possible, and preferably natural, motions. 
A particular activity might consist of a single event, a sequence of events, 
sets of event sequences, or hierarchic organizations of events. 
The concept 
of "walking" is a good example of the last. 
Events are the basic building blocks 
of the conceptual description, and our events indicate the motion. of objects. 
The interpretation of motion in terms of causal relationships is generally 
Figure 1. The mving car scenario 
Table 1 
Adverbials 
Relationships 
be-tween the orientation and trajectory 
or axis of an object 
between the trajectory of an object 
and fixed world directions - 
changing between objects 
Set of Conce~ts 
WC-, FORWARD, SIDEWAYS 
AROUND, OVER,CLOCKWISE, 
COUNTERCLOCKWISE 
DOFJN(W) ,UP(wARD) ,NORTHWARD 
SOUTHWARD. EASTWARD .WESTWARD 
ACROSS ,AGAINST ,ALONG ,APART, 
AROUND ,AM ,AMY -FROM, 
BEHIND,BY,F'ROM,IN,rnO,OFF, 
OW-OF,ON,ONTO,OUT,OUT-OF, 
OVER,THROUGH,TO ,TOGETHER, 
UNDm 
AWY-F'ROM,IN-THE-DIRECTION-OF 
IN(WARD) ,QUT(WARD) ,TOWARD 
4 
5 
6 
indicative of source and target 
between the path of an object and 
other (mving ) objects 
between an event and a previous 
event 
AFTER, AHEAD-OF,ALONG,APART 
TOGEmER,WITH 
BACK-AMTFOm ,TO-AND-FRO, 
UP-AND-DOWN BACK. THROUGH 
beyond the scope of the current system, although a semantic inference com- 
ponent could be included. Our descriptions consist mostly of observation 
of motion in context rather than explanation of why motion occurred. 
The general descriptive methodology is to keep only one static relational 
description of the scenario, that of the current image. Changes between 
it and the next sequential image are described by storing the names of 
changes in event nodes in a semantic network. In general, names of 
changes correspond to adverbs or prepositions (adverbials) describing 
directions or changing static relationships. Computational definitions for 
the set of adverbials in Table 1 appear in Badler (1975). We are only con- 
cerned with the senses of the adverbials pertaining to movement. Definitions 
arel implemented as demons: procedures which are activated, the executed, 
by the successive appearance of certain assertions in the image description 
or current conceptual database. These demons are related to those of 
Charniak (1972), although our use of them, their numbers, and their 
organization are simplified and restricted. They are used to recognize or 
classify properties or changes and to generate the hierarchic descriptive 
structure. An essential feature of this methodology is that the descriptions 
are continually condensed by this change abstraction process; descriptions 
grow in depth rather than length. 
The semantic information stored for each object in the scenario 
includes its TYPE, structural SUB-PARTS, VISIBILITY, MOBILITY, LOCATION 
ORIENTATION, and SIZE. 
Most of these properties are determined from 
the image sequence, but some are stored in object models (indexed by TYPE) 
in the semantic network, 
The event8 are also nodes in the semantic network. Each object is 
potentially the SUBJECT of an event node. A sequence of event nodes forms 
a history of movement of an object; only the latest node in the sequence is 
active, The set of active event nodes describes the current events in the 
scenario seen so far. The cases of the event node along with their approximate 
definitions follow. 
74 
SUBJECT: An object which is exhibiting movement. 
AGENT: A motile object which contacts the SUBJECT. 
INSTRUMENT: A moving object which contacts the SUBJECT. 
REFERENCE: A pair of object features (on a fixed object) which are 
used to fix absolute directions independent of the observer's position. 
DIRECTION: A temporally-ordered list of adverbials and their associated 
objects which apply to this SUBJECT. 
TRAJECTORY: The spatial direction of a location change of the SUBJECT. 
VELOCITY: The approximate magnitude of the velocity of the SUBJECT 
along the TRAJECTORY; it includes a RATES list containing STARTS, 
STOPS and (optionally) INCREASES or DECREASES. 
AXIS: The spatial direction of an axis of an orientation change (rotation) 
of the SUBJECT. 
ANGULAR-VELOCITY: Similar to VE MCITY, except for rotation about 
the AXIS. 
NEXT: The temporal successor event node having the same SUBJECT. 
STARTITIME: The time of the onset of the event. 
END-TIME: The time of the termination of the event. 
REPEAT-PATH: A list of event nodes which form a repeating sequence. 
These cases differ from Miller's (1972) primarily in the lack of a "permissive" 
case and our separation of the TRAJECTORY and AXIS cases. 
REFERENCE 
is new; one of its uses is to resolve descriptions of the same event from 
different viewpoints. The explicit times could be replaced by temporal 
relations. Miller's reflexive/objective distinction is not needed as each 
moving object has its own event nodes, regardless of the AGENT. 
A few necessary definitions follow before the presentation of the event 
generation algorithm. 
A.null event node has all its cases NIL or zero except START-TIME, 
END-TW, and perhaps NEXT. 
An event node is terminated when it has a non- NIL NEXT value. 
The function CREATE-EVENT-NODE (property pairs) creates an event 
node with the indicated case values, returning the node as a result. 
To compare successive values of numerical properties , a queue is 
associated with the case in current event nodes only. The front of the queue 
is represented by 'I*": the place where new information is stored. The 
queues have length three; the three positions will be referenc ed by prefixing 
75 
the case name with either "NEW", "CURRENT", or 
A function 
SHIFT manipulates property queues when they retpire updating: 
LAST-property: = CURRENT -property; 
CURRENT-property: = NEW -property; 
NEW-property: = $8 
The time will be abbreviated by TN and TL, 
For a particular event node E: 
TN: = IV3W-END-TIMII: (E); 
TC: = CURrnNT-END-TIME (E); 
Thus TN is always equal to the present image time. 
Now we can present the algorithm for the demon which controls the con- 
struction of the entlre event graph. It is executed once for each image when 
all lower level demons have finished; it creates, terminates, or updates each 
current event node. 
A. 1. Creating event nodes. 
A 1 1. An event node E is created when a mobile object first becomes 
visible and identifiable as an object. 
E: = CmATE-EVENT-NODE((SUB JECT object-node) 
(VELOCITY(* 0. 0. )) 
(ANGULAR-VELOCITY (4' 0. 0. )) 
(START -TIME NIL) 
(END-TIME (* TN TN)) ). 
The NIL START-TIME has the interpretation that we do not know what 
was happening to this object prior to time TN. 
A. 1.2. An event node E is created when a jointed part of the parent 
- 
object with current event node EP is first observed to move relative to the 
parent, for example, an arm relative to a person's body. 
TC: = CURRENT-END-TIME(EP); 
E : = CREATE -EVENT -NODE( (SUBJECT object-part-node) 
(AGENT parent- object-node) 
( INSTRUMENT joint-node) 
(REFERENCE . . . ) 
( DIlsECTION . . .) 
(TRAJECTORY , . . ) 
(VELO-CITY . . .) 
(AXIS . . , ) 
(ANGULAR-VELOCITY . . .) 
(START -TIME T C) 
(END-TIME (TN TC TC)) ). 
This is interpreted as the parent object moving the part using the joint as 
76 
the "instrament". Any appfopriate attributes are placed in the NEW -property 
positions. The node E is then immediately terminated (A. 1.3). 
A. 1.3, An event node E2 is created whenever another event node El 
- 
is terminated. 
TC: = CURRENT-END-TIME(E 1); 
NEXT(E1): = CREATE-EVENT-NODE( 
(SUBJECT.. .) 
(AGENT. , . ) 
( INSTRUMENT.. . ) 
(REFERENCE.. . ) 
(DIIIE;CTION,. , ) 
(TRAJECTORY SHLFT'(TRAJECT'0RY (E 1))) 
(VE MCITY SHIFT(VELOC1TY (E 1))) 
(AXIS SHLFT(AXIS(E I))) 
(ANGULAR-VELOCITY SHIFT(ANGULAR- 
VELOCITY (E 1))) 
(START-TIME TC) 
(END-TIME SHIFT(END-TIME(E 1))); 
E2: = NEXT(E 1). 
SUBJECT, AGENT, INSTRUMENT, REFERENCE, and DJRECTION are those 
which were present at termination of the previous node, subject to any 
additional conditions that changes in these may require. 
A. 2. 
Terminating event nodes. An event node E is terminated when 
- 
there are significant changes in its properties. All queue structures are 
deleted. 
END-TIME(E): = CURRENT-END-TIME(E); 
TRAJECTORY (E): =; CURELENT-TRAJECTORY (E); 
AXIS(E): = CURRENT-AXIS(E); 
VELOCITY(E): = (CURRENT-VELX)CITY(E) RATES(VEL0CITY (E))); 
ANGULAR-VELOCITY (E): = (CURRENT-ANGULAR-VELOCITY (E) 
RATES(ANGULAR-VELOCITY (E))). 
The DIRECTION list is unaltered except that the terminating adverbial (s) may 
be added to DIRECTION(E) rather than to DIRECTION(NEXT(E)) 
(see 
A. 2.1. Changes in SUBJECT. The assumptions of object rigidity and 
permanence preclude changes in an object. 
A. 2.21 3. Changes in AGENT and INSTRUmNT. These must be 
preceded by changes in CONTACT relations between objects and the SUBJECT. 
See A, 2.5 on DIRECTION. 
A. 2.4. Changes in REFERENCE. A change in the REFERENCE features 
forces termination of every event node referencing those features, as such 
changes are usually caused by spatial or temporal discontinuities in the 
scenario. 
A. 2.5. Changes in DWCTION. 
Changes in type (I) adverbials must be preceded by changes in TRAJECTORY, 
VELOCITY, AXIS, or ANGULAR-VELOCITY, because a relationship between 
an orientation and a TRAJECTORY or AXIS cannot change without at least 
one of the four cases changing. Changes in BACKWARD, FORWARD, and 
SIDEWAYS cause termination; this may occur with no orientation change 
if the TRAJECTORY has a non-zero derivative. For example, move a box 
in a circle while keeping its orientation constant. 
Changes in type (2) adverbials must be preceded by a change in TRAJECTORY, 
but some of these changes may be too slight to cause termination from the 
TRAJECTORY criteria. (A. 2.6. ). Changes from UP to DOWN or vice versa 
are the only ones in this group causing termination. 
Changes in type (3) adverbials terminate event nodes if and only if there 
is a change in a CONTACT relation or a VISIBILITY property, 
If the 
CONTACT is made or the VISIBILITY established, the adverbial goes into 
the new node's DIRECTION list. If the CONTACT is broken or VISIBILITY 
lost, the adverbial remains on the front of the terminated node's DIRECTION 
list. 
Since the type (4) adverbials are only indicators of current source and 
target, these do not change unless the path of the SUBJECT changes or 
the target object moves. Therefore no terminations arise from this group. 
The type (5) adverbials relate paths of the SUBJECT to other objects. 
They cause termination when they come into effect, and terminate their 
own nodes when they cease to describe the path. 
The tme (6) adverbials include higher level events and the basic 
repetitions. These all terminate the current event node. The repeated 
events (for example, BACK-AND -FORTH) are terminated when the 
repetition appears to cease. 
A. 2.6. Changes in TRAJECTORY. The changes in TRAJECTORY 
that are mas t important are those which change its derivative significantly. 
A change in the derivative from or to zero can be used (the start or end 
of a turn), but only the start is actually used for termination. Once the 
turn is begun, how it ends is unimportant since the final (current) tra- 
jectory is always saved. 
The other termination case watches for a momentarily large derivative 
which settles back to smaller values. This indicates a probable collision. 
It is of crucial importance in inferring CONTACT relations between objects 
when none were (or could be) directly observed. 
A. 2.7. 
Changes in VELOCITY. A change in VELOCITY from zero to 
a positive value (from a positive value to zero) terminates the current 
event node and enters STARTS (STOPS) in the new node's (old node's) 
VELOCITY RATES list, 
A. 2.8. 
Changes in AXIS. A reversal of rotation terminates the event 
node. 
This corresponds to a change in AXIS to the opposite direction, with 
no inte rrnediate values. 
A. 2.9. Changes in ANGULAR-VELOCITY, A change in ANGULAR- 
VELOCITY from zero to a positive value (from a positive value to zero) terminate 
the current event node and enters STARTS (STOPS) in the new node s (old 
node's) ANGULAR-VE LOCITY RATES list. 
A. 2.10. 
Changes in NEXT are not meaningful. 
A. 2.11112. 
Changes in START-TIME and END -TIME are not meaningful. 
A, 2.13. 
Changes in REPEAT-PATH. When new data fails to match 
the appropriate sub-event node of a REPEAT -PATH event node E, E is 
terminated. The definition of "match" for the basic repetitions appears 
in Badler (1975). The problem, in general, remains open. See, for example, 
Becker (1973). 
A.3, 
Maintaining event nodes. If the new assertions do not cause 
termination of the event node, the property queues are merely shifted: 
TRAJECTORY(E): = SHLFT(TRAJECT0RY (E)); 
VEfX)CITY(E): = SHIFT(VELOCITY(E)); 
AXIS(E): = SHIFT(AXIS(E)); 
ANGULAR-VELOCITY (E) : = SHIFT(ANGULAR-VELOCITY (E)); 
END-TIME(E): = SHIFT(END-TIME(E)). 
What does an event mean? This algorithm motivates a theorem that 
the events generated are the finest meaningful partition of the movements 
in the image sequence into distinct activities. The hypothesis of the 
assertion ie the natural environment being observed and the linguistically- 
based conceptual description desired, The conclusion is that an event node 
produced from this algorithm describes either the lack of motion or else 
an unimpeded, simple linear or smoothly curving (or rotating) motion of 
the SUBJECT with no CONTACT changes. In addition, the orientation of 
the SUBJECT does not change much with respect to the trajectory. The 
proof of this assertion follows directly from the choice of termination 
conditions. 
We will apply this algorithm to data obtained from each of the images 
in Figure 1. The lower front edge of the house is arbitrarily chosen as 
the REFERENCE feature; NORTH is toward the right of each image. We 
will not discuss the computation of the static relations from each image, 
only list in Table 2 the changes in the static description from irnage-to- 
image. Trajectory and rotation data are omitted for simplicity, although 
changes of significance are indicated. 
If we "write out" the event node sequence using the canonical motion 
verbs MOVES and TURNS with the adverbial phrases from the RATES 
and DIRECTION lists, we obtain the following lengthy, but accurate. 
description: 
C. 1 There is a CAR, 
C. 2 The CAR STARTS MOVING TOWARD the OBSERVER and EASTWARD, 
then ONTO the ROAD. 
C. 3 The CAR, while GOING FORWARD, STARTS TURNING, MOVES 
TOWARD the OBSERVER and EASTWARD, then NORTHWARD-AND- 
EASTWARD, then FROM the DRIVEWAY and OUT -OF the 
DRWEWAY, then OFF-OF the DRIVEWAY, 
Table 2 80 
Selected assertions and changes involved in the description of Figure 1. 
I 1 1 static 
The Action Asser-t:ion 
Event 
Assertion 1 Result 
- -- 
1 ADD 
ADD 
ADD 
ADD 
ADD 
ADD 
ADD 
ADD 
ADD 
ADD 
IN-FRONT-OF(CAR OBSERVER) 
IN-BACK-OF(CAR HOUSE) 
RIGHT-OF ( CAR HOUSE 
NEAR-TO ( CAR HOUSE ) 
SURROUNDED-BY (CAR DRIVENAY ) 
EFT-OF ( CAR DRIVEWAY) 
IN-BACK-OF(CAR DRIVEWAY) 
RIGHT-OF(CAR DRIVEMAY) 
AT(CAR DRIVENAY) 
SUPPORTED-BY (CAR DRIVEWAY) 
IN-WCK-OF(CAR MOUSE) 
- 
5 DEm IN-BACK-OF(W DRIVENAY) 
ADD SUPPORTED-BY (CAR ROAD 
ADD IN-FRONT-OF(W DRIVEWAY) 
( STARTS ) 
ms1m 
TOWARD OBSEXVER 
(A.2.7.) 
-- 
-- 
1 
ADD IN-FRONT-OF ( CAR HOl 1 
TRAJECTORY 
change 
ONTO ROAD 
ANGULAR-VELOCITY 
( STARTS 1 
NORTHWARD-AND- 
EASTWARD 
terminate C2 
(A.2.6.) 
t&ate C2 
(A.2.5.) 
terminate C2 
(A.2.9.) 
7 DELETE LEFT-OF(CAR DRIVEMAY) 
DELETE 
SURROUNDED-BY ( CAR DIIIVEWAY ) 
DELETE AT(CAR DRIVEWAY) 
ADD NEAR-TO (CAR DRIVEWAY) 
OUT-OF 
DEL;ETE SUPPORTED-BYCCAR DRIVEWAY) 
DRIVEWAY 
FROM DRIVEWAY 
FORWARD 
OFF-OF 
D~VEWAY 
-- 
-- 
terminate C3 
(A.2.5.) 
- -- - -- - - - 
DELETE NEAR-TO (CAR DRIVEWAY) 
ADD EFT-OF(CAR HOUSE) 
ADD FAR-FROM(CAR DF3VEWAY) 
NORTHWARD I -- 1 
12 DEm NEAR-TO(CAR HOUSE) 
ADD FAR-FROM(CAR HOUSE) 
AROUND HOUSE 
AWAY-FROM 
AWAY-FROM 
( STOPS 1 (A.2.9.) 
-- 
-- 
P5 
DELEIF, VISIBILITY(CAR VISIBLE) AWAY 
Notes: 
Relations with HOUSE use the house front orientation, not the 
observer's front. 
Termination of Ci creates Ci+l by A.1.3. 
C. 4 The GAR, while GOING FORWARD, MOVES N0RTHW.AR.D-AND- 
EASTWARD, then NORTHWARD, then AROUND the HOUSE and 
AWAY-FROM the DRIVEWAY, then AWAY -FROM the HOUSE and 
S'I'OPS TURNING, 
C. 5 The CAR, while GOING FORWARD, MOVES NORTHWARD, then 
AWAY. 
The canonical form follows easily from the case representation and the 
DIRECTION list orderings. The directional adverbials FORWARD, 
BACKWARD and SIDEWAYS are interpreted as lasting the duration of the 
event, hence are written as "while GOING.. . " clauses. STARTS is always 
interpreted at the beginning of the sentence, STOPS at the end. The 
termination conditions assure its correctness, 
There is much redundancy in this description, but it is only the lowest 
level, after all, and many activities span several events. Two sets of 
condensations are applied by demons that watch over terminated event nodes. 
The first set is mostly concerned with interpreting certain null events 
caused by the image sampling rate and removing trajectory changes 
which prove to be insignificant. The second set of demons removes adverbials 
referring to directions in the support plane, removes RATES terms except 
STOPS, and generalizes redundant adverbials referring to the same object. 
The result of applying these condensations is: 
C.2 The CAR MOVES TOWARD the OBSERVER, then ONTO the ROAD. 
C. 3 The CAR, while GOING FORWARD, MOVES TOWARD the 
OBSERVER, then FROM the DRIVEWAY. 
C.4 The CAR, while GOING FORWARD, MOVES AROUND the HOUSE 
and AWAY-FROM the DRIVEWAY, then AWAY-FROM the HOUSE, 
then STOPS TURNING. 
C. 5 The CAR, while GOING FORWARD, MOVES AWAY. 
Another condensation can be applied for the sake of less redundant output. 
It does not, however, permanently affect the database: 
The CAR MOVES TOWARD the OBSERVER, then ONTO the ROAD, while 
GOING FORWARD, then FROM the DRIVEWAY, then AROUND the 
HOUSE, then AWAY-FROM the HOUSE, then STOPS TURNING, then 
MOVES AWAY. 
Note that FROM the DRIVEWAY follows ONTQ the ROAD. This is due to 
the pictorial configuration: the car is on the road before it leaves the 
driveway. The position of the "while GOING FORWARD" phrase could be 
shifted backwards in time to the beginning of the translatory motion, but 
this may be risky in general. We will leave it where it is, since this is 
primarily a higher level linguistic matter. 
By applying demons which recognize instances of specific motion 
verbs to the individual event nodes, then condensing as above, we get: 
The CAR APPROACHES, then MOVES ONTO the ROAD, then LEAVES 
the DRIVEWAY, than TURNS AROUND the HOUSE, then DRIVES 
AWAY -FROM the HOUSE, then STOPS TURNING, then DRIVES AWAY. 
The major awkwardness with this last description is that it relates the 
car to every other object in the scene. Normally one object or another 
would be the focus of attention and statements would be made regarding 
its role. Such manipulations of the descriptions are yet unclear. 
In conclusion, we have outlined a small part of a system designed to 
translate sequences of images into linguistic semantic structures. Space 
permitted us only one example, but the method also yields descriptions 
for scenarios containing observer movement and jointed objects (such as 
walking persons). The availability of low level data has significantly 
shaped the definitions of the adverbials and motion verbs. Further work 
on these definitions, especially motion verbs, is anticipated. We expect 
that the integration of vision and language systems will benefit both domains 
by sharing in the specification of representational stmctures and description 
processes. 
American Journal of Computational Linguistics 
Microfiche 35 : 84 
JUDY ANNE KEGL AND NANCY CHINCHOR 
Department of Linguistics 
Uniwersi ty of Massachusetts 
Amherst 01002 
ABSTRACT 
This paper is a justification for the use of frame analysis as a linguis- 
tic theory of American Sign Language. We give examples to illustrate how 
frame analysis captures many of the important features of ASL. 
0. l ntroduct ion 
From a linguistic standpoint, we are interested in language processing 
systems for the elainis that they make about language in general. Our- inter- 
ests in those clairris leads us to exanline what inipl icaf ions they may have for 
the analysis of languages other than English. The data from American Sign 
Language (ASL) is important because it is indicative of the way people per- 
ceive and represent events. This linguistic data requires careful analysis 
and much psychological insight before it can be used as evidence for any par- 
ticular theory of representation of visual knowledge of events. We have 
tried to bring together some ideas from artificial intelligence, linguistics, 
and psycholinguistics in order to analyze the data from ASL. 
The major framework we have adopted from At is that of frames. Minsky's 
introduction of frames as a way of representing knowledge and the further 
formulations of frames and related notions by Winograd and Fillmore form the 
bases for our frame analysis. We rely heavily on the work done by psycholin- 
guists on visual perception as a justification for using frame analysis. 
Further just if icat ion comes as a resul t of the work of l inguists and psycho- 
linguists on ASL and the visual perception of the deaf. 
The two most direct sources for our analysis of ASL are Reid (1974) and 
Thompson (1975). Reid's paper presents a clear and useful distinct ion between 
the linguistic level of the sentence and the conceptual level of the image. 
The sentence is a generalization and the image is an instantiation of that 
geheralization. However, "the units in a sentence are not just realized as 
'parts' of a whole represented in the image by the individual participants, 
rather these units act reciprocally to determine jointly the character of the 
related participants and to unite them into a system of dependencies." At 
the level of the sentence the verb is all-important because it governs the re- 
lations that exist between the nouns. However, it has no direct representation 
in the image; it is merely embodied in the structure of the image. Thompson's 
paper gives guidelines for using frames in linguistic analysis. His defini- 
tions of key concepts and his examples of frames for English have been a 
model for our analysis. 
1. American S ign Language 
ASL is *e language of many deaf people in the US. There is a continuum 
encompass4ng the many version of several sign, systems. ASL is a manual lan- 
guage composed of signs, fingerspelling, and occasional initialization of 
signs. It is in no way a signed version of English but is rather an indepen- 
dent language as different from English as is French or Japanese. 
ASL is a visual language. This visual modality allows it not only a tem- 
poral but also a multidimensional spatial framework as well as freedom from 
many of the constraints nermally put on a linear language. Many 'spatial rela- 
tions can be preserved in minllture in what has been referred to in the sign 
literature as a visual analog. For example,  he sentence, 'Fred stood in 
front of Harry,' does not necessitate a linear description, It can be repre- 
sented by the indexicalized marker for FRED being positioned in the signlng 
space in front of the one for HARRY. It is with respect to the specification 
of location and the use of deictic elements that sign most clearly distin- 
guishes itself from spoken languages. This and other related problems in sign 
will be examined later in this paper. Focusing on the aspects of visual ana- 
log and deixis does not imply that sign does not employ many of the linear 
and temporal devices used in spoken languages, but rather that these devices 
serve different functions. 
ASL is linearly ordered with respect to a standard method for presenting 
a scenario. The order of presentation is usually ground, then figures, then 
the action or relation involved. A room would be specified, then a door, 
then relevant furniture, then participants in an action. Generally,signs are 
presented in such a way as to allow further reference to them even if this 
referencing was nat Intended when the element was introduced into the dis- 
course. 
A relational grammar (~erlmutter and postal) can be useful in describing 
ASL. Their grammar focuses on the relations of various participants in an 
actioh to the verb. The notion of subject can be related to what Friedman 
calls the Agent (AGENT-PATI ENT) or what Reid cal l s the causer (CAUSER-AFFECTED 
ELEMENT-RANGE) . The Agent or causer shows up in sign as the active participant, 
the pa:ient as the usually stationary participant being acted upon. As in re- 
lational grammar, these relations are based upon observational properties of 
the terms with respect to the verb. The relati~nal model is attractive be- 
cause it does not force one to specify the syntactic form of the sentence 
through a rigid ordering or tree structure. 
Even more flexible is a frame analysis model which allows one to speak 
in terms of a scene or visual image. Proximal relations can then be preserved 
without translation into any linear forms. The frames approach emphasizes an 
important aspect so often repeated in descriptions of ASL. What one is doing 
is building a picture -- a scene. The signer is always thinking in terms of 
the picture he is presenting. He is trying to produce a miniat~lre character- 
ization of a real event. When elements of the event are present and within 
access for him to refer to in his discourse, he will use them. For example, 
he will point to an actual person rather than producing an arbitrary grammat- 
ical index to refer to that person. Describing sign language through frames 
allows-one to stress the visual picture being presented. It allows also for 
the smooth integration of other communication conventions used within the 
speech act. For exarnpte, if mime is found to be more explicit than the use 
of conventionalized ASL forms, it can easily be incorporated into the dis- 
course making the total presentation a more direct representation of the 
event. 
2. Visual Logic 
Boyes (1972) gives various arguments based on visual perception experi- 
ments for analyzing sign in terns of visual logic. By 'visual logic,' she 
means a system of rules simi lar to the rules 
people use to make sense of any 
visual experience. In the next section we show that frame analysis can be con- 
sidered an appropriate visual logic for sign language. First we would like 
to present the basic arguments from Boyes (1972) for using visual logic since 
these arguments also support the use of frarne analysis. 
There are three major resul ts of visual percept ion experimentation whi ch 
Boyes cites in order to begin a study of the constraints that the visual mode 
puts on a sign language. These results all show the limitations of visual 
memory as compared to aud i tory menwry. These memory processes can each be 
divided into the same three Stages. First, there is the in'itial storage of 
the stimulus which is identical to the actual stimulus. This part of memory 
is referred to as iconic memory (visual mode) or echoic memory (auditory mode) . 
The next stage is short term memory where rehearsal can take place. Rehears- 
al is the process of repetition of the stored material during which the mate- 
rial is decoded, i.e., grouped into meaningful segments. This recoded mate- 
rial is then stored in long term memory. 
One result that Boyes cites is that iconic memory is shorter than echoic 
memory. Iconic storage usually lasts for between 250 msec and 1 sec whereas 
echoic storage can last as long as 10 sec. A second fact is that the reaction 
time to visual stimuli is longer than that to auditory stimuli. The third 
result is that visual short term memory is more limited than auditory short 
term memory in that it does not seem to be able to hold as many items in the 
presence of continued input. The current figures for this are 4 or 5 items 
maximum in visual STM as opposed to j - + 2 items in auditory STM. Boyes 
claims that this difference is due to the limited capacity for rehearsal of 
visual information. 
All three of these results show that there is generally less time avail- 
able for processing the sign sentence then there is for the spoken sentence. 
The temporal segmentation of sign would have to produce segments short enough 
to fit in iconic memory. And the sentence would have to be structured in such 
a way as to not tax STM with its limited rehearsal capacity. The sentence 
structure cannot rely on dependencies of elements which are temporally sepa- 
rated beyond the span of visual STM. Boyes seems to go a bit too far here 
and says that there should not be a "syntax which depends on decoding a tcm- 
poral succession of images as a unit." But all this really means is that the 
sentences in ASL must be shorter th&-r 5 items or that they must be processed 
in a way that does not require linguistic links between items which are sepa- 
rated by more than 4 items. Of course, more must be known about the linguis- 
tic processing of sign language before these conclusions can be made more 
specific. 
In any case, it is clear that more information must be encoded per time 
interval in a visual language than in a spoken language, if we assume that 
the rate of transmission of information is to be the same in both. This can 
be accomplished by the mode of production in two ways. First, the symbol 
system used must be more direct, i.e., there should be a simpler mapping be- 
tween visual sign and meaning than there is between sound and meaning. Sec- 
ondly, sign must utilize its spatial dimensions to overcome the temporal lim- 
itations on the transmission of information. 
Frame analysis. is able to rep- 
resent these qualities of ASL. 
3. Frame Analysis 
Frames are a convention for representing knowledge. Frame analysis is 
a method for representing language as a system of frames. There are four 
different types of linked frames that we will be using. These are discussed 
in Thompson (1975). Thompson attempts to resolve the apparent confl ict in 
terminnlogy with reference to the notions of scenes and frames in the work on 
prototype semantics (Fillmore and Rosch, MSSB, 1975) and the work on natural 
language understanding systems (Wi nograd and Bobrow, MSSB, 1975) . I n order 
to do so, he focuses in on two dichotomies. The first yields two types of 
frames, those representing knowledge of events and those representing linguis- 
tic knowledge. The second dichotomy further refines the categorization so 
that each type of frame can describe prototypic knowledge or knowledge of the 
instance at hand. These distinctions, then, give rise to four types of 
frames: Scene Prototype Frames (SPF) , Scene l ns tance Frames (S I F) , L i ngu i st i c 
Prototype Frames (LPF) , and Linguistic l nstance Frames (LI F) . Before we di s- 
cuss the structure of each type of frame we would like to indicate their pos- 
sible functions in processing ASL. A sees an event and an SIF is formed 
with guidance from the appropriate SPF which was activated when one of its 
principle defining characteristics had been recognized. A wishes to communi- 
cate this scene to B. A constructs the sign sentences by following the links 
from the SPF to an LPF. The LPF will guide the fi 11 ing in of an LIF based on 
the actual participants in the SIF thus producing the appropriate sign sen- 
tences. B watches A's signing and essentially reverses this process. An LIF 
begins to be formed and activates an LPF which guides the filling in of the 
LIF and causes the activation of an SPF. The SPF guides the filling in of 
the SIF with information from the LIF. Once the SIF contains all the requi- 
site information, B is said to have understood what A signed to him. 
What information do these frames contain and what are the various links, 
or "perspectives" as Thompson calls them, between these frames? Thompson 
sutigests a certain internal structure for these frames. 
A frame contains at least three sorts of things: slots, states, 
and actions. 
Slotsrare for identifying the participants in a given frame. 
Each slot has a name and a value. In an Instance Frame, these values 
will usually be names of other lnstance Frames which describe the 
things which are filling each slot, while in Prototype Frames, they 
will usually be names of other Prototype Frames which contai'n infor- 
matio'n about the sort of thing which can fill the associated slot. 
States are statements about various relatiotiships which hold 
among the slots, and actions describe transitions between states. 
We will need a slightly different structure because of the kind of information 
that is usually presented in sign. The major addition that we make is a cat- 
egory of slots called Ground which contains such things as the setting and 
the time element. We call the rest of the slots Figures. An example of an 
SPF wo~ld be { PREDITOR- PREY^ . 
Slots 
Ground 
TIME ltimel 
PLACE Lpl ace3 
Figures 
PRED bnima15 
PREY lanirnal3 
States 
I. PRED doesn' t have PREY 
I I. PREY has protection 
1 I I. PRED gets PREY 
IV. PREY gets caught 
Actions 
A. I. becomes false and Ill. becomes true 
5. I\. becomes false and IV. b-omes true 
C. 1. becomes true and IV. becomes false 
D. I I . becomes true and I 1 1 . becomes fdl se 
A or C, A imp1 ies B, C imp1 ies D 
An instance of this frame would have the ground and figure slots filled 
in with 
links to other instance frames as in the following SIF.. 
Slots 
Ground Figures 
TIME Enarrative time 4lfS 
PLACE \house 5841 
States and Act ions (as in sPF) 
The corresponding LPF would look much the same except for the crucial addition 
of the verb. An LPF contains Gr-ound and Figure slots along with a verb slot. 
The States and Actions are no longer present. Presumably the verb and the 
cases encode all this information. A perspective is given in order to match 
the Figure slots in the SPF with the case slots in the LPF. 
~PRED I TOR- PREY^ 
Slots 
Ground Figures 
T\ME Eposition on time line3 AGENT banima] 13 
PLACE Sposition in sign space3 PATIENT stanimal 11 
VERB WANT,GET,EAT~~ 
Pers ectives 
[PRED I TOR-PREY ,SPF~ 
PRED = AGENT 
PREY = PATIENT 
This account of the LPF is much in the spirit of Thompsan's LPF. But our ac- 
count of the LIF is different. We are dealing with sign and not a spoken 
language. The case relations are clearly manifested on the surface in sign 
because the hands act out the scene. So our LlF looks as follows: 
~PREDITOR-PREYS 
s lots 
Ground Figures 
TlME\positionontirneline617~ AGENT~W~~F~~~ 
~~ACElposition in sign space 729%  PATIENT^^^^ 911 
VERB f WANT, GET, EAT% 
There is no need to have Thompson's perspect ive to tel 1 us what case rol es 
the subject, object, etc. of the verb play in the prototype. 
Processing wi 11 
be faster since the linguistic prototype and instance frames are more alike 
in ASL. 
In sign the four frames are more a1 i ke in structure and there is much 
less need for links between frames. This cuts down processing time greatly 
and compensates for the limitations on visual memory. Linquistic frames 
differ from scene frames in the presence of the verb. As Reid says, the 
grammar of the image is different from the grammar of the language in that 
the image is made up of participants and properties attributed to them where- 
as the sentence is a package held together by the verb. Frame analysis for- 
malizes this notion and reflects the speed of processing ASL. We propose 
that it be seriously explored as a linguistic theory for sign language. 
4. A Frame Analysis of Sign Language 
The remainder of this paper will include a description of some devices 
in sign as well as a discussion of hbw they might be handled by a theory of 
Frame Analysis. These devices are not only interesting features to analyze, 
but also reveal the structure of the frames (focus, boundaries, weak points). 
Indexing is a process in ASL which parallels pronominalization and 
deixis (this, that, here, there) in spoken language. There are two types of 
indexing: real world references and conventional references. 
Real world references are of the type discussed earlier. When the per- 
son referred to is in the vicinity, one points directly to that person rather 
than to an arbitrary index. The same goes for location. A1 so, a person re- 
cently having left a group of signers will be referred to by pointing to the 
position he previously occupied. 
In frame analysis, the grammatical to real world reference link could be 
achieved by resorting to a higher frame encompassing the speech act. 
This 
speech act frame monitors the entire event and specifies what is common knowl- 
edge shared among the participants in the speech act. That shared knowledge 
determines the set of objects, persons and locations which can be referred 
to directly (by means of pointing). For example, if A knows that B has in 
his knowledge of the room they are in the vision of a bookshelf in one corner, 
then A can point directly to it without having to name it. The same goes 
for the shared knowledge of locations. If two people share the knowledge 
that city X is the obvious referrent of a point back over the left shoulder, 
then it will be used. Where this knowledge isn't shared, this referencing 
would be forbidden. 
There are several types of conventional indices for things, locations 
and people as well as positions for such indexing. The stationary person 
index, commonly referred to as grammatical indexing, involves referring to 
certain individuals by pointing to conventional places within the signing 
space: right, left, distal right, distal left, and straight ahead, in that or- 
der (for a right-handed signer). l ndexing into these positions al lows ready 
reference at any following time within the discourse. 
Grammatical indexing uses a frame for reference similar to thk speech 
act frame. Ln this frame, however, index points are specified as to which 
arbitrary referents are tied to them. In cases where participants are close- 
ly linked to Bpatial locations, they use these locations as their index 
points. 
Indices must be established (i.e. JOHN (indexed left position); ALICE 
(indexed right position)). 
Since the tie between these indices and their ref- 
erents is weak and arbitrary, they must frequently be reestablished. In the 
videotape, reindexing played a role in aiding us in our determination of 
frame boundaries. Reindexing interacts with the sign we have termed NEUTRAL 
POSITION (arms drop to sides). NEUTRAL POSlTlON is used to mark the end of 
a long discourse. Directly following NEUTRAL POSITION, at the beginning of 
a new frame, the signer would reindex 3 (the sign THREE) and focus upon one 
of the three pigs. Reindexing also marks mistakes and overcomplicated ref- 
erenc i ng. 
Besides NEUTRAL POSITION, there is another PAUSE SlGN which aids in the 
delineation of discourse and, therefore, in the discovery of frames. The 
PAUSE SlGN occurs at breaks between actions within frames or at shi Fts between 
agentive characters in frames. 
Other key sign structures which aid in frame determination are body po- 
sition shifting and the use of index markers. As a result of the limited 
length of this paper we cannot fully examine these devices here. 
However,an 
extended version of this paper and copies of the transcription of the video- 
tape of "The Three Little Pigs" are available from the authors. 
Acknowledgements. 
We would like to thank Tommy Radford for his help both in the signing of 
the story of The Three Little Pigs and in providing helpful conunents for its 
analysis. We would also like to thank the Sign Group and the Frames Group 
from the MSSB summer meetings, Berkeley, 1975. A special note of thanks to 
George Lakoff whose insights into our common interests made this paper possi- 
ble. The research fo'r this paper was supported by the 1975 MSSB Workshop on 
A1 ternat ive Theories of Syntax and Semantics. 

Bibliography
Boyes, Penny. 1972. "Visual Processing and the Structure of S i gn Language." unpub 1 i shed nis . 

Friedman, Lynn. 1975. "On the Semantics of Space, Time, and Person Reference in the American Sign Language." unpublished Master's Thesis, University of California at Berkeley. 

Reid, L. Starling. 1974. "Toward a Grammar of the image." P5bychological Bulletin, vol. 81, no. 6 (~une), pp. 319-334. 

Thompson, Henry. 1975. "Frames for L i ngu i sts .I' unpubl i shed ms . 

Deutsch, Barbara G. The Structure of Task-oriented dialogs, contributed papers, IEEE Symposium on Speech Recognition, Carnegie-Mellon University, Pittsburgh, Pennsylvania, 15-19 April, 1974. IEEE, New York, 1974.

Gary G. Hendrix. Expanding the Utility of Semantic Networks Through Partitioning. Advance papers of the 4th International Joint Conference on Artificial Intelligence, 1975.

Gary G. Hendrix.  Semantic Processing for Speech Understanding. ACL, 1975.

D. A. Norman, D. E. Rumelhart, and all. Explorations in Cognition, W.H. Freeman and Company, San Fran., 1975

William H. Paxton and Ann E. Robinson. System integration and control in a speech recognition system. ACL 1975.

Jane J. Robinson. A Tuneable Performance Grammar, ACL 1975.

Earl Sacerdoti. A structure for plans and behaviours, Technical note. Stanford Artificial Intelligence Center. August, 1975

Donaled E. Walker, et al. Speech Understanding Research, Annual Report. Project 3804. Artificial Intelligence Center, Stanford. June 1975.

Abelsbn, Robert. lvConcepts for Representing Mundane Reality in Plansv1 . In Representation and Understandinq: Studies Cognitive Science (Ed: D. Bobrow and A. Collins), Academic Press, New York, 1975. 

Austin, J. L. How - to - Do Thinas with Words. Clarendon Press, Oxford, 1962. 

Bruce, Bertram. "Belief Systems and Language Under~tanding~~. BBN Report No. 2973, 1975a. 

Bruce, Bertram. "Generation as a Social Actionw, In Theoretical Issues in Natural Lan~uaae process in^ (Ed: B, L. Nash-Webber and R. C. Schank), ACL, 1975b. 

Bruce, Bertram. 'TPragmatics in Speech Understandingf1. Proc. 4th IJCAI, Tbilisi, 1975~. 

Bruce, Bertram and C. F. Schmidt. tlEpisode Understanding and Belief Guided Parsingw. Presented at 12th ACL Meeting, Amherst, 1974. (Also Rutgers Computer Science Dept. Report CBM-TR-32). 

Deutsch, Barbara G. "The Structure of Task Oriented Dialoguesu. Contributed Papers, IEEE Symposium on Speech Recognition, CMU, Pittsburgh, 1974. 

Deutsch, Barbara G. Discourse Analysis and Pragmat,icsn . In S~eech Understanding Reaearch (D. Walker, W. Paxton, J. Robinson, G. Hendrix, Be Deutsch, and A. Robinson), Annual Technical Report, SRI, 1975. 

Goffman, Erving. Relations in Public. Basic Books, New York, 1971 

Grimes, Joseph. The Thread of Discourse. Mouton, Paris, in press. 

Labov, William. llRules for Ritual  insult^^^. In Studies Social Interaction (Ed: David Sudnow), The Free Press (Macmillan), 1972. 

Minsky, Marvin. "A Framework for the Representation of Knowledgef1. In The Psvcholoay - of Computer Vision (Ed: P. Winston), 1975. 

Phillips, Brian. To~ic Analvsis. Ph. D. Thesis, SUNY Buffalo, 1975. 

Rumelhart, David. flNotes, on a Schema for Storiesn, In Representation and Understandinq: Studies Connitive Science (Ed: D. Bobrow and A. Collins), Academic Press, New York, 1975 

Sacks, Harvey, Emanuel Schegloff and Gail Jefferson. "A Simplest Systematics for the Organization of Turn-Taking for Conversationsv. Semiotiea, 1974. 

Schank, Roger and Robert Abelson. "Scripts Plans and Knowledge". Proc. 4th IJCAI, Tbilisi, 1975. 

Scheglof f, Emanuel A. uN'otes on a Conversational Practice : Formulating Placen. In Studies Social Interaction (Ed: David Sudnow), The Free Press (Maernillan), 1972. 

Schmidt, Charles F. "Understanding Human Action: Recognizing the Motives and Plans of Other Personstt. Carnegie Symposium on Cognition: Cognition and Social Behavior, EMU, Pittsburgh, 1975

Searle, J. R. Speech Acts. Cambridge University Press, London, 1969. 

Stansfield, James L. Proaramrninrz, a Dialoaue Teachina Situation. Ph. D. Thesis, U. of Edinburgh, 1974. 

Winograd, Terry. If Frame Representations and the Declarative-Procedural Contr~versy~~. In Representation and Understanding: Studies in Cognitive Science (Ed : Bobrow and A. Collins), Academic Press, New York , 1975. 

Woods, William, M. Bates, B. Bruce, J. Colarusso, C. Cook, L. Gould, D. Grabel, J. Makhoul, B. Nash-Webber, R. Schwartz, J. Wolf. "Natural Communication with Computers, Final Report - Vol. I, Speech Understanding Research at BBNvl. BBN Report No. 2976, 1974. 

Woods, William A., R. Schwartz, C. Cook, J Klovstad , L. Bates, B Nash-Webber, B Bruce, J . Makhoul. tfSpeech Understanding Systems: QTPR 3". BBN Report No. 3115, 1975. 

Schank and Abelson 1975 R, C . Schank and R, P . Abelson, "Scripts, Plans and Knowledge", Proceedings of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, USSR, 1975. 

Schank 1973 R. C. Schank, "Causality and Reasoning", Technical . Report No. 1, Instituto per gli studi semantici e cognitivi, Castagnola, Switzerland, 1973. 

Schank 1974 R. C. Schank, "Understanding Paragraphs", Technical Report No. 6, Instituto per gli studi semantici e cognitivi, Castagnola, Switzerland, 1974, 

Schank et a1 1975 R. C . Schank and the Yale A1 ., Project, , "SAM--A Story Understander" , Research Report No. 43, Yale University Department of Computer Science, 1975 

Lehnert 1975 W. P. Lehnert, "What makes SAM run? Scr ipt-Based Techniques for Question Answering" , Proceedings of the Conference on Theoretical Issues in Natural Language Processing, edited by R. Schank and B. Nash-Webber, 1975. 

Charniak 1975 E. Charniak, "Organization and Inference in a Frame-Like System of Common Sense Knowledge", proceedings of the Conference on Theoretical Issues in Natural Language Processing, edited by R. Schank and B. Nash-Webber, 1975. 

Minsky 1974 M. Minsky, "Frame-Systems", MIT AI Memorandum, 1974. 

Rieger 1975 C. Rieger , "Conceptual Memory", in Information Processing, R. Schank (ed .) , North American Journal of Computational Linguistics Microfiche 35 : 70 

Fillmore, C. J. 1969. Toward a Modern Theory of Case. In Reibel and Schane. 

Furugori, T. 1974. "A memory model and simulation of memory processes for driving a car." Technical Report No. 77, Department of Computer Science, SUNY Buffalo. 

Hays, D. G. 1973, Types of Processes on Cognitive Networks. In Proceedings of the 1973 International Conference on Computational Linguistics. Pisa. 

Lakoff, G. 1972. Structural Complexity in Fairy Tales. me Study of Man 1, 128-150. 

Langacker, Ro W. 1969. On Pronominalization and the Chain of Command, In Reibel and Schane. 

Minsky, M. 1975 . A Framework for Representing Knowledge. In P . H. Wins ton (ed. ) , The Psychology of Computer Vision, - .) McGraw-Hill , MI. 

Phillips, B. 1975. Topic Analysis. Unpublished Ph.D, Thesis. SUNY Buffalo. 

Reibel , D. A. and D. A. Schane (eds . ] . 1969. Modern studies in English. Re,adings in Transformational Grammar. Prentice-Hall, Englewood Cliffs, 

Schank, R. C. and R. P. Abelson. 1975. Scripts, Plans, and Knowledqe. In Advance Papers of the Fourth ~nternational ~oint Conference on WtifLcial Intelligence. IJCAI, 

White; M. 1975. Abstract Definition in the Cognitive Network: The Metaphysical Terminology of a Contemporary Millenarian Community. Unpublished Ph.D. Thesis. SUNY Buffalo. 

Wilks, Y. 1975. A Preferential, Pattern-Seeking, Semantics for Natural Language Inference. Artificial Intelligence 6, 53-74. 

Badler, N. (197 5). "Temporal scene analysis: Conceptual descriptions of object movements. " University of Toronto, Department of Computer Sciehce, Technical Report No. 80, February 1975. 

Becker, J. (1973). "A model for the encoding of experiential information. I I In Computer Models of Thou&t and Language, Schank, Re and Colby, K. (eds.),W.H. Freeman 8~ Co., San Francisco, 1973, pp. 396-434. 

Charniak, E. (1972). "Toward a model of children's story comprehension. MIT Artificial Intelligence Report TR- 266, December 1972. 

Fillmore, C. (1968). "Tha case for case. " In Universals in Linguistic Theory, Bach, E. and Harms, R. (eds.), Halt, Rinehart, and Winston, Inc., Chicago, 1968. 

Hendrix, G. (1973a. ) . "Modeling simultaneous actions and continuous processes. " Artificial Intelligence 4, Winter 197 3, pp. 145-180. 

Hendrix, G., Thompson. C. and Slocum, J. (1973b). "Language processing via canonical verbs and semantic models. " Third International Joint Conference on Artificial Intelligence, August 197 3, pp. 262-269, 

Martin, W. (1973). "The things that really matter - A Theory of prepositions, semantic cases, and semantic type checking. " Automatic Programming Group, Internal Memo 13, MIT Project WC, 1973. 

Miller, G. (1972). "English verbs of motion: A case study in semantics and lexical memory. In Coding Processes and Human Memory, Melton, A. and Martin, E. (eds. ), V. H. Winston & Sons, Washington, D.C., 1973, pp. 335-372. 

Rumelhart, D., Lindsay, P. and Normw D. (1972). "A process model for long term memory. " In Organization of Memory. Tulving, E. and Donaldson, W. (eds. ), Academic Press, New York, 1972, pp. 197-246. 

Schank, R. (1973). "The fourteen primitive actions and their inferences. t I Stanford A. I. Laboratory Memo AIM-183, 1973. 

Simmons, R. (197 3). "Semantic networks: Their computation and use in understanding English sentences. " In Computer Models of Thought and Language, Schank, R. and Colby, K. (eds.), W.H. Freeman & Go., San Francisco, 1973, pp. 63-113. 

Winograd, To (1972). Understanding Natural Language, Academic Press, New York, 1972. 
