American Journal of Computational Linguistics 
Mi C~O~ i che 5 5 
THE FINITE STRING 
NEWSLETTER OF THE ASSOCIATION FOR COMPUTATIONAL LIKGUI'STICS 
ACL New officers for 1977 . . . . . . . . . ..... 2 
Call for papers . ........... . . 3 
Minutes, 1976 bbsiness meeting . ..... 4 
Secretary-Treasurer's report . . . . .... 7 
Financial report .... ... ... 9 
Hurrfanities - 3rd International Conference ... ... 10 
Linguistic and Literary Analysis - 5th International . . . 11 
Graphics and Interactive Techniques - 4th Annual 12 
Undergraduate Curricula and Computing Conference ... 13 
Representation and Understanding, edited by Daniel G.. 
Bobrow and Allan Collins. Reviewed by John Mylopoulos 14 
The Role of Speech in Language, edited by James F 
Kavanagh and James E. Cutting. Reviewed by Sieb 
Nooteboom . . . . . . . . . . . . . . .... 26 
Algebraic Parsing of Context-Free Languages 
Stephen F,. Weiss and Donald F. Stanat . . . .... 38 
A Comparison of Term Value M~asurements for Automatic 
Ipdexing - Gerard Salton . . . . . . . . . . . 61 
SNOPAR A Grammar Testing System - T P. Kehlrr 84 
AlERICAN JOURNAL OF COMPUTATIONAL L'INGUISTICS ' is published 
by the Association for Computational Linguistics. EDITOR- 
David G. Hays, 5848 Lake Shore Road; Hamburg, New York, 
14075. EDITORIAL ASSISTANT: William Benson. SECRETARY- 
TREASURER: Donald E. Walker, Stanford Research Institute, 
Menlc Park, California 94025. 
Assocla tlon ~lnguls tics 
American Journal of Computational Linguistics Microfiche 55 2 
NEW OFFICERS FOR 1977 
President PAUL CHAPIN 
National Science Foundation 
Vice President JONATHAN ALLEN 
Massachusetts Institute of Technology 
Secretary-Treasurer DONALD E. WALKER 
Stanford Research Institute 
Executive Council JERRY HOBBS 
City University of New York 
Continuing members of the Executive Committee are Bonnie 
Nash-Webber (through 197%) and Timothy C Diller (through 
1978). Continuing members of the Nominating Committee are 
William A. Woods, Jr . (1977) and Aravind K Joshi (1978) 
The Editor is a member of the Executive Committee ex officio 
American Journal of Computational Linguistics Microfiche 55 : 3 
CALL FOR PAPERS 
1-page abstract, wi~h title but no name 
Letter with author's name and paper title 
DEADLINE January 1, 1977 
Members of ACL should have received prior 
notice of this deadline by letter. 
ADDRESS Jonathan Allen 
Room 36-575 
Massachusetts Institute of Technology 
Cambridge 02139 
The Georgetown Universitv Round Table 
(on ~in~kstics and ~nth;o~olo~~) will 
be held immediately Eollowing the ACL 
meeting. 
Microfiche 55 : 4 
ASSOL 1 Al'J k UH \ZoyPUTAl IONAL LliJcUf&I'lCS' 
Pet rim announced that Haad Rober PO, Secretary- 
'Eredscrrer fif the ACLd tor the past five Years has restsned 
frm that t~ost, coincident it his departure from the 
Center for hrplied Lin~uistlcs to establish his own 
company, non ~3ker has been appointed to tPe nosition on 
aP 4r.tQrim taslS for the reaalnder ot tne year, Petrick 
exflressed th.e clratitude and app~eciatian of tne Associatiion 
for the dwfllcdion an4 service Roberts has provided during 
his tenure a sentiqent stronqly supported by the members 
Present. 
Walker red the Secretary-Treasurer's Report and the 
Fi~~ancial He~wrtt 00th of Which hdd been prepared by 
Poterts. Copies dre attached to these Yi~i~tes, Membership 
renebals, killirQ ~racticest and the financial status of the 
Association uqre discugscd, Petrick announced that John 
MoY ne r as Chalrran of the Yernbershlp Committee, was 
preparfnq e carpalan to recruit neb members, 
Petrick revle~ed tne status of the AJCL in the absence 
of pave Hays, its Witor, In discussing Aays* recent survey 
of the TernbershiD about the Jaqrnal, petric~ remarked On its 
quality and thbrouuhness, bath in PreDaration and in the 
analysis of the results, Over 200 members res~onded, an 
unusually high percentage; they stronqlr~ supported 
continulna publication in microfiche form. There was also 
consjderab1e interest expressed in having The Finite Stri'ng 
available in hara copy and in maklns it possible to acquire 
full size copi,es ot certain articles. ~efrick dnnounced an 
Lxecutjve committee decisionr continclent on adequate 
financial support, that at least part of tne contents of the 
Finite Strinq would be issued in hard COPY form, 
partjcularly those items of key ITtportance and timely 
interest to the mevhership. The cast of vakjnq hard copies 
of articles wallable dould he determinedr and members 
would be not it ied accordJnqly. 
Petrick annQunCed that the Executive Committee had 
v~t'ed to increase he kditoria]. Board ot the AJCL from 14 to 
15 qernbers and to establish three year terms qt office With 
f ive , DP~ menlbets to tre dppointed @act? year on a regular 
basis. 
Petrick announced that the increases tn ex~enses 
associatecj with the AJCL and with tne preparation and 
distrSbUtjon harflcopb nc?#Slett&r required rd.lsjnq the 
dtres $5 to 3 new tatdl of S15 tar individual rnembcrships. A 
sw4qest10n #as wade from the floor that a class of family 
nleFbershlp be established that would allow reduced dues for 
one of two spouses So that both could be memaers but only 
one copy of ~ubl~catlons kould be received. 
Pettick announced that he Executive tommitt@@ had 
decided to ~)utdish all the information that could be 
adthered about recent events assbciate? with lgar Mel*cuk, 
Me1 CU)~ &as fired from 6 long t'erm Position as Senior 
Research Fellow ot the Institute of L~nguistics in the 
Acarjemy of Sciences of the USSR, ostensibly on the basis of 
a letter he had submitted to the New York Times, The 
letter, wt~ich das published tn Januaryt expressed 
aisaareement biLh the criticisms of Andrei bakharov made by 
the soviet press. Ih blarchr Me1 cuk was fired; subsequently 
he prepared a letter describing the Circumstances and asked 
that ft be brought to the attention of American scientists, 
Questions had been raised about the appropriateness of 
~ublisnlng such correspondence on the grounds that it might 
hurt ejthet Me1 cuk or the ACL or both. An extensive 
discussion from the floor indicated that a variety of 
positions were ta~en on the issue, Petrick assured the 
menlhers that the Soviet position would be represented to the 
extent that information about it was available, 
The 15th Annual Meeting of tne ACL is being planned for 
hashinaton* LC., conjunction with the Georqeto~n 
IJnfversity Rauna rable on Lanjuases and I~inquistfcs, 
Tentative dates are 15-16 March 1977, 
PJ.k'tj1F~: 14tn bnnu-31 ACL Husiness 1Cleetinq 
Uartlnhdyrepbtted briefly on COLJING 7hr the bth 
Ir~ternation~L ~oOference eon Computationa1 Linquistics, ~hich 
bas held at the bniverslty of Ottawa, Uttaha* Canada, from 
2Y june to 3 July 1976, The next conference is scheduled 
for varna, t~lqaria, in 1978, 
Lker announced that the next Iqternat icnal Joint 
Conterpncr on Art it $cia1 lntel lisence Nil1 he held 22-2b 
A\IP"~~ 1~ 77 at the ~~assachusetts 1nstftute of Techna1ogy in 
Ca~bridqe, s4assachubetts, 
Jane pobi~s~n reported on Local 4rransementsr with 
pdrt lculal' emahasis a, the banquet scheduled for the 
evenjnur ~130rtly after the conclusion of the buslness 
Meetfna* 
Paul cha~in ~e~ortcd for the Program Committee. Of the 
21 abstracts submitted, 14 were accepted; he exaressed his 
apmreciation to tbe Comaittee memners tor their assistance. 
is experience hith publicity about the Call for Papers 
sw~ested that a Check list be establishej to provide more 
effectjve notitjcation, 
HOD barnes reported for the fiaminatlon Committee that 
the follohinq slate of of'ficers had been prgposed: 
f'aesident :- F'aul Chapin, hSC 
Vice President: Jonathan Allenr MIT 
Secretary lreasurer: Don dalker, SHI 
Fxecutive Cowmlttee: Jerry Hoobs. CUky 
yomimating Cornnittee: Stan Yetrick, lt3Y 
4 %tion that the slate be accepted unani~1ous1~ bas carried, 
bonnie bash-hehber expressed the appropriate sentiments 
In the forw ot a Resolutions Committee Report. 
Tbe mirrotiche question *as raised again, and Petrick 
reviewed ttie results of the questionnaire, the decision to 
 tov vide newsletter informatfon in hard copy form, and the 
provision of hard copies of selected articles at cost, 
Ihe me-etinq adjourned. 
Dona13 t. balker 
Secretary-~r easurer , Pro-re~~ 
Microfiche 55 : 7' 
Secretary-Treasurer 's Report 
I dm aladys sorry on those rare occasions when 1 cannot 
attend the annudl meeting of ACL; my Anqst is even qteatcr 
PO a th~s meet ina is my last one d s your 
secretary-t reasurer, hy annual report to you typically 
consists of statements about membership and finances, Tnis 
kill be a typical report, 
when the neN journal uas first issued in 1974 there 
was, a dra-at1.c inc?rease in the number of ACL members-from 
just under 100 in 1973, to over 800 by early 1975, Since 
the,n, these impressive gains have been so seriously eroded 
that our Current mehbefship stands at 580 (445 individuals 
and 135 institUtlons). A total of 212 individuals and 46 
institutions who had paid for 1975 did not rene# for 1976, 
althouah each Mek several renewals continue to dribble in, 
Several reasons might be thought of for the deckine: 
1.  he heavy ~mmotional activities at the beginning 
brouqht in some mewbers ~ho really herenet as interested in 
c0mputat:ional 1inguistiCs as they nay have thougfit they 
were , 
2. Some members ao not lfke microfiches, 
3. The recently established method Of billing members for 
their annudl dues (Including the dues notice on one of the 
opaque cards in the journal) which was conceived as an 
eCQnomy Peasure clearly failed to FrodUce rekults, and I 
would urqe--prasmatically--that this method never be 9ried 
again. 
It is hoped that the newly reactivated membership* 
CoKtmfttee, under the chairmanship of John liqoyne, will be 
able to devise creative ansNers to this chronic problem, 
In an organization such as Oars, where the association 
1s almost entirely dependent On the payment of annual dues, 
even a slight drop in membership causes serious problems, 
This years financial situation was further exacerbated bv 
three additional things: 
ACL secretary-Treasurer s Report, h,Uctober 197b Page 8 
I.  he coptinued, unreal st ically low dues rare of s~O, for 
whipp members. are receivins nearly '2 ,OW microfiche paqes 
YekrlY. [This proolen has not unexpected and #as dfscussed 
at the last Arlnual Yeet in9 in Hoston, and there were qaob 
reasons for ltbavin3 the dues at $10 Uhtil Such time as the 
ACL dr rided *Plat t o 00 j~bod~t: the journal. 1 
3. [Inexpected char I~S -- primarily tne Slri30,OO for 
refreshvents (cotfee and pastries) which were generously 
provide? k\y the sheratan Hotel at the last annual meeting. 
I1 now believe that the Sheratan chain, lndeed, is owned by 
1r3.1 
The customary cateL~orfzea financial statement is qiven 
below. Althouah the state'llent reflects ACL'S income and 
expenses, sowe ndlustwents ~ithin these figures will be made 
lat-er r penlqins a detailed allooC3tion of the costs incurred 
in and Incove derlved from the TtNLkY volumes, 
fiespeetfully submitted, 
A, Mood Roberts 
ACLJ Secretary-lreasurer's RePoxt r (c, actabcr 3P76 Page 9 
ASSCIC 3 &1 ION FOP CCMPUTATlO7JAL LINGUISTICS 
kinancia1 Heport far 197b 
Receipts: 
Pembership dues- 1975-1 q76 
14,353.37 
A~lr 76 meeting receipts to date 357.15 
Illa-wVwIIII 
Slhr818m61 
L~xshursement s: 
Adnlin5StratlVe c6Stsr office supplies, 
md1 Linqr and AJCL coats not 
covered by CAL account 317 S 4~777.64 
Vembership ACAL 
50 00 
AFIPF dues 497b 500.00 r 
4pnual meetfnq costs 1975-1976 lr130k58 
Pa16 out ot ACL membership receipts 
into CAIJ Account 317 for AJCL, 
as required by NSF 
9.039 ,60 
-----IaoII 
$15,497.82 
Balance as of October 10 1976 
American Journal of Computational Linguisdcs Microfiche 35 : 
THIRD 1NTERNATIONA.L CONFERENCE ON 
COHPUTING IN THE HUMANITIES 
SPONSORED BY THE UNIVERSITIES OF MONTREAL AND WATERLOO 
THEMES Frontiers between language and literature, Fine arts; 
Graphics, Historical studies, Information retrieval; 
Input techniques, Lexicography, Literary stylistics; 
Medieval studies; Music; Photocomposition, Public 
sexvice systems, Sernaptics 
INTERNATIONpL COMMITTEE F. V. Spechtler, Austria; J. R. Allen, 
Canada, A. Jones, England, I T. Piirainen, Finland, 
L. Fossier, France; W. Lenders, Germany; M. L. Alinei, 
Holland, S. C. Loh, Hong Kong, F. Papp, Hungary, 
B. J6nsson, Iceland; S. K. 'Havanur, India; U. Oman, 
Brael, L. F. Lara, Mexico, K. Hyldgaard-Jensen, Sweden, 
J. Joyce, USA, J. Raben, USA. 
REGISTRATION Professor J. S. North 
Chairman, ICCH3 
Department of English 
University of Waterloo 
Waterloo, Ontario, Canada 
N2L 3G1 
American Journal of Computational Linguistics Microfiche 55 : l1 
FIFTH INTERNATIONAL SYMPOSIUM ON THE USE OF COMPUTERS IN 
LINGUISTIC AND. LITERARY ANALYSIS 
UNIVERSITY OF ASTON IN BIRMINGHAM 
3 - 7 APRIL 1978 
Authorship studies 
Concordances 
Classical studies 
Input-output 
Oriental studles 
Software 
Stylistic analysis 
Syntactic analysis 
Text editing 
Language-oriented groups 
Education 
Lexicography 
Literary statistics 
ADDRESS FOR CORRESPONDENCE 
Professor D. E. Ager 
CLLR 
Department of Modern Languages 
University of Aston in Birmingham 
Gosta Green 
Birmingham B4 7ET 
England 
American Journal* of Computational Linguistics 
FOURTH ANNUAL CONFERENCE 
COMPUTCR GRAPHICS Ai4D 
INTERAC-rIVE.TECHNIQUmES 
CALL FOR PAPERS 
TOP I CS 
DEADLINE 
Microfiche 55 : 12 
Graphical theory and techniques such as 
languages, hardware, software, tools, porta- 
bility, standards, device independence, 
line graphics, raster graphics, data struc- 
tures, satellite systems, human factors, 
applications in the area of environmental, 
urban, transportatioh, cartography, biomedi- 
cine, ahimation, computer aided design, art, 
music, business, statistics, recreational 
graphics, decision making, and computer 
graphics education. 
Papers may report original work, unusual 
or unique applications or techniques of 
computer graphics, or they may evaluate 
graphical specifics 
A short abstract is requested by December 1, 
1976, and the final paper must be submitted 
by May 2, 1977 
PROGRAM CHAIRMAN James E George 415-447-1100 EXIZ 3360 
Los Alamos Scientific Laboratory 
P. 0. Box 1663, MS 272 
Los Alamos, New Mexico 8x545 
American Journal of Computational Linguistics Microfiche 55 : 13 
CALL FOR P4PERS : 1977 CONFERENCE ON COMPUTERS IN THE 
UNDERGRADUATE C U RtR ICULA 
MICHIGAN STATE UNIVERSITY, EAST LANSING 
SUBSTANCE 
Reports of actual experience with computer use in a specific 
course or sequence of courses, in any field except computer 
science. No-proposals; no repetition of previous reports 
without substantial new results. Survey papers only with 
synthesis or thorough evaluation. 
FORMAT 
Original manuscript suitable for reproduction in the proceedings. 
Typed, double spaced, up to 15 pages. 8"xlO" pictorial matter, 
glossy B&W photographs or photographable drawings. 
Title page. Authors' names, complete mailing add~ess, telephone 
numbers, if multiple, indicate which handles correspondence and 
wLll deliver the talk. Each page should have the principal 
author's name .on it. 
DEADL I NE 
ADDRESS 
Gerald L. Engel, Virginia Instituee of Marine Science, 
Gloucester Point, Virginia 23062. 
TRAVEL GRANTS 
.4 limited number of partial travel and subsistence grants may 
be available to speakers and others from minority institutions 
and small golleges. Information and app=cations from CCUC/8 
Travel Grant Committee, Eppley Center, a, East Lansing 48824 
American Journal of Computational Linguistics 
Microfiche 55 : 14 
REPRESENTATION AND UNDERSTANDING 
STUDIES IN COGNITIVE SCIENCE 
EDITED BY DANIEL G. BOBROW AND ALLAN CC~LINS 
Xerox Palo Alto Research Center and Bolt Bcranck and Newman 
Academic Press, Inc. 
New York 
LC 75-21630 $15.00 ISBN 0-12-108550-3 
Department of Computer Science 
University of Toronto 
Toronto, Ontario, Canada M5S 1A7 
A n~ajor goal of A)-tificlal 'Tntcll igcncc rcsral-ch today is 
to dcsign systcms that "undcrstnnd" p hocly of hnovlcdgc, i .c. 
usc j t whc%~tevrr ;ippropri:lte. Thc rcprcsm~tat ion oi the Anowl cJyc 
\available to such an '!rlndcrst nndcr" systg~n is an impel-t ant i ssuc 
for the systcmls dcsign ,~nd is intimately related to ihc pl-ogosctl 
uses of- that knowlcclge. 'I'his book includcs n collcctlo~~ of 
thirteen p;lpcrs. wri tten by some of the bcst known rcscnrchcrs 
1~110 are currcntly working on ~~nderstandcr systrms. The pnpcrs 
wcrc selcrtcd nnorlg those presented at a conference held in 
mcinory of Ja ilne Carl~oncll. 
Rt-prcs~nt a tion and 1lndc.r~ tanding 
l'hc c-ontct~ts of tllc Imok arc iis fol lnws 
1 . *1lp of Hcp~ cs~nti~ti~n 
1. llimcns ions of Rcprcscnt i~t ion 
!kinicl G. 13obrow 
2. I\')~:lt's ill a, Link: Poundniiol~s for Scmnntic Mctworks 
I 11 11 A 'lVlmds 
3. RcTlc3ct ions on thc 1:ormnl JIcscription of Rchovior 
JOSCF~ n. UCC~CT 
4. Systcmn~ic ilnc1erst:rildin~: 
Synthcsi s , Analysis, and Cont ingcnt Kno\~rmlcclge 
in Spcci:11 i zccl ilndcrst il11tl i ng Systcms 
Rohcrt J. Robrow 6 John Sccly Brown 
Ncru Elcmory !lode 1 s 
5. Solnc Principles of Xcmory Schemata 
Daniel G. R01)row C;. Donald A. Yorman 
6. A Fran~c Tor Fratilcs : 
Representing Knowlcdgc for Rccogni tion 
Benjamin J. Kuipcrs 
7. Frame Roprescntations and 
Thc ~cclarativc-Procedural Controversy 
Terry Ninograd 
111. Iiigller Lcvel Structures 
8. Notes on a Scllc~na for Stories 
David E. Rumelhart 
~eprcscn ta tiorr and Understanding 
9. Thc St ructurc of Epi sodcs in Pfcmory 
Rogcr C. Schnnk 
10. Conccpts for Rcprcscn~ing Yundnnc Rcoliiy in Plans 
Rohcrt P. Ahclson 
1V. Srm:~ni ic Know1 cdgc in i~nJcrsi:~~rilc.r Syst enis 
11 . Flu1 t ipl c Rcprcscnt at i ons of Know1 cdgc 
for Tutorial Reasoning 
John Sccly Rrown 6 ~ichnrd R. Burton 
12. Thc Rolc of Scmantiss 
in Automatic Spcech lfndcrstanding 
Bonrric Nush-Kcbbcr 
13. Reasoning From Incompl cte X~lorrlcdgc 
Allun Collins, Elcanor 11. ICarnock, 
Ncllckc Aicllo, I. Miller 
As stated in the book's introduction, the section on "Thcory 
of Rcprescntation" deals with gcncral issues regarding the 
rcprcscntation of knorvlcrlge, while that on lfScw afclnory ?-lodclsH 
discusses the implications of the assu-npti on illat input informa- 
tion is always interpreted in terms of large structur:ll units 
clcrivcd from expcricnce. The scction titled "Ilighcr Lcvel 
Structures" focuses on the rcprcscntation of plans, cpisodcs 
and stories within memory. Finally, the scction on "Semantic 
knowlcdgc in llndcrstandcr Systcms" dcscr~ibcs on- going work af 
the SO1311TE, SPI~I~CIILIS and SCIIOI.AR pro j cct s n t BRN. 
In attempting to rcvicw thc pilpcrs that nppcnr in this book 
collcctivcly rather tllan individually, wc i~rrivcd at a slightly 
L 
.& 
J 
P w 
+-r 
'La 
c 
L 
G 
C 
k 
s 
w 
.r 
, L. 
G 
f 
m ... 
- 
5 
- I 
'5 
L- 
5 
G 
rc. . .. 
*f 
f 
C 
.- 
t. 
C - 
,, 
E: 
*d 
s 
C* 
* yr 
r, 
C, 
c 
L 
0 
C 
C 
*$a 
*F 
F. 
5 
F 
- 
L 
- c. 
C 
C; 
d 


Representation and Understanding 20 
for dcclnrat ivc, VS. Ilcxiblc interaction ilnlong cli ffrrrnt Facts, 
for procedural . 
Schank ' s paper r 1 O 1 i ncl udcs a cl i s'cussi 011 on whct hcr thc 
o~~gnni zat ion of human mcmory is cpisodic or scmant ic. 
An cpi sodic 
mcmory orgn~~i ;at ion i111pl ics that h~io~\rl ccljic is storcdl ss tcapor;~l ly 
dated cpiso~lcs and cwcnts, with 1 rmpornl spat in1 rclat ions 
1 inking thcsc cvcnts. A S~milnI ic nlcinory organ i zilt i on, on the 
other ha11c1, involvcs t imc- invar innt know1 cdgc n pcrson posscsscs, 
c.g., "all elephants arc animals1'. A cor.ollary 01 these 
dcf init ions j s that an cpisodic mcmory orgi~ni zat ion f',~vo~rrs 
tclllporal and causal coi~nect ivos (e .g. , TllliN, Rl:ASON, I:NABIJE ctc.) , 
whcrcas a sc~nantjc mcmory org:lni znt ion uscs rxtclls ivel y the "I SA 
hicrnrchy!' (e.g. , "an cl epllant i s-a animalH). The d i scussion 
prescntcd in thc papcr on this issue is somcwhnt confusing since 
at one point (pp. 255-256) the tvo typcs of or.gnni zation arc 
contrasted as if thy -wcrc mutually cxclusi~rc, whilc later on 
(p. 263) thc paper argues Tor a combination of the notions of 
scmantic and episodic mcmory. in cithcr casc, Schnnkls work 
certainly makes a convincing argumcnt in rnvor of an cpi sodic 
monory organization by sllowing how jt can hc usccl to rcyrrscnt 
thc mcnning of a paragraph. 
I1 . Crl -- ,prc and ---- Extc~lsions or Rcj~rescntntion of li~~o\~lcclgc Par:~Jig~ns 
I ,,--,----,,, ------ -- - 
Scvcral papcrs, includjng some that were mcntioncd in thc 
prcvious scction, criticize, rcfinc, or cxtcncl one of the csisting 
paradigms Cor the rcprcscnt:it i on of knorul ctlgc. 
l'hc most not:ll~le csa~spl c alaong tllosc in ill i s calcl:ory i s 
I\'oodst paprr r23 i criticizes nlany (mis)uscs oi scmi~ntic 
nctworks by pointing out situations whcrc thcir sc'mantics arc 
poorly dcf incdaor inconsistent. Particular attcntjon is 
paid to tl~c rcprcscntation of qunntlficntion and that of rcla1 ive 
cln~iscs. 
As many of tI1c rcndcrs iindouhtcdly kllow, )Iinsky1s jnflucntinl 
paper intrnduc Sng I'framcs" 11 5'1 providcs more of an idcolog)* than 
a theory Tor rcprcscntjng knoxlcclge. Kuigcrs in [GI :irgucs in 
iilvor of a n~~nbcr or propcrtics Eramcs should h;luc, such ;IS the 
Iabj1.it.y to Jrscribc all object or situation to varying drgrces of 
detail, the ability to hc instanti;ltcd and the :~I~jlity to 11antlIc 
sa~nll pcrlurbnt ions of cxpcrted input data ui thout major rai lures. 
lie illustrotcs thq dcsirabjlity of tllcse icatures with a simple 
example of obj cct rccogni t j on. 
The second half of iVinogrndls paper m:~kcs an attcmpt to 
synthesize dcclarativc and proccdurnl aspects of a rcprcsentntion. 
Ilis proposal is hnscd on fr:uacs :~nd uscs a gcncra1jz;ltion (1%) 
hi crarchy 1lalring a number of icatur-cs , including thc abi 1 i ty to 
associ ntc procedurrs to obj ccts on the hi crnrclly l0li cll sl~cci fy 
how lo perform di ifcrept oycr:itions on lhosc ol~jc~cts. U:my of 
thc iclcns in TSJ and 171 hnvc bccn incorporated in KRL r161, 
as developed by D. Bobrow and IVinogrnd. 
IT1 . Rcprrscnt inp Dificrcnt Kinds of Know1 cdgc 
, ----- --------- -----,, ,-,,, ,-- 
Tn Format ion cnt cri ng an unclcrr.~ nndcr sys tcn~ may havc many 
di ffcrrnt VormsV, i. o. it may be codcd as photographs or 1 inc 
Rcprcsenta tj on and Undcrstdndjng 
22 
clrnwi ngs, sinplc sent rnces or paragraphs or cvcn coirpl ct-c 
st ori cs . Florcovcr , i t may hn~~c di f'fcrcnt "corlt cnt'! i . c. i nvolvc 
a fairy inlc world of 1;inl;s and tlrngons, a- blocks v:orId of' c~ibc's 
and pyl-amids, a social, mcntal or pllysical worltl. Onc j~rportnnt 
aspect oi the rcprcscntntjon problc~n is thc dcrinition of a 
collection oi kna~clcdgc, dcfincd by n restriction on its rorm 
and/or content, and the invcst igation of thc nclcquncy of a 
part icular rcprcsc~~totion. 
As ~ncnt i onbd crlicr, lroods ' pilpcr docs (1 i scuss thc reprc- 
scntation or quantification in tcrms or scm;intic networks, where 
thc form of tllc l\novlcdgc involved is prcsumnl?ly (first order) 
Prcdi catc Ca7 cul us and the contc~lt is ~i~lconsi-r;ii ncd . It a1 so 
discusses thc rcprcscntat ion of' rclativc clauses and conlplcx 
sdntcnccs 1111cre the form is flnturnl language and the content is, 
again, unconstrajncd. 
Rumelhart Is paper 83 j s prilnari ly conccrncd with tllc 
discovcry of structurc underlying simple storjcs. The structure 
js dcrincd in tcrms of n p1lr;lsc st,-ucturc gr;lalm;lr wi~h scl~n;ullic 
~ul cs associated to each product. ion. Thc pnpcr rcrt ainly follows 
thc gc~lcrsl trcncl to\~;iscls stlidying 1 ingui st ic u11 its 1:lrgcr t11n11 
sent cnccs, such as pnrngraphs, dinlogucs or st orics. Whcthcr 
thc methodology uscd (in part icul ;lr, pllrasc structure gr:ln1111:1rs) 
will bc found ndequatc for thc dcscrjl tion of sti-~ictu~c in stories 
rcmnins to be sccn. 
Sch:lnk 191 dcals mainly with the pr~blcm of constructing 
a structure of causal ly-1 i~ikcd actiorls nncl (-l~:~nj;cs or st nl cs 
~cprcsentation and Understanding 23 
(cpisodrs) from n pn~*agriiph. lShen cpisodcs arc used to make 
scnsc of ncw inputs in often-cxpcricnced situations, thcy arc 
callctl "scripts". 
The pnpcr cnds with a brief inti-oduction of 
scripts. 
Florc dct:lils :{bout them can be found in more rcccnt 
pub1 i cations by Sch:jnk and hi,s students, c. g. rl7,183. 
R~~~ncIl~nrt~s and Schm1k1s work arc rclntcd in that they both 
aitc~~pt to dcl inc tho structure of a collection of knowledge 
1 imj t cd with rcspcct to form (stories Tor Rulnclhsrt ,- paragraphs 
Tor Schank) and u~~constraincd with rcspcct to co~ltolt. \forcover, 
both papcrs agree that the dndcrlying rcprcscntntion used must 
involve causally-linked cvcnts, and thc causal coi~ncctivcs thcy 
employ arc similar. 
Abelsonts papcr is conccrncil with the rcprcscntation of 
"iiiu~~~lnnc rcnlityfl involving social act ions. l'hc approach he 
follo~~s is to postulate a number of primitive states and actions 
for achieving these states, jn terms of which hopefully all simple 
social bchi~viow can be dcscribed, The discussion of .the 
,primitjvcs is quitc thorough, but thc cxainpIcs givcn do not 
provjde.sufficient cvidcncc that thc primjtivcs proposcd arc in 
fact clcscript ivcly adcquate. ~lbC?l son's work is complcmcntnry to 
Schankls in scvcrel rcspccts and there is more rcccnt joint work 
on thc subject C193. 
IV1 . - Onsoj - ng Pro -- a-w-- j-ccts invo.lvi~ - IJndcrstnncler asterns 
-- , 
Thc last thrcc papcrs of thc book discuss partjcular projccrs 
involvi~~g the desjgn and jrnplcnicnt*atjon of undcrstand~~ systems. 
~cprovcntation and Understanding 24 
I11 1 dcscribcs thc scopc, has ic ~llc~tl~odology, :ind :ic]~jcvcmcnt s 
of SII, n kno~~rlrdgc-l~:~sc~cl computr~- ;~idcd instrustion (CAI) 
by asking qucstions, :~nswcr.ing qc~cst ions and lctting 11im try out 
Ilis iclcns. Of particular intcrcst to coaputntionnl 1 inguists 
!I 
sLo111d bc t 11c section dcscr i bing tl~c sc~ninntic grnnil~~ar" dcvc1opcd 
by nurton to 11:lndIc thc typcs of scn~cnccs c?xpcctcd during a 
dialog~ic on clcctronic circr~its. 
Nash-Kcl)l)cr I121 providcs an o~~crvicw of illc HBN SI)~~l~ClIJ,IS 
projcct in thc contrxt of a disc~ission on the usc oC scmn~~tic 
knowlcdgc for spccch umclcrst anding. F I 1, [ 137 discusses some 
of thc inrcrcncc rulcs jlnpl cacntocl or bpi ng eonsidcrcd !'or 
j~~~plclncntntion by the SCIIOJ,AR projcct whose aim is to dcvclop a 
knowJ cdgc-based CAI systcln 1 hat tc;lchrs geography . Tllc rcadcr. 
may filid many of tlrc rules stated in the pnpcr coml~lctely rcason- 
able and yet quite shaky from a logical point of view. For 
example, one rulc (the uniqncncss asyulipt ion) st atcs that if only 
one thing is fo~ind, it can bc nsslimcd that it constitutes a 
complcte set. Thus if sonlco~lc k~~ows of only onc city cnllcd 
flS1~ringricld" and locotcd jn ?l:~ssacl~~~:;sctts, hc can usr I he 
uniqueness nssumpt i on to rcply "no" to "1s Springfj cld in Kc~ltucky?" 
even though thcrc 11iny well be such a city. 
'l'he papers in this scctjon constitntc an important complcmcnt 
to the rcst of tllc book wllicl~ often irlvolvcs djscussio~is tllat arc 
too far rcmoVcd from thc rcnl i ty of I illlplr~n~~ltcd (or imp1cmcnt;iBJ c: 
Rcprcscnta tion and Understanding 25 
Ovcrall, this book provides an cxccllcnt rcvicw of thc state 
or tllc art, circa 1975, on thc problem of rcprcscnting knowlcdgc. 
Jt should be cq~pn~cnt rro~n the prc~ious discussion that the 
book assumcs a familiarity with basic issucs of rc~rcsc~~tation 
and unJcrstandcr systcm design. For more introductory discussions, 
the rcBdcr is rcfcrrcd to /I41 or Schank and Colby L203. 
American Journal of Computational Linguistics 
Microfiche 55 : 26 
EDITED BY JAMES F. KAVANAGH (GROWTH AND DEVELOPMENT BRANCH, 
NATIONAL INSTITUTE OF CHILD HEALTH AND HUMAN DEVELOPMENT) AND 
JAMES E. CUTTING (DEPARTMENT OF PSYCHOLOGY, WESLEYAN UNIVERSITY) 
The MIT Press 
Cambridge, Massachusetts 02139 
1975 
xiv + 335 pages $15.00 ISBN 0-262-11059-8 
REVIEWED BY SIEB NOOTEBOOM 
Instituut voor Perceptie Onderzoek 
Postbus 513 den Dolech 2 Eindhoven 4502 
The book under review contains the proceedings of a small con- 
ference (22 participants) with the same title, held in October 
1973 at the Urban Life Center, Columbia, Maryland. The confe- 
rence was one in a series called "Communicating by Language", 
sponsored by the National. ~nstiiute of Child Health and Human 
Development (NICHD). These are 19 papers, divided into 3 major 
sections, viz. 
I The development of speech in man and child 
I1 Language without speech (dealing with sign language) 
I11 Phonology and language 
Some papers are followed by comments of one of the participants 
each paper or coherent group of papers is followed by a summary 
of fhe open discussion. 
A separate IVth section of the book 
contains reflections on the conference by Ira J. Hirsh. Refe- 
The Role of Speech in Language 
rences are presented at the end of each paper. The editors 
have provided a name index and a subject index at the end of 
the book. 
Many linguists and psycholinguists take it for granted 
that language can be studied without studying speech. Like- 
wise many speech researchers seem to work from rhe view that 
the p~oduction and perception of speech can be studied without 
s~udying language. This situation leads Alvin Liberman to 
I I 
state in his "Introduction to the conference" that our topfc 
--the role of speech in language--is not an established one; 
no one has made it the direct and primary object of his research. 
11 
Although this statement is perhaps too categorical, it certainly 
is valid for most of the field. (An obvious exception, to my 
mind, is among others Professor Lindblom of the University of 
Stockholm, who systematically explores the explanatory value 
of quantitative models of speech production and perception in 
phonology, e.g. Lindblom 1972, 1975). The organizers of the 
conference, Kavanagh and Liberman, have taken care to select 
well-known researchers with different backgrounds and different 
interests to discuss the various problems which may be derived 
from the central question: "do we increase our understanding 
of language when we take into account that it is spoken?" 
The resulting texts make interesting reading, although 
one will look in vain for a convincing answer to the initial 
question. Different investigators have different opinions and 
the present state of knowledge does not seem to make it 
The RoJ~I of Speech in Language 
possible to settle the matter. In most papers specialist 
knowledge is freely intermixed with speculation, and it is not 
always easy to tell the one from the other. The discussions 
generally serve more to con-tinrle speculation than to criticize 
in detail each other's thinking. These remarks are not meant 
as a criticism of the conference and its proceedings. They 
intend to give an indication, however, of the style of this 
book, and a warhing that one will not find here a thorough 
discussion of empirical data or explicit, testable theories, 
that could be of use in more practically oriented work. Instead 
one finds a number of inspiring expositions of such diverse 
topics as similarities and dissimilarities between human and 
animal communication systems, the evolutionary connections 
between language, speech, and tool-making, the primacy of pro- 
duction or perception in the phylogenesis and the ontogenesis 
of speech, the primacy of signs or speech in the evolution of 
language, the articulate structure of signs in those who have 
sign language as their first language, the origins of phonolo- 
gical change, and the parallels in phonological and other lin- 
guistic organization of language. 
Below I will make a few remarks on a few selected topics: 
a) The evolutiorl of speech and language 
b) Spoken language and sign language 
c) Innate feature detectors 
d) The absence of prosody 
I will not attempt to cover in this review all papers in the 
book. 
A* THE EVOLUTION OF SPEECH AND LANGUAGE 
In a number of places in this volume attempts are made to re- 
late results of recent empirical studies of several kinds to 
theoretical ideas on the evolution of speech and language in 
early man. So Peter Pfarler gives an interesting description 
of communication systems in nonhuman primates and birds. His 
data on monkeys show a difference between discrete signal sys- 
tems, consisting of a limited number of acoustically well- 
distinguished sound signals, used by monkeys living in dense 
forests and having little visual contact, and graded signal 
systems displaying continuous variation of sound signals, used 
by terrestrial monkeys. The bird data on the white-crowned 
sparrow lead him to the concept of an innate auditory template 
for bird song, modifiable by a suitable external model and 
serving for the developmefit of vocal behavior. In his specu- 
lations on the origin of speech Marler emphasizes the impor- 
tnace of the evolution of innate but modifiable auditory tem- 
plates for speech sounds, serving to distinguish between 
acceptable and nonacceptable models for vocal development, for 
classifying acceptable sounds into.subcategories and for de- 
veloping speech. He also assumes that, while categorical 
processing was developed as an aid in identifying sounds from 
memory, continuous sensory processing of sounds was retained, 
thus leading to an intermingling of categorical and noncatego- 
rical (discrete and graded) processing. He finally suggests 
that "The substitution of categorical for continuous processing 
~ht. Role of Speech in Lanquaqe 
30 
of speech sounds may have directly facilitated the introduction 
of syntax as a radical innovation in primate communication". 
There appear to be two basic assumptions underlying 
Marler's reasoning. One is that comparative studies of sensory 
and vocal behavi~r in animals and man maxr lead to interesting 
theories about specific properties of the human brain under- 
lying man's capacity for speech and language. The other is 
that such studies may clarify the order in which postulated 
changes in vocal perception and development might have occurred 
in the evolution of early man. There is an important diffe- 
rence between these two assumptions. Whereas the former may 
lead to theories or hypotheses which in ~rinciple might become 
testable, the latter does not, at least not within the limits 
of this reviewer's imagination. Obviously this lack of testa- 
bility is common to many speculations about the evolution of 
humari behavior. This has in the past not kept scientists from 
making reasonable guesses particularly about the evolution of 
language and speech, and probably will not do so in the fature 
In this volume both Hewes in his comments on Mattingly's paper 
and Liberman in his own contribution relate the genesis of 
language to toolmaking. Hewes observes similarities between 
syntactic structures and the prescribed order of the various 
steps necessary for the manufacture of flakes from a prepared 
Levallois core. Liberman, taking the same Line of thought, 
states that the Levallois toolmaking technique cannot reason- 
ably be described by means of a phrase-structure grammar. 
A 
The Ro-E-' of Speech in Lanyuayc 
transformational grammar which formally incorporates a memory 
is necessary. As far as I understand his reasoning this is so 
because in making a particular chip one has to keep two things 
in mind, both the last chip that has been made and the final 
form of the tool. It seems to me, bwever, that in order to 
~ive his argument its force it still has to be shown that 
3 
there is a fundamental difference in the necessary complexity 
of underlying mental structures between Levallois toolmaking 
and many forms of goal-oriented behavior we find in higher 
animals. 
Liberman also suggests that the final crucial stage in 
the evolution of human language would appear to be the develop- 
ment of the bent two-tube supralarynge~l vocal tract of modern 
man, which allows its possessors to generate acoustic signals 
ehat (1) have very distinct acoustic properties and (2) are 
easy to produce, being acoustically stable. Reconstructions 
from fossils tell him that the Neanderthal hominids had to do 
without this asset, and therefore probably retained a cormuni- 
cation system with a mixed phonetic level that relied on both 
gestural and kocal components. At this point the reader parti- 
cularly feels the need for an expert criticism of the validity 
of %uch reconstructions. 
Bn SPOKEN LANGUAGE AND SIGN LANGUAGE 
The question whether speech or gestural comunication has been 
more important in the evolution of human language came up 
several times during the conference. In reaction to Mattingly 's 
The ÿ ole of Speech i n Lanryudye 
idea that "speech exemplifies a thoroughly and peculiarly 
human kind of knowing" Hewes commented that the depigmentation 
of the volar skin would indicate the antiquity of nonvocal 
cormn~nication. Indirect support for this supposed antiquity 
of gestural communication comes from some fascinating studies 
of American Sign Lansuage (ASL), according to Bellugi and 
Klima a full-fledged language of its own, and not a derivative 
or degenerate form of written or spoken English. Stokoe 
argues for the antiquity of sign language from a possible 
parallel between ontogeny and phylogeny. It appeatrs to be the. 
case that the infant with deaf parents, learning ASL as its 
first language, begins putting wordlike signs into sentencelike 
struktures at an earlier age than the child making two-word 
or three-word sentences in speech. 
Bellugi and Klima have studied sign language from histo- 
rical changes in the form of signs, in short term memory 
experiments, by analyzing a collection of "slips of the hand1', 
sad by comparing American Sign Language with Chinese Signs, 
in all cases with profoundly deaf peaple who use sign language 
as their primary form of communication. They show that signs 
in ASL are not simply signals which differ uniquely and hlis- 
tically from one another but are, rather, highly coded units. 
They also provide evidence that grammatical processes bear the 
marks of the particular transmission system in which the lan- 
guage developed. This seems to be donfirmed in ~uttenlocher's 
Thc Rol c of Speech in Lany uaqe 
contribution, comparing the encoding of spatial relations in 
ASL and natural language (= spoken American English) 
It is too early to draw any definite conclusions from 
these studies of sign language on the interdependence of 
natural language and speech, as the structure of sign language 
is only beginning to be understood. But it is certainly of 
much interest to students of language behavior that the human 
perceptual and cognitive systems appear to be so flexible that 
profoundly deaf people may develop visual communication systems 
among themselves which, if not equal in expressive power and 
speed of communication to natural spoken languages, at least 
come close to them. Further comparisons between the syntax of 
natural spoken languages and sign languages may lead to more 
caution in interprethg current ideas about what is and what 
is not innate in our linguistic abilities. Similarly compari- 
sons between the efficiency of speech perception and the effi- 
ciency of visual sign perception might well make us wonder 
whether speech perception is as special as some theorists like 
to make us believe. 
CI INNATE FEATURE DETECTORS 
The idea that speech perception is mediated by, possibly innate, 
speech specific feature detectors was given considerable atrep- 
tion in the conference. This idea supported Marler's extrapo- 
lation from innate auditory templates in birds to. innate 
auditory templates in humans. Studdert-Kennedy provides a 
The Role of Speech in Lilnyuayc 
careful survey of the current empirical evidence concerning 
the perceptual processing of consonants and vowels, from which 
he concludes that the "human cortex is supplied with sets of 
acoustic detectors tuned to speech, each inhibited from output 
to the phonetic system in the absence of collateral response 
in other detectors". 
Cutting and Eimas present evidence that such feature 
detectors are innate. Eimas has shown that very young infants, 
one month bnd four months of age, can discriminate much better 
between different speech sounds that belong to different pho- 
nemic categories than between different speech sounds belonging 
to the same phonemic category in adult speech. One ma7 concur, 
however, with the doubt expressed by Hirsh in his reflections 
on the conference whether Eimas's data are about speech or 
about general auditory perception. One may feel similar doubts 
about the interpretation Eimas and Cutting give to the data 
stemming from the selective adaptation paradigm, introduced in 
speech perception studies by Eimas and Corbit in 1973 and since 
then used by an increasing number of investigators. In selec- 
tive adaptation studies it is shown that repeated stimulation 
with a particular acoustic configuration, for instance a syl- 
lable - ba, may change the response distribution in a phoneme 
identification task, for instance the binary forced choice 
between ba - and 
measured with stimuli. taken from the acoustic 
continuum between - ba and . In this case the number of E- 
responses would increase at the cost of the - ba-responses. The 
Tho Role of Spcoch in Ldnguaye 
interpretation is that there are feature detectors which can 
be fatigued by repeated stimulation. By carefully studying 
which acoustic configurations lead to shifts in particular 
response distributions, it would be possible to find out what 
information is extracted by particular feature detectors. 
Cutting and Eimas argue for the existence of phonetic, speech 
speciiic, feature detectors. More recent studies show that 
categorical perception and selective adaptation are not unique 
to speech perception (Cutting, Rosner and Foard 1976) . Fur- 
thermore, to my knowledge, nobody has yet seriously discussed 
the. difficulties for a theory of "wired-in" feature detectors 
stemming from perceptual normalization experiments in which it 
is shown that response distributions in phoneme identification 
tasks may shift systematically due to the immediate environ- 
ment of the test segment (e .g . Fourcin 1972) . 
Dm THE ABSENCE OF PROSODY 
The volume under review is not only remarkable for the many 
interesting and stimulating papers it contains but also for 
-what it does not con&ain. In a collection of papers with the 
title "The role of speech in language" one wo~ld have expected 
to find at least one contribution seriously discussing the 
relation between speech prosody and linguistic structure. It 
is ironical that the only paper in which intonational contrast 
is given more ateention than obligatory lip service is Stokoe's 
contribution "The shape of soundles~ language", dealing with 
The Role of Speech in Language 3G 
sign language Stokoe's treatment of intonation and its kinesic 
correlate in sign language seems to make explicit why so many 
speech researchers do not pay attention to speech prosody. He 
suggests that intonational contrasts "are not necessarily lin- 
guistic and have more affinity with other systems that signal 
affect than with phonemic contrasts. There remain then only 
phonemic contrasts between consonant and consonant, vowel and 
vowel, and tone and tone (when so used) as the ihdisputably 
linguistic, basic features of language". One may fear that 
this undue overemphasis on phonemic contrast in speech percep- 
tion research will persist until speech scientists turn away 
from the study of isolated CV-syllables and start wondering 
about the perception of normal spontaneous connected speech. 
American Journal sf Computational Linguistics 
STEPHEN F. WEISS AND DONALD F. STANAT 
Department of Computer Science 
University of North Carolina 
New West Hall 035A 
Chapel Hi 11 27514 
A class of algebraic parsing techniques for context-free 
languages is presented. A grammar is used to characterize 
a parsing homomorphism which maps terminal strings to a 
polynomial semiring. The image of a string under an 
appropriate homomorphism contains terms which specify all 
derivations of the string. The work describes a spectzum 
of parsing techniques for each context-free grammar, ranging 
from a form of bottom-up to top-down procedures. 
ALGEBMIC PARSING OF CONTEXT-FREE LANGUAGES 
I. Introduction 
For many years syntactic analysis and the theor;- of formal 
languages have developed in a parallel, but not closely rel-ted, 
fashion. The work described here is an effort t.0 relate these 
areas by applying the tools of formal power series to the p-iroblem 
OF parsing. 
This paper presents an algebraic technique for parsing a broad 
class of context-free grammars. By parsing we mean the process of 
determining whether a string of terminal symbols, 1, is a member 
of the language generated by grarnmar G i.., is x e L(G)?) and, 
if it is, finding all derivations of x from the starting symbol 
of G. We hope that posing the parsing problem in purely algebraic 
terms will provide a basis for examination and comparison of parsfng 
algorithms and grammar classes. 
Section 11 presents an overview of the algebraic parsing process. 
It provides a general notion of how the method works without going 
into detail. Section 111 contains the algebraic preliminaries and 
notational eonventions needed in order to describe the parsing method 
precisely. The formal presentation of the parsing method and the 
proof of correctness form Section IVI Section V contains some 
interesting special cases of the theorem and presents some examples 
0-f parses. 
11. - Overview of the algebraic parsing, p recess 
The algebraic parsing formalism described here is applicable 
to all context-free grammars G = <vN, vT9 P, S> except those that 
contain producti~ns ~f the form A B where A and B are both 
nonterminals, or erasing rules such as A -p e. The parsing process 
consists first of constructing (on the basis of the grammar G); a 
polynomial and a function defined on polynomials. A parse of x is 
obtained by repeated applications of the function to a polynomial 
P(x). The process has two features worthy of note. First, it 
produces all parses of x in parallel. Second, the process of 
cohverting a grammar into the required algebraic form is straight- 
forward and does not alter the structure of the grammar. This 
property, the preservation of grammatical structure, is particularly 
important in areas such as natural language analysis where the 
structure that a grammar provides is as important as the language 
it generates. 
The polynomials we will use have terms of the form (Z,A), where 
Z is a string aver aa extended alphabet and A represents a sequence 
of productions of G. The process begins with a polynomial of ordered 
pairs representing X, the string to be parsed. A function is 
repeatedly applied to the poJvnomia1; the number of applications 
nacessary is bounded by. the input length. If the resulting polynomial 
contains a term (S,A) where S is the starting symbol in G, then A 
repLresents the production sequence used in generating x Esom S. If 
no such pair occurs, then x is not in L(G), and if multiple pairs 
I 
occur (S hl) , (5 'A2) 
. . . then x is ambiguous and the A s specify 
the several parses. A precise formulation of the polynomial and the 
operations on it is given belaw. 
111. Algebraic preliminaqies, and notation 
A semigroup is formally defined as an ordered pair <S,-i where 
S is a set (the carrier) znd ' is an associative binary operation. 
Similarly, a monoid is a triple consisting of a set, an operation 
and a two-sided identity (e.g., s,) We will feel free to 
denote a monoid or semigroup by its cerrier. 
* 
For any set V, V denotes the free monoid generated by V; 
* * + 
V = <V ,concatenation,n>. Similarly, V denotes the -- free semigroup 
+ 
generated by. r; V+ = <V , concatenat ion). We denote the length of a 
* + 
string X in 7 or V by 1x1. 
For an arbitrary alphabet V, we define = E;~V~VI. The free 
half-group generated by V, H(V), is defined to be the monoid 
generated by V u 9 together'with the relation aa = 1, where 1 is 
the monoid identity and a s any element of V. Note that in H(V) 
the elements of 7 are left inverses but not right inverses of the 
co.rresponding elements of V. We denote the extended alphabet 
If T = <~,*,1> and Q = <~,+,0> are monoids, we deno.te by 
T Q the product monoid <T y Q,@, (1;0)>. The carrier of T Q 
is the cartesian product T Q and the operation @ is defined to be 
the component-wise operation of T and 0: 
A semiring is an alzebraic system <S,+, ,O> such that 
<S,+,O> is a commutative monoid, 
<S,m> is a semigroup, 
and the operation distributes over +: 
am.(b+c) = a*b + aec, 
(a+b)*c = a-c + b*c. 
A semiring is commutative if the operation is commutative, 
A semiring with identity is a system <~,+;,0,1> where <s,+;,O) is 
a monoid. The semirings used in this paper are commutati~re and have 
identities. Furthermore, in each case the additive identity is a 
multiplicative zero: 
0-x = x-0 = 0. 
The boolean sem%ring B consists of the carrier {0,1] under the 
comrn~tat~ve operations + and *, where 1-1 = l+x = I. and 0+0 = O*x = 0 
for all x E I0,l). 
For an arbitrary monoid M we denote by R(M) the baniring of 
polynomials described as follows : 
1) Each tern is of the form ca where c E B (the 
boolean serniring of coefficients) and rx E M. 
2) Each polynomial is a formula sum (under +) of 
a finite number of terms. 
3) Addition and multiplication of terms is defined as follows : 
a) bu + crx = (b -f- c) a 
b) (ba) (cB) = (be) bP). 
4) Addition a,nd multiplication of polynomials is performed 
in the usual manner consistent with 3). 
Note that all coef iicients of R(M) arc either 1 or 0. We wi 11 
adopt the usual convention of not explicitly writing 1 for the terms 
with that coefficient and omitting telms with a coefficient of 0. 
A --- context-free grammar is a system G = <VN, VT, P, S> where VN 
and V are finite, disjoint, non erlpty sets denoted non-terminal and 
T 
terminal symbols respectively. We denote by V the set V :I VT. The 
N 
symbol S is the distinguished nonterminal from which all derivations 
begin, and P 2s the set of productions of G. A context-free grammer 
is proper if it does not contain productions of thz form A -+ c 
(erasures) or A B where A and E are both nonterminals. 
It can easily be shown that the set of Languages generated by 
proper context-free grammars is exactly the set of context-free 
languages. In addition, an arbitrary context-free grammar can be 
made proper by a straightforward method which alters the structure 
of the grammar very little. In this study we will deal with only 
proper .context-free grammars. This guarantees that all terminal 
strings have a finite number of derivations in C-, and thus makes 
possible our goal of finding all derivations of an input. 
i 
Productibns of G will be indexed by integers. Thus A M denotes 
th 
that A -+ >I is the i production in P. We will deal only with left- 
most deriyations. A leftmost derivation is completely specifzed by the 
initial sentential form and the sequence of production indices. If- 
* 
A c_ 1 is the sequence of production indices in the leftmost derivation. 
+ + C 
of N 6- V from M c V , we  rite ?I -N. The length of a derivation D 
- 
is denoted by I, and is equal to the number of production indices in L. 
We will use, but not formally define, the notion of height of a 
derivation', meaning the height of the corresponding derivation tree 
or the length of the longest path from the root to the frontier of the 
tree. The height of a derivation C will be denoted by h(C) . 
' 
Since derivation' will always mean 'lef tmos t derivation1 in the 
sequel, the following assertions hold: 
Assertion 1: A derivation is of height 0 if and only if it is of 
length 0. A derivation is of height 1 if and only if it is of length 1. 
Assertion 21 Let G be a proper context-free grammar, and 
G 
A -9M 
where IGliO. Then A is of height less than or equal to ]MI. 
Assertion 3: Let G = <VN, VT-, P, S> be a context-free grammar, I an 
th 
index set for P, and let the j pr~duction of G be 
Let -jr be a derivation 
jr 
A -Pi 
of height n + 1. Then 
and 
and for all i, 1 i "m, 
is a derivation of height n or less. 
The algebraic structure used in this work is the semiring of 
polynomials R(H - I*) where H = H (v) I the free half -group generated 
by V, and I is the in'dex set of the set of proJuctions P. We will. 
use an initial segment of the natural numbers, 2 3,. . , , as 
the index set I. Each term of a polynomial from R(H * I*) consists 
of an element from H I* tcgether with a coefficient from the 
boolean semiring B. The elements of H - I* will be the basis for 
calculating the parses of a string A. The elements of H will inter- 
act to determine if a product of terms characterizes a derivation. 
If so, the associated element og I* 3s the sequence of production 
indices or" the derivation. 
The following notational conventions will be observed. 
i, j, k m, n E - N, (set of natural numbers)* 
IS, g, , v will denote functions. Far the function g, 
IV. An algebraic parsing theorem 
Theorem (version - 1): 
Let G = <VN, vT , S, P) be a proper context- 
free grammar. Then there exist homomorphisms L,, g, and (5, 
2" * 
and a special polynomial p E R r I ) such that for every 
T 
X cz VT' X = XI --• X,. Xi ' VT, 
contains a term A if and only if A is a leftmost derivatim 
of x from S. 
Construction for the proof: 
Let V = 
v1 
IJ Vg be an arbitrary exhaustive division of V: 
The construction is most economfcal when V and V are disjoint, but 
1 2 
th$s is not required. 
The function v is the homomorphism induced by the following: 
* 
v(a) = (a,A), a E V and fl is the identity in I . 
Since v is a homomorphism, v(A) = A. 
The function g is the homomorphism induced by defi-ning 
g on the generators of the domain as follows: 
2i g (a, A) contains the term (a, A) ; a c V 
th 
2ii. If A -+ abl ... b is the i production 
n 
of P and a E V then g(a,A) contains 
1 
2iii. There are no other terns in ga(a,L) . 
Note that because g is a hombmorphism, g(A) = 4, where .?. 
* * 
is the identity of the monoid (X I ) 
The function 6 is the canonical homomorphism wh'ich 
* * 
coalesces a product in (C T ) into a single ordered 
pair by component-wi se mcltiplicati3n of the first 
entries (thus allowing cancellation in H) and 
catenation of the second entries. For example, 
* 3% 
d. The ~olynomial p is an element of ( ..' I: ) defined 
as follows : 
1.- p contains the summand A; 
2. If a c Vp and A -+ ab ... b is the j 
th 
1 r~ production 
of P then p contains the summand 
3. p contains no other summands. 
k 
We adopt the convention that p = A for k ' 0. 
k 
Note that since p contains X, p contains A as well 
as all summands of pJ for j ' k. 
For notational convenience we adopt the followiag conventions. 
* * 
First; where no ambiguity can result, products in R(T: T ) of 
the form 
will be abbreviated as: 
No cancellation is implied by this notation since cancellation cannot 
* * 
occur in R(c I ) . Second, we define the function 'Yk as follows: 
where ai E V and p is the polynpmial defined above. Note that. if 
k < 0, then y (a a ... an) = v(ala2 ,.. a ) and Y~(A) = A. 
Using 
k 12 n 
this notation, we can re-state the theorem as follows: 
Theorem (version 2) - : Let C = <VN, 
v~ ' 
P, S> be a proper context-free 
grammar. Then there exist ms2s Y, g and 6 such thit 
+ 
such that for every x E V 
x = xlxZ ... 
T' 
X,r xi E v T' 5gny n (X) 
A 
contains a term (S,A) if'and only if S --- X. 
The proof of the theorem rests on three lemmas. - Lernma I 
impliel; the "if" part of the theorem; Lemma 111 implies the "only if" 
part. Lemma 11 is used in the proof of Lemma 111. 
+ 
Lemma J: Let M E V , 
n 
A E V, and A -M. Then for all k ' h(A), 
k 
6g \Ykcm) contains (A,A) . 
Proof (by induction on h(A), the height of the derivation A): 
k 
Basis: If h(A) = 0, then A = A and fl = A. Then Y~(A) = p (A,Ei). 
Since A is a summand of p, it follows that (A,A) is a summand of 
k k 
p (A,A), and therefore A, is a summand of 6g Bk(A,h). 
Thus the 
A k 
derivation A A is represented in 6g Y (A) by (A,A), which 
k 
establishes the basis. 
A 
Induction: Let A be a derivation of height n + 1, A - Y. By 
assertion 3, 
where 
and 
where h(ri) n. 
k 
then by the induction hypothesis, bg Y (M ) contains the summand 
k j 
k 
aj,rj) 
Consider the term of g 'Y (M ) which cancels to (a I? ) in 
k 1 1' 1 
* ? 
R(H 'I T ). This term must be of the form (a I' )T, where r is 
1' 1 1 
Eithera eV ora c Vp . The sum 6g 
k+ly 
a prefix of r 
1' 1 1 1 
(v 1 
k+l 1 
k 
1 
contains 6gg Yk(M1), which contains 6g(a 1' I' 1 )T. If al c V1, then 
g(al ,rl) .contains (Aa2ag: . . a , j rl) , and Gg(al ,rl)T cbntains 
r 
(ha2a3.. .a , jrl). 
On the other hand, the sum bg k+lg 
r 
(M ) also 
k+l 1 
contains dpgny (M ) If a E Vq, then (Aala2.. .a , j) is a sumand 
k 1 i r 
1 -- 
of p, and therefore bp(a I' )T contains (Aa a 
11 2 3" 
. a , j r Thus in 
r 
k+l 
(M) contains the summand (AaZag=.-a jrl) and 
either case, 6g Yk+l r' 
k 
since every. summand of 6g (M ) is a summand of 6g 
k+ly 
kj 
. it 
k+l 1 
k+l 
follows that 6g k+l (M) contains 
This completes the proof. 
* k 
Lemma 11: Let a E V, I' f 1 . For k 2 8, all terms of g (a,l") 
- 
are of the form (b,aJ') (S ,A). . . (GI ,A) where b c V, c 
e 7, m 2 0, 
m i 
For notational. convenience we abbreviate c c by N; Ilence we 
1 m 
- 
denote (b ,~r) (c ,A). . . (cl,A) by (bN,AT). 
m 
Proof by induction on k, the number of applications of g. By 
0 
definition, g (a,I') = (I) which establishes the assertion for the 
value k = 0. 
n+l n 
Assume the assertion holds for k < n and consider g a, = gq (a,?'). 
By the induction hypothesis, all terms of $(a,T') are of the form 
8 
(bfi,~~') where b aN. 
Hence terms of gnfl(a,r) are of the form 
g(bE,Or). Since g limited to is the identity. .g(bi,~~) = [g(b,bT ) ] (i,,~). 
By definition of g, g(b ,Or ) contains only terms of the i'orm (cG, j91 ) 
j n't-1 
where C + blf is a production. Therefore terms of g (a,I') are of 
the form 
j F) j Q 
and since C -h b?l and b =s, aN it follows that C aNM. 
k 
- 
corollary: All terms of g (&r) are of the form (~NM,AT). 
k A 
Lemma 111: If 6g, yk(M) contains (A~,A), then A - MN. 
Proof by--induction on the length of M: 
Basis : Let a. E V and assume 
k 
6g Yk(a) contains (G>A). 
If pi represents an arbitrary summand of p other than PL, then every 
k 
term of g Y (a) can be represented in the form 
k 
where 0 r n < k and n denotes the number of nontrivial summands of p 
which are factors of the term. 
By const~uction, every summand of p is either A-or of the form 
+ 
(B .F , j i) where Bi I: VN, P cV,jiF-T 
I. i i 
.I i 
and B -* P is a production in G. 
i i 
k 
By Lemma. 11. every term of g (B. , j i) is of the form: 
I i 
C - * * 
(C .M.P ,I' . j ..) where Ci Vi, Mi, P cV,Ti tl 
~ri 11 i 
k 
By the same lemma, it follows that every term of g (a,A) is of the 
form 
-. 
I' 
(Cn+lMn+19 n+l 
) where C E V, M 
3 ll *+I c 1.- 
n+l n+l 
k 
Hence every term of g Y- (a) is of the form 
k 
r 
L 
riji I n+l 
where C -----" P.M. for 1 r i r n and C - M 
i 11 n+l nC1 
k 
By assumption there is a tern1 t of g Bk(a) such that 6[k] = (A~,A); 
t must be in the form indicated above. In order for t to cancel under 
4, the following must be true: 
C1 
= A since C cannot cancel from t, 
1 
- - 
P =QC for 1 i I n since C 
2- acn+l 
must all cancel from t. 
i i i+l 
Therefore 
This cancels to (i,~) as required with 
=I QMQ M . Q M 
n+l n n n-1 n-1 11 
Then by (19, 
C -C QM 
it1 i i' 
15 i 5 n, and 
i 
Hence-, since C = *A, 
1 
and thus 
a 
A . N. 
This establishes the basis . 
* 
Induction; Assume that for all M V such chat IM I n, if 
k a 
6g Yk(w) contains (AN,A) then A =-. MN. Let fi = Ma be a string 
k5 
such that I~ai = n+l and 6g y (Ma) contaks (AN,&). Because 6 g 
k 
and Y are hmornorphisms, 
k k 
Then 6g W (K) must contain a term (T A ) and 6g Y (a) must contain 
k is 1 k 
a term (T A ) such that T T = AN and h = 
2' 2 12 81A2 
In order f6r this tc occur, T2 must be of tahe form (BE*) r~hpre 
* - - 
B c(V, N2 r V , and TI just,be of the form (ANIB) where1 A E V, 
- 
* I - 
N1 E V , and N = fi1i2. (If T and T were not of this form, 
1 2 
k 
cancellation to ~ would be impossible.) 
Thus 6g Yk(M) contains 
- 
(AN R,Al)., and by the induction hypothesis 
1 
k 
Also 6g Y (a) contains (B&>,A~) and by the basis 
k 
It follows that 
and since A = Ma and N = 
N2N1' 
which completes- the proof. 
The theorem now follows from Lemmas 1 and I I1 and As~ertion 2. 
The 'if' part follows from Lemma I and Assertiori 2, and the 'only if' 
part follows kminediately from Lemma I11 for the special case of N = A. 
As we have stated the theorem, the length of x is used to 
determine a sufficient number of applications of g and Y. Alternatively, 
the theorem could be foxmulated in terms of the heights of derivations 
of X; if A is a derivation of x of height k, then for every n 2 k, 
the term (S , A) will be in the polynomial .s~"Y (x) . 
Furthermore, it 
n 
follows from Lemma 111 that no harm is done by choosing the value of 
n too large, i-e., no 'false' derivation terms will occur. 
In the flrst statement of the theorem, the derivation terms 
n 
n n 
are obtained from the polynomial Bg Tl,p v(x.) which can be re- 
written in the form 
Although we have used a constant value of n (equal to the length of 
X) for both the powers of the map g and the polynomial p, some 
economy can be gained in this respect. In fact, the poweYs 5f g and 
p can decrease from left to right so long- as they remain large 
enough to perform the appropriate computations on the suffix strirlgs 
of X. Thus, the theorem is true (b~t considerably mora difficult to 
prove) if,one instead uses a parsing polynomial of the form 
V. Special cases of the thedrem 
A number of intdesting special cases occur based cln the choice 
of V1 and V 
2' 
Case 1. V1 = VT. 
The function g handles all productions of the form 
while p handles productions of the form 
Notice that since g is nontrivial on only V g need be used only 
T ' 
once; i-e., 
The parsing polynomial is then 
The special case 01 V = VT and,V2 = VN 
1 
results in a particularly 
simple form if the grammar is in Greibach no ma1 form. The polynomial 
p = (A,A) and therefore has 110 effect. Since g need only be applied 
once, all derivations are found in one step. 
Example 1: 
G = <~,A,B>, {-a,b), S, P> 
P=1. s-i-ah 
2, A+AB 
3. AfA 
4. B-tb 
For the string x = aabb, the parsing polynomial g[Y (x)] then contains 
k 
(among other things) for all k 2 2, 
This contains : 
[w(S,l) (h,h)] [(A,2) (:,A) (x.~) (A,a) (i,~) (x,~)] [(A,3)1 (B,4) 1 [ (B,4)1 
Applying 6 we get 
Case 2. V1 = V. 
The entire job of parsing is now done by g, since the polynomial 
p is equal to (A, .'I) . Hence the parsing polynomial is 
Example 2: We use the same grammar and input stririg as above. 
V1 = is, A, B, a, b). 
v2 = 9' 
g(S,P-) = (%A) 
g(A,A) = (A,A) + (A,?) (&A) 
g(B,A> = 
g(a,/.) = a, + (~,1) (A,A) + @,3) 
The parsing polynomial for azbb is 
For k 2 3, this contains 
which in turn contains 
2 
[(s,I.)(H,A>][~ (A,3)][(B,4)][(B,4)] after one application of g, 
[(S,l)(A,~>][(A,223)(B,h)(B,h)] [(B,4)][(,4 after three. 
Applying 8 results in (S ,122344) as before. 
Case 3. Vl = 0. 
N.ow the entire parse is handled bp p. The parsing polynomial 
becomes 
VI .. Observations 
+he maj or theorem presented here shows how context-free 
parsing may be carried out by purely algebraic means. All parses 
of an input string are developed in parallel and the process is 
guaranteed to terminate-. As we have described the process, the 
+ 
number of terms of a parsing polynomial for a string x c V is 
T 
unreasonably large. %lowever, most of the terms in such a polynomial 
are not associated with a derivation in the grammar, and method; 
exist for r9ducing the computation by disregardin4 dead-end terms 
before they are completely evaluated. By applying such techniques in 
a straightforward fashion, and choosing V and V2 in various ways, 
1 
the algebraic method can be associated in natural ways with classical 
parsing techniques. For example, the algebraic process in case 1 
above 5s a goal directed top-dawn apptoach simflar to the predictive 
analyzer. Case 2 is the algebraic version of generalized bcttorn-up. 
Parsing algorithms are typically so difg erent one from another 
that they are incomparable. But using techniques described above, 
many parsing algorithms may be posed in a single algebraic framework. 
This may facilitate the comparison and evaluation of parsers and 
of various classes of grammars. 
American Journal of Computational Linguistics Microfiche 55 : 61 
Department of Computer Science 
Cornell University 
Ithaca, New York 14853 
This work was supported in part by the National Science 
Foundation under grant GJ 43505. 
b number of statistical theories have been proposed capable of 
identqifying individual text words that* are most useful for the 
content representation of written texts and documents. Among 
these are parameters based on the variance of the word-frequency 
distribution (NOCC/EK), and on information theoretical (signal- 
noise S/N) premises. These formal parameters are reLated to 
practical automatic indexing techniques--most notably to the 
discrimination value (DV) method, capable of generating content 
identifiers (individual words, phrases, and word classes) that 
distinguish the various texts and documents from each other. 
It is shown that terms with favorable formal parameters also 
exhibit desirable semantic characteristics in that such terms 
are concentrated in documents judged relevant by the respective 
user populations, and vice-versa for terms with unfavorable 
formal properties. 
1. Theories of Term Importance 
Automatic indexing may be considered to be a two-step process:. 
first the automatic identification of linguistic entities useful 
for the representation of document content, and then the assign- 
ment to the prospective content identifiers of weights reflect- 
ing their importance for content description. Since these tasks 
must ultimately depend on a study of the texts or documents 
under consideration-, a grelt deal can be learned by examining 
Term Value Measurements 
the occurrence patterns of words and other linguistic entities in the documents 
of a collection. Indeed, among the theories of term importance which have 
been studied in recent years, the best known 'ones are based on the respective 
frequency distributions across a variety of written texts. 
A) Variance-Based Measures 
The most widely used of the statistical theories' distinguishes so-called 
"specialty" wowds from "nonspecialtyll words by assuming that a deviation from 
randomiiess in the occurrence pattern of certain text words is indicative of 
specialization and hence of good content identifiers. Thus the best content 
descriptors are terms Whose occurrence pattepns deviate most strongly from 
randomness. Since a random sprinkxing of the occurrences of a given text w~d 
across the documents of a collection leads to wora frequency distributions 
which follow the Poisson model, a compa~ison of the actual freqiieiicy 
characteristics of a given term with the Poisson distribution leads to the 
appropriate distinct ion between good content words and poor. ones. 
More specifically, since the variance vk of the frequency distribution 
of term k is propo~tioeal to the total frequency of occurrence F~ for terms 
whose distribution obeys the Poisson model, a measure of term importance is 
k 
obtainable by using a formula based on the ratio of vk to F . Some 
typical formulas used fcr this purpose are vk/fk and n 
k k 
2-v/F 
where n is the collection size. [1,2,3] The basic mathematical formulations 
are collected in Table 3. 
Term Value Measurements 63 
---- 
Formulas 
- - .- L - - "-1 -=,-a : -- 
I.d ,er of 22c.~-;::r I.. ------ - -.., 
I 
li.ec_uency of term k in docment 
l I 
binary fre~uency of Term k in 
document i 
total frequency of term k in 
collection 
document Srequency of term k in 
collect ion 
(number df documents in which the 
term occurs ) 
average frequency of term k in 
collect ion 
Basic Frequency Formulas 
Table 1 
Term Val ue Measurements 
One such variance-based measure used by Dennis under the name of 
NOCC/EK [3] may be computed as 
It is obvious from this formulation that the most effective terms are those 
whose occurrence frequencies fk in the individual documents deviate strongly 
i 
k 
from the average frequency F /n. 
B) Signal-Noise Measure 
Another measwe based on the characteristics of the frequency digtribution 
of individual text units across the documents of a collection is the signal-noise 
ratio which varies with the skewness of the frequency distribution. This 
measure has the form OF entropy and assigns the highest value ts those terms 
whose occurrence characteristics exhibit the greatest variation from one 
document to another; ccntrariwise low values are assigned to terms with 
relatively similar frequency patterns in each of the documents of a 
collection. 
[3,4] 
The idea is that terms with even frequency distributions 
which may occur an identical number of times in each document of the 
collection canr~ot be used to distinguish the documents from each other; hence, 
their assignment for purposes of content representation is counter- 
productive. The reverse obtains for terms with skewed fvequency distributions. 
k 
The signal noise value (S/N) for term k is defined as 
k 
A1 J. I 
i 
(s/N)~ = log F - C log - 
:-A l? E 
k 
Term Val uc Meas urcmcnt s 
k 
The negative term in expression (2) is known.as the noise N ; it is 
k k 
maximized for even distributions where fk = F /n for all f.. 
The 
1 
properties of the signal-noise measure are thus very similar to those 
described earlier for the variance-based formulas. 
C) I nf ormat ion Theoret ic Considerations 
The for~going development leads to a distinction among the terms in 
k 
accordance with the relative sizes of the indibidual term frequencies 
fi 
k 
in the documents and the total collection frequency F . A question 
arises about the preferred size of the collection frequency 
F~ (or of the 
k 
document frbquency B 1 for terms that are useful as content identifiers. 
This problem may be tackled by having recourse to certain information-theoretic 
concepts. Consider the task of supplementing a set of existing 5ndex terms 
ideneifying a collection of documents by addition of a certain number of new 
Terms. Each new 'term is then most effective when 
a) 
it provides maximum additional reduction in uncertainty among the 
documents of the collection (that is, its assignment breaks up 
existing subsets of documents that cannot be distinguished by the 
existing term assignments into substantially smaller subsets); 
b) 
it exhibits little redundancy with the previously available terms 
SO that its assignment does indeed optimally divide the various 
document sets. 
The first property is obviously not fulkilled for tersms with low 
k 
document frequency B , that is, those assigned to very few documents in the 
collection, because their assignment provides little additional discrimination 
among the documents; the second property, on the other hand, does not obtain 
for terms of high document frequency that may be assigned to a very large 
number of documents, because such terms will obviously exhibit a good deal of 
redundancy with the already existing terms. 
Term value Measurements 
The conclusion is that the best terms are those whose document frequency 
k k 
B , or total frequency F , i~ neither too large nor too small, and whose 
k 
ikequency distribution is skeued in that for some documents4 f is much 
i 
F~ 
> 
F~ 
larger than - and for some others fi is much smaller than - . 
n n 
D) The Discrimination Value Model 
The discrimination value model uses as a point of departure the retrieval 
capability of the various index terms; specifically, a good content-indicative 
term is designed to help in the retrieval of material that is wanted (thus 
enhancing the recall), and in the rejection of material that is extraneous 
(thus enhancing the precision)fi. To produce high recall, that is to retrSeve 
most everything that is relevant, the terms used to 'identify documents and user 
queries must be fairly general in natwe; high precision, on the other hand, 
that is the rejection of the nonreleudat material, depends on the use of 
reasonably specific content identifiers. The indexing problem then reduces to 
the choice of terms that are specific enough to prohuce high precision while 
also being general enough to produce high recall. 
In the discriminatiqn value model, the assumption is aade that the best 
terms in this respect arc those which cause the maximum possible separation 
among the dobuments in the "document space". Consider , in part idular , a collect ion 
of documents each identified by a set of content identifiers, or index terms. 
The ?'ndex term sets for two given documents can be compared to produck a 
similuity coefficient measuring the closeness between the respective documents. 
* Recall is the proportion of relevant material retrieved while precision is 
the proportion of retrieved material that is relevant. An effective 
- 7 
retrieval system is one whlch produces the highest possible precision for a 
given level of recall. 
Term Val uc> Mcas urcmcnts 
The existence of the term qets representing the various documents, and the 
possibility of computing similarity measures between documents can be 
used to define a document space For the collectioh. In such a space two 
documents appear in close proximity when their similarity aoeffi~ient is 
large; contrariwise, documents exhibiting little similarity are widely 
separated in the document space. One may then conjecture that a document 
space which is "bunched up", in the sense that all documents exhibit 
somewhat similar term sets is not u~eful for retrieval, since one document 
cannr*t then be distinguished *om another. On the con;trary, a space. Which 
is spread out in suchma way that the documents are widely separated from 
each other may provide an ideal retrieval situation since some documents may 
then be retrieved - hopefully the relevant ones - while others can be 
rejected. 
This suggests that the value of an index term can be ascertained,by 
measuring the amount of spreading in the document space which occurs when 
that term is assigned to the documents of the collection. Specifically ; if 
Q is the density of the document space without term k present among the 
content indicators, and Qk is the density after term k is assigned, then 
for a good term 
Q - Qk > 0, since the space will have spread after term k is 
assigned. ConverseSy for poor terms Q - % T 0.2 [5,6] An appropriate 
* The density of the space might be computed, for example, as the sum of all 
pa:-vwise similarities between dist inct document pairs, that is 
where S(Di, D.), 0 < S < 1, is the similarity between documents 
D 
3 
- - 
and D.. 
i 
3 
Term Value Measurements 
measure of term importance is then the term discrimination value, DVk 3 
defined as 
It may be of interest to inquire into the relationship between the 
discrimination value of a term and the statistical. (frequency) parameters 
introduced earlier. The following conclusions are reached from a study of 
the indexing vocabularies in several different subject areas, relating the 
document frequency of a term to its discrimina.tion value: [5] 
a) 
terms with yery Low documeht fiequenay that may be assigned to 
very feQ documents in a collection are generally poor discriminators; 
when the terms are arranged in decreasing order of their discriminamtion 
values (where rank 1 is asdgned to the best discriminator, rank 2 
to the next best, and so on) such terms exhibit ranks in excess 
of t/2 for a total of t existing terms; 
b) 
term3 with high document frequencies, comprising those that are 
assigned to more than 10 percent of the documents of a collection are 
the worst discriminators, with average discrimination ranks (ranks in 
decreasing discriminatioh value order) near 
t; 
c) 
the best discriminators are those whose document frequency is neither 
€QO high nor too low -with document frequencies between n/100 and 
n/10 for n documentq; their average discrimination ranks are generally 
belaw t/5 for t terms. 
The vector space analysis then appear& to confirm the conclusions derived 
earlier from the statistical models, that terms which appear in a collection 
with great rarity or excessive frequency are not optimal for content 
description purposes. 
Term Value Measuremt:nts 
2. Compariscn and Evaluation 
The discrimination value analysis can be used to derive an effective 
indexing policy: 
since the best terms appear to be those with medium 
document frequencies, such terms can be directly assigned as content 
identifiers without further refining transformations. On the other hand, terms 
with excessively high document frequencies must be made more specific thereby 
decreasing the frequency of their assignment to The queries and documents 
of the coilection: contrariwise, terms with low document frequencies must 
be made more general by increasing their assignment frequencies. [5] This can 
be achieved by joining two or more high frequency terms into term phrases, 
while assembling a number of low frequency terms into term classes. 
Obviously, a term phrase exhibits a lower assignment frequency than any phrase 
component, and vice-versa for a term class which replaces a number of 
individual class elements. 
It was shown earlier that the use of phrases and term classes (thesaurus) 
constructed in accordance with t*he frequency requirements imposed by the 
discrimination value theory produces substantial improvements in retrieval 
effectiveness (recall and precision). In the present work, additional 
relationships are examined between the statisticd and the vector space models. 
However, instead of aotudly 'using the various term sets in a retrieval 
environment, an attempt is made to relate the formal frequency and vector 
spaee properties of the terms to the se-nantic characteristics of these terms. 
Specifically, consider a collection of documents in a given subject aea 
and an appropriate set of user queries pertaining to that area. 
For each user 
query, the set of documents can be partitioned into two subsets consisting of the 
Term Value Measutements 70 
relevant set R and the rlonrelevant set I, respectively. Relevance is 
assumed to be user-specified in such a way that a relevant item is assumed 
to be one which ig related in some sense to the infornation need expressed 
by the various user queries. The linguistic, or semantic, character of a 
given term can now be introduced by assuming that the most valuable content- 
identifiers assigned to a collectio~l of texts are those which are! concentrated 
in the documents specified as relevant to the respective queries, as opposed 
to the. nonrelevant ones, contrariwise, the less valuable terw will be 
concentbated in the nonrelevant items. 
The discussion may be formalized by using the concept of term 
relevance TR. [7] Consider a term k contained in query Q;. the terBm 
releva~ke TR(k) may be defined as 
where r and hk are the number of documents containing term k that are 
k 
relevant and nonrelevant respect ively to query Q, and I R I and I I I ire the 
total number of relevant and nonrelevant documents for that query.;' When a 
term k occurs in more than one query, its term relevance may be taken as the 
average of the relevance values obtained for the various queries. 
The mathernaticxlly undesirable situation when I RI r or when h 0 
is not likely to occur in a pi-actizd envircnmegt. 
k k 
Term Value Mcas urcmen ts 
It is clear from the function (4) that high values arc ~:SS~~TIC~ to 
those query terms which are prevalent in the relevant items and rhe in 
the nonrelevant, and vice-versa for thase previl3cn-t mainly %n the nonrelevant. 
Furthermore, the terms falling into ;he former class ape likely,to be more 
useful for content representation than those in the latter. 
To verify th'e relationships between the statistical models of word 
importance and ths vector space model, dcsument collections are used in three 
different subject areas, including aerodynamics (cRAN), medicine (MED) and 
world affairs (TIME). The vocabularies and user populations are disjoint 
for these rhree areas. Results which carry through for all three cases 
should be extendable to other subject fields as well. The basic collectibn 
statistics are contained in Table 2. 
It may be seen from the Table that the term relevance is defined for 
only a relatively small number of terms for each collection, namely 458, 172 
and 375 for CRAN, MED, and TIME, respectively. The reason Ls that a term 
relevance value is computable only for terms which occur joinaly in certain 
query-document pairs. Fop small experimental collections operqting with a 
restricted number of queries the size of the corresponding term sets is 
obviously limited. 
Consider now the comparison of the standard statistical term value 
measures with the term discrimination values obtained by the vector space 
transformations. Table 3 shows the values of the NOCC/EK and S/N measures 
(expressions (1) and (2)) obtained for tine 50 terms with highest discrimination 
values and the 50 terms with lowest discrimination values for each of the three 
test collections. 
The range of the respective values is given in each case, 
as well as the average values for each set of 50 terms in percent (that is, on 
Term Value Mcdsurcmcnts 
7 2 
Basic Collection Statistics 
Table 2 
Chc:ract eristics : 
- 
Subject area 
Nunher of documents 
Numb5r of cser queries 
Number of terms assigned 
to collect 5 on 
Number of teps occurring 
jointly in queries 
and document sets 
CRAid 
4 34 
aerodynamics 
424 
155 
2651 
458 
MCU 
450 
medicine 
450 
2 I! 
4726 
172 
TIME 
425 
world affairs 
425 
83 
7569 
37 5 
. - 
Term Value Measurements 
a scale of 0 to 100). 
T test values are.also shown ~epreserlting the 
probability that the two sets of 50 values (for the high DV and low DV 
terms) could have been derived from a common probability distrihutzon 
by chance. 
In statistical significance testing, a t-test value smaller 
than 0.05 is normally taken to imply a significant difference; that is, 
the hypothesis that the mo sets of values do in fact originate from a 
common distribution is rejected in such a case. 
[8] 
It rpay be seen that the ranges of values for the statistical parameters 
NOCC/EK and S/?$ exhibit substantial differences for ail three colleotions. 
The same is true for the corresponding average values. Moreover the 
differences are in all cases statistically significant. , It is then clear 
that a high discrimination value reflected in the ability of a term to 
expand the document space upon assignment to the collection also implies, 
favorable statistical parameters in terms of va iance and skewed frequency 
distributions; the converse is true for the low discrimination values. 
At the bottom of Table 3, range and average values are given for those 
terms among the sets of 50 terms for which the term relevar~ce is defined 
(that is, lhose which co-occur jointly in some query-document pair). 
Again the term relevance values are substantially different for the two 
classes of DV terms, and these differences are statistically significant. 
Also included in Table 3 are the multiplicative factors which relate 
the average values for the 50 high discriminators and the 50 low 
discrimihators for each of the three measures (that is, the factor by 
Term Value Measurements 
which the  OW average value must be multiplied to obtain the high). 
It may be seen that this factor is much higher for the term relevance 
than for either of NQCCIEK or S/N. The actual factors for the term 
relevance are 6.66, 80.0 and 36.33 for the CRAN, MED, and TIME collections, 
respectively. Thi& indicates that the high discriminators have very much 
higher average term relevance than the low discriminators; alternatively 
expressed, there is substantial agreement between the semantic term 
relevance concept and the automatically derived term discrimination values. 
The data already included in Table 3 are shown in term relevance order 
in Table 4. The output of Table 4 contains range and average values for 
NOCC/EK, S/N, and DV for the 50 terms with highest term precision and the 
50 terms with lowest precision for the CRAN and TIME collections, respectively. 
Averages are produced for only 30 high and 30 low precision terms for the 
MED collection because in the medical environment the small number of 
available queries (24) made it possible to compute term precision values 
for only 172 terms in all. 
It is clear from the output of Table 4 that the differences in the 
respective values aye substantial in all cases, and the t-test values 
indicate that they arc fully significant. For the three collections under 
study, &he evidence indicates that terms with favorable formal parameters tend 
to be concentrated in documents identified as relevant by the user population, 
and vice-versa for terms with unfavorable formal parameters. Also shown in 
4 -k 
Table 4 are average document frequency (B ) and average total frequency (F ) 
values for tho high and low relevance terms respectively. It may be seen that the 
Term Value Mcas urements 
high relevance terms exhibit a much lower frequency spectrum (as e~pected 
for good discriminators) than the low relevance terms. Once again, it 
appears that the term relevance reflecting the semantic properties of the 
terms in their particular collection environment effects a division among 
the terms very si~ilar to that obtained by the discrimination value 
cornputat ions. 
In earlier work it was shown that the discrimination value theory which 
leads to the assignment to queries and documents of medium frequency terms 
cincluding also phrases constructed from high frequency terms, and term 
classes made up of low frequency terms-) exhibits egfective retrieval 
characteristics. [4,5,6] Typical average retrieval precision values for 
three different recall levels (recall of 0.1, 0.5, and 0.9) are shown for 
the three collections in Table 5. The output shows that the use of medium- 
frequency phrases and term classes improves performance by about 20 percent 
compared with the assignment of single terms alone. The comparison of 
Tables 3 and 4 between discpimination values on the one hand, and statistical 
and semantic parameters on th"e other, indicates that the same theory which 
produces such effective retrieval characteristics also conforms to the known 
sta'tfstical and linguistic theories of term behavior. 
Term Value Measurements 
CRAN 424 
I 
- 
NOCC/EK range 
average ( in percent ) 
t-test 
average high/average low 
50 Terms with 
50 Terms with 
High Discrirninclt ion 
Low Discrjmi na t ion 
Values V~lJucs 
range 
average ( in percent ) 
t-test 
average high/average low 
1.954 to 0,699 1.222 10 0.000 
so. is% 59.95% 
0. 00002 
Term range 
elevance TR average (in percent) 
t-test 
average high/average Lon 
392.66 to 0.00 74.35 to 0.00 
14.06% 2.11% 
(21 terms only) (24 terms only) 
0.02208 i;J 
a) CRAN 424 Collection 
Cornparison of Statistical Models in 
Term Discriminati-or] Values 
Table 3 
Term Value Mcasurcments 
b) MED 450 Collection 
r 
MED 45'0 
NOCC/XK range 
Comparison of Statistical Models vith 
Term Discriminat ion Values (cont . ) 
Table. 3 
50 Terms with 
High Discrimination 
Values 
- 
50 Terms with 
Low ~iscbimindt f &n 
Values 
average ( in percent ) 29.51% 15.61% 
t-test 
0.00002 
average high/average low 1.89 
------------.I.-------- -.-----------,,,--,------ 
S/N range 2,792 to 0.693 1.738 to 0.126 
average ( in percent ) 
48: 46% 23.93% 
t-test 
0 . 00002 
average high/avsrage low 
2,03 
-----I--------------- -----------------------a- 
Term range 874.00 to 0,00 
i 
9.43 to 0.00 
Relevance TW average (in percent) 16.0% 0.20% 
(12 terms only) 
(24 terms only) 
t -.t cst 
0 04274 
average high/averags low 80.0 
9215 to 1359 7614 to 531 
A 
Term Value Measurements 
c) TIME 425 Collection 
r 
TIME 425 
NOCC/EK range 
average (in percent ) 
t-test 
average hi,gh/ave~age low 
6 
S/N range 
average 
t-test 
average high/average low 
Term 
Relevance TR range 
average ( in percent ) 
t-test 
average high/average low 
Comparison of Statistical Mo'ciels with 
Term Di scrirniriat-j on Values (cont . ) 
Tablc 3 
50 Terms with 
High Uiscrimi~~at ion 
Values 
50 Terms ,with 
Low Diccrinind tion 
Values 
13010 to 2330 4712 to 451 
37.5% 3 0.81% 
0.00002 
3.46 
d--------------------c----------------------------d- 
2.966 to 1.424 1,876 to 0.231 
68.85% 26.44% 
0.00002 
2.60 
,--,--------------------.----------------------- 
2459,OO to 62.62 27.73 to 0.44 
15.26% 0.42% 
(12 terms only) (23 ?Arms only) 
0.33921 
36.33 
Term Value Measurements 
a) CRAN 424 Cbllection 
NOCC/EK 
S/N 
DV 
- 
Comparison of Term Relevance with 
Term Discriminat ion Values 
Table 4 
50 High Relevance 
Terms 
--k *- 
B =10.3 F -24.6 
- 
50 Low Relevance 
Terms 
-A -& 
B =58.9 F =84.0 
3657 to 420 1584 t~ 432 
average 38.95% average 20.66% 
t-test 0.000n2 
average high/average low 1.89 
------I-------------------- 
1.953 to 0.000 0.998 to 0.045 
average 42.81% average 20.63% 
t-test 0.00002 
average high/average low 2.08 
--------------------------- 
1.223 to 0.002 0.075 to -1.283 
average 65.52% average 25.06% 
t-test 0.00140 
average high/average low 2.61 
- 
- - - -- - - -- - 
Term Value Measurements 
b,; MED 450 Collection 
Comparison of Term Relcvarlce with 
Term Discrirninat ion Values ( cont . ) 
Table 4 
30 Low Relevance 
Terms 
ec22.5 
* 
F =41.9 
- 
r 
30 High Relevance , 
Terms 
* --k 
P '9.5 F -24.0 
NOCC/EK 
I 
I 
2648 to 521 2248 tu 1140 
average lt8.01% 
t 
average 36.33% 
1 t-test 0; 02378 
{ 
1 average high/average low 1.32 
-------- 
t 
------------------- 
S/N 1.664 to 0.~126 1.259 to 0.000 
DV 
,.. . . 
- 
average 61.0% average 46.33% 
t-test 0.00272 
average high/average low 1.3 2. 
----------------e---------- 
0.135 to '0.006 0.688 to -1.030 
average 62.11% average 56.11% 
t-test 0.00621 
average high/averag low 1.11 
- 
Term Value Measurements 
c) TIME 425 Collection 
C~mpari~bfi of Tern Relevance wi~h 
Term Discriminat ion Values ( cont . ) 
Table 4 
50 High Relevaxe 
Terms 
4 
B =12.5 +=JC=~S.~. 
NOCC/EK 
------- 
S/N 
DV 
I 
6 
I 
50 L0.w Relevance 
* --k 
B -94, 5 F =161+.8 
13010 to 1117 2266 to 43% 
average 9 6.1% average 3.4% 
t-tes-b 0d00002 
average high/ave-rage low 4.7 4 
JLA- ---- A--d ----------- 
2.966 to 0.000 1.376 to 0.126 
average 42.31% average 19.25% 
t-test Q. 00002 
average high/average Iow 2~20 
,-,-------.r--,-----L.-----.L-,-c-1-- 
B.156 to 0.000 0 .a04 to -1.862 
average 94.05% average 83.0% 
t-test 0.00148 
average, highlaverage low 1.13 
Term Value Measurements 
Average Retrieval Becisiun 
For Various Recall Levels 
I 
I 
CRAN MED TIME 
I 
4 24 I 546 I 425 
\ 
A) Low Recall (0.1) 
i) single terms 
ii) single terms, 
phrases and 
term classes 
B) Medium Recall (0.5) 
i) single terms 
I 
ii.) single terms, 
phrases and 
term classes 
C) High Reca-31 (0.9) 
i) single term 
I 
$2) single terms, 
I phrases and 
terms classes 
Recall-Precision Performance for 
Medium Frequency Terms 
(Discriminat ion Value Theory) 
Table 5 
Term Value Measurements 83 
American Journal of Computational Linguistics 
Microfiche 55 : 84 
SNOPAR: 
Department of Mathematics 
Texas Woman's University 
Denton, Texas 76204 
A grammar testing program has been developed which permits 
modeling augmented transition network grammars as a series 
of SNOBOL4 functions. SNOPAR is designed for lknguistics 
teaching and research. Emphasis is placed or1 the development 
of small to medium grammars in a variety of languages. The 
system has been used so far to develop a grammar of English 
for use transformational grammar course and develop 
small grammars of a Nigerian and an American Indian language. 
Intended applications of SNOPAR are in fi.' & linguistics and 
grammar model testing. 
The main part of the program is the routine PARSER. When 
PARSER is c'allbd with a lexicon and grammar, input* strings are 
parsed according to the model grammar. The PARSER functions 
available for grammar developmerlt are CAT, PARSE, SETR, GETR, 
RESET, TESTR, GETF, GETCL, TO, BACK, FINDWRD, and BUILDS. The 
function operations and descriptions of their argumerits are 
given in Table 1. After a parsing, PARSER returns' control to 
the user permitting examination of stacks and registers at all 
TABLE 1 
PARSER 
CAT 
PARSE 
SETR 
GETR 
TESTR 
RE SET 
GETF 
GETGL 
BACK 
looks up the word class of the current first word.in 
the.input string. If the word is not in the lexicon 
an add routine is called which permits additions. If 
CAT succeeds by matching the current word class with 
its argument, the word is removed fram the input string 
and pushed ont-o a staclc (SAVEW) . If it fails an 
alternate class is tested, provided chat the alternate 
flag is on. Fail return leaves the surface string 
unaltered. 
calls the function given by its argument and if 
successful pushes the structure returned by the function 
onto a stack (SAVEQ) and assigns the structure to the Q 
register. 
sets the values of registers. It has three arguments 
level, register name, and value. Each cavil ~f SETR 
causes the register name specified to be placed on 
-3 list for the specified level. SETR enrries are 
treated as stacks, providing automatic saves for 
recursi-ve calls. 
returns the contents of the register name specified 
by its argument, and pops it ofk the stack saving 
the last value. 
looks at the value of the register name specified 
by its argument without popping it off the stack. 
changes the vaLue of a register without changing 
stack Levels. 
looks up the feature value for a feature speciFiied 
by its argurnebt of the current value of the word. 
register. Any word can be specif-ted by giving a second 
aqpment. Tf GETF fails for the word it looks at the 
root form of the word for certzain features 
loa~s up the word class of the word specified by its 
argument. 
has as its argumcnt, the new state label. It pushes the 
label onto a stack (PATH) ; outputs t:he s-t:at-c? , ou~:pllts 
the contents of the QrcgJstxr, and transfers control t.o 
the new scare. 
backs to ehhe state specified by its nrgrmic~nt. 
FINDWRD tests for the word specified by its argument:. 
BUILDS builds a structure from trhe register name list:. 
SNOPAR 
leve'ls. In the examination stage, traces may be turned on 
lexioal entries may b~ examined or minor changes to the grammar 
may be made. Functions available A for the examination of 
stacks, registers and lexicon are POP, OUT, GETR, LOOKLEX,- and 
TRACE. A function GETENG is also available for dictionary 
lookup in other languages. PARSER requires approximately 150 
lines of SNOBDL code and is currently operating on a DEC 10. 
A hatch version has been tested on an ZBM 360 
In order to use PARSER, a grammar and lexicon must be 
developed as disc: files. Since the grammar Ls developed as a 
separate file different components of the grammar can be tested 
and put together in a variety of contigurations. If a Lextcon 
is not developed as a disc Eiie prior to a parse, it may be 
entered fron the terminal A simple grammar which produces 
surface structure trees is shown in Example 1 along with a 
sample parsing. A portion of the lt?xicon is shown at the 
bottom of the page. Example 2 shows the use of the GETF func- 
tion to handle agreement between plural adjccti.vcs and a plural, 
rnark~r in Angas, a Nigeticln lony,unjic. Epomplc '3 shows a gr;lm- 
111:lr whi ch hyncll c?s r;c)ni-cnctt c?mbcdtli nf: in Engl i :;h . fiornc nnmlr 1 cB 
i IT; arcA :;hewn . 7'11~ mc~dr~l II:;P~ r ihca EX:IIII~ I cl '3 j;r;liJlrIi;ir 
j !; I,:,I :; i cij l 1 y I 11c orlcA (Icvca 1 opctl i 11 En); 1 j :;I) 7'rl111;; f o r-tll;i 1 j OII,J 1 
~II: y , Jo :I II~ J<o :;(~TI~J~J[J~I . A J :; i ( (':I NV ~:J-:IIII~II;I r f-o r 
1 j 11 I; wc: 1 1 ;I ;I. I i : or i cA1ll c~tl rr~ r for (:bo(* l ;JW (.in 
h~ic.ri can Indi an 1 nn):u;ij:cs) I i 11 tl(~vc.1 oplocJnI . 
SNOPAR 
The complete SNOPAR system has in add'ition to PARSER a 
routine for generating grammars from a .state transition graph 
and a register action table. This routine called NEW guides 
the user through a state transition graph and register actions 
to produce a grammar compatible with PARSER. Thd SNOPAR NEW 
routine is still in develoj~rnen~. The current routine allows 
deuelopment.of small grammars. The new developments will pro 
vide diagnostics of grammar errors. SNOPAR dlso has a line 
editor (FIXUP) and disc 1-10 commands. The complete system 
allows repetitive testing of model grammars, permits editing; 
and has trace capabilities fsr grammar debugging. 
SNOPAR 
Example 1 
SNP 
TRYVP 
QilES 
QhP 
POPS 
NPR 
PRO 
DET 
ADJ 
TRY N 
TRYPP 
POPNP 
PP 
TRYVPP 
VNP 
POPVP 
PARSE(NPO) IS(TO( .SNP)-) 
CAT( 'AU'X*) ts(~o( .QSSS*) )F(FRETURN) 
SSTR( .s, 'TYPE', 'DCL'> 
SETR(.S,'SUBJ*,Q) 
PARSE(VPO) ;S(TO(:POPS))F(FRETURX) 
SETR( ,S, 'TYPE',. '~t'F-STi051') 
SETR( .S,'AUX',Q) 
SETR(.S, *TE?lSEc,GETF( 'TNS')) 
PARSE(EP~) ;S(TO(.QNP))F(F.RETURN) 
SETR( .s, *SUEJ*,Q) ;(TO( .TRYVP)) 
SETR(.S,'PREDO,Q) 
S = BUILDS(S) s {RETURN) 
CAT( 'DET ') .IS(TO( .DET4)) 
CAT( 'PRO'] aS(TO(.PFO)) 
CAT( 'NPR ") IS(TO(.NPR)>F(FRETURNI 
SETR( .NP, 'PROP',Q) ~(TO(.POPNP)) 
SETR( .NP, *PRO* ,Q) t (-TO(. POPNP)) 
SETR( .NS, 'DE-T*,Q) 
CAT( *ADJ*) 
a. 
IF(TO( .TRYN-)) 
SETRf .NP, 'ADJO,O) a(TO(.ADJ)) 
CAT( 'N ') ~F(FRETURN) 
SET!?( .NP, 'N',Q) 
R~~~~(~~( f) SF(TO(.POPNP)) 
SETR(.NP,*PP",Q) S(TO(.TRYPP)) 
NP = BUI:LDS(NP) $(RETURN) 
CAT( -PREP ) t F(@RETURN) 
SETP(.PP,*PREP*,Q) 
PARSE(NP() 1 :F(FRETURN) 
SETR(.;PP, 'PREPNP ,U) 
PP = BULLDS(PP) ~(RETuRH) 
CAT( 'V *) t F~F'RETURN) 
SETRC .VP, 've,Q) 
PARSE(NPO) :S(TO(.VNP)) 
PARSE(PP~) IF(TO(.POPVP)) 
SETR(.VP,'PP',Q) 1 (TO( .TRYVPP) ) 
SETR( .VP, *NP',Q) I(TO(.POPVP)) 
VP = BUILDS(VP) [RETURN) 
TY LEXENG. 1 
DIP= (AUX)(TNS PAST). 
CAN= (AUX) (TNS P'RES') . 
COULD= (EOR!/l 'CA::) . 
WILL= (AUX) (TtlS FUT) .I 
THE= (DET). 
A= (DET). 
AN= (UET). 
THAT= (CLI!!l)) . 
BOY= (tl)(!JUP :;L.tif;). 
BOYS= (N)(NbS PL). . 
GIRL= (N)(tl9!{ LING). 
GLRLS- (E'0ItE.I I) 1 PL) . 
MAN= (N) (t.lnri SLliG). 
MEN= (N)(t!UR PL). 
WOMAN= (N)(NER :;L~Ic). 
WOHEN= (N) (1132 PI,). 
TABLE= (N) (!:!3R Sf !lC) . 
)ID YOU WALK TO THE VILLAGE 
DLD YO2 WALK TO THE VILLAGE 
STATE CUES 
COF?PLEuZZdT STRING:' YOU UALK TO THE VILLAGE 
BUILD STSUCT3RE DID 
STATE PFO 
COMPLEYEYT STRINGa WALK ~d THE VfLLACE 
BUILD STRUCTUFE YOU 
STATE POPNP 
COMPLEEE!;T STRiNC; WALK TO THE VILLAGE 
BUlLP STRUCTURE YOU 
STATE QEiP 
COMPLEXEAT STRi?:G;b XALX TO THE VILLAGE 
BUILD STRUCTURE (NF(PRO  YOU)^ 
STATE TRYVP 
COMPLEE'.ENT. STRLNG: WALK TO THE VILLAGE 
BUILD STRUCTURE (NP(PR0 YOU)) 
STATE DET 
COMPLEXENT. STRING a V-CLLAGE 
BUlLD STRUCTURE THE 
STATE TRYN 
COMPLEXEKT STRiNGt VILLAGE 
BUILD STRUCTURE 
STATE POPNP 
COMPLE3SST STRING: 
BUILD STRKTURE VILLAGE 
STATE TRYVPP 
~OMPLEMEKT STRiNC: 
BUILD STPUCTURE (PP(PREP TO)(PREFNP (NP(PET THE)(N VILLAGE)))) 
STATE POPVP 
COMPLEMEAT STRING: 
BUILD STISt'CTURE (PP(P.REP TOICPRZPNP (NP(DET T;iE)(N VILLAGE)))) 
STATE POPS 
COMPLEYENT S,Ti?I NG : 
BULLD SSRUCTURE: 
(VP(V WALK)(PP (PP(PREP TS)(PREPNP (NP(DET THE)(N VILLAGE)))))) 
STATE S 
COMPLEME?iT STRING 8 
BUILD STRUCTURE r 
(S(TYPE Q3ESTII:O))AAX DDDDJTEEISE PAST)(S'J&J (NP(PR0 YOU))) 
(PRED (V-P(V WALK) (PP (PaP(PREP TO) (PREPKP (NP(DET THE) (N VILLAGE))))))) 
) 
DO Y-OU WANT TO EXAKINE THE REGISTERS ? 
YES 
II 
OT OUTPUT = POP(PATH) SS(OT)F(EXAS\S\MIN) 
EP\P\OF 
PCPs 
POPVP 
TRYVPP 
POPNP 
TRYN 
DET 
TRYVP 
QNP 
POPNP 
PRO 
QUES 
DO YOU WANT TO EXAXLNE THE REGISTERS ? 
- C 
Example 2, 90 
ANGAS NOLT PHICASE 
NP 
POS 
ADJ 
KOM 
DET 
PL 
PL T 
NUH 
THWA 
POPNP 
EOC 
E&D 
ANGAS LEXICON 
LC-AS') = '(NOUN)(ENG DOG)* 
L<'MAT'> = '.(i!srr~) (ENG WOYAN) ' 
L~'FAE~A-> =' *(POSPRO) (EN?, MY? 
L<'RIZTO> = '( ADJ) (PL -PLl) (ENC GOOD) ' 
L<*R~~'J'-FfIJiTO> = '(ADJ) (P'L PL) (EMG GOOD) ' 
L<*BIJIM*> = [ADJ) (PL -PL) (ENG NIf3) ' 
e<'~~'tj-WAN *, = *(ADJ,) (PL PL)(E~.IS PIC>* 
L<'GAK.> (J EN ONE) * 
L< 'UAP .$ = (I ti TWO) 
L<'NYII*> = .(I)FT)(I.,NC 'I'IlJ~~)~ 
L<'~JA'> = *(Dl~T)(b,tl~~ Tt!b;)- 
tc'c~;', = '(DET) (E!;'; A) ' 
L< *MWA -3 = ' 1) ( ti EJL~JP) * 
L< 'RuLU.';*? = (1 1 (i "AMRE) 
L<'KI-) = '(Kl.)(EtlC; POS:;:.:;IVE) 
'EX$% 
STATE POPNP 
COmYTLE!?ENT STRING : 
BUILD STSUCTURE MvdA 
STATE 3.P 
91 
COYPLEEENT STRING: 
BUILD STRUCTURE (MP(NOLTN AS.)(POSPRQ FANAHADJ NAN-?:A~J)(DET CEHPL WA) 
1 
ENGLISH: DCl3 MY BIG A PLUR 
DO YOU WANT TO EXAMINE TEE REGISTERS ? 
NO 
INPUT STRUCTURE TO BE PARSED 
AS FANA BIJIY CE MWA 
AS FANA BLJIM CE KWA 
STATE ?OS 
CORPLEHSNT STRINS: FAMA BIJIM CE MW4. 
BUILD STFUCTURE AS 
STATE ADJ 
COYPLEMZNT STRISG: BIJIM CE MWA, 
BUILD ST8UCTURE FANA 
STATE DET 
COMPLEMENT STRING: CE 
BUILD STRUCTURE BJJIH 
STATE PLT 
COMPLEMENT STRING: 
Burm ST~HJCTYRZ ~WA 
STATE NP 
COMPLEMENT STRING: DID NOT PARSE 
BUILD STRUCTURE M'dA 
DO YOU WANT TO EXAMINE THE REGISTERS ? 
NO 
1NPUT.STRUCTURE TO BE PARSED 
AS WdA 
AS MWA 
STATE POS 
COMPLE.ME~TI STRING: MWA 
BUILD STRC~CTURE' AS 
STATE KT 
COMPLEMENT STReING: MWA 
BUILD STRUCTUaE 
STATE ADJ 
CONPLEMENT STRING: MrdA 
BUXLC STRUCTURE 
STATE KOM 
COMPLEMEfiT STRING : pl !A' 
BUILD STRUCTURE 
STATE DET 
GOXPLEMGNT STRING: MWA 
BUILD, STRUCTURE 
STATE PL 
COMPLEYENT STSING : MWA 
BUILD STRUCTURE 
STATE FLT 
COMPLEYENT STRING: 
BUILD STRUCTURE MWA 
STATE POPUP 
COMPLEXEYT STRING : 
BUILD STFUCTUfiE YWA 
ST A'-?' E Y 
COXPLE3,FNT STRING: 
BUILD ."sTRilCWTUFE PC AS) (PL MXA)) 
ENGLISH:, DOG PLUR 
DO YOU WANT TO EXANYE THE REGISTERS ? 
un 
Example 3 
FL~vCTION DEFINITIONS 
G,RAP CEFINE("st]Nv) 
CEFIKF('ESOp) 
CEFJhW('NP[)PINw) 
CEFIREtPPP()') 
GFIFIW[p'VP()N*) 
CEFXhE(.XO() 
* S PARSER 
PARSE(S(1) 
OUT('S'rSTRpQ) (NX'T,COM) 
s FARSE(NFO) ISCTOQ,SNPI~ 
CAT(,AUX) :S(TO(,Q)] 
FARSE(VF0) rS[TOC,XMP)lFCFRETURH) 
SNP SETF(pS~'SUBJ'tQ) 
SETF( tS~ 'TYPEn# 'CCL") 
FFFSE(VP()) 3S[TOT,POPS)) 
CAT( ,AUX) tS(TO[,AX))F(FRElTURN) 
IMP EETR(,SI'TYPE'I?I~P') 
SETf:(,Sr'SU!3J*p '(PRO YOU)*@) a[TO(,PQPS)) 
€4 SEW[ @SI'AUX'~Q) 
SETR(~SpPThS'IGETF(CTN5P)) 
EL7F~,Ev'TYPEC,"C'3 
FEPSE(bP0) ~s(TQ(,QNPI)F(FRET~RN) 
AX SETRI,Sr "AUXoyQ1 
SETF[,SI'ThS'rGETFfoTNS']) 
FXNJIkFD('HAVEC) SE'TP(IS,'HA",eHAVEe) 
FFRSE(VP[)) 8S~TO(,POPs)lF(FRETURM) 
QfiP SETF(,Sr *SUBJ',Q] 
FIt~DhPD('HAVE') SE?R(,Sf "HApr@HRVE@) 
PfiPSE(VF(3) rS(TO(,POPS))F(FRETWRM) 
PO P s EETF(,SI~PRED~,O) 
a a EUILDS("S/TYPE/SUBJIPRED~~] I [RETURN) 
* NP PARSER 
NP CB%T{"DEI@> xS[TO(,DET>) 
CAT(*PRQP) rS(TO(,PROl) 
C.AT(*NPR@) t5[70( ,NPRIE 
FAFSECES[)I tSCTO(,NPES)l 
a[TO[,PbNP)) 
DET SETP( ,NEiI CDET'pQ) 
ADJ C~T("ADdpl !r(To(,WI 
EETR(~NPp@ADJP MgQ) 
BUkP(@M') ttTo(,ADJS) 
h' CAI( *fipj tF(FRETURN1 
SETP(,NF, 'MCIQ) I(TQ(,NPF)) 
FOSPPO SETG[ ,Nfi, *PROCpO) r(%Q(,ADJ)l 
NPR SETP ( ltJFI "NPR5rQP 
XS(SE%I; [CCA~Eb)t @FOSYl CHG~AM(~NB\ p~~~*c @PQS?IPR.@) IFI:TO(,MPP)) 
FAPSEIFSf,)) tF(TO(,ADJ)I 
SF'TPC,?4F,'POSSflrG) 
t;E = BUILDS(*/rJP/!~PP/POSS./@4 I [RETURN) 
POPrJF FAPSECFSO) rS(TO( ,NPESl) 
hF a HUILDSCNP) &!(FErLUPN) 
NPP FPRSE(PP()) IF(TOE,POPNPI) 
SkTH( ,fiEr "NPPC NrQ) 
EVPP(~N~I esTo(,Npn>) 
P E-0 GFTF('CASEP) bPOS* rS(TO(,POSPAO)) 
EFTu( ,f~Pr 'PPOCpQ1 1 (TQ( rPQPfJP1) 
BLNP CAT('ADJC) tFdTO( p?dBL)l 
SETPI ,!IF? 'ADJ' pQ) 
z (TO( ,FLhPlI 
NPL 
NPES 
HTPP 
VADJ 
VDJPF 
VADJES 
NTNP 
VNP 
V IONF 
IOL 
AUXBE 
PAS 
PFPP 
FRNP 
TRPAS 
CA?('N') I$(GETF('N0R')rvPL') 2F(FRETURH) 
~ETF(,KFI *KOpQ) t (T0CQPQPNl3)) 
SETS( ,biFr 'COPP'r C) 
hF = EUlGDS(%P) :IFEIUFN) 
PP PARSER 
C''A'I'(@PREPe) ;S(TO( ,PREP)IFCFRETURN) 
FCPPREP'> = Q 
PP -= 'IFFEP ' Fq'PPEP', NpO 
')' ;S(RETURNlF(FRETUPN] 
VP PARSER 
CAT(.V'I ~F(TO(,AUXBEI) 
SFTR( ,VFt 'TtJS'rGETF('TWS')) 
HkStdAM(FS*r*AUXr) GETR('TNS*) 
IS(GETF('VTYP')r'TPANS') 
tS [TO[ ,TPAPiS) )F(TO( ,ITRAN]c) 
SE'IF(,VFp 'VT'pQI I(~Q(,VNP)~ 
SETRC ,VPt 'VgPQ) tETo(,NTPP)3 
CPT("AD3') 
$S('IB[,VADJ>) 
FARSE(NP0) 
~S(TOC,NTRPI) 
FPFSE(PP()) tF[TO(,POPVP)) 
SETR(,VPICVPPo WIG) 
EUI.'P(?N') : (TO[ ,NTPP)'l 
SEfR( ,VP, *ADJgrQ) 
FFP SE(ES()) tS(TO(,VADJES)) 
VF = EUILDS(VP1 
PAPSECFFO) tF(RETUFN1 
VF = VP Q rtTO(,VDJPPI) 
SE?P(,VPI 'ADJESWpG) t(TO(,POPVP)) 
S~'IF(~VFr 'NTNPCJQ) ~(T~(QPQPVP>) 
FPFSECIC()) tS(TO(,XOI)) 
FAR-SEINE ()) IFCFPWTVRN) 
SETF( ,VP, *OBJ',Q) 
FAPSE(IO0) lS(ToC,XOL)lF(TOC,POPVP)) 
SETPCQVPt 'XOC,C) 
FAFSE(NF()) tS(TQ(,VXONP))F(FRETURH-j 
SE'IF[,VFc *OBJ'rQ) t(TOC,PQPVP)) 
SETR(,VF, 'IO',Q] t (TO( ,POPVP)) 
CA'I('V'r'~LTP) fS(TOC,BE)) 
IS('fESTF ( *TYPEo), 'C*) ISCTESTR(~AUX') @BEe) $F(TO[,'SRYESI) 
SE?~(,VFI~V'~GEZF~'AZJ)S'~~ tCTO(,PAS)) 
IS(TESIF(*IF')l PA~SECESO) :F (FRETUFW ,j 
NP = Q t(FETURN1 
GETF(,VFp 'V',Q) 
SE'IF(,tF, "TNSr,GETF[fTUS*)] 
CAT [ 'V') tFCTO(,TADJ]) 
kORO 'INGC iStTo(,ING)I 
IS[GETF('VTYP@))~'TRANSC) tFCFRETUPN) 
GETF('TNS*) 'PPRTf tS(TPPAS)P[FRETURN) 
VF\z 
SkTF( @VPr 'AUX'r 'RE') 
Sp-TF( ,VEr 'TMS'r 'PPRG') 
SETP( ,VP,*VCIQ) 
FAPSE(NP()) ISCTO~~PRNP)) 
VF = ~~ILUS(*VP) 
PAFSECPFC)) ~FGRETURN) 
VP = VP 0 tCTO[,PRPp)) 
SETF(,VP,*PPNPrpG] t(rO(,POPVP)) 
VP = 
'AUXpr 'BE') 
'TYFE'), '€2') SETR( ,VPp 'TNS', *PPRTU) 
VPES 
TFPP 
FIO 
PNPTS'I 
PNP 
FIOL 
POPVF 
* 
I0 
IOTO 
ADD, TO 
XOFOF 
TES 
TWV 
POPES 
END 
CETF( 'V-') 
SETP(;VP,-p~',~) 
FAFSECIQ())*. tSCTO(pP1O)I 
Fx?~D~Pu('~EY~) ;SIIO( ,PNPTST)) 
FI~C~hFI) I'FPOMF) tS(TO(,PQPTSTI) 
FAFSE(ES()) !S(al:Q,C ;VPESl.jF(TO[ ,CHGSBJ)) 
SETR{weVPc 'ORJV,R<'SUP3')) 
XS(?ESTFI*TYPE'IV'DCL~ FFSET(*TYPEPt*TRPhSC) 
I~(TESTF(*TYPFC),'Q') PESE'I('TYPE','QPAS') 
FESET ('SUBJ'p 'SOFEONE') t (TO(,POBVP)) 
SLTPC ,VPv 'OBJESVr Q) 
VQ = EUILDSCVP) 
FAPSECPPO) ~FCRETURN) 
VP = VF Q I (TO( ,IRPP)~ 
SETPC e~P~*XO'~Q) 
FJNChRD('BY@) ;SCTO(,PNS?ST)) 
F~~~HFD('FPOM~) tS(ToC,PNPTST>>F(FRETURN) 
FARSE(NF0) :S(Tol,PNP))FCFRETURN) 
SETP(,VF,'OBJ8rGfTRCCSUBJ*)] 
RESET( 'SUT3dS8 Cl] 
IS~TEST~  T TYPE^, ~DCL.) PSSET(@TYPE~, .TF?PA~~~ 
~S(TESIP['TYPE~')~~Q@) FESETr"TYPEr,VQpA$@] 
IS(P<'PO'>) tF(TQ( ,POPVP)) 
PAFSE(IO()) 
8S~~Q(8~IoLl>F(TO[,POPVfa)) 
EETP( 5VFp'10vrQ] 
:SfTO( ,PQPVP)) 
VF = BUILDS(VP) r (RETURN) 
INOIPECl OBJECT 
FIhChPD(CTO*) rStTO(,IOTO)) 
FXVChPD( *FORp) rSETOC,IQFCR))FCFRZTURNl 
SETR(,ICI'PREPgpr*TO') 
FARSE (NF () ) 
xS(TOC~XONR)I 
STP = '10 ' STR ICFRE'XUPN) 
SETF( ,IC,"PREPfpQ) 
FAPSEChFO) rStTO(,IONP)) 
STR = 'FOR p $TI? 1 (FRETURN) 
SETRC.I~~*IONP.,G) 
Xa = aUILDs(X0) 8CPETUPN.l 
CAT ( 'CLINDp ) 8S(TO(,TESI) 
FIVCbPD(@TOc) gSCTO(,THV)) 
FXI;l?b.RD C 'HAVING') IS{TO(,ESVP)) 
IE(GEIF[rTNS')rrPPPCO) tF€FRETURN) 
FAFSF(VP()) aF [FRETURN) 
ES = C (RETURN 
SETP[,ESpCCLnI!JD ,Q) 
PARSF ($0) ?SrTQCaP0PEs3)F(FRE'3:URM] 
SLTP I ,ESP 'ItJF" TOC) 
EINDbFD[''llAVFe) OSITO(~ESVBII 
PARStl 1) OF (ADD,T0) 
SETFI,ESpCESVPB~C) 
ES = BUILOS(@/S/TYPE/SUU3/ESVP/@I n (RETURN) 
SETFl #EE, *AtlX"r 'HAVE@'] 
FARSE(VF0) rFCFPETURN) 
SETh( ,ESP *ESVP'~Q) 
ES = RUXLDSIV/S/TYPE/SU~3J/AtJX#ESVP/*) t CPETURN') 
ES a C tCPETlJPN) 
F 
DCJ 4'0U 11lStt4T TO E: :%PIItiE THE PEG1 STEF'? 
NO 
ItjPLIT Z-TFUSTUPE TO BE PRFIED 
I IlltitiT TO 150 
1. IIIH~~T TD GO 
.:'Thi~ I. 
COPIFLEbIEIil ITF'I tit:: GO 
ElJIL-11 ITPI-IIZTI-~FE: 
- 
1 1 T'r'PE L.1 I , UE 1 I.I~F-1- PF 0 1 1 1 I I FFED 1 $' 1 Tid PF'E' 4 1 I*;T l~lRt{T-& 
- 
I I {z IT'~'PE'~ICLI I -l-IE,_I I~~PI~F'~J I I 1 I IEI'~:'F 1$'Ft1t,! 13n) 
Ff?E.Z *I .#*.I 3 :I .I .I .I .-I b 
C 
DO '1'01-J ItlRtiT TO E: :fiM'ItiE THE F'EGISTEET < 
t1u 
I tiP!!T Z TFUC TUPE TO I:€ FRF-ZED 
I THIril TI-FIT I .IRIll '.1"171 IlIlTH HER 
I THIlib THHT I Ziiltl '-t'OCI 1111 TH HER 
lHld t4bT I11 LE%ICOt4 
LE: :ICUli ADil. TO fiE:OF'T- F'HFIE T'I'PE I TDP. ELI€ TS'Pi 'I'EZ 
'J'E" .- 
Id I2 P-11 ? 
.-. - 
AH!?? 
FE.a $LIFE ,TFIL t4G 
1% !TI TPHtiZ, tTC4 1 F'H 3T > 
II~Q~;~I$ 3 
ZTf3fE Z 
COlrlPLEMEtfT ?I WI {if:: HEF 
I :TF'UCTI,IFE: 
11if'r'PE DI:LI (:SlJE:J II~P~PFO I I F Ti FF'E'SI~~~T lHIli)E'* 
(OBJ fi-riP11;QMP ~I~T'I'F'E DGLJ 4:KE:J IIIF'~F'F'O It 1.1 tF'PED fVP1TIiI PHfTl 
rvT IHIIII tnBJ ~.~{P;PPD 41'UtJl 1 1PF.EP IIIITHI~{F'I,FFIO HEPl 11 i 11 11 11) 11 
96 
rtwr STRUCTURE TO BE PARSED 
,J0Flilm S 1:~L-IEVING THAT tIfIR'I' IS GOIN'G TO TI-IE VILI-AGE: IS EIYSTCRIOUS 
.iOl-It4 " S E:IZLIEt?II.IG THAT MARY IS GCIII4G 'TO THE VILL-AGE JS MYS TEriIOUS 
SIf'lTL: S 
CL7i rl"I, Eflr3i.4T ST133:PICi : MY STFri 1 OUS 
1IU:l LEI .I; TI\LJCTUI'<E : 
( SL f YPC rrcl, i ! Ci~~~ (IF (E.lrK LICII-li\,! S ( I='OSs ( VF'( TfiS F.'r;CS ( VT r:ELIcvE ) 
( l3E.i' (NI'" ( COril ' ( 3 i ! f C.1, i ( SLJD J, ( (141 '( NI''~.' Mr;TiY ) j ( F'FiED ( VI-' (kUX I{E ) 
1 IV G09ilFtFiEf' I I VILL,,^.G11)))))1))$))) 
(PRED ( VI'' (V EE i ( TNS T'FiE.2 i ChLI,J M'I'S T'ERIOUG) ) 1 ) 
DO YCSU WAIdT 70 IIYkMTNE TI-IE REGISTERS ? 
NO 
IPIF'UT STRUCTUI7T: TO BIZ F'ARSED 
TliAJ tIE, EFiOliE tlER 11 751-1 IS SERIOUS 
Tl'l~l I--IE I I I 1'5 SEIXIUUS 
STATE S 
Ct3MF'L.t:Ml11.I T S fRTNG: SERIOUS 
L~UILTr STRUCTURE t 
(S(TYT'E DCI,) (SbKtJ* (NF'(CO45F' (S(TYF'E KlCL) (SUEJ (NF1(I-"TiO HE))) 
(FRED ( VT-' t T14S PhST ) ( u f T:I?EI^II.\ ) C 0E.J ( T413 ( PRO I !El7 1 i N, LIISI I) 1 ) 1 ) ) 
z FRED ( VF'( v fi~ r~is FIRES i ~,LI.J SER~OUS 1 I 1 
DO YOU WANT TO EXAhINE Ti4E REGISTERS ? 
?\I0 - 
II!F'UT STRUCTURE TO BF F'ARSED 
TI-IE EO'i' ?!L'E(I~ I t.IZ 'T I-If.: f;L(.52S IS MLJLLIGAN 
Ti-IE F:OY I:HECII,I~.IG TI-IE GLASS 1S MULLIGAN 
STATE S 
CO~-~F'LEH~ZP~T STR JNG : MULLIGAN 
EU-I LK1 $7 r.:UCTURE t 
( 5 ( 7 YIZ'C DCI, 1 ( SUBJ C it TI t EjDY 1 ( EMP ((IF' (TNS F'F'RG ( VT EREAIi) 
(,OBJ uw;(ZsCr I ~I-ASS));~~~)(F'KED (LIF'(V BEHTNS FRES) 
( flTNT' i PJ!:# ( NrlR MllL 1-1 GAf4 j ) 1 1 ) 1 
DO YOU wntu TQ Eitnrm<c I REGISTERS ? 
NO 
TP!F'UT STRUCTURE TO hE PARSED 
JCltlrdD S IiEIt!G TtiIN 'IS NICE 
JUIIPJQ4; K:EING TI-IIIJ IS NICE 
STATE '3 
COP~F'LEI.II:I~T STRING : NICE 
HUlLLI 43-1 RUCT'UliE : 
(E:<TYFIF, DCI,itSUI!J !f4F1(tJF'R JOIiNmSi*(POSS iLIF'(U-EE)(TNS F'F'KG) 
(ABJ 7Hlll)j)):~a(F'I;'CLI i l:E)ITNS I-'li'CS)(kLIJ PIICE)))) 
110 YOU WANT TU EXAMINE THE I?'I,GISTEI?S '3 
NQ 
J NFUT STF:LJCTLIr*:C: 7 CI CL":'*AKSTD 
TO I n I wnr; t11 .; rll-iEnM 
rd r:r: fi nnbt wn;; 111s 11nEnM 
5 -1.17 T r L; 
I I I STRTNG : DIXECiM 
rw 1 I. II :+~ITI.JC T tmr: : 
( I; ( T?I"IT DCI, 1 ( ,I;IJL{ J (NI ' ( I;OMTB ( S ( 'T Y[.'E 1 C Z1IK.i J 1 ( ET;VF.' ( VI" ( V PE ) 
J I j J r I ( I t I t h j ( E4 MAFI 1 i 1 j 1 ) 1 i ) ( l'*l<L I1 ( VI-' ( V BE ) 
(7.N:; lbfi:.;T) 1 } ) 
110 YOIJ umg'r rn TX~I~+T.NK 71-11: ,KG 1:;rlir;S 'F 
N CI 
1 I- 5 I I TC1 DE I"')iRSCTi 
..I I I I 1 I , IS lil. C;I\L.CSS 
I I I 4 I I I 112 iil- Clil-ESE; 
E;TttI,E 
C L f; TF: I E4r; f 17ECI;LESS 
LCLl I I.JI :, 1 lilll: 1 IJIit-1 
tS( TYIvr: DCL) (!;LI.T,;J I (UI*(TNS iL'PR(;i (VT 14liI:t\l\) (OL{-l (i4F1(N IfTSl!ESk)) - 
THE F{QY RuNNIJ~J\]~G TO THE HOUSE IS JOHN 
'TtiE BOY RkJNl43 NG TO TliE HOUSE IS JOHN 
S'1A7E S 
COrlPLEMENT STRING: JOHN 
BUILrl S TI?UCTUI7E *' 
(S(TYF'E UCI-1 (SUGJ OEP(DET TI-IE) (id DoY) (EMB (Ul='iU RUN) (TNS F'F'RGI 
(UFp (I"RI,r-' T CI (Ef1'- C DE r THE ) (N 
I-IUUSE > > > )'> j ) > i PIED ( VF' ( U BE) 
<TNS F'RES)(N?I'IP (I!I:'(NE'R JOI-IN)))))) 
 TI^ ~Ju lrJriNr TO EXRMlNE THE REGISTERS ? 
NO 
INPUT STF;UCTURE TO BE PARSED 
K:REklxING IIISWCS IS F:ECKLESS 
DREklil 1.IG IllSliES IS RECKLESS 
ST'f2T.F S 
CCIFil='LEMi:I!T S TRIf4G t RECliLESS 
BUILD STRUCTURE t 
(S(TYF'E DCLi(6UKiJ tNP(COM13 CVF(TNS T-'F'RG)(VT EREAh)(OBJ (NF'(N DISI-IESl>) 
))I) 
EL P J E:E > (TNS PRES) f AT1 J RECIiLESS ) ) 1 
110 YOU WANT TO EXAMINE THE REGISTERS ? 
NO 
INPUT STRUCTURE TO BE PARSED 
I WAS TkJIt4liING THAT YOU E\F\WEKE CONSERVATIVE 
f UkS THI?.II<ING THAT YOU WERE CQNSERVATIVE 
STATE S 
COMPLEMTi4T STRING! CONSERVATIVE 
PUIL'It STRUCTURE: 
{SCTYFIE DCI,) (SUFJ (NF'(PR0 1)) 1 (FRET1 CUF'(RUX E:E) (TNS FF'RG) 
(V TI-/INI<j-CFYRPJF' (NF'CCOMI=' (S (TYF'E ICILL~ (SU~J (NF'(FRO YOU) ) 
(F'F:ED (I3 EEHTNS I='AST)(ATiJ CDNSERVCiT1VE)))I)))))) 
DO YOU WRrX TU EXAhIIIE THE REGISTEl3S ? 
NO 
TNFUT STKUCT-URE TO BE PARSED 
JOIN we PELI~VED TD RE DELAYEII 
401 iV WiiS t:lfLIEVt~t TO BE DELAYED 
57kTt S 
COMPLEI.SEN T S'Tf I NG t DELAYEX1 
BUILP S TRUC'TURE t 
(8 ( TYF'IE TriPhS ) t S~IIEI J SOHEONE? ( PRECI (:GUX, Blfb> ( TNS P~ES 1 t U *EEL JEVE ) 
(CIFJES C S Z TYI"E TT<FI.IS*) (,SUE{ J SC)r?EOl.ll: > I ILZV1.' ( V~?J( AUX Pic) ( T PIIS F'I"11T 3 
(U DELAY) (C1-E.J (NF (~tI=~l7 JQl{]gj >> j ) 1 > ) 3 1 
DO vDU WRi$'r TO EX#+MI IJE 'THC t7FGTSSTETiSj ? 
ND 
rbrrur STIxuc-rulx cu EE pAflSED 
T.Hi?T tie E E-IER IiS 31 1 WAS SI,RI.OWS 
TI-lhT HE t:fT~liE l711SR IC1I S1.f YhS' SERIOUS- 
STATE S 
COMr.'LEHISl4T STRT'NG $ C:ERIOUS 
BUILD STI2WCTURE t 
(S $TYPE lDCld) (WC J (WP( COI.IR IS (TYPE DCL, 1 (SlJp,l (1 HE) j I. 
(F'ftED (V17'fifNS ~A$'J) (V-f GREhli) (OIIJ (I),F'(FBFiD H~K) (N DfS1-l))) ))I)!) )) 
WREIJ V E f, F.h:;T) ChLlJ I;Efi,ID,US:) 1 ) 
DO YOU WANT i(3 .EXtlMINE TtlE REGIGTERs ? ' 
YES - 

References
14. ICinograd, T. "Five Lectures on Artificial Tnt eJ ligcncc" 
Stanford AI-h4cmo 246, Scptcmbcr 1974. 

15. Minslcy, ?I. "A Framework for Rcprcscnt ing Know1 edge" in 
Winston P. (Ed.) The ---pa- Psycholog) - - -..--- of Computcr Vjsion, 
- .----- 
McGrau Hill, 1975, 

16. Robraw, TI. and Winograd, T. "A KRL Uscrl s Manual" (uilpubi j sl~cd) . 

17. Schank, R. ''Ilsjng Knowlcdgc to IJnder~tand~~ TINLAP rrocccd ings 
pp. 117-121, June 3975. 

18. Schank, R. and the Yale A1 i;youp "SAM ---a Story Unclerstnndcrff 
Yalc University, Dcpt. of Computer Science, August 1975. 

19. Schank, R. and Abelson, R. "Scripts, PI ans and Know1 cdge" 
P.roccodings IJCAI, pp. 151-157, Scptc~nbcr 1975. 

20. Schank, R. and Colby, K. (Eds.) Computer Vodels .- of Thougllt -- 
and Language, Frccman, 1973. 

Cutting, J. E., Rosner, B. S., Foard, C. F. (1976) Perceptual 
categories for musiclike sounds: implications for theories 
of speech perception. Quarterly Journal of Experimental Psycholoqy, 
28 : 361-378. 

Fourcin; A. J. (1972) Perceptual mechanisms at the first 
level.of speech processing. In: A. Rigault and R. Charbon- 
nea~ , eds . 
Proceedings of the VII th International Congress of ~honet-ic 
Sciences, Montreal 1971.. Mouton, The Hague. 

Lindblom, B. E. F. (1972) Phonetics and the description of 
language. In: A. Rigault and R. Charbonneau, eds . Proceedings 
of the VVTIth Intcrnat ional Congress of Phonetic Sciences, Montreal, 1971. Mouton, The Hague. 

Lindblom, £3. E. F. (1975) Experiments in sound structure. 
Plenary paper, presented at the VIIIth International Con- 
gress of Phonetic Sciences, Leeds 1975. 

Chomsky ,. N. and M. Schutzenberger (1963) , The Algebraic Theory 
of Context-Free Languages, in Computer Programming and 
Formal Systems". (P- Braffort and D. Hirschbert, Eds.) , 
North Holland, Amsterdam, 

Ginsburg, .S. and H. G. Rice (1963), Two Families of Languages 
Related to ALGOL, JACM 9, pp. 350-371. 

Shamir, Eliahu (1967), A Representation Theorem for Algebraic 
and Context-Free Power Series in Non-Commuting Variables, 
Information and Control 11, pp. 239- 254 

Stanat, D. F, (1972), Approximation of weighted Type 0 Languages 
by Formal Power Series, Information and Control 21, pp 
344-381 . 

Stanat, D. F. (1972), A Homomorphism Theorem for Weighted Context- 
Free Grammars, J. Comput. System Sci. , pp. 217-232 

Weiss, S. F., D. F. Stanat and G. A- Mago (1973), Algebraic 
Parsing Techniques for Context-Free Grammars, in "Automata, 
Languages and ProgrammingT1 (M. Nivot, Ed.), pp. 493-498, 
North Holland/Arnerican Elsevier. 

(11 A. Bookstein and D.R. Swanson, Probabilistic Models for Automatic 
Indexing, Jownal of the ASIS, Vol. 
25, No. 5, ~e~tember~~ciober 1974 , 
p. 312-318. 

[-21 D.C. Stone and MI Rubinoff, Statistical Generation of a Technical 
Vocabulary, American Documentation, Vol. 19, No. 4, October 1968, 
p. 411-412. 

[3J S.P. Dennis, The Design and Testing of a Fully Automatic Indexing- 
Searching System for Documents Consisting of Expository Text, in 
Information Retrieval: A Critical Review, G. Schecter , editor, 
Thomps~h Book Co. , Washington, 1967, p. 67-94. 

[4] G. Salton, A Theory of Indexing, Regional Conference- Series in 
Appliea Mathematics No. 18, Society for Industrial and Applied 
Mathematics., Philadelphia, 1975. 

[5] G. Salton ,. C. S. Yang and C. T. Yu, A Theory of Term, Importance in 
Automatic Indexing, Journal of the ASIS, Vol. 26, No. 1, January- 
February 1975, p. 33-44. 

161 G. Ealton, A. Wong, and C.S. Yang, A Vector Space Model for Automatic 
Sndexing , Communications of the ACM , Vol . 18, No. 11, November ' 1975, 
p. 613-620. 

171 C.T. Yu and G. Salton, Precision Weighting -An Effective Automatic 
Indexing Method, to be published in Jownal of the ACM, 1376. 

[8] D. Williamsori, R. Williamson, and M. Lesk, The Cornell Emplementation 
of the SMA~T System, in The SMART Retrieval System, G. Salton, editor 
Pren-k ice -Hall, EngLewood Cliffs , NJ , 197 1 , Chapter 2. 
