Americaa Journal of Comput at iondl Lmgui$ tics 
--4 \s" #' iHi ' 'INITE 1 STRING 
NEWSLETTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUI STICS 
VOLUME 15 - IIIUMPR 2 JUNE 1978 
AMERICAN JOURNAL OF COMPUTATIONAL LINGUISTICS is published by 
the Association for Computational Linguistics 
SECRETARY -TREASURER Donald E Walker, SRI International, 
Menlo Park, California 94025 
EDITOR Davld G Hays, 5048 Lakeshore Road, Hamburg, New 
York, 14075 
ASSOCIATE EDITOR George E Heidorn, IBM Research Center, 
P 0 Box 218, Yorktown Heights, New York 10598 
EDITORIAL ASSISTANT William Benzon 
Copyright C 1978 
0 
Associa ti on for Computational Linguistics 
American Journal of C~rnputat~onal tiaguirtlcr 
CONTENTS Yicgcficht? 75: 2 
THE DERIVATION OF ANSWERS FROM LOGICAL FORMS EN A 
QUESTION ANSWERJNG SISTE*~, Fred 3 Dlny au 
ON€ MORE STEP TONARD COMPUTER LEXICOMETRF, 
Nlchof as ir Findler and Shu- HIT^ Lee 
PUBLJSHING AJGL 
CONFERENCES ASIS AND HICSS 
LINGUISTIC STRUCTURES PROCESSING Zampolll, ed 
NATURAL LANGUAGE IN INFORMATIDN SCIENCE, 
Walker, KdrIgrcn, and Fa\ eds 
DESCRIPTION OF AJCL 
American Jwrnd 
THE D,ERIVATICN OF MSWERS FROFl LOGICAL FORMS 
IIJ A QUESTION AHSWERIIIG SYSTEW 
FRED J DAMERAU 
IBM Corporation 
Thomag J Watson Research Center 
Yorktown Heights, New York 
ABSTRACT 
This papex descrsbes how the process 05 gene~ting a 
response given an underlying representation fox an input 
question is accomplished in the Transformatioaal Question 
Rnswering [PA? system under development at IBM Research, a 
brief description af which is given. 
The last formal level of representation in this system is 
called a logical form. The bas~c method of evaluation of 
logical forms is the generate and test" paradigm, used, for 
q~&npls in the LUNAR system (Woods, Kaplan and Nash-Webber, 
1972 1, althbugh that implementation must be fairly efficient 
in order te be j~actical on a moderate size data base. The 
basic idea is to keep track of the equivalence .relationships 
botueen the variables in the logical fcrh and associated 
constants, and use this information to dexive from the data 
base the extensions of the predicates contained in the 
logical form. A similar pxoposal has been made by 
Reitez(1976). The logical fo~ms and the process hy which 
candidate sets are computed from these forms @re described 
in considesable detail* We believe it shoufd not be 
necessary for a computational linguistics project to 
describe operahions beyopd the last lev'el of f orma1 
representation in ozdex 5~1 an outsider to understand 
exactly how a system operates sufficiently well that he.can 
paedict its behavior. Although we have attempted to achieve 
that, we stilL have a considerable way to go, 
~hir paper describes how tho process of generating a 
rasponse given an tl~lde~lying xrsprg!senT:amn fair an i11put 
question is ~cccm~lished An the ~ransforpntionnl Eucst~on 
Answ~rihg (TP) 1 sys trrp undo r' co~~tinu~ng dr,u@kopmmc?nt at IAN 
Research,. TQA has beert, operational. 1 a laborntszy mode for 
several yeers.. The system is noid installed in the office of 
the planning department ol a small city uhere it is used to 
access the file of land use fox each parcel of land In the 
city, (about 10,000 parcels ~ith 40 piecas of data for each 
parcel 1. The sysytcm is trnlilcrgoin,g rn~difications nncl 
in~pxovsnent pxisx to a formal eva1uati011 stage I 
A generalized flow diagram of th& TQA system is given in 
Figure 1. Input, from a display device or typewriter-like 
terminal, is fed to the preprocessox, which segments 5lle 
input character stsing anto words and performs lexical 
lookup. The process of lookup is complicated somewhat by a 
provision for synonym and phrase, xeplacement. Words like 
"car" and "automobilew are changed to "auto", an8 strings 
like "gas stationw are frozen into single lexical units, 
PAGE 5 
Input 
I 
I 
r------------ 1 
l~repxocessorl <---------- Lexicon 
L------------J 
I 
1 List of lexical tree9 
I 
----,------- 1 
ITransfozmatlonal parsark <---- Strin~ transformations 
L------d----------------J 
I 
I List of trees 
I 
p-.-CI-L.T..)..-(L&LI--L-.LI 
'l 
I Context free parser 1 <------ Contewt free phxa~e 
L-------------------J 
strulcruxe rules 
I 
1 List of surface trees 
I 
r---.l.l-.IT------.)g-.IIIIL~g 
1 
1 Wansformational parse* I <---- Invexse transformational 
L-,-,,,-,,-,,,-,-,,-----J 
grammar 
I 
I ueep st~ucture(s1 
I 
r--------c--,-,,,-,----- 
1 
ITransformational paxserl <---- Data base specific 
Lrl-lll.l)lll-...L-~-~II.I)liIIJ 
trsnsformational rules 
I 
r Quexy structureCs) 
I 
r-------------*------ 
7 
1 
Semant~c, interpreter I <----- Sernhntic rules 
-----------------,,J 
<-- I 
7 1 
I Logical form(s) 
1 I 
f --------- 1 
I 
IEvaluatosl <------------ Data base 
L,,,,,,,,, J 
I 
1 
Answer 
Figure 1 
---------~--C-----------II~C~C~CIIIIL 
PAGE 6 
The output from the lexical lookup is a l~st of tzansl 
each tree, contaznzng padrk of s~,er!ch in.brmatxorr, gyntactzc 
faaturcs and scrmantlc featut:es, RS requ~r~d A descz~~t~r~n~ 
of the, lexical cornpol>cetlt, now absul~t~ ln rts detarl but 
still valid 111 main outline AS glvan ln Rob~nsarr( 19733. "f'& 
list of trees is input to e; set 03 sFtt~$m tz~~~~!~~atx~~n.s. 
describ~d ln Platht 1974 I. These t~Q*nsfarnat~ons ap~ratc! arr 
gdjhcant 16~ic~bit8119~ to deal with patfcrns 05 C~.~SS~~LP~S, 
ordinal numbers, stranded prepoJit%ons, and the like. The 
effect of thls pnase is to reduce the nurnber of surface 
paxses and the amount of work clone in the transformational 
cycle. The resulting list of trees 1s input to a context 
free paserr whlch produws a set of surface trees, each of 
which 1s fed to the trzinsf~rmatianal recognizer. 
The recognizer attempts to find at1 yqd-~Xl~-ajjg ~-ru&t'l~r~q 
foq each surface tree, PlathClQ73). Typically only one of 
a set of surface trees ill result- Ln an underlying 
structure. This structure itself 1s input once agaln to the 
tkansfoxmational  recognize^:, uslng a (smalJ1 set of grammar 
rules tailored to a speciTic data base to produce a suerv 
structu~e, Query structu~es are similar to underlying 
structures In form, but re-flect the paticular rneanmg 
constraints resulting from the format and content of a given 
data base. The query stxucture tree 1s processed by a 
Knuth-style semantic intexpreter, PeteLck t19771, producing 
a losical form. A logical form can best be thought of, In 
PAGE 7 
our* corntext, as a retrievaZ expressiwn. which is to v be1 
evaluated, producing an anwer to the English input query. 
Since the major part 02 tlris paper is co.ncerned uith 
procefsing logical forms, discussion of their specifics vhll 
be deferzed until later 
The process of answer extraction from the data base is 
accr~aplished by a cotnbinatWn of LISP and FLII programs, 
described below, and an experimental relational data base 
manactenrent system called Relational Storage System (RSS) 
(Astrahan. et al. 1976). The RSS provides the capabiJiity 
to generate a data base of n-ary xelations, with indexes on 
any field of the relation, and low-level access commands 
lixe OPEN, NEXT? CLQSE, wit11 appropriate paraneters, to 
retrieve information from such a data base. 
All the proce.ssing modules are under the control of a 
driver mdule, which maintains cornmfinication with the user, 
calls the processors in the corzect sequence, and tests for 
errors. An example of the procgssing of a question, with 
tHe intermediate outputs, is given in Figure 2. 
In this example, *he numbers 2945, 6535, 6635, 6975 are 
the numbers of milliseconds .of computez time used up to the 
point shown, on an IBM S/370 Model 168 The strUctures 
printed are a bracketted terminal s.tting representation of 
structdres which are stored and manipulated as trees by the 
PAGE 8 
what are the heights of the drug stores ? 
2945 SURFACE STRUCTURES: 
(C(NH SOME) (THING XI)) BE (THE ((HETGHT X4) 
car c'~fi~ ((DRUG-STORE 591) X~))III ?.I 
6535 UNDERLYING STRUCTURZS: 
1. (BD IDENTICRL ('THE. (XQ (* Bp HEXGHT X4 
TTHE ((DRUG-STO.RE ~.~I)'x~I).BD *))'l:(CWR SOHE) 
(?flING XI)) BD) 
6 6 35 QUERY STPUCTU-RES : 
1. (THE (XL) * BD HESGHT X4 (THE ((DROG,#TORE 59 1.1 
X'I)) BD *))I 
6975 LOGICAL FORM: 
(setx 'X4! 
[foratleast 4 'X44 
'(setx f1X7 
''( testfdt 
'541' 
'('LUC X7 '1976) 
'= 1 3 
t testfct 
XU 
'('JSTOR X4L) "f976) 
'= 1 ) 1 
7995 ANSWERS: 
NUMBER 
STORIES 
Figure 2 
processing programsc The nonterminal nodes 0% the tree, 
together with thei'r associated complex featqrea, represent 
mucIl rrdditial~al inf brmatian that: id not shown here ., The 
number 591 is a land use code which. in the data base, 
indioates a drug storer and th# long numbers in bhe ansNez 
are the parcel identifi6rs, (ward-block-lot). 
Ffom this bzicd description, it should be apparent that 
the TQA system, considexed as a blach box, is sirn'ilar td 
many sthers:. 1.n particular, there is a desi9nated level of 
meaning representation, the logical Zoxrn, which is the lbsk 
formal construct in the system. The remaining processing 
necessary ko derive an answer and to format it for 
presentation to a user is accomplished ny an unstructured 
se-1; of computer programs. Two sepazate issues azise as a 
resu3 $: how efficzently can the logical form be evalu'aled 
against a real data base, and to what eztent do the 
processing functions eurther specify meaning, beyond that 
carried by the logical form? 
FVALUATION OF LOGICAL FORM!$ 
Th.e basic method of evaluation of lpgical forms is the 
"generate and testv paradigm used, for example, i.n the LUECAR 
PAGE 10 
system LJoads Kaplari and Nasli-Webbex, 1972 I. The simple 
version of this paradigm, used by Wooas and implemented in 
our eazly systems, in~olves checking pre-selected lists 05 
objects or, in the worst case, all the objects hnonn to the 
system, to see, if they satisfy. tile query pxed'icates. It is 
computationally impractical except foq small data bases. 
Our current* variant 05 this metllad js much more efficient. 
The basic idea is to keep track of the equivalence 
relatibnships between the vaxiables in the logical form and 
associated conskants, and use this information to derive the 
extensions c# the pxedica~tes contained in the logieal form 
f%am tne aata base. A similar-pxo~osal has been made by 
Reeiter(19761 We do not how.ever, m&Re such extensive use of 
quekp trgnsformations as Reiter outlined. 
Logical farm$ 
Zn order to describe the eualfiation process, it is 
necessary to describe the 1olgicaL form in s~mewht more 
detail, referring fos example again to Figure 2. In the 
fixst place, excepr +or the set-forming function satx, which 
takes as arguments a variable name and a proposition, all 
other weX.1,-fo2:med folimulqs are composed of predicates and 
their argu~~ients . Some of the predicates are perfectly 
ordinary like qreati!rthan. Some are quantifiers, like 
fox:~tdeast, which Cakes a limit argument n, an argument 
PAGE 11 
which is a set? and a proposition e, and which is true just 
in case or more elements 05 the specifred set satisfy the 
proposition e. Others axe special application predricates 
like parceL, which is true just in ease its single ergurnen-t 
is a parcel identifier. 
The hain data base related ptedicate is named testfct. 
Referring to Figure ZI it is seen that Cestfot has three 
arguments, The first is w constant or a variabLe which will 
be replaced by a constant befaye evaluation, the second 
argument is a llst whose memtbers determine a particular: 
data base value, and the third is an operator specifying the 
relation which must hold between the iirst argumentsand the 
data base value detexmined by the 'second argument. 
The data base oan be thought of as a collection Q+ binary 
relations, all shirring the same key. In our applicq,tiob, 
this is thQ parcel identification gr: account number, by 
which any piece af pg~perty can be identified. The list 
which is the second argument of testfct consists of the 
relation name and the* key which identifies a va1u.e in the 
relation. The key actually has two parts. The second part 
is a yeah now unused, although since the files in nhich we 
are currently intereded are changed on a yearly basis, we 
anticipate maintaining and accessing historical data. The 
first part of the key is the account number mentioned above. 
In gener a1 , the second argument of testfct 
must be 
PAGE 12 
sufficient to identify a unique binary rela*ion and value in 
that relation. 
If the logical form is itsel5 a proposition the system 
will answer eithek "yes" or "no" . If the logical form has a 
tdp level setx, the system wi-13 print the membezs of the set 
satisfying the specisied proposition, pexhaps along with 
some identify3ng information: 
Simplifications 
A number of simulificatlom can be, and in part have been, 
carried out on logical iorms prior to eva~hation. Some 
pxedicates, for example, are essentiazly empty for purposes 
02 evaluation, in that they always evaluate to true,. As an 
e'xample, the predlcate dollar, for information Sields 
referring to taxeg, is empty of meaning because the 
pxocessor assumes thAt the contents of the %axes field are 
always dollars. A slightly less obvious example of a 
possible sfmpliSication can be seen in Figure 2. The set 
argument of the foratleast pregicate cantains no free 
variables. It is not necessary, therefore, to evaluate the 
inner setx funmtion for each evaluation of the 
predicate. Instead, the setx function is evaluated as soon 
as the semanuc interpretex has discovered that it has no 
iree variables* using the standard evaluation mechanism, and 
the value, i.e., a set, is substituted for setx 
axpression. Our system perFormk simpl-if ications 04' this 
Rind in its normal mede (although it can also delay ~11 
evaluations qntil a comple%e form has- been built), so that 
the final logkcal foxm seen by the retrieval furlotions 
during p~oceSsiw is usually that shown ia Figure 3, where 
the innei  set^ has been replaced hy the satisfy-iing set 
viz the parcel identifiers of the set of drug sto~es.~ 
L+ker all the applkc@le simp&ifirathons have been donel tht~ 
resulting form is passed to the evaluation function, E,V,ALU. 
The Pre-evaluator 
It might seem that since the system has been written in 
LISP, it would 0nJ.y be necessary to define the appropziate 
functions and then call 'the regulax LISP evaluator, ~nstead 
of a special evaluator like EVALU. WI~ile Chis would be 
possible, the aifficulty with such an a~proac~h can readily 
b~ seen by considering the embedded setx in Figure 2. The 
desired set of X7s is that set of parcel identifiers fo 
which the associated land use code is "59 In. testfct is a 
predicate which is true for the appropriate X7sr but wha$ 1s 
the candidate set of X75 which should be tested? At wurst, 
the system might consider the set of all objects it knows 
about. AS a better choice, the system cou3d infer from the 
syntax of testfct that +he candidates are all members of the 
set of parce'l identifiers, but s{till there are almost 10,000 
PAGE 14 
what axe the helgnts 02 the arug stores ? 
2930 SURFACE S'TRUGTURES: 
1. t(<MM SO'MEI (THING XI)) BE (TI?& ((HEIGHT SLIT) 
[OF ~TME [I:DRUG,STORE 5911 X7)l'll) ?) 
1. ~BIL &DENTICAL ('THE (X4 (* RF HEIGHT X4 [THE 
[(DRUG-STORE 591) X7)l 13D k))) I(WQ,SOFlE) 
(T.HIN~; xi)) RD) 
6599 QUERY SITRUCTURES: 
1 (THE eX4 (* BD HEIGHT X4 (THE :(DRUG-STORE 591) 
X711 BD *I)) 
3 f76 LOGICAL FORM: 
('setx 'X4 
'tforatleast 1 'X44 
(90430000910 80100Q04811 80100000710 
705900016103 
i testfct 
x 4 
('JSTOR X44 1976) 
1 = 111 
NUMBER 
STORrES 
Figure 3 
-*---------------I------------ 
US those A much better approach is to attempt 'to compute 
the extension of those predicates for which the variable 
being sought IS an argu~ent Again refexr~ng to Figure 2, a 
PAGE 15 
reasonable set tin fact the perfect set) of candidates for 
X'7 can be found by. Looking in tllc data base for that set of 
identifiers fox which the land use code is 591 If the data 
base is properly organized, such a search can be very zast 
Not all predicates are so simple however. The remainder of 
this section will describe. in some detail llbw caradidate sets 
for more complicated p%edicates are  rived at. Once 
can'di3ate sets hav.e been computei~ the EVALU function can 
invoke *he LISP evaluator od tlla logical form. T~E! 
alternative of including a candidate generatow 'in the setx 
program and a12 the ~dtential top level predicates and then 
applying the LISP EVAL function directly seems much less 
attraative . 
As a pxeliminary, notice that we need only ipsuxe that 
candidate sets have been established fd'r all the setx 
variables in a logical for111. This is so Because, while each 
quantifier has an associated variab-le, the domain of that 
quantifier is either given explicitly as a If st ,o-f 
constarrts, or implicitly by absetx expxessioxi. Secondly, 
since the object of pre-evaluation is merely to find 
efficient., not neCesr-~ily optimal, candidate set's far the 
setx variables, we need not keep track of the structure of a 
complex predic-ate. As an example, consider Figt1r:e 4, whi~h 
is the logical 5orm foz the question, 
"What drug stores are located in wazd 8?v 
The prddicate of the s-e& is "andvT, but for pu~pposes o'f: 
PAGE 16 
(setx 'X2 
C and 
t testf ct 
' 59 1 
'CrLUC XZ '1976) 
C= ) 
(testfct 
l ' $ ?WARD X2 '19761 
'= ) I ) 
detexmxning a candidate set we can consider each term of the 
"andw individually. Evaluation of the farm with a given 
candidate set will ensure that a particular member 
satisfies both terms of the lVand1l. 
Operation of the  re-evaluation function. Pre-evaluation 
is accomplished by a functioli EVALUA., which takes a logical 
formr it a setx expression or a proposition as its 
argument. It determines the type of form with which it is 
dealing and calls an appropriate specialist roufihe If as 
in the case of the llandlr of Figure 4, the logical form being 
considered contains more than ohe component form, EVALUA 
caiLs i-bself fecursively. Consequently, pre-evaluation is a 
depth-first, left-to-rimht process. The function always 
zeturns nil, a woxk beilly a.cczliiipl~~..~, hy changes to 
global vaxiables. Among these are a LISP variable which 
PAGE 17 
contains a list of all set* variables in the logical foxm, a 
LISP variable which lists each query variable for which a 
value has been founiir and its value, and a LISP vax,iable 
which keeps track 0% the equality relationships which have 
been discovered between query vaziables for which a value is 
yet to be found. 
Operation of the aZgorithm can be better understood by 
considering somewhat more complicated examples than those 
seen pteviously. When EVALUR is given the logical 9orm 03 
What psrcels have an area exceeding 550000 
square feet ? 
7524 LOGICAL FORM: 
Csetx 'X2 
(and 
C3oratleast 1 'X39 
(setx 'X5 
' (tkstfct 
X5 
' ( 'PARAREA X2 ' 1976) 
'= 1 1 
Vgreatexthan Y39 '5500001 1 
(parcel ~21 1 1 
Figure 5 
Figure 5, it calls the setx specialist, which adds X2 to the 
(null) list of set variables and the (null) list of query 
vakiables, and calls EVALUA with the associated setx 
predicate, :'andvr. As mentioned., t%is simply results in two 
PAGE 13 
calls to EVALUA, the Sirst of which causes the quantifier 
Spedlali~t to be invoked. (The second call, when made, will 
not cause any change to the global lists oE candidate values 
far variables, since a cand~date set of all parcel 
identifiexs is not useful for purposes of retrieval.) X39 
is added to the list of query variables, and the domain 
argument of the quantifier is inspected. When this is seen 
te be an instance of setx rather -t;han a list of constants, 
two actions are taken. Notice that whatever the domain, of 
X39 19, it is a subset (perhaps not a proper subset,) 05 the 
domain of X5, e the candidate set for X5 must include at 
least a31 of the elements of X39. Further, any restrictions 
which can be imposed on X39 can also 'be imposed on X5, since 
the proposition associated with the quantifier is the one to 
be satisfied, and any candidate not maeting this criterion 
would be super'il~uow. Therefore, we ban 11 enter into the 
is of variable relationships the information that for 
purposes of the pre-evaluator, X39 and X5 are equivalen and 
2,) call EVALUA once more with the setx associated with X5 as 
an axgument. 
X5 is added to the lUlst of set vaxiables, and 
reinvocation of EVALUA with the setx predicate causes a call 
to the specialist fox testfct. Since there axe two variables 
in testfct, X5 and X2, for whfch values are unknown, ascall 
to the data base cannot yet be made. The instance of 
testfct is placed-on a list of pending lata base calls, 
PAGE 19 
preceded by the variables which require values. (Each time a 
value for a variable is found, that list is inspected, and 
any data base calls which can then bk made are executed.) 
Return is made to the quantifier specialist, which calls 
EVALUA with the predicate ovex: whose ax guments 
quantification is.made, viz., .crreaterthaq. 
The specialist for numeric predicates, finding that one 
argument is a variable and the othet a constafbct, causes a 
hhanse in the variable list to show that X39 and 
consequehtly X5 are greater than 550,000. A value like 
~>550,000~~ can be used by the data base componen* Lo narrow 
its search just as well as a constant or list of constants, 
and is therefore acceptable as the value of a candidate 
list-. These changes to the v.ariable iists cause the list of 
pending data base calls to be inspected and, since only one 
varsable is now unknown in the stacked testfct, a call to 
the data base is made for those pascels with an area greater 
than 550,000 square feet. 
The specialist for testfct instructs the data base search 
routine to return as a value a list coxxesponding to the 
remaining varia-ble in the Zorm, i. X2. In the present 
example, that is a llst of parcel numbeks, viz., those 
parcels which have an area exceeding 550,000 squaxe feet. 
This list is then assigned as the value of the candidate set 
for X2. 
PAGE 20 
The stack of recursive calls to EVALUA will now unwind, 
until a return is made ta the eJaluation function EVALU. 
This function de-kermines that candidate lists fox all the 
se.tx variables have been found? and weates a hew list of 
variable-candidate 'get pairs for use by the setx functYon 
itself. Finally, EVAIJJ &an call the LISP evaluator, with 
the original logical f oxm as an argtrment . 
The case of nesatives ., The predicate wnotwp denoted in 
our system by not* to distinguish it from the LISP not, 
presents spec=al problchs for the kind of system outlined 
above. # simple exa!nple 05 the difficulty can be seeh L~I 
What drug stores are not in CrafSic 
zone 6 ? 
5651 LOGICAL FORM: 
[set% vX3 
[and 
Cnot* 
(testfct 
'6 'i 
'('TRAFZ X3 '1976) 
?=I 1 1 
4 testfct 
'591 
'('LUC X3 '1b76) 
'= 1 1 1 
Figure 6 
. . . . . . . .. . .. . . . . . . . . . . . .. . . . 
Figure 6, which corresponds to the question 
"What drug stores are not located In traffic zone 6?" 
and variants thereof. "When the testfct specialist is given 
the first half of the anq in this form, along with 
information that therq is a dominat3ng no**, it could in 
principle generate a data base call., since there is on1.y one 
unassigned vaylahle. The effect would be the retrieval of 
all parcel identifiers of parcels nbt located in traffic 
zone 6. This is a subsmt;a~.rtial fraction of the dadta' base, and 
would require in~rdknate amounts of time and storage space 
to handle Notice that the other half of the and dl1 also 
provide a candidate list for the variable L3, presumabfy 
much smaller in size. It appeaxs to be the case? from our 
so far lirni%ed experience, that questions containing only 9 
single negated search clause hardJy ever occur. The 
evaluator therefore puts a testfct cakl of trhis type on the 
stack mentioned earlier, indexe t¶ by the variable( s 1 
corresponding to the parcel id en ti fie^. When the second 
half of the and of Figure 6 is ~XQG~SS~~, and a value found 
fox X3, the deferred testfct will be unstauked, resuFting in 
a data base call, and causihg a retrieval based on that list 
ok identifiers rather than on the negated value. This data 
base search 1s necessary, since we must find the traffic 
zones for the parcels contained in the candidate list. 
This example is also an illustration of why, as was 
mentioned above, the logical form as a whole must in general 
be evaluated by the LISP evaluator. In this case, the 
candidate set far X3 derived from the second clause of the 
PAGE 22 
and is a superset 05 the answer set whicn can anl'y be 
derived by evaluating the wh~ie aon3~n~ction. Some 
esf iciencies could d~ubtless be wdined rby ski~~ina 
evaluatian in those cases where At is ul~nedessaty, hut thnt 
is purely an implementation deuision 
The rl-ot-f of Figure 7 presents a dLfferent kind of pxoblem 
ftaw many banks have a height not exceeding 
9 floors ? 
(setx 'XI 
CquantitG XI 
(setx 'X3 
' (and 
(not* 
cforatlc ast 1 'X45 
(setx 'X6 
' I testfct 
XG 
'I'JSTOR X3 -'I9761 
l= 1 
(grea-tezthan 845 '51, 1 1 
ftestfct 
617 
' ( 'LUC X3 '1'976 1 
'=lI)l) 
from the previous example. Firstly, noti- that the 
negative must be passed inside the quanti4ie~ since the 
alternative of &inding all buildings greater than 5 stories 
in lreight and then getting the complement set with respect 
toc all buildings is extremely unattractive conbputationally. 
In the sgcond placer a search qualifier of "(= 5" does not 
intuitively seem to ba much worse than '9 5". at least in 
the absence of data base distribtttibnal statistlcs. one 
might, fherefore, generate search with such a qualifier. 
Oux pxesent eystem does thisl although experience hay show 
that all instances of t-estfc? dominated by no= should be 
deferred, as a the cases of"v-=" , for efficiency rea~ons. 
Other specia~isbs Most of the important specialist 
routines in Ehe pre7evaluator have already been mentioned. 
There are a few othezs which should be noted. One is a 
generakoz function which, g'iv,en a pxedicate , will produce 
its extension, from a stored list.. This featyre was heavilk 
used in our early system, ahich had a small data base, but 
is currentLy hardly used at all, though it remains 
av-aildble. In principle, one could, given a predicate XiHe 
"SCHOOL(X)tlr generate a list 05 schools. Tn the pzesedt 
applioation, this would not be useful, but might in soqe 
other. The sole uses at present a,re q generator for the 
predicate RANK, far which a list of numbers fxom 1 to 100 3s 
produced, and for the predicate YEAR, which produces a list 
of the numbers 1960 to 1985. 
The proposition "[QUA~TITY x slvl is true if #is equal ko 
the cardinality of the set, 5: The associated specialist hhs 
the obvious functiorr; of determining when g is an instarsce 
PAGE 24 
of setx. 
Equality between variables can be inferred where the 
domain of a quantified variable 1s niuen by qn Lnstance of 
setx, as was illustrated above. Certain predicates also 
allow this inferewe to be made. 1 is clear that 
predicates like "VQUAL'~, "SAMRREFt'-, C for ''same referencevr Is 
and "IDENTIC?AL" should belong to this class. Sinco 
variables can only xefer to individuals, the predicate 
rvMEMBER'T arso is in this class e . g., given (MEMB~R X3 (SCTX 
.. 11. a candidate set 'for X3 can be derived by evaluat~ng 
the seCx expressioh. 
Further..efficiencv conside-rations. Tt has already been 
noted that generation Srom instances 05 testfct with an 
operator of "-=" are deferred until enough information is 
available to execute the quesy using a list of parcel 
identifiers. Some other steps have also been taken to 
reduce daka base access time and subsequent evaluation the 
For one thing, the semantic intesp~eter has a preferred 
order~ng for instances of the predicate testfct. For 
example, the relation h~~~~w divides the parcels of the city 
into 6 classes, while the relation "XUC" [Land Use Code) 
divides the parcels inte several hundred classes. If there 
is no intrinsic reason for ordering the instances of tes+fct 
differently, the one with lvLUC'q wi-11 occ~ ear lie^ in the 
logical formJ (cf. Figure 41. The pre-evaluation specialist 
PAGE 25 
Sox testfat makes use of this ordering in two ways. If a 
Gariable has been assigned a list of identifiers containing 
fewex members- than some thxeskold x, is currently smt to 
25, but can easily be changed), then a retrieval wlll alwa-ys 
be made using the list of identifiers rather than by a 
constant compared. to data base values. In Figure 4, the 
second call to the ,.test-Ect specialist uill look up the ward 
of the foux drug stores instead of Einding the l~undreds of 
parcels in ward 2. In some instances, varticularly far 
relations like Land Use Cde, this may result in mor& d9ta 
base accksses than retrieving a new set of keys depending on 
value, but the improvement cannot be large. In many o%,ber 
instances, there is a big reduction in accesses. 
If the caqdidate set is laxger than.25, retrieval will be 
made using the oonstant, but the length of the curtent 
candidate list is used to limit the number of accesses. 
Thus? if the curren-f;~ candidate list is 50, the data base 
access progEam will terminate if it finds mofe than 50 
identifiess wPth the value being used. A re-access is then 
made using the' list of identifiers. Again, this may r;esuIt 
in.inefficiency in some cases where searches are ended just 
before normu termination, but it does provide a guarantee 
against excessively long zetrievals. 
Any number of other efficiency measures could be adopted? 
and more may be necessary than we now have. For the moment, 
PAGE 26 
these seem to pxovide acceptable retrievaf times. 
The EvQluatox? 
For the most part, evalua%ion of loQica1 forms is quite 
straightforward. Hidden semantic effects are discussed in 
the next sectkon; here we are mainly concerned with 
computation. 
Each instance of setx searchgs the l4st of 
variable-candidate set pairs to find the cand~date set 
associated with its own variable and substitutes the members 
of the set far the variable one by one into ~ts associated 
predicate. Those members of the candidate set fox which the 
predicate evaluates to true are placed in the solutlon set. 
Operation of the quantifier predicates is similar to that 03 
setx, except that, as in Figure 5, ~t may be necessary to 
evaluate an instance of setx to find the domain of the 
qua'ntif ication variable.. 
Evaluation of the ~ther predicates consists simply of 
applying a coz~esponding LISP function to the arguments. 
Sometimes the final fagical form to be evaluated bears no 
obvious relation to the input questi.on, as in Figure 8. The 
usual reason is t11a.t: a large amaurP of evaluation was done 
PAGE 27' 
Are there inore than 25 phrcels in the Carhart 
neighborhood ? 
36229 LOGICAL FORM: 
(greaterthah '176 '25) 
Figure 8 
duxing interpretation. because foxm contained no free 
varzables. The &uLL logical f0r.m corresponding to Figure 8 
Are there more than 25 parcels in the Carhart neighborhood.? 
15986 LOGICAL FORM: 
(farall 'XI15 
(seat% 'X38 
'(quantity 
X38 
(setx 'X34 
'(and 
(testict 
'9 
v(T~~~~~ X34 '1976) 
'= ) 
(parcel X34) 1 1 
(greatexthan XI15 ?25) 1 
Figure 9 
is% given in figure 9. 
PAGE 23 
The evaluatign of the predicate test'fct is not as nbbious 
as that of the othGrs One of the design goals in the 
project has been to make it reIatiLo1y easy to move from one 
data base to another. As past of that ef5brt, we have 
attempted to make the LISP programs, as c-ontsnsted tb the 
PL/I programs, insensitive to the stxuctuxe of the data 
base. Oux approach to ti has been to define a list 
strdcture, essenthlly nested binary relatSanS, into which 
the zeal data st!zucture is mapped. Restructuring is 
accomplished by the PL/I program which serves as the LISP - 
RSS interface. At the same tune. as the PL/I program returns. 
vafues to the testgct specialist durlng tile pre-evaluation 
phase, it $oxmatS the corresponding data base items into the 
sbandard struchre and writes them onto a disk fie In 
effect creating a sub-data base 5or the particular query. 
0x11~ the sub-data base is used durlng evalugtion ofgloglcal 
forms, to find values corresponding to keys in the instances 
of testfct. In addition to isolating the XISP programs from 
€he zeal data structure, this +actlc makes it unnecessary 
for any programs called by the evaluator to re-access the 
full data base, with a consequent efficiency gain. 
Cxeation of the s'tahdard LISP data. bxse into which the 
real data is translated hap mean* that the set of 1 SP 
functions has undergone the Least modification in our chang'e 
of data base from busmess statistics to planning data. 
Except fox improvements made to increase the efficiency of 
PAGE 29 
programg, these 3!!outines are almost the same as they were 
besore. 
$EMANTIC EFFECTS EVALUATIOV 
In principle the processes which will bw used to compute 
the answer to a query should be obvious at the level of 
evher the query structuze or the logical form. We have 
not, however, been zompletely successful in accomplishing 
this. In some cases, we can see how it might be done and 
have n~t gotten around to doing it because of more urgent 
concexns. In other cases, we can see h~w to ds it, but not 
how Lo do it efficiently. In a few cases, it is not clear 
what Vo do. 
Ap~roxirnation. Consider the sexkence and corresponding 
logical form shown in Figure JO. The precise system meaning 
of v~aboutw is clearly hidden In the program cosrespond~ng to 
the operator APPROX. In the present implementation, APPRXIX 
of y and y is true if: 
I J wnen yX10, -x>y-2 and x<y+2, 
2) when 10<y<40, x>y-3 an& x<y+3, 
3) when y>=H, x>y-.05y and x<y+.05y. 
I.e., g and 8 are approxima-t;ely equal to 2, 14 ahd 18 are 
approximately equal to -, 16 and 951 an'd 1049 are 
app~oximately equal to 1000. Whether +h$s ddinitian Ps 
PAGE 30 
Whqt parce3.s are ;p.ssessed at about $ 1000000 ? 
6168 LOGICAL FORM: 
(set% 'X2 
( and 
(testf ct 
' 10'00000 
'(fv~c~e X2 '1976) 
'A~PPROX 1 
(parcel X2) 1 1 
6373 ANSWERS: 
ASS-ESSMENT- 
GI ~k-$ 
Figure 10 
satisfactory or not clearly depeMs on a variety of 
contextual factors. IO+ should also be clear that the 
semantic intexp~etez could groduce a Logical form in which 
this meaning was expressed directly, We have, chosen to 
express the meanlng in our processing progxams primarily for 
convenience, i;e. it was easiest to do it in this way, an4 
there was no obvious reason to do it elsewhere. 
A similar but slightly diffe'rent ew:imple is shewn in 
Figuee 11, where the o..utput rather than the input is to be 
an approximation to the true value. fn this fnStance, a 
fyrrction called FUZZUP is applied to a data base value to 
PAGE 31 
About how many square feet do the,drug 
stores have ? 
7227 LOGICAL FORM: 
7479 ANSWERS : 
Figure 11- 
------------------------------ 
find that number with %he mayimbm number of trailing zeros 
,which. satisffes the APPROX relation. The fuzzed value 
rather than the true value becomes the output. 
A mQre subtle case is illustrated by Figure 12. It seems 
clear that what is really wante,d.are those parcels with an 
area of a m'illion square feet or more, rather than exactly 
l,-OOO,OOO square feet. If the latter result is wanted, the 
question is better phrased "exac<ly l,OOO,OOO"r, (and must be 
phrased- in this 6r a similxr way Sn our system,) On the 
other hand, a value Like lr000,205 s.eems .t;o imply that eyact 
equality is wanted. This intuition is captured fn our system 
PAGE, 32 
what patcels have an axen af 
1,000,000 square feet? 
8416 LOGICAL FORM:' 
(setx 'X2 
' (and 
(foratleast I 'X45 
tsetx 'X5 
' C-te&tfct 
x5 
' (-'PARAREA *X2 ' 1976 1 
'= 3 ) 
'(equal X45-'10000001 1 
(parcel X2) 1 I 
8789 AHSWERS: 
* 
22: 80300000101 
MORE PARTICULR~S DESIRED? 
YES OR NO? 
Yes 
EXPLANATIONS TO THE ANSWERS: 
FOR 70880000900 MORE - 13590410 
FOR 70790000100 MORE - 5977500 
FOR 70790000100 MORE - 5583085 
FOR 803OOOO0.10 1 ALMOST- 958320 
by having the testfct predicate inspect its numeric 
arguments with a function called ROUNDNM, which is true if 
an argument is a round number, defined in our syst~m to be a 
number greater than 99 in whish at least the rightmost half 
a5 its digits are 2. In the case-of round numbers, it seems 
reizS-dnable to give as an arrswer the identifier of a ~ar;cel 
PAGE 33 
whose area is only slightly leas khan 1 POOO, OQO square feet, 
as well as greater.. In our implementation, we use the same 
lower limit as $or 9PPROX, but this may be too low. Xn 
order Lo insure that *he anawer is correctly understood by 
the user, the system saves the exact values retrieved and 
displays them on request, as shown In Figure-12. 
Esualit~ of charactex values. A problem analagous to a at 
i 
of numerical appxoximations occurs also in comparing 
character st~ing values. Consider the question and answer 
pair shotdn in FiBure 13. The contents of the OWNER Sield 
What parcels does Shell- own ? 
4244 LOGICAL FORM: 
(set% 'X2 
' (and 
(testfct 
' SHELL 
'('OWNER X2 '1976) 
'r ) 
(parcel X2) 1. J 
4432 ANSWERS: 
SHELL OIL COMPANY 
SHELL OIL CO 
-I------------- .11111-1. 
have not been standardized, so that parcels could be owned 
by 'vSheL1 Oilw, "Shkll Oil Co.", etc. Fortunately, far names 
of persons', last names are listed fixst, so that; the 
strategy of assuming equality if the input argument and the 
field value match up to a comma ox a blank is genezally 
successful. Problems do arise; for example, properties 
belong both to llThe City of . . . l1 and lVCfty 05 . . . ', wl~ere 
the left match fails to 5ind a11 the xelevant data items. 
The opposite situation, i.e., aver-generalization, can of 
what parcels does Gluck own ? 
4525 LOGICAL FORM: 
Zsetx lX2 
(and 
(testfct 
' GLUCK 
l('OWNER X2 '1976) 
' = 1 
(parcel X21 1 I 
GLUCK, DE & ORS 
GLUCK, CP 
Figure 14 
course also occur, cf. Figure 14. Tn any event, the 
decision what constitutes sameness reference 
buried in computer code in this instance in the PZ/X 
ptogrsm as well as in the LISP definition 3f the runctjon 
PAGE 35 
SAHEREF. 
?ef initions. The extensional defini$ion of 
most 
predicates can be derived from the data base. A few 
pxedicates axe de5ined by f11e system code. ExampJes are RANK 
and YEAR. uuhieh as mentioned above have associated 
generazors An additional example is LASTYEAR which is 
defined to be the previous year. Many othez definitions of 
this kind have been elimihated in the current version df the 
system. 
.Answers. It is not always obvious what constitutes the 
answer to 9 question. Consider the example in Figure 15. 
Both the English question in its literal reading and the 
logical form would seem to imply tkat the question would be 
answered by presenting only the numbers in the right hand 
column of the tahle which is actual3.y printed as an answer. 
Yet it is quite clear that a simple list would generally be 
useless without the parcel identifiers printed on the left, 
and indeed that identification would be expected by the 
person entering such a question. The example of Figuke 16 
PAGE 36 
what is the gross floor area of the drug stores ? 
72Q5 LOGYCAL FORM: 
7465 ANSWERS: 
GROUND-FLOOR 
AREA-SQ-FT 
Figure 15 
is less clear. An enumeration of the three waxdS in which 
the four drug stores were located might have been a 
sufficient answer. The answer given would be coryect for 
Yri bhat ward is each drug store located?" 
Moreover, given the question 
"What axe the wards which have drug stores?" 
it is clea~ hhat only n 3.ist of wards shoul$ be the output, 
and given 
"What is the combined floor area 02 the drug 
only a single number representAng tne total is the desired 
In what wards axe the drug stgses located ? 
9-403 LOGICAL FORM! 
(setx 'X3 
Vforatleast 1 'X64 
(90430000910 80100000811 8010000,0710 
7059bOO 16 10 1 
' (test5ct 
X3 
'('WARD X6Y '1976) 
'= 1 1 
9597 ANSWERS : 
WARD 
Figure 16 
answer. (Our system does not as yet answer this questioh or 
its analogues, klthougth this is planned for later in the 
yes.) Since the ambiguity exhibited by the question of 
Figure 14 is so pervasive in an application of this kind, we 
have chosen to present a maximally genezal answer? in~luding 
identifications, when we are unable to resolve the ambiguity 
directly. An exchange with the user could be devised to 
elicit the information for resolution, but would apidl y 
bechme tedious for questions of this type., For yes/no 
questions, and far questions in which there is adly one 
abject in the answer set, this problem naturally does not 
PAGE 33 
arise, and the apprapriate answer is easdly produced.. 
We have not yet concexned auxseLves with adding an 
English response generator tb the TQA system. In the 
applications envisioned at present, such a capability does 
n8t seem to be critical. We are able to manage with short 
answers from the data base and with canned information and 
esror messages. In spite of this omission, it should aka 
be apparent that our computational component has a 
considerable amount of lingui-stic knowledge embedded in it, 
more than we would like. Whether it is possible to achieve 
a level af formal representation which would make this 
unnecessazy is stir1 unclear. Moreover, even if i-1; weze 
passiblq, it is not clear whether such a solution would be 
efficient enauyh, or even if St would be more pexspicuous 
than the current system We intend to proceed as far as we 
are able in this direction, out of conviction %hat 
practically useful systems must be easily adaptable to new 
ayplications, and that such adaptation is much hore 
difficult when computer code, even high-level computer code, 
must be changed, rather than tables. This is not to impw 
that we regard modification 09 a table whose size is on the 
order of a grammar as trlvlal; quite the contrary. 
Nonetheless, we believe it is easier to change-a grammax or 
PAGE 39 
a semantic interpreter expressed in table form than it is to 
change a Special parser ox a special interpreter. In 
essence, we believe it should not be necessary for a 
computational linguistics project to describe operations 
Beyond the last level of farma1 representation in order for 
an outsider to Andexstand edactly how 'a system opezates. 
PAGE 40 
This system was fazmerly called REQUEST, 
The form 03 Figwe 3 is, in fact, subject to tinother 
syntactic transformation prior to execution. Normally, 
3ora-t:l.e~st needs to be executed once for each potential 
value of the setx variable. However, in the case where the 
quahtificationa1. range of f0~rat1eas.t 1 is a constant, 
repeated evaluation of th& quantifier is quite 
inefficient. Instead, a special retrieval functian called 
MAPFIELQ, which can accept a lis tJof arguments, replacas 
foxms like those of Figure 3. In th~s example the 
re-placement taKes the form 
( MAPFIELD 'x77 'JSTOR '(5043 .... ... 00) '1976 ' 1 
Although- th~s transfarrn&ion arises- quite oLten in practice, 
~t is su$fi,ciently non-general that we have not augmented 
our inventory of logical forms by including MAPFIELD. 
Instead, we look on it as an implemen&ation measure only. 
PAGE 41 
American Journal of Computational Linguistics 
Microfiche 75: 43 
ONE I1lORE STEP TOIJARD C 0 A P U T E R 
L E X I C 0 M E T P Y 
NICHOLAS V. FINDLER AND SHU-HWA LEE 
Department of Computer Science 
State University of New York at Buffalo 
4226 Ridge Lea Road 
Amherst, New Yokk 14226 
ABSTRACT 
We describe the continuation of an earlier b70rk on the 
prbblm of lexical coverage. The objective is to prove 
experimentally certain mathematical conjectures concerning the 
relationshi? between the sizes of the covering and covered sets 
of words, an&-- maximun lenqth of dictionary definitions. The 
data base on which the experiments are cerried sut bas been also 
extended t6 the full contents -of an existinq dictionary of 
computer terminology. The rwults of the previous and present 
work lay the foundqtions for quantitative studies on lexical 
valence and its relation to the frequency of usage and other 
principles ofb ditztionary selection. 
Besides the inherent interest in t-hese investigations , the 
concepts dealt with and the methods of cgantifying dictionary 
variables may eventually lead to more' efficient dictionaries with 
respect to precision, compactness, and computer time andmemory 
needed for processing. 
Supported by ASF Grant MCS 76-24278. 
First, we shall introduce the problem define same basic 
terms and provide a brief historical account of pi(st results. In 
order to rendter this paper fiairly self-sufficient, @ brief 
sunwry of th& previous work, Flndler Viil (1974) , %"ill af so 
have to be giken. 
A mono1 ingual dictionary may be considered economical and 
efficient if a mall set of words are used to define a relatively 
large set of entties. Quantitative information as to what size 
vocabulary is needed to cover a given number of entries is very 
scarce and may be characterized by two "data points*: 
The New Method Enalish Dictionary published by N.P. Best and 
J.G. Endicott in 1961 uses 1,490 self-defined basic words to 
explain some 10,000 words and 6,000 idioms, i.e. about 24,000 
expressions, Thus, the size ratio is 0.062. 
Oqien's Basic English, published in 1933, involves 850 
English words and 50 !!internationaly words to defihe 20,OOu 
EnglTsh words. The ratio of the covering and covered set sizes 
is 0.045. 
The basis of selection was the Wusefulness" of the words 
employed in the definitions, as opposed to the freouencg of their 
occurrence in some standard texts. Howelvet , neither this concept 
nor other principles of selection suggested by other researchers 
have ever been quantitatively analyzes and made use of. We  hall 
discuss these issues later on. 
In order to approach the 
problem in definite terns, Findler 
( 1 970) considered three basic variables : 
(i) 
the c~verinp set, I Rt qf sfze vR, 
- 
(ii) 
the covered set, - S, of size us, 
- 
(iii) 
the maximum fief inition leffpth, -I N 
such that each word 
in I) S can be defined hy at most - N ordered words from - R. 
The task was formulated to find 
(4 
v, as a fpnction of vS at different parmetric values 
- - 
of N, and 
I 
(b) V~ as a function of - P.r at different parametric Values 
Callinu AvR/AvS increment ratio and vR/vS size ratio, the 
following cbn) ectures vere made concernina the f Srst task : 
(al)  he ,increment ratio is, in general, less than one. 
(a2) The increment ratio, in general, decreases as vs 
increases , 
(a3) 
For larue constant values or Nt 
vR 
- 
approaches a 
- 
limitin? value asvmptotically as vs increasep. 
(a4) The increme& ratio never exceeds the size ratio. 
Two points need to benoted in this connection. An excep- 
tion co rules (al) and (a21 would occur in a dictionary system, 
whi'ch does not treat polysemous words or homonyms as individual 
entries, every time a new word with many meanings or homonyms~ i.s 
introduced into the covered set, Second, the cited case is an 
exception to ruie (al) but not to (a4) . When N=1, the covering 
and the covered sets are of the same size, i.e, botn the incre- 
ment ra'tio and the size batio equal one, However, not every word 
is defined By itself only. If a new word is introquced that. al- 
ready has a synbonm in the coverihg set, it will be defined by 
that synonym. In this caser the increment ratio is 0 and the size 
ratio becomes less than 1. (This will be clear with the descrip- 
tion of the data base construction on page 11.) 
For the seoond general task, (b) the followinq 
conjectures were also mads: 
(bl) 
vR monotonically decreases aer I N rncreases. 
(b2) For any fixed value of vS, vR asymptoticallv 
- - 
a~pfoaches a lower limit as - N increases ahout bound. 
~ft seems reasonable to state $n a nualitative sense that in 
the process of aenerating a dictionarJl ~maller vR values mean 
- 
smaller storage remirements whereas smaler I N values td to 
reduce processinp time and output volume. In order tp answer the 
question "What are the optirum Hlues of vR and - N for a given vs 
- 
- 
for a certain (family of) conmuter applications on a machine with 
a given cost structure?' one hlts to consider the interrelation 
of the above three basic variables and to compute three entitles: 
the semantic index (raughlv, the nwef of different meanings1 of 
the elements in the covered set, the lexical valence (roughly 
the capahilitv substituted for another the 
elements in the coverincr set, and the fwyency of dccurence of 
.. 
the elements of bath aeta. Quantitative invest2gations of the 
last three dictionary variables are planned to follou! the 
present, second stage of our study, 
THE DATA BASE AND THE PRWRA' 
We have e~tended the data base used in our preaious work, 
Findler and Viil (1.974). 
The whole contents of the alcwonary on 
computer technalooy, Chandar (1970) , is now included in the 
presenr study. Its structure, cathex simple and unif arm, is 
described below. First, same ereneral principles of data hase 
constructiop are outlined. 
Evew element of the covereg set is considered a single 
lexical item, regardless of the number of words the ori~inal 
dictionary entry consiats of. Also. each word J.s coded as a 
striha of at moat 10 characters (containable in one CDC Cyber 
computer word). The abbrevhtians ere still easy to *ad with 
relatively short practice. 
Only the dominant meanincj bf poJgsemous terms was dealt 
with. 
Each entrv .\ has thus one meaning and one definition. 
Termr in the definitions (elements of the coverfng set) are 
also qonsidored lexical items, Le. even multiword entl.t~es 
appear as a sfnple unit and are represented by at most 10 
characters, 
The basic vocabulary, that is the covering set,  consist^ of 
elements tha.t also appeap in thq pvered set-. In our particular 
case, they are non-technical words used to aefine the technical 
tens of t-he computer dictiwanfi. definite distinction was 
made between content wbras and functiqn words (also called 
operators), The latter were not bnclud&d in the covefing set nor 
were they counted in determining the length of definitions. 
Hence, eh* covaring Set conbists only of content words. 
The function words indicate grammatical and loaical 
relationships between the words contributing to the content. 
They belong to 17 categories: 
1) prepositions, o.a. of, &, E: 
2) conjulllctions, e.p. and, - -r or - if; 
3) 
the relative pronoun which; - 
preposition. and relative ~ronoun, 
which, to which, bv whichm 
- L' 
5) prefiefit particlple~ equivalent to a areposition, e. g. 
usinq ,I containiny, representinn: 
6) comkinakiens of participle and preposition, e4b 
consisting of, opposed to, applied to; 
7) combinations of ad j sctive and preposition, em q, capable 
af, ~xclusiva of, equal to; 
- 
8) combinatiohs of noun and nrevsitian, e.~. part ofi - set 
of, - number of; 
9) combinations of ~r'eposition, now, and preposition, e. a. 
in terms of, bv means of, in the form pf; 
10) prepositional phrases associated with a f ollowiny 
inflnitfve, e.9, used to, necessary to, in order to: 
1 1) other f requentlv used purely functional expressions, e. a. 
for example, namelv known as. 
Y8 
Actually, the E~ction words byere rc~laced hy code numbers 
in the dictionary. The code numhers were assigned consecutively 
as the function wards-evere needed durincr the conqtructian of the 
eta base so that the order is puxelv random. A complete list of 
the 121 FunctioR wards used, toaether with their code numbers, is 
qiven in Table I. 
"w-m----~-~-m---m-mmmmm-pm---* 
IN$FRT TABLE I ABOUT HERE 
 he oriqinal definitions were oarnewhat silnplif ied and 
qtandazdized. In this process, articles were omitted (many 
languaces do very well without them). On tffe other hand, 
implicit relationships were made explicik, NOWS are represented 
in singular, thus avoid in^ another dictionary entrv far plural 
or, what would be worse, prourasmnina a mcrrammnr". Likewise. 
fdinite verb fom are represented in third person plural pre'sent 
iddicativs active. 
FNddinu the third person singular eliminates, 
another dietianary entry, and avoiding thg nassive voice 
eliminates a great manv participles, which otherwise ulould have 
had* to be entered. Of course, present .and past participles (the 
former identical to gerund in farm) could not always be avoided 
and had to he entered in the dictionary where needed. Auxiliary 
verbs vere automatically eliminated by avoiding gompound tenses 
and the passive voice. Finally, *to don associated with neaationp 
was sim~ly omitted. 
Some examples dl1 make the encodina process clear. 
Original dictionary entry: 
aberration A defect in the electronic lens svstem of a 
cathode rap tube. 
Definition in the data base: 
DEFECT (in) SYSTEM (of) ELECTSONIC LENS (of) 
CATHRAY TUB 
is equivalent to 
of 
in 
In terms of 
using 
and 
which 
in which 
between 
to 
or 
from 
used to 
necessary to 
part of 
consisting of 
containing 
capable of 
by means of 
opposed to 
when 
on 
so that 
in order to 
exclusive of 
fox 
pertaining t6 
under 
as 
such as 
among 
by 
namely 
related to. 
concerned with 
based on 
constituting 
resulting from 
set of 
includf ng 
followed by 
provided by 
developed by 
assigned to 
ref erred to 
used as 
in the farm of 
from which 
into which 
number of 
less 
defining 
known as 
perzarming 
performed by 
independeh t of 
chosen by 
for which 
equal to 92. at which 
i ntc) 93. whether 
with 
acsordinq to 
applied to 
depending on 
to which 
whose 
94, used by 
95. about 
96. before 
97, per 
98, having 
99, formed by 
obtained bg 100, around 
inherent kn 
through 
during 
where 
during which 
out of 
at 
101, after 
102, since 
103. against 
104, until 
1 0 5. whereupon 
6- wcept 
107. urcermined by 
by which 108. over which 
used in 109, in relation to 
without 
caused by 
over: 
110, belonging to 
111. correspondlnq to 
112. due to 
not 11 3, zeq-red far 
but 114, type of 
extended to 115. across 
SO as to 116. because 
for example 1 7 desigxied 
represented by 118, indicating 
along which 119. produced by 
representing 120. outside 
against which 121, towards 
similar to 
TABLE I 
uist of Function Words 
Wote that melactronic lens systemn (should be: 
electronic-lens system) means * 'system of electronic lensw (as 
opposed to *electronic system of lens*) , and this relationshiv is 
nade explicit. Note also that "cathode Pay tubeM is a sinqle 
lexical item. 
original d ict~onar!~ entry: 
ahsolute ccdj nq Pmqram instructions tl~hich, have been rb~rittqn 
in abaolute code, and do not reqyl-re further procesaina 
hefa- bnina intelligible to the computer. 
Dqta-babe entrv: ABSOCOfiINC 
Definition : 
PROGW INShTRUCTIO (which) ONE PTRITF (in) 
ABSOLITCODE ( and which not) REOVTRE FURTHE k. 
PWESSIMG \(before) INTELIGIBL (to) COMPUTER 
Note that the fryst predicate in the relative clause, thim 
person plural perfect indicative passive, is represented by the* 
singuldr indefinite pronoun "one" as sub jeet, follobfed by the 
standard olural active verb. The Auxiliary "dou has been omitted 
and the negation is represented by a function word. The 
virtually redundant "beingw has also been left out. In qeneral, 
the cormla is omitted (some lancpa~es do very well witbout it). 
Original dictionary entry: 
analytical function qenerator A function generator in which 
the function is a physical law. Also known as natural law 
function generator, natural function generator. 
Data-base entry : ANLYTFNCE~ 
Definition: 
53 
WNCGENRTR (in - which) FUNCTION P,HYSICAL LAW 
Note also the omisaibn of the glass "Also known as . 
M 
The styligad definitionsbare easily mhderstandable even to 
human readers as the printout or the dictionarv demonstrates. 
The data, base was constructed by selecting the first entry, 
then entering all the lexical items in its definition, subsequent- 
ly enterinq all the lexical items in the definitions of these, 
etc. Words that were not defined in Zhe original dictionary were 
entered and defined hv themselves; they constitute the basic 
vocabulary. This procedure was continued until everythhff was 
defined, i.e. until all the terms in the coverina set were also 
irr the covered set. Then the next entry was selected from the 
dictionary, ah8 the above process was repeated. -. 
The dictionary was arranged in the form pf a SLIP list, 
~indler et al. (1 971) . Cvery entry (element of the covered set) 
occupies four cells in this list: (1) enkrv word - (as 
character data, usina FORTWN format specif i&B'tIon A10) , (2) def - 
inition length (an inteqer) , (3) type of entry (an integer) , (4) 
sublist nahe. 
Three types of entries were distinguished for programming 
convenience : I 
1) code 0 indicates that the entry ikself is not used in 
any definition i,e, i;t occuxs only in the covered set and 
not in the covering set; 
2) coda1 inaigates that the entry occurs in both Sets and 
is not an element of the basic vocabulary; 
3) code 2 indicates that the entry is deiiined by iteelf, 
i.e. it belongs to tHe basic vocabulary. 
54 
The sublist the nahe of which is in the fourth cell for 
every entrv .,, in thq main list, contains the definition.  his 
arrangement convenient1,y separates the entry worda from those in 
the definitions. 
A cell in this second level contains either a wbnd (in A10 
tormat) , i. e. an element of the coverinq set, or a sublist name. 
The codes f,or fQnctian words (integers) are contkined in the 
cells in the third level, This arranaemen-t> is wntrenient for 
bypassinq the function words ih orocessino vhen they are not 
needed. The aeneral dictionarv entrv and an example thereof are 
illustrated in Figure 1. 
INSERT FIGURE 1 =OUT HEFE 
The fact that every dictionary entry ovns a sublist is 
aractical in another respect: useful information about the entry 
om be collected and deposited in a description list associated 
vrtth the sublist, Pot example, if it, were desired to evaluate 
the definition component of the lexical valence of each lexi-1 
itm, a proaram could be developed that counts how manv times a 
paxtkular item occurs in the definition a.f ~ther items and 
stodes this information in the description list created for th~t 
item. Investiaatians of thf s nature vill be done buhsequently . 
The task is to establish experimentally the reistionship 
between N and vR for fixed values of vS. The Program starts out 
I) 
1)1, - 
with the values of some fixed data point obtained in the previous 
I name I 
~ntry 
word 
L 
~efinition 
length 
Entry 
type 
sublist 
Data Stxucture for a Dictionary Entry 
, A 
An Exemp3pry Dictionary Entry 
RUN = : !lPFJWOBMBNCE OL (128 ONE PROGRAM OX (= 1 1 ) ROUTINE 
Definition length: 4; antry type: 1. 
FIGURE 1b 
- 
I 
C 
w 
Rm 
4 
1 
sublist 
Header 
.3 
name 
2 
Sublist 
name 
-. J 
-. a 
ONE 
-... 
-- .. ---L--.-- 
n 
PROGRAM 
Sublist, 
name 
t- > 
Sublist 
Header 
11 
ctu?s, Pindler and Vr 31 (107U), or one calculate6 far tbe 
extend~P data haqe 
The q3 ze of tbc coverlnn wt, 7yR i s then 
def+nitwns cf lerrth 1, 2, 3, I!-, etc. [m~~~lJ, COAF ? nCapC 
that SUC~ entr~es ere not Aecrned therqelves and occur pot!- Jx, 
the mvcrlncr arc? the covered net ) Pfter t%e Putst~tut~ar~ ?re 
made in all deC~njtfene an6 the rorc's are countee out of vh, the 
for dl fcerent 84 7e cavere3 *SC~S , 1. e. vS js 
levt at AJ f*erert 
- 
c~nst~nt 3e~cI.s~ for eath I. (r e rote tFat a ~ui\rtltatia~l\~ 
r ore cat7 sFactorv ref ]Per ert ccsul? hatre been a$de? t the 
wth all the reralnrncr defanitinns, and tl-c%e which do not arnear 
m ant defirltlon are to be el~rinaked. ThQs a hawc bard \70uf8 
occur jn the drctionan? nnlxr If it is needed in a Pef~nltlon, 
vhlch ~fi the case in tbp u9lreduced Plctionarv. Thy c. I av, q more 
natural ~romrtion between the hasic \-ores and other.; cou&e be 
restored. Fmvever, Jn tbe present prel~mmart war), tTe $$d not 
7 lsh to pav the considerahlv bioher price for such ref ihement. ) 
The procrram 1s verv com~ler for two basic reasons. First, 
the def ~nitlons of pards to he replaced matT themselves contam 
one or more words to be replaced. Tberefcre, as ranfr as 
necessarv zterations of rgplacment have to be carrleA out ibl the 
orocess. Second, tbe huae date hase revresentino the uhole 
dlctlonarv bad to be s@dividea ~nto fxles onlv one of brh~ch 
can be dealt w3 th hy the nronran at a kine. The lptemedlate 
results of one run P%ve to 
he transferred ta the subsenuent run, 
trrhid remj res some trM v vracTramrrJna. A hrj ef desclrlrstloh of 
57 
tbe multi-fxle Aandlqncr is smen rn the AppendJx. 
Figure 2, sununarlees the results for four different levels of 
the cavered set. Althauqh the procedure followed (leavinn one 
,and then two fr les out of the nine, and adqustlnq for the hias 
intro8ucedl leads to twantltht~ve jnaccuracl es, the con~ectures 
llsted 1 n the Jntroduckicm are fully corroborated. 
.IIC-~~CI.I~DI.LLI)~.IIICIIICqlLLo~.Lo.L.L~Lo 
INSF-PT FtcURF 2 ABOUT HFPF 
FIYAL CCIETN'J'F 
The data base encoded, some of the prooram used an& mast 
of all, &he exper3ence crained in deallno urf tb E4cUonaries and 
thel r character3 stlc varlahles o~i 11 be useful in aktac].lno the 
next set of prohlms, mhe Latter re3ate to the mestion on what 
size vocabularv iff needed to cover a criven number of dictiohary 
entries (without the ubl nultouq cfrcular defl nitions) . The 
answer should be owen a4 a function of storaqe reoulrements and 
+rocess~nq tjme ao that an optimum solution can be obtained for a 
famllv of appljc8tlons on a mach~ne trf tb a ajven cost structure. 
Such studv will involve the semantic frdex of the elements 6f the 
covered set, the lexical valence OF tlre elements a* tbe coverxnu 
set, and the frenuencv nf occurrence, of the elements of both sets, 
~p.cKMOWLEr.?cE~mE 
we thanr H. Viil, who co-authored with one of US (N.T-.v.) 
the fzrst phaqe of thls or, or mapv 3cleas and stmulaths 
d~scussians. We are also indebted to Penauin Rooks for thezr 
curve A 
curve B v = 2300 
s 
curve C 
v = 2480 
S 
curve D v = 2877 
S 
Varratlon of Maximum Deflnltron Length wzth the Sue of Coverrng Set 
FIGURE 2 
59 
permussion to use one of their publications as oar data has& 
In the followinq, we sive a brxef degcrlptjon of the wav 
vultl-frle bandling has been orcranized. 
It was noted before that the  hole djct~bnary could not he 
f5tted i~ the core maom at one the and, therefore, the data 
base had to be subd~vj-ded jnto Q f~3es to be nrocegsed 
separatplv. There was a need, hawever, far ~ame flaw a+ 
mfornratlon between mas dealin9 with the different files. Tbls 
was arranoed by additional files constructed durlno nrocesslna 
tine as v~ell as a fecl7 control varrahle values Fejna read fros 
cards at the bealnnlna of runs subseauent te the i4rst one. 
The varjahle KNTPFT i~dicates the sectran of the Ajctionary 
currentlv under studv, The variable IPCONT 1s set to 0 for the 
venr first run for each N value. Thfs tells the proarm to set 
Ir 
up new lists tor Cover& Ili~t, Coverina Lht, and SF-called 
f?ait J nu List. tn all subseauent runs, its VetSue is 1 t~hl ch 
indicates that the proarap must brjna these lists fn from an 
addit~onal, external file. 
The nrcmrar exanfnes tbe current qeetion of the dlctjonarv, 
entrv v entrrf. IF the entrv I- an ele~ent of the haslc 
vocahttlarr (tvpe 2)" the prooram byaasses it vhen it Peals wxth 
the unxebucea itictxonaxv (~t +9 hound to be nraeessed as r~rt of 
a def jnition Later) . Fthertui ae, th s type of r ad is hedlately 
added to both the Covered List ant? the Paverincr Zist (c?ucb ~mrd 
aluta~rs caverq itselr) , since tb~ Ocrf~~itlon~ in tihlcb tbev occur 
may Pave been ~l~rn~nated. 
Tf a vrbrcl 3s not founil fin the Coverad ~1st~ It in ~i~t th~r~ 
and the appronrJ ate counter is 3 ncrenentad. vhen all the t ores 
zn the deflnj thn of the v70rA In csuestxan are nut an tbe F'ajtJncr 
Llst, vhhtch 1s suhsementlv processed. Thy s 3 s recessart? because 
of the ado~tea rr~rc~rl-e that all t covermu tnrds ruqt 
thmselvecr Fe covere?. (Tabuhttd data are Wan~ncr*ul onlv If 
tbw condxtlon 4s satlsfjefi,) 
The rracrrar eventu~llv exmlnes the Pa+tlncr t~st %wrA htr 
tmrrj, If the carrent vmrii I$ alreadlr an the Pnvcre8 Ll=t (jt mart 
have recurred earljer in the Pirtmnarp) , the nronyam cbrcl.s if 
~t is a190 fin the Cover~ncr Lltqt (~t msv n~t he becan~e ft bas net' 
vet occurred in the Ctef~nq tmn of anotber I ard) . If not, ~t %p 
nut there ~nd t%e avnrcnr~ate rourtex 1s .rtlcrepsed nX3 I c\rec: or 
the ~~a~tinq Ilst COFP fr~~ rl~finrt~~ns am1 lnust tFerefore be 
ad fled to tbe Cnverjnc qt . 
after 8 mrP  ha^ keen rsocessdP, it 
1s deleted frer the fhjtl~o f 7 ~t (but it4 proce~~1~0 FaV ~AVC 
caused net, entries to annew nn the yT~itlnrr f let) . 
Tf the curr~rt ~~ort? j c: n~t Pn the Covered Tht, ft rust, of 
course, he nut therp. Pmt, hh~der, the proor~lr testp If the 
card, occurs in the sertlon aC the (1J~tronam qy~ently in core 
mmorv (it9 %nuer~ca1 ~alue?~ Fettfleen tFe9e of the fjrst end 
the last vord of the sectjon) , Tf' the \tor8 IC not there, it* 
processinr 1~ ~ast~oned and the nevt ~mr? on, the \7altlina 7 1st fs 
exaln~nee! because it zs ,yare econo~j ca3 to nroceqc. f 7 rrt all tFe PII 
!lords avalrlahle in the dr ct~onaxy sectlan present than to r~e4 {n 
other sect~arct of tbe dlctwnerv a6 tte wrPs dfctatc zt (meronr 
svfppno I s expens~ve) . 
'?en the hottar of P non-mptv wa?t~na t~~t I$ reached, the 
wrds r~najn~ncr there bust be in other srctions of the 
AI ~C~onarlr. @u~se~uent d ~ctlanasv ~ect3cn~ are k mu~ht 111, to 
replace the current ope, in a c'w11 c manper wltj 1 all rrocesslnrc 
u ccmnleted. 
American Journal of Comput atlonal Linguistics 
COIIPUTATION IN D E P A R T ti E N T S OF L I W G U I S T I C S 
RICHARD FRITZSON 
Department ,of Llnguis tics 
State Unlverslty of New York at Buffalo 
Buffalo, New York 14261 
That computers and linguists meet, for the host part, only in the skill 
sorne~hat evotic field of computational linguistics is $ sad statement about 
the st2te of ordinary linguistic research The titre bhen computers were to 
be considered only the tool of the natural scientist or the statist$cally 
minded social scieqtist is long past, 'word processing technology' is now 
the specialty of a growing number of computer companies Not only can this 
techrology be of great value in reducing the clerical burden of the linguigt 
and linguistics student, but, iiinguists, as specialists who have been studying 
and manipulating language far years, are in a position to be contributing to 
this field 
Jn fact, in many areas of linguistic research the analysis of 
particular languages, the search for li~rgui&tic universal s, the analysis of 
discourse and text, 
computar technology can bc of help tc the linguist, and, 
in many subfields of computer science automated lnngua~e processing, the 
deslgn of human/machme i~terf~ces, the structuring of data bases, linguistics 
has much to offer the ccnnputer scientist, vet up until how, relatively few 
such cross contributions have been made Computer scientists have been slow 
to discgvei the vdue of Ilnguistirs to their wor~, the tine has come for 
linguists to take the initiitivc and to train themselves (and their students) 
to hake use of and contribute to the field of computer science, 
Speci?lized traltning in the us& of the corrputer with-ln a particular 
discipline is not new Students in mary soclal sciences nok flnd themselves 
facing Lncreaslng pressure and rnanaatorf rcquirnQents to take co~ptsr training 
wlthin the* department, f iflghi~tics is, In fact, unusual III not having such 
requirements or even oppur tunities At a time wnen graduating linguistics 
students al. facing a shrinking job market, the oppurtunity to be trained ia s 
~commercjallj useful application of lin~uistics ougnt to be attractive to many 
students 
Today, in most unfversiti~, coaput41,g is dvililable to linguistics 
departments only through the use of a large, central university computer which 
is expected to he of aervice to all university departments. But, as computer 
casts continue to fall, and, as larse computing centers continue to be 
unresponsive to the needs of their new users, it will not be qneammon to find 
more and mbre departments purchasing their owh computing facilities and buying 
or developing their own software This is already happening today, both by 
externally funded individual researchers and by entire departments in need of 
specialized computing facilities What kinds of computing equipment are 
available4or a linguistics department crying to equip itself today? 
My Bnswer is structured, to some extent, by the organization bf language 
It is widely understood, even by non-stratificational linguists, that the 
faculty of language is based on a stack of structured systems, each one building 
a large number of units above from a smaller number below, i,e a handful of 
phonetic features combine to form less than fifty phonemic segments which 
combine to form thousands of morphemes,tens or hundreds of thousands of 
words, an infinite number of sentences and texts expressing countless ideas 
and concepts It will not be surprising to find that as one climbs this 
&tack, from phonology upward, the amount of computing powet needed to perform 
useful tasks and research increases in proportion to the increasing number 
of units and the complexity of their structuring I will concernmyself, 
mostly, with the possibilities available for the study of the lower levels 
This PB because the type of linguistic work being done in the study of the 
semantic and cognitive levels is still primarily research and the people 
involved ere more likely to already know their needs and options as far as 
computing goes Also, since the cost of computing in these areas is somewhat 
higher, it is less likely that department& will be doing their own purchasing 
for these purposes 
HARDWARE FOR T)IE PHONOLOGIST 
The 8tudent of phonology, morphology and linguistic f feld analysis is 
concerned .,primarily with the manfpulation of linguist~c text, expressed as a 
series ot phonemic symbols or blokks of phonetic features. fie task is to 
identify identical or similar subqttings, correlate their appearance with a 
particular meaning and segment the text into these identified substrings As 
new substrings aft fdentif ied, the text is ofttn rewritten with o new orgcx~izatlofi 
based on new understandings, so as to improve the chances oP f idding new 
~ubsttings, field workers often use index cards for this purpose Problem 
after problem is solved in this way, with a pot insignificant amount of time 
being spent in the reorganizing and tecopying stages It is a tedidus business 
because it is t ery mechanical In fact, efficient computer algoritllms tor 
doing much of thq job already exist and have been implemented on nedrly all 
computers in the form of text editom The task 1$ relatively simple and even 
the smallest computer available can do an adequate job 
A linguistics department intcrested %n providing its students ~ith training 
in the use of computers for this kind of work (and they will become standard 
toola for the purpose very soon) would do well to purchase as many (one or 
more) identicdl, small (hobbyist size) computers aq it can afford For 
educational purposes, the very snalles t microcomputers, equipped dith qodest 
~ss storage devices, such as tape cassettes or floppy discs, are jbst fine 
Assignments in classes can be distributed on departmentally owned or student 
owned tapes or discs (less that $10 each) These can be automatically duplioated 
just as assignments are nov mimeographed, they are reuseable and usually 
Contain enough room to store several assignments, including the partial 
results &tom day tp da and final solutions For larger, research sized 
projects, involving dot of telt, or more complicntcd analyses, suuh as 
automated analysis of phonof ogical tactics, the fasteat microcomputers, with 
larger mass storage devices, might be more appropriate 
(Imlicit in the discussion of these types of machines fa the fact thst 
student use of them is via Bh interactive terminal Microcamputers are not 
typically operated in 'batch mode', and no benefit could be derived from 
doing linguistic analysis in anv but an interactive mode of operation ) 
Whih nl~crocomputers and associated memories are relatively inexpensive, 
linguists have a genuine need for sophisticated input and output d~vfces which 
arq somewhat more expensive Standard coaputer terminals generally provlde 
all md only the characters available on a typewriter keyboard, some provide 
only upper case letters What is needed is a terminal with the sme capabilities 
9s the selectric style typewriter one with changeable type fonts, including the 
standard phonetic symbol alphabet, yith diacritics CRT terminals' (cathode 
ray tuber terminals) can provide this type'of operation more cheapty, more 
relkably, and more flexibly than printing terminals (there is no need to stop 
and change type fonts) CRT terminals which support user designed type fonts 
are available, arid in fact, may be the only ones on which the standard phonetic 
alphabet can be cu~rently supplied These terminals at2 somewhat expensive 
(apveral thousand dollqrs each), but since they are very flexiblel and often 
I 
support some degree of computer gtaphics display as well ad haung che 
potential to display texts written in any language, they are vahuable 
educational taolc 
If all or most of the termint\le in a department are CRT type termlnala, 
it will be necessary to provide some means of producing 'hard copy' output 
on paper 
While most interactions with a computer can take place on a screen, 
some record of the results of a session will be needed for study and evaluation 
Printers which can handle the fhexible type fonts needed by linguists are 
available They are fast they operate in the dame way that copyillg wachines 
work and simply transfer the contents of the CRY screen to the paper (including 
graphic materials) They are expensive However, a small department mlght 
well find that only one of these printers is necessary to meet their n&ds, 
the results of ~ork done on any of the snall microcomputers could be moved 
(either over cormbunication lines or carried on a disc or tape) to the printer 
with little or no delay 
HNU)\JM FOR THE GWW.'IA,Y 
Syntax is, perhaps, the most widely studied sub J ect in linguistics today 
Given that this is so, there ig a real need for linguists, both profess$onal 
and student, to understand the extreme difficulty of the task of writing 
a grammar for a language That attempts are made to do this without the aid 
of a computer is perhaps all the evidence one needs to see that the difficulties 
are not well understood. A formal granmar, particularly one written in 
the notations commonly used today, is very much like a computer progrm It 
is a list of instructions for generating a list of strings, a computer program 
is a list of instructions for performing some process (which might be 
generating a list of strings) Both need to be precise, both are very complex, 
both suffer from the fact that a change in one part of the ordered list may 
cause an unanticipated change in the effect of another part It would be 
very surprising to find that linguists were better at producing untested, yet 
correct, #formal charac'terizations of complex processes than computer 
programmers I eXpect that testing a newfy written gtammar will be as 
enlightening as experience far a lihguisaics student as debugging a new 
codolex program is for a gompuer science student, 
Furthennore, just as the computer is sf uae in studying phonology and 
morphology, it can also offer data organization servicestto aid in the study 
of syntax Automated tactic analysis of syntax is still a research project, 
the! software neceqsary for it is not likely to bc produced by a aoftware 
house. But the research ia probably best performed in a linguistics 
department 
Having established a need, we must now recall a warning made earlier 
Useful contributions to the study of syntax by computers requires more computing 
power than is needed for similar contribution% to the study of phonology 
and morpl~ology While the need for sophisticated type fonts and input/output 
devices is lorer (nor necessarily a good eductional svntax program \odd 
permit: the manipulatioh of syntactic trees on o graphics screen), there is 
a real need for foster processors and increased memorv capacity To 
purchase the necessary computing poler, a department would have to step 
up from the hobbvist microcomputer size machines to the scientific research 
minicomputer (e g the middle range PDP-11 series) These machines cost 
a6 order of magnitude more then the microcomputer and yet, when the subject 
is syntax, will probably only serve a feb students at a time 
An alternative, available to some departents, is to urie the university's 
central computing facility* Money could be spent on the best available 
terminals and the needed comunications equipment Grammar testers have 
been written by university researchers for typical university size computers 
(Friedman 1971, for transfbrmational grmars, Kehlen 1976, for ATN 
grammars) and are available at little or no cost 
As I mentioned in the beginwing, $he use of the computer in the study of 
semantics and cognition is still very much a research togxc and little, I£ 
any of the work being done currently can be performed on small computers 
I 
will not descrgbe the requirements of such work since they vary widely depend- 
ing an the nature of the work, 
SOFTWARE FOR TFF LXECUf ST 
What is missing from the computing faciiities described so far Zs 
software, programs which are of use In sglving linguistics ptoblems The 
small coqputqrs are sold with a minimum of very tradittonal computer sdftware, 
none of it of any use to the nonprograming linguist In tact, at no level 
of computing powerlis there currently available comercial software which is 
of use to nonprograming linguists For large computers, as mentioned above 
some of the results of university research work is available for some purposes 
However, for the types of machines that departments are likely to purchase, 
there is essentially nothing 
This problem can be overcome in two ways The standard method is for 
a department to hjre a student programmer to design md write the needed 
software This has several advantages it is relatively cheap (especially 
when university assistantships are available for the purpose), it is 
personal - the student can be instructed to wrxte exxtly the kind of 
program thqt is needed The disadvqntnges of this n~cthod Ire in thc quality 
and dur~bility of the systc~t~s produced Jn this wqy Student ylugronuncrs ?re, 
in fact, students learning to program Of ten the11 korfc is lwlcjng in the 
' ease-oizuse' or 'hunnn engineeringf features found in well written, 
comme~cially produced programs, and, it is just theqe featu~cs which are 
very important to useos not Eamibiar with or comfort~ble with compyters 
Furthermore, programs produced by student progranuncls are not well known for 
their reliability, maintenance of them is difficrtlt and usually restricted 
to the period of time that the original programmer is still available Again, 
to the user unfamiliar with computers, reli3bilit.y is a very important feature 
It is very discouraging to try to do anything with semi-operational programs 
An alternative is to create sufficient demand for this type of educational 
software so tnat a commercial software house or a well funded university 
programming grqup would consider the investment of its time and money 
profitable' With linguists and linguistic educators providing input at 
the design level, very useful and reasonably priced software could he 
produced in this way The catch, however, lies in generating sufficient 
demand 
A final comment about one other potential use of computers within a 
linguistics departmeqt The search for language universals (cross linguistic 
research) requires very large collections of uformation A~collection of 
partial and complete grammars along with sample texts for a large representative 
sample of human languages 1s a formidable amount of informatio~ The kinds 
of questions posed by linguists using this information do not require immediate 
interactive response In fact, they traditionally require weeks or months 
of library research for answers It is therefore not unreasonable to consider 
the storage of this information on a small, even hobbyist size, computer 
equipped with large mass storage devices The task is a difficult one, but 
of potential value to both linguists and computer scientists 
Linguists need easier access to thie infolmation A computerized 
database, structured accoiding to the needs of linguists, would be a very 
valuable tool wl~ich could be distributed to my dtp lt tmcnt ailliny to 
Make thc necess ~ly invc tmcnt in I~aldwlrc lllc d 11 11) ~sc i 1 KI g c, hut unl ih~ 
many other I?rgc d ltab lees, it is ollr ~hout who t ~tluc tutt n ~t dt 11 i 
v 
known Computer scltntists 3ra   till loohinp f 01 \ ~vs to cffc~ tivel) qnd 
eff fcicntly org?nfzc d .it?b?.;es, q11d lin~uist s, 1 ith thtii intimqtt h~lot*lt.dke 
of the stru~ture of l?nyu?&c, have ln oypuxtunity h~lt to plovidt In t\ampl~ 
of how to usc the ~~IUL tulc of 1 hodv OJ illfox1113ti011 111 stollnp it on a r 
con~put~~ t f f cc tivt 1y It i\ K t ~%h 11i~11 rt qu-11 t 5 t llc c\per t knot ICL!?L of 
sevelql linlui5tic dis~iplint s ?rid it Is .i lest 11~11 ploje~t idculy suittd to 
a department of linbuistics 
Amer~caa Journal of Computational L~nguistics 
MANIFESTO THE 
PRESS 
DAVID G HAYS, PUBLISHER 
5048 Lakeshore Road 
Hamburg, New York 14075 
An tdea and a Problem 
Contrary to a famous oplnlon, 
prlntlng just let us see what 
But thought is nonlmear, and 
l~nearlty came in nth speech, 
we had been saylng all the tlme 
conversation flows as prlnt ,, 
- 
cannot Wlth electronlc publication. we wlll be able to move 
through a permanent recorh of collective knowledge wlth some 
of the flexlblllty that conversation has always allowed 
But why a permanent 
so 1s art ,Kuhnlan 
Electronic medla do 
record7 
revolut 
not ~mp 
Sclence 
ions, sma 
ose artlf 
1s forever c 
11 or large, 
lclal stasls 
hanglng , and 
are frequent 
o@ the f luk 
of ideas that gradually elimmates errors from sclence and 
ylelds pleasure in art 
However, none of us have mvch experience m the new modes of 
comunlcatlon Slnce all need help, we must--1n the famous 
phrase--explain to each other what none of us understand 
A Method 
THE PRESS at Twln Wlllows is mostly a method 
The method 1s to use prlnted paper, famlllar to us all, and 
mlcr~flches, famll~ar to many, m shlftlng comblnatxon wlth 
the unfamlllar electronlc medla 
A computer wlll be Installed In the offlce of THE PRESS, and 
used 'from the beglnnlng for adrnlnlstratlon and text prepara- 
tlon Edltors of books and journals that come to THE PRESS 
can submlt on floppy dlsk, on 
they can also submlt on paper 
reptoduct~on, raprd prlntlng, 
rec~rd~ng to drlve a compdter 
Mcroff ches 9~~~1 be suggested 
casette, or by telephone, but 
Publlcatlon can Be by photo- 
hlgh-qual~ty offger, magnetlc 
or thrpugh the telephone net 
far many pbb llca t ions 
As edrtoss andxeaders gradually become famil~ar with the 
. 
new systems, teachlng each other as they learn, be can expec't 
the contents of publlcafions to become more and more suitable 
to the new media; and less and less sultable to the old 
Services 
THE PRESS at Win IJillods will offer services at every step 
from the author's conceptualization through advertf~ih~ of 
the finished work. 
Editorial. For its clients, THE PRESS will help if necessary 
to rind expert readers who can submit opinions and suggestions 
about the content of proposed articles and books. THEPRISSS 
will provide counsel on readability. THE TRESS will mark up 
copy for typographic form, lay out pages, and otherwise give 
t'raditional redactory services. 
Adminis~rative. - For its clients, THE PRESS  till maintain 
tickler mes dnd issue reminders to contributors and readers 
when their submissions are due. It wil1,prepare budgets and 
keep accounts. It will maintain mailing lists, membership 
lists, and consultation lists. It will conduct membership 
survep and elections of officers. 
~iblio ra hic As support can be obtained, THE PRESS will 
in collections and add its own classifications 
and subject labels to make bibliography available to clients. 
Thus the preparation of a bibliography for a work in progress 
can be assigned to THE PRESS, and a book buyer can fallow up 
references or ask for selective drs~emination~ 
Educational. THE PPSSS wilI shortly begin publicqticm of a 
newsletter -for clients and prospects: Services and How to 
use them, the competition, hew products in hardware and soft- 
ware, publications and courses for authors and editors, and 
personal notes from the field of electronic publication. 
Conferences, workshops, and courses will be organized as the 
field needs them and can support them. 
Handbooks, manuals, and other materials for editors will be 
written or collected as feasible, catalogued, and offered 
for sale or gift. 
Pricing Policy 
tlethods and materials will be designed for each client 
initially; later, a catalogue of components of the ~ublica- 
tion process will be prepared so that the clienk can do the 
design work. 
Beyond the direct cost of labor perforqed and materials con- 
sumed at THE PRESS arid of services purchased for the client, 
the equipment used will be htlled at a ratesintended to give 
rapid amortization, and a management fee of 15bdded. 
This policy should bring the cost of information--books, 
journals, and electronic access--within the limits of anyone's 
purse. 
American Jouraid of Computational Li~gdstics Microfiche 75: 71 
I? E V I E W S: P1ICRO HARDWARE, SOFTWARE 
P 13 B L I S H E R: THE PRESS AT TWIN \I'ILLOWS 
May 23, 19743 
To hap hobbyists, householders. businesses, and 
government keep up with the c~untless vendors who offer 
hardware and software in the microcomputer market, THE PRESS 
at Twin Willows will begin imediptely to collect and 
publish evaluative, analytic reviews, according to David 
G. Hays, Publisher. 
"When the computing market was dominated by just a few 
big companies," Hays says, "it was fairly easy to decide how 
to handle a computing problem. Once a buyer had settled on 
a computing budget, the market might offer only two or thtee 
main frames big enough and cheap enough to do the job. Now 
the buyer can design a machine to fit a~purpose, and 'has to 
choose components out of lists that run up to dozens of 
alternatives. The worst part is, no one publishes th@ list!" 
THE PRESS intends to correct part of the problem by 
making useful information about the market available in easy 
language and inexpensive format. "Before long," Hays expects 
th.e hardware and software reviews will be accessible online 
for clients 60 dial in. 
11 
Where will the reviews come from? THE PRESS invites any 
user of any microhardware or software to write it up; the 
- more - 
From THE PRESS at Twin Willows - !lay 23, 1978 72 
editors at THE PRESS will rewrite if necessary, make sure 
that the evaluations are not illegally harsh, and eliminate 
the most obvious errors. No fees are offered to reviewers 
at present, but a change is contemplated. "Everyone who 
helps should be paid," as Hays puts it. 
Manufacturers and software houses can send their lists 
and item descriptions to be included with the evaluations. 
THE PRESS, tvhich will also publish original material in 
whatever technical fields need its services, is "mostly a 
method," Hays says. Its purpose is to teach information 
users how to cooperate with each other, making central 
publishing f ess relevant. 
Hays, who is setting up THE PRESS, is a professor of 
linguistics and of computer science in the State University 
of New York at Buffalo. He moved to Buffal'o from The RAND 
Corporation in 1968 after 13 years of research on language 
and computing. 
Hays is honorary member of the ~nternational 
Committee on Computational Linguistics, ed%tor (1974-78) of 
the Americaa Journal of Computational L'inguistics, and former 
chairman of NSF's Social Science Advisory Committee. 
THE PRESS offers no free literature, but is preparing 
to issbe a Newsletter. 
A $1 deposit will bring the fkst few 
issues, incfuding more about the hardware reviews. TIIE PRESS 
is located at Twin Willows, 5048 Lake Shore ~oad, Hanburg, 
New York 14075; the telephone number is 716-627-5571. 
American Journal of Computational Linguistics 
Microfiche' 75: 73 
PUBLISHING AJCL 
DAVID G. HAYS 
THE PRESS. at Twin Willows 
May 20, 1978 
A letter to: 
ACL Executive Committee, AJCL Editorial Board 
Dear Colleague : 
My term as Editor expires, by my definition. at the end of 
the present calendar year. The AssocPation will choose a 
new Editor; at the same time, I think that some changes in 
operations are appropriate'. 
In the 1960s, I proposed 
Library development ; but 
photographic storage had 
time is now up. 
the use of ult~amicrofiches for 
I said, if I did not write, that 
a time limit; and the predicted 
To supplement my University salary, I: am organizing The Press 
at Twin Willows. The enclosure describes the earliest form 
o.f the venture; I hope for rapid evolution. 
It would be to say commercial advantage to act as publisher. 
for AJCL. I believe that if ACL adopts the word-processing 
and lexicographic businesses as areas of applied computa- 
r\. 
tional linguistics the Association can grow and serve' a sig- 
nificant role in improvement of the common weal; and for The 
Press to help would be very pleasant and profitable. 
hs Editor, 
that I paid 
sity gave. 
Press redun 
tions' for s 
1 have contr 
for myself, 
The new Edi 
dam; in tha 
econdary pub 
ibuted the use of 
and some small h 
tor may have more 
t case, I should 
lications extract 
spwa and equipment 
elp thar: the Univer - 
to offer , making The 
like to open negotia- 
ed from AJCL. 
The Press cannot offer quite so much; it will be necessary to 
bill the Association for machine time and personnel costs. 
But only out-of-pocket costs will appear on invoices if the 
Association decides to deal with The Press. 
As for member services, we can continue microfiches; offer 
hard cdpy; move up quickly or slowly to typographicr quality ; 
issue newsletters along with qtiarterly journal ; and give 
online access to computer files. Most of that can be done 
immecktately, but some of it may have to wait a few months. 
It is up to the Association to say what it needs, if anything 
Sincerely 
American Journal of Cornputfional Linguist'ics Microfiabe 15: 74 
A S I S: 41s~ ANNUAL MEETING 
NOVEMBER 13 - 17 
NEW YORK CITY 
THEME: THE IMFORMATIOII AGE IN PERSPECTIVE 
36 Technical Sessions in three general areas: 
COLLEUCION, GENERATION, AND ANALYSIS OF INFORMATION 
DISSEMINATION OF INFORMATION 
INFORMATION FOR DECISION-MAKING AND CONTROL 
FOR MORE INFORMAT~ION CONTACT: ASIS 
1155 16th Street, N.W. 
Washington, D.C. 2(~036 
202 - 659-3644 
12~~ ANNUAL HAWAI I INTERNATIONAL CONFERENCE ON SYSTENS SCIENCES 
JANUARY 4 - 5 
SPONSOR$HIP; College of Busine-ss Administration 
Department of Electrical Engineering 
Department. of Information and Computer Sciences 
UNIVERS ITY OF HAWAII 
Association for Computing Machinery 
Sessions on MEDICAL INFORMATION PROCESSING-will be included in the 
conference, For more in~ormation cdntact: 
Dr. Bruce D. Shriver or 
Dr. Terry M. Walker 
HICSS-12/Medical Information Processing 
University of Southwestern Louisiana 
Box 44330 
Lafayette, LA 70504 
American Journal of Computational Linguistics 
Microfiche 75: 75 
LIIJGU ISTIC STRUCTURES PROCESSING : 
STUDIES IN LINGUISTICS; CQMPUTATIONAL LINGUISTICS, AND 
ANTONIO ZAMPDLLI, EDITOR 
Director of the Linguistics Division, CNUCE 
The Institute of the Italian National Research Council (CNR) 
FUNDAMElVTAL STUDIES IN COMPUTER SCIENCE, Volume 5 
NORTH HOL&ANCI, AMSTERDAM & NEW Ydk~, XVI + 586 PP,, 1977 
ISBN 0-444-65017. US $44.95/DFL# 110.00 
JONATHAN ALLEN, Synthesis of Speech from Unrestricted Text 
EMM~N BACH, "The Posttion of Embedding Transformations in a 
Grammar!' Revisited 
CHARCES. J. FILLMORE,.Scenes-and-Frames Semantics 
EVA HAJICOVA, Fotus and Negation 
DAVID G. IIAYS , Cognition : The Linguistic Approach 
IfARTLN KAY, Morphological and Syntactic Analysis 
FERENC KliEFER, Some ObservatiUns Concerning the Differences 
Between Sentence and Text 
JOHN LYONS, Statements, Questions, and Comands 
BARBARA H. PARTEE, John is Easy to Please 
S.R. PETRICK, On Natural Language Based Computer Systems 
YORICK #ILKS, Natural Language Understanding Systems Within the 
A.I. Paradigm: A Survey and Some Comparisons 
TERRY WINOGRAD, Five Lectures on Artificial Intelligence 
W.A. WOODS, Lunar Rocks in Natural English: Explorations in 
Natural Language Question An~wering 
American Journal of Computational Linguistics Microfiche 75: 76 
NATURAL LANGUAGE IN INFORflATIOIJ SCIENCE 
SKRIPTOR, Stockholm, Sweden, 1977 
FID Publication 551 
This book presents the results of a Workshop on Linguistics and 
Information Science organized by the Committee on Linguistics in 
Documentation of the International Federation for Documentation ,(FID/LD) 
and by the UAL Institute for Information Science. Lt contains a aeries 
of papers that provide perspectives on linguist5cs and information 
science from the vantage points of information science (F. W. Lancaster, 
Univel-sity of Illinois), library science (Derek Austin, The British 
Library), quantitative linguistics (Wolf Moskovich, Hebrew University of 
Jeru3aPem), computational linguistic8 (Naomi Sager, New York 
University), linguistics (Petr Sgall, Charles University), o~mplex 
semantic information processing (Tew A. van Dijk, University, of 
Amsterdam), and terminology (J. Ooetschalckz, Commission of the European 
Comm~nities). The book also features a challenge paper on the 
linguistics of information science (Hans KarLgren, KVA L f nstitute for 
Information Science) that delineates major issues in this area, These 
papers are bracketed by an overview of the Workshop (Donald E. Walker, 
SRI International) and by a review of the field (Karen Sparck Jones, 
Cambridge University, and Martin Kay, Xerox Palo Alto Research Center) 
that updates the book Wstic~ =or- m, a 
comprehensive survey prepared several yeara aga by Sparck Jones and Kay 
under the auspices of FID/LD (Academic Press, New York, 1973). 
yatural Lanauane a Informatfoq Sc-2 will be of interest to 
specialists in the areas referenced above and to anyone who wants to 
know more about the potential of natural language processing for 
information science. The is $10.d0 (U.S.) plus postage and 
handling. Order as follows: 
Horth and South Ainerica lslxQBLu-andAust.rarFa 
Roberts Information Services %riptor 
8305-G Merrifield Avenue S-104 65 Stockholm 15, Sweden 
Fairfax, Virginia 22030, USA 
American Journal of Cornput at ional linguistics Microfiche 75 : 77 
l! j C L: JOURNAL OF THE ASSOCIATION FOR COF1PUTATIONAL LINGUISTICS 
RESEARCH ON LANGUAGE Lexicdogy Phonology 
Dialectology Language Change 
Grammar Semantics 
Discourse Universals 
Understanding 
LABORATORY EXPERIMENTATI ON t Psvchology Phonetics 
Soc iolagy Neurophys iologp 
PRACTICAL APPLICATION I Transla tion Documen ta t ion 
Instruction Lexicography 
Robotics Speech Recognition 
SCHOLARLY INVESTIGATION 8 Stylis tics Content Analysis 
Text Comparison 
CONTENT 
ORIGINKL CONTRIBUTIONS : Algorithms, programs, system designs, 
experimental results. theoretical analyses 
REVIEWS AND SURVEYS 
ANNOUNCEMENTS : Symposia, conferences , publications, courses, grants 
ABSTRqCTS OF PUBLICATIONS : Wide coverage of journals , boob, and 
technical reports 
RESEARCH IN PROGRESS 
RESOURCES : A perpetual inventory of files of text, computer programs, 
dictionaries, grammars, and other materials available to researchers 
ADVERTISEMENTS : Announcements of books, equipmeat, services 
AJCL description 
The AMERICAN JOURNAL OF COMPUTATIONAL LINGUISTICS is published on 
4" by 6" units, each an index card or s microfiche. For each orifiinal 
contribution, two units are supplied: an index card bearing an ex- 
tended summary, and a niicrofiche containing full text, illustrations, 
and related materials. Abstracts, announcements, advertisements, 
and resources may appear on cards or on microfiche. The microfiche 
standard is MIC-9, reduction 24x, maximum 98 pages per fiche. Each 
unite supplied carries at the top a heading characterigin its con- 
are issued each year. 
S tent. The Journal is mailed in quarterly numbers; 14 to 5 fiche 
Subscriptions to the AMERICAN JOURNAL OF COMPUTATIONAL LINGUISTICS 
are available thfough membership in the ~ssoc I  AT^ ON FQI? COMPUTATIONAL 
LINGUISTICS. For the year 1970, dues for individuals are $15; dues 
for institutions arc $30. 
A supplementary charge for first class 
mailing (U. S . ) is $2; for foreign subscriptions, the air printed 
charge is $4. Volumes of the AJCL for 1974, 1975, and 1916 are 
available at rates of $10 individual and $25 institutional per year; 
the rates fcr the 19.77 volume are $15 individual and $30 institutional 
for first class or air delivery, add $2 or $4 per year as appropriate. 
Send dues, payable to the ASSoCIATIoN FOR COMPUTATIONAL LINGUISTI cs 
(or ACL), or requests fcr information to: 
Dr. Donald E, Walker, ACL 
SRI International 
Elenlo Park, Califgdnia 94025, USA 
Ponder M McCarter Ed~tor 
d* Washington Report 
~..~~~~L.LII**~~~*Q*~II***I~~~~~(II~I***~ 
m 
, , , , t -1 .,~~1 p,oc pqs~nu \,I, vlles Iur W,i~.h~rl~jtorr Off{( 0 1815 North I vnrl SI~FQ~ SUII~ 805 Arllnglpn, Vlrpln~a 31109 703 243-3000 
Vol. IV, No. 6 
AF IPS IN WASH1 NGTON 
AFlPS CONV['NLS TO\[ 1 It1 h('1 US WIlI l l_ ll(NlS1 , ~'(IN(IKI SS I OS \I ISI'QHlrl\'llION SYSTI34S; 
-- 
SENIOR GOVERNMENT OFF lC IALS ADDRESPAFI PS AUDIENCE , 
High-level Government officials last, month addrcsscd sen lor mcmhcrs of 
AFIPS in a special Washington briefing on Whitc Ilobst and Congressional 
informat ion systems. The ilFIFS Conference on Whitc Ilou<c and Congressional 
Informat ion systems, held Flay 3nd in the Presidential PWqs Confercncc 
Room of the Old Exccutivc Office Building, was attended by: Flr. Richard 
Harden, special assistant to the Prcsidcnt; Mr. Carl Cslo, assistant 
director for Information Systems, Off ice of Ahinistr;lt ion, Cxccutive 
Offlcc of the President @OI'J ; Clr. rdward limmcrma;h, spcci;ll assistant 
to the director, Office of 4dministrat ion, )COP; Rcp. Charlie Rose (D- 
N .C.), chairman, House Pol ic) Group on Infomat ion and Computers; Mr. 
John Swearingcn, director of Informat ion Systems, U. S. Scnate; Mr. Neal 
Gregory, staff director, House Policy Group on Infmat ion and Computcfs; 
and Mr. Boyd Alexander, director, llousc Infornl~tion Sys tcms . About 70 
AFIFS' individuals at tcndcd the special brlcfing, including officers, 
members of the Board of' Directors, presidents of the canst ituent socict ies, 
and committee chairmen. 
lllTL IIOUSE , eON(;RLSSIOSAlA OFF 1CIAl.S A[l[)RtSS A1 II'S AII[)I I NCI ( F 1 IT. C White) 
Sl~l~ci iyl Assi 5mn.t. to thc Frcsidcnt Richard Il;~rJcrl told thc illldi cl~>cc 
t h.lt tlicrc arc "lalig- rilngc pl;lnsI1 for Jrvi%lopinp cornpit cr ctrmmunarat ions 
hctwccn the hl~i tc Ilouqc ;111d Cnpitn] iii 11. 
(PI-c~~ 011s  ports 11:1\fr i ~~Jir:lf ~d 
that s~ch com~~~nu~iir:~tio~~s could include shnring of hudp~t infol11:ltion.) A 
n~inic-c~iiq~ilt cr in cvcry Ccr~~g~cssaa~~ f; office (or nt 1~1:lst cl~ist crs of 
minis sh;lrrd l,y Cor~gr~ssl~~cn) ucrr l~ossihi lit ies di scu.;st~I by rcrnpl r.;h  ion:^ 1 
pirt i cilxlnts in thc con frrr~icr . 
Olwni 111: the hlli t c llousc prrsi8nt :it ion, Mr. IlnrJrn t~~t l incd tkc nccd for 
iml~rov~d i~lfo~rn.it ion nl;inngc~llcnt wi thin the EOI'! lie not rd tllnt thC tj-picnl 
IYri8<i~lcr~t ill1 :idviscr a:ly Ire considering 50 to 40 iss11cs hith five to 10 
i qsucs of IILI jar i~nport;~licc I~cing cons idcrcd at n givcll m~ialc~nt . For cac!~ 
issric, 11,lrJt~rr said, scvcrnl Fc.dcrnl ilgcnc ics may hc in\l~>l~cd, ils \r'cll' as 
i I~IC of C~~ngl.cssio~ii~l cimrittrrs, and otllrr p~*oups. 
hccrss to is ~~clnconfidcntinl systcms, both public and 
pri~ate. Thrsc might incl\iclc [I) publicly available infornut ion from 
Excc~it . -- -- ivc --- 0r:lnrh ---- svst ----* ems such as : FAFRS, the 'Federal tlssi st nnce 
Progrp Hctricvnl Sys t ern, dcvclcipcd by rhe Dcpnrtrndnt of AgricuTture; 
:~nd the Dcp;lrtmmt of Justice's JURIS, thc J~stice Rrtrielval SystcnC; (2) 
C~wtol - - ----- Hi I1 - -- svst~rn>~ --- such as: LEGIS, the 1,cgislativc informat ion and 
Status Syst 'm; SOPAD, Summary of proceedings And Debate; and SCORPIO, 
SuhjcctfContcnt-oriented Rctrlcvcr for Processing Information On-Line; 
ah (5) coalmercial resources, - such as: 
rrle New YorL Times Information BanL 
Loclhccdls Lllcllog, ~IHJ #harton Fconomctric Forccastlng Associates' [EFA] 
cl-onomic made i ingt and Jatn scl~ ic es . 
A scrics of infomriltion processing utllitjcs a~~ailable to a1 1 
users. 'I'hcsc might inc ludc corrcspondencr control, word proccssing and 
tcxt editing, projcct* tracking, and n doc~mlcnt filing arrd rctrlpnl system. 
Various spccial-purpose systcrns to mcct spccific needs of, 
individual officcs. Cxrunp1t.s of prcscnt systems arc: the Office of 
Slanilgt.n~nt nrrd B~idgct Is (031B) Budget Preparation Syst cm ; thc Nhi te HOL~SC' s 
Congressional Vot t. Analyses Systcm; and the Office of the Vice-president' s 
Time Analysis Systcm. 
A rcqticst for, proposal for dcvelopmcnt of thfs system is expected to be 
issued cnrly in .July, folloaing reviek by fhe Gencltal Sersices ~dministration, 
it was announced at the AFIPS Confcrence, Until July, Mr. Calo said that 
a l'tcrnpornry upgrade" would he accomplished "kith little or no increase in 
prcscnt cupcndi turcs .I' lie stressed that the upgrade rzould be replaced at 
the t imc of the final procurcmcnt. [E. : At press time, it was learned 
that Interdntn, Inc. . Occiinport, hew Jersey, performed the temporary upgrade .] 
Ed Zimmcrman noted that in dc~eloping the plans for the White Ilouse ystcms, 
the- EOI' has talhcd with the National Tclccom~nunicat ions and Informat ion 
Administrat ion (NTIA) , the neK unit within the Cclzm-erce Department; and is 
-. 
consi dcl-ing the ~co~ilrncndntions of the l'apcruork Cammissi on. $lr. -1rnmerman 
also nnnounccd that a demonstration of an advanced coran~inicntions information 
rt.tricv;ll system, ncccssing demographic infonnat ion from the Census Bureau, 
is schrdu1,cd in Jline on Cripitol )[ill, and in the 01B Ehecutive Office Building. 
JUFU'E, 1978 2 AFIPS \t'ASlll NGTOS REPORT 
In thc qlirstion-a~ld-answer session on White llouse infonnation systcms that 
followed, sevcral indi~fiduds in the a~~di~nce askcd how privacy requirements 
will bc mct in the new systcms. 
\i'l~ilc there will bc direct access of public 
data, Wsrdcn notcd that private information will only be available to 
the \\?~ite House in summarized form, and will not include individual records. 
Other quest ioncrs sqhed how the quality of information retrieved by the 
syst ems wolil d bc snfcgliarded. Zinunermiln said that EOP woMd be selective 
in using data, and would constantly li~onitor its quality, as is done in 
maintaining the quality of a good library, he said. 
In response to n rl~~cstion ahout the use of the systems at the very highest 
lcvels of the White Ilouse, Harden replied that the President might eventually 
use* a CH'T screen in his office. 
Opening the presentation on Cb~rgrcssional informlation systems, Rep. Char1 ie 
Kosc noted that two-way cable has already been installed in all Congressmen's 
offices and will pcrmit video as well as data communication. 
(The ilousc 
bas recently authorized members to purchase, out of their office budge~s, 
color telcviaions which could be used as display terminals.) 
In addition, 
Mr. Rose cited the improved communic?tions with constituents through the use 
of word proccssing equipment. He also' discussed the importance of computerized 
mail ing 1 ists for Congressmen in countering inaccurate mailings by lobby groups. 
Neal Gregory stated that some 230 Congressmen now use terminals to access 
LEGIS, SOPAD, SCORPIO and JURIS. LEGIS provides information on bill status 
in both the House and Scnate; SOPAD give3 an on-going account of proceedings 
in both Houses; and JlJRIS contains numerous Justice Department legal briefs. 
(Same 300 members will have terminals by the end of the year.) Mr. Gregory 
cited the nced for even more advanced word processing equipment to handle 
at least some of the eight million letters received cach month in t5c House. 
Boyd Alexander notcd that a detailed, three-month study of members is being 
initiated to determine the need for additional information systems in the 
tlousc. lie announced thnt an Amdnhl 470V5 had just been purchased to expand 
the scope of infomation services. [~d.: An An~dahl spolteslnan said delivery 
was expected May 15th.l According to Mr. Alexander, a list of members1 
recorded votes will be added to LEGIS around July. 
John Swearingen announced a new Scnate study released in Ilay, entitled 
ir,',z~ution Zptems for tFi Uaitcd States (#9. Mr. Swearingen also noted 
thc need for separate Ilousc and Scnate groups to oversee infonnation systcms, 
stating thnt the situation in the two Houses is comparable to different 
companies with varying rules and procedures. He added that computer usage 
in the Senate is less than that in the House. According to Sxenringen, 
the Senate receives up to two million letterspermonth, or 600 letters per 
wcck thnt could be handled (at least in part) by word processing eq~lipment. 
In the ensuing questibn and answer srssion on Congressional systems, Mr. 
Rose announced that the lteuse Adminisbration Committee is close to adopting 
a rule forbidding the use of mcmhersl computerized mailing lists by campaign 
conuni t t ccs. 
(An ethics rule of the Senate incorporates n simi lnr provision.) 
JUNE, 1978 
AFIPS KASHINGTON REPORT 
AFIPS President Dr. Theodore J . Wil lipms introduced tho participants and 
moderated the discussions. Washington Activities Cornmittel Chairman Keith 
W. Uneapher complimented AFIPS voluntcers for obj ertivi ty in providing 
ltecl~nical information to the Government. Fir. Uncaphcr noted tlrat differing 
Views can be extremely valuable to high-level policymakcrs who must consider 
a1 1 options. l'he Washington Activities Committee chairman also introduced 
Alcxnnder D, Roth, recently named to head the ATIPS Mshington Office. 
Al'PEALS COUKP' ORDERS ATG'I', FCC '1'0 IMPLEBKN'I' PREVIOUS EXCCUNET RULING 
IEFUSES 113 RECONSIDER DECISION : ATF,T SEEKS STAY WHI LE-ASII?NG SuPRuIE 'couw 
TO PONDER CASE 
The Bell system operating companies have begun processing requests by 
LtCI Cmunicat ions Corp., a Washington-based specialized carrier, for 
local telephone connections allowing blC1 to expand its long-distance 
phone service, Execunet, to 12 additional cities. In April, the U.S. 
Court of Appeals in Washington ordered ATFT and the Federal Communications 
Commission (FCC) to implement the court's July, 1977, ruling which 
authorized the Exccunet service. In May, the court refused to reconsider 
Ips earlier decision as requested by ATET and the FCC. At press erne, 
ATGT is seeking s stay while it asks the Supqeme Court to consider the 
CLISC. 
Despite the 1977 appeals court ruling (which the Supreme Court would not 
overrule last ,January) , thr Fcdcrnl C~mn~~lnicat ions Commiss'ion, in February 
rscc li,z;:~~;~i~~!cn 3t,;?~*t, 4/78, p. 31, held that ATGT was not required to 
make the additional local corlncctions required to implement Execunet. 
At that time, only Commissioner Joseph Fogarty dissented from the FCC, 
filing, stating that the commission's action nullified the 1977 appeals 
court ruling. In its April ruling the court agreed, arguing that "MCI 
is in effect no better off than it w3s during the entire course of the 
litigatiod in this court. Notwithstandifig our favor3ble decisicfn, it is 
unable to expand Execunet ." 
The appeals court contended that AT&T%nd tbFCC "twisted the issues we 
contemplated in this case beyond recognition, " ATET had argued that it 
would have to raise long-distance tekphone rates if competition was 
introduced by MCI into densely populated areas with Execunet. The FCC 
held that the local connections should be denied, contingent on its 
study into the effects of competition on ATET. 
The Execunet service, which provides voice 'and data communications, 
involves calling a local number, then giving a code number to be connected 
through MCI Is network with another telephone in one of 18 cities now served 
by Execunet. Tlre appeals court decision is also expected to affect 
Southern Pacific Cnmunications Co.% plans to market a service similar 
to Exccunct, called Sprint. 
AFIPS lJASIIINGTON REPORT 
FEDERAL RESERVE BOARD 1 SSllES F lKAL APZ'KOVAL FOR NAT1ON\VIDE ACH IN'TERCONNEC'I'I ON 
_- - -------- - - - _ I --_ __ _ __ __ --- - -- 
The Frdcral Rcscrve Roard has issued final approval for a nationwjde 
interconnect ion of nl~tomnted (check) clearing houses (ACIls) which, by 
the end of this year, could pelnit the Fed's corporate custbmcrs to 
debit or credit their private custon~ers' accounts using the Federal 
Reserva Cormnunj cations System (FRCS) (see Wnshington Report, 3/78, 
p. 8). 
In the past, thc Fed has provided ACH check processing, check 
set t lcment , and check delivery services on a strictly regional basis. 
Thc April dccision follows a 1976 pilot program undertaken by the Fed 
wllicll was criticized by a farmer Mite ljouse Office of Telecommunicat~ons 
Policy (O'TP) cjfficial as a "surreptitious development of an'on-line 
c;ipnbility." 
In the interregional pilot pmgram, some of the Fed's corporate 
customers filed debit or credit instructions on magnetic tape with their 
local ACHs. These instructions were then tra~~srnittcd with FCRS to other 
rcgi onal AClk and cvehtually to the corporate customers banks. 
In January, sccking comment on the prop~scd nationwide program, the Fed's 
Board of Governors said that "the probable long-run efficiencies resulting 
from interconnection of all operating ACH facilities justify the ~oard's 
action at this time to provide these services . . . Moreover, the Board 
rcgands its action to interconnect the current regional ACH facilities 
as a rcscarch and development program that will provide technicxl data 
and cxpcricnce in the operation of the nationwide ACH facilities. The 
Federal Reserve System intends ta make this information available to 
those in the private sector interested in the development of alternati~e 
systcms. " 
The Fed also cited recommcndati~ns of the National Commission on Electronic 
Fund Trahsfers [NCEFT) (see WasMngtm Report, 11/77, p. 2) which urged 
the Fed to continue development of "ACH-like services ,"'while also 
encouraging private sector development in the same aha. However, the 
Privacy Protect ion Study Commission (see Wgsllingtcm Report, 8/77. p. 2) 
rccommendcd that "no Government entity be allomd to own, opcrrate, or 
otherwise manage any part of an qlectronic payments mechanism thht 
involves. transact ions among private parties. The Fed has recently 
implmented procedures which mandate removal of most all individual 
names held in a data base after 30 days. 
According to the Fed, 95 out of 121 sets of comments received since last 
.January crldorsed the interregional ACH connccti-on . Among those critical 
of the program, the Dcpartment of Justice noted that Federal Reserve 
involvement would discourage the priyate sector from devkloping similar 
systcms because the Fed does not charge for its program. 
In its January announcement, the Fed added that provision of the "inter- 
bank servicef1 slmuld a1 so "enhance the opportunities open to depository 
institutions for developing improved 'retail' payments services for the 
pikb,lic." Although not provided for in this nationwide ACH interconnection, 
a point -of-sale (POS) switch could conceivably link consumers and retailers 
with ACHs and the Fed. 
I'he NCEFT urged the Federal government not to 
become involved "operationally" in POS switches "at present or jn the 
foreseeable future." 
JUNE, 1978 
AFIPS hfASHINGTON REPORT 
POLIC~lAKlNI: RVDRTEDLY BEING CENTRALIZED IN MIITE HOUSE AS GELLER HEARING 
HELP ON NEW ASSISTANT SECRETARY OF CM~IERCE NO~~INATION. 
Generally recognized as the Carter Administrat ioh s chief potential 
spohosmzrh on telccommunication policy, &nry Gsller oppenrcd before the 
Scnatc Committee on Commerce, Scie~~cc and l'ronsportntlon tt~ answer 
questions about his riolnination by the l'resideqt us Assistant Srcrctsry 
of Conuncrce for Communications and Informution. Although rrcciving 
r7 fricndly wclcomo from thc Scnate committee, Gel lcrt s appearance on 
April 14th was overshadowed by n controversy over the failure of Barry 
.JaftoJa, special qssistnnt to the President for Clcdia and Public Affairs, 
f o appear before the commit tec as requested on the same Jay, 
Presidential Adviser Suid to Esceed Role it1 Telecommyni,cotions, Sen. 
Ernest F. Hollings (n-LC,) i~lvitcd Jqgoda to appear before the committee 
to respond to allegations that blr. Carterr s special assistant might be 
exceeding his authority as adviscr to the President, thus detracting from 
Geller's presumed status as chief spokesman fbr telecomnic~tions policy. 
In declining to appear, Jagoda wrote Hollings that (as special assistant) 
his role is "advisory, and I have no decisionmaking authority in tele- 
communications policy." It appears, at press time, that until Yagodals 
status is resolved to the committee's satisfaction, the Celler nominiltion 
will be delayed, 
HOLLINGS, COPPlERCE CCIbQIITTEC" CIYE FRIUDLY RECEPTION TO GELLER 
[AFIPS/P. b!cCarter] 
Policymaking Said Being Central ized in White House. 
Although the President Is 
reorganization of computer-related bodies itressed the need for combining 
the functions of the White House Office of Telecommunications Policy with 
the Commerce Department Is Office of Telecommunie~tions in order to strengthen 
Cabinet government, recent developments (including the Jagoda controversy) 
indicate that the President may be centralizing policymaking in the White 
tious e . 
JUNE, 1978 
AF IPS WASHINGTON REPORT 
Mr. Carter's aides and Cabinet oiet in April at Camp David reportedly to 
determine procedures for centralizing long-range decisionmaking in the 
White House. 
The apparent shift in emphasis from Cabinet government 
to an increase in White House responsibility is further dramatized by 
the recent appointment of Anne Wexler as special assistant to the 
President. 
Ms. Wexler was formerly deputy undersecretary for Regional 
Affairs in the Department of Commerce. 
Geller Describes Workkng Relationship With Commerce Secretary, President. 
At his Senate' confirmation hearing, Assistant Secretary of Commerce- 
designate Gel ler described his relationship with the Secretary of Commerce, 
Juanita M. Kreps, and to the President. According to the nominee, he would 
bring "important decisions1' such as those concerned with comman carriers 
and Execmet to Secretary Kreps, with whom Gelter says he has "ready 
access." Prior to meeting with Carter, Geller indicated he would talk 
first with Mrs. Kreps. Asked how he would react on a disagreement with 
the Secretary, Geller replied, simply: 
"She wins." 
NTIA to Formulate Position on Bell Bill.' According to the nominee, the 
new Nat ional Telecommunications and f nf ormat ion AMinistrat ion [NTIA) , 
which Geller will head at Commerce, is formulating a position on the 
Conslrmer and Comnunieations Reform Act, the "Bell Bill," which F called 
the llmost important issue in telecommunications ." Gelle~ said cllure~ to 
study this issue would make NTIA "an adv~cate.~' He added that ,U"A 
is beginning its own studies on subsidies in the Bell System (i.e., 
ahether revenues flow primarily from the private line services to the 
public line services, as Bell claims. Or vice versa). Geller also sa3d . 
that NTIA will participate in the Federal Communications Commission ('FCC) 
rulemaking on message toll service @ITS) and. wid8-area toll service (WATS) . 
Electronic Mail.. Privacy Issues, Tpansborder Data Flow Take Precedencq 
her EFTS. In the nominee's March interview with the AFIPS Washineton 
3ffice (see Wa~hingtm Report, 4/78, Supplement), Geller noted thar NTIA 
is studying electronic mail, q., '!Should the U.S. Postal Service go 
irlto electronic mat l? . . . Are you going to give them [a] monopoly, not 
1ikeJ.y. Will there be an advantage if they start competing with Bell of 
Satellite Business Systems?" According to the Assistant Secretary of 
Cmerce-designate, electronic mail, cphvacy. issues and translw&r data 
flow are "proceeding in a faster track1' than electronic funds transser 
(EFTS). He told AFIPS Research Associate Pender M. McCarter: "We have 
those ahead of EFTS. We are doing electronic mail right now, looking at 
what should be done. We are deeply in the midst af privacy and will 
continue, Ana, we have made a commitment o.f resources to the international 
tramborder data flow issue." Geller described NTIA as a "focal pointv 
on transborder data flow, saying: 
"We ought to be doing the diggFng and 
supplying the information to the State Department, to the hgress, and 
others, as may be necessary.'' 
Gellerts Npmination Endorsed. 
Also appearing before the Senate ~odttee, 
in support of C3Wlerf s nomination, were: 
Rep. Herbert E. Harris I1 (D-Va.) ; 
Ms. Yaleri Byrd, staff director. National Black Media Coalition; and Mr. 
Paul G. Zurkowski , pkesident , Information Indw try ASsociation. 
Fol lowing 
Mr, 2urk0wski'~s presentation, Sen. llollings mlicited "help from your 
organization and others, on the convergence of computer and cofnmunications. 
AFIPS WASHINGTON REPOW 

References 
Astrahan, M.M.; Blasqen, M.W.; ChamberLin, D.D.; Eswaran, 
K.P.; Gray, J.N.: Griffiths, P.P.; king. W.F.; Lorie, R.A.; 
McJones, 4.; Mehl, J.W.; Putzolu, f3.R.; Traiger, I.L.; Wade, 
B.W.. Watson, V.(1976). System R: RelatSri.ona1 Approach Co 
Database Management. ACM Tr~nsactions on Database Systems, 
Vol, 1, NO. 21 Juner 1976, pp. 97-137. 

Petrick Stanley R.Il977). Semantic Interpretation in the 
Request Systemt In in C$mputational and Mathematical 
Lingustics, Proceedings of the Internamtional Conference on 
Computational Linguistics, Pisa, 27VII-X 1973, pp. 
585-6 10. 

Plath, Warren 5.C1973). Transformational Uramm'ar and 
Txansforma~ional Parsine in the Reques-k System. IBM Research 
Report RC 4396. Thomas J. Wats~n Research Center, Yorktown 
Heigkts, N.Y. 

Plath, Warren 3.(1974). String Transformations in the 
REQUEST System American Journal of Computational 
Linguistic's, Hi,crof iche 8. 

Reiterr Raxmond(1976). Query Optimization for 
Question-Answering Systems. In: COLING 76, Proceedings. 

Robinson, Jane 5.(19731. ~n Inverse Transformational 
Lexicon. In Natural Language Pr~aes~sing. Randall Rustin? ed. 
Algorithmic6 Press, Inc., New Yark, N.Y., 1973 pp. 43-60. 

Woods, A 'Kaplan. R.M.; Nash-Wehber, B,(l972). The Lunar 
Sciences Natural Language. Information System! Final Repprt. ,. 
BBN Report No, 2&378. Bolt Beranek and Newman, Inc., 
Cambridge, Massachusetts, June 15, 1972. 

Chandor, P. (1970). P bictjonaxy of Computers. ~enquin Books* 
Hamandmmrth, Fnmland. 

Findler , N .V. (1Q70). Sonre con'lectures jn computatlonaL 
I~nauistics. ~,in(ru~rtics, No. 64, op. 5-9. 

Findler, N."., J.L. Pfaltz and H.J. Bernsteln (le72). - Four 
Riuh-Level Exkens3 ons of Ft)PTT(A)l IV: SLIP 1 TFFFPp3,PN 
and SYMBOLAbE. Spartan Books: Pevt Yotk, 

Findler, M.tr. and H, T911 11 (1974) . A step toward computer
lexicometry. AJCL.
