File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1075_metho.xml

Size: 90,124 bytes

Last Modified: 2025-10-06 14:11:16

<?xml version="1.0" standalone="yes"?>
<Paper uid="J79-1075">
  <Title>RESEARCH ON LANGUAGE Lexicdogy Phonology Dialectology Language Change Grammar Semantics Discourse Universals Understanding LABORATORY EXPERIMENTATI ON t Psvchology Phonetics Soc iolagy Neurophys iologp PRACTICAL APPLICATION I Transla tion Documen ta t ion Instruction Lexicography Robotics Speech Recognition SCHOLARLY INVESTIGATION 8 Stylis tics Content Analysis Text Comparison</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE DERIVATION OF ANSWERS FROM LOGICAL FORMS EN A
QUESTION ANSWERJNG SISTE*~, Fred 3 Dlny au
ONEU MORE STEP TONARD COMPUTER LEXICOMETRF,
</SectionTitle>
    <Paragraph position="0"> Nlchof as ir Findler and Shu- HIT^ Lee</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PUBLJSHING AJGL
CONFERENCES ASIS AND HICSS
LINGUISTIC STRUCTURES PROCESSING Zampolll, ed
NATURAL LANGUAGE IN INFORMATIDN SCIENCE,
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DESCRIPTION OF AJCL
American Jwrnd
THE D,ERIVATICN OF MSWERS FROFl LOGICAL FORMS
IIJ A QUESTION AHSWERIIIG SYSTEW
FRED J DAMERAU
IBM Corporation
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> This papex descrsbes how the process 05 gene~ting a response given an underlying representation fox an input question is accomplished in the Transformatioaal Question Rnswering [PA? system under development at IBM Research, a brief description af which is given.</Paragraph>
    <Paragraph position="1"> The last formal level of representation in this system is called a logical form. The bas~c method of evaluation of logical forms is the generate and test&amp;quot; paradigm, used, for q~&amp;npls in the LUNAR system (Woods, Kaplan and Nash-Webber, 1972 1, althbugh that implementation must be fairly efficient in order te be j~actical on a moderate size data base. The basic idea is to keep track of the equivalence .relationships botueen the variables in the logical fcrh and associated constants, and use this information to dexive from the data base the extensions of the predicates contained in the logical form. A similar pxoposal has been made by Reitez(1976). The logical fo~ms and the process hy which candidate sets are computed from these forms @re described in considesable detail* We believe it shoufd not be necessary for a computational linguistics project to describe operahions beyopd the last lev'el of f orma1 representation in ozdex 5~1 an outsider to understand exactly how a system operates sufficiently well that he.can paedict its behavior. Although we have attempted to achieve that, we stilL have a considerable way to go, ~hir paper describes how tho process of generating a rasponse given an tl~lde~lying xrsprg!senT:amn fair an i11put question is ~cccm~lished An the ~ransforpntionnl Eucst~on Answ~rihg (TP) 1 sys trrp undo r' co~~tinu~ng dr,u@kopmmc?nt at IAN Research,. TQA has beert, operational. 1 a laborntszy mode for several yeers.. The system is noid installed in the office of the planning department ol a small city uhere it is used to access the file of land use fox each parcel of land In the city, (about 10,000 parcels ~ith 40 piecas of data for each parcel 1. The sysytcm is trnlilcrgoin,g rn~difications nncl in~pxovsnent pxisx to a formal eva1uati011 stage I A generalized flow diagram of th&amp; TQA system is given in Figure 1. Input, from a display device or typewriter-like terminal, is fed to the preprocessox, which segments 5lle input character stsing anto words and performs lexical lookup. The process of lookup is complicated somewhat by a provision for synonym and phrase, xeplacement. Words like &amp;quot;car&amp;quot; and &amp;quot;automobilew are changed to &amp;quot;auto&amp;quot;, an8 strings like &amp;quot;gas stationw are frozen into single lexical units,  The output from the lexical lookup is a l~st of tzansl each tree, contaznzng padrk of s~,er!ch in.brmatxorr, gyntactzc faaturcs and scrmantlc featut:es, RS requ~r~d A descz~~t~r~n~ of the, lexical cornpol&gt;cetlt, now absul~t~ ln rts detarl but still valid 111 main outline AS glvan ln Rob~nsarr( 19733. &amp;quot;f'&amp; list of trees is input to e; set 03 sFtt~$m tz~~~~!~~atx~~n.s.</Paragraph>
    <Paragraph position="2"> describ~d ln Platht 1974 I. These t~Q*nsfarnat~ons ap~ratc! arr gdjhcant 16~ic~bit8119~ to deal with patfcrns 05 C~.~SS~~LP~S, ordinal numbers, stranded prepoJit%ons, and the like. The effect of thls pnase is to reduce the nurnber of surface paxses and the amount of work clone in the transformational cycle. The resulting list of trees 1s input to a context free paserr whlch produws a set of surface trees, each of which 1s fed to the trzinsf~rmatianal recognizer.</Paragraph>
    <Paragraph position="3"> The recognizer attempts to find at1 yqd-~Xl~-ajjg ~-ru&amp;t'l~r~q foq each surface tree, PlathClQ73). Typically only one of a set of surface trees ill result- Ln an underlying structure. This structure itself 1s input once agaln to the tkansfoxmational recognize^:, uslng a (smalJ1 set of grammar rules tailored to a speciTic data base to produce a suerv structu~e, Query structu~es are similar to underlying structures In form, but re-flect the paticular rneanmg constraints resulting from the format and content of a given data base. The query stxucture tree 1s processed by a Knuth-style semantic intexpreter, PeteLck t19771, producing a losical form. A logical form can best be thought of, In</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 7
</SectionTitle>
    <Paragraph position="0"> our* corntext, as a retrievaZ expressiwn. which is to v be1 evaluated, producing an anwer to the English input query.</Paragraph>
    <Paragraph position="1"> Since the major part 02 tlris paper is co.ncerned uith procefsing logical forms, discussion of their specifics vhll be deferzed until later The process of answer extraction from the data base is accr~aplished by a cotnbinatWn of LISP and FLII programs, described below, and an experimental relational data base manactenrent system called Relational Storage System (RSS) (Astrahan. et al. 1976). The RSS provides the capabiJiity to generate a data base of n-ary xelations, with indexes on any field of the relation, and low-level access commands lixe OPEN, NEXT? CLQSE, wit11 appropriate paraneters, to retrieve information from such a data base.</Paragraph>
    <Paragraph position="2"> All the proce.ssing modules are under the control of a driver mdule, which maintains cornmfinication with the user, calls the processors in the corzect sequence, and tests for errors. An example of the procgssing of a question, with tHe intermediate outputs, is given in Figure 2.</Paragraph>
    <Paragraph position="3"> In this example, *he numbers 2945, 6535, 6635, 6975 are the numbers of milliseconds .of computez time used up to the point shown, on an IBM S/370 Model 168 The strUctures printed are a bracketted terminal s.tting representation of structdres which are stored and manipulated as trees by the  together with thei'r associated complex featqrea, represent mucIl rrdditial~al inf brmatian that: id not shown here ., The number 591 is a land use code which. in the data base, indioates a drug storer and th# long numbers in bhe ansNez are the parcel identifi6rs, (ward-block-lot).</Paragraph>
    <Paragraph position="4"> Ffom this bzicd description, it should be apparent that the TQA system, considexed as a blach box, is sirn'ilar td many sthers:. 1.n particular, there is a desi9nated level of meaning representation, the logical Zoxrn, which is the lbsk formal construct in the system. The remaining processing necessary ko derive an answer and to format it for presentation to a user is accomplished ny an unstructured se-1; of computer programs. Two sepazate issues azise as a resu3 $: how efficzently can the logical form be evalu'aled against a real data base, and to what eztent do the processing functions eurther specify meaning, beyond that carried by the logical form?</Paragraph>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
FVALUATION OF LOGICAL FORM!$
</SectionTitle>
    <Paragraph position="0"> Th.e basic method of evaluation of lpgical forms is the &amp;quot;generate and testv paradigm used, for example, i.n the LUECAR PAGE 10 system LJoads Kaplari and Nasli-Webbex, 1972 I. The simple version of this paradigm, used by Wooas and implemented in our eazly systems, in~olves checking pre-selected lists 05 objects or, in the worst case, all the objects hnonn to the system, to see, if they satisfy. tile query pxed'icates. It is computationally impractical except foq small data bases.</Paragraph>
    <Paragraph position="1"> Our current* variant 05 this metllad js much more efficient. The basic idea is to keep track of the equivalence relatibnships between the vaxiables in the logical form and associated conskants, and use this information to derive the extensions c# the pxedica~tes contained in the logieal form f%am tne aata base. A similar-pxo~osal has been made by Reeiter(19761 We do not how.ever, m&amp;Re such extensive use of quekp trgnsformations as Reiter outlined.</Paragraph>
    <Paragraph position="2"> Logical farm$ Zn order to describe the eualfiation process, it is necessary to describe the 1olgicaL form in s~mewht more detail, referring fos example again to Figure 2. In the fixst place, excepr +or the set-forming function satx, which takes as arguments a variable name and a proposition, all other weX.1,-fo2:med folimulqs are composed of predicates and their argu~~ients . Some of the predicates are perfectly ordinary like qreati!rthan. Some are quantifiers, like fox:~tdeast, which Cakes a limit argument n, an argument</Paragraph>
  </Section>
  <Section position="7" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 11
</SectionTitle>
    <Paragraph position="0"> which is a set? and a proposition e, and which is true just in case or more elements 05 the specifred set satisfy the proposition e. Others axe special application predricates like parceL, which is true just in ease its single ergurnen-t is a parcel identifier.</Paragraph>
    <Paragraph position="1"> The hain data base related ptedicate is named testfct.</Paragraph>
    <Paragraph position="2"> Referring to Figure ZI it is seen that Cestfot has three arguments, The first is w constant or a variabLe which will be replaced by a constant befaye evaluation, the second argument is a llst whose memtbers determine a particular: data base value, and the third is an operator specifying the relation which must hold between the iirst argumentsand the data base value detexmined by the 'second argument.</Paragraph>
    <Paragraph position="3"> The data base oan be thought of as a collection Q+ binary relations, all shirring the same key. In our applicq,tiob, this is thQ parcel identification gr: account number, by which any piece af pg~perty can be identified. The list which is the second argument of testfct consists of the relation name and the* key which identifies a va1u.e in the relation. The key actually has two parts. The second part is a yeah now unused, although since the files in nhich we are currently intereded are changed on a yearly basis, we anticipate maintaining and accessing historical data. The first part of the key is the account number mentioned above.</Paragraph>
    <Paragraph position="4"> In gener a1 , the second argument of testfct must be PAGE 12 sufficient to identify a unique binary rela*ion and value in that relation.</Paragraph>
    <Paragraph position="5"> If the logical form is itsel5 a proposition the system will answer eithek &amp;quot;yes&amp;quot; or &amp;quot;no&amp;quot; . If the logical form has a tdp level setx, the system wi-13 print the membezs of the set satisfying the specisied proposition, pexhaps along with some identify3ng information: Simplifications A number of simulificatlom can be, and in part have been, carried out on logical iorms prior to eva~hation. Some pxedicates, for example, are essentiazly empty for purposes 02 evaluation, in that they always evaluate to true,. As an e'xample, the predlcate dollar, for information Sields referring to taxeg, is empty of meaning because the pxocessor assumes thAt the contents of the %axes field are always dollars. A slightly less obvious example of a possible sfmpliSication can be seen in Figure 2. The set argument of the foratleast pregicate cantains no free variables. It is not necessary, therefore, to evaluate the inner setx funmtion for each evaluation of the predicate. Instead, the setx function is evaluated as soon as the semanuc interpretex has discovered that it has no iree variables* using the standard evaluation mechanism, and the value, i.e., a set, is substituted for setx axpression. Our system perFormk simpl-if ications 04' this Rind in its normal mede (although it can also delay ~11 evaluations qntil a comple%e form has- been built), so that the final logkcal foxm seen by the retrieval furlotions during p~oceSsiw is usually that shown ia Figure 3, where the innei set^ has been replaced hy the satisfy-iing set viz the parcel identifiers of the set of drug sto~es.~ L+ker all the applkc@le simp&amp;ifirathons have been donel tht~ resulting form is passed to the evaluation function, E,V,ALU.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
The Pre-evaluator
</SectionTitle>
      <Paragraph position="0"> It might seem that since the system has been written in LISP, it would 0nJ.y be necessary to define the appropziate functions and then call 'the regulax LISP evaluator, ~nstead of a special evaluator like EVALU. WI~ile Chis would be possible, the aifficulty with such an a~proac~h can readily b~ seen by considering the embedded setx in Figure 2. The desired set of X7s is that set of parcel identifiers fo which the associated land use code is &amp;quot;59 In. testfct is a predicate which is true for the appropriate X7sr but wha$ 1s the candidate set of X75 which should be tested? At wurst, the system might consider the set of all objects it knows about. AS a better choice, the system cou3d infer from the syntax of testfct that +he candidates are all members of the set of parce'l identifiers, but s{till there are almost 10,000  reasonable set tin fact the perfect set) of candidates for X'7 can be found by. Looking in tllc data base for that set of identifiers fox which the land use code is 591 If the data base is properly organized, such a search can be very zast Not all predicates are so simple however. The remainder of this section will describe. in some detail llbw caradidate sets for more complicated p%edicates are rived at. Once can'di3ate sets hav.e been computei~ the EVALU function can invoke *he LISP evaluator od tlla logical form. T~E! alternative of including a candidate generatow 'in the setx program and a12 the ~dtential top level predicates and then applying the LISP EVAL function directly seems much less attraative .</Paragraph>
      <Paragraph position="1"> As a pxeliminary, notice that we need only ipsuxe that candidate sets have been established fd'r all the setx variables in a logical for111. This is so Because, while each quantifier has an associated variab-le, the domain of that quantifier is either given explicitly as a If st ,o-f constarrts, or implicitly by absetx expxessioxi. Secondly, since the object of pre-evaluation is merely to find efficient., not neCesr-~ily optimal, candidate set's far the setx variables, we need not keep track of the structure of a complex predic-ate. As an example, consider Figt1r:e 4, whi~h is the logical 5orm foz the question, &amp;quot;What drug stores are located in wazd 8?v The prddicate of the s-e&amp; is &amp;quot;andvT, but for pu~pposes o'f:</Paragraph>
      <Paragraph position="3"> detexmxning a candidate set we can consider each term of the &amp;quot;andw individually. Evaluation of the farm with a given candidate set will ensure that a particular member satisfies both terms of the lVand1l.</Paragraph>
      <Paragraph position="4"> Operation of the re-evaluation function. Pre-evaluation is accomplished by a functioli EVALUA., which takes a logical formr it a setx expression or a proposition as its argument. It determines the type of form with which it is dealing and calls an appropriate specialist roufihe If as in the case of the llandlr of Figure 4, the logical form being considered contains more than ohe component form, EVALUA caiLs i-bself fecursively. Consequently, pre-evaluation is a depth-first, left-to-rimht process. The function always zeturns nil, a woxk beilly a.cczliiipl~~..~, hy changes to global vaxiables. Among these are a LISP variable which</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 17
</SectionTitle>
    <Paragraph position="0"> contains a list of all set* variables in the logical foxm, a LISP variable which lists each query variable for which a value has been founiir and its value, and a LISP vax,iable which keeps track 0% the equality relationships which have been discovered between query vaziables for which a value is yet to be found.</Paragraph>
    <Paragraph position="1"> Operation of the aZgorithm can be better understood by considering somewhat more complicated examples than those seen pteviously. When EVALUR is given the logical 9orm 03  vakiables, and calls EVALUA with the associated setx predicate, :'andvr. As mentioned., t%is simply results in two</Paragraph>
  </Section>
  <Section position="9" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 13
</SectionTitle>
    <Paragraph position="0"> calls to EVALUA, the Sirst of which causes the quantifier Spedlali~t to be invoked. (The second call, when made, will not cause any change to the global lists oE candidate values far variables, since a cand~date set of all parcel identifiexs is not useful for purposes of retrieval.) X39 is added to the list of query variables, and the domain argument of the quantifier is inspected. When this is seen te be an instance of setx rather -t;han a list of constants, two actions are taken. Notice that whatever the domain, of X39 19, it is a subset (perhaps not a proper subset,) 05 the domain of X5, e the candidate set for X5 must include at least a31 of the elements of X39. Further, any restrictions which can be imposed on X39 can also 'be imposed on X5, since the proposition associated with the quantifier is the one to be satisfied, and any candidate not maeting this criterion would be super'il~uow. Therefore, we ban 11 enter into the is of variable relationships the information that for purposes of the pre-evaluator, X39 and X5 are equivalen and 2,) call EVALUA once more with the setx associated with X5 as an axgument.</Paragraph>
    <Paragraph position="1"> X5 is added to the lUlst of set vaxiables, and reinvocation of EVALUA with the setx predicate causes a call to the specialist fox testfct. Since there axe two variables in testfct, X5 and X2, for whfch values are unknown, ascall to the data base cannot yet be made. The instance of testfct is placed-on a list of pending lata base calls,</Paragraph>
  </Section>
  <Section position="10" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 19
</SectionTitle>
    <Paragraph position="0"> preceded by the variables which require values. (Each time a value for a variable is found, that list is inspected, and any data base calls which can then bk made are executed.) Return is made to the quantifier specialist, which calls EVALUA with the predicate ovex: whose ax guments quantification is.made, viz., .crreaterthaq.</Paragraph>
    <Paragraph position="1"> The specialist for numeric predicates, finding that one argument is a variable and the othet a constafbct, causes a hhanse in the variable list to show that X39 and consequehtly X5 are greater than 550,000. A value like ~&gt;550,000~~ can be used by the data base componen* Lo narrow its search just as well as a constant or list of constants, and is therefore acceptable as the value of a candidate list-. These changes to the v.ariable iists cause the list of pending data base calls to be inspected and, since only one varsable is now unknown in the stacked testfct, a call to the data base is made for those pascels with an area greater than 550,000 square feet.</Paragraph>
    <Paragraph position="2"> The specialist for testfct instructs the data base search routine to return as a value a list coxxesponding to the remaining varia-ble in the Zorm, i. X2. In the present example, that is a llst of parcel numbeks, viz., those parcels which have an area exceeding 550,000 squaxe feet.</Paragraph>
    <Paragraph position="3"> This list is then assigned as the value of the candidate set for X2.</Paragraph>
  </Section>
  <Section position="11" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 20
</SectionTitle>
    <Paragraph position="0"> The stack of recursive calls to EVALUA will now unwind, until a return is made ta the eJaluation function EVALU.</Paragraph>
    <Paragraph position="1"> This function de-kermines that candidate lists fox all the se.tx variables have been found? and weates a hew list of variable-candidate 'get pairs for use by the setx functYon itself. Finally, EVAIJJ &amp;an call the LISP evaluator, with the original logical f oxm as an argtrment .</Paragraph>
    <Paragraph position="2"> The case of nesatives ., The predicate wnotwp denoted in our system by not* to distinguish it from the LISP not, presents spec=al problchs for the kind of system outlined above. # simple exa!nple 05 the difficulty can be seeh L~I  &amp;quot;What drug stores are not located In traffic zone 6?&amp;quot; and variants thereof. &amp;quot;When the testfct specialist is given the first half of the anq in this form, along with information that therq is a dominat3ng no**, it could in principle generate a data base call., since there is on1.y one unassigned vaylahle. The effect would be the retrieval of all parcel identifiers of parcels nbt located in traffic zone 6. This is a subsmt;a~.rtial fraction of the dadta' base, and would require in~rdknate amounts of time and storage space to handle Notice that the other half of the and dl1 also provide a candidate list for the variable L3, presumabfy much smaller in size. It appeaxs to be the case? from our so far lirni%ed experience, that questions containing only 9 single negated search clause hardJy ever occur. The evaluator therefore puts a testfct cakl of trhis type on the stack mentioned earlier, indexe tP by the variable( s 1 corresponding to the parcel id en ti fie^. When the second half of the and of Figure 6 is ~XQG~SS~~, and a value found fox X3, the deferred testfct will be unstauked, resuFting in a data base call, and causihg a retrieval based on that list ok identifiers rather than on the negated value. This data base search 1s necessary, since we must find the traffic zones for the parcels contained in the candidate list.</Paragraph>
    <Paragraph position="3"> This example is also an illustration of why, as was mentioned above, the logical form as a whole must in general be evaluated by the LISP evaluator. In this case, the candidate set far X3 derived from the second clause of the</Paragraph>
  </Section>
  <Section position="12" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 22
</SectionTitle>
    <Paragraph position="0"> and is a superset 05 the answer set whicn can anl'y be derived by evaluating the wh~ie aon3~n~ction. Some esf iciencies could d~ubtless be wdined rby ski~~ina evaluatian in those cases where At is ul~nedessaty, hut thnt is purely an implementation deuision The rl-ot-f of Figure 7 presents a dLfferent kind of pxoblem ftaw many banks have a height not exceeding</Paragraph>
    <Paragraph position="2"> from the previous example. Firstly, noti- that the negative must be passed inside the quanti4ie~ since the alternative of &amp;inding all buildings greater than 5 stories in lreight and then getting the complement set with respect toc all buildings is extremely unattractive conbputationally.</Paragraph>
    <Paragraph position="3"> In the sgcond placer a search qualifier of &amp;quot;(= 5&amp;quot; does not intuitively seem to ba much worse than '9 5&amp;quot;. at least in the absence of data base distribtttibnal statistlcs. one might, fherefore, generate search with such a qualifier.</Paragraph>
    <Paragraph position="4"> Oux pxesent eystem does thisl although experience hay show that all instances of t-estfc? dominated by no= should be deferred, as a the cases of&amp;quot;v-=&amp;quot; , for efficiency rea~ons. Other specia~isbs Most of the important specialist routines in Ehe pre7evaluator have already been mentioned.</Paragraph>
    <Paragraph position="5"> There are a few othezs which should be noted. One is a generakoz function which, g'iv,en a pxedicate , will produce its extension, from a stored list.. This featyre was heavilk used in our early system, ahich had a small data base, but is currentLy hardly used at all, though it remains av-aildble. In principle, one could, given a predicate XiHe &amp;quot;SCHOOL(X)tlr generate a list 05 schools. Tn the pzesedt applioation, this would not be useful, but might in soqe other. The sole uses at present a,re q generator for the predicate RANK, far which a list of numbers fxom 1 to 100 3s produced, and for the predicate YEAR, which produces a list of the numbers 1960 to 1985.</Paragraph>
    <Paragraph position="6"> The proposition &amp;quot;[QUA~TITY x slvl is true if #is equal ko the cardinality of the set, 5: The associated specialist hhs the obvious functiorr; of determining when g is an instarsce</Paragraph>
  </Section>
  <Section position="13" start_page="0" end_page="0" type="metho">
    <SectionTitle>
PAGE 24
</SectionTitle>
    <Paragraph position="0"> of setx.</Paragraph>
    <Paragraph position="1"> Equality between variables can be inferred where the domain of a quantified variable 1s niuen by qn Lnstance of setx, as was illustrated above. Certain predicates also allow this inferewe to be made. 1 is clear that predicates like &amp;quot;VQUAL'~, &amp;quot;SAMRREFt'-, C for ''same referencevr Is and &amp;quot;IDENTIC?AL&amp;quot; should belong to this class. Sinco variables can only xefer to individuals, the predicate rvMEMBER'T arso is in this class e . g., given (MEMB~R X3 (SCTX .. 11. a candidate set 'for X3 can be derived by evaluat~ng the seCx expressioh.</Paragraph>
    <Paragraph position="2"> Further..efficiencv conside-rations. Tt has already been noted that generation Srom instances 05 testfct with an operator of &amp;quot;-=&amp;quot; are deferred until enough information is available to execute the quesy using a list of parcel identifiers. Some other steps have also been taken to reduce daka base access time and subsequent evaluation the For one thing, the semantic intesp~eter has a preferred order~ng for instances of the predicate testfct. For example, the relation h~~~~w divides the parcels of the city into 6 classes, while the relation &amp;quot;XUC&amp;quot; [Land Use Code) divides the parcels inte several hundred classes. If there is no intrinsic reason for ordering the instances of tes+fct differently, the one with lvLUC'q wi-11 occ~ ear lie^ in the logical formJ (cf. Figure 41. The pre-evaluation specialist</Paragraph>
    <Paragraph position="4"> Sox testfat makes use of this ordering in two ways. If a Gariable has been assigned a list of identifiers containing fewex members- than some thxeskold x, is currently smt to 25, but can easily be changed), then a retrieval wlll alwa-ys be made using the list of identifiers rather than by a constant compared. to data base values. In Figure 4, the second call to the ,.test-Ect specialist uill look up the ward of the foux drug stores instead of Einding the l~undreds of parcels in ward 2. In some instances, varticularly far relations like Land Use Cde, this may result in mor&amp; d9ta base accksses than retrieving a new set of keys depending on value, but the improvement cannot be large. In many o%,ber instances, there is a big reduction in accesses.</Paragraph>
    <Paragraph position="5"> If the caqdidate set is laxger than.25, retrieval will be made using the oonstant, but the length of the curtent candidate list is used to limit the number of accesses.</Paragraph>
    <Paragraph position="6"> Thus? if the curren-f;~ candidate list is 50, the data base access progEam will terminate if it finds mofe than 50 identifiess wPth the value being used. A re-access is then made using the' list of identifiers. Again, this may r;esuIt in.inefficiency in some cases where searches are ended just before normu termination, but it does provide a guarantee against excessively long zetrievals.</Paragraph>
    <Paragraph position="7"> Any number of other efficiency measures could be adopted? and more may be necessary than we now have. For the moment,</Paragraph>
    <Paragraph position="9"> these seem to pxovide acceptable retrievaf times.</Paragraph>
    <Paragraph position="10"> The EvQluatox? For the most part, evalua%ion of loQica1 forms is quite straightforward. Hidden semantic effects are discussed in the next sectkon; here we are mainly concerned with computation.</Paragraph>
    <Paragraph position="11"> Each instance of setx searchgs the l4st of variable-candidate set pairs to find the cand~date set associated with its own variable and substitutes the members of the set far the variable one by one into ~ts associated predicate. Those members of the candidate set fox which the predicate evaluates to true are placed in the solutlon set. Operation of the quantifier predicates is similar to that 03 setx, except that, as in Figure 5, ~t may be necessary to evaluate an instance of setx to find the domain of the qua'ntif ication variable..</Paragraph>
    <Paragraph position="12"> Evaluation of the ~ther predicates consists simply of applying a coz~esponding LISP function to the arguments. Sometimes the final fagical form to be evaluated bears no obvious relation to the input questi.on, as in Figure 8. The usual reason is t11a.t: a large amaurP of evaluation was done  duxing interpretation. because foxm contained no free varzables. The &amp;uLL logical f0r.m corresponding to Figure 8 Are there more than 25 parcels in the Carhart neighborhood.?</Paragraph>
  </Section>
  <Section position="14" start_page="0" end_page="0" type="metho">
    <SectionTitle>
15986 LOGICAL FORM:
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> The evaluatign of the predicate test'fct is not as nbbious as that of the othGrs One of the design goals in the project has been to make it reIatiLo1y easy to move from one data base to another. As past of that ef5brt, we have attempted to make the LISP programs, as c-ontsnsted tb the PL/I programs, insensitive to the stxuctuxe of the data base. Oux approach to ti has been to define a list strdcture, essenthlly nested binary relatSanS, into which the zeal data st!zucture is mapped. Restructuring is accomplished by the PL/I program which serves as the LISP RSS interface. At the same tune. as the PL/I program returns.</Paragraph>
    <Paragraph position="3"> vafues to the testgct specialist durlng tile pre-evaluation phase, it $oxmatS the corresponding data base items into the sbandard struchre and writes them onto a disk fie In effect creating a sub-data base 5or the particular query.</Paragraph>
    <Paragraph position="4"> 0x11~ the sub-data base is used durlng evalugtion ofgloglcal forms, to find values corresponding to keys in the instances of testfct. In addition to isolating the XISP programs from EUhe zeal data structure, this +actlc makes it unnecessary for any programs called by the evaluator to re-access the full data base, with a consequent efficiency gain.</Paragraph>
    <Paragraph position="5"> Cxeation of the s'tahdard LISP data. bxse into which the real data is translated hap mean* that the set of 1 SP functions has undergone the Least modification in our chang'e of data base from busmess statistics to planning data.</Paragraph>
    <Paragraph position="6"> Except fox improvements made to increase the efficiency of</Paragraph>
    <Paragraph position="8"> programg, these 3!!outines are almost the same as they were besore.</Paragraph>
  </Section>
  <Section position="15" start_page="0" end_page="0" type="metho">
    <SectionTitle>
$EMANTIC EFFECTS EVALUATIOV
</SectionTitle>
    <Paragraph position="0"> In principle the processes which will bw used to compute the answer to a query should be obvious at the level of evher the query structuze or the logical form. We have not, however, been zompletely successful in accomplishing this. In some cases, we can see how it might be done and have n~t gotten around to doing it because of more urgent concexns. In other cases, we can see h~w to ds it, but not how Lo do it efficiently. In a few cases, it is not clear what Vo do.</Paragraph>
    <Paragraph position="1"> Ap~roxirnation. Consider the sexkence and corresponding logical form shown in Figure JO. The precise system meaning of v~aboutw is clearly hidden In the program cosrespond~ng to the operator APPROX. In the present implementation, APPRXIX of y and y is true if:</Paragraph>
    <Paragraph position="3"> I.e., g and 8 are approxima-t;ely equal to 2, 14 ahd 18 are approximately equal to -, 16 and 951 an'd 1049 are app~oximately equal to 1000. Whether +h$s ddinitian Ps</Paragraph>
    <Paragraph position="5"> satisfactory or not clearly depeMs on a variety of contextual factors. IO+ should also be clear that the semantic intexp~etez could groduce a Logical form in which this meaning was expressed directly, We have, chosen to express the meanlng in our processing progxams primarily for convenience, i;e. it was easiest to do it in this way, an4 there was no obvious reason to do it elsewhere.</Paragraph>
    <Paragraph position="6"> A similar but slightly diffe'rent ew:imple is shewn in Figuee 11, where the o..utput rather than the input is to be an approximation to the true value. fn this fnStance, a fyrrction called FUZZUP is applied to a data base value to  PAGE 31 About how many square feet do the,drug stores have ? 7227 LOGICAL FORM: 7479 ANSWERS : Figure 11------------------------------ null  find that number with %he mayimbm number of trailing zeros ,which. satisffes the APPROX relation. The fuzzed value rather than the true value becomes the output. A mQre subtle case is illustrated by Figure 12. It seems clear that what is really wante,d.are those parcels with an area of a m'illion square feet or more, rather than exactly l,-OOO,OOO square feet. If the latter result is wanted, the question is better phrased &amp;quot;exac&lt;ly l,OOO,OOO&amp;quot;r, (and must be phrased- in this 6r a similxr way Sn our system,) On the other hand, a value Like lr000,205 s.eems .t;o imply that eyact equality is wanted. This intuition is captured fn our system</Paragraph>
    <Paragraph position="8"> by having the testfct predicate inspect its numeric arguments with a function called ROUNDNM, which is true if an argument is a round number, defined in our syst~m to be a number greater than 99 in whish at least the rightmost half a5 its digits are 2. In the case-of round numbers, it seems reizS-dnable to give as an arrswer the identifier of a ~ar;cel</Paragraph>
    <Paragraph position="10"> whose area is only slightly leas khan 1 POOO, OQO square feet, as well as greater.. In our implementation, we use the same lower limit as $or 9PPROX, but this may be too low. Xn order Lo insure that *he anawer is correctly understood by the user, the system saves the exact values retrieved and displays them on request, as shown In Figure-12.</Paragraph>
    <Paragraph position="11"> Esualit~ of charactex values. A problem analagous to a at i of numerical appxoximations occurs also in comparing character st~ing values. Consider the question and answer pair shotdn in FiBure 13. The contents of the OWNER Sield What parcels does Shell- own ?</Paragraph>
  </Section>
  <Section position="16" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4244 LOGICAL FORM:
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"/>
  </Section>
  <Section position="17" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4432 ANSWERS:
SHELL OIL COMPANY
SHELL OIL CO
</SectionTitle>
    <Paragraph position="0"> -I------------- .11111-1.</Paragraph>
    <Paragraph position="1"> have not been standardized, so that parcels could be owned by 'vSheL1 Oilw, &amp;quot;Shkll Oil Co.&amp;quot;, etc. Fortunately, far names of persons', last names are listed fixst, so that; the strategy of assuming equality if the input argument and the field value match up to a comma ox a blank is genezally successful. Problems do arise; for example, properties belong both to llThe City of . . . l1 and lVCfty 05 . . . ', wl~ere the left match fails to 5ind a11 the xelevant data items.</Paragraph>
    <Paragraph position="2"> The opposite situation, i.e., aver-generalization, can of what parcels does Gluck own ?</Paragraph>
  </Section>
  <Section position="18" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4525 LOGICAL FORM:
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> course also occur, cf. Figure 14. Tn any event, the decision what constitutes sameness reference buried in computer code in this instance in the PZ/X ptogrsm as well as in the LISP definition 3f the runctjon PAGE 35 SAHEREF.</Paragraph>
    <Paragraph position="3"> ?ef initions. The extensional defini$ion of most predicates can be derived from the data base. A few pxedicates axe de5ined by f11e system code. ExampJes are RANK and YEAR. uuhieh as mentioned above have associated generazors An additional example is LASTYEAR which is defined to be the previous year. Many othez definitions of this kind have been elimihated in the current version df the system.</Paragraph>
    <Paragraph position="4"> .Answers. It is not always obvious what constitutes the answer to 9 question. Consider the example in Figure 15. Both the English question in its literal reading and the logical form would seem to imply tkat the question would be answered by presenting only the numbers in the right hand column of the tahle which is actual3.y printed as an answer. Yet it is quite clear that a simple list would generally be useless without the parcel identifiers printed on the left, and indeed that identification would be expected by the person entering such a question. The example of Figuke 16  is less clear. An enumeration of the three waxdS in which the four drug stores were located might have been a sufficient answer. The answer given would be coryect for Yri bhat ward is each drug store located?&amp;quot; Moreover, given the question &amp;quot;What axe the wards which have drug stores?&amp;quot; it is clea~ hhat only n 3.ist of wards shoul$ be the output, and given &amp;quot;What is the combined floor area 02 the drug only a single number representAng tne total is the desired In what wards axe the drug stgses located ?</Paragraph>
  </Section>
  <Section position="19" start_page="0" end_page="0" type="metho">
    <SectionTitle>
9-403 LOGICAL FORM!
</SectionTitle>
    <Paragraph position="0"/>
    <Paragraph position="2"> answer. (Our system does not as yet answer this questioh or its analogues, klthougth this is planned for later in the yes.) Since the ambiguity exhibited by the question of Figure 14 is so pervasive in an application of this kind, we have chosen to present a maximally genezal answer? in~luding identifications, when we are unable to resolve the ambiguity directly. An exchange with the user could be devised to elicit the information for resolution, but would apidl y bechme tedious for questions of this type., For yes/no questions, and far questions in which there is adly one abject in the answer set, this problem naturally does not</Paragraph>
    <Paragraph position="4"> arise, and the apprapriate answer is easdly produced..</Paragraph>
    <Paragraph position="5"> We have not yet concexned auxseLves with adding an English response generator tb the TQA system. In the applications envisioned at present, such a capability does n8t seem to be critical. We are able to manage with short answers from the data base and with canned information and esror messages. In spite of this omission, it should aka be apparent that our computational component has a considerable amount of lingui-stic knowledge embedded in it, more than we would like. Whether it is possible to achieve a level af formal representation which would make this unnecessazy is stir1 unclear. Moreover, even if i-1; weze passiblq, it is not clear whether such a solution would be efficient enauyh, or even if St would be more pexspicuous than the current system We intend to proceed as far as we are able in this direction, out of conviction %hat practically useful systems must be easily adaptable to new ayplications, and that such adaptation is much hore difficult when computer code, even high-level computer code, must be changed, rather than tables. This is not to impw that we regard modification 09 a table whose size is on the order of a grammar as trlvlal; quite the contrary.</Paragraph>
    <Paragraph position="6"> Nonetheless, we believe it is easier to change-a grammax or</Paragraph>
    <Paragraph position="8"> a semantic interpreter expressed in table form than it is to change a Special parser ox a special interpreter. In essence, we believe it should not be necessary for a computational linguistics project to describe operations Beyond the last level of farma1 representation in order for an outsider to Andexstand edactly how 'a system opezates.</Paragraph>
    <Paragraph position="9"> PAGE 40 This system was fazmerly called REQUEST, The form 03 Figwe 3 is, in fact, subject to tinother syntactic transformation prior to execution. Normally, 3ora-t:l.e~st needs to be executed once for each potential value of the setx variable. However, in the case where the quahtificationa1. range of f0~rat1eas.t 1 is a constant, repeated evaluation of th&amp; quantifier is quite inefficient. Instead, a special retrieval functian called MAPFIELQ, which can accept a lis tJof arguments, replacas foxms like those of Figure 3. In th~s example the re-placement taKes the form ( MAPFIELD 'x77 'JSTOR '(5043 .... ... 00) '1976 ' 1 Although- th~s transfarrn&amp;ion arises- quite oLten in practice, ~t is su$fi,ciently non-general that we have not augmented our inventory of logical forms by including MAPFIELD.</Paragraph>
    <Paragraph position="10"> Instead, we look on it as an implemen&amp;ation measure only.</Paragraph>
  </Section>
  <Section position="20" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> We describe the continuation of an earlier b70rk on the prbblm of lexical coverage. The objective is to prove experimentally certain mathematical conjectures concerning the relationshi? between the sizes of the covering and covered sets of words, an&amp;-- maximun lenqth of dictionary definitions. The data base on which the experiments are cerried sut bas been also extended t6 the full contents -of an existinq dictionary of computer terminology. The rwults of the previous and present work lay the foundqtions for quantitative studies on lexical valence and its relation to the frequency of usage and other principles ofb ditztionary selection.</Paragraph>
    <Paragraph position="1"> Besides the inherent interest in t-hese investigations , the concepts dealt with and the methods of cgantifying dictionary variables may eventually lead to more' efficient dictionaries with respect to precision, compactness, and computer time andmemory needed for processing.</Paragraph>
    <Paragraph position="2"> Supported by ASF Grant MCS 76-24278.</Paragraph>
    <Paragraph position="3"> First, we shall introduce the problem define same basic terms and provide a brief historical account of pi(st results. In order to rendter this paper fiairly self-sufficient, @ brief sunwry of th&amp; previous work, Flndler Viil (1974) , %&amp;quot;ill af so have to be giken.</Paragraph>
    <Paragraph position="4"> A mono1 ingual dictionary may be considered economical and efficient if a mall set of words are used to define a relatively large set of entties. Quantitative information as to what size vocabulary is needed to cover a given number of entries is very scarce and may be characterized by two &amp;quot;data points*: The New Method Enalish Dictionary published by N.P. Best and J.G. Endicott in 1961 uses 1,490 self-defined basic words to explain some 10,000 words and 6,000 idioms, i.e. about 24,000 expressions, Thus, the size ratio is 0.062.</Paragraph>
    <Paragraph position="5"> Oqien's Basic English, published in 1933, involves 850 English words and 50 !!internationaly words to defihe 20,OOu EnglTsh words. The ratio of the covering and covered set sizes is 0.045.</Paragraph>
    <Paragraph position="6"> The basis of selection was the Wusefulness&amp;quot; of the words employed in the definitions, as opposed to the freouencg of their occurrence in some standard texts. Howelvet , neither this concept nor other principles of selection suggested by other researchers have ever been quantitatively analyzes and made use of. We hall discuss these issues later on.</Paragraph>
    <Paragraph position="7"> In order to approach the problem in definite terns, Findler  limitin? value asvmptotically as vs increasep.</Paragraph>
    <Paragraph position="8"> (a4) The increme&amp; ratio never exceeds the size ratio.</Paragraph>
    <Paragraph position="9"> Two points need to benoted in this connection. An exception co rules (al) and (a21 would occur in a dictionary system, whi'ch does not treat polysemous words or homonyms as individual entries, every time a new word with many meanings or homonyms~ i.s introduced into the covered set, Second, the cited case is an exception to ruie (al) but not to (a4) . When N=1, the covering and the covered sets are of the same size, i.e, botn the increment ra'tio and the size batio equal one, However, not every word is defined By itself only. If a new word is introquced that. already has a synbonm in the coverihg set, it will be defined by that synonym. In this caser the increment ratio is 0 and the size ratio becomes less than 1. (This will be clear with the description of the data base construction on page 11.) For the seoond general task, (b) the followinq conjectures were also mads:  (bl) vR monotonically decreases aer I N rncreases.</Paragraph>
    <Paragraph position="10"> (b2) For any fixed value of vS, vR asymptoticallv - a~pfoaches a lower limit as - N increases ahout bound.</Paragraph>
    <Paragraph position="11"> ~ft seems reasonable to state $n a nualitative sense that in the process of aenerating a dictionarJl ~maller vR values mean  smaller storage remirements whereas smaler I N values td to reduce processinp time and output volume. In order tp answer the question &amp;quot;What are the optirum Hlues of vR and - N for a given vs  -- null  for a certain (family of) conmuter applications on a machine with a given cost structure?' one hlts to consider the interrelation of the above three basic variables and to compute three entitles: the semantic index (raughlv, the nwef of different meanings1 of the elements in the covered set, the lexical valence (roughly the capahilitv substituted for another the elements in the coverincr set, and the fwyency of dccurence of ..</Paragraph>
    <Paragraph position="12"> the elements of bath aeta. Quantitative invest2gations of the last three dictionary variables are planned to follou! the present, second stage of our study,</Paragraph>
  </Section>
  <Section position="21" start_page="0" end_page="0" type="metho">
    <SectionTitle>
THE DATA BASE AND THE PRWRA'
</SectionTitle>
    <Paragraph position="0"> We have e~tended the data base used in our preaious work, Findler and Viil (1.974).</Paragraph>
    <Paragraph position="1"> The whole contents of the alcwonary on computer technalooy, Chandar (1970) , is now included in the presenr study. Its structure, cathex simple and unif arm, is described below. First, same ereneral principles of data hase constructiop are outlined.</Paragraph>
    <Paragraph position="2"> Evew element of the covereg set is considered a single lexical item, regardless of the number of words the ori~inal dictionary entry consiats of. Also. each word J.s coded as a striha of at moat 10 characters (containable in one CDC Cyber computer word). The abbrevhtians ere still easy to *ad with relatively short practice.</Paragraph>
    <Paragraph position="3"> Only the dominant meanincj bf poJgsemous terms was dealt with.</Paragraph>
    <Paragraph position="4"> Each entrv .\ has thus one meaning and one definition. Termr in the definitions (elements of the coverfng set) are also qonsidored lexical items, Le. even multiword entl.t~es appear as a sfnple unit and are represented by at most 10 characters, The basic vocabulary, that is the covering set, consist^ of elements tha.t also appeap in thq pvered set-. In our particular case, they are non-technical words used to aefine the technical tens of t-he computer dictiwanfi. definite distinction was made between content wbras and functiqn words (also called operators), The latter were not bnclud&amp;d in the covefing set nor were they counted in determining the length of definitions. Hence, eh* covaring Set conbists only of content words.</Paragraph>
    <Paragraph position="5"> The function words indicate grammatical and loaical relationships between the words contributing to the content. They belong to 17 categories:  1) prepositions, o.a. of, &amp;, E: 2) conjulllctions, e.p. and, - -r or - if; 3) the relative pronoun which; preposition. and relative ~ronoun, which, to which, bv whichm - L' 5) prefiefit particlple~ equivalent to a areposition, e. g. usinq ,I containiny, representinn: 6) comkinakiens of participle and preposition, e4b consisting of, opposed to, applied to; 7) combinations of ad j sctive and preposition, em q, capable af, ~xclusiva of, equal to;  8) combinatiohs of noun and nrevsitian, e.~. part ofi - set of, - number of; 9) combinations of ~r'eposition, now, and preposition, e. a. in terms of, bv means of, in the form pf; 10) prepositional phrases associated with a f ollowiny  Actually, the E~ction words byere rc~laced hy code numbers in the dictionary. The code numhers were assigned consecutively as the function wards-evere needed durincr the conqtructian of the eta base so that the order is puxelv random. A complete list of the 121 FunctioR wards used, toaether with their code numbers, is qiven in Table I.</Paragraph>
    <Paragraph position="6"> &amp;quot;w-m----~-~-m---m-mmmmm-pm---*</Paragraph>
  </Section>
  <Section position="22" start_page="0" end_page="57" type="metho">
    <SectionTitle>
IN$FRT TABLE I ABOUT HERE
</SectionTitle>
    <Paragraph position="0"> he oriqinal definitions were oarnewhat silnplif ied and qtandazdized. In this process, articles were omitted (many languaces do very well without them). On tffe other hand, implicit relationships were made explicik, NOWS are represented in singular, thus avoid in^ another dictionary entrv far plural or, what would be worse, prourasmnina a mcrrammnr&amp;quot;. Likewise.</Paragraph>
    <Paragraph position="1"> fdinite verb fom are represented in third person plural pre'sent iddicativs active.</Paragraph>
    <Paragraph position="2"> FNddinu the third person singular eliminates, another dietianary entry, and avoiding thg nassive voice eliminates a great manv participles, which otherwise ulould have had* to be entered. Of course, present .and past participles (the former identical to gerund in farm) could not always be avoided and had to he entered in the dictionary where needed. Auxiliary verbs vere automatically eliminated by avoiding gompound tenses and the passive voice. Finally, *to don associated with neaationp was sim~ly omitted.</Paragraph>
    <Paragraph position="3"> Some examples dl1 make the encodina process clear.</Paragraph>
    <Paragraph position="4"> Original dictionary entry: aberration A defect in the electronic lens svstem of a cathode rap tube.</Paragraph>
    <Paragraph position="5"> Definition in the data base:  110, belonging to 111. correspondlnq to 112. due to not 11 3, zeq-red far but 114, type of  extended to 115. across SO as to 116. because for example 1 7 desigxied represented by 118, indicating along which 119. produced by representing 120. outside against which 121, towards</Paragraph>
    <Paragraph position="7"> Wote that melactronic lens systemn (should be: electronic-lens system) means * 'system of electronic lensw (as opposed to *electronic system of lens*) , and this relationshiv is nade explicit. Note also that &amp;quot;cathode Pay tubeM is a sinqle lexical item.</Paragraph>
    <Paragraph position="8"> original d ict~onar!~ entry: ahsolute ccdj nq Pmqram instructions tl~hich, have been rb~rittqn in abaolute code, and do not reqyl-re further procesaina hefa- bnina intelligible to the computer.</Paragraph>
    <Paragraph position="9">  Note that the fryst predicate in the relative clause, thim person plural perfect indicative passive, is represented by the* singuldr indefinite pronoun &amp;quot;one&amp;quot; as sub jeet, follobfed by the standard olural active verb. The Auxiliary &amp;quot;dou has been omitted and the negation is represented by a function word. The virtually redundant &amp;quot;beingw has also been left out. In qeneral, the cormla is omitted (some lancpa~es do very well witbout it). Original dictionary entry: analytical function qenerator A function generator in which the function is a physical law. Also known as natural law function generator, natural function generator.</Paragraph>
    <Paragraph position="10">  The styligad definitionsbare easily mhderstandable even to human readers as the printout or the dictionarv demonstrates. The data, base was constructed by selecting the first entry, then entering all the lexical items in its definition, subsequently enterinq all the lexical items in the definitions of these, etc. Words that were not defined in Zhe original dictionary were entered and defined hv themselves; they constitute the basic vocabulary. This procedure was continued until everythhff was defined, i.e. until all the terms in the coverina set were also irr the covered set. Then the next entry was selected from the dictionary, ah8 the above process was repeated. -.</Paragraph>
    <Paragraph position="11"> The dictionary was arranged in the form pf a SLIP list, ~indler et al. (1 971) . Cvery entry (element of the covered set) occupies four cells in this list: (1) enkrv word - (as character data, usina FORTWN format specif i&amp;B'tIon A10) , (2) def inition length (an inteqer) , (3) type of entry (an integer) , (4) sublist nahe.</Paragraph>
    <Paragraph position="12"> Three types of entries were distinguished for programming convenience : I 1) code 0 indicates that the entry ikself is not used in any definition i,e, i;t occuxs only in the covered set and not in the covering set; 2) coda1 inaigates that the entry occurs in both Sets and is not an element of the basic vocabulary; 3) code 2 indicates that the entry is deiiined by iteelf, i.e. it belongs to tHe basic vocabulary.</Paragraph>
    <Paragraph position="13">  The sublist the nahe of which is in the fourth cell for every entrv .,, in thq main list, contains the definition. his arrangement convenient1,y separates the entry worda from those in the definitions.</Paragraph>
    <Paragraph position="14"> A cell in this second level contains either a wbnd (in A10 tormat) , i. e. an element of the coverinq set, or a sublist name. The codes f,or fQnctian words (integers) are contkined in the cells in the third level, This arranaemen-t&gt; is wntrenient for bypassinq the function words ih orocessino vhen they are not needed. The aeneral dictionarv entrv and an example thereof are illustrated in Figure 1.</Paragraph>
    <Paragraph position="15"> INSERT FIGURE 1 =OUT HEFE The fact that every dictionary entry ovns a sublist is aractical in another respect: useful information about the entry om be collected and deposited in a description list associated vrtth the sublist, Pot example, if it, were desired to evaluate the definition component of the lexical valence of each lexi-1 itm, a proaram could be developed that counts how manv times a paxtkular item occurs in the definition a.f ~ther items and stodes this information in the description list created for th~t item. Investiaatians of thf s nature vill be done buhsequently . The task is to establish experimentally the reistionship between N and vR for fixed values of vS. The Program starts out  ctu?s, Pindler and Vr 31 (107U), or one calculate6 far tbe extend~P data haqe The q3 ze of tbc coverlnn wt, 7yR i s then def+nitwns cf lerrth 1, 2, 3, I!-, etc. [m~~~lJ, COAF ? nCapC that SUC~ entr~es ere not Aecrned therqelves and occur pot!- Jx, the mvcrlncr arc? the covered net ) Pfter t%e Putst~tut~ar~ ?re made in all deC~njtfene an6 the rorc's are countee out of vh, the for dl fcerent 84 7e cavere3 *SC~S , 1. e. vS js</Paragraph>
    <Paragraph position="17"> c~nst~nt 3e~cI.s~ for eath I. (r e rote tFat a ~ui\rtltatia~l\~ r ore cat7 sFactorv ref ]Per ert ccsul? hatre been a$de? t the wth all the reralnrncr defanitinns, and tl-c%e which do not arnear m ant defirltlon are to be el~rinaked. ThQs a hawc bard \70uf8 occur jn the drctionan? nnlxr If it is needed in a Pef~nltlon, vhlch ~fi the case in tbp u9lreduced Plctionarv. Thy c. I av, q more natural ~romrtion between the hasic \-ores and other.; cou&amp;e be restored. Fmvever, Jn tbe present prel~mmart war), tTe $$d not 7 lsh to pav the considerahlv bioher price for such ref ihement. ) The procrram 1s verv com~ler for two basic reasons. First, the def ~nitlons of pards to he replaced matT themselves contam one or more words to be replaced. Tberefcre, as ranfr as necessarv zterations of rgplacment have to be carrleA out ibl the orocess. Second, tbe huae date hase revresentino the uhole dlctlonarv bad to be s@dividea ~nto fxles onlv one of brh~ch can be dealt w3 th hy the nronran at a kine. The lptemedlate results of one run P%ve to he transferred ta the subsenuent run, trrhid remj res some trM v vracTramrrJna. A hrj ef desclrlrstloh of  tbe multi-fxle Aandlqncr is smen rn the AppendJx.</Paragraph>
    <Paragraph position="18"> Figure 2, sununarlees the results for four different levels of the cavered set. Althauqh the procedure followed (leavinn one ,and then two fr les out of the nine, and adqustlnq for the hias intro8ucedl leads to twantltht~ve jnaccuracl es, the con~ectures llsted 1 n the Jntroduckicm are fully corroborated.</Paragraph>
    <Paragraph position="19"> .IIC-~~CI.I~DI.LLI)~.IIICIIICqlLLo~.Lo.L.L~Lo</Paragraph>
  </Section>
  <Section position="23" start_page="57" end_page="59" type="metho">
    <SectionTitle>
INSF-PT FtcURF 2 ABOUT HFPF
FIYAL CCIETN'J'F
</SectionTitle>
    <Paragraph position="0"> The data base encoded, some of the prooram used an&amp; mast of all, &amp;he exper3ence crained in deallno urf tb E4cUonaries and thel r character3 stlc varlahles o~i 11 be useful in aktac].lno the next set of prohlms, mhe Latter re3ate to the mestion on what size vocabularv iff needed to cover a criven number of dictiohary entries (without the ubl nultouq cfrcular defl nitions) . The answer should be owen a4 a function of storaqe reoulrements and +rocess~nq tjme ao that an optimum solution can be obtained for a famllv of appljc8tlons on a mach~ne trf tb a ajven cost structure.</Paragraph>
    <Paragraph position="1"> Such studv will involve the semantic frdex of the elements 6f the covered set, the lexical valence OF tlre elements a* tbe coverxnu set, and the frenuencv nf occurrence, of the elements of both sets, ~p.cKMOWLEr.?cE~mE we thanr H. Viil, who co-authored with one of US (N.T-.v.) the fzrst phaqe of thls or, or mapv 3cleas and stmulaths d~scussians. We are also indebted to Penauin Rooks for thezr  permussion to use one of their publications as oar data has&amp; In the followinq, we sive a brxef degcrlptjon of the wav vultl-frle bandling has been orcranized.</Paragraph>
    <Paragraph position="2"> It was noted before that the hole djct~bnary could not he f5tted i~ the core maom at one the and, therefore, the data base had to be subd~vj-ded jnto Q f~3es to be nrocegsed separatplv. There was a need, hawever, far ~ame flaw a+ mfornratlon between mas dealin9 with the different files. Tbls was arranoed by additional files constructed durlno nrocesslna tine as v~ell as a fecl7 control varrahle values Fejna read fros cards at the bealnnlna of runs subseauent te the i4rst one.</Paragraph>
    <Paragraph position="3"> The varjahle KNTPFT i~dicates the sectran of the Ajctionary currentlv under studv, The variable IPCONT 1s set to 0 for the venr first run for each N value. Thfs tells the proarm to set Ir up new lists tor Cover&amp; Ili~t, Coverina Lht, and SF-called f?ait J nu List. tn all subseauent runs, its VetSue is 1 t~hl ch indicates that the proarap must brjna these lists fn from an addit~onal, external file.</Paragraph>
    <Paragraph position="4"> The nrcmrar exanfnes tbe current qeetion of the dlctjonarv, entrv v entrrf. IF the entrv I- an ele~ent of the haslc vocahttlarr (tvpe 2)&amp;quot; the prooram byaasses it vhen it Peals wxth the unxebucea itictxonaxv (~t +9 hound to be nraeessed as r~rt of a def jnition Later) . Fthertui ae, th s type of r ad is hedlately added to both the Covered List ant? the Paverincr Zist (c?ucb ~mrd aluta~rs caverq itselr) , since tb~ Ocrf~~itlon~ in tihlcb tbev occur may Pave been ~l~rn~nated.</Paragraph>
    <Paragraph position="5"> Tf a vrbrcl 3s not founil fin the Coverad ~1st~ It in ~i~t th~r~ and the appronrJ ate counter is 3 ncrenentad. vhen all the t ores zn the deflnj thn of the v70rA In csuestxan are nut an tbe F'ajtJncr Llst, vhhtch 1s suhsementlv processed. Thy s 3 s recessart? because of the ado~tea rr~rc~rl-e that all t covermu tnrds ruqt thmselvecr Fe covere?. (Tabuhttd data are Wan~ncr*ul onlv If tbw condxtlon 4s satlsfjefi,) The rracrrar eventu~llv exmlnes the Pa+tlncr t~st %wrA htr tmrrj, If the carrent vmrii I$ alreadlr an the Pnvcre8 Ll=t (jt mart have recurred earljer in the Pirtmnarp) , the nronyam cbrcl.s if ~t is a190 fin the Cover~ncr Lltqt (~t msv n~t he becan~e ft bas net' vet occurred in the Ctef~nq tmn of anotber I ard) . If not, ~t %p nut there ~nd t%e avnrcnr~ate rourtex 1s .rtlcrepsed nX3 I c\rec: or the ~~a~tinq Ilst COFP fr~~ rl~finrt~~ns am1 lnust tFerefore be ad fled to tbe Cnverjnc qt .</Paragraph>
    <Paragraph position="6"> after 8 mrP ha^ keen rsocessdP, it 1s deleted frer the fhjtl~o f 7 ~t (but it4 proce~~1~0 FaV ~AVC caused net, entries to annew nn the yT~itlnrr f let) .</Paragraph>
    <Paragraph position="7"> Tf the curr~rt ~~ort? j c: n~t Pn the Covered Tht, ft rust, of course, he nut therp. Pmt, hh~der, the proor~lr testp If the card, occurs in the sertlon aC the (1J~tronam qy~ently in core mmorv (it9 %nuer~ca1 ~alue?~ Fettfleen tFe9e of the fjrst end the last vord of the sectjon) , Tf' the \tor8 IC not there, it* processinr 1~ ~ast~oned and the nevt ~mr? on, the \7altlina 7 1st fs exaln~nee! because it zs ,yare econo~j ca3 to nroceqc. f 7 rrt all tFe PII !lords avalrlahle in the dr ct~onaxy sectlan present than to r~e4 {n other sect~arct of tbe dlctwnerv a6 tte wrPs dfctatc zt (meronr svfppno I s expens~ve) .</Paragraph>
    <Paragraph position="8"> '?en the hottar of P non-mptv wa?t~na t~~t I$ reached, the wrds r~najn~ncr there bust be in other srctions of the AI ~C~onarlr. @u~se~uent d ~ctlanasv ~ect3cn~ are k mu~ht 111, to replace the current ope, in a c'w11 c manper wltj 1 all rrocesslnrc u ccmnleted.</Paragraph>
    <Paragraph position="9"> American Journal of Comput atlonal Linguistics</Paragraph>
  </Section>
  <Section position="24" start_page="59" end_page="59" type="metho">
    <SectionTitle>
COIIPUTATION IN D E P A R T ti E N T S OF L I W G U I S T I C S
RICHARD FRITZSON
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
Department ,of Llnguis tics
State Unlverslty of New York at Buffalo
</SectionTitle>
      <Paragraph position="0"> Buffalo, New York 14261 That computers and linguists meet, for the host part, only in the skill sorne~hat evotic field of computational linguistics is $ sad statement about the st2te of ordinary linguistic research The titre bhen computers were to be considered only the tool of the natural scientist or the statist$cally minded social scieqtist is long past, 'word processing technology' is now the specialty of a growing number of computer companies Not only can this techrology be of great value in reducing the clerical burden of the linguigt and linguistics student, but, iiinguists, as specialists who have been studying and manipulating language far years, are in a position to be contributing to this field Jn fact, in many areas of linguistic research the analysis of particular languages, the search for li~rgui&amp;tic universal s, the analysis of discourse and text, computar technology can bc of help tc the linguist, and, in many subfields of computer science automated lnngua~e processing, the deslgn of human/machme i~terf~ces, the structuring of data bases, linguistics has much to offer the ccnnputer scientist, vet up until how, relatively few such cross contributions have been made Computer scientists have been slow to discgvei the vdue of Ilnguistirs to their wor~, the tine has come for linguists to take the initiitivc and to train themselves (and their students) to hake use of and contribute to the field of computer science, Speci?lized traltning in the us&amp; of the corrputer with-ln a particular discipline is not new Students in mary soclal sciences nok flnd themselves facing Lncreaslng pressure and rnanaatorf rcquirnQents to take co~ptsr training wlthin the* department, f iflghi~tics is, In fact, unusual III not having such requirements or even oppur tunities At a time wnen graduating linguistics students al. facing a shrinking job market, the oppurtunity to be trained ia s ~commercjallj useful application of lin~uistics ougnt to be attractive to many students Today, in most unfversiti~, coaput41,g is dvililable to linguistics departments only through the use of a large, central university computer which is expected to he of aervice to all university departments. But, as computer casts continue to fall, and, as larse computing centers continue to be unresponsive to the needs of their new users, it will not be qneammon to find more and mbre departments purchasing their owh computing facilities and buying or developing their own software This is already happening today, both by externally funded individual researchers and by entire departments in need of specialized computing facilities What kinds of computing equipment are available4or a linguistics department crying to equip itself today? My Bnswer is structured, to some extent, by the organization bf language It is widely understood, even by non-stratificational linguists, that the faculty of language is based on a stack of structured systems, each one building a large number of units above from a smaller number below, i,e a handful of phonetic features combine to form less than fifty phonemic segments which combine to form thousands of morphemes,tens or hundreds of thousands of words, an infinite number of sentences and texts expressing countless ideas and concepts It will not be surprising to find that as one climbs this &amp;tack, from phonology upward, the amount of computing powet needed to perform useful tasks and research increases in proportion to the increasing number of units and the complexity of their structuring I will concernmyself, mostly, with the possibilities available for the study of the lower levels This PB because the type of linguistic work being done in the study of the semantic and cognitive levels is still primarily research and the people involved ere more likely to already know their needs and options as far as computing goes Also, since the cost of computing in these areas is somewhat higher, it is less likely that department&amp; will be doing their own purchasing for these purposes</Paragraph>
    </Section>
  </Section>
  <Section position="25" start_page="59" end_page="59" type="metho">
    <SectionTitle>
HARDWARE FOR T)IE PHONOLOGIST
</SectionTitle>
    <Paragraph position="0"> The 8tudent of phonology, morphology and linguistic f feld analysis is concerned .,primarily with the manfpulation of linguist~c text, expressed as a series ot phonemic symbols or blokks of phonetic features. fie task is to identify identical or similar subqttings, correlate their appearance with a particular meaning and segment the text into these identified substrings As new substrings aft fdentif ied, the text is ofttn rewritten with o new orgcx~izatlofi based on new understandings, so as to improve the chances oP f idding new ~ubsttings, field workers often use index cards for this purpose Problem after problem is solved in this way, with a pot insignificant amount of time being spent in the reorganizing and tecopying stages It is a tedidus business because it is t ery mechanical In fact, efficient computer algoritllms tor doing much of thq job already exist and have been implemented on nedrly all computers in the form of text editom The task 1$ relatively simple and even the smallest computer available can do an adequate job A linguistics department intcrested %n providing its students ~ith training in the use of computers for this kind of work (and they will become standard toola for the purpose very soon) would do well to purchase as many (one or more) identicdl, small (hobbyist size) computers aq it can afford For educational purposes, the very snalles t microcomputers, equipped dith qodest ~ss storage devices, such as tape cassettes or floppy discs, are jbst fine Assignments in classes can be distributed on departmentally owned or student owned tapes or discs (less that $10 each) These can be automatically duplioated just as assignments are nov mimeographed, they are reuseable and usually Contain enough room to store several assignments, including the partial results &amp;tom day tp da and final solutions For larger, research sized projects, involving dot of telt, or more complicntcd analyses, suuh as automated analysis of phonof ogical tactics, the fasteat microcomputers, with larger mass storage devices, might be more appropriate (Imlicit in the discussion of these types of machines fa the fact thst student use of them is via Bh interactive terminal Microcamputers are not typically operated in 'batch mode', and no benefit could be derived from doing linguistic analysis in anv but an interactive mode of operation ) Whih nl~crocomputers and associated memories are relatively inexpensive, linguists have a genuine need for sophisticated input and output d~vfces which arq somewhat more expensive Standard coaputer terminals generally provlde all md only the characters available on a typewriter keyboard, some provide only upper case letters What is needed is a terminal with the sme capabilities 9s the selectric style typewriter one with changeable type fonts, including the standard phonetic symbol alphabet, yith diacritics CRT terminals' (cathode ray tuber terminals) can provide this type'of operation more cheapty, more relkably, and more flexibly than printing terminals (there is no need to stop and change type fonts) CRT terminals which support user designed type fonts are available, arid in fact, may be the only ones on which the standard phonetic alphabet can be cu~rently supplied These terminals at2 somewhat expensive (apveral thousand dollqrs each), but since they are very flexiblel and often I support some degree of computer gtaphics display as well ad haung che potential to display texts written in any language, they are vahuable educational taolc If all or most of the termint\le in a department are CRT type termlnala, it will be necessary to provide some means of producing 'hard copy' output on paper While most interactions with a computer can take place on a screen, some record of the results of a session will be needed for study and evaluation Printers which can handle the fhexible type fonts needed by linguists are available They are fast they operate in the dame way that copyillg wachines work and simply transfer the contents of the CRY screen to the paper (including graphic materials) They are expensive However, a small department mlght well find that only one of these printers is necessary to meet their n&amp;ds, the results of ~ork done on any of the snall microcomputers could be moved (either over cormbunication lines or carried on a disc or tape) to the printer with little or no delay</Paragraph>
  </Section>
  <Section position="26" start_page="59" end_page="59" type="metho">
    <SectionTitle>
HNU)\JM FOR THE GWW.'IA,Y
</SectionTitle>
    <Paragraph position="0"> Syntax is, perhaps, the most widely studied sub J ect in linguistics today Given that this is so, there ig a real need for linguists, both profess$onal and student, to understand the extreme difficulty of the task of writing a grammar for a language That attempts are made to do this without the aid of a computer is perhaps all the evidence one needs to see that the difficulties are not well understood. A formal granmar, particularly one written in the notations commonly used today, is very much like a computer progrm It is a list of instructions for generating a list of strings, a computer program is a list of instructions for performing some process (which might be generating a list of strings) Both need to be precise, both are very complex, both suffer from the fact that a change in one part of the ordered list may cause an unanticipated change in the effect of another part It would be very surprising to find that linguists were better at producing untested, yet correct, #formal charac'terizations of complex processes than computer programmers I eXpect that testing a newfy written gtammar will be as enlightening as experience far a lihguisaics student as debugging a new codolex program is for a gompuer science student, Furthennore, just as the computer is sf uae in studying phonology and morphology, it can also offer data organization servicestto aid in the study of syntax Automated tactic analysis of syntax is still a research project, the! software neceqsary for it is not likely to bc produced by a aoftware house. But the research ia probably best performed in a linguistics department Having established a need, we must now recall a warning made earlier Useful contributions to the study of syntax by computers requires more computing power than is needed for similar contribution% to the study of phonology and morpl~ology While the need for sophisticated type fonts and input/output devices is lorer (nor necessarily a good eductional svntax program \odd permit: the manipulatioh of syntactic trees on o graphics screen), there is a real need for foster processors and increased memorv capacity To purchase the necessary computing poler, a department would have to step up from the hobbvist microcomputer size machines to the scientific research minicomputer (e g the middle range PDP-11 series) These machines cost a6 order of magnitude more then the microcomputer and yet, when the subject is syntax, will probably only serve a feb students at a time An alternative, available to some departents, is to urie the university's central computing facility* Money could be spent on the best available terminals and the needed comunications equipment Grammar testers have been written by university researchers for typical university size computers (Friedman 1971, for transfbrmational grmars, Kehlen 1976, for ATN grammars) and are available at little or no cost As I mentioned in the beginwing, $he use of the computer in the study of semantics and cognition is still very much a research togxc and little, IPS any of the work being done currently can be performed on small computers I will not descrgbe the requirements of such work since they vary widely depending an the nature of the work, SOFTWARE FOR TFF LXECUf ST What is missing from the computing faciiities described so far Zs software, programs which are of use In sglving linguistics ptoblems The small coqputqrs are sold with a minimum of very tradittonal computer sdftware, none of it of any use to the nonprograming linguist In tact, at no level of computing powerlis there currently available comercial software which is of use to nonprograming linguists For large computers, as mentioned above some of the results of university research work is available for some purposes However, for the types of machines that departments are likely to purchase, there is essentially nothing This problem can be overcome in two ways The standard method is for a department to hjre a student programmer to design md write the needed software This has several advantages it is relatively cheap (especially when university assistantships are available for the purpose), it is personal - the student can be instructed to wrxte exxtly the kind of program thqt is needed The disadvqntnges of this n~cthod Ire in thc quality and dur~bility of the systc~t~s produced Jn this wqy Student ylugronuncrs ?re, in fact, students learning to program Of ten the11 korfc is lwlcjng in the ' ease-oizuse' or 'hunnn engineeringf features found in well written, comme~cially produced programs, and, it is just theqe featu~cs which are very important to useos not Eamibiar with or comfort~ble with compyters Furthermore, programs produced by student progranuncls are not well known for their reliability, maintenance of them is difficrtlt and usually restricted to the period of time that the original programmer is still available Again, to the user unfamiliar with computers, reli3bilit.y is a very important feature It is very discouraging to try to do anything with semi-operational programs An alternative is to create sufficient demand for this type of educational software so tnat a commercial software house or a well funded university programming grqup would consider the investment of its time and money profitable' With linguists and linguistic educators providing input at the design level, very useful and reasonably priced software could he produced in this way The catch, however, lies in generating sufficient demand A final comment about one other potential use of computers within a linguistics departmeqt The search for language universals (cross linguistic research) requires very large collections of uformation A~collection of partial and complete grammars along with sample texts for a large representative sample of human languages 1s a formidable amount of informatio~ The kinds of questions posed by linguists using this information do not require immediate interactive response In fact, they traditionally require weeks or months of library research for answers It is therefore not unreasonable to consider the storage of this information on a small, even hobbyist size, computer equipped with large mass storage devices The task is a difficult one, but of potential value to both linguists and computer scientists Linguists need easier access to thie infolmation A computerized database, structured accoiding to the needs of linguists, would be a very valuable tool wl~ich could be distributed to my dtp lt tmcnt ailliny to Make thc necess ~ly invc tmcnt in I~aldwlrc lllc d 11 11) ~sc i 1 KI g c, hut unl ih~ many other I?rgc d ltab lees, it is ollr ~hout who t ~tluc tutt n ~t dt 11 i v known Computer scltntists 3ra till loohinp f 01 \ ~vs to cffc~ tivel) qnd eff fcicntly org?nfzc d .it?b?.;es, q11d lin~uist s, 1 ith thtii intimqtt h~lot*lt.dke of the stru~ture of l?nyu?&amp;c, have ln oypuxtunity h~lt to plovidt In t\ampl~ of how to usc the ~~IUL tulc of 1 hodv OJ illfox1113ti011 111 stollnp it on a r con~put~~ t f f cc tivt 1y It i\ K t ~%h 11i~11 rt qu-11 t 5 t llc c\per t knot ICL!?L of sevelql linlui5tic dis~iplint s ?rid it Is .i lest 11~11 ploje~t idculy suittd to  Hamburg, New York 14075 An tdea and a Problem Contrary to a famous oplnlon, prlntlng just let us see what But thought is nonlmear, and l~nearlty came in nth speech, we had been saylng all the tlme conversation flows as prlnt ,,  cannot Wlth electronlc publication. we wlll be able to move through a permanent recorh of collective knowledge wlth some of the flexlblllty that conversation has always allowed But why a permanent so 1s art ,Kuhnlan  of ideas that gradually elimmates errors from sclence and ylelds pleasure in art However, none of us have mvch experience m the new modes of comunlcatlon Slnce all need help, we must--1n the famous phrase--explain to each other what none of us understand A Method THE PRESS at Twln Wlllows is mostly a method The method 1s to use prlnted paper, famlllar to us all, and mlcr~flches, famll~ar to many, m shlftlng comblnatxon wlth the unfamlllar electronlc medla A computer wlll be Installed In the offlce of THE PRESS, and used 'from the beglnnlng for adrnlnlstratlon and text preparatlon Edltors of books and journals that come to THE PRESS can submlt on floppy dlsk, on they can also submlt on paper reptoduct~on, raprd prlntlng, rec~rd~ng to drlve a compdter Mcroff ches 9~~~1 be suggested casette, or by telephone, but Publlcatlon can Be by photohlgh-qual~ty offger, magnetlc or thrpugh the telephone net far many pbb llca t ions As edrtoss andxeaders gradually become famil~ar with the .</Paragraph>
    <Paragraph position="1"> new systems, teachlng each other as they learn, be can expec't the contents of publlcafions to become more and more suitable to the new media; and less and less sultable to the old</Paragraph>
    <Section position="1" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
Services
</SectionTitle>
      <Paragraph position="0"> THE PRESS at Win IJillods will offer services at every step from the author's conceptualization through advertf~ih~ of the finished work.</Paragraph>
      <Paragraph position="1"> Editorial. For its clients, THE PRESS will help if necessary to rind expert readers who can submit opinions and suggestions about the content of proposed articles and books. THEPRISSS will provide counsel on readability. THE TRESS will mark up copy for typographic form, lay out pages, and otherwise give t'raditional redactory services.</Paragraph>
      <Paragraph position="2"> Adminis~rative. - For its clients, THE PRESS till maintain tickler mes dnd issue reminders to contributors and readers when their submissions are due. It wil1,prepare budgets and keep accounts. It will maintain mailing lists, membership lists, and consultation lists. It will conduct membership survep and elections of officers.</Paragraph>
      <Paragraph position="3"> ~iblio ra hic As support can be obtained, THE PRESS will in collections and add its own classifications and subject labels to make bibliography available to clients. Thus the preparation of a bibliography for a work in progress can be assigned to THE PRESS, and a book buyer can fallow up references or ask for selective drs~emination~ Educational. THE PPSSS wilI shortly begin publicqticm of a newsletter -for clients and prospects: Services and How to use them, the competition, hew products in hardware and software, publications and courses for authors and editors, and personal notes from the field of electronic publication.</Paragraph>
      <Paragraph position="4"> Conferences, workshops, and courses will be organized as the field needs them and can support them.</Paragraph>
      <Paragraph position="5"> Handbooks, manuals, and other materials for editors will be written or collected as feasible, catalogued, and offered for sale or gift.</Paragraph>
    </Section>
    <Section position="2" start_page="59" end_page="59" type="sub_section">
      <SectionTitle>
Pricing Policy
</SectionTitle>
      <Paragraph position="0"> tlethods and materials will be designed for each client initially; later, a catalogue of components of the ~ublication process will be prepared so that the clienk can do the design work.</Paragraph>
      <Paragraph position="1"> Beyond the direct cost of labor perforqed and materials consumed at THE PRESS arid of services purchased for the client, the equipment used will be htlled at a ratesintended to give rapid amortization, and a management fee of 15bdded.</Paragraph>
      <Paragraph position="2"> This policy should bring the cost of information--books, journals, and electronic access--within the limits of anyone's purse.</Paragraph>
      <Paragraph position="3">  To hap hobbyists, householders. businesses, and government keep up with the c~untless vendors who offer hardware and software in the microcomputer market, THE PRESS at Twin Willows will begin imediptely to collect and publish evaluative, analytic reviews, according to David G. Hays, Publisher.</Paragraph>
      <Paragraph position="4"> &amp;quot;When the computing market was dominated by just a few big companies,&amp;quot; Hays says, &amp;quot;it was fairly easy to decide how to handle a computing problem. Once a buyer had settled on a computing budget, the market might offer only two or thtee main frames big enough and cheap enough to do the job. Now the buyer can design a machine to fit a~purpose, and 'has to choose components out of lists that run up to dozens of alternatives. The worst part is, no one publishes th@ list!&amp;quot; THE PRESS intends to correct part of the problem by making useful information about the market available in easy language and inexpensive format. &amp;quot;Before long,&amp;quot; Hays expects th.e hardware and software reviews will be accessible online for clients 60 dial in.</Paragraph>
      <Paragraph position="5">  Where will the reviews come from? THE PRESS invites any user of any microhardware or software to write it up; the - more From THE PRESS at Twin Willows - !lay 23, 1978 72 editors at THE PRESS will rewrite if necessary, make sure that the evaluations are not illegally harsh, and eliminate the most obvious errors. No fees are offered to reviewers at present, but a change is contemplated. &amp;quot;Everyone who helps should be paid,&amp;quot; as Hays puts it.</Paragraph>
      <Paragraph position="6"> Manufacturers and software houses can send their lists and item descriptions to be included with the evaluations. THE PRESS, tvhich will also publish original material in whatever technical fields need its services, is &amp;quot;mostly a method,&amp;quot; Hays says. Its purpose is to teach information users how to cooperate with each other, making central publishing f ess relevant.</Paragraph>
      <Paragraph position="7"> Hays, who is setting up THE PRESS, is a professor of linguistics and of computer science in the State University of New York at Buffalo. He moved to Buffal'o from The RAND Corporation in 1968 after 13 years of research on language and computing.</Paragraph>
      <Paragraph position="8"> Hays is honorary member of the ~nternational Committee on Computational Linguistics, ed%tor (1974-78) of the Americaa Journal of Computational L'inguistics, and former chairman of NSF's Social Science Advisory Committee.</Paragraph>
      <Paragraph position="9"> THE PRESS offers no free literature, but is preparing to issbe a Newsletter.</Paragraph>
      <Paragraph position="10"> A $1 deposit will bring the fkst few issues, incfuding more about the hardware reviews. TIIE PRESS is located at Twin Willows, 5048 Lake Shore ~oad, Hanburg, New York 14075; the telephone number is 716-627-5571.</Paragraph>
      <Paragraph position="11">  My term as Editor expires, by my definition. at the end of the present calendar year. The AssocPation will choose a new Editor; at the same time, I think that some changes in operations are appropriate'.</Paragraph>
      <Paragraph position="12"> In the 1960s, I proposed Library development ; but photographic storage had time is now up.</Paragraph>
      <Paragraph position="13"> the use of ult~amicrofiches for I said, if I did not write, that a time limit; and the predicted To supplement my University salary, I: am organizing The Press at Twin Willows. The enclosure describes the earliest form o.f the venture; I hope for rapid evolution.</Paragraph>
      <Paragraph position="14"> It would be to say commercial advantage to act as publisher. for AJCL. I believe that if ACL adopts the word-processing and lexicographic businesses as areas of applied computar\. null tional linguistics the Association can grow and serve' a significant role in improvement of the common weal; and for The Press to help would be very pleasant and profitable.</Paragraph>
      <Paragraph position="15"> hs Editor, that I paid sity gave.</Paragraph>
      <Paragraph position="16">  for myself, The new Edi dam; in tha econdary pub ibuted the use of and some small h tor may have more t case, I should lications extract spwa and equipment elp thar: the Univer to offer , making The like to open negotiaed from AJCL.</Paragraph>
      <Paragraph position="17"> The Press cannot offer quite so much; it will be necessary to bill the Association for machine time and personnel costs. But only out-of-pocket costs will appear on invoices if the Association decides to deal with The Press.</Paragraph>
      <Paragraph position="18"> As for member services, we can continue microfiches; offer hard cdpy; move up quickly or slowly to typographicr quality ; issue newsletters along with qtiarterly journal ; and give online access to computer files. Most of that can be done immecktately, but some of it may have to wait a few months. It is up to the Association to say what it needs, if anything</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML