File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1008_metho.xml
Size: 64,628 bytes
Last Modified: 2025-10-06 14:11:10
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1008"> <Title>American Joual of Cornputat ienal Linguistics STRING TRANSFORMATIDNS IN THE REQU~T SYSTEM</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> STRING TRANSFORMATIDNS IN THE REQU~T SYSTEM </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Yorktown Heights </SectionTitle> <Paragraph position="0"> Microfiche 8 copyrrght 1974 by the ~ssociation for Computatianal Linguistics</Paragraph> </Section> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> The REQUEST System is an experimental natural language query system based on r. large transfo~ational grammar of English. In the original implementation of the system the process of computing the underlying sLructures of input queries involved a sequence of three steps: (1) preprocessing (including dictionary lookup), (2) surface phrase structure parsing, and (3) transformational parsing. This scheme has since been modified to permit transformational operations not only on the full trees available after completion of surface parsing, but also on the strings of lexical trees which are the output of the preprocessing phase. Transformational rules of this latter type which are invoded prior to surface parsing, are kno~n as string transformations.</Paragraph> <Paragraph position="1"> Since they must be defined in the absence of such structural markers as the location of clause boundaries, string transformations aye necessarily relatively local in scope. Despite this inherent limitation, they have so far proved to be an extremely useful and surprisingly versatile addition to the REQUEST System.</Paragraph> <Paragraph position="2"> Applications to date have included homograph resolution, analysis of classifier constructions, idiom handling, and the suppression of large numbers of unwanted surface parses. While by no means a panacea for transformational parsing, the use of string transformations in REQUEST has permitted relatively rapid and painless extension crf the English subset in a number of important areas without corresponding adverse impact on the size of the lexicon, the complexity of.the surface grammar, and the number of surface parses produced.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> TA-BLE OF CONTENTS Page 1. Introdnction 2. REQUEST System Organlzatlon </SectionTitle> <Paragraph position="0"/> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> References </SectionTitle> <Paragraph position="0"> String Transformations in the REQUEST System</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. IINTRODUC TION </SectionTitle> <Paragraph position="0"> The REQUEST (Restricted - English Que~tion-answering) Systeln [I, L] is an experimental natural language query system which is being. developed at the IBM Thomas J, Watson. Research Center. The system includes a large transformational grammar, a transformational parser, and a Knuthstyle semantic interpreter. The grammar and its associated lexicon are broadly oriented towards question-answering on periodic numerical data, they also include material specific to natural English interaction with collections of business statistics, as exemplified by the Fortune 500 The long-range objective of the work on REQUEST is to determine the extent to which mathine -unde rdstandable subsets of English can be developed to provide non-programmers with a convenient and powerful tool for accessing information in formatted data bases without having to 'Learn a formal query language. In the interest of facilitating effective &quot;under standing&quot; on the part of the system, the semantic scope of the English subset we are currently dealing with is largely restricted to the world of business statistics. Within that narrow domain of discourse, however, we are attempting to cover a relatively broad fange of syntactic and lexical alternatives, in the hope of permitting future users to employ their normal patterns of written expression -without major adjustment. The current REQUEST grammar covers a variety of basic English constrtittions in some depth, including wh- and yes -no questions, relative clauses and clausal negation It is now being extended into such areas as comparison, conjunction and quantification which, while complex, appear to be of central lrnportance in providing a sen~antically powerful subset of English.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. REQUES 1 System organization </SectionTitle> <Paragraph position="0"> The REQUEST System consists of a set of programs written in LlSF 1. 5 together with an associated set of data files containing the Lexicon, grammar, semantic interpretation rules and data base. The system Tuns interactively on a System/370 Model 158 under VM/370 in 768k bytes of virtual core. As can be observed from Figure 1, the system cmtain~ two major components, one transformational, the other interpretive.</Paragraph> <Paragraph position="1"> The transformational component, which serves to analyze input strings and con$pute their underlying structures, consists of two main parts: a preprocessor and a parser. The interprctive conlponent also Jr -v has two major subcomponents: (i) a semantic interpreter , which translates each underlying structure inta a logical form, i. e., a forl-nal expre ssion specifying the configuration of executable functions required to access the data base and compute the answer to the corresponding question 9,: * and (ii) a retrieval component which contains the various data-accessing testing, and output formatting functions needed ta evaluate the logical form and conlplete the question-answering proce s; s.</Paragraph> <Paragraph position="2"> Looking at the tr&nsformational component in somewhat greater detail, the tole of the preprocessor is to partition the input string into words * Implen~entation of the scn~antic interpreter, which operates according to a scheme originally proposed by D. E. Knuth [3], is due to S. R. Petrick [l, 4. 51, who has also devised the specific semantic interpretation rules employed in REQUEST.</Paragraph> <Paragraph position="3"> >:c z: F. J. Darr~erau is responsible for the design and implementation of the current retrieval component.</Paragraph> <Paragraph position="4"> ing a preprocessed string of lexica 1 trees which serves as input to the parser. Multi-word strings that function as Lexical units are identified by a &quot;longest match&quot; lookup in a special phrase lexicon; whl le the lexical trees corresponding to arabic numerals (which may variously represent ca rdinals, ordinals, or year names) are supplied algorithmically rather than by matching against the lexicon. In cases where there are gaps in the preprocessede string, due to the presence in the input of misspellings, unknown words, ambiguous pronoun references, and the like, the prepro cessor prompts the user to supply the required information.</Paragraph> <Paragraph position="5"> * Operation of the transformational parser proceeds in three stages: (1 ) The preprocessed string is successively analyzed with respect to thc structural description of each rule in a linearly ordered list of string transformations. Each successful match against a string transformation leads to modification of one or more of the trees in the pre-processed string through application of the operations specified in the structural change of the rule in question -- operations which are drawn from precisely the * The original design and implementation of the parser are due to Petrick 161.</Paragraph> <Paragraph position="6"> The version currently being used in REQUEST is the result of significant revisions and extensions by M. Pivovonsky, who (vzith the aid of E. 0. Lippmann) has also been chicfly responsible for implementing the preprocessor.</Paragraph> <Paragraph position="7"> same inventory of elementary transformatiops that the system makes available for the processing of' full trees by conventional cyclic and postcyc lic transformations, namely: deletion, replacement of a tree by a llst of trees. Chomsky adjunction, feature insertion, and feature deletion.</Paragraph> <Paragraph position="8"> (A more detailed account of the nature of string transformations and the motivation fa^ their use in a transformational parser will be presented in the remaining sections of the paper. ) (2) Upon completi~n of the string transformation phase, the resulting transformed preprocessed string--still in the form of a list of trees -- is passed to a context-free parser in order to compute the surface structure(s) of the sentence. (Although one major effect of the employment of string transformations has been a substantial reduction in the number of unwanted surface parses, cases still occur with some frequency where more than one surface parse is produced. ) (3) Finally, the transformational parser processes each surface structure in turn, attempting to map it step by step into a corresponding underlying structure according to the rules of a transformational grammar.</Paragraph> <Paragraph position="9"> In this process transformational inverses are applied in an order precisely, opposite to that in which their &quot;forward&quot; counterparts would b? invoked ili sentence genenation: inverses of the postcycliC transforma'tions are applied first, starting with the &quot;latest&quot; and ending with the J'earliest&quot;; then the inverses bf the cyclic transforn~ations are applied (also in last-to-first order) working down the tree from the main clause to those that arc most deeply embedded.</Paragraph> <Paragraph position="10"> To help cnsurc validity of its final output, the parser checks each intermediate output produced by successfsl application of an inverse tdansformation to determine whether or not its constituent structure conforIms fully with the set of branching patterns that can be generated by the current grammar in underlying or intermediate structures. At the end of each inverse cycle, a similar check is performed to determine whether all structure above the next (lower) level of embedded S s is consistent with the inventory of allowable underlying structure patterns alone. Failure of either test results in immediate abandonment of the current analysis path. (A6 described in 121, other. more stiingcnt teqsts involving the application of corresponding forward transformations can optionally be invoked in orde r to provide a more definitive validation of inverse transf~rnlational derivations.</Paragraph> <Paragraph position="11"> 3. Motivation for the Introduction of String Transfornlations Within the series of major processing steps described in the preceding section, the application of string transfor~nations occurs at a point midway between preprocessing (including lexical lookup) and surface phrase struc ture parsing. Taken in sequence, these three steps have the cumulative effect of shifting the locus of analysis opexations from the dolnain of word strings to that of full sentence trees, where conventional transfor~mations (and their inverses) can meaningfully be invoked. Unlike the balance of the transformational parsing process, these three preliminary steps do not seem to bear a, direct correspondence to familiar generative operations. Nevertheless, their combined effect is to produce the tree or trees which exist at that stage of the &quot;farward&quot; generation where the last postcyclic transformation has applied. Accordingly, it seems reasonable to view them initially as constituting a kind of &quot;bootstrap&quot; whose function is to set the stage for '-'true1' transformational parsing.</Paragraph> <Paragraph position="12"> Prior to the introduction of string transformations in the REQUEST System, the entire burden of the &quot;bootstrap''- role just outli'ned necessarily fell on the preprocessor and the surface parser. lMoreover, as will be esplained below, certain basic principles concerning the nature of the system's transformational component -- relating to the range of inputs to be accepted and the criteria for satisfactory outputs -- had the effect of ensuring that the burden would be a large one. The full dimensions of the situation began to emerge orice extensive testing of the first sizeable trans formational grammar was underway. There followed a series of corrective actions, the last and most far-reaching of which was the introduction of string transfornlations.</Paragraph> <Paragraph position="13"> 3. 1 In the early design phases of what subsequently became the REQUEST System's transformational grammar, it was decided to adopt a level of underlying structure considerably more abstract than the deep structhre of Chomsky's Aspects [7] - - a level which, somewhat in the spirit of generative semantics 18. 91, would go a long way towards direct representation of the meanings of sentences. Eschewing irrelevant details, the essentials of the representation adopted (which bears certain strong resem blances to the predicate calculus) are as follows: Each underlying structure tree represents a proposition (category Sl ) consisting of an underlying predicate (V) and its associated arguments (NP's) inbthat order. Argument slots are filled either by embedded propositiolls (conlplcnlent Sl'2) or by nominal expressions (MM's ). A nominal expression direcily donlinates either a NOUN, or a NO14 and an S1 (the relative clause construction). Each NOUN dominates an INDEX node which is specified as a constant (t CONST) in the case of proper nouns and as a variable (- CONST) other wise. The INDEXes and the terminal nodes they dominate play an inlpor .I.</Paragraph> <Paragraph position="14"> 1tant role in the grammar, including the representation of coreference One major impact which this view of underlying structure had on what the &quot;bootstrap&quot; had to accomplish involved the connection of pepositional phrases to the balance of the surface structure tree. In underlying structure, the noun phrase corresponding to each surface prepositional phrase would appear as a sp~cific argument in a specific proposition, follawing. the application of the gcnerativc tra11sfor1natior.l.s svl~osc inverses the parser would cm~ploy, thc resulting prepositional phrase rvould in 1110st cases still be explicitly linked to the clause or clausal remnant derived from that underlying proposition. Thus, in order to make possible a correct inverse transfermational derivation, the surface parser would have to make all snch linkages explicit. This requirement represented a significant depar ture from earlier practice in a number of phrase structure parsing systems, notably those employing predictive analysis 10 1 1 1, where the problen~ of connecting prepositional phrases to the correct level of structure was simply ducked by making an arbitrary linkage to the nearest available * Much of the early work on the grammar, in particular the svstem of variables and constant.^, reflects surrrrestions bv Paul Postal.</Paragraph> <Paragraph position="15"> candidate, therehy avoiding what would inevitably have been a large increase in the nurrlber of unwanted analyses. (A similar approach has recc~ltly been fbllo\-ved in tl~c ATN parser of Woods Lunar Sciences Naturaf Languagc Information System [I 21, but there the semantic interpreter is made to pick up the slack. ) A second design principle which had a major impact on the ~tlechanisms for computing surface structures from input strings was the alfeady-mentioned goal of providing broad coverage of syntactic alternatives to promote ease of use. (As should be fairly obvious, expansion of grammatical coverage -- even in a restricted domain of discourse -- ill general entails not only an incrcase in the size and complexity of lexicon and surface grammar but alsg an increase in the potential for lexical and syntactic ambiguity. ) Two classes of syntactic alternatives whose coverage at the surface syntax level led to specific problems.ultimate1y resolved by the use of string transfornlations were strahded prepositions and clas'sifier constructions. In both cases the problems stemmed from the introduction of new posslbilitieb for incorrectly connecting a preposition or prepositional phrase to the balance of the surface structure. Stranded prepositions occur with some f~equency in wh-questions and relative c1auscs.in English often yielding results whose naturalnes s compares favorably with that of thc correspontling non-stra~lcied versions, as in (1 ) (3) below. Becausc ~f these circumstances, we felt obfiged tu provide far such coxlstructions (1) a. Whatcompanies didXYZ selloilto? b. To what companies did XY Z. sell oil? [2) a.. What was the city which ABC's headquarters was located in in 1 969? b.</Paragraph> <Paragraph position="16"> What was the. city in which ABC's headquarters .was locakd in 19697 (3) a. What company was Mr. Jones the president & in 1972 b. ? Of what company was Mr: Jones the president in 19727 even in early versions of our grammar. The case for including classifier constructions, in which proper nouns are optionally accompanied by a cornman noun designating their semantic class [cf. *e (a) versi'ons of (4)- (7)), did not seem quite as c~mpelling as that for stranded prepositions, since (4) a. the City of Sheboygan b. Sheboygan State '(5) a. the - Co-mmonwe'8lth of Massachusetl b. Massachusetts Company 6 a. (the) Tentacle - Corporation b. Tentacle (7) a. the yeay (of) - 1965 b. 1965 the versions with classifieas have a formal, slightly pedantic quality that is absent from their classifier-less counterparts. Nevertheless, there appeared to be no reasonable grounds (such as obscurity, doubtful grammaticality, and the like) for excluding them from the subset. A third factor affe.cting the performance of the &quot;bootstrap&quot; was the conscious decision to try to get along initially with a shrfac.e parser which would be maximally simple with respect to both its computat-ional mechanism and its surface phrase structure grammar. In particular, this rneant empiuyment of a context-free parser without eith.er the complications or the benefits af sensitivity to syntactic and semantic Zeatures [I 1, 131. The hope was that any additional surface parses which resulted from this approach would be effectively filtered out during tfansformational parsing by the various well-formedness checks on inverse derivations discussed at the end of Section 2.</Paragraph> <Paragraph position="17"> 3. 2 Early Experience with the Parser Starting in late 1971, tests began on an inverse transfornlat~onal grammar whose generative counterpart had been developed with the aid of Joyce Friedman's transformational grammar tester [14] . In the interest of debugging the system with as few .unnecessary con~plications as possible, the initial examples were &quot;spoon fed&quot; to the parser using a minimal lexicon and surface grammar.</Paragraph> <Paragraph position="18"> While revealing no critical problems with the bootstrap, these first trial runs indicated that incorrect surface structures were indeed produced along with the correct ones and tended to give rise to analysis paths which continued for some time before being aborted by well-fornledness tests. Sentences with ambiguous verb forms were a case in point. Thus, in the question &quot;What companies are computers made by? 'I the surface parser produced two almost identical structures -- the first with &quot;made&quot; taken asea finite verb in the past tense, the second with it taken (correctly) as a past participle. T.he first analysis initiated a lengthy inverse derivation that was terminated as ill-formed only after the entire postsycle and the first inverse cycle had been traversed, meaning that nearly as much time was spent in pursuing this incorrect path as was required to follow the correct one. In this and a number of similar cases, however, it was observed that ill-formedness of the surface structure could have readily been detected at 02 near the outset of the transformational parsing process by performing testg employing the pattern-matchin power of transformational rules. This observation led to the introduction of so-called blocking rules in the transformational grammar, rules which proved to be quite effective in detecting and filtering out ill-formed structures such as the incompatible auxiliarylfinite verb combination in the example just considered, I11 the spring of 1972, the surface grammar was greatly expanded in an attempt ta cover the full range of structures that could be produced by thk set of transformational rules then in use. At that point, the mmbined effect of the various design decisions affecting surface structures and surface parsing became immediately and painfully evident in the form of a combinatorial explosion: The brief and apparently innocuous question (8) (8) &quot;Is the headquarters of XY% in Buffalo? I' produced no less than 19 surface parses, a figure which soared to 147 when a third prepositional phrase was introduced by replacing &quot;Buffalo&quot; with the classifier construction &quot;the city of Buffalo&quot; Although a blocking rule for detecting spurious stranded prepositions rather quickly killed off 16 of the 19 analyses in the former case, thereby reduoing the analysis problem to tractable size, the system was unable to cope with the latter situation at all, due to problenls of storage overflow.</Paragraph> <Paragraph position="19"> Thoughts of what would inevitably happen if we added yet another prep ositional phrase (as in &quot;Was the headquarters of XYZ in the city of Buffalo in 1971? &quot;) made it clear that killing off unwanted surface parses after the fact by means of blocking rules was not enough, measures would have to be adopted which would prevent formation of most such ai~alyses in the first place. Two corrective steps were taken almost immediatey: (a) coverage of classifier constructions was temporarily dropped, and (b) it was decided to explore whaf could be clone towards eli'n~ination of spurious surface parses through selective refinement of category distinctions in the surface grammar.</Paragraph> <Paragraph position="20"> In the latter area, it was discovered (not surprisingly) that differences in the surface structure distribution of prepositional phrases, genitive noun phrases, anu other types of noun phrases could be effectively exploited to suppress incorrect parses, as could distributional contrasts between proper nouns ahd common nouns, finite verbs and participles, et~. (In the case of (8) above, 13 of the original 19 parses were ruled out on the ground that proper ,nouns cannot take modifiers, while 3 more analyses (plus 4 of the 13 already eliminated) were excluded oq the basis of distribu tional distinctions between prepositioxlal phrases and other noun phrases. ) In~plen~entation of the refinements in the surface grammar required nmerons part-of-spcech code changes in the lexicon and a substantial increase in the number of rules in the surface grammar.</Paragraph> <Paragraph position="21"> Beyondthis, the central problem was that the transformational grammar defines a specific chss of surface structures -- employing only elements from a fixed set of intermediate syn~bols -- as the parses which must be found.</Paragraph> <Paragraph position="22"> Iri order to meet this requirement, the by now considerably expancicri set of intermediate symbols e.mploycd in the surface grammar had to be rnapped onto the smaller set ~o~rt~patiblc with the transformations. Thus, for example, PP (prepositional phrase) and NPG (genitive noun phrase) nodes in each surface structure would be replaced by NP nodes before transformational parslng began -- lortunately an extremely simple and rapid operation. (In the most re-cent version of REQUEST, the surface grammar employs a total of 32 temporary node names for this purpose; they are subsequently mapped onto a set of only 9 nodes for purposes of transformational parsing.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Problems of Growth of Coverage </SectionTitle> <Paragraph position="0"> The various measures just described had the effect of stabilizing the incidence of artificial surface structure ambiguities at a tolerably low level for a pecriod of about a year, during which the transfonmational grammar roughly doubled in size from about 35 rules to over 70 as coverage was cxtcndsd to include such structures as numerical quantifiers, time eosnpounds, and various expressions involving rank ahd ordinality. The principal costs of ambiguity suppression were felt not in the analysis programs, which required only negligible modification for that purpose, but rather in the surface grammar, which grew much larger and more complex to the point where it became rather difficult to work with. Since a number of additional extensions of grammatical coverage were under active consideration - - among them the rest~ration of classifier constructions to the subset -- it seemed desirable to seek out some nebw approach to ambiguity suppression which? would not further overburden the surface grammar.</Paragraph> <Paragraph position="1"> The alternatives originally considered were uniformly unattractive.</Paragraph> <Paragraph position="2"> In he case of the classifier constructions, one could have achieved the immediate objective by simply loading up the phrase lexicon with an entry for each legitimate pairing of a classifier with a proper noun, thereby achieving a mihor gain in grammatical coverage at the price of more than doubling the size of the lexicon.</Paragraph> <Paragraph position="3"> Another approach would have involved creating phrasal entries only for the classifiers themselves -- e. g., &quot;the city of&quot;, the state of&quot;, etc. -- leaving it to special ad hoc routinas at the end of the preprocessor first to check the preprocessed string for the presence of immediately following proper nouns of the corresponding semantic class and then to effect the appropriate amalgamations or deletions. The second altert~ative was quickly rejected as even more distasteful than tEe first, since despite its relatively small initial :ost, it would, if used at all extensively, have mcant abandonment of an otherwise orderly and perspicuous analysis procedbre. This train of thought, however, eventually led to the idea of modifying the preprocessed string not by ad hoc subroutines requiring accretions to the program, but hy means of locally defined transformational rules employing the same computational apparatus and notational conventions as the existing forward and inverse transformations. Within a week of its conception, the idea of a string transformation facility became a reality through some minor modifications to the flow of control of the parser.</Paragraph> <Paragraph position="4"> % Much of the ease of this transition stemmed from the generality of the original properanalysis mechanism, which was designed to accept a list of trees, rather than a single tree, as it3 input.</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="11" type="metho"> <SectionTitle> 4. The Use of String Transformations in the REQUEST System </SectionTitle> <Paragraph position="0"> >h Because they apply to strings of unconnected lexical trees, rabher than to full surface trees with their representation of the structure of phrases and clauses, string transformations tend to be relatively local in scope, typically being restricted to constructions with contiguous elements. Despite this inherent limitation, such rules rapidly found a wide variety of uses within the REQUEST System: Classifier constructions were readily identified md transformed into clas sifierle ss counterparts by a handful of string transformations. Rules were also written for suppressing incorrect stranded prepositions, resolving homography, and translating certain idioms into a form more manageable for the surface parser. Finally, ex periments were undertaken to explore the possibility of employing string transformations to deal with a limited but potentially useful range of conjunction construations.</Paragraph> <Paragraph position="1"> A conlmon thread running through several of these apparently diverse applications of string transformations is the application of what would otherwise have been treated as the inverse of a late postcyclic transformation at a point preceding surface structure parsing in order to achieve a *~t Least initially. Some string transformations currently in use produce what are in effect partial surface structures as output. In fact, it is quite pos'siblc that an appropriately chosen cyclically ordered set of string transformations could supplant the surface grammar cntirely, however, such a dcvclopment appears unattractive at this time due to efficiency considerations.</Paragraph> <Paragraph position="2"> simplification of the surface grammar, a reduction in the number of spurious surface parses,. or both. (The benefits of such a reordering sten] in large part from the fact that derived constituedr stfucture patterns provided for at the string transformation level need not be dealt with in the surface grammar, thereby reducing its size, its scope, and its potential for producing iocorrect surfqce parses.) In the case of classifier construc tions (section 4.1) and of certain idioms involving notions of rank (Section 4. 4), existing postcyclic transformations were actually replaced by string transformations; while in the case of stranded preposition prevention (Section 4. 2), a string transformation was made to assume much of the load of an existing postcyclic blocking rule, resulting in a highly beneficial elimination of unwanted surface parses in both instances. In other situations, such as those involving homograph resolution (Section 4. 3 ) and the treatment of the first group of idiom-processing rules discussed in Section 4. 4, a correspondence of string transformations to locally -defined postcyclic tqansformations, while potentially possible, dl4 not actually exist, since no attempt had been made to cover the constructi~ns 133 question prior to the introduction of string transformations.</Paragraph> <Paragraph position="3"> 4. 1 Classifier Constructions The string transformations relating to classifier constructions are exemplified by-the rule &quot;City, State, Year Classifier &quot;, whose statement is displayed in Figure L using a hybricl tree/llst notation in order to enhance legibility. Like' all transformations in the REQUEST System, thls rule consists of a list with five main sections: header, structural pattern, c-ondition, structural change, and feature change. The header, which serves to identify the rule and a number of basic attributes governing its applfcation, is in the form of a list comprising the name, type (FORW, INVDIR, INVINIIIR, STRING, or BLOCK), optionality (OB or OP), and mode (ALL, ANY, ONE, NA, or REANALYZE-) of the transformation.</Paragraph> <Paragraph position="4"> Thus the rule CSYCLSFR is labeled as a string transformation whose execution is - obligatory for all matches that may occur in the list of trees being processed.</Paragraph> <Paragraph position="5"> The structural pattern (possibly qualified by further constraints expressed in the condition section) defines the domain of applicability of the transformation in the fbrm of a list of pattern elements, each specifying a tree or class of trees. For a match to occur, it must be possible to partition thc input tree (or list of trees) into a list of non-overlapping, adjacent trees each of which matches the corre sponcling pattern element.</Paragraph> <Paragraph position="6"> Thus, the structural pattern in Figure 2 indicates that the rule CS'YCLSFR requires that the preprocessed string be partitionable into the following six-segme'nt sequence: (1 ) an arbitrary initial segment (possibly null) designated (X . 1 ) , (2) an occurrencb of thc definite article THE, (3) a common NOUN (already represented-in our surface structure as dominating Header: (CSYCLSFR STRING OB ALL) an underlying predicate (V) and a-n INDEX) which happens to be one of the three classifiers CITY, STATE, 02 YEAR, '(4) an occurrence of the preposition OF: (5) an INDEX bearing one of the feature pairs (+ CITY), (t STATE), or (t YEAR) (the absence of a preceding V node here is suffi cient to guarantee that any matching item will necessarily be an INDEX (t CONST )' - - i, e. , a proper noun); and (6) an arbitrary (pos slbly null) final segment, designated by (X . 7). The condition adds 'the further stipu$ null lation that the value of the variable ORX be compatible with node 3 in the pattern -- i. e., the proper noun must belong to the semantic class designated by the classifier.</Paragraph> <Paragraph position="7"> The structural change pf a transformational rule may be stated in one of two wayso fI) If the change is relatively simple fas here) it may conveniently be stated in 5he form of two lists of numerals referring to the correspondingly labelled elements of the structural pattern. The first list identif2es the elements under conslderatior the second list (whic'h must contain the same number of elements as the first) specifies what (if anything) happens to each of them -- replacement, deletion, sister adj-unction tk another element, etc. In the case of CSYCLS~R, the change specified is the * In addition to providing variables ALPHA, BETA, and GAMMA, ~hich range over thc set d feature values {t 1, 8 the notational system of the REQUEST transformational co~nponent includes the variables ORX, OR^ and ORZ, which range over sets of (feature value, feature name) pairs.</Paragraph> <Paragraph position="8"> deletion of the trees whbse top nodes arc labelled 2, 3 4, and 5 (including by convention, any higher nodes which dominate only deleted nodes).</Paragraph> <Paragraph position="9"> Thus the effect of the rule is to eliminate all classifiers of the designated type from the preprocessed string: Alfernatively the struttural change may be expressed as a list of elementary operations, drawn from the set REPLACE, DELETE, LCHOMADJ, RCHOMADJ), and their arguments. This notation is typically employed when fixed trees are inserted (although the first option may still be taken in such cases) and is obligatory whenever a choice is made among alternative structural changes by evaluating one or more conditional exprkssions. Had this second option been talcen in the case of the present rule, its structural change would have read: ((DE'LETE 2)</Paragraph> <Paragraph position="11"> The fezture change section of each transformation is always expressed as a ligt of elementary operations which are members of the set {INSERT, DELETE}, together with their associated arguments. Where no feature change is associated with a rule, as is the case for CSYCLSFR, this final sectiorr of the rule statement is specified as NIL, the eppty dst.</Paragraph> <Paragraph position="12"> (The structural change and condition sections of transformations can similarly be defined as NIL, clenofing that the tree structure remains unchanged and that- there are-no extra conditions on ao~licability, respectively. ) Two other classifier-deleting string transformations which are very similar to &quot;C<ty, State, Year Classifier&quot; are the rules &quot;Year Cltissifier'l (Y RCLASFR) and &quot;Company Classifier&quot; (COCLASFR) The former de lctes the lesical trees corresponding to the underlined material in examples like &quot;. . . the year 1968. . . &quot;, while the latter does the same I thing in examples such as . . . bhe) - American Can Company. . . . Although the underlyidg predicate COMPANY is the only one specified in the struttural pattern of COCLASFR, the rule actually applies to instances where a form of either of the words company&quot; or&quot;'corpoi.ation&quot; has been uwd in the input string, owing to the fact that the lexicon assigns the same underlying predicate to both in recognition of their synonymity &quot;City State Blockff (CSBLOCK) and &quot;City State&quot; (CITYSTATJ are ttyo rules, related to the preceding ones, which illustrate additional aspects of the system. Both of these rules follow CSYCLSFR in the list of string kransformations. As indicated by its header information-, (Figure 3(a)), CSBLOCK is a blocking rule (BLOCK), which entails that it is obligatory (OB) and will result in termination of the current analysis path if the structural pattern matches the preprocessed string at least once. The structural pattern +s identical to that for CSYCLSFR save for the omission of the alternatives relatigg to EUhe predicat'e YEAR and the feature (t YEAR) Due to the parallelism of the structural patterns and the relative ordering of thc two rules, it is necessarily the case \hat CSB.LOCK -Header: (CSBLOCK BLOCK OB QNE)</Paragraph> <Section position="1" start_page="11" end_page="11" type="sub_section"> <SectionTitle> Stl'uctural Pattern: </SectionTitle> <Paragraph position="0"> ((X. 1 (THE. 2) NOUN (OF . 5) ((INDEX . 6) (x. 7))</Paragraph> <Paragraph position="2"> will apply if and only if the classifier and the following proper noun do not correspond (any corresponding c lassifiers having already been dc- null leted by CSYCLSFR). Thus CSBLOCK has the effect of aborting analyses where a proper name known to the system as designating a state has been classified as denoting a city, or vice-versa The rule CITYSTAT does not refer to classifiers as such, but it does deal with a proper noun construction even more important for our particular subset: the precise identification of a specific city by appending the appropriate state name to the city name. This construction is essential in distinguishing among such cities as Portland, Maine and Portland, Oregon, not to mention the eighteen varieties of Springfield in the con.I. .I .,' .,E tinental United States The structural pattern of the rule (Figure 3(b)) specifies a domain consisting of a city name ((INDEX . 6) (+ CONST + CITY)) followed by an optional comma, followed by a state name (INDEX (t CONST proper noun, if the user were not previously asked by the system to resolve it, as is our current practice, ** Cf, rxerence 15.</Paragraph> <Paragraph position="3"> *:k# The structural variable W is employed in struct\iral patterns in place of the more usual X whenever one wishes to specify the occurrence of precisely one unknown tree.</Paragraph> <Paragraph position="4"> state name a single tree (W . 4).</Paragraph> <Paragraph position="5"> As indicated by the structural change, each match results in the replacement of the tree labelled 2 by a list of trees consisting of itself and the tree labelled 4, thereby pairing the state name with the city name by what amounts to right sister adjunction. The optional. comma (COMMA 3) and the state name (W . 4) -- plus, by the convention cited earlier, the structure dominating r~ -- are deleted. Finally, the feature (t CITYSTATE) is added to the featurg'list of the node (INDEX . 6), where its presence wilL eventually be noted by the semantic interpreter as rewiring a match on both elements of a (cityname, statename) pair in the data base. As far as the transformational corriponent is concerned, the net effect of the rulg is,.to make &quot;city, state&quot; constructions pass through both fhe surface parser and the inverse transformatiohs as though they were simple city names, 4. 2 Stranded Prepositions &quot;Stranded Preposition Prevention&quot; (Figuse 4) is a string transforrnatibn designed to prevent surface structure parses in which non-stranded prepositions are erroneously anaalyzed as stranded ones.</Paragraph> <Paragraph position="6"> Since most prepasitions, whether stranded or not.</Paragraph> <Paragraph position="7"> are obligatorily present in surface structures, this rule necessarily reflects an approach very different from the &quot;recognize and delete&quot; strategy employed in the string transformations involving classifiers. What is done here is to assign new word class codes to those prepositions determined to be non-strandable, and towrite the surface struct'ure rules for the new codes in such a way that they are only allowed to combine with a following noun phrase.</Paragraph> <Paragraph position="8"> Expressed in ~dinary English, the statement of the rule reads about as follows: &quot;Replace the word class code of each preposition by the cdrresponding code for non- strandable prepositions except where the preposition immedihtely precedes an auxiliary, a punctuation mark, a verb form, or another ~repn~ition, assign any locative feature associated with the original word class code to the new word class code&quot;. As staicd -- and as currently implemented -- the rule may well be at once both too weak and too strong, at least in+an absolute sense. It is probably too weak in that it will fail to label as non-strandable any preposition which immediately precedes a noun phrase beginning with an adjective (VADJ), as, for example; in the sequence &quot;to large companies&quot;. This sort of deficiency is of littLe consequence, however, since the rule will serve its purpose well if it fails to catch an occasional non-strandable preposition, leaving things as ambiguous as before in those cases.</Paragraph> <Paragraph position="9"> Excessive strength, in the sense of marking some stranded prepsition as non-strandable, is potentially a much more serious flaw, since it precludes obtaining a correct analysis in such instances. Examples such as (9), where SPRPPREV would fail in just this way by applying incorrectly, are not particularly difficult to think up.. However, the (9) Was the company XYZ bought ballbearings from a subsidiarv 01 Universal Nut & Bolt? great majority of such examples -- including (9) -- seem to be irrelevant to the present REQUEST data base. Thus, while it is clear that our initial rule for stranded preposition prevention does not provide anything approaching a general solution to the problem, it does appear to be working satisfactorily for the moment in eliminating artificial surface ambiguities within a narrow domain of discourse.</Paragraph> <Paragraph position="10"> 4. 3 Homograph Resolution One of the sinlpiest and yet most useful of the 33 strlng transformations inl the current version of REQUEST is the rule &quot;Ordinal Formation&quot; (ORDFORM). Its function is to match on each string consisting of an arabic numeal immediately followed by any member of the set of English ordinal-forming suffixes {d, nd, rd, st, th) and mark the sequence as an 0-rdinal numeral. The operation of ORDFORM (Figure 5) is entirely straightforward. By this poht in the analysis process, all arabicnumerals have already been assigned lexical trees dominated by the node (VADJ (t CARD)) -- the combination denoting a cardinal numeral -- during the input scanning phase of the preprocessor; while the ordinal-forming suffixes have been assigned .trees dominated by the category ORD during' Header: (ORDFORM STRING OB ALL) the lexical Lookup phase. ORDFORM simply finds each instance in the pre-processed string where a (VADJ (t CARD)) immediately precedes an ORD, hletes the ORD tree, and changes the feature on the VADJ from (t CARD) to (t O.RD), thereby identifying that item as an ordinal numeral rather than a cardinal.</Paragraph> <Paragraph position="11"> The approach just described has the advantage of putting an unlimited set of ordinals at the disposal of the user at negligible cost, involving a few very minor additions to the lexicon and none at all to either the surface grammar or the preprocessor. The alternate of using a postcyclic transformation instead of a string transformation to achieve the same coverage was avoided because it would have imposed the additional requirement that the surface grammar be significantly enlarged through the inclusion of at least three new category symbols (for cardinals, ordinals, and ordinal suffixes) along with a set of context-free rules describing their distribution. Although identification of ordinal numerals of this type could also have been effected by buildingrthe appropriate tests directly into the preproce*s sor, the Zatte r altcrnatlve would have been much less attractwe than the string transformation approach for at least two reasons: First, it is inhere-ntly pessier to bury suc-11 operations in a special program subroutine than to deal with them as just another transformational rule. Second, and more important, is the fact that the latter approach makes the system less general and flexible, since material specific to English is directly reflected in the. structure of the program itself, rather than being confined to the grammar, where it is readily accessible to the liaguist who may wish to modify it -or replace- it by material describing some other natural language.</Paragraph> <Paragraph position="12"> Another string transformation currently employed to resolve word class homography on the basis of local context is the rule &quot;Cardinal Noun ' (CARDNOUN), which will be discussed only briefly here. The rule distinguishes instances where a cardinal numeral functions as a proper noun (1 0) from those in which it serves as a nu~nerical quantifier pf a following nominal expression (1 1 ). It does so by checking the immediatd right-hand cbntext of each (VADJ (+ CARD)) for the presence of</Paragraph> <Paragraph position="14"> WPat companies employed at least 200, 000 people in 19737 items (such as articles, ausiliarie si punctuation, and verbs ) which are incompatible with the latter possibility, replacing the VADJ structure by a correspondilrg proper noun structure whenever a match occurs.</Paragraph> <Paragraph position="15"> (CARDNOUN follows ORDFORM in the list of string transformations in order to take advantage of the latter's replacement of certain cardinals by corresponding ordinals. ) 4. 4 Idiorrl Proces sing By their very definition, idiomatic expressions are items which present problems in grammatical analysis, sernan'tic interpretation, or both. Although it would be very tempting to exclude all constructions of this sort from the English subset of REQUEST, the currency and naturalness of many idioms is so great that such a prohibition would entail abandonment of our goal of permitting future users to employ their normal patterns of expression.</Paragraph> <Paragraph position="16"> For idioms such as &quot;make money&quot;, (in the. sense of &quot;be profitable&quot;), where the components are adjacent and the number of paradigmatic variants are few, one possible approach is to deal with the problem by putting appropriate entries in the phrase lexicon. For example, the entry for &quot;makes money&quot; in our present lexicon treats that combination as aan intransitive verb in the present tense and singular number which dominates the same underlying predicate and has the same selectional features as the adjective &quot;profitable&quot;. Even in such a relatively straightforward case, however, it is not difficult to think of minor extensions, such as the inclusion of negatives (&quot;make no money&quot;), which will at least require another set of pHrasa1 entries. Moreover, the phrase lexicon approach breaks down con~plctely as soon as one deals with an idiomatic construction :hat includes afl open class as one of its components, producing a situation parallel to that encountered earlier for classifier constructions. The attempt to provide broad coverage of constructions involving notions of rank and ordinalitty Led to the consideration of a number of comnlon idiomatic j~atterns inc luding arbitrary cardinal or ordinal numer als. These patterns, three of which are illustrated in (1 2) , were eventually dealt with succe s sfully by the development of string trans forma tions designed not-only to cope with ther syntactic peculiarities but to List the top LO companies in 1973 growth rate! set the stage for corfect semantic processing as well.</Paragraph> <Paragraph position="17"> The nature of these idiom-proces sing transformations is perhaps best illustrated by considering EUhe rule &quot;Top n&quot; (TOPN), whose statement appears in Figure 6. The structural pattern of TOPN specities a sequence of elements consisting of an initial arbitrary string of trees final arbitrary string of trees (X . 7).</Paragraph> <Paragraph position="18"> The structudal change includes a replacement and two deletions.</Paragraph> <Paragraph position="19"> The syntax of a replacement operation is of the form (REPLACE < list of trees > <tree > ) ; its execution results in the replacement of klqe item corresponding to tree - by the items corresponding to list of trees. The replacement operation in TOPN is therefore to be understood as follows: The non~ix~al espFession tree in the input which rnatchcs the pattern element (NOM . 5) is repraced by a list of elements consisting of itself, followed by lexical trees corresponding to (i) the -ing form of I I the verb rank&quot;, (ii) the ordinal nunieral &quot;first&quot; (where the (NQUOTE 1 ) notation cayscs the &quot;1&quot; to be interpreted as literal, rather than as .a refrrencc to the pattt\rn clerne~lt ( . 1 I), (iii) the preposition &quot;through&quot;, and (iv) the ordinal numeral corresponding to the cardinal which matched ((VADJ . 4) (4- CARD)) in the structural pattern. The two deletion operations remove the lexical trees for the cardinal numeraI and the adjective 'top&quot; from the preprocessed string.</Paragraph> <Paragraph position="20"> In the case of (IZc), the overall effect of this structural change is to replace the string of lexical trees corresponding to &quot;the top 20 companies&quot; by themstring of trees corresponding to &quot;the companies ranki-ng (1st through 20th&quot;. A subsequent string transformation called &quot;Rank Interval&quot; (RNKINTVI,) , operating in a fashion similar to that of &quot;Clty State&quot; cf. Section 4. 1 ), then transforms the trees corresponding to &quot;1 st through 20th&quot; into a single ordinal numeral free (bearing the feature (; INTERVAL)) which dominates the numerals &quot;'1 'I and &quot;20&quot; As a result of these operations both surface andatransformational parsing of such examples has become completely routine; while their semantic intcrpretafion has required only the addition of a simple mechanism -triggered by the feature (t INTERVAL) --$forbgenerating a de/nse set of integers from its endpoints.</Paragraph> <Paragraph position="21"> Another group of string transformations involving ran!< are derived from what were originally late ppstcyclic transforlhations. The three, rules in question -- &quot;Eirst Superlative&quot; (FIRSTSUB) &quot;NtV Superlative&quot; (NTHSUPER) , and *&quot;Nth Place&quot; (NTHPLACE) -- collectively serve to restore the various deletions illustrated in (1 3 ) .</Paragraph> <Paragraph position="22"> The prime motivation for shifting these rules from the postcycle to a point preceding surface parsing was that the structure and distribution of the various phrase remnants resulting from the deletions are at best difficult to desdribe within the framework of a context-free phrase struc ture grammar. Avariety of adhpc mara at us, including specialword class codes for the verb &quot;rank&quot; and for superlative ac ectives, as weell as special phrase names for such sequences as &quot;the t superlatilre&quot; and &quot;ordinal numcfal + superlative&quot; , would have to be inttoduced ih order to provide broad coverage witl~out an accompanying ~o~binatorial explosion. By restoring the deletions before surface parsing, however, such distasteful and complicated measures are entirely avoided, since lexical categories are left unchanged and the surface parser has to do no more than parse an ordinary prepositional phrase in the position following the verb.</Paragraph> <Paragraph position="23"> 4. 5 E~periments in Limited Conjunction Processing -- As was mentioned in the introduction to this paper, one of the principal directiohs ih which we are currently seeking to extend the English subset accepted bv the REQUEST System is in the caverage of (coordinate) conjunction constructions. The fact that the underlying variety and complcxity of these constructions mds to be masked by superficial similarities makes a selective, piecemeal approach to their coverage a generally-dubious move in a system swch as RJjXXJEtST, whose eventual users can hardly be expected to make distinctions that may not be immediately obvious even to a trained linguist. Despi-te strong reservatiops on- this point, it was decided to employ the string transformation mechanism to deal with an extremely limited range of coniunction constructions on an experimental basis.</Paragraph> <Paragraph position="24"> The range of constructions chosen was confined to conjoined proper nouns exclusively, subject to the further constraint that all terms of a given conjunction beomembers of the same semantlc class - i. e., for the current data base, either company names, city names, state names or year names. While undeniably highly limited m scope, this particular inc renlcntal inc rcase in grammatical coverage (if successful) had three distinct merits: (1 ) it app~ared to be compatible with the adjacency constraints of string transformations, .owing to the tendency of proper nouns to take no modifiers, (2) it seemed potentially explainable to a -naive user in simple terms, and (3) it could provide a natural language interface to an existing, but as yet largely unused, capability of the output formatting routines to generate and display tables of value$ containing such information as the earnings of each of a set of companies over a 1erio.d of years.</Paragraph> <Paragraph position="25"> The approach cmployed in the string transformations for processing :onjoined proper nouns is exemplified by the rule &quot;City. State, Year, a subpattern that is preceded by an astorisk and surrounded by a pair of parentheses. a his notation identifies the occurrence of a so-called &quot;Kleene star expression&quot;, which is interpreted by the transformational parser 3s a pattern clement that is to be matched bv zero or more consecutive occurrences of tree sequences matching con~ponents. The particular Kleene star expression used here vyill rrlatch a string of aq .I.</Paragraph> <Paragraph position="26"> lengthv whicy consists entirely of aq alternating seQuence of proper nouns and'commas, provl'ded that all the proper nouns are members of the same</Paragraph> <Paragraph position="28"> semantic class The pattern elements falowihg the Kleene star expression specify that it must be followed by: ($ another instance of a proper noun of the appropriate class {this will be the initial instance if the null value of the Kleene star expression is the on%y one that matches); Jr 9.</Paragraph> <Paragraph position="29"> The effcct of the condition, which precludes any match where the left-hand structural variable (X . 1) ends in a sequence of trees satisfying the pattern-of the ~k'enk $'tar expression, is to force a (unique) match of maximum length.</Paragraph> <Paragraph position="31"> OR is already used to signal the presence of a disjunctive pattern element to the rule-processing, routine) ; (iv) the final instance of a semantically compatible proper noun, and (v) the usual end variable.</Paragraph> <Paragraph position="32"> The structural change specifies (1) that the terminal elements of all but the rightmost conjunct (which are collectively associated with the pattern element (W . 2) during the pattern matching phase) are to be sister adjoined to the terminal element of that rightmost conjunct and (2) that the original occurrences of all trees but those corresponding to the end variabbes and the final conjunct are to be deleted. Conditional on the presence of the conjunction &quot;and&quot; (AND . 4), the feature change adds the feature (+ ANDSET) to the feature list of the surviving INDEX and the feature (- SG) to that of the NOUN node immediately above.</Paragraph> <Paragraph position="33"> (The latter operation automatically re sults in replacement of the original (f SG)). IPS the conjunction is an &quot;or&quot; (ORR . 5) instead, the feature change merely adds the feature (t ORSET) to the feature list of the INDEX, leaving the number of the NOUN unchanged.</Paragraph> <Paragraph position="34"> The overall effect of the rule reflects the by now familiar strategy of mapping a structure which would otherwise pose severe problems in surface parsing into a significantly simpler one which will be processed without difficulty by both the surface parser and the transformational parser. As in the case of CITYSTATE and RNKINTVL, a special featur-e is attached to the node in the output structure that directly dominates two or more terminal symbols as a result of the 5 tructural change of the rule. In each case, the purpose of the feature is to communicate t~ the semantic interpreter how the elements of the set of terminal symbols are to be treated -- as a (city, state) pair, as the endpoints of a dense set of integers, or as the elements of a conjoined set of proper nouns.</Paragraph> <Paragraph position="35"> The experimental approach to proper naun conjunction just described appeared initially to be a rather effective one. Examples such as (14) went through the transformational component as sWmoothly as ones like (1 5), (14) How much did GM, Ford, and Chrysler earn in the years from 1967 through 19727 whereupon the interpretive component produced what appeared to be an appropfiate answer -- in the case of (14), an earnings table with 18 entries</Paragraph> <Paragraph position="37"> How much did Ford earn in 19697 listed by company and by year. It was not long, however, before consideration of examples such as (16) and (17) revealed that the initial appearance of an. adequate solution had been highly misleading.</Paragraph> <Paragraph position="39"> What were the earnings of the Big Three auto companies lor tne 1 966-1 968 period? For the former example, at least two readings seem possible: one as a selec,kional question, paraphrased in (18a) (which would preclude a (18) a. Which auto company was unprofitable in 1970 -- GM or Ford7 b. Was either G.M or Ford unprofitable in 19707 yes ar no answer), the other as a yes-no question (18b), where the conditions for giving a positive answer depend upon the interpretation of the !lor It as inclusive or exclusive. In the case of (17), there seems to be a series of possible readings, roughly paraphrased .by (19a-d), reflecting ambiguity as to whether what has been requested is earnings information (19) a. What were the earnings of each of the Big Three auto companies for each of the years 11966-1 968? b. What were the combined earnings of the Big Three auto companies for each of the years 1966-1 9689 c. What did the earnings of each of the Blg Three auto companies totak for the 1966- 1968 period? d What did the combined earnings of the Big Three auto companies total for the 1966- 1968 period' (a) presented individually by company and by year, (b) summed over companies hut not over ycars, (c) summed over years but not over companies, or (d) summed over both companies and ycars.</Paragraph> <Paragraph position="40"> Ambiguities of the types exemplified by (1 6) and (1 7) were found to be quite widespread in the sort of material we are dealing with, occurring in a number of examples such as (14) where their presence was not initially perceived. Moreover, it was soon rcalizccl that such ambiguities were totally different in character from the types we had previously been most concerned with, since they involved instances of genuine multiple meaning in the language, rather than ambiguities artificially introduced by the inadequacies of a grammatical description or a parsing mechanism. It was also clear that the underlying structures assigned to these ambiguous examples were seriously deficient, in that they did not indicate the presence of an ambiguous situation, much less what the ambiguous alternatives were.</Paragraph> <Paragraph position="41"> Further investigation indicdted that the ambiguities encountered were not restricted to conjoined proper nouns, but could also occur in the case of plural noun phrases. Foraexample, (20) is ambiguous between a reading requesting earnings listed individually by company and a reading (20) What were the 1972 earnings of the companies in Chicago? requesting a combined earnings figure -- exactly the same readings which would exist if the phrase &quot;the companies in Chicago&quot; were replaced by the conjoined names of all companies satisfying that description. Thus, it appearcd that the ambiguities wc wished to undarstand and cope with were related not to conjunction per sc, but to semantic properties of setb and relations on sets.</Paragraph> <Paragraph position="42"> This view was reinforced by the discovery of syntactically parallel examples with sharply contrasting ambiguity patterns, as in (2 1 ). While both (2la) and (21b) share a reading where what is desired is a production (employment) figurc for each year in the period, only (21a) has a (21) a. How Inany cars were produced by Chrysler in the 1969-1972 period? b. How many people were employed by Chrysler in the 1969-1 972 period? sensible ieading \vvhere the annual figures are to be totalled up arithmetically. The reason lies in the distinction between quantities like earnings, auto production, and rainfall. -- which are inherently additive and are measured on a cumulative basis -- and quantities like employment, assets and temperature, which are measured on an instantaneous pasls .r.</Paragraph> <Paragraph position="43"> -1.</Paragraph> <Paragraph position="44"> and are not additive over time in a meaningful sense . On the other hand, (Zlb) seems to have two other possible readings (22a) and (22b), reflecting questions abaut the size of a set union and of a set intersection, respectively. Although neither version of (22) could be answered with</Paragraph> <Paragraph position="46"> Although it is meaningful to add them on the way to computing an average over a period of time.</Paragraph> <Paragraph position="47"> (22) a. How Inany cliffercnt people were employed by Chrysler in the 1969-1 972 period? b. How many people wcre employed by Chrysler during the entire 1969- 1972 period? respect to a Fortune-500-type data base, where people are countable but indistinguishable, both are questions which it would be quite reasonable to try to deal with in a data base environment that included personnel files.</Paragraph> <Paragraph position="48"> At present, we are continuing to work on problems of conjunctionhandling both by pursuing the line qf investigation just touched upon and by studying patterns of disambiguation suggested by such examples as (I$), (19), and (22) . The richness and subtlety of the material we have encountered - - scarcely'hinted at here - - is particularly remarkable in the light of the severe limitations placed on the types of conjunction constructions to be considered. While the use of string transformations has not provided us with a satisfactory solution for even a srnall part of the domain of co~~junction constructions, it has had the hlghly beneficial effect of bringing us face-to-face with a range of significant problems of which we had previously been almost total!y unaware.</Paragraph> </Section> </Section> class="xml-element"></Paper>