File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/79/j79-1047_metho.xml
Size: 61,148 bytes
Last Modified: 2025-10-06 14:11:10
<?xml version="1.0" standalone="yes"?> <Paper uid="J79-1047"> <Title>Association for Computational Linguistics k SURVEY OF SYNTACTIC ANN-YS IS PROCEDURES FOR NATURAL LANGUAGE</Title> <Section position="5" start_page="19" end_page="19" type="metho"> <SectionTitle> 3. ALGORITHM SPECIFICATIOtJS </SectionTitle> <Paragraph position="0"> We present below precise specifications for some of the parsing algorithms which have been discussed. These algorithms are presented in SETL, a programming language which is based on concepts Prom set theory and has been developed at New York University by a group led by Jack Schwartz. The large variety of data types, operators, and control structures in SETL makes it possible to specify the algorithms in a relatively compact and natural fashion. An implementation is available which includes most of the fcaturcs of the specification language, so that algorithms can be tested in essentially the form in which they are published. A description of the subset of SETL whici~ has been used in this report is given in the appendix.</Paragraph> <Section position="1" start_page="19" end_page="19" type="sub_section"> <SectionTitle> 3.1 Parsing Alqorithms for Context-Free Grammars _. </SectionTitle> <Paragraph position="0"> Context-free grammars played a major role i the early stages of automatic natural language analysis. A1 though they have now generally been superceded by more con~plex and powerful grammars, many of these grammars are based on or have as one of their components a context-f ree grammar. The selection of an efficient context-free parser theref ore remains an i~nportant considerat ion in natural Language analysis.</Paragraph> <Paragraph position="1"> Jecause so many different context-free parsers have been proposed, a comprehensive survey would be impracti cable. We si~all rather present a taxonomy according t.u which most context-free parsers can be classified, and illustrate this classification with five of the possible basic zlgorithms. At the end we shall mention which 6f these are beinq used in current natural language systems .</Paragraph> <Paragraph position="2"> T?L~ first division we shall make is according to the amount of memory space requLred by the parser. Type 0 parsers store only the parse tree currently being built. The other parsers grabally accumulate data from which all parses of a sentence can be extracted; types 1, 2 and 3 store this data in decreasingly compact representations. The four types are : (0) Develops a single parse tree at a time; at any instant the store holds a set of nodes corresponding to the nodes of an incomplete potential parse tree ft) The store holds a set of nodes, each of which represents the fact that some substring of the sentence, from word f to word R, can be analyz~d as some symbol N.</Paragraph> <Paragraph position="3"> (2) The store holds a set of nodes, each of which represents an analysis of some substring of the sentence, from word f to word a, as some symbol N (if there are several different analyses of words f to 11 as some symbol N, there will be several nodes corresponding to a single node in a type 1 parser).</Paragraph> <Paragraph position="4"> (3) The store holds a set of nodes, each of which corresponds to an analysis of? some substring of the sentence; from word f to word a, as some symbol N appearing as part of some incomplete potential parse tree (if symbol N, spanning words f to !L, appears in several of the incomplete potential parse trees, there will be several nodes corresponding to each node in a type 2 parser) .</Paragraph> <Paragraph position="5"> Type (0) parsers require only an qmount of storage proportional to the length of the input sentence. The storage requirements of type (1) parsers grow as the cube of the length, while the requirements for types (2) and (3) grow exponentially.</Paragraph> <Paragraph position="6"> A second division can be made between top-down and bottom-up parsers. A third criterion for classification is whether alternative parses of a sentence are all produced together (parallel parser) or are generated sequentially (serial parser) ; this division doe$ not apply to type (0) parsers.</Paragraph> <Paragraph position="7"> Finer divisions can be made of some of these categories.</Paragraph> <Paragraph position="8"> For example, among bottom-up parsers we can distinguish those which perform a reduction only when all required elements have been found from those which make a tentative reduction when the first element of a production is found (so-called &quot; left-corner parsers&quot;). Parallel parsers can be classified according to the orderin(- strategy they use in building nodes: by leftmost or rightmast word subsumed i . e , spanned) by the node or by levell. In addition, we shall not consider a number of optimization strategies, such as selectivity matrices and shaper and generalized shaper tests for top-down parsers.</Paragraph> <Paragraph position="9"> We shall naw describe algorithms in five of the cateyorics: We have not included any type 3 parsers bccausc, despite their profligate use nE storage, they do not operate inuch Easter than type 0 parsers. The only reported use of such a parser of which we are aware is the &quot;Error-Correcting Parse Algorithm&quot; of Irons (Cornm. ACM - 6, 669 (1963) ) . A top-down left-to-right parallel strategy was employed so that the parser could make a suitable modification to the sentence when it &quot;got stuck1' because of an error in the input.</Paragraph> <Paragraph position="10"> STSITL ~rocedures are given for these five parsers. The input data structures are the same in all cases: The sentence, passed through parameter SENTENCE, is a tuple. The elements 0 the tuple, the words of :he sentence, are to be matched by terminal symbols from the grammar. The context-f ree grammar, passed through parameter GRAMMAR, is a set each of whose elements corresponds to a production* The production</Paragraph> <Paragraph position="12"> is transformed into the [ntl) -tuple <ao, alra2, .. ,a, > The root symbol sf the grammar is passed to the parser in parameter ROOT.</Paragraph> <Paragraph position="13"> Algorithm A, Type 0 T-op-down Serial This procedure builds the parse trees for the input sentence sequentially in a two-dimen~iona$ arrap TREE. The first subscript of TREE specifies the number of the node, the second selects t component of the node as follows: As each analysis of the sentence is completed, it is added to the skt PARSES. When parsing is finished, this ~t of trees is returned as the value of the function PARSE. The variable NODES holds a count of the number of npdes in the parse tree; this is also the number af the node most rqcently added to the tree. WORD holds the number of the next word in fie sentence to be matched.</Paragraph> <Paragraph position="14"> The heart of the parser is the recursive proced~re EXPAND. EXPLtD is passed one argument, the number of a node in the parse tree. IY EXPAND has not been called for this node before, it will try to expand the node, fbe. , build a parse tree below the node which matches part of the remainder of the sentence. If EXPAND has already been called once for this node -- so that a tree already exists below this node -- EXPAND tries to find an alternate tree below the node which will match up with part of the remainder of the sentence.</Paragraph> <Paragraph position="15"> If EXPAND is successful -- an (alternate) tree below the node was found -- it returns the value true; if it is unsueeessful, it returns false. In the case where the node corresponds do a terminal sy~nbol, EXPAND will return true on the first call only if the synbol matches the next word in the sentence; it will always return false on the second call.</Paragraph> <Paragraph position="16"> definef PARSE (6'11~filhlAR~ ROOT, SENTENCE) ; local PARSES, TREE, NODES, WORD ;</Paragraph> <Paragraph position="18"> if GRAMMARITREE(X~~NAME~H eq nR then /* terminal symbol */ if TREE (X, 'ALTERNATE OPTIONS ' ) eq 0 then /* first call -- test for match with sentence */ TREE (X, 'ALTERNATE OPTIONS ' ) = nR; if WORD le #SENTENCE then if SENTENCE(W0RD) eq TREE(x, 'NAME') then</Paragraph> <Paragraph position="20"/> <Paragraph position="22"> if EXPAND (TREE (X, ' DAUGIlTERS ' ) (I) ) then /* expansion found.. . if this is last element, return successfully, else advance to next element */ if I eq #OPT then</Paragraph> <Paragraph position="24"> /* all expansions for this option have been generated; if more options, loop, else return falsa, */</Paragraph> <Paragraph position="26"> One way of viewing this procedure is to consider each node as a separate process. Each process creates and invokes the processes corresponding to its daughter nodes. In SETL, the algorithm cannot be represented directly in this way, since there are no meckgmisms for creating and suspending processes.</Paragraph> <Paragraph position="27"> Instead, the data which would correspond to the local vaxiab1.e~ of the process are stored as components of each node in the parse tree. In languages which provide for the suspension of processes, such as SIMULA, the algorithm can be represented even more succinctly (see, for example, a version of this algorithm in &quot;Hierarchical Program Structures&quot; by 0.-J, Dahl and C. A. R. Hoare, in Structured Progrminq by 0.-J. Dahl et al., page 201).</Paragraph> <Paragraph position="28"> Algorithm B- Type 2 Bottom-up Parallel This algorithm is sometimes called the &quot;Immediate Co~stituent Analysis&quot; (IcA) alqorithm, because it was used quite early in parsing natural langqage with ICA grammars, 1%- constructs all nodes in a single left-to-right pass over the sentence. As each word is scanned, the parser builds all nodes which subsume a portion of the sentence ending at that word. The nodes ( &quot;spans&quot; ) are accumualted in a two-dimensional array SPAN, whose first subscript specifies the number of the span and whose second subscript selects a component of the span, as follows:</Paragraph> <Paragraph position="30"> of span n.</Paragraph> <Paragraph position="31"> At the end of the routine is some code to convert SPAN, a graph structure with each span potentially a part of many parses, into a set of parse trees. This code has two parts: a loop to find all root odes created in the immediate constituent analysis, and a recursive routine EXPAND which makes copies of all descendants of the root node and puts them in TREE. Each node in the tree has the following comr,onents: TREE(n, 'NAME') = name of node n TREE(n, 'FW') = number of first sentence word suksumed by node n TREE(n, 'LW+ll) = (nunher of last sentence word subsumed by node n) + 1 TREE (n, 'DAUGHTERS') = tuple of numbers of daughter nodes of node n TREE (11, 'PARENT')= number of parent node of node n. The set of parse trees is accun\ulated i.n PARSES and finally returned as the value of the function PARSE.</Paragraph> <Paragraph position="33"> local TODO, WORR, CURRENT, DEF, DEFNAME, DEFELIST, REM, SPAN, , SPANS! TREE, NODES, PARSES, MS, I;</Paragraph> <Paragraph position="35"> /* add span whose name is sentence word */ ADDSPAN (SENTENCE (WORD) , WORD, WORD+l, nult) ; /* TODO contains the numbers of spans which were just created and for which ye have not yet checked whether they can be used as the last daughter span in building some more spans */ end VREM; end VDEF; end while TODO; end 1 \= VWORD; /* eXtract trees from set of spans */</Paragraph> <Paragraph position="37"> match the elen~ents of the n-tuple ELIST md .cqhich span a portion of the sentence whose last word +l = ENDFVDPI; returns a set, each element of which is an (ntl) -tuple, whose tail is one of the n-tupleof spans and whose head is the rlunher of the first word spanned by the n-tuple of spans */ if ELIST cq nu9t then return ((ENDwDP~~~); else return [U: 1 <= 1 <= NODESJ (if (SPAN (I, 'NAME ' ) eq ELXST (#ELIST) ) and (SPAN (I, 'LFJ+ll ) eq ENDWDP1)</Paragraph> <Paragraph position="39"> /* creates a node for each span in DAW and each descendant thereof, and returns a tuple with the numbers of the nodes (in TREE) corresponding to the spans in DAW */ Dr Ni eq Q then return 52; ;</Paragraph> <Paragraph position="41"> Algorithm C Type 1 Bottom-up Parallel Algorithm C is the basic &quot;nodal spansr' parsing algorithm [Cocke 19701. The sequencing logic is identical to that for Algorithm 3. The ~nly difference in the tree representation ia that all spans in Algorithm B with conunon values in the NIIME, FW, and LW+1 components are joined into a single span in Algorithm C. The DAUGHTERS component now becomes a set, each of hose elements correspands to the value of the DAUGHTERS compbnent of one of the spans in Algorithm B (this set is called the &quot;division list&quot; .in the nodal spans algorithm): sp~~(n, 'DAUGHTERS ') = a set each of whose elements is a tuple of numbers of daughter nodes of span n In order to effect this change in the trce, it is necessary only to modify the procedure ADDSPN~ to check whether a span with the specified value of NAME, FW, and LW+1 aIlr?i?aay exists: define ADDSPAiJ (NFr).IE, FW, LWP1, DAUGHTERS) ;</Paragraph> <Paragraph position="43"> The procedure for converting the spans into a set of trees is now more complicated than for ~lgorithm B; see, for example, Wens 1:1975], Sec. 7.</Paragraph> <Paragraph position="44"> Algorithm D. Type 2 Top-down Serial We now seek to combine the advantages of algorithm A with those of algorithms B and C. Algorithms B and C would construct any given tree over a portion of the sentence only once, whereas algorithm A might construct some trees many times during the course of a parse. On the ~ther hand, B and C would construct many trees which A would never try to build. More precisl~ly, B and C would build trees while processing word n+l which could not enter into any parse for any sentence whose first n words were those processed so far.</Paragraph> <Paragraph position="45"> To combine these algorithms, we shall return to the basic framewdrk provided by algorithm A. To this we add a mechanism for recording &quot;well formed substrings.&quot; The first time the parser tries to analyze a portion of the sentence beginning at word f as an instance of symbol N, this mechanism records any and all trees constructed below node N. The next time the parser tries symbol N at word f, the saving mechanism retrieves this information so mat the trees below N need not actually be rebuilt.</Paragraph> <Paragraph position="46"> The previously-completed trees are stored in the two-dimensional array WFS, whose structure is identical to that of SFAN in</Paragraph> <Paragraph position="48"> of substring n WFSS sholds the numbei- of substrings in WFS. When the parsing operation is complete, WFS will contain a subset of the elements which were in TREE at the end of algorithm B.</Paragraph> <Paragraph position="49"> The tree used by the top-down parser must be augmented to allow for the possibility that khe parser is not building a trce below a given node but rather consulting the table of well-formed substrings for that node. In that case the node will have, instcad of a tuple of daughters and a set of alternative options, the nun.lber of the well-formed substring currently being used in the taree and the set of alternative weil-formed substrings.</Paragraph> <Paragraph position="50"> The structure of a node is thus:</Paragraph> <Paragraph position="52"> set of tdplcs rcprcsenting productions not yet tried for node n TREE (n, 'ALTERNATE WFS ' ) = set of numbers of well-form6d substrings not yet tried for node n</Paragraph> <Paragraph position="54"> Finally, we require a table which indicates, for each symbol N and sentence word f, whether a11 the well-formed substrings for N starting at f have been recorded in WFS. For this the parser uses the two-dimensional array EXPANDED: EXPANDED(N,f) = true if all substrings have been recorded, Gi if not.</Paragraph> <Paragraph position="55"> The text of procedure D is given below; comments arE included only for those statements added to procedure A.</Paragraph> <Paragraph position="56"> def inef PARSE (GRAMMAR, ROOT, SENTENCE) ; local PARSES , TREE, NDDEG , WORD, WFS , WFSS ,. EXPANDED $</Paragraph> <Paragraph position="58"> local I, S, LAST, OPT; if EXPANDED (TREE (X, 'NAME ' ) , TREE (X, ' FW' ) ) eq true then /* the expansions for this symbol have been computed before */ /* if this is a new node, get its WFS entries */ if TREE (x, 'ALTERNATE WFS ' ) eq Q then</Paragraph> <Paragraph position="60"> GETOPT: if OPT eq R then OPT from TREE (-XI 'ALTERNATE OPTIONS ' ) ; TREE: (XI 'CURRELJT OPTION I)= OPT;</Paragraph> <Paragraph position="62"> if EXPAND (TREE (X, DAUGHTERS ) (I) ) then if I eq #OPT then</Paragraph> <Paragraph position="64"> Note that this parser returns an ordered pair consisting of the set of trees and the set of well-formed sSstrings, since the trees alone do not contain complete information about the sentence analysis.</Paragraph> <Paragraph position="65"> Algorithm E Type 1 Top-Down Serial To complete our set of algorithms, we shall apply to Algorithm D the same change we made to convert Algorithm B to Algorithm C. That is, where in Algorithm D we may have had Sevc~.~.; ,~e11 formed substrings with the same values of XAP?, l?W, LIJ4-1, we shall combine these into a single substring ir, Algorithm E: The component DAUGHTERS 'becomes a set, each o: whose elements is a tuple corresponding to the value of DAUGHTERS of one of ti,e substrings in Aigorithm D. Just as we only had to change AD7idODE in Algorithm B, we only have to change ADD~FS in Algorithm D.</Paragraph> <Paragraph position="67"> /* &arch for well formed substring with identical ;.IMil3, m, LW+l */ if 1 <= 3 W <= WFSS 1 ( (WFS (W ,INAME ' ) eq NODEX ( 'NAME ' ) ) and</Paragraph> <Paragraph position="69"> Use of the Varioub Algo~ithms in Natural Language Systems The type 0 top-down algorithm (algorithm A) is one of the simplest and lnost frequently used. For example, a special version of this algorithm (for Greibach normal form grammars) was used in the original Harvard predictive Analyzer [ICuno 19621. The later version of the ~arvard system, incorpornti~~g a &quot;path elimination&quot; technique, was a type 1 top-down serial parser, a variant of Algorithm E; instead of saving all daughters in WFS during the parse, they were recomputed later for those nodes appearing in a parse tree [Xqo 19651.</Paragraph> <Paragraph position="70"> Several current sys.t;cms use nug111&11t;ed context-f ree grammars : grammars to which have bccn uddcd restrictions on the parse tree typically in the form of LISP predicates, which must be true if the tree is to be accepted as a sentence L~~~alysis. The lBinocjrcld [197l] svs tern uses an acy~ented contest-f ree grarmar with the context-f ree component encoded as a program rather than, data. The parsing strategy is essentially that of !a type 0 top-down algorf-thm, except that back-up is explicitly controlled rather than automatic.</Paragraph> <Paragraph position="71"> Woods ' system [1970b] also uses a type 0 top-down algorithm, although somewhat different from the one presented here since his grammar is a recursive transition network. The LincJuistic String Project system [Sager 19671 started out with a parser based on a type 0 top-down algorithm; for efficiency it later progressed to a type 2 top-down algorithm. A type 2 rather than a type 1 algorithm was used because the restrictions can o11e analysis of a portion of the sentence as a particular symbol while accepting another analysis of the same portion of the sentence as the same syxbol.</Paragraph> <Paragraph position="72"> For a type 2 algorithm, this means simply el-ininating some nodes i'n WFS; for a type 1 algorithm, whe~e a single node may represent several ttrees, a complicated procedure which could create new nodes would have been required in general. The Linguistic String Project parser is considerably more complex than the type 1 top-down serial parser shown above (algorithm D), in part because of the restrictions which must be evaluated during the parse, in part because (for reasons of skoroge ecorloiny) the system makes it possible to save only selected nodes in WFS.</Paragraph> <Paragraph position="73"> The type 2 bottom-up parallel algorithm also saw early use in natural language processing.</Paragraph> <Paragraph position="74"> The parser designed by Cocke for the Rand system was a special version of this algorithm for Chomsky normal form grammars.</Paragraph> <Paragraph position="75"> A thorough survey of the different ordering strategies possible with this algorithm was given by Hays [1967]. This algorithm was subsequently developed by Cocke (among others) into a type 1 bottom-up paralleL algorithm named &quot;nodal spans&quot; and subsequently into a type .1 top-down parallel algorithm called &quot;improved nodal spans&quot; (see Cocke [1970] for a description of these algorithms). The latter is very similar to a parsing algorithm described by Earley [19 701 . +These type 1 algorithms have, to &he best of our knowledge, not yet been used in natural language parsing.</Paragraph> <Paragraph position="76"> In closing a few remarks are in order on the practical importance of the differences between the various algorithms.</Paragraph> <Paragraph position="77"> How significant is the difference between type 2 and type 1, between top-down and bottom-up , be tween serial and parallel? There has been no systematic study of these questions, and the answers to them are in all likelihood qyite grammar specific.</Paragraph> <Paragraph position="78"> For example, the advantage of the tcjp-down parser is that, in working on word n+l, it eliminates from consideration those symbols which could not occur in any parse tree whose first n terminal symbols are the first n words of the sentence. Is this a large effect? Although I am not aware of any measurement of this quantity, the factor seems to be relatively small for large-coverage English grammars -- perhaps reducing the number of symbols in half.</Paragraph> <Paragraph position="79"> The advantage of type 1 over type 2 algorithms depends on the degree of ambiguity of the grammar. How frequently can a portion of i+ sentence be analyzed as a particular symbol in several ways? For unaugmented context-free grammars the answer in general has been very frequently -- this was one of the problems of the context-free systems. For such grammars, type 1 algorithms would be much more efficient. When restrictions are added, however, they discriminate some of the analyses from others.</Paragraph> <Paragraph position="80"> L 53 A rich set of syntactic, semantic, and pragmatic restrictions (available so far only for small subsets of English in limited areas of di~coursa) would presumabl~~ eliminate almost all ambiguity, so that the advantage of a type 1 algorithm would then be small.</Paragraph> <Paragraph position="81"> Finally, We should mentian the difference butwecn serial and parallel parsers. Since serial and parallel algorithms wi'll have created the same nupber of nodes by the time parsing is con~plete, the difference in time is probably quite small.</Paragraph> <Paragraph position="82"> The parallel algorithm pay have the edge because &quot;bookkeeping&quot; is simpl'er. Also; the parallel algorithm can handle left rccursio:~ naturlaily, whereas a special mechanism is requirca for top-down serial parsers. On the other hand, a serial algorithm may be preferable if only the first parse of a sentence is required.</Paragraph> <Paragraph position="83"> In addition, the serial algorithms can more sim~ly handle the situation where memory space is marginal. Normally most of the space in algoritnms D and E is used by the set WFS, not by TREE.</Paragraph> <Paragraph position="84"> Consequently a type 1 or 2 serial parser can &quot;rescue itself&quot; when memory is almost exhausted by reverting to a type 0 algorithm; this simply means tnat it stops saving nodes in WFS. In terms of the SETL programs given above this requires, in addition to a change to ADDWFS, only that elements of EXPANDED no longer be set t6 true qnce -savTng has been tern~inated.</Paragraph> <Paragraph position="85"> 3.2. A Parser for Unrestricted Rewritina Rule Grammars A number of natural language systems, such as REL (at the California Institute of Technology) and the Q-system (at the University of Montreal) have used unrestricted phrase structure grammars. In such grammars, each rule specifies that some sequence of symbols be rewri'tten as some other sequence of symbols. The parsing algorithm used in these system was described by Kay in 1967 (&quot;Experiments with a Powerful. Parser,&quot; Martin Kay, ih 26me Conference Internationale sur le Traitmcnt Automatique des Langues , Grenoble) .</Paragraph> <Paragraph position="86"> Kay added qudte a few features to the basic parsing procedure to create his &quot;powerful parser&quot;. These included rule ordering and conditions on rule application. Other unrestricted rewriting rule sys terns have also included some such features. to ~ermit the parsimonious description of c?~mplex natural language grammars. In this newsletter, however, we shall not be concerned with these additional features ; only the basic parsing procedure will be described below.</Paragraph> <Paragraph position="87"> The parser to be presented represents only a small modification to the context-free parser B (the &quot;immediate constituent analyzer&quot;) given earlier . To understand this modification, consider the following example. We are given a context-Free grammar which includes the productions</Paragraph> <Paragraph position="89"> We shall create a diagram for the sentence by making eath word into an arc connecting twa nodes, which are labeled with the number of the word in the sentence and the number +1: Context-free parser B would fifst apply the product a -+ doPS in reverse, to obtain a span 'a&quot; , we can indicate this thus: Note that the arc for thc spa connects nodas corresponding to the first word and the last word + 1 subsurnod by I span.</Paragraph> <Paragraph position="90"> The parscr would then apply x -c y a in rcvcrse: getting a span which subsumes the entire sentence.</Paragraph> <Paragraph position="91"> Now consider analyzing the same sentence with the unrestricted phrase structure! grammar z+ya x+zb We begin by ilsing the first production tc reduce the se~~tcnce to y a b. This raises the problem of how to label tthc no~ic between arcs a and b. ~lthough a and b together subs~~a the last three words of the sentence, no fraction of- this C~II be assigned individually to a or to b; hence we cannot label this new node with the number of a sentence word, hlstead, we assign a nzw, unique label (here, v1) to the node: PiCri can thea reverse the second production to get a span z Fhallly we reverse the third production to get q span x subsuming the entire sentence: In the progzam below, new node names are created by cans oal the SWL function newat, which returns a different Ucpe symbol (a &quot;blank atom&quot; in SETL terminology) each time it is called. We have retained the span component names EW and LW+l for the labels of the nodes at the ends of the arc, moxagh their values may now be blank atoms instead of numbers. A production of the form created and for which we have not yet checked just whether they can be used as the last daughter span in building some more spans */ (while TODO ne nult) /* select a span from TODO */ CURRENT = hd TODO; TODO = tl TODO; /* loop over all productions whose last element = name of current span */ (VDEF E GRAMMAR~DEF(#DEF) eq SPAN (CURRENT, ') ) /* separate left and right sides of production */</Paragraph> <Paragraph position="93"> can be matched by spans, add new spans whose names = left-hand side of production for each match*/</Paragraph> <Paragraph position="95"> match the elements of the n-tuple ELIST and which span a portion of the se~ltence whose last word + 1 = ENDWDP1; returns a set, each element of which is an (n+l) -tuple, whose tail is one of the n-tuples of spans and whose head is the number of the first word spanned by the</Paragraph> <Paragraph position="97"> The unrestricted rewriting rule parser has the power of a Turing machine. The user is afforded great flexibility in the manipulation of sentence strings. One drawback ~f such power, however, is the absence of a decision procedure -no parser can determine, for an arbitrary grammar of this type, that a given sentence string is ungrqmrnatical.</Paragraph> <Paragraph position="98"> The user must tllcrefore be careful to design grammars so that the parser will terminate (in a reasonable amount of time) for any input sentence.</Paragraph> <Paragraph position="99"> 3.3. Parsinq Procedures for Transformational Grammars Most linguistic research over the past fifteen years has been conducted within the framework of transformational grammar developed by Chom'sky and ~arris. In the early 19601sr a few years after the blossoming of transforma~ional grammar, several efforts were begun to develop parsers which could operate fairly directly from a transformational grammar.</Paragraph> <Paragraph position="100"> Two of these achieved some measure of success: a project at MITRE led by Donald Walker [Zwicky 1965, Walker 19661 and work at MIT by Stanley Petrick [19651.</Paragraph> <Paragraph position="101"> These twa efforts had quite different objectives.</Paragraph> <Paragraph position="102"> The MITRE group was concerned with a specific practical application: development of a natural language interface for a military information retrieval sys tex. They developed a grammar for a subset of English meeting their requirements and were primarily concerned with designing a parser which could handle this particular grammar. Petrick, in contrast , developed a general parsing procedure which would work with any member of a class of transformational grammars. This difference in objective affeeked a number of design decisions regarding the parsing procedure, as we shall see later on.</Paragraph> <Paragraph position="103"> Petrick and his coworkers, at the Air Force Cambridge Research Laboratory and now at XBM, Yorktown Heights, have modified the parser to reflect changes in transformational grammar and to adapt it for use as the front-end in an information retrieval system. IPetrick 1966,1973,1975; Keyser 1967; Plath 1974a, 1974bl. Interestingly enough, these modifications have brought Petrick's parser much closer to the original MITRE design.</Paragraph> <Paragraph position="104"> Since the structure of trans fom'iational grammar has varied in time and between different schools of linguistic theory, the notion of a transformational parser is not we11 defined. In order to present a parsing algorithm, we have selected a particularly simple granular formulation. This Eorinulation corresponds a~proxirnate~ly to the early work of Chomsky (e.g., Syntactic Structztres) and the theory used in the early versions of the MITRE and Petrick systems, Complicating factors, such as features and content-sensitive rules for lexical insertion, have been on~itted.</Paragraph> <Paragraph position="105"> The grammar consists of a bass curnps~t8nt and a ~~~Lz~~s~oI*~*:LI$.iunnZ conty~?t~?zd. The base componant is a context-free granunar which produces a set of dcsp s trnc-t;u~.a trees. The t:-ansfor~natiorla1 coi~~ponent is a set of tree-rewriting rulcs which, when applied to a deep structure tree, produces one or more su~fucc s t~ucture trees. The frontiers (tern~inal node sequences) of Lhe surface structure trees are the sentences of the language.</Paragraph> <Paragraph position="106"> The root symbol of the base component is named S. The base component also contains a distinguished symbol COMP which appears on the left side of only one production: COrviP -+ # s # # is referred to as the sentence boundary marker, With the exclusion of this production, the CralImIar is not recursive.</Paragraph> <Paragraph position="107"> Each transformation consists primarily of a s +~ztctu~a 2 index and a strztctuz~at change. The strllctural irldex is a tuple (vector) , <si . . . s > I each of whose components 1' n is either a symbol (name of a node) or &quot;Xu- The structural change is a tuple <sclr. . . ,sc of the same length as the n structural index. Each of its components is ir turn a tuple</Paragraph> <Paragraph position="109"> il,. - . ,sc >, possibly empty (ni = 0). Each of the scij in, is either a terminal symbol or an integer between 1 and n. The application of fransfomational rules is based dh the notion of a proper anatysis, which is in turn based on the concept of a cut of a tree.</Paragraph> <Paragraph position="110"> Roughly speaking, a cut is defined by drawing a line from left to right through a tree, passing only through nodes (hot through the lines connecting nodes) ; the nodes thus passed through form the cut. For example, for the tree</Paragraph> <Paragraph position="112"> the sequence of nodes NP, VERB, M form a cut. More formally (Aho and Ullman, The Theory of Parsing, Trans Zation, and Compiling, Vol. I, pg.140), a cut is a subset C of the nodes D of the tree such that I* no node in C is on a successor path from some other node in C 2. no other node of D can be added to C without violating rule 1 If the na&nes of the nodes in the cut, arranged in sequence from left to right, match the structural index of the transformation, the cut is a proper analysis of the tree with Y respect to this transformatiofl. A structural index matches the sequence of node names if there exists a substitution of sequences of symbols (possibly null and not incldding #) for the occurrences of &quot;X&quot; in the structural index which will make the structural index identical to the sequence of names. For example, the cut</Paragraph> </Section> </Section> <Section position="6" start_page="19" end_page="19" type="metho"> <SectionTitle> NP VERB N </SectionTitle> <Paragraph position="0"> would be matched by any of the structural indices</Paragraph> </Section> <Section position="7" start_page="19" end_page="19" type="metho"> <SectionTitle> NP VERB N </SectionTitle> <Paragraph position="0"> The proper analysis associates with each element of the structural index (except possib~y &quot;XX&quot;s) a nods in thc tree and hence a subtree, the trce dominated by that node. The? structural change indicates how these subtrees are to be shuffled to effect the trans fornation. sc specifies what i is to go into the position occupied by the node matching sc i If sci is a 1-tuple, we simply have a case of one 11ode (and the sabtrce it dominates) replacing another; if sc is an i n tuple, n > 1, we first substitute sc for the or~yinal node and then insert sci2 1 . . . , sc .</Paragraph> <Paragraph position="1"> il lni as right siblings of that node: If scij is an integer,between 1 and n, the new node is the node matched to the scij-th element of the structural index; if SCij is a terminal symbol, the new node is a terminal node with that name.</Paragraph> <Paragraph position="2"> Because the value of sci may be the null tuple < >, it is possible for a node in the tree to be left with no successors. We therefore &quot;clean up&quot; the tree after applying the transformation by deleting any nonteminal node not dominating at least one terminal node.</Paragraph> <Paragraph position="3"> The prescription just given is inadequate for components ~f the structural index equal to &quot;X&quot;, since these may match zero or moxe than one node. ide shall cons train the transfornations so that nodes in the cut which are matched by &quot;~&quot;'s do not take part in the transformation. In terms of the structural change, if sik = &quot;x&quot; then sc = <k> and no other scij k = k.</Paragraph> <Paragraph position="4"> As an example (3 siqplfication of the example in Keyser and Petrick, Syntactic AnaZysis, p. 9), consider the passive transformation. Its structural index is <PIPt AUX, Vt XI NP, XI BY, PASS> and its structural change Applied to the tree</Paragraph> </Section> <Section position="8" start_page="19" end_page="19" type="metho"> <SectionTitle> ART </SectionTitle> <Paragraph position="0"> the crocodi it produces the proper analysis indicated by the dotted line Applying the transformation yields the tree</Paragraph> </Section> <Section position="9" start_page="19" end_page="19" type="metho"> <SectionTitle> ART PREDP AUX </SectionTitle> <Paragraph position="0"> the girl BE EN frighten BY the crocodile In additian to the structural index and structural change, some trans formations may have an identity condition, requiring that the subtrees matched by two elen~ents of the structural index be identical for the transEorma.t=ion to ~PP 1~ The rule COMP -+ # S #, which maes the base conponcnt recursive, also plays a special role in the transformations. If the structure appears in the parse trce, we call the tree dominated by that S a c?3)istitltdnt (or embedded) sentence, and the tsce &~ninated by the next S above COMP the rnadl?,ix sentence parse tree. The transformations are of h~o types, s and binarii (or embedding). In a singulary transformation, the structurar index does not contain the symbol #. In a bhary transformation, the structural index is of the form a #. B B y , whete a, 8, and y are strings of symbols not containing #.</Paragraph> <Paragraph position="1"> The binary transformation deEetes these boundary markers (if # are the ith and jth con~yonents of the structural index, then none of the SC~.~ = i or j , thus combining a constituent sentence with its matrix sentence..</Paragraph> <Paragraph position="2"> The transformations are also cl ssed as optio~wl or ob lt gatory. Just like the generation of sentences with a context-free grap.uilar, the application of t.ransfornations to a base structure may be viewed. as a ndndeterninistic process. Depending on the choices made, one of several possible surface structures may be obtained from a single deep structu-e.</Paragraph> <Paragraph position="3"> The transformations are considered in a fixed order to be described momentarily. If there exists no proper analysis for a transformation, the transfqrhation is skipped. If there exist several proper analyses, one is chosen. If the transformation is obligatory, it is then applied; if it 1s optional, a choice is made to apply it or not.</Paragraph> <Paragraph position="4"> The singulary and binary trans formations are separately ordered. The tr ~sformationol process begins by selecting an embedded sentence tree not including any other embedded sentence. The singulary transformations are applied in sequence to this tree; structural indices are matched against the embedded tree, not the entire parse tree. The binary transfokmations are then applied to this tree and its matrix sentence tree; one of these should actually transform the tree, deleting the embedded sentences (if none applied, we would eventually be left with a surface structure containing # ' s , which would be rejected) . Another deepest ernbedded sentence is selected and the process repeats until no embedded sentences remain. The singulary trans formations are then applied to the entire tree, completing the generating process (if the base structure contained no embedded sentences, this would be the only step) .</Paragraph> <Paragraph position="5"> In order to parse a sentence -- obtain its deep structures -we would like to reverse the process just describedv. First build one or more ;.>otential surface structure parse trees for a sentence and $hen, by applying the transformations in reverse, try to obtain a valid deep structure from each of these.</Paragraph> <Paragraph position="6"> We shall deal with these two steps in turn.</Paragraph> <Paragraph position="7"> The surface structure parse tree will, in general, contain many structures which could not be directly generated by the base component. If we want to produce all the surface structure trees for a sentence using a context-free grammar, it will be necessary to augment the base component. For example, if the base component contains the production A+XY and there is a transformation which interchanges x and Y, the rule must be included in the grammar which is used to produce the surface structure trees. Petrick has described (in his P21. D. thesis) a procedure which can determine fro111 the base and transfor~national components, how the base must be augmented in order to obtain all surface struc.turt3 trees.</Paragraph> <Paragraph position="8"> Because a trnnsformntion can replace one node with two, it is possible for rcgcated application of such a transformation to produce a node in the surface. structure with an arbiarbitrary n~unbes of inmediate descendants. Tllis cans th~t. an infinite n~ur~el&quot; of rules must be added to 1 base component. Petrick noted, however, that; if a limit is placed on the length of sentences to be analyzed (and certain minimal assumptions are made about the grammar) , only a finite number of rules are required. (Alternatively, it seems, a recursive transition network could be used tu obtain the surface structure, since such a device allows a node to have an arbitrary number of immediate descendants . ) This augmented grcnlmar will produce all the valid surface structure parse trees but it will a! so produce, in general, many spurious trees (trees not derivable fl-om deep structures) This is unavoidable, since a context-free yranuuar is a ri~uch weaker computational device than a tra~:sfosmati on5.1 grannaar. Because the language defined by the augmented base component is larger than that defined by the tranxEormationa1 graminar, the augmented base component is called a coda19ing gramrzar. Since each spurious surface analysis will have to undergo a len'gthy reverse tra~sformational process before it !S recognized as invalid, it is important to minimize the number of such parses. The seriousness of this problem is indicatedby some early results obtained by the MITRZ group. The MITRE system did not have a procedure for automatically augmenting the base component; theirs was assembled manually. Using a small grmar., one of their 12-word test sentences obtained 48 surface analyses, almost all of thorn spurious.</Paragraph> <Paragraph position="9"> Petrick had similar experience: he found that the covering grammars produced by his procedure-were too broad, producing too many surface parses. He has instead, like the MITRE group, produced his surface grammars manually, by analyzing oonstructions which appear in the surf ace structure of input sentences to determine which productions are required. In tYlis way, he has been able to produce a practically useful covering grammar for a limited area of discourse.</Paragraph> <Paragraph position="10"> These practical difficulties do n~t negate the value of an automatic procedure, such as that described by Petrick, which will produce a covering grammar we can be surc is complete (will produce all valid surface analyses) . They do indicate, however, the value of developing procedures which produce &quot;tighter&quot; surface grammars, perhaps by observing that certain sequences of transEormations are impossible and hence suppressing the corresponding surface grammar productions.</Paragraph> <Paragraph position="11"> They also suggest that a more powerful device than a- context-free grammar -- such as an &quot;augmented* context-free grammar&quot; should perhaps be used to generate the surface analyses. This view is held by a number of workers, such as Sager and Woods, who are also aiming at a transformational decomposition.</Paragraph> <Paragraph position="12"> Armed with a covering grammar, we turn now to the construction of a reverse transforrnafional component. This component should produce, from a surface structure, all base structures which can generate the surface structure (and for a spurious surface structure, indicate that there are no base structures) .</Paragraph> <Paragraph position="13"> * &quot;augmented&quot; meaning here that the grammar may contain predicates which are arbitrary computable fiu7ctions.</Paragraph> <Paragraph position="14"> The first problem is that it is not always possible to construct such a component. If a (fonvard) transformation simply deletes a portion of the parse tree, it will in general be impossible to reconstruct that portion when working backwards from the sur'facc structure: there may be an infinite number ~f deep structures which produce one surface structure. Such a situation is called i.z*z*e~?sur:~l~zb It1 .JG tt.t%'cv~ ('Phis is in contrast to recoverable delcti~ns, which make use of a compo~~ent af forward transformations briefly iiontionod oarlier: identity conditions. An idcntiCy condition specifics two or more colnpar~cnts of the struotura3. index; tlle tuL~nsformst.iun may be applied only if the trccs doit~inated by the nodes matched by these eleinents aiue identical. If soc~e -- but not a11 -- sf thcse trces are deleted by a transformation, the deletion is recoverable: the reverse transfs~-matisnal.</Paragraph> <Paragraph position="15"> component may restore the deletion by. copying another part of the tree. ) So, to be able to construct a reverse component at all, the grammar may contain no irrecoverable de leci ons .</Paragraph> <Paragraph position="16"> Life would be relatively easy if, for each transformation, one could produce an inverse tra~~sfsrmation which undoes the change wrought in the tree. Unfort:unatcly, for the PSom of trar~sforrnation we have chosen, this is nut possible. Cor~s ider, for exangle; a transformatior1 with structural index and structural change Suppsse that the only structure to which it ever applies in the generation of a sentence is producing bg as transformation seenp straight forward enough.</Paragraph> <Paragraph position="17"> xse transformation need not be a true inverse ; it mt hawe to be able to reconstruct any input given to transformation. It need only be able to repon-Senputs which occur in the derivation of sentences.</Paragraph> <Paragraph position="18"> this case, it must insert a B dominating a C !&!# m* This operation cannot be performed in the transformationaL sm dasc?r%bed above (unless a B dominating a C is present in the tree) . In terms of elementary changes to a parse tree, tHis formalism permits only deletion (of 1. replacement (of 6ne node by another), and sister adjunction (insertion of one node &quot;next&quot; to another, with both natedqby the same notie); it does not allow insertion oS node' below another. This formalism was used in Petrick's original system. Most more recent systems, including the system and Petrick's later systems, have allowed a r set of elementary operations, capable of making an ftrary change to a tree.</Paragraph> <Paragraph position="19"> Even if the set of operations is sufficient to form a set oE rehxse transformations, their formation is not trivial. a such as the one just considered, the reverse transon cannot be genexated from an examination of the transformatian alone One must examine the entire to see how the transformation is used in sentence on, This is a complex process which has (to the oras knowledge) nevert been programmed. In the MITRE , the reverse transfofdn'ations were all produced I Petrick, seeking orj yinally a procedure which w~uld work cal2y from the transf~rmational grammar took a actifberent tack. He developed a reverse transformational & which mapped the surface strirrg (the senkence) hto a set of potential deep structure strings; the latter -me ~~ parsed by the base component. The individual ions of the reverse component are str&ng and not kme mewr5,ting sules .</Paragraph> <Paragraph position="20"> The advantage of this approach lies in the simplicity of forming the individual reverse transformations. The reverse transformations will be in one-to-one corresponAence with the forward ones, and each reverse transformation T' can be computed on the basis of the corresponding forward transformation T alone. These reverse transformations will satisfy the following property: for any tree t with frontier s, if T maps t into t1 with frontier sl, then T' maps s1 into s.</Paragraph> <Paragraph position="21"> Suppose we are given a fonvard transformatisr~ with structural index si and structural change sc. since we are interested only in the frontier and not in the internal structure of the tree, we shall use a rcd~xccd structural change rsc = [+: 1 <= i <= #SC] sc(i) obtained by concatenating the elements. of the structural change. The fact that a proper analysis exists for a tree with frontier s implies that s can be givided into substrings How can this shuffle be reversed? We begin by creating an .inverse structuraZ index isi (1 <= j <= r) according to.</Paragraph> <Paragraph position="22"> j isi(j) = if rsc(j) is an integer then si(rsc(j)) else rsc(j) and an invrrae structura2 change isc (1 = j C= n) j according to isc(j) = if 3k I rsc(k) eq j then k else si(j) TQen giTlen *a string s ' , we divide it into r substrings, requiring that the? j-th substring be the frontier of some tree with root isi (again unless isi = ' X' ) . One of j j these divisions will be the s . produced 'by the forward transformation (there may be others) . These substrings are then rearranged according to thB isc, producing the original string s j s (j) = if isc( j) is an integer then s' (kc( j) ) else iscl j) If there are several matches to the isi, the transformation must be applied to all; we can only be sure that one of the resulting strings will be s. If the forward transformation is a recoverable deletion involvi3s identity conditions, the formulas given above are somewhat more complicated.</Paragraph> <Paragraph position="23"> Given a set of reverse transformations, we must finally specify the sequencing among them. The reverse transformations should be considered in precisely the reverse order fro11 that of the corresponding forward transformations. The sequericihg is again cyclic, with each iteration now creating an embedded sentence.</Paragraph> <Paragraph position="24"> Even if a reverse transformation matches the sentence being decomposed, one cannot be sure that the corresponding forward transformation was involved in the generation of the sentence. Uridoing the transformation may lead to a dead end (no other reverse transformations apply), and arkother transformation may also have produced the current structure. Consequently, both possibilities -- unsdoing and not undoing the transformation -- must Lie followed. In analogy with the fonvard transformations, one can say that all reverse transformations are optional.</Paragraph> <Paragraph position="25"> This implies, unfortunately, that parsing ti~ne can increase exponentially with the nuinber of applicable transformations. Such a procedure has therefore proved impracticabLe for a11 but the smallest grammars and sentcn~cs. To avoid this exponential growth, ths parser must have some way of dctermining directly from a tree the last transformation which applied to produce the trce. An analysis must be made of the possible intermediate structures which can arise in sentence generation, and the resulting information trans1 ated into conditions on the reverse transformations. Such an anaLysis has not been automated, but it: is norrr.al2y a straightforward and integral part of the manual construuti~n of a reverse transformational component. The MITRE group was able to specify the appropriate conditions for ail their reverse trans formations ; their sys tern provided for optional reverse transformations but their grammar did not utilize this facility.</Paragraph> <Paragraph position="26"> Eliminating optional reverse transformations is more di f ficulC in a reverse zornponen t using string ~cwriting rules, not retaining any tree structure between transforn~akions. * Most of the information which is needed to determine which transformation to undo is not available. In any case, the original impetus for using string rewriting rules -- providing a procedure which can operate directly from the transformational grammar -- is lost when we seek to add, for reasons of efficiency, re~erictdons which are not automatically generated from the grammar..</Paragraph> <Paragraph position="27"> * Petrick's original system, using s~ring rewriting rules, did retain some low-level tree structures between transforrnations , but his later sys terns did not .</Paragraph> <Paragraph position="28"> Petrick' s current parser, part of the REQUEST sys tern, is mu& closer in overaL1 structure to the MITRE design, A set of potential surface structure trees are operated upon by a reverse transformational component consis*ing of tree rewriting rules. The reverse transformations are prepared manually, not obtained automatically from corresponding forward transformations. The conditions on the reveree tra,nsformations are sufficiently tight to obviate the need for optional reverse transformations. As a result, they are able to operate efficiently with a moderately large set of reverse transl?ormations (about &quot; 130) .</Paragraph> <Paragraph position="29"> Once a11 reverse transformations have been applied, the resulting structures nust be checked to determine which are valid deep structures. If the reverse transforn~ations work on trees, each tree must be examined for productions not in the base component. If the reverse transformations work on strings, each string musk be parsed using the base component.</Paragraph> <Paragraph position="30"> The original Petrick and MITRE procedures envisioned a final synt.hesis phase. This phase would apply to each deep structure produced by the reverse transformational component. It would apply the corresponding forward transf ormations to determine whether the original sentence can be recovered; if it cannot, the deep structure is rejected.</Paragraph> <Paragraph position="31"> Such a .check is necessary if the reverse transformations can produce deep structures which do not lead-back to the original sentence and perhaps do not lead to any sentence at all. This is certainly the case with the reverse transformations applied to strings; such transformations are unable to capture many constraints present when applying the fomard transformations to trees. It can also be true with reverse transformation's working on trees, if the constraint9 on the reverse transformations are too loose. With revera transforinations on trees, however, it should be possible to formulate constraints sufficiently tight as to obviate the need for a synthesiq phase. A synthesis check is optional in the current Petrick-Plath system. Instead of applying the forward transformations in a sepsrate synLhesis phase after a deep structure is obtained, however, they are applied during analysis after each corrcspond jng inverse transformation is applied.</Paragraph> </Section> <Section position="10" start_page="19" end_page="19" type="metho"> <SectionTitle> THE PROCE,D,UKE </SectionTitle> <Paragraph position="0"> We present below a SETL verllion of one of Petrick's early trans farnational parsers. As was noted earlier, this algorithm is of i~nportance because it is the only procedure which can work dircctly from a forward transformational gramlear.</Paragraph> <Paragraph position="1"> The SETL program has bcen adapted from the LISP prvyr~m developed by Petrick at the Air Force Cambridge Research 1,aboratories (1966) , and is somewhat simpler than the version presented in Petrick's thesis. In particular, the 1966 version preserves no tree structure between reverse transformations.</Paragraph> <Paragraph position="2"> Considerable liberty has bcen taken in rewriting the program for presentation here. Features which were not deemed essential to an understanding of the basic procedure were deleted. Specifically, the procedure for converting Iolwirr-d to reverse trans formations was not included; identity conditions in transformations were not prsvided; optional denrents in structural indices twrc not allowed. On the other hand, the gross flow of control of the LISP program has bcen preserved.</Paragraph> <Paragraph position="3"> The main procedure of the parser, XFPARSE, takes five arguments : SENTENCE the string to be analyzed XFMiiI the set of reverse transformations iq UEVXFMN S the number of transformations BASEGR the (context- free) base component AUXRULES the additional rules which must be added to the base component to form the covering grammar The context-free grammars have the form described in Sec. 3.1. Each trans formation has the fbllowing components : if the transformation is unary, it aiso'has the component XFMN (it 'ISC') inverse structural change if the transformation is binary, the inverse structural change will contain a pair of sentence boundary markers, with the component sentence inside the markers and the matrix sentence outside (the general form is m m m # c c c #.m m m , with the m's part of the matrix sentence and the c's part of the component) . For the parser, it is more convenient to represent this as two separate tuples, one with the boundary markers and the elements between them replaced by the symbol 'COMP' ( m m m 'COMP' m m m ) the other with the elements between the markers (c c c) In the transformation, these are components XFm (i, 'ISC-MATRIX' ) inverse structural change for matrix sentence inverse structural change for component: sentence The value of XFPARSE is a set, with each element giving one possible deep Structure frontier for the sentence, and the reverse transformations which were applied to obtain that deep structure. To determine whether these are in fact deep structures for the sentence, it would be necessary to go through the generative transformational phase for each potential deep structure and verify that the original sentence can be obtained.</Paragraph> <Paragraph position="4"> If the deep structure contains no embedded sentences, the structure of an element in the returned set is <deep-structure-frontier, trans formations-applied> where deep-s tructure-f rontier is a tuple whose elcmcnts are the symbols in the frontier of we possible deep structure. Transformations-applied is a li,~llplo whose clcmcnts are the transformations applicd to obtain 'this deep structure ; the first elenlent gives the lust transformation applied in the decomposition, and hence the first which would bc applicd in gcrleration. If tihe deep structure colitains an cmbcdded sen tcnce , the element will still be a pair as just described; deep-structure-f rontier, however, will not incltida as elements two boundary markers and the intervening cl~beddcd suntence. Instead at that point in the tuple will bc an elenlent which Is itself a pair, with the first element the frontier cf the embedded sentence and the second the transf~rmations applied to decompose the enbedded sentence. The transformations-applied element of the top-level pair will include only the embedding transformation and the transfor~nations applied to the sentence before it was divided into matrix and constituent.</Paragraph> <Paragraph position="5"> Since each reverse transformation is optional and may apply in several ways, there will usually be n~dny paths to follow during the decon~position process. In XFPARSE, each such path is recorded as an elenlent in the set TODO; the element is a f ronbtier,'history pair, j ust like those produced as output. The main loop of* the parser runs over the trsnsformations. For each clement TODO, if the transformation applies all possible transforms of the element are added to TODO; since the transformation is optional, the ori~inal element remains as well. When an inverse embedding transformation applies, XFPARSE is called recursively to deco~~~pose the embedded sentence.</Paragraph> <Paragraph position="6"> de fine f XFPARSE (SENTENCE , XFMN , NUMXFMNS , BASEGR , AUXRULES ) ; local TODO, DONE, PARSES, SGRAMMAR, XFMNNO, CONT, SEiOT, XFAPPLD,</Paragraph> <Paragraph position="8"> /&quot;fi lraransfo~.mations haw bcon triod */ /* select and return those strings Which can be analyzcd by tho base component */</Paragraph> <Paragraph position="10"> Most of the work of the parser is done by the two Eoutines PROCES and IMPOSE, PROCES matches cv-rant string against the inverse struct~iral index, an1 IMPOSE con~putes the e'flfect of the inverse structural change.</Paragraph> <Paragraph position="11"> PROCES takes four arguments :</Paragraph> </Section> class="xml-element"></Paper>