File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/84/j84-3005_metho.xml
Size: 69,687 bytes
Last Modified: 2025-10-06 14:11:34
<?xml version="1.0" standalone="yes"?> <Paper uid="J84-3005"> <Title>Strong Generative Capacity, Weak Generative Capacity, and Modern Linguistic Theories</Title> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> 1. Unbounded Deletion, Past and Present </SectionTitle> <Paragraph position="0"> It has long been recognized that the possibility of unbounded deletion is at the root of the computational power of Aspects style transformational theories. If what a machine must do to recognize whether or not a given sentence (surface string) is in the language generated by some transformational grammar is to recover its deep structure, and if deep structures can be arbitrarily large compared to the surface strings derived from them, then the recognition procedures for such languages are not even recursive.</Paragraph> <Paragraph position="1"> Before describing Peters and Ritchie's formal characterization of this connection between deep structure length and the complexity of recognition, it would be valuable to give some insight into just why this connection should hold. Recall that a recursive set is one where membership in the set can be determined in some finite (though perhaps large) amount of time. Here, the set we have in mind is the language generated by some transformational grammar, L(TG); given some sentence s, our job is to calculate s ~ L(TG) and return a yes or no answer in some finite amount of time. Also recall that a set is recursively enumerable (r.e.) if, whenever s is in fact in L(TG), there is a procedure such that the answer yes can come back in some finite amount of time, but if s is not in the language, we have no such guarantee; the procedure could just run forever.</Paragraph> <Paragraph position="2"> The key insight connecting length of deep structure to recursive enumerability comes from examining the conditions under which a computation could run forever. If we use a standard Turing machine model of computation, one thing that could happen is that the machine could just keep using more and more new tape cells, moving a step each time. This could go on forever. So one way to obtain unbounded computation time is to use unbounded space.</Paragraph> <Paragraph position="3"> If we substitute for the &quot;tape cells&quot; of the Turing machine the number of embedded s or np cycles in some arbitrarily large deep structure, and if we must recover this deep structure in order to figure out whether or not the sentence is in the grammar, then. we have our correspondence between unbounded deep structures and unbounded time for computation.</Paragraph> <Paragraph position="4"> But is this the only way to achieve unboundedly long computations? Why not just have the machine shuttle back forth along some fixed sequence of tape cells, using the same space but looping forever? This is certainly possible, but in this case one can show that the number of distinct machine configurations is bounded above by the cross-product of a fixed number of possible moves times a fixed number of possible cell contents. But this means we could &quot;shut off&quot; the machine after this number of time steps (counting each Turing machine move as a tick of the clock), since the machine cannot do anything new after this number of moves, t In other words, given an upper bound on the space a machine uses, we can fix an upper bound on the length of time the machine can ever use without looping forever. 2 In short then, the only way to get non-recursive computations is by using unbounded space. In the transformational analog, Peters and Ritchie (1973) connected recognition complexity to the possible difference in length between deep structures and surface strings: Let G be a transformational grammar. Letfc be the cycling function of G, where fox is 0 if x is not in L(G), and otherwise is the least number s such that G assigns x a deep structure with s subsentences, lffc is bounded by an elementary (primitive) recursive function, then L(G) is elementary (primitive) recursive. (In fact, if f6 is linear, then L(G) is in a still smaller class.) If the cycling function is not bounded, then L(G) is not even recursive.</Paragraph> <Paragraph position="5"> It is the possibility of arbitrary deletion that makes a surface sentence arbitrarily &quot;shorter&quot; than its corresponding underlying deep structure. Lapointe (1977), in an excellent review, sums up the situation: Putnam noted that early theories of transformations allowed grammars which could generate any r.e. language (whether recursive or not). The chief reason for this was that early theories allowed arbitrary deletions and substitutions in the course of a derivation. Arbitrary permutations or copying could never cause a grammar to generate a nonrecursive set, for if w i and ~i+l are successive steps in a derivation such that Ti+ 1 arises through the application i The &quot;clock&quot; takes up a bit of extra space - log space - since it has to count! 2 This standard result may be found in Hoperoft and Ullman (1979: 300-301 ).</Paragraph> <Paragraph position="6"> 190 Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 Robert C. Berwick Generative Capacity and Linguistic Theory of a permutation or copying rule \[from\] ri then . . . the number of terminal symbols in ~i+1 will be at least as great as \[the number of\] terminal symbols in ~i- But this property, that successive steps in a derivation do not &quot;shrink&quot; in length, is the basic defining characteristic of context-sensitive grammars. Therefore only the application of rules which reduce length (that is, deletions and substitutions) could cause a grammar to generate a non-CS \[context-sensitive rcb\], and perhaps a nonrecursive, language. (1977: 228) We can exhibit this result in a compact form. It is well known that any r.e. set can be described as the homomorphic image of the intersection of two context-free languages (Ginsburg, Greibach\] and Harrison 1967).</Paragraph> <Paragraph position="8"> Recall that a homomorphism is simply a &quot;respelling&quot; of the symbols of a language. The key point is that the homomorphism required here permits the deletion of unbounded strings of symbols, that is,</Paragraph> <Paragraph position="10"> where ~ is the empty string. In fact, all the proofs demonstrating the power of Aspects-style transformational grammars make use of this erasing power in one fashion or another. The remainder of this section reviews two of these demonstrations in the literature, one by Peters (1973), and one by Kimball (1967). (Another demonstration that Aspects-style TGs can generate any r.e.</Paragraph> <Paragraph position="11"> language, given by Salomma (1971), also uses unbounded deletion, but is similar to Kimball's approach and will not be discussed here.) The point of going through the examples in detail is to show exactly how each proof relies on unbounded deletion, and why it is that each does not go through under the assumptions of the current government-binding theory. The basic reason for the change is that unlimited erasing or deletion is no longer allowed.</Paragraph> <Paragraph position="12"> Indeed, as the next section will make clear, only a linear amount of erasing is permitted in current theories. This insight is the key to the analysis of the modern theory.</Paragraph> <Paragraph position="13"> We begin with Peters's 1973 demonstration. Peters gives a specific example showing just how &quot;large&quot; deep structures can be associated with &quot;short&quot; surface sentences. Again, copying and deletion are the culprits.</Paragraph> <Paragraph position="14"> Peters's example relies on the &quot;Equi np deletion&quot; analysis of sentences such as these: 1. Their sitting down promises to steady the canoe.</Paragraph> <Paragraph position="15"> On this account, such sentences have an underlying structure that explicitly reconstructs the missing np subject of the embedded complement to promise: 2. \[NP Their sitting down\] promises \[s \[NP their sitting down\] to steady the canoe\].</Paragraph> <Paragraph position="16"> Note that this sentence consists of three S phrases: the root S and two embedded S phrases (the subject NP of the matrix clause and the subject NP of the complement of the VP). The subject NP of the VP complement is deleted under structural identity with the matrix subject NP. This deletion follows the &quot;recoverability of deletion&quot; constraint. Peters next builds a surface string that has a large associated deep structure by embedding this sentence recursively in a construction of the same type, that is, a sentence that has a matrix subject NP structurally identical to the subject NP of a verb complement. But the subject NP has more than two S phrases (three). Given identity between subject NP of the matrix and the subject of the complement, it follows that at the level of deep structure the subject NP of the complement must have the same number of subsentences as the subject NP of the matrix, here, three. The hew sentence given below must have a deep structure with more than 22 = 4 S phrases in all: 3. Their sitting down promising to steady the canoe threatens to spoil the joke.</Paragraph> <Paragraph position="17"> Clearly, as Peters notes, we can carry out this embedding over and over again. Each time the number of deep structure subsentences is at least doubled, because of the assumption that the complement NP subject is identical to that of the matrix subject. If we let ds(n) be the size of the deep structure corresponding to such a sentence of length n, then we have the inductive formula that ds(n) > 2ds(n-1). If we solve this formula, we find that the number of deep structure subsentences grows as an exponential function when compared to the length of the surface string, exactly the sort of sentence that was to be constructed. If the sentence recognizer must reconstruct this entire deep structure in order to determine language membership, then at least this much space, and hence at least this much time, will be required, just to write down the deep structure.</Paragraph> <Paragraph position="18"> Interestingly, the argument does not work under current versions of transformational theory. The simple reason is that we no longer explicitly copy material to reconstruct a deep structure; in fact, we no longer rebuild deep structure at all. In place of the literally duplicated subject complement NPs, we have an empty category placeholder, PRO, indexed to the proper antecedent NP as appropriate?</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. \[NP \[NP Their sitting down\], promising \[Pro/to steady </SectionTitle> <Paragraph position="0"> the canoe\]\]j threatens \[Proj to spoil the joke\] Crucially, the indexed Pros are not &quot;nested&quot;. What does this mean and why does this matter? Proj is indexed to the entire matrix subject NP their sitting down promising to steady the canoe. But it does not contain as a subpart the PRO corresponding to their sitting down (although it may be indexed to a subpart of a long antecedent string). The underlying predicate-argument structure is fixed without building up an explicit representation of antecedents in the embedded clause, what used to be called &quot;deep structure.&quot;</Paragraph> </Section> <Section position="4" start_page="0" end_page="2" type="metho"> <SectionTitle> 3 The other possibility is that the empty category is a trace, the result of </SectionTitle> <Paragraph position="0"> the movement of an NP from an argument position like the direct object of a transitive verb. Here the empty category is PRO rather than trace because the subject NP position in the complement is not governed by tense or the matrix verb, but the reader may safely ignore this detail for our purposes here.</Paragraph> <Paragraph position="1"> Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 191 Robert C. Berwick Generative Capacity and Linguistic Theory By changing what the representation looks like we have avoided the problem of exponential space growth. At each step we add just a single new element to the reconstructed structure, the new PRO. Our new inductive equation is simply ds(n)=ds(n-1)+l - a linear increase in the size of reconstructed forms, compared to the surface sentence lengths. In fact, if we ignored the bracketing and just counted the PROs, at each step we also add a new word (the new verb) and the underlying representations are always just a fixed constant larger than the corresponding surface sentences.</Paragraph> <Paragraph position="2"> One subtle point still remains. At each step we add an indexed element, PRO/. The index itself must grow as the number of PROs increases. If we assume a standard binary encoding, an index of size i will take log2i space to write down, not the constant space implicitly assumed just above. Since log/< i, at worst the space added for each embedding will be proportional to i. Summed over n possible embeddings, this is at worst n 2 space, not exponential space. In the next section we shall see how this representation of so-called &quot;empty categories&quot; works in general? Peters's example centered on a &quot;natural&quot; example that exhibited exponential deep structure growth. We next turn to a more artificial example, but one showing how arbitrarily large deep structures may be used to generate any r.e. language. Kimball (1967) does this by exhibiting a transformational grammar that meets a variant of the Ginsburg, Greibach, and Harrison theorem (due to Haines, cited in Kimball 1967: 185). In brief, Kimball sets up a base context-free grammar to generate two trees, rooted at S 1 and S 2, corresponding to the two context-free languages demanded by the homomorphism theorem (CFLi and CFL2) and a third tree dominating these two that eventually &quot;simulates&quot; the homomorphism H. S 1 dominates a terminal string labeled x and S 2, a terminal string labeled y. Dominating these two subtrees is a third tree that, besides x and y, dominates a terminal string z.</Paragraph> <Paragraph position="3"> The idea is to use a single transformation to successively check that the first member of x matches the first member of y and z; if so, this element is erased in x and y. If all elements match, and we are at the end of z (indicated by a special symbol), then the two strings are identical; this step carries out the intersection of the two context-free languages. A final transformation performs the required homomorphism. Figure 1 depicts the overall scheme. It is important to point out that both S 1 and S 2 generate context-free languages that are self-embedding, of the general form aicaii. Thus they must exhibit recursion on some nonterminal node in the relevant context-free grammar. null As Kimball notes, the strings x and y are deleted under identity with z, so nothing is amiss here in the Aspects theory. X and y are arbitrarily long. The underlying &quot;deep structure&quot; (the context-free base) is arbitrarily larger than the resulting surface string, namely, some part of z that remains after the homomorphism does its work.</Paragraph> <Paragraph position="4"> What happens to this example in a modern transformational theory? The key point is that the modern theory does not have a deletion operation like the one just presented. Instead, a constituent is moved from one position to a &quot;landing site&quot; within its own cyclic domain or to the next higher cyclic domain? In English, the cyclic nodes are S and NP. 6 When a node is moved, it leaves behind a trace, denoted e, of the same category as the displaced constituent, but with no phonological features.</Paragraph> <Paragraph position="5"> (Thus the trace is not &quot;pronounced&quot; and does not show up in the surface sentence.) The trace is co-indexed to the displaced constituent, as indicated by a subscript. For example, given the sentence, 5. John bought what we could move what, yielding (after some adjustment with the auxiliary verb), 6. What did John buy e i Now consider Kimball's tree structures again. Since S 1 and S 2 are true recursive sub-trees, in a trace-oriented theory the way that we would get deletion would be to successively move elements of x and y to higher and higher phrases, leaving behind traces (denoted by ei) as we go.</Paragraph> <Paragraph position="6"> Schematically, oar output structure would have to look something like that in figure 2, where R indicates a cyclic node. As it stands though, this structure is impossible because it requires traces to be linked to elements that are &quot;too far away&quot;: according to a key constraint of the modern transformational theory, the subjaceney constraint, the linking can cross at most one cyclic node.</Paragraph> <Paragraph position="7"> Since all recursion must eventually pass through S or NP nodes, subjacency must be violated by the trees pictured in figure 2. Put another way, the rule that &quot;erases&quot;, for example, x 1, now must move x I across many S or NP nodes, and this movement is not directly possible. An alternative is to move X i successive cyclically, up the chain of R nodes one step at a time. But there are only two ways to do this: either we wind up moving more and more nodes at each step - at the n th step we move n nodes, which must &quot;land&quot; at n spots at the next higher cyclic domain - or we collapse how many nodes we move by adjoining some of the moved elements together. Figure 3 shows both possibilities. null Both solutions are ruled out in current theories of transformational grammar. The movement of an arbitrary number of nodes in a single cycle is impossible because it calls for an arbitrary number of &quot;landing sites&quot; in domain n+l. In fact, there can only be a finite number of such possibilities, as specified by a set of context-free base rules. For example, we can move an NP to a subject or object 4 There is another solution to the index growth problem, one that will be required later on. Suppose that each indexed NP or PRO is in effect a distinct element of the grammar's vocabulary. Thus the grammar allows a denumerable infinity of &quot;'pre-indexed&quot; elements NPi, NP 2 ..... This is not such a strange proposal, since the index is not used for any syntactic process, but simply for co-indexing. As we shall see, this same proposal is made, usually implicitly, in most current theories, for example, in the lexical-functional theory.</Paragraph> <Paragraph position="8"> Robert C. Berwick Generative Capacity and Linguistic Theory Transformation: structural condition (roughly): \[s, Xx\] Yy\] Zz x = y = z; .~, y, z are terminal symbols structural change: delete x and y</Paragraph> <Paragraph position="10"> position, a wh phrase to a comp position (the position occupied by that in I know that Mary likes ice cream). 7 But we cannot move n nodes in a single cycle, because there will not be enough places tO put the moved constituents.</Paragraph> <Paragraph position="11"> The second solution is also ruled out. Either the adjoined NPs linked to the traces violate subjacency (as pictured), or else we must also adjoin at each step i a copy of the i-1 st trace to the i th trace. But this last method is also barred, because we do not admit &quot;nested&quot; traces, or tree structures that dominate some arbitrarily deep sequence of nested empty elements. In other words, a trace can be co-indexed at its &quot;top-level&quot; to a displaced constituent, but is otherwise &quot;opaque&quot;; it has no interior structure. There are in fact good linguistic reasons for a principle banning nested traces (see Hornstein 1984).</Paragraph> <Paragraph position="12"> Section 3 probes the formal implications of this constraint in more detail.</Paragraph> <Paragraph position="13"> It is hard then to see how the required trace structure linking x and z could even be built. But this is just the first step in Kimball's proof. Enforcing equality between x and y looks even harder. Of course, this by no means shows that there is no way to carry out Kimball's construction, but it does hint at some of the difficulties in a revised grammatical framework that does not permit the same liberties with deletions as Aspects, and does not rely on an explicitly reconstructed D-structure.</Paragraph> <Paragraph position="14"> 2. The Complexity of Modern Transformational</Paragraph> <Section position="1" start_page="0" end_page="2" type="sub_section"> <SectionTitle> Grammar </SectionTitle> <Paragraph position="0"> As we have seen, the crux of the problem with Aspects-style transformational grammars is deletion, and, more pointedly, the demand to recover unboundedly large deep structures in order to determine sentence-hood. The proofs of intractability all hinge on the assumption that the job of the parser is to recover a literal copy of deleted elements. If this assumption is not needed, then the job of the recognizer could well be easier. The modern theory requires only the recovery of a trace- or PRO-augmented 7 See the next section for more on &quot;landing sites.&quot; structure, an &quot;annotated surface structure&quot;. This makes a difference. As Lapointe (1977) shows, it makes the recognition problem for such languages recursive. Whatever the merits of their arguments on other grounds, Lapointe's result renders moot Bresnan and Kaplan'sconcerns (1982: xli) about the non-recursiveness of transformational theory, since their criticisms apply only to the older Aspects theory. This is our first conclusion.</Paragraph> <Paragraph position="1"> Much more than this can be said. If the recognizer does not have to recover full deep structures, then its job could be much easier, as observed by Peters and Ritchie 1973: Putnam proposed that the class of transformational grammars be defined so that they satisfy a &quot;cut-elimination&quot; theorem. We can interpret this rather broadly to mean that for for every grammar G 1 in a class there exists 2 such that (i) L(G 1) = L(G2) and (ii) there is a constant k with the property that for every x in L(G2), there is a deep phrase marker q~ underlying x with respect to G 2 such that l\[d(q~)\] < kx. (1973: 81-82) Here, the notation l(x) stands for &quot;length of&quot;, while d((~) is the &quot;debracketization&quot; of the deep structure. The debracketization consists of terminal elements sans right and left brackets, but with traces and PROs. As Peters and Ritchie go on to say: We now see that any grammar satisfying such a cut-elimination theorem generates a language which more than being recursive is context sensitive. This is so because a nondeterministic linear bounded automaton can determine both that a labeled bracketing 4 is strongly generated by a context sensitive grammar and that it underlies a given string $x$ if the automaton has enough tape to write 4.</Paragraph> <Paragraph position="2"> (1973:82) How would such a linearly-bounded recognizer work? Roughly, it would use a kind of &quot;analysis by synthesis&quot;: given a sentence of length n, it would mark out a length of input tape kn, k a constant depending on the transformational grammar. The machine would be guaranteed that annotated surface structures could not get larger than this. Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 193 Robert C. Berwick Generative Capacity and Linguistic Theory</Paragraph> <Paragraph position="4"> The machine would then use its nondeterministic power to &quot;guess&quot; all possible annotated surface structures less than or equal to this length, now with the proviso that one of them must be a correct underlying structure if the sentence in question is in fact in the language generated by the grammar. Since the number of (NP or S) cycles in each structure is bounded, we may simply try all possible transformational rules (again nondeterministically) to produce possible surface sentences, one at a time. If there is a match, then the sentence is in the language; if all structures less than our bound are tested an fail, then the sentence is not in the language, s What then of modern transformational grammar? We claim that D-structure need not be reconstructed at all to determine grammaticality. This may be a surprise for some readers accustomed to the older picture of a transformational grammar, where annotated surface structure is just the result of mapping from D-structure under the operation of Move a. But it is nonetheless true. Chomsky (1981: 91ff.) observes that annotated surface structures may be simply defined with respect to certain admissibility conditions (more on these shortly) without regard to an actual movement rule that maps from one level to another. 9 Our goal, then, will be to assume that only annotated surface structure is built to test grammaticality. &quot;~ We must now define more carefully just what annotated surface structure is in the current GB theory. We then show that these representations are at most linearly larger than their corresponding surface sentences.</Paragraph> <Paragraph position="5"> We begin simply by describing the set of admissible annotated surface structures without reference to D-structure. That is, we define the set of annotated surface structures statically, in the manner that Joshi and Levy (1977) define a set of admissible tree structures. Roughly, the annotated surface structures of a given grammar are just the set of all well-formed labeled bracketings produced by the constraints of X theory plus the restrictions imposed by lexical subcategorization, plus bracketings where empty categories appear in certain positions, governed by a fixed set of conditions. In more detail, the well-formed annotated surface structures are defined inductively as follows: ~ (l): Following standard assumptions, constraints along with locality conditions on subcategorization together yield a system describable by a context-free grammar (see, e.g., Gazdar and Pullum 1981). All NPs dominate some lexical material and correspond in one and only one way to the A positions, arguments subcategorized by the relevant verbs, again following the method outlined by Joshi and Levy (1977); the positions in English are: adjacent to the verb, for an object NP; first NP under S, for subject NP; first NP under PP for oblique PP, and so forth. ~2 Further, all such lexical NPs must appear in argument (A) positions, where the notion of an argument position again depends in a strictly local way on the verb (e.g., the subject position of seem in English is not an argument position). Finally, we allow a finite number of specified lexical deletions (of particular words), such s The details of the testing procedure are not given here, but may of course add some fixed space to the kn bound required to write down the annotated surface structures.</Paragraph> <Paragraph position="6"> 9 At least, this seems to be so for all cases in English. But a note of qualification is required. There may be subtle examples showing that D-structure must be explicitly rebuilt in order to test grammaticality. Such examples do not seem to arise in English, but they may in other languages, such as Italian. So for example, it may be in Italian that the grammaticality of such examples as was built a house may demand explicit reference to D-structure, in order to determine whether a verb is a real passive or merely adjectival. If so, then the conclusions in the main text might not hold, since D-structure would have to be built. to Note that this is true of the Marcus parser (Marcus 1982).</Paragraph> <Paragraph position="7"> II Even this account is incomplete in some details, ignoring certain alternative formulations of the theory. But these defects can be repaired at the cost of adding more or slightly different clauses to the definition. For example, we omit a discussion of clitics, verb movement, government defined as mutual c-command, or Subject-Verb agreement. This last constraint may be defined via lexical insertion contexts, following Chomsky (1965) as formalized by Joshi and Levy (1977).</Paragraph> <Paragraph position="9"> as of, you, in any single phrase, as long as no other constraints are violated. All labeled bracketings meeting these conditions are well-formed annotated surface structures.</Paragraph> <Paragraph position="10"> (II): Any of the structures of (I) with empty categories (e.c.'s) replacing NPs, subject to the following conditions, are well-formed annotated surface structures: (i) Every such e.c. is an atomic constituent with a numerical index and with no internal bracketing; null (ii) If the e.c is governed by some X deg (lexical element such as verb, noun, and so on), where X governs Y iff the first branching node dominating X dominates Y, and there is no intervening maximal projection (full phrase) between X and Y then: 1. the e.c. must be c-commanded by an NP antecedent(= element with the same numerical index), where c-command is defined just as government but dropping the clause about maximal projections; and 2. the antecedent is either a lexical NP or another e.c. in a non-argument (A) position (the complement of the A positions defined above); and 3. the e.c. must be subjacent to that antece null dent, where subjacency is defined as usual.</Paragraph> <Paragraph position="11"> (iii) Else, the e.c. is ungoverned (is a &quot;sc PRO&quot;) and can receive an arbitrary index. ~3 (III): Any of the structures defined by (I) and (II), and, in addition, with a wh phrase in COMP position c-commanding a governed e.c., or another wh phrase in COMP position and with the same index as that other e.c. or phrase, is a well-formed annotated surface structure.</Paragraph> <Paragraph position="12"> (IV): Any of the structures defined by (I)-(III), and, in addition, with one of those structures with an e.c. having an index the same as that of an element adjoined to VP (following Baltin 1982), and c-commanded and subjacent to that element, is a well-formed annotated surface structure/4 There can be at most one such adjoined position.</Paragraph> <Paragraph position="13"> (V): Any of the structures defined by (I)-(IV) conjoined so as to meet Williams's (1978) Across the Board (ATB) conventions is a well-formed annotated t3 Subject to constraints dictated by &quot;control&quot; theory, that is. We ignore this matter here by assuming an arbitrary index for PRO; this does not bear in any essential way on the description of the possible annotated surface structures. Neither c-command nor subjacency seem required for control; hence this may fall under whatever mechanism it is that interprets the indices of ordinary pronouns generally, a matter we leave outside scope of annotated surface structure.</Paragraph> <Paragraph position="14"> t4 Note that the position so adjoined to VP is not part of the obligatory argument structure mentioned by the verb.</Paragraph> <Paragraph position="15"> Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 195 Robert C. Berwick Generative Capacity and Linguistic Theory surface structure. ~5 Without going into detail on the ATB constraint, its effect is to place e.c.'s in a conjunct if its verb or verb phrase is missing; the e.c. is bound to a c-commanding antecedent, as before.</Paragraph> <Paragraph position="16"> In the case of more complex reduced conjunctions (The meat is ready to heat, serve, and eat) the missing constituent sequence may be represented by a single e.c. in each missing position, though alternative analyses are to be preferred here. ~5 ~ (VI):Nothing else is a well-formed annotated surface structure.</Paragraph> <Paragraph position="17"> We also need some technical constraints. As pointed out above, we must assume that the actual index of an NP (as denoted by a subscript) takes up no extra space beyond the constant storage required for a distinct nonterminal name. Otherwise the amount of space required to write down an annotated structure of length proportional to $n$ could be at worst proportional to n logn, where the logn factor is used to hold the number of the index n. To sidestep this problem we assume a denumerable infinity of distinct NP &quot;names&quot;.</Paragraph> <Paragraph position="18"> This same assumption must be made explicitly or implicitly in any theory that assumes co-indexing but still strives for linearity in the size of underlying structures. Consider Kaplan and Bresnan's sketch (1982:263-267) that lexical-functional language recognition uses only linear space. In the full description of lexical-functional languages, names are distinguished as co-referential or not. Thus, two occurrences of, say, Mary must be distinct. In the LFG formalism, this is indicated by subscripts. (See for example Kaplan and Bresnan's ' examples 1982:225-227.) But then, this means that the sheer size of a functional structure (f-structure), the lexical-functional analog of an annotated surface structure, could be of size n logn, again with logn space for the indices. Just writing down one f-structure could take more than linear space. Kaplan and Bresnan do not say in any detail just how they intend to check for sentence-hood using just linear space, but since all of their descriptions involve building an f-structure, we may assume that at least this much space will be required. In short, in order to a linear space bound, Kaplan and Bresnan need to adopt exactly the proposal made above.</Paragraph> <Paragraph position="19"> A second key assumption is that traces may not be nested; a trace cannot contain another trace. This ban is required because otherwise we could build a tree representation containing just empty elements (the traces). Since a tree can be arbitrarily large, a single NP or S domain could have an arbitrarily large but surface-empty structure of elements, just what was to be avoided. We are now ready to state just what we want to show.</Paragraph> <Paragraph position="20"> Theorem Let G be a government-binding grammar, and L(G) the language it generates. Let AS i be the annotated surface structure) associated with sentence w i in L(G). (If there is more than one such annotated surface structure, then AS i is a set of annotated surface structures; AS i is a singleton set if there is just one annotated surface structure.) Then there is a constant k such that for all sentences w i in L(G), and for all annotated surface structures AS i underlying w i, \]ASi\[ <_ k\[wi\[. The proof proceeds by induction on the number of cycles (S or NP domains) in an annotated surface structure corresponding to a sentence in L(G). First we shall show that the length of a one-cycle annotated surface structure is linearly proportional to its corresponding surface sentence. This will be easy, since within a single cycle (S or NP domain), there can be movement to at most a fixed number of &quot;landing sites&quot; as defined above: the ~ positions, plus COMP, plus one adjunct to a VP. The lexical entry for a verb mentions only a finite number of such arguments. The one additional landing site adjoined to VP can receive only one phrase, because in order to receive more, additional phrases would have to be adjoined in the manner of the Kimball-type structures discussed in the previous section. But these would violate subjacency? 6 Once we have established linearity in the base case, we now look at annotated surface structures i and i+l cycles deep. Assuming that structures of depth i maintain linearity, we show that those of depth i+l do also. This step is tedious, since one must go through the possible ways to obtain the i+l cycle from the one preceding it, one by one. The landing site analysis is exploited here, as is subjacency. The empty category analysis is also used. Subjacency helps because there is no way to &quot;skip&quot; a cycle, constructing structures of depth i+2 from those of depth i directly. Proof Basis step. i = 1 (bottom cycle, no embedded sentences or NPs.) Given a surface sentence w i, we consider the length of the corresponding annotated surface structure. Let s = the length of the surface sentence. There are four cases. Case 1. No e.c.'s in the S or NP cycle, and no specified lexical deletions. Assume a context-free base with no useless nonterminals or cycles, and with rules where the length of the longest righthand side is p. If m = the number of nonterminals in the derivation of a sentence in this grammar, then m _< cs for some fixed positive integer C/, as may be easily verified by induction. In addition, to write down the annotated surface structure, we must add two bracket labels for each nonterminal symbol. Thus IASi\] = 2m + s _< 3cs. Note that if we wanted to establish a relationship between debracketed annotated surface structures and surface strings, then this last step would be unnecessary.</Paragraph> <Paragraph position="21"> Case 2. A finite number of specified lexical deletions within this cycle, e.g, of, as in, all of the people ~ all the people, or an imperative (if a root sentence). Let the t5 For a more recent formulation of the ATB conventions as the linear union of phrase markers, see Goodall (1983). We note in passing that the phrase marker union also preserves linearity of annotated surface structures. null t6 Recall that now we are applying subjacency as a static constraint on annotated surface structures. In fact, since in the basis step we consider only annotated surface structures one S or NP cycle deep, this case does not arise.</Paragraph> <Paragraph position="22"> 196 Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 Robert C. Berwick Generative Capacity and Linguistic Theory maximum number of these deletions be K. Then IASil</Paragraph> <Paragraph position="24"> (3c+3K)s. Let c r = 3c + 3K. Then IASil _< c's. Again, we can omit the 2K factor for the debracketed case.</Paragraph> <Paragraph position="25"> Case 3. Empty categories within this (S or NP) cycle, with antecedents in the same S or NP. There are a finite number of such positions, as described earlier: only NP argument positions (thematically marked by the verb); or the adjoined position to VP. Let C bound this number from above. Then IASil < cPs + C; using the same approach as in case 2, the righthand side of this inequality is less than c's. Clearly, combinations of cases 2 and 3 cause no problems because we can add a constant number of deletions together with the constants obtained from within cycle e.c.'s to obtain a new constant factor.</Paragraph> <Paragraph position="26"> Case 4. Empty categories in the cycle without antecedents in that domain. If an empty category exists in an S or NP without an antecedent (NP, wh, etc.) in that domain, then clearly the corresponding surface string is shorter than the corresponding annotated surface structure, since it does not include the empty category symbols. However, again the addition of each empty category symbol adds just one to the total annotated surface structure length, and there are at most a finite number of such positions (~'positions, such as COMP, as described earlier). Therefore, the corresponding annotated surface structure is just a constant longer than the corresponding surface sentences, as in Case 3.</Paragraph> <Paragraph position="27"> Well-formed annotated surface structures exhibiting the features of gapping, VP deletion, and conjunction reduction do not show up at this step, since they combine two i level cycles into an i+l domain. They are considered in the induction step. This completes the basis step.</Paragraph> <Paragraph position="28"> Induction step. Suppose that up through cycle i we have that IASil _< ksi, where s i is the terminal length at the ith cycle, and k is a constant. We now must show that this relation holds for structures of depth i+1. There are five possibilities.</Paragraph> <Paragraph position="29"> Case 1. No empty categories athe top level of cycle i+1. Then the terminal string associated with this cycle consists of two parts, whatever terminals are introduced directly by nonterminals in cycle i+ 1 and new elements of cycle i+1 bound to e.c.'s in cycle i. But there are a finite number of empty category sites for material in the current domain, by the definition of a well-formed annotated surface structure. Call this number C. By the inductive hypothesis, any of these constituents themselves meet the condition that their annotated surface structures are bounded above by a linear multiple of their terminal strings. Thus the total annotated surface structure for the current cycle is at most C times the bound on previous cycles, plus a constant to accommodate the length of terminals introduced directly in cycle i+1.</Paragraph> <Paragraph position="31"> Case 2. Specified deletions in cycle i+l. If there are a finite number of specified lexical deletions, this is just like the basis case. This case includes the introduction of PROs. PRO can appear in a finite number of new positions in cycle i+l (the Subject position, if ungoverned). null Case 3. E.c.'s with antecedents within cycle i+1. The demonstration proceeds as in the basis case.</Paragraph> <Paragraph position="32"> Case 4. E.c.'s with antecedents in cycle i+2. Again, like the basis case. This cannot change the linearity bound.</Paragraph> <Paragraph position="33"> Case 5. Annotated surface structures with empty verb, verb phrase, and coordinate reduction positions. This is the only new situation that arises in the induction step as opposed to the basis step. Suppose we have a conjunct formed by deleting material from each of n conjuncts. An example is the meat is ready to take out of the fridge, heat, and serve. The example is from Rounds (1975:137) attributed to E. Bach. If such constructions involved actual recovery of deleted deep structure material, then problems could arise. The literal material would have to be copied, and we could get a linearity-violating Peters-type sentence.</Paragraph> <Paragraph position="34"> But this problem can be avoided with an interpretive approach governed by the &quot;across the board&quot; conventions of Williams (1978). We supply indices, not actual copied material, for the well-formed annotated surface structure. The ATB constraint lines up the conjuncts to be co-ordinated, one under the other. For example, a sentence like the meat is ready to heat, serve, and eat is factored as follows, where we have deleted duplicate material in other conjuncts.</Paragraph> <Paragraph position="35"> The meat/is ready to heat e i We can represent term (2) as an unordered set of lexical items, for example, heat, eat, serve. Plainly, this representation cannot be more than linearly larger than the surface sentence? 7 Similar results hold for empty categories linked to verbs and verb phrases. Each cyclic domain of the the associated annotated surface structure contains a constant number of empty VP &quot;gaps&quot;, denoted \[e\]; there can be at most one main verb, VP, or auxiliary verb sequence 17 Again, the Goodall (1983) representation would be suitable here. Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 197 Robert C. Berwick Generative Capacity and Linguistic Theory per cyclic domain. Therefore, the total number of gaps in the conjoined structure is bounded from above by a constant times Si+l, the length of the terminal string.</Paragraph> <Paragraph position="36"> This exhausts the range of possible cases, completing the induction and the proof.</Paragraph> <Paragraph position="37"> The linearity demonstration shows restricting deletions has a powerful effect on the weak generative capacity of a transformational grammar. The implications of that result for linguistic description are discussed in the next section.</Paragraph> </Section> </Section> <Section position="5" start_page="2" end_page="2" type="metho"> <SectionTitle> 3. The Root of Complexity: kexicaI-Functional </SectionTitle> <Paragraph position="0"> Grammars and GB grammars The results of the previous section show something about the weak generative capacity of modern transformational grammars. The study of weak generative capacity is not an end in itself, however. In the best case, we would like weak generative capacity to be a kind of diagnostic aid to tell us that something is amiss with a linguistic theory. We would like our theory to be able to describe all and only the natural languages. A theory could fail to do this in two ways, either in terms of weak generative capacity or in terms of strong generative capacity. A theory that is too powerful could generate either unnatural tree structures (and so be too powerful in terms of strong generative capacity) or it could generate unnatural sentences (and be too powerful in terms of weak generative capacity). If we are interested in the rule systems (grammars) that underly linguistic behavior, then it is ultimately strong generative capacity that is of interest. Still, weak generative capacity can help here to point the way to excess strong generative capacity. We will also not want to stop at diagnosis. We also want to determine just why a particular theory can generate too many languages - what the source of its excess power is. We saw that with Aspects transformational grammars the additional power lay with unbridled deletion. What of other recent theories of grammar? In this section we shall present an example of exactly this kind. This will be a language that is presumably not a natural language. We will use this language as a &quot;probe&quot; into the power of current linguistic theories. We shall see that this language can be easily generated by lexical-functional grammar, but not by a GB grammar. More important, this weak generative result has a strong generative capacity reflex. We can use this result to locate the excess power of the LFG system. This could be of value in discovering restrictions for the LFG system. In terms of strong generative capacity, the more important goal, we shall see that the LFG theory has the ability to define unification predicates over hierarchical tree structures, something unavailable in the GB theory. This extension of the traditional definition of linguistic predicates has implications for the ability of LFG to describe unnatural grammars, not just unnatural languages.</Paragraph> <Paragraph position="1"> Here is what we mean to show in more detail. LFGs use a particular kind of unification machinery (described below) in order to account for well-formed sentence structures of Dutch (Bresnan, Kaplan, Peters, Zaenen 1982).</Paragraph> <Paragraph position="2"> This unification procedure is central to the construction of the grammatical structures of lexical-functional theory.</Paragraph> <Paragraph position="3"> But it is also powerful enough to describe grammars quite unlike any natural grammatical system. By changing the Dutch LFG only slightly we can produce a rule system that allows &quot;object control&quot; via a preceding NP (as in Mary persuaded John to leave) just in case the NP in question and the controlled position are equally deeply embedded.</Paragraph> <Paragraph position="4"> This we take to be an unnatural rule system.</Paragraph> <Paragraph position="5"> To begin, we present our artificial &quot;diagnostic&quot; language, the power of 2 language, L 2 = {aili is a power of 2}. L 2 is a lexical functional language, since the following lexical-functional grammar generates it:</Paragraph> <Paragraph position="7"> nonterminals enforce the restriction that the same number of A expansions be taken on each subtree; expansions are symmetric all the way down the &quot;words&quot;, the as. This guarantees a power of 2 expansion; the details are left to the reader.</Paragraph> <Paragraph position="8"> We can now ask deeper questions. First, why can lexical-functional grammars generate such languages? More on this shortly. Second, can L 2 be generated by a GB grammar? To answer the second question first, the answer here seems to be no, because of a property of GB languages that is violated by L 2, namely, the constant growth property, defined and discussed for tree adjunct grammars by Joshi (1983). This property will only be briefly explored below; for more complete remarks, see Berwick and Weinberg (1984).</Paragraph> <Paragraph position="9"> If we arrange the sentences of L 2 in order of increasing length, we see that they become farther and farther apart. In fact, for any fixed set of constants C, we can always find a sentence of Le, w i, say, such that there is no wj in L2, with Iwi\[ = Iwjl + c, for c * C. We state this property as follows: Definition. A language L is said to possess the constant growth property (or be constant growth) if and only if there exists a constant M and a set of constants C such that for all sentences w k * L with Iwkl > M, there exists another sentence in L, wkr, such that w k is at most a constant longer than Wkt, Iwkl = IWk' I / c, for c * C. A grammar is said to possess the constant growth property iff the language it generates is constant growth. ~8 Lexical-functional grammars, then, are not constant growth. In contrast, government-binding grammars cannot generate such languages because they are constant growth. Intuitively, the demonstration works much like the linearity proof. For a full discussion, see Berwick and 18 So far as it can be now determined, constant growth seems to be a purely mathematical property of natural languages that has no clear &quot;'functional&quot; reflection. Presumably, constant growth is a derivative of other, deeper properties of natural languages.</Paragraph> <Paragraph position="10"> 198 Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 Robert C. Berwick Generative Capacity and Linguistic Theory Weinberg (1984: Appendix A). The point is that no government-binding grammar can generate L 2 or any nonconstant growth language.E9 What is it that gives lexical-functional grammars their ability to define languages like L2? LFGs can test complete subtrees for compatibility. At a dominating node we can check whether an entire hierarchical structure is feature compatible with another structure* This follows from the account of functional structure unification defined by Kaplan and Bresnan (1982). Functional structures are hierarchical in nature; they are directed, acyclic graphs. Functional structure well-formedness is defined by the condition of functional structure uniqueness. Roughly speaking, there can be no conflicts in the assignment of feature complexes, even if those features are in fact hierarchical structures.</Paragraph> <Paragraph position="11"> This kind of feature compatibility test goes well beyond that required for the checking of &quot;ordinary&quot; agreement, as in subject-verb number agreement. When we test a subject and verb for agreement, all that we do is check an unordered list of features for compatibility* The number, gender, and so forth of the subject NP must agree with that of the verb, as percolated through the VP. It is a far cry from this kind of agreement checking to the &quot;agreement&quot; of two entire tree structures, but this is what is implied by the lexical-functional unification procedure) deg As we saw in our earlier example, this unification procedure is sufficient to generate the power of 2 language* A natural question to ask then is whether this ability to compare entire functional structures is necessary. For if all cases of functional structure unification can be replaced by unordered feature agreement tests, then there is no motivation for adopting the more powerful mechanism, at least on these grounds.</Paragraph> <Paragraph position="12"> The lexical-functional theory is, perhaps, already committed to the ability to test hierarchical functional structures for compatibility. For functional structures are certainly hierarchical in nature. They must encode the hierarchical relationships between root and embedded propositions, for example* A functional structure is used as the input to semantic interpretation, and so must reflect hierarchical dependencies. Otherwise we cannot decipher the relationships in a complex sentence like John expected Mary to persuade Bill to win. The feature checking machinery must be designed to test for functional structure compatibility because that is the only level of representation where features like the number of the subject are to be found* But once we permit feature checking of functional structures at a single,'unembedded level for the number of a subject NP, it is hard to see how we can rule it out for a more complex functional structure.</Paragraph> <Paragraph position="13"> In fact, lexical-functional researchers have proposed natural language cases where one must check one complex functional structure for compatibility against another.</Paragraph> <Paragraph position="14"> 19 Another example is the language of perfect squares.</Paragraph> <Paragraph position="15"> 20 Several other theories also adopt a directed, acyclic graph notation for features, among these, Kay's (1982) unification grammar and Shieber's (1983) PATR I1 formalism. Interestingly, Sag et al. (1984) adopt the more restricted view of features.</Paragraph> <Paragraph position="16"> Just such a case has been discussed by Bresnan, Kaplan, Peters, and Zaenen (1982), in the analysis of certain Dutch sentences. We will not review all the details of their proposal here except to establish the point that hierarchical functional structure comparisons are crucially implicated. The data Bresnan et al. want to account for is this. Dutch contains infinitely many sentences of the following sort (examples from Bresnan et al. 1982: 614): *.. dat Jan de kinderen zag zwemmen *.. that Jan the children saw swim *.. that Jan saw the children swim *.. dat Jan Piet Marie de kinderen zag helpen laten zwemmen *.. that Jan Peter Marie the children saw help make swim *.. that Jan saw Piet help Marie make the children swim These Dutch sentences must have a certain constituent structure* Their proposed structure consists of two branching &quot;spines&quot;, one a right branching tree of VPs containing objects and complements, the other a right branching tree of V containing verbs without their objects and complements. Every verb uses its lexical argument structure to demand certain NP objects or that the verb complement's subject be controlled by the verb's object or subject* For example, the verb zag demands that its object control the subject of zag's verbal complement* This is analogous to the English case where the object of a verb, for example, persuade, controls the subject of persuade's complement, as in, We persuaded John to leave.</Paragraph> <Paragraph position="17"> The lexical-functional system encodes this agreement in number of verbs and NPs by forcing an identification between the functional structure of the object of zag and the functional structure of the verb complement of zag (denoted VCOMP). The &quot;equation&quot; is written (~ VCOMP SUB J) = (q OBJ). The problem, of course, is that if we have three verbs then we have three such constraints, but the associated NPs that satisfy them lie along a distinct VP &quot;spine&quot; of the constituent structure tree that is separated from the verbs along the V spine* In other words, the &quot;control&quot; equations are built up along the rightmost, V spine of the constituent structure tree, but the NPs that satisfy these equations lie along the left side. How can we assemble the NP functional structures for proper checking against the control equation demands? Because feature checking can occur only at some common dominating mother node, the first place where all elements are &quot;visible&quot; to each other is at the first VP node completely dominating both right and left subtrees. The way that Bresnan et al. accomplish this task is to build up along the rightmost subtree a functional structure representation that encodes all of the control equations, in the form of a hierarchical functional structure with unfilled slots for the subjects and objects mentioned by the controlling verbs.</Paragraph> <Paragraph position="18"> Note that the structure is indeed hierarchical, containing embedded components* Along the lefthand side of the constituent tree Bresnan et al. build up a second hierarchi-Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 199 Robert C. Berwick Generative Capacity and Linguistic Theory cal functional structure that &quot;merges&quot; successfully into the righthand one just in case the number of NPs and their assignment to controlled positions meshes with the &quot;slots&quot; left remaining in the righthand functional structure.</Paragraph> <Paragraph position="19"> One must build and check a hierarchy of features because in order to encode the possibility of an arbitrary number of controlled NPs below the dominating VP node, we must adopt some means of encoding a potentially arbitrary number of features (denoting each of the NPs and their associated verbs). But given that the functional structure &quot;equations&quot; annotating the underlying context-free grammar are fixed once the grammar is written down, the only way to do that is by building up some recursive structure that mimics the constituent structure derivation as a chain. (With only a finite number of features, we can only encode an infinite number of different cases by means of chains or trees.) This means that Bresnan et al. are forced to adopt hierarchical feature checking as the means to describe the Dutch sentences.</Paragraph> <Paragraph position="20"> In contrast, the government-binding theory represents the same pairing of NPs and verbs via a &quot;flat&quot; co-indexing scheme. Jan de kinderen zag zwemmen would be roughly, NP/ NP 2 \[p&quot; V 1 V2\] in annotated surface structure (see Evers 1975 and Berwick and Weinberg 1984). As outlined in Berwick and Weinberg (1984), potential co-indexings can be evaluated by non-erasing pushdown transductions that test only single, unanalyzed nodes, never building up tree-structured features as in the lexical-functional grammar example. (Note again that D-structures are not reconstructed to carry out this Check.) The problem for the lexical-functional machinery is that once hierarchical checking is admitted for this one example, there is nothing to bar it in other cases. But then the power of 2 language can be generated. One can also build &quot;unnatural&quot; lexical-functionat grammars using just the linguistically motivated control equation apparatus and phrase structure rules proposed in the lexical-functional theory. The same linguistically motivated rules used for Dutch, combined in slightly different ways, lead to grammars quite unlike anything ever attested or likely to be attested in natural rule systems. The example we give uses almost precisely the Dutch control equations, and a slightly different context-free base.</Paragraph> <Paragraph position="21"> The idea behind our unnatural grammar is this. We will build a grammar where a verb controls a higher object NP just in case both the verb and that NP are essentially equally deeply embedded along different &quot;spines&quot; of the constituent structure tree. This we take to be a highly unnatural system. There is no natural language where a control property &quot;counts&quot;.</Paragraph> <Paragraph position="22"> We need these context-free rules and their functional structure annotations: 1. VP --,- NP V (V) (fsubj)=+ (fVcomp)=~ f= 2. VP --- NP (f obj) = m 3. V -~ V V (f Vcomp) = I, 4. V --- V 5. NP -,- N Rules (2)-(5) are precisely those used by Bresnan et al.</Paragraph> <Paragraph position="23"> Rule (1) is different. (I) has the associated equation (f subj) = ~ attached to the NP node instead of the equation (f obj) = 4. We must also add new lexical entries for the following &quot;verbs&quot;:</Paragraph> <Paragraph position="25"> The effect of this modest change is a rule system that has exactly the properties we claimed. Consider first the functional structure built up along the lefthand VP branching spine. The last NP expansion will have the associated equation (4' obj) = 4. Each VP demands that the VCOMP functional structure component associated with the node above it be identified with the functional structure built up at that VP. The effect is to build up a hierarchical arrangement of VCOMP functional structures, one for every VP node that is generated except for the top and the bottommost vP. In addition, a subject functional structure component is passed up from all NPs but the last one. The object from the lefthand functional structure merges into this righthand structure successfully if and only if it has one level of embedding less than the righthand structure. This is our desired result. Otherwise the object structure cannot be laid on top of the righthand structure and overlap properly; it must coincide with the empty object slot on the righthand side. 2~</Paragraph> </Section> <Section position="6" start_page="2" end_page="2" type="metho"> <SectionTitle> 4. The Formal Characterization of Natural Languages </SectionTitle> <Paragraph position="0"> Summarizing the analysis so far, we have seen just how modern transformational theories differ formally from their older counterparts. We have also seen that that difference is reflected as a weak and strong generative capacity difference between the new theory and the lexical-functional theory.</Paragraph> <Paragraph position="1"> Some questions are still unanswered. In the previous section, we came to a partial diagnosis of the source of the extra power of the lexical-functional theory. In this section we would like to pin down that diagnosis. At the same time we shall offer a different perspective on the formal characterization of natural languages. This analy21 For example, suppose we interchanged V 2 and V 3. Then the control verb V 3 is less deeply embedded than the object NP it is supposed to control. This structure should be ruled out, and it is. The lefthand functional structure will be as before. But now the righthand functional structure will not merge properly with the lefthand functional structure because that functional structure demands that the object be embedded inside two VCOMPs, whereas the righthand structure calls for an object embedded inside just one. Similarly, if V 3 were embedded one more level down, the number of VCOMPs would not match. Only when the number of embeddings is the same (plus one) on both left- and righthand sides is the structure well-formed.</Paragraph> <Paragraph position="2"> 200 Computational Linguistics, Volume 10, Numbers 3-4, July-December 1984 Robert C. Berwick Generative Capacity and Linguistic Theory sis will necessarily be more speculative. Still, it is hoped that the discussion will provoke a fresh look at how to go about the mathematical analysis of natural languages.</Paragraph> <Paragraph position="3"> To begin, let us recall that the suspected source of extra power in the lexical-functional theory is the unification procedure defined over hierarchical structures (constituent structures). We also argued that nothing like this kind of power is required to describe natural languages. In this section we shall investigate this claim more deeply~ We shall look at one case, co-ordination, that might seem to require full unification, and see that in fact hierarchical unification is not required.</Paragraph> <Paragraph position="4"> At first glance, co-ordination would seem to demand some kind of unification predicate. The reason is that co-ordinate structures obey a familiar principle (Williams 1978) that permits only &quot;similar&quot; conjuncts to be linked. 22 One way to visualize the parallelism constraint is to imagine the two conjuncts being laid ~n top of one another. If they match, then the conjunction is permitted, otherwise, it is not permitted. Williams (1978) formalizes this condi, tion. This process is reminiscent of the lexical-functional unification procedure. (Compare it to the Dutch example given earlier.) Here too, we &quot;overlaid&quot; two hierarchical structures to determine well-formedness. The Dutch sentences were legal just in case two hierarchical spines could be so overlaid, or unified. Is unification required? On closer inspection the analogy with unification breaks down. It is true that the parallelism of co-ordinate conjuncts demands a match in terms of phrasal nodes. The key difference between lexical-functional unification and the co-ordination constraint is that co-ordinate parallelism need only hold at the top level of a phrasal sequence.</Paragraph> <Paragraph position="5"> Internal details of the matched conjuncts do not matter.</Paragraph> <Paragraph position="6"> This is in contrast to the unification predicate, which, as the Dutch example shows, can demand a hierarchical match. For example, the following conjunction is perfectly grammatical, even though the conjoined VPs are internally different, one containing an Adjectival Phrase and the other a Noun Phrase (example from Goodall 1983): the bouncer was muscular and was a guitarist. One can even conjoin active and passive sentences (John went to Boston and was taken for a ride). As Goodall (1983) demonstrates, one way to describe this effect is as the union of the top level of phrasal nodes (actually, phrase markers).</Paragraph> <Paragraph position="7"> In contrast, the Kaplan and Bresnan unification procedure (1982: 272), as defined by their statement (190c), recursively defines a union over what may be an entire tree: (190) c. If e I e 2 are both f-structures, let A 1, A 2 be sets of attributes e! and e2, respectively. Then a new f-structure e is constructed with e = {(a,v) I a * A 1 U A 2 and v = merge \[Locate I(el,a)\], Locate \[(e2,a)\] \]} (Locate is an operator that actually finds the sub-f-structure with the specified attribute structure.) Here, (a, v) is the union of a hierarchical attribute set, since this last step is carried out recursively to all levels of structure. This means that there is nothing to stop us from writing a co-ordination rule in the lexical-functional system that demands equality in tree structure through all levels of hierarchical detail, contrary to what is observed. 23 We might speculate then that a general property of constraint statements in natural languages is that they are defined in terms of predicates on linear sequences of structures (phrase markers), rather than by hierarchically defined unification predicates. It remains to explore just what this restriction comes to, but it is clear that this is exactly where and how lexical-functional grammar diverges from the &quot;classical&quot; view of generative grammar. The classic view, outlined in Chomsky's Logical Structure of Linguistic Theory, defined predicates in terms of a concatenative algebra at each of several levels of repr6sentation (phonetic, syntactic, and so forth). The details are not essential here, but one property of these algebras is: they fixed predicates in terms of linear sequences of elements, rather than trees. 24 The lexical-functional system extends the power of representational description to include the possibility of unification predicates defined over nonlinear constituent structures. While this violation of the usual syntactic adjacency restrictions (observed from earliest days of generative grammar) is certainly sufficient to describe natural languages, the examples presented here show that it is not necessary.</Paragraph> <Paragraph position="8"> This diagnosis also tells us one way to repair the lexical-functional theory. One could restrict the lexical-functional theory to ban hierarchical unification predicates. One way to do this is to simply eliminate the recursive step of Kaplan and Bresnan's unification procedure (190c), (1982:272) excerpted earlier. For example, feature merger could be restricted to operate over just two cyclic (S or NP) domains. One would still need a way to handle constructions like those in Dutch, or, should they be necessary, the ww constructions. Of course, it may be that other restrictions suffice.</Paragraph> <Paragraph position="9"> Whatever the outcome of these changes, a more general question for future work centers on the status of the concatenation algebras underpinning traditional generative grammar. While there has been some formal work in this area (see Borgida 1983 and Berwick 1982), it remains to be seen whether the linear predicates presupposed by such a model do indeed characterize what it means to be a natural grammar. If they do, then extensions to more general unification predicates, as in LFG, unification grammar, or PATR-II, may well be unwarranted.</Paragraph> </Section> class="xml-element"></Paper>