File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/p90-1015_metho.xml
Size: 31,048 bytes
Last Modified: 2025-10-06 14:12:36
<?xml version="1.0" standalone="yes"?> <Paper uid="P90-1015"> <Title>LICENSING AND TREE ADJOINING GRAMMAR IN GOVERNMENT BINDING PARSING</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> This paper aims to provide a psychologically plausible mechanism for putting the knowledge which a speaker has of the syntax of a language, the competence grammar, to use. The representation of knowledge of language I assume is that specified by Government Binding (GB) Theory introduced in \[Chomsky, 1981\]. GB, as a competence theory, emphatically does not specify the nature of the language processing mechanism. In fact, &quot;proofs&quot; that transformational grammar is inadequate as a linguistic theory due to various performance measures are fundamentally flawed since they suppose a particular connection between the grammar and parser \[Berwick and Weinberg, 1984\]. Nonetheless, it seems desirable to maintain a fairly direct connection between the linguistic competence and *I would like to thank the following for their valuable discussion and suggestions: Naoki Fukui, Jarnie Henderson, Aravind Joshi, Tony Kroch, Mitch Marcus, Michael Niv, Yves Schabes, Mark Steedman, Enric Vallduv{. This work was pa~ially supported by ARO Grants DAAL03-89-C0031 PRI and DAAG29-84-K-0061 and DARPA grant N00014-85-K0018. The author is supported by a Unisys doctoral fellowship. its processing. Otherwise, claims of the psychological reality of this particular conception of competence become essentially vacuous since they cannot be falsified but for the data on which they are founded, i.e. grammaticality judgments. Thus, in building a model of language processing, I would like to posit as direct a link as is possible between linguistic competence and the operations of the parser while still maintaining certain desirable computational properties.</Paragraph> <Paragraph position="1"> What are the computational properties necessary for psychological plausibility? Since human syntactic processing is an effortless process, we should expect that it take place efficiently, perhaps in linear time since sentences do not become more difficult to process simply as a function of their length. Determinism, as proposed by Marcus \[1980\], seems desirable as well. In addition, the mechanism should operate in an incremental fashion.</Paragraph> <Paragraph position="2"> Incrementality is evidenced in the human language processor in two ways. As we hear a sentence, we build up semantic representations without waiting until the sentence is complete. Thus, the semantic processor should have access to syntactic representations prior to an utterance's completion. Additionally, we are able to perceive ungrammaticality in sentences almost immediately after the ill fonnedness occurs. Thus, our processing mechanism should mimic this early detection of ungrammatical input.</Paragraph> <Paragraph position="3"> Unfortunately, a parser with the most transparent relationship to the grammar, a &quot;parsing as theorem proving&quot; approach as proposed by \[Johnson, 1988\] and \[Stabler, 1990\], does not fare well with respect to our computational desiderata. It suffers from the legacy of the computational properties of first order theorem proving, most notably undecidability, and is thus inadequate for our purposes. The question, then, is how much we must repeat from this direct instantiatiou so that we can maintain the requisite properties. In this paper, I attempt to provide iii an answer. I propose a parsing model which represents the principles of the grammar in a fairly direct manner, yet preserves efficiency and incrementality. The model depends upon two key ideas. First, I utilize the insight of \[Abney, 1986\] in the use of licensing relations as the foundation for GB parsing. By generalizing Abney's formulation of licensing, I can directly encode and enforce a particular class of the principles of GB theory and in so doing efficiently build phrase structure. The principles expressible through licensing are not all of those posited by GB. Thus, the others must be enforced using a different mechanism. Unfortunately, the unbounded size of the tree created with licensing makes any such mechanism computationally abhorrent. In order to remedy this, I make use of the Tree Adjoining Grammar (TAG) framework \[Joshi, 1985\] to limit the working space of the parser. As the parser proceeds, its working slructure is bounded in size.</Paragraph> <Paragraph position="4"> If this bound is exceeded, we reduce this structure by one of the operations provided by the TAG formalism, either substitution or adjunction. This results in two structures, each of which form independent elementary trees. Interestingly, the domain of locality imposed by a TAG elementary tree appears to be sufficient for the expression of the remaining grammatical principles. Thus, we can check for the satisfaction of the remaining grammatical principles in just the excised piece of structure and then send it off for semantic interpretation. Since this domain of constraint checking is bounded in size, this process is done efficiently. This mechanism also works in an incremental fashion.</Paragraph> </Section> <Section position="4" start_page="0" end_page="112" type="metho"> <SectionTitle> 2 Abney's Licensing </SectionTitle> <Paragraph position="0"> Since many grammatical constraints are concerned with the licensing of elements, Abney \[1986\] proposes utilizing licensing structure as a more concrete representation for parsing. This allows for more efficient processing yet maintains &quot;the spirit of the abstract grammar.&quot; Abney's notion of licensing requires that every element in a structure be licensed by performing some syntactic function. Any structure with unlicensed elements is ill-formed. Abney takes them role assignment to be the canonical case of licensing and assumes that the properties of a general licensing relation should mirror those of theta assignment, namely, that it be unique, local and lexical.</Paragraph> <Paragraph position="1"> The uniqueness proporty for them assignment requires that an argument receives one and only one them role. Correspondingly, licensing is unique: an element is licensed via exactly one licensing relation. Locality demands that theta assignment, and correspondingly licensing, take place under a strict definition of government: sisterhood. Finally, ture (S = subjecthood, F = functional selection, M = modification, T = theta) theta assignment is lexical in that it is the properties of the the theta assigner which determine what theta assignment relations obtain. Licensing will have the same property; it is the licenser that determines how many and what sort of elements it licenses.</Paragraph> <Paragraph position="2"> Each licensing relation is a 3-tuple (D, Cat, Type). D is the direction in which licensing occurs. Cat is the syntactic category of the element licensed by this relation. Type specifies the linguistic function accomplished by this licensing relation. This can be either functional selection, subjecthood, modification or theta-assignment. Functional selection is the relation which obtains between a functional head and the element for which it subcategorizes, i.e. between C and IP, I and VP, D and NP. Subjecthood is the relation between a head and its &quot;subject&quot;. Moditication holds between a head and adjunct. Theta assignment occurs between a head and its subeategnrized elements.</Paragraph> <Paragraph position="3"> Figure 1 gives an example of the licensing relations which might obtain in a simple clause. Parsing with these licensing relations simply consists of determining, for each lexieal item as it is read in a single left to right pass, where it is licensed in the previously constructed structure or whether it licenses the previous structure.</Paragraph> <Paragraph position="4"> We can now re-examine Abney's claim that these licensing relations allow him to retain &quot;the spirit of the abstract grammar.&quot; Since licensing relations talk only of very local relationships, that occurring between sisters, this system cannot enforce the constraints of binding, control, and ECP among others. Abney notes this and suggests that his licensing should be seen as a single module in a parsing system. One would hope, though, that principles which have their roots in licensing, such as those of theta and case theory, could receive natural treatments. Unfortunately, this is not true. Consider the theta criterion. While this licensing system is able to encode the portion of the constraint that requires theta roles to be assigned uniquely, it fails to guarantee that all NPs (arguments) receive a theta role. This is crucially not the case since NPs are sometimes licensed not by them but by subject licensing. Thus, the following pair will be indistinguishable: i. It seems that the pigeon is dead ii. * Joe seems that the pigeon is dead Both It and Joe will be appropriately licensed by a subject licensing relation associated with seems. The case filter also cannot be expressed since objects of ECM verbs are &quot;licensed&quot; by the lower clause as subject, yet also require case. Thus, the following distinction cannot accounted for: i. Carol asked Ben to swat the fly ii. * Carol tried Ben to swat the fly Here, in order to get the desired syntactic structure (with Ben in the lower clause in both cases), Ben will need to be licensed by the inflectional element to. Since such a licensing relation need be unique, the case assigning properties of the matrix verbs will be irrelevant. What seems to have happened is that we have lost the modularity of the the syntactic relations constrained by grammatical principles. Everything has been conltated onto one homogeneous licensing structure.</Paragraph> </Section> <Section position="5" start_page="112" end_page="113" type="metho"> <SectionTitle> 3 Generalized Licensing </SectionTitle> <Paragraph position="0"> In order to remedy these deficiencies, I propose a system of Generalized Licensing. In this system, every node is assigned two sets of licensing relations: gives and needs.</Paragraph> <Paragraph position="1"> Gives are similar to the Abney's licensing relations: they are satisfied locally and determined lexically. Needs specify the ways in which a node must be licensed.1 A need of type them, for example, requires a node to be licensed by a theta relation. In the current formulation, needs differ from gives in that they are always directionaUy unspecified. We can now represent the theta criterion by placing theta gives on a theta assigner for each argument and theta needs on all DPs. This encodes both that all them roles must be assigned and that all arguments must receive theta roles.</Paragraph> <Paragraph position="2"> In Generalized Licensing, we allow a greater vocabulary of relation types: case, them assignment, modification, functional selection, predication, f-features, etc. We can then explicitly represent many of the relations which are posited in the grammar and preserve the modularity of the theory. As a result, however, certain elements can and must be multiply licensed. DPs, for instance, will have needs for both them and case as a result of the case filter and theta criterion. We therefore relax the requirement that all nodes be uniquely licensed. Rather, we demand that all gives and needs be uniquely &quot;satisfied.&quot; The uniqueness requirement in Abney's relations is now pushed down to the level of individual gives and needs. Once a give or need is satisfied, it may not participate in any other licensing relationships.</Paragraph> <Paragraph position="3"> One further generalization which I make concerns the positioning of gives and needs. In Abney's system, licensing relations are associated with lexical heads and applied to maximal projections of other heads. Phrase structure is thus entirely parasitic upon the reconstruction of licensing structure. I propose to have an independent process of lexical projection. A lexical item projects to the correct number of levels in its maximal projection, as determined by theta structure, f-features, and other lexical properties. 2 Gives and needs are assigned to each of these nodes. As with Abney's system, licensing takes place under a strict notion of government (sisterhood). However, the projection process allows licensing relations determined by a head to take place over a somewhat larger domain than sisterhood to the head. A DP's theta need resulting from the them criterion, for example, is present only at the maximal projection level. This is the node which stands in the appropriate structural relation to a theta give. As a result of this projection process, though, we must explicitly represent structural relations during parsing.</Paragraph> <Paragraph position="4"> The reader may have noticed that multiple needs on a node might not all be satisfiable in one structural position. Consider the case of a DP subject which possesses both theta and case needs. The S-structure subject of the sentence receives its theta role from the verb, yet it receives its case from the tense/agreement morpheme heading IP. This is impossible, though, since given the structural correlate of the licensing relation, the DP would then be directly dominated both by IP and by VP. Yet, it cannot be in either one of these positions alone, since we will then have unsatisfied needs and hence an ill-formed structure. Thus, our representation of grammatical principles and the constraints on give and need satisfaction force us into adopting a general notion of chain and more specifically the VP internal subject hypothesis. A chain consist of a list of nodes (al .... ,a~) such that they share gives and needs and each ai c-commands each a~+l. The first element in the chain, al, the head, is the only element which can have phonological content. Others must be empty categories.</Paragraph> <Paragraph position="5"> Now, since the elements of the chain can occupy different structural positions, they may be governed and hence licensed by distinct elements. In the simple sentence: \[IP Johns tns/agr \[V' ti smile\]\] 21 assume the relativized X-bar theory proposed in \[Fukui and Speas, 1986\].</Paragraph> <Paragraph position="6"> the trace node which is an argument of smile forms a chain with the DP John. In its V' internal position, the theta need is satisfied by the theta give associated with the V.</Paragraph> <Paragraph position="7"> In subject position, the case need is satisfied by the case give on the I' projection of the inflectional morphology.</Paragraph> <Paragraph position="8"> Now, how might we parse using these licensing relations? Abney's method is not sufficient since a single instance of licensing no longer guarantees that all of a node's licensing constraints are satisfied. I propose a simple mechanism, which generalizes Abney's approach: We proceed left to right, project the current input token to its maximal projection p and add the associated gives and needs to each of the nodes. These are determined by examination of information in the lexical entries (such as using the theta grid to determine theta gives), examination of language specific parameters (using head directionality in order to determined directionality of gives, for exampie), and consultation of UG parameters (for instance as a result of the case filter, every DP maximal projection will have an associated case need). The parser then attempts to combine this projection with previously built structure in one of two ways. We may attach p as the sister of a node n on the right frontier of the developing structure, when p is licensed by n either by a give in n and/or a need in the node p. Another possibility is that the previously built structure is attached as sister to a node, rn, dominated by the maximal projection p, by satisfying a give in rn and/or a need on the root of the previously built structure. In the case of multiple attachment possibilities, we order them according to some metric such as the one proposed by Abney, and choose the most highly ranked option.</Paragraph> <Paragraph position="9"> As structure is built, nodes in the tree with unsatisfied gives and needs may become closed off from the right frontier of the working structure. In such positions, they will never become satisfied. In the ease of a need in an internal node n which is unsatisfied, we posit the existence of an empty category rn, which will be attached later to the structure such that (n, ra) form a chain. We posit an element to have been moved into a position exactly when it is licensed at that position yet its needs are not completely satisfied. After positing the empty category, we push it onto the trace stack. When a node has an unsatisfied give and no longer has access to the right frontier, we must posit some element, not phonologically represented in the input, which satisfies that give relation. If there is an element on the top of the trace stack which can satisfy this give, we pop it off the stack and attach it. 3 Of course, if the trace has any remaining needs, it is returned to the Pace stack since its new position is isolated from the right frontier.</Paragraph> <Paragraph position="10"> If no such element appears on top of the mace stack, we 3Note that the use of this stack to recover filler-gap structures forbids non-nested dependencies as in \[Fodor, 1978\].</Paragraph> </Section> <Section position="6" start_page="113" end_page="114" type="metho"> <SectionTitle> IP </SectionTitle> <Paragraph position="0"> /~ 8tree: <left, case, nomlaattve, 1> i needs: <th~, ?, ?> needs: <caae, non~ative, ~>Harry ! styes: <rlsht, ~anctioc.-select, VP, ?> posit a non-mace empty category of the appropriate type, if one exists in the language. 4 Let's try this mechanism on the sentence &quot;Harry laughs.&quot; The first token received is Harry and is projected to DP. No gives are associated with this node, but them and case needs are inserted into the need set as a result of the them criterion and the case filter. Next, tns/agr is read and projected to I&quot;, since it possesses f-features (cf. \[Fuktti and Speas, 1986\]). Associated with the I deg node is a rightward functional selection give of value V. On the I' node is a leftward nominative case give, from the f-features, and a leftward subject give, as a result of the Extended Projection Principle. The previously constructed DP is attached as sister to the I' node, thereby satisfying the subject and case gives of the I' as well as the case need of the DP. We are thus left with the structure in figure 2. 5 Next, we see that the them need of the DP is inaccessible from the right frontier, so we push an empty category DP whose need set contains this unsatisfied theta need onto the mace stack. The next input token is the verb laugh.</Paragraph> <Paragraph position="1"> This is projected to a single bar level. Since laugh assigns an external theta role, we insert a leftward theta give to a DP into the V' node. This verbal projection is attached as sister to I deg, satisfying the functional selection give of I. However, the theta give in V' remains unsatisfied and since it is leftward, is inaccessible. We therefore need to posit an empty category. Since the DP trace on top of the trace stack will accept this give, the trace stack is popped and the trace is attached via Chomsky-adjunction to the 4Such a simplistic approach to determining whether a trace or nontrace empty category should be inserted is dearly not correct. For instance, in &quot;tough movement&quot; Alvin i is tough PRO to feed ti the proposed mechanism will insert the trace of Alvin in subject position rather than PRO. It remains for future work to determine the exact mechanism by which such decisions are made.</Paragraph> <Paragraph position="2"> 5In the examples which follow, gives are shown as 4-topics (D,T~tpe,Val, SatBI/) where D is the direction, T~tpe is the type of licensing relation, Val is the licensing relation value and SatB~ is the node which satisfies the give (marked as 7, if the relation is as yet unsatisfied). Needs are 3-tuples (Type, Val, SatB~/) where these are as in the gives. For purposes of readability, I remove previously satisfied gives and needs from the fgure. Of course, such information persists in the parser's representation.</Paragraph> <Paragraph position="4"> V' node yielding the structure in figure 3. Since this node forms a chain with the subject DP, the theta need on the subject DP is now satisfied. We have now reached the end of our input. The resulting structure is easily seen to be well-formed since all gives and needs are satisfied.</Paragraph> <Paragraph position="5"> We have adopted a very particular view of traces: their positions in the structure must be independently motivated by some other licensing relation. Note, then, that we cannot analyze long distance dependencies through successive cyclic movement. There is no licensing relation which will cause the intermediate traces to exist. Ordinarily these traces exist only to allow a well-formed derivation, i.e.</Paragraph> <Paragraph position="6"> not ruled out by subjacency or by a barrier to antecedent government. Thus, we need to account for constraints on long distance movement in another manner. We will return to this in the next section.</Paragraph> <Paragraph position="7"> The mechanism I have proposed allows a fairly direct encoding for some of the principles of grammar such as case theory, them theory, and the extended projection principle. However, many other constraints of GB, such as the ECP, control theory, binding theory, and bounding thetry, cannot be expressed perspicuously through licensing.</Paragraph> <Paragraph position="8"> Since we want our parser to maintain a fairly direct connection with the grammar, we need some additional mechanism to ensure the satisfaction of these constraints.</Paragraph> <Paragraph position="9"> Recall, again, the computational properties we wanted to hold of our parsing model: efficiency and incrementality.</Paragraph> <Paragraph position="10"> The structure building process I have described has worst case complexity O(n 2) since the set of possible attachments can grow linearly with the input. While not enormously computationally intensive, this is greater that the linear time bound we desire. Also, checking for satisfaction of non-licensing constraints over unboundedly large structures is likely to be quite inefficient. There is also the question of when these other constraints are checked.</Paragraph> <Paragraph position="11"> To accord with incrementality, they must be checked as soon as possible, and not function as post-processing &quot;filters.&quot; Unfortunately, it is not easily determinable when a given constraint can apply such that further input will not change the status of the satisfaction of a constraint. We do not want to rule a structure ungrammatical simply because it is incomplete. Finally, it is unclear how we might incorporate this mechanism which builds an ever larger syntactic structure into a model which performs semantic interpretation incrementally.</Paragraph> </Section> <Section position="7" start_page="114" end_page="116" type="metho"> <SectionTitle> 4 Limiting the Domain with TAG </SectionTitle> <Paragraph position="0"> These problems with our model are solved if we can place a limit on the size of the structures we construct. The number of licensing possibilities would be bounded yielding linear time for smacture construction. Also, constraint checking could be done in a constant amount of processing. Unfortunately, the productivity of language requires us to handle sentences of unbounded length and thus linguistic structures of unbounded size.</Paragraph> <Paragraph position="1"> TAG provides us with a way to achieve this paradise.</Paragraph> <Paragraph position="2"> TAG accomplishes linguistic description by factoring recursion from local dependencies \[Joshi, 1985\]. It posits a set of primitive structures, the elementary trees, which may be combined through the operations of adjunction and substitution. An elementary tree is a minimal non-recursive syntactic tree, a predication structure containing positions for all arguments. I propose that this is the projection of a lexical head together with any of the associated functional projections of which it is a complement. For instance, a single elementary tree may contain the projection of a V along with the I and C projections in which it is embedded. 6 Along the frontier of these trees may appear terminal symbols (i.e. lexical items) or non-terminals.</Paragraph> <Paragraph position="3"> The substitution operation is the insertion of one elementary tree at a non-terminal of same type as the root on the frontier of another elementary tree. Adjunction allows the insertion of one elementary tree of a special kind, an auxiliary tree, at a node internal to another (cf. figure 4). In 6This definition of TAG elementary trees is consistent with the Lexicalized TAG framework \[Schabes et al., 1988\] in that the lexical head may be seen as the anchor of the elementary trees. For further details and consequences of this proposal on elementary tree weU-fomaedness, see \[Frank, 1990\].</Paragraph> <Paragraph position="4"> auxiliary trees, there is a single distinguished non-terminal on the frontier of the tree, the foot node, which is identical in type to the root node. Only adjunctions, and not substitutions, may occur at this node.</Paragraph> <Paragraph position="5"> TAG has proven useful as a formalism in which one can express linguistic generalizations since it seems to provide a sufficient domain over which grammatical constraints can be stated \[Kroch and Joshi, 1985\] \[Kroch and Santorini, 1987\]. Kroch, in two remarkable papers \[1986\] and \[1987\], has shown that even constraints on long distance dependencies, which intuitively demand a more &quot;global&quot; perspective, can be expressed using an entirely local (i.e.</Paragraph> <Paragraph position="6"> within a single elementary lee) formulation of the ECP and allows for the collapsing of the CED with the ECP.</Paragraph> <Paragraph position="7"> This analysis does not utilize intermediate traces, but instead the link between filler and gap is &quot;stretched&quot; upon the insertion of intervening structure during adjunctions.</Paragraph> <Paragraph position="8"> Thus, we are relieved of the problem that intermediate traces are not licensed, since we do not require their existence. null Let us suppose a formulation of GB in which all principles not enforced through generalized licensing are stated over the local domain of a TAG elementary tree. Now, we can use the model described above to create structures corresponding to single elementary trees. However, we restrict the working space of the parser to contain only a single structure of this size. If we perform an attachment which violates this &quot;memory limitation,&quot; we are forced to reduce the structure in our working space. We will do this in one of two ways, corresponding to the two mechanisms which TAG provides for combining structure. Either we will undo a substitution or undo an adjunction.</Paragraph> <Paragraph position="9"> However, all chains are required to be localized in individual elementary tree. Once an elementary tree is finished, non-licensing constraints are checked and it is sent off for semantic interpretation. This is the basis for my proposed parsing model. For details of the algorithm, see \[Frank, 1990\]. This mechanism operates in linear time and deterministically, while maintaining coarse grained (i.e.</Paragraph> <Paragraph position="10"> clausal) incrementality for grammaticality determination and semantic interpretation.</Paragraph> <Paragraph position="11"> Consider this model on the raising sentence &quot;Harry seemed to kiss Sally.&quot; We begin as before with &quot;Harry tns/agr&quot; yielding the structure in figure 2. Before we receive the next token of input, however, we see that the working structure is larger than the domain of an elementary tree, since the subject DP constitutes an independent predication from the one determined by the projection of I. We therefore unsubstitute the subject DP and send it off to constraint checking and semantic interpretation. At this point, we push a copy of the subject DP node onto the trace stack due to its unsatisfied theta need.</Paragraph> <Paragraph position="12"> We continue with the verb seem which projects to V' and attaches as sister to I satisfying the functional selection give yielding the structure in figure 5. There remains only one elementary tree in working space so we need not perform any domain reduction. Next, to projects to I' since it lacks f-features to assign to its specifier. This is attached as object of seem as in figure 6. At this point, we must again perform a domain reduction operation since the upper and lower clauses form two separate elementary trees. Since the subject DP remains on the trace stack, it cannot yet be removed. All dependencies must be resolved withina single elementary tree. Hence, we must unadjoin the structure recursive on I' shown in figure 7 leaving the structure in figure 8 in the working space. This structure is sent off for constraint checking and semantic interpretation. We continue with kiss, projecting and attaching it as functionally selected sister of I and popping the DP from the trace stack to serve as external argument. Finally, we constrained, we might be able to retain the efficient nature of the current model. Other strategies for resolving such indeterminacies using statistical reasoning or hard coded rules or templates might also be possible, but these constructs are not the sort of grammatical knowledge we have been considering here and would entail further abstraction from the competence grammar.</Paragraph> <Paragraph position="13"> Another problem with the parser has to do with the incompleteness of the algorithm. Sentences such as project and attach the DP Sally as sister of V, receiving both them role and case in this position. This DP is unsubstituted in the same manner as the subject and is sent off for further processing. We are left finally with the structure in figure 9, all of whose gives and needs are satisfied, and we are finished.</Paragraph> <Paragraph position="14"> This model also handles control constructions, bare infinitives, ECM verbs and binding of anaphors, modification, genitive DPs and others. Due to space constraints, these are not discussed here, but see \[Frank, 1990\].</Paragraph> </Section> class="xml-element"></Paper>