File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/89/e89-1012_intro.xml
Size: 26,715 bytes
Last Modified: 2025-10-06 14:04:42
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1012"> <Title>Programming in Logic with Constraints for Natural Language Processing</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2. Movement Theory in GB </SectionTitle> <Paragraph position="0"> In this section, we briefly summarize the main aspects of movement theory (Chomsky 1982, 1986) and give several examples. GB theory is a complete revision of the baroque set of rules and transformations of the standard theory, achieving a much greater expressive power and explanatory adequacy. GB theory is composed of a very small base component (which follows X-bar syntax), a single movement rule and a small set of principles whose role is to control the power of the movement rule. GB exhibits a greater clarity, ease of understanding and linguistic coverage (in spite of some points which remain obscure). The recent formalization of GB theory has several attractive properties for the design of a computational model of natural language processing, among which: -concision and economy of means, - high degree of parametrization, -modularity (e.g. independence of filtering principles), - declarativity (e.g. no order in the application of rules), -absence of intermediate structures (e.g.</Paragraph> <Paragraph position="1"> no deep slructure).</Paragraph> <Paragraph position="2"> GB theory postulates four levels: d-structure (sometimes not taken into account, like in our approach), s-structure (surface form of structural description), phonetic form (PF) and logical form (LF). The latter two levels are derived independently from s-structure. We will be mainly interested here in the s-structure level. Movement theory being also applicable, with different parameter values, to LF, we will also show how our approach is well-adapted to characterize LF level from s-structure level.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Move-cx and constraints </SectionTitle> <Paragraph position="0"> In GB, grammaticality of a sentence is based on the existence of a well-formed annotated surface form of that sentence. Thus, no real movements of constituents occur and additional computational and representational problems are avoided. Up to now very few and only partial works have been undertaken to model principles of GB theory. Among them, let us mention (Berwick and Weinberg 1986), (Stabler 1987) and (Brown et al. 1987). There is however an increasing interest for this approach.</Paragraph> <Paragraph position="1"> GB theory postulates a single movement - 88 rule, more-a, controlled by principles and filters. This very general rule states: Move any constituent a to any position.</Paragraph> <Paragraph position="2"> The most immediate constraints are that tx is moved to the left to an empty position (a subject position which is not 0-marked) or is adjoined to a COMP or INFL node (new positions are created from nothing, but this not in contradiction with the projection principle). Constraints and filters control movement but they also force movement. For example, when a verb is used in the passive voice, it can no longer assign case to its object. The object NP must thus move to a place where it is assigned case. The (external) subject 0-role being also suppressed, the object NP naturally moves to the subject position, where it is assigned case, while keeping its previous 0-role.</Paragraph> <Paragraph position="3"> Another immediate constraint is the 0-criterion: each argument has one and only one 0-role and each 0-role is assigned to one and only one argument. Such roles are lexically induced by means of the projection principle (and by lexical insertion), confering thus an increasing role to lexical subeategorization. Finally, government gives a precise definition of what a constituent can govern and thus how the projection principled is handled.</Paragraph> <Paragraph position="4"> Move-ix is too abstract to be directly implementable. It needs to be at least partially instantiated, in a way which preserves its generality and its explanatory power. In addition, while the theory is attaining higher and higher levels of adequacy, the interest for analysing the specifics of particular constructions is decreasing. As a consequence, we have to make explicit elements left in the shade or just neglected. Finally, the feature system implicit in GB theory has also to be integrated.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 Examples of movements </SectionTitle> <Paragraph position="0"> All the examples given below are expressed within the framework of head-initial languages like French and English. Let us first consider the relative clause construction. In a relative clause, an N is pronominalized and moved to the left and adjoined to a COMP node. A trace of N is left behind and co-indexed with the pronominalized N: (1) \[COMP N(+Pro)i ........ \[N2 trace i \] ...... \] as in: \[COMP thati John met IN2 trace i \] yesterday \] Where i is the co-indexation link.</Paragraph> <Paragraph position="1"> The case of the passive construction is a little more complicated and needs to be explained. An object NP is moved to a subject position because the passivisation of the verb no longer allows the verb to assign case to its object NP and a 0-role to its subject NP (in an indirect manner): at d-structure we have, for example: \[ \[NP \] \[INFL gives \[ N2 a book \] \] \] and at s-structure we have: \[ \[NP a booki \] \[INFL is given \[N2 tracei \] \]. At d-structure, the subject NP is here not mentioned. In a passive construction, the subject is not moved to a PP position (by N2). 0-roles are redistributed when the verb is passivized (this illustrates once again the prominent role played by the lexical description and the projection principle) and a by-complement with the previous 0-role of the subject NP is created.</Paragraph> <Paragraph position="2"> Another example is the subject-to-subject raising operation, where: It seems that Jane is on time becomes: Jane seems to be on time.</Paragraph> <Paragraph position="3"> Jane moves to a position without 0-role (it is not 0-marked by seem ). When the clause is on time is in the infinitive form then the subject NP position is no longer case-marked, forcing Jane to move: \[INFL Janei seem \[COMP lracei \[VP to be on time \] \] \] Finally, let us consider the wh-construal construction occuring at logical form (I.F) (May 86). The representation of: Who saw what ? is at s-structure: \[COMP2 \[COMP whoi \] \[INFL tracei saw'\[N what \] \] \] and becomes at IF: \[COMP2 \[COMP whatj \] \[COMP whoi \] \] \[INFL tracei saw tracej \] \].</Paragraph> <Paragraph position="4"> Both what and who are adjoined to a COMP node. This latter type of movement is also restricted by a small number of general principles based on the type of landing site a raised quantifier may occupy and on the nature of the nodes a quantifier can cross over when raised. The first type of constraint will be directly expressed in rules by means of features; the latter will be dealt with in section 5 devoted to Bounding theory, where a model of the subjacency constraint is presented.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Towards a computational </SectionTitle> <Paragraph position="0"> expression of movements Movements have to be expressed in a simple computational way. Let us consider the relative clause construction (wh-movement in general), all the other examples can be expressed in the same way.</Paragraph> <Paragraph position="1"> Relative clause construction can be expressed in a declarative Way by stating, very informally, that: within the domain of a COMP, an N(+Pro) is adjoined to that COMP and somewhere else in that domain an N2 is derived into a trace co-indexed with that N(+Pro). The notion of domain associated to a node like COMP refers to Bounding theory and will be detailed in section 5, the constraint on the co-existence in that domain of an N(+Pro) adjoined to a COMP and, somewhere else, of an N2 derived into a trace can directly be expressed by constraints on syntactic trees, and, thus, by constraints on proof trees in an operational framework. This is precisely the main motivation of DISLOG that we now briefly introduce.</Paragraph> <Paragraph position="2"> 3. An Introduction to DISLOG, Programming in Logic with Discontinuities.</Paragraph> <Paragraph position="3"> Dislog is an extension to Prolog. It is a language composed of Prolog standard clauses and of Dislog clauses. The computational aspects are similar to that of Prolog. Foundations of DISLOG are given in (Saint-Dizier 1988b). We now introduce and briefly illustrate the main concepts of Dislog.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1. Disiog clauses </SectionTitle> <Paragraph position="0"> A Dislog clause is a t-mite, unordered set of Prolog clauses fi of the form: {fl ,f2 ......... fn }.</Paragraph> <Paragraph position="1"> The informal meaning of a Dislog clause is: ira clause fi in a Dislog clause is used in a given proof tree, then all the other ~ of that Dislog clause must be used to build that proof tree, with the same substitutions applied to identical variables. For example, the Dislog clause (with empty bodies here, for the sake of clarity): { arc(a/a), arc(e~9 3.</Paragraph> <Paragraph position="2"> means that, in a graph, the use of arc(a,b) to construct a proof is conditional to the use of arc(e~. If one is looking for paths in a graph, this means that - 89 all path going through arc(a,b) will also have to go through arc(el).</Paragraph> <Paragraph position="3"> A Dislog clause with a single element is equivalent to a Prolog clause (also called definite program clause).</Paragraph> <Paragraph position="4"> A Dislog program is composed of a set of Dislog clauses. The definition of a predicate p in a Dislog program is the set of all Dislog clauses which contain at least one definite clause with head predicate symbol p. Here is an example of a possible definition for p:</Paragraph> <Paragraph position="6"> A full example is given in section 3.3.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Constraining Disiog clauses </SectionTitle> <Paragraph position="0"> We now propose some simple restrictions of the above general form for Dislog clauses. A first type of restriction is to impose restrictions on the order of use of Prolog clauses in a Dislog clause. We say that an instance of a clause ri precedes an instance of a clause rj in a proof tree if either ri appears in that proof tree to the left of rj or if ri dominates rj. Notice that this notion of precedence is independent of the strategy used to build the proof tree. In the following diagram, the clause: a :- al precedes the clause b :- bl. :</Paragraph> <Paragraph position="2"> To model this notion of precedence, we add to Dislog clauses the traditional linear precedence restriction notation, with the meaning given above: a < b means that the clause with head a precedes the clause with head b (clause numbers can also be used). When the clause order in a Dislog clause is complete, we use the more convenient notation: fl /12 / ............ / fn.</Paragraph> <Paragraph position="3"> which means that fl precedes 12 which precedes 13 etc... The relation I is viewed as an accessibility relation.</Paragraph> <Paragraph position="4"> Another improvement to Dislog clauses is the adjunction of modalities. We want to allow Prolog clauses in a Dislog clause to be used several times. This permits to deal, for example, with parasitic gaps and with pronominal references. We use the modality m applied on a rule to express that this clause can be used any number of times in a Dislog clause. For example, in:</Paragraph> <Paragraph position="6"> the clause f3 can be used any number of times, provided that fl anti t2 are used. Substitutions for identical variables remain the same as before.</Paragraph> <Paragraph position="7"> Another notational improvement is the use of the semi-colon ';' with a similar meaning as in Prolog to factor out rules having similar parts: C.</Paragraph> <Paragraph position="8"> {a,b}. and {a,c} can be factored out as: {a,(b;c)}.</Paragraph> <Paragraph position="9"> which means that a must be used with either b or</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Programming in Dislog </SectionTitle> <Paragraph position="0"> Here is a short and simple example where Dislog rams out to be very well-adapted.</Paragraph> <Paragraph position="1"> In a conventional programming language, there are several one-to-one or one-to-many relations between non-contiguous instructions. For instance, there is a relation between a procedure and its corresponding calls and another relation between a label declaration and its corresponding branching instructions. Dislog rule format is very well adapted to express those relations, permitting variables to be shared between several definite clause in a Dislog clause. These variables can percolate, for example, addresses of entry points.</Paragraph> <Paragraph position="2"> We now consider the compiler given in (Sterling and Shapiro 86) which transforms a program written in a simplified version of Pascal into a set of basic instructions (built in the argument)'. This small compiler can be augmented with two Dislog rules: {procedure declaration, procedure call(s) }.</Paragraph> <Paragraph position="3"> { label statement, branching instruction(s) to label}.</Paragraph> <Paragraph position="4"> In order for a procedure call to be allowed to - 90 appear before the declaration of the corresponding procedure we do not state any linear precedence restriction. Furthemore, procedure call and branching instruction description rules are in a many-to-one relation with respectively the procedure declaration and the label declaration. A procedure call may indeed appear several times in the body of a program (this is precisely the role of a procedure in fact). Thus, we have to use the modality m as follows: {procedure declaration, re(procedure call) }.</Paragraph> <Paragraph position="5"> { label statement, re(branching instruction to label)}.</Paragraph> <Paragraph position="6"> In a parse tree corresponding to the syntactic analysis of a Pascal program, we could have, for example the following tree:</Paragraph> <Paragraph position="8"> proc_c~, The main calls and the Dislog rules are the following:</Paragraph> <Paragraph position="10"> We have carried out an efficient and complete implementation for Dislog rules which are compiled into Prolog clauses.</Paragraph> <Paragraph position="11"> 4. Expressing movement rules in Dislog A way of thinking to move-or (as in Sells 85) is that it expresses the 'movement' part of a relation between two structures. We quote the term movement because, in our approach, we no longer deal with d-structure and no longer have, thus, movements but rather long-distance relations or constraints. We think that, in fact, move-vt is itself the relation (or prototype of relation) and that the constraints (case assignment, 0-marking, bounding theory, etc...) are just specific arguments or constraints on that relation: everything is possible (relation) and constraints filter out incorrect configurations. From this point of view, Dislog is a simple and direct computational model for move-or.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Expressing movement in Dislog </SectionTitle> <Paragraph position="0"> The relativisation rule given above is expressed in a straightforward way by a Dislog clause. That Dislog clause is composed of two Prolog(-like) clauses. The first clause deals with the adjunction of the N(+Pro) to the COMP and the second clause deals with the derivation of the N2 into a trace. A shared variable I permits to establish the co-indexation link. The Dislog clause is the following, in which we adopt the</Paragraph> <Paragraph position="2"> An xp is a predicate which represents any category. The category is specified in the first argument, the bar level in the second, syntactic features in the third one (oversimplified here), the fourth argument is the co-indexation link and the last one, not dealt with here, contains the logical form associated with the rule. Notice that using identical variables (namely here I and Case) in two different clauses in a Dislog clauses permits to transfer feature values in a very simple and transparent way.</Paragraph> <Paragraph position="3"> The passive construction is expressed in a similar way. Notice that we are only interested in the s-structure description since we produce annotated - 91 surface forms (from which we then derive a semantic representation). The passive construction rule in Dislog is:</Paragraph> <Paragraph position="5"> Case and 0-role are lexically induced. Following a specification format like in (Sells 85), we have, for example, for the verb to eat, the following lexical entry: eat, V, (subject:NP, agenO, (object:NP, patienO, assigns no case to object.</Paragraph> <Paragraph position="6"> which becomes with the passive inflection: eaten, V, (object: NP, patient), assigns no case. (the by-complement is also lexically induced by a lexical transformation of the same kind with: iobject:NP, agent, case: ablative) Let us now consider the subject-to-subject raising operation. At d-structure, the derivation of an N2 into the dummy pronoun it is replaced by the derivation of that N2 into an overt noun phrase. This is formulated as follows in Dislog:</Paragraph> <Paragraph position="8"> xp(n~,,Case,l,_)--> trace(l).</Paragraph> <Paragraph position="9"> xp(n,2,Case,1, ), The movement construction rules given above have many similarities. They can be informally put together to form a single, partially instaneiated movement rule, roughly as follows:</Paragraph> <Paragraph position="11"/> </Section> <Section position="8" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Other uses of Dislog for </SectionTitle> <Paragraph position="0"> natural language processing Dislog has many other uses in natural language processing. At the semantic level, it can be used in a convenient way as a computational model to deal with quantifier raising, with negation and modality operator raising operations or to model some meaning postulates in Montague semantics. Dislog can also provide a simple model for temporal relations involving the notion of (partial) precedence of actions or events.</Paragraph> <Paragraph position="1"> Semantic interpretation or formula optimisation often involves putting together or rewriting elements which are not necessarily contiguous in a formula. Dislog rules can then be used as rewriting rules.</Paragraph> <Paragraph position="2"> In order to properly anchor the N2, we have to repeat in the Dislog rule a rule from the base component (rule with infl). Once again, this is lexically induced from the description of the verb to seem: when the N2 is raised, the proposition following the completive verb has no subject, it is tenseless, i.e. in the infinitive form. Finally, notice the case variable, designed to maintain the case chain. The wh-construal construction at LF is dealt with in exactly the same manner, an N2(+pro) is adjoined to a COMP node:</Paragraph> <Paragraph position="4"> xp(n2,Cased,_)--> trace(l).</Paragraph> <Paragraph position="5"> Case permits the distinction between different pronouns. Notice that this rule is exactly similar to the relative construction rule.</Paragraph> <Paragraph position="6"> Dislog rules describing movements can be used in any order and are independent of the parsing strategy. They are simple, but their interactions can become quite complex. However, the high level of declarativity of Dislog permits us to control movements in a sound way.</Paragraph> <Paragraph position="7"> Finally, at the level of syntax, we have shown in (Saint-Dizier 87) that Dislog can be efficiently used to deal with free phrase order or free word order languages, producing as a result a normalized syntactic tree. Dislog can also be used to skip parts of sentences which cannot be parsed.</Paragraph> </Section> <Section position="9" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.3 Formal grammatical aspects of Dislog rules </SectionTitle> <Paragraph position="0"> A Dislog rule can be interpreted by a term attribute grammar. A term attribute grammar has arguments which are terms. It is a context-free grammar that has been augmented with conditions (on arguments) enabling non-context-free aspects of a language to be specified. A Dislog rule can be translated as follows into a term attribute grammar.</Paragraph> <Paragraph position="1"> Consider the rule: a-->b / c-->d.</Paragraph> <Paragraph position="2"> a possible (and simple) interpretation is: a(X,Y) --> b(X,X1), add(Xl,\[c-->dl,Y).</Paragraph> <Paragraph position="3"> b(X,Y) --> withdraw(\[c-->d1,X,Y1), d(Y1,Y).</Paragraph> <Paragraph position="4"> When a-->b is executed, the rule c-->d is stored in an argument (X and Y represent input and output arguments for storing these rules to be executed, like strings of words are stored in DCGs). c-->d can only - 92 be executed if it is present in the list. At the end of the parsing process, the list of rules to be executed must be empty (except for rules marked with modality m). Notice also that shared variables in a Dislog rule are unified and further percolated when rules are stored by the procedure add.</Paragraph> <Paragraph position="5"> however more general and more powerful because it deals with unordered sets of rules rather than with a single, rigid rewriting rule, it also permits to introduce modalities and no extrasymbols (to represents skips or to avoid loops) need to be introduced (see Saint-Dizier 88b).</Paragraph> <Paragraph position="6"> Dislog rules can be used to express context-sensitive languages. For example, consider the language L= {anbmcndm, n, m positive integers), it is recognized by the following grammar: that the number of a's is equal to the number of o's and the number of b's is equal to the number of d's, we have: { (S --> \[a\], S), (S --> \[c\], S) 1. { (S --> \[b\], S), (S --> \[d\], S) }. S --> \[a\] / \[b\] / \[el / \[d\].</Paragraph> <Paragraph position="7"> Bounding nodes and modalities can also be added to deal with more complex languages.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.4 Related works </SectionTitle> <Paragraph position="0"> Dislog originates a new type of logic-based grammar that we call Contextual Discontinuous Grammars. The closest formalisms to Dislog are Extraposition Grammars (Pereira 1981) and Gapping Grammars (DaM and Abramson 1984). As opposed to Gapping Grammars, Dislog permits to deal with trees rather than with graphs. Gapping Grammars are of type-0 and are much more difficult to write and to control the power of. Compared to Extraposition Grammars, Dislog no longer operates movements of strings and it is also more general since a Dislog clause can contain any number of Prolog clauses which can be used in any orderand at any place within a domain. Extraposition grammars also involve graphs (although much simpler than for Gapping Grammars) instead of trees, which are closer to the linguistic reality. The implementation of Dislog is about as efficient as the very insightful implementation provided by F. Pereira.</Paragraph> <Paragraph position="1"> 5. Bounding Theory in Dislog Bounding theory is a general phenomena common to several linguistic theories and expressed in very similar ways. Roughly speaking, Bounding theory states constraints on the way to move constituents, or, in non-transformational terms on the way to establish relations between non-contiguous elements in a sentence. The main type of constraint is expressed in terms of domains over the boundaries of which relations cannot be established. For example, if A is a bounding node (or a subtree which is a sequence of bounding nodes), then the domain of A is the domain it is the root of and no constituent X inside that domain can have relations with a constituent outside it (at least not directly): In Dislog, if an instance of a Dislog clause is activated within the domain of a bounding node, then, the whole Dislog clause has to be used within that domain. For a given application, bounding nodes are specified as a small database of Prolog facts and are interpreted by the Dislog system.</Paragraph> <Paragraph position="2"> More recently (Dahl, forthcoming), Static Discontinuity Grammars have been introduced, motivated by the need to model GB theory for sentence generation. They permit to overcome some drawbacks of Gapping Grammars by prohibiting movements of constituents in rules. They have also borrowed several aspects to Dislog (like bounding nodes and its procedural interpretation). Dislog is In the case of Quantifier Raising, we have several types of bounding nodes: the nodes of syntax, nodes corresponding to conjunctions, modals, some temporal expressions, etc... Those nodes are declared as bounding nodes and are then processed by Dislog in a way transparent to the grammar writer.</Paragraph> <Paragraph position="3"> 6. An implementation of Dislog for - 93 natural language processing We have carried out an specific implementation of Dislog for natural language parsing described in (St-Dizier, Toussaint, Delaunay and SebiUot 1989). The very regular format of the grammar rules (X-bar syntax) permits us to define a specific implementation which, in spite of the high degree of parametrization of the linguistic system, is very efficient.</Paragraph> <Paragraph position="4"> We use a bottom-up parsing strategy similar to that given in (Pereira and Shieber 1987), with some adaptations due to the very regular rule format of X-bar syntax rules, and a one-step look-ahead mechanism which very efficiently anticipates the rejection of many unappropriate rules. The sentences we have worked on involve several complex constructions; they are parsed in 0.2 to 2 seconds CPU time in Quintus Prolog on a SUN 3.6 workstation.</Paragraph> </Section> </Section> class="xml-element"></Paper>