File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-1057_metho.xml
Size: 24,134 bytes
Last Modified: 2025-10-06 14:12:54
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-1057"> <Title>A Generalized Greibach Normal Form for Definite Clause Grammars</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 The Generalized Greibach Normal </SectionTitle> <Paragraph position="0"> Form for Definite Clause Grammars null</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.1 Definite Grammar Schemes </SectionTitle> <Paragraph position="0"> As usually defined (see \[6\]), a definite clause grammar consists in two separate sets of clauses: 1. Nonterminal clauses, written as: a(Tl ..... 7h) --, a , where n is a nonterminal, T~ ..... T. am terms (variable.s, constants, or complex terms), and a is a sequence of &quot;terminal goals&quot; \[term\], of &quot;nontenninal goals&quot; b(T{,..., 7;'), and of&quot;auxiliary predicate goals&quot; {p(Ti', .... T~')}.</Paragraph> <Paragraph position="1"> SSee the Appendix for some indications on the methods used.</Paragraph> <Paragraph position="2"> SThus. a side-effect of the GGNF is to provide a decision procedure for the problem of knowing whether a DCG is offline-parsable or not. This is equivalent to deciding whether a eomext-free grammar is infinitely ambiguous or not, a problem the decidability of which seems to be &quot;quasi.folk-knowledge&quot;, although I was innocent of this until the fact was brought to my attention by Yves Schabes, among others: the proof is more or less implicit in the usual technique to make a CFG &quot;cycle-free&quot;, \[1, p. 150\] . See also \[4\] for a special case of this problem. (Caveat. The notion of &quot;cycle&quot; in &quot;cycle-free&quot; is technically different from the notion used here, which simply means: cycle in the graph associated with the relation &quot;callable from&quot;. See note 10.) 2. Auxiliary clauses, constituting an autonomous definite program, defining rite auxiliary predicates ap~ pcaring in the right-hand sides of nonterminal rules.</Paragraph> <Paragraph position="3"> These clauses am written: p(7~ ..... 7)) :- f~ , where fl is some sequence of predicate goals.</Paragraph> <Paragraph position="4"> A definite grammar scheme DGS is syntactically identical to a definite clause grammar, except for the fact that: 1. Tile '/} arguments appearing in the nonterminal and auxiliary predicate goals ,are restricted to beiug variables: no constants or complex terms are allowed; 2. Only nonterminal clauses appear in the definite grannnar scheme, but no auxiliary clause; the auxiliary predicates which may appear in the right-hand sides of clauses do not receive a definition.</Paragraph> <Paragraph position="5"> A definite grammar scheme can be seen as an uncompletely specified definite clause grammar, that is, a definite clause granmtar &quot;lacking&quot; a definition for the auxiliary predicates {p(Xt ..... X,)} that it &quot;uses&quot;. The auxiliary predicates are &quot;free&quot;: the interpretation of p is not fixed a priori, but can be an arbitrarily chosen n-ary relation on a certain Herbrand universe of terms. 7 Example 1 The following clauses define a definite</Paragraph> <Paragraph position="7"> In this definite grammar scheme, only variables appear as argutnents; the auxiliary predicates pl, p2, p3 and q do not receive a definition.</Paragraph> <Paragraph position="8"> If a definite program definiug these auxiliary predicates is added to the definite grammar scheme, one obtains a full-fledged definite clause grammar, which can be interpreted in the usual manner.</Paragraph> <Paragraph position="9"> Conversely, every definite clause grammar can be seen as the conjunction of a definite grammar scheme and of a definite clause program. In order to do so, a minor transforntation must be performed: each complex term T appearing as an argument in the head or body of a nonterminal clause must be replaced by a variable X, this variable being implicitly constrained to unify with T through the addition of an ad-hoc unification goal in the body of the clause (see \[3\]).</Paragraph> <Paragraph position="10"> rThe domain of interpretation can in fact be any set. Taking it to be the Ilerbrand universe over a certain vocabulary of functional symbols permits to &quot;simulate&quot; a DCG, by fixing the interpretation of the free auxiliary predicates in this domain. Another linguistically relevant domain of interpretation is the set of directed acyclic graphs built over a certain vocabulary of labels and atomic symbols, which pemtits tile simulation of unification grammars of the PATR-II type.</Paragraph> <Paragraph position="11"> SThe usual symbol for the initial nonterminal is s; we prefer to use al for reasons of notational coherence.</Paragraph> <Paragraph position="13"> pl(nil).</Paragraph> <Paragraph position="14"> p3(r).</Paragraph> <Paragraph position="15"> Let us define two new predicates p2 and q by the following clauses: p2(/).</Paragraph> <Paragraph position="16"> q(E, A, B, cons(B, A)). then the definite clause grammar above can be rewritten as:</Paragraph> <Paragraph position="18"> pl(nil).</Paragraph> <Paragraph position="19"> p2(f).</Paragraph> <Paragraph position="20"> pS(r).</Paragraph> <Paragraph position="21"> q(E, A, B, cons(B, A)). that is, in the form of a definite grammar scheme to which has been added a set of auxiliary clauses defining its auxiliary predicates. This definite grammar scheme is in fact identical with DGSI (see previous example). In the sequel of this paper, we will be interested, not directly in transformations of definite clause grammars, but in transformations of definite grammar schemes. The transformation of a definite grammar scheme DGS into DGS ~ will respect the following conditions: * The auxiliary predicates of DGS and of DGS' are the same; * For any definite clause program P which defines the auxiliary predicates in DGS (and therefore also those in DGSI), the definite clause grammar DCG obtained through the adjunction of P to DGS has the same denotational semantics as the definite clause grammar DCG' obtained through the adjunction of P to DGS j.</Paragraph> <Paragraph position="22"> Under the preceding conditions, DGS and DGS' are stud to be equivalent definite grammar schemes. The grammar Vansformatious thus defined are, in a certain sense, universal transformations:, they are valid independently from the interpretation given to the auxiliary predicates.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.2 GGNF for definite clause grammars </SectionTitle> <Paragraph position="0"> Structure of the GGNF The definite grammar scheme DGS, on the terminal vocabulary V, having Q as its set of auxiliary predicates, is said to be in Generalized Greibach Normal Form if: 1. The nonterminals of DGS are partitioned in three distinct subsets: A = {al}, called the initial set; U, called the set of unit nonterminals; N, called the set of co-unit nonterminals.</Paragraph> <Paragraph position="1"> 2. The rules of DGS are partitioned into three groups of rules, called the factorization group (defining the elements of A), the unit group (defining the elements of U), and the co-unit group (defining the elements of N), graphically presented in the following mannet: null factorization rules: definition of the elements of A unit rules: definition of the elements of U co-unit rules: definition of the elements of N 3. Factofization rules are taken among the two following rules: al(Xl ..... X.) --, nl(X1 ..... X,~) , and/or: al(Xl,...,Xn) &quot;~ ul(Xl ..... Xn) , where nl E N and ul E U, and where n E N is the arity of the initial nonterminal al.</Paragraph> <Paragraph position="2"> 4. Unit rules are of the form: u(Xl ..... X,,) --* tl , where u E U is a unit nonterminal of arity m, m E N, and where H is a finite sequence of nonterminal unit goals of U, of auxiliary predicates of Q, or is the empty string \[\]. The group of unit rules forms a subscheme of the GGNF definite grammar scheme (see below).</Paragraph> <Paragraph position="3"> 5. Co-unit rules are of the form:</Paragraph> <Paragraph position="5"> where n E N is a co-unit nonterminal of arity k, k E N. where \[term\] E V and where .hi is a finite sequence of terminal goals of V, of nonterminal unit goals of U, of auxiliary predicates of Q, or of non-terminal co-unit goals of N.</Paragraph> <Paragraph position="6"> 6. The context-free skeleton of DGS, considered as a context-free grammar, is reduced. 9 degA context-free grammar is said to be reduced iff it all its nonterminals are accessible from al and are productive (see \[5, pp. 73-78\]) ACrEs DE COLING-92, Nx~rI~s. 23-28 AO~T 1992 3 6 8 P~toc. Or COL1NG-92, NANTES, AU(~. 23-28, 1992 The nonterminals of A, U, and N am defined in function of one another, as well as in function of \[ \], of the terminals of V, and of the auxiliary predicates of Q, according to the definitional hierarchy illastrated below: G i v :o:: ti:</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.3 Structure of the unit subscheme and offline-parsability </SectionTitle> <Paragraph position="0"> One can remark that: * The group of unit rules is closed: file definition of unit nontenninals involves only unit nonterminals (but no co-unit nonterminal). For this reason, the group of unit roles is called the unit subscheme (or, loosely, the unit subgrammar) of the GGNF definite grammar scheme.</Paragraph> <Paragraph position="1"> * The unit subscheme can only generate the empty string \[ \].</Paragraph> <Paragraph position="2"> The unit subscheme of the GGNF is said to contain a cycle iff there exists a unit nonterminal u(X1, .... X,,~) which &quot;calls itself recursively&quot;, directly or indirectly, inside this group./deg One can show that this property is equivalent to the fact that the context-free skeleton of DGS is infinitely ambiguous, or, in other words, to the fact that DGS is not offline-parsable \[3\].</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.4 'lop-down parsing with the GGNF </SectionTitle> <Paragraph position="0"> Let DGS be a definite grammar scheme in GGNF, having Q for its set of auxiliary predicates. Assume that every element p of Q, of arity n, is defined tlLrough a head clause t ~, of the form: p(TI ..... T,,).</Paragraph> <Paragraph position="1"> where 7'1, ... ,Tn can be any terms; In other words, the auxiliary predicates are constrained to be simply unifications. Let DCG be the definite clause grammar obtained through adjunction of these clauses to DGS. The grammar DCG has the following properties: tdegFor example, the scheme:</Paragraph> <Paragraph position="3"> contains a cycle in ul * n We use the terminology 'head clause' for a clause without body. A more standard terminology would be 'unit clause', but this would conflict with our technical notion of &quot;unit' (a nonterminai generating the empty string \[ \]).</Paragraph> <Paragraph position="4"> I. If tile unit subscheme does not contain a cycle, then, for arty input string Siring, the standard top-down parsing algorithm terminates, after enumerating all the analyses for Siring; 2. If the unit subscheme contains a cycle, the top-down parsing algorithm can terminate or not, depending on the definition given to the auxiliary predicates.</Paragraph> <Paragraph position="5"> We give below three exmnples of definite grammar schemes, and of the equivalent definite grammar schemes in GGNF.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 2.5 Examples </SectionTitle> <Paragraph position="0"> Example 3 Consider the definite grammar scheme</Paragraph> <Paragraph position="2"> The following definite grammar scheme DGS;u is in GGNF and is equivalent to DGSI:</Paragraph> <Paragraph position="4"> {q(E, 1I, B, Z)}, h(Z, X) Suppose P is any auxiliary definite program which defines the auxiliary predicates p l,p2,p3, q. Then the definite clause grammars DCGt and DCG2, obtained by adjunction of this program to DGSt and DGS2, respectively, are equivalent.</Paragraph> <Paragraph position="5"> The unit subscheme of DGS~ does not contain a cycle (it is empty12). One can conclude that DCGI, as well as DCG~, are offline-parsable. If, moreover, it is assumed that P defines the auxiliary predicates as being unifications, then it can be concluded that top-down parsing with DCG~ will enumerate all possible analyses for a given string and terminate.</Paragraph> <Paragraph position="6"> For instance, assume that the auxiliary program consists in the following four clauses (see Example 2): pl(nil).</Paragraph> <Paragraph position="8"> {q(E, V, B, Z)}, h(Z, X) pl(nit).</Paragraph> <Paragraph position="9"> p2(f).</Paragraph> <Paragraph position="10"> p3(r).</Paragraph> <Paragraph position="11"> q(E, A, B, cons(B, A)). These two definite clause grammars are declaratively equivalent. They both accept strings of the form: oh oui ... oui where oui is repeated k times, k E N, and assign to each of these strings the (single) analysis represented by the term:</Paragraph> <Paragraph position="13"> On the other hand, from the operational point of view, if a top-down parsing algorithm is used, DCGt loops on any input string, ts while DCG2 enumerates all solutions on backtracking---here, zero or one solution, depending on whether the string is in the language generated by the grammar--and terminates.</Paragraph> <Paragraph position="14"> Example 4 Consider the following definite grammar scheme DGSz:</Paragraph> <Paragraph position="16"> taRemark that DCG1 is left recursive in a &quot;vicious&quot; (covert) way: nontexminal al calls itself, net immediately, but arc* calling a3, which does not consume anything in the input string.</Paragraph> <Paragraph position="17"> The GGNF of DGSs is DGS4 below:</Paragraph> <Paragraph position="19"> From an inspection of DGS4 it can be concluded that: * The unit subscheme does not contain a cycle} 4 Therefore DGS4, and consequently DGSa, is offline-parsable.</Paragraph> <Paragraph position="20"> * If DCG3 (resp. DCG4) is the definite clause grammar obtained through the adjunction to SDG3 (resp. SDG4) of clauses defining the auxiliary predicates p, q, r, then DCG3 and DCG4 are equivalent; Furthermore, ff these definitions make p, q, r unification predicates, then top-down parsing with DCG4 terminates, after enumerating all solutions.</Paragraph> <Paragraph position="21"> Example 5 Consider the following definite grammar</Paragraph> <Paragraph position="23"> From an inspection of DGS6 it can be concluded that: * The unit subscheme contains acycle. Therefore neither DGS6 nor DGS~ are oflline-parsable.</Paragraph> <Paragraph position="24"> * If DCG5 (resp. DCG6) is the definite clause grammar obtained through the adjunction to DGS5 (resp.</Paragraph> <Paragraph position="25"> DGS~) of clauses defining the auxiliary predicates p, q, then DCG~ and DCG6 are equivalent; * Even if p,q are defined as unifications, top-down parsing with DCG6 may not terminate.</Paragraph> <Paragraph position="26"> Regarding the last point, let us show that different definitions for p and q result in different computational behaviors: a4 It can easily be shown that, iff this is the ease, then the unit nonterminals can be completely eliminated, as in the case of example 3 above.</Paragraph> <Paragraph position="27"> ACn'ES DE COLING-92, NANTES, 23-28 AOUT 1992 3 7 0 PRoc. OF COLING-92, NANTES, Auo. 23-28. 1992 Situation 1 Assume that p,q are defined through the following clauses: p(nil).</Paragraph> <Paragraph position="28"> q(f(X), X).</Paragraph> <Paragraph position="29"> In such a situation, top-down parsing with DCG6 of the input string oh does not terminate: an infinite number of solutions (X = nil, X = f(nil), . . .) are enumerated on backtracking and the program loops, t5 Situation 2 Assume that p,q are defined by the following clauses: p(X) :- fail.</Paragraph> <Paragraph position="30"> q(nil, nil).</Paragraph> <Paragraph position="31"> The first clause defines p as being the 'false' (omitting giving a clause for p would have the same result). In such a situation, top-down parsing with DCG6 terminates.</Paragraph> </Section> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Conclusions: </SectionTitle> <Paragraph position="0"> * Few, if any, norn'tal form results for DCGs (and for their close relatives, unificatiun grammars) were previously known. The GGNF transformation can be applied to any DCG, whether oflline-parsable or not.</Paragraph> <Paragraph position="1"> * In the GGNF, the potential sources of undecidability of the parsing problem are factorized in the unit subgmmmar, a grammar &quot;over&quot; the empty string \[ \]. The GGNF as a whole is oflline-parsable exactly when its unit sub-grammar is. This is the case iff tile unit subgrammar does not contain a nontenninal calling itself recursively.</Paragraph> <Paragraph position="2"> * The GGNF seems to provide the closest analogue to the GNF that one can hope to find for DCGs. 16 * If the DCG (or equivalently its GGNF) is oflline-parsable then top-down parsing with the GGNF finds all solutions to the parsing problem and terminates.</Paragraph> <Paragraph position="3"> * The transformatiou under GGNF can be specialized to the simpler case of a context-free grammar. In this case, the GGNF provides a variant of the standard GNF preserving degrees of ambiguity, t 7 l~This is not the worst possible case: here, at least, a/l so~ lutioos end up being enumerated on backtracking. This would not be the case with more complex definitions for p and q.</Paragraph> <Paragraph position="4"> leJustificatlon for this claim is given in \[3\].</Paragraph> <Paragraph position="5"> lrFor lack of space, this point was not discussed in the paper. Consider the context-free grammar GFG (which is the skeleton of example 5): al ~ \[oh\] algal The GNF for this grantmar is fire grammar: al ~ \[oh\] The original grammar assigns an infinite degree of ambiguity to \[oh\], while its GNF does not (in fact it is easy to show that a GNF can never be infinitely ambiguous). On the other hand, Appendix: Some indications on the transformation method We can only give here some brief indications in tile hope that rite interested reader will be motivated to look into the full description given ill \[3\]. We start with some comments on the GGNF in the CFG case and then move on to the case of definite grammar schemes.</Paragraph> <Paragraph position="6"> CFGs, Algebraic Systems, and the GGNF. The most powerful transformation methods existing for context-free grammars are &quot;algebraic (&quot;matrix based&quot; \[8\]) ones relying on the concepts of formal power series and algebraic systems (see \[5, 91). Using such concepts, a context4ree grammar such as:</Paragraph> <Paragraph position="8"> which represents a fixpoint equation in the variables (or &quot;nontermitmls&quot;) al,a2 on a certain algebraic structure (a non-commutative semiring) of formal power series Noo<<V'~, where Net is the set of non-negative integers, extended to infinity, hfformally, an element of N,,~<<V'>> represents a language on the vocabulary V (such that \[v\] E V), where each string in the language is associated with a number, finite or infinite, which can be interpreted as the degree of ambiguity of this string relative to the system (or, equivalently the corresponding CFG).</Paragraph> <Paragraph position="9"> In the example at hand, it can be easily verified that the following assigments of formal power series to at, a~:</Paragraph> <Paragraph position="11"> satisfy the system, as In terms of rile corresponding CFG, this fact implies that (1) the entpty string \[ \] is recognized exactly once by the grammar, and that each of the strings \[v\], \[v\]\[v\], \[v\]\[v\]iv\] ..... is recognized exactly twice by tile gramutar.</Paragraph> <Paragraph position="12"> From the point of view of transformations, algebraic systems have certain impoltallt advantages over context-free grammars: (1) they make an~biguity degrees explicit, (2) they involve equations (rather ttmn rewriting rules), tile GGNF of Ut/G is:</Paragraph> <Paragraph position="14"> and it can be verified that it preserves degrees of ambiguity.</Paragraph> <Paragraph position="15"> This difference, which may be considered minor in tile case of CFGS, plays an important role in the transformation of DCGs.</Paragraph> <Paragraph position="16"> solution to this system.</Paragraph> <Paragraph position="17"> Adds DE COLING-92. NANTES. 23-28 AO~r 1992 3 7 1 PI~oc. OF COLING-92. NANTES, AUG. 23-28. 1992 where &quot;equals can be replaced by equals&quot;, and (3) they possess a rich algebraic structure (addition, multiplication) which endows them with mathematical perspicuity and power.</Paragraph> <Paragraph position="18"> There are some substantial differences between the transformation steps used to obtain the GGNF of an algebraic system and the standard ones used to obtain its GNF, the principal one lying in the necessity to preserve degrees of ambiguity at each step. In the GNF case, the initial step consists in first transforming the initial system into a proper system (a notion analoguous to that of proper CFG)--an operation which does not preserve degrees of ambiguity--and then performing the main transformation. For this reason, the transformation steps in the GGNF case must be formulated in a more global way, which, among other complications, involves the use of certain identities on regular languages) 9 However, there are also important similarities between the GNF and the GGNF transformations, among them the observation that the elementary algebraic system in the variable a on the vocabulary V = {Iv\], It\]}: a = a Iv\] + \[t\] has the unique solution a = \[t\] Iv\]*, an observation which can be much generalized, and which plays a central role in both cases.</Paragraph> <Paragraph position="19"> DCGs, Mixed Systems, and the GGNF. In ordcr to define the GGNF in the case of Definite Grammar Schemes (or, equivalently. DCGs), we have introduced so-eallcd mixed systems, a generalization of algebraic systems capable of representing association of structures to strings. Without going into details, let's consider the following definite grammar scheme:</Paragraph> <Paragraph position="21"> This scheme is reformulated as the mixed system: alx = aly a2z IY~ z + a2y qYz + r,: a2~ = \[v\] s~ In this system, the variables (or nonterminals) at, a~ are seen as functions: /g -~ B<<V*>> (where B is the set of booleans {0, 1}), that is, as functions mapping elements of a set E (often taken to be a Herbrand universe), representing linguistic structures, into formal series of B~V*>>, that is, into languages over V. This can be seen to correspond to the intuitive notion that a nonterminal &quot;associates&quot; structures to strings. As for p, q, r, s, they are seen respectively as fonctions from E ~, E ~, E, E into B C B<C/:V*>>, that is, as predicates of different arities over E. The system represents a fixpoint equation on the variables ax, a2, given the constants \[v\], p, q, r, s. 2deg Although mixed systems are defined on more complex structures than are algebraic systems, the transformation methods for algebraic systems generalize without difficuhy to their case, and these methods form the mathematieal basis of the results reported in this paper.</Paragraph> <Paragraph position="22"> l~Such as the identity (e + \])* ~ e*(ye*) deg.</Paragraph> <Paragraph position="23"> ZdegFor the interested reader, the given system expresses (using &quot;conventions of summation&quot; familiar in tensor algebra) the</Paragraph> </Section> class="xml-element"></Paper>