File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/ackno/94/p94-1029_ackno.xml
Size: 33,349 bytes
Last Modified: 2025-10-06 13:52:03
<?xml version="1.0" standalone="yes"?> <Paper uid="P94-1029"> <Title>AN EXTENDED THEORY OF HEAD-DRIVEN PARSING</Title> <Section position="2" start_page="0" end_page="216" type="ackno"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> We show that more head-driven parsing algorithms can he formulated than those occurring in the existing literature. These algorithms are inspired by a family of left-to-right parsing algorithms from a recent publication. We further introduce a more advanced notion of &quot;head-driven parsing&quot; which allows more detailed specification of the processing order of non-head elements in the right-hand side. We develop a parsing algorithm for this strategy, based on LR parsing techniques.</Paragraph> <Paragraph position="1"> Introduction According to the head-driven paradigm, parsing of a formal language is started from the elements within the input string that are most contentful either from a syntactic or, more generally, from an information theoretic point of view. This results in the weakening of the left-to-right feature of most traditional parsing methods. Following a pervasive trend in modern theories of Grammar (consider for instance \[5, 3, 11\]) the computational linguistics community has paid large attention to the head-driven paradigm by investigating its applications to context-free language parsing.</Paragraph> <Paragraph position="2"> Several methods have been proposed so far exploiting some nondeterministic head-driven strategy for context-free language parsing (see among others \[6, 13, 2, 14\]). All these proposals can be seen as generalizations to the head-driven case of parsing prescriptions originally conceived for the left-to-right case. The methods above suffer from deficiencies that are also noticeable in the left-to-right case. In fact, when more rules in the grammar share the same head element, or share some infix of their right-hand side including the head, the recognizer nondeterministically guesses a rule just after having seen the head. In this way analyses that could have been shared are duplicated in the parsing process.</Paragraph> <Paragraph position="3"> Interesting techniques have been proposed in the left-to-right deterministic parsing literature to overcome redundancy problems of the above kind, thus reducing *Supported by the Dutch Organisation for Scientific Research (NWO), under grant 00-62-518</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Giorgio Satta </SectionTitle> <Paragraph position="0"> Universith di Padova Dipartimento di Elettronica e Informatica via Gradenigo 6/A, 35131 Padova Italy satt a@dei, unipd, it the degree of nondeterminism of the resulting methods. These solutions range from predictive LR parsing to LR parsing \[15, 1\]. On the basis of work in \[8\] for nondeterministic left-to-right parsing, we trace here a theory of head-driven parsing going from crude top-down and head-corner to more sophisticated solutions, in the attempt to successively make more deterministic the behaviour of head-driven methods.</Paragraph> <Paragraph position="1"> Finally, we propose an original generalization of head-driven parsing, allowing a more detailed specification of the order in which elements of a right-hand side are to be processed. We study in detail a solution to such a head-driven strategy based on LR parsing. Other methods presented in this paper could be extended as well.</Paragraph> </Section> <Section position="2" start_page="0" end_page="216" type="sub_section"> <SectionTitle> Preliminaries </SectionTitle> <Paragraph position="0"> The notation used in the sequel is for the most part standard and is summarised below.</Paragraph> <Paragraph position="1"> Let D be an alphabet (a finite set of symbols); D + denotes the set of all (finite) non-empty strings over D and D* denotes D + U {c}, where c denotes the empty string. Let R be a binary relation; R + denotes the transitive closure of R and R* denotes the reflexive and transitive closure of R.</Paragraph> <Paragraph position="2"> A context=free grammar G = (N, T, P, S) consists of two finite disjoint sets N and T of nonterminal and terminal symbols, respectively, a start symbol S E N, and a finite set of rules P. Every rule has the form A ~ a, where the left-hand side (lhs) A is an element from N and the right-hand side (rhs) e~ is an element from V +, where V denotes (N U T). (Note that we do not allow rules with empty right-hand sides. This is for the sake of presentational simplicity.) We use symbols A, B, C,... to range over N, symbols X, Y, Z to range over V, symbols a,/3, 7 .... to range over V*, and v, w, z,... to range over T*.</Paragraph> <Paragraph position="3"> In the context-free grammars that we will consider, called head grammars, exactly one member from each rhs is distinguished as the head. We indicate the head by underlining it, e.g., we write A --* aXfl. An expression A --+ cr7/3 denotes a rule in which the head is some member within 7- We define a binary relation ~ such that B <> A if and only if A --* otB__fl for some a and/3. Relation (>* is called the head-corner relation.</Paragraph> <Paragraph position="4"> For technical reasons we sometimes need the augmented set of rules PI, consisting of all rules in P plus the extra rule S ~ ~ .kS, where ff is a fresh nontermihal, and 3_ is a fresh terminal acting as an imaginary zeroth input symbol. The relation p,t is extended to a relation --. on V* x V* as usual. We write ~ P-~ 6 whenever 7 &quot;-'* 6 holds as an extension of p E P * We write 7t&quot;P2&quot;'~&quot; 6 if 7 ~ 61 ~ 6 P&quot; 2&quot;&quot; &quot;6~-x ---,6 For a fixed grammar, a head-driven recognition algorithm can be specified by means of a stack automaton a = (T, alph, Init(n), ~--,, Fin(n)), parameterised with the length n of the input. In A, symbols T and Aiph are the input and stack alphabets respectively, lair(n), Fin(n) E Alph are two distinguished stack symbols and ~ is the transition relation, defined on Alph + x Alph + and implicitly parameterised with the input.</Paragraph> <Paragraph position="5"> Such an automaton manipulates stacks F E Aiph +, (constructed from left to right) while consulting the symbols in the given input string. The initial stack is Init(n). Whenever F ~ F r holds, one step of the automaton may, under some conditions on the input, transform a stack of the form F&quot;F into the stack F&quot;F ~. In words, F ~ F ~ denotes that if the top-most few symbols on the stack are F then these may be replaced by the symbols I'. Finally, the input is accepted whenever the automaton reaches stack Fin(n). Stack automata presented in what follows act as recognizers. Parsing algorithms can directly be obtained by pairing these automata with an output effect.</Paragraph> <Paragraph position="6"> A family of head-driven algorithms This section investigates the adaptation of a family of left-to-right parsing algorithms from \[8\], viz. top-down, left-corner, PLR, ELR, and LR parsing, to head grammars. null Top-down parsing The following is a straightforward adaptation of top-down (TD) parsing \[1\] to head grammars.</Paragraph> <Paragraph position="7"> There are two kinds of stack symbol (items), one of the form \[i, A, j\], which indicates that some subderiva- null tion from A is needed deriving a substring of ai+l * * * aj, the other of the form \[i, k, A --* a * 7 */3, m, j\], which also indicates that some subderivation from A is needed deriving a substring of ai+l * * * aj, but specifically using the rule A --~ ot7/3, where 7 -'-~* ak+x ... a,n has already been establishe~. Formally, we have</Paragraph> <Paragraph position="9"> We call a grammar head-recursive ifA ~+ A for some A. Head-driven TD parsing may loop exactly for the grammars which are head-recursive. Head reeursion is a generalization of left recursion for traditional TD parsing. null In the ease of grammars with some parameter mechanism, top-down parsing has the advantage over other kinds of parsing that top-down propagation of parameter values is possible in collaboration with context-free parsing (eft the standard evaluation of definite clause grammars), which may lead to more efficient processing. This holds for left-to-right parsing as well as for head-driven parsing \[10\].</Paragraph> <Paragraph position="10"> Head-corner parsing The predictive steps from Algorithm 1, represented by Clause 0 and supported by Clauses 0a and 0b, can be compiled into the head-corner relation ~)*. This gives the head-corner (HC) algorithm below. The items from IT D are no longer needed now. We define I Hc = If D.</Paragraph> <Paragraph position="11"> Algorithm 2 (head-corner) A ~tc = (T, I Hc, Init(n), ~-*, Fin(n)), where Init(n) = \[-1, -1, S' ~ * 3- * S, 0, n\], Fin(n) = \[-1, -1, S' --~ * IS =, n, hi, and ~ is given by the following clauses. (Clauses lb, 2b, 3b, 4b are omitted, since these are symmetric to la, 2a, 3a, 4a, respectively.) la \[i,k,a--*aoT*B/3, m,j\]~--~ \[i, k, A ---* a * 7 * B/3, m, j\]\[m, p- 1, C ~ rl.a.O, p, j\] where there are C ~ r/a0 E pt and p such that m < p < j and ap = a and C ~* B 2a \[i,k,A--* a * 7 * a~,m,j\] ~-* \[i,k,A--* a * -/a * 13, m+ 1,j\] provided m < j and am+l ---- a 3a \[i, k, D ~ a*7*A13, m,j\]\[i', k', B ~ .6., ra',jq ~-. \[i, k, D --~ o~.7.A13, rn, j\]\[i', k', C --~ ~.B.8, rn', j'\] provided m = C/, where there is C --* r/_B0 E pt such that C <>* A (j = j' is automatically satisfied) 4a \[i,k,A ~ a.7.B13, m,j\]\[i',k',B ---* .6.,m',j'\] \[i, k,A ~ a *'IB * 13, m', j\] provided m = k ~ (m = i' and j = j' are automatically satisfied) Head-corner parsing as well as all algorithms in the remainder of this paper may loop exactly for the grammars which are cyclic (where A ---~+ A for some A). The head-corner algorithm above is the only one in this paper which has already appeared in the literature, in different guises \[6, 13, 2, 14\].</Paragraph> <Paragraph position="12"> Predictive HI parsing We say two rules A --* al and B --* ~2 have a common infix a if al = 1310C/-/1 and a2 = 132(:~-/2, for some 131,132, 71 and -/2. The notion of common infix is an adaptation of the notion of common prefix \[8\] to head grammars. If a grammar contains many common infixes, then HC parsing may be very nondeterministie; in particular, Clauses 1 or 3 may be applied with different rules C ---} 0_a0 E pt or C ---* r/__B0 E P~ for fixed a or B. In \[15\] an idea is described that allows reduction of nondeterminism in case of common prefixes and left-corner parsing. The resulting algorithm is called predictive LR (PLR) parsing. The following is an adaptation of this idea to HC parsing. The resulting algorithm is called predictive HI (PHI) parsing. (HI parsing, to be discussed later, is a generalization of LR parsing to head grammars.) First, we need a different kind of item, viz. of the form \[i, k,A --~ 7, re, j\], where there is some rule A --* a_713. With such an item, we simulate computation of different items \[i,k,A --* (~ * 7 * 13, re, j\] E I Hc, for different a and 13, which would be treated individually by an HC parser. Formally, we have I Pm = {\[i,k,A--,%m,j\]lA--,ctT_13EP?A i<_k<m<_j} Algorithm 3 (Predictive HI) A PHI = (T, I PHI, lnit(n), ~-~, Fin(n)), where Init(n) = \[-1, -1, S' ---* _1_, 0, n\], Fin(n) = \[-1,-1, S' ~ IS, n, n\], and ~-* is given by the following (symmetric &quot;b-clauses&quot; omitted). la \[i,k,A---*%m,j\]~-* \[i,k,A ~ % m,j\]\[m,p- 1, C --* a,p,j\] where there are C ~ y_a0, A ---* aTB13 ~ pt and p such that m < p < j and ap = a an(:\] C O* B 2a \[i,k,A--* 7, m,j\] .-. \[i,k,A--* Ta, m + 1,j\] provided m < j and am+~ = a, where there is A --* aT_.a13 ~ pt 3a \[i, k, o -~ % m,j\]\[i', ~', B -~ ~, m', j'\] \[i,k,D ---* 7, m,j\]\[i', k',C -+ B,m',j'\] provided m = i' and B --* 6 E pt, where there are D --* a_TA13, C ---* q_B0 E pt such that C <>* A 4a \[i,k,A ---* 7, m,j\]\[i',k',B ~ 6, m',j'\] \[i, k, A ---, -/B, m', j\] provided m = k' and B ~ _6 E pt, where there is A .---. ~7_.Bfl E pt Extended HI parsing The PHI algorithm can process simultaneously a common infix a in two different rules A --* 131_~-/1 and A --* 132_~72, which reduces nondeterminism.</Paragraph> <Paragraph position="13"> We may however also specify an algorithm which succeeds in simultaneously processing all common infixes, irrespective of whether the left-hand sides of the corresponding rules are the same. This algorithm is inspired by exlended LR (ELR) parsing \[12, 7\] for extended context-free grammars (where right-hand sides consist of regular expressions over V). By analogy, it will be called extended HI (EHI) parsing.</Paragraph> <Paragraph position="14"> This algorithm uses yet another kind of item, viz. of the form \[i,k,{A1,A~,...,Ap} --* -/,m,j\], where there exists at least one rule A --* a_713 for each A E {A1,Au,...,Ap}. With such an item, we simulate computation of different items \[i, k, A --* a * -/ * 13, m, j\] E I He which would be treated individually by an HC parser. Formally, we have I Em = {\[i, k, zx --+ -/, m, j\] I 0CAC{A I A ---, a-/13 E P t} A i<_k<m<j} Algorithm 4 (Extended HI) A EH1 = (T, I EHI, Init(n), ~-*, Fin(n)), where Init(n) = \[-1,-1, {S'} --+ .1_, 0, n\], Fin(n) = \[-1,-1, {S'} ~ _l_S, n, n\], and ~-~ is given by: la \[i,k,A---*%m,j\] \[i, k, A --. % m, j\] Ira, p - 1, A' ---. a, p, j\] where there is p such that m < p < j and ap = a and A' = {C \[ qc ~ 71a_O,A --~aT_Bfl E pI(A E A A C <~* B)} is not empty 2a \[i,k,A---~ %m,j\] ~ \[i,k,A'---* Ta, m + 1,j\] provided ra < j and am+a = a and A' = {A E A \[ A --* a'/aft E pt} is not empty 3a \[i, k, A --+ 7, m, j\]\[i', k', A' ---* 6, m', j'\] ~-+ \[i,k,A --+ %m,j\]\[i',k',A&quot; ---, B,m',j'\] provided rn = i' and B --* ti E pt for some B E A' such that A&quot; = {C \] 3C ---+ yB6, D ---* aT_A~ E pt(D E A A C <>* A)} is not empty 4a \[i, k,A -. %m,j\]\[i',k',A' ~ a,,n',j'\] \[i,k,h&quot;-+ -/B,m',j\] provided m = k' and B ---+ 6_ E pl for some B E A' such that A&quot; = {A E A I A ---+ crTB13 E pt} is not empty This algorithm can be simplified by omitting the sets A from the items. This results in common infix (CI) parsing, which is a generalization of common prefix parsing \[8\]. CI parsing does not satisfy the correct subsequence property, to be discussed later. For space reasons, we omit further discussion of CI parsing. HI parsing If we translate the difference between ELR and LR parsing \[8\] to head-driven parsing, we are led to HI parsing, starting from EHI parsing, as described below. The algorithm is called HI because it computes head-inward derivations in reverse, in the same way as LR parsing computes rightmost derivations in reverse \[1\]. Headinward derivations will be discussed later in this paper. Ill parsing uses items of the form \[i, k, Q, m, j\], where Q is a non-empty set of &quot;double-dotted&quot; rules A --* a * 3' * ft. The fundamental difference with the items in I EHl is that the infix 3' in the right-hand sides does not have to be fixed. Formally, we have</Paragraph> <Paragraph position="16"> We explain the difference in behaviour of Ill parsing with regard to EHI parsing by investigating Clauses la and 2a of Algorithm 4. (Clauses 3a and 4a would give rise to a similar discussion.) Clauses la and 2a both address some terminal ap, with m < p < j. In Clause la, the case is treated that ap is the head (which is not necessarily the leftmost member) of a rhs which the algorithm sets out to recognize; in Clause 2a, the case is treated that ap is the next member of a rhs of which some members have already been recognized, in which case we must of course have p = m + 1.</Paragraph> <Paragraph position="17"> By using the items from I t4r we may do both kinds of action simultaneously, provided p = m + 1 and ap is the leftmost member of some rhs of some rule, where it occurs as head) The lhs of such a rule should satisfy a requirement which is more specific than the usual requirement with regard to the head-corner relation. 2 We define the left head-corner relation (and the right head-corner relation, by symmetry) as a subrelation of the head-corner relation as follows.</Paragraph> <Paragraph position="18"> We define: B / A if and only if A ---* Bo~ for some a. The relation Z* now is called the left head-corner relation.</Paragraph> <Paragraph position="19"> We define</Paragraph> <Paragraph position="21"> parse will be found, due to the absence of rules with empty right-hand sides ( epsiion rules).</Paragraph> <Paragraph position="22"> 2Again, the absence of epsilon rules is of importance here. {C~.X.O\[C~ X.OEPtA SA--.a .7 . Bfl E Q(C /* B)}U {A~a. TX .~ \[A ~a.'r. X3E Q} and assume symmetric definitions for gotolefl 1 and gotoleft~.</Paragraph> <Paragraph position="23"> The above discussion gives rise to the new Clauses la and 2a of the algorithm below. The other clauses are derived analogously from the corresponding clauses of Algorithm 4. Note that in Clauses 2a and 4a the new item does not replace the existing item, but is pushed on top of it; this requires extra items to be popped off the stack in Clauses 3a and 4a. 3 Algorithm 5 (HI) A m = (T, I Hz, Init(n), ~&quot;h Fin(n)), where lnit(n) = \[-1, -1, {S' --+ * \]- * S}, O, n\], Fin(n) = \[-i, -1, {S' -~ * .kS .}, n, n\], and ~ defined: la \[i,k,Q,m,j\] ~ \[i,k,Q,m,j\]\[m,p- 1,Q',p,j\] where there is p such that m+ 1 < p_< j and ap = a and Q' = gotorightl(Q, a) is not empty 2a \[i,k,Q,m,j\]~-~ \[i,k,Q,m,j\]\[i,k,Q',m+ 1,j\] provided m < j and am+l = a and Q' = gotoright~(Q, a) is not empty 3a \[i,k,Q,m,j\]Ii...I,_l\[i',k',Q',m',j'\] \[i, k, Q, .~, ~\]\[i', ~', Q&quot;, m', j'\] provided m < k', where there is B ---* * X1...Xr * E Q' such that Q&quot; = gotorighti(Q, B) is not empty 4a \[i, k, Q, m, j\]I~... I,_~ \[i', k', Q', m', j'\] \[i, k, Q, m, j\]\[i, k, Q&quot;, m', j\] provided m = k' or k = k ', where there is B ~ * X1 ...Xr * E Q' such that Q&quot; = gotorighl~(Q, B) is not empty We feel that this algorithm has only limited advantages over the EHI algorithm for other than degenerate head grammars, in which the heads occur either mostly leftmost or mostly rightmost in right-hand sides. In particular, if there are few sequences of rules of the form A ---* A___Lai,Ax ~ A__2ot2,...,Am-1 --~ Amain, or of the form A ---, alA__ i, A1 -', a2A__g,..., A,~-i ~ amAin, then the left and right head-corner relations are very sparse and HI parsing virtually simplifies to EHI parsing. null In the following we discuss a variant of head grammars which may provide more opportunities to use the advantages of the LR technique.</Paragraph> <Paragraph position="24"> A generalization of head grammars The essence of head-driven parsing is that there is a distinguished member in each rhs which is recognized first. Subsequently, the other members to the right and to the left of the head may be recognized.</Paragraph> <Paragraph position="25"> An artifact of most head-driven parsing algorithms is that the members to the left of the head are recognized strictly from right to left, and vice versa for the members to the right of the head (although recognition of the members in the left part and in the right part may be interleaved). This restriction does not seem to be justified, except by some practical considerations, and it prevents truly non-directional parsing.</Paragraph> <Paragraph position="26"> We propose a generalization of head grammars in such a way that each of the two parts of a rhs on both sides of the head again have a head. The same holds recursively for the smaller parts of the rhs. The consequence is that a rhs can be seen as a binary tree, in which each node is labelled by a grammar symbol. The root of the tree represents the main head. The left son of the root represents the head of the part of the rhs to the left of the main head, etc.</Paragraph> <Paragraph position="27"> We denote binary trees using a linear notation. For example, if a and /5 are binary trees, then (cOX(f 0 denotes the binary tree consisting of a root labelled X, a left subtree a and a right subtree ft. The notation of empty (sub)trees (e) may be omitted. The relation --** ignores the head information as usual.</Paragraph> <Paragraph position="28"> Regarding the procedural aspects of grammars, generalized head grammars have no more power than traditional head grammars. This fact is demonstrated by a transformation r head from the former to the latter class of grammars. A transformed grammar rhead(e) contains special nonterminals of the form \[c~\], where c~ is a proper subtree of some rhs in the original grammar G = (T, N, P, S). The rules of the transformed grammar are given by: A --* \[a\] X \[fl\] for each A ---* (a)X(f 0 * P \[(a)X(/~)\] ~ In\] X \[fl\] for each proper subtree (a)X(fl) of a rhs in G where we assume that each member of the form \[e\] in the transformed grammar is omitted.</Paragraph> <Paragraph position="29"> It is interesting to note that vh,~d is a generalization of a transformation vt,~o which can be used to transform a context-free grammar into two normal form (each rhs contains one or two symbols). A transformed grammar rt~o(G) contains special nonterminals of the form \[a\], where c~ is a proper suffix of a rhs in G. The rules of rtwo(G) are given by A --~ X \[a\] for each A ---* Xa * P \[XC/~\] --* X \[a\] for each proper suffix Xa of a rhs in G where we assume that each member of the form \[e\] in the transformed grammar is omitted.</Paragraph> <Paragraph position="30"> HI parsing revisited Our next step is to show that generalized head grammars can be effectively handled with a generalization of HI parsing (generalized HI (GHI) parsing). This new algorithm exhibits a superficial similarity to the and rules with the one between kernel and nonkernel items of LR parsing \[1\].</Paragraph> <Paragraph position="31"> the smallest set which satisfies</Paragraph> <Paragraph position="33"> The trees or rules of which the main head is some specified symbol X can be selected from a set Q by goto(Q, x)= (t * Q It = = A -.</Paragraph> <Paragraph position="34"> In a similar way, we can select trees and rules according to a left or right subtree.</Paragraph> <Paragraph position="36"> We assume a symmetric definition for gotoright.</Paragraph> <Paragraph position="37"> When we set out to recognize the left subtrees from a set of trees and rules, we use the following function.</Paragraph> <Paragraph position="39"> We assume a s~,mmetric definition for right.</Paragraph> <Paragraph position="40"> The set I all1 contains different kinds of item: * Items of the form \[i,k,Q,m,j\], with i < k < m < j, indicate that trees (a)X(fl) and rules A ---* (a)X(~) in Q are needed deriving a substring of ai+l ... aj, where X ---~* ak+x...a,~ has already been established. null * Items of the form \[k, Q, m, j\], with k < m < j, indicate that trees (cOX(fl) and rules A ~ (a)X(fl) ill V are needed deriving a substring of ak+l. * * aj, where (~X ---~* ak+l ... a,, has already been established.</Paragraph> <Paragraph position="41"> Items of the form \[i, k, Q, m\] have a symmetric meaning. null * Items of the form \[k, t, m\], with k < m, indicate that</Paragraph> <Paragraph position="43"> where there is p such that m < p < j and Q' = --* _L(S)}, 0, 4\] _L(S)}, 0, 4\] \[0, 3, {S ---, ((c)A(b))s, S ~ (A(d))s, S ---* (B)s}, 4, 4\] -+ J_(S)}, 0, 4\] \[0, 3, {S -+ ((c)A(b))s, S ---, (A(d))s, S -~ (B)s}, 4\] .L(S)}, 0, 4\] \[0, 3, {S ~ ((c)A(b))s, S ---* (A(d))s, S ---* (B)s}, 4\] \[0, 1, {A ~ a}, 2, 3\] ---* .I_(S)}, 0, 4\] \[0, 3, {S ~ ((c)A(b))s, S ---* (m(d))s, S ---* (B)s}, 4\] \[0, 1, {A ---* a}, 2\] where there is p such that m < p _< j and Q' golo(righl(Q), ap) is not empty 35 \[i,k,Q,m\]~--* \[i,k,Q,m\]\[i,p- 1, Q',p,k\] where there is p such that i < p < k and Q' = The algorithm above is based on the transformation rhead. It is therefore not surprising that this algorithm is reminiscent of LR parsing \[1\] for a transformed grammar rt~oo(G). For most clauses, a rough correspondence with actions of LR parsing can be found: Clauses 2 and 3 correspond with shifts. Clause 5 corresponds with reductions with rules of the form \[Xa\] ---, X \[a\] in rtwo(G). Clauses 6 and 7 correspond with reductions with rules of the form A --* X \[a\] in rtwo(G). For Clauses 1 and 4, corresponding actions are hard to find, since these clauses seem to be specific to generalized head-driven parsing.</Paragraph> <Paragraph position="44"> The reason that we based Algorithm 6 on rheaa is twofold. Firstly, the algorithm above is more appropriate for presentational purposes than an alternative algorithm we have in mind which is not based on &quot;/'head , and secondly, the resulting parsers need less sets Q.</Paragraph> <Paragraph position="45"> This is similar in the case of LR parsing. 5 Example 1 Consider the generalized head grammar with the following rules: Assume the input is given by ala2a3a4 = c a b s. The steps performed by the algorithm are given in Figure 1. \[\] Apart from HI parsing, also TD, tIC, PHI, and EHI parsing can be adapted to generalized head-driven parsing. null Correctness The head-driven stack automata studied so far differ from one another in their degree of nondeterminism.</Paragraph> <Paragraph position="46"> In this section we take a different perspective. For all these devices, we show that quite similar relations exist between stack contents and the way input strings are visited. Correctness results easily follow from such characterisations. (Proofs of statements in this section are omitted for reasons of space.) Let G = (N, T, P, S) be a head grammar. To be used below, we introduce a special kind of derivation.</Paragraph> <Paragraph position="47"> sit is interesting to compare LR parsing for a context-free grammar G with LR parsing for the transformed grammar rtwo(G). The transformation has the effect that a reduction with a rule is replaced by a cascade of reductions with smaller rules; apart from this, the transformation does not affect the global run-time behaviour of LR parsing. More serious are the consequences for the size of the parser: the required number of LR states for the transformed grammar is smaller \[9\].</Paragraph> <Paragraph position="48"> the composition of a-derivations Pi, 1 < i < 3. The starting place of each a-derivation is indicated, each triangle representing the application of a single production. null where Pl,P2 .... ,Ps are productions in pt, s > 1, Pi rewrites the unique nonterminai occurrence introduced as the head element of pi-1 for 2 < i < S, p, = (B c~) and p E P* rewrites t 1 into z E T +.</Paragraph> <Paragraph position="49"> The indicated occurrence of string 7/in (1) is called the handle of the a-derivation. When defined, the right-most (leftmost) nonterminal occurrence in a (/~, respectively) is said to be adjacent to the handle. The notions of handle and adjacent nonterminal occurrence extend in an obvious way to derivations of the form CA0 L C/70Z710 , where A :--, 70z71 is a a-derivation. By composing a-derivations, we can now define the class of sentential forms we are interested in. (Figure 2 shows a case example.) Definition 2 A head-outward sentential form is obtained through a derivation</Paragraph> <Paragraph position="51"> where q > 1, each Pi is a a-derivation and, for 2 < i < q, only one string 7i-l,j is rewritten by applying Pi at a nonterminal occurrence adjacent to the handle of pi-1.</Paragraph> <Paragraph position="52"> Sequence Pl,p~,...,pq is said to derive the sentential form in (2).</Paragraph> <Paragraph position="53"> The definition of head-outward sentential form suggests a corresponding notion of head-outward derivation. Informally, a head-outward derivation proceeds by recursively expanding to a terminal string first the head of a rule, and then the remaining members of the rhs, in an outward order* Conversely, we have head-inward (HI) derivations, where first the remaining members in the rhs are expanded, in an inward order (toward the head), after which the head itself is recursively expanded. Note that HI parsing recognizes a string by computing an HI derivation in reverse (of. Lit parsing).</Paragraph> <Paragraph position="54"> Let w = axa2 * .-an, n > 1, be a string over T and let a0 = .1_. For -1 < i < j < n, we write (i,j\],, to denote substring ai+ l * * * aj .</Paragraph> <Paragraph position="55"> Theorem 1 For A one of Anc, A PH1 or A EH~, the following facts are equivalent:</Paragraph> <Paragraph position="57"> for the respective automata, I < t < q; (it) a sequence of a-derivations Pl, P2,..., Pq, q >_ 1, derives a head-outward sentential form</Paragraph> <Paragraph position="59"> where lr is a permutation of {1 .... ,q}, Pt has handle ~?t which derives (kTr(t),m~r(t)\]w, I < t < q, and m~(t-1) < kTr(t), 2 < t < q.</Paragraph> <Paragraph position="60"> As an example, an accepting stack configuration \[-1,-1,S l ---* * IS .,n,n\] corresponds to a a-derivation (S' ---+ IS)p, p E P+, with handle _I_S which derives the head-outward sentential form 70(-1, n\]~71 = _l_w, from which the correctness of the head-corner algorithm follows directly.</Paragraph> <Paragraph position="61"> If we assume that G does not contain any useless symbols, then Theorem 1 has the following consequence, if the automaton at some point has consulted the symbols ail,ai2,...,aim from the input string, il ..... im increasing indexes, then there is a string in the language generated by G of the form voai~vl ...vm_lai,~vm.</Paragraph> <Paragraph position="62"> Such a statement may be called correct subsequence property (a generalization of correct prefix property \[8\]). Note that the order in which the input symbols are consulted is only implicit in Theorem 1 (the permutation 7r) but is severely restricted by the definition of head-outward sentential form. A more careful characterisation can be obtained, but will take us outside of the scope of this paper.</Paragraph> <Paragraph position="63"> The correct subsequence property is enforced by the (top-down) predictive feature of the automata, and holds also for A TD and A HI. Characterisations similar to Theorem 1 can be provided for these devices. We investigate below the GHI automaton.</Paragraph> <Paragraph position="64"> For an item I E 1 GUt of the form \[i,k,Q,m,j\], \[k, Q, m, j\], It, k, Q, m\] or \[k, t, m\], we say that k (m respectively) is its left (right) component. Let N ~ be the set of nonterminals of the head grammar rhead(G).</Paragraph> <Paragraph position="65"> We need a function yld from reachable items in I am into (N' tO T)*, specified as follows. If we assume</Paragraph> <Paragraph position="67"> It is not difficult to show that the definition of yld is consistent (i.e. the particular choice of a tree or rule from Q is irrelevant).</Paragraph> <Paragraph position="68"> Theorem 2 The following facts are equivalent: (i) A cHl reaches a configuration whose stack contents are Il I~ . . . Iq, q > 1, with kt and mt the left and right components, respectively, of It, and yld(It) = Yt, for l<t<q; (it) a sequence of tr-derivations Pl,P2,...,Pq, q > 1, derives in rh~aa(G) a head-outward sentential form 7o(k~r(1), m,r(1)\]w&quot;Y1 (kr(2), mr(2)\]w72 * * * &quot;'&quot; 7q- 1 (k~-(q), m~(q)\]w')'q where ~r is a permutation of {1,...,q}, Pt has handle tit which derives (k~(t),m,~(t)\]w, 1 < t < q, and rex(t-l) <_ kx(t), 2 < t < q.</Paragraph> <Paragraph position="69"> We have presented a family of head-driven algorithms: TD, I/C, Pill, EHI, and HI parsing. The existence of this family demonstrates that head-driven parsing covers a range of parsing algorithms wider than commonly thought.</Paragraph> <Paragraph position="70"> The algorithms in this family are increasingly deterministic, which means that the search trees have a decreasing size, and therefore simple realizations, such as backtracking, are increasingly effEcient.</Paragraph> <Paragraph position="71"> However, similar to the left-to-right case, this does not necessarily hold for tabular realizations of these algorithms. The reason is that the more refined an algorithm is, the more items represent computation of a single subderivation, and therefore some subderivations may be computed more than once. This is called redundancy. Redundancy has been investigated for the left-to-right case in \[8\], which solves this problem for ELR parsing. Head-driven algorithms have an additional source of redundancy, which has been solved for tabular I-IC parsing in \[14\]. The idea from \[14\] can also be applied to the other head-driven algorithms from this paper.</Paragraph> <Paragraph position="72"> We have further proposed a generalization of head-driven parsing, and we have shown an example of such an algorithm based on LR parsing. Prospects to even further generalize the ideas from this paper seem promising.</Paragraph> </Section> </Section> class="xml-element"></Paper>