File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/p90-1001_metho.xml

Size: 28,220 bytes

Last Modified: 2025-10-06 14:12:36

<?xml version="1.0" standalone="yes"?>
<Paper uid="P90-1001">
  <Title>POLYNOMIAL TIME PARSING OF COMBINATORY CATEGORIAL GRAMMARS*</Title>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> In this paper we present a polynomial time parsing algorithm for Combinatory Categorial Grammar.</Paragraph>
    <Paragraph position="1"> The recognition phase extends the CKY algorithm for CFG. The process of generating a representation of the parse trees has two phases. Initially, a shared forest is build that encodes the set of all derivation trees for the input string. This shared forest is then pruned to remove all spurious ambiguity.</Paragraph>
  </Section>
  <Section position="3" start_page="0" end_page="1" type="metho">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Combinatory Categorial Grammar (CCG) \[7, 5\] is an extension of Classical Categorial Grammar in which both function composition and function application are allowed. In addition, forward and backward slashes are used to place conditions on the relative ordering of adjacent categories that are, to be combined. There has been considerable interest in parsing strategies for CCG' \[4, 11, 8, 2\]. One of the major problems that must be addressed is that of spurious ambiguity. This refers to the possibility that a CCG can generate a large number of (exponentially many) derivation trees that assign the same function argument structure to a string. In \[9\] we noted that a CCG can also generate exponentially many genuinely ambiguous (non-spurious)derivations. This constitutes a problem for the approaches cited above since it resuits in their respective algorithms taking exponential time in the worst case. The algorithm we present is the first known polynomial time parser for CCG.</Paragraph>
    <Paragraph position="1"> The parsing process has three phases. Once the recognizer decides (in the first phase) that an input can be generated by the given CCG the set of parse *This work was partially supported by NSF grant IRI8909810. We are very grateful to Aravind Joshi, Michael Niv,  trees can be extracted in the second phase. Rather than enumerating all parses, in Section 3, we describe how they can be encoded by means of a shared forest (represented as a grammar) with which an expoo ential number of parses are encoded using a polynomially bounded structure. This shared forest encodes all derivations including those that are spuriously ambiguous. In Section 4.1, we show that it is possible to modify the shared forest so that it contains no spurious ambiguity. This is done (in the third phase) by traversing the forest, examining two levels of nodes at each stage, detecting spurious ambiguity locally. The three stage process of recognition, building the shared forest, and eliminating spurious ambiguity takes polynomial time.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
1.1 Definition of CCG
</SectionTitle>
      <Paragraph position="0"> A CCG, G, is denoted by (VT, VN, S, f, R) where VT is a finite set of terminals (lexical items), VN is a finite set of nonterminals (atomic categories), S is a distinguished member of VN, f is a function that maps elements of VT to finite sets of categories, R is a finite set of combinatory rules. Combinatory rules have the following form. In each of the rules x, y, zl,.., are variables and li E {\,/}.</Paragraph>
      <Paragraph position="1">  1. Forward application: z/y y .--. z 2. Backward application: y z\y ~ z 3. Forward composition (for n &gt; 1): ~ly yllz112... I.z. - xllz112.., l~z.</Paragraph>
      <Paragraph position="2"> 4. Backward composition (for n_&gt; i):  yl,z~12...l.=, x\y--* ~I~=~12...I.=~ In the above rules, z \[ y is the primary category and the other left-hand-side category is the secondary category. Also, we refer so the leftmost nonterminal of a category as the target of the category. We assume that categories are parenthesis-free. The results presented here, however, generalize to the case of fully parenthesized categories. The version of CCG used in \[7, 5\] allows for the possibility that the use of these combinatory rules can be restricted. Such restrictions limit the possible categories that can inatantiate the variables. We do not consider this possibility here, though the results we present can be extended to handle these restrictions.</Paragraph>
      <Paragraph position="3"> Derivations in a CCG involve the use of the combinatory rules in R. Let ~ be defined as follows, where Tt and T2 are strings of categories and terminals and c, cl, c2 are categories.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
1.2 Context-Free Paths
</SectionTitle>
      <Paragraph position="0"> In Section 2 we describe a recognition algorithm that involves extending the CKY algorithm for CFG. The differences between the CKY algorithm and the one presented here result from the fact that the derivation tree sets of CCG have more complicated path sets than the (regular) path sets of CFG tree sets. Consider the set of CCG derivation trees of the form shown in Figure 1 for the language { ww t w E {a, b} deg }.</Paragraph>
      <Paragraph position="1"> Due to the nature of the combinatory rules, categories behave rather like stacks since their arguments are manipulated in a last-in-first-out fashion. This has the effect that the paths can exhibit nested dependencies as shown in Figure 1. Informally, we say that CCG tree sets have context-free paths. Note that the tree sets of CFG have regular paths and cannot produce such tree sets.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="1" end_page="1" type="metho">
    <SectionTitle>
2 Recognition of CCG
</SectionTitle>
    <Paragraph position="0"> The recognition algorithm uses a 4 dimensional array L for the input at...a,. In entries of the array L we cannot store complete categories since exponentially many categories can derive the substring  ai... aj I it is necessary to store categories carefully It is possible, however, to share parts of categories b~ tween different entries in L. This follows from the fac' that the use of a combinatory rule depends only on (1) the target category of the primary category of th~ rule; (2) the first argument (sufrLx of length 1) of th~ primary category of the rule;(3) the entire (bounded secondary category. Therefore, we need only find thi: (bounded) information in each array entry in ordel to determine whether a rule can be used. Entries o the form ((A, a), T) are stored in L\[i, j\]\[p, q\]. This en codes all categories whose target is A, suffix ~, am that derive the ai ... aj. The tail T and the indices j and q are used to locate the remaining part of thes~ categories. Before describing precisely the informatior that is stored in L we give some definitions.</Paragraph>
    <Paragraph position="2"> that R contains a rule whose secondary category is ylzzzl2... InZn and let k2 be the maximum of kl and all n where there is some c E f(a) such that c = As and \]o~ I = n.</Paragraph>
    <Paragraph position="3"> In considering how categories that are derived in the course of a derivation should be stored we have  with respect to j - i. Since previous approaches to CCG parsin~ store entire categories they can take exponential time. items appearing in the input string or whose length is less that kt and could therefore be secondary categories of a rule. Thus all categories whose length is bound by k~ are encoded in their entirety within a single array entry.</Paragraph>
    <Paragraph position="4"> 2. All other categories are encoded with a sharing mechanism in which we store up to kt arguments locally together with an indication of where the remaining arguments can be found.</Paragraph>
    <Paragraph position="5"> Next, we give a proposition that characterizes when an entry is included in the array by the algorithm. An entry (A, a), T) E L\[i, j\]~&gt;, q\] where A E VN and a ~ ({\,/}VN)* when one of the following holds.</Paragraph>
    <Paragraph position="6"> If T = 7 then 7 e {\, I}VN, 1 &lt; I~l &lt; kx, and for  some a' ~ ({\,/}VN)* the following hold (1) Aa'ct &amp;quot;';~ hi...%-tAa'Taq+t ...aj.</Paragraph>
    <Paragraph position="7"> (2) An'7 ~ ap...%.</Paragraph>
    <Paragraph position="8"> (3) Informally, the category An'7 in (1) above is &amp;quot;derived&amp;quot; from Aatc~ such that there is no intervening point in the derivation before reaching An7 at which the all of the suffix a of Aa~a has been &amp;quot;popped&amp;quot;* Alternatively, ifT = - then 0 &lt;: \[a I &lt; kt +k2, (p, q) = (0, 0) and Ac~ =~=t, al...a~. Note that we have In\[ &lt; kl + k2 rather than \[M &lt;_ k~ (as might have been expected from the discussion above). This  is the case because a category whose length is strictly less than k2, can, as a result of function composition, result in a category of length &lt; kl + k~. Given the way that we have designed the algorithm below, the latter category is stored in this (non-sharing) form.</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
2.1 Algorithm
</SectionTitle>
      <Paragraph position="0"> If c E f(ai) for some category c, such that c - An, then include the tuple ((A, a),-) in L\[i, i\]\[0, 0\].</Paragraph>
      <Paragraph position="1"> For some i and j, l &lt; i &lt; j &lt;_ n consider each rule x/~ ~ltzt... I,~z,, ~ xllzt.., l.,z., 2.</Paragraph>
      <Paragraph position="2"> For some k, i &lt; k &lt; j, we look for some ((B, B), -) E L\[k+l,j\]\[O,O\], where IN - m, (corresponding to the secondary cate$ory of the rule) and we look for ((A, a/B), T) E L\[i, k\]\[p, q\] for some a, T, p and q (corresponding to the primary category of the rule).</Paragraph>
      <Paragraph position="3"> From these entries in L we know that for some c~' Aa%/B =~ ai...ak and B/3 =~ ak+1...a~.</Paragraph>
      <Paragraph position="4"> 2Backward composition and application are treated in the same way as this rule, except that all occurrences below of i and k are swapped with occurrences of k+ 1 and j, respectively. Thus, by the combinatory rule given above we have Asia/3 ~ hi...aj and we should store and encoding of the category Acgaf? in L\[i, j\]. This encoding depends on cd, a, fl, and T, If \[~\[ &lt; kl + k2 then (case la) add ((A, aft), -) to L\[i, j\]\[0, 0\]. Otherwise, (case lb) add ((A, ),/B) to ~\[i,/\]\[i, k\]. *T~- andre&gt; 1 The new category is longer than the one found in L\[i, k\]\[p, q\]. If a C/ e then (case 2a) add ((A, ), IS) to L\[i, Jill, k\], otherwise (case 2b) add ((A, ~),T) to L\[i, j\] \[p, q\].</Paragraph>
      <Paragraph position="5"> *T~- andrn= 1 (case 3) The new category has the same length as the one found in L\[i, k\]~, q\]. Add ((A, ~/), T) to L\[i, j\]~, q\]. .T----7 ~- and m----O The new category has the a length one less than the one found in L\[i, k\]~, q\]. If a ~ e then (case 4a) add ((A, a), T) to. L\[i, j\]\[p, q\]. Otherwise, (case 4b) since a = * we have to look for part of the category that is not stored locally in L\[i, k\]~, q\]. This may be found by looking in each entry Lip, q\]\[r, s\] for each ((A, ~'7), T'). We know that either T' = - or fl' C/ e and add ((A, ~'), T') to L\[i, jilt, s\]. Note that for some a&amp;quot;, Aa'l~17 ~ a v. .aq, Aa&amp;quot;/3'/B a~ .ak, and thus by the combinatory rule above Au'~ ~ =~</Paragraph>
      <Paragraph position="7"> As in the case of CKY algorithm we should have loop statements that allow i, j to range from 1 through n such that the length of the spanned substring starts from 1 (i - j) and increases to n (i = 1 and j --- n).</Paragraph>
      <Paragraph position="8"> When we consider placing entries in L\[i,j\] (i.e., to detect whether a category derives ai*..ai) we have to consider whether there are two subconstituents (to simplify the discussion let us consider only forward combinations) which span the substrings ai .. * ak and ak+l...aj. Therefore we need to consider all values for k between i through j - 1 and consider the entries in L\[i,k\]~,q\] and L\[k+ 1,j\]\[0, 0\] where i ~ p _&lt; q &lt; k orp=q=0.</Paragraph>
      <Paragraph position="9"> The above algorithm can be shown to run in time O(n 7) where n is the length of the input. In case 4b.</Paragraph>
      <Paragraph position="10"> we have to consider all possible values for r, s between p and q. The complexity of this case dominates the complexity of the algorithm since the other cases do involve fewer variables (i.e., r and s are not involved). Case 4b takes time O((q - p)2) and with the loops for i, j, k, p, q ranging from 1 through n the time complexity of the algorithm is O(n't).</Paragraph>
      <Paragraph position="11"> However, this algorithm can be improved to obtain a time complexity of O(n s) by using the same method employed in \[9\]. This improvement is achieved by moving part of case 4b outside of the k loop, since looking for ((A, ff/7'), T~) in LIp, q\]\[r, s\] need not be done within the k loop. The details of the improved method may be found in \[9\] where parsing of Linear Indexed Grammar (LIG) was considered. Note that O(n s) (which we achieve with the improved method) is the best known result for parsing Tree Adjoining Grammars, which generates the same class of languages generated by CCG and LIG.</Paragraph>
      <Paragraph position="12"> A\[.-a\] --. A, \[a,\]... A, x \[a,-a \] A,\[../~\] A,+I \[ai+l\]... A,\[an\] A\[a\] &amp;quot;~ a The first form of production is interpreted as: if a nonterminal A is associated with some stack with the sequence cr on top (denoted \[-.c~\]), it can be rewritten such that the i th child inherits this stack with ~ replacing a. The remaining children inherit the bounded stacks given in the production.</Paragraph>
      <Paragraph position="13"> The second form of production indicates that if a non-terminal A has a stack containing a sequence a then it can be rewritten to a terminal symbol a.</Paragraph>
      <Paragraph position="14"> The language generated by a LIG is the set of strings derived from the start symbol with an empty stack.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="1" end_page="5" type="metho">
    <SectionTitle>
3 Recovering All Parses
</SectionTitle>
    <Paragraph position="0"> At this stage, rather than enumerating all the parses, we will encode these parses by means of a shared forest structure. The encoding of the set of all parses must be concise enough so that even an exponential number of parses can be represented by a polynomial sized shared forest. Note that this is not achieved by any previously presented shared forest presentation for CCG \[8\].</Paragraph>
    <Section position="1" start_page="1" end_page="1" type="sub_section">
      <SectionTitle>
3.1 Representing the Shared Forest
</SectionTitle>
      <Paragraph position="0"> Recently, there has been considerable interest in the use of shared forests to represent ambiguous parses in natural language processing \[1, 8\]. Following Billot and Lang \[1\], we use grammars as a representation scheme for shared forests. In our case, the grammars we produce may also be viewed as acyclic and-or graphs which is the more standard representation used for shared forests.</Paragraph>
      <Paragraph position="1"> The grammatical formalism we use for the representation of shared forest is Linear Indexed Grammar (LIG) a. Like Indexed Grammars (IG), in a LIG stacks containing indices are associated with nonterminals, with the top of the stack being used to determine the set of productions that can be applied. Briefly, we define LIG as follows.</Paragraph>
      <Paragraph position="2"> If a is a sequence of indices and 7 is an index, we use the notation A\[c~7\] to represent the case where a stack is associated with a nonterminal A having -y on top with the remaining stack being the c~. We use the following forms of productions.</Paragraph>
      <Paragraph position="3"> aIt has been shown in \[I0, 3\] that LIG and CCG generate the same class of languages.</Paragraph>
    </Section>
    <Section position="2" start_page="1" end_page="5" type="sub_section">
      <SectionTitle>
3.2 Building the Shared Forest
</SectionTitle>
      <Paragraph position="0"> We start building the shared forest after the recognizer has completed the array L and decided that a given input al ... an is well-formed. In recovering the parses, having established that some ~ is in an element of L, we search other elements of L to find two categories that combine to give a. Since categories behave like stacks the use of CFG for the representation of the set of parse trees is not suitable. For our purposes the LIG formalism is appropriate since it involves stacks and production describing how a stack can be decomposed based on only its top and bottom elements.</Paragraph>
      <Paragraph position="1"> We refer to the LIG representing the shared forest as Gsl. The set of indices used in Ga! have the form (A, a, i, j). The terminals used in Gs/ are names for the combinatory rule or the lexical assignment used (thus derived terminal strings encode derivations in G). For example, the terminal Fm indicates the use of the forward composition rule z/y yllzII2... ImZm and (c, a) indicates the lexical assignment, c to the symbol a. We use one nonterminal, P.</Paragraph>
      <Paragraph position="2"> An input al...an is accepted if it is the case that ((S, e), -) 6 L\[1, n\]\[0, 0\]. We start by marking this entry. By marking an entry ((A, c~), T) e L\[i, j\]~, q\] we are predicting that there is some derivation tree, rooted with the category S and spanning the input al ...a,, in which a category represented by this entry will participate. Therefore at some point we will have to consider this entry and build a shared forest to represent all derivations from this category.</Paragraph>
      <Paragraph position="3"> Since we start from ((S, e),-) E L\[1, hi\[0, 0\] and proceed to build a (representation of) derivation trees in a top down fashion we will have loop statements that vary the substring spanned (a~...aj) from the largest possible (i.e., i = 1 and j = n) to the smallest (i.e., i = j). Within these loop statements the algorithm (with some particular values for i and j) will consider marked entries, say ( (A, ct), T) E L\[i, j\]~, q\] (where i &lt; p &lt; q &lt; j or p = q = 0), and will build representations of all derivations from the category (specified by the marked entry) such that the input spanned is ai...aj. Since ((A, ~), T) is a representation of possibly more than one category, several cases arise depending on ot and T. All these cases try to uncover the reasons why the recognizer placed thin entry in L\[i, j\]~, q\]. Hence the cases considered here are inverses of the cases considered in the recognition phase (and noted in the algorithm given below).</Paragraph>
      <Paragraph position="4">  Mark ((S, e), -) in L\[1, n\]\[0, 0\].</Paragraph>
      <Paragraph position="5"> By varying i from 1 to n, j from n to i and for all appropriate values of p and q if there is a marked entry, say ((d, a), T) ~ L\[i,j\]~p, q\] then do the following.</Paragraph>
      <Paragraph position="6"> * Type I Production (inverse of la, 3, and 4a)  If for some k such that i _ k &lt; j, some a, 13 such that ~' = a/3, and B E VN we have ((A, a/B), T) E L\[i, k\]\[p, q\] and ((B,/3), -) E L\[k + 1, j\]\[0, 0\] then let p be the production P\[..(A, a', i, j)\] -..* F,, P\[..(A, a/B, i, k)\] P\[(B, B, k + 1, j)\] where m = \[/31. If p is not already present in Gdeg! then add p and mark ((A, a/B), T) e L\[i, k\]~,, q\] as well as</Paragraph>
      <Paragraph position="8"> If for some k such that i &lt; k &lt; j, and a,B,T',r,s,k we have ((A,a/B),T') E L\[i,k\]\[r,s\] where (p,q) = (i, k), ((B, ~'), -) e L\[k + 1, j\]\[0, 0\], T =/B, and the lengths of a and a' meet the requirements on the corresponding strings in case lb and 2a of the recognition algorithm then then let p be the production P\[..(A, a/B, i, k)(A, a', i, 1)\] --F,,, P\[..(A, or~B, i, k)\] P\[(B, a', k + 1, j)\] where m = la'l. If p is not already present in Gdeg! then add p and mark ((A, a/B), T') e L\[i, k\]\[r, s\] and  If for some k such that i &lt; k &lt; j, and some B it is the case that ((A,/B), T) 6 L\[i, l:\]\[p, q\] and ((B, ~'),-) E L\[k + 1, j\]\[0, 0\] where \]a'\] &gt; 1 then then let p be the production P\[.-(A, a', i, 1)\] --. E,, P\[..(A,/B, i, k)\] P\[(B, a', k + 1, j)\] where m = Intl. If p is not already present in G,I then add p and mark ((A,/B),T) 6 L\[i, k\]~, q\] and  If j = i, then it must be the case that T = - and there is a lexical assignment assigning the category As / to the input symbol given by at. Therefore, if it has not already been included, output the production P\[(a, ~', i, i)\] - (A~, a,) The number of terminals and nonterminals in the grammar is bounded by a constant. The number of indices and the number of productions in G,! are O(nS). Hence the shared forest representation we build is polynomial with respect to the length of the input, n, despite the fact that the number of derivations trees could be exponential.</Paragraph>
      <Paragraph position="9"> We will now informally argue that G,! can be built in time O(nZ). Suppose an entry ((A, a'), T) is in L\[i,j\]~,q\] indicating that for some /3 the category A/3c~' dominates the substring al...aj. The method outlined above will build a shared forest structure to represent all such derivations. In particular, we will start by considering a production whose left hand side is given by P\[..(A, ~', i, j)\]. It is clear that an introduction of production of type 4 dominates the time complexity since this case involves three other variables (over input positions), i.e., r, sl k; whereas the introduction of other types of production involve only one new variable k. Since we have to consider all possible values for r, s, k within the range i through j, this step will take O((j - 0 3) time. With the outer loops for i, j, p, and q allowing these indices to range from 1 through n, the time taken by the algorithm is O(n7).</Paragraph>
      <Paragraph position="10"> Since the algorithm given here for building the shared forest simply finds the inverses of moves made in the recognition phase we could have modified the recognition algorithm so as to output appropriate G,! productions during the process of recognition without altering the asymptotic complexity of the recognizer.</Paragraph>
      <Paragraph position="11"> However this will cause the introduction of useless productions, i.e., those that describe subderivations which do not partake in any derivation from the category S spanning the entire input string al ... a,.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="5" end_page="5" type="metho">
    <SectionTitle>
4 Spurious Ambiguity
</SectionTitle>
    <Paragraph position="0"> We say that a given CCG, G, exhibits spurious ambiguity if there are two distinct derivation trees for a string w that assign the same function argument structure. Two well-known sources of such ambiguity in CCG result from type raising and the associativity of composition. Much attention has been given to the latter form of spurious ambiguity and this is the one that we will focus on in this paper.</Paragraph>
    <Paragraph position="1"> To illustrate the problem, consider the following string of categories.</Paragraph>
    <Paragraph position="2"> At!A2 A2/Aa ... An-z/An Any pair of adjacent categories can be combined using a composition rule. The number of such derivations is given by the Catalan series and is therefore exponential in n. We return a single representative of the class of equivalent derivation trees (arbitrarily chosen to be the right branching tree in the later discussion).</Paragraph>
    <Section position="1" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
4.1 Dealing with Spurious Ambiguity
</SectionTitle>
      <Paragraph position="0"> We have discussed how the shared forest representation, Gsl, is built from the contents of array L. The recognition algorithm does not consider whether some of the derivations built are spuriously equivalent and this is reflected in G,I. We show how productions of G,! can be marked to eliminate spuriously ambiguous derivations. Let us call this new grammar Gnu.</Paragraph>
      <Paragraph position="1"> As stated earlier, we are only interested in detecting spuriously equivalent derivations arising from the associativity of composition. Consider the example involving spurious ambiguity shown in Figure 2. This example illustrates the general form of spurious ambiguity (due to associativity of composition) in the derivation of a string made up of contiguous substrings ai~ ...a h, a~ ...aj2, and ai~ ...aj8 resulting in a category Az alot2a3. For the sake of simplicity we assume that each combination indicated is a forward combination and hence i2 = jl + 1 and i3 = J2 + 1.</Paragraph>
      <Paragraph position="2"> Each of the 4 combinations that occur in the above figure arises due to the use of a combinatory rule, and hence will be specified in G,! by a production. For example, it is possible for combination 1 to be represented by the following type I production.</Paragraph>
      <Paragraph position="3"> P\[..( At , ot' ot2 / A3, il , j2)\] -~ F,,, P\[..( Ax, ot' / A2, i, ,jx)\] P\[(A2, a2, i2, j2 )\] where i2 = jz + 1, ~' is a suffix of az of length less than  kl, and m = la2\[. Since Aloq/A3 and Aaa3 are used as secondary categories, their lengths are bounded by kl + 1. Hence these categories will appear in their entirety in their representations in the G,! productions. The four combinations 4 will hence be represented in G,! by the productions:  by a Type 1 production.</Paragraph>
      <Paragraph position="4"> These productions give us sufficient information to detect spurious ambiguity locally, i.e., the local left and right branching derivations. Suppose we choose to retain the right branching derivations only. We are no longer interested in combination 2. Therefore we mark the production corresponding to this combination.</Paragraph>
      <Paragraph position="5"> This production is not discarded at this stage because although it is marked it might still be useful in detecting more spurious ambiguity. Notice in Figure 3  that the subtree obtained from considering combination 5 and combination 1 is right branching whereas the entire derivation is not. Since we are looking for the presence of spurious ambiguity locally (i.e., by considering two step derivations) in order to mark this derivation we can only compare it with the derivation where combination 7 combines Aa/A1 with Alala2a3 (the result of combination 2) s. Notice we would have already marked the production corresponding to combination 2. If this production had been discarded then the required comparison could not have been made and the production due to combination 6 can not have been marked. At the end of the marking process all marked productions can be discarded 6 .</Paragraph>
      <Paragraph position="6"> In the procedure to build the grammar Gn8 we start with the productions for lexical assignments (type 5).</Paragraph>
      <Paragraph position="7"> By varying il from n to 1, jz from i + 2 to n, i~ from j3 to il + 1, and i3 from i.~ + 1 to j3 we look for a group of four productions (as discussed above) that locally indicates the the presence of spurious ambiguity. Productions involved in derivations that are not right branching are marked.</Paragraph>
      <Paragraph position="8"> It can be shown that this local marking of spurious derivations will eliminate all and only the spuriously ambiguous derivations. That is, enumerating all derivations using unmarked productions, will give all and only genuine derivations. If there are two derivations that are spuriously ambiguous (due to the associativity of composition) then in these derivations there must be at least one occurrence of subderivations of the nature depicted in Figure 3. This will result in the marking of appropriate productions and hence the spurious ambiguity will be detected. By induction it is also possible to show that only the spuriously ambiguous derivations will be detected by the marking process outlined above.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML