File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/p96-1011_metho.xml
Size: 21,055 bytes
Last Modified: 2025-10-06 14:14:19
<?xml version="1.0" standalone="yes"?> <Paper uid="P96-1011"> <Title>Efficient Normal-Form Parsing for Combinatory Categorial Grammar*</Title> <Section position="4" start_page="80" end_page="83" type="metho"> <SectionTitle> 4 A Normal Form for &quot;Pure&quot; CCG </SectionTitle> <Paragraph position="0"> It is convenient to begin with a special case. Suppose the CCG grammar includes not some but all instances of the binary rule templates in (4). (As always, a separate lexicon specifies the possible categories of each word.) If we group a sentence's parses into semantic equivalence classes, it always turns out that exactly one parse in each class satisfies the following simple declarative constraints: (7) a. No constituent produced by >Bn, any n ~ 1, ever serves as the primary (left) argument to >Bn', any n' > 0.</Paragraph> <Paragraph position="1"> b. No constituent produced by <Bn, any n > 1, ever serves as the primary (right) argument to <Bn', any n' > 0.</Paragraph> <Paragraph position="2"> The notation here is from (4). More colloquially, (7) says that the output of rightward (leftward) composition may not compose or apply over anything to its right (left). A parse tree or subtree that satisfies (7) is said to be in normal form (NF).</Paragraph> <Paragraph position="3"> As an example, consider the effect of these restrictions on the simple sentence &quot;John likes Mary.&quot; Ignoring the tags -OT, -FC, and -Be for the moment, (8a) is a normal-form parse. Its competitor (85) is not, nor is any larger tree containing (8b). But non3How inefficient? (i) has exponentially many semantically distinct parses: n = 10 yields 82,756,612 parses (2deg) -- 48,620 equivalence classes. Karttunen's in 10 method must therefore add 48,620 representative parses to the appropriate chart cell, first comparing each one against all the previously added parses--of which there are 48,620/2 on average--to ensure it is not semantically redundant. (Additional comparisons are needed to reject parses other than the lucky 48,620.) Adding a parse can therefore take exponential time.</Paragraph> <Paragraph position="5"> Structure sharing does not appear to help: parses that are grouped in a parse forest have only their syntactic category in common, not their meaning. Karttunen's approach must tease such parses apart and compare their various meanings individually against each new candidate. By contrast, the method proposed below is purely syntactic--just like any &quot;ordinary&quot; parser--so it never needs to unpack a subforest, and can run in polynomial time.</Paragraph> <Paragraph position="6"> standard constituents are allowed when necessary:</Paragraph> <Paragraph position="8"> It is not hard to see that (7a) eliminates all but right-branching parses of &quot;forward chains&quot; like A/B B/C C or A/B/C C/D D/E/F/G G/H, and that (Tb) eliminates all but left-branching parses of &quot;backward chains.&quot; (Thus every functor will get its arguments, if possible, before it becomes an argument itself.) But it is hardly obvious that (7) eliminates all of CCG's spurious ambiguity. One might worry about unexpected interactions involving crossing composition rules like A/B B\C--~ A\C. Significantly, it turns out that (7) really does suffice; the proof is in SS4.2.</Paragraph> <Paragraph position="9"> It is trivial to modify any sort of CCG parser to find only the normal-form parses. No semantics is necessary; simply block any rule use that would violate (7). In general, detecting violations will not hurt performance by more than a constant factor. Indeed, one might implement (7) by modifying CCG's phrase-structure grammar. Each ordinary CCG category is split into three categories that bear the respective tags from (9). The 24 templates schematized in (10) replace the two templates of (4).</Paragraph> <Paragraph position="10"> Any CFG-style method can still parse the resulting spuriosity-free grammar, with tagged parses as in (8). In particular, the polynomial-time, polynomialspace CCG chart parser of (Vijay-Shanker & Weir, 1993) can be trivially adapted to respect the constraints by tagging chart entries.</Paragraph> <Paragraph position="11"> (9) -FC output of >Bn, some n > 1 (a forward composition rule) -BC output of <Bn, some n > 1 (a backward composition rule) -OT output of >B0 or <B0 (an application rule), or lexical item (10) a. Forward application >BO: ~ x/y-OT y-Be t -'+ x--OT y-OT ) b. Backward application <B0: y-Be ~ x\y-OT j&quot; ~ x-OT 9-O'1&quot; ) y l,,z,, l~z~ llz1-BC ---, x l,z,~..- \]2z2 llz1-FC c. Fwd. composition >Bn (n > 1): x/y-OT Y Inz,~ 12z2 IlZl-OT d. Bwd. composition <Bn (n >_ 1): Y I,~z~ 12z2 Ilzl-BC ---, x Inz,''&quot; I~.z2 Ilzl--BC y I,z, I~.z2 IlZl-OT x\y-OT (ii) a. Syn/sem for >Bn (n _> 0): =/y y * I.z....</Paragraph> <Paragraph position="12"> It is interesting to note a rough resemblance between the tagged version of CCG in (10) and the tagged Lambek cMculus L*, which (Hendriks, 1993) developed to eliminate spurious ambiguity from the Lambek calculus L. Although differences between CCG and L mean that the details are quite different, each system works by marking the output of certain rules, to prevent such output from serving as input to certain other rules.</Paragraph> <Section position="1" start_page="81" end_page="82" type="sub_section"> <SectionTitle> 4.1 Semantic equivalence </SectionTitle> <Paragraph position="0"> We wish to establish that each semantic equivalence class contains exactly one NF parse. But what does &quot;semantically equivalent&quot; mean? Let us adopt a standard model-theoretic view.</Paragraph> <Paragraph position="1"> For each leaf (i.e., lexeme) of a given syntax tree, the lexicon specifies a lexical interpretation from the model. CCG then provides a derived interpretation in the model for the complete tree. The standard CCG theory builds the semantics compositionally, guided by the syntax, according to (11). We may therefore regard a syntax tree as a static &quot;recipe&quot; for combining word meanings into a phrase meaning.</Paragraph> <Paragraph position="2"> One might choose to say that two parses are semantically equivalent iff they derive the same phrase meaning. However, such a definition would make spurious ambiguity sensitive to the fine-grained semantics of the lexicon. Are the two analyses of VP/VP VP VP\VP semantically equivalent? If the lexemes involved are &quot;softly knock twice,&quot; then yes, as softly(twice(knock)) and twice(softly(knock)) arguably denote a common function in the semantic model. Yet for &quot;intentionally knock twice&quot; this is not the case: these adverbs do not commute, and the semantics are distinct.</Paragraph> <Paragraph position="3"> It would be difficult to make such subtle distinctions rapidly. Let us instead use a narrower, &quot;intensional&quot; definition of spurious ambiguity. The trees in (12a-b) will be considered equivalent because they specify the same &quot;recipe,&quot; shown in (12c). No matter what lexical interpretations f, g, h, k are fed into the leaves A/B, B/C/D, D/E, E/F, both the trees end up with the same derived interpretation, namely a model element that can be determined from f, g, h, k by calculating Ax~y.f(g(h(k(x)))(y)).</Paragraph> <Paragraph position="4"> By contrast, the two readings of &quot;softly knock twice&quot; are considered to be distinct, since the parses specify different recipes. That is, given a suitably free choice of meanings for the words, the two parses can be made to pick out two different VP-type functions in the model. The parser is therefore conservative and keeps both parses. 4</Paragraph> </Section> <Section position="2" start_page="82" end_page="83" type="sub_section"> <SectionTitle> 4.2 Normal-form parsing is safe & complete </SectionTitle> <Paragraph position="0"> The motivation for producing only NF parses (as defined by (7)) lies in the following existence and uniqueness theorems for CCG.</Paragraph> <Paragraph position="1"> Theorem 1 Assuming &quot;pure CCG,&quot; where all possible rules are in the grammar, any parse tree ~ is semantically equivalent to some NF parse tree NF(~).</Paragraph> <Paragraph position="2"> (This says the NF parser is safe for pure CCG: we will not lose any readings by generating just normal forms.) Theorem 2 Given distinct NF trees a # o/ (on the same sequence of leaves). Then a and a t are not semantically equivalent.</Paragraph> <Paragraph position="3"> (This says that the NF parser is complete: generating only normal forms eliminates all spurious ambiguity.) null Detailed proofs of these theorems are available on the cmp-lg archive, but can only be sketched here. Theorem 1 is proved by a constructive induction on the order of a, given below and illustrated in (13): * For c~ a leaf, put NF(c~) = a.</Paragraph> <Paragraph position="4"> * (<R, ~, 3'> denotes the parse tree formed by combining subtrees/~, 7 via rule R.) If ~ = <R, fl, 7>, then take NF(c~) = <R, gF(fl), NF(7)> , which exists by inductive hypothesis, unless this is not an NF tree. In the latter case, WLOG, R is a forward rule and NF(fl) = <Q,~l,flA> for some forward composition rule Q. Pure CCG turns out to provide forward rules S and T such that a~ = <S, ill, NF(<T, ~2, 7>)> is a constituent and is semantically equivalent to c~. Moreover, since fll serves as the primary subtree of the NF tree NF(fl),/31 cannot be the output of forward composition, and is NF besides. Therefore a~ is NF:</Paragraph> <Paragraph position="6"> 1993) appear to share this view of semantic equivalence.</Paragraph> <Paragraph position="7"> Unlike (Karttunen, 1986), they try to eliminate only parses whose denotations (or at least A-terms) are systematically equivalent, not parses that happen to have the same denotation through an accident of the lexicon.</Paragraph> <Paragraph position="9"> This construction resembles a well-known normal-form reduction procedure that (Hepple & Morrill, 1989) propose (without proving completeness) for a small fragment of CCG.</Paragraph> <Paragraph position="10"> The proof of theorem 2 (completeness) is longer and more subtle. First it shows, by a simple induction, that since c~ and ~' disagree they must disagree in at least one of these ways: (a) There are trees/?, 3' and rules R # R' such that <R, fl, 7> is a subtree of a and <R',/3, 7> is a subtree of a'. (For example, S/S S\S may form a constituent by either <Blx or >Blx.) (b) There is a tree 7 that appears as a subtree of both c~ and cd, but combines to the left in one case and to the right in the other.</Paragraph> <Paragraph position="11"> Either condition, the proof shows, leads to different &quot;immediate scope&quot; relations in the full trees ~ and ~' (in the sense in which f takes immediate scope over 9 in f(g(x)) but not in f(h(g(x))) or g(f(z))). Condition (a) is straightforward. Condition (b) splits into a case where 7 serves as a secondary argument inside both cr and a', and a case where it is a primary argument in c~ or a'. The latter case requires consideration of 7's ancestors; the NF properties crucially rule out counterexamples here.</Paragraph> <Paragraph position="12"> The notion of scope is relevant because semantic interpretations for CCG constituents can be written as restricted lambda terms, in such a way that constituents having distinct terms must have different interpretations in the model (for suitable interpretations of the words, as in SS4.1). Theorem 2 is proved by showing that the terms for a and a' differ somewhere, so correspond to different semantic recipes. Similar theorems for the Lambek calculus were previously shown by (Hepple, 1990; ttendriks, 1993).</Paragraph> <Paragraph position="13"> The present proofs for CCG establish a result that has long been suspected: the spurious ambiguity problem is not actually very widespread in CCG.</Paragraph> <Paragraph position="14"> Theorem 2 says all cases of spurious ambiguity can be eliminated through the construction given in theorem 1. But that construction merely ensures a right-branching structure for &quot;forward constituent chains&quot; (such as h/B B/C C or h/B/C C/D D/E/F/G G/H), and a left-branching structure for backward constituent chains. So these familiar chains are the only source of spurious ambiguity in</Paragraph> </Section> </Section> <Section position="5" start_page="83" end_page="85" type="metho"> <SectionTitle> galootN \]N\]NP </SectionTitle> <Paragraph position="0"> If some rules are removed from a &quot;pure&quot; CCG grammar, some parses will become unavailable.</Paragraph> <Paragraph position="1"> Theorem 2 remains true (< 1 NF per reading).</Paragraph> <Paragraph position="2"> Whether theorem 1 (>_ 1 NF per reading) remains true depends on what set of rules is removed. For most linguistically reasonable choices, the proof of theorem 1 will go through, 5 so that the normal-form parser of SS4 remains safe. But imagine removing only the rule B/a C --~ B: this leaves the string A/B B/C C with a left-branching parse that has no (legal) NF equivalent.</Paragraph> <Paragraph position="3"> In the sort of restricted grammar where theorem 1 does not obtain, can we still find one (possibly non-NF) parse per equivalence class? Yes: a different kind of efficient parser can be built for this case.</Paragraph> <Paragraph position="4"> Since the new parser must be able to generate a non-NF parse when no equivalent NF parse is available, its method of controlling spurious ambiguity cannot be to enforce the constraints (7). The old parser refused to build non-NF constituents; the new parser will refuse to build constituents that are semantically equivalent to already-built constituents. This idea originates with (Karttunen, 1986).</Paragraph> <Paragraph position="5"> However, we can take advantage of the core result of this paper, theorems 1 and 2, to do Karttunen's redundancy check in O(1) time--no worse than the normal-form parser's check for -FC and -Be tags.</Paragraph> <Paragraph position="6"> (Karttunen's version takes worst-case exponential time for each redundancy check: see footnote SS3.) The insight is that theorems 1 and 2 establish a one-to-one map between semantic equivalence classes and normal forms of the pure (unrestricted) CCG: (15) Two parses a, ~' of the pure CCG are semantically equivalent iff they have the same normal form: gF(a) = gF(a').</Paragraph> <Paragraph position="7"> The NF function is defined recursively by SS4.2's proof of theorem 1; semantic equivalence is also defined independently of the grammar. So (15) is meaningful and true even if a, a' are produced by a restricted CCG. The tree NF(a) may not be a legal parse under the restricted grammar. However, it is still a perfectly good data structure that can be maintained outside the parse chart, to serve 5For the proof to work, the rules S and T must be available in the restricted grammar, given that R and Q are. This is usually true: since (7) favors standard constituents and prefers application to composition, most grammars will not block the NF derivation while allowing a non-NF one. (On the other hand, the NF parse of A/B B/C C/D/E uses >B2 twice, while the non-NF parse gets by with >B2 and >B1.) as a magnet for a's semantic class. The proof of theorem 1 (see (13)) actually shows how to construct NF(a) in O(1) time from the values of NF on smaller constituents. Hence, an appropriate parser can compute and cache the NF of each parse in O(1) time as it is added to the chart. It can detect redundant parses by noting (via an O(1) array lookup) that their NFs have been previously computed.</Paragraph> <Paragraph position="8"> Figure (1) gives an efficient CKY-style algorithm based on this insight. (Parsing strategies besides CKY would Mso work, in particular (Vijay-Shanker & Weir, 1993).) The management of cached NFs in steps 9, 12, and especially 16 ensures that duplicate NFs never enter the oldNFs array: thus any alternative copy of a.nfhas the same array coordinates used for a.nfitself, because it was built from identical subtrees.</Paragraph> <Paragraph position="9"> The function Pre:ferableTo(~, r) (step 15) provides flexibility about which parse represents its class. PreferableTo may be defined at whim to choose the parse discovered first, the more left-branching parse, or the parse with fewer non-standard constituents. Alternatively, PreferableTo may call an intonation or discourse module to pick the parse that better reflects the topic-focus division of the sentence. (A variant algorithm ignores PreferableTo and constructs one parse forest per reading. Each forest can later be unpacked into individual equivalent parse trees, if desired.) (Vijay-Shanker & Weir, 1990) also give a method for removing &quot;one well-known source&quot; of spurious ambiguity from restricted CCGs; SS4.2 above shows that this is in fact the only source. However, their method relies on the grammaticality of certain intermediate forms, and so can fail if the CCG rules can be arbitrarily restricted. In addition, their method is less efficient than the present one: it considers parses in pairs, not singly, and does not remove any parse until the entire parse forest has been built.</Paragraph> <Paragraph position="10"> 6 Extensions to the CCG Formalism In addition to the Bn (&quot;generalized composition&quot;) rules given in SS2, which give CCG power equivalent to TAG, rules based on the S (&quot;substitution&quot;) and T (&quot;type-raising&quot;) combinators can be linguistically useful. S provides another rule template, used in the analysis of parasitic gaps (Steedman, 1987; Szabolcsi, 1989): (16) a. >s: x/yllz yllz * Ilz / g b. <S: yllz x\yIlz --* xIlz Although S interacts with Bn to produce another source of spurious ambiguity, illustrated in (17), the additional ambiguity is not hard to remove. It can be shown that when the restriction (18) is used together with (7), the system again finds exactly one 1. for/:= lton 2. C\[i - 1, i\] := LexCats(word\[i\]) (* word i stretches from point i - 1 to point i *) 3. for width := 2 to n 4. for start := 0 to n- width 5. end := start + width 6. for mid := start + 1 to end- 1 7. for each parse tree ~ = <R,/9, 7> that could be formed by combining some /9 6 C\[start, miaq with some 7 e C\[mid, ena~ by a rule/~ of the (restricted) grammar 8. a.nf := NF(a) (* can be computed in constant time using the .nf fields of fl, 7, and other constituents already in C. Subtrees are also NF trees. *) 9. ezistingNF := oldNFs\[~.nf .rule, c~.nf .leftchild.seqno, a.nf .rightchild.seqno\] 10. if undefined(existingNF) (* the first parse with this NF *) 11. ~.nf.seqno := (counter := counter + 1) (* number the new NF ~ add it to oldNFs *) 12. oldNFs\[c~.nf .rule, c~.nf .leflchild.seqno, a.nf .rightchild.seqno\] := a.nf 13. add ~ to C\[start, ena~ 14. a.nf.currparse := c~ 15. elsif PreferableTo(a, ezistingNF.currparse) (* replace reigning parse? *) 16. a.nf:= existingNF (* use cached copy of NF, not new one *) 17. remove a.nf. currparse from C\[start, en~ 18. add ~ to C\[start, enaq 19. ~.nfocurrparse := 20. return(all parses from C\[0, n\] having root category S) simpler normal-form parser will suffice for most grammars.) parse from every equivalence class.</Paragraph> <Paragraph position="11"> n _> 2, ever serves as the primary (left) argument to >S.</Paragraph> <Paragraph position="12"> b. No constituent produced by <Bn, any n > 2, ever serves as the primary (right) argument to <S.</Paragraph> <Paragraph position="13"> Type-raising presents a greater problem. Various new spurious ambiguities arise if it is permitted freely in the grammar. In principle one could proceed without grammatical type-raising: (Dowty, 1988; Steedman, 1991) have argued on linguistic grounds that type-raising should be treated as a mere lexical redundancy property. That is, whenever the lexicon contains an entry of a certain cate- null gory X, with semantics x, it also contains one with (say) category T/(T\X) and interpretation Ap.p(z).</Paragraph> <Paragraph position="14"> As one might expect, this move only sweeps the problem under the rug. If type-raising is lexical, then the definitions of this paper do not recognize (19) as a spurious ambiguity, because the two parses are now, technically speaking, analyses of different sentences. Nor do they recognize the redundancy in (20), because--just as for the example &quot;softly knock twice&quot; in SS4.1--it is contingent on a kind of lexical coincidence, namely that a type-raised subject commutes with a (generically) type-raised object. Such ambiguities are left to future work.</Paragraph> </Section> class="xml-element"></Paper>