File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/94/p94-1018_evalu.xml
Size: 14,034 bytes
Last Modified: 2025-10-06 14:00:13
<?xml version="1.0" standalone="yes"?> <Paper uid="P94-1018"> <Title>A Psycholinguistically Motivated Parser for CCG</Title> <Section position="7" start_page="127" end_page="129" type="evalu"> <SectionTitle> 5 CCG and flexible derivation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="127" end_page="127" type="sub_section"> <SectionTitle> 5.1 The Problem </SectionTitle> <Paragraph position="0"> CCG's distinguishing characteristic is its derivational flexibility -- the fact that one string is potentially assigned many truth-conditionally equivalent analyses. This feature is crucial to the present approach of incremental parsing (as well as for a range of grammatical phenomena, see e.g. Steedman 1987, 1994; Dowty 1988). But the additional ambiguity, sometimes referred to as 'spurious', is also a source of difficulty for parsing. For example, the truth-conditionally unambiguous string 'John was thinking that Bill had left' has CCG derivations corresponding to each of the 132 different binary trees possible for seven leaves. The fact that this sentence makes no unusual demands on humans makes it clear that its exponentially prolif~ crating ambiguous analyses are pruned somehow.</Paragraph> <Paragraph position="1"> The interpreter, which can resolve many kinds of ambiguity, cannot be used to for this task: it has no visible basis for determining, for example, that the single-constituent analysis 'John was thinking' 2In addition to the category-ambiguity problem in (3), the viable analysis criterion solves other problems, analogous to shift-reduce ambiguities, which are omitted here for reasons of space. The interested reader is referred to Niv (1993a) for a comprehensive discussion and an implementation of the parser proposed here.</Paragraph> <Paragraph position="2"> somehow makes more sense (in CCG) than the twoconstituent analysis 'John'+'was thinking'.</Paragraph> <Paragraph position="3"> Note that the maximMly left-branching derivation is the one which most promptly identifies syntactic relations, and is thus the preferred derivation. It is possible to extend the viable analysis criterion to encompass this consideration of efficiency as well.</Paragraph> <Paragraph position="4"> The infant learns that it is usually most efficient to combine whenever possible, and to discard an analysis in which a combination is possible, but not taken. 3.</Paragraph> <Paragraph position="5"> While this left-branching criterion eliminates the inefficiency due to flexibility of derivation, it gives rise to difficulties with (5).</Paragraph> <Paragraph position="6"> John loves Mary madly (5) s/vp vp/np np vp\vp In (5), it is precisely the non-left-branching derivation of 'John loves Mary' which is necessary in order to make the VP constituent available for combination with the adverb. (See Pareschi and Steedman 1987.)</Paragraph> </Section> <Section position="2" start_page="127" end_page="128" type="sub_section"> <SectionTitle> 5.2 Previous Approaches </SectionTitle> <Paragraph position="0"> Following up on the work of Lambek (1958) who proposed that the process of deriving the grammaticality of a string of categories be viewed as a proof, there have been quite a few proposals put forth for computing only normal forms of derivations or proofs (KSnig 1989; Hepple and Morrill 1989; Hepple 1991; inter alia). The basic idea with all of these works is to define 'normal forms' -- distinguished members of each equivalence class of derivations, and to require the parser to search this smaller space of possible derivations. But none of the proposed methods result in parsing systems which proceed incrementally through the string. 4 Karttunen (1989) and others have proposed chart-based parsers which directly address the derivational ambiguity problem. For the present purpose, the principal feature of chart parsing -the factoring out of constituents from analyses -turns out to create an encumberance: The interpreter cannot compare constituents, or arcs, for the purposes of ambiguity resolution. It must compare analyses of the entire prefix so far, which are awkward to compute from the developing chart.</Paragraph> <Paragraph position="1"> 3 Discussion of the consequences of this move on the processing of picture noun extractions and ambiguityrelated filled-gap effects is omitted for lack of space. See Niv (1993a).</Paragraph> <Paragraph position="2"> 4In the case of Hepple's (1991) proposal, a left-branching normal form is indeed computed. But its computation must be delayed for some words, so it does not provide the interpreter with timely information about the incoming string.</Paragraph> <Paragraph position="3"> Pareschi and Steedman (1987) propose the following strategy: (which can be taken out of the chart-parsing context of their paper) construct only maximally left-branching derivations, but allow a limited form of backtracking when a locally non-left-branching derivation turns out to have been necessary. For example, when parsing (5), Pareschi and Steedman's algorithm constructs the left branching analysis for 'John loves Mary'. When it encounters 'madly', it applies >0 in reverse to solve for the hidden VP constituent 'loves Mary' by subtracting the s/vp category 'John' from the s The idea with this 'revealing' operation is to exploit the fact that the rules >n and <n, when viewed as three-place relations, are functional in all three arguments. That is, knowledge any two of {left constituent, right constituent, result), uniquely determines the third. There are many problems with the completeness and soundness Pareschi and Steedman's proposal (Hepple 1987; Niv 1993a). For example, in (7), the category b\c cannot be revealed after it had participated in two combinations of mixed direction: <0 and >0.</Paragraph> <Paragraph position="4"> very attractive in the present setting. I propose to replace their unification-based revealing operation with a normal-form based manipulation of the derivation history. The idea is to construct and maintain the maximally incremental, left-branching derivations. (see section 4.) When a constituent such as the VP 'loves Mary' in (5) may be necessary, e.g. whenever the right-most constituent in an analysis is of the form X\Y, the next-to-rightmost derivation is rewritten to its equivalent right-branching derivation by repeated application the local transformations , defined in (8) and (9).</Paragraph> <Paragraph position="5"> The right frontier of the rewritten derivation now provides all the grammatically possible attachment sites.</Paragraph> <Paragraph position="7"> Results from the study of rewrite systems (see Klop (1992) for an overview) help determine the computational complexity of this operation:</Paragraph> </Section> <Section position="3" start_page="128" end_page="129" type="sub_section"> <SectionTitle> 6.1 A Rewrite System for Derivations </SectionTitle> <Paragraph position="0"> If x is a node in a binary tree let A(x) (resp. p(x)) refer to its left (right) child.</Paragraph> <Paragraph position="1"> Any subtree of a derivation which matches the left-hand-side of either (8) or (9) is called a redez. The result of replacing a redex by the corresponding right-hand-side of a rule is called the eontractum. A derivation is in normal form (NF) if it contains no redexes. In the following I use the symbol --~ to also stand for the relation over pairs of derivations such that the second is derived from the first by one application of ,7. Let ~-- be the converse of---*. Let ( , be ~ U ~---. Let ,~ be the reflexive transitive closure of --~ and similarly, the reflexive transitive closure of ~---, and , ,, the reflexive transitive closure of ~ ,. Note that ....</Paragraph> <Paragraph position="2"> is an equivalence relation.</Paragraph> <Paragraph position="3"> A rewrite system is strongly normalizing (SN) iff every sequence of applications of ~ is finite.</Paragraph> <Paragraph position="4"> Theorem 1 ---* is SN 5 proof Every derivation with n internal nodes is assigned a positive integer score. An application of is guaranteed to yield a derivation with a lower 5Hepple and Morrill (1989) Proved SN for a slight variant of ---*. The present proof provides a tighter score function, see lemma 1 below.</Paragraph> <Paragraph position="5"> score. This is done by defining functions # and for each node of the derivation as follows:</Paragraph> <Paragraph position="7"> Each application of ---+ decreases a, the score of the derivation. This follows from the monotonic dependency of the score of the root of the derivation upon the scores of each sub-derivation, and from the fact that locally, the score of a redex decreases when ---+ is applied: In figure 2, a derivation is depicted schematically with a redex whose sub-constituents are named a, b, and c. Applying ~ reduces ~(e), hence the score of the whole derivation.</Paragraph> <Paragraph position="8"> in redex:</Paragraph> <Paragraph position="10"> Observe that #(x) is the number of internal nodes in x.</Paragraph> <Paragraph position="11"> Lemma I Given a derivation x, let n = #x. Every sequence of applications of ---+ is of length at most n(n - 1)/2. 6 proof By induction on n: Base case: n = 1; 0 applications are necessary. Induction: Suppose true for all derivations of fewer than n internal nodes. Let m = #A(x). So 0 < 6Niv (1994) shows by example that this bound is tight.</Paragraph> <Paragraph position="12"> m_<n--1 and#p(x)=n-m-1.</Paragraph> <Paragraph position="14"> So far I have shown that every sequence of applications of ----+ is not very long: at most quadratic in the size of the derivation. I now show that when there is a choice of redex, it makes no difference which redex one picks. That is, all redex selection strategies result in the same normal form.</Paragraph> <Paragraph position="15"> A rewrite system is Church-Rosser (CR)just in case w, y.(z ,, ,, y ~ 3z.(z---~ z ^ y ,, z)) A rewrite system is Weakly Church-Rosser (WCR) just in ease w, ~, w.(w~ ~ ^ w~ y) ~ 3z.(, ~ z ^ y ,, z) Lemma 2 ---, is WCR.</Paragraph> <Paragraph position="16"> proof Let w be a derivation with two distinct redexes x and y, yielding the two distinct derivations w I and w&quot; respectively. There are a few possibilities: null case 1: x and y share no internal nodes. There are three subcases: x dominates y (includes y as a subconstituent), x is dominated by y, or z and y are incomparable with respect to dominance. Either way, it is clear that the order of application of ---+ makes no difference.</Paragraph> <Paragraph position="17"> case 2: x and y share some internal node. Without loss of generality, y does not dominate x. There exists a derivation z such that w~----~ zAw&quot;---~ z. This is depicted in figure 3. (Note that all three internal nodes in figure 3 are of the same rule direction, either > or <.) \[\] Lemma 3 (Newman) WCR A SN D CR.</Paragraph> <Paragraph position="18"> Theorem 2 ~ is CR.</Paragraph> <Paragraph position="19"> proof From theorem 1 and lemmas 2 and 3. \[\] Therefore any maximal sequence of applications of ~ will lead to the normal form 7. We are free to select the most efficient redex selection scheme. From lemma 1 the worst case is quadratic. Niv (1994) shows that the optimal strategy, of applying --+ closest as possible to the root, yields ---+ applications sequences of at most n steps.</Paragraph> <Paragraph position="20"> 7Assuming, as is the case with extant CCG accounts, that constraints on the applicability of the combinatory rules do not present significant roadblocks to the derivation rewrite process.</Paragraph> </Section> <Section position="4" start_page="129" end_page="129" type="sub_section"> <SectionTitle> 6.2 Discussion </SectionTitle> <Paragraph position="0"> Given the rightmost subconstituent recovered using the normal form technique above, how should parsing proceed? Obviously, if the leftward looking category which precipitated the normal form computation is a modifier, i.e. of the form X\X, then it ought to be combined with the recovered constituent in a form analogous to Chomsky adjunction. But what if this category is not of the form X\X? For example, should the parser compute the reanalysis in (10)? Ascribing the same non-garden-path status to the reanalysis in (10) that we do to (6) would constitute a very odd move: Before reanalysis, the derivation encoded the commitment that the /b of the first category is satisfied by the b of the b/c in the second category. This commitment is undone in the reanalysis. This is an undesirable property to have in a computational model of parsing commitment, as it renders certain revisions of commitments easier than others, without any empirical justification. Furthermore, given the possibility that the parser change its mind about what serves as argument to what, the interpreter must be able to cope with such non-monotonic updates to its view of the analysis so far -- this would surely complicate the design of the interpreter, s Therefore, constituents on the right-frontier of a right-normal-form should only combine with 'endocentric' categories to their right.</Paragraph> <Paragraph position="1"> The precise definition of 'endocentric' depends on the semantic formalism used -- it certainly includes post-head modifiers, and might also include coordination. null Stipulating that certain reanalyses are impossible immediately makes the parser 'incomplete' in the sense that it cannot find the analysis in (10).</Paragraph> <Paragraph position="2"> From the current perspective of identifying garden paths, this incompleteness is a desirable, even a necessary property. In (10), committing to the composition of a/b and b/c is tantamount to being led down the garden path. In a different sense, the current parser is complete: it finds all analyses if the Viable Analysis Criterion and the interpreter never discard any analyses.</Paragraph> </Section> </Section> class="xml-element"></Paper>