XML Viewer - p99-1060

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/99/p99-1060_metho.xml
Size: 23,395 bytes
Last Modified: 2025-10-06 14:15:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="P99-1060">
  <Title>An Earley-style Predictive Chart Parsing Method for Lambek Grammars</Title>
  <Section position="3" start_page="0" end_page="465" type="metho">
    <SectionTitle>
2 The Lambek Calculus
</SectionTitle>
    <Paragraph position="0"> We are concerned with the implicational (or 'product-free') fragment of the associative Lambek calculus (Lambek, 1958). A natural deduction formulation is provided by the following rules of elimination and introduction, which correspond to steps of functional application and abstraction, respectively (as the term labelling reveals). The rules are sensitive to the order of assumptions. In the \[/I\] (resp. \[\I\]) rule, \[B\] indicates a discharged or withdrawn assumption, which is required to be the rightmost (resp. leftmost) of the proof.</Paragraph>
    <Paragraph position="1">  The above proof illustrates 'hypothetical reasoning', i.e. the presence of additional assumptions ('hypotheticals') in proofs that are subsequently discharged. It is because of this phenomenon that standard chart methods are inadequate for the Lambek calculus -- hypotheticals don't belong at any position on the single ordering over lexical categories by which standard charts are organised. 1 The previous chart methods for the Lambek calculus deal with this problem in different ways. The method of K6nig (1990, 1994) places hypotheticals on separate 'minicharts' which can attach into other (mini)charts where combinations are 1In effect, hypotheticals belong on additional suborderings, which can connect into the main ordering of the chart at various positions, generating a branching, multi-dimensional ordering scheme.</Paragraph>
    <Paragraph position="2">  possible. The method requires rather complicated book-keeping. The method of Hepple (1992) avoids this complicated book-keeping, and also rules out some useless subderivations allowed by Khnig's method, but does so at the cost of computing a representation of all the possible category sequences that might be tested in an exhaustive sequent proof search.</Paragraph>
    <Paragraph position="3"> Neither of these methods exhibits performance that would be satisfactory for practical use. 2</Paragraph>
  </Section>
  <Section position="4" start_page="465" end_page="465" type="metho">
    <SectionTitle>
3 Some Preliminaries
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="465" end_page="465" type="sub_section">
      <SectionTitle>
3.1 First-order Compilation for
Categorial Parsing
</SectionTitle>
      <Paragraph position="0"> Hepple (1996) introduces a method of first-order compilation for implicational linear logic, to provide a basis for efficient theorem proving of various categorial formalisms. Implicational linear logic is similar to the Lambek calculus, except having only a single non-directional implication --o. The idea of first-order compilation is to eliminate the need for hypothetical reasoning by simplifying higher-order formulae (whose presence requires hypothetical reasoning) to first-order formulae. This involves excising the subformulae that correspond to hypotheticals, leaving a first-order residue. The excised subformulae are added as additional assumptions. For example, a higher-order formula (Z -o Y) --o X simplifies to Z+ (Y -o X), allowing proof (a) to be replaced by (b):</Paragraph>
      <Paragraph position="2"> The method faces two key problems: avoiding invalid deduction and getting an appropriate se2Morrill (1996) provides a somewhat different tabular method for Lambek parsing within the proof net deduction framework, in an approach where proof net checking is made by unifying labels marked on literals. The approach tabulates MGU's for the labels of contiguous subsegments of a proof net.</Paragraph>
      <Paragraph position="3"> mantics for the combination. To avoid invalid deduction, an indexing scheme is used to ensure that a hypothetical must be used to derive the argument of the residue functor from which was excised (e.g. Z must be used to derive the argument Y of Y--o X, a condition satisfied in proof (b). To get the same semantics with compilation as without, the semantic effects of the introduction rule are compiled into the terms of the formulae produced, e.g. (Z -o Y) --o X : w gives Z : z plus Y --o X : Au.w(Az.u). Terms are combined, not using standard application/fl-reduction, but rather an operation Ax.g + h =~ g\[h//x\] where a variant of substitution is used that allows 'accidental' variable capture. Thus when Y--o X combines with its argument, whose derivation includes Z, the latter's variable becomes bound,</Paragraph>
      <Paragraph position="5"/>
    </Section>
    <Section position="2" start_page="465" end_page="465" type="sub_section">
      <SectionTitle>
3.2 Multiset-valued Linear Indexed
Grammar
</SectionTitle>
      <Paragraph position="0"> Rambow (1994) introduces the multiset-valued linear indexed grammar formalism ({}-LIG). Indices are stored in an unordered multiset representation (c.f. the stack of conventional linear indexed grammar). The contents of the multiset at any mother node in a tree is distributed amongst its daughter nodes in a linear fashion, i.e each index is passed to precisely one daughter. Rules take the form A0\[m0\]-+ Al\[ml\]...An\[m,~\]. The multiset of indices m0 are required to be present in, and are removed from, the multiset context of the mother node in a tree. For each daughter Ai, the indices mi are added into whatever other indices are inherited to that daughter. Thus, a rule A\[\] --+ B\[1\] C\[\] (where \[\] indicates an empty multiset) can license the use of a rule DIll ~ a within the derivation of its daughter BIll, and so the indexing system allows the encoding of dominance relations.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="465" end_page="470" type="metho">
    <SectionTitle>
4 A New Chart Parsing Method for
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="465" end_page="468" type="sub_section">
      <SectionTitle>
Lambek Grammars
4.1 Lambek to SLMG Conversion
</SectionTitle>
      <Paragraph position="0"> The first task of the parsing approach is to convert the antecedent formulae of the sequent to be proved into a collection of rules of a formalism I call Span Labelled Multiset Grammar (SLMG). For digestibility, I will present the conversion process in three stages. (I will assume</Paragraph>
      <Paragraph position="2"> that in any sequent F ~ A to be proved, the succedent A is atomic. Any sequent not in this form is easily converted to one, of equivalent theoremhood, which is.) Firstly, directional types are labelled with span information using the labelling scheme of Morrill (1995) (which is justified in relation to relational algebraic models for the Lambek calculus (van Benthem, 1991)). An antecedent Xi in X1...Xn =~ X0 has basic span (h-i) where h -- (i - 1). The labelled formula is computed from (Xi:(h-i)) + using the polar translation functions shown in Figure 1 (where /~ denotes the complementary polarity to p).3 As an example, Figure 1 also shows the results of converting the antededents of X/(Y/Z), W, (W\Y)/Z =~ X (where k is a constant and i,j variables). 4 The second stage of the conversion is adapted from the first-order compilation method of Hepple (1996), discussed earlier, modified to handle directional formulae and using a modified indexation scheme to record dependencies 3The constants produced in the translation correspond to 'new' string positions, which make up the additional suborderings on which hypotheticals are located. The variables produced in the translation become instantiated to some string constant during an analysis, fixing the position at which an additional subordering becomes 'attached to' another (sub)ordering.</Paragraph>
      <Paragraph position="3"> 4The idea of implementing categorial grammar as a non-directional logic, but associating atomic types with string position pairs (i.e. spans) to handle word order, is used in Pareschi (1988), although in that approach all string positions instantiate to values on a single ordering (i.e. integers 0 - n for a string of length n), which is not sufficient for Lambek calculus deductions.</Paragraph>
      <Paragraph position="4"> between residue formulae and excised hypotheticals (one where both the residue and hypothetical record the dependency). For this procedure, the 'atomic type plus span label' units that result from the previous stage are treated as atomic units. The procedure T is defined by the cases shown in Figure 2 (although the method is perhaps best understood from the example also shown there). Its input is a pair (T, t), T a span labelled formula, t its associated term. 5 This procedure simplifies higher-order formulae to first-order ones in the manner already discussed, and records dependencies between hypothetical and residue formulae using the indexing scheme. Assuming the antecedents of our example X/(Y/Z),W, (W\Y)/Z ~ X, to have terms 81,82,83 respectively, compilation yields results as in the example in Figure 2. The higher-order X/(Y/Z) yields two output formulae: the main residue X/Y and the hypothetical Z, with the dependency between the two indicated by the common index 1 in the argument index set of the former and the principal index set of the latter. The empty sets elsewhere indicate the absence of such dependencies.</Paragraph>
      <Paragraph position="5"> The final stage of the conversion process converts the results of the second phrase into SLMG productions. The method will be explained by example. For a functor such as B\(((A\X)/D)/C), we can easily project the sequence of arguments it requires: 5Note that the &amp;quot;+&amp;quot; of (A + F) in (TO) simply pairs together the single compiled formula A with the set F of compiled formulae, where A is the main residue of the input formula and F its derived hypotheticals.</Paragraph>
      <Paragraph position="6">  as for (Tla) modulo directionality of connective T((m, Xa/(Y:ml), t)) = (m, X2/(Y:ml), Av.s) + F where Y atomic, T((m, X1, (tv))) = (re, X2, s) + F, v a fresh variable as for (T2a) modulo directionality of connective</Paragraph>
      <Paragraph position="8"> functor was the lexical category of a word w, it might be viewed as fulfilling a role akin to a PS rule such as X --+ A B w C D. For the present approach, with explicit span labelling, there is no need to include a rhs element to mark the position of the functor (or word) itself, so the corresponding production would be more akin to X -+ A B C D. For an atomic formula, the corresponding production will have an empty rhs, e.g. A --4 0 .6 The left and right hand side units of SLMG productions all take the form Aim\] (i-j), where A is an atomic type, m is a set of indices (if m is empty, the unit may be written A\[\](i-j)), 6Note that 0 is used rather than e to avoid the suggestion of the empty string, which it is not -- matters to do with the 'string' are handled solely within the span labelling. This point is reinforced by observing that the 'string language' generated by a collection SLMG productions will consist only of (nonempty) sequences of 0's. The real import of a SLMG derivation is not its terminal Yield, but rather the instantiation of span labels that it induces (for string matters), and its structure (for semantic matters).</Paragraph>
      <Paragraph position="9"> and (i-j) a span label. For a formula (m, T, t) resulting after first-order compilation, the rhs elements of the corresponding production correspond to the arguments (if any) of T, whereas its lhs combines the result type (plus span) of T with the multiset m. For our running example X/(Y/Z), W, (W\Y)/Z =~ X, the formulae resulting from the second phase (by first-order compilation) give rise to productions as shown in Figure 3. The associated semantic term for each rule is intended to be applied to the semantics if its daughters in their left-to-right order (which may require some reordering of the outermost lambdas c.f. the terms of the first-order formulae, e.g. as for the last rule).</Paragraph>
      <Paragraph position="10"> A sequent X1...Xn =~ Xo is proven if we can build a SLMG tree with root X0\[\](0-n) in which the SLMG rules derived from the antecedents are each used precisely once, and which induces a consistent binding over span variables.</Paragraph>
      <Paragraph position="11"> For our running example, the required derivation, shown below, yields the correct interpretation Sl(AZ.S3 z s2). Note that 'linear resource use', i.e. that each rule must be used precisely</Paragraph>
      <Paragraph position="13"> and does not need to be separately stipulated.</Paragraph>
      <Paragraph position="14"> Thus, the span (0-n) is marked on the root of the derivation. To bridge this span, the main residues of the antecedent formulae must all participate (since each 'consumes' a basic subspan of the main span) and they in turn require participation of their hypotheticals via the indexing scheme.</Paragraph>
      <Paragraph position="16"/>
    </Section>
    <Section position="2" start_page="468" end_page="470" type="sub_section">
      <SectionTitle>
4.2 The Earley-style Parsing Method
</SectionTitle>
      <Paragraph position="0"> The chart parsing method to be presented is derived from the Earley-style DTG parsing method of Rambow et al. (1995), and in some sense both simplifies and complicates their method. In effect, we abstract from their method a simpler one for Eaxley-style parsing of {}-LIG (which is a simpler formalism than the Linear Prioritized Multiset Grammar (LPMG) into which they compile DTG), and then extend this method to handle the span labelling of SLMG. A key differences of the new approach as compared to standard chart methods is that the usual external notion of span is dispensed with, and the combination of edges is instead regirnented in terms of the explicit span labelling of categories in rules. The unification of span labels requires edges to carry explicit binding information for span variables. We use R to denote the set of rules derived from the sequent, and E the set of edges in the chart. The general form of edges is: ((ml, m2), 9, r, (A ~ F * A)) where (~4 ~ F,A) E R, 0 is a substitution over span variables, r is a restrictor set identifying span variables whose values are required non-locally (explained below), and ml, m2 are multisets. In a {}-LIG or SLMG tree, there is no restriction on how the multiset indices associated with any non-terminal node can be distributed amongst its daughters. Rather than cashing out the possible distributions as alternative edges in the predictor step, we can instead, in effect, 'thread' the multiset through the daughters, i.e. passing the entire multiset down to the first daughter, and passing any that are not used there on to the next daughter, and so on.</Paragraph>
      <Paragraph position="1"> For an edge ((ml, m2), 19, r, (A ~ F * A)), ml corresponds to the multiset context at the time the ancestor edge with dotted rule (,4 -+ .FA) was introduced, and m2 is the current multiset for passing onto the daughters in A. We call ml the initial multiset and m2 the current multiset.</Paragraph>
      <Paragraph position="2"> The chart method employs the rules shown in Figure 4. We shall consider each in turn.</Paragraph>
      <Paragraph position="3"> Initialisation: The rule recorded on the edge in this chart rule is not a real one (i.e. ~ R), but serves to drive the parsing process via the prediction of edges for rules that can derive X0\[\](1-n). A successful proof of the sequent is shown if the completed chart contains an inactive edge for the special goal category, i.e. there is some edge ((0,0),0,0, (GOAL\[\](,-.) --+ h.)) E E Prediction: The current multiset of the predicting edge is passed onto the new edge as its initial multiset.</Paragraph>
      <Paragraph position="4"> The latter's current multiset (m6) may differ from its initial one due either to the removal of an index to license the new rule's use (i.e. if  Initialisation: if the initial sequent is X 1 ... X n :=~ Z 0</Paragraph>
      <Paragraph position="6"> Completer: if ((ml,rr~2),Ol,rl,(A\[m3\](f-g) --+ F . B\[m4\](i-h),A)) E E and ((m2, ms), 02, r2, (B\[m6\](i-j) -4 A*)) E E then ((ml, ms), 03, rl, (A\[m3\](f -(gO)) -~ F, B\[m4\](i-j) * (A0))) E E</Paragraph>
      <Paragraph position="8"> m5 is non-empty), or to the addition of indices from the predicting edge's next rhs unit (i.e. if ma is non-empty). (Note the 'sloppy' use of set, rather than explicitly multiset, notation. The present approach is such that the same index should never appear in both of two unioned sets, so there is in practice little difference.) The line 0 = 01 + MGU((g-h), (i-j)) checks that the corresponding span labels unify, and that the resulting MGU can consistently augment the binding context of the predicting edge.</Paragraph>
      <Paragraph position="9"> This augmented binding is used to instantiate span variables in the new edge where possible.</Paragraph>
      <Paragraph position="10"> It is a characteristic of this parsing method, with top-down left-to-right traversal and associated propagation of span information, that the left span index of the next daughter sought by any active edge is guarenteed to be instantiated, i.e. g above is a constant.</Paragraph>
      <Paragraph position="11"> Commonly the variables appearing in SLMG rules have only local significance and so their substitutions do not need to be carried around with edges. For example, an active edge might require two daughters B\[\](g-h) C\[\](h-i). A substitution for h that comes from combining with an inactive edge for B\[\](g-h) can be immediately applied to the next daughter C\[\](h-i), and so does not need to be carried explicitly in the binding of the resulting edge.</Paragraph>
      <Paragraph position="12"> However, a situation where two occurrences of a variable appear in different rules may arise as a result of first-order compilation, which will sometimes (but not always) separate a variable occurrence in the hypothetical from another in the residue. For the rule set of our running example, we find an occurrence of h in both the first and second rule (corresponding to the main residue and hypothetical of the initial higher-order functor). The link between the two rules is also indicated by the indexing system. It turns out that for each index there is at most one variable that may appear in the two rules linked by the index. The identity of the 'non-local variables' that associate with each index can be straightforwardly computed off the SLMG grammar (or during the conversion process).</Paragraph>
      <Paragraph position="13"> The function nfvreturns the set of non-local variables that associate with a multiset of indices. The line r2 = nlv(m2 12 m4) computes the set of variables whose values may need to  be passed non-locally, i.e. from the predicting edge down to the predicted edge, or from an inactive edge that results from combination of this predicted edge up to the active edge that consumes it. This 'restrictor set' is used in reducing the substitution 8 to cover only those variables whose values need to be stored with the edge. The only case where a substitution needs to be retained for variable that is not in the restrictor set arises regarding the next daughter it seeks. For example, an active edge might require two daughters B\[\](g-h) C\[1\](k-i), where the second's index links it to a hypothetical with span (k-h). Here, a substitution for h from a combination for the first daughter cannot be immediately applied and so should be retained until a combination is made for the second daughter. The function call dauglnlv(A) returns the set of non-local variables associated with the multiset indices of the next daughter in A (or the empty set if A is empty).</Paragraph>
      <Paragraph position="14"> There may be at most one variable in this set that appears in the substitution 8. The line</Paragraph>
      <Paragraph position="16"> tution to cover only the variables whose values need to be stored. Failing to restrict the substitution in this way undermines the compaction of derivations by the chart, i.e. so that we find edges in the chart corresponding to the same subderivation, but which are not recognised as such during parsing due to them recording incompatible substitutions.</Paragraph>
      <Paragraph position="17"> Completer: Recall from the prediction step that the predicted edge's current multiset may differ from its initial multiset due to the addition of indices from the predicting edge's next rhs unit (i.e. m4 in the prediction rule). Any such added indices must~be 'used up' within the subderivation of that rhs element which is realised by the combinations of the predicted edge. This requirement is checked by the condition m5 C_ m2. The treatment of substitutions here is very much as for the prediction rule, except that both input edges contribute their own substitution.</Paragraph>
      <Paragraph position="18"> Note that for the inactive edge (as for all inactive edges), both components of the span (i-j) will be instantiated, so we need only unify the right index of the two spans -- the left indices can simply be checked for atomic identity. This observation is important to efficient implementation of the algorithm, for which most effort is in practice expended on the completer step. Active edges should be indexed (i.e. hashed) with respect to the (atomic) type and left span index of the next rhs element sought. For inactive edges, the type and left span index of the lhs element should be used. For the completer step when an active edge is added, we need only access inactive edges that are hashed on the same type/left span index to consider for combination, all others can be ignored, and vice versa for the addition of an inactive edge.</Paragraph>
      <Paragraph position="19"> It is notable that the algorithm has no scanning rule, which is due to the fact that the positions of 'lexical items' or antecedent categories are encoded in the span labels of rules, and need no further attention. In the (Rambow et hi., 1995) algorithm, the scanning component also deals with epsilon productions. Here, rules with an empty rhs are dealt with by prediction, by allowing an edge added for a rule with an empty rhs to be treated as an inactive edge (i.e.</Paragraph>
      <Paragraph position="20"> we equate &amp;quot;() -&amp;quot; and &amp;quot;. ()&amp;quot;).</Paragraph>
      <Paragraph position="21"> If the completed chart indicates a successful analysis, it is straightforward to compute the proof terms of the corresponding natural deduction proofs, given a record of which edges were produced by combination of which other edges, or by prediction from which rule. Thus, the term for a predicted edge is simply that of the rule in R, whereas a term for an edge produced by a completer step is arrived at by combining a term of the active edge with one for the inactive edge (using the special substitution operation that allows 'accidental binding' of variables, as discussed earlier). Of course, a single edge may compact multiple alternative subproofs, and so return multiple terms. Note that the approach has no problem in handling multiple lexical assignments, they simply result in multiple rules generated off the same basic span of the chart.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML