File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/e89-1003_metho.xml
Size: 13,224 bytes
Last Modified: 2025-10-06 14:12:14
<?xml version="1.0" standalone="yes"?> <Paper uid="E89-1003"> <Title>EFFICIENT PROCESSING OF FLEXIBLE CATEGORIAL GRAMMAR</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2. Structural Completeness </SectionTitle> <Paragraph position="0"> In the next section we present a grammatical calculus, which is more flexible than the systems considered by Wittenburg (1987) and Pareschi & Steedman (1987), and therefore is attractive for linguistic purposes. At the same time, it offers a solution to the spurious ambiguity problem.</Paragraph> <Paragraph position="1"> Spurious ambiguity causes problems for parsing in the systems mentioned above, because there is no systematic relationship between syntactic structures and semantic representations. That is, there is no way to identify in advance, for a given sentence S, a proper subset of the set of all possible syntactic structures and associated semantic representations, for which it holds that it will contain all possible semantic representations of S.</Paragraph> <Paragraph position="2"> (5) Strong Structural Completeness null If a sequence of categories XI .. X n reduces to Y, with semantics Y', there is a reduction to Y, with semantics Y', for any bracketing of XI..Xn into constituents.</Paragraph> <Paragraph position="3"> Grammars with this property, can potentially circumvent the spurious ambiguity problem, since for these grammars, we only have to inspect all left-branching syntax trees, to find all possible readings. This method will only fail if the set of left-branching trees itself would contain spurious ambiguous derivations. In section 4 we will show that these can be eliminated from the calculus presented below.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3. The P-calculus </SectionTitle> <Paragraph position="0"> Consider now a grammar for which the following property holds:</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> (4) Structural Completeness </SectionTitle> <Paragraph position="0"> If a sequence of categories X1 .. X n reduces to Y, there is a reduction to Y for any bracketing of X1..Xn into constituents. (Moortgat, 1987:5) 2 Structural complete grammars are interesting linguistically, since they are able to handle, for instance, all kinds of non-constituent conjunction, and also because they allow for strict left-to-right processing (see Moortgat, 1988).</Paragraph> <Paragraph position="1"> The latter observation has consequences for parsing as well,, since, if we can parse every sentence in a strict left-to-right manner (that is, we produce only strictly left-branching syntax trees), the parsing algorithm can be greatly simplified. Notice, however, that such a parsing strategy is only valid, if we also guarantee that all possible readings of a sentence can be found in this way. Thus, instead of (4), we are looking for grammars with the following, slightly stronger, property: 2 Buszkowski (1988) provides a slightly different definition in terms of functor-argument structures.</Paragraph> <Paragraph position="2"> The P(roduct)-calculus is a categorial grammar, based on Lambek (1958), which has the property of strong structural completeness. null In Lambek (1958), the foundations of flexible categorial grammar are formulated in the form of a calculus for syntactic categories. Well-known categorial rules, such as application, composition and category-raising, are theorems of this calculus. A largely neglected aspect of this calculus, is the use of the product-operator.</Paragraph> <Paragraph position="3"> The calculus we present below, was developed as an alternative for Moortgat's (1988) M-system. The M-system is a subset of the Lambek-calculus, which uses, next to application, only a very general form of composition. Since it has no raising, it seems to be an attractive candidate for investigating the possibilities of left-associative parsing for categorial grammar. It is not completely satisfactory, however, since structural completeness is not fully guaranteed, and also, since it is unknown whether the strong structural completeness property holds for this system. In our calculus, we hope to overcome these problems, by using product-introduction and -elimination rules instead of composition.</Paragraph> <Paragraph position="4"> The kernel of the P-calculus is rightand left-application, as usual. Next to these,</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> -21 - </SectionTitle> <Paragraph position="0"> we use a rule for introducing the productoperator, and two inference rules for elimi- null We can use this calculus to produce left-branching syntax trees for any given (grammatical) sentence. (7) is a simple ex- null rule (application or composition, for instance), we now have the freedom to concatenate arbitrary categories, completely irrespective of their internal structure. The P-calculus is structurally complete. To prove this, we prove that for any four categories A,B,C,D, it holds that :</Paragraph> <Paragraph position="2"> is the derivability relation. From this, structural completeness may be concluded, since any bracketing (or branching of syntax trees) can be obtained by applying this equivalence an arbitrary number of times.</Paragraph> <Paragraph position="3"> Proof : From (AB)C --> D it follows that there exists a category E such that AB --> The first step in the derivation of (7) is the application of rule I. The other two reductions ((a) and (b)) are instantiations of the inference rule P. As the example shows, the *-operator (more in particular its use in I) does something like concatenation, but whereas such operations are normally associated with particular grammatical rules (i.e. you may concatenate two elements of category N P and V P, respectively, if there is a rule 3 To improve readability, we assume that the operators / and \ take precedence over * (X*Y/Z should be read as X*(Y/Z) ).</Paragraph> <Paragraph position="5"> We can now include semantics in the proof given above, and from this, we may conclude that strong structural completeness holds for the P-calculus as well.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4. Eliminating Spurious </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Ambiguity </SectionTitle> <Paragraph position="0"> In this section we outline a subset of the P-calculus, for which efficient processing is possible. As was noted above, in the P-calculus there is always a strictly left-branching derivation for any reading of a - 22 sentence S. The restrictions we add in this section are needed to eliminate spurious ambiguities from these left-branching derivations. null Restricting a parser so that it will only accept left-branching derivations will not directly lead to an efficient parsing procedure for the P-calculus. The reason is twofold.</Paragraph> <Paragraph position="1"> First, nothing in the P-calculus excludes spurious ambiguity which occurs within the set of left-branching analysis trees. Consider again example (7). This sentence is unambiguous, but nevertheless we can give a left-branching derivation for it which differs from the one given earlier : If we try to prove that A*B and C can be reduced to a category D, we could use P, with I in the left premise. To prove the second premise, we could use P', also with using I in the left premise. But now the right premise of P' is identical to our initial problem; and thus we have made a useless loop, which could even lead to an infinite regress.</Paragraph> <Paragraph position="2"> These problems can be eliminated, if we restrict the grammar in two ways. First of all, we consider only derivations of the form C1,...,Cn ==> S, where C1,...,Cn,S do not contain the product-operator. This means we require that the start symbol of the grammar, and the set lexical categories must be product-free. Notice that this restriction can be easily made, since most categorial lexicons do not contain the product-operator anyway.</Paragraph> <Paragraph position="3"> Given this restriction, the inference rule P can be restricted: we require that the left premise of this rules always is an instance of either left- or right-application. Consider what would happen if we used I here :</Paragraph> <Paragraph position="5"> Since the lexicon is product-free, and we are interested in strictly left-branching derivations only, we know that C must be a product-free category. If we combine B and C through I, we are faced with the problem in ***. At this point we could use I again for instance, thereby instantiating D as A*(B*C).</Paragraph> <Paragraph position="6"> But this will lead to a spurious ambiguity, since we know that: A*(B*C) E => F iff (A*B)*C E--> F 4.</Paragraph> <Paragraph position="7"> A category (A*B)*C can be obtained by applying I directly to A*B and C.</Paragraph> <Paragraph position="8"> If we apply P' at point ***, we find ourselves trying to find a solution for A B => E, and then E C => D. But this is nothing else than trying to find a left-branching derivation for A,B,C => D, and therefore, the inference step in (11) has not led to anything new.</Paragraph> <Paragraph position="9"> In fact, given that the lexicon is product free and only application may be used in the left premise of P, P' is never needed to derive a left-branching tree.</Paragraph> <Paragraph position="10"> As a result, we get (12), where we have made a distinction between reduction rules (right and left-application) and other rules. This enables us to restrict the left premise of P. The fact that every reduction rule is also a general rule of the grammar, is expressed by R. P' has been eliminated.</Paragraph> <Paragraph position="11"> 4 In the P-calculus, this follows from the fact that E must be product-free. It is a theorem of the Lambek-calculus as well.</Paragraph> <Paragraph position="12"> sentence like (7) using a shift-reduce parsing technique, and having only right- and left-application as syntax rules.</Paragraph> <Paragraph position="13"> The system in (12) is a subset of the Pcalculus, which is able to generate a strictly left-branching derivation for every reading of a given sentence of the grammar.</Paragraph> <Paragraph position="14"> The Prolog fragment in (13) shows how the restricted system in (12) can be used to define a simple left-associative parsing algorithm. null</Paragraph> </Section> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 5. Shift-reduce parsing </SectionTitle> <Paragraph position="0"> It has sometimes been noted that a derivation tree in categorial grammar (such as (7)) does not really reflect constituent structure in the traditional sense, but that it reflects a particular parse process. This may be true for categorial systems in general, but it is particularly clear for the P-calculus.</Paragraph> <Paragraph position="1"> Consider for instance how one would parse a Shifting an element onto the stack (apart form the first one maybe) seems to be equivalent to combining elements by means of I.</Paragraph> <Paragraph position="2"> The stack is after all nothing but a somewhat different representation of the product types we used earlier. The fact that adding one element to the stack (vp\vp) induces two reduction steps, is comparable to the fact that the inference rule P may have the effect of eliminating more than one slash at time.</Paragraph> <Paragraph position="3"> The similarity between shift-reduce parsing and the derivations in P brings in another interesting aspect. The shift-reduce algorithm is a correct parsing strategy, because it will produce all (syntactic) ambiguities for a given input string. This means that in the example above, a shift-reduce parser would only produce one syntax tree (assuming that the grammar has only application). null If the input was potentially ambiguous, as in (15), there are two different derivations. null (15) aJa a a\a It is after shifting a on the stack that a difference arises. Here, one can either reduce or shift one more step. The first choice leads to the left-branching derivation, the second to the right-branching one.</Paragraph> <Paragraph position="4"> The choice between shifting or reducing has a categorial equivalent. In the P-calculus, one can either produce a left-branching derivation tree for (15) by using application only, or as indicated in (16).</Paragraph> <Paragraph position="5"> Note that the P-calculus thus is able to find genuine syntactic (or potentially semantic) ambiguities, without producing a different branching phrase structure. The correspondence to shift-reduce parsing already suggests this of course, since we should consider the phrase structure produced by a structurally complete grammar much more as a record of the parse process than as a constituent structure in the traditional sense.</Paragraph> </Section> class="xml-element"></Paper>