File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/p01-1028_metho.xml

Size: 27,383 bytes

Last Modified: 2025-10-06 14:07:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P01-1028">
  <Title>Generating with a Grammar Based on Tree Descriptions: a Constraint-Based Approach</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 Description Grammars
</SectionTitle>
    <Paragraph position="0"> There is a range of grammar formalisms which depart from Tree Adjoining Grammar (TAG) by taking as basic building blocks tree descriptions rather than trees. D-Tree Grammar (DTG) is proposed in (Rambow et al., 1995) to remedy some empirical and theoretical shortcomings of TAG; Tree Description Grammar (TDG) is introduced in (Kallmeyer, 1999) to support syntactic and semantic underspecification and Interaction Grammar is presented in (Perrier, 2000) as an alternative way of formulating linear logic grammars. Like all these frameworks, DG uses tree descriptions and thereby benefits first, from the extended domain of locality which makes TAG particularly suitable for generation (cf. (Joshi, 1987)) and second, from the monotonicity which differentiates descriptions from trees with respect to adjunction (cf. (Vijay-Shanker, 1992)).</Paragraph>
    <Paragraph position="1"> DG differs from DTG and TDG however in that it adopts an axiomatic rather than a generative view of grammar: whereas in DTG and TDG, derived trees are constructed through a sequence of rewriting steps, in DG derived trees are models satisfying a conjunction of elementary tree descriptions. Moreover, DG differs from Interaction Grammars in that it uses a flat rather than a Montague style recursive semantics thereby permitting a simple syntax/semantics interface (see below).</Paragraph>
    <Paragraph position="2"> A Description Grammar is a set of lexical entries of the form a1a3a2a5a4a7a6a9a8 where a2 is a tree description and a6 is the semantic representation associated with a2 .</Paragraph>
    <Paragraph position="3"> Tree descriptions. A tree description is a conjunction of literals that specify either the label of a node or the position of a node relative to  other nodes. As a logical notation quickly becomes unwieldy, we use graphics instead. Figure 1 gives a graphic representation of a small DG fragment. The following conventions are used.</Paragraph>
    <Paragraph position="4"> Nodes represent node variables, plain edges strict dominance and dotted edges dominance. The labels of the nodes abbreviate a feature structure, e.g. the label NP:a90 represents the feature structure a91a93a92a21a94a96a95a39a97a26a98a33a99a100a4a31a101a89a102a88a103a9a97a104a103a106a105 , while the anchor represents the a99a108a107a87a109a33a98 value in the feature structure of the immediately dominating node variable.</Paragraph>
    <Paragraph position="5"> Node variables can have positive, negative or neutral polarity which are represented by black, white and gray nodes respectively. Intuitively, a negative node variable can be thought of as an open valency which must be filled exactly once by a positive node variable while a neutral node variable is a variable that may not be identified with any other node variable. Formally, polarities are used to define the class of saturated models. A saturated model a110 for a tree description a2 (written a110 a111a112 S a2 ) is a model in which each negative node variable is identified with exactly one positive node variable, each positive node variable with exactly one negative node variable and neutral node variables are not identified with any other node variable. Intuitively, a saturated model for a given tree description is the smallest tree satisfying this description and such that all syntactic valencies are filled. In contrast, a free model a110 for a2 (written, a110 a111a112 F a2 ) is a model such that every node in that model interprets exactly one node variable in a2 .</Paragraph>
    <Paragraph position="6"> In DG, lexical tree descriptions must obey the following conventions. First, the polarities are used in a systematic way as follows. Roots of  fragments (fully specified subtrees) are always positive; except for the anchor, all leaves of fragments are negative, and internal node variables are neutral. This guarantees that in a saturated model, tree fragments that belong to the denotation of distinct tree descriptions do not overlap. Second, we require that every lexical tree description has a single minimal free model, which essentially means that the lexical descriptions must be tree shaped.</Paragraph>
    <Paragraph position="7"> Semantic representation. Following (Stone and Doran, 1997), we represent meaning using a flat semantic representation, i.e. as multisets, or conjunctions, of non-recursive propositions. This treatment offers a simple syntax-semantics interface in that the meaning of a tree is just the conjunction of meanings of the lexical tree descriptions used to derive it once the free variables occurring in the propositions are instantiated. A free variable is instantiated as follows: each free variable labels a syntactic node variable a146 and is unified with the label of any node variable identified with a146 . For the purpose of this paper, a simple semantic representation language is adopted which in particular, does not include &amp;quot;handles&amp;quot; i.e. labels on propositions. For a wider empirical coverage including e.g. quantifiers, a more sophisticated version of flat semantics can be used such as Minimal Recursion Semantics (Copestake et al., 1999).</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
3 Parsing with DG
</SectionTitle>
    <Paragraph position="0"> Parsing with DG can be formulated as a model generation problem, the task of finding models satisfying a give logical formula. If we restrict our attention to grammars where every lexical tree description has exactly one anchor and (unrealistically) assuming that each word is associated  with exactly one lexical entry, then parsing a sentence a153a155a154a106a156a31a156a31a156a7a153 a139 consists in finding the saturated model(s) a110 with yield a153a155a154a106a156a31a156a31a156a7a153 a139 such that a110 satisfies the conjunction of lexical tree descriptions</Paragraph>
    <Paragraph position="2"> a0a161a160 the tree description associated with the word a153 a160 by the grammar. Figure 2 illustrates this idea for the sentence &amp;quot;John loves Mary&amp;quot;. The tree on the right hand side represents the saturated model satisfying the conjunction of the descriptions given on the left and obtained from parsing the sentence &amp;quot;John sees Mary&amp;quot; (the isolated negative node variable, the &amp;quot;ROOT description&amp;quot;, is postulated during parsing to cancel out the negative polarity of the top-most S-node in the parse tree). The dashed lines between the left and the right part of the figure schematise the interpretation function: it indicates which node variables gets mapped to which node in the model.</Paragraph>
    <Paragraph position="3"> As (Duchier and Thater, 1999) shows however, lexical ambiguity means that the parsing problem is in fact more complex as it in effect requires that models be searched for that satisfy a conjunction of disjunctions (rather than simply a conjunction) of lexical tree descriptions.</Paragraph>
    <Paragraph position="4"> The constraint based encoding of this problem presented in (Duchier and Thater, 1999) can be sketched as follows1. To start with, the conjunction of disjunctions of descriptions obtained on the basis of the lexical lookup is represented as a matrix, where each row corresponds to a word from the input (except for the first row which is filled with the above mentioned ROOT description) and columns give the lexical entries associated by the grammar with these words. Any matrix entry which is empty is filled with the formula a95a3a162a35a163a161a164 which is true in all models. Figure 3 shows an example parsing matrix for the string &amp;quot;John saw Mary&amp;quot; given the grammar in Figure 1.2 Given such a matrix, the task of parsing con- null the ROOT description in the matrices.</Paragraph>
    <Paragraph position="5"> sists in: 1. selecting exactly one entry per row thereby producing a conjunction of selected lexical entries, 2. building a saturated model for this conjunction of selected entries such that the yield of that model is equal to the input string and 3. building a free model for each of the remain null ing (non selected) entries.</Paragraph>
    <Paragraph position="6"> The important point about this way of formulating the problem is that it requires all constraints imposed by the lexical tree descriptions occurring in the matrix to be satisfied (though not necessarily in the same model). This ensures strong constraint propagation and thereby reduces nondeterminism. In particular, it avoids the combinatorial explosion that would result from first generating the possible conjunctions of lexical descriptions out of the CNF obtained by lexical lookup and second, testing their satisfiability.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4 Generating with DG
</SectionTitle>
    <Paragraph position="0"> We now show how the parsing model just described can be adapted to generate from some semantic representation a0 , one or more sentence(s) with semantics a0 .</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.1 Basic Idea
</SectionTitle>
      <Paragraph position="0"> The parsing model outlined in the previous section can directly be adapted for generation as follows. First, the lexical lookup is modified such that propositions instead of words are used to determine the relevant lexical tree descriptions: a lexical tree description is selected if its semantics subsumes part of the input semantics. Second, the constraint that the yield of the saturated model matches the input string is replaced by a constraint that the sum of the cardinalities of the multisets of propositions associated with the lexical tree descriptions composing the solution tree equals the cardinality of the input semantics. Together with the above requirement that only lexical entries be selected whose semantics subsumes part of the goal semantics, this ensures that the semantics of the solution trees is identical with the input semantics.</Paragraph>
      <Paragraph position="1"> The following simple example illustrates this idea. Suppose the input semantics is a91a96a98a108a94a127a165a166a164a5a167a24a90a168a4a89a169a39a109a33a107a47a98a168a170a35a4a31a98a108a94a33a165a171a164a5a167a24a172a173a4a31a165a174a94a127a162a35a175a161a170a35a4a39a176a31a164a35a164a5a167a89a177a127a4a178a90a168a4a178a172a179a170a21a105 and the grammar is as given in Figure 1. The generating matrix then is:</Paragraph>
      <Paragraph position="3"> Given this generating matrix, two matrix models will be generated, one with a saturated model a110a189a188 satisfying a2a178a190a53a191a86a192a35a193a194a133a195a2a150a196a89a197a24a197a35a132a140a133a198a2a150a199a100a200a178a201a89a202 and a free model satisfying a2a150a196a3a197a3a197a178a203 and the other with the saturated model a110a205a204 satisfying a2a178a190a53a191a86a192a35a193a140a133a194a2 a196a89a197a3a197a178a203 a133a194a2 a199a100a200a178a201a89a202 and a free model satisfying a2a150a196a89a197a3a197 a132 . The first solution yields the sentence &amp;quot;John sees Mary&amp;quot; whereas the second yields the topicalised sentence &amp;quot;Mary, John sees.&amp;quot;</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.2 Going Further
</SectionTitle>
      <Paragraph position="0"> The problem with the simple method outlined above is that it severely restricts the class of grammars that can be used by the generator. Recall that in (Duchier and Thater, 1999)'s parsing model, the assumption is made that each lexical entry has exactly one anchor. In practice this means that the parser can deal neither with a grammar assigning trees with multiple anchors to idioms (as is argued for in e.g. (Abeill'e and Schabes, 1989)) nor with a grammar allowing for trace anchored lexical entries. The mirror restriction for generation is that each lexical entry must be associated with exactly one semantic proposition. The resulting shortcomings are that the generator can deal neither with a lexical entry having an empty semantics nor with a lexical entry having a multi-propositional semantics. We first show that these restrictions are too strong. We then show how to adapt the generator so as to lift them.</Paragraph>
      <Paragraph position="1"> Empty Semantics. Arguably there are words such as &amp;quot;that&amp;quot; or infinitival &amp;quot;to&amp;quot; whose semantic contribution is void. As (Shieber, 1988) showed, the problem with such words is that they cannot be selected on the basis of the input semantics.</Paragraph>
      <Paragraph position="2"> To circumvent this problem, we take advantage of the TAG extended domain of locality to avoid having such entries in the grammar. For instance, complementizer &amp;quot;that&amp;quot; does not anchor a tree description by itself but occurs in all lexical tree descriptions providing an appropriate syntactic context for it, e.g. in the tree description for &amp;quot;say&amp;quot;. Multiple Propositions. Lexical entries with a multi-propositional semantics are also very common. For instance, a neo-Davidsonian semantics would associate e.g. a162a35a163a5a98a206a167a89a177a41a170a35a4a54a94a88a207a87a164a39a98a161a95a41a167a89a177a127a4a178a90a208a170 with the verb &amp;quot;run&amp;quot; or a162a209a163a5a98a206a167a89a177a96a4a178a90a208a170a35a4a86a99a5a94a33a176a209a95a41a167a89a177a150a170 with the past tensed &amp;quot;ran&amp;quot;. Similarly, agentless passive &amp;quot;be&amp;quot; might be represented by an overt quantification over the missing agent position (such as</Paragraph>
      <Paragraph position="4"> the complement verb semantics). And a grammar with a rich lexical semantics might for instance associate the semantics a214a215a94a33a98a161a95a31a167a89a177a33a154a54a4a178a90a168a4a21a177a88a216a54a170 , a107a87a94a33a217a127a164a93a167a89a177a54a216a33a4a178a90a208a170 with &amp;quot;want&amp;quot; (cf. (McCawley, 1979) which argues for such a semantics to account for examples such as &amp;quot;Reuters wants the report tomorrow&amp;quot; where &amp;quot;tomorrow&amp;quot; modifies the &amp;quot;having&amp;quot; not the &amp;quot;wanting&amp;quot;).</Paragraph>
      <Paragraph position="5"> Because it assumes that each lexical entry is associated with exactly one semantic proposition, such cases cannot be dealt with the generator sketched in the previous section. A simple method for fixing this problem would be to first partition the input semantics in as many ways as are possible and to then use the resulting partitions as the basis for lexical lookup.</Paragraph>
      <Paragraph position="6"> The problems with this method are both theoretical and computational. On the theoretical side, the problem is that the partitioning is made independent of grammatical knowledge. It would be better for the decomposition of the input semantics to be specified by the lexical lookup phase, rather than by means of a language independent partitioning procedure. Computationally, this method is unsatisfactory in that it implements a generate-and-test procedure (first, a partition is created and second, model generation is applied to the resulting matrices) which could rapidly lead to combinatorial explosion and is contrary in spirit to (Duchier and Thater, 1999) constraint-based approach.</Paragraph>
      <Paragraph position="7"> We therefore propose the following alternative procedure. We start by marking in each lexical entry, one proposition in the associated semantics as being the head of this semantic representation. The marking is arbitrary: it does not matter which proposition is the head as long as each semantic representation has exactly one head. We then use this head for lexical lookup.</Paragraph>
      <Paragraph position="8">  basis of their index. That is, a lexical entry is selected iff its head unifies with a proposition in the input semantics. To preserve coherence, we further maintain the additional constraint that the total semantics of each selected entries subsumes (part of) the input semantics. For instance, given the grammar in Figure 4 (where semantic heads are underlined) and the input semantics a162a209a163a5a98a100a167a89a177a96a4a178a90a208a170a35a4a31a98a108a94a33a165a171a164a179a167a24a90a168a4a53a245a96a246a127a247a179a146a168a170a35a4a86a99a179a94a150a176a209a95a33a167a89a177a41a170 , the generating matrix will be:  Given this matrix, two solutions will be found: the saturated tree for &amp;quot;John ran&amp;quot; satisfying the conjunction a2a178a190a12a191a44a192a35a193a248a133a213a2a150a201a73a200 a193 and that for &amp;quot;John did run&amp;quot; satisfying a2a178a190a12a191a44a192a35a193a249a133a250a2a150a201a89a251 a193 a133a252a2a33a253a178a254a151a253 . No other solution is found as for any other conjunction of descriptions made available by the matrix, no saturated model exists.</Paragraph>
      <Paragraph position="9"> 5 Comparison with related work Our generator presents three main characteristics: (i) It is based on an axiomatic rather than a generative view of grammar, (ii) it uses a TAG-like grammar in which the basic linguistic units are trees rather than categories and (iii) it assumes a flat semantics.</Paragraph>
      <Paragraph position="10"> In what follows we show that this combination of features results in a generator which integrates the positive aspects of both top-down and bottom-up generators. In this sense, it is not unlike (Shieber et al., 1990)'s semantic-head-driven generation. As will become clear in the following section however, it differs from it in that it integrates stronger lexicalist (i.e. bottom-up) information. null</Paragraph>
    </Section>
    <Section position="3" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.1 Bottom-Up Generation
</SectionTitle>
      <Paragraph position="0"> Bottom-up or &amp;quot;lexically-driven&amp;quot; generators (e.g., (Shieber, 1988; Whitelock, 1992; Kay, 1996; Carroll et al., 1999)) start from a bag of lexical items with instantiated semantics and generates a syntactic tree by applying grammar rules whose right hand side matches a sequence of phrases in the current input.</Paragraph>
      <Paragraph position="1"> There are two known disadvantages to bottom-up generators. On the one hand, they require that the grammar be semantically monotonic that is, that the semantics of each daughter in a rule subsumes some portion of the mother semantics.</Paragraph>
      <Paragraph position="2"> On the other hand, they are often overly non-deterministic (though see (Carroll et al., 1999) for an exception). We now show how these problems are dealt with in the present algorithm.</Paragraph>
      <Paragraph position="3"> Non-determinism. Two main sources of non-determinism affect the performance of bottom-up generators: the lack of an indexing scheme and the presence of intersective modifiers.</Paragraph>
      <Paragraph position="4"> In (Shieber, 1988), a chart-based bottom-up generator is presented which is devoid of an indexing scheme: all word edges leave and enter the same vertex and as a result, interactions must be considered explicitly between new edges and all edges currently in the chart. The standard solution to this problem (cf. (Kay, 1996)) is to index edges with semantic indices (for instance, the edge with category N/x:dog(x) will be indexed with x) and to restrict edge combination to these edges which have compatible indices. Specifically, an active edge with category A(...)/C(c ...) (with c the semantics index of the missing component) is restricted to combine with inactive edges with category C(c ...), and vice versa.</Paragraph>
      <Paragraph position="5"> Although our generator does not make use of a chart, the constraint-based processing model described in (Duchier and Thater, 1999) imposes a similar restriction on possible combinations as it in essence requires that only these nodes pairs be tried for identification which (i) have opposite polarity and (ii) are labeled with the same semantic index.</Paragraph>
      <Paragraph position="6"> Let us now turn to the second known source of non-determinism for bottom-up generators namely, intersective modifiers. Within a constructive approach to lexicalist generation, the number of structures (edges or phrases) built when generating a phrase with a146 intersective modifiers is a255 a139 in the case where the grammar imposes a single linear ordering of these modifiers. For instance, when generating &amp;quot;The fierce little black cat&amp;quot;, a naive constructive approach will also build the subphrases (1) only to find that these cannot be part of the output as they do not exhaust the input semantics.</Paragraph>
      <Paragraph position="7"> (1) The fierce black cat, The fierce little cat, The little black cat, The black cat, The fierce cat, The little cat, The cat.</Paragraph>
      <Paragraph position="8"> To remedy this shortcoming, various heuristics and parsing strategies have been proposed. (Brew, 1992) combines a constraint-propagation mechanism with a shift-reduce generator, propagating constraints after every reduction step. (Carroll et al., 1999) advocate a two-step generation algorithm in which first, the basic structure of the sentence is generated and second, intersective modifiers are adjoined in. And (Poznanski et al., 1995) make use of a tree reconstruction method which incrementally improves the syntactic tree until it is accepted by the grammar. In effect, the constraint-based encoding of the axiomatic view of generation proposed here takes advantage of Brew's observation that constraint propagation can be very effective in pruning the search space involved in the generation process.</Paragraph>
      <Paragraph position="9"> In constraint programming, the solutions to a constraint satisfaction problem (CSP) are found by alternating propagation with distribution steps.</Paragraph>
      <Paragraph position="10"> Propagation is a process of deterministic inference which fills out the consequences of a given choice by removing all the variable values which can be inferred to be inconsistent with the problem constraint while distribution is a search process which enumerates possible values for the problem variables. By specifying global properties of the output and letting constraint propagation fill out the consequences of a choice, situations in which no suitable trees can be built can be detected early. Specifically, the global constraint stating that the semantics of a solution tree must be identical with the goal semantics rules out the generation of the phrases in (1b). In practice, we observe that constraint propagation is indeed very efficient at pruning the search space. As table 5 shows, the number of choice points (for these specific examples) augments very slowly with the size of the input.</Paragraph>
      <Paragraph position="11"> Semantic monotonicity. Lexical lookup only returns these categories in the grammar whose semantics subsumes some portion of the input semantics. Therefore if some grammar rule involves a daughter category whose semantics is not part of the mother semantics i.e. if the grammar is semantically non-monotonic, this rule will never be applied even though it might need to be. Here is an example. Suppose the grammar contains the following rule (where X/Y abbreviates a category with part-of-speech X and semantics Y): vp/call up(X,Y) a0 v/call up(X,Y), np/Y, pp/up And suppose the input semantics is</Paragraph>
      <Paragraph position="13"> input, lexical lookup will return the categories V/call up(john,mary), NP/john and NP/mary (because their semantics subsumes some portion of the input semantics) but not the category PP/up. Hence the sentence &amp;quot;John called Mary up&amp;quot; will fail to be generated.</Paragraph>
      <Paragraph position="14"> In short, the semantic monotonicity constraint makes the generation of collocations and idioms problematic. Here again the extended domain of locality provided by TAG is useful as it means that the basic units are trees rather than categories. Furthermore, as argued in (Abeill'e and Schabes, 1989), these trees can have multiple lexical anchors. As in the case of vestigial semantics discussed in Section 4 above, this means that phonological material can be generated without its semantics necessarily being part of the input.</Paragraph>
    </Section>
    <Section position="4" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
5.2 Top-Down Generation
</SectionTitle>
      <Paragraph position="0"> As shown in detail in (Shieber et al., 1990), top-down generators can fail to terminate on certain grammars because they lack the lexical information necessary for their well-foundedness. A simple example involves the following grammar fragment: null r1. s/S a0 np/NP, vp(NP)/S r2. np/NP a0 det(N)/NP, n/N r3. det(N)/NP a0 np/NP0, poss(NP0,NP)/NP r4. np/john a0 john r5. poss(NP0,NP)/mod(N,NP0) a0 s r6. n/father a0 father r7. vp(NP)/left(NP) a0 left Given a top-down regime proceeding depth-first, left-to-right through the search space defined by the grammar rules, termination may fail to occur as the intermediate goal semantics NP (in the second rule) is uninstantiated and permits an infinite loop by iterative applications of rules r2 and r3. Such non-termination problems do not arise for the present algorithm as it is lexically driven. So for instance given the corresponding DG fragment for the above grammar and the input semantics a91 a1 a164a4a3a39a95a39a167a89a177a96a4a178a90a208a170a35a4a5a3a31a94a127a95a24a107a5a164a31a162a108a167a24a90a9a4a178a172 a170a35a4a31a98a108a94a33a165a171a164a5a167a24a172a173a4a53a245a96a246a127a247a179a146a168a170a21a105 , the generator will simply select the tree descriptions for &amp;quot;left&amp;quot;, &amp;quot;John&amp;quot;, &amp;quot;s&amp;quot; and &amp;quot;father&amp;quot; and generate the saturated model satisfying the conjunction of these descriptions.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="0" end_page="0" type="metho">
    <SectionTitle>
6 Implementation
</SectionTitle>
    <Paragraph position="0"> The ideas presented here have been implemented using the concurrent constraint programming language Oz (Smolka, 1995). The implementation includes a model generator for the tree logic presented in section 2, two lexical lookup modules (one for parsing, one for generation) and a small DG fragment for English which has been tested in parsing and generation mode on a small set of English sentences.</Paragraph>
    <Paragraph position="1"> This implementation can be seen as a proof of concept for the ideas presented in this paper: it shows how a constraint-based encoding of the type of global constraints suggested by an axiomatic view of grammar can help reduce non-determinism (few choice points cf. table 5) but performance decreases rapidly with the length of the input and it remains a matter for further research how efficiency can be improved to scale up to bigger sentences and larger grammars.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML