XML Viewer - p02-1003

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/p02-1003_metho.xml
Size: 25,187 bytes
Last Modified: 2025-10-06 14:07:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="P02-1003">
  <Title>Generation as Dependency Parsing</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
2 The Realization Problem
</SectionTitle>
    <Paragraph position="0"> In this paper, we deal with the subtask of natural language generation known as surface realization: given a grammar and a semantic representation, the problem is to find a sentence which is grammatical according to the grammar and expresses the content of the semantic representation.</Paragraph>
    <Paragraph position="1"> We represent the semantic input as a multiset (bag) of ground atoms of predicate logic, such as {buy(e,a,b), name(a,mary) car(b)}. To encode syntactic information, we use a tree-adjoining grammar without feature structures (Joshi and Schabes, 1997). Following Stone and Doran (1997) and Kay (1996), we enhance this TAG grammar with a syntax-semantics interface in which nonterminal nodes of the elementary trees are equipped with index variables, which can be bound to individuals in the semantic input. We assume that the root node, all substitution nodes, and all nodes that admit adjunction carry such index variables. We also assign a semantics to every elementary tree, so that lexical entries are pairs of the form (ph, T), where ph is a multiset of semantic atoms, and T is an initial or auxiliary tree, e.g.</Paragraph>
    <Paragraph position="3"> When the lexicon is accessed, x,y,z get bound to terms occurring in the semantic input, e.g. e,a,b in our example. Since we furthermore assume that every index variable that appears in T also appears in ph, this means that all indices occurring in T get bound at this stage.</Paragraph>
    <Paragraph position="4"> The semantics of a complex tree is the multiset union of the semantics of the elementary trees involved. Now we say that the realization problem of a grammar G is to decide for a given input semantics S and an index i whether there is a derivation tree which is grammatical according to G, is assigned the semantics S, and has a root node with index i.</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="5" type="metho">
    <SectionTitle>
3 NP-Completeness of Realization
</SectionTitle>
    <Paragraph position="0"> This definition is the simplest conceivable formalization of problems occurring in surface realization as a decision problem: It does not even require us to compute a single actual realization, just to check</Paragraph>
    <Paragraph position="2"> whether one exists. Every practical generation system generating from flat semantics will have to address this problem in one form or another.</Paragraph>
    <Paragraph position="3"> Now we show that this problem is NP-complete.</Paragraph>
    <Paragraph position="4"> A similar result was proved in the context of shake-and-bake generation by Brew (1992), but he needed to use the grammar in his encoding, which leaves the possibility open that for every single grammar G, there might be a realization algorithm tailored specifically to G which still runs in polynomial time.</Paragraph>
    <Paragraph position="5"> Our result is stronger in that we define a single grammar G ham whose realization problem is NP-complete in the above sense. Furthermore, we find that our proof brings out the sources of the complexity more clearly. G ham does not permit adjunction, hence the result also holds for context-free grammars with indices.</Paragraph>
    <Paragraph position="6">  It is clear that the problem is in NP: We can simply guess the elementary trees we need and how to combine them, and then check in polynomial time whether they verbalize the semantics. The NP-hardness proof is by reducing the well-known HAMILTONIAN-PATH problem to the realization problem. HAMILTONIAN-PATH is the problem of deciding whether a directed graph has a cycle that visits each node exactly once, e.g. (1,3,2,1) in the graph shown above.</Paragraph>
    <Paragraph position="7"> We will now construct an LTAG grammar G ham such that every graph G =(V,E) can be encoded as a semantic input S for the realization problem of G ham , which can be verbalized if and only if G has a Hamiltonian cycle. S is defined as follows:</Paragraph>
    <Paragraph position="9"> a Hamiltonian cycle.</Paragraph>
    <Paragraph position="10"> The grammar G ham is given in Fig. 1; the start symbol is B, and we want the root to have index 1. The tree a  models an edge transition from node i to the node k by consuming the semantic encodings of this edge and (by way of a substitution of a  )of the node i. The second substitution node of a  can be filled either by another a  , in which way a path through the graph is modelled, or by an a  , in which case we switch to an &amp;quot;edge eating mode&amp;quot;. In this mode, we can arbitrarily consume edges using a  , and close the tree with a  when we're done. This is illustrated in Fig. 2, the tree corresponding to the cycle in the example graph above. The Hamiltonian cycle of the graph, if one exists, is represented in the indices of the B nodes. The list of these indices is a path in the graph, as the a  trees model edge transitions; it is a cycle because it starts in 1 and ends in 1; and it visits each node exactly once, for we use exactly one a  tree for each node literal. The edges which weren't used in the cycle can be consumed in the edge eating mode. The main source for the combinatorics of the realization problem is thus the interaction of lexical ambiguity and the completely free order in the flat semantics. Once we have chosen between a  in the realization of each edge literal, we have determined which edges should be part of the prospective Hamiltonian cycle, and checking whether it really is one can be done in linear time. If, on the other hand, the order of the input placed restrictions on the structure of the derivation tree, we would again have information that told us when to switch into the edge eating mode, i.e. which edges should be part peter likes mary  of the cycle. A third source of combinatorics which does not become so clear in this encoding is the configuration of the elementary trees. Even when we have committed to the lexical entries, it is conceivable that only one particular way of plugging them into each other is grammatical.</Paragraph>
  </Section>
  <Section position="5" start_page="5" end_page="5" type="metho">
    <SectionTitle>
4 Topological Dependency Grammar
</SectionTitle>
    <Paragraph position="0"> These factors are exactly the same that make dependency parsing for free word order languages difficult, and it seems worthwhile to see whether optimized parsers for dependency grammars can also contribute to making generation efficient. We now sketch a dependency formalism which has an efficient parser and then discuss some of the important properties of this parser. In the next section, we will see how to employ the parser for generation.</Paragraph>
    <Section position="1" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
4.1 The Grammar Formalism
</SectionTitle>
      <Paragraph position="0"> The parse trees of topological dependency grammar (TDG) (Duchier and Debusmann, 2001; Duchier, 2002) are trees whose nodes correspond one-to-one to the words of the sentence, and whose edges are labelled, e.g. with syntactic relations (see Fig. 3). The trees are unordered, i.e. there is no intrinsic order among the children of a node. Word order in TDG is initially completely free, but there is a separate mechanism to specify constraints on linear precedence. Since completely free order is what we want for the realization problem, we do not need these mechanisms and do not go into them here.</Paragraph>
      <Paragraph position="1"> The lexicon assigns to each word a set of lexical entries; in a parse tree, one of these lexical entries has to be picked for each node. The lexical entry specifies what labels are allowed on the incoming edge (the node's labels) and the outgoing edges (the node's valency). Here are some examples:  The lexical entry for &amp;quot;likes&amp;quot; specifies that the corresponding node does not accept any incoming edges (and hence must be the root), must have precisely one subject and one object edge going out, and can have arbitrarily many outgoing edges with label adv (indicated by [?]). The nodes for &amp;quot;Peter&amp;quot; and &amp;quot;Mary&amp;quot; both require their incoming edge to be labelled with either subj or obj and neither require nor allow any outgoing edges.</Paragraph>
      <Paragraph position="2"> A well-formed dependency tree for an input sentence is simply a tree with the appropriate nodes, whose edges obey the labels and valency restrictions specified by the lexical entries. So, the tree in Fig. 3 is well-formed according to our lexicon.</Paragraph>
    </Section>
    <Section position="2" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
4.2 TDG Parsing
</SectionTitle>
      <Paragraph position="0"> The parsing problem of TDG can be seen as a search problem: For each node, we must choose a lexical entry and the correct mother-daughter relations it participates in. One strength of the TDG approach is that it is amenable to strong syntactic inferences that tackle specifically the three sources of complexity mentioned above.</Paragraph>
      <Paragraph position="1"> The parsing algorithm (Duchier, 2002) is stated in the framework of constraint programming (Koller and Niehren, 2000), a general approach to coping with combinatorial problems. Before it explores all choices that are possible in a certain state of the search tree (distribution), it first tries to eliminate some of the choices which definitely cannot lead to a solution by simple inferences (propagations). &amp;quot;Simple&amp;quot; means that propagations take only polynomial time; the combinatorics is in the distribution steps alone. That is, it can still happen that a search tree of exponential size has to be explored, but the time spent on propagation in each of its node is only polynomial. Strong propagation can reduce the size of the search tree, and it may even make the whole algorithm run in polynomial time in practice.</Paragraph>
      <Paragraph position="2"> The TDG parser translates the parsing problem into constraints over (variables denoting) finite sets of integers, as implemented efficiently in the Mozart programming system (Oz Development Team, 1999). This translation is complete: Solutions of the set constraint can be translated back to correct dependency trees. But for efficiency, the parser uses additional propagators tailored to the specific inferences of the dependency problem. For instance, in the &amp;quot;Peter likes Mary&amp;quot; example above, one such propagator could contribute the information that neither the &amp;quot;Peter&amp;quot; nor the &amp;quot;Mary&amp;quot; node can be an adv child of &amp;quot;likes&amp;quot;, because neither can accept an adv edge. Once the choice has been made that &amp;quot;Peter&amp;quot; is the subj child of &amp;quot;likes&amp;quot;, a propagator can contribute that &amp;quot;Mary&amp;quot; must be its obj child, as it is the only possible candidate for the (obligatory) obj child.</Paragraph>
      <Paragraph position="3"> Finally, lexical ambiguity is handled by selection constraints. These constraints restrict which lexical entry should be picked for a node. When all possible lexical entries have some information in common (e.g., that there must be an outgoing subj edge), this information is automatically lifted to the node and can be used by the other propagators. Thus it is sometimes even possible to finish parsing without committing to single lexical entries for some nodes.</Paragraph>
    </Section>
  </Section>
  <Section position="6" start_page="5" end_page="5" type="metho">
    <SectionTitle>
5 Generation as Dependency Parsing
</SectionTitle>
    <Paragraph position="0"> We will now show how TDG parsing can be used to enumerate all sentences expressing a given input semantics, thereby solving the realization problem introduced in Section 2. We first define the encoding.</Paragraph>
    <Paragraph position="1"> Then we give an example and discuss some runtime results. Finally, we consider a particular restriction of our encoding and ways of overcoming it.</Paragraph>
    <Section position="1" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
5.1 The Encoding
</SectionTitle>
      <Paragraph position="0"> Let G be a grammar as described in Section 2; i.e. lexical entries are of the form (ph, T), where ph is a flat semantics and T is a TAG elementary tree whose nodes are decorated with semantic indices. We make the following simplifying assumptions. First, we assume that the nodes of the elementary trees of G are not labelled with feature structures. Next, we assume that whenever we can adjoin an auxiliary tree at a node, we can adjoin arbitrarily many trees at this node. The idea of multiple adjunction is not new (Schabes and Shieber, 1994), but it is simplified here because we disregard complex adjunction constraints. We will discuss these two restrictions in the conclusion. Finally, we assume that every lexical semantics ph has precisely one member; this restriction will be lifted in Section 5.4.</Paragraph>
      <Paragraph position="1"> Now let's say we want to find the realizations of the input semantics S = {ph</Paragraph>
      <Paragraph position="3"> }, using the grammar G. The input &amp;quot;sentence&amp;quot; of the parsing start mary buy car indef red</Paragraph>
      <Paragraph position="5"> problem we construct is the sequence {start}[?]S, where start is a special start symbol. The parse tree will correspond very closely to a TAG derivation tree, its nodes standing for the instantiated elementary trees that are used in the derivation.</Paragraph>
      <Paragraph position="6"> To this end, we use two types of edge labels substitution and adjunction labels. An edge with a substitution label subst A,i,p from the node a to the node b (both of which stand for elementary trees) indicates that b should be plugged into the p-th substitution node in a that has label A and index i.We write subst(A) for the maximum number of occurrences of A as the label of substitution nodes in any elementary tree of G; this is the maximum value that p can take.</Paragraph>
      <Paragraph position="7"> An edge with an adjunction label adj A,i from a to b specifies that b is adjoined at some node within a carrying label A and index i and admitting adjunction. It does not matter for our purposes to which node in abis adjoined exactly; the choice cannot affect grammaticality because there is no feature unification involved.</Paragraph>
      <Paragraph position="8"> The dependency grammar encodes how an elementary tree can be used in a TAG derivation by restricting the labels of the incoming and outgoing edges via labels and valency requirements in the lexicon. Let's say that T is an elementary tree of G which has been matched with the input atom ph</Paragraph>
      <Paragraph position="10"> stantiating its index variables. Let A be the label and i the index of the root of T.IfT is an auxiliary tree, it accepts incoming adjunction edges for A and i, i.e. it gets the labels value {adj A,i }.IfT is an initial tree, it will accept arbitrary incoming substitution edges for A and i, i.e. its labels value is {subst A,i,p  |1 [?] p [?] subst(A)} In either case, T will require precisely one out-going substitution edge for each of its substitution nodes, and it will allow arbitrary numbers of outgoing adjunction edges for each node where we can adjoin. That is, the valency value is as follows: {subst A,i,p  |ex. substitution node N in T s.t. A is label, i is index of N, and N is pth substitution node for A:i in T} [?]{adj A,i [?]|ex. node with label A, index i in T which admits adjunction} We obtain the set of all lexicon entries for the</Paragraph>
      <Paragraph position="12"> as just specified. The start symbol, start, gets a special lexicon entry: Its labels entry is the empty set (i.e. it must be the root of the tree), and its valency entry is the set {subst S,k,1 }, where k is the semantic index with which generation should start.</Paragraph>
    </Section>
    <Section position="2" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
5.2 An Example
</SectionTitle>
      <Paragraph position="0"> Now let us go through an example to make these definitions a bit clearer. Let's say we want to verbalize the semantics {name(m, mary), buy(e,m,c), car(c), indef(c), red(c)} The LTAG grammar we use contains the elementary trees which are used in the tree in Fig. 5, along with the obvious semantics; we want to generate a sentence starting with the main event e. The encoding produces the following dependency grammar; the entries in the &amp;quot;atom&amp;quot; column are to be read as abbreviations of the actual atoms in the input semantics. null  If we parse the &amp;quot;sentence&amp;quot; start mary buy car indef red with this grammar, leaving the word order completely open, we obtain precisely one parse tree, shown in Fig. 4. Reading this parse as a TAG derivation tree, we can reconstruct the derived tree in Fig. 5, which indeed produces the string &amp;quot;Mary buys a red car&amp;quot;.</Paragraph>
    </Section>
    <Section position="3" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
5.3 Implementation and Experiments
</SectionTitle>
      <Paragraph position="0"> The overall realization algorithm we propose encodes the input problem as a DG parsing problem and then runs the parser described in Section 4.2, which is freely available over the Web, as a black box. Because the information lifted to the nodes by the selection constraints may be strong enough to compute the parse tree without ever committing to unique lexical entries, the complete parse may still contain some lexical ambiguity. This is no problem, however, because the absence of features guarantees that every combination of choices will be grammatical. Similarly, a node can have multiple children over adjunction edges with the same label, and there may be more than one node in the upper elementary tree to which the lower tree could be adjoined. Again, all remaining combinations are guaranteed to be grammatical.</Paragraph>
      <Paragraph position="1"> In order to get an idea of the performance of our realization algorithm in comparison to the state of the art, we have tried generating the following sentences, which are examples from (Carroll et al., 1999): (1) The manager in that office interviewed a new consultant from Germany.</Paragraph>
      <Paragraph position="2"> (2) Our manager organized an unusual additional weekly departmental conference.</Paragraph>
      <Paragraph position="3"> We have converted the XTAG grammar (XTAG Research Group, 2001) into our grammar format, automatically adding indices to the nodes of the elementary trees, removing features, simplifying adjunction constraints, and adding artificial lexical semantics that consists of the words at the lexical anchors and the indices used in the respective trees. XTAG typically assigns quite a few elementary trees to one lemma, and the same lexical semantics can often be verbalized by more than hundred elementary trees in the converted grammar. It turns out that the dependency parser scales very nicely to this degree of lexical ambiguity: The sentence (1) is generated in 470 milliseconds (as opposed to Carroll et al.'s 1.8 seconds), whereas we generate (2) in about 170 milliseconds (as opposed to 4.3 seconds).</Paragraph>
      <Paragraph position="4">  Although these numbers are by no means a serious evaluation of our system's performance, they do present a first proof of concept for our approach.</Paragraph>
      <Paragraph position="5"> The most encouraging aspect of these results is that despite the increased lexical ambiguity, the parser gets by without ever making any wrong choices, which means that it runs in polynomial time, on all examples we have tried. This is possible because on the one hand, the selection constraint automatically compresses the many different elementary trees that XTAG assigns to one lemma into very few classes. On the other hand, the propagation that rules out impossible edges is so strong that the free input order does not make the configuration problem much harder in practice. Finally, our treatment of modification allows us to multiply out the possible permutations in a postprocessing step, after the parser has done the hard work. A particularly striking example is (2), where the parser gives us a single solution, which multiplies out to 312 = 13 * 4! different realizations. (The 13 basic realizations correspond to different syntactic frames for the main verb in the XTAG grammar, e.g. for topicalized or passive constructions.)</Paragraph>
    </Section>
    <Section position="4" start_page="5" end_page="5" type="sub_section">
      <SectionTitle>
5.4 More Complex Semantics
</SectionTitle>
      <Paragraph position="0"> So far, we have only considered TAG grammars in which each elementary tree is assigned a semantics that contains precisely one atom. However, there are cases where an elementary tree either has an empty semantics, or a semantics that contains multiple atoms. The first case can be avoided by exploiting TAG's extended domain of locality, see e.g.</Paragraph>
      <Paragraph position="1"> (Gardent and Thater, 2001).</Paragraph>
      <Paragraph position="2"> The simplest possible way for dealing with the second case is to preprocess the input into several  A newer version of Carroll et al.'s system generates (1) in 420 milliseconds (Copestake, p.c.). Our times were measured on a 700 MHz Pentium-III PC.</Paragraph>
      <Paragraph position="3"> different parsing problems. In a first step, we collect all possible instantiations of LTAG lexical entries matching subsets of the semantics. Then we construct all partitions of the input semantics in which each block in the partition is covered by a lexical entry, and build a parsing problem in which each block is one symbol in the input to the parser.</Paragraph>
      <Paragraph position="4"> This seems to work quite well in practice, as there are usually not many possible partitions. In the worst case, however, this approach produces an exponential number of parsing problems. Indeed, using a variant of the grammar from Section 3, it is easy to show that the problem of deciding whether there is a partition whose parsing problem can be solved is NP-complete as well. An alternative approach is to push the partitioning process into the parser as well. We expect this will not hurt the runtime all that much, but the exact effect remains to be seen.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="5" end_page="5" type="metho">
    <SectionTitle>
6 Comparison to Other Approaches
</SectionTitle>
    <Paragraph position="0"> The perspective on realization that our system takes is quite different from previous approaches. In this section, we relate it to chart generation (Kay, 1996; Carroll et al., 1999) and to another constraint-based approach (Gardent and Thater, 2001).</Paragraph>
    <Paragraph position="1"> In chart based approaches to realization, the main idea is to minimize the necessary computation by reusing partial results that have been computed before. In the setting of fixed word order parsing, this brings an immense increase in efficiency. In generation, however, the NP-completeness manifests itself in charts of worst-case exponential size. In addition, it can happen that substructures are built which are not used in the final realization, especially when processing modifications.</Paragraph>
    <Paragraph position="2"> By contrast, our system configures nodes into a dependency tree. It solves a search problem, made up by choices for mother-daughter relations in the tree. Propagation, which runs in polynomial time, has access to global information (illustrated in Section 4.2) and can thus rule out impossible mother-daughter relations efficiently; every propagation step that takes place actually contributes to zooming in on the possible realizations. Our system can show exponential runtimes when the distributions span a search tree of exponential size.</Paragraph>
    <Paragraph position="3"> Gardent and Thater (2001) also propose a constraint based approach to generation working with a variant of TAG. However, the performance of their system decreases rapidly as the input gets larger even when when working with a toy grammar. The main difference between their approach and ours seems to be that their algorithm tries to construct a derived tree, while ours builds a derivation tree.</Paragraph>
    <Paragraph position="4"> Our parser only has to deal with information that is essential to solve the combinatorial problem, and not e.g. with the internal structure of the elementary trees. The reconstruction of the derived tree, which is cheap once the derivation tree has been computed, is delegated to a post-processing step. Working with derived trees, Gardent and Thater (2001) cannot ignore any information and have to keep track of the relationships between nodes at points where they are not relevant.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML