XML Viewer - c92-2092

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-2092_metho.xml
Size: 20,364 bytes
Last Modified: 2025-10-06 14:13:00
<?xml version="1.0" standalone="yes"?>
<Paper uid="C92-2092">
  <Title>LE'ITING THE CAT OUT OF THE BAG: GENERATION FOR SHAKE-AND-BAKE MT</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
LE'ITING THE CAT OUT OF THE BAG:
GENERATION FOR SHAKE-AND-BAKE MT
CHRIS BREW
</SectionTitle>
    <Paragraph position="0"> Sharp Laboratories of Europe Ltd.</Paragraph>
    <Paragraph position="1"> Oxford Mon, May 11, 1992 chrisbrOprg, ac. ox. uk</Paragraph>
  </Section>
  <Section position="2" start_page="0" end_page="0" type="metho">
    <SectionTitle>
1 .Introduction
</SectionTitle>
    <Paragraph position="0"> This paper discusses an implementation of the sentence generation component of a Shake-and-Bake Machine Translation system.. Since the task itself is NP-complete, and therefore almost certainly intractable our algorithm is a heuristic method based on constraint propagation. We present preliminary evidence that this is likely to offer greater efficiency than previous algorithms.</Paragraph>
    <Paragraph position="1"> In SLE's approach to multilingual machine translation. \[Whitelock, 1991, this conference\] we envisage the process of sentence generation as beginning from a multiset or bag of richly structured lexical signs rather than from a conventional logical form or other underlying structure. The translation equivalences are stated between sets of lexical signs, with the superstructure of non-terminal symbols being no more than the means by which monolingual grammars are implemented The work described here was motivated by a desire to improve on a correct but inefficient algorithm provided by Whitelock \[Whitelock, 1991, this conference\]. We begin by introducing the problem, proceed by investigating its worst-case behaviour, and conclude by describing new algorithms for Shake-and-Bake generation.</Paragraph>
    <Paragraph position="2"> Since the linear order of the source language is not transferred into the bag, it is the business of the monolingual grammar writer to ensure that the word-order requirements of the target language are suitably encoded, and the business of the algorithm designer to ensure that this encoding is exploited as efficiently as possible. For an example of the grammar writer's responsibility, the difference between &amp;quot;Mary likes Frances&amp;quot; and &amp;quot;Frances likes Mary&amp;quot; can be encoded in the sharing of index variables between the proper nouns and the verb. For an example of the algorithm designer's responsibility, it would be a mistake (as Whitelock has noted) to provide a translation or generation algorithm which unintentionally unified the two index variables, leading to a reading in which &amp;quot;Mary&amp;quot; and &amp;quot;Frances&amp;quot; are alternative names for the same person.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.Shake-and-Bake Generation
2.1 Complexity results
2.1.1 Specification
</SectionTitle>
      <Paragraph position="0"> Shake-and-Bake generation has more in common with a parsing algorithm than with conventional generation from logical form or other underlying structure. The input to the task consists of the following elements:  * A set (B) of lexical signs having cardinality I B I.</Paragraph>
      <Paragraph position="1"> * A grammar (G) against which to parse  this input string.</Paragraph>
      <Paragraph position="2"> and a solution to the problem consists of * A parse of any sequence (S) such that S contains all the elements of B.</Paragraph>
      <Paragraph position="3"> The unordered nature of B is the difference between Shake-and-Bake generation and conventional CFG parsing. Although we are really interested in more expressive grammar frameworks, it will for the moment be convenient to assume that G is a simple context-free grammar. Since it is always possible to re-implement a CFG in the more expressive formalisms, the Shake-and-Bake generation problem for these formalisms is certainly at least as hard as the equivalent problem for CFGs 2.1.2.Upper bound Simply stating the Shake-and-Bake problem in these terms yields a naive generation algorithm and a minor technical result. The algorithm, which we shall call generate-and-test, is simply to feed the results of permuting the input bag to a standard context-free parser. The minor technical result, which will used to establish a complexity result in SS2.1.4., is that Shake-and-Bake generation is in NP. Once we note that * Context-free parsing is a polynomial process.</Paragraph>
      <Paragraph position="4"> * The &amp;quot;magical&amp;quot; non-determinism which NP allows is enough to permute the input string using no more than polynomial time and space.</Paragraph>
      <Paragraph position="5"> ACRES DE COLING-92. NANTES, 23-28 AOOT 1992 6 I 0 PROC. OF COLING-92. NANTES, AUG. 23-28, 1992 it becomes obvious that Shake-and-Bake generation falls within the definition of NP given by Garey and Johnson \[Garey and Johnson, 1979, p 321. This provides an upper bound on the complexity of Shake-and-Bake generation by showing it to be in NP (rather than being, for example, PSPACE hard or worse). All that remains to be shown is whether it also satisfies the definition of NP-completeness given on p38 of the same work.</Paragraph>
      <Paragraph position="6">  The purpose of this section is to establish a lower bound on the complexity of Shake-and-Bake generation. We do this by demonstrating that Shake-and-Bake generation is equivalent to the problem which Garey and Johnson \[Garey and Johnson, 1979, pp 50-53\] call THREE-DIMENSIONAL MATCHING, but which we prefer to refer to as the MENAGE A TROIS PROBLEM.</Paragraph>
      <Paragraph position="7"> This is a generalization to three dimensions of the well-known MARRIAGE PROBLEM. In the MARRIAGE PROBLEM the task is a constrained pairwise matching of elements from two disjoint sets, while in the MENAGE A TROIS PROBLEM, the task is the construction of triples based on elements from three disjoint sets. While the original two-dimensional problem is soluble in polynomial time, the three-dimensional analogue is NP-complete. It is therefore of interest to demonstrate a reduction from MENAGE A TROIS to the Shake-and-Bake generation problem, since this serves to establish the complexity class of the latter problem.</Paragraph>
      <Paragraph position="8"> 2__.1.4The MENAGE A TROIS in the bag The MENAGE A TROIS problem involves three sets A, B, C of identical cardinality n, having elements which we shall refer to as al...an, bl...bn and Cl...Cn, along with a set M of constraints each of which is a triple which represents a mutually acceptable mdnage ?~ trois . The overall goal is to</Paragraph>
    </Section>
    <Section position="2" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
2.2 .Conclusion
</SectionTitle>
      <Paragraph position="0"> It is highly unlikely that we will be able to find algorithms for Shake-and-Bake generation which are efficient in the general case: while it might conceivably turn out that NP-complete problems are, after all, soluble in polynomial time, they must for the moment be assumed intractable. We therefore proceed to the discussion of algorithms which are exponential in the worst case, but which do not necessarily exhibit the exponential behaviour unless the grammar is extremely unusual.</Paragraph>
      <Paragraph position="1"> 3.Improved Generation algorithms It may not be possible to find algorithms which come with a useful theoretical upper bound on runfind a set of three-way marriages selected from M such that every member of A, B and C participate in exactly one triple. Garey and Johnson provide a proof, after Karp, that the MENAGE A TROIS problem is equivalent to the standard problem of 3SAT. We now provide a polynomial-time reduction from an arbitrary instance of MENAGE A TROIS to an instance of Shake-and-Bake generation, which allows the same conclusion to be drawn for this problem.</Paragraph>
      <Paragraph position="2"> We start by forming an input string S containing all the elements of the three sets A,B,C, in any order. We then construct a contextofree gramnmr G from M, such that each constraint of the form {ai,bj,ck} corresponds to a distinct ternary production in G, with the form x --&gt; ai,bJ,c k.</Paragraph>
      <Paragraph position="3"> To complete the grammar we need a final production of the form The role of this production is to ensure that a parse can be achieved if and only if there is a way of covering the inpnt string with constraints. The construction of the grammar anti the input string is clearly a polynomial process. Context-free parsing has the property that a leaf node of the input string can only be directly dominated by one node of the final analysis tree, and by the definition of Shake-and-Bake generation given above the Shake-and-Bake process for G and S must succeed if and only if G admits, under the standard node-admissibility interpretation of context-free grammars, a string sl which is a permutation of 8. By combining the preparation described above with Shake-and-Bake generation, we obtain a solution of MENAGE A TROIS. Taken together with the result from SS2.l,this constitutes a demonstration that Shake-and-Bake generation is NPcomplete. null time cost, but it is still worth looking for ones which will provide acceptable behaviour for realistic inputs. This makes the assessment of such algorithms an empirical matter.</Paragraph>
      <Paragraph position="4"> 3.1.Whitelock's algorithm Whitelock's algorithm is a generalisation of Shift-Reduce parsing. It is an improvement on the naive generate-and-test outlined above, bnt exhibits exponential behaviour even on the sort of inputs which our MT system is likely to encounter. A case in point is found in the analysis of English adjectives. We shall be using the phrase &amp;quot;The fierce little brown cat&amp;quot; as our main example. ACRES DE COLING-92. NANTES. 23-28 AOtrr 1992 6 1 1 PROC. OF COLING-92. NANTES, AUG. 23-28. 1992 The fierce brown little cat The brown fierce little cat The brown little fierce cat The little brown fierce cat Figure I For the sake of argument suppose that we need to rule out the questionable versions of the phrase in Figure 1. It is not clear that these phrases are completely ungrammatical, but they serve the present purpose of providing an illustration, and most English speakers agree that they would only be produced in highly unusual circumstances.</Paragraph>
      <Paragraph position="5"> In order to cover this data in a unification grammar, we adopt the encoding shown in Figure 2. This states that &amp;quot;fierce&amp;quot; must precede &amp;quot;little&amp;quot; or &amp;quot;brown&amp;quot; if either of these are present, that &amp;quot;little&amp;quot; must precede &amp;quot;brown&amp;quot; if both are present. (The type assignments are based on the systematic  This set of type assignments prevents the dubious phrases listed in Figure 1, but still allows syntactically acceptable phrases such as 'The fierce cat&amp;quot;, &amp;quot;I'he little cat&amp;quot; and '~I'he little brown cat&amp;quot;. In principle, this means that generation from a bag produced by analysis of &amp;quot;La petite chatte fOroce et brune&amp;quot; will eventually yield the correct outcome. Unfortunately, for phrases like this one Whitelock's algorithm displays spectacular inefficiency.</Paragraph>
      <Paragraph position="6"> The fierce brown cat The fierce cat The brown cat The little cat The cat Figure 3 For example, the algorithm will construct the intermediate phrases shown in Figure 3, all of which eventually lead to an impasse because it is impossible to incorporate the remaining adjectives while respecting the prescribed ordering.The reason for this behaviour is that Whitelock's algorithm makes reductions on the basis of mere possibility, rather than taking account of the fact that all elements of the input bag must eventually be consumed.</Paragraph>
      <Paragraph position="7"> 3.2.Constraint propagation We are looking for a global property of the input bag which can be exploited to prune the search space involved in the generation process, and we wish to exploit the completeness property which Whitelock's algorithm neglects Van Benthem's \[1986\] observation that categorial grammars display a count invariant, while promising, cannot be directly applied to unification based grammars. As an alternative we develop an approach to Shake-and-Bake generation in which the basic generator is augmented with a simple constraint propagation algorithm \[Waltz, 19721. The augmented generator is potentially more efficient than Whitelock's, since the constraint propagation component helps to direct the generator's search for solutions.</Paragraph>
      <Paragraph position="8"> Ac~f.s DE COLING-92, NANTES. 23-28 Ao~'r 1992 6 l 2 Paoc. OF COLING-92, NAtCrES, AUG. 23-28, 1992  Our new algorithm relies on the ability to break a bag of signs into its component basic signs, and arranges these signs according to their nesting level. Nesting level is defined to be zero for the functor of a categorial sign, one for the functors of its direct arguments, two for the functors of any arguments which form part of these arguments, and so on. Thus the category a/(blc)ld has an a with nesting level 0, a b and a d with nesting level 1, and a c with nesting level 2. We organize the basic signs of the input bag into a graph in which two nodes are linked if and only if * Their nesting levels differ by exactly one.</Paragraph>
      <Paragraph position="9"> * They arise from different lexical items.</Paragraph>
      <Paragraph position="10"> These are necessary but not sufficient conditions for two basic signs to undergo unification in the course of a completed derivation.</Paragraph>
      <Paragraph position="11">  In the example of the fierce brown cat we obtain connections listed in Figure 4 and the graph shown in Figure 5 Figure 5 It simplifies the algorithm to hallucinate a dummy node corresponding to the &amp;quot;inverse&amp;quot; of the target category of the derivation ; this is node 0. The node numbers shown in Figure 5ff. correspond to those listed in Figure 4 .The structure is a directed graph, in which elements are linked if and only if they may stand in a functor/argument relationship.</Paragraph>
      <Paragraph position="12"> Figure 6 The results of doing this are shown in Figure 6. The task of parsing is reinterpreted as a search for a particular sort of spanning tree for the graph. Our new algorithm is an interleaving of Whitelock's shift reduce parsing algorithm with a constraint propagation component designed to facilitate early detection of situations in which no suitable spanning tree can be built. This helps to prune the search space, reducing the amount of unnecessary work carried out during generation.</Paragraph>
    </Section>
  </Section>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
ACRES DE COLING-92, NAN-X~S, 23-28 AO~l' 1992 6 1 3 PROC. or COLING-92, NxrcrEs, AO6.23-28. 1992
</SectionTitle>
    <Paragraph position="0"> We know that each node of the structure must participate in exactly one functor/argument relationship, but in order to distinguish between those elements which may stand in such a relationship and those which actually form part of a complete solution, it is necessary to propagate constraints through the graph. In order to do this it is convenient to add construction lines linking signs in functor position to the corresponding signs which occur in their argument positions.</Paragraph>
    <Paragraph position="1"> In Figure 6 we can immediately see that node 3 must be connected to node 2, since there are no other links leading away from node 3. Similarly the link from 9 to 8 must be present in any spanning tree, since there is no other way of reaching node 8. Node I must be connected to node 0 for analogous reasons.</Paragraph>
    <Paragraph position="2"> Once these links have been established, we can delete alternative links which they preclude.</Paragraph>
    <Paragraph position="3"> This results in the deletion of the lines from node 9 to nodes 6, 4 and 2, and that of the line from 7 to 2. This produces Figure 7.The resulting system can once again be simplified by deleting the line from  of the phrase in question. In this example the constraints encoded in the graph are sufficient to drive the analysis to a unique conclusion, without further search, but this will not always happen.</Paragraph>
    <Paragraph position="4"> We need a combination of constraint propagation with a facility for making (reasonably intelligent) guesses when confronted with a choice of alternatives. This is described in the next section.</Paragraph>
    <Paragraph position="5"> 4.2.The code We combine the constraint propagation mechanism with Whitelock's original shift-reduce parser, propagating constraints after every reduction step. The parser has the role of systematically choosing between alternative reductions, while the constraint propagation mechanism tills in the consequences of a particular set of choices.</Paragraph>
    <Paragraph position="6"> Listing 1 provides a schematic Prolog implementation of the algorithm described in this section. The code is essentially that of a shift-reduce parser, with the following modifications * One of the elements in a reduction is taken from the top of the stack, while the other is taken from anywhere in the tail of the stack. This idea, due to Whitelock and Reape, ensures that the input is treated as a bag rather than a string.</Paragraph>
    <Paragraph position="7"> * At initialization a constraint graph is constructed. Every time a reduction is proposed the constraint propagation component is informed, allowing it to (reversibly) update the graph by propagating constraints. Constraint propagation may fail if the constraint mechanism is able to show that there will be no way of completing a suitable spanning tree given the choices which have been made by the shift-reduce component.</Paragraph>
    <Paragraph position="8"> In this algorithm it is the role of the shift-reduce component to make guesses, and the role of the constraint solver to follow through the consequences of these guesses. In the limit this will clearly reduce to an inefficient implementation of exhausitive search, but this should not be a surprise given the NP-completeness of the task.</Paragraph>
    <Section position="1" start_page="0" end_page="0" type="sub_section">
      <SectionTitle>
4.3.Results
</SectionTitle>
      <Paragraph position="0"> We have conducted an experiment to show the relative performance of the two algorithms.</Paragraph>
      <Paragraph position="1"> Figure 8 shows the number of reductions which were carried out by each algorithm in dealing with a range of sentences about fierce cats and tame foxes (The talk of cats and bags is because we are trying to get a CATegory out of a BAG. The constraint propagation algorithm attempts substantially fewer reductions than the original in all cases, with an increasing performance advantage for longer sentences. This remains true even when real-time measurements are used, although the difference is less marked because of the overhead of the constraint propagation algorithm.</Paragraph>
      <Paragraph position="2">  These preliminary results must obviously be interpreted with some caution, since the examples were specially constructed. Further work is in hand to test the performance of the algorithms on larger grammars and more realistic sentences.</Paragraph>
      <Paragraph position="3"> Because the problem is NP-complete, it is most unlikely that there is an algorithm which will prove efficient in all cases, but the algorithm described here already provides worthwhile improvements in practice, and there is considerable scope for further improvement. For example, for grammars related to HPSG it seems probable that considerable benefit would be gained from adding a constraint propagation component to an unordered version of a head-corner parsing algorithm, as described by Van Noord \[Van Noord, 1991\]. Alternatively, it may be that constraint graphs, like the LR parsing tables described by Briscoe and Carroll \[Briscoe and Carroll, 1991\], are suitable locations for the storage of probabilistic information derived from the analysis of corpora</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML