File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/95/p95-1035_intro.xml
Size: 4,489 bytes
Last Modified: 2025-10-06 14:05:52
<?xml version="1.0" standalone="yes"?> <Paper uid="P95-1035"> <Title>An Efficient Generation Algorithm for Lexicalist MT</Title> <Section position="2" start_page="0" end_page="261" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> Lexicalist approaches to MT, particularly those incorporating the technique of Shake-and-Bake generation (Beaven, 1992a; Beaven, 1992b; Whitelock, 1994), combine the linguistic advantages of transfer (Arnold et al., 1988; Allegranza et al., 1991) and interlingual (Nirenburg et al., 1992; Dorr, 1993) approaches. Unfortunately, the generation algorithms described to date have been intractable. In this paper, we describe an alternative generation component which has polynomial time complexity.</Paragraph> <Paragraph position="1"> Shake-and-Bake translation assumes a source grammar, a target grammar and a bilingual dictionary which relates translationally equivalent sets of lexical signs, carrying across the semantic dependencies established by the source language analysis stage into the target language generation stage.</Paragraph> <Paragraph position="2"> The translation process consists of three phases: 1. A parsing phase, which outputs a multiset, or bag, of source language signs instantiated with sufficiently rich linguistic information established by the parse to ensure adequate translations. null 2. A lexical-semantic transfer phase which em null ploys the bilingual dictionary to map the bag *We wish to thank our colleagues Kerima Benkerimi, David Elworthy, Peter Gibbins, Inn Johnson, Andrew Kay and Antonio Sanfilippo at SLE, and our anonymous reviewers for useful feedback and discussions on the research reported here and on earlier drafts of this paper. of instantiated source signs onto a bag of target language signs.</Paragraph> <Paragraph position="3"> 3. A generation phase which imposes an order on the bag of target signs which is guaranteed grammatical according to the monolingual target grammar. This ordering must respect the linguistic constraints which have been transferred into the target signs.</Paragraph> <Paragraph position="4"> The Shake-an&Bake generation algorithm of (Whitelock, 1992) combines target language signs using the technique known as generate-and-test. In effect, an arbitrary permutation of signs is input to a shift-reduce parser which tests them for grammatical well-formedness. If they are well-formed, the system halts indicating success. If not, another permutation is tried and the process repeated. The complexity of this algorithm is O(n!) because all permutations (n! for an input of size n) may have to be explored to find the correct answer, and indeed must be explored in order to verify that there is no answer.</Paragraph> <Paragraph position="5"> Proponents of the Shake-and-Bake approach have employed various techniques to improve generation efficiency. For example, (Beaven, 1992a) employs a chart to avoid recalculating the same combinations of signs more than once during testing, and (Popowich, 1994) proposes a more general technique for storing which rule applications have been attempted; (Brew, 1992) avoids certain pathological cases by employing global constraints on the solution space; researchers such as (Brown et al., 1990) and (Chen and Lee, 1994) provide a system for bag generation that is heuristically guided by probabilities. However, none of these approaches is guaranteed to avoid protracted search times if an exact answer is required, because bag generation is NP-complete (Brew, 1992).</Paragraph> <Paragraph position="6"> Our novel generation algorithm has polynomial complexity (O(n4)). The reduction in theoretical complexity is achieved by placing constraints on the power of the target grammar when operating on instantiated signs, and by using a more restrictive data structure than a bag, which we call a target language normalised commutative bracketing (TNCB). A TNCB records dominance information from derivations and is amenable to incremental updates. This allows us to employ a greedy algorithm to refine the structure progressively until either a target constituent is found and generation has succeeded or no more changes can be made and generation has failed.</Paragraph> <Paragraph position="7"> In the following sections, we will sketch the basic algorithm, consider how to provide it with an initial guess, and provide an informal proof of its efficiency.</Paragraph> </Section> class="xml-element"></Paper>