File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/h05-1101_metho.xml
Size: 23,420 bytes
Last Modified: 2025-10-06 14:09:33
<?xml version="1.0" standalone="yes"?> <Paper uid="H05-1101"> <Title>Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 803-810, Vancouver, October 2005. c(c)2005 Association for Computational Linguistics Some Computational Complexity Results for Synchronous Context-Free Grammars</Title> <Section position="3" start_page="803" end_page="804" type="metho"> <SectionTitle> 2 Synchronous context-free grammars </SectionTitle> <Paragraph position="0"> Several definitions for synchronous context-free grammars have been proposed in the literature; see for instance (Chiang, 2004; Chiang, 2005). Our definition is based on syntax-directed translation schemata (SDTS; Aho and Ullman, 1972), with the difference that we do not impose the restriction that two paired context-free productions have the same left-hand side. As it will be discussed in Section 4, this results in an enriched generative capacity when probabilistic extensions are considered. We assume the reader is familiar with the definition of context-free grammar (CFG) and with the associated notion of derivation.</Paragraph> <Paragraph position="1"> Let VN and VT be sets of nonterminal and terminal symbols, respectively. In what follows we need to represent bijections between all the occurrences of nonterminals in two strings over VN [?] VT. This can be done by annotating nonterminals with indices from an infinite set. We define I(VN) = {A(t) | A [?] VN, t [?] N} and VI = I(VN) [?] VT. We write index(g), g [?] V [?]I , to denote the set of all indices (the integers t) that appear in symbols in g.</Paragraph> <Paragraph position="2"> Two strings g,gprime [?] V [?]I are synchronous if each index in index(g) occurs only once in g, each index in index(gprime) occurs only once in gprime, and index(g) = index(gprime). Therefore synchronous strings have the general form</Paragraph> <Paragraph position="4"> I(VN), ti negationslash= tj for i negationslash= j and pi is some permutation defined on set {1,...,r}.</Paragraph> <Paragraph position="5"> Definition 1 A synchronous context-free grammar (SCFG) is a tuple G = (VN,VT,P,S), where VN, VT are finite, disjoint sets of nonterminal and terminal symbols, respectively, S [?] VN is the start symbol and P is a finite set of synchronous productions, each of the form [A1 - a1, A2 - a2], with A1,A2 [?] VN and a1,a2 [?] V [?]I synchronous strings.</Paragraph> <Paragraph position="6"> The size of a SCFG G is defined as |G |=summationtext [A1-a1, A2-a2][?]P |A1a1A2a2|. Based on an example from (Yamada and Knight, 2001), we provide asampleSCFGfragmenttranslatingfromEnglishto Japanese, specified by means of the following synchronous productions:</Paragraph> <Paragraph position="8"> Note that in production s2 above, the nonterminals VB and TO generated from nonterminal VB2 in the English component are inverted in the Japanese component, where some additional lexical material is also added.</Paragraph> <Paragraph position="9"> In a SCFG, the 'derives' relation is defined on synchronous strings in terms of simultaneous rewriting of two nonterminals with the same index. Some additional notation will help us defining this relation precisely. A reindexing is a one-to-one function onN. We extend a reindexing f to VI by letting f(A(t)) = A(f(t)) for A(t) [?] I(VN) and f(a) = a for a [?] VT. We also extend f to strings in V [?]I by letting f(e) = e and f(Xg) = f(X)f(g), for each X [?] VI and g [?] V [?]I . We say that strings g1,g2 [?] V [?]I are independent if index(g1)[?] index(g2) = [?]. Definition 2 Let G = (VN,VT,P,S) be a SCFG and let g1,g2 be synchronous strings in V [?]I . The derives relation [g1, g2] =G [d1, d2] holds whenever there exist an index t in index(g1), a synchronous production [A1 - a1, A2 - a2] in P and some reindexing f such that (i) f(a1a2) and g1g2 are independent; and (ii) gi = gprimeiA(t)i gprimeprimei , di = gprimeif(ai)gprimeprimei , for i = 1,2. We also write [g1, g2] =sG [d1, d2] to explicitly indicate that the derives relation holds through some synchronous production s [?] P.</Paragraph> <Paragraph position="10"> Since d1 and d2 in Definition 2 are synchronous strings, we can define the reflexive and transitive closure of =G, written =[?]G. This relation is used to represent derivations in G. In case we have [g1i[?]1, g2i[?]1] =siG [g1i, g2i] for 1 [?] i [?] n, n [?] 1, we also write [g10, g20] =sG [g1n, g2n], where s = s1s2 ***sn. We always assume some canonical form for derivations (as for instance left-most derivation on the left component). Similarly to the case of context-free grammars, each derivation in G can be associated with a pair of parse trees, that is, one parse tree for each dimension.</Paragraph> <Paragraph position="11"> Back to our example, we report a fragment of a derivation of the string pair [he adores listening to music, kare ha ongaku wo kiku no ga daisuki desu]:</Paragraph> <Paragraph position="13"> kare ha TO(6) VB(5) ga daisuki desu].</Paragraph> <Paragraph position="14"> The translation generated by a SCFG G is a binary relation over V [?]T defined as</Paragraph> <Paragraph position="16"> The set of strings that are translations of a given string w1 is defined as:</Paragraph> <Paragraph position="18"> A probabilistic SCFG (PSCFG) is a pair (G,pG) where G = (VN,VT,P,S) is a SCFG and pG is a function from P to real numbers in [0,1] such that,</Paragraph> <Paragraph position="20"/> </Section> <Section position="4" start_page="804" end_page="807" type="metho"> <SectionTitle> 3 The membership problem </SectionTitle> <Paragraph position="0"> We consider here the membership problem for SCFG, defined as follows: for input instance a SCFG G and a pair [w1, w2], decide whether [w1, w2] is in T(G). This problem has been considered for instance in (Wu, 1997) for his inversion transduction grammars and has applications in the support of several tasks of automatic annotation of parallel corpora, as for instance segmentation, bracketing, phrasal and word alignment. We show that the membership problem for SCFGs is NPhard. The result could be derived from the findings in (Melamed et al., 2004) that synchronous rewriting systemsasSCFGsarerelatedtotheclassofsocalled linear context-free rewriting systems (LCFRSs) and from the result that the membership problem for LCFRSs is NP-hard (Satta, 1992; Kaji and others, 1994). However, we provide here a direct proof, to simplify the presentation.</Paragraph> <Paragraph position="1"> Theorem 1 The membership problem for SCFGs is NP-hard.</Paragraph> <Paragraph position="2"> Proof. We reduce from the three-satisfiability problem (3SAT, Garey and Johnson, 1979). Let <U,C> be an instance of the 3SAT problem, where U = {u1,...,up} is a set of variables and C = {c1,...,cn} is a set of clauses. Each clause is a set of three literals from {u1,u1,...,up,up}.</Paragraph> <Paragraph position="3"> The general idea of the proof is to use a string pair [w1w2 ***wp, wc], where wc is a string representation of C and each wi is a string controlling the truth assignment for the variable ui. We then construct a SCFG G such that each wi can be derived in two possible ways only, using some specialized productions of G, encoding the truth assignment of variable ui. In this way the derivation of the whole string w1 ***wp in the left dimension corresponds to a guess of a truth assignment for U. Accordingly, on the right dimension only those symbols of wc will be derived that represent clauses that hold true under the guessed assignment.</Paragraph> <Paragraph position="4"> We need some additional notation. Below we treat C as an alphabet of atomic symbols. We use a function d such that, for every i with 1 [?] i [?] p, cd(i,1),cd(i,2),...,cd(i,si) is the sequence of all clauses that include literal ui, in the left to right order in which they appear within c1c2 ***cn, and cd(i,si+1),cd(i,si+2),...,cd(i,ti) isthesequenceofall clauses that include literal ui, again as they appear within c1c2 ***cn from left to right. Note that we must have summationtextpi=1 ti = 3n. We also use a function e such that, for every 1 [?] i [?] p and 1 [?] j [?] ti, e(i,j) = j +summationtexti[?]1k=1 tk (assumesummationtext0k=1 tk = 0). Consider the alphabet {ai,bi |1 [?] i [?] p}. For every i, 1 [?] i [?] p, let wi denote a sequence of exactly ti + 1 alternating symbols ai and bi, i.e., wi [?] (aibi)+ [?] (aibi)[?]ai. For every 1 [?] i [?] p, let x(i,1) = aibi and let x(i,h) = ai (resp. bi) if h is even (resp. odd), 2 [?] h [?] ti. Let also x(i,h) = ai (resp. bi) if h is odd (resp.</Paragraph> <Paragraph position="5"> even), 1 [?] h [?] ti [?] 1, and let x(i,ti) = aibi (resp. biai) if ti is odd (resp. even). Therefore we can write wi = x(i,1)x(i,2)***x(i,t1) = x(i,1)x(i,2)***x(i,t1).</Paragraph> <Paragraph position="6"> Finally, we need a permutation pi defined on the set {1,...,3n} as follows. Fix i and j with 1 [?] i [?] p and 1 [?] j [?] ti, and let h be the number of occurrences of the clause cd(i,j) found in the sequence cd(1,1), cd(1,2), ..., cd(1,t1), cd(2,1), ..., cd(i,j). Note that we must have 1 [?] h [?] 3. Then we set</Paragraph> <Paragraph position="8"> We can now define the target instance <G,[w,wprime]> of our reduction. Let [w,wprime] =</Paragraph> <Paragraph position="10"> below define set P: (i) for every 1 [?] i [?] p:</Paragraph> <Paragraph position="12"> It is easy to see that |G|, |w |and |wprime |are polynomially related to |U |and |C|. From a derivation of [w,wprime] [?] T(G), we can exhibit a truth assignment that satisfies C simply by reading off the derivation of the left string w1w2 ***wp. Conversely, starting fromatruthassignmentthatsatisfiesC wecanprove w [?] L(G)bymeansof(finite)inductionon|U|: this part requires a careful inspection of all items in the definition of G.</Paragraph> <Paragraph position="13"> From Theorem 1 we may conclude that algorithms for the membership problem for SCFGs are very unlikely to run in polynomial time. In the literature, several algorithms for this problem have been proposed using tabular methods (chart parsing). In the worst case, all these algorithms run in time Th(|G|*nk(G)), with G an SCFG and n the length of the input string pair. We know that, unless P = NP, k(G) cannot be a constant. We now prove a lower bound on k(G), providing thereby an exponential time lower bound result for our problem under the assumption of the tabular paradigm.</Paragraph> <Paragraph position="14"> Tabular methods for the membership problem are based on the following representation. Given a synchronous production</Paragraph> <Paragraph position="16"> the already recognized constituent pairs B1i,B2pi(i) are gather together in several steps, keeping a record of the spanned substrings of the input. To provide a concrete example, if we gather all the B1i's on the left dimension from left to right, the partial analysis we obtain after the first step can be represented as a state <s(1), (i11,j11), (i21,j21)> , meaning that B11 and B2pi(1) span substrings w1[i11,j11] and w2[i21,j21], respectively.1 At the second step we have a state <s(2), (i11,j12), (i21,j21),</Paragraph> <Paragraph position="18"> spans w2[i22,j22]. We can see that, for some worst casepermutations, theleft-to-rightstrategydemands for increasingly more pairs of indices, so that the exponent in the time complexity linearly grows with r.</Paragraph> <Paragraph position="19"> How much better can we do, if we exploit some strategy other than the left-to-right above? More precisely, we ask how many unconnected spannings a state may require for some worst case permutation pi, under the choice of the best possible parsing strategy for pi itself.</Paragraph> <Paragraph position="20"> Theorem 2 In the worst case, standard tabular methods for the SCFG membership problem require an amount of time Ohm(|G|nc*[?]r), with r the length of the longest production in G and c a constant.</Paragraph> <Paragraph position="21"> Proof. For any r [?] 8 we let q = floorleftradicalbigr/2floorright [?] floorleftradicalbig8/2floorright = 2, and define a permutation pir on {1,...,r}. We view the domain of pir as composed of 2q blocks with q adjacent integers each, possibly followed by r [?] 2q2 additional &quot;padding&quot; integers, and its codomain as composed of q blocks</Paragraph> <Paragraph position="23"> with 2q adjacent integers each, again possibly followed by r [?] 2q2 &quot;padding&quot; integers. Permutation pir transposes all blocks by sending the j-th element of the i-th block in the domain into the i-th element of the j-th block in the codomain, while mapping each padding integer identically into itself. Formally, for all positive integers i [?] 2q and j [?] q, pir(q * (i [?] 1) + j) = 2q * (j [?] 1) + i, and for all integers i with 2q2 < i [?] r, pir(i) = i.</Paragraph> <Paragraph position="24"> We count below how many spans are instantiated by a state that has gathered p constituent pairs, 1 [?] p [?] r, in parsing production (1) under any possible strategy. When a constituent pair B1i,B2pir(i) is gathered, we say integer i in the domain of pir and integer pir(i) in the codomain have been pebbled. In this way each span (i,j) in a state corresponds to some run i,i + 1,...j of pebbled integers, with either i = 1 or i[?]1 unpebbled, and with either j = r or j + 1 unpebbled. We call each such run a segment, and show that every parsing strategy demands at least q = floorleftradicalbigr/2floorright segments either in the domain or in the codomain of pir.</Paragraph> <Paragraph position="25"> We say that a block in the domain of pir is empty, full, or mixed if, respectively, none, all, or some but not all of its elements have been pebbled. Assume that, for a given parsing strategy, the last block that becomes mixed does so when we place the i-th pebble, and the first block that becomes full does so when we place the j-th pebble. Obviously i negationslash= j: the first pebble placed in a previously empty block can not make it full since every block contains at least 2 elements.</Paragraph> <Paragraph position="26"> If i < j, after placing the i-th pebble and before placing the j-th pebble every block in the domain of pir is mixed. Each of these 2q blocks then contains at least one pebbled element which is adjacent to an unpebbled one and must therefore be either the first or the last element of a segment. The domain of pir then contains at least 2q/2 = q segments.</Paragraph> <Paragraph position="27"> If j < i, after placing the j-th pebble and before placing the i-th pebble at least one block in the domain of pir (e.g., the h-th block) is full, and at least one (e.g., the k-th) is empty. Then, in each of the q blocks in the codomain of pir, the h-th element is pebbled while the k-th is not. Therefore the h-th elements of any two consecutive blocks in the codomain of pir must belong to two distinct segments, since at least one intermediate element is not pebbled. The codomain of pir then contains at least q segments.</Paragraph> </Section> <Section position="5" start_page="807" end_page="808" type="metho"> <SectionTitle> 4 The translation problem </SectionTitle> <Paragraph position="0"> In this section we consider some formulations of the translation problem for PSCFG that have been proposed in the literature. The most general definition of the translation problem for PSCFG is this: for an input PSCFG Gp = (G,pG) and an input string w, produce a representation of all possible parse trees, alongwiththeirprobabilities, thatareassigned byGto a string in the setT(G,w)under some translation of w.</Paragraph> <Paragraph position="1"> Variant of this definition can be found where the input is a single parse tree for w (Yamada and Knight, 2001), or where the output is a single parse tree, chosen according to some specific criteria (Wu andWong, 1998). Toformallystudytheseproblems, in what follows we focus on single parse trees associated with derivations in Gp. For a derivation s of the form [S(1),S(1)] =sG [w1,w2], we write ts,l and ts,r to denote the left and the right parse trees, respectively, associated with s. The probability that ts,r is obtained as a translation of ts,l through Gp is thus pG([ts,l, ts,r]) = pG(s). Let t be some parse tree; we write y(t) to denote the string in the yield of t. For a string w [?] V [?]T and a parse tree t, we also consider the probability that t is obtained from w through Gp, defined as: pG([w, t]) = summationdisplay y(tprime)=w pG([tprime, t]). (2) We can now precisely define the variants of the translation problem we are interested in. Given as input a PSCFG Gp = (G,pG) and two strings</Paragraph> <Paragraph position="3"> pG([t1, t2]). (3) If the synchronous productions in the underlying SCFG G have length bounded by some constant, then the above problem can be solved in polynomial time using extensions of the Viterbi search strategy to parse forests. This has been shown for instance in (Wu and Wong, 1998; Yamada and Knight, 2001; Melamed, 2004).</Paragraph> <Paragraph position="4"> A second interesting problem is defined as follows. Given as input a PSCFG Gp = (G,pG) and a string w [?] V [?]T , output the parse tree argmaxt pG([w, t]). (4) Even in case we impose some constant bound on the length of the synchronous productions in G, the above problem is NP-hard, as we show in what follows. null We assume the reader is familiar with the definition of probabilistic context-free grammar (PCFG) and with the associated notion of derivation probability (Wetherell, 1980). We denote a PCFG as a pair (G,pG), with G = (VN,VT,P,S) the underlying context-free grammar and pG the associated function providing the probability distributions for the productions in P, conditioned on their left-hand side. A probabilistic regular grammar (PRG) is a PCFG with underlying productions of the form A - aB or A - e, with A,B nonterminal symbols and a a terminal symbol.</Paragraph> <Paragraph position="5"> We consider below a decision problem associated with PRG, called the consensus problem, defined as follows: Given as input a PRG (G,pG) and a rational number d [?] [0,1], decide whether there exists a string w in the language generated by G such that pG(w) [?] d. It has been shown in (Casacuberta and de la Higuera, 2000) that, for a PRG G whose productions have all probabilities expressed by rational numbers, the above problem is NP-complete. (Essentially the same result is also reported in (Lyngso and Pedersen, 2002), stated in terms of hidden Markov models.) We reduce the consensus problem for PRG to a decision version of the problem in (4), called the best translated derivation problem and defined as follows. Given as input a PCFG Gp = (G,pG), a string w [?] V [?]T and a rational number d [?] [0,1], decide whether maxt pG([w, t]) [?] d.</Paragraph> <Paragraph position="6"> Theorem 3 The best translated derivation problem for the class PSCFG is NP-hard.</Paragraph> <Paragraph position="7"> Proof. We provide a reduction from the consensus problem for the class PRG with rational production probabilities. Themainideaisdescribedinwhatfollows. Given the input PRG Gp, we construct a target PSCFG Gprimep that translates string $ into $, with $ a special symbol. Given as input the string $, Gprimep simulates all possible derivations of Gp through its own derivations. This is done by encoding the nonterminals appearing in a derivation r of Gp within the left component of some derivation s of Gprimep, and by encoding the terminal string generated by r within the right component of s. The probability of r is also preserved by s.</Paragraph> <Paragraph position="8"> Let Gp = (G,pG), d be an instance of the consensus problem as above, with G = (VN,VT,P,S).</Paragraph> <Paragraph position="9"> We specify a PSCFG Gprimep = (Gprime,pGprime) with Gprime = (V primeN,{$},Pprime,S) and V primeN = VN [?]VT. Set Pprime is constructed as follows: (i) for every (S - aA) [?] P, s : [S - A(1), S a(1)] is added to Pprime, with pGprime(s) = pG(S aA); null (ii) for every (S - e) [?] P, s : [S - $, S - $] is added to Pprime, with pGprime(s) = pG(S - e); (iii) for every a [?] VT and (A - bB) [?] P, s :</Paragraph> <Paragraph position="11"> Note that the construction of Gprimep can be carried out in quadratic time in the size of Gp. It is not difficult to see that there exists a derivation of the form S =G a1A1 =G a1a2A2 *** =G a1a2 ***anAn if and only if there exist a derivation in Gprime associated with unary trees t1 and t2, such that string SA1A2 ***An is read from the spine of t1 and string Sa1a2 ***an is read from the spine of t2. Furthermore, the two derivations are composed of 'corresponding' productions with the same probabilities.</Paragraph> <Paragraph position="12"> We conclude that there exists a string w in L(G) with pG(w) > d if and only if there exists a unary tree t with string Sw$ read from the spine such that pGprime([$,t]) > d.</Paragraph> <Paragraph position="13"> We discuss below an interesting consequence of Theorem 3. The SDTS formalism discussed in Section 1 has been extended to the probabilistic case in (Maryanski and Thomason, 1979), called stochastic SDTS (SSDTS). As a corollary to the proof of Theorem 3, we obtain that one can define, through some PSCFG Gp and some fixed string w, a probability distribution pG([w,t]) on parse trees that cannot be obtained through any SSDTS. Without providing the details of the definition of SSDTS, we give here only an outline of the proof. We also assume that the reader is familiar with probabilistic finite automata and with their distributional equivalence with PRG.</Paragraph> <Paragraph position="14"> Consider the PSCFG Gprimep = (Gprime,pGprime) defined in the proof of Theorem 3, and assume there exists some SSDTS Gprimeprimep = (Gprimeprime,pGprimeprime) such that, for every tree t, we have pGprimeprime([$,t]) = pGprime([$,t]). Since in a derivationofanSDTSthegeneratedtreesarealways isomorphic, up to some reordering of sibling nodes, we obtain that the productions of Gprimeprime must have the form [S - a(1), S - a(1)], [a - b(1), a - b(1)] and [a - $, a - $]. From these productions we can construct a probabilistic deterministic finite automaton generating the same language as the PRG Gp, and with the same distribution. But this is impossible since there are string distributions defined by some PRG that cannot be obtained through probabilistic deterministic finite automata; see for instance (Vidal et al., 2005).</Paragraph> <Paragraph position="15"> We conclude by remarking that in (Casacuberta and de la Higuera, 2000) it is shown that finding the best output string for a given input string is NP-hard for stochastic SDTS with a single nonterminal in each production's right-hand side. Our result in Theorem3, statedforPSCFG,isstronger, sinceitinvestigates individual parse trees rather than strings.</Paragraph> </Section> <Section position="6" start_page="808" end_page="808" type="metho"> <SectionTitle> 5 Concluding remarks </SectionTitle> <Paragraph position="0"> The presented results are based on worst case analysis: further experimental evaluation needs to be carried out on multilingual corpora in order to asses the practical impact of these findings.</Paragraph> </Section> class="xml-element"></Paper>