Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 803–810, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Some Computational Complexity Results
for Synchronous Context-Free Grammars
Giorgio Satta
Dept. of Information Engineering
University of Padua
via Gradenigo, 6/A
I-35131 Padova
Italy
satta@dei.unipd.it
Enoch Peserico
Dept. of Information Engineering
University of Padua
via Gradenigo, 6/A
I-35131 Padova
Italy
enoch@dei.unipd.it
Abstract
This paper investigates some computa-
tional problems associated with proba-
bilistic translation models that have re-
cently been adopted in the literature on
machine translation. These models can be
viewed as pairs of probabilistic context-
freegrammarsworkingina‘synchronous’
way. Two hardness results for the class
NP are reported, along with an exponen-
tial time lower-bound for certain classes
of algorithms that are currently used in the
literature.
1 Introduction
State of the art architectures for machine transla-
tion are all based on mathematical models called
translation models. Generally speaking, a transla-
tion model accounts for all the elementary opera-
tions that rule the process of translation between the
words and the different word orderings of the source
and target languages. Translation models are usu-
ally enriched with statistical parameters, to drive the
searchtowardthemostlikelytranslation(s). Special-
ized algorithms are provided for the automatic esti-
mation of these parameters from corpora of trans-
lation pairs. Besides the task of natural language
translation, statistical translation models are also ex-
ploited in other applications, such as word align-
ment, multilingual document retrieval and automatic
dictionary construction.
The most successful translation models that are
found in the literature exploit finite-state machinery.
The approach started with the so-called IBM mod-
els (Brown et al., 1988), implementing a set of ele-
mentary operations, such as movement, duplication
and translation, that independently act on individ-
ual words in the source sentence. These word-to-
word models have been later enriched with the in-
troduction of larger units such as phrases; see for
instance (Och et al., 1999; Och and Ney, 2002).
Still, the generative capacity of these models lies
within the realm of finite-state machinery (Kumar
and Byrne, 2003), so they are unable to handle
nested structures and do not provide the expressivity
required to process language pairs with very differ-
ent word orderings.
Recently, more sophisticated translation models
have been proposed, borrowing from the theory of
compilers and making use of synchronous rewrit-
ing. In synchronous rewriting, two formal gram-
mars are exploited, one describing the source lan-
guage and the other describing the target language.
Furthermore, the productions of the two gram-
mars are paired and, in the rewriting process, such
pairs are always applied synchronously. Formalisms
based on synchronous rewriting have been empow-
ered with the use of statistical parameters, and spe-
cialized estimation and translation (decoding) algo-
rithms were newly developed. Among the several
proposals, we mention here the models presented
in (Wu, 1997; Wu and Wong, 1998), (Alshawi et al.,
2000), (Yamada and Knight, 2001), (Gildea, 2003)
and (Melamed, 2003).
In this paper we consider synchronous models
based on context-free grammars and probabilistic
extensions thereof. This is the most common choice
803
in statistical translation models that exceed the gen-
erative power of finite-state machinery. We focus
on two associated computational problems that have
been defined in the literature. One is the member-
ship problem, which involves testing whether an in-
put string pair can be generated by the model. The
other is the translation problem (also called the de-
coding problem) which involves the search for a
suitable translation of an input string/structure. It
has been often informally stated in the literature
that the use of structured models results in efficient,
polynomial time algorithms for the above problems.
We show here that sometimes this is not the case.
The contribution of this paper can be stated as fol-
lows:
• we show that the membership problem is NP-
hard, unless a constant bound is imposed on the
length of the productions (Section 3);
• we show an exponential time lower bound for
the membership problem, in case chart parsing
is adopted (Section 3);
• we show that translating an input string into
the best parse tree in the target language is NP-
hard, even in case productions are bounded in
length (Section 4).
Investigation of the computational complexity of
translation models has started in (Knight, 1999) for
word-to-word models. This paper can be seen as the
continuation of that line of research.
2 Synchronous context-free grammars
Several definitions for synchronous context-free
grammars have been proposed in the literature; see
for instance (Chiang, 2004; Chiang, 2005). Our
definition is based on syntax-directed translation
schemata (SDTS; Aho and Ullman, 1972), with the
difference that we do not impose the restriction that
two paired context-free productions have the same
left-hand side. As it will be discussed in Section 4,
this results in an enriched generative capacity when
probabilistic extensions are considered. We assume
the reader is familiar with the definition of context-
free grammar (CFG) and with the associated notion
of derivation.
Let VN and VT be sets of nonterminal and termi-
nal symbols, respectively. In what follows we need
to represent bijections between all the occurrences
of nonterminals in two strings over VN ∪ VT. This
can be done by annotating nonterminals with indices
from an infinite set. We define I(VN) = {A(t) |
A ∈ VN, t ∈ N} and VI = I(VN) ∪ VT. We
write index(γ), γ ∈ V ∗I , to denote the set of all in-
dices (the integers t) that appear in symbols in γ.
Two strings γ,γprime ∈ V ∗I are synchronous if each in-
dex in index(γ) occurs only once in γ, each index
in index(γprime) occurs only once in γprime, and index(γ) =
index(γprime). Therefore synchronous strings have the
general form
u10A(t1)11 u11A(t2)12 u12 ··· u1r−1A(tr)1r u1r,
u20A(tpi(1))21 u21A(tpi(2))22 u22 ··· u2r−1A(tpi(r))2r u2r,
where r ≥ 0, u1i,u2i ∈ V ∗T , A(ti)1i ,A(tpi(i))2i ∈
I(VN), ti negationslash= tj for i negationslash= j and pi is some permuta-
tion defined on set {1,...,r}.
Definition 1 A synchronous context-free gram-
mar (SCFG) is a tuple G = (VN,VT,P,S), where
VN, VT are finite, disjoint sets of nonterminal and
terminal symbols, respectively, S ∈ VN is the start
symbol and P is a finite set of synchronous produc-
tions, each of the form [A1 → α1, A2 → α2], with
A1,A2 ∈ VN and α1,α2 ∈ V ∗I synchronous strings.
The size of a SCFG G is defined as |G| =summationtext
[A1→α1, A2→α2]∈P |A1α1A2α2|. Based on an ex-
ample from (Yamada and Knight, 2001), we provide
asampleSCFGfragmenttranslatingfromEnglishto
Japanese, specified by means of the following syn-
chronous productions:
s1 : [VB → PRP(1) VB1(2) VB2(3),
VB → PRP(1) VB2(3) VB1(1)]
s2 : [VB2 → VB(1) TO(2),
VB2 → TO(2) VB(1) ga]
s3 : [TO → TO(1) NN(2), TO → NN(2) TO(1)]
s4 : [PRP → he, PRP → kare ha]
s5 : [VB1 → adores, VB1 → daisuki desu]
s6 : [VB → listening, VB → kiku no]
s7 : [TO → to, TO → wo]
s8 : [NN → music, NN → ongaku]
Note that in production s2 above, the nonterminals
VB and TO generated from nonterminal VB2 in
804
the English component are inverted in the Japanese
component, where some additional lexical material
is also added.
In a SCFG, the ‘derives’ relation is defined on
synchronous strings in terms of simultaneous rewrit-
ing of two nonterminals with the same index. Some
additional notation will help us defining this rela-
tion precisely. A reindexing is a one-to-one func-
tion onN. We extend a reindexing f to VI by letting
f(A(t)) = A(f(t)) for A(t) ∈ I(VN) and f(a) = a
for a ∈ VT. We also extend f to strings in V ∗I by
letting f(ε) = ε and f(Xγ) = f(X)f(γ), for each
X ∈ VI and γ ∈ V ∗I . We say that strings γ1,γ2 ∈
V ∗I are independent if index(γ1)∩ index(γ2) = ∅.
Definition 2 Let G = (VN,VT,P,S) be a SCFG
and let γ1,γ2 be synchronous strings in V ∗I . The
derives relation [γ1, γ2] ⇒G [δ1, δ2] holds
whenever there exist an index t in index(γ1), a syn-
chronous production [A1 → α1, A2 → α2] in P
and some reindexing f such that
(i) f(α1α2) and γ1γ2 are independent; and
(ii) γi = γprimeiA(t)i γprimeprimei , δi = γprimeif(αi)γprimeprimei , for i = 1,2.
We also write [γ1, γ2] ⇒sG [δ1, δ2] to explicitly
indicate that the derives relation holds through some
synchronous production s ∈ P.
Since δ1 and δ2 in Definition 2 are synchronous
strings, we can define the reflexive and transitive
closure of ⇒G, written ⇒∗G. This relation is used
to represent derivations in G. In case we have
[γ1i−1, γ2i−1] ⇒siG [γ1i, γ2i] for 1 ≤ i ≤ n,
n ≥ 1, we also write [γ10, γ20] ⇒σG [γ1n, γ2n],
where σ = s1s2 ···sn. We always assume some
canonical form for derivations (as for instance left-
most derivation on the left component). Similarly to
the case of context-free grammars, each derivation
in G can be associated with a pair of parse trees, that
is, one parse tree for each dimension.
Back to our example, we report a fragment of a
derivation of the string pair [he adores listening to
music, kare ha ongaku wo kiku no ga daisuki desu]:
[VB(1), VB(1)]
⇒s1G [PRP(2) VB1(3) VB2(4),
PRP(2) VB2(4) VB1(3)]
⇒s4G [he VB1(3) VB2(4),
kare ha VB2(4) VB1(3)]
⇒s5G [he adores VB2(4),
kare ha VB2(4) daisuki desu]
⇒s2G [he adores VB(5) TO(6),
kare ha TO(6) VB(5) ga daisuki desu].
The translation generated by a SCFG G is a bi-
nary relation over V ∗T defined as
T(G) = {[w1, w2] | [S(1), S(1)] ⇒∗G [w1, w2],
w1,w2 ∈ V ∗T }.
The set of strings that are translations of a given
string w1 is defined as:
T(G,w1) = {w2 | [w1, w2] ∈ T(G)}.
A probabilistic SCFG (PSCFG) is a pair (G,pG)
where G = (VN,VT,P,S) is a SCFG and pG is a
function from P to real numbers in [0,1] such that,
for each A1,A2 ∈ VN, we have:
summationdisplay
α1,α2
pG([A1 → α1, A2 → α2] = 1.
If for n ≥ 1 and si ∈ P, 1 ≤ i ≤ n, string
σ = s1s2 ···sn is a canonical derivation of the form
[S(1), S(1)] ⇒σG [w1, w2], we write pG(σ) =producttext
n
i=1 pG(si). If D([w1,w2]) is the set of all canon-
ical derivations in G for pair [w1,w2], we write
pG([w1,w2]) =summationtextσ∈D([w1,w2]) pG(σ).
3 The membership problem
We consider here the membership problem for
SCFG, defined as follows: for input instance a
SCFG G and a pair [w1, w2], decide whether
[w1, w2] is in T(G). This problem has been con-
sidered for instance in (Wu, 1997) for his inver-
sion transduction grammars and has applications in
the support of several tasks of automatic annotation
of parallel corpora, as for instance segmentation,
bracketing, phrasal and word alignment. We show
that the membership problem for SCFGs is NP-
hard. The result could be derived from the findings
in (Melamed et al., 2004) that synchronous rewriting
systemsasSCFGsarerelatedtotheclassofsocalled
linear context-free rewriting systems (LCFRSs) and
from the result that the membership problem for
805
LCFRSs is NP-hard (Satta, 1992; Kaji and others,
1994). However, we provide here a direct proof, to
simplify the presentation.
Theorem 1 The membership problem for SCFGs is
NP-hard.
Proof. We reduce from the three-satisfiability
problem (3SAT, Garey and Johnson, 1979). Let
〈U,C〉 be an instance of the 3SAT problem, where
U = {u1,...,up} is a set of variables and C =
{c1,...,cn} is a set of clauses. Each clause is a set
of three literals from {u1,u1,...,up,up}.
The general idea of the proof is to use a string
pair [w1w2 ···wp, wc], where wc is a string repre-
sentation of C and each wi is a string controlling the
truth assignment for the variable ui. We then con-
struct a SCFG G such that each wi can be derived
in two possible ways only, using some specialized
productions of G, encoding the truth assignment of
variable ui. In this way the derivation of the whole
string w1 ···wp in the left dimension corresponds to
a guess of a truth assignment for U. Accordingly, on
the right dimension only those symbols of wc will
be derived that represent clauses that hold true un-
der the guessed assignment.
We need some additional notation. Below we
treat C as an alphabet of atomic symbols. We use
a function d such that, for every i with 1 ≤ i ≤
p, cd(i,1),cd(i,2),...,cd(i,si) is the sequence of all
clauses that include literal ui, in the left to right
order in which they appear within c1c2 ···cn, and
cd(i,si+1),cd(i,si+2),...,cd(i,ti) isthesequenceofall
clauses that include literal ui, again as they appear
within c1c2 ···cn from left to right. Note that we
must have summationtextpi=1 ti = 3n. We also use a function
e such that, for every 1 ≤ i ≤ p and 1 ≤ j ≤ ti,
e(i,j) = j +summationtexti−1k=1 tk (assumesummationtext0k=1 tk = 0).
Consider the alphabet {ai,bi | 1 ≤ i ≤ p}. For
every i, 1 ≤ i ≤ p, let wi denote a sequence of
exactly ti + 1 alternating symbols ai and bi, i.e.,
wi ∈ (aibi)+ ∪ (aibi)∗ai. For every 1 ≤ i ≤ p,
let x(i,1) = aibi and let x(i,h) = ai (resp. bi)
if h is even (resp. odd), 2 ≤ h ≤ ti. Let
also x(i,h) = ai (resp. bi) if h is odd (resp.
even), 1 ≤ h ≤ ti − 1, and let x(i,ti) = aibi
(resp. biai) if ti is odd (resp. even). There-
fore we can write wi = x(i,1)x(i,2)···x(i,t1) =
x(i,1)x(i,2)···x(i,t1).
Finally, we need a permutation pi defined on the
set {1,...,3n} as follows. Fix i and j with 1 ≤ i ≤
p and 1 ≤ j ≤ ti, and let h be the number of oc-
currences of the clause cd(i,j) found in the sequence
cd(1,1), cd(1,2), ..., cd(1,t1), cd(2,1), ..., cd(i,j). Note
that we must have 1 ≤ h ≤ 3. Then we set
pi(e(i,j)) = 3·[d(i,j)−1] + h.
We can now define the target instance
〈G,[w,wprime]〉 of our reduction. Let [w,wprime] =
[w1w2 ···wp, c1c2 ···cn]. Let also G = (VN, VT,
P, S), with VN = {S} ∪ {Ai | 1 ≤ i ≤ 3n} and
VT = C ∪ {ai,bi | 1 ≤ i ≤ p}. The productions
below define set P:
(i) for every 1 ≤ i ≤ p:
(a) for 1 ≤ h ≤ si:
[Ae(h,i) → x(i,h), Ae(h,i) → ce(i,h)],
[Ae(h,i) → x(i,h), Ae(h,i) → ε],
[Ae(h,i) → x(i,h), Ae(h,i) → ε];
(b) for si + 1 ≤ h ≤ ti:
[Ae(h,i) → x(i,h), Ae(h,i) → ε],
[Ae(h,i) → x(i,h), Ae(h,i) → ce(i,h)],
[Ae(h,i) → x(i,h), Ae(h,i) → ε];
(ii) [S → A(e(1,1))e(1,1) A(e(1,2))e(1,2) ···
A(e(1,t1))e(1,t1) A(e(2,1))e(2,1) ···A(e(p,tp))e(p,tp) ,
S → A(pi(e(1,1)))pi(e(1,1)) A(pi(e(1,2)))pi(e(1,2)) ···
A(pi(e(1,t1)))pi(e(1,t1)) A(pi(e(2,1)))pi(e(2,1)) ···A(pi(e(p,tp)))pi(e(p,tp)) ].
It is easy to see that |G|, |w| and |wprime| are polyno-
mially related to |U| and |C|. From a derivation of
[w,wprime] ∈ T(G), we can exhibit a truth assignment
that satisfies C simply by reading off the derivation
of the left string w1w2 ···wp. Conversely, starting
fromatruthassignmentthatsatisfiesC wecanprove
w ∈ L(G)bymeansof(finite)inductionon|U|: this
part requires a careful inspection of all items in the
definition of G.
From Theorem 1 we may conclude that algo-
rithms for the membership problem for SCFGs are
very unlikely to run in polynomial time. In the
literature, several algorithms for this problem have
been proposed using tabular methods (chart pars-
ing). In the worst case, all these algorithms run in
time Θ(|G|·nk(G)), with G an SCFG and n the
806
length of the input string pair. We know that, un-
less P = NP, k(G) cannot be a constant. We now
prove a lower bound on k(G), providing thereby an
exponential time lower bound result for our problem
under the assumption of the tabular paradigm.
Tabular methods for the membership problem are
based on the following representation. Given a syn-
chronous production
s : [A1 → B(1)11 ···B(r)1r ,
A2 → B(pi(1))21 ···B(pi(r))2r ], (1)
the already recognized constituent pairs B1i,B2pi(i)
are gather together in several steps, keeping a record
of the spanned substrings of the input. To pro-
vide a concrete example, if we gather all the B1i’s
on the left dimension from left to right, the partial
analysis we obtain after the first step can be repre-
sented as a state 〈s(1), (i11,j11), (i21,j21)〉, mean-
ing that B11 and B2pi(1) span substrings w1[i11,j11]
and w2[i21,j21], respectively.1 At the second
step we have a state 〈s(2), (i11,j12), (i21,j21),
(i22,j22)〉, meaning that B11B12 together span
w1[i11,j12], B2pi(1) spans w2[i21,j21] and B2pi(2)
spans w2[i22,j22]. We can see that, for some worst
casepermutations, theleft-to-rightstrategydemands
for increasingly more pairs of indices, so that the ex-
ponent in the time complexity linearly grows with r.
How much better can we do, if we exploit some
strategy other than the left-to-right above? More
precisely, we ask how many unconnected spannings
a state may require for some worst case permutation
pi, under the choice of the best possible parsing strat-
egy for pi itself.
Theorem 2 In the worst case, standard tabular
methods for the SCFG membership problem require
an amount of time Ω(|G|nc·√r), with r the length of
the longest production in G and c a constant.
Proof. For any r ≥ 8 we let q = floorleftradicalbigr/2floorright ≥
floorleftradicalbig8/2floorright = 2, and define a permutation pir on
{1,...,r}. We view the domain of pir as composed
of 2q blocks with q adjacent integers each, possi-
bly followed by r − 2q2 additional “padding” in-
tegers, and its codomain as composed of q blocks
1For a string w = a1 · · · an, we write w[i,j] to denote the
substring ai+1 · · · aj.
with 2q adjacent integers each, again possibly fol-
lowed by r − 2q2 “padding” integers. Permutation
pir transposes all blocks by sending the j-th element
of the i-th block in the domain into the i-th element
of the j-th block in the codomain, while mapping
each padding integer identically into itself. For-
mally, for all positive integers i ≤ 2q and j ≤ q,
pir(q · (i − 1) + j) = 2q · (j − 1) + i, and for all
integers i with 2q2 < i ≤ r, pir(i) = i.
We count below how many spans are instanti-
ated by a state that has gathered p constituent pairs,
1 ≤ p ≤ r, in parsing production (1) under any pos-
sible strategy. When a constituent pair B1i,B2pir(i)
is gathered, we say integer i in the domain of pir and
integer pir(i) in the codomain have been pebbled. In
this way each span (i,j) in a state corresponds to
some run i,i + 1,...j of pebbled integers, with ei-
ther i = 1 or i−1 unpebbled, and with either j = r
or j + 1 unpebbled. We call each such run a seg-
ment, and show that every parsing strategy demands
at least q = floorleftradicalbigr/2floorright segments either in the domain
or in the codomain of pir.
We say that a block in the domain of pir is empty,
full, or mixed if, respectively, none, all, or some but
not all of its elements have been pebbled. Assume
that, for a given parsing strategy, the last block that
becomes mixed does so when we place the i-th peb-
ble, and the first block that becomes full does so
when we place the j-th pebble. Obviously i negationslash= j:
the first pebble placed in a previously empty block
can not make it full since every block contains at
least 2 elements.
If i < j, after placing the i-th pebble and before
placing the j-th pebble every block in the domain of
pir is mixed. Each of these 2q blocks then contains
at least one pebbled element which is adjacent to an
unpebbled one and must therefore be either the first
or the last element of a segment. The domain of pir
then contains at least 2q/2 = q segments.
If j < i, after placing the j-th pebble and be-
fore placing the i-th pebble at least one block in the
domain of pir (e.g., the h-th block) is full, and at
least one (e.g., the k-th) is empty. Then, in each
of the q blocks in the codomain of pir, the h-th el-
ement is pebbled while the k-th is not. Therefore
the h-th elements of any two consecutive blocks in
the codomain of pir must belong to two distinct seg-
ments, since at least one intermediate element is not
807
pebbled. The codomain of pir then contains at least
q segments.
4 The translation problem
In this section we consider some formulations of the
translation problem for PSCFG that have been pro-
posed in the literature. The most general definition
of the translation problem for PSCFG is this: for
an input PSCFG Gp = (G,pG) and an input string
w, produce a representation of all possible parse
trees, alongwiththeirprobabilities, thatareassigned
byGto a string in the setT(G,w)under some trans-
lation of w.
Variant of this definition can be found where the
input is a single parse tree for w (Yamada and
Knight, 2001), or where the output is a single parse
tree, chosen according to some specific criteria (Wu
andWong, 1998). Toformallystudytheseproblems,
in what follows we focus on single parse trees asso-
ciated with derivations in Gp. For a derivation σ of
the form [S(1),S(1)] ⇒σG [w1,w2], we write tσ,l and
tσ,r to denote the left and the right parse trees, re-
spectively, associated with σ. The probability that
tσ,r is obtained as a translation of tσ,l through Gp is
thus pG([tσ,l, tσ,r]) = pG(σ). Let t be some parse
tree; we write y(t) to denote the string in the yield
of t. For a string w ∈ V ∗T and a parse tree t, we
also consider the probability that t is obtained from
w through Gp, defined as:
pG([w, t]) =
summationdisplay
y(tprime)=w
pG([tprime, t]). (2)
We can now precisely define the variants of the
translation problem we are interested in. Given
as input a PSCFG Gp = (G,pG) and two strings
w1,w2 ∈ V ∗T , output the pair of parse trees
argmax
y(t1) = w1,
y(t2) = w2
pG([t1, t2]). (3)
If the synchronous productions in the underlying
SCFG G have length bounded by some constant,
then the above problem can be solved in polynomial
time using extensions of the Viterbi search strategy
to parse forests. This has been shown for instance
in (Wu and Wong, 1998; Yamada and Knight, 2001;
Melamed, 2004).
A second interesting problem is defined as fol-
lows. Given as input a PSCFG Gp = (G,pG) and a
string w ∈ V ∗T , output the parse tree
argmaxt pG([w, t]). (4)
Even in case we impose some constant bound on
the length of the synchronous productions in G, the
above problem is NP-hard, as we show in what fol-
lows.
We assume the reader is familiar with the defini-
tion of probabilistic context-free grammar (PCFG)
and with the associated notion of derivation prob-
ability (Wetherell, 1980). We denote a PCFG as
a pair (G,pG), with G = (VN,VT,P,S) the un-
derlying context-free grammar and pG the associ-
ated function providing the probability distributions
for the productions in P, conditioned on their left-
hand side. A probabilistic regular grammar (PRG)
is a PCFG with underlying productions of the form
A → aB or A → ε, with A,B nonterminal symbols
and a a terminal symbol.
We consider below a decision problem associated
with PRG, called the consensus problem, defined as
follows: Given as input a PRG (G,pG) and a ra-
tional number d ∈ [0,1], decide whether there ex-
ists a string w in the language generated by G such
that pG(w) ≥ d. It has been shown in (Casacuberta
and de la Higuera, 2000) that, for a PRG G whose
productions have all probabilities expressed by ra-
tional numbers, the above problem is NP-complete.
(Essentially the same result is also reported in (Lyn-
gso and Pedersen, 2002), stated in terms of hidden
Markov models.) We reduce the consensus problem
for PRG to a decision version of the problem in (4),
called the best translated derivation problem and
defined as follows. Given as input a PCFG Gp =
(G,pG), a string w ∈ V ∗T and a rational number
d ∈ [0,1], decide whether maxt pG([w, t]) ≥ d.
Theorem 3 The best translated derivation problem
for the class PSCFG is NP-hard.
Proof. We provide a reduction from the consensus
problem for the class PRG with rational production
probabilities. Themainideaisdescribedinwhatfol-
lows. Given the input PRG Gp, we construct a target
PSCFG Gprimep that translates string $ into $, with $ a
special symbol. Given as input the string $, Gprimep sim-
ulates all possible derivations of Gp through its own
808
derivations. This is done by encoding the nontermi-
nals appearing in a derivation ρ of Gp within the left
component of some derivation σ of Gprimep, and by en-
coding the terminal string generated by ρ within the
right component of σ. The probability of ρ is also
preserved by σ.
Let Gp = (G,pG), d be an instance of the con-
sensus problem as above, with G = (VN,VT,P,S).
We specify a PSCFG Gprimep = (Gprime,pGprime) with Gprime =
(V primeN,{$},Pprime,S) and V primeN = VN ∪VT. Set Pprime is con-
structed as follows:
(i) for every (S → aA) ∈ P, s : [S → A(1), S →
a(1)] is added to Pprime, with pGprime(s) = pG(S →
aA);
(ii) for every (S → ε) ∈ P, s : [S → $, S → $] is
added to Pprime, with pGprime(s) = pG(S → ε);
(iii) for every a ∈ VT and (A → bB) ∈ P, s :
[A → B(1), a → b(1)] is added to Pprime, with
pGprime(s) = pG(A → bB)
(iv) for every a ∈ VT and (A → ε) ∈ P,
s : [A → $, a → $] is added to Pprime, with
pGprime(s) = pG(A → ε).
Note that the construction of Gprimep can be carried out
in quadratic time in the size of Gp. It is not diffi-
cult to see that there exists a derivation of the form
S ⇒G a1A1 ⇒G a1a2A2 ··· ⇒G a1a2 ···anAn
if and only if there exist a derivation in Gprime asso-
ciated with unary trees t1 and t2, such that string
SA1A2 ···An is read from the spine of t1 and string
Sa1a2 ···an is read from the spine of t2. Further-
more, the two derivations are composed of ‘corre-
sponding’ productions with the same probabilities.
We conclude that there exists a string w in L(G)
with pG(w) > d if and only if there exists a unary
tree t with string Sw$ read from the spine such that
pGprime([$,t]) > d.
We discuss below an interesting consequence of
Theorem 3. The SDTS formalism discussed in Sec-
tion 1 has been extended to the probabilistic case
in (Maryanski and Thomason, 1979), called stochas-
tic SDTS (SSDTS). As a corollary to the proof of
Theorem 3, we obtain that one can define, through
some PSCFG Gp and some fixed string w, a proba-
bility distribution pG([w,t]) on parse trees that can-
not be obtained through any SSDTS. Without pro-
viding the details of the definition of SSDTS, we
give here only an outline of the proof. We also as-
sume that the reader is familiar with probabilistic
finite automata and with their distributional equiv-
alence with PRG.
Consider the PSCFG Gprimep = (Gprime,pGprime) defined in
the proof of Theorem 3, and assume there exists
some SSDTS Gprimeprimep = (Gprimeprime,pGprimeprime) such that, for every
tree t, we have pGprimeprime([$,t]) = pGprime([$,t]). Since in a
derivationofanSDTSthegeneratedtreesarealways
isomorphic, up to some reordering of sibling nodes,
we obtain that the productions of Gprimeprime must have the
form [S → a(1), S → a(1)], [a → b(1), a → b(1)]
and [a → $, a → $]. From these productions we
can construct a probabilistic deterministic finite au-
tomaton generating the same language as the PRG
Gp, and with the same distribution. But this is im-
possible since there are string distributions defined
by some PRG that cannot be obtained through prob-
abilistic deterministic finite automata; see for in-
stance (Vidal et al., 2005).
We conclude by remarking that in (Casacuberta
and de la Higuera, 2000) it is shown that finding
the best output string for a given input string is NP-
hard for stochastic SDTS with a single nonterminal
in each production’s right-hand side. Our result in
Theorem3, statedforPSCFG,isstronger, sinceitin-
vestigates individual parse trees rather than strings.
5 Concluding remarks
The presented results are based on worst case analy-
sis: further experimental evaluation needs to be car-
ried out on multilingual corpora in order to asses the
practical impact of these findings.
Acknowledgment
We are indebted to Dan Melamed and Mark-Jan
Nederhof for technical discussion on topics related
to this paper. Dan Melamed also suggested to us the
problem investigated by Theorem 2. The first author
is partially supported by MIUR under project PRIN
No. 2003091149 005.
References
A. V. Aho and J. D. Ullman. 1972. The Theory of Pars-
ing, Translation and Compiling, volume 1. Prentice-
Hall, Englewood Cliffs, NJ.
809
Hiyan Alshawi, Srinivas Bangalore, and Shona Douglas.
2000. Learning dependency translation models as col-
lections of finite state head transducers. Computa-
tional Linguistics, 26(1):45–60, March.
Peter F. Brown, John Cocke, Stephen A. Della Pietra,
Vincent J. Della Pietra, Fredrick Jelinek, Robert L.
Mercer, and Paul Roossin. 1988. A statistical ap-
proach to language translation. In Proceedings of
the International Conference on Computational Lin-
guistics (COLING) 1988, pages 71–76, Budapest,
Hungary, August.
F. Casacuberta and C. de la Higuera. 2000. Computa-
tional complexity of problems on probabilistic gram-
mars and transducers. In L. Oliveira, editor, Gram-
matical Inference: Algorithms and Applications; 5th
International Colloquium, ICGI 2000, pages 15–24.
Springer.
D. Chiang. 2004. Evaluating Grammar Formalisms for
Applications to Natural Language Processing and By-
ological Sequence Analysis. Ph.D. thesis, Department
of Computer and Information Science, University of
Pennsylvania.
D. Chiang. 2005. A hierarchical phrase-based model for
statistical machine translation. In Proc. of the 43rd
ACL, pages 263–270.
M. R. Garey and D. S. Johnson. 1979. Computers and
Intractability. Freeman and Co., New York, NY.
Daniel Gildea. 2003. Loosely tree-based alignment for
machine translation. In Proceedings of the 40th An-
nual Meeting of the Association for Computational
Linguistics (ACL), Sapporo, Japan, July.
Y. Kaji et al. 1994. The computational complexity of
the universal recognition problem for parallel multiple
context-free grammars. Computational Intelligence,
10(4):440–452.
Kevin Knight. 1999. Decoding complexity in word-
replacement translation models. Computational Lin-
guistics, Squibs and Discussion, 25(4).
S. Kumar and W. Byrne. 2003. A weighted finite state
transducer implementation of the alignment template
model for statistical machine translation. In Proceed-
ings of HLT-NAACL.
R. B. Lyngso and C. N. S. Pedersen. 2002. The con-
sensus string problem and the complexity of compar-
ing hidden markov models. Journal of Computing and
System Science, 65:545–569.
Fred J. Maryanski and Michael G. Thomason. 1979.
Properties of stochastic syntax-directed translation
schemata. International Journal of Computer and In-
formation Sciences, 8(2):89–110.
I. Dan Melamed, Giorgio Satta, and Benjamin Welling-
ton. 2004. Generalized multitext grammars. In Pro-
ceedings of the 42nd Annual Meeting of the Associa-
tion for Computational Linguistics (ACL), Barcelona,
Spain.
I. Dan Melamed. 2003. Multitext grammars and syn-
chronous parsers. In Proceedings of the Human Lan-
guage Technology Conference and the North Ameri-
can Association for Computational Linguistics (HLT-
NAACL), pages 158–165, Edmonton, Canada.
I. Dan Melamed. 2004. Statistical machine translation
by parsing. In Proceedings of the 42nd Annual Meet-
ing of the Association for Computational Linguistics
(ACL), Barcelona, Spain.
Franz Josef Och and Hermann Ney. 2002. Discrimina-
tive training and maximum entropy models for statis-
tical machine translation. In Proceedings of the 40th
Annual Meeting of the Association for Computational
Linguistics (ACL), Philadelphia, July.
Franz Josef Och, Christoph Tillmann, and Hermann Ney.
1999. Improved alignment models for statistical ma-
chine translation. In Proceedings of the 4nd Confer-
ence on Empirical Methods in Natural Language Pro-
cessing (EMNLP), pages 20–28, College Park, Mary-
land.
G. Satta. 1992. Recognition of linear context-free rewrit-
ing systems. In Proc. of the 30th ACL, Newark,
Delaware.
E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta,
and R. C. Carrasco. 2005. Probabilistic finite-state
machines – Part I. IEEE Trans. on Pattern analysis
and Machine Intelligence. To appear.
C. S. Wetherell. 1980. Probabilistic languages: A re-
view and some open questions. Computing Surveys,
12(4):361–379.
Dekai Wu and Hongsing Wong. 1998. Machine trans-
lation with a stochastic grammatical channel. In Pro-
ceedings of the 35th Annual Meeting of the Associa-
tion for Computational Linguistics (ACL), Montreal,
Canada, July.
Dekai Wu. 1997. Stochastic inversion transduction
grammars and bilingual parsing of parallel corpora.
Computational Linguistics, 23(3):377–404, Septem-
ber.
Kenji Yamada and Kevin Knight. 2001. A syntax-based
statistical translation model. In Proceedings of the
39th Annual Meeting of the Association for Compu-
tational Linguistics (ACL), pages 531–538, Toulouse,
July.
810
