Left-To-Right Parsing 
and Bilexical Context-Free Grammars 
Mark-Jan Nederhof 
DFKI 
Stuhlsatzenhausweg 3 
D-66123 Saarbrficken 
Germany 
nederhofOdfk±, de 
Giorgio Satta 
Dipartimento di Elettronica e Informatica 
Universit~ di Padova 
via Gradenigo, 6/A 
1-35131 Padova, Italy 
satta~dei, unipd, it 
Abstract 
We compare the asymptotic time complexity of 
left-to-right and bidirectional parsing techniques for 
bilexical context-free grammars, a grammar formal- 
ism that is an abstraction of  models used in 
several state-of-the-art real-world parsers. We pro- 
vide evidence that left-to-right parsing cannot be re- 
alised within acceptable time-bounds if the so called 
correct-prefix property is to be ensured. Our evi- 
dence is based on complexity results for the repre- 
sentation of regular s. 
1 Introduction 
Traditionally, algorithms for natural  pars- 
ing process the input string strictly from left to right. 
In contrast, several algorithms have been proposed 
in the literature that process the input in a bidi- 
rectional fashion; see (van Noord, 1997; Satta and 
Stock, 1994) and references therein. The issue of 
parsing efficiency for left-to-right vs. bidirectional 
methods has longly been debated. On the basis of 
experimental results, it has been argued that the 
choice of the most favourable strategy should depend 
on the grammar at hand. With respect to grammar 
formalisms based upon context-free grammars, and 
when the rules of these formalisms strongly depend 
on lexical information, (van Noord, 1997) shows that 
bidirectional strategies are more efficient than left- 
to-right strategies. This is because bidirectional 
strategies are most effective in reducing the parsing 
search space, by activating as early as possible the 
maximum number of lexical constraints available in 
the grammar. 
In this paper we present mathematical arguments 
in support of the above empirically motivated the- 
sis. We investigate a class of lexicalized grammars 
that, in their probabilistic versions, have been widely 
adopted as  models in state-of-the-art real- 
world parsers. The size of these grammars usually 
grows with the square of the size of the working lex- 
icon, and thus can be very large. In these cases, the 
primary goal in the design of a parsing algorithm 
is to achieve asymptotic time performance sublinear 
in the size of the working grammar and indepen- 
dent of the size of the lexicon. These desiderata are 
met by existing bidirectional algorithms (Alshawi, 
1996; Eisner, 1997; Eisner and Satta, 1999). In con- 
trast, we show the following two main results for 
the asymptotic time performance of left-to-right al- 
gorithms satisfying the so called correct-prefix prop- 
erty. 
• In case off-line compilation of the working gram- 
mar is not allowed, left-to-right parsing cannot 
be realised within time bounds independent of 
the size of the lexicon. 
• In case polynomial-time, off-line compilation of 
the working grammar is allowed, left-to-right 
parsing cannot be realised in polynomial time, 
and independently of the size of the lexicon, un- 
less a strong conjecture based on complexity re- 
sults for the representation of regular s 
is falsified. 
The first result implies that the well known Earley 
algorithm and related standard parsing techniques 
that do not require grammar precompilation can- 
not be directly extended to process the above men- 
tioned grammars (resp.  models) within an 
acceptable time bound. The second result provides 
evidence that well known parsing techniques as left- 
corner parsing, requiring polynomial-time prepro- 
cessing of the grammar, also cannot be directly ex- 
tended to process these formalisms within an accept- 
able time bound. 
The grammar formalisms we investigate are based 
upon context-free grammars and are called bilex- 
ical context-free grammars. Bilexical context-free 
grammars have been presented in (Eisner and Satta, 
1999) as an abstraction of  models that have 
been adopted in several recent real-world parsers, 
improving state-of-the-art parsing accuracy (A1- 
shawl, 1996; Eisner, 1996; Charniak, 1997; Collins, 
1997). Our results directly transfer to all these lan- 
guage models. In a bilexical context-free grammar, 
possible arguments of a word are always specified 
along with possible head words for those arguments. 
Therefore a bilexical grammar requires the grammar 
writer to make stipulations about the compatibil- 
272 
ity of particular pairs of words in particular roles, 
something that was not necessarily true of general 
context-free grammars. 
The remainder of this paper is organized as fol- 
lows. We introduce bilexical context-free grammars 
in Section 2, and discuss parsing with the correct- 
prefix property in Section 3. Our results for parsing 
with on-line and off-line grammar compilation are 
presented in Sections 4 and 5, respectively. To com- 
plete the presentation, Appendix A shows that left- 
to-right parsing in time independent of the size of 
the lexicon is indeed possible when an off-line com- 
pilation of the working grammar is allowed that has 
an exponential time complexity. 
2 Bilexical context-free grammars 
In this section we introduce the grammar formalism 
we investigate in this paper. This formalism, origi- 
nally presented in (Eisner and Satta, 1999), is an ab- 
straction of the  models adopted by several 
state-of-the-art real-world parsers (see Section 1). 
We specify a non-stochastic version of the formal- 
ism, noting that probabilities may be attached to 
the rewrite rules exactly as in stochastic CFG (Gon- 
zales and Thomason, 1978; Wetherell, 1980). We 
assume that the reader is familiar with context-free 
grammars. Here we follow the notation of (Harrison, 
1978; Hopcroft and Ullman, 1979). 
A context-free grammar (CFG) is a tuple G = 
(VN, VT, P, S), where VN and VT are finite, disjoint 
sets of nonterminal and terminal symbols, respec- 
tively, S E VN is the start symbol, and P is a finite 
set of productions having the form A -+ a, where 
A E VN and a E (VN t3 VT)*. A "derives" relation, 
written ::~, is associated with a CFG as usual. We 
use the reflexive and transitive closure of =~, writ- 
ten =~*, and define L(G) accordingly. The size of a 
CFG G is defined as IGI = ~(A--+a)EP \]Aal. If every 
production in P has the form A --+ BC or A --+ a, 
for A,B,C E VN,a E VT, then G is said to be in 
Chomsky Normal Form (CNF). 
A CFG G = (VN, VT, P, S\[$\]) in CNF is called a 
bilexical context-free grammar if there exists a 
set VD, called the set of delexicalized nontermi- 
nals, such that nonterminals from VN are of the form 
A\[a\], consisting of A E VD and a E VT, and every 
production in P has one of the following two forms: 
(i) A\[a\] ~ B\[b\] C\[c\], a E {b, c); 
(ii) A\[a\] ~ a. 
A nonterminal A\[a\] is said to have terminal symbol 
a as its lexlcal head. Note that in a parse tree for 
G, the lexical head of a nonterminal is always "in- 
herited" from some daughter symbol (i.e., from some 
symbol in the right-hand side of a production). In 
the sequel, we also refer to the set VT as the lexicon 
of the grammar. 
A bilexical CFG can encode lexically specific pref- 
erences in the form of binary relations on lexi- 
cal items. For instance, one might specify P as 
to contain the production VP\[solve\] -+ V\[solve\] 
NP\[puzzles\] but not the production VP\[eat\] --4 
V\[eat\] NP\[puzzles\]. This will allow derivation of 
some VP constituents such as "solve two puzzles", 
while forbidding "eat two puzzles". See (Eisner and 
Satta, 1999) for further discussion. 
The cost of this expressiveness is a very large 
grammar. Indeed, we have IGI = O(\]VD\[ 3" IVT\[2), 
and in practical applications \]VTI >> IVDI > 1. Thus, 
the grammar size is dominated in its growth by the 
square of the size of the working lexicon. Even if we 
conveniently group lexical items with distributional 
similarities into the same category, in practical ap- 
plications the resulting grammar might have several 
thousand productions. Parsing strategies that can- 
not work in sublinear time with respect to the size of 
the lexicon and with respect to the size of the whole 
input grammar are very inefficient in these cases. 
3 Correct-prefix property 
So called left-to-right strategies are standa~dly 
adopted in algorithms for natural  pars- 
ing. Although intuitive, the notion of left-to-right 
parsing is a concept with no precise mathematical 
meaning. Note that in fact, in a pathological way, 
one could read the input string from left-to-right, 
storing it into some data structure, and then per- 
form syntactic analysis with a non-left-to-right strat- 
egy. In this paper we focus on a precise definition 
of left-to-right parsing, known in the literature as 
correct-prefix property parsing (Sippu and Soisalon- 
Soininen, 1990). Several algorithms commonly used 
in natural  parsing satisfy this property, as 
for instance Earley's algor!thm (Earley, 1970), tab- 
ular left-corner and PLR parsing (Nederhof, 1994) 
and tabular LR parsing (Tomita, 1986). 
Let VT be some alphabet. A generic string over VT 
is denoted as w = al "-an, with n _> 0 and ai E VT 
(1 < i < n); in case n = 0, w equals the empty 
string e. For integers i and j with 1 < i < j < n, we 
write w\[i,j\] to denote string aiai+l" .aj; if i > j, 
we define w\[i, j\] = c. 
Let G -- (VN,VT,P,S) be a CFG and let w = 
al ... an with n _> 0 be some string over VT. A rec- 
ognizer for the CFG class is an algorithm R that, 
on input (G,w), decides whether w E L(G). We 
say that R satisfies the correct-prefix property 
(CPP) if the following condition holds. Algorithm 
R processes the input string from left-to-right, "con- 
suming" one symbol ai at a time. If for some i, 
0 < i < n, the set of derivations in G having the 
form S ~* w\[1, i\]7, 7 E (VN U VT)*, is empty, then 
R rejects and halts, and it does so before consuming 
symbol ai+l, if i < n. In this case, we say that R 
273 
has detected an error at position i in w. Note that 
the above property forces the recognizer to do rele- 
vant computation for each terminal symbol that is 
consumed. 
We say that w\[1,i\] is a correct-prefix for a lan- 
guage L if there exists a string z such that w\[1,i\]z E 
L. In the natural  parsing literature, the 
CPP is sometimes defined with the following condi- 
tion in place of the above. If for some i, 0 < i < n, 
w\[1, if is not a correct prefix for L(G), then R rejects 
and halts, and it does so before consuming symbol 
ai+i,~ if i < n. Note that the latter definition asks 
for a stronger condition, and the two definitions are 
equivalent only in case the input grammar G is re- 
duced, i While the above mentioned parsing algo- 
rithms satisfy the former definition of CPP, they do 
not satisfy the latter. Actually, we are not aware of 
any practically used parsing algorithm that satisfies 
the latter definition of CPP. 
One needs to distinguish CPP parsing from Some 
well known parsing algorithms in the literature that 
process symbols in the right-hand sides of each gram- 
mar production from left to right, but that do not 
exhibit any left-to-right dependency between differ- 
ent productions. In particular, processing of the 
right-hand side of some production may be initi- 
ated at some input position without consultation of 
productions or parts of productions that may have 
been found to cover parts of the input to the left 
of that position. These algorithms may also consult 
input symbols from left to right, but the processing 
that takes place to the right of some position i does 
not strictly depend on the processing that has taken 
place to the left of i. Examples are pure bottom-up 
methods, such as left-corner parsing without top- 
down filtering (Wiren, 1987). 
Algorithms that do satisfy the CPP make use of 
some form of top-down prediction. Top-down pre- 
diction can be implemented at parse-time as in the 
case of Earley's algorithm by means of the "predic- 
tor" step, or can be precompiled, as in the case of 
left-corner parsing (Rosenkrantz and Lewis, 1970), 
by means of the left-corner relation, or as in the case 
of LR parsers (Sippu and Soisalon-Soininen, 1990), 
through the closure function used in the construc- 
tion of LR states. 
4 Recognition without 
precompilation 
In this section we consider recognition algorithms 
that do not require off-line compilation of the input 
grammar. Among algorithms that satisfy the CPP, 
the most popular example of a recognizer that does 
i A context-free grammar G is reduced if every nonterminal 
of G can be part of at least one derivation that rewrites the 
start symbol into some string of terminal symbols. 
not require grammar precompilation is perhaps Ear- 
ley's algorithm (Earley, 1970). We show here that 
methods in this family cannot be extended to work 
in time independent of the size of the lexicon, in 
contrast with bidirectional recognition algorithms. 
The result presented below rests on the follow- 
ing, quite obvious, assumption. There exists a con- 
stant c, depending on the underlying computation 
model, such that in k > 0 elementary computation 
steps any recognizer can only read up to c. k pro- 
ductions from set P. In what follows, and without 
any loss of generality, we assume c = 1. Apart from 
this assumption, no other restriction is imposed on 
the representation of the input grammar or on the 
access to the elements of sets VN, VT and P. 
Theorem 1 Let f be any function of two variables 
defined on natural numbers. No recognizer for bilexi- 
cal context-free grammars that satisfies the CPP can 
run on input (G, wl in an amount of time bounded 
by f(\[VDI, \[W\[), where VD is the set of delexicatized 
nonterminals of G. 
Proof. Assume the existence of a recognizer R sat- 
isfying the CPP and running in I(IVDI, IwL) steps or 
less. We show how to derive a contradiction. 
Let q >_ 1 be an integer. Define a bilexical CFG 
G a = (Vr~ , V~, Pq, d\[bi\]) where V~ contains q + 2 
distinct symbols {bi,...,bq+2} and 
V~ = {A\[b,\]l l<i<q+l}u{T\[b\]lbeV~}, 
and where set pa contains all and only the following 
productions: 
(i) A\[b,\] --+ A\[b,+i\] T\[b,\], 1 < i < q; 
(ii) A\[bq+i\] -+ T\[bq+2\] T\[bq+l\]; 
(iii) T\[b\] -+ b, b E V~. 
Productions in (i) are called bridging productions. 
Note that there are q bridging productions in Gq. 
Also, note that V~ = {A,T} does not depend on 
the choice of q. Thus, we will simply write VD. 
Choose q > max{f(IVD\[,2),l }. On input 
(Gq, bq+2bq+i), R does not detect any error at posi- 
tion 1, that is after having read the first symbol bq+2 
of the input string. This is because A\[bl\] ~* bq+2~/ 
with 3' =- T\[ba+i\]T\[ba\]T\[bq-i\]'"T\[bi\] is a valid 
derivation in G. Since R executes no more than 
f(IVD\] ,2) steps, from our assumption that reading 
a production takes unit time it follows that there 
must be an integer k, 1 < k < q, such that bridging 
production A\[bk\] --+ A\[bk+i\] T\[bk\] is not read from 
Gq. Construct then a new grammar GI~ by replacing 
in Gq the production A\[bk\] --+ A\[bk+l\] T\[bk\] with 
the new production A\[bk\] --+ T\[bk\] A\[bk+i\], leaving 
everything else unchanged. It follows that, on in- 
put (G~, ba+2ba+i), R behaves exactly as before and 
does not detect any error at position 1. But this is 
274 
a contradiction, since there is no derivation in G~ of 
the form A\[bl\] =~* bq+2"Y, 7 E (VN U VT)*, as can be 
easily verified. • 
We can use the above result in the comparison 
of left-to-right and bidirectional recognizers. The 
recognition of bilexical context-free s can 
be carried out by existing bidirectional algorithms 
in time independent of the size of the lexicon and 
without any precompilation of the input bilexical 
grammar. For instance, the algorithms presented 
in (Eisner and Satta, 1999) allow recognition in time 
O(IVDI 3 IwI4). 2 Theorem 1 states that this time 
bound cannot be met if we require the CPP and if 
the input grammar is not precompiled. In the next 
section, we will consider the possibility that the in- 
put grammar is in a precompiled form. 
5 Recognition with precompilation 
In this section we consider recognition algorithms 
that satisfy the CPP and allow off-line, polynomial- 
time compilation of the working grammar. We focus 
on a class of bilexical context-free grammars where 
recognition requires the stacking of a number of un- 
resolved lexical dependencies that is proportional to 
the length of the input string. We provide evidence 
that the above class of recognizers perform much less 
efficiently for these grammars than existing bidirec- 
tional recognizers. 
We assume that the reader is familiar with the 
notions of deterministic and nondeterministic finite 
automata. We follow here the notation in (Hopcroft 
and Ullman, 1979). A nondeterministic finite au- 
tomaton (FA) is a tuple M = (Q, E, 5, q0, F), where 
Q and P. are finite, disjoint sets of state and alphabet 
symbols, respectively, qo E Q and F _C Q are the ini- 
tial state and the set of final states, respectively, and 
is a total function mapping Q x ~ to 2 Q, the power- 
set of Q. Function 5 represents the transitions of the 
automaton. Given a string w = al ""an, n > O, an 
accepting computation in M for w is a sequence 
qo, al,ql,a2,q2 .... ,an,q,, such that qi E 5(qi-l,ai) 
for 1 <i < n, and q~ E F. The L(M) is 
the set of all strings in E* that admit at least one 
accepting computation in M. The size of M is de- 
fined as \]M\] = ~qeQ,ae~ I~(q,a)l. The automaton 
M is deterministic if, for every q E Q and a E ~, we 
have IS(q, a)\] = 1. 
We call quasi-determinizer any algorithm A 
that satisfies the following two conditions: 
1. A takes as input a nondeterministic FA M --= 
(Q, ~, 5, qo, F) and produces as output a device 
DM that, when given a string w as input, de- 
cides whether w E L(M); and 
2More precisely, the running time for these algorithms is 
O(IVDI 3 Iw\[3min{\[VT\[, \[w\[}). In cases of practical interest, 
we always have Iw\[ < IVT\[. 
2. there exists a polynomial PA such that every 
DM runs in an amount of time bounded by 
PA(Iwl). 
We remark that, given a nondeterministic FA M 
specified as above, known algorithms allow simula- 
tion of M on an input string w in time O(IM I IwI) 
(see for instance (Aho et al., 1974, Thin. 9.5) 
or (Sippu and Soisalon-Soininen, 1988, Thm. 3.38)). 
In contrast, a quasi-determinizer produces a device 
that simulates M in an amount of time independent 
of the size of M itself. 
A standard example of a quasi-determinizer is the 
so called power-set construction, used to convert 
a nondeterministic FA into a -equivalent 
deterministic FA (see for instance (Hopcroft and 
Ullman, 1979, Thin. 2.1) or (Sippu and Soisalon- 
Soininen, 1988, Thm. 3.30)). In fact, there exist 
constants c and d such that any deterministic FA 
can be simulated on input string w in an amount of 
time bounded by c \]w I + d. This requires function 
to be stored as a IQ\] x \]El, 2-dimensional array with 
values in Q. This is a standard representation for 
automata-like structures; see (Gusfield, 1997, :Sect. 
6.5) for discussion. 
We now pose the question of the time efficiency 
of a quasi-determinizer, and consider the amount of 
time needed in the construction of DM. In (Meyer 
and Fisher, 1971; Stearns and Hunt, 1981) it is 
shown that there exist (infinitely many) nonde- 
terministic FAs with state set Q, such that any 
-equivalent deterministic FA must have at 
least 2 IQ} states. This means that the power-set con- 
struction cannot work in polynomial time in the size 
of the input FA. Despite of much effort, no algo- 
rithm has been found, up to the authors' knowledge, 
that can simulate a nondeterministic FA on an input 
string w in linear time in' Iwl and independently of 
IMI, if only polynomial-time precompilation of M 
is allowed. Even in case we relax the linear-time re- 
striction and consider recognition of w in polynomial 
time, for some fixed polynomial, it seems unlikely 
that the problem can be solved if only polynomial- 
time precompilation of M is allowed. Furthermore, 
if we consider precompilation of nondeterministic 
FAs into "partially determinized" FAs that would 
allow recognition in polynomial (or even exponen- 
tial) time in Iw\], it seems unlikely that the analysis 
required for this precompilation could consider less 
than exponentially many combinations of states that 
may be active at the same time for the original non- 
deterministic FA. Finally, although more powerful 
formalisms have been shown to represent some regu- 
lar s much more succinctly than FAs (Meyer 
and Fisher, 1971), while allowing polynomial-time 
parsing, it seem unlikely that this could hold for reg- 
ular s in general. 
275 
Conjecture There is no quasi-determinizer that 
works in polynomial time in the size of the input 
automaton. 
Before turning to our main result, we need to 
develop some additional machinery. Let M = 
(Q,E,6, qo, F) be a nondeterministic FA and let 
w = al..-an E L(M), where n > 0. Let 
qo, al , ql , . . . , an, qn be an accepting computation for 
w in M, and choose some symbol $ ¢ E. We can 
now encode the accepting computation as 
($, q0)(al, ql) • • • (an, qa) 
where we pair alphabet symbols to states, prepend- 
ing $ to make up for the difference in the number 
of alphabet symbols and states. We now provide 
a construction that associates M with a bilexical 
CFG GM. Strings in L(GM) are obtained by pair- 
ing strings in L(M) with encodings of their accepting 
computations (see below for an example). 
Definition 1 Let M = (Q,E,J, qo,F) be a nonde- 
terministic FA. Choose two symbols $, # ~ E, and 
let A = {(a,q) I a e EU{$}, q • O}. A bilexi- 
cat CFG GM -- (VN, VT, e~ C\[($, qo)\]) is specified as 
follows: 
(i) vN = {TIff I ~ • VT} U {C\[~\],C'\[.\] I ~ • a}; 
(ii) V T = A U ~ U { }; 
(iii) P contains all and only the following produc- 
tions: 
(a) for each a • VT, 
T\[a\] -, a; 
(b) for each (a,q),(a',q*) • A such that q* • 
5(q,a'), C\[(a, 
q)\] ~ C'\[(a', 4)\] T\[(a, q)\]; 
(C) for each (a,q) • A, 
C'\[(a,q)\] ~ T\[a\] C\[(a,q)\]; 
(d) .for each (a, q) • A such that q • F, C\[(a, 
q)\] -, T\[#\] T\[(a, q)\]. 
We give an example of the above construc- 
tion. Consider an automaton M and a string 
w = ala2a3 such that w • L(M). Let 
($,qo)(al,ql)(a2,q2)(a3,q3) be the encoding of an 
accepting computation in M for w. Then the 
string ala2a3~(a3, q3)(a2, q2)(al, ql)($, qo) belongs 
to L(GM). The tree depicted in Figure I represents 
a derivation in GM of such a string. 
The following fact will be used below. 
Lemma 1 For each w • E*, w# is a correct-prefix 
for L(GM) if and only if w • L(M). 
Outline of the proof. We claim the following 
fact. For each k > 0, al,a2,...,ak • ~ and 
qo, ql , . . . , qk • Q we have 
qi • 6(qi-l,ai), for all i (1 < i < k), 
c\[($,q0)l 
C'\[(al, ql)\] T\[($, q0)\] 
T\[a,\] C\[(al,ql)\] ($, q0) 
al C'\[(a2, q2)\] r\[(al, ql)l 
T\[a2\] C\[(a2, q2)\] (al, qz ) 
a~ c'\[(a~,q~)\] T\[(a~,q2)\] 
T\[a3\] C\[(a3, q3)\] (a2, (12) 
a~ T\[#1 T\[(a~,q3)\] 
I I 
# (a3, q3) 
Figure I: A derivation in GM for string 
ala2a3#(a3, qs)(a2, q2)(ax, ql)($, q0). 
ff and only ff 
C\[($, qo)\] ~* 
al'." akg\[Cak, qk)\](at¢-l, qk-x) • • • ($, qo). 
The claim can be proved by induction on k, using 
productions (a) to (c) from Definition 1. 
Let R denote the reverse operator on strings. 3 
From the above claim and using production (d) from 
Definition 1, one can easily show that 
L(GM) = {w#u \[ w E L(M), u R encodes an 
accepting computation for w}. 
The lemma directly follows from this relation. • 
We can now provide the main result of this sec- 
tion. To this end, we refine the definition of rec- 
ognizer presented in Section 3. A recognizer for the 
CFG class is an algorithm R that has random access 
to some data structure C(G) obtained by means of 
some off-line precompilation of a CFG G. On in- 
put w, which is a string on the terminal symbols of 
G, R decides whether w E L(G). The definition of 
the CPP extends in the obvious way to recognizers 
working with precompiled grammars. 
Theorem 2 Let p be any polynomial in two vari- 
ables. 1\] the conjecture about quasi-determinizers 
holds true, then no recognizer exists that 
3Note that R does not affect individual symbols in a string. 
Thus (a, q)R = (a, q). 
276 
(i) has random access to data structure C(G) pre- 
compiled from a bilexical CFG G in polynomial 
time in IGI, 
(ii) runs in an amount of time bounded by 
p(IVDI, Iwl), where VD is the set of delexicalized 
nonterminals of G and w is the input string, 
and 
(iii) satisfies the CPP. 
Proo/. Assume there exists a recognizer R that sat- 
isfies conditions (i) to (iii) in the statement of the 
theorem. We show how this entails that the conjec- 
ture about quasi-determinizers is false. 
We use algorithm R to specify a quasi- 
determinizer A. Given a nondeterministic FA M, 
A goes through the following steps. 
1. A constructs grammar GM as in Definition 1. 
2. A precompiles GM as required by R, producing 
data structure C(GM). 
3. A returns a device DM specified as follows. 
Given a string w as input, DM runs R on string 
w~. If R detects an error at any position i, 
0 < i < Iw#l, then DM rejects and halts, oth- 
erwise DM accepts and halts. 
From Lemma 1 we have that DM accepts w if and 
only if w E L(M). Since R runs in time P(\]VDI, Iwl) 
and since GM has a set of delexicalized nonterminals 
independent of M, we have that there exists a poly- 
nomial PA such that every DM works in an amount 
of time bounded by pA(IWl). We therefore conclude 
that A is a quasi-determinizer. 
It remains to be shown that A works in polyno- 
mial time in IMI. Step 1 can be carried out in time 
O(IM\[). The compilation at Step 2 takes polynomial 
time in IGM\], following our hypotheses on R, and 
hence polynomial time in IMI, since IGMI = O(IMI). 
Finally, the construction of DM at Step 3 can easily 
be carried out in time O(IMI) as well. • 
In addition to Theorem 1, Theorem 2 states that, 
even in case the input grammar is compiled off- 
line and in polynomial time, we cannot perform 
CPP recognition for bilexical context-free grammars 
in time polynomial in the grammar and the input 
string but independent of the lexicon size. This 
is true with at least the same evidence that sup- 
ports the conjecture on quasi-determinizers. Again, 
this should be contrasted with the time performance 
of existing bidirectional algorithms, allowing recog- 
nition for bilexical context-free grammars in time 
O(IVDI 3 Iwl ). 
In order to complete our investigation of the above 
problem, in Appendix A we show that, when we drop 
the polynomial-time restriction on the grammar pre- 
compilation, it is indeed possible to get rid of any 
IVT\] factor from the running time of the recognizer. 
6 Conclusion 
Empirical results presented in the literature show 
that bidirectional parsing strategies can be more 
time efficient in cases of grammar formalisms whose 
rules are specialized for one or more lexical items. 
In this paper we have provided an original mathe- 
matical argument in favour of this thesis. Our re- 
sults hold for bilexical context-free grammars and 
directly transfer to several  models that can 
be seen as stochastic versions of this formalism (see 
Section 1). We perceive that these results can be ex- 
tended to other  models that properly em- 
bed bilexical context-free grammars, as for instance 
the more general history-based models used in (Rat- 
naparkhi, 1997) and (Chelba and Jelinek, 1998). We 
leave this for future work. 
Acknowledgements 
We would like to thank Jason Eisner and Mehryar 
Mohri for fruitful discussions. The first author is 
supported by the German Federal Ministry of Edu- 
cation, Science, Research and Technology (BMBF) 
in the framework of the VERBMOBIL Project under 
Grant 01 IV 701 V0, and was employed at AT&T 
Shannon Laboratory during a part of the period this 
paper was written. The second author is supported 
by MURST under project PRIN." BioInformatica e 
Ricerca Genomica and by University of Padua, un- 
der project Sviluppo di Sistemi ad Addestramento 
Automatico per l'Analisi del Linguaggio Naturale. 

A Recognition in time independent 
of the lexicon 
In Section 5 we have shown that it is unlikely that 
correct-prefix property parsing for a bilexical CFG 
can be carried out in polynomial time and indepen- 
dently of the lexicon size, when only polynomial- 
time off-line compilation of the grammar is allowed. 
To complete our presentation, we show here that 
correct-prefix property parsing in time independent 
of the lexicon size is indeed possible if we spend ex- 
ponential time on grammar precompilation. 
We first consider tabular LR parsing (Tomita, 
1986), a technique which satisfies the correct-prefix 
property, and apply it to bilexical CFGs. Our pre- 
sentation relies on definitions from (Nederhof and 
Satta, 1996). Let w E V~ be some input string. A 
property of LR parsing is that any state that can be 
reached after reading prefix w\[1,j\], j < \]w\], must be 
of the form 
goto(goto(. . . (goto( q~n, X1),...), Xm-1), Xm) 
where q~ is the initial LR state, and XI,..., X,~ are 
terminals or nonterminals such that XI'.'Xm o* 
w\[1, if. For a bilexical CFG, each X~ is of the form b~ 
or of the form B~\[b~\], where bl,..., bm is some subse- 
quence of wIl,j \]. This means that there are at most 
(2+ IVDI)" distinct states that can be reached by the 
recognizer, apart from qin. In the algorithm, the tab- 
ulation prevents repeated manipulation of states for 
a triple of input positions, leading to a time complex- 
ity of O(n 3 IvDIn), where n = Iwl. Hence, when we 
apply precompilation of the grammar, we can carry 
out recognition in time exponential in the length of 
the input string, yet independent of the lexicon size. 
Note however that the precompilation for LR pars- 
ing takes exponential time. 
The second algorithm with the CPP we will con- 
sider can be derived from Earley's algorithm (Ear- 
ley, 1970). For this new recognizer, we achieve a 
time complexity completely independent of the size 
of the whole grammar, not merely independent of 
the size of the lexicon as in the case of tabular LR 
parsing. Furthermore, the input grammar can be 
any general CFG, not necessarily a bilexical one. In 
terms of the length of the input, the complexity is 
polynomial rather than exponential. 
Earley's algorithm is outlined in what follows, 
with minor modifications with respect to its origi- 
nal presentation. An item is an object of the form 
278 
\[A -+ a ,, j3\], where A -~ a~ is a production from 
the grammar. The recognition algorithm consists in 
an incremental construction of a (n + 1) x (n + 1), 
2-dimensional table T, where n is the length of the 
input string. At each stage, each entry T\[i,j\] in 
the table contains a set of items, which is initially 
the empty set. After an initial item is added to en- 
try T\[0, 0\] in the table, other items in other entries 
are derived from it, directly or indirectly, using three 
steps called predictor, scanner and completer. When 
no more new items can be derived, the presence of 
a final item in entry T\[0, n\] indicates whether the 
input is recognized. 
The recognition process can be precompiled, 
based on the observation that for any grammar the 
set of all possible items is finite, and thereby all po- 
tential contents of T's entries can be enumerated. 
Furthermore, the dependence of entries on one an- 
other is not cyclic; one item in T\[i, j\] may be derived 
from a second item in the same entry, but it is not 
possible that, for example, an item in T\[i,j\] is de- 
rived from an item in T\[i',j'\], with (i,j) ~ (i',j'), 
which is in turn derived from an item in T\[i,j\]. 
A consequence is that entries can be computed 
in a strict order, and an operation that involves the 
combination of, say, the items from two entries T\[i, j\] 
and T\[j, k\] by means of the completer step can be 
implemented by a simple table lookup. More pre- 
cisely, each set of items is represented by an atomic 
state, and combining two sets of items according 
to the completer step is implemented by indexing 
a 2-dimensional array by the two states representing 
those two sets, yielding a third state representing 
the resulting set of items. Similarly, the scanner 
and predictor steps and the union operation on sets 
of items can all be implemented by table lookup. 
The time complexity of recognition can straight- 
forwardly be shown to be (9(n3), independent of 
the size of the grammar. However, massive pre- 
compilation is involved in enumerating all possi- 
ble sets of items and precomputing the operations 
on them. The motivation for discussing this algo- 
rithm is therefore purely theoretical: it illustrates 
the unfavourable complexity properties that The- 
orem 2, together with the conjecture about quasi- 
determinizers, attributes to the recognition problem 
if the correct-prefix property is to be ensured. 
279 

References 

A. V. Aho, J. E. Hopcroft, and J. D. Ullman. 1974. 
The Design and Analysis of Computer Algorithms. 
Addison-Wesley, Reading, MA. 

H. Alshawi. 1996. Head automata and bilingual 
tiling: Translation with minimal representations. 
In Proc. of the ACL, pages 167-176, Santa 
Cruz, CA. 

E. Charniak. 1997. Statistical parsing with a 
context-free grammar and word statistics. In 
Proc. of AAAI-97, Menlo Park, CA. 

C. Chelba and F. Jelinek. 1998. Exploiting syntactic 
structure for  modeling. In Proc. of the 
36th ACL, Montreal, Canada. 

M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In Proc. of the 35th 
ACL, Madrid, Spain. 

J. Earley. 1970. An efficient context-free parsing algorithm. Communications of the Association for 
Computing Machinery, 13(2):94-102. 

J. Eisner and G. Satta. 1999. Efficient parsing for 
bilexical context-free grammars and head automaton grammars. In Proc. of the 37 th ACL, pages 
457-464, College Park, Maryland. 

J. Eisner. 1996. An empirical comparison of probability models for dependency grammar. Technical 
Report IRCS-96-11, IRCS, Univ. of Pennsylvania. 

J. Eisner. 1997. Bilexical grammars and a cubic-time probabilistic parser. In Proceedings of the 
Int. Workshop on Parsing Technologies, MIT, 
Cambridge, MA, September. 

R. C. Gonzales and M. G. Thomason. 1978. Syntactic Pattern Recognition. Addison-Wesley, Reading, MA. 

D. Gusfield. 1997. Algorithms on Strings, Trees and 
Sequences. Cambridge University Press, Cambridge, UK. 

M. A. Harrison. 1978. Introduction to Formal Language Theory. Addison-Wesley, Reading, MA. 

J. E. Hopcroft and J. D. Ullman. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading, MA. 

A. R. Meyer and M. J. Fisher. 1971. Economy of description by automata, grammars and formal systems. In 12th Annual Symp. on Switching and Automata Theory, pages 188-190, New York. IEEE. 

M.-J. Nederhof and G. Satta. 1996. Efficient tabular 
LR parsing. In Proc. of the ACL, pages 239-246, Santa Cruz, CA. 

M.-J. Nederhof. 1994. An optimal tabular parsing algorithm. In Proc. of the ACL, pages 117-124, Las Cruces, New Mexico. 

A. Ratnaparkhi. 1997. A linear observed time statistical parser based on maximum entropy models. In Second Conference on Empirical Methods 
in Natural Language Processing, Brown University, Providence, Rhode Island. 

D. J. Rosenkrantz and P. M. Lewis. 1970. Deterministic left corner parsing. In IEEE Conf. Record 
11th Annual Symposium on Switching and Automata Theory, pages 139-152. 

G. Satta and O. Stock. 1994. Bidirectional context-free grammar parsing for natural  processing. Artificial Intelligence, 69:123-164. 

S. Sippu and E. Soisalon-Soininen. 1988. Parsing Theory: Languages and Parsing, volume 1. 
Springer-Verlag, Berlin, Germany. 

S. Sippu and E. Soisalon-Soininen. 1990. Parsing 
Theory: LR(k) and LL(k) Parsing, volume 2. 
Springer-Verlag, Berlin, Germany. 

R. E. Stearns and H. B. Hunt. 1981. On the equivalence and containment problem for unambiguous 
regular expressions, grammars, and automata. In 
22nd Annual Syrup. on Foundations of Computer 
Science, pages 74-81, New York. IEEE. 

M. Tomita. 1986. Efficient Parsing for Natural Language. Kluwer, Boston, Mass. 

G. van Noord. 1997. An efficient implementation of 
the head-corner parser. Computational Linguistics, 23(3):425-456. 

C. S. Wetherell. 1980. Probabilistic s: A 
review and some open questions. Computing Surveys, 12(4):361-379. 

M. Wiren. 1987. A comparison of rule-invocation 
strategies in parsing. In Proc. of the EACL, 
pages 226-233, Copenhagen, Denmark. 
