Fast Context-Free Parsing Requires Fast Boolean Matrix 
Multiplication 
Lillian Lee 
Division of Engineering and Applied Sciences 
Harvard University 
33 Oxford Street 
Cambridge, MA 012138 
llee~eecs, harvard, edu 
Abstract 
Valiant showed that Boolean matrix 
multiplication (BMM) can be used for 
CFG parsing. We prove a dual re- 
sult: CFG parsers running in time 
O(\[Gl\[w\[ 3-e) on a grammar G and a 
string w can be used to multiply m x m 
Boolean matrices in time O(m3-e/3). 
In the process we also provide a formal 
definition of parsing motivated by an 
informal notion due to Lang. Our re- 
sult establishes one of the first limita- 
tions on general CFG parsing: a fast, 
practical CFG parser would yield a 
fast, practical BMM algorithm, which 
is not believed to exist. 
1 Introduction 
The context-free grammar (CFG) formalism 
was developed during the birth of the field of 
computational linguistics. The standard meth- 
ods for CFG parsing are the CKY algorithm 
(Kasami, 1965; Younger, 1967) and Earley's al- 
gorithm (Earley, 1970), both of which have a 
worst-case running time of O(gN 3) for a CFG 
(in Chomsky normal form) of size g and a string 
of length N. Graham et al. (1980) give a vari- 
ant of Earley's algorithm which runs in time 
O(gN3/log N). Valiant's parsing method is the 
asymptotically fastest known (Valiant, 1975). 
It uses Boolean matrix multiplication (BMM) 
to speed up the dynamic programming in the 
CKY algorithm: its worst-case running time is 
O(gM(N)), where M(rn) is the time it takes to 
multiply two m x m Boolean matrices together. 
The standard method for multiplying ma- 
trices takes time O(m3). There exist matrix 
multiplication algorithms with time complexity 
O(m3-J); for instance, Strassen's has a worst- 
case running time of O(m 2"sl) (Strassen, 1969), 
and the fastest currently known has a worst-case 
running time of O(m 2"376) (Coppersmith and 
Winograd, 1990). Unfortunately, the constants 
involved are so large that these fast algorithms 
(with the possible exception of Strassen's) can- 
not be used in practice. As matrix multi- 
plication is a very well-studied problem (see 
Strassen's historical account (Strassen, 1990, 
section 10)), it is highly unlikely that simple, 
practical fast matrix multiplication algorithms 
exist. Since the best BMM algorithms all rely 
on general matrix multiplication 1, it is widely 
believed that there are no practical O(m 3-~) 
BMM algorithms. 
One might therefore hope to find a way 
to speed up CFG parsing without relying on 
matrix multiplication. However, we show in 
this paper that fast CFG parsing requires 
fast Boolean matrix multiplication in a precise 
sense: any parser running in time O(gN 3-e) 
that represents parse data in a retrieval-efficient 
way can be converted with little computational 
overhead into a O(m 3-e/3) BMM algorithm. 
Since it is very improbable that practical fast 
matrix multiplication algorithms exist, we thus 
establish one of the first nontrivial limitations 
on practical CFG parsing. 
1The "four Russians" algorithm (Arlazarov et al., 
1970), the fastest BMM algorithm that does not sim- 
ply use ordinary matrix multiplication, has worst-case running time O(mS/log m). 
Our technique, adapted from that used by 
Satta (1994) for tree-adjoining grammar (TAG) 
parsing, is to show that BMM can be efficiently 
reduced to CFG parsing. Satta's result does not 
apply to CFG parsing, since it explicitly relies 
on the properties of TAGs that allow them to 
generate non-context-free languages. 
2 Definitions 
A Boolean matrix is a matrix with entries from 
the set {0, 1}. A Boolean matrix multiplication 
algorithm takes as input two m x m Boolean ma- 
trices A and B and returns their Boolean prod- 
uct A x B, which is the m × m Boolean matrix 
C whose entries c~j are defined by 
m 
= V (a,k A bkj). 
k=l 
That is, c.ij = 1 if and only if there exists a 
number k, 1 < k < m, such that aik = bkj = 1. 
We use the usual definition of a context-free 
grammar (CFG) as a 4-tuple G = (E, V, R, S), 
where E is the set of terminals, V is the set 
of nonterminals, R is the set of productions, 
and S C V is the start symbol. Given a string 
w ~ WlW2...WN over E*, where each wi is an 
element of E, we use the notation ~ to denote 
the substring wiwi+l " " " Wj-lWj • 
We will be concerned with the notion of 
c-derivations, which are substring derivations 
that are consistent with a derivation of an entire 
string. Intuitively, A =~* w~i is a c-derivation if 
it is consistent with at least one parse of w. 
Definition 1 Let G = (E, V, R, S) be a CFG, 
and let w = wlw2...wN, wi E ~. A nontermi- 
J hal A E V c-derives (consistently derives) w i if 
and only if the following conditions hold: 
• A ~* w~, and 
• S =::~* i--lA N 'u\] 1 14wit 1 . 
(These conditions together imply that S ~* w.) 
We would like our results to apply to all 
"practical" parsers, but what does it mean for 
a parser to be practical? First, we would like 
to be able to retrieve constituent information 
for all possible parses of a string (after all, 
the recovery of structural information is what 
distinguishes parsing algorithms from recogni- 
tion algorithms); such information is very use- 
ful for applications like natural language under- 
standing, where multiple interpretations for a 
sentence may result from different constituent 
structures. Therefore, practical parsers should 
keep track of c-derivations. Secondly, a parser 
should create an output structure from which 
information about constituents can be retrieved 
in an efficient way -- Satta (1994) points out an 
observation of Lang to the effect that one can 
consider the input string itself to be a retrieval- 
inefficient representation of parse information. 
In short, we require practical parsers to output 
a representation of the parse forest for a string 
that allows efficient retrieval of parse informa- 
tion. Lang in fact argues that parsing means 
exactly the production of a shared forest struc- 
ture "from which any specific parse can be ex- 
tracted in time linear with the size of the ex- 
tracted parse tree" (Lang, 1994, pg. 487), and 
Satta (1994) makes this assumption as well. 
These notions lead us to equate practical 
parsers with the class of c-parsers, which keep 
track of c-derivations and may also calculate 
general substring derivations as well. 
Definition 2 A c-parser is an algorithm that 
takes a CFG grammar G = (E,V,R,S) and 
string w E E* as input and produces output 
~G,w; J:G,w acts as an oracle about parse in- 
formation, as follows: 
• If A c-derives w~, then .7:G,w(A,i,j) = 
"yes ". 
If A ~* J :which implies that A does not • W i 
c-derive wJi ), then :7:G,w( A, i, j ) = "no". 
• J:G,w answers queries in constant time. 
Note that the answer 5~c,w gives can be arbi- 
J trary if A :=v* J but A does not c-derive w i . 
w i 
The constant-time constraint encodes the no- 
tion that information extraction is efficient; ob- 
serve that this is a stronger condition than that 
called for by Lang. 
\]0 
We define c-parsers in this way to make the 
class of c-parsers as broad as possible. If we 
had changed the first condition to "If A derives 
...", then Earley parsers would be excluded, 
since they do not keep track of all substring 
derivations. If we had written the second con- 
dition as "If A does not c-derive ur~i , then ... ", 
then CKY parsers would not be c-parsers, since 
they keep track of all substring derivations, not 
just c-derivations. So as it stands, the class of 
c-parsers includes tabular parsers (e.g. CKY), 
where 5rG,w is the table of substring deriva- 
tions, and Earley-type parsers, where ~'G,~ is 
the chart. Indeed, it includes all of the parsing 
algorithms mentioned in the introduction, and 
can be thought of as a formalization of Lang's 
informal definition of parsing. 
3 The reduction 
We will reduce BMM to c-parsing, thus prov- 
ing that any c-parsing algorithm can be used 
as a Boolean matrix multiplication algorithm. 
Our method, adapted from that of Satta (1994) 
(who considered the problem of parsing with 
tree-adjoining grammars), is to encode informa- 
tion about Boolean matrices into a CFG. Thus, 
given two Boolean matrices, we need to produce 
a string and a grammar such that parsing the 
string with respect to the grammar yields out- 
put from which information about the product 
of the two matrices can be easily retrieved. 
We can sketch the behavior of the grammar 
as follows. Suppose entries aik in A and bkj in 
B are both 1. Assume we have some way to 
break up array indices into two parts so that 
i can be reconstructed from il and i2, j can 
be reconstructed from jl and J2, and k can be 
reconstructed from kl and k2. (We will describe 
a way to do this later.) Then, we will have 
the following derivation (for a quantity 5 to be 
defined later) : 
Cil ,Jl ~ Ail ,kl Bkl ,jl 
derived by Ail,k I derived by Bkl,jl 
The key thing to observe is that Cil,jt generates 
two nonterminals whose "inner" indices match, 
and that these two nonterminals generate sub- 
strings that lie exactly next to each other. The 
"inner" indices constitute a check on kl, and the 
substring adjacency constitutes a check on k2. 
Let A and B be two Boolean matrices, each 
of size m x m, and let C be their Boolean matrix 
product, C = A x B. In the rest of this section, 
we consider A, B, C, and m to be fixed. Set 
n = \[ml/3\], and set 5 = n+2. We will be 
constructing a string of length 35; we choose 5 
slightly larger than n in order to avoid having 
epsilon-productions in our grammar. 
Recall that c/j is non-zero if and only if we 
can find a non-zero aik and a non-zero ~j such 
that k -- k. In essence, we need simply check 
for the equality of indices k and k. We will 
break matrix indices into two parts: our gram- 
mar will check whether the first parts of k and 
are equal, and our string will check whether 
the second parts are also equal, as we sketched 
above. Encoding the indices ensures that the 
grammar is of as small a size as possible, which 
will be important for our time bound results. 
Our index encoding function is as follows. Let 
i be a matrix index, 1 < i < m. Then we define 
the function/(i) -- (fl(i), f2(i)) by 
fl(i) = \[i/nJ (0 < fl(i) <_ n2), and 
f2(i) = (i mod n) + 2 (2_f2(i)_<n+l). 
Since fl and f2 are essentially the quotient and 
remainder of integer division of i by n, we can 
retrieve i from (fl(i),f2(i)). We will use the 
notational shorthand of using subscripts instead 
of the functions fl and f2, that is, we write il 
and i2 for fl(i) and f2(i). 
It is now our job to create a CFG G = 
(E, ~/: R, S) and a string w that encode infor- 
mation about A and B and express constraints 
about their product C. Our plan is to include 
a set of nonterminals {Cp,q : 1 < p,q < n 2} in 
V so that cij = 1 if and only if Cil,jl c-derives 
w j2+2~ In section 3.11 we describe a version i2 
of G and prove it has this c-derivation property. 
Then, in section 3.2 we explain that G can easily 
be converted to Chomsky normal form in such 
a way as to preserve c-derivations. 
11 
We choose the set of terminals to be E = 
{we : l<g<3n+6}, and choose the string 
to be parsed to be w = WlW2. "'w3n+6. 
We consider w to be made up of three 
parts, x, y, and z, each of size 6: w = 
WlW2 • " " Wn+2 Wn+3 • " " W2n+4 W2n+5 " " " W3n+6. ~ ~-~ , 
z ~ z 
Observe that for any i, 1 < i < m, wi.~ lies 
within x, wi2+~ lies within y, and wi~+2~ lies 
within z, since 
i2 E \[2, n+l\], 
i2 + 6 ~ \[n + 4, 2n + 3\], and 
i2 + 26 E \[2n + 6,3n + 5\]. 
3.1 The grammar 
Now we begin building the grammar G = 
(E, V, R, S). We start with the nonterminals 
V = {S} and the production set R = ~. We 
add nonterminal W to V for generating arbi- 
trary non-empty substrings of w; thus we need 
the productions 
(W-rules) W > wtWlwe, 1 < g < 3n + 6. 
Next we encode the entries of the input matrices 
A and B in our grammar. We include sets of 
non-terminals { Ap,q : 1 < p, q < n 2 } and { Bp,q : 
1 < p, q < n2}. Then, for every non-zero entry 
aij in A, we add the production 
(A-rules) Ai~,j~ > wi~Wwj2+~. 
For every non-zero entry bij in B, we add the 
production 
(B-rules) BQ,jl > zoi2+l+6Wzoj2+26. 
We need to represent entries of C, so we cre- 
ate nonterminals {Cp,q : 1 < p, q <_ n 2 } and pro- 
ductions 
(C-rules) Cp,q > Ap,rBr,q, 1 < p, q, r < n 2. 
Finally, we complete the construction with 
productions for the start symbol S: 
(S-rules) S > WCp,qW, l <_ p,q < n 2. 
We now prove the following result about the 
grammar and string we have just described. 
Theorem 1 For 1 <_ i,j < m, the entry cij 
in C is non-zero if and only if Ci~,jl c-derives 
W j2 +26 i2 
Proof. Fix i and j. 
Let us prove the :'only if" direction first. 
Thus, suppose c~j = 1. Then there exists a k 
such that aik = bkj = 1. Figure 1 sketches how 
Cil,j~ c-derives w~. -~+2~ iS 
Claim 1 Ci~,j~ 0* w. ~)+2~ i2 
The production Cil,jl > Ah,k~Bkx,j ~ is one of 
the C-rules in our grammar. Since aik = 1, 
Aix,k~ > wi2 Wwk2+~ is one of our A-rules, and 
since bkj -: 1, Bkl,j I ) Wk2+l+sWwj2+2 6 is 
one of our B-rules. Finally, since i2 + 1 < (k2 + 
6) -- 1 and (k2 + 1 +6) + 1 <__ (j2 +2~) - 1, 
we have W 0" .k2+~-1 and W =~* w j2+2~-~ wi2+l k2+2+6 ' 
since both substrings are of length at least one. 
Therefore, 
Cil ,jl o Ail ,kl Bkl ,jl 
=:~* Wi2 WWk2+~ Wk2+l+6Wwj2+26 
derived by Aq,k~ derivedby B~,~ 
:=~ , j2+26 Wi 2 , 
and Claim 1 follows, 
Claim 2 S 0" " i~-lc~ ~,,3n+6 Wl ~il ,jl uJj2+26+l • 
This claim is essentially trivial, since by 
the definition of the S-rules, we know that 
S =~* WCil,jl W. We need only show that nei- 
w3n+6 ther w~ "2-1 nor j2+26+1 is the empty string (and 
hence can be derived by W); since 1 < i2 - 1 
and j2 + 26 + 1 <__ 3n + 6, the claim holds. 
Claims 1 and 2 together prove that Cil,jl c- 
derives W j2+26 i2 , as required. 2 
Next we prove the "if" direction. Sup- 
pose Cil,j~ c-derives W j2+26 which by definition i2 ' 
means Cil,jl o* W j2+26 Then there must be i2 
a derivation resulting from the application of a 
C-rule as follows: 
Cil,jl 0 Ail,k, Bk,,jl =~* w~. .'2+2ci i2 
2This proof would have been simpler if we had al- 
lowed W to derive the empty string. However, we avoid 
epsilon-productions in order to facilitate the conversion 
to Chomsky normal form, discussed later. 
12 
W 
S 
Cil,j~ W 
W 1 ... Wi 2 ... Wk2+SWk2+lq- ~ ... Wj2+28 ... W3n+6 
x y z 
Figure 1: Schematic of the derivation process when aik -~ bkj ---- 1. The substrings derived by Ail,k~ 
and Bkl,jl lie right next to each other. 
for some k ~. It must be the case that for some 
~, Ail,k' =:~* w ~. and Bk',jl 0" ~ j~+2~ But z2 ~£+1 " 
then we must have the productions Ail,k' 
wi2Wwt and Bk',jl > ?.l)£+lWWj2+2 5 with ~ = 
k" + ~ for some k". But we can only have such 
productions if there exists a number k such that 
kl = k t, k2 = k n, aik = 1, and bkj ---- 1; and this 
implies that cij = 1. • 
Examination of the proof reveals that we have 
also shown the following two corollaries. 
Corollary 1 For 1 < i,j < m, cij = 1 if and 
only if Cil,jl =:b* j2+2~ Wi 2 
Corollary 2 S =~* w if and only if C is not 
the all-zeroes matrix. 
Let us now calculate the size of G. V consists 
of O((n2) 2) = O(m 4/3) nonterminals. R con- 
tains O(n) W-rules and O((n2) 2) = O(m 4/3) 
S-rules. There are at most m 2 A-rules, since 
we have an A-rule for each non-zero entry in A; 
similarly, there are at most m 2 B-rules. And 
lastly, there are (n2) 3 = O(m 2) C-rules. There- 
fore, our grammar is of size O(m2); since G en- 
codes matrices A and B, it is of optimal size. 
3.2 Chomsky normal form 
We would like our results to be true for the 
largest class of parsers possible. Since some 
parsers require the input grammar to be in 
Chomsky normal form (CNF), we therefore wish 
to construct a CNF version G ~ of G. However, 
in order to preserve time bounds, we desire that 
O(IG'I) = O(\]GI), and we also require that The- 
orem 1 holds for G ~ as well as G. 
The standard algorithm for converting CFGs 
to CNF can yield a quadratic blow-up in the 
size of the grammar and thus is clearly un- 
satisfactory for our purposes. However, since 
G contains no epsilon-productions or unit pro- 
ductions, it is easy to see that we can convert 
G simply by introducing a small (O(n)) num- 
ber of nonterminals without changing any c- 
derivations for the Cp,q. Thus, from now on we 
will simply assume that G is in CNF. 
3.3 Time bounds 
We are now in a position to prove our relation 
between time bounds for Boolean matrix multi- 
plication and time bounds for CFG parsing. 
13 
Theorem 2 Any c-parser P with running time 
O(T(g)t(N)) on grammars of size g and 
strings of length N can be converted into 
a BMM algorithm Mp that runs in time 
O(max(m 2, T(m2)t(mU3))). In particular, if P 
takes time O(gN3-e), then l~/Ip runs in time 
0(m3-~/3). 
Proof. Me acts as follows. Given two Boolean 
m x m matrices A and B, it constructs G and 
w as described above. It feeds G and w to P, 
which outputs $'c,w- To compute the prod- 
uct matrix C, Me queries for each i and j, 
1 < i,j < m, whether Ci~,jl derives wJ ~+2~ 
-- -- 't 2 
(we do not need to ask whether Cil,j~ c-derives 
w'\]J ~+26 because of corollary 1), setting cij appro- i2 
priately. By definition of c-parsers, each such 
query takes constant time. Let us compute the 
running time of Me. It takes O(m 2) time to 
read the input matrices. Since G is of size 
O(rn 2) and Iwl = O(ml/3), it takes O(m 2) time 
to build the input to P, which then computes 
5rG,w in time O(T(m2)t(ml/3)). Retrieving C 
takes O(m2). So the total time spent by Mp is 
O(max(m 2, T(m2)t(mU3))), as was claimed. 
In the case where T(g) = g and t(N) = N 3-e, 
Mp has a running time of O(m2(ml/3) a-e) = 
O(m 2+1-£/3) = O(m3-e'/3). II 
The case in which P takes time linear in the 
grammar size is of the most interest, since in 
natural language processing applications, the 
grammar tends to be far larger than the strings 
to be parsed. Observe that theorem 2 trans- 
lates the running time of the standard CFG 
parsers, O(gN3), into the running time of the 
standard BMM algorithm, O(m3). Also, a c- 
parser with running time O(gN 2"43) would yield 
a matrix multiplication algorithm rivalling that 
of Strassen's, and a c-parser with running time 
better than O(gN H2) could be converted into 
a BMM method faster than Coppersmith and 
Winograd. As per the discussion above, even if 
such parsers exist, they would in all likelihood 
not be very practical. Finally, we note that if 
a lower bound on BMM of the form f~(m 3-a) 
were found, then we would have an immediate 
lower bound of ~(N 3-3a) on c-parsers running 
in time linear in g. 
4 Related results and conclusion 
We have shown that fast practical CFG parsing 
algorithms yield fast practical BMM algorithms. 
Given that fast practical BMM algorithms are 
unlikely to exist, we have established a limita- 
tion on practical CFG parsing. 
Valiant (personal communication) notes that 
there is a reduction of m × m Boolean matrix 
multiplication checking to context-free recog- 
nition of strings of length m2; this reduc- 
tion is alluded to in a footnote of a paper 
by Harrison and Havel (1974). However, this 
reduction converts a parser running in time 
O(Iwl 1"5) to a BMM checking algorithm run- 
ning in time O(m 3) (the running time of the 
standard multiplication method), whereas our 
result says that sub-cubic practical parsers are 
quite unlikely; thus, our result is quite a bit 
stronger. 
Seiferas (1986) gives a simple proof of 
N 2 an ~t(lo-Q-W) lower bound (originally due to 
Gallaire (1969)) for the problem of on-line lin- 
ear CFL recognition by multitape Turing ma- 
chines. However, his results concern on-line 
recognition, which is a harder problem than 
parsing, and so do not apply to the general off- 
line parsing case. 
Finally, we recall Valiant's reduction of 
CFG parsing to boolean matrix multiplication 
(Valiant, 1975); it is rather pleasing to have the 
reduction cycle completed. 
5 Acknowledgments 
I thank Joshua Goodman, Rebecca Hwa, Jon 
Kleinberg, and Stuart Shieber for many helpful 
comments and conversations. Thanks to Les 
Valiant for pointing out the "folklore" reduc- 
tion. This material is based upon work sup- 
ported in part by the National Science Foun- 
dation under Grant No. IRI-9350192. I also 
gratefully acknowledge partial support from 
an NSF Graduate Fellowship and an AT&T 
GRPW/ALFP grant. Finally, thanks to Gior- 
gio Satta, who mailed me a preprint of his 
BMM/TAG paper several years ago. 
14 

References 
Arlazarov, V. L., E. A. Dinic, M. A. Kronrod, and 
I. A. Farad~ev. 1970. On economical construc- 
tion of the transitive closure of an oriented graph. 
Soviet Math. Dokl., 11:1209-1210. English trans- 
lation of the Russian article in Dokl. Akad. Nauk 
SSSR 194 (1970). 
Coppersmith, Don and Shmuel Winograd. 1990. 
Matrix multiplication via arithmetic progression. 
Journal of Symbolic Computation, 9(3):251-280. 
Special Issue on Computational Algebraic Com- 
plexity. 
Earley, Jay. 1970. An efficient context-free pars- 
ing algorithm. Communications of the A CM, 
13(2):94-102. 
Gallaire, Herv& 1969. Recognition time of context- 
free languages by on-line turing machines. Infor- 
mation and Control, 15(3):288-295, September. 
Graham, Susan L., Michael A. Harrison, and Wal- 
ter L. Ruzzo. 1980. An improved context-free 
recognizer. A CM Transactions on Programming 
Languages and Systems, 2(3):415-462. 
Harrison, Michael and Ivan Havel. 1974. On the 
parsing of deterministic languages. Journal of the 
ACM, 21(4):525-548, October. 
Kasami, Tadao. 1965. An efficient recognition and 
syntax algorithm for context-free languages. Sci- 
entific Report AFCRL-65-758, Air Force Cam- 
bridge Research Lab, Bedford, MA. 
Lang, Bernard. 1994. Recognition can be 
harder than parsing. Computational Intelligence, 
10(4):486-494, November. 
Satta, Giorgio. 1994. Tree-adjoining grammar pars- 
ing and boolean matrix multiplication. Computa- 
tional Linguistics, 20(2):173-191, June. 
Seiferas, Joel. 1986. A simplified lower bound 
for context-free-language recognition. Informa- 
tion and Control, 69:255-260. 
Strassen, Volker. 1969. Gaussian elimination is not 
optimal. Numerische Mathematik, 14(3):354-356. 
Strassen, Volker. 1990. Algebraic complexity the- 
ory. In Jan van Leeuwen, editor, Handbook of 
Theoretical Computer Science, volume A. Elsevier 
Science Publishers, chapter 11, pages 633-672. 
Valiant, Leslie G. 1975. General context-free recog- 
nition in less than cubic time. Journal of Com- 
puter and System Sciences, 10:308-315. 
Younger, Daniel H. 1967. Recognition and parsing 
of context-free languages in time n 3. Information 
and Control, 10(2):189-208. 
