File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/p97-1002_metho.xml

Size: 12,140 bytes

Last Modified: 2025-10-06 14:14:33

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1002">
  <Title>Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication</Title>
  <Section position="4" start_page="0" end_page="13" type="metho">
    <SectionTitle>
3 The reduction
</SectionTitle>
    <Paragraph position="0"> We will reduce BMM to c-parsing, thus proving that any c-parsing algorithm can be used as a Boolean matrix multiplication algorithm.</Paragraph>
    <Paragraph position="1"> Our method, adapted from that of Satta (1994) (who considered the problem of parsing with tree-adjoining grammars), is to encode information about Boolean matrices into a CFG. Thus, given two Boolean matrices, we need to produce a string and a grammar such that parsing the string with respect to the grammar yields output from which information about the product of the two matrices can be easily retrieved.</Paragraph>
    <Paragraph position="2"> We can sketch the behavior of the grammar as follows. Suppose entries aik in A and bkj in B are both 1. Assume we have some way to break up array indices into two parts so that i can be reconstructed from il and i2, j can be reconstructed from jl and J2, and k can be reconstructed from kl and k2. (We will describe a way to do this later.) Then, we will have the following derivation (for a quantity 5 to be defined later) : Cil ,Jl ~ Ail ,kl Bkl ,jl derived by Ail,k I derived by Bkl,jl The key thing to observe is that Cil,jt generates two nonterminals whose &amp;quot;inner&amp;quot; indices match, and that these two nonterminals generate sub-strings that lie exactly next to each other. The &amp;quot;inner&amp;quot; indices constitute a check on kl, and the substring adjacency constitutes a check on k2.</Paragraph>
    <Paragraph position="3"> Let A and B be two Boolean matrices, each of size m x m, and let C be their Boolean matrix product, C = A x B. In the rest of this section, we consider A, B, C, and m to be fixed. Set n = \[ml/3\], and set 5 = n+2. We will be constructing a string of length 35; we choose 5 slightly larger than n in order to avoid having epsilon-productions in our grammar.</Paragraph>
    <Paragraph position="4"> Recall that c/j is non-zero if and only if we can find a non-zero aik and a non-zero ~j such that k -- k. In essence, we need simply check for the equality of indices k and k. We will break matrix indices into two parts: our grammar will check whether the first parts of k and are equal, and our string will check whether the second parts are also equal, as we sketched above. Encoding the indices ensures that the grammar is of as small a size as possible, which will be important for our time bound results.</Paragraph>
    <Paragraph position="5"> Our index encoding function is as follows. Let i be a matrix index, 1 &lt; i &lt; m. Then we define the function/(i) -- (fl(i), f2(i)) by</Paragraph>
    <Paragraph position="7"> Since fl and f2 are essentially the quotient and remainder of integer division of i by n, we can retrieve i from (fl(i),f2(i)). We will use the notational shorthand of using subscripts instead of the functions fl and f2, that is, we write il and i2 for fl(i) and f2(i).</Paragraph>
    <Paragraph position="8"> It is now our job to create a CFG G = (E, ~/: R, S) and a string w that encode information about A and B and express constraints about their product C. Our plan is to include a set of nonterminals {Cp,q : 1 &lt; p,q &lt; n 2} in V so that cij = 1 if and only if Cil,jl c-derives w j2+2~ In section 3.11 we describe a version i2 of G and prove it has this c-derivation property.</Paragraph>
    <Paragraph position="9"> Then, in section 3.2 we explain that G can easily be converted to Chomsky normal form in such a way as to preserve c-derivations.</Paragraph>
    <Paragraph position="10">  We choose the set of terminals to be E = {we : l&lt;g&lt;3n+6}, and choose the string to be parsed to be w = WlW2. &amp;quot;'w3n+6.</Paragraph>
    <Paragraph position="11"> We consider w to be made up of three parts, x, y, and z, each of size 6: w =</Paragraph>
    <Section position="1" start_page="11" end_page="12" type="sub_section">
      <SectionTitle>
3.1 The grammar
</SectionTitle>
      <Paragraph position="0"> Now we begin building the grammar G = (E, V, R, S). We start with the nonterminals V = {S} and the production set R = ~. We add nonterminal W to V for generating arbitrary non-empty substrings of w; thus we need  Finally, we complete the construction with productions for the start symbol S:</Paragraph>
      <Paragraph position="2"> We now prove the following result about the grammar and string we have just described.</Paragraph>
      <Paragraph position="3"> Theorem 1 For 1 &lt;_ i,j &lt; m, the entry cij in C is non-zero if and only if Ci~,jl c-derives W j2 +26 i2 Proof. Fix i and j.</Paragraph>
      <Paragraph position="4"> Let us prove the :'only if&amp;quot; direction first. Thus, suppose c~j = 1. Then there exists a k such that aik = bkj = 1. Figure 1 sketches how  This claim is essentially trivial, since by the definition of the S-rules, we know that S =~* WCil,jl W. We need only show that neiw3n+6 ther w~ &amp;quot;2-1 nor j2+26+1 is the empty string (and hence can be derived by W); since 1 &lt; i2 - 1 and j2 + 26 + 1 &lt;__ 3n + 6, the claim holds. Claims 1 and 2 together prove that Cil,jl c-derives W j2+26 i2 , as required. 2 Next we prove the &amp;quot;if&amp;quot; direction. Suppose Cil,j~ c-derives W j2+26 which by definition i2 ' means Cil,jl o* W j2+26 Then there must be i2 a derivation resulting from the application of a C-rule as follows: Cil,jl 0 Ail,k, Bk,,jl =~* w~. .'2+2ci i2 2This proof would have been simpler if we had allowed W to derive the empty string. However, we avoid epsilon-productions in order to facilitate the conversion to Chomsky normal form, discussed later.</Paragraph>
      <Paragraph position="5">  and Bkl,jl lie right next to each other.</Paragraph>
      <Paragraph position="6"> for some k ~. It must be the case that for some ~, Ail,k' =:~* w ~. and Bk',jl 0&amp;quot; ~ j~+2~ But z2 ~PS+1 &amp;quot; then we must have the productions Ail,k' wi2Wwt and Bk',jl &gt; ?.l)PS+lWWj2+2 5 with ~ = k&amp;quot; + ~ for some k&amp;quot;. But we can only have such productions if there exists a number k such that kl = k t, k2 = k n, aik = 1, and bkj ---- 1; and this implies that cij = 1. * Examination of the proof reveals that we have also shown the following two corollaries.</Paragraph>
      <Paragraph position="7"> Corollary 1 For 1 &lt; i,j &lt; m, cij = 1 if and only if Cil,jl =:b* j2+2~ Wi 2 Corollary 2 S =~* w if and only if C is not the all-zeroes matrix.</Paragraph>
      <Paragraph position="8"> Let us now calculate the size of G. V consists of O((n2) 2) = O(m 4/3) nonterminals. R contains O(n) W-rules and O((n2) 2) = O(m 4/3) S-rules. There are at most m 2 A-rules, since we have an A-rule for each non-zero entry in A; similarly, there are at most m 2 B-rules. And lastly, there are (n2) 3 = O(m 2) C-rules. Therefore, our grammar is of size O(m2); since G encodes matrices A and B, it is of optimal size.</Paragraph>
    </Section>
    <Section position="2" start_page="12" end_page="12" type="sub_section">
      <SectionTitle>
3.2 Chomsky normal form
</SectionTitle>
      <Paragraph position="0"> We would like our results to be true for the largest class of parsers possible. Since some parsers require the input grammar to be in Chomsky normal form (CNF), we therefore wish to construct a CNF version G ~ of G. However, in order to preserve time bounds, we desire that O(IG'I) = O(\]GI), and we also require that Theorem 1 holds for G ~ as well as G.</Paragraph>
      <Paragraph position="1"> The standard algorithm for converting CFGs to CNF can yield a quadratic blow-up in the size of the grammar and thus is clearly unsatisfactory for our purposes. However, since G contains no epsilon-productions or unit productions, it is easy to see that we can convert G simply by introducing a small (O(n)) number of nonterminals without changing any c-derivations for the Cp,q. Thus, from now on we will simply assume that G is in CNF.</Paragraph>
    </Section>
    <Section position="3" start_page="12" end_page="13" type="sub_section">
      <SectionTitle>
3.3 Time bounds
</SectionTitle>
      <Paragraph position="0"> We are now in a position to prove our relation between time bounds for Boolean matrix multiplication and time bounds for CFG parsing.</Paragraph>
      <Paragraph position="1">  Theorem 2 Any c-parser P with running time O(T(g)t(N)) on grammars of size g and strings of length N can be converted into a BMM algorithm Mp that runs in time O(max(m 2, T(m2)t(mU3))). In particular, if P takes time O(gN3-e), then l~/Ip runs in time 0(m3-~/3).</Paragraph>
      <Paragraph position="2"> Proof. Me acts as follows. Given two Boolean m x m matrices A and B, it constructs G and w as described above. It feeds G and w to P, which outputs $'c,w- To compute the product matrix C, Me queries for each i and j, 1 &lt; i,j &lt; m, whether Ci~,jl derives wJ ~+2~ -- -- 't 2 (we do not need to ask whether Cil,j~ c-derives w'\]J ~+26 because of corollary 1), setting cij appro- i2 priately. By definition of c-parsers, each such query takes constant time. Let us compute the running time of Me. It takes O(m 2) time to read the input matrices. Since G is of size O(rn 2) and Iwl = O(ml/3), it takes O(m 2) time to build the input to P, which then computes 5rG,w in time O(T(m2)t(ml/3)). Retrieving C takes O(m2). So the total time spent by Mp is O(max(m 2, T(m2)t(mU3))), as was claimed.</Paragraph>
      <Paragraph position="3"> In the case where T(g) = g and t(N) = N 3-e, Mp has a running time of O(m2(ml/3) a-e) = O(m 2+1-PS/3) = O(m3-e'/3). II The case in which P takes time linear in the grammar size is of the most interest, since in natural language processing applications, the grammar tends to be far larger than the strings to be parsed. Observe that theorem 2 translates the running time of the standard CFG parsers, O(gN3), into the running time of the standard BMM algorithm, O(m3). Also, a c-parser with running time O(gN 2&amp;quot;43) would yield a matrix multiplication algorithm rivalling that of Strassen's, and a c-parser with running time better than O(gN H2) could be converted into a BMM method faster than Coppersmith and Winograd. As per the discussion above, even if such parsers exist, they would in all likelihood not be very practical. Finally, we note that if a lower bound on BMM of the form f~(m 3-a) were found, then we would have an immediate lower bound of ~(N 3-3a) on c-parsers running in time linear in g.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="13" end_page="13" type="metho">
    <SectionTitle>
4 Related results and conclusion
</SectionTitle>
    <Paragraph position="0"> We have shown that fast practical CFG parsing algorithms yield fast practical BMM algorithms.</Paragraph>
    <Paragraph position="1"> Given that fast practical BMM algorithms are unlikely to exist, we have established a limitation on practical CFG parsing.</Paragraph>
    <Paragraph position="2"> Valiant (personal communication) notes that there is a reduction of m x m Boolean matrix multiplication checking to context-free recognition of strings of length m2; this reduction is alluded to in a footnote of a paper by Harrison and Havel (1974). However, this reduction converts a parser running in time O(Iwl 1&amp;quot;5) to a BMM checking algorithm running in time O(m 3) (the running time of the standard multiplication method), whereas our result says that sub-cubic practical parsers are quite unlikely; thus, our result is quite a bit stronger.</Paragraph>
    <Paragraph position="3"> Seiferas (1986) gives a simple proof of N 2 an ~t(lo-Q-W) lower bound (originally due to Gallaire (1969)) for the problem of on-line linear CFL recognition by multitape Turing machines. However, his results concern on-line recognition, which is a harder problem than parsing, and so do not apply to the general off-line parsing case.</Paragraph>
    <Paragraph position="4"> Finally, we recall Valiant's reduction of CFG parsing to boolean matrix multiplication (Valiant, 1975); it is rather pleasing to have the reduction cycle completed.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML