File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1002_intro.xml
Size: 4,843 bytes
Last Modified: 2025-10-06 14:06:15
<?xml version="1.0" standalone="yes"?> <Paper uid="P97-1002"> <Title>Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication</Title> <Section position="3" start_page="0" end_page="0" type="intro"> <SectionTitle> 2 Definitions </SectionTitle> <Paragraph position="0"> A Boolean matrix is a matrix with entries from the set {0, 1}. A Boolean matrix multiplication algorithm takes as input two m x m Boolean matrices A and B and returns their Boolean product A x B, which is the m x m Boolean matrix C whose entries c~j are defined by</Paragraph> <Paragraph position="2"> That is, c.ij = 1 if and only if there exists a number k, 1 < k < m, such that aik = bkj = 1.</Paragraph> <Paragraph position="3"> We use the usual definition of a context-free grammar (CFG) as a 4-tuple G = (E, V, R, S), where E is the set of terminals, V is the set of nonterminals, R is the set of productions,</Paragraph> <Paragraph position="5"> element of E, we use the notation ~ to denote the substring wiwi+l &quot; &quot; &quot; Wj-lWj * We will be concerned with the notion of c-derivations, which are substring derivations that are consistent with a derivation of an entire string. Intuitively, A =~* w~i is a c-derivation if it is consistent with at least one parse of w.</Paragraph> <Paragraph position="6"> Definition 1 Let G = (E, V, R, S) be a CFG, and let w = wlw2...wN, wi E ~. A nontermi-J hal A E V c-derives (consistently derives) w i if and only if the following conditions hold:</Paragraph> <Paragraph position="8"> (These conditions together imply that S ~* w.) We would like our results to apply to all &quot;practical&quot; parsers, but what does it mean for a parser to be practical? First, we would like to be able to retrieve constituent information for all possible parses of a string (after all, the recovery of structural information is what distinguishes parsing algorithms from recognition algorithms); such information is very useful for applications like natural language understanding, where multiple interpretations for a sentence may result from different constituent structures. Therefore, practical parsers should keep track of c-derivations. Secondly, a parser should create an output structure from which information about constituents can be retrieved in an efficient way -- Satta (1994) points out an observation of Lang to the effect that one can consider the input string itself to be a retrievalinefficient representation of parse information. In short, we require practical parsers to output a representation of the parse forest for a string that allows efficient retrieval of parse information. Lang in fact argues that parsing means exactly the production of a shared forest structure &quot;from which any specific parse can be extracted in time linear with the size of the extracted parse tree&quot; (Lang, 1994, pg. 487), and Satta (1994) makes this assumption as well.</Paragraph> <Paragraph position="9"> These notions lead us to equate practical parsers with the class of c-parsers, which keep track of c-derivations and may also calculate general substring derivations as well.</Paragraph> <Paragraph position="10"> Definition 2 A c-parser is an algorithm that takes a CFG grammar G = (E,V,R,S) and string w E E* as input and produces output ~G,w; J:G,w acts as an oracle about parse information, as follows: * If A c-derives w~, then .7:G,w(A,i,j) = &quot;yes &quot;.</Paragraph> <Paragraph position="11"> If A ~* J :which implies that A does not * W i c-derive wJi ), then :7:G,w( A, i, j ) = &quot;no&quot;. * J:G,w answers queries in constant time.</Paragraph> <Paragraph position="12"> Note that the answer 5~c,w gives can be arbi-J trary if A :=v* J but A does not c-derive w i . w i The constant-time constraint encodes the notion that information extraction is efficient; observe that this is a stronger condition than that called for by Lang.</Paragraph> <Paragraph position="13"> We define c-parsers in this way to make the class of c-parsers as broad as possible. If we had changed the first condition to &quot;If A derives ...&quot;, then Earley parsers would be excluded, since they do not keep track of all substring derivations. If we had written the second condition as &quot;If A does not c-derive ur~i , then ... &quot;, then CKY parsers would not be c-parsers, since they keep track of all substring derivations, not just c-derivations. So as it stands, the class of c-parsers includes tabular parsers (e.g. CKY), where 5rG,w is the table of substring derivations, and Earley-type parsers, where ~'G,~ is the chart. Indeed, it includes all of the parsing algorithms mentioned in the introduction, and can be thought of as a formalization of Lang's informal definition of parsing.</Paragraph> </Section> class="xml-element"></Paper>