File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/97/p97-1002_intro.xml

Size: 4,843 bytes

Last Modified: 2025-10-06 14:06:15

<?xml version="1.0" standalone="yes"?>
<Paper uid="P97-1002">
  <Title>Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication</Title>
  <Section position="3" start_page="0" end_page="0" type="intro">
    <SectionTitle>
2 Definitions
</SectionTitle>
    <Paragraph position="0"> A Boolean matrix is a matrix with entries from the set {0, 1}. A Boolean matrix multiplication algorithm takes as input two m x m Boolean matrices A and B and returns their Boolean product A x B, which is the m x m Boolean matrix C whose entries c~j are defined by</Paragraph>
    <Paragraph position="2"> That is, c.ij = 1 if and only if there exists a number k, 1 &lt; k &lt; m, such that aik = bkj = 1.</Paragraph>
    <Paragraph position="3"> We use the usual definition of a context-free grammar (CFG) as a 4-tuple G = (E, V, R, S), where E is the set of terminals, V is the set of nonterminals, R is the set of productions,</Paragraph>
    <Paragraph position="5"> element of E, we use the notation ~ to denote the substring wiwi+l &amp;quot; &amp;quot; &amp;quot; Wj-lWj * We will be concerned with the notion of c-derivations, which are substring derivations that are consistent with a derivation of an entire string. Intuitively, A =~* w~i is a c-derivation if it is consistent with at least one parse of w.</Paragraph>
    <Paragraph position="6"> Definition 1 Let G = (E, V, R, S) be a CFG, and let w = wlw2...wN, wi E ~. A nontermi-J hal A E V c-derives (consistently derives) w i if and only if the following conditions hold:</Paragraph>
    <Paragraph position="8"> (These conditions together imply that S ~* w.) We would like our results to apply to all &amp;quot;practical&amp;quot; parsers, but what does it mean for a parser to be practical? First, we would like to be able to retrieve constituent information for all possible parses of a string (after all, the recovery of structural information is what distinguishes parsing algorithms from recognition algorithms); such information is very useful for applications like natural language understanding, where multiple interpretations for a sentence may result from different constituent structures. Therefore, practical parsers should keep track of c-derivations. Secondly, a parser should create an output structure from which information about constituents can be retrieved in an efficient way -- Satta (1994) points out an observation of Lang to the effect that one can consider the input string itself to be a retrievalinefficient representation of parse information. In short, we require practical parsers to output a representation of the parse forest for a string that allows efficient retrieval of parse information. Lang in fact argues that parsing means exactly the production of a shared forest structure &amp;quot;from which any specific parse can be extracted in time linear with the size of the extracted parse tree&amp;quot; (Lang, 1994, pg. 487), and Satta (1994) makes this assumption as well.</Paragraph>
    <Paragraph position="9"> These notions lead us to equate practical parsers with the class of c-parsers, which keep track of c-derivations and may also calculate general substring derivations as well.</Paragraph>
    <Paragraph position="10"> Definition 2 A c-parser is an algorithm that takes a CFG grammar G = (E,V,R,S) and string w E E* as input and produces output ~G,w; J:G,w acts as an oracle about parse information, as follows:  * If A c-derives w~, then .7:G,w(A,i,j) = &amp;quot;yes &amp;quot;.</Paragraph>
    <Paragraph position="11"> If A ~* J :which implies that A does not * W i c-derive wJi ), then :7:G,w( A, i, j ) = &amp;quot;no&amp;quot;. * J:G,w answers queries in constant time.</Paragraph>
    <Paragraph position="12">  Note that the answer 5~c,w gives can be arbi-J trary if A :=v* J but A does not c-derive w i . w i The constant-time constraint encodes the notion that information extraction is efficient; observe that this is a stronger condition than that called for by Lang.</Paragraph>
    <Paragraph position="13">  We define c-parsers in this way to make the class of c-parsers as broad as possible. If we had changed the first condition to &amp;quot;If A derives ...&amp;quot;, then Earley parsers would be excluded, since they do not keep track of all substring derivations. If we had written the second condition as &amp;quot;If A does not c-derive ur~i , then ... &amp;quot;, then CKY parsers would not be c-parsers, since they keep track of all substring derivations, not just c-derivations. So as it stands, the class of c-parsers includes tabular parsers (e.g. CKY), where 5rG,w is the table of substring derivations, and Earley-type parsers, where ~'G,~ is the chart. Indeed, it includes all of the parsing algorithms mentioned in the introduction, and can be thought of as a formalization of Lang's informal definition of parsing.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML