File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/91/p91-1014_metho.xml

Size: 12,972 bytes

Last Modified: 2025-10-06 14:12:50

<?xml version="1.0" standalone="yes"?>
<Paper uid="P91-1014">
  <Title>Polynomial Time and Space Shift-Reduce Parsing of Arbitrary Context-free Grammars.*</Title>
  <Section position="3" start_page="106" end_page="109" type="metho">
    <SectionTitle>
2 The Parser
</SectionTitle>
    <Paragraph position="0"> The parser we propose handles any context-free grammar; the grammar can be ambiguous and need not be in any normal form. The parser is a predictive shift-reduce bottom-up parser that uses compiled top down prediction information in the form of tables. Before run-time, a non-deterministic push down automaton (NPDA) is constructed from a given context-free grammar. The parsing tables encode the finite state control and the moves of the NPDA. At run-time, the NPDA is then driven in pseudo-parallel with the help of a chart. We show the construction of a basic machine which will be driven non-deterministically.</Paragraph>
    <Paragraph position="1"> In the following, the input string is w -- al...an and the context-free grammar being considered is</Paragraph>
    <Paragraph position="3"> symbols, NT the set of non-terminal symbols, P a set of production rules, S the start symbol. We will need to refer to the subsequence of the input string w = az...aN from position i to j, w\]i,j\], which we define as follows: f ai+l ... aj , if i &lt; j w\]i,~\] I, C/ ,ifi&gt;_j We explain the data-structures used by the parser, the moves of the parser, and how the parsing tables are constructed for the basic NPDA. Then, we study the formal characteristics of the parser.</Paragraph>
    <Paragraph position="4"> The parser uses two moves: shift and reduce. As in standard shift-reduce parsers, shift moves recognize new terminal symbols and reduce moves perform the recognition of an entire context-free rule. However in the parser we propose, shift and reduce moves behave differently on rules whose recognition has just started (i.e. rules that have been predicted) than on rules of which some portion has been recognized. This behavior enables the parser to efficiently perform reduce moves when ambiguity arises.</Paragraph>
    <Section position="1" start_page="107" end_page="108" type="sub_section">
      <SectionTitle>
2.1 Data-Structures and the Moves of
the Parser
</SectionTitle>
      <Paragraph position="0"> The parser collects items into a set called the chart, C. Each item encodes a well formed substring of the input. The parser proceeds until no more items can be added to the chart C.</Paragraph>
      <Paragraph position="1"> An item is defined as a triple (s,i,jl, where s is a state in the control of the NPDA, i and j are indices referring to positions in the input string (i, j E \[0, n\]). In an item (s,i,j), j corresponds to the current position in the input string and i is a position in the input which will facilitate the reduce move.</Paragraph>
      <Paragraph position="2"> A dotted rule of a context-free grammar G is defined as a production of G associated with a dot at some position of the right hand side: A ~ a */~ with A --~ afl E P.</Paragraph>
      <Paragraph position="3"> We distinguish two kinds of dotted rules. Kernel dotted rules, which are of the form A ~ a * fl with a non empty, and non-kernel dotted rules, which have the dot at the left most position in the right hand side (A --~ *1~). As we will see, non-kernel dotted rules correspond to the predictive component of the parser.</Paragraph>
      <Paragraph position="4"> We will later see each state s of the NPDA corresponds to a set of dotted rules for the grammar G. The set of all possible states in the control of the NPDA is written S. Section 2.2 explains how the states are constructed.</Paragraph>
      <Paragraph position="5"> The algorithm maintains the following property (which guarantees its soundness)4: if an item (s, i,j) is in the chart C then for all dotted rules A ~ aofl E s the following is satisfied:  (i) if a E (E U NT) +, then B7 E (NT U ~)* such that S~w\]o,i\]A 7 and a=:=~w\]~d\]; (ii) if a is the empty string, then B 7 E (NT O ~)*  such that S=~w\]0./\]A 7.</Paragraph>
      <Paragraph position="6"> The parser uses three tables to determine which move(s) to perform: an action table, ACTION, and two goto tables, the kernel goto table, GOTOk, and the non-kernel goto table, GOTOnk.</Paragraph>
      <Paragraph position="7"> The goto tables are accessed by a state and a non-terminal symbol. They each contain a set of states:</Paragraph>
      <Paragraph position="9"> r, rt,s E S,X E NT. The use of these tables is explained below.</Paragraph>
      <Paragraph position="10"> The action table is accessed by a state and a terminal symbol. It contains a set of actions. Given an item, (s, i,j), the possible actions are determined by the content of ACTION(s, aj+x) where aj+l is the j + 1 th input token. The possible actions contained in ACTION(s, aj+l) are the following: * KERNEL SHIFT s t, (ksh(s t) for short), for s t E S. A new token is recognized in a kernel dotted rule A --* a * aft and a push move is performed.</Paragraph>
      <Paragraph position="11"> The item (s I, i,j + 1) is added to the chart, since aa spans in this case w\]i,j+l\].</Paragraph>
      <Paragraph position="12"> * NON-KERNEL SHIFT s t, (nksh(s I) for short), for s t E S. A new token is recognized in a non-kernel dotted rule of the form A --* *aft. The</Paragraph>
      <Paragraph position="14"> spans in this case wljj+x \]</Paragraph>
      <Paragraph position="16"> been totally recognized. The rule spans the sub-string ai+z ...aj. For all items in the chart of the form (s ~, k, i), perform the following two steps: - for all rl E GOTOk(s',X), it adds the item (ra, k,j) to the chart. In this case, a dotted rule of the form A ~ a * Xfl is combined with X --* fl* to form A ---* aX */~; since a spans w\]k,i\] and X spans wli,j\], aX spans w\]k,j\].</Paragraph>
      <Paragraph position="17">  - for all r2 E GOTOnk(s t, X), it adds the item (r2,i,j) to the chart. In this case, a dotted rule of the form A ~ * Xf~ is combined with X --~ fl* to form A ~ X */~; in this  more items can be added to the chart C: (1) KERNEL SHIFT: if (s,i,j) 6 C and if ksh(s') 6 ACTION(s, aj+I), then (s', i, j + 1) is added to C.</Paragraph>
      <Paragraph position="18"> (2) NON-KERNEL SHIFT: if (s,i,j) e C and if nksh(s') E ACTION(s, aj+I), then (s',j,j+ 1) is added to C.</Paragraph>
      <Paragraph position="19"> (3) REDUCE: if (s, i, j) E C, then for all</Paragraph>
      <Paragraph position="21"> In the above algorithm, non-determinism arises from multiple entries in ACTION(s, a) and also from the fact that GOTOk(s,X)and GOTOnk(s,X)contain a set of states.</Paragraph>
    </Section>
    <Section position="2" start_page="108" end_page="109" type="sub_section">
      <SectionTitle>
2.2 Construction of the Parsing Tables
</SectionTitle>
      <Paragraph position="0"> We shall give an LR(0)-like method for constructing the parsing tables corresponding to the basic NPDA.</Paragraph>
      <Paragraph position="1"> Several other methods (such as LR(k)-like, SLR(k)like) can also be used for constructing the parsing tables and are described in (Schabes, 1991).</Paragraph>
      <Paragraph position="2"> To construct the LR(0)-like finite state control for the basic non-deterministic push-down automaton that the parser simulates, we define three functions, closure, gotok and gotonk.</Paragraph>
      <Paragraph position="3"> If s is a state, then closure(s) is the state con- null structed from s by the two rules: (i) Initially, every dotted rule in s is added to closure(s); (ii) If A --* a * B/~ is in closure(s) and B --* 7 is a  production, then add the dotted rule B --* e7 to closure(s) (if it is not already there). This rule is applied until no more new dotted rules can be added to closure(s).</Paragraph>
      <Paragraph position="4"> If s is a state and if X is a non-terminal or terminal symbol, gotok(s,X) and gotonk(s,X) are the set of states defined as follows:</Paragraph>
      <Paragraph position="6"> The goto functions we define differ from the one defined for the LR(0) construction in two ways: first we have distinguished transitions on symbols from kernel items and non-kernel items; second, each state in goto~(s,X) and gOtOn~(S,X) contains exactly one kernel item whereas for the LR(0) construction they may contain more than one.</Paragraph>
      <Paragraph position="7"> We are now ready to compute the set of states ,9 defining the finite state control of the parser.</Paragraph>
      <Paragraph position="8"> The SET OF STATES CONSTRUCTION is constructed as follows:</Paragraph>
      <Paragraph position="10"> until no more states can be added to 8 end PARSING TABLES. Now we construct the LR(0)  parsing tables ACTION, GOTOk and GOTOnk from the finite state control constructed above. Given a context-free grammar G, we construct ~q, the set of states for G with the procedure given above. We construct the action table ACTION and the goto tables using the following algorithm.</Paragraph>
      <Paragraph position="11"> begin (CONSTRUCTION OF THE PARSING TABLES) Input: A context-free grammar G = (Y,, NT, P, S).</Paragraph>
      <Paragraph position="12"> Output: The parsing tables ACTION, GOTOk and GOTOnk for G, the start state start and the set of final states ~'.</Paragraph>
      <Paragraph position="13">  (i) for all r e gotok(si,a), add ksh(r) to ACTION(si, a); (ii) for all r E goto, k(si,a), add nksh(r) to to ACTION(si, a); (iii) if A --* a* is in si, then add red(A--* a)  to ACTION(si, a) for all terminal symbol a and for the end marker $.</Paragraph>
      <Paragraph position="14"> Step 4. The kernel and non-kernel goto tables for state si are determined for all non-terminal symbols X as follows:  (i) VX E NT, GOTO~(si,X) := gotok(si,X) (ii) VX E NT, GOTOnk(si, X) :-- gotonk(si, X) Step 3. The start state of the parser is start := ciosure({S --* .a I S --~ a ~_ P}) Step 4. The set of final states of the parser is Y := {s e SI3 S--* a 6 P s.t. S--. a. E s} end (CONSTRUCTION OF THE PARSING TABLES) Appendix A gives an example of a parsing table.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="109" end_page="109" type="metho">
    <SectionTitle>
3 Complexity
</SectionTitle>
    <Paragraph position="0"> The recognizer requires in the worst case O(\[GIn2)space and O(\[G\[2na)-time; n is the length of the input string, \]GI is the size of the grammar computed as the sum of the lengths of the right hand side of each productions:</Paragraph>
    <Paragraph position="2"> One of the objectives for the design of the non-deterministic machine was to make sure that it was not possible to reach an exponential number of states, a property without which the machine is doomed to have exponential complexity (Johnson, 1989). First we observe that the number of states of the finite state control of the non-deterministic machine that we constructed in Section 2.2 is proportional to the size of the grammar, IG\[. By construction, each state (except for the start state) contains exactly one kernel dotted rule. Therefore, the number of states is bounded by the maximum number of kernel rules of the form A --* ao/~ (with a non empty), and is O(IGI).</Paragraph>
    <Paragraph position="3"> We conclude that the algorithm requires in the worst case O(IGIn~)-space since the maximum number of items (8, i, j) in the chart is proportional to IGIn 2.</Paragraph>
    <Paragraph position="4"> A close look at the moves of the parser reveals that the reduce move is the most complex one since it involves a pair of states (s, i,j) and (s', k,j/. This move can be instantiated at most O(IGI2nS)-time since i,j,k E \[0, n\] and there are in the worst case O(IGI ~) pairs of states involved in this move. 5 The parser therefore behaves in the worst case in O(IGI2nS)-time.</Paragraph>
    <Paragraph position="5"> One should however note that in order to bound the worst case complexity as stated above, arrays similar to the one needed for Earley's parser must be used to implement efficiently the shift and reduce moves. 6 As for Earley's parser, it can also be shown that the algorithm requires in the worst case O(IGI2n2)-time for unambiguous context-free grammars and behaves in linear time on a large class of grammars.</Paragraph>
  </Section>
  <Section position="5" start_page="109" end_page="109" type="metho">
    <SectionTitle>
4 Retrieving a Parse
</SectionTitle>
    <Paragraph position="0"> The algorithm that we described in Section 2 is a recognizer. However, if we include pointers from an item to the other items (to a pair of items for the reduce moves or to an item for the shift moves) which caused it to be placed in the chart, the recognizer can be modified to record all parse trees of the input string.</Paragraph>
    <Paragraph position="1"> The representation is similar to a shared forest.</Paragraph>
    <Paragraph position="2"> The worst case time complexity of the parser is the same as for the recognizer (O(\[GI2n3)-time) but, as for Earley's parser, the worst case space complexity increases to O(\[G\[2n 3) because of the additional bookkeeping. null</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML