File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/p90-1035_metho.xml

Size: 17,691 bytes

Last Modified: 2025-10-06 14:12:38

<?xml version="1.0" standalone="yes"?>
<Paper uid="P90-1035">
  <Title>DETERMINISTIC LEFT TO RIGHT PARSING OF TREE ADJOINING LANGUAGES*</Title>
  <Section position="5" start_page="276" end_page="277" type="metho">
    <SectionTitle>
2 Automata Models of Tags
</SectionTitle>
    <Paragraph position="0"> Before we discuss the Bottom-up Embedded Push-down Automaton (BEPDA) which we use in our parser, we will introduce the Embedded Pushdown Automaton (EPDA). An EPDA is similar to a pushdown automaton (PDA) except that the storage of an EPDA is a sequence of pushdown stores. A move of an EPDA (see Figure 1) allows for the introduction of bounded pushdowns above and below the current top pushdown. Informally, this move can be thought of as corresponding to the adjoining operation move in TAGs with the pushdowns introduced above and below the current pushdown reflecting the tree structure to the left and right of the foot node of an auxiliary being adjoined. The spine (path from root to foot node) is left on the previous stack.</Paragraph>
    <Paragraph position="1"> The generalization of a PDA to an EPDA whose storage is a sequence of pushdowns captures the generalization of the nature of the derived trees of a CFG to the nature of derived trees of a TAG. From Thatcher (1971), we can observe that the path set of a CFG (i.e. the set of all paths from root to leaves in trees derived by a CFG) is a regular set. On the other hand, the path set of a TAG is a CFL. This follows from the nature of the adjoining operation of TAGs, which suggests stacking along the path from root to a leaf. For example, as we traverse down a path in a tree 3' (in Figure 1), if adjunction, say by/~, occurs then the spine of/~ has to be traversed before we can resume the path in 7.</Paragraph>
    <Paragraph position="2"> ~ e ~ -gQeft of foot d \[~ ~ .,~splne of I~ i~fight d foot of ~</Paragraph>
  </Section>
  <Section position="6" start_page="277" end_page="277" type="metho">
    <SectionTitle>
3 Bottom-up Embedded Push-
</SectionTitle>
    <Paragraph position="0"> down Automaton 3 For any TAG G, an EPDA can be designed such that its moves correspond to a top-down parse of a string generated by G (EPDA characterizes exactly the set of Tree Adjoining Languages, Vijay- Shanker, 1987). If we wish to design a bottom-up parser, say by adopting a shift reduce parsing strategy, we have to consider the nature of a reduce move of such a parser (i.e. using EPDA storage). This reduce move, for example applied after completely considering an auxiliary tree, must be allowed to 'remove' some bounded pushdowns above and below some (not necessarily bounded) pushdown.</Paragraph>
    <Paragraph position="1"> Thus (see Figure 2), the reduce move is like the dual of the wrapping move performed by an EPDA.</Paragraph>
    <Paragraph position="2"> Therefore, we introduce Bottom-up Embedded Push-down Automaton (BEPDA), whose moves are dual of an EPDA. The two moves of a BEPDA are the unwrap move depicted in Figure 2 - which is an inverse of the wrap move of an EPDA - and the introduction of new pnshdowns on top of the previous pushdown (push move). In an EPDA, when the top pnshdown is emptied, the next pushdown automatically becomes the new top pushdown. The inverse of this step is to allow for the introduction of new pushdowns above the previous top pushdown. These are the two moves allowed in a BEPDA, the various steps in our parsers are sequences of one or more such moves.</Paragraph>
    <Paragraph position="3"> Due to space constraints, we do not show the equivalence between BEPDA and EPDA apart from noting that the moves of the two machines are dual of each other.</Paragraph>
  </Section>
  <Section position="7" start_page="277" end_page="277" type="metho">
    <SectionTitle>
4 LR Parsing Algorithm
</SectionTitle>
    <Paragraph position="0"> An LR parser consists of an input, an output, a sequence of stacks, a driver program, and a parsing table that has three parts (ACTION, GOTOright and GOTO.foot). The parsing program is the same for all LR parsers, only the parsing tables change from one grammar to another.</Paragraph>
    <Paragraph position="1"> The parsing program reads characters from the input one character at a time. The program uses the sequence of stacks to store states.</Paragraph>
    <Paragraph position="2"> The parsing table consists of three parts, a parsing action function ACTION and two goto functions GOTOright and GOTOloot. The program driving the LR parser first determines the state i currently on top of the top stack and the current input token at. Then it consults the ACTION table entry for state i and token 3The need to use bottom-up version of an EPDA in LR style parsing of TAGs was suggested to us by Bernard Lang and David Weir. Also their susgestions played all insU~llaK~\[ v01e in the definition of BBPDA, for example restriction on the moves allowed.</Paragraph>
    <Paragraph position="3"> read only input tape  at. The entry in the action table can have one of the following five values:  put string (errors are associated with empty table entries).</Paragraph>
    <Paragraph position="4"> The function GOTOright and GOTOfoo, take a state i and an auxiliary tree # and produce a state j.</Paragraph>
    <Paragraph position="5"> An example of a parsing table for a grammar generating L = {anbnecndnln &gt; 0} is given in Figure 5. We denote an instantaneous description of the BEPDA by a pair whose first component is the sequence of pushdowns and whose second component is the unexpanded input: (lltm'' &amp;quot;till&amp;quot; &amp;quot;-Ilsl&amp;quot; &amp;quot;sw, a~a~+l...a,$) In the above sequence of pushdowns, the stacks are piled up from left to right. II stands for the bottom of a stack, s~ is the top element of the top stack, Sx is the bottom element of the top stack, tl is the top element of the bottom stack and tm is the bottom element of the bottom stack.</Paragraph>
    <Paragraph position="6"> The initial configuration of the parser is set to: (110, al-..an$) where 0 is the start state and ax * .. a,$ is the input string to be read with an end marker ($). 278 Suppose the parser reaches the configuration: (lit,,,&amp;quot; &amp;quot;till&amp;quot; &amp;quot;IIi~&amp;quot;&amp;quot;&amp;quot; ill, arar+l.., an$) The next move of the parser is determined by reading at, the current input token and the state i on top of the sequence of stacks, and then consulting the parsing table entry for ACTION\[i, a,\]. The parser keeps applying the move associated with ACTION\[i, at\] until acceptance or  error occurs. The following moves are possible: (i) (ii)  ACTION\[/, at\] = shift state j (,j). The parser executes a push move, entering the configuration: (lltm''' tx II&amp;quot;&amp;quot; IIi~o * * * ilillJ, at+l&amp;quot;'&amp;quot; an$) ACTION\[/, at\] = resume right of 6 at address dot (rs6@doO. The parser is coming to the right and below of the node at address dot in 6, say ri, on which an auxiliary tree has been adjoined. The information identifying the auxiliary tree is in the sequence of stacks and must be recovered. There are two eases: Case 1:71 does not subsume a foot node. Let k be the number of terminal symbols subsumed by r/.</Paragraph>
    <Paragraph position="7"> Before applying this move, the current configuration looks like: (ll&amp;quot;&amp;quot; Ilikll &amp;quot;&amp;quot; IIi111i, a,.. &amp;quot;an$) The k top first stacks are merged into one stack and the stack IIm is pushed on top of it, where m = GOTOfoo,\[ik, #\] for some auxiliary tree # that can be adjoined in 6 at 71, and the parser enters the configuration: (11&amp;quot;&amp;quot;&amp;quot; Ilikllit-t &amp;quot;'&amp;quot; ix illm, at&amp;quot;&amp;quot; a,$) Case 2:~7 subsumes the foot node of 6. Let k (resp. k') be the number of terminal symbols to the right (resp. to the left) of the foot node subsumed by r/. Before applying this move, the configuration looks like: (ll&amp;quot; &amp;quot;&amp;quot; Ilnv+tll&amp;quot;&amp;quot;&amp;quot; Ilnxllsl&amp;quot; &amp;quot;&amp;quot; szllik&amp;quot; &amp;quot;&amp;quot; Iii111i, a,--. a.$) The k' stacks below the k + 2 *h stack from the top as well as the k + 1 top stacks are rewritten onto the k + 2 th stack and the stack lira is pushed on top of it, where m = GOTO/oot\[nk,+ x,/3\] for some auxiliary tree ~ that can be adjoined in 6 at ,7, and the parser enters the configuration: (11&amp;quot;&amp;quot; Ilnv+lllsl &amp;quot;&amp;quot; .sink .... nlik.., ixil\]m, a~... an$) (iii) ACTION\[/, at\] = reduce root of an auxiliary tree/3 in which the last adjunction on the spine was performed at address star (rdfl@star). The parser has finished the recognition of the auxiliary tree/L It must remove all information about/3 and continue the recognition of the tree in which/3 was adjoined. The parser executes an unwrap move. Let k (resp.</Paragraph>
    <Paragraph position="8"> k') be the number of terminal symbols to the left (resp. to the righO of the foot node of B. Let ff be the node at address star in/3 (ff = nil if star is not set). Let p be the number of terminal symbols to the left of the foot node subsumed by ~ (p = 0 if = nil). p + k' + 1 symbols from the top of the sequence of stacks popped. Then k - p single element stacks below the new top stack are unwrapped. Let j be the new top element of the top stack. Let ra = GOTOriaht~, t~\]. j is popped and the single element stack lira is pushed on top of the top stack. By keeping track of the auxiliary trees being reduced, it is possible to output a parse instead of acceptance or an error.</Paragraph>
    <Paragraph position="9"> The parser recognizes the derived tree inside out: it extracts recursively the innermost auxiliary tree that has no adjunction performed in it.</Paragraph>
  </Section>
  <Section position="8" start_page="277" end_page="281" type="metho">
    <SectionTitle>
5 LR(0) Parsing Tables
</SectionTitle>
    <Paragraph position="0"> This section explain how to construct an LR(0) parsing table given a TAG. The construction is an extension of the one used for CFGs. Similarly to Schabes and Joshi (1988), we extend the notion of dotted rules to trees. We define the closure operations that correspond to adjunction. Then we explain how transitions between states are defined. We give in Figure 5 an example of a finite state automaton used to build the parsing table for a TAG (see Figure 5) generating a context-sensitive language.</Paragraph>
    <Paragraph position="1"> We first explain preliminary concepts (originally defined to construct an Earley-type parser for TAGs) that will be used by the algorithm. Dotted rules are extended to trees. Then we recall a tree traversal that the algorithm will mimic in order to scan the input from left to right.</Paragraph>
    <Paragraph position="2"> A dotted symbol is defined as a symbol associated with a dot above or below and either to the left or to  the right of it. The four positions of the dot are annotated by ia, ib, ra, rb (resp. left above, left below, right above, right below): taa,~ In practice, only two dot Ib.L.rb * positions can be used (to the left and to the fight of a node). However, for sake of simplicity, we will use four different dot positions. A dotted tree is defined as a tree with exactly one dotted symbol. Furthermore, some nodes in the dotted tree can be marked with a star. A star on a node expresses the fact that an adjunction has been performed on the corresponding node. A dotted tree is referred as \[c~, dot, pos, stars\], where o~ is a tree, dot is the address of the dot, pos is the position of the dot (la, lb, ra or rb) and stars is a list of nodes in a annotated by a star.</Paragraph>
    <Paragraph position="3"> Given a dotted tree with the dot above and to the left of the root, we define a tree traversal of a dotted tree (as shown in the Figure 3) that will enable us to scan the frontier of an elementary tree from left to right while trying to recognize possible adjunctions between the above and below positions of the dot of interior nodes.</Paragraph>
    <Paragraph position="4">  A state in the finite state automaton is defined to be a set of dotted trees closed under the following operations: Adjunction Prediction, Left Completion, Move Dot Down, Move Dot Up and Skip Node (See Figtire 4). 4 Adjunction Prediction predicts all possible auxiliary trees that can be adjoining at a given node. Left Completion occurs when an auxiliary tree is recognized up to its foot node. All trees in which that tree can be adjoined are pulled back with the node on which adjunction has been performed added to the list of stars. Move Dot Down moves the dot down the links. Move Dot Up moves the dot up the links. Skip Node moves the dot up on the right hand side of a node on which no adjunction has been performed.</Paragraph>
    <Paragraph position="5"> All the states in the finite state automaton (FSA) must be closed under the closure operations. The FSA is  build as follows. In states set 0, we put all initial trees with a dot to the left and above the root. The state is then closed. Then recursively we build new states with the following transitions (we refer to Figure 5 for an example of such a construction).</Paragraph>
    <Paragraph position="6">  * A transition on a (where a is a terminal symbol) from Si to Sj occurs if and only if in Si there is a dotted tree \[6, dot, la, stars\] in which the dot is to the left and above a terminal symbol a; Sj consists of the closure of the set of dotted trees of the form \[6, dot, ra, stars\].</Paragraph>
    <Paragraph position="7"> * A transition on/3~ight from Si to Sj occurs iff in  Si there is a dotted tree \[8, dot, rb, stars\] such that the dot is to the right and below a node on which /3 can he adjoined; Sj consists of the closure of the set of dotted trees of the form \[8, dot, ra, stars'\]. If the dotted node of \[8, dot, rb, stars\] is not on the spine 5 of 8, star' consists of all the nodes in star that strictly dominate the dotted node. When the dotted node is on the spine, stars' consists of all the nodes in star that strictly dominate the dotted node, ff there are some, otherwise stars' = {dot}.</Paragraph>
    <Paragraph position="8"> * A Skip foot of \[/3, dot, lb, stars\] transition from Si to Sj occurs iff in S~ there is a dotted tree \[/3, dot, lb, stars\] such that the dot is to the left and below the foot node of the auxiliary tree/3; Sj consists of the closure of the set of dotted trees of the form \[/3, dot, rb, stars\].</Paragraph>
    <Paragraph position="9"> The parsing table is constructed from the FSA built as above. In the following, we write trans(i, z) for set of states in the FSA reached from state i on the transition labeled by z.</Paragraph>
    <Paragraph position="10"> The actions for ACTION(i, a) are:  state i there is a dotted tree \[/3, O, ra, {star}\], where /3 is an auxiliary tree. 6 * Accept occurs iff a is the end marker (a = $) and there is a dotted tree \[~, O, ra, {star}\], where a is an initial tree and the dot is to the right and above the root node.</Paragraph>
    <Paragraph position="11"> * Error, if none of the above applies.</Paragraph>
    <Paragraph position="12"> The GOTO table encodes the transitions in the FSA on non-terminal symbols. It is indexed by a state and by /3right or /31oot, for all auxiliary trees /3: j G GOTO(i, label) iff there is a transition from i to j on the given label (label E {/3riaht,/3/oot I/3 is an auxiliary tree}.</Paragraph>
    <Paragraph position="13"> If more than one action is possible in an entry of the action table, the grammar is not LR(0): there is a conflict of action, the grammar cannot be parsed deterministitally without lookahead.</Paragraph>
    <Paragraph position="14"> An example of a finite state automaton used for the construction of the LR(0) table for a TAG (trees cq,/31 in Figure 5) generating 7 L = {anbneendnln &gt;_ O}, its corresponding parsing table is given and an example of sequences of moves are given in Figure 5.</Paragraph>
    <Paragraph position="15"> 60 is the address of the root node.</Paragraph>
    <Paragraph position="16"> tin the given TAG (trees ~1 and/31), if we omit a and c, we obtain a TAG that is similar to the one for the Dutch cross-serial construction. This grammar can still bc handled by an LR(0) parser. In the trees c~ and /3, na stand for null adjuncfion constraint (i.e. no anxifiary tree can be adjoined on a node with null adjunction constraint).</Paragraph>
    <Paragraph position="18"/>
  </Section>
  <Section position="9" start_page="281" end_page="281" type="metho">
    <SectionTitle>
6 SLR(1) Parsing Tables
</SectionTitle>
    <Paragraph position="0"> The tables that we have constructed are LR(0) tables.</Paragraph>
    <Paragraph position="1"> The Resume Right and Reduce Root moves are performed regardless of the next input token. The accuracy of the parsing table can be improved by computing lookaheads. FIRST and FOLLOW can be extended to dotted trees, s FIRST of a dotted tree corresponds to the set of left most symbols appearing below the subtree dominated by the dotted node. FOLLOW of a dotted tree defines the set of tokens that can appear in a derivation immediately following the dotted node. Once FIRST and FOLLOW computed, the LR(0) parsing table can be improved to an SLR(1) table: Resume Right and Reduce Root are applicable only on the input tokens in the follow set of the dotted tree.</Paragraph>
    <Paragraph position="2"> For example, the SLR(1) table for the TAG built with trees oq and ~1 is given in Figure 6.</Paragraph>
    <Paragraph position="4"> By associating dotted trees with lookaheads, one can also compute LR(k) items in the finite state automaton in order to build LR(k) parsing tables.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML