XML Viewer - j93-4002

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/93/j93-4002_metho.xml
Size: 10,249 bytes
Last Modified: 2025-10-06 14:13:25
<?xml version="1.0" standalone="yes"?>
<Paper uid="J93-4002">
  <Title>Parsing Some Constrained Grammar Formalisms</Title>
  <Section position="3" start_page="0" end_page="594" type="metho">
    <SectionTitle>
2. Linear Indexed Grammars
</SectionTitle>
    <Paragraph position="0"> An Indexed Grammar (Aho 1968) can be viewed as a CFG in which objects are nonterminals with an associated stack of symbols. In addition to rewriting nonterminals, the rules of the grammar can have the effect of pushing or popping symbols on top of the stacks that are associated with each nonterminal. Gazdar (1988) discussed a restricted form of Indexed Grammars in which the stack associated with the nonterminal on the left of each production can only be associated with one of the occurrences of non-terminals on the right of the production. Stacks of bounded size are associated with other occurrences of nonterminals on the right of the production. We call this Linear Indexed Grammars (LIG). 2 1 The path set of a tree is the set of strings labeling paths from the root to the frontier of the tree. The path set of a tree set is the union of path sets of trees in the set. 2 The name Linear Indexed Grammars is used by Duske and Parchmann (1984) to refer to a different restriction on Indexed Grammars in which production was restricted to have only a single nonterminal on their right-hand side.</Paragraph>
    <Paragraph position="1">  K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms Definition 2.1 A LIG, G, is denoted by (VN, VT, VI, S, P) where VN is a finite set of nonterminals, VT is a finite set of terminals, VI is a finite set of indices (stack symbols), S c VN is the start symbol, and P is a finite set of productions.</Paragraph>
    <Paragraph position="2"> We adopt the convention that (~, fl (with or without subscripts and primes) denote members of V~, and ~ denotes a stack symbol. As usual, A, B, C will denote nonterminals, a, b, c will denote terminals, and u, v, w will denote members of V~. Definition 2.2 A pair consisting of a nonterminal, say A, and a string of stack symbols, say (~, will be called an object of the grammar and will be written as A (c~). Given a grammar, G, we define the set of objects Vc(G) = { A ((~) I A C VN, (~ E V~ }.</Paragraph>
    <Paragraph position="3"> We use T to denote strings in (Vc(G) U VT)*. We write A(-.~) to denote the non-terminal A associated with an arbitrary stack (~ with the string on top. Also, we use A () to denote that an empty stack is associated with A. The general form of a production in a LIG is: a (.. (~) --+ Wlal (oL1)w2... ai-1 (oq-1) wiai (.. oq) Wi+lai+ 1 (oq+ 1)... A n (o@) Wn+ 1 for n &gt; 0 and wl ..., W,+l are members of V~-.</Paragraph>
    <Paragraph position="4"> Definition 2.3 The derivation relation, ~, is defined below. If the above production is used then for any fl ~ V{, T1, T2 E (Vc(G) U Wv) *:</Paragraph>
    <Paragraph position="6"> We use ~ as the reflexive, transitive closure of ~. As a result of the linearity in the general form of the rules, we can observe that the stack flc~ associated with the object in the left-hand side of the derivation and flc~i associated with one object in the right-hand side have the initial part fl in common. In the derivation above, we will say that this object a i (flOq) is the distinguished child of A (flo0. Given a derivation, the distinguished descendant relation is the reflexive, transitive closure of the distinguished child relation.</Paragraph>
    <Paragraph position="7"> The language generated by a LIG, G, L(G) = { w I S() ~ w }.</Paragraph>
    <Paragraph position="8"> Example 2.1 The LIG, G = ({ S, T }, { a, b, c }, { &amp;quot;)/a~ &amp;quot;Yb )~ S~/9) generates ( wcw \] w C {a, b} + } where P contains the following productions.</Paragraph>
    <Paragraph position="9"> S(..)-*aS(..%) S(..)--~bS(..q/b) S(..)---~ T(..) T(..%)--, T(..)a T(..',/b)-+ T(..)b T()--*c A derivation tree for the string abbcabb is given in Figure 1.</Paragraph>
    <Paragraph position="10">  Derivation tree for LIG.</Paragraph>
    <Paragraph position="11"> In this paper rather than adopting the general form of rules as given above, we restrict our attention to grammars whose rules have the following form. In fact, this can be easily seen to constitute a normal form for LIG.</Paragraph>
    <Paragraph position="12">  1. A (c0 ~ c where ~ C VT U {c} and length of c~, len (,9&lt;) &gt;&gt;_ 1. 2. A (.. &amp;quot;/1... Q/m) ----&gt; Ap (.. Vp) As (O&lt;s) where m &gt; 0.</Paragraph>
    <Paragraph position="13"> 3. a ('&amp;quot; '71&amp;quot;. &amp;quot;Ym) --&amp;quot; As (OLs) ap (.. ~p) where m &gt; 0. 4. A (&amp;quot;71... 7m) &amp;quot;--+ Ap (.. 7p) where m &gt; 0.</Paragraph>
    <Paragraph position="14">  We allow at most two symbols in the right-hand side of productions because we intend to develop CKY-style algorithms. In the above rules we say that AF (.. &amp;quot;yp) is the primary constituent and As (c~s) is the secondary constituent. Notice also that in a derivation using such a rule, the primary constituent yields the distinguished child. (In grammatical theories that use a stack of subcategorized arguments, the top of the stack in the primary constituent determines which secondary constituent it can combine with.)</Paragraph>
    <Section position="1" start_page="593" end_page="594" type="sub_section">
      <SectionTitle>
2.1 Terminators
</SectionTitle>
      <Paragraph position="0"> Let us consider how we may extend the CKY algorithm for the recognition of LIG.</Paragraph>
      <Paragraph position="1"> Given a fixed grammar G and an input al * .. an, the recognition algorithm will complete an n x n array P such that an encoding of A (cO is stored in P \[i, d\] if and only if A (oQ ai... ai+d-1. The algorithm will operate bottom-up. For example, if G contains the rule a ('&amp;quot; &amp;quot;)11... &amp;quot;Ym) ---+ ap (.. &amp;quot;~p) A s (O~s) and we find an encoding of Ap (O&lt;p'yp) in P Ii, dp\] and an encoding of As (C~s) in P Ii + dp~ ds\] then an encoding of A (C~p'yl... &amp;quot;Ym) will be stored  K. Vijay-Shanker and David J. Weir Parsing Some Constrained Grammar Formalisms in P Ii, dp + dsl. What encoding scheme should be used? The most straightforward possibility would be to store a complete encoding of A (c~p3,~... 3,m) in P \[i, dp + ds\]. However, in general, if an object A (~) derives a string of length d then the length of o~ is (,.9(d). 3 Hence there can be O(/d) objects that derive a substring of the input (of length d), for some constant k. Hence, the space and time complexity of this algorithm is exponential in the worst case. 4 The inefficiency of this approach can be seen by drawing an analogy with the following algorithm for CFG. Suppose rather than storing sets of nonterminals in each array entry, we store a set of trees containing all derivation subtrees that yield the corresponding substring. The problem with this is that the number of derivation trees is exponential with respect to the length of the string spanned. However, there is no need to store derivation trees since in considering the combination of subderivation trees in the CFG, only the nonterminals at the root of the tree are relevant in determining whether there is a production that licenses the combination.</Paragraph>
      <Paragraph position="2"> Likewise because of the last-in first-out behavior in the manipulation of stacks in LIG, we will argue that it is not necessary to store the entire stack. For instance, consider the derivation (depicted by the tree shown in Figure 2) from the point of view of recording the derivation in a bottom-up parser (such as CKY). Let a node ~?1 labeled B (fl3,1 ... 3,k... 3,m) be a distinguished descendant of a node ~1 labeled A (fl3,1 ... 3,k) as shown in the figure. Viewing the tree bottom-up, let the node ~\], labeled A (fl3,1 * *. 3,k), be the first node above the node ~71, labeled B (fl3,1 *. * 3,k. * * 3,m), where 3,k gets exposed as the top of the stack. Because of the last-in first-out behavior, every distinguished descendant of ~\] above 711 will have a label of the form A I (fl3,1 ... 3,k~) where len (~) &gt; 1. In order to record the derivation from A (fl3,1 ... 3,k) it would be sufficient to store A and 3'1 .. * 3,k if we could also access the entry that records the derivation from At (fl3,t). In the entry for ~?, using a pointer to the entry for At (fl3,t) would enable the recovery of the stack below the top k symbols, 3,1 * .. &amp;quot;Yk. However, this scheme works well only when k _&gt; 2. For instance, when k = 1, suppose we recorded only A, 3,1, and a pointer to entry for At (fl3,t). Suppose that we are looking for the symbol below 3,1, i.e., the top of ft. Then it is possible that in a similar way the latter entry could also record just At~ 3,t, and a pointer to some other entry to retrieve ft. This situation can occur arbitrarily many times.</Paragraph>
      <Paragraph position="3"> Consider the derivation depicted in Figure 3. In this derivation we have indicated the branch containing only the distinguished descendants. We will assume that the node labeled D (f13,, ..-3,k-13,~ .-. 3/~n ,) is the closest distinguished descendant of C (fl3,1..-3,k-13,~) such that every node between them will have a label of the form C' (fl&amp;quot;Yl-., 3,k-13,~ O/) where len (~') &gt; 1. Therefore, any node between that labeled C (fl3,1..-3,k-13,~) and B(fl3,1...3,rn) will have a label of the form C&amp;quot; (fl3,1..-&amp;quot;~k-10/') where fen (c~&amp;quot;) &gt; 1. Now the entries representing derivations from both A(fl3,1...</Paragraph>
      <Paragraph position="4"> 3,k-13,k) and C (fl3,1... 3,k-13,~) could point back to the entry for the derivation from At (fl3,t), whereas the entry for C' (fl3,1 ...3,k-13,~c~') will point back to the entry for A We shall now formalize these notions by defining a terminator.</Paragraph>
      <Paragraph position="5"> 3 For instance, consider the grammar in Example 2.1 and the derivation in Figure 1. In general we can have derivations of the form T (q'a3&amp;quot;~) ~ cab n. However, if there exists productions of the form A (c~) --~ ~ then the length of the stack in objects is not even bounded by the length of strings they derive.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML