File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/95/j95-4002_metho.xml

Size: 68,953 bytes

Last Modified: 2025-10-06 14:13:58

<?xml version="1.0" standalone="yes"?>
<Paper uid="J95-4002">
  <Title>Tree Insertion Grammar: A Cubic-Time, Parsable Formalism that Lexicalizes Context-Free Grammar without Changing the Trees Produced Yves Schabes * MERL</Title>
  <Section position="4" start_page="480" end_page="483" type="metho">
    <SectionTitle>
2 In Schabes and Waters (1993a) these three kinds of auxiliary trees are referred to differently, as right
</SectionTitle>
    <Paragraph position="0"> recursive, left recursive, and centrally recursive, respectively.</Paragraph>
    <Paragraph position="1">  As in TAG, but in contrast to CFG, there is an important difference in TIG between a derivation and the tree derived. By means of simultaneous adjunction, there can be several trees created by a single derivation. In addition, there can be several different derivations for the same tree.</Paragraph>
    <Paragraph position="2"> To eliminate useless ambiguity in derivations, TIG prohibits adjunction: at nodes marked for substitution, because the same trees can be created by adjoining on the roots of the trees substituted at these nodes; at foot nodes of auxiliary trees, because the same trees can be created by simultaneous adjunction on the nodes the auxiliary trees are adjoined on; and at the roots of auxiliary trees, because the same trees can be created by simultaneous adjunction on the nodes the auxiliary trees are adjoined on. Figure 1 shows five elementary trees that might appear in a TIG for English. The trees containing 'boy' and 'saw' are initial trees. The remainder are auxiliary trees. As illustrated in Figure 2, substitution inserts an initial tree T in place of a frontier node # that has the same label as the root of T and is marked for substitution.</Paragraph>
    <Paragraph position="3"> Adjunction inserts an auxiliary tree T into another tree at a node # that has the same label as the root (and therefore foot) of T. In particular, # is replaced by T and the foot of T is replaced by the subtree rooted at #. The adjunction of a left auxiliary tree is referred to as left adjunction. This is illustrated in Figure 3. The adjunction of a right auxiliary tree is referred to as right adjunction (see Figure 4).</Paragraph>
    <Paragraph position="4"> Simultaneous adjunction is fundamentally ambiguous in nature and typically results in the creation of several different trees. The order in the sequences of left and right auxiliary trees fixes the order of the strings being combined. However, unless one of the sequences is empty, variability is possible in the trees that can be produced. The TIG formalism specifies that every tree is produced that is consistent with the specified order.</Paragraph>
    <Paragraph position="6"> Figure 5 illustrates the simultaneous adjunction of one left and one right auxiliary tree on a node. The string corresponding to the left auxiliary tree must precede the node, and the string corresponding to the right auxiliary tree must follow it. However, two different trees can be derived---one where the left auxiliary tree is on top and one where the right auxiliary tree is on top. The simultaneous adjunction of two left and two right auxiliary trees leads to six derived trees.</Paragraph>
    <Paragraph position="7"> The adjunction of a wrapping auxiliary tree is referred to as wrapping adjunction.</Paragraph>
    <Paragraph position="8"> This is illustrated in Figure 6. The key force of the restrictions applied to TIG, in</Paragraph>
    <Paragraph position="10"> comparison with TAG, is that they prevent wrapping adjunction from occurring, by preventing the creation of wrapping auxiliary trees. 3 Wrapping adjunction yields context-sensitive languages because two strings that are mutually constrained by being in the same auxiliary tree are wrapped around another string. This observation stems from the equivalence of TAG and head grammars (Vijay-Shanker et al. 1986). In contrast, every operation allowed by a TIG inserts a string into another string. Simultaneous adjunction merely specifies multiple independent insertions. Simultaneous left and right adjunction is not an instance of wrapping, because TIG does not allow there to be any constraints between the adjoinability of the trees in question.</Paragraph>
    <Paragraph position="11"> There are many ways that the TIG formalism could be extended. First, adjoining constraints could be used to prohibit the adjunction of particular auxiliary trees (or all auxiliary trees) at a given node.</Paragraph>
    <Paragraph position="12"> Second, one can easily imagine variants of TIG where simultaneous adjunction is more limited. One could allow only one canonical derived tree. One could allow at most one left auxiliary tree and one right auxiliary tree as we did in Schabes and Waters (1993a). One could forbid multiple adjunction altogether. We have chosen unlimited simultaneous adjunction here primarily because it reduces the number of chart states, since one does not have to record whether adjunction has occurred at a given node.</Paragraph>
    <Paragraph position="13"> Third, one can introduce stochastic parameters controlling the probabilities with which particular substitutions and adjunctions occur (see Schabes and Waters 1993b).</Paragraph>
    <Paragraph position="14"> Fourth, and of particular importance in the current paper, one can require that a TIG be lexicalized.</Paragraph>
    <Section position="1" start_page="483" end_page="483" type="sub_section">
      <SectionTitle>
Definition 7
</SectionTitle>
      <Paragraph position="0"> \[LTIG\] A lexicalized tree insertion grammar (LTIG) 4 (G, NT, L A, S) is a TIG where every elementary tree in I U A is lexicalized. A tree is lexicalized if at least one frontier node is labeled with a terminal symbol.</Paragraph>
      <Paragraph position="1"> An LTIG is said to be left anchored if every elementary tree is left anchored. An elementary TIG tree is left anchored if the first nonempty frontier element other than 3 Using a simple case-by-case analysis, one can show that given a TIG, it is not possible to create a wrapping auxiliary tree. A proof of this fact is presented in Appendix A.</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="483" end_page="486" type="metho">
    <SectionTitle>
4 In Schabes and Waters (1993a) a formalism almost identical to LTIG is referred to as lexicalized
</SectionTitle>
    <Paragraph position="0"> context-free grammar (LCFG). A different name is used here to highlight the importance of the nonlexicalized formalism, which was not given a name in Schabes and Waters (1993a).</Paragraph>
    <Paragraph position="1">  Schabes and Waters Tree Insertion Grammar the foot, if any, is a lexical item. All the trees in Figure 1 are lexicalized; however, only the ones containing seems, pretty, and smoothly are left anchored.</Paragraph>
    <Paragraph position="2"> 3. Relations between CFG, TIG and TAG In this section, we briefly compare CFG, TIG and TAG, noting that TIG shares a number of properties with CFG on one hand and TAG on the other.</Paragraph>
    <Paragraph position="3"> Any CFG can be trivially converted into a TIG that derives the same trees by converting each rule R into a single-level initial tree. If the right hand side of R is empty, the initial tree created has a single frontier element labeled with e. Otherwise, the elements of the right hand side of R become the labels on the frontier of the initial tree, with the nonterminals marked for substitution.</Paragraph>
    <Paragraph position="4"> Similarly, any TIG that does not make use of adjoining constraints can be easily converted into a TAG that derives the same trees; however, adjoining constraints may have to be used in the TAG. The trivial nature of the conversion can be seen by considering the three differences between TIG and TAG.</Paragraph>
    <Paragraph position="5"> First, TIG prohibits elementary wrapping auxiliary trees. From the perspective of this difference, a TIG is trivially a TAG without the need for any alterations. Second, TIG prohibits adjunction on the roots of auxiliary trees and allows simultaneous adjunction while TAG allows adjunction on the roots of auxiliary trees and prohibits simultaneous adjunction. From the perspective of this difference in approach, a TIG is also trivially a TAG without alteration. To see this, consider the following: Suppose that there are a set of auxiliary trees T that are allowed to adjoin on a node # in a TIG. Simultaneous adjunction in TIG allows these auxiliary trees to be chained together in every possible way root-to-foot on #. The same is true in a TAG where the trees in T are allowed to adjoin on each other's roots.</Paragraph>
    <Paragraph position="6"> Third, TIG imposes a number of detailed restrictions on the interaction of left and right auxiliary trees. To convert a TIG into a TAG deriving the same trees and no more, one has to capture these restrictions. In general, this requires the use of adjoining constraints to prohibit the forbidden adjunctions.</Paragraph>
    <Paragraph position="7"> It should be noted that if a TIG makes use of adjoining constraints, then the conversion of the TIG to a TAG deriving the same trees can become more complex or even impossible, depending on the details of exactly how the adjoining constraints are allowed to act in the TIG and TAG.</Paragraph>
    <Paragraph position="8"> TIG generates context-free languages. Like CFG, TIG generates context-free languages. In contrast, TAG generates so called tree adjoining languages (TALs) (,Joshi 1985).</Paragraph>
    <Paragraph position="9"> The fact that any context-free language can be generated by a TIG follows from the fact that any CFG can be converted into a TIG. The fact that TIGs can only generate context-free languages follows from the fact that any TIG can be converted into a CFG generating the same language, as shown in the following theorem.</Paragraph>
    <Paragraph position="10"> Theorem 1 If G = (E, NT, LA, S) is a TIG then there is a CFG G' = (E, NT',P,S) that generates the same string set. 5 5 As usual, a context-freegrammar (CFG) G is a four-tuple (G, NT, P, S) where ~ is a set of terminal symbols, NT is a set of nonterminal symbols, P is a finite set of finite production rules that rewrite nonterminal symbols to, possibly empty, strings of terminal and nonterminal symbols, and S is a distinguished nonterminal symbol that is the start symbol of any derivation.</Paragraph>
    <Paragraph position="11">  Computational Linguistics Volume 21, Number 4 P~oo~ The key step in converting a TIG into a CFG is eliminating the auxiliary trees. Given  only initial trees, the final conversion to a CFG is trivial.</Paragraph>
    <Paragraph position="12"> * Step 1: For each nonterminal Ai in NT, add two more nonterminals Wi and Zi. This yields the new nonterminal set NTL * Step 2: For each nonterminal Ai, include the following rules in P: Yi ---* c and Zi --* c.</Paragraph>
    <Paragraph position="13"> * Step 3: Alter every node # in every elementary tree in I and A as follows: Let Ai be the label of #. If left adjunction is possible at #, add a new leftmost child of # labeled Yi and mark it for substitution. If right adjunction is possible at #, add a new rightmost child of # labeled Zi and mark it for substitution.</Paragraph>
    <Paragraph position="14"> * Step 4: Convert every auxiliary tree t in A into an initial tree as follows: Let a i be the label of the root # of t. If t is a left auxiliary tree, add a new root labeled Yi with two children: # on the left, and on the right, a node labeled Yi and marked for substitution. Otherwise add a new root labeled Zi with two children: # on the left, and on the right, a node labeled Zi and marked for substitution. Relabel the foot of t with e, turning t into an initial tree.</Paragraph>
    <Paragraph position="15"> * Step 5: Every elementary tree t is now an initial tree. Each one is  converted into a rule in P as follows: The label of the root of t becomes the left hand side of R. The labels on the frontier of t with any instances of c omitted become the right hand side of R.</Paragraph>
    <Paragraph position="16"> Every derivation in G maps directly to a derivation in G t that generates the same string. Substitution steps map directly. Adjunctions are converted into substitutions via the new non-terminals Yi and Zi. The new roots and their children labeled Yi and Zi created in Step 3 allow arbitrarily many simultaneous adjunctions at a node. The right linear ordering inherent in these structures encodes the ordering information specified for a simultaneous adjunction. \[\] It should be noted that while G / generates the same strings as G, it does not generate the same trees: the substitutions in G / that correspond to adjunctions in G create trees that are very different from the trees generated by G. For instance, if a left auxiliary tree T has structure to the right of its spine, this structure ends up on the left rather than the right of the node &amp;quot;adjoined on&amp;quot; in G ~. However, this does not alter the strings that are generated, because by the definition of TIG, the structure to the right of the spine of T must be entirely empty.</Paragraph>
    <Paragraph position="17"> The theorem above does not convert TAGs into CFGs, because the construction involving Yi and Zi does not work for wrapping auxiliary trees. The reason for this is that a wrapping auxiliary tree has nonempty structure on both the left and the right of its spine.</Paragraph>
    <Paragraph position="18"> TIG generates context-free path sets. The path set of a grammar is the set of all paths from root to frontier in the trees generated by the grammar. The path set is a set of strings in (~. U NT)*. CFGs have path sets that are regular languages (RLs) (Thatcher 1971). In contrast, TAGs have path sets that are context-free languages (CFLs) (Weir 1988).</Paragraph>
    <Paragraph position="19">  The fact that the path sets generated by a TIG cannot be more complex than context-free languages follows from the fact that TIGs can be converted into TAGs generating the same trees. The fact that TIGs can generate path sets more complex than regular languages is shown by the following example.</Paragraph>
    <Paragraph position="20"> Consider the TIG in Figure 7. The path set L generated by this grammar contains a variety of paths including Sx (from the elementary initial tree), SASBSx &amp; SAa (from adjoining the elementary auxiliary tree once on the initial tree), and so on. By relying on the fact that the intersection of two regular languages must be regular, it is easy to show that L is not a regular language. In particular, consider: L N {SA}*S{BS}*x = {SA}nS{BS}nx This intersection corresponds to all the paths from root to x in the trees that are generated by recursively embedding the elementary auxiliary tree in Figure 7 into the middle of its spine. Since this intersection is not a regular language, L cannot be a regular language.</Paragraph>
  </Section>
  <Section position="6" start_page="486" end_page="492" type="metho">
    <SectionTitle>
4. Parsing TIG
</SectionTitle>
    <Paragraph position="0"> Since TIG is a restricted case of tree-adjoining grammar (TAG), standard O(n6)-time TAG parsers (Lang 1990; Schabes 1991; Vijay-Shanker 1987; Vijay-Shanker and Weir 1993; Vijay-Shanker and Joshi 1985) can be used for parsing TIG. Further, they can be easily optimized to require at most O(n4)-time when applied to a TIG. However, this still does not take full advantage of the context-freeness of TIG.</Paragraph>
    <Paragraph position="1"> A simple O(nB)-time bottom-up recognizer for TIG in the style of the CKY parser for CFG can be straightforwardly constructed following the approach shown in Schabes and Waters (1993a).</Paragraph>
    <Paragraph position="2"> As shown below, one can obtain a more efficient left-to-right parsing algorithm for TIG that maintains the valid prefix property and requires O(n 3) time in the worst case, by combining top-down prediction as in Earley's algorithm for parsing CFGs  An auxiliary tree and its textual representation.</Paragraph>
    <Paragraph position="4"> (Earley 1970) with bottom-up recognition. The algorithm is a general recognizer for TIGs, which requires no condition on the grammar. 6</Paragraph>
    <Section position="1" start_page="487" end_page="490" type="sub_section">
      <SectionTitle>
4.1 An Earley-Style Cubic-Time Parser For TIG
</SectionTitle>
      <Paragraph position="0"> Notation. Suppose that G = (G, NT, L A, S) is a TIG and that al ... an is an input string.</Paragraph>
      <Paragraph position="1"> The Greek letters #, v, and p are used to designate nodes in elementary trees. Subscripts are used to indicate the label on a node, e.g., #x. Superscripts are sometimes used to distinguish between nodes.</Paragraph>
      <Paragraph position="2"> A layer of an elementary tree is represented textually in a style similar to a production rule, e.g., #x--*~'Y pz. For instance, the tree in Figure 8 is represented in terms of four layer productions as shown on the right of the figure.</Paragraph>
      <Paragraph position="3"> The predicate Init(#x) is true if and only if #x is the root of an initial tree. The predicate LeftAux(px) is true if and only if px is the root of an elementary left auxiliary tree. The predicate RightAux(px) is true if and only if Px is the root of an elementary right auxiliary tree. The predicate Subst(#x) is true if and only if #x is marked for substitution. The predicate Foot(px) is true if and only if #x is the foot of an auxiliary tree. The predicate Adjoin(px,,x) is true if and only if the restrictions governing adjunction in TIG permit the auxiliary tree px to be adjoined on the node #x.</Paragraph>
      <Paragraph position="4"> Chart states. The Earley-style TIG parser collects states into a set called the chart, C. A state is a 3-tuple, \[p, i,j\] where: p is a position in an elementary tree as described below; and 0 &lt; i &lt; j _&lt; n are integers indicating a span of the input string.</Paragraph>
      <Paragraph position="5"> During parsing, elementary trees are traversed in a top-down, left-to-right manner that visits the frontier nodes in left-to-right order (see Figure 9). Positions, which are depicted as dots in Figure 9, are used to represent the state of this traversal.</Paragraph>
      <Paragraph position="6"> In a manner analogous to dotted rules for CFG as defined by Earley (1968), being at a particular position with regard to a particular node divides the subtree rooted at the node into two parts: a left context consisting of children that have already been matched and a right context that still needs to be matched.</Paragraph>
      <Paragraph position="7"> Positions are represented by placing a dot in the production for the corresponding 1 2 4 layer. For example, the fourth position reached in Figure 9 is represented as #S~#Ae#B . 6 This parser is the more remarkable because for TAG the best parser known that maintains the valid prefix property requires, in the worst case. more time than parsers that do not maintain the valid prefix property (o(ng)-time versus O(n6)) (Schabes 1991).</Paragraph>
      <Paragraph position="9"> In dotted layer productions, the Greek letters ~, fl, and 31 are used to represent sequences of zero or more nodes.</Paragraph>
      <Paragraph position="10"> The indices i,j record the portion of the input string that is spanned by the left context. The fact that TIG forbids wrapping auxiliary trees guarantees that a pair of indices is always sufficient for representing a left context. As traversal proceeds, the left context grows larger and larger.</Paragraph>
      <Paragraph position="11"> Correctness condition. Given an input string al &amp;quot;'an, for every node #x in every elementary tree in G, the Earley-style TIG parsing algorithm guarantees that: h \[#x--*~ofl, i,j\] E C if and only if there is some derivation in G of some string beginning with al ...aj where ai+l ...aj is spanned by: A sequence of zero or more left auxiliary trees simultaneously adjoined on #x plus The children of #x corresponding to g plus if fl = ~, zero or more right auxiliary trees simultaneously adjoined on #x.</Paragraph>
      <Paragraph position="12"> The algorithm. Figure 10 depicts the Earley-style TIG parsing algorithm as a set of inference rules. Using the deductive parser developed by Shieber, Schabes, and Pereira (1995), we were able to experiment with the TIG parser represented directly in this form (see Section 6).</Paragraph>
      <Paragraph position="13"> The first rule (1) initializes the chart by adding all states of the form \[#s--sock, 0, 0\], where #s is the root of an initial tree. The initial states encode the fact that any valid derivation must start from an initial tree whose root is labeled S.</Paragraph>
      <Paragraph position="14"> The addition of a new state to the chart can trigger the addition of other states as specified by the inference rules in Figure 10. Computation proceeds with the introduction of more and more states until no more inferences are possible. The last rule (13) specifies that the input is recognized if and only if the final chart contains a state of the form \[#s--+go, 0, n\], where #s is the root of an initial tree.</Paragraph>
      <Paragraph position="15"> The scanning and substitution rules recognize terminal symbols and substitutions of trees. They are similar to the steps found in Earley's parser for CFGs (Earley, 1970). The scanning rules match fringe nodes against the input string. Rule 4 recognizes the presence of a terminal symbol in the input string. Rules 5 and 6 encode the fact that one can skip over nodes labeled with c and foot nodes without having to match anything.</Paragraph>
      <Paragraph position="17"> An Earley-style recognizer for TIG, expressed using inference rules.</Paragraph>
      <Paragraph position="18"> The substitution rules are triggered by states of the form \[#A--*c~euB fl, i,j\] where UB is a node at which substitution can occur. Rule 7 predicts a substitution. It does this top down only if an appropriate prefix string has been found. Rule 8 recognizes a completed substitution. It is a bottom-up step that concatenates the boundaries of a fully recognized initial tree with a partially recognized tree.</Paragraph>
      <Paragraph position="19"> The subtree traversal rules control the recognition of subtrees. Rule 9 predicts a subtree if and only if the previous siblings have already been recognized. Rule 10 completes the recognition of a subtree. Rules 9 and 10 are closely analogous to rules 7 and 8. They can be looked at as recognizing a subtree that is required to be substituted as opposed to a subtree that may be substituted.</Paragraph>
      <Paragraph position="20"> The left and right adjunction rules recognize the adjunction of left and right auxiliary trees. The left adjunction rules are triggered by states of the form \[#A--*ec~, i,j\]. Rule 2 predicts the presence of a left auxiliary tree, if and only if a node that the auxiliary tree can adjoin on has already been predicted. Rule 3 supports the bottom-up recognition of the adjunction of a left auxiliary tree. The fact that left adjunction can occur any number of times (including zero) is captured by the fact that states of the form \[#A--~-~, i,j\] represent both situations where left adjunction can occur and situations where it has occurred. The right adjunction rules (11 &amp; 12) are analogous to the left adjunction rules, but are triggered by states of the form \[#a---~c~o, i,j\].</Paragraph>
      <Paragraph position="21"> As written in Figure 10, the algorithm is a recognizer. However, it can be straight- null Schabes and Waters Tree Insertion Grammar forwardly converted to a parser by keeping track of the reasons why states are added to the chart. Derivations (and therefore trees) can then be retrieved from the chart (each in linear time).</Paragraph>
      <Paragraph position="22"> For the sake of simplicity, it was assumed in the discussion above that there are no adjunction constraints. However, the algorithm can easily be extended to handle such constraints by including them in the predicate Adjoin(px,/zx).</Paragraph>
      <Paragraph position="23"> Computational bounds. The algorithm in Figure 10 requires space O(IGIn 2) in the worst case. In this equation, n is the length of the input string and IG I is the size of the grammar G. For the TIG parser, IGI is computed as the sum over all the non-leaf nodes # in all the elementary trees in G of: one plus the number of children of #. The correctness of this space bound can be seen by observing that there are only IG\]n 2 possible chart states \[#x--+aofl, i,j\].</Paragraph>
      <Paragraph position="24"> The algorithm takes O(IGI2n 3) time in the worst case. This can informally be seen by noting that the worst case complexity is due to the completion rules (3, 8, 10, &amp; 12) because they apply to a pair of states, rather than just one state. Since each of the completion rules requires that the chart states be adjacent in the string, each can apply at most O(IGI2n 3) times, since there are at most n 3 possibilities for 0 &lt; i &lt; j &lt; k &lt; n.</Paragraph>
    </Section>
    <Section position="2" start_page="490" end_page="492" type="sub_section">
      <SectionTitle>
4.2 Improving the Efficiency of the TIG Parser
</SectionTitle>
      <Paragraph position="0"> As presented in Figure 10, the TIG parser is optimized for clarity rather than speed.</Paragraph>
      <Paragraph position="1"> There are several ways that the efficiency of the TIG parser can be improved.</Paragraph>
      <Paragraph position="2"> Parsingthatislinearinthegrammarsize. The time complexity of the parser can be reduced from O(IGI2n 3) to O(IGIn 3) by using the techniques described in Graham et al. 1980).</Paragraph>
      <Paragraph position="3"> This improvement is very important, because IG\[ typically is much larger than n for natural language applications. The speedup can be achieved by altering the parser in two ways.</Paragraph>
      <Paragraph position="4"> The prediction rules (2, 7, 9, &amp; 11) can apply O(IGI2n 2) times, because they are triggered by a chart state and grammar node /9; and for each of O(IGIn 2) possible values of the former there can be O(\]GI) values of the latter. However, the new chart state produced by the prediction rules does not depend on the identity of the node in the triggering chart element, nor on the value of i, but rather only on whether there is any chart element ending at j that makes the relevant prediction. Therefore, the parser can be changed so that a prediction rule is triggered at most once for any j and p. This reduces the prediction rules to a time complexity of only O(IGIn ).</Paragraph>
      <Paragraph position="5"> The completion rules (3, 8, 10, &amp; 12) can apply O(IGI2n 3) times, because they are triggered by pairs of chart states; and there can be O(IGI) possibilities for each element of the pair for each i &lt; j &lt; k. However, the new chart state produced by the completion rules does not depend on the identity of the node p in the second chart element, but rather only on whether there is any appropriate chart element from j to/~. Therefore, the parser can be changed so that a completion rule is triggered at most once for any possible first chart state and k. This reduces the completion rules to a time complexity of O(IGIn3).</Paragraph>
      <Paragraph position="6"> Eliminating equivalent states. Rules 5 and 6 merely move from state to state without changing the span i,j. These rules reflect facts about the grammar and the traversal that do not depend on the input. These rules can be largely precompiled out of the  Computational Linguistics Volume 21, Number 4 algorithm by noting that the following states are equivalent.</Paragraph>
      <Paragraph position="7"> \[/AA--+.L, Xc~,i,j\] :-- \[/AA---~,X.C~,i,j\] if (X= c VFoot(~'x)) A -~3pA LeftAux(pA) \[#A--*a.vxfl, i,j\] =-- \[#A--+a~X.fl, i,j\] if (X = C/ VFoot(vx)) To take advantage of equivalent states during parsing, one skips directly from the first to the last state in a set of equivalent states. This avoids going through the normal rule application process and has the effect of reducing the grammar size.</Paragraph>
      <Paragraph position="8"> For a state \[/AA-/OL,X a, i,j\] to be equivalent to \[/AA~L, XOa, i,j\], it is not sufficient that the first child of vx be empty or a foot node. It must also be the case that left adjunction is not possible on/AA. If left adjunction is possible on/AA, the state \[/AA--*q, vX a, i,j\] must be independently retained in order to trigger left adjunction when appropriate.</Paragraph>
      <Paragraph position="9"> Sharing nodes in a TIG. An important feature of the parser in Figure 10 is that the nth child of a node need not be unique and a subtree need not have only one parent. (Nonuniqueness indicates that a subtree or a supertree appears at several different places in the grammar.) The only requirement when sharing nodes is that every possible way of constructing a tree that is consistent with the parent-child relationships must be a valid elementary tree in the grammar.</Paragraph>
      <Paragraph position="10"> For example, consider the trees in Figure 11.</Paragraph>
      <Paragraph position="11">  A pair of TIG trees.</Paragraph>
      <Paragraph position="12"> They can be represented individually as follows:</Paragraph>
      <Paragraph position="14"> However, taking maximum advantage of sharing within and between the trees, they can be represented more compactly as: 14_+ 2 4 2 3 4 5 flS /AA/AB,/AA----~/Aa,/AB--'+{/AA \[/A2}/AS,8/AA___+/ADB 6 /Ab,7 LeftAux(/A1), Subst(/A6), Foot(/A8) In the above, two kinds of sharing are apparent. Subtrees are shared by using the same node (for example/AA) on the right-hand side of more than one layer production.</Paragraph>
      <Paragraph position="15"> Supertrees are shared by explicitly recording the fact that there are multiple alternatives for the nth child of a some node. This is represented textually above using curly braces.</Paragraph>
      <Paragraph position="16"> In the case of Figure 11, sharing reduces the grammar size IG\[ from 21 to 11.</Paragraph>
      <Paragraph position="17"> Depending on the amount of sharing present in a grammar, an exponential decrease in the grammar size is possible.</Paragraph>
      <Paragraph position="18">  Schabes and Waters Tree Insertion Grammar Parsing left anchored LTIGs. The algorithm above can be extended to take advantage of the fact that the elementary trees in an LTIG are lexicalized. This does not change the worst case complexity, but is a dramatic improvement in typical situations, because it has the effect of dramatically reducing the size of the grammar that has to be considered when parsing a particular input string.</Paragraph>
      <Paragraph position="19"> Space does not permit a discussion of all the ways lexical sensitivity can be introduced into the TIG parser. However, one way of doing this is particularly important in the context of this paper. The LTIG lexicalization procedure presented in Section 5 produces grammars that have no left auxiliary trees and are left anchored---ones where for each elementary tree, the first element that must be matched against the input is a lexical item. By means of two simple changes in the prediction rules, the TIG parser can benefit greatly from this kind of lexicalization.</Paragraph>
      <Paragraph position="20"> First, whenever considering a node #B for prediction at position j, it should only be predicted if its anchor is equal to the next input item aj+l. Other predictions cannot lead to successful matches. However, if sharing is being used, then one chart state can correspond to a number of different positions in different trees. As a result, even though every tree has a unique left anchor, a given chart state can correspond to a set of such trees and therefore a set of such anchors. A prediction should be made if any of these anchors is the next element of the input.</Paragraph>
      <Paragraph position="21"> Second, when predicting a node ~B whose first child is a terminal symbol, it is known from the above that this child must match the next input element. Therefore, there is no need to create the state \[#B--*eua c~,j,j\]. One can instead skip directly to the state \[#B--*ua.c~,j,j + 11.</Paragraph>
      <Paragraph position="22"> Both of the changes above depend critically on the fact that there are no left auxiliary trees. In particular, if there is a left auxiliary tree PB that can be adjoined on /~B, then the next input item may be matched by p8 rather than/~B; and neither of the shortcuts above can be applied.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="492" end_page="504" type="metho">
    <SectionTitle>
5. TIG Strongly Lexicalizes CFG
</SectionTitle>
    <Paragraph position="0"> In the following, we say that a grammar is lexicalized (Schabes 1990; Schabes et al. 1988) if every elementary structure contains a terminal symbol called the anchor. A CFG is lexicalized if every production rule contains a terminal. Similarly, a TIG is lexicalized if every tree contains a terminal symbol.</Paragraph>
    <Paragraph position="1"> A formalism F' is said to lexicalize (Joshi and Schabes 1992) another formalism F, if for every grammar G in F that does not derive the empty string, there is a lexicalized grammar G' in F' such that G and G' generate the same string set.</Paragraph>
    <Paragraph position="2"> F' is said to strongly lexicalize F if for every finitely ambiguous grammar G in F that does not derive the empty string, there is a lexicalized grammar G ~ in F ~ such that G and G ~ generate the same string set and tree set.</Paragraph>
    <Paragraph position="3"> The restrictions on the form of G in the definitions above are motivated by two key properties of lexicalized grammars (Joshi and Schabes 1992). First, lexicalized grammars cannot derive the empty string, because every structure introduces at least one lexical item. Thus, if a CFG is to be lexicalized, it must not be the case that S~e. Second, lexicalized grammars are finitely ambiguous, because every rule introduces at least one lexical item into the resulting string. Thus, if a grammar is to be strongly lexicalized, it must be only finitely ambiguous. In the case of a CFG, this means that it must not be the case that X=~X for any non-terminal X.</Paragraph>
    <Paragraph position="4"> As shown by Greibach (1965) and Rosenkrantz (1967), any CFG grammar that does not generate the empty string can be converted into a lexicalized CFG. Moreover, this  Computational Linguistics Volume 21, Number 4 grammar can be left anchored--one where the first element of the right hand side of each rule is a terminal symbol. However, this is only a weak lexicalization, because the trees generated by the lexicalized grammar are not the same as those generated by the original CFG.</Paragraph>
    <Paragraph position="5"> Another way to lexicalize CFGs is to convert them into categorial grammars (Bar-Hillel 1964). However, these are again only weak lexicalizations because the trees produced are not preserved. 7 Strong lexicalization can be obtained using TAG (Joshi and Schabes 1992; Schabes 1990), but only at the cost of O(n 6) parsing. TIG is O(n 3) parsable and strongly lexicalizes CFG.</Paragraph>
    <Section position="1" start_page="493" end_page="500" type="sub_section">
      <SectionTitle>
5.1 A Strong Lexicalization Procedure
</SectionTitle>
      <Paragraph position="0"> In the following, we give a constructive proof of the fact that TIG strongly lexicalizes CFG. The proof is based on a lexicalization procedure related to the lexicalization procedure used to create Greibach normal form (GNF) as presented in Harrison 1978.</Paragraph>
      <Paragraph position="1"> 5.1.1 Lemmas. Our procedure relies on the following four lemmas. The first lemma converts CFGs into a very restricted form of TIG. The next three lemmas describe ways that TIGs can be transformed without changing the trees produced.</Paragraph>
      <Paragraph position="2"> Lemma 1 Any finitely ambiguous CFG G = (~,NT, P, S) can be converted into a TIG G' = (G, NT, I, {}, S) such that: (i) there are no auxiliary trees; (ii) no initial tree contains any interior nodes; (iii) G ~ generates the same trees and, therefore, the same strings as G; (iv) there is only one way to derive a given tree in G'.</Paragraph>
      <Paragraph position="3">  We assume without loss of generality that G does not contain any useless production. The set I of initial trees in G' is constructed by converting each rule R in P into a one-level tree t whose root is labeled with the left-hand side of R. If R has n &gt; 0 elements on its right-hand side, then t is given n children, each labeled with the corresponding right-hand-side element. Each child labeled with a nonterminal is marked for substitution. If the right-hand side of R is empty, t is given one child labeled with c. By construction, there are no auxiliary trees and no interior nodes in any initial tree. There is an exact one-to-one correspondence between derivations in G and derivations using the initial trees. Each rule substitution in G becomes a tree substitution in GL As a result, exactly the same trees are generated in both cases, and there is only one way to generate each tree in G t, because there cannot be two ways to derive the same tree in a CFG. \[\] Lemma 2 Let G = (E, NT, LA, S) be a TIG. Let t c I U A be an elementary tree whose root is labeled Y and let # be a frontier element of t that is labeled X and marked for substitution. Further, suppose that if t is an initial tree, X ~ Y. Let T' be the set of 7 This is true even if Bar-Hillel's Categorial Grammars are augmented with composition (.Joshi, personal communication).</Paragraph>
      <Paragraph position="4">  The transformation specified by this lemma closes over substitution into # and then discards t. Since t cannot be substituted into #, this only generates a finite number of additional trees.</Paragraph>
      <Paragraph position="5"> Any complete derivation in G can be converted into exactly one derivation in G' as follows: A derivation consists of elementary trees and operations between them. Every use of t in a complete derivation in G has to be associated with a substitution of some u E I for #. Taken as a group, the two trees t and u, along with the substitution operation between them, can be replaced by the appropriate new tree t ~ E T t that was added in the construction of G ~.</Paragraph>
      <Paragraph position="6"> Since TIGs do not treat the roots of initial trees in any special way, there is no problem converting any operation applied to the root of u into an operation on the corresponding interior node of t'. Further, since it cannot be the case that t = u, there is no ambiguity in the mapping defined above.</Paragraph>
      <Paragraph position="7"> Any derivation in G ~ can be converted into exactly one derivation in G by doing the reverse of the conversion above. Each instance t' of one of the new trees introduced is replaced by an instance of t with the appropriate initial tree u E I being combined with it by substitution.</Paragraph>
      <Paragraph position="8"> Again, since TIGs do not treat the roots of initial trees in any special way, there is no problem converting any operation applied to an interior node of t ~ that corresponds to the root of u into an operation on the root of u.</Paragraph>
      <Paragraph position="9"> Further, if there is only one way to derive a given tree in G, there is no ambiguity in the mapping from derivations in G' to G, because there is no ambiguity in the mapping of T ~ to trees in G. The tree t ~ must be different from the other trees generated when creating T ~, because t ~ contains complete information about the trees it was created from. The tree t ~ must not be in I U A. If it were, there would be multiple derivations for some tree in G---one involving t ~ and one involving t and u. Finally, t' must be different from t, because it must be larger than t.</Paragraph>
      <Paragraph position="10"> If there is only one way to derive a given tree in G, the mappings between derivations in G' and G are one-to-one and there is therefore only one way to derive a given tree in G ~. \[\] Lemma 3 Let G = (E, NT, I,A, S) be a TIG. Let t E I be an elementary initial tree whose root is labeled with X ~ S. Further, suppose that none of the substitution nodes, if any, on the fringe of t are labeled X. Let U' be the set of every initial tree that can be created by substituting t for one or more frontier nodes in an initial tree u E I that are labeled  Computational Linguistics Volume 21, Number 4 X and marked for substitution. Let W be the set of every auxiliary tree that can be created by substituting t for one or more frontier nodes in an auxiliary tree v E A that are labeled X and marked for substitution. Define G ~ = (G, NT, I~,A',S) where</Paragraph>
      <Paragraph position="12"> Then, G ~ generates exactly the same trees as G. Further, if there is only one way to generate each tree generated by G, then there is only one way to generate each tree generated by G ~.</Paragraph>
      <Paragraph position="13">  The transformation specified by this lemma closes over substitution of t and then discards t. Since t cannot be substituted into itself, this generates only a finite number of additional trees. Since the root of t is not labeled S, t is not required for any purpose other than substitution.</Paragraph>
      <Paragraph position="14"> Any complete derivation in G can be converted into exactly one derivation in G ~ as follows: Since the root of t is not labeled S, every use of t in a complete derivation in G has to be substituted into some frontier node # of some u E I U A. Taken as a group, the two trees u and t, along with any other copies of t substituted into other frontier nodes of u and the substitution operations between them, can be replaced by the appropriate new tree u ~ E U ~ U V ~ that was added in the construction of GL Since TIGs do not treat the roots of initial trees in any special way, there is no problem converting any operation applied to the root of t into an operation on the corresponding interior node of u/. Further, since it cannot be the case that t = u, there is no ambiguity in the mapping defined above.</Paragraph>
      <Paragraph position="15"> Any derivation in G ~ can be converted into a derivation in G by doing the reverse of the conversion above. Each instance u ~ of one of the new trees introduced is replaced by one or more instances of t substituted into the appropriate tree u E I U A.</Paragraph>
      <Paragraph position="16"> Again, since TIGs do not treat the roots of initial trees in any special way, there is no problem converting any operation applied to the interior node of u ~ that corresponds to the root of t into an operation on the root of t.</Paragraph>
      <Paragraph position="17"> Further, if there is only one way to derive a given tree in G, there is no ambiguity in the mapping from derivations in G r to G, because there is no ambiguity in the mapping of u I to trees in G. The tree u ~ must be different from the trees that are generated by substituting t in other trees u, because u ~ contains complete information about the trees it was created from. The tree u r must not be in I U A. If it were, there would be multiple derivations for some tree in G---one involving u ~ and one involving u and t. Finally, u ~ must be different from t, because it must be larger than t.</Paragraph>
      <Paragraph position="18"> If there is only one way to derive a given tree in G, the mappings between derivations in G ~ and G are one-to-one and there is therefore only one way to derive a given tree in G/. E3 Lemma 4 Let G = (G, NT, LA, S) be a TIG and X E NT be a nonterminal. Let T C I be the set of every elementary initial tree t such that the root of t and the leftmost nonempty frontier node of t are both labeled X. Suppose that every node labeled X where adjunction can occur is the root of an initial tree in I. Suppose also that there is no tree in A whose root is labeled X. Let T ~ be the set of right auxiliary trees created by marking the first nonempty frontier node of each element of T as a foot rather than for substitution. Define G' = (~, NT, I - T, A U T', S).</Paragraph>
      <Paragraph position="19"> Then, G / generates exactly the same trees as G. Further, if there is only one way to generate each tree generated by G, then there is only one way to generate each tree generated by GC  Note that when converting the trees in T into trees in T ~, every initial tree is converted into a different auxiliary tree. Therefore, there is a one-to-one mapping between trees in T and T'. Further, since there are no X-rooted trees in A, A N T' = {}.</Paragraph>
      <Paragraph position="20"> Since in G, every node labeled X where adjunction can occur is the root of an initial tree in/, it must be the case that in G', every node labeled X where adjunction can occur is the root of an initial tree in I', because the construction of T t did not create any new nodes labeled X where adjunction can occur. Therefore, the only way that any element of T' can be used in a derivation in G' is by adjoining it on the root of an initial tree u. The effect of this adjunction is exactly the same as substituting the corresponding t E I in place of u and then substituting u for the first nonempty frontier node of t.</Paragraph>
      <Paragraph position="21"> Any complete derivation in G can be converted into exactly one derivation in G' as follows: Every instance of a tree in T has to occur in a substitution chain. The chain consists of some number of instances h, t2 .... ,tm of trees in T, with each tree substituted for the leftmost nonempty frontier node of the next. The top of the chain tm is either not substituted anywhere (i.e., only if X = S) or substituted at a node that is not the leftmost nonempty node of a tree in T. The bottom tree in the chain tl has some tree u ~ T substituted for its leftmost nonempty frontier node. Since there are no X-rooted trees in A, there cannot be any adjunction on the root of u or on the roots of any of the trees in the chain. The chain as a whole can be replaced by the simultaneous adjunction of the corresponding trees ' ' ' in T t tl, t 2 ..... t m on the root of u, with u used in the same way that tm was used.</Paragraph>
      <Paragraph position="22"> Any derivation in G' can be converted into a derivation in G by doing the reverse of the conversion above. Each use of a tree in T' must occur as part of the simultaneous adjunction of one or more auxiliary trees on the root of some initial tree u, because there are no other nodes at which this tree can be adjoined. Since the trees in T' are the only X-rooted trees in A ~, all the trees being simultaneously adjoined must be instances of trees in T t. The simultaneous adjunction can be replaced with a substitution chain combining the corresponding trees in T, with u substituted into the tree at the bottom of the chain and the top of the chain used however u was used.</Paragraph>
      <Paragraph position="23"> Further, if there is only one way to derive a given tree in G, there is no ambiguity in the mapping from derivations in G' to G, because there is no ambiguity in the mapping of the t; to trees in G. If there is only one way to derive a given tree in G, the mappings between derivations in G ~ and G are one-to-one and there is therefore only one way to derive a given tree in GL \[\] After an application of Lemmas 2-4, a TIG may no longer be in reduced form; however, it can be brought back to reduced form by discarding any unnecessary elementary trees. For instance, in Lemma 2, if # is the only substitution node labeled X  and X ~ S, then when t is discarded, every X-rooted initial tree can be discarded as well.</Paragraph>
      <Paragraph position="24"> 5.1.2 Constructing an LTIG. Using the above lemmas, an LTIG corresponding to a CFG can be constructed.</Paragraph>
      <Paragraph position="25"> Theorem 2 If G = (G, NT, P,S) is a finitely ambiguous CFG that does not generate the empty  string, then there is an LTIG G' = (G, NT, I',A', S) generating the same language and tree set as G with each tree derivable in only one way. Furthermore, G' can be chosen  Computational Linguistics Volume 21, Number 4 so that all the auxiliary trees are right auxiliary trees and every elementary tree is left anchored.</Paragraph>
      <Paragraph position="26">  To prove the theorem, we first prove a somewhat weaker theorem and then extend the proof to the full theorem. We assume for the moment that the set of rules for G does not contain any empty rules of the form A ~ C/.</Paragraph>
      <Paragraph position="27"> The proof proceeds in four steps. At each step, none of the modifications made to the grammar change the tree set produced nor introduce more than one way to  derive any tree. Therefore, the degree of ambiguity of each string is preserved by the constructed LTIG.</Paragraph>
      <Paragraph position="28"> An ordering {A1 ..... Am} of the nonterminals NT is assumed.</Paragraph>
      <Paragraph position="29"> * Step 1: Using Lemma 1, we first convert G into an equivalent TIG (Y~,NT, I, {}, S), generating the same trees. Because G does not contain any empty rules, the set of initial trees created does not contain any empty trees.</Paragraph>
      <Paragraph position="30"> * Step 2: In this step, we modify the grammar of Step 1 so that every initial tree t E I satisfies the following property fL Let the label of the root of t be Ai. The tree t must either: (i) (ii)  be left anchored, i.e., have a terminal as its first nonempty frontier node; or have a first nonempty frontier node labeled Aj where i &lt; j.</Paragraph>
      <Paragraph position="31"> We modify the grammar to satisfy f~ inductively for increasing values of i. Consider the Al-rooted initial trees that do not satisfy Ft. Such trees must have their first nonempty frontier node labeled with A1. These initial trees are converted into right auxiliary trees as specified by Lemma 4. The applicability of Lemma 4 in this case is guaranteed since, after Step 1, there are no auxiliary trees, no interior nodes, and TIG prohibits adjunction at frontier nodes.</Paragraph>
      <Paragraph position="32"> We now assume inductively that Ft holds for every Ai rooted initial tree t where i &lt; k.</Paragraph>
      <Paragraph position="33"> Step 2a: Consider the Ak-rooted initial trees that fail to satisfy Ft.</Paragraph>
      <Paragraph position="34"> Each one must have a first nonempty frontier node # labeled with Aj where j _&lt; k. For those where j &lt; k, we generate a new set of initial trees by substituting other initial trees for # in accordance with Lemma 2.</Paragraph>
      <Paragraph position="35"> By the inductive hypothesis, the substitutions specified by Lemma 2 result in trees that are either left anchored, or have first nonempty frontier nodes labeled with A l where I &gt; j. For those trees where 1 ~ k, substitution as specified by Lemma 2 is applied again.</Paragraph>
      <Paragraph position="36"> After at most k - 1 rounds of substitution, we reach a situation where every Ak-rooted initial tree that fails to satisfy Ft has a first nonempty frontier node labeled with Ak.</Paragraph>
      <Paragraph position="37"> Step 2b: The Ak-rooted initial trees where the first nonempty frontier node is labeled with Ak are then converted into right auxiliary trees as specified by Lemma 4. The applicability of  Schabes and Waters Tree Insertion Grammar Lemma 4 in this situation is guaranteed by the following facts.</Paragraph>
      <Paragraph position="38"> First, there cannot have previously been any Ak-rooted auxiliary trees, because there were none after Step 1, and every auxiliary tree previously introduced in this induction has a root labeled Ai for some i &lt; k. Second, there cannot be any internal nodes in any elementary tree labeled Ak, because there were none after Step 1, and all subsequent substitutions have been at nodes labeled Ai where i &lt; k.</Paragraph>
      <Paragraph position="39"> Steps 2a and 2b are applied iteratively for each i, 1 &lt; i _&lt; m until every initial tree satisfies f~.</Paragraph>
      <Paragraph position="40"> * Step 3: In this step, we modify the set of initial trees further until every one is left anchored. We modify the grammar to satisfy this property inductively for decreasing values of i.</Paragraph>
      <Paragraph position="41"> According to property f~, every Am-rooted initial tree is left anchored, because there are no higher indexed nonterminals.</Paragraph>
      <Paragraph position="42"> We now assume inductively that every Ai rooted initial tree t where i &gt; k is left anchored.</Paragraph>
      <Paragraph position="43"> The Ak rooted initial-trees must be left anchored, or have leftmost nonempty frontier nodes labeled with Aj, where j &gt; k. When the label is Aj, we generate new initial trees using Lemma 2. These new rules are all left anchored, because by the induction hypothesis, all the trees u substituted by Lemma 2 are left anchored.</Paragraph>
      <Paragraph position="44"> The above is repeated for each i until i = 1 is reached.</Paragraph>
      <Paragraph position="45"> * Step 4: Finally, consider the auxiliary trees created above. Each is a right auxiliary tree. If an auxiliary tree t is not left anchored, then the first nonempty frontier element after the foot is labeled with some nonterminal Ai. There must be some nonempty frontier element after the foot of t because G is not infinitely ambiguous. We can use Lemma 2 yet again to replace t with a set of left anchored right auxiliary trees. All the trees produced must be left anchored because all the initial trees resulting from Step 3 are left anchored.</Paragraph>
      <Paragraph position="46"> * Empty rules: The auxiliary assumption that G does not contain empty rules can be dispensed with.</Paragraph>
      <Paragraph position="47"> If G contains empty rules, then the TIG created in Step 1 will contain empty trees. These trees can be eliminated by repeated application of Lemma 3. Let t be an empty tree. Since G does not derive the empty string, the label of the root of t is not S. The tree t can be eliminated by applying Lemma 3. This can lead to the creation of new empty trees.</Paragraph>
      <Paragraph position="48"> However, these can be eliminated in turn using Lemma 3. This process must terminate because G is finitely ambiguous.</Paragraph>
      <Paragraph position="49"> Mark all the interior nodes in all the initial trees created by Lemma 3 as nodes where adjunction cannot occur. With the inclusion of these adjoining constraints, the procedure above works just as before. \[\] In the worst case, the number of elementary trees created by the LTIG procedure above can be exponentially greater than the number of production rules in G. This explosion in numbers comes from the compounding of repeated substitutions in Steps 2&amp;3.</Paragraph>
      <Paragraph position="50">  Example of the operation of the LTIG procedure.</Paragraph>
      <Paragraph position="51"> However, as noted at the end of Section 4, counting the number of elementary trees is not an appropriate measure of the size of an LTIG. The compounding of substitutions in the LTIG procedure causes there to be a large amount of sharing between the elementary trees. Taking advantage of this sharing can counteract the exponential growth in the number of rules completely. In particular, if the CFG does not have any empty rules or sets of mutually left recursive rules involving more than one nonterminal, then the size of the LTIG created by the procedure of Theorem 2 will be smaller than the size of the original CFG.</Paragraph>
      <Paragraph position="52"> On the other hand, if a grammar has many sets of mutually left recursive rules involving more than one nonterminal, even taking advantage of sharing cannot stop an exponential explosion in the size of the LTIG. In the worst case, a grammar with m nonterminals can have m! sets of mutually left recursive rules, and the result LTIG will be enormous.</Paragraph>
      <Paragraph position="53"> 5.1.3 An Example. Figure 12 illustrates the operation of the LTIG procedure. Step 1 of the procedure converts the CFG at the top of the figure to the TIG shown on the second line.</Paragraph>
      <Paragraph position="54">  Schabes and Waters Tree Insertion Grammar In Step 2, no change is necessary in the Al-initial tree. However, the first A2-initial tree has the Al-initial tree substituted into it. After that, the first two A2-initial trees are converted into auxiliary trees as shown on the third line of Figure 12. In step 3, the Al-initial tree is lexicalized by substituting the remaining A2-initial tree into it. Step 4 creates the final LTIG by lexicalizing the auxiliary trees. The Al-initial tree is retained under the assumption that A1 is the start symbol of the grammar.  Schabes 1990) that TAG extended with adjoining constraints not only strongly lexicalizes CFG, but itself as well. We conjecture that our construction can be extended so that given any TIG as input, an LTIG generating the same trees could be produced. As with TAGs, adjoining constraints forbidding the adjunction of specific auxiliary trees on specific nodes can be required in the resulting LTIG.</Paragraph>
    </Section>
    <Section position="2" start_page="500" end_page="502" type="sub_section">
      <SectionTitle>
5.2 Comparison of the LTIG, GNF, and Rosenkrantz Procedures
</SectionTitle>
      <Paragraph position="0"> dure traditionally used to create GNF (see, for example, Harrison, 1978). This procedure is referred to below as the GNF procedure. This procedure is not the procedure originally developed by Greibach (1965). Rather, it is very similar to the procedure developed shortly thereafter by Abbott and Kuno (1965). The main part of the GNF procedure operates in three steps that are similar to Steps 2, 3, &amp; 4. However, there are five important differences between the LTIG and GNF procedures.</Paragraph>
      <Paragraph position="1"> First, in lieu of Step 1, the GNF procedure converts the input into Chomsky normal form. This eliminates infinite ambiguity and empty rules, and puts the input grammar in a very specific form. The elimination of infinite ambiguity is essential, because the GNF procedure will not operate if infinite ambiguity is present. The elimination of empty rules is also essential, because empty rules in the input to the rest of the GNF procedure lead to empty rules in the output. However, the remaining changes caused by putting the input in Chomsky normal form are irrelevant to the basic goal of creating a left anchored output. A more compact left anchored grammar can typically be produced by eliminating infinite ambiguity and empty rules without making the other changes necessary to put the input in Chomsky normal form. In the following discussion, we assume a modified version of the GNF procedure that takes this approach.</Paragraph>
      <Paragraph position="2"> Second, the GNF procedure can reduce the ambiguity of the input grammar. This is due to loss of information when the same rule is derived in more than one way by the GNF procedure. Ambiguity can be retained simply by retaining any duplicate rules that are derived (Abbott and Kuno 1965).</Paragraph>
      <Paragraph position="3"> Third, the GNF procedure changes the trees produced. This is an essential difference and cannot be avoided. However, as shown by Abbott and Kuno (1965), it is possible to transform parse trees created using the GNF into the parse trees that would have been obtained using the original grammar, based on a record of exactly how each GNF rule was derived. In contrast to LTIG, which derives the correct trees in the first place, this transformation requires a separate post phase after parsing.</Paragraph>
      <Paragraph position="4"> The fourth important difference between the LTIG and GNF procedures is the way they handle left recursive rules. The LTIG procedure converts them into right auxiliary trees. In contrast, the GNF procedure converts them into right recursive rules. That is to say, the GNF procedure converts rules of the form Ak --+ AkO~ I fl into rules of the form Ak --~ fl \[ flZk and Zk ~ ~ \[ C~Zk. This is the source of the most radical changes in the trees produced.</Paragraph>
      <Paragraph position="5">  The LTIG of Figure 12 converted into a CFG.</Paragraph>
      <Paragraph position="6"> Figure 13 illustrates the operation of the GNF procedure when applied to the same CFG as in Figure 12. Since the input grammar is finitely ambiguous and has no empty rules, it can be operated on as is.</Paragraph>
      <Paragraph position="7"> The step of the GNF procedure corresponding to Step 2 of the LTIG procedure converts the CFG at the top of Figure 13 into the rules shown in the second part of the figure. No change is necessary in the A1 rule. However, the first A2 rule has the A1 rule substituted into it. After that, the left recursive A2 rules are converted into right recursive rules utilizing a new nonterminal Z2.</Paragraph>
      <Paragraph position="8"> The step of the GNF procedure corresponding to Step 3 of the LTIG procedure lexicalizes the A1 rule by substituting the A2 rules into it.</Paragraph>
      <Paragraph position="9"> The final step of the GNF procedure lexicalizes the Z2 rules as shown at the bottom of Figure 13. Note that there are eight ways of substituting an A1 or A2 rule into the first position of a Z2 rule, but they yield only four distinct rules. For example, substituting A1 ---* aA2 into Z2 --* A1 yields the same result as substituting A2 ~ a into Z2 --+ A2A2. If the LTIG created in Figure 12 is converted into a CFG as specified in Theorem 1, the rules in Figure 14 are obtained. Ambiguity is lost in this transformation, because both auxiliary trees turn into the same rule. If the empty rule in Figure 14 is eliminated by substitution, a grammar identical to the one at the bottom of Figure 13 results.</Paragraph>
      <Paragraph position="10"> We conjecture that there is, in general, an exact correspondence between the output of the LTIG procedure and the GNF procedure. In particular, if (a) the LTIG procedure is applied to a CFG in Chomsky normal form, (b) the LTIG is converted into a CFG as specified in Theorem 1, and (c) any resulting empty rules are eliminated by substitution, the result is always the same CFG as that produced by the GNF procedure. The fifth important difference between the LTIG and GNF procedures is that the  Schabes and Waters Tree Insertion Grammar output of the LTIG procedure can be represented compactly. There are two reasons for this. To start with, the use of auxiliary trees in an LTIG can allow it to be exponentially smaller than the equivalent GNF. To see this, note that the elimination of empty rules required when converting an LTIG into a GNF can cause an exponential increase in the number of rules. Furthermore, the trees created by the LTIG procedure have an extremely repetitive structure. As a result, node sharing can typically be used to represent the LTIG compactly--it is often smaller than the original CFG (see Section 6.1).</Paragraph>
      <Paragraph position="11">  dure is the CFG lexicalization procedure of Rosenkrantz (1967). This procedure operates in a completely different way from Greibach's procedure--simultaneously eliminating all leftmost derivation paths of length greater than one, rather than shortening derivation paths one step at a time via substitution and eliminating left recursive rules one nonterminal at a time.</Paragraph>
      <Paragraph position="12"> One consequence of the simultaneous nature of the Rosenkrantz procedure is that one need not select an order of the nonterminals. This contrasts with the Greibach and LTIG procedures where the order chosen can have a significant impact on the number of elementary structures in the result.</Paragraph>
      <Paragraph position="13"> As with the GNF procedure, one typically begins the Rosenkrantz procedure by converting the input to Chomsky normal form. This is necessary to remove infinite ambiguity and empty rules. However, it is also needed to remove chain rules, which would otherwise lead to nonlexicalized rules in the output. The conversion to Choresky normal form makes a lot of other changes as well, which are largely counterproductive if one wants to construct a left anchored grammar.</Paragraph>
      <Paragraph position="14"> Also like the GNF procedure, ambiguity can be reduced and the trees derived are changed. However, the ambiguity can be retained if duplicate rules are maintained.</Paragraph>
      <Paragraph position="15"> It should also be possible to convert the resulting parse trees into parse trees in the original grammar. This could be a complicated process, however, since the Rosenkrantz algorithm alters the trees more radically than the GNF procedure.</Paragraph>
      <Paragraph position="16"> A key advantage of the Rosenkrantz procedure is that, unlike the Greibach and LTIG procedures, the output it produces cannot be exponentially larger than the input. In particular, the growth in the number of rules is at worst O(mS), where m is the number of nonterminals. However, the Rosenkrantz procedure typically produces grammars that are less compact than those created by the LTIG procedure (see Section 6.1).</Paragraph>
      <Paragraph position="17"> It may be useful to develop a formalism and procedure that bare the same relationship to the Rosenkrantz procedure that TIG and the LTIG procedure bare to the GNF procedure. Given the fundamental advantages of the Rosenkrantz procedure over the GNF procedure, this might lead to a result that is superior to the LTIG procedure.</Paragraph>
    </Section>
    <Section position="3" start_page="502" end_page="504" type="sub_section">
      <SectionTitle>
5.3 Variants of the LTIG Procedure
</SectionTitle>
      <Paragraph position="0"> The LTIG procedure above creates a left anchored LTIG that uses only right auxiliary trees. As shown in Section 6.3, this is quite an advantageous form. However, other forms might be more advantageous in some situations. Many variants of the LTIG procedure are possible. For example, everywhere in the procedure, the word &amp;quot;right&amp;quot; can be replaced by &amp;quot;left&amp;quot; and vice versa. This results in the creation of a right anchored LTIG that uses only left auxiliary trees. This could be valuable when processing a language with a fundamentally left recursive structure.</Paragraph>
      <Paragraph position="1"> A variety of steps can be taken to reduce the number of elementary trees produced by the LTIG procedure. To start with, the choice of an ordering {A1 ..... Am} for the  nonterminals is significant. In the presence of sets of mutually left recursive rules involving more than one nonterminal (i.e., sets of rules of the form {A ~ Bfl, B Ac~}), choosing the best ordering of the relevant nonterminals can greatly reduce the number of trees produced.</Paragraph>
      <Paragraph position="2"> If one abandons the requirement that the grammar must be left anchored, one can sometimes reduce the number of elementary trees produced dramatically. The reason for this is that instead of being forced to lexicalize each rule in G at the first position on its right hand side, one is free to choose the position that minimizes the total number of elementary trees eventually produced. However, one must be careful to meet the requirements imposed by TIG while doing this. In particular, one must create only left and right auxiliary trees as opposed to wrapping auxiliary trees. The search space of possible alternatives is so large that it is not practical to find an optimal LTIG; however, by means of simple heuristics and hill climbing, significant reductions in the number of elementary trees can be obtained.</Paragraph>
      <Paragraph position="3"> Finally, one can abandon the requirement that there be only one way to derive each tree in the LTIG. This approach is discussed in Schabes and Waters 1993c. In the presence of sets of mutually left recursive rules involving more than one nonterminal, allowing increased ambiguity can yield significant reduction in the number of elementary trees.</Paragraph>
      <Paragraph position="4"> It should be noted that while exploring ways to create LTIGs with small numbers of elementary trees is interesting, it may not be of practical significance because the number of elementary trees is not a good measure of the size of a TIG. In particular, if a decreased number of elementary trees is accompanied by decreased sharing, this can lead to an increase in the grammar size, rather than a decrease. As illustrated in Section 6.1, the opportunities for sharing between the elementary trees in the LTIGs created by the LTIG procedure is so high that the grammars produced are often smaller than alternatives that have many fewer elementary trees.</Paragraph>
      <Paragraph position="5">  Schabes and Waters Tree Insertion Grammar</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML