File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/p88-1032_metho.xml
Size: 33,583 bytes
Last Modified: 2025-10-06 14:12:12
<?xml version="1.0" standalone="yes"?> <Paper uid="P88-1032"> <Title>AN EAR.LEY-TYPE PAR.SING ALGOR.ITHM FOR. TR.EE ADJOINING GR_kMMAR.S *</Title> <Section position="3" start_page="0" end_page="259" type="metho"> <SectionTitle> 2 Dotted symbols, dotted </SectionTitle> <Paragraph position="0"> trees, tree traversal The full algorithm is explained in the next section. This section introduces preliminary concepts that will be used by the algorithm. We first show how dotted rules can be extended to trees. Then we introduce a tree traversal that the algorithm will mimic in order to scan the input from left to right. We define a dotted symbol as a symbol associated with a dot above or below and either to the left or to the right of it. The four positions of the dot are annotated by In, lb, ra, rb (resp. left above, left below, right above, right below): laura lb ~rb * Then we define a dotted tree as a tree with exactly one dotted symbol.</Paragraph> <Paragraph position="1"> Given a dotted tree with the dot above and to the left of the root, we define a tree traversal of a dotted tree as follows (see Figure 3): * if the dot is at position la of an internal node, we move the dot down to position lb, * if the dot is at position lb of an internal node, we move to position la of its leftmost child, * if the dot is at position la of a leaf, we move the dot to the right to position ra of the leaf, * if the dot is at position rb of a node, we move the dot up to position ra of the same node, * if the dot is at position ra of a node, there are two cases: - if the node has a right sibling, then move the dot to the right sibling at position la.</Paragraph> <Paragraph position="2"> - if the node does not have a right sibling, then move the dot to its parent at position rb.</Paragraph> <Paragraph position="3"> This traversal will enable us to scan the frontier of an elementary tree from left to right while trying to recognize possible adjunctions between the above and below positions of the dot.</Paragraph> </Section> <Section position="4" start_page="259" end_page="264" type="metho"> <SectionTitle> 3 The algorithm </SectionTitle> <Paragraph position="0"> We define an appropriate data structure for the algorithm. We explain how to interpret the structures that the parser produces. Then we describe the algorithm itself.</Paragraph> <Section position="1" start_page="259" end_page="259" type="sub_section"> <SectionTitle> 3.1 Data structures </SectionTitle> <Paragraph position="0"> The algorithm uses two basic data structures: state and states set.</Paragraph> <Paragraph position="1"> A states set S is defined as a set of states. The states sets will be indexed by an integer: Si with i E N. The presence of any state in states set i will mean that the input string al...al has been recognized.</Paragraph> <Paragraph position="2"> Any tree ~ will be considered as a function from tree addresses to symbols of the grammar (terminal and non-terminal symbols): if z is a valid address in a, then a(z) is the symbol at address z in the tree a.</Paragraph> <Paragraph position="3"> Definition 2 A state s is defined as a 10-tuple, \[a, dot, side,pos, l, ft, fr, star, t~, b~\] where: side E {left, right}.</Paragraph> <Paragraph position="4"> * pos: is the position of the dot; pos E {above, below}.</Paragraph> <Paragraph position="5"> * star. is an address in a. The corresponding node in a is called the starred node.</Paragraph> <Paragraph position="6"> * ! (left), ft (foot left), fr (foot right), t~ (top left of starred node), b~ (bottom left of starred node) are indices of positions in the input string ranging over \[O,n\], n being the length of the input string. They will be explained further below.</Paragraph> </Section> <Section position="2" start_page="259" end_page="260" type="sub_section"> <SectionTitle> 3.2 Invariant of the algorithm </SectionTitle> <Paragraph position="0"> The states s in a states set Si have a common property. The following section describes this invariant in order to give an intuitive interpretation of what the algorithm does. This invariant is similar to Earley's invariant.</Paragraph> <Paragraph position="1"> Before explaining the main characterization of the algorithm, we need to define the set of nodes on which an adjunction is allowed for a given state. Definition 3 The set of nodes 7~(s) on which an adjunction is possible for a given state s - \[a, dot, side, pos, l, fhfi,star, t~,b~\], is defined as the union of the following sets of nodes in a: * the set of nodes that have been traversed on the left and right sides, i.e., the four positions of the dot have been traversed; * the set of nodes on the path from the root node to the starred node, root node and starred node included. Note that if there is no star this set is empty.</Paragraph> <Paragraph position="2"> Definition 4 (Left part of a dotted tree) The left part of a dotted tree is the union of the set of nodes in the tree that have been traversed on the left and right sides and the set of nodes that have been traversed on the left side only.</Paragraph> <Paragraph position="3"> We will first give an intuitive interpretation of the ten components of a state, and then give the necessary and sufficient conditions for membership of a state in a states set.</Paragraph> <Paragraph position="4"> We interpret informally a state s = \[~, dot, side, pos, l, f~, fi, star, t~, b~\] in the following way (see Figure 4): * l is an index in the input string indicating where the tree derived from a begins.</Paragraph> <Paragraph position="5"> * ft is an index in the input string corresponding to the point just before the foot node (if any) in the tree derived from a.</Paragraph> <Paragraph position="6"> * fi is an index in the input string corresponding to the point just after the foot node (if any) in the tree derived from a.The pair fi and fi will mean that the foot node subsumes the string al,+,...ay,. * star:, is the address in a of the deepest node that subsumes the dot on which an adjunction has been partially recognized. If there is no adjunction in the tree a along the path from the root to the dotted node, star is unbound.</Paragraph> <Paragraph position="7"> * t~ is an index in the input string corresponding to the point in the tree where the adjunction on the starred node was made. If star is unbound, then t~ is also unbound.</Paragraph> <Paragraph position="8"> * b~ is an index in the input string corresponding to the point in the tree just before the foot node of the tree adjoined at the starred node. The pair t~ and b~ will mean that the string as far as the foot node of the auxiliary tree adjoined at the starred node matches the substring alT+l...ab7 of the input string. If star is unbound, then b~ is also unbound.</Paragraph> <Paragraph position="9"> * s E Si means that the recognized part of the dotted tree a, which is the left part of it, is consistent with the input string from al to aa and from at to aI, and from ay. to ai, or from a I to al and from az to al when the foot node is not in the recognized part of the tree.</Paragraph> <Paragraph position="10"> We are now ready to characterize the membership of s in S~: Invariant 1 A state s = \[a, dot, side,pos, l, fh fr, star, t~, b~\] is in Si if and only if there is a derived tree from an initial tree such that (see Figure 4): 1. The tree a is part of the derivation.</Paragraph> <Paragraph position="11"> 2. The tree derived from a in the derivation tree, ~, has adjunctions only on nodes in 7~(s).</Paragraph> <Paragraph position="12"> 3. The part of the tree to the left of the dot in the tree derived spans the string al ... ai.</Paragraph> <Paragraph position="13"> 4. The tree derived from a, E, has a yield that starts just after ah ends at ay, before the foot node (if ay, is defined), and starts after the foot node just after ay, (if aI, is defined).</Paragraph> <Paragraph position="14"> 5. If there are adjunctions on the path from the dotted node to the root of a, then star is the address of the deepest adjunction on that path and the auxiliary tree adjoined at that node star has a yield that starts just after a,~ and stops at its foot node at ab t.</Paragraph> <Paragraph position="15"> The proof of this invariant has as corollaries the soundness, completeness, and therefore the correctness of the algorithm.</Paragraph> </Section> <Section position="3" start_page="260" end_page="263" type="sub_section"> <SectionTitle> 3.3 The recognizer </SectionTitle> <Paragraph position="0"> The Earley-type recognizer for TAGs follows: Let G be a TAG.</Paragraph> <Paragraph position="1"> Let al...a, be the input string.</Paragraph> <Paragraph position="2"> program recognizer beg~ So = { \[a, O, left, above, 0 ..... -\] \]a is an initial tree } For i := 0 to n do begin Process the states of Si, performing one of the following seven operations on each state s = \[c~, dot, side,pos, l, f,, fr, star, t~, b~\] until no more states can be added: I. Sc-~er 2. Move dot down S. Move dot up 4. Left Predictor 5. Left Completor 6. Right Predictor 7. Right Completor</Paragraph> <Paragraph position="4"> If there is in S. a state s=\[a,O, right, above,O .... ,-\] such that ~ is an initial tree then return acceptance.</Paragraph> <Paragraph position="5"> end.</Paragraph> <Paragraph position="6"> The algorithm is a general recognizer for TAGs. Unlike the CKY algorithm, it requires no condition on the grammar: the trees can be binary or not, the elementary (initial or auxiliary) trees can have the empty string as frontier. It is an off-line algorithm: it needs to know the length n of the input string. However we will see later that it can very easily be modified to an on-line algorithm by the use of an end-marker in the input string. We now describe one by one the seven processes. The current states set is presumed to be S/and the state to be processed is s = \[a, dot, side, pos, l, fZ, fr, star, tT\]. Only one of the seven processes can be applied to a given state. The side, the position, and the address of the dot determine the unique process that can be applied to the given state.</Paragraph> <Paragraph position="7"> Definition 5 (Adjunct(a, address)) Given a TAG G, define Adjunct(a, address) as the set of auxiliary trees that can be adjoined in the elementary tree ct at the node n which has the given address. In a TAG without any constraints on adjunction, if n is a non-terminal node, this set consists of all auxiliary trees that are rooted by a node with same label as the label of n.</Paragraph> <Paragraph position="8"> The scanner scans the input string. Suppose that the dot is to the left of and above a terminal symbol (see Figure 5). Then if the terminal symbol matches the next input token, the program should record that a new token has been recognized and try to recognize the rest of the tree.</Paragraph> <Paragraph position="9"> Therefore &quot;the scanner applies to s = \[a, dot, left, above, 1, ft, L, star, t\[, b\[\] such that ,',(dot) is a terminal symbol and \[tr, dot, right, above, l, ft, fr, star, t\[ , b\[ \] to S,.</Paragraph> <Paragraph position="10"> Move dot down (See Figure 6), moves the dot down, from position lb of the dotted node to posi- null It therefore applies C/o s = \[~, d~, left, below, l, ~, f,, star, t\[, b\[\] such that ~he node where the do~ is has a lef~most child at address u.</Paragraph> <Paragraph position="11"> It adds \[a, u, left, above, I, ~ , re, star, t\[ , b~ \] to S,.</Paragraph> <Paragraph position="12"> Move dot up (See Figure 7), moves the dot &quot;up&quot;, from position ra of the dotted node to position la of its right sibling if it has a right sibling, otherwise to position rb of its parent.</Paragraph> <Paragraph position="13"> It therefore applies to s = \[a, dot, ~ght, above, l, ~, fi, star, t\[, b\[\] such that the node on which the dot is has a parent node.</Paragraph> <Paragraph position="14"> * Case 1: the node where the dot is has a right sibling at address r.</Paragraph> <Paragraph position="15"> It adds \[ct, r, left, above, l, fz, fr, star, t~ , b~\] ~o S,.</Paragraph> <Paragraph position="16"> * Case 2: the node where the dot is is ~he rightmost child of the parent node p.</Paragraph> <Paragraph position="17"> It adds \[~, p, right, below, l, f,, re, star, t~, bT\] to S,. Suppose that there is a dot to the left of and above a non-terminal symbol A (see Figure 8). Then the algorithm takes two paths in parallel: it makes a prediction of adjunction on the node labeled by A and tries to recognize the adjunction (stepl) and it also considers the case where no adjunction has been done (step2). These operations are performed by the Left Predictor.</Paragraph> <Paragraph position="18"> It applies to s = \[~, dot, left, above, 1, h, fr, aar, t~, b~\] such that ~(dot) is a non-terminal.</Paragraph> <Paragraph position="19"> foot node.</Paragraph> <Paragraph position="20"> It adds the state \[~, dot, left, below, 1, ~ , fi , star, t~ , b~ \] to S,.</Paragraph> <Paragraph position="21"> -- Case 2: the dot is on the foot node. Necessarily, since the foot node has not been already traversed, ~ and fr are unspecified.</Paragraph> <Paragraph position="22"> It adds the state \[~, dot, left, below, l, i, -, star, t~ , b~ \] to S,.</Paragraph> <Paragraph position="23"> Suppose that the auxiliary that we left-predicted has been recognized as far as its foot (see Figure 9). Then the algorithm should try to recognize \[I. n. fr. tl.. bl.\] ~, (i.-.-.-.-\] J \[1, fl, fr, tl&quot; ,bl*\] \[1, ft. fr, tl&quot;, bl*\] what was pushed under the foot node. (A star in the original tree will signal that an adjunction has been made and half recognized.) This operation is performed by the Left Completer.</Paragraph> <Paragraph position="24"> It applies to s = \[a, dot, left, below, l, i, -, star, t~, b~\] such that the dot is on the foot node.</Paragraph> <Paragraph position="25"> For all I I I t I ,n St s = L 8, dot , left, above, l, f;, f~, star, t t , bt \] in Sz such that a E Adjunct(B, dot') Case I: dot' is on the foot node of B. Then necessary, f\[ and f~ are unbound.</Paragraph> <Paragraph position="26"> It adds the state LS, dot',left, below, l',i,-,dot',l,~ to S,. Case 2: dot ~ is not on the foot node of B.</Paragraph> <Paragraph position="27"> It adds the state ~, dot', left, below, l', f\[, f:, dot', l, ~ to S,. Suppose that there is a dot to the right of and below a node A (see Figure I0). If there has been an adjunction made on A (case I), the program should try to recognize the right part of the auxiliary tree adjoined at A. However if there was no adjunction on A (case 2), then the dot should be moved up. Note that the star will tell us if an adjunction has been made or not. These operations are performed by the Right predictor.</Paragraph> <Paragraph position="28"> The right predictor applies to s = \[a, dot, right, below, l, fz, fr, star, tT, bT\] * Case 1: dot = star For all states ,t $; s = \[/3, dot', left, below, t~, bT, -, star ~-, t t , b t \]. in Sb 7 such that ~ C/ Adjunct(a, dot), it adds the state L O, dot', right, below,tT, * &quot; *' *' bz ,,,star',t z ,b I \] to s,.</Paragraph> <Paragraph position="29"> * Case 2: dot ~ star It adds the state \[a, dot, right, above, l, fl, fr, star, tT , bT \] to S,.</Paragraph> <Paragraph position="30"> Suppose that the dot is to the right ot and above the root of an auxiliary tree (see Figure 11). Then the adjunction has been totally recognized and the program should try to recognize the rest of the tree in which the auxiliary tree has been adjoined. This operation is performed by the Right Completor. It applies to s = \[a, 0, right, above, l, fz, L, -, -, -\] For all states s! = \[/3, dot', left, above, l', f\[ , fir, star', t~', b~'\] inS, and for all states LS, dot',right, below, t',T,,~,dot',Z, fd in aS, such that a E Adjunct(E, dot') It adds Lff , dot', right, above, l',-~l , 7~r, star', t;', 6;'\] to S,.</Paragraph> <Paragraph position="31"> Nhere 7 = f, if f is bound in state st, and f can have any value, if f is unbound in state el.</Paragraph> </Section> <Section position="4" start_page="263" end_page="264" type="sub_section"> <SectionTitle> 3.4 Handling constraints on adjunc- </SectionTitle> <Paragraph position="0"> tion In a TAG, one can, for each node of an elementary tree, specify one of the following three constraints on adjunction (Joshi, 1987): * Null adjunction (NA): disallow any adjunction on the given node.</Paragraph> <Paragraph position="1"> * Obligatory adjunction (OA): an auxiliary tree must be adjoined on the given node. * Selective adjunction (SA(T)): a set T of auxiliary trees that can be adjoined on the given node is specified.</Paragraph> <Paragraph position="2"> The algorithm can be very easily modified to handle those constraints. First, the function Adjunct(a, address) must be modified as follows: * Adjunct(a, address) = ~, if there is NA on the node.</Paragraph> <Paragraph position="3"> * A~unct(a, address) as previously defined, if there is OA on the node.</Paragraph> <Paragraph position="4"> * Adjunct(a, address) = T, if there is SA(T) on the node.</Paragraph> <Paragraph position="5"> Second, step 2 of the left predictor must be done</Paragraph> </Section> <Section position="5" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.5 An example </SectionTitle> <Paragraph position="0"> We give one example that illustrates how the recognizer works. The grammar used for the example generates the language L = {a&quot;b&quot;ecndn\]n > 0}. The input string given to the recognizer is: aabbeccdd. The grammar is shown in Figure 12. The states sets are shown in Figure 14.</Paragraph> <Paragraph position="1"> Next to each state we have printed in parentheses the name of the processor that was applied to the state. The input is recognized since \[a, O, right, above, 0 ..... -\] is in states set sg.</Paragraph> </Section> <Section position="6" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.6 Remarks </SectionTitle> <Paragraph position="0"> Use of move dot up and move dot down Move dot down and move dot up can be eliminated in the algorithm by merging the original dot and the position it is moved to. However for explanatory purposes we chose to use these two processors in this paper.</Paragraph> <Paragraph position="1"> Off-llne vs on-line The algorithm given is an off-line recognizer. It can be very easily modified to work on line by adding an end marker to all initial trees in the grammar (see Figure 13).</Paragraph> <Paragraph position="2"> Extracting a parse The algorithm that we describe in section 3.3 is a recognizer. However, if we include pointers from a state to the other states which caused it to he placed in the states set, the recognizer can be modified to produce all parses of the input string.</Paragraph> </Section> <Section position="7" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.7 Correctness </SectionTitle> <Paragraph position="0"> The correctness of the parser has been proven and is fully reported in Schahes and Joshi (1988). It consists of the proof of the invariant given in section 3.2. Our proof is similar in its concept to the proof of the correctness of Earley's parser given in Aho and Ullman 1973. The &quot;ofily if&quot; part of the invariant is proved by induction on the number of states that have been added so far to all states sets.</Paragraph> <Paragraph position="1"> The &quot;if&quot; part is'proved by induction on a defined rank of a state. The soundness (the algorithm recoguizes only valid strings) and the completeness (if a string is valid, then the algorithm will recognize it) are corollaries of this invariant.</Paragraph> </Section> <Section position="8" start_page="264" end_page="264" type="sub_section"> <SectionTitle> 3.8 Implementation </SectionTitle> <Paragraph position="0"> The parser has been implemented on Symbolics Lisp machines in Flavors. More details of the actual implementation can be found in Schabes mad Joshi (1988). The current implementation has an O(IGlZn 9) worst case time complexity and O(IGln 6) worst case space complexity. We have not as yet been able to reduce the worst case time complexity to O(\[G\[Zn6). We are currently attempting to reduce this bound. However, the main purpose of constructing an Parley-type parser is to improve the average complexity, which is crucial in practice.</Paragraph> </Section> </Section> <Section position="5" start_page="264" end_page="265" type="metho"> <SectionTitle> 4 Extensions </SectionTitle> <Paragraph position="0"> We describe how substitution is defined in a TAG.</Paragraph> <Paragraph position="1"> We discuss the consequences of introducing substitution in TAGs. Then we show how substitution can be parsed. We extend the parser to deal with feature structures for TAGs. Finally the relationship with PATR-II is discussed.</Paragraph> <Section position="1" start_page="264" end_page="265" type="sub_section"> <SectionTitle> 4.1 Introducing substitution in TAGs </SectionTitle> <Paragraph position="0"> TAGs use adjunction as their basic composition operation. It is well known that Tree Adjoining Languages (TALs) are mildly context-sensitive.</Paragraph> <Paragraph position="1"> TALs properly contain context-free languages. It is also possible to encode a context-free grammar with auxiliary trees using adjunction only. However, although the languages correspond, the possible encoding does not reflect directly the original \[a, O, left, above, 0 ..... -\] (left predictor) \[C/~, O, left, below, O, -, -, -, -, -~ (move dot down) \[~! Zp left, ahoy% 01 --,--r--,--,--2 (scanner) 1, right, abo~e, 0, --, -, --, --, -\] (move dot up) 2, left, below, 0, --, --, --, --, -\] (move dot down) \[~, 2.1, left, above, O, -, -, -, -, -\] (scanner) z, le/tt.bove, Z, , , , ,-\] ~sc~ner) left degha. 2 - -,- - -i (left \[/~, 2, left, below, 1 ..... -\] (move dot down) O, left, below, 2, --, --, -, --, --\] (move dot down) \[~', 1, right, above, 1, -t --1--, --,--\] ~move dot up) \[0, 2.2, left, below, 1, 3, --,--, --,--\] ~left completor) \[/~, 2.1, right, above, I, --, --, --, --, --\] (move dot up) \[~, O, left, above, 0, - .... -\] (left predictor) f/J, O, left, below, 0, -, -, -, -, -\] (move dot down) -\] ~scanner) \[ct, 11 le~t l aboo% 0 r -1 --I --P -, (left predictor) ,\[~, 2, left, above, O, -, -, -, -, \[13, O, left, above, 1, -, -, -, -, -\] (left predictor) \[0, O, left, below, 1, -, --, --, -, --\] (move dot down) \[/~, 2.1, left, aboue, 1, --, --, -, -, -\] (scanner) \[B, 1, left, above, 2, -, --, --, -, --\] (scanner) \[/~, 2, left, above, 1, --, -, --, --, -\] (left predictor) \[0, 2, left, below, 0, -, -, 2, 1,3\] (move dot down) \[~, 2.2, left, above, 1, -, -, -, -, -\] (left predictor) \[p, 2.1, le/t, abate, O, -, -, 211, a I (scanne 0 \[o, 1, left, above, O, --, --, O, O, 4\] (manner) \[~, 2.2, fell abo~e, O, -, -, 2, 1, 3\] (left predictor) \[~, 2.2, le)'t, below, O, 4, --, 2, 1,3\] (left completor) \[0, 2.3, left, abooe, O, 4, 5, 2,1,3\] (scanner) \[~, 2.2, right, above, 0, 4, 5, 2, 1, 3\] (move dot up) \[a~ 1, right, above t O r --t --w 01014\] (move dot up) \[0, 2.2, right, above, 1, 3, 6, -, -, -\] (move dot up) \[~, 2.3, left, above, 1, 3, 6, --, -, -\] (scanner) \[~, 2.2, right, below, 1~ 3~ 6~ -~ - r -\] (right predictor r case 2) \[0, 2, right, below, 1,3, 6,--,-,--\] (right predictor, case 2) B I 3, lep, above, 1,3, 6, -I --I--1 (scanner) ~, O, right, below, I, 3, 6, --, --, -\] (right predictor, case 2) \[~, 3, left, above, 0, 4, 5, --, --, --\] (scanner) (move dot up) \[~1 21 fish'1 ohdegre10, 41 51 --, --I -- (right predictor, case 2) \[~, O, right, below, O, 4, 5, -, -, \[~, O, rlqht l above, O, 4, 5, --, --, --\] (right completor) \[a, 0, left, beio~, 0, --, --, 0, 0, 4\] (move dot down) \[0, 2.1, right, above, 0, --, --, 2, 1,3\] (move dot up) \[\[3, 2.2, right, below, 0, 4, 5, 2,1,3\] (right predictor, case 2) \[a, 0, right, below, O, -, -, O, O, 4\] (right predictor, case 1) \[0, 2.8, right, above, 0, 4, 5, 2, 1, 3\] (move dot up) LS, 2, right, below, O, 4, 5, 2,1,3\] (right predictor, case 1) \[0, 2, right, above, 1,3, 6, --, --, --\] (move dot up) I B r 2.31 right I above, 113, 61 --I --~--\] (move dot up) /3, O, right, above, I, 3, 6, --, --, --\] (right completor) \[0, 3, right, abo~e, 1,3, 6, --, --, --\] (move dot up) \[o, O, right, above, O, --, --, --, -, -\] (end test) \[~, 3, right, above, O, 4, 5, -, --, --\] (move dot up) context free grammar since this encoding uses adjunction. null Substitution is the basic operation used in CFG.</Paragraph> <Paragraph position="2"> A CFG can be viewed as a tree rewriting system.</Paragraph> <Paragraph position="3"> It uses substitution as basic operation and it consists of a set of one-level trees. Substitution is a less powerful operation than adjunction.</Paragraph> <Paragraph position="4"> However, recent linguistic work in TAG grammar development (Abeilld, 1988) showed the need for substitution in TAGs as an additional operation for obtaining appropriate structural descriptions in certain cases such as verbs taking two sentential arguments (e.g. &quot;John equates solving this problem with doing the impossible&quot;) or compound categories. It has also been shown to be useful for lexical insertion (Schabes, Abeind and Joshi, 1988). It should be emphasized that the introduction of substitution in TAGs does not increase their generative capacity. Neither is it a step back from the original idea of TAGs.</Paragraph> <Paragraph position="5"> Definition 6 (Substitution in TAG) We de-</Paragraph> </Section> </Section> <Section position="6" start_page="265" end_page="268" type="metho"> <SectionTitle> $ VP NP </SectionTitle> <Paragraph position="0"> Figure 16: Writing a CFG in TAG fine substitution in TAGs to take place on specified nodes on the frontiers of elementary trees. When a node is marked to be substituted, no adjunction can take place on that node. Furthermore, substitution is always mandatory. Only trees derived from initial trees rooted by a node of the same label can be substituted on a substitution node. The resulting tree is obtained by replacing the node by the tree derived from the initial tree. Substitution is illustrated in Figure 15.</Paragraph> <Paragraph position="1"> We conventionally mark substitution nodes by a down arrow (1).</Paragraph> <Paragraph position="2"> As a consequence, we can now encode directly a CFG in a TAG with substitution. The resulting TAG has only one-level initial trees and uses only substitution. An example is shown in Figure 16.</Paragraph> <Section position="1" start_page="265" end_page="266" type="sub_section"> <SectionTitle> 4.2 Parsing substitution </SectionTitle> <Paragraph position="0"> The parser can be extended very easily to handle substitution. We use Earley's original predictor and completor to handle substitution.</Paragraph> <Paragraph position="1"> The left predictor is restricted to apply to nodes to which adjunction can be applied.</Paragraph> <Paragraph position="2"> A flag subst? is added to the states. When set, it indicates that the tree (initial) has been predicted for substitution. We use the index ! (as in Earley's original parser) to know where it has been predicted for substitution. When the initial tree that has been predicted for substitution has been totally recognized, we complete the state as Earley's original parser does.</Paragraph> <Paragraph position="3"> A state s is now an ll-tuple * \[~, dot, side,poe, l, fl, fr, star, t~, b~, subst?\]: where subst? is a boolean that indicates whether the tree has been predicted for substitution. The other components have not been changed.</Paragraph> <Paragraph position="4"> We add two more processors to the parser.</Paragraph> </Section> <Section position="2" start_page="266" end_page="266" type="sub_section"> <SectionTitle> Substitution Predictor </SectionTitle> <Paragraph position="0"> Suppose that there is a dot to the left of and above a non-terminal symbol on the frontier A that is marked for substitution (see Figure 17). Then the algorithm predicts for substitution all initial trees rooted by A and tries to recognize the initial tree.</Paragraph> <Paragraph position="1"> This operation is performed by the substitution predictor.</Paragraph> <Paragraph position="2"> It applies to s- \[~, dot, left, above, l, f l, fr , star, t~ i b~ , subst?\] such that a(dot) is a non-terminal on the frontier of ~ .hieh is marked for subst itut ion: It adds the states {\[fl, O, left, above, i, -, -, -, -, -, true\] \]/~ is an Lnitial tree s.t.#(O) -- or(dot)} to Si.</Paragraph> </Section> <Section position="3" start_page="266" end_page="266" type="sub_section"> <SectionTitle> Substitution Completor </SectionTitle> <Paragraph position="0"> Suppose that the initial tree that we predicted for substitution has been recognized (see Figure 18).</Paragraph> <Paragraph position="1"> Then the algorithm should try to recognize the rest of the tree in which we predicted a substitution. This operation is performed by the substitution completor.</Paragraph> <Paragraph position="2"> It applies to s=\[a,O, rioht,above, l, , , , , ,true\] For all states s = \[/3, dot', left, a~-v~o e,- l',jt,jr,star',&quot; &quot; t~', b~', subst?'\] in Sa s.t. #(dot') is marked for substitution and l~(dot) = a(O).</Paragraph> <Paragraph position="3"> It adds the following stats to Si: \[/3, dot', right, above, 1', f\[ , f~, star', t~' , b~ ', subst?'\] . Complexity The introduction of the substitution predictor and the substitution completor does not increase the complexity of the overall TAG parser.</Paragraph> <Paragraph position="4"> If we encode a CFG with substitution in TAG, the parser behaves in O(IGl~n s) worst case time and O(\[GIn 2) worst case space like Earley's original parser. This comes from the fact that when there are no auxiliary trees and when only substitution is used, the indices ft,fi,t~,b~ of a state will never be set. The algorithm will use only the substitution predictor and the substitution eompletor. Thus, it behaves exactly like Earley's original parser on CFGs.</Paragraph> </Section> <Section position="4" start_page="266" end_page="268" type="sub_section"> <SectionTitle> 4.3 Parsing feature structures for </SectionTitle> <Paragraph position="0"> TAGs The definition of feature structures for TAGs and their semantics was proposed by Vijay-Shanker (1987) and Vijay-Shanker and Joshi (1988). We first explain briefly how they work in TAGs and show how we have implemented them. We introduce in a TAG framework a language similar to PATR-II which was investigated by Shieber (Shieber, 1984 and 1986). We then show how one can embed the essential aspects of PATR-II in this system.</Paragraph> <Paragraph position="1"> to go to the movies</Paragraph> <Paragraph position="3"> Feature structures in TAGs As defined by Vijay-Shanker (1987) and Vijay-Shanker and 30shi(1988), to each adjunction node in an elementary tree two feature structures are attached: a top and a bottom feature structure. The top feature corresponds to a top view in the tree from the node. The bottom feature corresponds to the bottom view. When the derivation is completed, the top and bottom features of all nodes are unified. If the top and bottom features of a node do not unify, then a tree must be adjoined at that node.</Paragraph> <Paragraph position="4"> This definition can be trivially extended to substitution nodes. To each substitution node we attach two identical feature structures (top and bottom). null The updating of features in case of adjunction is shown in Figure 19.</Paragraph> <Paragraph position="5"> Unification equations As in PATR-II, we express with unification equations dependencies between DAGs in an elementary tree. The system therefore consists of a TAG and a set of unification equations on the DAGs associated with nodes in elementary trees.</Paragraph> <Paragraph position="6"> An example of the use of unification equations in TAGs is given in Figure 20. Note that the top and bottom features of node S in (~ can not be unified. This forces an adjunction to be performed on S. Thus, the following sentence is not accepted: *to go 1;o 1;he movies.</Paragraph> <Paragraph position="7"> The auxillm-y tree 81 can be adjoined at S in or: John wan1;s 1;o go 1;o 1;he movies.</Paragraph> <Paragraph position="8"> But since the bottom feature of S has tensed value - in c~ and since the bottom feature of S has tensed value -4- in/32, /31 can not be adjoined at movies.</Paragraph> <Paragraph position="9"> We refer the reader to Abeill6 (1988) and to Schabes, Abeill6 and 3oshi (1988) for further explanation of the use of unification equations and substitution in TAGs.</Paragraph> <Paragraph position="10"> Parsing and the relationship with PATrt-II By adding to each state the set of DAGs corresponding to the top and bottom features of each node, and by making sure that the unification equations are satisfied, we have extended the parser to parse TAGs with feature structures.</Paragraph> <Paragraph position="11"> Since we introduced substitution and since we are able to encode a CFG directly, the system has the main functionalities of PATtt-II. The system parses unification formalisms that have a CFG skeleton and a TAG skeleton.</Paragraph> </Section> </Section> class="xml-element"></Paper>