File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2156_metho.xml
Size: 7,332 bytes
Last Modified: 2025-10-06 14:15:04
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2156"> <Title>An alternative LR algorithm for TAGs</Title> <Section position="4" start_page="948" end_page="949" type="metho"> <SectionTitle> 4 The recognizer </SectionTitle> <Paragraph position="0"> Relying on the functions defined in the previous section, we now explore the steps of the LR automaton, which as usual reads input from left to right and manipulates a stack.</Paragraph> <Paragraph position="1"> We can divide the stack elements into two classes. One class contains the LR states from Q, the other contains elements of A4. A stack consists of an alternation of elements from these two classes. More precisely, each stack is an element from the following set of strings, given by a regular expression:</Paragraph> <Paragraph position="3"> Note that the bottom element of the stack is always qin. We will use the symbol A to range over stacks and substrings of stacks, and the symbol X to range over elements from A4.</Paragraph> <Paragraph position="4"> A configuration (A, w) of the automaton consists of a stack A * $ and a remaining input w. The steps of the automaton are given by the binary relation t- on pairs of configurations. There are three kinds of step: shift (Aq, aw) b ( Aqaq', w), provided q' =</Paragraph> <Paragraph position="6"> following. If for somej (1 < j <_ m) Xj is of the form (M, L) then this provides the value of L, otherwise we set L = \[\].~ reduce aux tree ( AqoXlqlX2q2 . . . Xrnqm, W) F- (AqoXq~, w), provided t * reductions(qm), X1... Xm * CS(Rt) and q' = goto(qo, N) ~ O, where we obtain node N from the (unique) Xj (1 _< j _< m) which is of the form (M, \[NIL\]), 2Exactly in the case that N dominates a footnote will (exactly) one of the Xj be of the form (M, L), some M. and set X = N if L -- \[\] and X = (N,L) otherwise) The shift step is identical to that for context-free LR parsing. There are two reduce steps that must be distinguished. The first takes place when a subtree of an elementary tree t has been recognized. We then remove the stack symbols corresponding to a cross-section through that subtree, together with the associated LR states. We replace these by 2 other symbols, the first of which corresponds to the foot of an auxiliary tree, and the second is the associated LR state. In the case that some node M of the cross-section dominates the foot of t, then we must copy the associated list L to the first of the new stack elements, after pushing N onto that list to reflect that the spine has grown one segment upwards.</Paragraph> <Paragraph position="7"> The second type of reduction deals with recognition of an auxiliary tree. Here, the head of the list \[NIL\], which indicates the node at which the auxiliary tree t has been adjoined according to previous bottom-up calculations, must match a node that occurs directly above the root node of the auxiliary tree; this is checked by the test q' = goto(qo, N) ~ 0.</Paragraph> <Paragraph position="8"> Input v is recognized if (qin,v) ~-* (qinAq,C/) for some A and q E Q/~,. Then A will be of the form XlqlX2q2&quot;'&quot; qm-lXm, where X1 ..&quot; Xm E CS(Rt), for some t e I.</Paragraph> <Paragraph position="9"> Up to now, it has been tacitly assumed that the recognizer has some mechanism to its disposal to find the strings XI&quot;&quot;Xm E CS(Rt) and XI&quot;&quot; Xm E CS+(N) in the stack. We will now explain how this is done.</Paragraph> <Paragraph position="10"> For each N, we construct a deterministic finite automaton that recognizes the strings from CS+(N) from right to left. There is only one final state, which has no outgoing transitions.</Paragraph> <Paragraph position="11"> This is related to the fact that CS+(N) is suffixclosed. A consequence is that, given any stack that may occur and any N, there is at most one string XI'&quot; Xm E CS+(N) that can be found from the top of the stack downwards, and this string is found in linear time. For each t E IUA we also construct a deterministic finite automaton for CS(Rt). The procedure for t E I is given in Figure 3, and an example of its application is given in Figure 4. The procedure for t E A is</Paragraph> <Paragraph position="13"> (K, N, T, s, {f}) that recognizes CS(Rt), given some t E I. K is the set of states, N acts as alphabet here, 7&quot; is the set of transitions, s is the initial state and f is the (only) final state.</Paragraph> <Paragraph position="14"> similar except that it also has to introduce transitions labelled with pairs (N, L), where N dominates a foot and L is a stack in Af*; it is obvious that we should not actually construct different transitions for different L E .hf*, but rather one single transition (N, _), with the placeholder &quot;_&quot; representing all possible L EAf*.</Paragraph> <Paragraph position="15"> The procedure for CS+(N) can easily be expressed in terms of those for CS(Rt).</Paragraph> </Section> <Section position="5" start_page="949" end_page="950" type="metho"> <SectionTitle> 5 Extended example </SectionTitle> <Paragraph position="0"> For the TAG presented in Figure 1, the algorithm from Schabes and Vijay-Shanker (1990) does not work correctly. The language described by the grammar contains exactly the strings abc, a'b'c ~, adbec, and a'db'ecq The algorithm from Schabes and Vijay-Shanker (1990) however also accepts adb'ec' and a~dbec. In the former string, it acts as if it were recognizing the (ill-formed) tree in Figure 2: it correctly matches the part to the &quot;south&quot; of the adjunction to the part to the &quot;north-east&quot;. Then, after reading c', the information that would indicate CS(R1), where R1 is the root node of ~1 (Figure 1).</Paragraph> <Paragraph position="1"> whether a or a' was read is retrieved from the stack, but this information is merely popped without investigation. Thereby, the algorithm fails to perform the necessary matching of the elementary tree with regard to the part to the &quot;north-west&quot; of the adjunction.</Paragraph> <Paragraph position="2"> Our new algorithm recognizes exactly the strings in the language. For the running example, the set of LR states and some operations on them are shown in Figure 5. Arrows labelled with nodes N represent the goto function and those labelled with +-(N) represent the goto+- function. The initial state is 0. The thin lines separate the items resulting from the goto and goto+- functions from those induced by the closure function. (This corresponds with the distinction between kernel and nonkernel items as known from context-free LR parsing.) That correct input is recognized is illustrated by the following: Note that as soon as all the terminals in the auxiliary tree have been read, the &quot;south&quot; section of the initial tree is matched to the &quot;north-west&quot; section through the goto function. Through subsequent shifts this is then matched to the &quot;north-east&quot; section.</Paragraph> <Paragraph position="3"> This is in contrast to the situation when incorrect input, such as adb~ec ~, is provided to the Here, the computation is stuck. In particular, a reduction with auxiliary tree/3 fails due to the fact that goto(1, N2) --- 0.</Paragraph> </Section> class="xml-element"></Paper>