File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2156_intro.xml
Size: 8,278 bytes
Last Modified: 2025-10-06 14:06:37
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2156"> <Title>An alternative LR algorithm for TAGs</Title> <Section position="3" start_page="0" end_page="948" type="intro"> <SectionTitle> 2 Notation </SectionTitle> <Paragraph position="0"> For a good introduction to TAGs, the reader is referred to Joshi (1987). In this section we merely summarize our notation.</Paragraph> <Paragraph position="1"> A tree-adjoining grammar is a 4-tuple (Z, NT, I, A), where ~ is a finite set of terminals, I is a finite set of initial trees and A is a finite set of auxiliary trees. We refer to the trees in I U A as elementary trees. The set NT, a finite set of nonterminals, does not play any role in this paper.</Paragraph> <Paragraph position="2"> Each auxiliary tree has a distinguished leaf, call the foot. We refer to the foot of an auxiliary tree t as Ft. We refer to the root of an elementary tree t as Rt. The set of all nodes of an elementary tree t is denoted by At(t), and we define the set of all nodes in the grammar by At = U, ruAAt(t).</Paragraph> <Paragraph position="3"> For each non-leaf node N we define children(N) as the list of children nodes. For other nodes, the function children is undefined. The dominance relation <J* is the reflexive and transitive closure of the parent relation <~ defined by N <~ M if and only if children(N) = aMf~, for some ~, f~ E A/'*.</Paragraph> <Paragraph position="4"> Each leaf N in an elementary tree, except when it is a foot, is labelled by either a terminal from Z or the empty string e. We identify such a node N labelled by a terminal with that terminal. Thus, we consider 2: to be a subset of Af, I For now, we will disallow labels to be e, since this causes a slight technical problem. We will return to this issue in Section 6, For each node N that is not a leaf or that is a foot, we define Adjunct(N) as the set of auxiliary trees that can be adjoined at N. This set may contain the element nil to indicate that adjunction at that node is not obligatory.</Paragraph> <Paragraph position="5"> An example of a TAG is given in Figure 1.</Paragraph> <Paragraph position="6"> There are two initial trees, al and a2, and one auxiliary tree fL For each node N, Adjunct(N) has been indicated to the right of that node, unless Adjunct(N) = {nil}, in which case that information is omitted from the picture.</Paragraph> <Paragraph position="7"> 3 Construction of the LR table For technical reasons, we assume an additional node for each elementary tree t, which we denote by T. This node has only one child, viz. the actual root node Rt. We also assume an additional node for each auxiliary tree t, which we denote by _L. This is the unique child of the actual foot node Ft. The domain of the function children is extended to include foot nodes, by defining children(Ft) = _L, for each t E A.</Paragraph> <Paragraph position="8"> For the algorithm, two kinds of tree need to be distinguished: elementary trees and subtrees of elementary trees. A subtree can be identified by a pair (t, N), where t is an elementary tree and N is a node in that tree; the pair indicates the subtree of t rooted at N. The set of all trees needed by our algorithm is given by:</Paragraph> <Paragraph position="10"> From here on, we will use the symbol t exclusively to range over I U A, and r to range over T in general.</Paragraph> <Paragraph position="11"> 1With this convention, we can no longer distinguish between different leaves in the grammar with the same terminal label. This merging of leaves with identical labels is not an inherent part of our algorithm, but it simplifies the notation considerably.</Paragraph> <Paragraph position="12"> For each ~ E T, we may consider a part of the tree consisting of a node N in ~- and the list of its children nodes 7. Analogously to the notation for context-free parsing, we separate the list of children nodes into two lists, separated by a dot, and write N --~ a * f~, where a/~ = 7, to indicate that the children nodes in a have already been matched against a part of the input string, and those in fl have as yet not been processed.</Paragraph> <Paragraph position="13"> The set of such objects for an elementary tree</Paragraph> <Paragraph position="15"> Such objects are attached to the trees ~ E T to which they pertain, to form the set of items: Items = {\[T,g--~ a * fl\] I r e T,(Y-+ (~ . fl) E P~ } A completed item is an item that indicates a completely recognized elementary tree or subtree. Formally, items are completed if they are of the form \[t,T ~ Rt *\] or of the form \[(t,N),N -+ a *\].</Paragraph> <Paragraph position="16"> The main concept needed for the construction of the LR table is that of LR states. These are particular elements from 2 Items to be defined shortly.</Paragraph> <Paragraph position="17"> First, we introduce the function closure from 2 Items to 2 Items and the functions goto and goto+from 2 Items x J~f to 2 Items. For any q C_ Items, closure(q) is the smallest set such that: 1. q C closure(q); 2. \[r,N --~ o~ * M/~\] E closure(q), nil E</Paragraph> <Paragraph position="19"> 8\] E Items implies \[T,N -+ aM * fl\] E closure (q).</Paragraph> <Paragraph position="20"> The clauses 1 thru 4 are reminiscent of the clo- null sure function for traditional LR parsing. Note that in clause 4 we set out to recognize a sub-tree (t',N) of elementary tree tq Clause 5 is unconventional: we traverse the tree ~- upwards when the dot indicates that all children nodes of M have been recognized.</Paragraph> <Paragraph position="21"> Next we define the function goto, for any q C Items, and any M E ~7 or M EAf such that Adjunct(M) includes at least one auxiliary tree.</Paragraph> <Paragraph position="23"> The function goto+- is similar in that it shifts the dot over a node, in this case the imaginary node J_ which is the unique child of an actual foot node Ft. However, it only does this if t is a tree which can be adjoined at the node that is given as the second argument.</Paragraph> <Paragraph position="24"> goto+-(q,M) = {\[7, Ft --~ _1_ .\] I \[T, Ft &quot;-+ * .k\] E closure(q) A t E Adjunct(M)} The initial LR state is the set qin -- {\[t,T-+ ,,Rt\] \] t e I} We construct the set Q of all LR states as the smallest collection of sets satisfying the conditions: null 1. qin E 0,; 2. q E Q, M E A/&quot; and q' = goto(q,M) ~ @ imply q~ E Q; and 3. q E Q, M E A/&quot; and q' = goto+-(q,M) ~ 0 imply q' E Q.</Paragraph> <Paragraph position="25"> An LR state is final if its closure includes a completed item corresponding to an initial tree: Q1~n = {q E Q I closure(q) n {\[t, T R, -\] I t e Z) # C/0} Final LR states indicate recognition of the input. Other completed items give rise to a reduction, a type of stack manipulation by the LR automaton to be defined in the next section. As defined below, reductions are uniquely identified by either auxiliary trees t or by nodes N obtained from the corresponding completed items.</Paragraph> <Paragraph position="27"> For each node N in a tree, we consider the set CS(N) of strings that represent horizontal cross-sections through the subtree rooted at N.</Paragraph> <Paragraph position="28"> If we do not want to include the cross-section through N itself, we write CS(N) +. A cross-section can also be seen as the yield of the sub-tree after removal of a certain number of its subtrees. null For convenience, each node of an auxiliary tree (or subtree thereof) that dominates a foot node is paired with a stack of nodes. The intuition behind such a stack of nodes \[N1,..., Arm\] is that it indicates a path, the so called spine, through the derived tree in the direction of the foot nodes, where each Ni, with 1 <_ i < m, is a node at which adjunction has taken place.</Paragraph> <Paragraph position="29"> Such stacks correspond to the stacks of linear indexed grammars.</Paragraph> <Paragraph position="30"> The set of all stacks of nodes is denoted by A/'*. The empty stack is denoted by \[\], and stacks consisting of head H and tail T are denoted by \[HIT \]. We define:</Paragraph> <Paragraph position="32"> and we simultaneously define the functions CS and CS + from Af to 2 &quot;~&quot; as the least functions</Paragraph> </Section> class="xml-element"></Paper>