File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-2183_metho.xml

Size: 11,816 bytes

Last Modified: 2025-10-06 14:15:03

<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-2183">
  <Title>A Descriptive Characterization of Tree-Adjoining Languages (Project Note)</Title>
  <Section position="3" start_page="1117" end_page="1119" type="metho">
    <SectionTitle>
2 Tree Manifolds and Automata
</SectionTitle>
    <Paragraph position="0"> Tree manifolds are a generalization to arbitrary dimensions of Gorn's tree domains (Gorn, 1967). A tree domain is a set of node address drawn from N* (that is, a set of strings of natural numbers) in which c is the address of the root and the children of a node at address w occur at addresses w0, wl,..., in left-to-right order. To be well formed, a tree domain must be downward closed wrt to domination, which corresponds to being prefix closed, and left sibling closed in the sense that if wi occurs then so does wj for all j &lt; i. In generalizing these, we can define a one-dimensional analog as string domains: downward closed sets of natural numbers interpreted as string addresses. From this point of view, the address of a node in a tree domain can be understood as the sequence of string addresses one follows in tracing the path from the root to that node. If we represent N in unary (with n represented as 1 n) then the downward closure property of string domains becomes a form of prefix closure analogous to downward closure wrt domination in tree domains, tree domains become sequences of sequences of 'l's, and the left-closure property of tree domains becomes a prefix closure property for the embedded sequences.</Paragraph>
    <Paragraph position="1"> Raising this to higher dimensions, we obtain, next, a class of structures in which each node expands into a (possibly empty) tree. A, three-dimensional tree manifold (3-TM), then, is set of sequences of tree addresses (that is, addresses of nodes in tree domains) tracing the paths from the root of one of these structures to each of the nodes in it. Again this must be downward closed wrt domination in the third dimension, equivalently wrt prefix, the sets of tree addresses labeling the children of any node must be downward closed wrt domination in the second dimension (again wrt to prefix), and the sets of string addresses labeling the children of any node in any of these trees must be downward  closed wrt domination in the first dimension (left-of, and, yet again, prefix).Thus 3-TM, tree domains (2-TM), and string domains (1-TM) can be defined uniformly as dth-order sequences of 'l's which are hereditarily prefix closed. We will denote the set of all 3-TM as T d. For any alphabet E, a E-labeled d-dimensional tree manifold is a pair (T, r) where T is a d-TM and r : T ~ E is an assignment of labels in E to the nodes in T. We will denote the set of all E-labeled d-TM as T d.</Paragraph>
    <Paragraph position="2"> Mimicking the development of tree manifolds, we can define automata over labeled 3-TM as a generalization of automata over labeled tree domains which, in turn, can be understood as an analogous generalization of ordinary finite-state automata over strings (labeled string domains).</Paragraph>
    <Paragraph position="3"> A d-TM automaton with state set Q and alphabet E is a finite set: J:\[d _C \]\[\] x Q x ~Q-1.</Paragraph>
    <Paragraph position="4"> The interpretation of a tuple (a, q, 7) E A d is that if a node of a d-TM is labeled a and T encodes the assignment of states to its children, then that node may be assigned state q. A run of an d-TM automaton A on a E-labeled d-TM 7 = (T, r) is an assignment r : T -+ Q of states in Q to nodes in T in which each assignment is licensed by A. If we let Q0 c Q be any set of accepting states, then the set of (finite) E-labeled d-TM recognized by A, relative to Q0, is that set for which there is a run of A that assigns the root a state in Q0. A set of d-TM is recognizable iff it is A(Qo) for some d-TM automaton ,4 and set of accepting states Q0.</Paragraph>
    <Paragraph position="5"> The strength of the uniform definition of d-TM automata is that many, even most, properties of the sets they recognize can be proved uniformly--independently of their dimension.</Paragraph>
    <Paragraph position="6"> It is easy to see that in the typical &amp;quot;crossproduct&amp;quot; construction of the proof of closure under intersection, for instance, the dimensionality of the TMs is a parameter that determines the type of the objects being manipulated but does not affect the manner of their manipulation. Uniform proofs can be obtained for closure of recognizable sets under determinization (in a bottom-up sense), projection, cylindrification, Boolean operations and for decidability of emptiness.</Paragraph>
    <Paragraph position="7"> 3 wSnT3 We are now in a position to build relational structures on d-dimensional tree manifolds. Let T d be the complete n-branching d-TM--that in which every point has a child structure that has depth n in all its (d- 1) dimensions. Let -\]-3 def 3 = (Tn, '~I, '~2, '~3&gt; where, for all x,y 6 T 3, x &amp;quot;~i y iff x is the immediate predecessor of y in the ith -dimension. The weak monadic second-order language of T 3 includes constants for each of the relations (we let them stand for themselves), the usual logical connectives, quantifiers and grouping symbols, and two countably infinite sets of variables, one ranging over individuals (for which we employ lowercase) and one ranging over finite subsets (for which we employ uppercase).</Paragraph>
    <Paragraph position="8"> If ~o(xl,..., xn, X1,..., Am) is a formula of this language with free variables among the xi and Xj, then we will assert that it is satisfied in T 3 by an assignment s (mapping the 'xi's to individuals and 'Xj's to finite subsets) with the notation T 3 ~ ~ Is\]. The set of all sentences of this language that are satisfied by T~ is the weak monadic second-order theory of T 3, denoted wSnT3.</Paragraph>
    <Paragraph position="9"> A set T of E-labeled 3-TM is definable in wSnT3 iff there is a formula ~r(XT, Xa)aez, with free variables among XT (interpreted as the domain of a tree) and Xa for each a E E (interpreted as the set of a-labeled points in T), such that</Paragraph>
    <Paragraph position="11"> It should be reasonably easy to see that any recognizable set can be defined by encoding the local TM of an accepting automaton in formulae in which the labels and states occur as free variables and then requiring every node to satisfy one of those formulae. One then requires the root to be labeled with an accepting state and &amp;quot;hides&amp;quot; the states by existentially binding them.</Paragraph>
    <Paragraph position="12"> The proof that every set of trees definable in wSnT3 is recognizable, while a little more involved, is just a lift of the proofs of Doner and Thatcher and Wright.The initial step is to show that every formula in the language of wSnT3  can be reduced to equivalent formulae in which only set variables occur and which employ only the predicates X C_ Y (with the obvious interpretation) and X '~i Y (satisfied iff X and Y are both singleton and the sole element of X stands in the appropriate relation to the sole element of Y). It is easy to construct 3-TM automata (over the alphabet 9~({X, Y}), where \[P denotes power set) which accept trees encoding satisfying assignments for these atomic formulae. The extension to arbitrary formulae (over these atomic formulae) can then be carried out by induction on the structure of the formulae using the closure properties of the recognizable sets.</Paragraph>
  </Section>
  <Section position="4" start_page="1119" end_page="1120" type="metho">
    <SectionTitle>
4 Defining TALs in wSnT3
</SectionTitle>
    <Paragraph position="0"> The signature of wSnT3 is inconvenient for expressing linguistic constraints. In particular, one of the strengths of the model-theoretic approach is the ability to define long-distance relationships without having to explicitly encode them in the labels of the intervening nodes.</Paragraph>
    <Paragraph position="1"> We can extend the immediate predecessor relations to relations corresponding to (proper) above (within the 3-TM), domination (within a tree), and precedence (within a set of siblings) using: def X T~ i y *. .. x ~ y A (3X)\[X(x) A X(y)A (Vz)\[X(z) ~ (z ~ y V (3!z')\[X(z') A z &amp;quot;~i z'\])\]\]. Which simply asserts that there is a sequence of (at least two) points linearly ordered by '~i in which x precedes y.</Paragraph>
    <Paragraph position="2"> To extend these through the entire structure we have to address the fact that the two dimensional yield of a 3-TM is not well defined--there is nothing that determines which leaf of the tree expanding a node dominates the subtree rooted at that node. To resolve this, we extend our structures to include a set H picking out exactly one head in each set of siblings, with the &amp;quot;foot&amp;quot; of a tree being that leaf reached from the root by a path of all heads. Given H, it is possible to + + define '~2 and '~1, variations of dominance and precedence 1 that are inherited by substructures in the appropriate way. At the same time, it is convenient to include the labels explicitly in the structures. A headed E-labeled 3-TM, then, is 1Of course &lt;3 + is just ~3.</Paragraph>
    <Paragraph position="3"> a structure: (T, &lt;i, ~i, &lt;~+, H, Pa) l&lt;_i&lt;a, a~g, where T is a rooted, connected subset of T 3 for some n.</Paragraph>
    <Paragraph position="4"> With this signature it is easy to define the set of 3-TM that captures a TAG in the sense that their 2-dimensional yields--the set of maximal points wrt ,~+, ordered by 4 + and ,~+--form the set of trees derived by the TAG. Note that obligatory (OA) and null (NA) adjoining constraints translate to a requirement that a node be (non-)maximal wrt ,~+. In our automata-theoretic interpretation of TAGs selective adjoining (SA) constraints are encoded in the states. Here we can express them directly: a constraint specifying the modifier trees which may adjoin to an N node, for instance, can be stated as a condition on the label of the root node of trees immediately below N nodes.</Paragraph>
    <Paragraph position="5"> In general, of course, SA constraints depend not only on the attributes (the label) of a node, but also on the elementary tree in which it occurs and its position in that tree. Both of these conditions are actually expressions of the local context of the node. Here, again, we can express such conditions directly--in terms of the relevant elements of the node's neighborhood.</Paragraph>
    <Paragraph position="6"> At least in some cases this seems likely to allow for a more general expression of the constraints, abstracting away from the irrelevant details of the context.</Paragraph>
    <Paragraph position="7"> Finally, there are circumstances in which the primitive locality of SA constraints in TAGs is inconvenient. Schabes and Shieber (1994), for instance, suggest allowing multiple adjunctions of modifier trees to the same node on the grounds that selectional constraints hold between the modified node and each of its modifiers but, if only a single adjunction may occur at the modified node, only the first tree that is adjoined will actually be local to that node.</Paragraph>
    <Paragraph position="8"> They point out that, while it is possible to pass these constraints through the tree by encoding them in the labels of the intervening nodes, such a solution can have wide ranging effects on the overall grammar. As we noted above, the expression of such non-local constraints is one of the strengths of the model-theoretic approach.</Paragraph>
    <Paragraph position="9"> We can state them in a purely natural way--as a simple restriction on the types of the modifier  trees which can occur below (in the ,~+ sense) the modified node.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML