File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-2183_intro.xml
Size: 5,375 bytes
Last Modified: 2025-10-06 14:06:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-2183"> <Title>A Descriptive Characterization of Tree-Adjoining Languages (Project Note)</Title> <Section position="2" start_page="0" end_page="1117" type="intro"> <SectionTitle> 1 Introduction </SectionTitle> <Paragraph position="0"> In the early Sixties Biichi (1960) and Elgot (1961) established that a set of strings was regular iff it was definable in the weak monadic second-order theory of the natural numbers with successor (wS1S). In the early Seventies an extension to the context-free languages was obtained by Thatcher and Wright (1968) and Doner (1970) who established that the CFLs were all and only the sets of strings forming the yield of sets of finite trees definable in the weak monadic second-order theory of multiple successors (wSnS). These descriptive characterizations have natural application to constraint- and principle-based theories of syntax. We have employed them in exploring the language-theoretic complexity of theories in GB (Rogers, 1994; Rogers, 1997b) and GPSG (Rogers, 1997a) and have used these model-theoretic interpretations as a uniform framework in which to compare these formalisms (Rogers, 1996). They have also provided a foundation for an approach to principle-based parsing via compilation into tree-automata (Morawietz and Cornell, 1997).</Paragraph> <Paragraph position="1"> Outside the realm of Computational Linguistics, these results have been employed in theorem proving with applications to program and hardware verification (Henriksen et al., 1995; Biehl et al., 1996; Kelb et al., 1997). The scope of each of these applications is limited, to some extent, by the fact that there are no such descriptive characterizations of classes of languages beyond the context-free. As a result, there has been considerable interest in extending the basic results (MSnnich, 1997; Volger, 1997) but, prior to the work reported here, the proposed extensions have not preserved the simplicity of the original results.</Paragraph> <Paragraph position="2"> Recently, in (Rogers, 1997c), we introduced a class of labeled three-dimensional tree-like structures (three-dimensional tree manifolds-3-TM) which serve simultaneously as the derived and derivation structures of Tree Adjoining-Grammars (TAGs) in exactly the same way that labeled trees can serve as both derived and derivation structures for CFGs. We defined a class of automata over these structures that are a generalization of tree-automata (which are, in turn, an analogous generalization of ordinary finite-state automata over strings) and showed that the class of tree manifolds rec- null ognized by these automata are exactly the class of tree manifolds generated by TAGs if one relaxes the usual requirement that the labels of the root and foot of an auxiliary tree and the label of the node at which it adjoins all be identical. null Thus there are analogous classes of automata at the level of labeled three-dimensional tree manifolds, the level of labeled trees and at the level of strings (which can be understood as two- and one-dimensional tree manifolds) which recognize sets of structures that yield, respectively, the TALs, the CFLs, and the regular languages. Furthermore, the nature of the generalization between each level and the next is simple enough that many results lift directly from one level to the next. In particular, we get that the recognizable sets at each level are closed under union, intersection, relative complement, projection, cylindrification, and determinization and that emptiness of the recognizable sets is decidable. These are exactly the properties one needs to establish that recognizability by the automata over a class of structures characterizes satisfiability of monadic second-order formulae in the language appropriate for that class. Thus, just as the proofs of closure properties lift directly from one level to the next, Doner's and Thatcher and Wright's proofs that the recognizable sets of trees are characterized by definability in wSnS lift directly to a proof that the recognizable sets of three-dimensional tree manifolds are characterized by definability in their weak monadic second-order theory (which we will refer to as wSnT3).</Paragraph> <Paragraph position="3"> In this paper we carry out this program. In the next section we introduce 3-TMs, our uniform notion of automaton over tree manifolds of arbitrary (finite) dimension and indicate the nature of the dimension-independent proofs of closure properties. In Section 3 we introduce wSnT3, the weak monadic second-order theory of n-branching 3-TM, and sketch the proof that the sets definable in wSnT3 are exactly those recognizable by 3-TM automata. This, when coupled with the characterization of TALs in Rogers (1997c), gives us our descriptive characterization of TALs: a set of strings is generated by a TAG (modulo the generalization of Rogers (1997c)) iff it is the (string) yield of a set of 3-TM definable in wSnT3. Finally, in Section 4 we look at how working in wSnT3 allows a potentially more transparent means of defining TALs and, in particular, a simplified treatment of constraints on modifiers in TAGs. Due to the limited length of this note, many of the details are omitted. The reader is directed to (Rogers, 1998) for a more complete treatment.</Paragraph> </Section> class="xml-element"></Paper>