File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/05/j05-2003_metho.xml
Size: 64,800 bytes
Last Modified: 2025-10-06 14:09:39
<?xml version="1.0" standalone="yes"?> <Paper uid="J05-2003"> <Title>Tree-Local Multicomponent Tree-Adjoining Grammars with Shared Nodes</Title> <Section position="2" start_page="188" end_page="192" type="metho"> <SectionTitle> (XTAG Research Group 1998) might look as shown in Figure 2. In the derivation, the </SectionTitle> <Paragraph position="0"> Computational Linguistics Volume 31, Number 2 verspricht-tree adjoins to the root node of the reparieren-tree, and the nominative NP der Mechaniker is substituted for the subject node in the verspricht-tree. This leads to the tree on the right in Figure 2.</Paragraph> <Paragraph position="1"> When es is added, there is a problem: It should be added to reparieren, since it is one of its arguments. But at the same time, it should precede der Mechaniker;thatis, it must be adjoined either to the root or to the NP nom node. The root node belongs to verspricht,andtheNP nom node belongs to der Mechaniker. Consequently, an adjunction to one of them would not give the desired predicate-argument structure. If one wanted to analyze only example (1), one could add a tree to the grammar for reparieren with a scrambled NP that allows adjunction of verspricht between the NP and the verb. But as soon as there are several scrambled elements that are arguments of different verbs, this no longer works.</Paragraph> <Paragraph position="2"> This example has given an idea of why scrambling is problematic for TAG. However, adopting specific elementary trees, it is possible to deal with a part of the difficult scrambling data: It has been shown (see Joshi, Becker, and Rambow 2000) that TAG can describe scrambling up to depth two (two crossed VP borders). But this is not sufficient. Even though examples of scrambling of depth greater than two are rare, they can occur. An example is example (2), taken from Kulick (2000): zu reparieren] zu versuchen] ...thattherefrigerator nobody torepair totry zu versprechen] bereit ist to promise willing is '...thatnobodyiswilling to promise to try to repair the refrigerator' Consequently, TAG is not powerful enough to account for scrambling. Becker, Rambow, and Niv (1992) argue that even linear context-free rewriting systems (LCFRSs) (Weir 1988) are not powerful enough to describe scrambling. (LCFRSs are weakly equivalent to set-local MCTAGs and therefore more powerful than TAGs.) Although we think that the language Becker, Rambow, and Niv define as a kind of test language for scrambling is not exactly what one needs (see section 2.3), we still suspect that they are right in claiming that LCFRSs cannot describe scrambling.</Paragraph> <Section position="1" start_page="189" end_page="192" type="sub_section"> <SectionTitle> 1.3 TAG Variants Proposed for Scrambling </SectionTitle> <Paragraph position="0"> The problem with long-distance scrambling and TAG is that the trees representing the syntax of scrambled German subordinate clauses do not have the simple nested structure that ordinary TAG generates. The CETM requires that (positions for) all of the arguments of the lexical anchor of an elementary tree be included in that tree. But in a scrambled tree, the arguments of several verbs are interleaved freely. All TAG extensions that have been proposed to accommodate this interleaving involve factoring the elementary structures into multiple components and inserting these components at multiple positions in the course of the derivation.</Paragraph> <Paragraph position="1"> One of the first proposals made was an analysis of German scrambling data using nonlocal MCTAG with additional dominance constraints (Becker, Joshi, and Rambow 1991). However, the formal properties of nonlocal MCTAG are not well understood, and Kallmeyer Multicomponent TAGs with Shared Nodes it is assumed that the formalism is not polynomially parsable. Therefore this approach is no longer pursued, but it has influenced the different subsequent proposals.</Paragraph> <Paragraph position="2"> An alternative formalism for scrambling is V-TAG (Rambow 1994a, 1994b; Rambow and Lee 1994), a formalism that has nicer formal properties than nonlocal MCTAG.</Paragraph> <Paragraph position="3"> V-TAG also uses multicomponent sets (vectors) for scrambled elements; in this it is a variant of MCTAG. Additionally, there are dominance links among the trees of the same vector. In contrast to MCTAG, the trees of a vector in V-TAG are not required to be added simultaneously. The lexicalized V-TAGs that are of interest for natural languages are polynomially parsable. Rambow (1994a) proposes detailed analyses of a large range of different word order phenomena in German using V-TAG and thereby shows the linguistic usefulness of V-TAG.</Paragraph> <Paragraph position="4"> Even though V-TAG does not pose the problems of nonlocal MCTAG in terms of parsing complexity, it is still a nonlocal formalism in the sense that, as long as the dominance links are respected, arbitrary nodes can be chosen to attach the single components of a vector. Therefore, in order to formulate certain locality restrictions (e.g., for wh-movement and also for scrambling), one needs an additional means of putting constraints on what can interleave with the different trees of a vector, or in other words, constraints on how far a dominance link can be stretched. V-TAG allows us to put integrity constraints on certain nodes that disallow the occurrence of these nodes between two trees linked by a dominance link. This has the effect of making these nodes act as barriers. With integrity constraints, constructions involving long-distance movements can be correctly analyzed. But the explicit marking of barriers is somewhat against the original appealing TAG idea that such constraints result from imposition of the CETM, according to which the position of the moved element and the verb it depends on must be in the same elementary structure, and from the further combination possibilities of this structure. In other words, in local formalisms with an extended domain of locality such as TAG or tree-local and set-local MCTAG, such constraints result from the form of the elementary structures and from the locality of the derivation operation. That is, they follow from general properties of the grammar, and they need not be stated explicitly. This is one of the aspects that make TAG so attractive from a linguistic point of view, and it gets lost in nonlocal TAG variants.</Paragraph> <Paragraph position="5"> D-tree substitution grammars (DSGs) (Rambow, Vijay-Shanker, and Weir 2001) are another TAG variant one could use for scrambling. DSGs are a description-based formalism; that is, the objects a DSG deals with are tree descriptions. A problem with DSG is that the expressive power of the formalism is probably too limited to deal with all natural language phenomena: According to Rambow, Vijay-Shanker, and Weir (2001) it &quot;does not appear to be possible for DSG to generate the copy language&quot; (page 101). This means that the formalism is probably not able to describe cross-serial dependencies in Swiss German. Furthermore, DSG is nonlocal and therefore, as in the case of V-TAG, additional constraints (path constraints) have to be placed on material interleaving with the different parts of an elementary structure.</Paragraph> <Paragraph position="6"> Another TAG variant using tree descriptions is local tree description grammar (TDG) (Kallmeyer 2001). Local TDG can be used for scrambling in a way similar to DSG or V-TAG. The languages generated by local TDGs are semilinear. However, the formalism allows one to generate tree descriptions with underspecified dominance relations, and the process of resolving the remaining dominance links is nonlocal. Therefore one may have the same problem as in the case of DSG and V-TAG. Furthermore, so far it has not been shown that the formalism is polynomially parsable, and it is not clear whether such parsing is possible without any additional constraint or limitation on the underspecified tree descriptions.</Paragraph> <Paragraph position="7"> Computational Linguistics Volume 31, Number 2 A further TAG variant proposed in order to deal with scrambling is segmented tree-adjoining grammar (SegTAG) (Kulick 2000). SegTAG uses an operation on trees called segmented adjunction that consists partly of a standard TAG adjunction and partly of a kind of tree merging or tree unification. In this operation, two different things get mixed up, the more or less resource-sensitive adjoining operation of standard TAG, in which subtrees cannot be identified, and the completely different unification operation.</Paragraph> <Paragraph position="8"> Perhaps using tree descriptions instead of trees, a more coherent definition of SegTAG can be achieved. But we will not pursue this here.</Paragraph> <Paragraph position="9"> The formal properties of SegTAG are not clear. Kulick (2000) suggests that SegTAGs are probably in the class of LCFRSs, but there is no actual proof of this. However, if SegTAG is in LCFRS, the generative power of the formalism is probably too limited to deal with scrambling in a general way. But it seems that the limit imposed by the grammar on the complexity of the scrambling data is fixed but arbitrarily high. (With increasing complexity, the elementary trees, however, get larger and larger.) This means that one can probably define a SegTAG that can analyze scrambling up to some complexity level n for any n [?] IN. (A definition of what a complexity level is, is not given; it is perhaps the depth of scrambling.) In this sense, a general treatment of scrambling might be possible. We follow a similar approach in this article by proposing a mildly context-sensitive formalism that can deal with scrambling up to some fixed complexity limit n that can be chosen arbitrarily high.</Paragraph> <Paragraph position="10"> All these TAG variants are interesting with respect to scrambling, and they give a great deal of insight into what kind of structures are needed for scrambling. But as explained above, none of them is entirely satisfying. The most convincing one is V-TAG, since this formalism can deal with scrambling, lexicalized V-TAG is polynomially parsable, and the set of languages V-TAG generates contains the set of all tree-adjoining languages (TALs) (in particular, the copy language). Furthermore, a large range of word order phenomena has been treated with V-TAG, and thereby the usefulness of V-TAG for linguistic applications has been shown. But as already mentioned, V-TAG has the inconvenience of being a nonlocal formalism. For the reasons explained above, it is desireable to find a local TAG extension for scrambling (as opposed to the nonlocality of derivations in V-TAG, DSG, and nonlocal MCTAG) such that locality constraints for movements follow only from the form of the elementary structures and from the local character of derivations. This article proposes a local TAG variant that can deal with scrambling (at least with an arbitrarily large set of scrambling phenomena), that is polynomially parsable, and that properly extends TAG in the sense that the set of all TALs is a proper subset of the languages it generates.</Paragraph> <Paragraph position="11"> In section 2, tree-local MCTAG with shared nodes (SN-MCTAG) and in particular restricted SN-MCTAG (RSN-MCTAG) are introduced, formalisms that extend TAG in the sense mentioned above. Section 3 shows linguistic applications of RSN-MCTAG, in particular, an analysis of scrambling. In section 4, a relation between RSN-MCTAG and range concatenation grammar (RCG) (Boullier 1999, 2000) is established. This relation allows us to show that certain subclasses of RSN-MCTAG are mildly context-sensitive and therefore in particular polynomially parsable. These subclasses do not cover all cases of long-distance scrambling but, in contrast to TAG, they cover an arbitrarily large 4 More precisely, only the root of the new elementary tree and eventually (i.e., in the case of an adjunction) the foot node get identified with the node the new tree attaches to. But there is no unification of whole subtrees. Consequently, every edge occurring in the derived tree comes from exactly one edge in an elementary tree, and every edge from the elementary trees used in the derivation occurs exactly once in the derived tree. In this sense the operation is resource-sensitive.</Paragraph> <Paragraph position="12"> Kallmeyer Multicomponent TAGs with Shared Nodes set, providing scrambling analyses that respect the CETM. This means that the limit they impose on the complexity of the scrambling data one can analyze is variable. Based on empirical studies, it can be chosen sufficiently great such that the grammar covers all scrambling cases that one assumes to occur.</Paragraph> </Section> </Section> <Section position="3" start_page="192" end_page="200" type="metho"> <SectionTitle> 2. The Formalism </SectionTitle> <Paragraph position="0"> An informal introduction of (restricted) tree-local MCTAG with shared nodes can also be found in Kallmeyer and Yoon (2004).</Paragraph> <Section position="1" start_page="192" end_page="193" type="sub_section"> <SectionTitle> 2.1 Motivation: The Idea of Shared Nodes </SectionTitle> <Paragraph position="0"> Let us consider again example (1) in order to illustrate the general idea of shared nodes.</Paragraph> <Paragraph position="1"> In standard TAG, nodes to which new elementary trees are adjoined or substituted disappear; that is, they are replaced by the new elementary tree. For example, after having performed the derivation steps shown in Figure 2, the root node of the reparieren tree does not exist any longer. It is replaced by the verspricht tree, and its daughters have become daughters of the foot node of the verspricht tree. That is, the root node of the derived tree is considered to belong only to the verspricht tree. Therefore, an adjunction at that node is an adjunction at the verspricht tree.</Paragraph> <Paragraph position="2"> However, this standard TAG view is not completely justified: In the derived tree, the root node and the lower S node might as well be considered to belong to reparieren, since they are results of identifying the root node of reparieren with the root and the foot node of verspricht.</Paragraph> <Paragraph position="3"> Therefore, we propose that the two nodes in question belong to both verspricht and reparieren. In other words, these nodes are shared by the two elementary trees. Consequently, they can be used to add new elementary trees to verspricht and (in contrast to standard TAG) also to reparieren.</Paragraph> <Paragraph position="4"> In the following, we use an MCTAG, and we assume tree-locality; that is, the nodes to which the trees of such a set are added must all belong to the same elementary tree. Standard tree-local MCTAGs are weakly and even strongly equivalent to TAGs, but they allow us to generate a richer set of derivation structures. In combination with shared nodes, tree-local multicomponent derivation extends the weak generative power of the grammar (see Figure 4 for a sample tree-local MCTAG with shared nodes that generates a language that is not a tree-adjoining language).</Paragraph> <Paragraph position="5"> Let us go back to example (1). Assume the tree set in Figure 3 for the scrambled NP es. If the idea of shared nodes is adopted, this tree set can be added to reparieren using the root of the derived tree for adjunction of the first tree and the NP acc substitution node for substitution of the second tree. The operation is tree-local, since both nodes are part of the reparieren tree.</Paragraph> <Paragraph position="6"> 5 Actually, in a feature-structure based TAG (FTAG) (Vijay-Shanker and Joshi 1988), the top feature structure of the root of the derived tree is the unification of the top of the root of verspricht and the top of the root of reparieren. The bottom feature structure of the lower S node is the unification of the bottom of the foot of verspricht and the bottom of the root of reparieren. In this sense, the root of the reparieren tree gets split into two parts. The upper part merges with the root node of the verspricht tree, and the lower part merges with the foot node of the verspricht tree.</Paragraph> <Paragraph position="7"> 6 In a way, the idea of node sharing is already present in description-based definitions of TAG-related formalisms (see Vijay-Shanker 1992; Rogers 1994; Kallmeyer 2001). This is why these formalisms are monotonic with respect to the node properties described in the tree descriptions. However, the possibility of exploiting this in order to obtain multiple adjunctions combined with multicomponent tree descriptions has not been pursued so far.</Paragraph> <Paragraph position="8"> Computational Linguistics Volume 31, Number 2 Figure 3 Derivation of (1) dass es der Mechaniker zu reparieren verspricht ('that the mechanic promises to repair it') using shared nodes.</Paragraph> <Paragraph position="9"> The notion of shared nodes means in particular that a node can be used for more than one adjunction. (E.g., in Figure 3, two trees were adjoined at the root of the reparieren tree.) A similar idea has led to the definition of extended derivation in Schabes and Shieber (1994). For certain auxiliary trees, Schabes and Shieber allow more than one adjunction at the same node. However, the definition of the derived tree in Schabes and Shieber (1994) is such that if first b and then b are adjoined at some node u (i.e., in the derivation tree there are edges from some g to b can be adjoined in any order. This is important for obtaining all the possible permutations of scrambled elements.</Paragraph> </Section> <Section position="2" start_page="193" end_page="198" type="sub_section"> <SectionTitle> 2.2 Formal Definition of Tree-Local MCTAG with Shared Nodes </SectionTitle> <Paragraph position="0"> As already mentioned, the idea of tree-local MCTAG with shared nodes is the following: In the case of a substitution of an elementary tree a into an elementary tree g,in the resulting tree, the root node of the subtree a is considered to be part of a and of g. Similarly, when an elementary tree b is adjoined at a node that is part of the elementary trees g ,...,g n , then in the resulting tree, the root and foot node of b are both considered to be part of g Kallmeyer Multicomponent TAGs with Shared Nodes or foot nodes starting from g prime , then each of these adjunctions can be considered an adjunction at g, since it takes place at a node shared by g,g prime , and all the subsequently adjoined trees.</Paragraph> <Paragraph position="1"> Therefore, one way to define SN-MCTAG refers to the standard TAG derivation tree in the following way. Define the grammar as an MCTAG and then allow only derivation trees that satisfy the following tree-locality condition: For each instance {g</Paragraph> <Paragraph position="3"> an elementary tree set in the derivation tree, there is a g such that each of the g i is either a daughter of g or is linked to one of the daughters of g by a chain of adjunctions at root or foot nodes.</Paragraph> <Paragraph position="4"> As an example, consider the derivation tree for (1) in Figure 3. It shows that the trees used in the derivation are the reparieren tree, the verspricht tree, the Mechaniker tree, and the two trees es and epsilon1-es from the tree set in Figure 3. epsilon1-es is substituted into reparieren at position 21, and verspricht is adjoined to reparieren at position epsilon1. Then, Mechaniker is substituted into verspricht at position 1, and es is adjoined to verspricht at position epsilon1.The derivation is tree-local in the node-sharing sense, since for the tree set {epsilon1-es, es}, epsilon1-es is a daughter of reparieren in the derivation tree and es is linked to reparieren byafirst adjunction of verspricht to reparieren and a further adjunction of es to the root of verspricht. In the following, we adopt this way of viewing derivations in SN-MCTAG as specific multicomponent TAG derivations; that is, we define SN-MCTAG as a variant of MCTAG. This avoids formalizing a notion of shared nodes, even though this was the starting motivation for the formalism.</Paragraph> <Paragraph position="5"> We assume a definition of TAG as a tuple G =<I, A, N, T> with I being the set of initial trees, A the set of auxiliary trees, and N and T the nonterminal and terminal node labels, respectively (see, for example, Vijay-Shanker [1987] for a formal definition of TAG). Now we formally introduce multicomponent tree-adjoining grammars (Joshi</Paragraph> <Paragraph position="7"> As in TAG, a derivation starts from an initial tree, and in the end, in the final derived tree, there must not be any obligatory adjunction constraint, and all leaves must be labeled by a terminal or by the empty word.</Paragraph> <Paragraph position="8"> In each MCTAG derivation step, an elementary tree set is chosen, and the trees from this set are added to the already derived tree. Since they are added to pairwise different 8 P(X) is the set of subsets of some set X.</Paragraph> <Paragraph position="9"> 9 As usual (see Vijay-Shanker 1987; Weir 1988), g[p,g prime ] is defined as follows: If g prime is (derived from) an initial tree and the node at position p in g is a substitution node, then g[p,g prime ] is the tree one obtains by substitution of g prime into g at node position p.Ifg prime is (derived from) an auxiliary tree and the node at position p in g is an internal node, then g[p,g prime ] is the tree one obtains by adjunction of g prime to g at node position p.Otherwiseg[p,g prime ] is undefined.</Paragraph> <Paragraph position="10"> Computational Linguistics Volume 31, Number 2 nodes, one can just as well add them one after the other; that is, each multicomponent derivation in an MCTAG G =<I, A, N, T,A> corresponds to a derivation in the TAG</Paragraph> <Paragraph position="12"> :=<I, A, N, T> . Let us define the TAG derivation tree of such a multicomponent derivation as the corresponding derivation tree in G TAG . We can then define tree-local, set-local, and nonlocal MCTAG and also the different variants of SN-MCTAG this article deals with by putting different constraints on this derivation tree. Note that for each</Paragraph> <Paragraph position="14"> ],thenodeaddressp in the derived tree g points at a node that is at some address p prime in some elementary tree g prime that was already added (g prime and p prime are unique). In the TAG derivation tree, there will be in this case an edge from g</Paragraph> <Paragraph position="16"> A TAG derivation tree can be considered a tuple of nodes and edges. As usual in finite trees, the edges are directed from the mother node to the daughter. Linear precedence is not needed in a derivation tree, since it does not influence the result of the derivation. So a derivation tree is a tuple <N,E> ,withN being a finite set of instances of elementary trees and with E [?] N xN x IN</Paragraph> <Paragraph position="18"> is the set of Gorn addresses. We define the parent relation as the relation between mothers and daughters in a derivation tree, the dominance relation as the reflexive transitive closure of the parent relation, and the node-sharing relation as the relation between nodes that either are mother and daughter or are linked first by a substitution/adjunction and then a chain of adjunctions at root or foot nodes:</Paragraph> <Paragraph position="20"> being an auxiliary tree with foot node address p The TAG derivation trees for MCTAG derivations have certain properties resulting from the requirement that the elements of instances of elementary tree sets must be added simultaneously to the already derived tree: First, if an elementary tree set is used, then all trees from this set must occur in the derivation tree. Secondly, one tree from an elementary tree set cannot be substituted or adjoined into another tree from the same set, and, thirdly, two tree sets cannot be interleaved. For nonlocal MCTAG, these are all constraints the TAG derivation tree needs to satisfy.</Paragraph> <Paragraph position="21"> 10 This TAG derivation tree is not the MCTAG derivation tree defined in Weir (1988). The nodes of Weir's MCTAG derivation trees are labeled by sequences of elementary trees (i.e., by elementary tree sets), and each edge stands for simultaneous adjunctions/substitutions of all elements of such a set. for 2 [?] i [?] n.</Paragraph> <Paragraph position="22"> The proof of this lemma is given in the appendix. The lemma gives us a way to characterize nonlocal MCTAG via the properties of the TAG derivation trees the grammar licenses. With this characterization we get rid of the original simultaneity requirement: The corresponding properties are now captured in the three constraints (MC1)-(MC3). But since these constraints need to hold only for the TAG derivation trees that correspond to derived trees in the tree language, subderivation trees need not satisfy them. In other words, g and g from the same elementary tree set can be added at different moments of the derivation as long as the final complete TAG derivation tree satisfies (MC1)-(MC3).</Paragraph> <Paragraph position="23"> We now define tree-local, set-local, SN-tree-local, and SN-set-local TAG derivation trees by imposing further conditions. Basically, the difference between the first two and their SN variants is that in the first two, the definition refers to the parent relation, whereas in the second two, it refers to the node-sharing relation.</Paragraph> <Paragraph position="24"> Definition 3 Let G =<I, A, N, T,A> be an MCTAG. Let D =<N,E> be a TAG derivation tree for some [?] N, there is an instance G of an elementary tree set such that for all 1 [?] i [?] n, there is a t</Paragraph> <Paragraph position="26"> The formalism we are proposing for scrambling is MCTAG with SN-tree-local TAG derivation trees. We call these grammars tree-local MCTAGs with shared nodes: Definition 4 Let G be an MCTAG. G is a tree-local MCTAG with shared nodes iff the set of trees generated by G, L T (G), is defined as the set of those trees that can be derived with an SN-tree-local multicomponent TAG derivation tree in G. As usual, the string language L S (G) is then defined as the set of strings yielded by the trees in L T (G).</Paragraph> <Paragraph position="27"> All tree-adjoining languages can be generated by SN-MCTAGs, since a TAG corresponds to an MCTAG with unary multicomponent sets. For such an MCTAG, each TAG derivation tree is trivially SN-tree-local. In other words, in this case the tree sets are the same, whether the grammar is considered a TAG, a tree-local MCTAG, or an SN-MCTAG.</Paragraph> <Paragraph position="28"> In particular, all TAG analyses proposed so far can be maintained, since each TAG is trivially also an instance of SN-MCTAG. SN-MCTAG is a proper extension of TAG (and of tree-local MCTAG) in the sense that there are languages that can be generated by an SN-MCTAG but not by a TAG. As an example, consider Figure 4, which shows an SN-MCTAG for {www|w [?] T Similar to the grammar in Figure 4, for all copy languages {w</Paragraph> <Paragraph position="30"> } for some n [?] IN , an SN-MCTAG can be found. Other languages that can be generated by SN-MCTAG and that are not TALs are the counting languages {a</Paragraph> <Paragraph position="32"> k [?] 4, these languages are tree-adjoining languages).</Paragraph> <Paragraph position="33"> There are two crucial differences between V-TAG and SN-MCTAG: First, in V-TAG, the adjunctions of auxiliary trees from the same set need not be simultaneous. In this respect, V-TAG differs not only from SN-MCTAG, but from any of the different 11 However, viewing a TAG as an SN-MCTAG allows us to obtain a richer set of SN-derivation structures, as introduced in the next section. This is exploited in Kallmeyer (2002) for semantics.</Paragraph> <Paragraph position="34"> Kallmeyer Multicomponent TAGs with Shared Nodes MCTAGs mentioned above. Secondly, V-TAG is nonlocal in the sense of nonlocal MC-TAG, whereas SN-MCTAG is local, even though the locality is not based on the parent relation in the derivation tree, as is the case in standard local MCTAG, but on the SNdominance relation in the derivation tree. As a consequence of the locality, we do not need dominance links (i.e., dominance constraints that have to be satisfied by the derived tree) in SN-MCTAG, in contrast to other TAG variants for scrambling. The locality condition put on the derivation sufficiently constrains the possibilities for attaching the trees from elementary tree sets: Different trees from a tree set attach to different nodes of the same elementary tree. Consequently, the dominance relations among these different nodes determine the dominance relations among the different trees from the tree set. Therefore extra dominance links are not necessary. This is different for nonlocal TAG variants such as V-TAG or DSG, in which one can in principle attach the different components of an elementary structure at arbitrary nodes in the derived tree.</Paragraph> </Section> <Section position="3" start_page="198" end_page="200" type="sub_section"> <SectionTitle> 2.3 SN-MCTAG and Scrambling: Formal Considerations </SectionTitle> <Paragraph position="0"> Figure 5 shows an SN-MCTAG generating a language that cannot even be generated by linear context-free rewriting systems (see Becker, Rambow, and Niv [1992] for a proof), and therefore not by set-local MCTAG. This example, however, concerns neither weak nor strong generative capacity, but something that Becker, Rambow, and Niv (1992) call derivational capacity: the derivation of n</Paragraph> <Paragraph position="2"> must be such that the p(i)th n and the ith v come from the same elementary tree set in the grammar.</Paragraph> <Paragraph position="3"> The grammar in Figure 5 works in the following way: Each derivation starts with a. Then a first instance of the tree set (yielding n and v ) is added to the N and V nodes in a. For each further instance of the tree set (yielding n</Paragraph> <Paragraph position="5"> is adjoined to the root node of the b</Paragraph> <Paragraph position="7"> adjunctions except the first are occurring at root nodes, and consequently all b v are (primary or secondary) SN-daughters of a.</Paragraph> <Paragraph position="9"> can be adjoined to any of the root or foot nodes of the b n that have already been added, since in this way all adjunctions of b n except the first one occur at root or foot nodes, and therefore all these b n are SN-daughters of a. This allows us to place n</Paragraph> <Paragraph position="11"> }, and thereby any permutation of the ns can be obtained. Since all nodes in the derivation tree are SN-daughters of a, the derivation is SN-tree-local. Note that in the grammar in Figure 5, there is no NA constraint on the foot node of the first auxiliary tree in the tree set. This is crucial for allowing all permutations of the n</Paragraph> <Paragraph position="13"> . In this respect, the elementary trees differ from what is usually done in TAG. Becker, Rambow, and Niv (1992) argue that a formalism that cannot generate the language in Figure 5 is not able to analyze scrambling in an adequate way. We think, Figure 5</Paragraph> <Paragraph position="15"> are in the same elementary tree set and they were added in the ith derivation step for all i,1[?] i [?] k,andp is a permutation of (1,..., k)}.</Paragraph> <Paragraph position="16"> Computational Linguistics Volume 31, Number 2 Figure 6 Predicate argument structure for SCR. however, that this language is not exactly what one needs for scrambling. The assumption underlying the language in Figure 5 is that n</Paragraph> <Paragraph position="18"> should be added to v</Paragraph> <Paragraph position="20"> the additional assumption that argument NPs are added by substitution, then one can require that the argument NPs have already been substituted (this is what Joshi, Becker, and Rambow [2000] call the weak co-occurrence constraint), that is, that the tree for</Paragraph> <Paragraph position="22"> . In this case, the language in Figure 5 is an appropriate test language for scrambling. But we do not want to make this assumption.</Paragraph> <Paragraph position="23"> Furthermore, there are more predicate-argument dependencies: v</Paragraph> <Paragraph position="25"> for i [?] 2. This is what Joshi, Becker, and Rambow (2000) call the strong co-occurrence constraint. In other words, the dependency tree should be as in Figure 6.</Paragraph> <Paragraph position="26"> Additionally to the permutation of the n</Paragraph> <Paragraph position="28"> can be moved leftward, as long as they do not permute among themselves. Consequently, for scrambling data (without extraposition), one rather wants to generate the following language: SCR :=</Paragraph> <Paragraph position="30"> in w for all 1 [?] i [?] k and v</Paragraph> <Paragraph position="32"> in w for all 1 < i [?] k} with the derivation structure in Figure 6. An SN-MCTAG generating this language is shown in Figure 7.</Paragraph> <Paragraph position="33"> The SN-MCTAG in Figure 7 yields the following derivations: Either start with a } are added. These sets can be added in any order; the auxiliary tree is always adjoined to the root node of the already derived tree that is shared by all auxiliary trees that have been used so far and by the first a 1 . The initial tree is primarily substituted Kallmeyer Multicomponent TAGs with Shared Nodes into the argument slot it fills. So the only condition for adding such a tree set is that the verb it depends on has already been added, since the tree of this verb provides the substitution node for the initial tree. Therefore, since the lexical material is always left of the foot node, one obtains that v</Paragraph> <Paragraph position="35"> all 1 [?] i [?] k.</Paragraph> <Paragraph position="36"> Note that in Figure 7, for a scrambled n i , the substitution node is filled with an empty node, while the n is adjoined higher at a node that is not yet available in the elementary structure of v i . So the combination of n</Paragraph> <Paragraph position="38"> cannot be precompiled here.</Paragraph> </Section> </Section> <Section position="4" start_page="200" end_page="201" type="metho"> <SectionTitle> 2.4 Restricted SN-MCTAG </SectionTitle> <Paragraph position="0"> When the formal properties of SN-MCTAG are examined, it becomes clear that the formalism is hard to compare to other local TAG-related formalisms, since in the derivation tree, arbitrarily many trees can be secondary SN-daughters of a single elementary tree, such that these secondary links are considered to be adjunctions to that tree. This means that these secondary links are relevant for the SN-tree-locality of the derivation. An example is the grammar in Figure 5, in which in each derivation step, the relevant node-sharing relations are the links between a and the two auxiliary trees of the new set.</Paragraph> <Paragraph position="1"> This means that for a word of length k, there are k SN-daughters of a that are relevant for the SN-tree-locality of the derivation. The grammar in Figure 5 indicates that this property of SN-MCTAG is at least partly responsible for the fact that SN-MCTAG allows us to generate languages that are not even mildly context-sensitive (i.e., that are not in the class of languages that can be generated by LCFRS). However, it would be desirable to stay inside the class of mildly context-sensitive languages. Therefore, in the following, we define a restricted version, RSN-MCTAG, that limits the number of relevant secondary SN-daughters of an elementary tree. The restriction is obtained as follows: We require that in each derivation step, among the SN-relations between the old g and the new set G, there be at least one primary SN-relation. The number of primary SN-daughters of a specific elementary tree is limited, since the primary SN-daughters correspond to substitutions/adjunctions at pairwise different nodes and the number of nodes in an elementary tree is limited. Consequently, the number of relevant secondary SN-daughters for a node is limited as well.</Paragraph> <Paragraph position="2"> An example of a derivation satisfying the new constraint is that in Figure 3, in which es is a secondary SN-daughter of reparieren, while the second element of the tree set, epsilon1-es, is a primary SN-daughter of reparieren.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 31, Number 2 The first condition of the definition says that the grammar is SN-tree-local, and the second condition ensures that at least one of the relevant SN-daughters of g is a primary SN-daughter, that is, an actual daughter of g.</Paragraph> <Paragraph position="4"> As for SN-MCTAG, all tree-adjoining languages can also be generated by RSN-MCTAGs. The sample grammars in Figures 4 and 5 are not RSN-MCTAGs. We suspect that there is no RSN-MCTAG that generates the language in Figure 5. But the grammar in Figure 7 for the language SCR is an RSN-MCTAG.</Paragraph> <Paragraph position="5"> It can be shown that for the TAG derivation trees of an RSN-MCTAG, the following holds: For each instance of an elementary tree set G,theg to which all elements of G are linked by node-sharing relations with at least one primary link is unique (which is not necessarily the case for general SN-MCTAG). This is formulated in the following },withg being the unique elementary tree as described in the lemma, all <g,g</Paragraph> <Paragraph position="7"> adjunction links in D. The proof of the lemma is given in the appendix.</Paragraph> <Paragraph position="8"> Now we introduce the SN-derivation structure of a TAG derivation tree D in an RSN-MCTAG. It consists of D enriched with additional links for the secondary adjunctions. These links are equipped with the positions of the first substitutions/adjunctions on the chain that corresponds to the secondary adjunctions.</Paragraph> <Paragraph position="9"> Definition 6 Let G =<I, A, N, T,A> be an RSN-MCTAG. Let D =<N,E> be a TAG derivation tree in G.TheSN-derivation structure of D, D That this lemma holds is nearly immediate: Each secondary adjunction must be associated with a primary adjunction or substitution into the same tree instance. There are at most k primary adjunctions or substitutions into any tree instance if k is the maximal number of nodes per elementary tree. Consequently there are at most k x n secondary adjunctions per node if n + 1 is the maximal number of trees per elementary tree set.</Paragraph> <Paragraph position="10"> In linguistic applications, the SN-derivation structure is intended to reflect the predicate-argument dependencies of a sentence in the following way: For each tree in the SN-derivation structure, if this tree is secondarily adjoined to some other tree g, then it depends on g. Otherwise it depends on its mother node in the TAG derivation tree. In this way, the grammar for SCR in Figure 7 yields the desired dependency structure.</Paragraph> </Section> <Section position="5" start_page="201" end_page="206" type="metho"> <SectionTitle> 3. Linguistic Applications </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="201" end_page="206" type="sub_section"> <SectionTitle> 3.1 Scrambling with RSN-MCTAG </SectionTitle> <Paragraph position="0"> In this section, we present a small German grammar that allows us to analyze some cases of scrambling. The aim is not an exhaustive treatment of the phenomenon, but just to show that in principle, an analysis of scrambling in German is possible using RSN-MCTAG. The data to which we restrict ourselves are word order variations of example (3) without extraposition, that is, under the assumption that the order of the verbs is zu reparieren zu versuchen verspricht: (3) ...dass er dem Kunden das Fahrrad zu reparieren The elementary trees and tree sets for example (3) are shown in Figure 8. In contrast to standard TAG practices, which are often guided by technical considerations, we represent all arguments of a verb (including an embedded VP) by substitution nodes. For those parts that might be scrambled, there is a single elementary tree (for the case without scrambling) and a tree set used for scrambling. The tree set contains an auxiliary tree that can be primarily or secondarily adjoined to some root node and a tree with the empty word that is intended to fill the argument position. In order to avoid spurious ambiguities, we assume that whenever a derivation using the single elementary tree is possible, this is chosen.</Paragraph> <Paragraph position="1"> A scrambled element always adjoins to a VP node, and the scrambled element is to the left of the foot node. Therefore it precedes everything that is below or on the Computational Linguistics Volume 31, Number 2 Figure 8 Elementary trees for scrambling.</Paragraph> <Paragraph position="2"> right of the VP node to which it adjoins. Consequently, given the form of the verbal elementary trees in Figure 8, in which the verb is always below or to the right of all VP nodes allowing adjunction, the order xvfor an x being a nominal or a verbal argument of v is always respected.</Paragraph> <Paragraph position="3"> For an element (a lexical item), the tree set for scrambling is used whenever one of the following three cases holds:</Paragraph> <Paragraph position="5"> Scrambling of depth more than one out of the element takes place.</Paragraph> <Paragraph position="6"> a114 The element intervenes between some element A (on its right) and some element B (on its left) scrambled out of A, and the element itself does not belong to A.</Paragraph> <Paragraph position="7"> In other words, the fact that the set for scrambling is used for some element does not necessarily mean that this element is scrambled. It just means that one of the three cases above holds, that is, that some scrambling around this element takes place.</Paragraph> <Paragraph position="8"> One could actually do without the single trees and always use the tree sets. In this case, even if no scrambling took place, all argument slots would be filled by empty words, and all lexical material would be adjoined to the root node of the derived tree. At first glance, this seems rather odd. But if one does not consider the substitution nodes argument slots but rather some kind of subcategorization features marking which arguments need to be added, an analysis using only the tree sets makes sense. However, for this article, we keep the single trees.</Paragraph> <Paragraph position="9"> For example (3), a derivation without secondary adjunctions and using only the single trees is possible. Let us consider the following word orders as examples of how secondary adjunction is used for scrambling: Consequently, for versuchen and er, the sets with two trees are used, whereas for all the other elements, the single trees can be used. In example (5), the reparieren-VP is scrambled out of the versuchen-VP, with dem Kunden intervening between the two. Therefore, the tree sets are used for reparieren and dem Kunden. For versuchen, the single tree can be used, since the scrambling out of versuchen is of depth one. In example (6), we have the same scrambling as in example (4), and additionally, das Fahrrad is scrambled out of the reparieren-VP and the versuchen-VP (depth two). Consequently, in this case one needs tree sets for Fahrrad, er, versuchen,and reparieren.</Paragraph> <Paragraph position="10"> Let us consider the analysis of example (4): Starting with verspricht, the single tree for dem Kunden and the tree set for versuchen (with adjunction of the auxiliary tree at the root) are added. This leads to the first tree in Figure 9. The VP nodes in boldface type in the figure are shared by versuchen and verspricht; that is, they can be used for further adjunction at the verspricht tree. (Of course, only the root node can be used for adjunction, since the other nodes have NA constraints.) It does not matter in which order er and zu reparieren are added. For er, the tree set is used. The auxiliary tree is secondarily adjoined to the root node, and the initial tree is substituted for the NP nom node in the verspricht tree. This leads to the second tree in Figure 9. For reparieren and das Fahrrad, the single trees are added below the VP substitution node in the versuchen tree. The corresponding SN-derivation structure (see Figure 9) contains the desired predicate-argument dependencies. The TAG derivation tree is RSN-tree-local.</Paragraph> <Paragraph position="11"> Next, let us consider example (5). Here, the single trees for er and versuchen are added to verspricht. This leads to the first tree in Figure 10. The VP node in boldface type in the figure belongs to verspricht and versuchen. It is next used for secondary adjunction of dem Kunden to the verspricht tree. The initial tree is substituted at the NP dat slot. This leads to the second tree. Here, the bold VP node belongs to verspricht, versuchen,and Kunde. It is next used for secondary adjunction of the auxiliary tree of reparieren to versuchen, while the initial tree is substituted for the VP leaf in the versuchen tree. This 13 Actually, er here is not really scrambled, but since in our formalism, scrambled elements attach at the left of a VP, any other element even more to the left is treated as if it is scrambled (even if it depends on the matrix verb).</Paragraph> <Paragraph position="12"> leads to the third tree. After that, one needs only to add the single tree for das Fahrrad to reparieren. Note that this is a derivation in which the foot node of the elementary tree containing the lexical material does not dominate the tree with the empty word. Now let us consider the derivation of example (6). Here, only for dem Kunden,the single tree is added by substitution. In all other cases, the tree set is used with (primary Kallmeyer Multicomponent TAGs with Shared Nodes or secondary) adjunction at the root node of the already derived tree. This root node consequently belongs to all verbs that have already occurred in the derivation and can therefore be used to add arguments to any of them.</Paragraph> <Paragraph position="13"> We leave it to the reader to verify that all word orders can be generated. This kind of analysis also works for more than two embeddings.</Paragraph> <Paragraph position="14"> Since all scrambled elements attach to a VP node in the elementary tree of the verb they depend on, they cannot attach to the VP of a higher finite verb that embeds the sentence in which the scrambling occurs. In this way, a barrier effect is obtained without establishing any explicit barrier, as is done in V-TAG. Instead, this locality of scrambling is a consequence of the form of the elementary trees and of the locality of the derivations. Concerning adjunct scrambling, each adjunct has a single auxiliary tree as in standard TAG and additionally a set of two auxiliary trees, a lower auxiliary tree with an empty word and a higher auxiliary tree with the adjunct. This is shown in Figure 11. The internal VP node of the higher tree in the tree set serves as an adjunction site for the lower parts of other adjuncts. Similarly, the elementary trees of verbs need an extra VP node in order to adjoin adverbs.</Paragraph> <Paragraph position="15"> For more analyses of scrambling, including scrambling in combination with extraposition and topicalization, and also for an extension of the analysis presented here to Korean data, see Kallmeyer and Yoon (2004).</Paragraph> </Section> <Section position="2" start_page="206" end_page="206" type="sub_section"> <SectionTitle> 3.2 Raising Verbs and Subject-Auxiliary Inversion </SectionTitle> <Paragraph position="0"> Other phenomena often mentioned in the TAG literature (see, e.g., Rambow, Vijay-Shanker, and Weir 1995; Kulick 2000; Dras, Chiang, and Schuler 2004) as being problematic for TAG and tree-local MCTAG are sentences with raising verbs and subject-auxiliary inversion, as in examples (7) and (8): (7) Does Gabriel seem to be likely to eat gnocchi? (8) What does John seem to be certain to like? The standard TAG analyses of examples (7) and (8) (see Figure 12 for the analysis of example (8)) start with the eat and like tree, respectively, adjoin an auxiliary tree for likely and certain, respectively, and then add the trees for does and seem, respectively. If we assume that these trees are in the same elementary tree set, then this last derivation step is nonlocal, since the does tree adjoins to eat and like, respectively, while the seem tree adjoins to likely and certain, respectively. Though different from scrambling, this problem seems to be of a similar nature, and formalisms that have been proposed for scrambling have also been used to treat these examples (see Kulick 2000).</Paragraph> <Paragraph position="1"> RSN-MCTAG allows us to analyze examples (7) and (8) in a way that puts does and seem into a single elementary tree set: After having adjoined to be likely and to be certain, respectively, the root nodes of the adjoined trees are considered still to be part of the elementary trees of eat and like, respectively. These elementary trees can then be used to add the elementary tree set for does and seem: Both auxiliary trees are adjoined to these trees. Figure 12 shows the corresponding SN-derivation structure.</Paragraph> </Section> </Section> <Section position="6" start_page="206" end_page="215" type="metho"> <SectionTitle> 4. RSN-MCTAG and Range Concatenation Grammar </SectionTitle> <Paragraph position="0"> In the following, we show that for each RSN-MCTAG of a certain type (i.e., with an additional restriction), a weakly equivalent simple range concatenation grammar (Boullier 1999, 2000) can be constructed. It has been shown that RCGs generate exactly the class of all polynomially parsable languages (Bertsch and Nederhof 2001; appendix A). Furthermore, as shown in Boullier (1998b), simple RCGs in particular are even weakly equivalent to linear context-free rewriting systems (Weir 1988). As a consequence, one obtains that the languages generated by simple RSN-MCTAGs are mildly context-sensitive. This last property was introduced in Joshi (1985). It includes formalisms that are polynomially parsable, are semilinear, and allow only a limited number of crossing dependencies. (We do not give formal definitions of mild context-sensitivity and of LCFRS, since we do not need these definitions in this article.) Concerning RSN-MCTAGs in general, that is, without any further restriction, we are almost sure that they are not mildly context-sensitive. Perhaps they can even generate languages that are not in the class of languages generated by RCGs.</Paragraph> <Section position="1" start_page="206" end_page="209" type="sub_section"> <SectionTitle> 4.1 Range Concatenation Grammars </SectionTitle> <Paragraph position="0"> This section defines range concatenation grammars.</Paragraph> <Paragraph position="1"> , the arguments of the predicates in the clause are instantiated with substrings of w, more precisely, with the corresponding ranges. A range <i, j> with 0 [?] i < j [?] n corresponds to the substring between positions i and j, that is, to the substring t</Paragraph> <Paragraph position="3"> to the empty string epsilon1.Ifi > j, then <i, j> is undefined.</Paragraph> <Paragraph position="4"> Definition 8 For a given clause, an instantiation with respect to a string w = t</Paragraph> <Paragraph position="6"> if consecutive variables and occurrences of terminals in an argument in the clause are mapped to <i</Paragraph> <Paragraph position="8"> can be replaced with the right-hand side of this instantiation.</Paragraph> <Paragraph position="9"> The language of an RCG G is the set of strings that can be reduced to the empty word, that is, {w|S(<0,|w|> ) AnRCGissaidtobenoncombinatorial if each of the arguments in the right-hand sides of the clauses are single variables. It is said to be linear if no variable appears more than once in the left-hand sides of the clauses and no variable appears more than once in the right-hand side of the clauses. It is said to be nonerasing if for each clause, each variable occurring in the left-hand side occurs also in the right-hand side and vice versa. It is said to be simple if it is noncombinatorial, linear, and nonerasing.</Paragraph> <Paragraph position="10"> Simple RCGs and LCFRSs are equivalent (Boullier 1998b).</Paragraph> </Section> <Section position="2" start_page="209" end_page="215" type="sub_section"> <SectionTitle> 4.2 Relation between RSN-MCTAG and Simple RCG </SectionTitle> <Paragraph position="0"> The goal of this section is to construct an equivalent simple RCG for a given RSN-MCTAG. In order to be able to perform this construction, in the following we further constrain the formalism of RSN-MCTAG by defining RSN-MCTAG of a specific arity n. For this version of RSN-MCTAG, the construction of an equivalent simple RCG is possible.</Paragraph> <Paragraph position="1"> First, let us sketch the general idea of the transformation from TAG to RCG (see Boullier 1998a). The RCG contains predicates <a> (X)and<b> (L, R) for initial and auxiliary trees, respectively. X covers the yield of a and all trees added to a,andL and R cover those parts of the yield of b (including all trees added to b) that are to the left and the right of the foot node of b. The clauses in the RCG reduce the argument(s) of these predicates by identifying those parts that come from the elementary tree a/b itself and those parts that come from one of the elementary trees added by substitution or adjunction. A sample TAG with an equivalent RCG is shown in Figure 13.</Paragraph> <Paragraph position="2"> For the construction of an equivalent RCG from a given RSN-MCTAG, we follow the same ideas while considering a secondary adjunction of b at some g as adjunction at g and not as adjunction at the elementary tree that is the mother node of b in the TAG derivation tree. There are two main differences between RSN-MCTAG and TAG that influence the construction of an equivalent RCG.</Paragraph> <Paragraph position="3"> First, more than one tree can be added to a node. Therefore we allow predicates of the form <ab were secondarily adjoined. Since the number of secondary adjunctions at a node is limited by some constant depending on the grammar Kallmeyer Multicomponent TAGs with Shared Nodes Figure 13 A sample TAG and an equivalent RCG.</Paragraph> <Paragraph position="4"> (see Lemma 3), k is limited as well, and therefore this extension with respect to TAG adds only a finite number of predicates.</Paragraph> <Paragraph position="5"> Second, the contribution of an elementary tree a/b including the trees added to it can be separated into arbitrarily many parts. Since each of the arguments of the predicates in the RCG has to cover a true substring of the input string, one needs predicates of arbitrary arities, namely, <a...> (L ), for the case where n auxiliary trees were added at the root of a/b that were actually secondarily adjoined at some higher tree such that these n trees separate the contribution of a/b into 2n + 1/2n + 2 parts, respectively. This extension is problematic, since it leads to an RCG with predicates of arbitrary arity: a dynamic RCG (Boullier 2001), a variant of RCG that is not polynomially parsable and that we therefore want to avoid. For this reason, we need an additional constraint on theRSN-MCTAGsweemploy.</Paragraph> <Paragraph position="6"> An example in which the contribution of an elementary tree is separated into three different parts is example (9), analyzed with the RSN-MCTAG in section 3.1 (see Figure 14). In the derived tree, the VP das Fahrrad zu reparieren zu versuchen (the broken triangles), which is the contribution of versuchen, is separated into three parts, The crucial point in example (9) is that in the SN-derivation structure (see Figure 14), there are two crossings of secondary edges inside one group of secondary links. This means that the contribution of versuchen is interrupted twice by arguments of verspricht (by Kunde and er). In order to avoid predicates of arbitrary arity, we therefore limit the number of crossings of secondary links. We define the arity of an RSN-MCTAG depending on the maximal number of crossings that are allowed.</Paragraph> <Paragraph position="7"> First, we define special subgraphs of the SN-derivation structure, secondary groups. These are subgraphs consisting of a chain of one primary substitution/adjunction and subsequent adjunctions at root or foot nodes such that there are secondary adjunctions along the whole chain. For example, the nodes verspricht, zu versuchen, Kunde, zu reparieren, er,andFahrrad in the SN-derivation structure in Figure 14 form such a group. For an SN-derivation structure of a certain arity, the number of crossings of secondary edges inside a single secondary group is then limited: For an SN-derivation structure of arity n, the number of crossings of secondary edges per secondary group is limited to n [?] 1. In other words, if i is the maximal number of crossings, then 2(i + 1) is the arity of the grammar. Of course, the arity is chosen such that an equivalent RCG of the same arity can be constructed. TAG, for example, is a grammar with 0 crossings, that is, an arity 2(0 + 1) = 2 if the grammar is viewed as an SN-MCTAG, and the corresponding RCG is actually of arity 2.</Paragraph> <Paragraph position="8"> (G), is defined as the set of those trees that can be derived in G with an RSN-tree-local multicomponent TAG derivation tree such that the corresponding SN-derivation structure is of arity n.</Paragraph> <Paragraph position="9"> Kallmeyer Multicomponent TAGs with Shared Nodes Figure 15 Sample RSN-MCTAG of arity four.</Paragraph> <Paragraph position="10"> Consider a simple example of a construction of an equivalent RCG for a given RSN-MCTAG. We choose an RSN-MCTAG of arity four, and we see that the arity of the corresponding RCG is four as well. The RSN-MCTAG is shown in Figure 15. Whether this grammar is considered to be a general RSN-MCTAG or an RSN-MCTAG of arity four does not matter in this case, since even in the general case, all possible SN-derivation structures are of arity four. However, in the case of other RSN-MCTAGs, the restriction to a certain arity might exclude certain TAG derivation trees and thereby decrease the language generated by the grammar.</Paragraph> <Paragraph position="11"> The language generated by the RSN-MCTAG in Figure 15 is {er zu kommen (zu</Paragraph> <Paragraph position="13"> derivation structures corresponding to the different strings are shown in Figure 15. The last one contains one crossing of secondary links; that is, the RSN-MCTAG is of arity four.</Paragraph> <Paragraph position="14"> Now let us look at the corresponding RCG. Since the arity of the RSN-MCTAG is four, the predicates of the corresponding RCG are of arity three (for initial trees) and four (for auxiliary trees).</Paragraph> <Paragraph position="15"> The contribution of a is never separated into parts, therefore the first and the third arguments of the predicate <a > are always epsilon1. Looking at the SN-derivation structures in Figure 15, we have three different possibilities for <a</Paragraph> <Paragraph position="17"> > , where two trees were added to the same node and further adjunctions at the root of b are possible. The point is that the part covered by b and the trees added to it can be separated into different substrings. This leads to cannot be separated into different parts, since nothing can be adjoined to b Kallmeyer Multicomponent TAGs with Shared Nodes This example should give an idea of how an equivalent RCG for a given RSN-MCTAG of arity n can be constructed.</Paragraph> <Paragraph position="18"> As already mentioned, in an RSN-MCTAG, the number of substitutions and (primary or secondary) adjunctions that can occur at each node is limited (see Lemma 3). Therefore, the number of predicates needed in the corresponding RCG is limited as well. Furthermore, in an RSN-MCTAG of arity n, the contribution of an elementary tree is separated into at most n parts. This still needs to be shown: Lemma 4 Let G be an RSN-MCTAG of arity n. Then for all w in the string language of G and for all elementary trees g used to derive w in G, the contribution of g, that is, the yield of g and everything added to g, is separated into at most n parts. The proof is given in the appendix.</Paragraph> <Paragraph position="19"> Theorem 1 For each RSN-MCTAG G of arity n, a simple RCG G prime of arity n can be constructed such</Paragraph> <Paragraph position="21"> The construction algorithm and a sketch of the proof are presented in the appendix. As a consequence of this theorem, the following corollary holds: Corollary For a given n, the string languages generated by RSN-MCTAGs of arity n are mildly context-sensitive, and they are in particular polynomially parsable. Since we have shown that for RSN-MCTAG with a fixed arity, one obtains grammars that are LCFRSs, we know that we can even construct a weakly equivalent set-local MCTAG. This set-local MCTAG, however, does not present an alternative to the RSN-MCTAG with fixed arity: It is very large, containing a large number of elementary trees per tree set (the number depends on the arity of the grammar) and, furthermore, a large number of trees without lexical material and a large number of internal nodes that are needed only to provide adjunction sites.</Paragraph> <Paragraph position="22"> An example is the set-local MCTAG in Figure 16. It is weakly equivalent to the RSN-MCTAG of arity four in Figure 15, and it even gives the correct dependency structure. The verspricht tree contains several VP nodes that are needed in order to provide adjunction sites for the different parts of er and versuchen.Theversuchen tree set needs an extra auxiliary tree that provides an additional VP node for adjunction and has to be separated from the tree containing versuchen, since the contribution of versuchen might be separated into different parts. Of course this little grammar is still simple, since there are almost no possibilities of adjoining different trees at the same node or of separating the contribution of one lexical item into different parts.</Paragraph> <Paragraph position="23"> As we have seen in Lemma 4, the linguistic signification of restricting the arity of the grammar to some n is that the lexical material containing a verb, all its arguments (including arguments and adjuncts of these arguments, etc.), and all its adjuncts cannot be separated into more than n discontinuous substrings in the whole sentence. For example, an RSN-MCTAG of arity two with elementary tree sets similar to those proposed above for scrambling would not be able to analyze example (9). However, RSN-MCTAGs of arity n for some sufficiently large fixed n can perhaps even describe Computational Linguistics Volume 31, Number 2 Figure 16 Equivalent set-local MCTAG for the RSN-MCTAG from Figure 15.</Paragraph> <Paragraph position="24"> all cases of scrambling: See again the analysis of example (9) in Figure 14. Here, the contribution of versuchen and its arguments is split only by other elements secondarily adjoined to verspricht. If only a limited number of such secondary adjunctions were possible (this is the case), and if none of these other secondarily adjoined elements allowed for further secondary adjunctions at its root or foot node (this still needs to be investigated), then the number of crossings might be limited. We leave this issue for further research.</Paragraph> <Paragraph position="25"> Even if RSN-MCTAG with a fixed arity could not analyze all scrambling data, based on empirical studies, n could be chosen sufficiently great such that the grammar would cover all scrambling cases that one assumes to occur.</Paragraph> <Paragraph position="26"> The important point is that the complexity limit given by the fixed n is variable; that is, an arbitrary n can be chosen. This is different from TAG, for example, in which the limit is fixed (assuming, of course, that we desire only analyses respecting the CETM). In this sense one can say that RSN-MCTAG can analyze scrambling in general.</Paragraph> </Section> </Section> class="xml-element"></Paper>