File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/88/c88-1013_metho.xml

Size: 30,190 bytes

Last Modified: 2025-10-06 14:12:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C88-1013">
  <Title>REPRESENTATION TREES AND STRING-TREE CORRESPONDENCES</Title>
  <Section position="1" start_page="0" end_page="0" type="metho">
    <SectionTitle>
REPRESENTATION TREES AND
STRING-TREE CORRESPONDENCES
</SectionTitle>
    <Paragraph position="0"/>
  </Section>
  <Section position="2" start_page="0" end_page="60" type="metho">
    <SectionTitle>
AB~AC E
</SectionTitle>
    <Paragraph position="0"> The corresponderlce between a string of a language and its abstract representation, usually a (decorated) tree, is not Straightforward. Ilowever, it is desirable to maintain it, for Example to build structured editors for tex ts wr 1 t t El/ i n nat urn 1 Ianguage. AS such ccr'resp)ndences must be compos 1 t iona\] , we ca \] I ~hem &amp;quot;Structured Strmg--lree Correspondences&amp;quot; (SSTC).</Paragraph>
    <Paragraph position="1"> We ~jrgue that a SSTC is m fact composed of two mterrelated correspondences, one between nodes and substr ings, and the other between subt tees and substrings, the substrings being possibly discontinuous in both cases. We then proceed to show how to define a SSTC witl~ a Structura! Correspondence Static Grammar (SCSG), and ~qich constraints to put on the rules of the SCSG to get a &amp;quot;natural&amp;quot; SSTC.</Paragraph>
    <Paragraph position="2"> Kev~d'~ : linguist ic dascr lpt ors, distort inuous consti tuents, discont imuous phrase structure grammars, st rLICt ured str ing- tree correspondences, structural corrosp:)ndence static gralilnlars t~t~),&amp;~D~: DPSG, M\], N\[., SSIC, STCG.</Paragraph>
    <Paragraph position="3"> ~U.C/3JLQ_N Ordered trees, annotated with simple labels or COmplex 'cecora~ions&amp;quot; (property lists), are widely used for representing natural language (NL) utterances. This oErresponOs to a hierarchical view: the utterance is decomposed into groups and subgroups. When the depth of lmguiscic analys~s is suc~ that a representation m terms of graphs, networks or sets of formulas would l)e more Jirect, one often st i \] I prefers to use tree structures, at the price of encoding the desired informa::ion in the decorations (e g., by &amp;quot;ooindexing&amp;quot; two or more nodes). This is because trees are conceptual\]y and a\]gorithmical\]y eas~er to manipu\]ate, and also because all usua\] interpretations based on the linguistic structure are more or less &amp;quot;compositiona\]&amp;quot; in nature. If a language is described by a classical Phrase Structure Grammar, or by a (projective) Dependency Grammar, the tree structure &amp;quot;contains&amp;quot; the associated string in some easily defined sense. \]n particular, the surface order of tile string is derived from some ordered traverse1 of the tree (left--to-right order of the leaves of a constituent tree, or infix order' foe a dependency tree).</Paragraph>
    <Paragraph position="4"> However, if one wants to associate &amp;quot;natural&amp;quot; structures to strings, for examole abstract trees for programs or predicate-argument structures for NL utterances, this is no longer true. Elements of the string may have been erased, or duplicated, some &amp;quot;discontinuous&amp;quot; groups may have been put together, and the surface order may not be reflected in the tree (e.g., for' e normalized representation). Such correspondences must be compositional: the complete tree corresponds to the complete string, thee subtrees correspond to suPstrings, etc. Hence, we call them &amp;quot;Structured String-tree Correspondences&amp;quot; (SSTC).</Paragraph>
    <Paragraph position="5"> For some applications, like classical (batch) Machine Translation (MT), it is not necessary to Keep the correspondence explicit: 'For revising a translation, it is enough to show the correspondence between two sentences or two paragraphs. 14owever, if one wants to build structured editors for texts written tn natural language, thereby using at the same time a string (the text) and a tree (its representation), it seems necessary to represent explicitly the associated SSTC.</Paragraph>
    <Paragraph position="6"> In the first part, we briefly review the types of string-tree correspondences whloh are implied by the most usual types of tree representations of NL utterances. We argue that a SSTC should in fact be composed of two interrelated correspondences, one between nodes and substrings, and the other between subtrees and substrings, the substrings being possibly discontinous m both cases. This is presented in more detail in the second part. \]n the last part, we show how to define a SSTC with a Structural Correspondence Static Grammar (SCSG), and which constraints to put on the rules of the  SCSG to get a &amp;quot;natural&amp;quot; SSTC.</Paragraph>
    <Paragraph position="7"> \[. ~CRR~N~..E,_j~TWEEN A STRIN~ 1. p~F~E ~TRUCTURE TREES (C-STRUCTURESI Classical Phrase Structure trees give rise to a very simple Kind of SSTC. To each string w = al...an, let us  associate the set of interva\]s i j, O~i~j~n. w(i j} denotes the substring ai...a3 of w if i&lt;o, 6 otherwise. The root, or equtva\]ently the M\]o\]e tree, corresponds to w = wl0n). Each \]ear corresponds to some substring w(i j), of length 0 or 1 (we may extend this to any lengtm if terminals are allowed to be themselves Strings Then, the correspondence is such that any internal node of the tree, or equivalently each tree &amp;quot;complete&amp;quot; m breadth and depth, COrrespondS tO w(i.j), iff its m daughters (or' its m immediate subtrees), in order', correspond to a sequence w{iL_jl),...,w(im gm\], such that i1=i, jm=j, and jk=ik*l for O&lt;k&lt;m.</Paragraph>
    <Paragraph position="8"> This type of correspondence is &amp;quot;prooectwe&amp;quot; It has however Peer\] argued that classical phrase structure trees are maaequate for&amp;quot; charaoterising syntactic representations in genera\], especially in the ease of so-oat\]ed &amp;quot;discontinuous&amp;quot;constttuents. Here are some examples.</Paragraph>
    <Paragraph position="9">  (1) John Z lkiL_\[k~C~, of course, ~j~.</Paragraph>
    <Paragraph position="10"> (2) He ~ the ball PS/{Q.</Paragraph>
    <Paragraph position="11"> (3) Je ~ le lui al ~ donn@.</Paragraph>
    <Paragraph position="12">  (I did not give it to him) According tO (McOawley 82), sentence (1) contains a verb phrase &amp;quot;talked about politics&amp;quot;, wlnich is divided by the adverbial phrase &amp;quot;of course&amp;quot;, which modifies the #~ole sentence, end not only the verbal kernel (or the verbal phrase, in ChomsKy's terminology). Sentence (2) contains the particle &amp;quot;up&amp;quot;, whtoh ls separated from its verb &amp;quot;picKed&amp;quot; by &amp;quot;the ball&amp;quot;, In sentence (3), the discontinuous negation &amp;quot;ne..,pas&amp;quot; overlaps with the composed form of the verb &amp;quot;ai...donn~&amp;quot;. Moreover, i{ a sentence in active voice ls to be represented in a standard order (subject verb object complement), this sentence contains two displaced elements, namely the  object &amp;quot;le&amp;quot; and the complement &amp;quot;lui&amp;quot;. (McCawley 82) and later (Bunt &amp; al 87} have argued that &amp;quot;meaningful&amp;quot; representations of sentences (2) and (3) should be the following phrase structure trees, (4) and (5), respectively.</Paragraph>
    <Paragraph position="13">  + ....................................................... + S (4) S (5) ! ! ! I I ! ! ! ! VP ! ! ~ ! ! ! I I ! ! ! ! ! V I __ ! ! NP ! VP ! ) NP ! ! I I ~1 I ; I I I I ! ! ! ! ! ! ! ! ! I ! ! ! V ADVP PP He picked the bali upl ! ! ! ! ~ I ! ! ! ! ! ! ! ! !John talked of course about politics I ....................................................... /</Paragraph>
    <Paragraph position="15"> here, the correspondence is establ iehed between a node (or equivalently the complete suDtree rooted at a node) and a sequence of intervals. If a displacement arises, ee in (3), the left-to-right order of nodes in the tree may be incompatible with the order of the corresponding sequences of intervals in the strtng (the considered ordering is the natural lexioographic extension).</Paragraph>
    <Paragraph position="16"> Rather than to introduce the awkward notion of &amp;quot;discontinuous&amp;quot; tree, as above, with intersecting branches, we suggest to keep the tree diagrams in thelr usual form and to show the string separately. For sentence (3), then, we get the following diagram.</Paragraph>
    <Paragraph position="18"/>
    <Paragraph position="20"> NOw, as before, the root of the tree still corresponds to w=w(0_n\], and a leaf corresponds to an interval of length O or 1 (or more, see above). But an internal node with m daughters corresponds to a sequence of intervals, ~hich' is the &amp;quot;union&amp;quot; of the m sequences corresponding to Its daughters.</Paragraph>
    <Paragraph position="21"> More precisely, a &amp;quot;sequence&amp;quot; of Intervals is a llst of the form S = w{il_jl) ..... wlip_jp}, in order (Ik&lt;Ik+1 for O&lt;K&lt;p) and without overlapping (jk&lt;ik+1 for O&lt;k&lt;p). Its union (denoted by &amp;quot;+&amp;quot;) with an interval I = w(i j} is the smallest list containing all elements of S and of I. For example, S+I is: S itself, if there is a k such that ik&lt;t and j_&lt;jk; S, augmented with wii J} inserted in the proper place, if j&lt;il or jp&lt;i or there is a k&lt;p such that Jk&lt;i and</Paragraph>
    <Paragraph position="23"> w{ll_jl} ..... w{tq_jq}.w{i_Jr} ..... w(lp_jp}, if there are q and r such that Jq&lt;t~lq+l and tr~j~jr (other cases are analogous),</Paragraph>
  </Section>
  <Section position="3" start_page="60" end_page="60" type="metho">
    <SectionTitle>
2. DEPENDENCY TREES (F~S~RUCTUR~)
</SectionTitle>
    <Paragraph position="0"> In classical dependency trees, elements of the represented string appear on the nodes of the tree, with no auxiliary symbols, except a &amp;quot;dummy node&amp;quot;, often indicated by &amp;quot;=&amp;quot;, which serves to separate the left daughters from the right daughters.</Paragraph>
    <Paragraph position="1"> There are two aspects in the correspondence. First, a node corresponds to an element of the string, usually an interval of length 1. Second, the complete subtree rooted at a node corresponds to the tnterval union of the intervals corresponding tO the node and to Its subtree.</Paragraph>
    <Paragraph position="2"> These intervals may not overlap.</Paragraph>
    <Paragraph position="3"> The string can be produced from the tree by an tnorder traversal (one starts from the root, and, at any node, one traverses ftrst the trees rooted at the left daughters, then the node, then the trees rooted at the  right daughters, reeursively).</Paragraph>
    <Paragraph position="4"> Sentences (1) and (2) might be represented by trees (8) and (9) below.</Paragraph>
    <Paragraph position="5"> + ......................................................... / ! talked (8) picked (9) ! ! ! I ! ! t ! ~ ! ! ! ! ! I !John ' of__ about He = _ball up ! ISUBJ : ADVS I OBJ1 ! SUBJ : l OBJ1PTC! : : ! ! ! ! : : ! ! : ! : : = course = politics : : the &amp;quot; : ! : : : : : .... : ; : DES : : t ; : : : : : . . : . : ! : : : : : : Hi picked the bail up ' * , , : , , !  nodes the syntactic functions usually attached to the edges.</Paragraph>
    <Paragraph position="6"> There may be some discussion on the structures produced. For example, some linguists would rather&amp;quot; see &amp;quot;politics&amp;quot; dominating &amp;quot;about&amp;quot;. This tS not our&amp;quot; tOpiC here, but we wtll use this other possibility in a later diagram. For the moment, note that discontinuity does not always disappear in dependency trees. Here is an example corresponding to sentence (3).</Paragraph>
    <Paragraph position="8"> Let us now take a simple example from the area of programming languages, ~C/nioh $he~ an abstract tree associated to an assignment, ~here some elements of the string are &amp;quot;missing&amp;quot; in the tree, and where a node oorreeponds to a &amp;quot;discontinuous&amp;quot; substring (a sequence of intervals).</Paragraph>
    <Paragraph position="10"> language expression Here, we have shown the correspondence between nodes and sequences. The parentheses are mlsstng in the tree, wtqich means that the sequence corresponding to the subtree rooted at node &amp;quot;+&amp;quot; is more than the union of the sequences oorrespondfng to its subtrees. However, there is no overlapping between sequences corresponding to independent nodes or suPtrees.</Paragraph>
    <Paragraph position="11"> Anoeher remark is that the elements appearing on the nodes are not always identical with elements of the represented string. FOr example, we have replaced &amp;quot;:=&amp;quot; by &amp;quot;=. &amp;quot; ~nd the (discontinuous) substring &amp;quot;if then else&amp;quot; by &amp;quot;if thE.m else&amp;quot;, in a usual fashion.</Paragraph>
    <Paragraph position="12">  3. P_RED OATE-ARGUMENT TREES (P-STRUCTURES) In &amp;quot;predicate-argument structures&amp;quot;, it is usual to construct a unique node for a compound predicate, in the same ~;pirit as the &amp;quot;if_then_else&amp;quot; operator above, With sentences (1) and (2), for example, we could get trees (12) and (13) below. Beside the logical relation (argument place) or the semantic relation, the nodes must also contain some other information, like tense, person, etc,, ~hich is not sho~n here.</Paragraph>
    <Paragraph position="13"> ! __~! I I ! ! ! ! I I ! John of course politics He ___ball I ! ARGO ESTII~ __ ARG1 ARGO ! ARe1 I  We now come to Situations where overlapping occurs, and ~r}re It ts natural to consider &amp;quot;tnooaplete&amp;quot; subtree8 corresl)ondtng to &amp;quot;dlsco~ttnous&amp;quot; groups. Thhl occurs frequently in eases of coordination with elision, as tn: &amp;quot;John and Mary give Paul and Ann trousers and Cresses.&amp;quot; In order to simplify the trees, ~ abstract this by the f{)rma\] language {an v bn on t n&gt;O}, and propose the two i:rees (14) and (15) below for the string &amp;quot;a a v b b c c&amp;quot; (also written a. 1 a.2 v b. 1 b.2 e.l c.2 to sl~L}w the positions) as more &amp;quot;natural&amp;quot; representations than i:he syntactic tree derived from a context-sensitive grammar in normal form for this language (all rules are of the form &amp;quot;1A r --~ 1 u r&amp;quot;, 1 and r being the left and right ~:ontext, respectively).</Paragraph>
    <Paragraph position="15"> On certain nodes, we have represented the sequence corresponding to the complete 8ubtree rooted at the node, fel \]owed by the sequence Corresponding to the node itself. For nodes A, B, C in tree (14), this &amp;quot;local&amp;quot; 8equanoe ts empty.</Paragraph>
    <Paragraph position="16"> In both trees, tt i8 clear that the sequence al V bl ol corresponds to an &amp;quot;incomplete&amp;quot; subtree, namely V(A(al),B(bl),C(cl)) In (14) and V(al,bl,cl) in (15).</Paragraph>
    <Paragraph position="17"> In tree (14), the cOOrdination is shoal directly on the graph, and the verb (V) is not shown as elided. \]t is a matter of further analysis to accept or not the distributive Interpretation (&amp;quot;respectively&amp;quot; may hold between the three groups, the last two ones, or nones).</Paragraph>
    <Paragraph position="18"> On the contrary, tree (15), in a sense, is a more &amp;quot;abstract&amp;quot; representation. It shows directly the interpretation as a coordination of two sentences, and &amp;quot;restores&amp;quot; the elided V.</Paragraph>
  </Section>
  <Section position="4" start_page="60" end_page="64" type="metho">
    <SectionTitle>
4, MULTILEVEL TREES (M-STRUCTURES)
</SectionTitle>
    <Paragraph position="0"> Multilevel tree structures, or m-structures for short, have been introduced by B.VAUQUOIS in 19.//4 (see (Vaupuols */8)) for the purposes of Machine Translation. On the same graph, three &amp;quot;levels of interpretation&amp;quot; are described (constituents, syntactic dependencies, logical and semantic relations). AS seen in other examples above, the nodes ~C/nich refer directly to the string do not contatn elements of the string, but rather representatives of (sequences of) elements of the string, called &amp;quot;lexical units&amp;quot; (LU), like &amp;quot;repair&amp;quot; for &amp;quot;reparation&amp;quot;, plus some information about the derivation used.</Paragraph>
    <Paragraph position="1"> The graph is deduced by simple rules from a dependency tree: each tnternat node t8 &amp;quot;lowered&amp;quot; tn the &amp;quot;'&amp;quot; position and its syntactic function becomes &amp;quot;GOV&amp;quot; (for &amp;quot;governor&amp;quot;, or head in some other terminology), discontinuous lexical elements (like &amp;quot;ne...pas&amp;quot; or &amp;quot;al...denn~&amp;quot; are represented by one node, coordination ts represented by &amp;quot;vertical ltsts&amp;quot; as tn tree (14), lextoal units of referred element~ are put In the nodes corresponding to the pronouns, an approximation of colndexlng, etc..</Paragraph>
    <Paragraph position="2"> From the point of view of the associated correspondence between representation trees and represented 8trtngs, nothing new has to be mentioned.</Paragraph>
    <Paragraph position="3">  can not be required, even on the leaves, because the string &amp;quot;( b )&amp;quot; may well have a representation tree with the unique nede b.</Paragraph>
    <Paragraph position="4">  2. if N has m daughters Nt...Nm, then STREE(N) ~ STREE(N1)+...+STREE(Nm) + SNODE(N). \]n case of strict containment, the difference correspond to the elements of the string which are represented by the subtree but which are not explicitly represented, like &amp;quot;(l' and &amp;quot;)&amp;quot; in &amp;quot;( b )&amp;quot;.</Paragraph>
    <Paragraph position="5"> c) The sequence SSUBT(X,N) corresponding to a given mcomplete subtree X rooted at node N of the whole tree T is defined recursively by: SSUB1(X,N) : STREE(X) if X : N, that is, if x iS reduced to one node, not necessarily a leaf of T;</Paragraph>
    <Paragraph position="7"> the root of X, has p subtrees XI...XD in T.</Paragraph>
    <Paragraph position="8"> In other words, one takes the smallest sequence contaming the bi9gest sequence corresponding to the leaves of x (S\]REE on the leaves) and compatible with the monotony rules above.</Paragraph>
    <Paragraph position="9"> Here are some interesting properties of SSTCs which may help to classify them.</Paragraph>
    <Paragraph position="10">  for any two sister nodes N1 and N2, N1 to the left of N2, STREE(N1) is completely to the left of STREE(N2). This means that,</Paragraph>
    <Paragraph position="12"> then jp~kl.</Paragraph>
    <Paragraph position="13"> A SSTC is irg_LE~\[ if. for each elementary interval w(i_i-1), there is a node N such that SNODE(N) = w{i i+1).</Paragraph>
    <Paragraph position="14"> A SSTC is gQEP_Lg_C~ if each elementary interval Is contained in SNODE(N) for some node N. A SSTC is of the g~ if SNODE(N) is empty for each non terminal node N.</Paragraph>
    <Paragraph position="15"> 3. ~_R~PRE~SNTATION In the examples above, we have encoded the correspondence in the tree. However, this is in practice not always necessary, or even practical. in the case of explicit and projective SSTCs, for instance, the string can De obtained directly from the tree, and there is no need to show the intervals, Note that, in the process of generating a string from a tree, one naturally starts from the top, not knowing the final length of the string, and goes down recurs \]rely, dividing this i nt erv~a \] into smaller intervals. Rather than to introduce variables representing the extremities of the created intervals, it may be more practical to start from a fixed interval, say 0_1 or 0 lO0. Yhen. the Positions between the elements of  the string will be denoted by am increasing sequence of rational numbers (0, 1/3, 1/2, 5/?), etc.</Paragraph>
    <Paragraph position="16"> In the case of &amp;quot;local&amp;quot; non-projectivtty, we have tried some devices using two relative integers (POS,LEV) associated with each node N. POS(N) st~ws the relative order in the subtree rooted at mother(N), if LEV(N)=O, or more generally at tts LEV(N/I) ancestor, if t.EV(N)&gt;O. Unfortunately, all these schemes seem to work only for particular situations.</Paragraph>
    <Paragraph position="17"> Also, if the SSTC is overlapping, or' not complete, 1( may be computationally costly fo find the (sma\]lest) subtree associated with a given (possib\]y discontinuous) substrtng. But this operation would be essential in a &amp;quot;structural&amp;quot; editor of NL texts. A possibility is then to encode the correspondence both in the tree and in the string.</Paragraph>
    <Paragraph position="18"> Finally, take the example of tree (15) above. Suppose that the user of a NL editor wants to cllange bl (Paul, in the corresponding NL example) in a way v~Hch may contradict some agreement constraint between al, v, bl and el. One should be able to ftmd the smallest SSIC containing al and other elements, that is, the subtr'ee V(al,bl,cl) and the discontinuous substring al v bl cl (the notation a..v.b..c., might be suitable, if one wants to avoid indices).</Paragraph>
    <Paragraph position="19"> For these reasons, it may be werth~qile to COnSider the possibility of representing the $gTC independently of beth the tree and the string. This is actually the ldea behind the formalism of gTCG (String-Tree Correspondence Grammar).</Paragraph>
    <Paragraph position="20"> The static grammars of (Vauquols &amp; Chappuy 85) are devices to define string-tree correspondences. They have been formalized by the STCGs of (Zahartn 86).</Paragraph>
    <Paragraph position="21"> Here, a context-free ltke apparatus of rules (also called &amp;quot;boards&amp;quot;, for &amp;quot;planches&amp;quot; in French, because they are usually written with two~dtmenslonal tree diagrah~s) is used to construct the set of &amp;quot;legal&amp;quot; SSYCs. The axioms are all pairs (X,Y($F)), where X is an unbounded string variable, Y a starting node (standing for SENTENCE, or TITLE, for example), and SF is an unbounded forest variable.</Paragraph>
    <Paragraph position="22"> The terminals are all pairs (x,x'), where x is an element of a strtng and x' a one-node tree vZ~ich represents it.</Paragraph>
    <Paragraph position="23"> The rules chow how a SSTC t8 made up of smaller' ones. \]he generated language ts the set of all variable-free (&lt;strlng&gt;,&lt;tree&gt;) pairs derivable from an axiom by the grammar rules.</Paragraph>
    <Paragraph position="24"> In order to avoid undue formalism, let us give an example for the formal language (an bn cn I n&gt;O).</Paragraph>
    <Paragraph position="25"> IRule RI: (@ b c , S(a0 b, c)) w</Paragraph>
    <Paragraph position="27"> X, Y and Z are string variables, SF ~ forest variable, and the indices are Just there to distinguish elements with the same label.</Paragraph>
    <Paragraph position="28"> Actually, the formalism is a bit more precise and powerful, because it is posslb\]e to express that a correspondence in the r.h.s. (right hand side) is obtained only by certain rules, and to restrict the possible unifications (rather, a sparta1 Ktnd called &amp;quot;identifications&amp;quot; in (Zaharim 86}). 1'0 illustrate this, we may rewrite the last element of the r.h.s, as:</Paragraph>
    <Paragraph position="30"> identifying X in xYZ with ax in axbYeZ (in the \].I~,s.).</Paragraph>
    <Paragraph position="31"> In the ver'sTon of (Zaharin 86}, the correspondence is alv~,ays oF cor~st ituent type, because time only appl teat tons considered had been to m-structures used for L4T, where non--terminal nodes do not directly correspond to subst rings.</Paragraph>
    <Paragraph position="32"> But tills is by no means necessary, as the next example illustrates, with the language (an v bn cn 1 n&gt;0).</Paragraph>
    <Paragraph position="33">  (aavbbco. tree (15)). But something has to be added to dist ingu ish the STREE and SNODE parts.</Paragraph>
    <Paragraph position="34"> We simply associate to each constant or&amp;quot; variable appearing in a STCG rule one or two expressions represem ing the STREE and SNODE sequences, separated by a &amp;quot;/&amp;quot; if necessary, with basic elements of the form &amp;quot;p_q&amp;quot;, ~.~here p and q are constant or&amp;quot; variab\]e mdtces. In any given (&lt;string&gt;,&lt;tree&gt;) Dair, we associate one such expression to each element of &lt;string&gt;, and two to each node of &lt;tree&gt;, the first for STREE and the second for&amp;quot; SN0bE. The second may be omitted: by default, SNODE is taken to be empty on internal nodes and equal to STREE on leaves.</Paragraph>
    <Paragraph position="35"> Our last example may now be rewritten as follows.</Paragraph>
    <Paragraph position="36">  We will now give examples of STCBs which give rise to unnatural correspondences end try to derive some constraints on the rules. Let us first slightly modify  In the first element of R2, XYZ has been replaced by ZYX. The following representation tree (16) would have been naturally associated with the string al.aZ.a3.bl.bE.b3.cl,cE,o3 by our first STCG. With this modtPlcation, it becomes associated with a1.o2.a3.bl.bE.b3.cl.a2,03, as sho~ in the next diagram. ........................................................... +  etr lngs The problem here is that the subtree rooted at S,2, considered as e whole tree, should correspond to the strtng a2.c3.b2.b3.c2.a3, and that it corresponds to 02.a3.b2.ba.a2.c3 when embedded in the whole tree rooted at S,1.</Paragraph>
    <Paragraph position="37"> The STREE Correspondences are not properly def ined, because one should be able to distinguish between different permutations of the Intervals, which is clearly  impossible with our previous definitions and representations of SSTCs.</Paragraph>
    <Paragraph position="38"> This is because the order of the elements of the strings is not compatible in the l.h.s, and in the r.h.s.: our first constraint will be to forbid this in STCG rules.</Paragraph>
    <Paragraph position="39"> Our second constraint will be to forbid the use of auxiliary variables which do not correspond to substrlngs (subtrees) of tme terminal (variable-free) pairs produced by the STCG.</Paragraph>
    <Paragraph position="40"> Let us illustrate this witl~ the following STCG, which constructs the representation tree S(A(u),B(v)) for each word w on (a,b,e) of even length such that w=uv and MU=NV.</Paragraph>
    <Paragraph position="42"> F~gure 15: Example of STCG with auxiliary variables There is a natural SSTC between the representation tree and the string. For example, we get S(A(a,b,c),B(b,a,c)) for w=abcbac, But the construction of this final correspondence involves the construction of pairs SUCh as (abcPPP,S(A(a,b,c),P,P,P)), w~ich are just used for counting.</Paragraph>
    <Paragraph position="43"> If we try to put sequence expressions on the P nodes and string elements, we notice that it would be necessary to extend the intervals of w, rather than to divide them, Otherwise, we would make the first P of aDoPPP correspond to the second b of w=abcbac, which is quite natural, but what would we associate to the first P of bBcPPP ? \]f we represent explicitly (and separately) the structure of a given (&lt;string&gt;,&lt;tree&gt;) element of the SSTC by its derivation tree in the STCG, the second constraint will allow us to instantiate all variables by substrings or subtrees of &lt;string&gt; and &lt;tree&gt;, wtthout having to construct other auxiliary strings and trees.</Paragraph>
    <Paragraph position="44"> This, of course,' would permit a mope economical ~mplementation, in terms of space.</Paragraph>
    <Paragraph position="45"> Finally, note that the interesting properties of SSTCs mentioned in Ill.l above have simple expressions as constraints on the rules of our extended STCG formalism. CONCLUDIN6 R~MARK~ Trees have been widely used for the representation of naturat language utterances. However, there have been arguments saying that they are not adequate for representing the so-called 'discontinuous' structures.</Paragraph>
    <Paragraph position="46"> This has led to various solutions, relying, for instance, on encoding the desired information in the nodes (e.g. 'eoindexin9&amp;quot;), or on oefining trees with &amp;quot;discontinuous&amp;quot; const i tuents.</Paragraph>
    <Paragraph position="47"> We have presented here a proposal for representing discont inuous constituents, and, more generally, non-projective and uncomplete SSTCs with overlapping.</Paragraph>
    <Paragraph position="48"> The proposal uses the ordinary definition of ordered trees. This is made possible by separating the representation tree from the surface utterance (which the tree is a representation of). The correspondence between the two may be represented explicitly by means cf sequences of intervals attached to the nodes.</Paragraph>
    <Paragraph position="49"> This opens Up a discussion on (and definitions of) structured string-tree correspondences in general. Thls representation might also be used in syntactic editors for programs or In syntact~co-semanttc editors for NL texts.</Paragraph>
    <Paragraph position="50">  Finally, the formalism of the String-Tree Correspondence 6rammar has been extended to glve the means of representing the said structured correspondences.</Paragraph>
    <Paragraph position="51"> An analogous problem is to define structured correspondences between representation trees, for lnstanoe between source and target interface structures in transfer-based MT systems. We do not yet know of any satisfactory proposal.</Paragraph>
    <Paragraph position="52"> A solution to this problem would give two very Interesting results: - first, a way to specify structural transfers in a reasoned manner, just as STCGs are used to specify structural analysers or generators, second, a way to put a text and its translation in a very fine-grained correspondence. This is quite easy with word-for-word approaches, of course, and also for approaches using classical (projective) PS trees or dependency trees, but has become qutte difficult with more sophisticated approaches using p-structures or m-structures.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML