REPRESENTATION TREES AND 
STRING-TREE CORRESPONDENCES 
presented for COLIN6-88 
Budapest, 22-27 August 1988 
by 
Ch. B01TET & Y.ZAHARIN 
GETA, BP 68 
Urliversit@ de Grenoble et CNRS 
38402 Saint-Martin-d'H@res, FRANCE 
PTMK, Universiti Satns Malaysia 
11800 Penang, MALAYSIA 
AB~AC E 
The corresponderlce between a string of a language and 
its abstract representation, usually a (decorated) tree, 
is not Straightforward. Ilowever, it is desirable to 
maintain it, for Example to build structured editors for 
tex ts wr 1 t t El/ i n nat urn 1 Ianguage. AS such 
ccr'resp)ndences must be compos 1 t iona\] , we ca \] I ~hem 
"Structured Strmg--lree Correspondences" (SSTC). 
We ~jrgue that a SSTC is m fact composed of two 
mterrelated correspondences, one between nodes and 
substr ings, and the other between subt tees and 
substrings, the substrings being possibly discontinuous 
in both cases. We then proceed to show how to define a 
SSTC witl~ a Structura! Correspondence Static Grammar 
(SCSG), and ~qich constraints to put on the rules of the 
SCSG to get a "natural" SSTC. 
Kev~d'~ : linguist ic dascr lpt ors, distort inuous 
consti tuents, discont imuous phrase structure grammars, 
st rLICt ured str ing- tree correspondences, structural 
corrosp:)ndence static gralilnlars 
t~t~),&~D~: DPSG, M\], N\[., SSIC, STCG. 
~U.¢3JLQ_N 
Ordered trees, annotated with simple labels or COmplex 
'cecora~ions" (property lists), are widely used for 
representing natural language (NL) utterances. This 
oErresponOs to a hierarchical view: the utterance is 
decomposed into groups and subgroups. When the depth of 
lmguiscic analys~s is suc~ that a representation m 
terms of graphs, networks or sets of formulas would l)e 
more Jirect, one often st i \] I prefers to use tree 
structures, at the price of encoding the desired 
informa::ion in the decorations (e g., by "ooindexing" two 
or more nodes). This is because trees are conceptual\]y 
and a\]gorithmical\]y eas~er to manipu\]ate, and also 
because all usua\] interpretations based on the linguistic 
structure are more or less "compositiona\]" in nature. 
If a language is described by a classical Phrase 
Structure Grammar, or by a (projective) Dependency 
Grammar, the tree structure "contains" the associated 
string in some easily defined sense. \]n particular, the 
surface order of tile string is derived from some ordered 
traverse1 of the tree (left--to-right order of the leaves 
of a constituent tree, or infix order' foe a dependency 
tree). 
However, if one wants to associate "natural" 
structures to strings, for examole abstract trees for 
programs or predicate-argument structures for NL 
utterances, this is no longer true. Elements of the 
string may have been erased, or duplicated, some 
"discontinuous" groups may have been put together, and 
the surface order may not be reflected in the tree (e.g., 
for' e normalized representation). Such correspondences 
must be compositional: the complete tree corresponds to 
the complete string, thee subtrees correspond to 
suPstrings, etc. Hence, we call them "Structured 
String-tree Correspondences" (SSTC). 
For some applications, like classical (batch) Machine 
Translation (MT), it is not necessary to Keep the 
correspondence explicit: 'For revising a translation, it 
is enough to show the correspondence between two 
sentences or two paragraphs. 14owever, if one wants to 
build structured editors for texts written tn natural 
language, thereby using at the same time a string (the 
text) and a tree (its representation), it seems necessary 
to represent explicitly the associated SSTC. 
In the first part, we briefly review the types of 
string-tree correspondences whloh are implied by the most 
usual types of tree representations of NL utterances. We 
argue that a SSTC should in fact be composed of two 
interrelated correspondences, one between nodes and 
substrings, and the other between subtrees and 
substrings, the substrings being possibly discontinous m 
both cases. This is presented in more detail in the 
second part. \]n the last part, we show how to define a 
SSTC with a Structural Correspondence Static Grammar 
(SCSG), and which constraints to put on the rules of the 
SCSG to get a "natural" SSTC. 
\[. ~CRR~N~..E,_j~TWEEN A STRIN~ 
1. p~F~E ~TRUCTURE TREES (C-STRUCTURESI 
Classical Phrase Structure trees give rise to a very 
simple Kind of SSTC. To each string w = al...an, let us 
associate the set of interva\]s i j, O~i~j~n. w(i j} 
denotes the substring ai...a3 of w if i<o, 6 otherwise. 
The root, or equtva\]ently the M\]o\]e tree, corresponds 
to w = wl0n). Each \]ear corresponds to some substring 
w(i j), of length 0 or 1 (we may extend this to any 
lengtm if terminals are allowed to be themselves Strings 
Then, the correspondence is such that any internal node 
of the tree, or equivalently each tree "complete" m 
breadth and depth, COrrespondS tO w(i.j), iff its m 
daughters (or' its m immediate subtrees), in order', 
correspond to a sequence w{iL_jl),...,w(im gm\], 
such that i1=i, jm=j, and jk=ik*l for O<k<m. 
This type of correspondence is "prooectwe" It has 
however Peer\] argued that classical phrase structure trees 
are maaequate for" charaoterising syntactic 
representations in genera\], especially in the ease of 
so-oat\]ed "discontinuous"constttuents. Here are some 
examples. 
(1) John Z lkiL_\[k~C~, of course, ~j~. 
(2) He ~ the ball £/{Q. 
(3) Je ~ le lui al ~ donn@. 
(I did not give it to him) 
According tO (McOawley 82), sentence (1) contains a 
verb phrase "talked about politics", wlnich is divided by 
the adverbial phrase "of course", which modifies the 
#~ole sentence, end not only the verbal kernel (or the 
verbal phrase, in ChomsKy's terminology). Sentence (2) 
contains the particle "up", whtoh ls separated from its 
verb "picKed" by "the ball", In sentence (3), the 
discontinuous negation "ne..,pas" overlaps with the 
composed form of the verb "ai...donn~". Moreover, i{ a 
sentence in active voice ls to be represented in a 
standard order (subject verb object complement), this 
sentence contains two displaced elements, namely the 
object "le" and the complement "lui". 
(McCawley 82) and later (Bunt & al 87} have argued 
that "meaningful" representations of sentences (2) and 
(3) should be the following phrase structure trees, (4) 
and (5), respectively. 
59 
+ ....................................................... + 
S (4) S (5) ! 
! ! I I 
! ! ! ! VP ! 
! ~ ! ! ! I I 
! ! ! ! ! V I __ ! 
! NP ! VP ! ) NP ! ! 
I I ~1 I ; I I I I 
! ! ! ! ! ! ! ! ! I ! 
! ! V ADVP PP He picked the bali upl 
! ! ! ! ~ I 
! ! ! ! ! ! ! ! 
!John talked of course about politics I 
....................................................... ÷ 
Figure l: Examples of discontinuous phrase structure 
trees 
Along the same line, and taking into consideration the 
displaced elements, a "meaningful" representation for 
sentence (3) would be tree (6). 
+ ........................................................ 
S (6) 
NP ! VP 
i ! ! ! ! 
I V NP NP 
! ! ! ! ! I 
! ! ! ! ! 
! NEG! ! ! ! 
! ~ ! ! ! ! ! 
de ne le lui ai pas donne ÷ ....... ~ ............................................... 
Figure 2: Example of discontinuity and displacement 
here, the correspondence is establ iehed between a node 
(or equivalently the complete suDtree rooted at a node) 
and a sequence of intervals. If a displacement arises, ee 
in (3), the left-to-right order of nodes in the tree may 
be incompatible with the order of the corresponding 
sequences of intervals in the strtng (the considered 
ordering is the natural lexioographic extension). 
Rather than to introduce the awkward notion of 
"discontinuous" tree, as above, with intersecting 
branches, we suggest to keep the tree diagrams in thelr 
usual form and to show the string separately. For 
sentence (3), then, we get the following diagram. 
. ....................................................... 
! S ('/) 
q 
! ! ! I 
! NP NEG __.VP _ 
! ! ! ! ! ! I 
! Je ne pas V NP NP 
! : : : ! ! ! I 
! . : : : .................. ai ........ donne le lul 
! : : : : : : ', 
! : : .................... : ........ : ................ : ................ : : 
! : : : .......... : ........ : ................ : ............................ : \] , : ' : : : : 
! de ne le lui ai bae donn6 
+ ....................................................... 
Figure 3: Separation of a string and its "dlsoontlnou8" 
PS tree 
NOw, as before, the root of the tree still corresponds 
to w=w(0_n\], and a leaf corresponds to an interval of 
length O or 1 (or more, see above). But an internal node 
with m daughters corresponds to a sequence of intervals, 
~hich' is the "union" of the m sequences corresponding to 
Its daughters. 
More precisely, a "sequence" of Intervals is a llst of 
the form S = w{il_jl) ..... wlip_jp}, in order (Ik<Ik+1 for 
O<K<p) and without overlapping (jk<ik+1 for O<k<p). Its 
union (denoted by "+") with an interval I = w(i j} is the 
smallest list containing all elements of S and of I. For 
example, S+I is: 
S itself, if there is a k such that ik<t and j_<jk; 
S, augmented with wii J} inserted in the proper place, 
if j<il or jp<i or there is a k<p such that Jk<i and 
j<i K* 1 ; 
60 
w{ll_jl} ..... w{tq_jq}.w{i_Jr} ..... w(lp_jp}, if there 
are q and r such that Jq<t~lq+l and tr~j~jr (other 
cases are analogous), 
2. DEPENDENCY TREES (F~S~RUCTUR~) 
In classical dependency trees, elements of the 
represented string appear on the nodes of the tree, with 
no auxiliary symbols, except a "dummy node", often 
indicated by "=", which serves to separate the left 
daughters from the right daughters. 
There are two aspects in the correspondence. First, a 
node corresponds to an element of the string, usually an 
interval of length 1. Second, the complete subtree rooted 
at a node corresponds to the tnterval union of the 
intervals corresponding tO the node and to Its subtree. 
These intervals may not overlap. 
The string can be produced from the tree by an tnorder 
traversal (one starts from the root, and, at any node, 
one traverses ftrst the trees rooted at the left 
daughters, then the node, then the trees rooted at the 
right daughters, reeursively). 
Sentences (1) and (2) might be represented by trees 
(8) and (9) below. 
+ ......................................................... ÷ 
! talked (8) picked (9) ! 
! ! I ! 
! t ! ~ ! ! ! ! ! I 
!John ' of__ about He = _ball up ! 
ISUBJ : ADVS I OBJ1 ! SUBJ : l OBJ1PTC! : : ! ! ! ! : : ! ! : ! 
: : = course = politics : : the " : ! 
: : : : : .... : ; : DES : : t ; : : : : : . . : . : ! 
: : : : : : Hi picked the bail up ' 
• , , : , , ! 
IJohn taiKed of course about polltics ! ÷ ......................................................... ÷ 
Figure 4: Examples of classical dependency trees 
\]n those trees, the discontinuities shov;n tn the PS 
trees (4) and (5) have disappeared. We have shown on some 
nodes the syntactic functions usually attached to the 
edges. 
There may be some discussion on the structures 
produced. For example, some linguists would rather" see 
"politics" dominating "about". This tS not our" tOpiC 
here, but we wtll use this other possibility in a later 
diagram. For the moment, note that discontinuity does 
not always disappear in dependency trees. Here is an 
example corresponding to sentence (3). 
+ ........................................................ 
I aonn6 (10) 
I ! 
!! ! ! I I ! 
I de ne le Jut at = 
ISUBJ NEG OBJ1 OBJ2 AUX : 
: ! ! : : : 
: = pa8 : : : 
: : NEG2.,..: .......... : .......... : .......... : : : : : : ; : 
• : , , : : . 
j; ne le 1;t at pas d&n6 
Figure 5: Example of a "dtsoonttnous" dependency tree 
Let us now take a simple example from the area of 
programming languages, ~¢nioh $he~ an abstract tree 
associated to an assignment, ~here some elements of the 
string are "missing" in the tree, and where a node 
oorreeponds to a "discontinuous" substring (a sequence of 
intervals). 
÷ .......................................................... 
! if then_else (01+2_3+6_7) (11) 
! i 
! ! ! ! 
! ok =: (4 5) =: (8_9) 
! 12 _ ! i f ! ! ! ! 
! a x " (10 11) x 
! 56 34 ! ?_8 
! a + (13_14) 
! 910 ! ! ! ! 
! b 0 
! 12_13 1415 ! 
! if ok then x := a else x := a ~ ( b + e ) ! 
!01_23456"/_89_10 1_12_13_14_15_16 ÷ ......................................................... 
Figure 6: Example of "abstract" tree for a formal 
language expression 
Here, we have shown the correspondence between nodes 
and sequences. The parentheses are mlsstng in the tree, 
wtqich means that the sequence corresponding to the 
subtree rooted at node "+" is more than the union of the 
sequences oorrespondfng to its subtrees. However, there 
is no overlapping between sequences corresponding to 
independent nodes or suPtrees. 
Anoeher remark is that the elements appearing on the 
nodes are not always identical with elements of the 
represented string. FOr example, we have replaced ":=" by 
"=. " ~nd the (discontinuous) substring "if then else" by 
"if thE.m else", in a usual fashion. 
3. P_RED OATE-ARGUMENT TREES (P-STRUCTURES) 
In "predicate-argument structures", it is usual to 
construct a unique node for a compound predicate, in the 
same ~;pirit as the "if_then_else" operator above, With 
sentences (1) and (2), for example, we could get trees 
(12) and (13) below. Beside the logical relation 
(argument place) or the semantic relation, the nodes must 
also contain some other information, like tense, person, 
etc,, ~hich is not sho~n here. 
! __~! I I 
! ! ! ! I I 
! John of course politics He ___ball I 
! ARGO ESTII~ __ ARG1 ARGO ! ARe1 I 
! 0 1 2 4 \[ 5_6 0 1 the 3 4 I 
! about 2_3 I 
TOPIC ! 
! 4_5 He picked the ball up I 
! 01 2345 I 
t I 
! John talked of course about poltttcs I 
!0 I 23 4 .5__6 I ÷ ........................................................ ÷ 
Figure ?; Examples of predteate-argumont trees 
We now come to Situations where overlapping occurs, 
and ~r}re It ts natural to consider "tnooaplete" subtree8 
corresl)ondtng to "dlsco~ttnous" groups. 
Thhl occurs frequently in eases of coordination with 
elision, as tn: 
"John and Mary give Paul and Ann trousers and 
Cresses." 
In order to simplify the trees, ~ abstract this by 
the f{)rma\] language {an v bn on t n>O}, and propose the 
two i:rees (14) and (15) below for the string 
"a a v b b c c" (also written a. 1 a.2 v b. 1 b.2 e.l c.2 
to sl~L}w the positions) as more "natural" representations 
than i:he syntactic tree derived from a context-sensitive 
grammar in normal form for this language (all rules are 
of the form "1A r --~ 1 u r", 1 and r being the left and 
right ~:ontext, respectively). 
........................................................ + 
(14) V (0_7/2_3) V (0_7/2_3) (15) 
I 
! I ! I ! I ! 
A (0 Z) g (3_5) C (5_7) a.1 b.1 c. 1 V (1_3.4_5÷6_*/) 
t I ! 0_13_45_6 __! /23) 
I ! ! ! ! ! P I 
la. 1 A b. 1 B c.1 C (6_*/) a.2 b.2 c.2 
\]0_1 ! 8_4 I 5_6 I 12456_*/ 
a.2 b.2 e.2 
l_Z 4_5 6_*/ 
a a v b b c c 
O__ 1 __2__.3__4__5 L7 
Figure 8: Examples of p-etructures for al a2 v bl b2 cl 
c2 
On certain nodes, we have represented the sequence 
corresponding to the complete 8ubtree rooted at the node, 
fel \]owed by the sequence Corresponding to the node 
itself. For nodes A, B, C in tree (14), this "local" 
8equanoe ts empty. 
In both trees, tt i8 clear that the sequence al V bl 
ol corresponds to an "incomplete" subtree, namely 
V(A(al),B(bl),C(cl)) In (14) and V(al,bl,cl) in (15). 
In tree (14), the cOOrdination is shoal directly on 
the graph, and the verb (V) is not shown as elided. \]t is 
a matter of further analysis to accept or not the 
distributive Interpretation ("respectively" may hold 
between the three groups, the last two ones, or nones). 
On the contrary, tree (15), in a sense, is a more 
"abstract" representation. It shows directly the 
interpretation as a coordination of two sentences, and 
"restores" the elided V. 
4, MULTILEVEL TREES (M-STRUCTURES) 
Multilevel tree structures, or m-structures for short, 
have been introduced by B.VAUQUOIS in 19.//4 (see (Vaupuols 
*/8)) for the purposes of Machine Translation. On the 
same graph, three "levels of interpretation" are 
described (constituents, syntactic dependencies, logical 
and semantic relations). AS seen in other examples 
above, the nodes ~¢nich refer directly to the string do 
not contatn elements of the string, but rather 
representatives of (sequences of) elements of the string, 
called "lexical units" (LU), like "repair" for 
"reparation", plus some information about the derivation 
used. 
The graph is deduced by simple rules from a dependency 
tree: each tnternat node t8 "lowered" tn the "'" position 
and its syntactic function becomes "GOV" (for "governor", 
or head in some other terminology), discontinuous lexical 
elements (like "ne...pas" or "al...denn~" are represented 
by one node, coordination ts represented by "vertical 
ltsts" as tn tree (14), lextoal units of referred 
element~ are put In the nodes corresponding to the 
pronouns, an approximation of colndexlng, etc.. 
From the point of view of the associated 
correspondence between representation trees and 
represented 8trtngs, nothing new has to be mentioned. 
II. A PROPOSAL: STRUCTURED STRING-TREE CORRESPONDENCES 
Our proposal Is now almost complete. 
I. DEFINITIONS 
a) The correspOndence between a string and its 
representation tree ts made of two interrelated 
correspondences: 
between nodes and (possibly discontinuous) 8ubstrings; 
between (possibly Incomplete) subtrees and (possibly 
dtsconttnous) substrlngs. 
gl 
b) It can be encoded on the tree by attaching to each 
node N two sequences of intervals, called SNODE(N) and 
SIREE(N), such that: 
1. SNODE(N) ~ STREE(N), v~ich means that SNODE(N) 18 
"contained" tm STREE(N) with respect to tts basic 
elements (the w(i j}), that is, that 
StREE(N) = STREE(N) g SNODE(N). Note that equality 
can not be required, even on the leaves, because 
the string "( b )" may well have a representation 
tree with the unique nede b. 
2. if N has m daughters Nt...Nm, then 
STREE(N) ~ STREE(N1)+...+STREE(Nm) + SNODE(N). \]n 
case of strict containment, the difference 
correspond to the elements of the string which are 
represented by the subtree but which are not 
explicitly represented, like "(l' and ")" in 
"( b )". 
c) The sequence SSUBT(X,N) corresponding to a given 
mcomplete subtree X rooted at node N of the whole 
tree T is defined recursively by: 
SSUB1(X,N) : STREE(X) if X : N, that is, if × iS 
reduced to one node, not necessarily a leaf of T; 
SSUBT(X,N) = SSUBT(XI)*..I+SSUBT(XD) U SNODE(N). if N, 
the root of X, has p subtrees XI...XD in T. 
In other words, one takes the smallest sequence 
contaming the bi9gest sequence corresponding to the 
leaves of x (S\]REE on the leaves) and compatible with 
the monotony rules above. 
Here are some interesting properties of SSTCs which 
may help to classify them. 
A SSTC iS ~ if 
STREE(NI) and STREE(N2) have an empty intersection If 
N1 and N2 are independent; 
SNODE(N1) and STREE(N2) have an empty intersection if 
N2 is a daughter of NI. 
A SSTC is eB_~~eta~_t&~ if 
it is non-overlapping; 
for any two sister nodes N1 and N2, N1 to the left of 
N2, STREE(N1) is completely to the left of STREE(N2). 
This means that, 
if STREE(N1) = w(il_jl},..w(ip_jp) or 
and STREE(N2) = w(kl_ll}...w(kqlq} or ~, 
then jp~kl. 
A SSTC is irg_LE~\[ if. for each elementary interval 
w(i_i-1), there is a node N such that 
SNODE(N) = w{i i+1). 
A SSTC is gQEP_Lg_C~ if each elementary interval Is 
contained in SNODE(N) for some node N. 
A SSTC is of the g~ if SNODE(N) is empty 
for each non terminal node N. 
3. ~_R~PRE~SNTATION 
In the examples above, we have encoded the 
correspondence in the tree. However, this is in practice 
not always necessary, or even practical. 
in the case of explicit and projective SSTCs, for 
instance, the string can De obtained directly from the 
tree, and there is no need to show the intervals, 
Note that, in the process of generating a string from 
a tree, one naturally starts from the top, not knowing 
the final length of the string, and goes down 
recurs \]rely, dividing this i nt erv~a \] into smaller 
intervals. Rather than to introduce variables 
representing the extremities of the created intervals, it 
may be more practical to start from a fixed interval, say 
0_1 or 0 lO0. Yhen. the Positions between the elements of 
G2 
the string will be denoted by am increasing sequence of 
rational numbers (0, 1/3, 1/2, 5/?), etc. 
In the case of "local" non-projectivtty, we have tried 
some devices using two relative integers (POS,LEV) 
associated with each node N. POS(N) st~ws the relative 
order in the subtree rooted at mother(N), if LEV(N)=O, or 
more generally at tts LEV(N÷I) ancestor, if t.EV(N)>O. 
Unfortunately, all these schemes seem to work only for 
particular situations. 
Also, if the SSTC is overlapping, or' not complete, 1( 
may be computationally costly fo find the (sma\]lest) 
subtree associated with a given (possib\]y discontinuous) 
substrtng. But this operation would be essential in a 
"structural" editor of NL texts. A possibility is then 
to encode the correspondence both in the tree and in the 
string. 
Finally, take the example of tree (15) above. Suppose 
that the user of a NL editor wants to cllange bl (Paul, in 
the corresponding NL example) in a way v~Hch may 
contradict some agreement constraint between al, v, bl 
and el. One should be able to ftmd the smallest SSIC 
containing al and other elements, that is, the subtr'ee 
V(al,bl,cl) and the discontinuous substring al v bl cl 
(the notation a..v.b..c., might be suitable, if one 
wants to avoid indices). 
For these reasons, it may be werth~qile to COnSider 
the possibility of representing the $gTC independently of 
beth the tree and the string. This is actually the ldea 
behind the formalism of gTCG (String-Tree Correspondence 
Grammar). 
The static grammars of (Vauquols & Chappuy 85) are 
devices to define string-tree correspondences. They have 
been formalized by the STCGs of (Zahartn 86). 
Here, a context-free ltke apparatus of rules (also 
called "boards", for "planches" in French, because they 
are usually written with two~dtmenslonal tree diagrah~s) 
is used to construct the set of "legal" SSYCs. 
The axioms are all pairs (X,Y($F)), where X is an 
unbounded string variable, Y a starting node (standing 
for SENTENCE, or TITLE, for example), and SF is an 
unbounded forest variable. 
The terminals are all pairs (x,x'), where x is an 
element of a strtng and x' a one-node tree vZ~ich 
represents it. 
The rules chow how a SSTC t8 made up of smaller' ones. 
\]he generated language ts the set of all variable-free 
(<strlng>,<tree>) pairs derivable from an axiom by the 
grammar rules. 
In order to avoid undue formalism, let us give an 
example for the formal language (an bn cn I n>O). 
IRule RI: (@ b c , S(a0 b, c)) w 
t ==> ! 
I (a,a) (b,b) (c,c) ! 
IRule R2: (a X b Y c ~ , S.l(a, b, c, S.2($F)) ! ! ~=> ! 
! (a,a) (b,b) (e,c) (X Y Z, 5.2($F)) ! + ................................................................. 
Figure 9: A slmp\]e SCSG for an bn-cn 
X, Y and Z are string variables, SF ~ forest variable, 
and the indices are Just there to distinguish elements 
with the same label. 
Actually, the formalism is a bit more precise and 
powerful, because it is posslb\]e to express that a 
correspondence in the r.h.s. (right hand side) is 
obtained only by certain rules, and to restrict the 
possible unifications (rather, a sparta1 Ktnd called 
"identifications" in (Zaharim 86}). 1'0 illustrate this, 
we may rewrite the last element of the r.h.s, as: 
+ ................................................................................ + 
! (X Y Z, S.2($F)) !.. 
i with ref ! 
\[ "(RI: X/a, Y/b, Z/e, S 2/S ! 
! !R2: X/aX, Y/bY, Z/cZ,' $F/(a,b,c,S.2($F))) ! ÷ .............................................................................. ÷ 
Figure IO Exarllple of with r'ef cart in a r.h.s. 
R2: X/aX,... means that the subcorrespondenoe 
(XYZ,S.2($F)) may be generated by rule R2, thereby 
identifying X in ×YZ with ax in a×bYeZ (in the \].I~,s.). 
In the ver'sTon of (Zaharin 86}, the correspondence is 
alv~,ays oF cor~st ituent type, because time only appl teat tons 
considered had been to m-structures used for L4T, where 
non--terminal nodes do not directly correspond to 
subst rings. 
But tills is by no means necessary, as the next example 
illustrates, with the language (an v bn cn 1 n>0). 
+ ................................................................................. ÷ 
!Rule RI: (a b C , V(a, b, C)) ! 
! ==> I 
! (a,a) (v,V) (b,b) (c,c) ! 
• i ........................................................................... .j 
!Rule R2: (a x v b Y o Z , V.l(a, b, c, V.2($F)) I ! ==> ! 
! (e,a) (v,V.1) (b,b) (c,c) (X v Y Z, V.2($F)) ! 
! with ref ! 
! (RI: X/a, Y/b, Z/C, V.2/V, SF/(a,b,c) ! 
! !RE: X/aX, Y/bY, Z/cZ, V.2/V.1, $F/(a,b,c,V.2($F))) ! 
F:iguro ll: SI'CG for an v bn on giving tree (15) 
II\] i S STCG generates correspondences such as 
(aavbbco. tree (15)). But something has to be added to 
dist ingu ish the STREE and SNODE parts. 
We simply associate to each constant or" variable 
appearing in a STCG rule one or two expressions 
represem ing the STREE and SNODE sequences, separated by 
a "/" if necessary, with basic elements of the form 
"p_q", ~.~here p and q are constant or" variab\]e mdtces. 
In any given (<string>,<tree>) Dair, we associate one 
such expression to each element of <string>, and two to 
each node of <tree>, the first for STREE and the second 
for" SN0bE. The second may be omitted: by default, SNODE 
is taken to be empty on internal nodes and equal to STREE 
on leaves. 
Our last example may now be rewritten as follows. 
+ ..................................................................... + 
!Rule RI:V 
(t0_t2 +i3_i4+i5_i6/il i2) 
( i ) 
a lvi2 b o a b c tO tl i _ i3j415_i6 i0_tl t3 i4 t5_i6 
==> 
(a,a), (b,b), (c,o) 
IRule R2: b 
( a x v b Y c z t 
i0_11 il i2 i2_13 t4_I5 i5 i6 t7_i8 t8_19 I i 
V,1 (t013+i4 I6+i7_t9/12 i3) i 
I i 
+ I t ! ) q 
a b c V.2 i 
lO_il i4 t5 i7 i8 (11 t3+i5 i6÷i7 i9/iE, i3) i 
! i 
SF ~:=> 
i 
(a,a) (b,b) (v,V.1) (c,c) (x v Y Z , V.2) I 
I i 
SF i 
wtth Per 
(RI: X/a, Y/b, Z/c, V.2/V, $F/(a,b,o) i 
IR2: X/aX, Y/bY, Z/cZ, V.Z/V. 1, $F/(a,b,c,V.2($F))) 
Flgure 12: Extended STOG for an v bn cn 
We will now give examples of STCBs which give rise to 
unnatural correspondences end try to derive some 
constraints on the rules. Let us first slightly modify 
our first STCG for an bn on. 
÷ ................................................................ + 
IRule RI: (a b C , S(a, b, c)) ! 
t ~=> ! 
! (a,a) (b,b) (c,c) ! 
!Rule R2: (a Z b Y c X , S,l(a, b, c.S.2($F)) + ! ==> ! 
! (a,a) (b,b) (c,c) (X Y Z, S.2($F)) ! + ........................................................... + 
Figure 13: Example of "unordered" STCG 
In the first element of R2, XYZ has been replaced by 
ZYX. The following representation tree (16) would have 
been naturally associated with the string 
al.aZ.a3.bl.bE.b3.cl,cE,o3 by our first STCG. With this 
modtPlcation, it becomes associated with 
a1.o2.a3.bl.bE.b3.cl.a2,03, as sho~ in the next diagram. 
........................................................... + 
s,1 (09) (16) ! 
! I t ! 
a.1 b. 1 c.I S.2 (1_3+4_6÷? 9) 
01 34 67 ! I 
! ! ! ! 
a.2 b,2 c,2 S.3 (2~3÷5_6÷8_9) 
78 45 12 I 
___} ..... 
a.3 b.3 c.3 
23 5_6 8_9 
al ¢Z a3 bl b2 b3 ol a2 c3 
0__,__ 1 23__4 __5__6__ ? __8+9 
................................................................. ÷ 
Figure 14: Example of STC6 "unordered" w.r.t, the 
etr lngs 
The problem here is that the subtree rooted at S,2, 
considered as e whole tree, should correspond to the 
strtng a2.c3.b2.b3.c2.a3, and that it corresponds to 
02.a3.b2.ba.a2.c3 when embedded in the whole tree rooted 
at S,1. 
The STREE Correspondences are not properly def ined, 
because one should be able to distinguish between 
different permutations of the Intervals, which is clearly 
{,3 
impossible with our previous definitions and 
representations of SSTCs. 
This is because the order of the elements of the 
strings is not compatible in the l.h.s, and in the 
r.h.s.: our first constraint will be to forbid this in 
STCG rules. 
Our second constraint will be to forbid the use of 
auxiliary variables which do not correspond to substrlngs 
(subtrees) of tme terminal (variable-free) pairs produced 
by the STCG. 
Let us illustrate this witl~ the following STCG, which 
constructs the representation tree S(A(u),B(v)) for each 
word w on (a,b,e) of even length such that w=uv and 
MU=NV. 
+ ....................................................... + 
!Rule RI: ( x P , S(A(x),P) ) ! ! ==> ! 
' (x,x) (P,P) -- x 6 (a,b,c) ! 4 ....................................................... 
!Rule R2: ( x X Y P, S(A(x,$L),$M,P) ) ! 
i ==> ! 
\] ......... ......................... 4 
!Rule RS: ( x Y , S(A($L),B($M)) ) ! 
i ==> ! 
( X Z , S(A($L),$F) ) ! 
L ( Y Z , S(A($M),$F) ) ! + ....................................................... + 
F~gure 15: Example of STCG with auxiliary variables 
There is a natural SSTC between the representation 
tree and the string. For example, we get 
S(A(a,b,c),B(b,a,c)) for w=abcbac, But the construction 
of this final correspondence involves the construction of 
pairs SUCh as (abcPPP,S(A(a,b,c),P,P,P)), w~ich are just 
used for counting. 
If we try to put sequence expressions on the P nodes 
and string elements, we notice that it would be necessary 
to extend the intervals of w, rather than to divide them, 
Otherwise, we would make the first P of aDoPPP correspond 
to the second b of w=abcbac, which is quite natural, but 
what would we associate to the first P of bBcPPP ? 
\]f we represent explicitly (and separately) the 
structure of a given (<string>,<tree>) element of the 
SSTC by its derivation tree in the STCG, the second 
constraint will allow us to instantiate all variables by 
substrings or subtrees of <string> and <tree>, wtthout 
having to construct other auxiliary strings and trees. 
This, of course,' would permit a mope economical 
~mplementation, in terms of space. 
Finally, note that the interesting properties of SSTCs 
mentioned in Ill.l above have simple expressions as 
constraints on the rules of our extended STCG formalism. 
CONCLUDIN6 R~MARK~ 
Trees have been widely used for the representation of 
naturat language utterances. However, there have been 
arguments saying that they are not adequate for 
representing the so-called 'discontinuous' structures. 
This has led to various solutions, relying, for instance, 
on encoding the desired information in the nodes (e.g. 'eoindexin9"), 
or on oefining trees with "discontinuous" 
const i tuents. 
We have presented here a proposal for representing 
discont inuous constituents, and, more generally, 
non-projective and uncomplete SSTCs with overlapping. 
The proposal uses the ordinary definition of ordered 
trees. This is made possible by separating the 
representation tree from the surface utterance (which the 
tree is a representation of). The correspondence between 
the two may be represented explicitly by means cf 
sequences of intervals attached to the nodes. 
This opens Up a discussion on (and definitions of) 
structured string-tree correspondences in general. Thls 
representation might also be used in syntactic editors 
for programs or In syntact~co-semanttc editors for NL 
texts. 
64 
Finally, the formalism of the String-Tree 
Correspondence 6rammar has been extended to glve the 
means of representing the said structured 
correspondences. 
An analogous problem is to define structured 
correspondences between representation trees, for 
lnstanoe between source and target interface structures 
in transfer-based MT systems. We do not yet know of any 
satisfactory proposal. 
A solution to this problem would give two very 
Interesting results: 
- first, a way to specify structural transfers in a 
reasoned manner, just as STCGs are used to specify 
structural analysers or generators, 
second, a way to put a text and its translation in a 
very fine-grained correspondence. This is quite easy 
with word-for-word approaches, of course, and also for 
approaches using classical (projective) PS trees or 
dependency trees, but has become qutte difficult with 
more sophisticated approaches using p-structures or 
m-structures. 

References

{Bunt & a\] 87) H.BUNT, J.THESINGH & K. VAN PER SLOOT 
(1987) 
Discontinuous ognstttuents ~n .~ ~ 
~a~J~_g 
Proc. 3rd Conf. ACL European Chapter, Copenhagen, 
April 1987. 

{McCawley 82} J.D. MCCAWLEY (1982) 
p~renthettoal and discontinuous 
r~ 
Linguistic inquiry 13 (1), 913106, 1982. 
constituent 

{Vauquois 78\] B.VAUQUOIS (1978) 
Description £Le\]_~ ~;~ Jnterm6diatre 
Communication pr~sent~e au colloque de Luxembourg, 
April 1978, BETA document, Grenoble. 

(Vauquois ~ Boiler 85} 8.VAUQUOI8 & CH.BOITET (1985) 
~ranslat~cn6$.GETA ~ University) 
Computational Linguistics, 11:1, 28-36, January 
1985. 

{Vauquois & Chappuy 85) B.VAUQUOIS & S.CHAPPUY (1985) 
~ crammers 
Proc. Conf. on theoretical & methodological issues 
in MT, Colgate Univ., Hamilton, N.Y., August 1985. 

(Zahartn 86\] Y.ZAHARIN (1986) 
t rB_~Leg.t~ a~ ~ m t~e n \]~v.~j~ ~. tD_~Z~! 
nlao.qua~ m I~Lr~ ~ 
Ph.D. Thesis, Untverstti Salns Malaysia, March 1986 
(Research conducted under GETA-USM cooperation GETA 
document, Grenoble. 

(Zahartn 87a} Y.ZAHARIN (1987) 
Strina-Tree Correspondence ~ 8 
r~ ~ gg_f~ tr~ corresoondence I~ 
of ~ and tree structures? 
Prec. 3rd Conf. ACL European Chapter, Copenhagen, 
April 1987. 

{Zaharin 8?b} Y.ZAHARIN (1987) 
The ~ ~ at FE..TAz = ~v_qQa~L~ 
the journal TECHNOLOGOS (LISH-CNRS), prlntemps, 
1987, Paris. 

(ZaJac 86\] R.ZAJAC (1986) 
SCSL; ~ 1i~ sDeclflcatton n.~ :EO£ 
Proc. of COLING-88, IKS, 393-398, Bonn, August 
Z5-29, 1986. 
