LINGUISTIC DEVELOPNENTB IN EUROTRA SINCE 1983. 
Lieven Oaspaert 
Katholieke Universiteit Leaven (Belgium) 
I wish to put the theory and metatheory currently 
adopted in the Eurotra project (ArnoO6) into a 
historical perspective, indicating where and why 
changes to its basic design for a transfer-based MT 
(TBMT) system have been made. 
Let T. be some theory of representation, inducing 
sets of representations R~ and Rt for languages L,, 
and L~ (seen as sets of texts), respectively. 
Transfer-based translation is described as follows: 
AN 
TRF 
R~ ~ .................... R~ 
I :GEN 
L~ - .................... L~ 
TRA 
where AN, BEN and TRF are binary relations, and TRA 
is the composition of AN, TRF and GEN, i.e. 
interpretation was to be expressed as an orbered 
tree with complex property lists on the nodes, which 
was manipulated by two basic operations, viz. tree 
transformations and lexical substitution. The BETA 
preoccupation with robustness, on the other hand, 
made them require that all linguistic information 
about texts should be merged into one single gds. On 
failure to compute parts of a deeper linguistic 
dimension, the intuition went, some clever algorithm 
could be used to extract from the gds an equivalent 
piece of representation on the next less pretentious 
dimension. The logical extreme of this reasoning was 
that, if all else failed, it should be possible to 
recover the original text from the gds. 
Grenoble, however, had perceived the usefulness of 
dependency theory (DT) for TBNT. "\[here is a sense in 
which DT is a lexically oriented theory of language, 
and, in the end, translation is a question of 
getting the right translation for words. 
Nevertheless, the marriage between DT and the gds 
design led to (I) procrustinated linguistics, and 
(2) a formalism with untractable semantics. 
(i) AN ~ L~ x R,~, GEN ~ R~ x L~, TRF s R~ x R~ 
(ii) TRA = AN o TRF o GEN 
We also need to introduce two parameters, viz. 
hypotheses about T .... A theory (e.g. for the AN 
relation) is m ultistratal when It consists of a set 
of subtheories {tl,t2,...,tn}, each characterising a 
set of representation R,, such that 
i.i. ~Io~ostratal. 
The advocated representation theory was not 
stratified in any interesting sense. Rather, the 
whole burden of modularising the relation between 
text and representation was put on the translation 
of the relation into a procedure: discussions about 
never brought to hear. 
(iii) AN = AN~ o AN~ ..... AN~, 
(iv) AN~ ~ L~ x R~, 
AN2 ~ R~ x R~, 
R .... x 
Otherwise, a theory is monostratal, 
A theory T is mult!_dim_ebsional when descriptions of 
linguistic objects along several linguistic 
dimensions are merged into one single 
representational mbject. The notion of linguistic 
dimension is meant to correspond to some organising 
principle for a theory of representation (e.g. 
constituency, grammatical relations, logical 
semantics, ate.l. Otherwise, a theory is 
monodimensional. 
In what follows we describe the various Eurotra 
approaches to TBMT in terms of this basi~ model, 
Initially, due to its BETA inheritance, Eurotra 
adhered to a monostratal multidimensional model for 
TBMT. Computationally, it was based on the Grenoble 
formalism of the g~n~E~ateu_r d~._stEuctur~s (g.ds). 
Linguistically, it advocated a diluted form of 
dependency theory as a basis for TBMT. 
The observation that theoretical linguistics had 
been incapable of providing a practically applicable 
basis for translation had led GrenoOle to build 
almost no linguistic commitment into the gds 
formalism. Every possible form of linguistic 
294. 
\]he innovation of \[Arno83\] was its attempt to derive 
requirements on T. from a set of more abstract 
principles, seen as a theory of MT providing a 
framework within which possible substantive theories 
for TBMT could be devised and compared. The weakness 
of the framework was to seek to motivate the tools 
inherited from BETA a-posteriori. Its merit was to 
be a partial theory of TBMT, independent of the 
inheritance. 
Its major concern was directed at elucidating the 
division of labour between AN, TRF and OEN, and at 
deriving implications on T. from this understanding. 
The pivotal principles of the framework that have 
survived the many face lifts of the Eurotra model 
are i soduidy_ and Q-d~E~D~.g!2. 
The principle of isoduiy allowed for a principled 
definition, in terms of properties of T,., of the 
domain of the GEN relation of some language in terms 
of the nodomain of the AN relation for that same 
language, thus indirectly defining .the TRF relation. 
The principle of O-differentiation required that T. 
should be sufficiently expressive to ensure that all 
meaning aspects of text that are relevant for 
translation (called 'Q') be represented in members 
of R. The two principles together" provided a basis 
for designing a transfer device that was (i) 
developmentally simple, and (2) Q-preserving. These 
are necessary features of any multilingual TBMT 
system striving for good-quality translation. 
1.2. ~lultidimensior, al. 
Despite its success in providing an initial 
framework for Eurotra, lame83\] failed dismally when 
it came to deriving from it a substantive linguistic 
representation theory. The failure was not unrelated 
to the absence of motivation for the GETA vestiges. 
The gds comprised a flat geometry and a rich 
decoration on the nodes. Given the requirement of 
merging, th~ geometry for all dimensions (text 
string, morphology, surface syntax, deep syntax, 
semantics) had to be very similar: this was only 
possible by making the geometry quite meaningless, 
and by putting the whole expressive burden on the 
labelling of nodes. The need to preserve surface 
word order (robustness) gave geometry its only 
interesting task: the representation of word order 
through the ordering of sister nodes, Within a 
merged approach, this requirement led to the 
arbitrary interdependence of the subtheorios for the 
various linguistic dimensions. The problem was most 
tangible in the design of a subtheory of \]',, for a 
semantic dimension. T~, became unnecessarily complex 
and inconsistent. Given the absence of linguistic 
~ommitments built into the tools and the failure of 
the framework to answer substantive linguistic 
questions, debates about the relative merits of 
particular representational choices wore 
inconclusive. 
We give an example of linguistic procrustination, 
Surface word order being represented by the order of 
sister nodes in the merged tree (the gds), tree 
geometry was seen as ordered. Tins geometry of 
dependency representations, on the ether hand, are 
normally unordered, Tile way out was a refashioning 
of DT as a compromise between DT and X-theory with a 
single bar: a subset of the information about the 
governing node was lowered into the mdbtree 
representing its dependents and to require that the 
subtree he ordered conforming to the position o~ 
elements in the input text, lbi~ worked badly with 
all sorts of difficult linguistic phenomena: 
exocentri(: constructions (e.g. ~onjuection), 
gapoing, discontinuity, long-distance dependencies, 
etc. Much of the linguistic research, then, was 
aimed at overcoming these problems in a principled 
way by means of a theory of empty elements. Although 
the latter was intuitively consistent, it caused 
such an increase in the complexity of the formalism 
that the latter defied any coherent formal 
characterisation. 
The first design was, amongst other things, unable 
to flesh out the problem of robustness, Combining a 
muitidimensienal representation with a b~sically all 
paths combinatorial algoritbm led to the inability 
to rely on the actual computation of combinations of 
information required by the safety net algorithm. 
The second deuign (which was never formally accepted 
by the project) purported to solve this problem, 
without eliminating multidimeneion-alitv. It was 
multistratal and multidimensional. 
2.1. Rultistr~tal. 
It was observed that the representations induced by 
T,, had to meet two (possibly conflicting) 
requirements: (i) they had to have sufficient 
expressive power to allow for adequate translation 
via simple transfer, and (2) their computation had 
to be feasible. As a consequence, T. was split into 
two subtheories, T~ and T~, were the former was 
directed at the needs of adequacy for simple 
transfer and the latter to the reliability of 
presence of a consistent representation from which 
either the more pretentious T~ representation was 
reached or, alternatively, translation via less.- 
simple transfer was possible. The model that emerged 
was the following: 
TRF~ 
AN~ I TRFF I GEN~ 
AN R./f ............................... R~:zf GEN 
AN~ I I GENF 
TRA 
The motivation .\[or this design hinged on (i) the 
fact that the f-stratum could make use of know-how 
in computational linguistics, (2) the f-stratum was 
a good starting point for innovative research on 
what T~. stlould be for multilingoal TBMT, (3) the 
model gave content to the notion of safety nets 
(rmbostness), (4) developmental issues, 
The claim made was that with a monolithic l., the 
formulation of safety nets is hindered hy the 
hybridity prohlem~ their' input domain coold be any 
unpredictable mombination of feasibly computable and 
adequate information on several dimensions in the 
gds. \]he new design provided the f-stratum as a more 
reliable basis for' safe safety nets. 
2.2. Hultidi~nsio~aI. 
This feature of the design did not change, Instead 
of one multidimensional representation, we now had 
two, No further attempt was mmde, however, to 
justify the use of multidimensional representations. 
Given the rejection of theoretical modularity on the 
basis of considerations of reliability of 
computation, the only course to take seemed to be to 
abandon the multidimensional view itself and to let 
the strata themselves v, epresenh linguistic 
dimensions. The new model became multistratal and 
monodimensional, 
3.1. khlltistratal. 
T,, was described as a sot of iodependently defined 
subtheories for" representing normalised text (no), 
morphology (me), surface syntax (ss), deep syntax 
(dm) and semantics (sem), They were concogtumlly 
related to each other, however, by being based on a 
common central notion of dependency defined in terms 
of slotfilling and modification, A strength of this 
move is that linguistics in Eurotra could now profit 
from linguistic work in the outside world, 
The proposal suffered, l'lowever~ from the absence of 
a clear view on what sorts of dedicated operations 
were needed to actually map between arbitrarily 
different dependency trees. Nor were considerations 
of the computational complexity of arbitrary tree- 
transformation formalisms taken into account in the 
definition of the levelo, A proposal to relate all 
these levels to each other by giving them all a 
lexicalist underpinning was rejected by the C,E.C. 
Finally, a stratificatienal strategy was imposed on 
the makers of the design, with the (unjustified) 
intuition that it would provide a basis for the 
incorporation of safety nets into the model. 
295 
The model now roughly looked as follows (with 
question marks indicating undefined parts): 
TRF .... 
R~z~.m ........................... a~/~,. 
? TRF.. ? 
R./d. - ....... ?????7777 .......... RtXd. 
? TRF.. ? 
R.~.. - ....... 77777?777 .......... R,.~.. 
AN ? TRF~o ? BEN 
R.~o - ....... 777777777 .......... R~o 
? TRF~ ? 
R...~ ........ 777777777 .......... R~,~ 
? ? 
L. - .......................... Lt 
TRA 
3.2. Monodimensional. 
Representations reflect only one linguistic 
dimension: the gds approach was completely 
abandoned. 
The theories identified described the representation 
of normalised text strings, the internal structure 
of words, the surface dependency, the canonical 
dependency and the semantic dependency of the input 
texts. 
4. The #resent desi~_LCde~T85jA\[!oB6\]± 
The properties of the current Eurotra design 
constitute the topic of Arnold & des Tombe's paper 
in this volume. Here, I merely relate it to previous 
hypotheses about the Eurotra translation model, 
The design is multistrata\] and monodimensional and 
can be depicted as follows: 
AN 
Gm/nt G~/mo ~/cm ~m/r~ G*X~em 
text. <-> nt. <-> me. <-> cs. <-> re. (-> sem. 
TRF 
text, <-> nt~ <-> mo~ <-> cst <-> rs~ <-) sem~ 
t t t t t 
Gt/nt St/me Gt/c. Gt/r~ G~/=~m 
GEN 
4.1. Multistratal. 
Each stratum corresponds to an autonomous generating 
device for a representation language, Each generator 
consists of a set of atoms and a set of constructors 
that together allow for the generation of L(8), a 
set of formally well-formed derivation trees. The 
latter are then evaluated (by unification) to a set 
of meaningfull representations, R(G). 
The intuition underlying this model is that 
translation between natural language texts can be 
split up into a sequence of more IE~Y~_ 
translations between elements of adjacent 
generators. Adjacent generators must be devised so 
that the primitive translations that obtain are also 
simll_e, This is taken to mean that primitive 
translations must be (i) compositional and (2) one- 
shot. The justification for compositionality is the 
intuition that the translation of some expression E 
296 
is a straightforward function of the translation of 
E's parts and of the way these parts ere put 
together, The latter is required to restrain the 
complexity of this function: the codomain of a 
primitive translation must always be well-formed in 
terms of the target generator, This forbids internal 
strategy inside translators. 
The project is examining various hypotheses about 
particular instantiations of this core model: e.g. 
translators could perform any one of the following 
four mappings: (i} derivation to derivation, (ii) 
derivation to representation, (iii) representation 
to derivation and (iv) representation to 
representation. Possibility (i) was found to be too 
restrictive. We now study possibility (iii). Note 
the similarity between (iv) and the structural 
correspondence approach adopted in LFG for mapping 
between information structures of a different 
nature. 
4.2. Monodi~ensional. 
The current strata envisaged are normalised text, 
morphology, configurational surface syntax, 
relational surface syntax and semantics. Morphology 
is based on work on word grammar as independent of 
phrase structure grammar. Configurational syntax 
draws from the X-theory literature. Relational 
syntax representations resemble LFG f-structures. 
The semantic stratum, finally, is not yet fully 
specified: this has to do with the very special 
requirements that translation by means o~ simple 
transfer puts on e semantic representation theory. 
The point is, however, that the non-semantic levels 
are claimed to be feasible (cfr. f-stratum in 2) and 
that they can thus provide a basis for researching a 
translation-oriented semantic theory, 
5. Conclusion. 
I hope to have slightly lifted the veil that has 
hidden the Eurotra project from the scientific 
community for a number of years. It has become 
clear, hopefully, that the Eurotra design has become 
more homogeneous and that it constitutes a valuable 
step towards a better understanding of the problem 
of machine translation. 
References

\[ArnoB3\]: Arnold, laspaert & des Tombs, LAEgujst~c 

lArnoB4a\]: Arnold, Jaspaert & des Tombs, \[T_L~:3_FiEai 
Re~prt, C,E,C,, 1984. 

\[ArnoB4b\]: Arnold, Oaspaert & des Tombs, \[T~I\[._.F.~.\[~ 
~2\[~, C.E.C., 1984. 

\[ArnoBSa\]: Arnold, Jaspaert & des Tombs, ~!t~_@ 

\[Arne86\]: Arnold & des lombe, Basic Theory and 
~ethodology in Eurotra, to appear in: 8. Nirenburg 
1986. 

\[desT85\]: des Tombs, Arnold, Jaspaert, Johnson, 
Krauwer, Rosner, Varile & Warwick, A Prelie~nery 
Linguistic Framework for EUROTRA, In: P\[oceed~n~s2f" 
~.1 ! I !1__ J !_.I !~h_/i ! s !_.{.\[£!.!.!_~t ijLL~!~. ~!~2!.!~.. L aj~!\] g!_!., 
Colgate University, 1985, 283-289. 
