PRESENTATION OF THE 
EUROLANG PROJECT 
B.SEITE, D.BACHUT, D.MARET, B.ROUDAUD 
SITE 
12 rue de Reims - 94700 Maisons-Alfort - FRANCE 
tel : +33 16 1 43 96 72 00 
e-mail : D Bachut@site-maisons-alfort,fr 
ABSTRACT 
International trade in general and particularly the 
Single European Market will bring about a considerable 
increase in the already huge documentation market. NLP 
products will contribute to improving the 
competitiveness of industry in this strategic field. 
The indaslaial objective of EUROLANG is thus 
to provide efficient NLP tools and to give the European 
business community a better opportunity to maintain a 
command of multilingual technical and commercial 
communication. 
The technical objective of the EUROLANG 
project is to build an MT/NLP toolbox, offering a wide 
range of 'open' and powerful tools, which reflect the 
state-of-the-art in computing and linguistic techniques. 
These tools will then be validated via the production of a 
mnltilingunl MT system based on second generation 
principles. 
With respect to the computing aspects, the main 
technical choices are : portability, maintainability, 
openness, possibility of evolution and ergonomy. This 
implies the u~ of standard techniques and tools (UNIX, 
Xll, MOTIF, C, SQL, SGML). 
The linguistic developments, based on a 
syntacticn-semantic analysis, will follow an industrial 
methodology, implying formal linguistic specifications. 
The use of speciatised languages will ensure a better 
separation between software and lingware, and thus better 
modularity. The three-phase translation process 
guarantees the multilingual aspect of the MT system. 
INTRODUCTION 
In the light of the communication society, 
which is leading industrial companies to become 
involved in NLP technology, the EUROLANG parlnexs 
have decided to pool their technical, human and 
commercial resources in order to define and develop a 
new 'high quality' \] 'low cost' MT system. This system 
will be based on state-of-the-art computing and linguistic 
techniques, and is aimed at the market for quality 
translation with post-editing. 
To ensure the reusability of the linguistic 
resources and the possibility of evolution of the system, 
a powerful toolbox, providing different NLP tools and 
useful 'plug and play' functionalities, will be developed. 
This toolbox will be a linguistic plaffoml providing a 
usex-liiendly environment to design and davelop different 
kinds of NLP applications. 
In the following paper, we present more 
specifically the aims and motivations of the project and 
the main technical choices, with regard to both 
computing and linguistic issues. 
GENERAL PRESENTATION 
Aims and motivations of EUROLANG 
The technical objective of the project is to build 
an MT/NLP toolbox, offering a wide range of 'open' and 
powerful tools, which reflect the state-of-the-art in 
computing and linguistic techniques. These tools will 
then be validated via the production of a multilingual 
MT system 'based on second generation principles. This 
system will process five European languages (English, 
French, German, Italinn, Spanish). Only ten language 
pairs (chosen according to the market and partner needs) 
will be dealt with : the eight pairs involving English 
language, and the two French/German pairs. The 
dictionaries will contain 50 000 terms in each language, 
and their equivalents in the other 'languages. 
The general objective is to yield products at the 
end of the project, which will be targeted on the 'low 
cost' / 'high quality' market. This goal is made possible 
by the 1991 state-of-the-art MT technology and will 
considerably increase the share of the nmrket currently 
occupied by the commercial MT systems. 
The market is estimated at 12 billion dollars 
(Gartaer Group figures), and is rapidly increasing. The 
following trends suggest a boom in this market : 
continuous increase in intematiounl trade, 
enormous need within the Single European 
Market, 
shortage and ever increasing cost of qualified 
Iranslators, 
ACTES DE COLING-92, NANTES, 23-28 AO~ 1992 l 2 8 9 PROC. OF COLING-92, N^NI"ES, AUt~. 23-28, 1992 
strategic need for a company involved in a 
competitive export market to possess high 
quality trauslatinns of techuical and commercial 
documents rapidly available. 
The Japanese were the fhst to appreciate the 
strategic importance of substantial investment in this 
field. The major coqx)ratious invest considerable sums 
arid participate in projects such as EDR, ATR... The 
state-of-the-art in computing and linguistic technologies 
associated with skills to deM with different European 
languages seems to be a good opportunity for Europe to 
I>ecome a leader in this sector. 
The linguistic quality of a translation is of 
course one of the major criteria involved in the 
evaluating of nn MT system, however there are others 
(cf. \[Roudaud 91\]). Some linguistic phenomena are very 
complex, and thus very costly, to deal with, whereas 
another technical solution, less resource-consuming, 
could lead to an equivalent (or even better) result, as far 
as efficiency is concerned. The industrial issue implies 
that a compromise must be reached, betwean cost, 
efficiency and linguistic quality : the right solution to 
the right problem. For example, anaphora are a very 
difficult linguistic problem to solve, whereas in 
teclmical docun)eutation the French pronoun il can more 
often he translated iu English by it. In such it case, a 
better solntion would be to propose it, as the default 
translation, and he and she as alternatives, so that the 
revisor can choose one or the other. 
This last point shows that we must adopt a 
global approach for the design of an MT system. Not 
only is the MT kernel of major importance, but also a 
user+fiiendly environment, providing many useful tools 
fi~r the end-user (IJ'anshator, writer,...), must be desigued. 
Tiffs is the reason why special attention is paid in this 
project to rite design of the pre- and IX)st-editing tools. 
For users, an MT system is only a part of the 
docume,ttary system and integration and connectiou with 
other tools must be Ioreseen. 
It is well-known that second geueration MT 
systems itre very time-consuming. Predictions for the 
coming years show that workstations will very rapidly 
)'each 50 or 100 MIPS, and that the cost of MIPS will 
continue to decrease. All these factors coulribute to the 
reduction of exploitation costs of MT systems and to the 
improvemeut of delivery fime/vohane nitio. 
Organisation and main technical choices 
EUROLANG is a three-year EUREKA project 
which began in I~'cember '91. The first year will mainly 
consist of specifying both computer and linguistic 
developments. The two following years will be devoted 
to development, integrafiou, tests and evaluation. 
The global cost of the project is 684 MFF 
(about 14{)M US$). It involves five European 
countries: France, Germany, Italy, Spain, United 
Kingdom. The partners of the project are : 
- in France : 
- SITE group (prime contractor) 
- CAP GEMINI INNOVATION 
o CNET 
- GETA 
- LADL 
- MATRA SPACE MARCONI 
- in Germany : 
- SIEMENS NIXDORF 
- KRUPP hidustries 
- IAI Saarbrilcken 
- in United Kingdom : 
- RANK XEROX Ltd. 
- UMIST (Manchester) 
- University of Essex 
- hi Spain : 
- BDE 
- Univc~sidad de Barcelona 
- Universidad anlonoma de Barcelona 
- in Italy : 
- LEXICON 
- THAMUS 
- Universith di Salerno 
Uinversit~t di Pisa 
- Universith di Torino 
Most of the partners participate in projects in 
the field of MT/NLP (EUROTRA, ESPRIT...) and have 
practical experience. The other partners are industmlisls 
with needs in this area, and they will be very active in 
the definition of the end-user stations (pre/post-editing...) 
and in the evaluation of the products. 
The project is co-managed by SITE and 
SIEMENS. 
SITE, prime contractor of the project, has the 
twofold competence : MT/NLP, in particular via its 
subsidiary B'VITAL, and industrial documentation 
management, with respect to writing as well as 
translation. SITE's translators have validated the quality 
of die translations produced using the ARIANE MT 
system and SITE is thus convinced that this technology 
can be used profitably in an industrial environment (cf. 
\[Bachut 91a\]). Unfortuuately, the cost of such machine 
translation is currently so high, that the gain obtained by 
reducing the translators' work is lost when the cost of 
the CPU used is added to the human cost. Furthermore, 
the linguistic quality of the product is not a sufficient 
condition for improving the translators' efficiency : the 
translator workstation should be designed to fit the 
necessary ergouomic requirements. 
SIEMENS NIXDORF, which is developing, 
maintaining, using (SIEMENS' translators use METAL) 
and Commercialising METAL MT system (cf. \[Slocum 
83\], \[Schneider 91\]), is particularly interested by the 
Ac~l~s DE COtiNG-92, NAP¢rEs. 23-28 Aot)I 1992 1 2 9 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
definition of a common european NLP platform and 
wishes to improve METAL technology. This is the 
reason why SIEMENS NIXDORF decided to play an 
active role in the EUROLANG project. Its commercial 
experience is one major advantage ill tile commercial 
perspective of EUROLANG. 
On these bases, SITE and SIEMENS 
NIXDORF decided that it was neces~try to develop a 
new MT system, based on a considerably improved 
ARIANE and METAL technology, considering the 
advanced state of cun'ent computer technology and the 
evolufioo of linguistics. Technical choices are thus 
being made bearing in mind the industrial needs : 
portability, maintainability, openness, possibility of 
evolution and ergonomy. 
TECHNICAL ASPECTS 
Computer aspects 
As already mentioned, the main objective is to 
provide a powerful toolbox, contahling tools dedicated to 
linguistic developments. One of the most important 
characteristics of such a toolbox is tile implied 
reusability of its components. A plug and play strategy 
will thus enable the linguists to develop different kinds 
of applications using file existing 'components' of the 
toolbox. A 'lingware workbench' will provide all thc 
facilities required to specify, implement, test and 
maintain these applications. To facilitate the 
communication between the tools and with external 
systems and titus enable tile plug and play sWategy, an 
API (Application Programming Interface) will be 
defined. 
The toolbox will be designed ill such a way that 
new tools can easily be added, ensuring its durability. 
Any evolution of the 'state of file art' in computational 
linguistics can thus be rapidly takco into account in the 
EUROLANG product. 
Most of the initial tools will be specialized 
lmlguages, allowing the developer (or linguist) to handle 
concepts he is used to. Such linguistic languages will 
consist of 4GLs (4th Generation Languages, i.e. 
specialized progranlming hmguages adapted to specific 
developments) and the associated compilers and 
iuterpreters. This architecture ensures a better 
independance of tile lingware and the softwarc (for 
instance, rite pattern matching nmchanism is part of tim 
software and should not be programmed by the linguist), 
and consequently a better linguistic nlodularity. 
Lexical and textual data I)ases are also needed in 
the toollx)x, to enable an easy management of the lexical 
and textual resoumes. The lexical data base will provide a 
user-friendly interface to 'add or modify terms. A flexible 
underlying model is necessary to 'allow modification of 
the linguistic model, and thus modification of the 
linguistic information needed in tile dicfiolmarics. 
l~.eprcsentatiOll of texts and charactels in a 
mull!lingual enviromuent is a crucial issue. Although 
works have already been undertaken to solve this 
problem, no general stmtdard exists as yel ,"rod an external 
and an inlcmal representation should be designexl, taking 
into accouut filly standard of rccolnfllendation (c.g. Tex! 
Encoding hilt!alive rccomlnendation). 
A geueral exchange humat, bitted on SGML 
(Standard Generalized Markup I auguage), will thus Iv.: 
defined lot both lexical and textual data. It will guarantee 
the openness el the system by allowing the reusability 
of the lingwam. 
The fiuul MT ciivironnleut will luovide a user~ 
friendly translator's workstation. Two kinds of 
functionalities are fin,'seen : pre-edithlg fntd lU~St-cdiling 
function~dities. Prc-ediling functionalilies will comprise 
conveuthmal t~)ls (e.g. spelling checker) and euhunced 
functionalities (e.g. tools to handle \[n;w words and 
predict their linguistic behaviour). Post-editing 
functional!tics will comprise functionalities needed by 
any ltanslator (even to translate ab in!tie) and 
functionalities specialised 10r MT revisiou. Among all 
the forscen functionalities, the following are worlh 
ultderliuiug : direct accc~ss to dictionaries, iu~uulgeulent of 
succcssiw~ annotations, intelligent search and replace 
manipulalions, easy access to al~ernalive trauslations 
offered by the MT system, request fl)r information 
conceraiu t, the MT system, and other specific word 
processing functinns. 
To cnsme that the system is portahle, 
develoluUents will be umde in C or C++ poitable 
lauguage (ANSI), under UNIX. Graphics will he 
produced under X-W/NDOW/MOTIt;. 'File staudards 
cmrenlly ill fuLCe will be respected (St)\] ,, SGMI ,, elc.). 
Althuugh UNIX hlt ~', Imcn chosen ttn tile devehllnncnls 
during the pmicct, the I'C world (with WINDOWS) is 
one of our Inture objectives. 
Linguistic aspects 
'file first applicalion of the toolbox will hc a 
auiltiliuguai MT system, based on second general!on 
technology : 
the use by linguists el specialised languages, 
e, nsnrillg it better Sellaratioll I)etweell the 
liagwmc aud the st)ltwzue, 
a Ihree-llhasc hanslalioll :: allRlysis, Itansler und 
yeneration, cn:;uring a better lnulliltilgual 
aPluOach. 
The underlying liuguisiic thculy is based on a 
syntactico-semantic analysis, giving a deep 
represenlaliou of the text ill an atom/died lree structure 
(ill which each tuxle is ';unlotated' by a set of linguistic 
featutes). The mail! tools used in imrfi)rming such an 
aualysis will bca slighlly contc:(tnal parser, based on 
METAl, pmscr, and a ROBRA.-Iike Ire(: ti'ansducGl 
AcrEs Dr COLING-92, N^Nt'i!s, 23-28 AO(Yl 1992 1 2 9 1 lq~oC. OV C()I,ING-92, NANII!S. At;c;. 23-28, 1992 
(ROBRA is the tree transtlacor designed by GETA in the 
ARIANE MT system, cL \[Boitet 82\], \[Boitet 86\]). The 
transfer phase makes it possible to translate words in 
context and the generation phase allows the linguist to 
specify the surface structure of the text, depending on the 
deep sguctm'e calculated. 
Linguistic development methodology, already 
used by B'VITAL and SITE linguistic teams, implies 
formal linguistic specifications to describe the desired 
deep slructure. These specifications will be perfomed 
using a speeialised 4GL inspired by the GETA's static 
grammar formalism (el. \[Vauquois 85\]). 
Common linguistic interface structures are 
being defined (in the fast project phase) to facilitate the 
plug and play mecanism between different linguistic 
components. These linguistic interfaces will consist of 
the definition of the minimal requirements which should 
be followed by the linguistic speficications of all the 
involved languages. This will also ensure a better 
multilingnalism and make it possible to reduce the 
transfe~ phase between two languages. 
Given that the MT product is designed for use in 
industry, a certain number of characteristics are essential 
to the final system : 
the system should always provide at least one 
translation, 
when several translations are possible, the 
presentation of the different solutions to the 
revisor should be usex-ffiendly, 
anpredicted phenomena or new words should not 
block the whole translation process 
(robustness). 
CONCLUSION 
Considering the industrial objective of the 
project, EUROLANG will provide not only an efficient 
European MT system, to compete with Japanese and 
other MT systems abroad, but also an unequalled NLP 
platform for large scale NLP application developments. 
This industrial goal will not prevent all 
linguistic and computing developments from being based 
on the current state-of-the-art technology. To ensure the 
durability of the toolbox, an R&D stratum will prepare 
for future versions of the product, in which new tools 
may be added and old ones may be improved. 
The EUROLANG system will thus give the 
European business community a better opportunity to 
maintain the command of multilingual technical and 
commercial eommunication, which is crucial for 
developing international cooperation and for safegumding 
all language specificities. 

BIBLIOGRAPHY 
\[Alonso 88\] 
ALONSO, J., "A model for Transfer Control in 
the METAL MT-System", COLING 88, 
Budapest, 1988. 
\[Bachut 91a\] 
BACHUT, D., & at., "Industrialisation d'un 
syst~me de TAO fran~ais-anglais pour la 
documentation technique", G6nie Linguistique 91, 
Versailles, 1991. 
\[Bachut 91b\] 
BACHUT, D., & al., "Traduction et 
Terminologie : exp6rience et perspectives 
industrielles", 2~mes journ6es scientifiques du 
RL'IT, Mons, 1991. 
\[Boitet 82\] 
BOITET, Ch., & al., "ARIANE-78: an 
integrated environment for automated translation 
and human revision", COLING-82, Prague, 1982. 
\[Boitet 86a\] 
BOITET, Ch., "The French National MT-Project: 
Technical organization and translation results of 
CALLIOPE-AERO", IBM Conf. on Translation 
Mec 'hanization, Copenhague, 1986. 
\[Boitet 86b\] 
BOITET, Ch., "Current Machine Translation 
systems developed with GETA's methodology and 
software tools", ASLIB, London, 1986. 
\[Boitet 871 
BOITET, Ch., "Current state and future outlook 
of the research at GETA", MT Summit, Hakone, 
1987. 
\[Chandioux 76\] 
CHANDIOUX, J., "METEO: un syst~me 
op6rationnel pour la traduction des bulletins 
m6t6orologiques destin6s au grand public", Meta 
21, 1976. 
\[Chappuy 83\] 
CHAPPUY, S., "Formalisation de la description 
des niveaux d'interpr6tation des langues 
naturelles", Th~se 3e cycle informatique, 
Grenoble, 1983. 
\[Gross 75\] 
GROSS, M,, "M6thodes en syntaxe : R6gime des 
compl6tives", Editions HERMANN, Paris, 1975. 
\[Gross 90\] 
GROSS, M., GUILLET, A., "Mod61es 
Linguistiques", Traitement des Langues 
Naturelles, Ecoles d'6t6 du CNET, Lannion, 
1990. 
\[Hutchins 86\] 
HUTCHINS, W.J., "MACHINE 
TRANSLATION: past, present, future", 
Chichester, Ellis Horwood series in Computer 
and their applications, 1986. 
\[lsabelle 78\] 
ISABELLE, P., & al., "TAUM-AVIATION : 
description d'un syst~me de traduction automafs6 
des mannels d'entretien en a6ronantique", 
COLING, 1978. 
\[Kugler 91\] 
KUGLER, M., & al., "The Translator's 
workbench : An Environment for Multi-Lingual 
Text Processing and Translation", MT Summit 
III, Washington, 1991. 
\[Perschke 89\] 
PERSCHKE, S., "EUROTRA", MT Summit II, 
Munich, 1989. 
|Roudaud 91\] 
ROUDAUD, B., "A procedure for the evaluation 
and improvement of an MT system by the end- 
user", Workshop on Evaluation of MT Systems, 
Ste Croix, 1991. 
\[Schneider 91 \] 
SCHNEIDER, T.. "The METAL System", MT 
Summit HI, Washington, 1991. 
\[Schlitz 91\] 
SCHOTZ, J., & al., "An Architecture Sketch of 
Eurotra-II", MT Summit HI, Washington, 1991. 
\[Scott 89\] 
SCOTT, B.E., "The LOGOS System", MT 
Summit II, Munich, 1989. 
\[S6it6 91\] 
SEITE, B., "Enjeux du TALN en gestion 
docnmentaire : Strat6gie de SITE en ing6nierie 
documentaire", Salon International des Industries 
de la Langue, OFIL, Paris, 1991. 
\[Sloctan 83\] 
SLOCUM, J., "A Status Report on the LRC 
Machine Translation System", First Conference 
oll Applied Natural Language Processing, ACL, 
Santa Monica, 1983. 
\[Uchida 89\] 
UCHIDA, H., "ATLAS", MT Summit II, 
Munich, 1989. 
\[Vanquois 85\] 
VAUQUOIS, B., & CHAPPUY, S., "Static 
grammars : a formalism for the description of 
linguistic models", International Conference on 
Theoretical and M~thodological Issues in Machine 
Translation of Natural Language, Colgate 
University, 1985. 
