File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/92/c92-4209_metho.xml
Size: 14,441 bytes
Last Modified: 2025-10-06 14:13:07
<?xml version="1.0" standalone="yes"?> <Paper uid="C92-4209"> <Title>PRESENTATION OF THE EUROLANG PROJECT</Title> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> GENERAL PRESENTATION </SectionTitle> <Paragraph position="0"> Aims and motivations of EUROLANG The technical objective of the project is to build an MT/NLP toolbox, offering a wide range of 'open' and powerful tools, which reflect the state-of-the-art in computing and linguistic techniques. These tools will then be validated via the production of a multilingual MT system 'based on second generation principles. This system will process five European languages (English, French, German, Italinn, Spanish). Only ten language pairs (chosen according to the market and partner needs) will be dealt with : the eight pairs involving English language, and the two French/German pairs. The dictionaries will contain 50 000 terms in each language, and their equivalents in the other 'languages.</Paragraph> <Paragraph position="1"> The general objective is to yield products at the end of the project, which will be targeted on the 'low cost' / 'high quality' market. This goal is made possible by the 1991 state-of-the-art MT technology and will considerably increase the share of the nmrket currently occupied by the commercial MT systems.</Paragraph> <Paragraph position="2"> The market is estimated at 12 billion dollars (Gartaer Group figures), and is rapidly increasing. The following trends suggest a boom in this market : continuous increase in intematiounl trade, enormous need within the Single European Market, shortage and ever increasing cost of qualified Iranslators, ACTES DE COLING-92, NANTES, 23-28 AO~ 1992 l 2 8 9 PROC. OF COLING-92, N^NI&quot;ES, AUt~. 23-28, 1992 strategic need for a company involved in a competitive export market to possess high quality trauslatinns of techuical and commercial documents rapidly available.</Paragraph> <Paragraph position="3"> The Japanese were the fhst to appreciate the strategic importance of substantial investment in this field. The major coqx)ratious invest considerable sums arid participate in projects such as EDR, ATR... The state-of-the-art in computing and linguistic technologies associated with skills to deM with different European languages seems to be a good opportunity for Europe to I>ecome a leader in this sector.</Paragraph> <Paragraph position="4"> The linguistic quality of a translation is of course one of the major criteria involved in the evaluating of nn MT system, however there are others (cf. \[Roudaud 91\]). Some linguistic phenomena are very complex, and thus very costly, to deal with, whereas another technical solution, less resource-consuming, could lead to an equivalent (or even better) result, as far as efficiency is concerned. The industrial issue implies that a compromise must be reached, betwean cost, efficiency and linguistic quality : the right solution to the right problem. For example, anaphora are a very difficult linguistic problem to solve, whereas in teclmical docun)eutation the French pronoun il can more often he translated iu English by it. In such it case, a better solntion would be to propose it, as the default translation, and he and she as alternatives, so that the revisor can choose one or the other.</Paragraph> <Paragraph position="5"> This last point shows that we must adopt a global approach for the design of an MT system. Not only is the MT kernel of major importance, but also a user+fiiendly environment, providing many useful tools fi~r the end-user (IJ'anshator, writer,...), must be desigued. Tiffs is the reason why special attention is paid in this project to rite design of the pre- and IX)st-editing tools. For users, an MT system is only a part of the docume,ttary system and integration and connectiou with other tools must be Ioreseen.</Paragraph> <Paragraph position="6"> It is well-known that second geueration MT systems itre very time-consuming. Predictions for the coming years show that workstations will very rapidly )'each 50 or 100 MIPS, and that the cost of MIPS will continue to decrease. All these factors coulribute to the reduction of exploitation costs of MT systems and to the improvemeut of delivery fime/vohane nitio.</Paragraph> <Paragraph position="7"> Organisation and main technical choices EUROLANG is a three-year EUREKA project which began in I~'cember '91. The first year will mainly consist of specifying both computer and linguistic developments. The two following years will be devoted to development, integrafiou, tests and evaluation.</Paragraph> <Paragraph position="8"> The global cost of the project is 684 MFF (about 14{)M US$). It involves five European countries: France, Germany, Italy, Spain, United Kingdom. The partners of the project are : - in France : - SITE group (prime contractor) - CAP GEMINI INNOVATION o CNET - GETA - LADL - MATRA SPACE MARCONI - in Germany : - SIEMENS NIXDORF - KRUPP hidustries - IAI Saarbrilcken - in United Kingdom : - RANK XEROX Ltd.</Paragraph> <Paragraph position="9"> - UMIST (Manchester) - University of Essex - hi Spain : - BDE - Univc~sidad de Barcelona - Universidad anlonoma de Barcelona - in Italy : - LEXICON - THAMUS - Universith di Salerno Uinversit~t di Pisa - Universith di Torino Most of the partners participate in projects in the field of MT/NLP (EUROTRA, ESPRIT...) and have practical experience. The other partners are industmlisls with needs in this area, and they will be very active in the definition of the end-user stations (pre/post-editing...) and in the evaluation of the products.</Paragraph> <Paragraph position="10"> The project is co-managed by SITE and SIEMENS.</Paragraph> <Paragraph position="11"> SITE, prime contractor of the project, has the twofold competence : MT/NLP, in particular via its subsidiary B'VITAL, and industrial documentation management, with respect to writing as well as translation. SITE's translators have validated the quality of die translations produced using the ARIANE MT system and SITE is thus convinced that this technology can be used profitably in an industrial environment (cf. \[Bachut 91a\]). Unfortuuately, the cost of such machine translation is currently so high, that the gain obtained by reducing the translators' work is lost when the cost of the CPU used is added to the human cost. Furthermore, the linguistic quality of the product is not a sufficient condition for improving the translators' efficiency : the translator workstation should be designed to fit the necessary ergouomic requirements.</Paragraph> <Paragraph position="12"> SIEMENS NIXDORF, which is developing, maintaining, using (SIEMENS' translators use METAL) and Commercialising METAL MT system (cf. \[Slocum 83\], \[Schneider 91\]), is particularly interested by the Ac~l~s DE COtiNG-92, NAPC/rEs. 23-28 Aot)I 1992 1 2 9 0 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 definition of a common european NLP platform and wishes to improve METAL technology. This is the reason why SIEMENS NIXDORF decided to play an active role in the EUROLANG project. Its commercial experience is one major advantage ill tile commercial perspective of EUROLANG.</Paragraph> <Paragraph position="13"> On these bases, SITE and SIEMENS NIXDORF decided that it was neces~try to develop a new MT system, based on a considerably improved ARIANE and METAL technology, considering the advanced state of cun'ent computer technology and the evolufioo of linguistics. Technical choices are thus being made bearing in mind the industrial needs : portability, maintainability, openness, possibility of evolution and ergonomy.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> TECHNICAL ASPECTS </SectionTitle> <Paragraph position="0"> Computer aspects As already mentioned, the main objective is to provide a powerful toolbox, contahling tools dedicated to linguistic developments. One of the most important characteristics of such a toolbox is tile implied reusability of its components. A plug and play strategy will thus enable the linguists to develop different kinds of applications using file existing 'components' of the toolbox. A 'lingware workbench' will provide all thc facilities required to specify, implement, test and maintain these applications. To facilitate the communication between the tools and with external systems and titus enable tile plug and play sWategy, an API (Application Programming Interface) will be defined.</Paragraph> <Paragraph position="1"> The toolbox will be designed ill such a way that new tools can easily be added, ensuring its durability. Any evolution of the 'state of file art' in computational linguistics can thus be rapidly takco into account in the EUROLANG product.</Paragraph> <Paragraph position="2"> Most of the initial tools will be specialized lmlguages, allowing the developer (or linguist) to handle concepts he is used to. Such linguistic languages will consist of 4GLs (4th Generation Languages, i.e.</Paragraph> <Paragraph position="3"> specialized progranlming hmguages adapted to specific developments) and the associated compilers and iuterpreters. This architecture ensures a better independance of tile lingware and the softwarc (for instance, rite pattern matching nmchanism is part of tim software and should not be programmed by the linguist), and consequently a better linguistic nlodularity.</Paragraph> <Paragraph position="4"> Lexical and textual data I)ases are also needed in the toollx)x, to enable an easy management of the lexical and textual resoumes. The lexical data base will provide a user-friendly interface to 'add or modify terms. A flexible underlying model is necessary to 'allow modification of the linguistic model, and thus modification of the linguistic information needed in tile dicfiolmarics.</Paragraph> <Paragraph position="5"> l~.eprcsentatiOll of texts and charactels in a mull!lingual enviromuent is a crucial issue. Although works have already been undertaken to solve this problem, no general stmtdard exists as yel ,&quot;rod an external and an inlcmal representation should be designexl, taking into accouut filly standard of rccolnfllendation (c.g. Tex! Encoding hilt!alive rccomlnendation).</Paragraph> <Paragraph position="6"> A geueral exchange humat, bitted on SGML (Standard Generalized Markup I auguage), will thus Iv.: defined lot both lexical and textual data. It will guarantee the openness el the system by allowing the reusability of the lingwam.</Paragraph> <Paragraph position="7"> The fiuul MT ciivironnleut will luovide a user~ friendly translator's workstation. Two kinds of functionalities are fin,'seen : pre-edithlg fntd lU~St-cdiling function~dities. Prc-ediling functionalilies will comprise conveuthmal t~)ls (e.g. spelling checker) and euhunced functionalities (e.g. tools to handle \[n;w words and predict their linguistic behaviour). Post-editing functional!tics will comprise functionalities needed by any ltanslator (even to translate ab in!tie) and functionalities specialised 10r MT revisiou. Among all the forscen functionalities, the following are worlh ultderliuiug : direct accc~ss to dictionaries, iu~uulgeulent of succcssiw~ annotations, intelligent search and replace manipulalions, easy access to al~ernalive trauslations offered by the MT system, request fl)r information conceraiu t, the MT system, and other specific word processing functinns.</Paragraph> <Paragraph position="8"> To cnsme that the system is portahle, develoluUents will be umde in C or C++ poitable lauguage (ANSI), under UNIX. Graphics will he produced under X-W/NDOW/MOTIt;. 'File staudards cmrenlly ill fuLCe will be respected (St)\] ,, SGMI ,, elc.). Althuugh UNIX hlt ~', Imcn chosen ttn tile devehllnncnls during the pmicct, the I'C world (with WINDOWS) is one of our Inture objectives.</Paragraph> <Paragraph position="9"> Linguistic aspects 'file first applicalion of the toolbox will hc a auiltiliuguai MT system, based on second general!on technology : the use by linguists el specialised languages, e, nsnrillg it better Sellaratioll I)etweell the liagwmc aud the st)ltwzue, a Ihree-llhasc hanslalioll :: allRlysis, Itansler und yeneration, cn:;uring a better lnulliltilgual aPluOach.</Paragraph> <Paragraph position="10"> The underlying liuguisiic thculy is based on a syntactico-semantic analysis, giving a deep represenlaliou of the text ill an atom/died lree structure (ill which each tuxle is ';unlotated' by a set of linguistic featutes). The mail! tools used in imrfi)rming such an aualysis will bca slighlly contc:(tnal parser, based on METAl, pmscr, and a ROBRA.-Iike Ire(: ti'ansducGl AcrEs Dr COLING-92, N^Nt'i!s, 23-28 AO(Yl 1992 1 2 9 1 lq~oC. OV C()I,ING-92, NANII!S. At;c;. 23-28, 1992 (ROBRA is the tree transtlacor designed by GETA in the ARIANE MT system, cL \[Boitet 82\], \[Boitet 86\]). The transfer phase makes it possible to translate words in context and the generation phase allows the linguist to specify the surface structure of the text, depending on the deep sguctm'e calculated.</Paragraph> <Paragraph position="11"> Linguistic development methodology, already used by B'VITAL and SITE linguistic teams, implies formal linguistic specifications to describe the desired deep slructure. These specifications will be perfomed using a speeialised 4GL inspired by the GETA's static grammar formalism (el. \[Vauquois 85\]).</Paragraph> <Paragraph position="12"> Common linguistic interface structures are being defined (in the fast project phase) to facilitate the plug and play mecanism between different linguistic components. These linguistic interfaces will consist of the definition of the minimal requirements which should be followed by the linguistic speficications of all the involved languages. This will also ensure a better multilingnalism and make it possible to reduce the transfe~ phase between two languages.</Paragraph> <Paragraph position="13"> Given that the MT product is designed for use in industry, a certain number of characteristics are essential to the final system : the system should always provide at least one translation, when several translations are possible, the presentation of the different solutions to the revisor should be usex-ffiendly, anpredicted phenomena or new words should not block the whole translation process (robustness).</Paragraph> </Section> class="xml-element"></Paper>