TOWARD INTEGRATED DICTIONARIES FOR M(a)T: 
motlvatlons and linguistic organtsation 
presented for COLING--86 
Bonn, 25-29 August 1986 
by 
Ch. BOITE1 & N. NEDOBEJKINE 
GETA, BP 68 
gelversltc~ de Grenoble 
38402 Sa I nt ~Mar t i n-d ' I t(~.r'es, F RANCE 
~,BSTIR A_C T 
In tile framework of Macll I rre (aided) Translation 
systems, two types of lextcal knowledge are used, 
"natural " and "formal ", in the form of on-i lee 
termlnologlca I resources for human translators or 
revisors arid of coded dtct lonar ies for Machine 
Translat ton proper. 
A new organization is presented, whlch allows to 
integrate both types In a unique structure, called "fork" 
integrated dictionary, or FIB, A given FIG is associated 
wl th one natural language and may give access to 
translations into several other laeguages. 
The FIGs associated to languages L1 and 1_2 contain all 
information necessary to geeerate coded dictionaries of 
M(a)T systems translating from L1 Into l_2 or vice-versa. 
The skeleton of a FIG may be vlewed as a classical 
rnonollngual dictionary, augmented with one (or several) 
bilingual dictionary. Each Item Is a tree strLictLIre, 
constructed by taking the "natural" information (a tree) 
and "graft 1 t\]~J II onto i t some '1 forma 111 lnf"ormat Ion . 
Various aspects of thls design are refined and 
Illustrated by detailed examples, several scenarli for 
the construcI Ion of rids are presented, and seine problems 
of organizer ion and Implement at ion are discussed. A 
prototype hrlplementation of the FID structure Is L,lder 
way ill Grenoble. 
Key-words : Macbtne (aldod) Translation, Fork Ietegrated 
Dictionary, Lexioal Data Base, Specialized I..aeguages for 
Linguistic Pr ograltrni ng. 
AbbE.@_y1AttoD.9 : M(a)T, MT, 14AMT, MAHT, FIG, I_EXBB, SLLP. 
I NT RODUC T IO_N. 
Integrated Machine (aided) lraosl at loll ("M(a) r") 
systems tncludo two types ef translator" aids, First, 
there ts a sort of traeslator "workstatlon", relying on a 
text processing system augmeetod with spectal f~unc| ~ons 
and glvlng access to one or several "natural" on-line 
"lextcal resources" IC4,7\[I, such as dictionaries, 
terminology lists or data banks, and t hesaLIr i . This 
constitutes the Machlne Alded Human Translatlon ("MAHT") 
aspect. Second, there may be a true Machine lranslation 
( "MT" ) system, wh tell " l ingware" conststs of "coded" 
grammars and dictionaries, lhls Is the (human alded) MT 
aspect, abbreviated as "HAMT", or simply "MT", because 
human revision Is necessary even more for machine 
translations than for human translations. 
The tern1 "coded" doesn't orlly mean that MT gr'armlar's 
and dictionaries are written In Specialized Languages for" 
Linguistic Pr ogr anlnf rrg ("SLLP") , but also that the 
grammatical and lexical Information they contain is of a 
more "formal" nature. In some systems, the f`ormal lexical 
information ts a reduction (and perhaps ae 
oversimplification) of the Information found In usual 
dictionaries. But, tn all sophisticated systenls, it Is 
far more detailed, and relies on some deep analysis of 
the language. Moreover, tile access keys may be 
different: classical dtct 1charles are accessed by 
\]efftrlas, whl le formal d'lct looar tes may be accessed by 
morphs (roots, affixes...), \]el~\]~ras, lexlcal units, and 
even other linguistic properties. Ill many systems written 
ill ARIANE--78 {1}, left, as are not directly used. 
Efforts have beer\] made to devise data base systems for 
the natural or the formal aspect , separately. 
Multillngual terminological data bases, such as TERMIUM 
I'B\[I or EURODICAUTOM, Illustrate tt~e first type. 
On tile other hand, the Japanese and the French 
National MT projects have developed specialized lexlcal 
data base systems ( "LEXDB" ) , in which the ( formal ) 
information is entered, and from WlllCll MT dictionaries 
are produced. More precisely, there Is a data base for" 
each language (I.), and for eacl\] pair of laeguages (L1,L2) 
handled by the MT system. From the first LEXDB, analysis 
and syntilesls MT dictionaries for I_ are automatlcally 
constructed, while transfer dictionaries for (L1,L2) are 
produced from tire second. 
In all Integrated M(a)r system, it would be useful to 
maintain the two types of dlct ionar les in a unique 
structere, ill order to ensure coherency. rhls strLlcture 
would act as a "pivot", being the source of the "natural" 
view as well as of the "formal" dictioearles. Moreover, 
ft would be lnterestlng, for the same reasons, to reduce 
the number of I..E XDBs. Will\] the t ocl~rl 1due r/len t ioned 
above, there el'(:; I\]*'2 for' I'1 languages. 
The authors have begun a research aloeg those I lnes in 
1982 {6). \[rl 1985, this has led to a tentative 
(sma I 1-sea le} implementation ef a first prototype, 
adapted to tl~e aims of" a Eurotra coetraet. 
At tile time of revision of: tl~l s paper , work on 
specification arrd Implenrentation was being continued by a 
smal } team tryiog to construct a Japaeese-French-Er/glish 
L.EXDB, for a partlcular domain. Tills is why some details 
given in this PaDer are already obsolete. However-, the 
spirit I~as remaleed the same. 
lhe Ii/a~ll Idea Of the new organization ls to fntograte 
both types of dictionaries in a unlqtJe structllre, called 
"for'l~." integrated dictionary, or "I:ID'. A given FID tS 
associated with one natural laeguage and may give access 
to translations Into several other languages. 
Hence, there would be only n FiGs for n languages. The 
f"orm oF ~tle "natural" Dart has been designed to reflect 
the organl zat 1on o£ current modern usual dlct loner les. 
lhts is why we have limited ourselves to the "fork" 
architecture", and have not attempted to constrtlct a 
Llnlque str'ueture for n languages. 
In tile flrst part, we present tile "skeleton" of a Fill 
item, Part I1 shows how to "graft" codes onto It, and 
discusses the nature and place of tllose codes. Finally, 
some problems of' organization and fmplementation are 
discussed in part IIi. An annex gives a complete example 
for the len~r~as associated with the lexlcal unit COMPTER. 
I...USING A 'tNA URAL" SKELE rON 
After having stedied the strectures of several 
classical dlct 1char les, including LOGOS, I AROUSSE, 
ROGER1 , I4ARRAP'S, WEBSTER, SACHS, etc., we have proposed 
a staedard flora for the "natural skeleton" of a FIG item. 
Items are accessed by the lenrnas, but the eotlon of 
iexlcal untt ( "LU" , or "UL" 111 French) ts present. 
k, bl~\]la s are "norma 1 Forms" 0£ words ( in Engi lsh, 
tnflnlt ire tier' verbs, singular For" nouns, etc.). A 
lextcal uelt fs the main element of a derlvatlonal 
family, and is usually denoted by the main len~na of thts 
family. Lexlcal unlts are useful lrl MT systems, for" 
paraphrasing purposes. 
423 
.!. SOME._SIMPLE EXAMPLES 
1 ,4.~ IIa tree sDh~r e 'L~'j~ t r~h@r 1 gue" 
c\]~ "atmosph6re" 
Im C\] N.F. U\] -- base -- 
C OEISiLE 1 : NON OUANTIFIE 
£aff 1 : ASTRONOMIE 
sen& 1 : 
d ef "masse gazeuse qui entoure un astre" 
e x "l'atmosph@re terrestre" 
~d6Elv_ I'atmosphfarlque" c1 A 
schem RELATIF-A 
tra_Q 1 : 
ANG "atmosphere" 
RUS "atmosfe£a" 
ALM "Atmosphare" 
reef 2 : FIGURE 
sens 2 : 
_def "ambiance, cltmat moral" 
ex "une atmosphere d6prlmante" 
t£ag 2 : 
ANG v~i_\[ !sad_ 1 
RUS volr t r£aC\[ 1 
ALM "St in~nung" 
constr 2 : OUANTIFIE 
sen.ss 3 : 
def "unlt¢ de press l on" 
e x "une presslon de 2 atmospheres" 
t£_ad 3 : ~OiE trad 1 
cl¢ "atmospn@rlque" 
.!.f~ cl A. gl "atmosphere" cl orlg N.F. vo_lr tJ1 sens 1 
s~o~ 
~ff "relatlf & l'atmoph~re" 
~x "perturbations atmosbh6rtques" 
trad 
ANG "atmospheric" 
RUS "atmosfe~nylj" 
ALM I'atmosbharisch" 
CJ @ "pr@m@dlter" 
lm c l V.T.1 M1 -- ~ase I'pr@m@dit" 
d6rlv surf "atlon" gl N,F. schem ACTION-DE 
~j~e~,£ PPAS ~ A.Ac!lem OUI-EST-- 
se~s 
d#f "d@cider, pr@Darer avec calcul" 
ex "le pharmaclen avalt pr@m6dlt~ la rupture" 
ex "ll avatt pr@m@dlt@ de s'enfulr" 
trod 
ANG "premeditate" cl V. 
RUS "zamyishlltq" 
ALM "vorsessen" 
el@ "pr@m6dit6" 
lm C! A. u\[ "pr@m6dlter" C1 or!~ V.T.1 
d#riv ~frec\[ PPAS 
sen s 
d ef "qul est r@alis6 avec premeditation" 
ex "son crime gut pr@m@dit6" 
t~ad 
ANG "premeditate" ~i a. 
RUS "prednamerennyij" 
ALM "vorsassltch" 
GI~ 'lpr@m6ditatlon" 
\]rlj cl N.F. N! "pr~m6diter" C! o~!g V.T.1 
d~rlv surf "alton" sohem ACTIQN-DE 
sens 
def "desseln r@fl6chl d'accompllr une action" 
ex "meurtre avec pr@m6dttatten" 
trad 
ANG "premeditatlon" 
RUS "prednamerennostq" 
ALM "Vorsass" 
1,3. T~ sp~f element~n__theenotation 
There are three types of elements in the examples. 
Keywords are underlined. They show the articulation oF 
the standard structure. In case of repetition at the same 
level, numbers are used (e.g. trad 1). 
Identlflers are in uppercase (and should be In italic, 
but for the limitations of our printer). They correspond 
to the list of abbreviations which is usually placed at 
the beginning of a classical dictionary. They may contain 
some special signs such as "." or "-" 
Strings are shown between double quotes. They 
cerrespond to the data. We use our "local" transcription, 
based on IS0-025 (French character set). 
2. FORM OF AN ITEM 
2_..!~ K~ s .~emma s L _ l e x 1 c al u 0 ItA 
As illustrated above, an Item may consist of several 
lemmas, because of possible ambiguities between two 
canonical Forms (e.g. LIGHT-noun and LIGHT-adjective). 
The corresponding LU Is always given. The symbol "--" 
stands for the key of the Item. Confusion should be 
avoided In the denotation of LUs. For example, for lernmas 
LIGHT, we could denote the LU cerreponding to the first 
(the noun) by .... lm 1" or .... CI N." 
2.2. Constructions refinements m@s 
The preceding Items have been chosen for their 
relative slmpltctty. In general, a lemma may lead to 
several constructions, a construction to several 
refinements, eacb deflQed as a "meaning", for lack ef a 
better word. 
Further refinements may be added, to select various 
translations For a given meanlng. The Following diagram 
illustrates the idea. 
.......................................................... 4 
key ! 
__ _ l etTllla ! 
I constructlon ! 
! ! .... meaning/transl. ANG constructlonl 
I I RUS constructtonl 
t t ALM constructlonl 
I construction ! 
._ refinement 1 
! ..... meaning/transl. ANG { 
l !_ refinement 1 
f I I constructlonl 
! I reftnement ! 
! I constructtonl 
! RUS constructfon! 
f ALM constructionl 
.... refinement I 
I meanlng/trans1. ANG ..... constructlonl 
L__ RUS .... construction! 
! ALM__constructlon! 
lemma meaning/transl, ! .... ~L~ ........ L--fiZ ............................................... 
\[ntultlvely, constralnts are more local to the left 
than to the right. The presence of a construction may be 
tested In a sentence, but the notion of domain of 
discourse or of level of language Is obviously more 
global. 
The notion of construction Is fundamental. In 
particular, predicative words cannot be translated in 
Isolation, and it Is necessary to translate expressions 
of the Form P(x,y,z), P being the predicate and x, y, z 
Its arguments, possibly with conditions on the arguments. 
Note that 1dloms or locutions are particular Forms of 
constructions. 
In general, refinements may be local or global. Local 
refinements often consist In restrictions on the semantic 
features of the arguments ("to count on somebody" vs. "to 
count on something"). Global refinements concern the 
424 
domain, the style (level of discourse), or the typology 
(abstract, bulletln, article, ckeck-11st...). 
In our view, a meaning In L1 ls translated by one or 
several constructions In L2. 
We have then avoided to translate a meaning by a 
meaning, which might seem more logical. But this would 
have forced us to descrlbe the corresponding cascade of 
constraints In L2. As a matter of fact, It Is usually 
possible to reconstruct It, from the constraints tn L1 
and contrastlve knowledge about L1 and L2. Hence, we 
follow the practice of usual dlctlonarles. 
2~.3, TrAoslatlqns .C!--t~: "fork" dictionaries 
We have shown how to include In an Item Its 
translations Into several target languages. Hence the 
term "fork". The "handle" Of the item consists In all 
information concerning the source language (L1). In 
order for such an organization to work, we must have at 
least 2 such dictlonarles, for L1 and L2, as no detailed 
information about 1_2 ls included In the Ll-based 
dictionary. This information may be found In the L2-based 
dlct 1chary, by look lng-up the appropriate ttem and 
locatlng the construction: the path from the key to the 
construction contains It. 
3. F&CTORIZ_ATION ANp_ REFERENCE 
AS seen In the examples, we introduce some 
possibllltles of naming subparts of a given len'~na, by 
simply number lng them (sees 3 refers to trad 1 In 
"atmosph6re" ). 
This allows not only to Factorize some information, 
such as translations, but also to defer certain parts of 
the item. For example, translations might be grouped at 
tile end of the (linear) writing of an item. The same can 
be said of the formal part o¢ the Information (see 
be low). 
I_I ~_GRA. I= T I N_.O" ._EO_BMAL.. I NFOI~MAT ~ON • I' CODE_S "} 
.1=. _P R I NC I PLES 
_l,J..._ALLrl~utQ~_~n~clAsses 
The formalized information may correspond to several 
dlstlnct \]ln.qulstlc theories. Such a theory Is deflned by 
a set o¢ formal attr!butes, each of a well-defined type. 
For example, the morphosyntactlc class might be defined 
as a scalar attrlbute: 
CATMS (VERB, NOUN, ADJECTIVE, ADVERB, CONJUNCTION, 
etc. ) 
The gender might be defined as a set attribute: 
GENOER = ens (MASCULIN, FEMINTN, NEUTRE). 
Each theory may glve rise to several implementations 
(\]tngwares), each of them having a particular notation 
For represent lng these attributes and their values. 
Moreover, lr, a given llngware, the information relatlve 
to an Item may be distributed among several components, 
such as analysis, transfer and synthesis dictionaries. 
Usually, comblnat Ions of particular properties (or" 
at tr lbute/value pairs) are glven names and called 
cj asses,_ For example, In ARIANE-78, there are the 
"morphologlcai" and "syntactic" "formats", abbreviated as 
FTM and FTS, in the AM (mor phol oglca I analysis) 
diet lonar les. Special questionnaires, called "indexing 
charts", lead to the approprlate class, by asking global 
questions (vs. one particular question for each possible 
attr lbute). 
1.2~ F_oEm of _Wbat...ls._~\[rafted 
In tile slmplest case, there ls one theory, and one 
corresponding 11ngware. Tile grafted part wtl\] be of tile 
form: 
apJ3 info properties In the theory 
code codes (classes and possibly basic properties) 
The keyword aPD means "appended". 
In a A less simple case, there might be two theories, 
called and B, of French. Suppose that there ts an 
analyzer, FR1, and a synthesizer, FRA, corresponding to 
A, and two analyzers and a synthesizer (FR2, FR3, FRB), 
relative to B. The grafted part will be of the form: 
ap_Q tJq A l nfo properties In theory A 
c#d~ LS FR1 AM FrM CMO01 FTS CS023 
code LC FRA ... (LS for source language, 
(LC for target language) 
t l! B j_nf~o properties In theory B 
code LS FR2 AM FTM FORM3 FTS SEM25 
cpde LS FR3 ... 
code LC FRB ... 
"AM" must be Known as ae lntroductor of cedes for 
morphological anaiysls in ARlANE-78-based llngwares. 
Formal parts may be attached at all levels of an item, 
for factorizatlon purposes. The Information ls supposed 
to be cumulated along a path from a key to a "meaning" or 
to a translation. If two bits of information are 
contradictory, the most recent one (rlghtmost In our 
diagrams) has preeminence. 
Taking again the example of systems written In 
ARIANE-78, we may suggest to distribute the codes In the 
following fashlon. One could attach: 
- the morphological codes (FTM) and the "morphs" to 
the roots ("bases") or to the lenin/as; 
- the "local" syntaxo-semantic codes (FTS) to tbe 
\]ermlas or to the constructions; 
- the "global" syntactic codes (concerning the 
typology) to the various levels of refinement; 
- the codes concerning the derivations to the d~E1v 
parts, wherever they appear In the item. 
£, AN_._E XAMPkE .(_'~'ATMO&R_HSR~ 'LI 
C16 "atmosph6re" 
lm c.1 N.F. ul_ -- 
aDD 
$13 A lnfo FLEXN=S, MORPH="atmosph~.re", 
DE R I V ='' a t mosph6r 1 que" 
code IS FR1 AM FTM FXN1 
code \[.C FRA GM FAF FXN1 
th B l_nfo FLEXN=ES, MORPH='atmosph6r", 
ALTER=GRAVE, SUF=IOUE 
code. LS FR2 AM FTM FNESIO 
code. LC FRB GM FAD FNESIO 
6DD th A info CATMS=NOUN, GENDER=FEMININ 
cp_d~e LS FR1 AM FTS NCFEM 
_code LC FRA GM FAF NCFEM 
th B In£Q CAT=N, GNR=FEM, N=NC, AMBSEM=3 
cQde LS FR2 AM FTS NCFEM3 
code LC FRB GM FAF NCFEM 
#pnst r 1 : NON QUANTIFIE 
raff i : AS'IRGNOMIE 
sens I : 
d ef "masse gazeuse qul entoure un astre" 
e x "l'atmosph6re terrestre" 
d6£1v "atmospll&rlque" c1 A 
scllem RELA1 £F-A 
t.l~_ad 1 : 
ANG "atmosphere" 
RUS "atmosfe\[a" 
ALM "Atmosphare" 
aDD .t_l~ A info SEM=STRUCT, SEMI=ASTRE, 
DERPOT=NADJ, SCHEM=13 
425 
LS FR1 AX FAF PNA 
code LC FRA GX PAF PNA13 
th B info SEM=COLLECT, CLCT=FLUID, 
SEMI=SPHERE, DERPOT=NA 
code LS FR2 AX FAF PNA PAF COLF 
code LC FRB GX FAF DERIQUE 
raff 2 : FIGURE 
e~ 2 : 
de f_ "ambiance, cllmat moral" 
ex "une atmosphGre dGprlmante" 
ANG ~_r ~&dd 1 
RUS v~lr trad 1 
ALM "St tmmung" 
~_p_ th A lnfo SEM=ETAT, SEMI=ACTIVITE 
code LS FR1 AX PAF SDETAT, V1ACT 
code LC FRA ,.. 
constr 2 : QUANTIFIE 
sens 3 : 
def "unlt~ de presslon" 
e x "une presslon de 2 atmosph@res" 
try_c\] 3 : volt tra~ 1 
aJAP_ th A lnfo SEM=UNITE 
code LS FR1 AX PAF SOUNT 
code LC FRA ... 
th B lnfo SEM=UNITE, SEMZ=POIDS 
c~de LS FR1 AX PAF SOUNT, VPPS 
code LC FRB ... 
3. CONSTRUCTION OF INIEGRATED DICTIONARIES 
Suppose the natural skeleton of an ltem ts obtained by 
using available dictionaries. There are two main methods 
for constructing the a~p parts. 
First, one may begin by filling the lnfo parts. This 
Is tile tecllnlque followed by the two afore-mentioned 
national projects. For this, people without special 
background in computer linguistics laay be used. They fill 
questionnaires (on paper or on screen) asking questions 
directly related to the formal attributes. Thts 
information ls checked and inserted In the i nfo parts at 
the propel" places, which are determined by knowing the 
relation between the "natural" Information and the 
"theory". 
In a second stage, programs knowing the relation 
between the theory and a particular ltngware will fill 
the ¢.gde parts. 
The second methods tries to make better use of 
existing MT dictionaries. First, the relation between 
the elements of a llngware and the "natural" system is 
defined, and programs are constructed to extract the 
useful Information from the MT dictionaries and to 
distribute It at the appropriate places. Then, knowing 
the relation between the "coded" Information and the 
theory, tnfg parts may be constructed or completed. 
At the time this paper was revised, M.DYMETMAN was 
Implementing such a program to construct a FID from our 
current Russfan-French MT system. Hls results and 
conclusions should be the theme of a forthcoming paper. 
Inconsistencies may be detected at various stages hq 
tbe construction of a Fib, and the underiylng DB (data 
base) system must provlde facilities for constructing 
checks, using them to locate incorrect parts, and 
modifying the item. 
Ill. PROBLEMS OF DESIGN AND IMPLEMENTATION 
The construction of an Implemented "mock-up" has led 
us to identify some problems tn the design, to wonder 
whether there is any available DBMS (data base management 
system) adequate for our purposes, and to ask what should 
be done about the representation of characters, Ina 
multt 1 ingual setting. 
I_ I\]E-\[=ATION .B_E TWEE_N_ NATU RAL,_. AND F O RMA(- I N F 0 RMA!.I O_N 
The relation between the formal information of a 
theory and the formal information of an implemented model 
of It (a llngware) Is simple: the latter Is a notational 
variant of (a subset of) the former. 
By contrast, it ls not so easy to define and use the 
relation between a formal theory and the "natural" 
information. The theory mlght ignore some aspects, such 
as phonology, or etymology, wi)lle it would use "semantic" 
categories (such as COUNTABLE, TOOL, HUMAN, 
PERSONNIFIABLE, CONCRETE, ABSTRACT...) far more detailed 
than the "natural" ones (SOMEBODY, SOMETHING...). 
In order for the construction of such FID to be 
possible, we must at least ask that all "selective" 
lnformatlon, which guides the choice of a meaning and of 
a translation, must In some sense be co~aon to the 
natural and the formal systems. 
Hence, these systems must flare a certain degree of 
homogeneity. Dictionaries containing very llttle 
gral~attca\] Information (e.g. only the class) cannot be 
used as skeletons For FIDs integrating the lexlcal data 
base of a (lextcally) sophisticated MT system. 
Another problem is just how to express the relatlon 
between the systems, In such a way that it Is possible: 
to reconstruct (part of) the skeleton of an ttem 
from the "coded" information; 
to compute (part of) the formal information on a 
path of the skeleton. 
For the time being, we can write ad hoc programs to 
perform these tasks, for a particular pair of systems, 
but we have no satisfactory way to "declare" the relation 
and to automatically generate programs from it. 
2. TYPE OF UNDERLYING DATA-BASE SYSTEM 
P.Vauquols (a son of B.Vauquols) and D.Bachut have 
implemented the above-mentioned mock-up in Prolog-CRISS, 
a dialect of Prolog which provides fac1lltles for tile 
manipulation of "banks" of clauses. It Is possible to 
represent directly the tree structure of an item by a 
(complex) term, making it easy to program the functions 
associated to a FID directly In Protog. 
ttowever, Prolog Is not a DBMS, and, at least with tile 
current Implementations of Prolog, a large scale 
implementation Would be very experlstve to use (in terms 
of t 1me and space) , or perhaps even impossible to 
realize. 
AS FIbs would certainly grow to at least 50000 items 
(perhaps to 200000 or more), it might be preferable to 
implement them Ina colm~erclally available DBMS system, 
such as DL1, SOCRATE, etc. A numeric simulation made by 
E. de goussineau shows that a (1--2) Fig of about 100000 
len~mas CoUld be Implemented In a Socrate DB, of the 
network type, in one or two "virtual spaces". No 
experlment has yet been conducted to evaluate the 
fieasiblllty o£ tile method and its COSt. 
Other possibilities include relational and specialized 
DBMS systems. In a relational DBMS, each Socrate entity 
would glve rise to a relatlon. Specla\]lzed DBMS have 
been developed for terminological data banks, such as 
fERMIUM or EURODICAUTOM. There is a general tool for 
building terminological DB, ALEXIS (3~. 
3. CHARACTER SETS 
None of tile above--mentioned systems provides 
facllltles for handling multlllngua\] character sets. 
Hence, all strings representing units of the considered 
natural languages, including the keys, must be 
represented by appropriate transcriptions. 
Thls is clumsy for languages written In the Roman 
alphabet, and almost unacceptable for oilier languages, 
alphabetical or ideographlc. Supposing that bit-map 
terminals and printers are available, two solutions may 
be envisaged: 
define appropriate ASCII or EBCDIC transcriptions, 
and equip the DBMS wltll corresponding interfaces; 
426 
modify the BBMS itself to represent and handle 
several (possibly large) character sets. Thls ls 
what has been done in Japan, where progralrmleg 
langLlages, text processing systems and operating 
systems have been adapted to the 16-btt JIS (or 
JES) standard. 
CONC~ION 
We have presented and illustrated the new concept of 
Fig, or Fork Integrated Dictionary, To our knowledge, 
this ts the first attempt to unify classical and MT 
dictionaries. However, only a small mock-up has been 
implemented, and some problems of design and 
Implementatl(in have been detected. It remalns to be seen 
wllether large scale FlOs can be constructed and used in 
an operational setting. 
ACKNOWLEDGMENT 
We are grateful to AOI (French Agence de 
l'Informatlque) and to the EC (European Co,~nunity, 
EUROTRA project) to have given US the opportunity to 
start this research and to test some of our ideas. 
--0-0--0--0-0-0--0-0~ 
#N_N~X__L_"COMP!EJ~" 
cjl~ "compt er" 
.app_ 1 (no ":", hence see forward) 
base "compt" .apD 2 
c onstr. 1 : QN.x -- QCH.y A QN,z 
.~DP 3 
Sg~S 1 : 
def "falre payer" 
tra~! 1 : 
ANG "charge" el. V. 
Cs.\[r~d S-O.x -- S--O.z FOR S-'Ttt.y 
S--O.x -- S-TH.y TO S-O.z 
~PCQ_ 4 
RUS "zakhestq C1 V. 
ALM "auszahlen" cl V. 
D$.\[ra.d J-D.x -- ETW.y J-M.z 
aJoD 5 
coDs tr 2 : ON.x -- ON.y POUR ON.z 
aP.P_ 6 (further _aDD parts suppressed) 
ON.x -- OCH.y POUR OCH.z 
sens 2 : 
_des "tenlr pour" 
t racl 2 : 
ANG "consider" p.i.V. 
cst_r#td S-().x -- S-O/S-TH.y AS S-O/S.-TH.z 
flUS "skhltatq" cl V. 
c_strad KTO. x -- KOGO/KHTO. y KEM/KHEM. z 
ALM "haltee" e! V. 
estr'~td J-d.x -- J-N/ETW.y FUER J-N/ETW.z 
coQ..str 3 : QN.x --. QN.y PARMI QN.z 
ON.y -- PARMI ON.z POUR ON.x 
QN.x -- 6)CII.y PARMI QCH.z 
OCI4.y -- PARMI OCii.z POUR ON.x 
sens 3 : 
dei~ "consldGri;r cormle falsant partle de" 
r__ad_ 3 : 
ANG "count" c\] V. 
cst!i.ar4 5-O.x -- S-O/S-IH.y AMOUNG S-O/S--Tf.I.z 
RUS "skhltatq" c\[ V. 
c strad KTO,x ~- KOGO/KHTO.y SREDI KOGO/KHEGO.z 
ALM "zablen" c l V. 
¢strad J-D.x -- J-N/ETW.y ZU J-N/ETW.z 
constE 4 : QN.x -- :\[NF/QUE+IND/SUR.I.OCH.y 
sans 4 : 
d.ef "esp@rer" 
ira{4 4 : 
ANG "expect" C1 V. 
CS~ _tad S-O. x -- TO+INF/THA\]+ IND/S-TII. y 
RUS "rasskhttylvatq" el. V. 
cstrac| KTO.x -- INF/KHTO+IND/NA+KHTO.v 
ALM "hefren" c1 v. 
cstra(| J--D, x -- ZU+INF/DASS+IND/AUF+ETW.y 
cQQstr 5 : ON.x --- Sun ON,y 
s ens 5 : 
def "avolr conflance" 
trLacJ 5 : 
ANG "rely" cl V, 
cstrad. S-O.x -- ON S-O.y 
RUS "polozbitdsya" _cl V. 
cstrrad KTO.x -- NA KOGO.y 
ALM "zahlen" cl V. 
cstrad J-PLY{- -- AUF J-N,y 
eonstr. 6 : ON.x -- /WEC ON/QCH.y 
sens 6 : 
dgf "prendre en consldGratlon" 
t..rad 6 : 
ANG "reckon" cl V. 
cstrad S-O.x -- WITH S-O/S-TH.y 
RUS "skhitatqsya" C1 V. 
cslrad_. KTO.x -- S KEM/KHEM.y 
ALM "rechnnn" 01 V. 
csLrad J-O.x -- MIT J-M/ETW.y 
constr "f : QCI4.x -- TANI--DE,y 
def "total i ser" 
e x "la blbllethe.que compte 1000 llvres" 
trad 7 : 
ANG "count" C1 V. 
c sl;Lad 5-TH. x -- SO-MUCH. y 
RUS "naskhltylvatq" c! V. 
cst!-ad KI4TO. x -- SKOLQKO. y 
ALM "zahlen" El V. 
cslrad ETW.x -- SOVIEL.y 
¢pnstr 8 : QN/QCFI.x -- OCFl.y 
raff x. PERSONNE/INSTRUMENT & y. NOM-DE-MESURE 
sens 8 : 
de f 'lmesurer, 6valuer I, 
tra_d 8 : 
ANG "cot.lnt" C\] V. 
cstrad S-O/S-TH. x .... S--TH. y 
427 
~J~ 1 
aD_D 2 
aDD 3 
RUS "otskhttatq" c\] V. 
~!E~ KTO/KHTO.x -- KHTO.y 
ALM "rechnen" c\] V. 
c~tra_~ J-D/ETW, x -- ETW.y 
r aff x. PERSONNE/INSTRUMENT 
& y,NOM-COLLECTIF/PLURIEL-DENOMBRABLE 
sens 9 : 
~LQf "d6nombrer" 
ex "compter les moutons" 
tra~ 9 : 
ANG "count" gl. V. 
cstrad S-O/S-TH.x -- S-O/S-TH.y 
RUS "skhitatq" c\] V. 
~StP__~L~ KTO/KHTO.x -- KOGO/KHTO.y 
ALM "zahlen" c\] V. 
cstra~d J-D/ETW.x -- J-N/ETW.y 
consLE 9 : QN/QCH.x -- 
r'aff x. PERSONNE/INSTRUMENT 
& -- DE-TETE/SUR-SES-DOIGTS/JUSQU'A 
sens 10 : 
d e! "~num~rer" 
10 : yolr traq 9 (sans y) 
r aff 
sens 11 : 
........ ~e~ "~tre important" 
trad 11 : 
ANG volt trad 10 
RUS "skhttatqsya" c_! V. 
#trad NUZHNO -- S KEM/KHEM.x 
ALM "wichtig" g! A, 
cstrad J-D/ETW.x -- SEIN 
c#f~ x. PERSONNE 
sens 12 : 
def "regarder & \]a d~pense" 
tra~ 12 : 
ANG "stingy" CZ A. 
cstrad S-O.x BE -- 
RUS "yekonomnyij" CZ A. 
cstrad KTO.x (BYITO) -- 
ALM "sparsam" ~l A. 
~trad J-D.x --SEIN 
constc 10 : \]ocut A --- DE QCH.x c\] PREP, 
sens 13 : 
d ef "& partir de" 
tr6d 13 : 
ANG "reckoning" c\] PREP. 
cstrad -- FROM S-TH.x 
RUS "nakhinaya" c\] PREP, 
~stra~ -- S KHEGO.x 
ALM "yon" C\] PREP, 
cstrad -- ETW,x AN 
tb A jDro CAT=V, EXPANS=(TRANS,INTRANS), 
SEM=(ACTION,ETAT) 
cq~ LS FR1 AM FTS VB AX PAF VT1TR 
LC FRB GX FAF VB 
th A lnfo CONJUG=IGR 
~de LS FR1 AM FTM VB1A 
LC FRA GM FAF VB1A 
th A lnfo PRED=ECHANGE, MODALITE=FACTITIF, 
VLI=GN, VL2=AGN, VLO=GN 
cpde LS FR1AX FAF SCHR11PAF ECHFC 
LC FRA GX PCP CSTR1FAF SCHR11 
aJ~D 4 : th A into ARG2=FOR, ARGINV=12 
codq LS FR1TL FAF ×YFORZ PAF INV12 
etc... 
--0--0-0-0--0--0-0-0-- 
428 

REFERENCES 

1. Ch. Boitet & N.NedobeJkine (1981), 
"Recent developments in RLISS 1an-French Machlne 
Translation at Grenoble", 
Linguistics 19, 199-271 (1981). 

2. Ch,Boltet, P.Gulllaume, M.Qu~zel-Ambrunaz (1982), 
"ARIANET78 an integrated environment for automated 
translation and human revision", 
Proceedings COLING82, Nor th-Hol land, Linguistic 
Series No 47, 19-27, Prague, JLlly 82. 

3. SociGt@ ERLI (1984), 
"ALEXIS : pr'@sent at ion g#nGrale et manuel 
d'ut lllsat Ion", 
Soc. ERr_I, Charenton, octobre 1984. 

4. A.Melby (1982), 
"Mul t l-level translation aids in a dlstr Ibuted 
system", 
Proceedings COLING82, North-Holland, 215-220, 
Prague, July 82. 

5. M.Perrler (1982), 
"Banque TERMIUM, gouvernement du Canada. I exlque 
BCF (budg~tafre, eomptable et financier", 
Bureau des t raduct ions, Direction de la 
termlnologle, section ~conomique et Jur ldlque, 
Ottawa, Juln 1982. 

6. Ch.Boitet, N.Nedobejklne (1982), 
"Base h.~xlcale: organlsatlon g@n~rale et tndexage", 
ProJet ESOPE, ADI, rapport flnal, partle D, 1982.. 

7. D.E.Walker (1985), 
"Knowledge Resource Tools for Accessing Large Text Files", 
Proc. of the Conf. on Theoret lea I and 
Methodo/og lea \] Issues in Machine Translation of 
Natural Languages, Colgate Univ. , Aug. 14 16, 1985. 
