A PROTOTYP\]{ MACII\]NE TRANSI,ATION B/~SI!D ON EXTRACTS Fl?m,~ DATA P!~OC\]{SSINP, VANUAI,S 
11. Luctkens Ph. Ferment 
\])epartment of Information ,<c\]ence and 1)ocumentat:ion 
Free Un:ivers\]ty of Brussels 
Be l ~ \] um 
The fo\]lowine article presents a prototype 
:for the mach:ine translation of English into 
French. The study was carried out over a 
period of nine months, fol:lowin? a six 
months preliminary study, under contract 
with tile Burroughs Company and using a mi- 
cro-comnuter of tile B20 series. 
The prototype aims to provide a d\]apnostic 
study that lays the foundations for further 
development rather than immediately nrodu- 
cing an accurate but limited realisation. 
By way of exneriment, the corpus :For trans- 
lat:ion was based on selected extracts :From 
computer systems manuals. After studyin, 
the basic materia\], as well as assessdn.~ the 
varJous decis:ion criteria, :it was decided to 
construct a prototype made tm of three com- 
ponents : analysis, transfer and generation. 
Although the prototype was desimmd with 
muitJ\]ingua\], anp\]icat\]ons :in mind, it am~ea- 
red preferable at th:is stage not to set un a 
system wilh interlinp:ua since tile e:labora- 
• lion of the :interlinyua alone would have ta- 
ken un a disnronortionate amount of time 
(King, Perschke, 1984), thus handJcap~Jnp 
the development of the nrototyne itself. 
I. Genera1 ontlJne of the nrototvr;c 
Genera\] outline Prototype 
SL text A.nal Zs):s 
Preprocessiny ...formating of text 
with a view to fur- 
ther nrocessinf 
+MorDh. anal ..... not envisaged :for 
the moment 
+Synt. anal ....... ATN to produce a 
deed structure 
+Disambiguation..not envisaged :for 
the moment 
Transfer 
+Lex. transfer...morphemic transla- 
tion 
Str. transfer...adaptation of the 
parse tree to gene- 
ration in the TI, 
Generation 
Synt. synth .... generation of surfa- 
ce structures \]inked 
with SI, 
+Morph. synth...ru\]es of agreement, 
conjugat\] on, . . . 
TI, text Post-edit\]ne...:in the :First stape, 
use of the B20 text 
\]\]FOCOSSOF 
: sub-com~)ononts with dict:Jonary \]eok-un 
2 . "Apj~ \] y s is" _corer 9nej~t 
\] n tile nrototyl\]o ~ tile HaIIcq \] ys i S I, col\]lDol\]ont 
uses only three of LI~e ahove sub-comnonents: 
nreprocessinp, source~lanouape dictionary 
and syntactical parser. Reasons :for not 
us:in~ mornholo?::ica\] analysis and desambigua- 
tJon are qiven below. 
2.1. Prenrocessin~ 
The nrenrocossin~ sub-col~lDonel\]t recof~ni zes 
which sentences to analyse, a sentence being* 
considered as a ser:ies of s:i~,ns which are 
themselves i,rouned to~,ether in words, and 
ending, :in a ful\] ston. The latter is the 
only spec:ial sign wh:ich :is taken into ac- 
count. Horeover, all the canital letters 
placed lit the be~-innin~ of sentollces are con- 
verted to the lower case before analysis and 
are reintroduced during oeneration. 
One could env~sape allowino :for nunctuation 
s~ns when ,mrsing, since these sometimes 
heln to root: out ambiguities of certain sen- 
a t.~ll%~b ....... , 1~ .5 L Llkl v \] ~-) LLIL I UII L l.~ ~t)l!5 I L-fIG L 2 
thi s. 
2.2. Mornholooical analysis 
As the nrototyne was being, real~sed based on 
and for a limited corrals, the SL dictionary 
was made un of comnlete forms : the working 
out of a mornhological narser is simpler 
than that of a syntactical narser. 
2.3. Syntactical analvsis 
The Aup-mented Transition Network (ATN) was 
selected for the ana\]ysis : it had success- 
fully been used in many nrevious systems : 
I,UNAi~, SH!~DLII, \]INTEI,I,\]iC, T and, more recently, 
ENFISPAN (Leon, 1984). T. ~'~\]no~rad nresents 
three networks in great detail in his boo\]< 
'Lan~ua9e as a Cognitive Process' (Winograd, 
1983). These were taken as the basis for 
the four (Sentence, Noun Phrase, Prenositio- 
nal Phrase and Adjectival Phrase) of the nro- 
totyDe, thus makin~ it nossible to sneed un 
the develonment of a narser which had alrea- 
dy nroved itself in other resnects. 
The majority of the modifications made to 
the \]~rino~rad's ATN were aimed at increasin~ 
its nerformance (esnecia\]ly bv dealin~ with 
the most common cases of coordination) as 
well as its determinist canacities thereby 
ensuring, the accuracy of the initial analy- 
sis sunnlied by the system (it :is ill fact on 
this analysis that the transfer operates be- 
cause the micro-comnuter's memory was satura- 
643 
ted before it had managed to supply all pos- 
sible analysis). 
2.4. Disambiguation 
Within the prototype framework, the creation 
of a disambiguation sub-component would have 
taken up too much time and would not have be- 
en useful particularly that this research is 
deliberately designed to apply to only a li- 
mited corpus in which most of the ambigui- 
ties concern the Prepositional Phrase attach- 
ment and need not be solved for the transla- 
tion if English into French. 
2.5. Source-Language dictionary 
For the various reasons explained above, the 
dictionary includes only complete forms. 
All variable words are characterised by dif- 
ferent syntactical :features, certain of 
which concern their form, others do not. 
All of these are treated by the analysis com- 
ponent. Semantic features could easily be 
added at a later stage. 
Words forming certain 'traditiona\]' classes 
may belong to various categories of the ~ro- 
totype dictionary. This is notably the case 
with cardinal adjectives, which are at once 
classified as determiners and substantives. 
At present, the only compounds that the pro- 
totype dictionary accepts are locutions with 
a maximum of two consecutive words. Longer 
locutions, compound verbs and other disconti- 
nuous compounds~ quite rare in the corpus, 
will be treated as follows at a later sta~e: 
all words liable to appear in compounds will 
be tagged with a nointer to this effect, to 
enable the preprocessing sub-comnonent to de- 
termine whether a compound or simple form is 
present in a given text. 
Numbers were not introduced into the prototy- 
pe dictionary. The parser would accent them 
if a routine were created that would automa- 
tically attribute noun and determiner catego- 
ries to them. 
3. "Transfer" component 
The transfer component deals with the re- 
sults obtained by the analysis component. 
3.1 . Structural transfer 
By dealing<with the structural transfer 
first, one is saved, notably, from having to 
waste time translating forms that will duly 
be dropped (such as 'will'), since the adap- 
tation to tense in French is done along with 
the structural transfer. 
The structural transfer operates on the sen- 
tence as a whole, on various levels. It on- 
ly saves those results of the analysis that 
are pertinent for the generation. 
3.1.1. Sentence 
The various constituent elements of the clau- 
644 
se are rewritten so as to conform to the fol- 
lowing seauence : 
(Passive) + (Negative) + Role + NPl + Auxi- 
liaries + Verb + (NP2) + (NP3) + PP 
NPI is the deep subject of the clause, NP2 
is the direct'object (the attribute or even 
nothin~ at all if the main verb is of the 
'be' type) and NP3 is the indirect object. 
All nass\]ve clauses are out into the active 
voice durlnv the analysis and structural 
transfer. These are the transformations 
that, where necessary, regain the passive 
voice in the process of generation into 
French. 
3.1.2. Noun Phrase 
Three rewrites are possible for the noun 
phrase : 
- Number + Pronoun 
- DNP ('dummy NP') 
- Number + ((Determiner) + Noun + (Adjective) 
+ (Noun) + (PP) + (S)) 
The rewrite elements are derived from vari- 
ous registers of the analysis result. 
3.1.3. Verb Phrase 
By Verb Phrase is understood here the Auxi- 
liary together with the Main Verb. This in- 
volves 'Auxiliary' in its widest sense, that 
is comprising all that precedes the verb : 
tcnse (present, infinitive and/or imperfect), 
modality and even person. It should be no- 
ted that only third person forms appear in 
the corpus studied. 
The verb phrase rewrites itself extensively 
in the following manner : 
(Infinitive) + Present/Imperfect + 3rd.n + 
(Avoir/Etre + Past Participle) + (Modal) + 
(Avoir/Etre + Past Participle) + Verb 
To arrive at this rewrite, many rules that 
combine together are brought into play for 
various reasons concerning, notably, the mul- 
tiple feature categories, the treatment of 
'be', 'dummy be' and .'dummy modal'. 
3.2. Transfer dictionary 
In En~l\]sh as in other languages, a word may 
belon~ to several grammatical categories 
('all' is at once adverb, determiner and pro- 
noun) or, indeed, the same form may have va- 
rious dimensions ('read' has the features of 
infinitive, present (except for the 3rd per- 
son in the singular), and past as well as 
past participle). Besides, one word in Eng- 
lish may have several possible translations 
in French. For these reasons, it seemed con- 
venient to create a transfer dictionary si- 
tuated in between source and target language 
dictionaries in order to avoid excessive mul- 
tiplication of relationships and also to fa- 
cilitate the extension of the system to o- 
ther language pairs. 
Unlike the English terms which are in the 
dictionary in a complete form, their French 
translations are presented in canonical form. 
3.3. Lexical transfer 
Lexical transfer operates directly after the 
structural transfer. At the moment, it is 
always the first translation (when there are 
several possibilities) that is chosen. 
One could envisage adapting various means of 
selecting the best translation, ranging from 
the human operator to the style index. 
4. "Generation" component 
The generation or synthesis takes place in 
two stages : the syntactical generation is 
followed by the morphological generation. 
Both of these stages refer to data :from the 
target-language dictionary as well as from 
the common data pool. 
The generation in French is inspired by the 
rules of (:homskian generative and transforma- 
tional grammar, specifically as presented in 
the work of C. Nique (Nique, 1978). 
Most of the other grammatical theories cur- 
rently in vogue (Montagov\]an Grammar, Genera- 
lized Phrase Structure Grammar, ...) make wi- 
de use of semantics and thus necessitate Far 
more powerful computer resources than those 
available on micro-computers at present. 
4.1. Target-Language dictionary 
In the target-language dictionary, the diffe- 
rent :features allowing :for the agreement of 
the canonical :forms must be added to the va- 
rious grammatical categories. 
A common data pool is associated with this 
dictionary. This enables one to conjugate 
the verbs correctly (root table and conjuga- 
tion table), lit also contains the different 
forms of the determiners and their conditi- 
ons of usage. 
4.2. Syntactical Generation 
The generation is carried out by means of 
transformations. Below are presented those 
transformations that have a fundamental role 
in the elaboration of the structure of the 
sentence in French and in the ordering of 
its terms. Others directly concern the mor- 
phology of the words, and are outlined brief- 
ly later on. 
In accordance with the theory of generative 
and transformational grammar, transformati- 
ons occur in an orderly manner in an ascen- 
ding cycle, that is to say :from the inside, 
outwards, starting with the most subordinate 
clauses. 
Passive Transformation : 
e.g. : The entire field of booleans can be 
treated - active deep structure - Le champ 
entier de boo16ens peut 8tre trait6. 
Transformation of Negation : 
e.~. : Each name is an identifier which can- 
not be a\]located --positive dee~ structure - 
Chaoue nom est un \]dent:ifieur qu\] ne peut 
pas ~tre allou6. 
Transformation of Subordination, which cor- 
rectly :inserts the subordinate clauses : 
e.g. : Each bit may be used to store a logi- 
cal value - Chaque \])it ~eut @tre employ6 
pour m(moriser une valeur lo~.ique. 
Auxiliary Transformation : 
- if, in the rewrite of the verbal nhrase, 
Avoir/gtre occur, the apDronrJate auxiliary 
is chosen denending on the feature specified 
:in the tareet-langua~e dictionary. 
Transformation Movement of the Adverb : 
e.g. : A virtual field item always occupies 
an :integral number of 4-bit di£its o Un arti- 
cle v:irtue\] du cham~ occune toujours un nora- 
bre entier de chiffres de auatre hits. 
4.3. ~ornholoqica\] ~enerat:ion 
The mornholoyical generation :is made up of 
the following trans:formations : subject-verb 
a~,reement, conjugation, noun oualifier (which 
inserts 'de le' between a noun and its com- 
olement), insertion of determiner, noun a- 
,reement, determiner agreement, adject:ive 
agreement, placement of adjective, elision 
and contract:ion. 
5. Conclusion 
The resuJts obtained over a relatively brief 
neriod by a team of two researchers may he 
considered as encouragin~ and tend to be o~ 
timistic as to the :future o:f machine trans- 
lation or machine-aided translation on small 
systems. 
References 
KING (~.), PERSCHKE (S.). - Eurotra. 
no, ADril 1984. 
Luga - 

LEON (~.). - Develonment of En~lish-S;oanish 
~achine Translation. - Cranfie\]d, 1984. 

NIOIJE (C.). - Initiation a la ~ramma\]re g6n6- 
rative. Paris, Colin, 1978. - 176 n. 

NInUE (C.). - Grammaire o6n6rative : hypothe- 
ses et argumentations. Paris, Colin, 
1978. - 207 n. 

WlNOGRAI) (rl'.). -- Language as a Cognitive Pro- 
cess, Syntax. I,ondon, Addison-Wesley, 
1983. - 64O p. 
