TYPOLOGY STUDY OF FRENCH TECHNICAL TEXTS, 
WITH A VIEW TO DEVELOPING A MACHINE 
TRANSLATION SYSTEM 
B.ROUDAUD 
B'VITAL (SITE group) 
35 rue J. Chanrion - 38000 Grenoble - FRANCE 
tel : +33 76 51 04 96 
e-mail : B Roudaud@site-maisons-alfort.fr 
ABSTRACT 
Within the industrial context of the information 
society, technical translation represents a considerable 
commercial stake. In the light of this, machine 
translation is considered as being an application of 
paramount importance. It is for this reason that the 
activities of B'VITAL have always centered around the 
processing of technical texts. 
The following article gives an account of the 
various tasks carried oat over the last few years on 
corpus analysis. We have drawn conclusions as to the 
validity of the notion of text typologies, applied in 
particular to technical matter, with a view to developing 
a machine translation system. The study was conducted 
using a fair amount of French documents and has led us 
to observe in particular, that a same typology may be 
identified in texts originating from varying fields. 
INTRODUCTION 
The technical literature of a nuclear plant 
represents about 150 000 pages (figure quoted by EDF, 
France, in 1990). About 20 000 pages make up the 
maintenance literature of a an aircraft, of which 5 000 
are subject to revision every three months on average. 
In France, the cost per page for a translation 
varies from 250 to 400 francs for a translation from 
French to one of the other European languages 
(according to Bossard Consultants), which amounts to 
6 million francs for the maintenance literature of an 
aircraft. 
If a machine translation system is able to reduce 
considerably the time spent per page by translators, even 
if the result is not perfect, it will signify a gain with 
respect to delivery times and costs. 
To avoid problems concerning non standardised 
terminology, we have endeavoured to stick to the terms 
familiar to traditional grammar, even though they may 
not always be appropriate. 
I wiah to thank my colleagues, D. Baehut, O. Gamrat and 
M.C. Puerta, for their help. 
1 - DEFINITION OF THE 
CORPUS STUDY 
According to our method, based on GETA's 
works (Grenoble), if the MT system comes across a non 
handled phenomenon, the sentence is not rejected and 
translation is carried through using those parts of the 
sentence which have been successfully analysed (CL \[1\] 
& \[2\]). Within the framework of an industrial 
application in which the ratio of development costs to 
gain in quality is preponderant, rare phenomena may 
thus be handled in a very simplified way, or even not 
handled at all. 
Midway between controlled syntax systems and 
systems 'which translate everything', our approach to 
MT favours the development of systems adapted, a 
posteriori, to the texts to be translated. This theory goes 
by the assumption that it is possible to define and then 
recognise the typology of texts, specifying in particular 
their form, the linguistic phenomena present or absent 
and the general vocabulary used (excluding terminology). 
The study, which was strongly influenced by the 
MT application, began in 1984, during the French Projet 
National de Traduction Automatique (PNTAO), with an 
intensive research phase. It is still currently in progress 
although only sporadically, with the study of new texts. 
The first study aimed at proving that it was possible to 
define a typology for the given corpus. Its result was the 
definition of an initial typology (less restrictive than the 
METEO typology for instance, see \[6\]), which was 
further refined (though not radically changed) during the 
years which followed. The definition of file typology 
retained consisted of a list of the phenomena handled, in 
other words, a list of the static grammar charts (Cf. \[2\] 
& \[3\]) which are part of the linguistic specifications of 
the system. We will not give a full formal explanation 
of the defined typology, however, we will view it from a 
more informal point of view. 
The whole corpus was made up of documents in 
French provided by different fh'ms (Sonovision, SITE, 
A6rospatiale, EDF, Rh6ne Poulenc, Syseca...). It 
consists mainly of aeronautical documents (maintenance 
documents, job cards...), data processing texts (extracts 
from reference documents or from user guides, or 
ACRES DE COLING-92, NANTES, 23-28 AO~T 1992 I 2 8 4 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
software error messages), minutes of meetings, extracts 
from work schedules or from technico-commercial 
documents. 
The aeronautical documents which make up the 
greater part of the corpus (about 50%), maintenance 
manuals and job cards, were initially chosen as the basic 
corpus for the PN'fAO for ,several reasons : 
they were available ill machine readable form, at 
Souovisiou (which later became the SITE group), 
they corresponded to a consklerable commercial need, 
they concerned a sector of strong export activity, 
they were representative of a large number of 
technologies (mechanics, data processing, fluid 
mechanics, strength of materials, etc.). 
It is for these reasons that they coustitute even 
today, an important source of enrichment for the corpus. 
The initial corpus was made up of about 
400 pages taken from maintenance and service guides of 
Marcel Dassault aircraft. Part of the texts, provided with 
the English translation, were made up of job cards. I~tch 
job card indicates how to go about a specific servicing 
operation. They contain general recommea&ltions and 
precise instructions : 
NOTA : 
L'avion dolt ~tre arnarr~ Iorsque la vltesse 
du vent risque de d~passer bO noeuds. 
Pour les vents tres violents, i} est 
pr#f#rable de mettre I'avion sous abrL 
A - Preparation du travail 
a. Proc~dcr au campement de I'avion (voir 
02.1 I). 
B - Execution de l'operatlon 
0 I Deposer" les portes 1905, 1907, 4106 
02 Visser une attache d'amarrage sur chaque 
rerrure 
The other texts were ~rvice notes describing an 
apparatus or a mechanism, how it functions and the 
servicing procedures to be followed. The following is an 
illustration of this second type of text : 
section I0 ~ UTILISA\] ION 
1 I - Description - Fonctionnement 
I ;.I - G~n~ralltes 
Le raccord auto-obturable d'un {)quipement 
mtacanlque a pour but d'assurer le 
raccordement raplde d'une gent)ration 
l~ydl'aullque ~un banc de test et permet la 
mlse en fonctlon des diff6rents organes de 
celte g~n~ratlon hydraulique saris perte de 
I iquide e\[ sans entree d'air. 
Le raccord auto-obturable sert egalement 
au rempllssage et au d~gazage des circuits, 
The study enabled us to pinpoint tile 
characteristic of these two types of texts, It turned out 
that their typology was identical. The main difference is 
at form level : the first type of text contains enumerated 
instructions, generally described using short phrases 
without fnll stops and which do not exist in the second 
type of text. 
'the purpose for extending the study was to 
iuvestigate the possible existing differences amongst 
various types of documents in various fields. It turned 
out that most of the literature iulended for technicians 
COlXespouds to the original typology. 
Amongst the other corpora studied (about 
800 pages form various fields, e.g. electrical installation 
manuals, computer manuals and advetlisiug, software 
manuals and advertising, aeronautical texts, description 
of composite materials, maintenance documentation for 
mechanical installations, etc.), we have classified the 
texts into two groups : 
texts pertinent to the initial corpus, in other 
words those texts whose typology was identical 
or closely re~sembled that of the first corpus, 
non-pertinent texts containing new phenomena. 
311is second type of text will probably require a 
sub-categorisatiou into other typologies (a task 
not as yet carried out). 
Typology is field independent. We have received 
extracts tiom aeronautical documents, provided by 
Adrospatiale. Some of these texts tall into the non- 
pertinent category, while others fall into rite pertinent 
category. Furthermore, it was found that texts 
originating h'om different fields (data processing or 
electrical engineering, for example) fell into the 
typology. 
Amongst the non-pertinent texts we found 
mainly texts which require rewriting rather than 
Ixanslatinn, lot instance user manuals (at least those 
intended for non specialists), scientific articles, 
advertising and legal texts. Such texts are obviously less 
adapted to MT. We al~ found minutes of meetings and 
client service reports, whose syntax is often very 
f,.mciful. 
As far as quantity goes, the volume of technical 
taatter is much greater than that of advertising and legal 
matter (Cf. Van Dijk consultancy figures), Maintenance 
and service guides, intended for specialists, and referetlce 
guides represent a much greater volume than that of user 
guides, intended for non-specialists (which are less 
hoxnogenoas typologically speaking). 
2 - RESULT OF THE CORPUS 
STUDY 
The study laid rile emphasis on the problems 
directly related to tn, mslation, in particular with respect 
to accuracy, 
The sub-sections wlfich follow illustrate tile 
defined typology mid instance the linguistic phenomena 
not encountered in the so-called pertinent texts studied. 
Several points must be underlined : 
ACRES DI! COLING-92, NAN'lJ~S, 23-28 AOl~q' 1992 l 2 8 5 PREC. OF COL1NG-92, NANrES, AUG. 23-28, 1992 
1- any phenomenon classed as being absent may 
nevertheless be marginally present in a text ; 
2- the corpus study showed that few of the phenomena 
absent from the initial study were later added. 
2.1. FREQUENT PHENOMENA 
It is not possible to list fully all the phenomena 
encountered. We will however endeavour to give the 
main features of the typology of the texts studied, giving 
the most unusual or significant examples (from the 
pertinent corpus). The following hence plays more of an 
illustrative role than a descriptive role. 
Punctuation 
1- Commas 
Generally speaking, puntuation is used somewhat 
irregularly. There are few commas, and they do not 
always appear where they would normally be 
expected. Consequently, it appears pointless to use 
them for linguistic purposes, for example, in the 
determining of the limit of a coordination. 
2 - Full stops 
As previously mentioned, the job cards come in the 
form of enumerated operations. Each operation is 
described by one or more small paragraphs 
containing sentences, As a rule, there are no full 
stops indicating the end of a paragraph : 
01 Ouvrir los verrlTres 
NOTA : 
L'lnspection extdrleure de I'avion 
s'effectue par le pilot@ accompagn6 du 
mecanicien. Ell@ @st exTcut~e darts le sens 
des aiguilles d'une montre, en partant de la 
gauche du poste avant 
02 Contr61er que... 
Enumerations 
Every possible type of enumexation can be found. In 
each case, processing is delicate : 
S'assurer que : 
l'assiette de I'avion reste volsine de la 
ligne de vol 
- la manoeuvre ne pr~sente pas de danger 
NOTA : 
...le mTcanicien doit: 
s'assurer de l'absence de fultes d'hulle... 
inspecter le rev~tement : sur les @l~ments 
en stratifi~ list,s cl-apres, s'assurer de 
I'dtat de la peinture et de 1'absence 
d'ental lie ou blessure 
- karmans voilure-fuselage intrados et 
arMTre 
- saumons voilure 
Infinitive clauses 
They are veay frequent in the texts, especially as 
operations to be carried out are described using the 
infinitive form (and not using the imperative form : 
faites ceci, or personal form : vous faites ceci ). They 
appear as the main clause of a sentence, as an object or 
as an adverbial : 
Apres avow solgneusement ass@ch6 les 
partie externes du raccord, soumettre 
celul-cl a une pression interne de 0,6 bar 
pendant 5 minutes... 
...it est prdfTrable de mettre l'avion sous 
abris. 
Ne pas tendre exag~rement les cordes. 
Aglr sur la valve pour la faire reculer. 
Infimtive chuses introduced by file preposition 
d, with a passive adjectival value are frequently 
encountered. They bear a certain modality : 
fixer un obturateur sur la buse d'entree du 
circuit d'air iLEe, LFI92~ (to be cooled) 
• .. Ileffet i~t_gJ2Le,13.~ (to be obtained) 
Les coupures de frettes ne sont pas ~. 
Drendre en consideration 
Finally, the various infinitive clauses can be 
inter-coordinated or coordinated with different types of 
clauses (object e* adverbial clauses). Coordinations (apart 
from enumerations) are generally limited to two or three 
clauses. 
Conjugated verbal clauses 
We class a conjugated verbal clause as being any 
clause governed by a conjugated verb. The clauses 
encountered may appear either as a main (or independen0 
clause or a subordinate clause (object or adverbial). All 
the common types of clauses can be found (personal, 
impersonal, active, passive...). Coordinations 
(enumerations aside) are generally limited to two or three 
clauses. Examples of characteristic sentences : 
Pour les vents tres violents, il est 
prdferable de mettre l'avion sous abri. 
S1 le liqulde polluant s'inflltre dans la zone 
sltu@e entre la jante de la roue et le talon, 
il faut imperativement d@monter le pneu 
pour le nettoyer. 
Amongst the remarkable clauses, we 
encountered several examples of inverted subjects (which 
could pass as a stylistic turn unlikely to be found in a 
technical tex0. For example : 
Au cadre 19 se trouve un bo~tier etanche... 
Relative clauses 
Very few are encountered in the texts (on 
average 1 relative clause every 2 pages), most of them 
are introduced by the relative pronoun qta (for more than 
95% of the texts studied). Here are a few examples : 
II est ba~d sur la conductibilit6 
thermique.., et sur le mode de circulation 
adopt@ qul permettent un 6change... entre 
les deux circuits d'air qui la traverse. 
AC'lXS DE COLING-92. NANTES. 23-28 AOt~q' 1992 1 2 8 6 PROC. OF COLING-92. NANTES. AUG. 23-28, 1992 
,.. elles poss~dent chacune un anneau 
travers lequel passe la sangle de rappel,.. 
Noun phrases 
We class noun phrases as being all groups with 
a nominal value whether or not they are introduced by a 
proposition. It is likely that all the possible types of 
noun phrases exist in the corpus. We observed with 
interest that in some cases, the groups were quite often 
barely" grmnmatically acceptable. The principle of 
juxtaposing nouns in French is a rare phenomena. All 
the same, juxtaposition is quite a generalised pratice in 
the texts studied (this is probably the case for a lot of 
technical texts). The following juxtapositions are 
normally accepted : 
I'antenne I~ACAN 
I'ensemble raccord-bouchon 
un syst~me de type ba~onnette 
Whereas the following are less acceptable 
grammatically speaking : 
I'ensemble de hissage avion 
Iongueur part i e \] Isse 
I'axe avlon 
The following complex examples inuslrate the 
degree of complexity noun phrases may attain (the part 
of the sentence which is not the noun phrase is 
bracketed) : 
(11 proc~de) de la technique des 
refroidisseurs 9 surfaces secondalres, 
appel~,s retroidisseurs a lames et 
intercalaires ou refroidisseurs compacts 
qui convlent particuli~,rement bten au gaz 
ayant un mauvais coefficient d'echange 
thermique. 
Verb nominalisation (verbiaction noun 
derivation) is also frequently used : 
... effectuer une v~riflcation du tarage... 
(rather than v#rifier le forage) 
,., proc~der ~ la d~pose des panneaux... 
Faire une verification du reglage... 
These structures are built using a small number 
of French verbs (loire, effectuer,...), and they are used 
with all nominalisations of verbs of action. 
Idiomatic expressions 
We class idiomatic expressions as being 
variable or non-variable expressions, which require 
identification within a text in order to ensure correct 
translation. Idiomatic expressions exist in all the 
syntactic categories. Nominal idiomatic expressions are 
probably the most frequent, particularly in specialised 
terminology (more than 80 % of the temis are nominal 
idiomatic expressions, in aeronaalical termiuology). 
Without going into further detail, it can be said 
that they are all present in the corpus. However, only a 
relatively small number of verbal expressions can be 
found (in comparison with the potential wealth of the 
French language) and there has been no example of a 
transformation of these expressions (for example les 
mesures qui doivent ~tre prises) in the corpus. They are 
generally made up using a limited number of verbs 
(prendre, tenir, mettre...) : 
Malntenlr en place l'embase... 
Cette pression de rempllssage tlent compte 
d'une perte de charge... 
... mettre les commandes du r~gulateur 
oxyg~)ne sur ON.,, 
We must bear in mind that expressions of the 
effectuer-plus-a-nominalised-verb type do not belong to 
the specific idiomatic expression class, and are considered 
as being a commonly accepted Wansformation. 
2.2. RARE OR NONEXISTENT 
PHENOMENA 
The following is a list of significant examples 
of linguistic phenomena rarely or never encountered, it is 
not an exhaustive list. 
Interrogative clauses 
No direct nor indirect interrogatives were encountered 
in any of the texts. 
Imperative clauses 
No imperatives (imperative mood) were encountered 
in the pertinent corpus, they are replaced by 
infinitive clauses (such is the case for most service 
or maintenance notes in French) : 
proc~der au campement de I'avlon 
ne pas tendre exag~rement les cordes 
Amongst the non-pertinent texts, messages for users 
of a data processing system, user guides or training 
guides may contain imperatives or (sometimes) 
interrogatives. 
Direct speech 
No examples of direct speech were encotmtered. 
Comparative phrases with a congmrative reference 
Comparative phrases containing a comparative 
reference were extremely rare in the corpus studied. 
Only one example was encountered : 
La b~che ~ eau est plus volumineuse que la 
cuve d'{)vaporation, composee d'~lement en 
alliage le, ger, soudes. 
Concessive clauses 
No concessive clauses were encountered. 
Clausal subjects 
No examples of personal clauses with a syntactic 
role of subject were encountered. It is hence possible 
to find : 
...11 est recommandL ~ de ne pas Inverser la 
palette... 
but not : 
Inverser la palette n'est pas recommand6... 
Human personal pronouns 
ACTES DE COLING-92, NANTES. 23-28 AOt~q" 1992 1 2 8 7 PROC. OF COL1NG-92, NAN'I\]~S. AUG. 23-28, 1992 
Although not totally absent, personal pronouns are 
rare in technical texts (one per page on average). 
None of the personal pronouns encountered referred 
to a human (in fact, humans are rarely referred to in 
the texts) in the pextinent corpus. 
This information meant that for the French-English 
system, personal pronouns would always be "it" (or 
"they") as it is not necessary to search for the 
refcmmce. 
Rhetorical figures 
Rhetorical figures are rare and correspond to 
established usages. 
Only one metaphor was encountered in the texts : 
.. les commandes chemlnent sur le cot~ droll 
sous le plancher passager.. 
The problem is solved "at dictionary level, given that 
a cable can cheminer (make its way). 
The metonyms encountered appear to be accepted 
metonyms which are transposable into the target 
language. Thus, in the above example, the term 
commandes which represents the cable or cables 
which propagate the controls, is literally translated 
into English, without shocking the technicians of 
the field. 
Anaphora are rare, and as with personal pronouns, if 
encountered could be handled in a simplified way. 
Finally, with regard to the aspectual 
phenomena, in French, syntactically speaking, there is 
very little aspectual indication. English, which was our 
target language very often uses the progressive form. We 
therefore paid particular attention to the reconstitution of 
the progressive form. After having carried out a 
comparative study of the texts, we were able to draw the 
conclusion that the progressive form was virtually 
nonexisent in the translated texts. 
5 - CONCLUSION 
The corpus study has enabled us to draw several 
conclusions. Firstly, it has enabled the definition of a 
text typology which is pertinent to a considerable 
volume of texts of different origins and different 
technical fields (although this excludes in particular 
computer user guides and documents for the general 
public). 
The definition of this typology has enabled us 
to draw up accurate linguistic specifications, simplifying 
and sometimes ignoring the handling of certain 
linguistic phenomena. The specifications were then used 
to develop an FAT system (based on GETA's system, 
ARIANE), whose grammars are less complex, and hence 
easier to maintain. The quality of translation, tested on a 
important number of aeronautical texts (maintenance 
guides) and on several specimens of pertinent texts, have 
proved the validity of this approach : revision times 
varied from 20 to 30 minutes per page (average 
translation times in SITE is about 1 page per hour). 
The study of texts of different typologies has 
enabled as to pinpoint the limits of a minimal language, 
used in all types of texts, and hence the indispensable 
kernel of any new typology. 
From an economic point of view, the texts 
targeted by the typology described here represent a 
considerable amount of the documents to be translated. 
BIBLIOGRAPHY 
\[1\] BACHUT, D., & al., "Industrialisation d'un 
syst~me de TAO fran~ais-anglais pour la 
documentation technique", G6nie Linguistique 91, 
Versailles, 1991. 
\[2\] BOITET, Ch., & al., "ARIANE-78: an 
integrated environment for automated translation 
and human revision", COLING~82, Prague, 1982. 
\[3\] BorI'ET, Ch., "The French National Mr-Project: 
Technical organization and translation results of 
CALLIOPE-AERO", IBM Conf. on Translqtion 
Mechanization, Copenhagtm, 1986. 
\[4\] BOITET, Ch., "Current Machine Translation 
systems developed with GETA's methodology and 
software tools", ASLIB, London, 1986. 
\[5\] BOITET, Ch., "Current state and future outlook 
of the research at GETA", MT Summit, Hakone, 
1987. 
\[6\] CHANDIOUX, J., "METEO: un syst~me 
op6rationoel pour la traduction des bulletins 
mtttorologiques destints au grand public", Meta 
21, 1976. 
\[7\] CHAPPUY, S., "Formalisation de la description 
des niveaux d'interprttation des langues 
naturelles", Th~se 3e cycle informatique, 
Grenoble, 1983. 
\[8\] HUTCHINS, W.J., "MACHINE 
TRANSLATION: past, present, future", 
Chichester, Ellis Horwood series in Computer 
and their applications, 1986. 
K1TTREDGE, R., & LEHRBERGER, J., 
"Sublanguages : study of language in restricted 
semantic domains", Berlin, de Gruyter, 1982. 
VAUQUOIS, B., & CHAPPUY, S., "Static 
grammars : a formalism for the description of 
linguistic models", International Conference on 
Theoretical and Methodological Issues in Machine 
Translation of Natural Language, Colgate 
University, 1985. 
