A Study of some Lexical Differences between French and 
English Instructions in a Multilingual Generation Framework* 
Farid Cerbah 
Dassault Aviation 
DGT/DTN/EL -- 78. quai Marcel Dassault - cedex 300 
92552 Saint-Cloud - FRANCE 
e-mail: cerbah@dassault-avion.fr -- Fax: 33 (1) 47-11-52-83 
Abstract 
This paper describes ongoing research on the 
lexicalisation problem in a multilingual gener- 
ation framework. We will focus in particular on 
two major types of verbal differences observed 
in a corpus of bilingual (French - English) pro- 
cedural texts extracted from aircraft mainte- 
nance manuals. To deal with these two types 
of differences, we propose lexicalisation mech- 
anisms, which proceed from the same semantic 
representation for both French and English re- 
alisations. We will however discuss at the end 
of the paper other types of lexical differences 
which may require language-specific inputs. 
keywords: Multilingual generation, lexical 
choice, controlled languages. 
1 Introduction 
Technical documentation appears as a promis- 
ing application area for text generation• Sev- 
eral works (\[18, 17, 6, 12, 7\] l) demonstrate 
that NLG techniques may contribute in the fu- 
ture to make technical documentation more re- 
liable and maintainable. Many of these contri- 
butions are concerned with multilingual gen- 
eration, which is often presented as an alter- 
native to Machine Translation. The multilin- 
gual generation approach stipulates that tech- 
nical documents, such as maintenance manu- 
als, can be generated automatically in several 
*This paper partly covers a work made by the au- 
thor at Dassault Aviation within a Technical coopera- 
tion between Dassault Aviation and British Aerospace 
- Military Aircraft Division. The University of Edin- 
burgh was involved in this project as a sub-contractor 
of British Aerospace• 
1This list is far from being exhaustive. 
languages from knowledge bases used in design 
processes or constructed for the purpose of au- 
tomatic documentation production. 
GhostWriter is a bilingual generation system 
under development at Dassault Aviation and 
British Aerospace. Our objective in this 
project is to show how French and English 
maintenance procedures can be generated from 
an abstract representation of underlying action 
plans expressed in a formalism inspired by AI 
planning models. The role of the text gener- 
ator is to propose bilingual drafts of procedu- 
ral texts intended to be integrated in mainte- 
nance manuals, and to perform rephrasing op- 
erations which may be requested by the techni- 
cal author, for example grouping maintenance 
instructions at surface level or changing the 
specificity level of an instruction. 
The design of a multilingual generation system, 
needless to say, requires a precise analysis of the 
linguistic means used by each language to ex- 
press the same conceptual content. The aim of 
this paper is to describe the main verbal differ- 
ences observed in a bilingual corpus of proce- 
dural texts and to analyse their impacts on the 
lexicalisation mechanisnm of the sentence gen- 
eration system GLOSE \[4\] used in GhostWriter. 
The structure of this paper is as follows. I give 
in section 2 an overview of GLOSE. Then, I dis- 
cuss brieily in the next section the corpus anal- 
ysis and its role in the design of the multilin- 
gual generation system. Sections 4 and 5 fo- 
cus on specific types of lexical differences and 
the related lexicalisation mechanisms. Finally, 
the conclusion will describe some lexical diver- 
gences which may require the introduction of 
language-specific semantic representations. 
131 
2 The sentence generator 
Our sentence realiser GLOSE is based on 
Meaning-Text Theory (MTT) \[14\]. This linguis- 
tic theory offers many potentialities for mul- 
tilingual applications. In computational lin- 
guistics, it has been primarily used as a the- 
oretical basis for language generation models 
(e.g. \[2, 1, 16\]). Recently, some works in 
the fields of machine translation and compu- 
tational lexicography (e.g. \[8\], \[9\]) take advan- 
tage of lexicographic descriptive concepts of- 
fered by MTT, in particular the well-known no- 
tion of lexicalfunction. In accordance with the 
stratified framework of MTT, the target repre- 
sentation of the lexicalisation process of CLOSE 
is a Deep Syntactic representation -- mainly a 
dependency tree, whose nodes are labeled with 
full lexemes and lexical fimctions. The rela- 
tions between nodes represent deep syntactic 
relations which are defined as abstractions over 
superficial syntactic relations. The dependency 
tree is enriched with communicative biparti- 
tions such as Theme/Rheme and Given~New. 
We will ignore these communicative constraints 
in this paper because they are of minor impor- 
tance for the linguistic phenomena considered 
here. Lexical functions are used to represent 
syntactico-semantic relations between lexemes, 
such as synonymy, hyperonymy, and various 
types of cotlocational relations. 
GLOSE is composed of two MT-models 2, one for 
each of the two languages considered in our do- 
main. It should be mentioned that only the 
grammatical realisation 3 component of GLOSE 
can be considered as an implementation of 
"pure" MY-models, since we do not use at 
the lexicalisation phase MTT-style semantic net- 
works which represent in this theory a linguisti- 
cally motivated semantic level, independent of 
the conceptual level. The integration of such 
semantic representations in a multilingual en- 
vironment raises several theoretical and practi- 
cal problems which will be the object of future 
investigations. We should note that these prob- 
2A Meaning-Text model consists of the grammar 
and the lexicon of a particular language. 
3We mean by grammatical realisation the following 
(main) linguistic operations: (1) transition from deep 
syntactic representation to surface syntactic represen- 
tation, (2) linearisation of the surface syntactic repre- 
sentation and (3) surface morphology. 
lems are studied by several NLG researchers (eg, 
\[10, 11, 13\]). At present, we consider the lexi- 
calisation problem as a mapping process from 
conceptual representations to French and En- 
glish lexemes. This process relies on concept- 
lexeme mapping structures, integrated in the 
lexicon, and which represent elementary tran- 
sitions from conceptual structures to lexemes. 
3 The contrastive analysis 
The corpus is composed of about thirty bilin- 
gual pairs of extended procedural texts ex- 
tracted fl'om aircraft maintenance manuals. 
Our contrastive analysis concentrates on verbal 
expressions. Verbal differences between French 
and English instructions can be classified along 
three interrelated dimensions: (1) lezical- 
French and English versions diverge because of 
differences in the lexical resources available in 
both languages -- (2) syntactic- equivalent 
verbs exist but the two versions cannot rely 
on similar syntactic constructions--, and (3) 
stylistic -- lexically and syntactically equiva- 
lent versions may be obtained but one of them 
would be stylistically incorrect. 
We should stress that, when designing the lex- 
icalisation component of a multilingual gen- 
eration system, one should be careful in de- 
ciding how much importance should be given 
to such a contrastive analysis. In the corpus, 
bilingual sentences expressing the same content 
may differ significantly, even though closely re- 
lated and acceptable versions can be obtained. 
Hence, in such cases, it is difficult to know if 
the author(s) had good reasons to make the 
English and French versions so different and 
if the differences should be respected in the 
automatic generation process. For aeronautic 
maintenance procedures, controlled languages 
-- in particular AECMA/AIA Simplified English 
and GIFAS Rationalised French -- provide use- 
ful guidances, which help to identify the rel- 
evant differences for multilingual generation. 
The lexical differences reported in the next sec- 
tions will be systematically evaluated from a 
controlled language perspective. This does not 
mean that controlled languages should be con- 
sidered as "absolute" references. We will see 
that the writing rules defining these languages 
are sometimes too general. 
132 
4 Operator verbs 
Our corpus analysis reveals that a precise ac- 
count of operator verbs is required. This texical 
class encloses semantically poor items like do, 
carry out in English and effectuer, proc~der in 
French, which are combined with predicative 
nouns to form complex predicates. For exam- 
pie, in sentence (1F), the operator verb procgder 
takes as its direct object the predicative noun 
remplissage which, in some way, denotes the 
action to be performed: 
(1F) Procdder au remplissage du rdservoir hy- 
draulique. 
(Lit. 'Proceeds with the filling of the hydraulic 
reservoir.') 
Operator verb constructions have already been 
studied from a machine translation perspec- 
tive \[5\]. Such constructions raise an interest- 
ing problem for MT because they cannot be 
translated in a purely compositional manner. 
For example, a compositional English transla- 
tion of the sentence "John a posd une question 
d Mary" would lead to the incorrect sentence 
"John put a question to Mary", whereas the 
correct (or the more closely related) transla- 
tion would be "John asked Mary a question". 
To make the appropriate translation, an MT 
system should be able to identify in the initial 
sentence the semi-idiomatic expression poser 
une question and consequently build a sentence 
based on the equivalent English expression ask 
a question. Besides, the equivalent expression 
in the target language does not always exist,, 
which means that even more complex corre- 
spondences should be found. The literal trans- 
lation associated to sentence (Iv) illustrates 
this point. We can hardly get an acceptable 
English translation if we want to preserve the 
structure of the French instruction. The En- 
glish equivalent of (1F) found in the corpus is 
based on the verb fill which takes as direct ob- 
ject the translation of the argument of the pred- 
icative noun remplissage in (1F): 
(1E) Fill the hydraulic reservoir. 
French and English instructions often diverge 
on this aspect. Operator verbs are exceedingly 
common in the French versions. We have found 
many pairs of bilingual instructions where the 
French instruction is based on an operator verb 
construction and the English instruction on a 
simple verb. Here are some excerpts which il- 
lustrate this regularity: 
(2E) Bleed suction lines. 
(2F) Effectuev la purge du circuit d'aspiration. 
(Lit. 'Carry out the bleeding of suction lines.') 
(3E) Change the hydraulic fluid. 
(3F) Effectuer le renouvellement du liquide 
hydraulique. 
(Lit. 'Carry out the renewal of hydraulic liquid.') 
(4E) Carefully clean the filter body. 
(4F) Effeetuer un nettoyage soignC du corps 
du filtre. 
(Lit. 'Carry out a careful cleaning of the filter 
body.') 
It is important to note that, in many cases, 
these French instructions can be paraphrased 
by sentences based on simple verbs. For exam- 
ple, sentence (2F) can be paraphrased by the 
sentence based on the verb purger, directly re- 
lated to the predicative noun used in (2F): 
(2F') Purger le circuit d'aspiration. 
((2F') is the closest translation of the English 
version (2E)) 
This remark holds for all the examples given 
above. The choice of operator verbs is of- 
ten a consequence of technical writers'stylistic 
preferences. However, as shown by the literal 
translations, stylistically inadequate sentences 
would result if this preference were equally ap- 
plied for English. 
Simplified English and Rationalised French 
suggest to restrict the use of operator verbs, as- 
suming that verbs that directly show the actions 
make maintenance instructions clearer. How- 
ever, operator verbs cannot always be avoided, 
even in English. Consider the following pair: 
(5E) Gain access to rear compartment. 
(5F) Acedder d la soute artiste. 
We can hardly find an acceptable paraphrase of 
(5E) built on a simple verb. We will also show 
later that sometinms operator verbs cannot be 
avoided when some attributes of the action to 
be performed should be conveyed explicitly. 
133 
Sem/-nput = actlon-token-i / ...'" 
illoc-walue - Imperatlv4..". ................ ............. 
Agent - object-token-2 
Domaln-object - operator-1 
referentlal-status - specific 
Patient ,, object-token-2 
Domain-object - hydr-reserwolr-4 
referentlal-status = specific 
.......... ~.FILL(v) 
MECHANIC RESERVOH~t 
s~8" 4el A~ t ~f 
MYDRAULICfA ) 
"*'" "~. REMPLIR (y) 
MECANICIEN RESERVOIR 
'-. N~ (N) ", -- 
",. HYDRAULIC(A ) 
~.. .............. . 
• ". PROCEDER "'-. • ,. ".. 
MECANICIEN ". REMPLISSAGE ", 
RESERVOIR 
H'YD RAULIQUEiA I 
(IE) Fill the hydraulic reservoir. 
(! F') Remplir le reservoir hydraulique. 
(I F) Proc~der au remplissage du 
rkservoir hydraulique. 
Figure l: An illustration of operator verb/simple verb selections. 
4.1 Operator verb constructions in 
the lexicallsation process 
The sentence generator should be able to gen- 
erate multilingual pairs of instructions similar 
to the excerpts (2), (3) and (4), by selecting 
an operator verb construction for one element 
of the pair and a 'simple verb construction' 
for the other element. For this kind of dif- 
ferences, the French and English lexicalisations 
rely on the same basic mechanisms. However, 
the way these basic mechanisms are combined 
is language-specific. 
Let us look more closely at the pair (1) 4 and at 
the lexicalisation process required to produce 
such sentences. Surface realisation starts with 
the following input representation: 
SemInput = hction-token-I 
llloc-value = Imperative 
Domain-predicate = fill 
Agent = object-token-2 
Domain-object = operator-i 
Referential-status = specific 
Patient = object-token-3 
Domain-object = hydr-reservoir-4 
Referential-status = specific 
This structure represents an imperative illocu- 
tionary act. Its propositional content is an ac- 
tion of type fill which has two arguments 
Agent and Patient. The figure I illustrates 
4(1E) Fill the hydraulic reservoir. 
(IF) Procdder au remplissage du rdservoir 
hydraulique. 
potential correspondences between this input 
representation an(\] the deep syntactic repre- 
sentations required to derive sentences (1E), 
(iF'), and (le) after grammatical realisation. 
The dotted arrows indicate the possible lexi- 
cal mappings of the conceptual predicate fill. 
Tile English realisation and the first French 
option (1F') rely on a simple correspondence 
between the predicate fill and corresponding 
verbs (fill and remplir). By contrast, the sec- 
ond French option is based on a complex cor- 
respondence between the predicate fill and a 
multi-lexemic structure procdder it> remplis- 
sage. 
To deal with this lexical phenomenon, two lex- 
icalisation rules are involved. These rules may 
roughly be described as follows. Given the in- 
put representation'5 : 
SemInput = action-token 
Illoc-value = Imperative 
Domain-predicate = P 
Agent = x, 
Patient = x= 
Rolen = x. 
5For sake of clarity, we consider that the illocution- 
cry value is always imperative since we strictly focus in 
this paper on the instructional parts of the procedures. 
This illocutionary value does not affect the lexicalisa- 
tion of the proposition, i.e the construction of the deep 
syntactic tree. However, it has an effect on grammat- 
ical realisation, such as erasing the subject during the 
transition to surface syntactic level. 
134 
rl: Simple Verb Construction 
1. Look in the concept - lexeme mapping 
structures for a correspondence P ~ V. 
2. Lexicalise the arguments xl, ..., Xn and 
link the resulting lexemic structures to V. 
r2: Operator Verb Construction 
1. Look for a mapping structure P ~ N. 
2. Look in the lexical entry of N for a verb g 
such that V = Operl(N). 
3. lexicalise xl and link the resulting lexemic 
structure to V by means era deep syntactic 
relation I. 
4. Link N to V by means of a relation II. 
5. Lexicalise the remaining arguments 
xa, .... Xn and link the resulting lexemic 
structures to 1'1'. 
Several remarks should be made about these 
rules: 
• To link predicative lexemes to their depen- 
dents (i.e. realisations of arguments), corre- 
spondences I)etween conceptual roles and deep 
syntactic relations (\[, 1I, ..., IV) are specified 
in the lexical entry of each verb and predicative 
noun. Hence, a conceptual-lexeme mapping 
structure indicates not only which lexeme(s) 
can be used to express a concept but also how 
the roles of the concept should be realised in 
terms of deep syntactic relations. 
• In a MTT-like lexicon, predicative nouns are 
linked to their operator verbs I)y means of the 
lexical functions Operx, Opera, ...(for ex- 
ample, Operl(remplissage) = procdder). The 
number designates the actant of the predicative 
noun which is promoted as first actant (syntac- 
tic subject) of the operator verb. In the proce- 
dures we have analysed, only the Operl func- 
tion seems to be relevant. 
• The rule r2 maps a single concept P to a 
multi-lexemic structure composed of an oper- 
ator verb governing a predicative noun. How- 
ever, this correspondence is not given as such 
in the lexicon. It appears more natural to con- 
sider that the lexical realisation performed by 
rule r2 relies primarily on a correspondence be- 
tween the predicate P and the predicative noun. 
It should also be mentioned that such basic cor- 
respondences can also be exploited to gener- 
ate similar phrases in other types of construc- 
tions. For example, the correspondence :f5.11 
rernplissage, used by the rule r2 when gen- 
erating the sentence (1F) can also be used to 
construct the nominalisation le remplissage de 
l'accumulateur in the declarative sentence: 
(6F) .Le remplissage de l'aeeumulateur dolt 
provoquer l'allumage du voyant sur le 
tableau hydluulique. 
(Lit. 'The replenish of the accumulator should 
cause the warning light to come on on the hy- 
draulic panel.') 
• The lexicalisation of arguments involves other 
mechanisms, which concern in particular the 
construction of referring expressions \[3\]. 
• An appropriate generation of multilingual in- 
structions in accordance with these lexical dif- 
ferences can be achieved by assigning priorities 
to these rules. In English, rt should be priv- 
ileged and r2 applied only if rx fails. For ex- 
ample, this last case would occur when gener- 
ating sentence (SE) 6. rl would fail because the 
lexicon does not contain a mapping structure 
relating the atomic predicate gain_access and 
a simple verb. In French, it is, however, dif- 
ficult to assign absolute priorities in the same 
way, since we can find both types of construc- 
tions in similar contexts. If stylistic preferences 
observed in the corpus have to be reflected in 
the automatically generated texts, a reasonable 
solution would be to select indifferently one of 
these rules. Notice that Rationalised French, 
which is not, respected in the procedural texts 
we have analysed, will assign a higher priority 
to rl, resulting in an identical parameterisation 
of the lexicalisation mechanisms for both lan- 
guages. 
4.2 The problem of complex actions 
We have assumed so far that actions to be ver- 
balised can be represented by simple predicate 
- argument structures. However, actions may 
have attributes (manner, temporal constraints, 
S(5E) Gain access to rear compartment. 
135 
...) which should be conveyed explicitly. In 
general, the two types of constructions repre- 
sented by rules rl ~nd r2 are possible, even when 
some attribute of the action should be realised 
at surface level. For example, in (4F) 7 the man- 
ner attribute of the cleaning action is expressed 
as an adjective since this action is nominalised. 
But if the same action were expressed as a verb 
the manner attribute would take the form of an 
adverbial modifier: 
(4F') Nettoyer soigneusement le corps du filtre. 
(Lit. 'Carefully clean the body of the filter.') 
To deal with such modifiers, a minor extension 
of rules rl and r2 is required. The rules should 
be able to introduce modifiers on the 'main' 
predicative element of the sentence, i.e. the 
main verb in rx and the direct object of the 
operator verb (the predicative noun) in r2: 
• In rx: an attribute of the action will be re- 
alised as an adverb linked to the main verb 
V by means of an attributive deep syntac- 
tic relation (ATTR). 
• In r2: the attribute will be reatised as an 
adjective which linked to the predicative 
noun N with an attributive relation. 
The problem is that sometimes these attributes 
cannot take an adverbial form anti in the anal- 
ysed procedural texts, it seems that this limi- 
tation is an important motivation for using op- 
erator verbs. They provide the ability to in- 
troduce such attributes in an adjectival form. 
Consider the following pair: 
(7E) Carry out a dry ventilation of the reactor. 
(7v) Effectuer une ventilation s~che du 
rdacteur. 
From both English and French versions, we 
cannot derive in a simple way equivalent ex- 
pressions based on a simple verb because of the 
adverbial modifiers: 
(TE') *Ventilate drily the reactor. 
(7F') *Ventiler s~chement le rdaeteur. 
A key problem for text generation is to be able 
to avoid such incorrect sentences. This prob- 
lem has already been tackled in \[1..5\]. Meteer 
proposes to express the input semantic con- 
tent in terms of abstract linguistic resources, 
7 (4F) Effectuer un nettoyage soignd du corps du filtre. 
i.e. semantic categories, which prevent in- 
correct combinations of concrete linguistic re- 
sources during surface realisation. Following 
Meteer's analysis, the lexeme dry in (7E) de- 
notes a property which cannot be realised if 
an event perspective is taken on the predicate. 
This constraint enforces the nominalisation of 
the action. By contrast, an attribute of cate- 
gory manner can be combined with both event 
and object perspectives. This explains why 
(4F) and (4F') are both acceptable. In many 
cases, the characterisation of attributes along 
the semantic opposition manner/property ex- 
plains the acceptability or inacceptability of the 
"adverbial forms". However, this characterisa- 
tion is not always straightforward and it ap- 
pears that more precise oppositions should be 
introduced. 
5 Specificity level of verbal 
items 
Another important lexical difference concerns 
the specificity level of each element of the bilin- 
gual pairs. A French instruction may be less 
specific because a conceptual argument has 
been left implicit while explicitly realised in the 
equivalent English instruction. However, even 
when both instructions are at tile same speci- 
ficity level, differences may appear in the way 
semantic content is spread over the lexical ma- 
terial. This is mainly due to the fact that verbs 
available in both languages do not necessarily 
cover the same part of the initial content. 
We will focus on three types of lexical diver- 
gences which are frequent in the analysed pro- 
cedures: 
1. Domain-speclfic vs ordinary verb 
The two verbs have similar argument struc- 
tures but one of them belongs to the technical 
jargon of the domain. 
(8E) Unlock valve clapper nut. 
(8F) Ddfreiner l'dcrou du clapet de valve. 
The verbs unlock and dgfreiner have a very 
close meaning, but tile second one is domain 
specific and imposes more c~nstraints on its 
second argument (the direct object). For ex- 
ample, the English sentence unlock the door is 
acceptable but not the French one Ddfreiner la 
porte. 
136 
2. Specific vs general verb 
One of the two verbs has a more specific mean- 
ing: 
(9E) Charge the accumulator with nitrogen. 
(9F) Gonfler l'accumulaleur h l'azote. 
(Lit. 'Inflate the accumulator with nitrogen.') 
The choice of a more general verb for the En- 
glish version is purely stylistic since a specific 
verb -- inflate -- exists, as shown in the literal 
translation of (gF). We have found several di- 
vergences of this kind, which seem to be stylis- 
tically motivated. \[19\] describes similar diver- 
gences between English and German instruc- 
tions. 
Notice that, with respect to Simplified English. 
sentence (9E) is not acceptable, since specific 
verbs have to be prefered when available. 
We will see in section 5.1 that, interestingly, in- 
structions can be made more precise with gen- 
eral verbs because of differences in argument 
structures: a general verb may have a more ex- 
tended argument structure than a specific one. 
3. Ordinary vs denominal verb 
The two verbs have distinct argument struc- 
tures. One of them, in general the English one, 
incorporates an argument which is expresse(t 
at surface level in the French version. Such ar- 
gument incorporation is often realised through 
the use denominal verbs which are much more 
frequent in English procedures: 
(iOE) Jack up the aircraft. 
(IOF) Mettre l'avion sur vdrins. 
(Lit. 'Put the aircraft on jacks.') 
The verb jack up has no direct equivalent in 
French. Hence, the French version has to rely 
on a general verb and the locative argument 
should be realised at surface level. In the cor- 
pus, denominal verbs are systematically used 
in the English versions (when they are avail- 
able) even though this choice leads to bilingual 
pairs with quite different lexical structures. 
Such verbs ensure conciseness and, sometimes, 
the lack of denominal verbs in French makes 
the French version much longer. It should be 
stressed that, in general, both instructions are 
at the same specificity level, even though one 
of them appears more complex. 
5.1 Consequences for the lexicalisa- 
tion mechanisms 
1. Let us start with the first type of differences, 
domain-specific us ordinary verb. The corpus 
shows that domain-specific verbs are often pref- 
ered over ordinary verbs. A plausible motiva- 
tion of this preference is that, as illustrated by 
example (8) s, they impose precise selectional 
restrictions on the arguments. The important 
point for multilingual generation is that the ab- 
sence of a domain specific verb in one language 
does not affect lexicalisation in the other one 
(i.e., a specific verb will be used if available). 
2. The second type of differences is a more com- 
plex issue. Both Simplified English and Ra- 
tionalised French include a writing rule which 
says that specific words should be prefered over 
general words. This rule can be used as a guid- 
ing principle in the verb selection mechanisms. 
However, it is not always sufficient in order to 
reach the appropriate specificity level required 
for the instruction. Selecting a more specific 
verb does not necessarily lead to a more spe- 
cific instruction., A verb may have a precise 
meaning but a restricted argument structure 
which may force to leave implicit some part of 
the initial content. To illustrate this point, let 
us compare the following surface realisations of 
the same instruction: 
(11 E) Remove lockwire from filler bowl. 
(liE') Unlock the filter bowl. 
The verb unlock is more specific than remove, 
but the locking device to be removed is not 
specified as a surface argument of the verb. By 
contrast, this argument can be made explicit 
with the verb remove. Which of these two ver- 
sions can be considered more specific? (llE) 
seems more specific, for the 'unlocking' action, 
though incompletely specified by the main verb 
remove, is somewhat suggested by the argu- 
ment loekwire (since, obviously, the function of 
a lockwire is to lock). Besides, it brings an- 
other information -- the nature of the locking 
device -- which cannot be expressed in (liE'). 
The integration in a text generation system of 
such evaluations of instruction specificity level 
is not a straightforward issue. Complex world 
s(8E) Unlock valve clapper nut. 
(8F) Ddfreiner l'dcrou du clapet de valve. 
137 
knowledge and lexical semantic inferences are 
involved in these evaluations, and they require 
a deeper model of domain knowledge and pre- 
cise semantic definitions of lexical items. At 
present, our approach is less ambitious. We 
take advantage of the simple heuristic: "the 
more arguments a verb has, the more specific 
the resulting instruction" in order to detect po- 
tential conflicts. This ability of detecting lex- 
ical options may help to perform rephrasing 
operations. For example, if sentence (liE') is 
generated first, considering that more specific 
verbs should be privileged, a rephrasing request 
would cause the generator to propose an alter- 
native realisation based on the general verb re- 
move which allows to express at surface level 
the argument left implicit in the first proposal. 
According to our corpus, this kind of rephras- 
ing operations will normally concerns only the 
English versions, since in the French procedures 
specific verbs are systematically prefered. 
Let us now describe briefly how these function- 
alities are concretely integrated in the lexical- 
isation component. The generation of an in- 
struction based on a specific verb involves the 
rules rx and r2 (see section 4.1)"( These rules 
make correspondences between the conceptual 
predicate of the action and a specific lexical 
item. The choice of a more general verb relies 
on the same rules but the generation process 
will proceed from a transformed input repre- 
sentation built on a superordinate predicate. 
For instance, to produce sentence (11E') m, lex- 
icalisation will proceed from the following rep- 
resentation, provided that the mapping struc- 
ture remove-locking-device ~ unlock is given 
in the lexicon: 
SemInput = Action-token-I 
Illoc-value = Imperative 
Domain-predicate = remove-locking-device 
Agent = object-token-2 
Domain-object = operator-1 
Referential-status = specific 
Patient = object-token-3 
Domain-object = lockwire-4 
Referential-status = specific 
Location = object-token-4 
Domain-object = filter-bowl-5 
Referential-status = specific 
9And also the rule r3 dedicated to the selection of 
denominal verbs and wlfich will be defined later. 
m(llE') Unlock the filter bowl. 
At the deep syntactic level, only arguments 
Agent and Location will be realised as ac- 
tants of the verb unlock (Agent as actant \[ 
and Location as actant II). The generation 
of sentence (lIE) 11 will proceed from an input 
representation based on the superordinate con- 
ceptua.l predicate romove with the same argu- 
ments. The predicate will be directly linked to 
tile verb remove as specified in the lexicon and 
the three arguments will be realised at the deep 
syntactic level. 
3. As we have already said, the use of de- 
nominal verbs often causes differences between 
the French and English versions of instructions, 
since they are usually not available in French. 
Besides, even when they are available they are 
not systematically used as in the English ver- 
sions, as attested by the following example: 
(12E) Pvessurise the hydraulic system. 
(12F) Mettre le circuit hydraulique sous pres- 
8iOn. 
(Lit. 'Put the hydraulic system under pressure.') 
The sentence (12F') based on the denominal 
verb prdssuriser and which is equivalent to 
(12F) is also present in the corpus: 
(12F') Pressuriser le circuit hydraulique. 
The lexicalisation rules defined so far perform 
mappings between a single concept (the pred- 
icate) and one or several lexemes. By con- 
trast, the selection of denominal verbs involves 
mappings between several concepts and a sin- 
gle lexeme. A denominal verb covers not only 
the main predicate but also an argument of 
the predicate. In the example given in figure 
2, the French and English versions are derived 
from the same conceptual representation. The 
French version results from a one to one map- 
ping between concepts of the input representa- 
tion and lexemes. In particular, the predicate 
lock is directly mapped to the verb freiner and 
the argument Instrument to the phraseme 'ill 
frein'. The generation of such sentences relies 
on rules rl and r 2. However, in the English 
version, it is the combination of the predicate 
lock and the argument Instrument which is 
mapped to the main verb lockwire. 
To ensure such correspondences, an additional 
It (llz) Remove lockwire from filter bowl. 
138 
. . ............ .~.FRE~IER 6, ) ...o.°. .... ~ p+~ 
• "" MECHANI 'FIL FPd~\[ ' CORPS .... / ,~ , ~, ~, ,,N, I 
SemInput ,, actlola-token-3 .-'" \] "*+< ........ '1 ..... I 
illoc-value - Imperative .-'" / .'"" \[ I ============================================ 
............ • / .."*" FILTRE J ," 
:~.~-p.r~alc~te - lock . ": ~ ..... '~' ~,l : ====================================== .......... : ...... . ,m~, 
: ,'~strumemt - obJect-token-i ""-,.A ............... . { : Doslain-obJ ect - lockwire-2 : Fremer aufilfrei, la cuve sur le corps duflltre. 
: 
• .:,. referentlal-status = masslc.: i + 
Agent = obJect~tokel~-2 -'" "",,. 
Domaln-obJect = operator-1 .... 
referentlal-status - specific "'"'"'"+ ........... ~LOCKWtREfV~ 
Location = object-token-3 i~ 1 
Domaln-object ~ filter-body 
referential-status = specific MECHA ':71'" 
DolmaiIl-obJect - bowl-i 
referentlal-status = specific ! FILTER I 
L ~t. J~\] 
Loc~vire bmvl on filler body. 
Figure 2: In the English version, the predicate and the instrument argument are mapped to a 
denominal verb. 
rule is required: 
Given the input representation: 
SemInput = action-token 
Illoc-value = Imperative 
Domain-predicate = P 
Agent = xt 
Patient = x2 
Role. = x. 
r3: Argument Incorporation 
1. Look in the concept - lexeme map- 
ping structures for a correspondence 
P+xi ~ V, i 6 {1,...,n}. 
2. Lexicalise the remaining arguments and 
link the resulting lexemic structures to V. 
To be consistent with the lexical preferences 
observed in the corpus, this rule should have 
the highest priority. 
The incorporated argument does not always 
hold the same semantic role. For example, it 
can be the instrument as in the verbs lock- 
wire, energise and pressurise or a locative ar- 
gument as in the verb jack up. It should also be 
mentioned that such incorporations are not re- 
stricted to arguments. \[19\] discusses closely re- 
lated phenomena concerning German, English 
and French instructions. The authors provide 
in particular some examples where a manner 
attribute is realised as an adverb in English 
while incorporated in the verb in German and 
French tu. 
6 Conclusion 
We have focused in this paper on some frequent 
lexical differences between French and English 
instructions. We have also proposed a specifi- 
cation of lexicalisation mechanisms, without in- 
troducing distinct semantic representations for 
French and English lexicalisations. We do not 
claim however that distinct representations can 
always be avoided. Our corpus reveals the ex- 
istence of deeper differences (though less fre- 
quent) which call for language-specific repre- 
sentations. For example, we have found sev- 
eral instructions where aspectual values are 
conveyed explicitly in French but not in En- 
glish. Another interesting case concerns agen- 
tivity values assigned to the operator in the in- 
structions. Consider the following example: 
(13E) Allow hydraulic pressure to fall. 
(13F) Chuter la pression hydraulique. 
(Lit. 'Decrea.se hydraulic pressure.') 
In (13E), the operator is presented as the en- 
abler of a physical process, whereas in (13F), he 
t2For example: 
(E) affect adversely- (G) beeintr<ichtigen- (F) amoindrir 
139 
is presented ms the causer of an action. It seems \[9\] 
that the generation of such a bilingual pair re- 
quires language-specific semantic inputs built 
on distinct event categories. Interestingly, we 
have noticed that controlled languages will not,, 
in most cases, allow these deeper differences to 
appear. One of the pair is often rejected by 
the corresponding controlled language. For ex- 
ample, (13E) does not comply with Simplilied 
English, which would encourage the use of the 
more direct form: Decrease the hydraulic pres- 
.sure. This last sentence is closer to (13F) and 
we can reasonably suppose that these two sen- \[ll\] 
tences can be generated from the same input. 
Acknowledgments 
I wouhl like to thank Alexis Nasr, Corinne 
Fournier, and Owen Rainbow for helpful com- 
ments on ealier versions of this paper. 

References
\[l\] L. Bourbeau, D. Carcagno, E. Goldberg, R. Kit- 
tredge, and A. Polgubre. Bilingual synthesis of \[l,l\] 
weather forecasts in an operations environment. 
In Proceedings of the 13th International Con- 
ferenee on Computational Linguistics (COL- 
ING'90), Helsinki, 1990. COLING-90. \[15\] 
\[2\] M. Boyer and G. Lapahne. Generating para- 
phrases from meaning-text semantic networks. 
Computational Linguistics, 1:103-I 17, 1985. 
\[3\] F. Cerbah. Referring Expressions in Ghost- 
Writer. Technical report, Dassault Aviation - 
British Aerospace, 1995. 
\[4\] F. Cerbah and C. Fournier. The syntactic com- 
ponent of the GLOSE generation system. Tech- 
nical report, Dassault Aviation, 1995. 
\[5\] L. Danlos. Support verb constructions: lin- 
guistic properties, representation, translation. 
French Language Studies, (2):1-32, 1992. 
\[6\] J. Delin, A. Hartley, C. Paris, D. Scott, and 
K. Van Linden. Expressing procedural relation- 
ships in multilingual instructions. In Proceed- 
ings of the Sevenlh International Workshop on 
Natural Language Generation, Kennebunkport, 
Maine, 1994. 
\[7\] A. F. Hartley and C. L. Paris. Supporting 
Multilingual Document Production: Machine 
Translation or Multilingual Generation'? In 
IJCAI Workshop on Multilingual Text Genera- 
tion, pages 34-41, Montrdal, 1995. 
B. Lavoie. \[nterlingua for Bilingual Statistical 
Reports. In IJCAI Workshop on Multilingual 
Text Generation, pages 84-93, Montrdal, 1995. 
I. A. Mel'~uk. Dependency Syntax: Theory and 
Practice. State University of New York Press, 
New York, 1988. 
M. W. Meteer. Bridging the generation gap be- 
tween text planning and linguistic realization. 
Computational Linguistics, 7(4), 1991. 
O. Rambow and T. Korelsky. Applied Text 
Generation. In Third Conference on Ap- 
plied Natural Language Processing, pages 40-47, 
Trento, Italy, 1992. 
E. Reiter, C. Mellish, and J. Levine. Au- 
tomatic generation of on-line documentation 
in the tDAS project. In Proceedings of the 
Third Conference on Applied Natural Language 
Processing (ANLP-1992), pages 64-7l, Trento, 
Italy, 1992. 
D. RSsner and M. Stede. Customizing aST 
for the automatic production of technical man- 
uals. In R. Date, E. Hovy, D. RSsner, and 
O. Stock, editors, Aspects of Automated Nat- 
ural Language Generation, Lecture notes in Ar- 
tificial Intelligence 587, pages 199-214. Springer 
Verlag, Berlin, 1992. 
\[19\] M. Stede and B. Grote. The lexicon: 
Bridge between language-neutral and language- 
specific representations. In IJ6"AI Workshop on 
Multilingual Text Generation, pages 129-135, 
Montrdal, 1995. 
\[8\] U. Heid. Notes on the use of lexical func- 
tions for the description of collocations in an 
NLP lexicon. In International Workshop on the 
Meaning-Text Theory, pages 217-229, Darm- 
stadt, 1992. 
D. lteylen, L. tlumphreys, S. Warwick- 
Armstrong, N. Calzolari, and S. Murison- 
Bowie. Collocations and the lexicalisation of se- 
mantic operations -- lexical functions for mul- 
tilingual lexicons. In International Workshop 
on the Meaning-Text Theory, pages 173-183, 
Darmstadt, 1992. 
\[10\] L. Iordanskaja, R. Kittredge, and A. Polgubxe. 
Lexical selection and paraphrase m a 
meaning-text generation model. In C. Paris, 
W. Swartout, and W. Mann. editors, Natural 
Language Generation in Artifical Intelligence 
and Computational Linguistics, pages 293-312. 
Kluwer Academic Publishers, 1991. 
R. Kittredge. Efficiency vs. Generality in Inter- 
lingual Design. In IJCAI Workshop on Multilin- 
gual Text Generation, pages 64-74, Montrdal, 
1995. 
L. Kosseim and C,. Lapalme. Content and 
rhetorical status selection in instructional texts. 
In Proceedings of the Seventh International 
Workshop on Natural Language Generation, 
Kennebunkport, Maine, 1994. 
