Evaluating and comparing three text-production techniques 
.Ios~ Coch 
GSI-Erli 
l, pl. dos Mal:seillais 
F-94227 Charcnton-lc-Pont Codex 
France 
jose. coch@erli, fr 
Abstract 
What al+c the benefits of ttsing Natural 
\[~anguagc (;cneratio,t in an industrial 
apl+lication? We have attempt t<) answer 
part (}f this qttcsti{}n with at descripti(}n of 
an assessment {}f three techniques for 
producing multiscntcntial text: senti- 
atutomatic fill-in-lhc-blank interfacing, 
automalic linguistic-and-tcmphltes hybrid 
generation, and hunlall writing. This 
asscssIllol\]l used a black b(}x 
motlmdology, with ain independetlt blind- 
tested jury that gave difforent quality 
levels in relation to a sot o1' criteria. The 
texts used for tile assessliicnt wcfc 
business reply letters. 
1 Introduction 
Thct'c arc many m{}re industrial proiccts in Analysis 
than in Natural l,anguagc (;cneration. Therclorc the 
bencl:ils {}f using applied N1 ,(; would a\]}pcal + a crucial 
issue. We have l}r{}vidcd a partial rCSl)onsc It} this 
issue by analysing the asscsslnent o\[" three different 
tcclmiqucs for producing multiscntential text (in this 
case, business reply letters). 
In the lollowing section, we have describe{l tile three 
techniques under assessment: semi-automatic non- 
linguistic fill-in-the-blank intcrlhcing, atut(}matic 
linguistic-and-tentphtte hybrid gerlct'ation, and human 
wril:ing. 
The third section deals with the black-b{}x 
mcthodol(}gy and qttality critcria used for tile 
aISSBSSIlICII\[, 
"\['lie lk)urth section descl+ibcs the results {}f the 
alsscssntcn\[. 
The fiftll section givcs examl}les of letters prt~luccd 
by both the semi-autonutlic system, and the 
lilU_Btistic-and-tempIate hybrid system. 
The hlst section analyses tile results of tile 
assessment. 
2 Three techniques for producing 
multisentential text 
This section describes tile three text-production 
techniques under assesslllelt\[. 
2.1 Fill-in-the-blank semi-automatic 
technique 
Since 1975, the mail department el' lea Redoule (a 
l~;uropcan mail-order colnpany) has been using a 
semi ;automatic reply system, referred to below as 
"SA", consisting of a nutnbel + of predelined attd fill- 
in-the-blank sentences or paragraphs which are 
identified by codes that the writers memorisc. Writing 
a letter thcrcfore involves typing the code that 
corresponds to the desired pm'agraph and inserting the 
relevant elcnlents. The sentences or paragraphs thus 
produced are thcl'clbre concatenations o1' predefined and 
illSertcd texts. 
l. A relatively high number of prcdefined sentences 
and paragraphs have to be provided, to cover the 
writers' needs, but: 
2. In fact, writers use only a reduced set of predefined 
13aragraphs, Ihe nunlber of which depending on tile 
writer. 
3. The quality of tile t+esulting style of reply varies 
widely. 
2.2 Automatic Hybrid Generation 
(IAnguistic + Template approach) 
lea Rcdoutc and GSI-Erli have developed a real- 
situation pilot system (for details on this project, see 
(Coch, David, and Magnolcr, 1995)) which builds up 
a text (i.e. a letter) fronl data entered by tile human 
operator who processes the request; a custonlcf 
database; and knowledge bases. It uses GSi-Erli's 
AlcthGen text generation toolbox (see (Coch, 1996)). 
The overall system is composed of two Inain 
modules: thc I)ccision module and the Generation 
module. 
The Decision module has the following functions: 
249 
• it allows the writer (who reads the request letter) 
to identify the author and subject of the request 
letter; 
• it asks the writer for relevant information; 
• it suggests a decision (for example, order 
cancellation, renewal, etc.), after consulting the 
customer database and the domain knowledge; 
• it asks the writer to validate the decision (or make 
a different choice); 
• it communicates the relevant information to the 
Generation module. 
The Generation module automatically produces the 
reply letter in a standard l~rmat (SGML). This 
module consists of several submodulcs (for more 
details, see (Coch, David and Magnoler 1995) and 
(Coch and David, 1994)): the direct generator; the text 
deep-structure planner (or conceptual planner); the 
text surface-structure planner (or rhetorical planner); 
and linguistic realisation, inspired by the Meaning- 
Text Theory. 
The direct generator has two functions: 
1. planning the text in direct mode (top-down), anti 
2. generating more or less fixed expressions or non- 
linguistic texts (i.e. tables, addresses, lists, etc.). 
The direct generator could be used without the other 
submodules to generate texts in an automatic but 
non-linguistic way (manipulation of character 
strings). Reiter (Reiter, 1995) calls this technique 
"the template approach". 
The output of the conceptual planner is the text's 
deep structure, in which the events to be era'tied out 
are not yet in a definitive order. The conceptual 
planner uses logical, causality, and time rules (see 
(Coch and David, 1994)). 
The rhetorical module chooses concrete operators, 
modalities and surlace order, according to rhetorical 
rules. The choices made depend on certain attributes, 
e.g. whether the addressee is aware of an event, 
whether an event is in the addressee's favour, and so 
on, 
Lastly, the linguistic generation submodule realises 
each event li'om the text surface structure. It uses 
anaphora (see (Coch, David and Wonsever, 1994)), 
semantic, deep-syntactic, surface-syntactic, and 
morphological rules. This sub-module is inspired 
mainly by the Meaning-Text Theory (as developed for 
example in (Mel'euk, 1988) and (Mel'euk and 
Polgubre 1988)). 
In accordance with Reiter (Reiter, 1995), La Redoute 
and GSI-Erli's system can be defined as "hybrid", 
because it uses both linguistic and template 
techniques. 
2.3 Human writing 
The third technique used was human writing in 
"ideal" conditions: one of La Redoute's best writers 
wrote the letters with no time constraints. 
2.4 Functional differences 
It is to be noted that the three techniques describexl 
differ from an external functional point of view: 
• in the semi-automatic approach, the writer 
compose the letter themselves, even if assisted by 
a set of predefined-paragraph codes; 
• in the autonmtic hybrid approach, the operator 
enters data on the addressee and letter, but does not 
have to compose the reply letter; 
• in the third case, the writer has to write the letter. 
Reiter (Reiter, 1995) studied the difference between 
the linguistic generation anti template approaches. 
The two techniques do not differ from an external 
functional point of view. 
3. Methodology 
3.1 Evaluation Tests 
Black-box methodology was used for the assessmcm, 
which was era'tied out by an independent jury of 14 
people, who were representative of end users, in a 
blind-test context. The jury was not informed of the 
automatic generation project. 
Each member of the jury examined the quality of a 
set of 60 letters (20 produced by the SA system, 20 
by the automatic hybrid system, and 20 human- 
written, for identical cases). No member of the jury 
knew which technique had been used for producing 
each of the letters. 
Each member o1' the jury wrote a report on cad1 
letter, with assessment values according to quality 
criteria. Examples of these criteria are: 
• correct spelling, 
• good grammar, 
• comprehensiveness, 
• rhythm and llow, 
• appropriateness of the tone, 
• proximity, personalisation, 
• absence of repetition, 
• correct choice and precision of the terminology 
used. 
The first three criteria were considered as eliminatory, 
and were marked 0 or I. The other criteria were 
marked out of 20. 
There were also other criteria, but they were too 
application-oriented and confidential. 
250 
3.2 Reprcsentativity of the results 
Given that the tests used only 20 letters of each type, 
one might question their representativity. 
In fact, representativity is ensured by the projection 
of the results of the previous phase (system tests) 
which used the same quality criteria, involved a 
reductxl Jury (2 to 6 members), and was based on 
200 test cases (200 letters of each type). 
The test cycle was performed six timcs: 
Delivery 7 ",,, 
Correction \ ,/ Test 
Diagnosis 
After the sixth cycle, the average quality scores 
showed thai the results wottld be sufficiently 
representative. 
For example, for the following criteria: 
• rhythm and flow 
1.21 precision of terminology 
0 absence of rel)ctitions 
16 
~ 44 
"~ 12 
< 10 
I '1 I '1 
2 3 4 5 
Step 
We can thus conclude that, for the automatic letters, 
the results are representative, 
The semi-automatic letters were produced hy \[ittnlan 
"writers" in a real situalion. There is no proo\[ o1" this, 
but several people who know the semi-autotnatic 
systetn were of tim opinion that the scmi-automatic 
letters ttsed in the test were butter than the average 
semi-atttomatic letter. 
4. Assessment results 
4.1 Eliminatory criteria and overall 
average 
All the automatic and human letters met the 
eliminatory criteria standards. However, this was not 
the case for the senti-automatic system, in particular 
due to problems of comprehension, but also due to 
grammatical mistakes in the fill-in-the-blank 
system. 
The overall averages of the entire jury, for all the 
quality criteria (including application-oriented 
criteria), and for all the letters were as follows. 
• semi-automatic system: I 1 out of 20 
• automatic hybrid system: 14.5 out of 20 
• human-written letters: 15.5 out of 20. 
It can be seen that the quality of the letters generated 
by the pilot systeln using AlethGen was lar superior 
to that of the senti-automatic system using predetinexl 
paragraphs. 
These tests show that the "Ideal" human-written 
letters are, obviously, thc best. However, the 
differences between the hmnan-written letters and 
those produced by the automatic hybrid system ,'ue 
relatively slight. 
4.2 Detailed results 
Below are the averages for the whole jury and all the 
letters, as regards the non-eliminatory criteria: 
4.2.1 Rhythni and flow 
• scmi-automatic system: 12.8 out o1'20 
• automatic hybrid system: 14 out of 20 
• human-written letters: 16.8 out of 20 
IIiffcrcnces : 
, ideal human letters 2.8 
• atttomatic letters 1.2 
• kleal httnmn letters 4 
The difli:rence between the ideal human letters mid 
those obtained with the automatic hybrid system is 
considerable: 2.8 out of 20. 
vs. automatic letters: 
vs. SA lcttcrs: 
vs. SA letters: 
4.2.2Right tone 
• Selni-automatic system: 
* autonmtic hybrid system: 
* huma,>written letters: 
Differences: 
. ideal human letters 
, automatic letters 
I 1.6 out of 20 
13.6 out of 20 
14.4 out of 20 
vs. automatic letters: 
vs. SA letters: 
0.8 
2 
251 
• ideal human letters vs. SA letters: 2.8 
The results obtained by the ideal human letters ~md 
those generated automatically are close. However, the 
ditTemnce between automatic and semi-autonmtic 
letters is considerable: 2 out of 20. 
4.2.3 Proximity, personalisation 
• semi-automatic system 12 out of 20 
• automatic hybrid system 15.2 out of 20 
• human-written letters 17.6 out of 20 
Differences: 
• ideal hunmn letters vs. automatic letters: 2.4 
• automatic letters vs. SA letters: 3.2 
• ideal human letters vs. SA letters: 5.6 
Here, all the difli:renccs are considerable. The human 
letters are obviously the best, but the dil\]~rence 
between the automatic and semi-automatic letters is 
very great: 3.2 out of 20. 
4.2.4Absence of repetition 
• semi-automatic system 11.2 out of 20 
• atttomatic hybrid system 14.8 out (11" 20 
• human written-letters 17.6 out of 20 
Differences: 
• ideal human letters vs. automatic letters: 2.8 
• automatic letters vs. SA letters: 3.6 
• ideal human letters vs. SA letters: 6.4 
For this last point, all the difl~rcnces mc 
considerable, but that between the automatic and 
semi-automatic letters is very great: 3.6 out of 2(i). 
4.2.5Correct choice of terminology 
• semi-automatic system I 1.6 out of 20 
• automatic hybrid system 14 out of 20 
• human written-letters 16 out of 20 
Differences: 
• ideal human letters vs. automatic letters: 2 
• automatic letters vs, SA letters: 2.4 
• ideal hunmn letters vs. SA letters: 4.4 
Here, all differences are relatively great. That between 
the atmmmtic and semi-automatic letters is 
considerable: 2.4 out of 20. 
5. Examples 
Below are several examples o1' letters produced using 
the semi-automatic \['ill-in4hc-blmlks system and the 
automatic linguistic-and-template hybrid system. 
5.1 Semi-automatic letter 
ChOre Madame, 
J'ai bien fe(~u votre courrier du 30tohre \[sic\] 
et je eomprends tout h fait votre 
mdcontentement. 
Nous faisons le maximum pour contenter nos 
clients, mais nous sommes ddpcndants des 
ddlais de liw'aison que nous imposent certains 
fournisseurs. 
Je suis ddsolde de no pouvoir vous donner une 
date prdcise de livraison, croyez bien clue je 
regrette vivcment ce retard. 
Restans 5 votre enti~re disposition, je vous 
prie de croire, Ch6rc Madame, h l'expression 
de rues sentiments ddvouds. 
/Dear Madam, 
In reply to your letter of 3rd Owber \[sicL 1 
can completely undetwtand your 
d#sati,sfaction. 
W(-: do our ulmost to satisJS, our customer:v, 
but are dependent ott the delivery times 
imposed on us by certain suppliers. 
l ant q/kaid that 1 cannot give you an exact 
delivery date, and sincerely apologise for this 
dek(y. 
I remain at your entire dLvposal should you 
require any jitrther assistance. 
Yours sincerely, \] 
5.2 Linguistic and template example 
ChOre Madame, 
Je suis ddsolde que vous n'ayez pas re(2u les 
chaussurcs de sport blanches. 
Comme vous en avez dtd informdc lots de 
I'enrcgistremcnt de votre commando, ellcs 
n'dtaient pas disponibles. La livraison dtait 
diffdrde de deux semaines. 
Cc ddlai sera un peu plus long que prdvu. 
I)?~s la rentrde en stock de ces ehaussurcs de 
sport, jc vous lcs envcrrai immddiatement, cn 
priorit,5. 
J'esp~re que vous nous pardonnerez celte 
attente et que vous voudrez bien patienter. 
Je vous prie d'agrdcr, Chbre Madame, 
l'expression de men entier ddvouement. 
252 
/Dear Mac&m, 
\] ant very sorry/hat you have 1tot received lhe 
white sports shoes. 
These items were not available when yottr 
order wax ivcor(led, as 3,oil were injbrmed at 
the time. The deliver was postponed by two 
weeks. 
The delivery will in Jact take a little longer 
than planned. 
As soo~ cts these .S'l)orts s'hoe.s' city in sleek 1 
will send them to you ill i)rioriO '. 
1 helle that you will forgive us ./or this delay, 
and are prel)ared to wait.fin your deliveo'. 
YoHI'S sill('ere\[),, / 
5.3 Comments 
a) Spelling errol" ill tile semi-automatic letter' due to 
the dale writlen by the r)perat(rr in a blank of a 
predefincd senlcnce 
b) Personalisation: the article and its colour me 
mentioned only in the automatic letter 
c) Precision of terminology (precision of the 
explanalion): clearly, tile autonnltic loller is much 
more precise 
5.4 Senti-automatic example 
The following example shows the typical problem o1' 
~£~)ctition in the semi-automatic letters. 
Cher Monsieur, 
J'ai bien regal w)tre lettre qui a lelenu toule 
rl\]()rl attention. 
Je r@onds h w)trc demande concernant la 
marchandise diffdrde suiwmte : cardigan 
4566654 lai\[le 114. 
La marchandise a did enrcgistr@ sous le no 
176 788956. 
Un envoi a dld fait le 23 juin. 
Norlnalement, wins dewiez ddjh awfir req;u la 
liwaison de ce paquet, veuillez m'adresser de 
prdfdrence un cheque pour rdgler la 
l/\]archandise que rlous VOtlS av(rllS ellVOy~e. 
Rcstant a w)lre enli,Sre disposition, je vous 
pric de or(tire, Chcr M(msieur, en rues 
sentiments ddvouds. 
/Dear Si~, 
I have received your lettel, which I have n'gd 
with great attention. 
I am writing in ~W~I 3, to your request 
concerning the ,/o/lowing postponed 
merchandise: ealzligan 4566654 size 114. 
771e merchandise was recorded with the 
number 176 788956. 
~<Sending oeetued>~ oil June 23rd. 
You shouhl ahvcMy have received this parcel, 
ther@n'e would you please send me a cheque 
iH paymeHt of the merchandise that we have 
,vellt lo yell. 
I remain at your entire disposal 
Yomw sineerely,/ 
. Analysis of results and 
Conclusion 
6.1 Analysis of results 
The order of results for tile different techniques is 
always tile same for all tile criteria: first, truman 
writing; second, the automatic hybrid approach; trod 
third, tile senti-automatic system. Let us now 
examine the salient points of each type o1' technique. 
Senti-automatic system 
'File principal weak points of the semi-automatic 
system are as follows, in decreasing order of variation 
in relation to the human averages. 
• l:,liminalory criteria not always reel due lo 
problems of comprehension and gramlnar. 
• Excessive repetition (a diflcrence of 6.4 out of 20 
ill relalion Io human writing, and of 3.6 ira 
rehrtion to tire automatic system). 
° l,ack of pcrsonalisation (5.6 and 3.2). 
• Lack of precision in the choice of vocabulary (4.4 
and 2.4). 
Automatic hybrid system 
The principal strong points of the automatic 
linguistic-and-templates system based on AlethGen 
are as follows, in decreasing order of variation in 
relation to lhe semi-automatic averages. 
• Eliminatory criteria always met. 
• Absence of repelition (3.6 out of 20 better than 
tile semi-automatic system). 
* Proximity, personalisation (3.2 better than the 
semi-automatic system). 
* Precision in the choice of vocabulary (2.4 better'). 
The main points for improvement for lhe automatic 
system are as follows, in decreasing order of variation 
in relation to the human averages. 
253 
• Absence of repetition (human letters 2.8 out of 20 
better). 
• Rhythm and flow (human letters 2.8 better). 
• Proximity, personalisation (human letters 2.4 
better). 
Human writing 
The best characteristics of the human letters were 
absence of repetition, and proximity / 
personalisation, which were both given scores of 
17.6 out of 20. 
it can be seen that the jury considers the tone of the 
human letters as being not very good: only 14.4 out 
of 20. This would appear to be mainly \['or reasons 
related to commercial communication rather than 
computational linguistics. 
6.2 Conclusion 
The first conclusion is that semi-automatic systems 
(just as real-situation human writing) are subject to 
human mistakes, and that the texts they produce may 
be difficult to understand. 
The second conclusion is that the weak points of the 
semi-automatic systems are the strong points of the 
automatic hybrid systems, in the same order. 
We can conclude that, even if current automatic 
generation systems could do better (and we believe 
that this will soon be the case), one of the two main 
reasons for using linguistic-and-template hybrid 
systems such as that developed by La Redoute ~md 
GSI-Erli, rather than using semi-automatic systems, 
is the improvement in quality (the other being, of 
course, productivity). 
Although there are more research and industrial 
projects in Analysis than in Natural Language 
Generation, Generation has great potential, since the 
gains in terms of quality and productivity 
largely justify the investment. 
References 
Jose Coch and Raphael David. 1994. Representing 
knowledge 1"o1: planning multisentential text. 
Proceedings of the 4th Conference on Applied 
Natural Language Processing, Stuttgart, 
Germany. 
Jose Coch, RaphaEl David, and Dina Wonsever. 
1994. Plans, rhetoric and anaphora in a text 
generation tool. Working papers of the IBM 
Institute fin" Logic and Linguistics. Special Issue 
on Focus and Natural l~cmguage Processing, IBM 
Deutschland hfl"0rmationssysteme GmbH, 
Scientific Centre, Heidelberg, Germany. 
Jose Coch, RaphaEl David, and Jeannine Magnoler. 
1995. Quality test for a mail generation system. 
Proceedings of Linguistic Engineering 95, 
Montpellier, France. 
Jose Coch. 1996. Overview of AlethGen. 
Proceedings of the International Workshop on 
Natural Language Generation (INLG-96). 
Herstmonceux, England, 1996. 
Igor Mel'~uk. 1988. Dependency Syntax: Theory and 
Practice. State University of New York Press, 
Albany, NY, USA. 
Igor Mel'euk and Alain Polgu~re. 1987. A Formal 
Lexicon in the Meaning-Text Theory (or How to 
Do Lexica with Words). Computational 
Linguistics, 13(3-4):276-289. 
Ehud Reiter. 1994. Has a consensus NL Generation 
architecture appeared, and is it psycho- 
linguistically plausible? In Proceedings of the 
Seventh International Workshop on Natural 
Language Generation, pages 163-170. 
Ehud Reiter. 1995. NLG vs. Templates. In 
Proceedings of the 1995 European NL Generation 
Workshop, Holland. 
254 
