Generation of Extended Bilingual Statistical Reports 
L. Iordanskaja, M. Kim, R. Kittredge, B. Lavoie and A. Polgu~re 
CoGenTex Inc. 
810 rue Champagneur, suite 210 
Montreal H2V 4S3, Quebec, Canada 
1 Introduction 
During tim past few years we liave been concerned 
with developing models for the automatic planning 
and realization of report texts wittlin technical sub- 
languages of English and French. Since 1987 we have 
been implementing Meaning-Text language models 
(MTMs) \[6, 7\] for the task of realizing sentences 
from semantic specifications that are output by a 
text planner. A relatively complete MTM implemen- 
tation for English was tested in the domain of oper- 
ating system audit summaries in tile Gossip project 
of 1987-89 \[3\]. At COLING-gO a report was given on 
the fully operational FoG system for generating ma- 
rine forecasts in both English and French at weather 
centres in Eastern Canada \[1\]. The work reported 
on here concerns the experimental generation of ex- 
tended bilingual summaries of Canadian statistical 
data. Our first focus has been on labour force sur- 
veys (LFS), where an extensive corpus of published 
reports in each language is available for empirical 
study. Tire current LFS system has built on the ex- 
perience of the two preceding systems, but goes be- 
yond either of them 1. Iu contrast to FoG, but similar 
to Gossip, LFS uses a semantic net representation of 
sentences as input to the realization process. Like 
Gossip, LFS also makes use of theme/theme con- 
straints to help optimize lexical and syntactic choices 
during sentence realizatiou. But in contrast to Gos- 
sip, which produced only English texts, LFS is bilin- 
gual, making use of the conceptual level of repre- 
sentation produced by the planner as an interlingua 
from which to derive the linguistic semantic repre- 
sentations for texts in the two languages indepen- 
dently. Hence the LFS interlingua is much "deeper" 
than FoG's deep-syntactic interlingua. This allows 
us to iutroduce certain semantic differences between 
English and I,¥ench sentences that we observe in nat- 
ural "translation twin" texts. 
1The LFS Bystem is being developed by CoGenTex Inc. 
under contract 36902-O-0749/Ol-XAF wich Communications 
Canada, Canadian Workplace Automation Research Centre. 
Tim first four authors have current academic affiliations with 
the Universlt6 de Montr~al. Polgu6re is now at the National 
University of Singapore. 
LFS is based on a much more detailed text planning 
process than was attempted earlier, and results in 
texts of much greater length and complexity. For 
example, sentence order within certain parts of sta- 
tistical texts depends on data salience, therefore re- 
quiring locally dynamic text planning. Text plan- 
ning also includes tests that allow for appropriate 
use of certain quantifier expressions (e.g., all, mosO, 
evaluative words such as also and only, and intra- 
sentential pronominalization. 
LFS also incorporates some substantial extensions 
in our use of the Meaning-Text framework. First, 
it makes more use of lexical functions (ef.\[8\]), the 
mechanism in MTMs that allows computation of ap- 
propriate collocations and semautieally related lex- 
emes needed ill paraphrasing and ill conflict resolu- 
tion during generation. Second, the grammar is more 
extensive, covering important types of conjunction 
and ellipsis. 
Generation in the domain of employment statistics 
is not new. Roesner's Semtex system \[10\] produced 
German (and later, English) summaries of such data 
that are remarkably similar in style as well as coutent 
to our own. The difference lies in our use of a pow- 
erflfl linguistic model that promises to simplify the 
problem of scaling up the generator to more complex 
and varied texts, or extend them to other varieties of 
text. Furthermore, the LFS project is using feedback 
from domain experts to refine tile rules nsed in both 
text planning and realization. 
2 Text Planning for Statistical 
Reports 
Our approach to planning statistical reports is sim- 
ilar to that used on tile Gossip project \[2\]. A "con- 
ceptual frame" tree schema is mstantiated with input 
data to provide an initial characterization of the in- 
tended content of the reports. Input data for the 
employment domain is in the form of relational ta- 
AcrEs DE COLING-92. NANTES. 23-28 not~q" 1992 1 0 I 9 I)ROC. OF COLING-92, NANTES. AUG. 23-28. 1992 
bles which provide numerical values for employment, 
unemployment, participation rate, etc., broken down 
by age, sex, region and industry, for the current re- 
porting period (e.g., month), as well as for previ- 
ous comparison periods (e.g., preceding month, one 
year ago, etc.). The instantiated tree gives a prelim- 
inary hierarchical structure for the future text, and 
provides a framework for further processing on the 
content to determine the details of text structure. 
For example, comparisons of employment changes in 
various labour force groups will lead to ordering mes- 
sages (future clauses) so as to highlight the most sig- 
nificant changes. The tree structure is traversed and 
modified as a part of this process. The conceptual 
text tree also carries annotations of theme and theme 
specifications which will constrain the set of possible 
texts which can be derived from it. 
An important part of text planning is the identifica- 
tion of messages which can be grouped together into 
structures which will give rise to single sentences. 
This includes conjoining two messages with identical 
theme to give marked structures that will produce 
linguistic conjunction and subject pronominalization 
later as in: 
(1) Employment increased by 20,000 among women 
while it decreased slightly among men. 
Conceptual conjunction includes checking and mark- 
ing similarities in thematic elements that may later 
lead to ellipsis, as in (2), with the possible introduc- 
tion of lexical functions such as in (3) 2. 
(2) Employment increased by 5000 in Manitoba, by 
10,800 in Alberta and by 15,000 in Ontario. 
(3) For the week ended November 18, 1989, the sea- 
sonally adjusted level of employment was estimated 
at 12,518,000, up 32,000from October. 
It has been noticed \[5\] that certain types of report 
texts have complex internal dependencies that put 
special demands on the planning mechanism used. 
In particular, top-down expansion of rhetorical oper- 
ators is inadequate for generating statistical reports 
in our domain. Our planning approach, by making 
use of the power of arbitrary tests and operations 
on tree schemata, allows us to adequately represent 
the cross-serial dependencies found among the pieces 
of content of these reports. However, a more gen- 
eral, but appropriately constrained language for re- 
port planning seems to be a desirable goal for future 
research. 
ZThe lexeme up is the value of tile lexical function Adv 1 
applied to the verb ir.crea#e. 
3 Interlingual Representation 
Published bilingual reports in our domain occasion- 
ally exhibit deep differences between corresponding 
English and French sentences, as in (4a) and (4b): 
(4a) Employment remained virtually unchanged. 
(4b) L'emploi a pen vari4. 
\["Employment changed little."\] 
Not only are the surface syntactic structures incom- 
parable in this case, but they cannot be easily related 
on the level of linguistic semantics, because their se- 
mantic predicates are dissimilar. We have therefore 
chosen to use a conceptual interlingua (the output 
of the text planning process) in order to derive sepa- 
rate semantic net representations of the sentences in 
each language. Hence the sentences (4a) and (4b) are 
derived from non-isomorphic Meaning-Text seman- 
tic networks, which allow us to fully represent the 
two languages' different '~iewpoints" on the same 
conceptual material. 
4 Realizer Design 
Grammatical realization in the LFS system is the 
process by which the semantic nets produced by the 
planner for the incipient sentences are converted into 
surface sentences of each language. Our realizer for 
English is based largely on the general Meaning-Text 
sentence realizer used in Gossip, with some addi- 
tions to cover structures found in statistical texts. 
A comparable realizer for French has been built for 
LFS. As in the case of Gossip, we use four main lin- 
guistic levels of representation between conceptual 
structures and texts: semantic nets (SemR), deep 
syntactic dependency trees (DSyntR), surface syn- 
tactic dependency trees (SSyntR) and morphologi- 
cal strings (MorphR). For each language, the first 
linguistic operation requires searching the semantic 
net for a given sentence to determine the commu- 
nicatively dominant node. This search is constrained 
by the theme/theme specifications which the SemR 
inherits from the conceptual structure (see \[9, 3\]). 
The second operation consists of "replacing" sin- 
gle or complex (configurations of) meauing-bearing 
nodes in the semantic network by actual lexemes 
of the language, and replacing semantic features on 
those nodes by grammatical features which will be 
attached to the nodes of the future deep-syntactic 
tree. These operations lead to a reduced semam 
tic graph (RSemR), which is intermediate between 
SemR and DSyntR. In fact, the SemR is not modi~ 
ACIES DE COLING-92, NANTES, 23-28 ^ot~r 1992 I 0 2 0 PROC. or COLING-92, Nhtcrns, AUG. 23-28, 1992 
fled, but rather it is used as a blueprint for building 
the RSemR, just as each subsequent representation 
is built by mapping rules from its ancestor represen- 
tation. 
The production of the DSyntR tree out of the 
RSemR, called "arborization", entails the mapping 
of predicate-argument relations to deep syntactic re- 
lations using information about potential dominant 
nodes of tim RSemR and grammatical features. 
The SSyntR is built by mapping deep-syntactic 
nodes and relations into their surface-syntactic coun- 
terparts. Single DSyntR nodes corresponding to 
phrasemes (i.e., locutions) give rise to syntactic sub- 
trees in SSyntR, and some grammatical lexemes are 
introduced, including auxiliary verbs, articles and 
syntactically motivated prepositions. 
The next mapping, to MorpbR structure, determines 
word order and all syntactically motivated morpho- 
logical features. A final operation produces actual 
text by computing the final (graphical) wordforms 
based on the morphological features attached to lex- 
emes in MorphR. 
5 Lexical Functions 
The sublanguage of statistical summary reports 
shows a certain amount of variation in the syntactic 
structure and \[exieal choices used to express a given 
content. We have used lexical functions to imple- 
ment this paraphrastic variation in a systematic way 
within our Meaning-Text models, much as was done 
in Gossip \[3\]. Briefly stated, lexical functions (LFs) 
can be considered abstract meanings which have dif- 
ferent lexical values depending on their argument 
lexemes. LFs provide a way of delaying some id- 
iosyncratic lexical realizations until after major syn- 
tactic choices have beeu made. They also allow us 
to formulate very general paraphrase rules. 
Our statistical reports, with their emphasis on nu- 
merical changes and comparisons of change, pro- 
vide an excellent opportunity to use lexical flmc- 
tions, such as Magn ("intensifying" word), S O (ac- 
tion nominal) and Oper I (agent-oriented support 
verb). For example, sentence (6) can be calculated 
to be a paraphrase of (5): 
(5) Employment decreased sharply in October. 
(6) Employment showed a sharp decrease in October. 
A general paraphrase rule states that a verbal lexeme 
(here, decrease), can be paraphrased by a syntactic 
construction where the new verb (i.e., show) is the 
value of Oper I operating on the nominalizatiou (i.e., 
S0) of the old verb. This computation, carried out 
by successively looking up LF values in argument 
word lexieal entries, derives the new verb show by 
functional composition. In the derived paraphrase 
sentence (6) this new verb takes as its syntactic ob- 
ject the nominalization of the old verb. In a separate 
operation which if factored out of the paraphrase op- 
eration, the lexical value of the intensifier sharp is 
computed via the lexical function Magn operating 
on lexeme decrease. It is simpler to delay its evalu- 
ation until after the change in grmnmatieal category 
of the head word. The paraphrase rule which re- 
lates the two verbal constructions of (5) and (6) can 
be stated using only lexieal functions, lexical class 
(part-of-speech) symbols and grammatical relations, 
without reference to specific lexical |tents. 
In addition to the above "well-known" lexical func- 
tions, our domain also makes use of Syn (synonym), 
AntiMagn (diminutive modifier), Locin (locative 
preposition), Adv 1 (locative adverb) and several 
more "exotic" ones. Most LFs used in our system 
are introduced during the mapping from RSemR to 
DSyntR. Exceptions include Syn, which is used dur- 
ing reduction of SemR to RSemR. When there are 
two semantic nodes with identical lexemic meanings, 
the realizer uses Syn to lexicalize one differently 
from the other by finding a synonym. 
6 Implementation and Future 
Directions 
The LFS system is implemented in Quintus Prolog 
on Sun 4 workstations. Adaptations to several spe- 
cific varieties of employment reports have been car- 
ried out, including the multi-paragraph general sum- 
mary reports for English and French, given below 
in §6.1 and §6.2 respectively. The approach out- 
lined here is now being extended to produce other 
varieties of statistical reports, dealing with different 
kinds of data (e.g., retail trade summaries). The user 
interface, which currently allows various choices from 
among a set of options, is being made more flexible 
and dynamic by tying tile choices more directly to 
tile tree schemata that guide the planning process. 
Until now, LFS paraphrasing capability has been im- 
plemented only for eases where variation is needed to 
avoid repetition within a given sentence. The next 
step, now in preparation, is to enforce variation over 
longer stretches of text such as whole paragraphs. 
Ac'r~ DE COLING-92. NANTES, 23-28 AOt~T 1992 1 0 2 1 PRoc. OF COLING-92. NANTES, Auo. 23-28, 1992 
6.1 Sample English output 6.2 Corresponding French output 
COMMENTARY 
Overview 
Estimates for November 1989 from Statistics Canada's Labnur 
Force Survey show th~,t the seasonally adjusted level of employ- 
ment r~ae by 32000 and that the level of unemployment inclosed 
by 30000. The unemployment rate increased by 0.2 to 7.6. 
Employment 
For the week ended November 3, 1989, the seasonally adjusted 
level of employment ~ estimated at 12568000, up 32000 from 
October. The increase was concentrated among women aged 25 
and over. The employment / population ratio remained virtually 
unchanged ( 62.1 ). 
Employment among women aged 25 and over rose by 44000 and 
their employment / population ratio increased by 0.5 to 52.3 
Employment among men aged 25 and over fell by 12000 and their 
employment / population ratio decreased by 0.3 to 72.5. 
Part-time employment increased by 25000. The increase was 
evenly distributed between men and women. 
Full-time employment remained virtually unchanged. An increase 
among women was offset by a decrease among men, 
The level of employment fell by 10000 in agriculture, by 12000 in 
transportation, communication and other utilities and by 12000 
in primary industries other than agriculture. The level of employ- 
ment rose by 68000 in services and by 20000 in trade. The level 
of employment remained virtually unchanged in the other sectors. 
The level of employment rose by 11000 in Quebec, by 8000 in 
Alberta, by 6000 in British Columbia and by 0000 in Ontario. The 
level of employment remained virtually unchanged in the other 
lectors. 
Unemployment and Participation Rate 
The a~aaonally adjusted level of unemployment was estimated at 
1032000 for November 1989, up 30000 from October. The un- 
employment rate rose by 0.2 to 7.0 and the participation rate 
increased by 0.3 to 07.2. 
The increase in unemployment was concentrated among men aged 
25 and over. 
Unemployment among men aged 25 and over increased by 24000 
while unemployment remained virtually unchanged among women 
aged 25 and over. 
The unemployment rate among men aged 15 to 24 increaLed by 
0.7 to 12,9. 
The participation rate among men aged 15 to 24 increased by ft.5 
to 73.4 and the participation rate remained virtually unchanged 
among women aged 15 to 24. 
The seasonally adjusted level of unemployment remained virtu- 
ally unchanged in moat provinces. The level of unemployment 
increm~d only in Ontario ( + 24000 ) 
COMMENTAIRE 
Ape~u 
Lea estimations ti~g*es de Penqu6te de Statistique Canada sur la 
population active pour novembre 1989 indiquent que le niveau 
d~saisonnaliad de I'emploi a augment~ de 32000 et que le niveau 
du ch6mage a augmentd de 30000. Le taux de ch6mage a aagmentd 
de 0.2 it 7.6. 
Emploi 
Pour la semaine ~ terminant le 3 novembre 1989, le niveau 
d6aaisonnalia~ d~emploi eat estimd it 12568000, en hauaae de 32000 
par rapport /L octobre, La hausae a principalement touchd lea 
famines de 25 ans et plus. Le rapport emploi / population n'a 
pratiquement pas vari6 ( 02.1 ). 
L'emploi chez lea famines de 25 ass et plus a augment~ de 44000 
et le rapport emploi / population chez celles de 25 ass et plus a 
augment~ de 0.5 /L 52.3. 
L'emploi chez lea hommes de 25 ass et plus a diminud de 12000 
et le rapport emploi / population chez ceux de 25 ass et plus a 
baissd de 0.3 ~ 7Z5. 
L'emploi a temps partiel a augmentd de 25000, La hauase a'~tait 
~galement r6partie entre lea hommes et lea femmea. 
L'emploi it temps plein n'a pratiquement pan vari6. Une hausse 
chez lea famines a 6td compensde par une balsas chez lea hommes. 
De niveau d'emploi a diminu6 de 10000 dana le secteur de 
l'agriculture, de 12000 dana celui des transports, communications 
et autres Jervices publics et de 12000 dana lea industries primai~a 
autres que l'agrieulture. Le niveau d~emploi a augment~ de 68000 
dana lea industries de services et de 20000 dane le aecteur du com- 
merce. Le niveau d'emploi n'a pratiquement pas vari6 dons lea 
autres secteura. 
Le niveau d'emploi a augment6 de 11000 au Quebec, de 8000 en 
Alberta, de 6000 en Colombie-Britannique et de 5000 en Ontario. 
Le niveau d'emploi n~a pratiquement pan vari6 dana lea autres 
provinces. 
Ch6mage et taux d'activit6 
Le niveau ddsaisonnalia~ de chTmage eat entire6 it 1032000 pour 
novembre 1989, en hauaae de 30000 par rapport it octobre. Le 
taux de ch6mage a augmentd de 0.2 it 7.6 et le taux d*activit6 a 
augment~ de 0.3 A 67.2. 
La hausse du ch6mage a principalement touch6 lea hommes de 20 
ann et plus. 
Le chSmage cheg lea hommes de 20 ann et plus a augment6 de 
24000 slots que le chTmage n'a pratiquement pas varid chez lea 
femmes de 25 ann et plus. 
Le taux de chTmage chez lea hommes de 15 ~ 24 ass a augmentd 
de 0.7 it 12.9. 
Le taux d'activit6 chez les hommes de 15 ik 24 ass a augmentd de 
0.5 it 73.4 et le taux d'activitd n'a pratiquement pan vari~ chez lea 
famines de 15 /t 24 mls. 
Le niveau ddsailonnalia~ de ch6mage n'a pratiquement pas vari6 
dana la plupart des provinces. Le niveau de chSmage a augment~ 
aeulement en Ontario ( q- 24000 ). 
ACl~ DE COLING-92, NANTES, 23-28 AOfff 1992 I 0 2 2 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
References 
\[1\] Bourbeau, L., D. Carcagno, E. Goldberg, R. 
Kittredge and A. Polgu~re (1990) "Bilingual 
Generation of Weather Forecasts in an Opera- 
tions Environment", Proceedings of the 13th In. 
ternational Conference on Computational Lin- 
guistics, vol.3, pp. 318-320. 
\[2\] Careagno D. and L. Iordanskaja (1989) "Con- 
tent Determination and Text Structuring in 
GOSSIP", Extended Abstracts of the Second Eu- 
ropean Workshop on Natural Language Genera- 
lion, Edinburgh. 
\[3\] Iordanskaja, L., R. Kittredge and A. Polgubre 
(1991) "Lexical Selection and Paraphrase in a 
Meaning-Text Generation Model" in Natural 
Language Generation in Artificial Intelligence 
and Computational Linguistics (C. Paris, W. 
Swartout and W. Mann, eds.), Kluwer Aca- 
demic Publishers, pp.293-312. 
\[4\] Kittredge R.., L.Iordanskaja and A. Polgubre 
(1988) "Multi-Lingual Text Generation and tile 
Meaning-Text Theory", Proc. of the 2rid Inter- 
national Conf. on Theoretical and Methodolog- 
ical Issues in Machine Translation of Natural 
Languages, Carnegie-Mellon University. 
\[5\] Kittredge 1%, T. Korelsky and O. Rambow 
(1991) "On the Need for Domain Communi- 
cation Knowledge", Computational lnte!hgencc, 
7(4): 305-314. 
\[6\] Mel'~.uk I. (1981) "Meaning-Text Models", An- 
nual Review of Anthropology, vol.10, pp.27-62. 
\[7\] Mel'/!uk I. and N. Pertsov (1987) Surface Syntax 
of English, Benjaanins, Amsterdam. 
\[8\] Mel'~uk I. and A. Polgu6re (1987) "A Formal 
Lexicon in the Meaning Text Theory (or, how 
to do lexica with words)", Computational Lin- 
guistics, 13(3-4): 261-275. 
\[9\] Polgu~re, A. (1990) Stracturalion el mise en jeu 
procgdurale d'un modkle linguistique ddclaratif 
dans un cadre de gdndration de texte, Ph.D. the- 
sis, Universit6 de Montr6al. 
\[10\] Roesner, D. (1987) "The Automated News 
Agency: SEMTEX - A Text Generator for Ger- 
man" Natural Language Generation: New Re- 
sults in Artificial Intelligence, Psychology and 
Linguistics (G. Kempen, ed.), Martinus Nijhoff 
Publishers, pp.133-148. 
ACRES DE COLING-92, NANTES, 23-28 AO~' 1992 1 0 2 3 PROC. OF COLING-92, NANTES, AUG. 23-28, 1992 
