A complete integrated NLG system using AI and NLU tools
Laurence Danlos
Lattice
U. Paris 7, Case 7003
2, place Jussieu
75251 Paris Cedex 05
France
danlos@linguist.jussieu.fr
Adil El Ghali
Lattice
U. Paris 7, Case 7003
2, place Jussieu
75251 Paris Cedex 05
France
adil@linguist.jussieu.fr
Abstract
A standard architecture for an NLG system has
been defined in (Reiter and Dale, 2000). Their
work describes the modularization of an NLG
system and the tasks of each module. How-
ever, they do not indicate what kind of tools can
be used by each module. Nevertheless, we be-
lieve that certain tools widely used by the AI or
NLU community are appropriate for NLG tasks.
This paper presents a complete integrated NLG
system which uses a Description logic for the
content determination module, Segmented Dis-
course Representation Theory for the document
structuring module and a lexicalized formalism
for the tactical component. The NLG system,
which takes into account a user model, is illus-
trated with a generator which produces texts
explaining the steps taken by a proof assistant.
1 Introduction
The standard architecture of an NLG system
proposed in (Reiter and Dale, 2000) is repre-
sented schematically in Figure 1.1. The tool
used by a module and the data structure of
its output are not precisely defined. According
to Reiter and Dale, they vary from one author
to the other. However, we believe that certain
tools widely used by the AI or NLU community
are appropriate for NLG tasks. Therefore, we
reformulate in more specific terms Figure 1.1 as
Figure 1.2.
The paper describes the modules of Fig-
ure 1.2: Section 3 justifies the use of a descrip-
tionlogicforthecontentdeterminationtaskand
its ouput, a “message”; Section 4 specifies the
use of sdrt for the document structuring task
Documentstructuringà la SDRT
Communicative goals
ContentDetermination
DocumentStructuring
description logicwith aDetermination 
Content
Communicative goals
LexicalizedMicro−planner
 SurfaceLexicalized
  Semantic    Dependency tree
realizer 
Micro−planner
representation    Semantic
Document Planner
Figure 1.1Standard architecture of an NLG system
Tactical component
SDRS
Figure 1.2Architecture of an NLG system
Document plan
message logical form
with data structures
TextText
 Surfacerealizer 
and its output, a “document plan”; Section 5
exposes briefly the use of a lexicalized formal-
ism in the tactical component. Each section is
illustrated with GePhoX, a generator that pro-
duces texts which explain the steps taken by
PhoX , a proof assistant (Raffalli and Roziere,
2002). GePhoX is presented in Section 2.
As this paper intends to present a complete
NLG system, there is no room for explaining
each module in detail. We refer the reader to (El
Ghali, 2001) for the content determination mod-
ule, to (Danlos et al., 2001) for the document
structuring module and to Danlos (1998; 2000)
for the lexicalized tactical component. These
goal ∀p,d : N(d negationslash= N0 → ∃q,r : N(r < d∧p = q ∗d + r))
1. intros.
2. elim −4 H well founded.N.
3. intros.
4. elim −1 d −3 a lesseq.case1.N.
5. next.
6. intros ∃ ∧.
7. next −3.
8. instance ?1 N0.
9. instance ?2 a.
10. intro.
11. trivial.
12. local aprime = a - d.
13. elim −1 aprime H3.
14. trivial.
15. elim lesseq.S rsub.N.
16. elim −1 [case] H0.
17. trivial =H1 H5.
18. trivial.
19. lefts H5 ∧ ∃.
20. intros ∧ ∃.
21. next −3.
22. instance ?4 r.
23. instance ?3 S q.
24. rewrite mul.lS.N −r add.associative.N −r H8.
25. intro.
26. trivial.
27. save euclide exists.
Table 1: Proof script for Euclidian division
modules have been built more or less indepen-
dently from each other, but with the same un-
derlying idea: adaptation of NLU/AI theories
and tools to NLG. They are now integrated in a
complete model, which is presented here and il-
lustrated with GePhoX, a generator whose im-
plementation is in progress.
2 GePhoX
PhoX is an extensible proof assistant based on
higherorderlogic, whichhasbeendeveloppedto
help mathematicians building proofs and teach-
ing mathematics. Like other proof assistants,
PhoX works interactively. The user (a mathe-
matician) first gives the theorem to be proven
(a goal). PhoX returns a list of subgoals which
should be easier to prove than the initial goal.
The user enters a command to guide PhoX
in choosing or achieving a subgoal. The proof
is thus computed top-down from goals to ev-
idences. The user’s commands form a Proof
script. PhoX’s output is a list of successive
goals equivalent to a Proof tree.
Both the Proof script and PhoX’s output are
difficult to read (even for a mathematician), as
the reader can see for himself in Table 1 and
Table 2. Hence, the need of an NLG system in
order to obtain an easy-to-read version of the
proof.
GePhoX is given as input both the Proof
script and the Proof tree. It is one of the
Here is the goal:
goal 1/1
|- /\p,d:N (d != N0 ->
\/q,r:N (r < d & p = q * d + r))
End of goals.
%PhoX% intros.
1 goal created.
New goal is:
goal 1/1
H := N p
H0 := N d
H1 := d != N0
|- \/q,r:N (r < d & p = q * d + r)
End of goals.
. . .
Table 2: Proof tree for Euclidian division
main original proposals in our generator (simi-
lar generators, such as PROVERB (Huang and
Fiedler, 1997), take as input only the Proof
tree). It makes it possible for GePhoX to start
from an incomplete proof and produce texts
during the interactive session. These texts help
the mathematician user: before entering a new
command in the Proof script, he can read a text
reminding himself what he has been doing so
far. The Proof script is also useful for identify-
ing the reasoning strategies that have been used
(reasoning by contradiction or induction), while
it is very hard (if not impossible) to retrieve this
information from a Proof tree with its numerous
deduction steps.
Another originality of GePhoX is that it
takes into account the knowledge of the user
who can be either a mathematician using PhoX
or a person more or less novice in mathematics.
For the same proof, GePhoX can generate sev-
eral texts according to the user model.
3 Using a descrition logic (DL)
The knowledge representation system Kl-One
(Branchman et al., 1979) was the first DL. It
was created to formalize semantic networks and
frames (Minsky, 1974). It introduces the no-
tions of TBoxes and ABoxes respectively for
terminological and assertional knowledge. Kl-
One has been widely used in the NLG commu-
nity to formalize the domain model. On the
other hand, this is not the case for the more
recent DLs. Nevertheless, they present at least
two advantages compared to Kl-One : 1) for a
large variety of DLs, sound and complete algo-
rithms have been developped for main inference
problems such as subsumption, concepts satis-
fiability and consistency (Donini et al., 1996);
2) the relations between instances and classes
are well defined for all the constructors, and
their mathematical and computational proper-
ties have been studied in detail (Horrocks et al.,
2000). So we believe that DLs are appropriate
for the content determination task as shown in
3.3. Let us first briefly present DLs.
3.1 A brief introduction to DL
The three fundamental notions of DLs are in-
dividuals (representing objects in the domain),
concepts (describing sets of individuals), and
roles (representing binary relations between
individuals or concepts). A DL is characterized
by a set of constructors that allow us to build
complex concepts/roles from atomic ones.
The set of constructors which seem useful for
GePhoX and their syntax are shown in Table
3; examples of concepts and roles with their
semantic are shown underneath Table 3.
Constructor (abbreviation) Syntax
atomic concept A
top latticetop
bottom ⊥
conjonction C ∧ D
disjonction (U) C ∨ D
complement (C) rightangleneC
univ. quant. ∀R.C
exist. quant. (E) ∃R.C
numeral restrictions (N) >n R.C
≤n R.C
collection of individuals (O) {a1,...,an}
atomic role P
roles conjonction (R) Q∧R
inverse role R−1
role composition Q ◦ R
Table 3: Syntax of standard constructors
Examples of concepts with their semantic
Theorem, Variable, {H1}, ∃CHOOSE.User
{ x / Theorem(x) } : Theorem concept
{ x / Variable(x) } : Variable concept
{ H1} : concept constructed by the O construc-
tor on individual H1
{ x / ∃ u : User, CHOOSE(u,x) }
Examples of roles with their semantic
IMPLIES, PROVES
{ x,y / IMPLIES(x,y) } : x implies y
{ x,y / PROVES(x,y) } : x proves y
The choice of constructors is domain depen-
dent. Constructors other than those used in
GePhoX (e.g. temporal extension) can be used
for other domains (e.g. domains with non triv-
ial temporal information), without altering the
mathematical and computational properties.
3.2 Domain and user models in DL
The Domain model is the set of concepts and
roles necessary to express the input of the gen-
erator. More formally, let TD be a TBox, such
that each input I can be described by means of
an ABox AD corresponding to TD. The knowl-
edge base ΣD = (TD,AD) is called knowledge
base for the domain and noted dkb. The User
model is a knowledge base ΣU = (TU,AU) such
that TU and AU are respectivly subsets of TD
and AD. ΣU is noted ukb. Table 4 shows a
part of the dkb for GePhoX.
Goal MathObj
Subgoal Axiom
Hypothese Theorem
Rules well_founded
Intro lesseq.case1
Elim add.associative
Rewrite Operator
Trivial LogicalOper
Left Exist
ReasonningStrategy Forall
ByInduction LAnd
ByContradiction ArithOper
. . . Add
. . . Multi
Table 4: GePhoX Domain model
3.3 Content determination tasks
The content determination module performs
four tasks, as shown in Figure 2.
Translation: The input of the generator (as-
sertional information) is first translated into
concepts of the TBox. For that purpose, a
correspondence is established between the ele-
ments of the input and concepts and roles in
the dkb. The O constructor is used to keep
information about the individuals occurring in
the input. For example, command 2 in Table 1
with individual H is translated into the concept
C0 .= ∃EliminationWell founded.Hypothese
{H}, and commands 8 to 11 are translated into
C1 .= ∃ByInduction {p}.
Selection: The selection task consists of
choosing the relevant concepts among those
constructed in the translation phase with regard
to the ukb. For example, if C0 is an unknown
concept for the user, a conceptC must be looked
up in the ukb such as C approximates1 C0.
TBoxConcepts
Concepts
Translation
Selection
Verification
Instanciation
Terminological Assertional
Logical FormABox
Input
Figure 2: Content Determination Tasks
Verification: At this point, the coherence of
all the concepts of the selection is verified. For
example, if the user tries to reason by induction
on a real number, GePhoX tells him that it is
not possible.
Instanciation: With the information about
individuals, which have been kept in the transla-
tion phase (with the use of the O constructor),
the instanciation task is straightforward. Ta-
ble 5 shows some instanciated concepts for the
Euclidian division.
As is well known, designing knowledge bases
(dkb and ukb) and translating the input of the
generator into concepts and roles of the DL is
a difficult task which has to be fulfilled for ev-
ery generator. However, with a DL, the selec-
tion, verification and instanciation tasks are do-
main independent: algorithms and their imple-
mentation are reusable. Moreover, when using
a DL for the content determination task, the
“message” is a first order logic formula (a stan-
dard representation shared by a large commu-
1Given two TBoxes T and T prime with T ⊂ T prime and a
concept C ∈ T − T prime, Cprime ∈ T prime approximates C if C
minimally subsumes Cprime or Cprime minimally subsumes C.
. ∃p1 ∈ Entier
named(p1,p)
choose(user,p1)
. ∃d1 ∈ EntierNonNul
named(d1,d)
choose(user,d1)
. ∃f1 ∈ Formula
constant(f1,∃q,r: N (r < d∧p = q.d + r))
. prove(user,f1)
induction(f1,p1)
...
Table 5: DL-Message for Euclidian division
nity) which takes into account the user knowl-
edge and whose coherence has been checked.
4 Using SDRT for document
structuring
In (Danlos et al., 2001) we advocate using sdrt
(Segmented Discourse Representation Theory
(Asher, 1993; Asher and Lascarides, 1998)) as
a discourse framework, since sdrt and drt
(Discourse Representation Theory, (Kamp and
Reyle, 1993)) are the most popular frameworks
for formal and computational semantics. Let us
briefly present sdrt.
4.1 A brief introduction to SDRT
sdrt, designed first for text understanding, was
introduced as an extension ofdrtin order to ac-
count for specific properties of discourse struc-
ture. sdrt can be viewed as a super-layer on
drt whose expressiveness is enhanced by the
use of discourse relations. Thus the drt struc-
tures (Discourse Representation Structures or
drs) are handled as basic discourse units in
sdrt.
drss are ”boxed” first order logic formulae.
Formally, a drs is a couple of sets 〈U,Con〉. U
(the universe) is the set of discourse referents.
Con contains the truth conditions representing
the meaning of the discourse.
A sdrs is a pair 〈U,Con〉, see Figure 3. U
is a set of labels of drs or sdrs which can
be viewed as “speech act discourse referents”
(Asher and Lascarides, 1998). Con is a set of
conditions on labels of the form:
• pi : K, where pi is a label from U and K is
a (s)drs
• R(pii,pij), where pii and pij are labels and
R a discourse relation. Discourse relations
are inferred non-monotonically by means of
a defeasible glue logic exploiting lexical and
world knowledge.
SDRS
labels
a0a1a0a3a2
a0a5a4
a6a8a7
a6a10a9 Max
falla11 a7a13a12a14a6a16a15
a0 a2 a4
a17a19a18a20a7a2
a17a21a9 John
pusha11 a7 a2 a12a14a17a22a12a14a18a23a15
a18a10a9a24a6
Explanationa11 a0 a12 a0 a2a15
discoursereferents
DRS(basicdiscourseconstituents)
conditions(content/meaning)
discourserelation
Figure 3: sdrs for Max fell. John pushed him.
4.2 Building a SDRS
Starting from a “message” encoded into a log-
ical form, the document structuring module
builds a sdrs. On a first step, the logical form
is translated into a drs. In the case of a purely
existential formula2, this amounts to putting all
the variables into the universe of the drs and
splitting the formula into elementary conjoined
conditions.
After this first step, the document structuring
task amounts to building a sdrs from a drs and
to go on recursively on each embedded (s)drs.
This process is schematized below.
universe
condition1
condition2
condition3
condition4
condition5
condition6
condition7
−→
pi1 pi2 pi3
pi1 :
universe1
condition1
condition7
pi2 :
universe2
condition2
condition5
pi3 : universe3condition
4
R1(pi1,pi2) → condition3
R2(pi2,pi3) → condition6
Let us first examine the principles governing
the splitting of the conditions. All the condi-
tions in the drs have to be expressed in the
sdrs. Two cases arise:
• either a condition in the drs appears as a
condition in one of the sub-drs; that is the
case for condition1 which appears in the
sub-drs labelled pi1;
2More complex formulas are not considered here.
• or it is expressed through a discourse re-
lation; that is the case for condition3 with
R1(pi1,pi2) → condition3, which means that
R1(pi1,pi2) must have condition3 among its
consequences: no other element is in charge
of expressing condition3.
To establish discourse relations, the sdrt
conditions are reversed. As an illustration, in
sdrt for text understanding, there is the Ax-
iom (1) for Narration. This axiom states that if
Narration holds between two sdrss pi1 and pi2,
then the main event (me) of pi1 happens before
the main event of pi2.
(1) a50(Narration(pi1,pi2) → me(pi1) < me(pi2))
For text generation, this axiom is reversed as
shown below (Roussarie, 2000, p. 154):
• If k1 and k2 are drss whose main eventu-
alities are not states,
• and if the main event of k1 occurs before
the main event of k2,
• then Narration(pi1,pi2) is valid when pi1 and
pi2 respectively label k1 and k2.
As another example, the condition
cause(e1,e2) can be expressed through Re-
sult(pi1,pi2) or Explanation(pi2,pi1) when pi1
and pi2 label the sub-drss that contain the
descriptions of e1 and e2 respectively.
Let us now examine how we determine the
universes of sub-drss, i.e. discourse refer-
ents, while observing two technical constraints,
namely:
• the arguments of any condition in a sub-
drs must appear in the universe of this
drs;
• the universes of all the sub-drss have to be
disjoint. This constraint is the counterpart
of the following constraint in understand-
ing: “partial drss introduce new discourse
referents” (Asher, 1993, p. 71).
These two constraints are not independent.
Assuming that the first constraint is respected,
the second one can be respected with the fol-
lowing mechanism: if a variable x already ap-
pears in a preceding sub-drs labelled pix, then
a new variable y is created in the universe of
the current sub-drs labelled piy and the con-
dition y = x is added to the conditions of piy.
The discourse referent y will be generated as an
anaphora if pix is available to piy (Asher, 1993),
otherwise it will be generated as a definite or
demonstrative NP.
A document structuring module la sdrt
based on the principles we have just exposed
can be used for any generator (whose “message”
is first order logic formula). The algorithm and
the rules establish discourse relations (obtained
by reversing the rules in NLU) are generic. See
below an example of sdrs in GePhoX, the
sdrs built from Table 5.
pi3pi4
pi3 :
pi1pi2
pi1 :
x u e1
user(u)
entier(x)
named(x,p)
choose(e1,u,x)
pi2 :
y v e2
entier-non-nul(y)
named(y, d)
choose(e2,v,y)
v = u
Parallel(pi1,pi2)
pi4 :
x1 f w e3
formula(f)
constant(f,∃q,r:N ...)
prove(e3,w,f)
induction(e3,x1)
w = u
x1 = x
Narration(pi3,pi4)
Table 6: sdrs for Euclidian division
5 Using a lexicalized grammar for
the tactical component
Lexicalized grammars are commonly used in
NLU and also in NLG (Stede, 1996). In Dan-
los (1998; 2000) we propose a lexicalized formal-
ism, called g-tag, for the tactical component of
an NLG system. It is modularized into a micro-
planner which produces a semantic dependency
tree and a surface realizer which produces the
text (see Figure 1.2).
The surface realizer is designed to use the syn-
tactic and lexical information of a lexicalized
tag grammar. The tag grammar is extended
to handle multi-sentential texts and not only
isolated sentences.
The microplanner is based on a lexicalized
conceptual-semantic interface. This interface is
made up of concepts; each concept is associated
with a lexical database. In our model, a con-
cept is either a term in the TBox or a discourse
relation. A lexical database for a given concept
records the lexemes lexicalizing it with their ar-
gument structure, and the mappings between
the conceptual and semantic arguments. The
process of generating a semantic dependency
tree from a sdrs 〈U,Con〉 is recursive:
- An element pii in U is generated as a clause
if pii labels a drs and recursively as a text
(possibly a complex sentence) if pii labels a
sdrs.
- A condition R(pii,pij) in Con is generated as a
text “Si. Cue Sj.” or as a complex sentence
“Si Cue Sj.”, where Si generates pii, Sj pij,
and Cue is a cue phrase which is encoded
in the lexical database associated with R
(Cue may be empty).
- A condition pi : K in Con where K is a drs
〈U,Con〉 is generated as a clause according
to the following constraints (which are the
counterparts of constraints in understand-
ing):
• A discourse referent in U is generated as an
NP or a tensed verb.
• Conditions guide lexical choices. Condi-
tions such as x = John correspond to
proper nouns. Equality conditions between
discourse referents (e.g. x = y) give rise
to (pronominal or nominal) anaphora. The
other conditions, e.g. prove(e1,x,y), are
lexicalized through the lexical data base as-
sociated with the concept (prove).
The surface realizer, based on a tag gram-
mar, is a set of lexical data bases. A data base
for a given lexical entry encodes the syntactic
structures realizing it with their syntactic argu-
ments. With such a tag grammar and a mor-
phological module, the text is computed in a de-
terministic way from the semantic dependency
tree.
6 Conclusion
Since NLG is a subfield of NLP, which is itself
a subfield of AI, it seems to be a good idea to
reuse tools developped by the NLP or AI com-
munity. We have shown in this paper how to
integrate DL, sdrt, and a lexicalized grammar
into an NLG system, while following the stan-
dard pipelined architecture3.
3Some authors (de Smedt et al., 1996) have made jus-
tified criticisms of the pipelined architecture. However,
we decided to keep it for the time being.
Theorem.
∀p,d:IN (d negationslash= 0 → ∃q,r:IN (r < d ∧ p = q.d + r))
Proof. Let us choose p,d two natural numbers
with d negationslash= 0. By induction on p we prove
∃q,r:IN (r < d ∧ p = q.d + r). Let take a a strictly
positive natural. We assume
∀b:IN (b < a → ∃q,r:IN (r < d ∧ b = q.d + r))
and we must prove ∃q,r:IN (r < d ∧ a = q.d + r).
We distinguish two cases: a < d and d ≤ a. In the
first case, we choose q = 0 and r = a. In the second
case, we take aprime = a − d. Using the induction hy-
pothesis on aprime, we find two naturals q,r such that
r < d and aprime = q.d + r. We take S q and r as quo-
tient and remaining for the division of a. We must
prove a = S q.d + r which is immediate.
Table 7: A Text of proof for Euclidian division
GePhoX illustrates the applicabilty of our
system. It is currently being implemented in
Java. The development of the document plan-
ner of GePhoX is work in progress. The goal
is to interface this module with CLEF (Meunier
and Reyes, 1999), an implementation of g-tag.
We intend to produce a text as shown in Ta-
ble 7.

References

N. Asher and A. Lascarides. 1998. The seman-
tics and pragmatics of presupposition. Jour-
nal of Semantics, 15(3):239–300.

N. Asher. 1993. Reference to Abstract Objects
in Discourse. Kluwer, Dordrecht.

R. Branchman, R. Bobrow, P. Cohen, J. Klovs-
tad, B. Webber, and W. Woods. 1979. Re-
search in natural language understanding.
Technical Report 4274, Bolt. Beranek and
Newman, Cambridge MA.

L. Danlos, B. Gaiffe, and L. Roussarie. 2001.
Document structring `a la sdrt. In ACL’2001
Toulouse Proceeding.

L. Danlos. 1998. G-TAG : un formalisme lexi-
calis´e pour la g´en´eration de textes inspir´e de
tag. Revue T.A.L., 39(2):7–33.

L. Danlos. 2000. G-TAG: A lexicalized formal-
ism for text generation inspired by Tree Ad-
joining Grammar. In A. Abeill´e and O. Ram-
bow, editors, Tree Adjoining Grammars: for-
malisms, linguistics analysis and processing,
pages 343–370. CSLI Publications, Stanford.

K. de Smedt, H. Horacek, and M. Zock.
1996. Architectures for natural language
generation: Problems and perspectives. In
G. Adorni and M. Zock, editors, Trends in
NLG. Proceedings of the 4th European Work-
shop, EWNLG’93, Pisa. Springer-Verlag.

F. Donini, M. Lenzerini, D. Nardi, and
A. Schaerf. 1996. Reasoning in descrip-
tion logics. In G. Brewka, editor, Principles
of Knowledge Representation and Reasoning,
Studies in Logic, Language and Information.
CLSI Publications.

A. El Ghali. 2001. Une logique de description
pour le module quoi-dire-? DEA de linguis-
tique informatique, Universit´e Paris 7.

I. Horrocks, U. Sattler, and S. Tobies. 2000.
Practical reasoning for very expressive de-
scription logics. Logic Journal of the IGPL,
8(3):239–264.

X. Huang and A. Fiedler. 1997. Proof verbal-
ization as an application of NLG. In IJCAI
(2), pages 965–972.

H. Kamp and U. Reyle. 1993. From Discourse
to Logic. Kluwer Academic Publishers, Dor-
drecht, The Netherlands.

F. Meunier and R. Reyes. 1999. La plate
forme de dveloppement de gnrateurs de textes
CLEF. In Actes du 2e Colloque Franco-
phone sur la Gnation Automatique de Textes,
GAT’99, Grenoble.

M. Minsky. 1974. A framework for representing
knowledge. MIT-AI Laboratory Memo 306.

C. Raffalli and P. Roziere, 2002. The PhoX
Proof checker documentation. LAMA, Uni-
versit´e de Savoie / Universit´e Paris 7.

E. Reiter and R. Dale. 2000. Building Natural
Language Generation Systems. Cambridge
University Press.

L. Roussarie. 2000. Un mod`ele th´eorique
d’inf´erences de structures s´emantiques et dis-
cursives dans le cadre de la g´en´eration au-
tomatique de textes. Th`ese de doctorat en lin-
guistique, Universit´e Paris 7.

M. Stede. 1996. Lexical paraphrases in multi-
lingual sentences generation. Machine Trans-
lation, 11.
