Document Structure and Multilingual Authoring 
Caroline Brun Marc Dymetman Veronika Lux 
Xerox Research Centre Europe 
6 chemin de Maupertuis 
38240 Meylan, France 
{brun, dymetman, lux}©xrce, xerox, com 
Abstract 
The use of XML-based authoring tools is swiftly be- 
coming a standard in the world of technical docu- 
mentation. An XML document is a mixture of struc- 
ture (the tags) and surface (text between the tags). 
The structure reflects the choices made by the au- 
thor during the top-down stepwise refinement of the 
document under control of a DTD grammar. These 
choices are typically choices of meaning which are 
independent of the  in which the document 
is rendered, and can be seen as a kind of interlin- 
gua for the class of documents which is modeled by 
the DTD. Based on this remark, we advocate a rad- 
icalization of XML authoring, where the semantic 
content of the document is accounted for exclusively 
in terms of choice structures, and where appropri- 
ate rendering/realization mechanisms are responsi- 
ble for producing the surface, possibly in several lan- 
guages simultaneously. In this view, XML authoring 
has strong connections to natural  genera- 
tion and text authoring. We describe the IG (In- 
teraction Grammar) formalism, an extension of DT- 
D's which permits powerful linguistic manipulations, 
and show its application to the production of multi- 
lingual versions of a certain class of pharmaceutical 
documents. 
1 Introduction 
The world of technical documentation is forcefully 
moving towards the use of authoring tools based 
on the XML markup  (W3C, 1998; Pardi, 
1999). This  is based on grammatical spec- 
ifications, called DTD's, which are roughly similar 
to context-free grammars 1 with an arbitrary num- 
ber of non-terminals and exactly one predefined ter- 
minal called pcdata. The pcdata terminal has a 
special status: it can dominate any character st, ring 
(subject to certain restrictions on the characters al- 
lowed). Authoring is seen as a. top-down interactive 
process of step-wise refinement of the root nonter- 
minal (corresponding to the whole document) where 
the author iteratively selects a rule for expanding a 
lBut see (Wood, 1995: Prescod, 1998) for discussions of 
the differences. 
nonterminal already present in the tree and where 
in addition s/he can choose an arbitrary sequence 
of characters (roughly) for expanding tile pcdata 
node. The resulting document is a mixture of tree- 
like structure (the context-free derivation tree cor- 
responding to the author's selections), represented 
through tags, and of surface, represented as free-text 
(PCDATA) between the tags. 
We see however a tension between the structure 
and surface aspects of an XML document: 
® While structural choices are under system con- 
trol (they have to be compatible with the DTD), 
surface choices are not. 2 
• Surface strings are treated as unanalysable 
chunks for the styling mechanisms that render 
the XML document to the reader. They can 
be displayed in a given font or moved around, 
but they lack the internal structure that would 
permit to "re-purpose" them for different ren- 
dering situations, such as displaying on mobile 
telephone screens, wording differently for a spe- 
cific audience, or producing prosodically ade- 
quate phonetic output. This situation stands 
in contrast with the underlying philosophy of 
XML, which emphasizes the separation between 
content specification and the multiple situations 
in which this content can be exploited. 
. Structural decisions tend t,o be associated wit, h 
choices of meaning which are independent of the 
 in which the document is rendered. 
Thus for instance the DTD for an aircraft main- 
tenance manual might distinguish between two 
kinds of risks: caution (material damage risk) 
and warning (risk to the operator). By select- 
ing one of these options (a choice that will lead 
t,o further-t_owerdevel choices,), the::author takes 
a decision of a semantic nature, which is quite 
independent of the  in which the docu- 
ment is to be rendered, and which could be ex- 
ploited to produce multilingual versions of the 
2With the emergenceof schemas (W3C, 1999a), which per- 
mit some typing of the surface (float, boolean, string, etc.), 
some degree of control is becoming more feasible. 
24 
document. By contrast, a PCDATA string is 
-specific.and ill-suited for multilingual 
applications. 
These remarks point to a possible radical view of 
XML authoring that advocates that surface strings 
be altogether eliminated from the document content, 
and that author choices be all under the explicit con- 
trol of the DTD and reflected in the document struc- 
ture. Such a view, which is argued for in a related 
paper (Dymetman et el., 2000), emphasizes the link 
application of MDA to a certain domain of pharma- 
ceutical documents. 
2 Our approach to Multilingual 
Document Authoring 
Our Multilingual Document Authoring system has 
the following main features: 
First, the authoring process is monolingual, but 
the results are multilingual. At each point of the pro- 
cess the author can view in his/her own  the 
..... . .......... between ~ML`d~cumeqt~a~a9ring`~aad;mu~ti~nguaL;~,~.~te~t:~s/h~hasa~u~h~rex~:~.~aa~a~d~rea~Èwhere~he ..: 
text authoring/generation (Power and Scott, 1998; text still needs refinement are highlighted. Menus 
Hartley and Paris, 1997; Coch, 1996): the choices for selecting a refinement are also presented to the 
made by the author are treated as a kind of in- author is his/her own . Thus, the author is 
terlingua (specific to the class of documents being always overtly working in the  s/he nows, 
modelled), and it is the responsibility of appropri- but is implicitly building a -independent 
ate "rendering" mechanisms to produce actual text representation of the document content. From this 
from these choices ill tile different s 3 under representation, the system builds multilingual texts 
consideration, in any of several s simultaneously. This ap- 
For such a program, existing XML tools suffer proach characterizes our system as belonging to an 
however from serious limitations. First, DTD's are emerging paradigm of"natural  authoring" 
too poor in expressive power (they are close to (Power and Scott, 1998; Hartley and Paris, 1997), 
context-free grammars) for expressing dependencies which is distinguished from natural  gener- 
between different parts of the document, an aspect ation by the fact that the semantic input is provided 
which becomes central as soon as the document interactively by a person rather than by a program 
micro-structure (its fine-grained semantic structure) accessing digital knowledge representations. 
starts to play a prominent role, as opposed to simply Second, the system maintains strong control both 
its macro-structure (its organization in large seman- over the semantics and the realizations of the docu- 
tic units, typically larger than a paragraph). Second, ment. At the semantic level, dependencies between 
current rendering mechanisms such as CSS (Cascad- different parts of the representation of the document 
ing Style Sheets) or XSLT (XLS transformation lan- content can be imposed: for instance the choice of 
guage) (W3C, 1999b) are ill-adapted for handling a certain chemical at a certain point in a mainte- 
even simple linguistic phenomena such as morpho- nance manual may lead to an obligatory warning 
logical variation or subject-verb agreement, at another point in the manual. At the realization 
In order to overcome these limitations, we are level, which is not directly manipulated by the au- 
using a formalism, Interaction Grammars (IG), a thor, the system can impose terminological choices 
specialization of Definite Clause Grammars (Pereira (e.g. company-specific nomenclature for a given con- 
and Warren, 1980) which originates in A. Ranta's cept) or stylistic choices (such as choosing between 
Grammatical Framework (GF) (Ranta; M~enp~igt using the infinitive or the imperative mode in French 
and Ranta, 1999; Dynaetman et el., 2000), a gram- to express an instruction to an operator). 
matical formalism based on Martin-LSf's Type The- Finally, and possibly most distinctively, the st- 
ory (Martin-L6f, 1984) and building on previous ex- mantle representation underlying the authoring pro- 
perience with interactive mathematical proof editors cess is strongly document-centric and geared towards 
(Magnusson and Nordstr6m, 1994). In this formal- directly expressing the choices which uniquely char- 
ism, the carrier of meaning is a choice tree (called aeterize a given document in an homoge~cous class 
"abstract tree" in GF), a strongly typed object in of documents belonging to the same domain. Our 
which dependencies between substructures can be view is document-centric in the sense that it takes 
easily stated using the notion of dependent types, as its point of departure the widespread practice of 
The remainder of this paper is organized as fol- using XML tools for authoring the macro-structure 
lows. In section 2,,,we give a'~,high.teveloverview .of ..... of doeuments,-oand--extends this-practice towards an 
the Multilingual Document Authoring (MDA) sys- account of their m.icro-structure. But the analysis 
tern that we have developed at XRCE. In section of the micro-structure is only pushed as far as is 
3, we present in some detail the formalism of In- necessary in order to account for the variability in- 
teraction Grammars. In section 4. we describe an side the class of documents considered, and not in 
terms of the ultimate meaning constituents of lan- 3The word "" should be understood here in an 
extended sense tha! not only covers English. French. etc., but guage. This nlicro-structure can in general be de- 
also different styles or modes of communication, lerlniued by studying a corpus of documents and by 
25 
exposing the structure of choices that distinguish a 
given document from other documents in this class. 
This structure of choices is represented in a choice 
tree, which is viewed as the semantic representation 
for the document. 4 One single choice may be asso- 
ciated with text realizations of drastically different 
granularities: while in a pharmaceutical document 
the choice of an ingredient may result in the produc- 
tion of a single word, the choice of a "responsability- 
waiver" may result in a long stereotypical paragraph 
of text, the further analysis of which would be totally 
.counter-productive. 
3 Interaction Grammars 
Let us now give some details about the formalism 
of Interaction Grammars. We start by explaining 
the notion of choice tree on the basis of a simple 
context-free grammar, analogous to a DTD. 
Context-free grammars and choice trees 
Let's consider the following context-free grammar 
for describing simple "addresses" in English such as 
"Paris, France": s 
address --> city, ",", 
country. 
country --> "France". 
country --> "Germany". 
city --> "Paris". 
city --> "Hamburg". 
city --> "the capital of", 
country. 
What does it mean, remembering the XML anal- 
ogy, to author a "document" with such a CFG? It 
means that the author is iteratively presented with 
partial derivation trees relative to the grammar (par- 
tial in the sense that leaves can be terminals or non- 
terminals), and at each given authoring step both 
selects a certain nonterminal to "refine", and also a 
given rule to extend this non-terminal one step fur- 
ther: this action is repeated until the derivation tree 
is complete. 
If one conventionally uses the identifier 
nonterminal~ to name the i-th rule expanding 
the nonterminal nonterminal, then the collection 
of choices made by the author during a session can 
be represented by a choice tree labelled with rule 
identifiers, also called combinators. An example 
of such a tree is addressl(city2,country2) 
4This kind of semantic representation stands i-n contrast 
to some representations commonly used in NLP, which tend 
to emphasize the fine-grained predicate-argument structure of 
sentences independently of the productivity of such analyses 
.\[or a given class of documents. 
5For compatibility with the notacionsCo follow, we use low- 
ercase to denote nonlerminals, aml quoted strings to denote 
terminals, rather than tile inore usna\[ ul)pot'case lowercase 
convent ions. 
which corresponds to choices leading to the output 
"Hamburg, Germany". 6 In.practice, rather than 
using combinator names which strictly adhere to 
this numbering scheme, we prefer to use mnemonic 
names directly relating to the meaning of the 
choices. In the sequel we will use the names adr; 
fra, ger, par, ham, cap for the six rules in the 
example grammar. The choice tree just described is 
thus written adr(ham,ger). 
Making choice trees explicit As we have ar- 
gued previously, choices trees are in our view the cen- . 
tral repositoi-y of documentc0ntent and we Want to 
manipulate them explicitely. Definite Clause Gram- 
mars represent possibly the simplest extension of 
context-free grammars permitting such manipula- 
tion. Our context-free grammar can be extended 
straightforwardly into the DCG: 7 
address(adr(Co,C)) --> city(C), "," 
country(Co). 
country(fra) --> "France". 
country(ger) --> "Germany". 
city(par) --> "Paris". 
city(ham) --> "Hamburg". 
city(cap(Co)) --> "the capital of", 
country(Co). 
What these rules do is simply to construct choice 
trees recursively. Thus, the first rule says that if the 
author has described a city through the choice tree 
C and a country through the choice tree Co, then the 
choice tree adr(Co,C) represents the description of 
an address. 
If now, in this DCG, we "forget" all the terminals, 
which are -specific, by replacing them with 
the empty string, we obtain the following "abstract 
gram mar'l: 
address(adr(Co,C)) --> city(C), country(Co). 
country(fra) --> \[\]. 
country(ger) --> \[\]. 
city(par) --> \[\]. 
city(ham) --> \[\]. 
city(cap(Co)) --> country(Co). 
which is in fact equivalent to the definite clause 
program: s 
SSuch a choice tree can be projected into a derivation 
tree in a straightforward way, by mapping a combinator 
nonterminali into the monterminal name nontermin,:.l, and 
by 'introducing terminal material as required by the specific 
rules. 
7According to the usual logic programming conventions, 
lowercase letters denote predicates and functors, whereas up- 
percase letters denote metavariables that can be instauciated 
with terms. 
Sin the sense that rewriting the nonterminal goal 
address (adr (Co ,C)) to the empty string in the DCG is equiv- 
alent to proving the goal address(adr(Co,C)) in the program, 
26 
address(adr(Co,C)) :- city(C), country(Co). 
country (f ra). 
country (ger). 
city(par). 
city(ham). 
city(cap(Co)) :- country(Co). 
This abstract grammar (or, equivalently, this logic 
program), is  independent and recursively 
defines a set of well-formed choice trees of different 
categories, or types. Thus, the tree adr(ham,ger) 
is .well-formed "in".. the. :typ~/add.~:r~s, ,End the .lice 
cap(fra) well-formed in the type city. 
Dependent Types In order to stress the type- 
related aspects of the previous tree specifications, 
we are actually using in our current implementa- 
tion the following notation for the previous abstract 
grammar: 
adr(Co,C)::address --> C::city, 
Co : : country. 
fra: :country --> \[\] . 
ger: :country --> \[\] . 
par: :city --> \[3 . 
ham: :city --> \[\]. 
cap(Co)::city --> Co::country. 
The first rule is then read: "if C is a tree of 
type city, and Co a tree of type country, then 
adr(Co,C) is a tree of type address", and similarly 
for the remaining rules. 
The grammars we have given so far are deficient 
in one important respect: there is no dependency 
between the city and the country in the same ad- 
dress, so that the tree adr(ham,fra) is well-formed 
in the type address. In order to remedy this prob- 
lena, dependent types (Ranta; Martin-L6f, 1984)can 
be used. From our point of view, a dependent type 
is simply a type that can be parametrized by objects 
of other types. We write: 
adr(Co,C)::address --> C::city(Co), 
Co: :country. 
fra: :country --> \[\] . 
get: :country --> \[\] . 
par::city(fra) --> \[\]. 
ham::city(ger) --> \[\]. 
cap(Co)::city(Co) --> Co::country. 
in which the type city is now parametrized by 
objects of type country, and where the notation 
par : : city(fra) is read as "'paris atree of the type: 
city of fra'. 9 
which is another way of stating the well-known duality be- 
tween the rewriting and the goal-proving approaches to the 
interpretation of Prolog. 
9In terms of the underlying Prolog implementation. "::" is 
simply an infix operator for a predicate of arity 2 which relates 
an object and its type, and both simple and dependent types 
are handled st raighforwardly. 
Parallel Grammars and Semantics-driven 
• Compositionality.for .;Text .;Realizat6ion We 
have just explained how abstract grammars can be 
used for specifying well-formed typed trees repre- 
senting the content of a document. 
In order to produce actual multilingual documents 
from such specifications, a simple approach is to al- 
low for parallel realization English, French ..... gram- 
mars, which all have the same underlying abstract. 
grammar (program), but which introduce terminals 
specific, to ~the_  -at. hand. Thus. the (ollow- 
ing French andEnglish gi-annmkrs a/'e pai~allel to the ':" 
previous abstract grammar:l° 
adr(Co,C)::address --> C::city(Co), ",", 
Co: :country. 
fra: :country --> "France". 
ger : : country --> "Germany". 
par::city(fra) --> "Paris". 
ham: : city(ger) --> "Hamburg". 
cap(Co)::city(Co) --> "the capital of", 
Co : : country. 
adr(Co,C)::address --> C::city(Co), ",", 
Co : : country. 
fra: : country --> "In France". 
ger : : country --> "i' Allemagne". 
par: : city(fra) --> "Paris". 
ham: : city (get) -- > "Hambourg". 
cap(Co): :city(Co) --> "In capitale de", 
Co: :country. 
This view of realization is essentially the one we 
have adopted in the prototype at the time of writ- 
ing, with some straighforward additions permitting 
the handling of agreement constraints and morpho- 
logical variants. This simple approach has proven 
quite adequate for the class of documents we have 
been interested in. 
However, such an approach sees the activity of 
generating text from an abstract structure as ba- 
sically a compositional process on strings, that is, 
a process where strings are recursively associated 
with subtrees and concatenated to produce strings 
at the next subtree level. But such a direct proce- 
dure has well-known limitations when the seinantic 
and syntactic levels do not have a direct correspon- 
dence (simple example: ordering a list of modifiers 
around a noun). We are currently experimenting 
with.a, powerful extension~of.stri.ng compqsihonality - 
where tim objects compositionally associated with 
abstract subtrees are not strings, but syntactic rep- 
resentations with rich internal structure. The text 
10Because the order of goals in the right-hand side of an ab- 
stract grammar rule is irrelevant, the goals on the right-hand 
sides of rule in two parallel realization grammars can appear 
in a different order, which permits certain reorganizations of 
the linguistic material (situation not shown in the example). 
27 
itself is obtained from the syntactic representation 
associated with the .total tree .by simply enumerat- 
ing its leaves. 
In this extended view, realization grammars have 
rules of the following form: 
al(B,C .... )::a(D .... )-Syn --> 
B::b(E .... )-SynB, 
C::c(F,...)-SynC, 
general public. Le VIDAL ® includes a collection of 
notices ,for .around• 5 5.00. dmgs..a~ailable .in France. 
As the publisher, OVP-t~ditions du Vidal has taken 
care of homogeneity across the notices, reformatting 
and reformulating source information. The main 
source are the New Drug Authorizations (Autori- 
sation de Mise sur le March~), regulatory docu- 
ments written by pharmaceutical laboratories and 
approved by legal authorities. 
Relative to multilingual document authoring, this 
{constraints (B, C ..... D, E, F .... ) }, corpus has three features whicli,~e, considered highly 
• ' {compose=engt.ish(~synB ;~.SynC, " :-;-.Syn.)~}-~.--:-desi-r~ble:;(l)-it-dea\[s.with ,a.res\[rlcted-~em~:tit d~2 
The rule shown is a rule for English: the syn- 
tactic representations are  dependent; par- 
allel rules for the other s are obtained by 
replacing the compose_english constraint (which is 
unique to this rule) by constraints appropriate to the 
other s under consideration. 
Heterogeneous Trees and Interactivity Natu- 
ral  authoring is different from natural lan- 
guage generation in one crucial respect. Whenever 
the abstract tree to be generated is incomplete (for 
instance the tree cap(Co)), that is, has some leaves 
which are yet uninstanciated variables, the genera- 
tion process should not proceed with nondeterminis- 
tically enumerating texts for all the possible instan- 
elations of the initial incomplete structure. Instead 
it should display to the author as much of the text as 
it can in its present "knowledge state", and enter into 
an interaction with the author to allow her to fur- 
thor refine the incomplete structure, that is, to fur- 
ther instanciate some of the uninstanciated leaves. 
To this purpose, it is useful to introduce along with 
the usual combinators (adr, fra, cap, etc.) new 
combinators of arity 0 called typenames, which are 
notated type, and are of type "type. These combi- 
nators are allowed to stand as leaves (e.g. in the tree 
cap(country)) and the trees thus obtained are said 
to be heterogeneous. The typenames are treated by 
the text generation process as if they were standard 
semantic units, that is, they are associated with text 
units which are generated "at their proper place" in 
the generated output. These text units are specially 
phrased and highlighted to indicate to the author 
that some choice has to be made to refine the un- 
derlying type (e.g. obtaining the text "la capitale de 
PAYS"). This choice has the effect of further instan- 
elating the incomplete tree with "true" combinators, 
main (for which various terminological resources are 
available), (2) it is a homogeneous collection of docu- 
ments all complying to the same division in sections 
and sub-sections, (3) there is a strong trend in in- 
ternational bodies such as the EEC towards making 
drug package notices (which are similar to VIDAL 
notices) available in multilingual versions strictly 
aligned on a common model. 11 
4.2 Corpus analysis 
An analysis of a large collection of notices from Le 
VIDAL ® de la famille, describing different drugs, 
from different laboratories was conducted in order 
to identify: 
* the structure of a notice, 
® the semantic dependencies between elements in 
the structure. 
For this task, all the recta-information available is 
useful, in particular: explanations provided by Le 
VIDAL ® de la famille and help of a domain expert. 
Corpus study was a necessary preliminary task be- 
fore modeling the notices in the IG formalism pre- 
sented in section 2. 
4.2.1 Structure 
Notices from Le VIDAL ® are all built on the same 
model, including a title (the name of the drug, plus 
some general information about it). followed by sec- 
tions describing the main characteristics of the cirug: 
general description, composition, indications, con- 
traindications, warnings, drug interactions, preg- 
nancy and breast-feeding, dosage and administra- 
tion, possible side effects. This initial knowledge 
• about the semantic content of the document is cap- 
tured with a first., simple context free rule, such as: 
and the generation process is iterated. 
4 An Application to Pharmaceutical 
Documents 
4.1 Corpus selection 
Our corpus consists in drug notices extracted froln 
"'Le VIDAL®de la Famille" (Editions du Vidal. 
1998). a practical book about heahh made for the 
........ vidalNot.ice(T,D,C, I ,CI.~W,DI ~ PaBF,D~i-A,PSI) : :notice 
--> 
T: :title, 
D: :description, 
C: :composition, 
I lA similar but less extended corpus was previously built 
by the third author as the basis for a prototype of muhilingual 
ctocument authoring using G F. 
28 
I::indications, 
Cl::contraindications, 
W::warnings, 
DI::drugsInteraction, 
PaBF::pregnancyAndBreastFeeding, 
DaA::dosageAndAdmin, 
PSI::possibleSideEffects. 
Each section is associated with context-bee rules 
that describe its internal structure: 
'vidalTitle(N,APi ..., .~;>)~:-.:~d£1e-=:n ....... 
--> 
N::name0fDrug, 
AP::activePrinciples ..... 
vidalDescription(N,PF,P...)::description 
--> 
\['DESCRIPTION'\], 
N::nameOfDrug, 
PF::pharmaceutForm, 
P::package ..... 
vidalDosageAndAdmin(D,A)::dosageAndAdmin 
--> 
\['DOSAGE AND ADMINISTRATION'\], 
D::dosage, 
A::administration. 
tablet::pharmaceutForm --> \['tablet'\]. 
eyeDrops:::pharmaceutForm --> \['eye drops'\]. 
At this point, we allow parallel realizations for 
French and English. So, in addition to the English 
grammar given above, we have the French grammar: 
vidalTitle(N, AP ........ )::title 
--> 
N::name0fDrug, 
AP::activePrinciples, ... . 
vidalDescr(N,PF,P...)::description 
--> 
\['PRESENTATION'\], 
N::nameOfDrug, 
PF::pharmaceutForm, 
P::package ..... 
vidalDosageAndAdmin(D,A)::dosageAndAdmin 
--> 
\['MODE D'EMPLOI ET POSOLOGIE'\], 
D::dosage, 
A::administration. 
tablet::pharmaceutForm --> \['comprim~'\]. 
eyeDrops:::pharmaceutForm --> \['collyre'\]. 
This first grammar is fully eq.ivalent to a XML 
I)TD that describes the structure of a notice, though 
it distinguishes finer-grained units 1hart traditional 
l)TI)s tends to do. 
4.2.2 Modeling dependencies 
, ,~ButHG :~ goes £urt, her,:than XM-L DTDs ~it~h'regard 
to the semantic control of documents: it enables us 
to express dependencies which may arise in differ- 
ent parts of a document, including tong-distance de- 
pendencies, through the use of dependent types pre: " 
sented in section 2. 
Identification of the dependencies to be modeled was 
done in a second stage of the corpus study. For ex- 
ample, we identified dependencies between: 
, ........ ,:.-.: :~-~ "the:--ghamaaeoa~tieal ,:forrrr;0t~ a :gi,#ed~dtfug :(.cbn:.- 
cept pharmaceutForm) and its packaging (con- 
cept package), 
® particular ingredients given in the section com- 
position and warning instructions given ill the 
section warnings, 
® categories of patients the drug is intended for in 
the section description and posology indicated 
for each category in the section indications. 
To illustrate the modeling task, we now give more 
details about one particular dependency identified. 
Intuitively, it appears that there is a strong link be- 
tween the pharmaceutical form of a given drug and 
the way it should be administered: tablets are swal- 
lowed, eye drops are put in the eyes, powder is di- 
luted in water etc. In our first grammar, the phar- 
maceutical form concept appears in the description 
section, since the administration way is described in 
the dosage and administration section. The use of 
dependent types permits to link these sections to- 
gether according to the pharmaceutical form. Tile 
parts of the (English) grammar involved become: 
vidalNotice(T,D,C,I,CI,W,DI,PaBF,DaA,PSI)::notice 
--> 
T::title, 
D::description(PF), 
C::composition, 
I::indications, 
CI::contraindications, 
W::warnings, 
DI::drugslnteraction, 
PaBF::pregnancyAndBreastFeeding, 
DaA::dosageAndAdmin(PF), 
PSI::possibleSideEffects. 
vidalDescription(N,PF,P,...)::description(PF) 
--> 
\['D~SCRIPTION'\], " • 
N::nameOfDrug, 
PF::pharmaceutForm, 
P::package ..... 
vidalDosageAndAdmin(D,A)::dosageAndAdmin(PF) 
--> 
\['DOSAGE AND ADMINISTRATION'\], 
D::dosage, 
29 
A : : administration (PF). 
The administration section should now be de-- .... 
scribed according to the pharmaceutical form it pre- 
supposes, several administration ways being compat- 
ible with each form: 
t abletsAdminl : : administrat ion (Tablet) 
¢O~I'~£-INDICAT%(~mS: ce ~idl¢~ent rm do|t p~s ~tre ut~l~sb dlns les C~S sutvancs: 
----> aller~le au~ /~1SS nocu~ent t'asetrlne i 
\['Swallow the tablets without "- l 
crunching them. '\] . ar~n~: "'... -" ..... _ . ~w=' ~':" :~','~.~'%'-~~ -.-" .......... 
• \[KTERACTZORS HI~DICAHENTEIJSES: Ce |~atc~ent aeut tnterlqtr avec a'autres ~ed~ca~ents. tablet sAdmin2 : :administrat ion (Tablet) ~,o~ .... ~ - ~-,~,. ,, .... t,,~ ,nt ,~n..,~to ,.~ .... t.,.,~ ,,~ 
augmentation des effets ~a~Is~r~bles. - le ltthtu~: ~9uentatlon ~u taux de Hthiu| 
__> dam le sanq. 
\['Let the tablets melt under c.oss~ss( £TT AttAI~M~,T: 
the tongue. '\] . 
eyeDropsAdmin::administration(EyeDrops) 
--> 
\[~Pull the lower eyelid down while 
looking up and squeeze the eye drops, 
so that they fall between the eyelid 
and the eyeball.'\]. 
emacs: "prolo@ ° : 
I 
llOaOF£1t IbuDrofane 
P'R~\[NTATION: RUROFEN : ¢ot~r|m~ C blanc ) : bQIte de Z£ - ~ah &~ - 15.s F - • t@orat01 res Boots Healt.care 
¢o,tposrrzoq: p cD 
Ibugrofene .............. 20fl ig 
INDICATIONS: Ce |~dlcuent est u, gncl-|nflulatOl¢o non stero~cHen {PISS). I\] osc 
ut|lis6 e, cas de aouIeurs diverses. 
.~OOE D'EHPtOI ET POSOLOCZE: i\[~llllllmllmlml ~ . P~ologta Usuel t e: : ~ ¢o.pr i mes ...,~;441.i~  g/l 
The consequence of such a modeling is a better 
control of the semantic content of the document in 
the process of being authored: once the user chooses 
tablet as pharmaceutical form in the section descrip- 
tion, his choice is restricted between the two con- 
cepts tabletsAdminl and tabletsAdmin~ in the ad- 
ministration section. If he chooses eye drops as the 
pharmaceutical form, there is no choice left if the ad- 
ministration section: the text fragment correspond- 
ing to the concept eyeDropsAdmin will be generated 
automatically in the document. 
This example illustrates how dependencies are 
propagated into the macro-structure, but they can 
be propagated into the micro-structure as well: for 
example, in the description section, we can express 
that the packaging of the drugs is also dependent of 
their form: tablets are packaged in boxes, eye drops 
in flasks, powder in packets, etc.: 
vidalDescription(N,P .... )::description(PF) 
--> 
\['DESCRIPTIDN'\], 
N::name0fDrug, 
PF::pharmaceutForm, 
P::package(PF) ..... 
box: :package(Tablet) .--> \['Box'\]. 
flask::package(EyeDrops) --> \['Flask'\]. 
This example shows that tile granularity degree of 
the linguistic realization cat\] vary from full text seg- 
ment (administration ways) to sing\[e words (forms 
like tablet, eye drops, powder, etc.). This is highly 
related to the reusability of the concept: references 
to specific forrns may appear it\] many parts of the 
Figure 1: A stage in the authoring of a notice, with 
French text shown. 
document, while the administration ways are more 
or less frozen segments. 12 
The level of generality of dependencies encoded in 
the grammar needs to be paid attention to: one has 
to be sure that a given dependency is viable over a 
large collection of documents in the domain. If a 
choice made by the grammar writer is too specific, 
the risk is that it may be not relevant for other docu- 
ments. For this reason, an accurate knowledge of the 
corpus is necessary to ensure an adequate coverage 
of documents in the domain. 
4.3 An Example 
Screen copies of the IG interface during an authoring 
process of a VIDAL notice are given on figures 1 and 
2. Figure 1 represents the notice authored in French 
at a given stage. The fields still to be refined by 
tile user appear ill dark. When the author wants to 
refine a given field, a pulldown menu presenting tile 
choices for this field appears on the screen. Here, the 
author chooses to refine the field avaler in the admin- 
istration (mode d'emploi et posologie ) section: the 
corresponding menu.proposes the list of.administra- 
tion ways corresponding to the pharmaceutical form 
tablet he has chosen before. Figure 2 shows the par- 
allel notice in English but one step further, i.e. once 
he has selected the administration way. 
12 For a discussion of some of the issues regarding the use of 
templates in nature\[  generation systems, see (\[-leit er, 
1995). 
30 
I ............................ .... ~ .: • - .... aaa-.;~o~}=: . .................. . . 7~. 7-~ i.. 
RUnOFE# Ibuprofen 
OESERIPT|ON: HUROfEH : tablet ( vhite ) ; box of 20 - G~ Rezab - X5.8 F - . Boots 
Real thcare Laboratories 
¢~¢SFrIOH: 0 tb 
~buDrot en .............. 200 i~ 
INDICATZC~S: This dru9 Is a ,on-~terold/I anct-lnflulatcry (NSAIPS). It IS used to 
treat various pal~s 
COliTRA.\[KOIC&Tl~44S: This drug should not be used in the following cases: all~rtly to 
NSAtOS l in particular t~_~p_trtn i 
WA~I~INCS: ...... . • 
~RU¢ I~TER~'I'ZONS: This clru9 can Interact ~ltb other drugs. In ~art~cular: - asplr,, 
aria the other non steroidal ~t~-tnfl~latory drugs: ~ncrea.se of side effects. - 
Lithium: ~¢reasl of blood hth~ul rate. 
I 
PRECNN(CV MD 8REAST-rE£DINC: 
VeDm~ 
DOSAGE AnD .~DMINISTRATI(~4: ~ tablet swallowed vith a lass of 
aye. ~ . 
t 
PC~SIeLE SlO£ EFFECTS: 
Figure 2: The parallel English notice one authoring 
step later. 
5 Conclusion 
XML-based authoring tools are more and more 
widely used in the business community for sup- 
porting the production of technical documentation, 
controlling their quality and improving their re- 
usability. In this paper, we have stressed the connec- 
tions between these practices and current research in 
natural  generation and authoring. We have 
described a formalism which removes some of the 
limitations of DTD's when used for the production 
of multilingual texts and presented its application to 
a certain domain of pharmaceutical documents. 
Acknowledgements Thanks to Jean-Pierre 
Chanod, Marie-H~_lb.ne Corr/mrd, Sylvain Pogodalla 
and Aarne Ranta for important contributions, 
discussions and comments. 

References 
a. Coch. 1996. Evaluating and comparing three text 
production techniques. In Proceedings of the 16th 
International Confe~vnce on Computational kin- 
guistics. 
OVP l~ditions du Vidal, editor. 1998. Le VIDAL de 
la famille. HACHETTE. 
M. Dymetman. V. Lux, and A. Ranta. 2000. XML 
and multilingual document authoring: Conver- 
gent trends. In Pro,'eedings Coling 2000, Saar- 
brficken. 
A. Hartley and ('. Paris. 1997. Muhilingual docu- 
ment production-: from supporl for translating to 
support for authoring. In Machine Translation, 
Special Issue. on New Tools for Huma n TranslaT,.. 
tots, pages 109-128. 
L. Magnusson and B. Nordstr6m. 1994. The ALF 
proofeditor and its proof engine. In Lecture Notes 
in Computer Science 806: Springer. 
P. Martin-L6f. 1984. Intuitionistic Type Theory. 
Bibliopolis, Naples. 
P. M/ienp/ii and A. Ranta. 1999. The type theory 
and type checker of GF. In Colloquium on Prin- 
ziples, .Logics, ..and Implementations .of High-Level 
Progrdmm.ihg L~inTJages, Worl~shop: On-Logical 
Frameworks and Meta-s, Paris, Septem- 
ber. Available at http ://www. cs. chalmers, se/ 
~aarne/papers/Ifm 1999. ps. gz. 
W. Pardi. 1999. XML in Action. Microsoft Press. 
Fernando C. N. Pereira and David H. D. Warren. 
1980. Definite clause grammars for  anal- 
ysis. Artificial Intelligence, 13:231-278. 
R. Power and D. Scott. 1998. Multilingual au- 
thoring using feedback texts. In Proceedings of 
the 17th International Conference on Computa- 
tional Linguistics and 36th Annual Meeting of the 
Association for Computational Linguistics, pages 
1053-1059. 
P. Prescod. 1998. Formalizing SGML 
and XML instances and schemata 
with forest automata theory. 
http ://www. prescod, net/forest/shorttut/. 
A. Ranta. Grammatical Framework work 
page. http ://www. cs. chalmers, se/ 
aarne/GF/pub/work- index/index, html. 
E. Reiter. 1995. NLG vs. templates. In Proceedings 
of the 5th European Workshop on Natural Lan- 
guage Generation (EWNLG '95), pages 95-106, 
Leiden. 
W3C, 1998. Extensible Markup Language (XML) 
1.0, February. W3C reconunendation. 
W3C, 1999a. XML Schema - Part 1: Structu~vs, 
Part 2 : Datatypes -, December. W3C Working 
draft. 
W3C, 1999b. XSL Transformations (XSLT), 
November. W3C recommendation. 
D. Wood. 1995. Standard Generalized Markup Lan- 
guage: Mathematical and philosophical issues. 
Lecture Notes in Computer Science. 1000:344-- 
365. 
