XML and Multilingual Document Authoring: Convergent Trends 
Marc l)ymctman Veronika Lux 
Xerox Research Centre Europe 
6, chemin de Maupertuis 
38240 Meylan, France 
{ dymetman,lux } @xrce.xerox.com 
Aarne Ranta 
Department of Computing Science 
Chahners University of Technology 
and GOteborg University 
S-412 96 GOteborg, Sweden 
aarne @ cs.chahners, se 
Abstract 
Typical al)proaches to XML authoring view a XML doc- 
urnent as a mixture of structure (the tags) and surlhce 
(texl between the tags). We advoeale a radical approach 
where the surface disappears from lhe XML documenl 
altogether to be handled exclusively by rendering mech- 
anisms. This move is based on the view that the author's 
choices when authoring XML docutnciHs are best seen 
as -i~eutral semantic decisions, that lhe SlftlC- 
lure can then be viewed as inlerlingual content, and that 
the textual oulpul should be derived from this co)lien\[ by 
-sl~ecific realization mechanisms, lhus assimi- 
lating XML aufllol'ing lo Mullilingual Document Amhof 
ing. However, slandard XMI, tools have imporlant lhni- 
tations when used for such a ptu'pose: (1) they are weak 
at propagating semanlic dependencies belween dil'ferenl 
parts of the st,'ucture, and, (2) current XMI. rendering 
tools are ill-suited for handling the grammatical combi- 
nation of lextual units. We present two relalcd proposals 
for overcoming these limitalions: one (GI:) origitmting 
in the Iradilion of malhemalical proof edilors and con- 
slructivc type lhcery, the other (IG), a speciali×ation of 
l)elinite Clause (_\]ranllllars strongly inspired by (iF. 
1 Introduction 
The typical al3pl'oacll to XML authoring views an XML 
doctmlcnt as a mixture of wee-like strttctttre, expressed 
througll balanced labelled parentheses (tim lags), and 
of sul:face, expressed llu'ough free lexi interspersed be- 
tween lhe tags (PCI)ATA). A l)octunent Type l)elini- 
lion (DTD) is roughly similar to a coiitext-free gram- 
mar j with exactly one predelined terminal. It delines a 
set o1' well-formed structures, (hat is, a la,guage over 
trees, where each nonterminal node can dominate either 
the empty string, or a sequence of occurrences of nonter- 
minal nodes and of 111o terminal node pcdata. The ter- 
minal pcdata has a specM status: it can in turn dominate 
any characler string (subjecl to certain reslrictions on the 
characters allowed). Authoring is typically seen as a top- 
down interactive process of step-wise refinement of the 
root nonterminal (corresponding to the whole document) 
where the aulhor ileratively chooses a rule for expanding 
IBu( see (l'rescod, 1998) lbr an inleresfing discussion oflhe differ- 
enccs. 
a nonlerminal aheady present in the tree, 2 and where in 
addition the author can choose an arbitrary sequence of 
characters (roughly) for expanding lhe pcdata node. 
One can observe the following trends in the XML 
world: 
A move towards more typing of the surface: 
Schemas (W3C, 1999a), which are an inlluemial 
proposal for the ieplacenlent of I)TD's, provide for 
types such as float, boolean, uri, etc., instead o\[" 
the single type pcdata; 
A move, aheady constitulive of the main lmlpose 
of XMl, its opposed l(1 HTML for instance, towards 
clearer separation between content and form, where 
the original XML document is responsible for con- 
lent, and powerful styling lnechanisms (e.g. XSI.T 
(W3C, 1999b)) are available for rendering 111o doc- 
tlll/en\[ \[o lhe end-user. 
We advocate an approach in which these two moves 
are radicali×cd in tile folk)wing ways: 
Strongly typed, surface-free XML documents. The 
whole content of the document is a trcc whore each node 
is labelled and typed. For inlernal nodes, lhe lype is just 
the usual nonierminal name (or category), and Ille label 
is a name for the expansion chosen for this nonlernfinal, 
lhat is, an identifier of which rule was chosen to expand 
ibis nonterminal. For leaves, lhe type is a semanlically 
specilic category such as Integer, Animal, etc., and lhe 
label is a specilic concept of this type, such as three or 
dog) 
Styling responsible for producing tim text itself. 
The styling mechanisnl is not only responsible for ren- 
dering the layout of the lext (typography, order and pre- 
sentation of lhe elements), but also for producing the text 
itse!ffrom 111o document content. 
What are (he motiw~tions behind this proposal? 
Autlmring choices carry -independent 
meaning. First, let us note that lhe expansion choices 
2We arc ignoring here tl~e aspecls of lhis process relating to lhe 
regular ,mlure of Ihe righ(-halld sides of rules, but Ihese parliculars are 
uncssenlial lo the nlaill g:lfgtllllOnl. 
3Note Ihat lnlcgcr is of"logical type" e, whereas Animal is of log- 
ical lype (c, t): lhe,'c is no reslriction on lhe denotalional slalus of 
leaves. 
243 
<!ELEMENT Risk (Caution I Warning) > risk-rulel: Risk --> Caution 
risk-rule2: Risk --> Warning 
<!ELEMENT Caution ( ... I ... I ... ) > caution-rulel: 
caution-rule2: 
caution-rule3: 
Caution --> .., 
Caution --> ... 
Caution --> ... 
Figure 1 : Context-flee rules (shown on the right) corresponding to the aircraft DTD (shown on the left); for illustration 
purposes, we have assumed that there are in turn three semantic varieties of cautious. The rule identitier on the left 
can be seen as a semantic label for each expansion choice (in practice, the rule identifiers are given mnemonic names 
directly related to their meauing). 
made during the authoring of an XML document gener- 
ally carry -independent meaning. For instance, 
the DTD for an aircraft maintenance manual might be 
legally required to distinguish between risk instructions 
of two kinds: caut ion (risk related to material damages) 
and warning (risk to the operator). Or a D~'I) describing 
a personal list of contacts might provide a choice of gen- 
der (male, female), title (dr, prof, default), country 
(ger, fra,...), etc. Each such authoring choice, which 
formally consists in selecting among different rules for 
expanding the same nonterminal (see Figure 1), corre- 
sponds to a semantic decision which is independent of 
the  chosen for expressing the document. A 
given DTD has an associated expressive space of tree 
structures which fall under its explicit control, and the 
author is situating herself in this space through top-down 
expansion choices. There is then a tension between on 
the one hand these cxplicitely controlled choices, which 
should be rendered differently in different s 
(thus ger as Germany, Allemagne, Deutschland .... and 
Warning by a paragraph starting with Warnillg! ...; At- 
tention, Danger! ...; Achtung, Lebensgefahr! ...), and 
on the other hand the uncontrolled inclusion in the XML 
document of free PCDATA strings, which are written in 
a specific . 
Surface-fi'ce XML documents. We propose to com- 
pletely remove these surface strings from the XML doc- 
ument, and replace them with explicit meaning labels. 4 
The tree structure of the document then becomes the sole 
repository of content, and can be viewed as a kind of in- 
terlingua for describing a point in the expressive space 
of tile DTD (a strongly domain-dependent space); it is 
then the responsability of the -specific rendering 
mechanisms to "display" such content in each individual 
 where the document is needed. 
XML and Multilingual Document Authoring. In 
this conception, XML authoring has a strong connection 
to the enterprise of Multilingual Document Authoring in 
which the author is guided in the specilication of the 
document content, and where the system is responsible 
4There are autlmring situations in which it may be necessary for 
the user to introduce new selllalllic labels eorleSl)onding lo expres- 
sive needs not foreseen by lhe creator of the original I)TD. To handle 
such situations, it is useflfl to view the l)TI)'s as open-ended objecls 1o 
which new semantic labels and types can be added at authoring time. 
for generating from this content textual output in several 
s simultaneously (see (Power and Scott, 1998; 
Hartley and Paris, 1997; Coch, 1996)). 
Now there are some obvious problems with this view, 
due to the current limitations of XML tools. 
Limitations of XML for multilingual document au- 
thoring. The first, possibly most serious, limitation 
originates in the fact that a standard DTD is severely re- 
stricted in the semantic dependencies it can express be- 
tween two subtrces in the document structure. Thus, if 
in the description of a contact, a city of residence is in- 
cluded, one may want to constrain such an information 
depending on the country of residence; or, in the air- 
craft maintenance manual example, one might want to 
automatically include some warning in case a dangerous 
chemical is mentioned somewhere else in the document. 
Because DTD's are essentially ofcontcxt-fi'ce expressive 
power, the only communication between a subtree and its 
environment has to be mediated through the name of the 
nonterminal rooting this subtree (for instance the nonter- 
minal Country), which presents a bottleneck to informa- 
tion ilow. 
The second limitation comes fi'om the fact that the cur- 
rent styling tools for rendering an XML document, such 
as CSS (Cascading Style Sheets), which arc a strictly 
layout-oriented , or XSLT (XSL transformation 
), which is a more generic tool for transforming 
an XML document into another one (such as a display- 
oriented HTML file) are poorly adapted to linguistic pro- 
cessing. In particulm, it seems difficult in such for- 
malisms to express such basic grammatical facts as ntun- 
ber or gender agreement. But such problems become 
central as soon as semantic elements corresponding to 
textual units below the sentence level have to be com- 
bined and rendered linguistically. 
We will present two related proposals for overcom- 
ing these limitations. The first, the Grammatical Frame- 
work (GF)(Ranta, 2000), originates in constructive type- 
theory (Martin-L6f, 1984; Ranta, 1994) and in mathe- 
matical proof editors (Magnusson and Nordstr6m, 1994). 
The second, h~teraction Grammars (IG), is a specializa- 
tion of Definite Clause Grammars strongly inspired by 
GF. The two approaches present certain lk)rmal differ- 
ences that will not be examined in detail in this papeh 
244 
but they share a number of important assumptions: 
• The semantic representations are strrmgly O'ped 
trees, and rich dependencies between subtrees can 
be specilied; 
• The abstract tree is independe,lt of tile different tex- 
tual realization hmguages; 
• Tim surface realization in each  is obtained 
by a semalltics-driven compositional process; that 
is, the surface realizations are constructed by a 
bottom-up recursive process which associates sur- 
face realizations to abstract tree nodes by recur- 
sively combining the realizations of daugthcr nodes 
to obtain the realization of the mother node. 
• The grammars are revelwible, that is, can be used 
both for generation and for parsing; 
• The authoring process is an interactive process 
of repeatedly asking the author to further specify 
nodes in the absmlct tree of which only the type is 
known at the 1)oint of interacti(m (tyFe re/itlemeHt). 
This process is mediated througll text in the lan- 
guage of the author, showing the types t(5 be relined 
as specially highlighted textual units. 
2 GF ~ the Grammatical Framework 
The Grammatical Framework (GF; (Ranta, 2000)) is a 
special-purpose programming hmguage combining co~z- 
strttctive type thee O, with an annotation hmguage for 
concrete syntax. A grammar, in the sense of GF, delines, 
on one hand, an abstract s3,1ttax (a system of types and 
typed syntax trees), and on the other hand, a mapping {51 
tile abstract syntax into a co,icicle sy, tta.v. The abstract 
syntax has cotCtlot 3, declarations, such as 
cat Country ; cat City ; 
and combinator (orfttnctiolO dechuations, such as 
fun Get : Country ; fun Fra : Country ; 
fun Ham : City ; fun Par : City ; 
fun cap : Country -> City ; 
The type of a combinator can be either a basic type, such 
as the type City of the combinator Ham, or a function 
type, such as the type of the combinator cap. Syntax 
trees formed by combinators of functioll types are con> 
plex functional terlns, such as 
cap Fra 
of type City. 
"file concrete syntax part of a GF grammar gives lit~- 
earization rules, which assign strings (or, in general, 
more complex linguistic objects) to syntax trees. For the 
abstract syntax above, we may lmve 
fin Ger = "Germany" ; fin Fra = "France" ; 
lin Ham = "Hamburg" ; lin Par : "Paris" ; 
lin cap Co = "the capital of" ++ Co ; 
Thus tile linearization of cap Fra is 
the capital of France 
2.1 GFinXMI, 
Functional terms have a straightforward encoding in 
XML, l'el~resenting a term of tile forna 
.\[ (11 . .. (I,~ 
by the XML object 
<J'> ct', ... a',, </f> 
where each e~ is tile encoding of a i. In this encoding, 
cap Fra is 
<cap> 
<Fra> 
</Fra> 
</cap> 
Tile simple encoding does not pay attention to the 
types (51' the objects, and has no interesting DTI). To 
express type distinctions, we will hence use a slightly 
more complicated representation, in which the category 
and combinator declarations of GF are represented as 
DTDs in XML, so that GF type dlecking becomes equiv- 
alent ,a, itll XML validatiom The represelm~tion of the GF 
grallllllaf o1' tile previous section is tile DTI) 
<!ELEMENT Country (Ger \[ Fra) > 
<!ELEMENT Get EMPTY > 
<!ELEMENT Fra EMPTY > 
<!ELEMENT City (Ham I Par I (cap,Country))> 
<!ELEMENT Ham EMPTY > 
<!ELEMENT Par EMPTY > 
<!ELEMENT cap EMPTY > 
In this DTD, each category is represented as an EI,E- 
MENT dclinition, listing all combinators producing trees 
of that category. The combinators themselves are repre- 
sented as EMPTY elements. The XML representation of 
the capital (51' France is 
<City> 
<cap /> 
<Country> 
<Fra /> 
</Country> 
</City> 
which is a wdid XML object w.r.t, tile given DTD. 
The latter encoding of GF in XML enjoys two impor- 
tant properties: 
• All well-typed GF trees are represented by valid 
XML objects. 
• An XML represents a unique GF tree. 
The tirst property guarantees that type checking in the 
sense of GF (and type theory) can be used for validation 
of XML objects. The second property guarantees that GF 
objects can be stored in tim XML format. (The second 
property is already gt, aranteed by tile simpler encoding, 
which ignores types.) 
()ther prope,'ties one would desire are the followillg: 
245 
• All valid XML objects represent well-typed GF 
trees. 
• A DTD represents a unique GF abstract grammar. 
These properties cannot be satislied, in general. The rea- 
son is that GF grammars may contain dependent types, 
i.e. types depending on objects. We will retnrn to this 
notion shortly. But let us first consider the use of GF for 
nmltilingual generation. 
2.2 Multilingualgeneration in GF 
Multilingual generation in GF is based on parallel gram- 
mars': two (or more) GF grammars are parallel, if they 
have the same abstract syntax. They may differ in con- 
crete syntax. A grammar parallel to the one above is de- 
fined by the concrete syntax 
param Case = hem \[ gen ; 
oper noml : Str -> Case => Str = 
ks -> tbl {{nom} => s, {gen} -> s+"n"} ; 
oper nom2 :Str -> Case => Str 
ks -> tbl 
{{nom} => s+"ki", {gen} -> s+"gin"} ; 
lincat Country = Case => Str ; 
lincat City = Case => Str; 
lin Ger = noml "Saksa" ; 
lin Fra = noml "Ranska" ; 
lin Ham = noml "Hampuri" ; 
lin Par = noml "Pariisi" ; 
lin cap Co = 
tbl {c => Co!gen ++ 
nora2 "p~iikaupun" ! c} ; 
This grammar renders GF objects in Finnish. In addition 
to linearization rules, it has rules introducing parameters 
and operations, and rules detining the linearization O,pes" 
corresponding to basic types: the linearization type el' 
Country, for instance is not just string (Str), but a func- 
tion fl'om cases to strings. 
Not only the linearization rules proper, but also param- 
eters and linearization types wwy a lot fl'om one hmguage 
to another. In our example, we have the paralnetre of ease 
with two values (in larger granunars for Finnish, as many 
as 16 may be required!), and two patterns for inflecting 
Finnish nouns. The syntax tree cap Fra produces the 
strings 
Ranskan p~fikaupunki 
Ranskan p~kaupungin 
which are the nominative and the genitive form, respec- 
tively. 
2.3 Del)endent types 
DTDs in XML are capable of representing simple types, 
i.e. types without dependencies. Even a simple type sys- 
tem can contribute a lot to the semantic control of doc- 
uments. For instance, the above grammar permits the 
formation of the English noun phrase 
the capital of France 
but not of 
the capital of Paris 
Both of these expressions would be well-formed w.r.t. 
an "ordinary" granunar, in which both France and Paris 
would be classitied simply as noun phrases. 
Dependent types are types depending on objects of 
other types. An example is the following alternative dec- 
laration of Country and City: 
cat Country ; cat City (Co:Country) ; 
Under tiffs definition, there are no objects of type City 
(which is no longer a well-formed type), but of types 
City Ger and City Fra. Tlms we define e.g. 
fun Ham : City Ger ; fun Par : City Fra ; 
fun cap : (Co:Country) -> City Co ; 
Observe the use of the variable Co in the type of the com- 
binator capital: the variable is bound to the argument 
type and then used in the value type. The capital of a 
country is by definition a city of the same country. This 
involves a generalization o1' function types with depen- 
dent types. 
Now consider a simplified format ()f postal addresses: 
an address is a pair of a country and a city. The GF rule 
is either 
fun addr : Country -> City -> Address ; 
iin addr Co C = C ++ "," ++ Co ; 
using simple types or 
fun addr : 
(Co:Country) -> C±ty Co -> Address ; 
&in addr Co C = C ++ "," ++ Co ; 
using dependent types. The invalid address 
Hamburg, France 
is well-typed by the former definition but not by the lat- 
ter. Using the laUer delinition gives a simple mechanism 
of semantic control ot' addresses. The same idea can ob- 
viously be exlended to full addresses with street names 
and numbers. Such dependencies cannot, however, be 
expressed in DTDs: both of the address rules above cor- 
respond to one and the same ELEMENT definition, 
<!ELEMENT Address (addr, Country, City) > 
This example 
enoughlbr GF 
<Address> 
<addr /> 
<Country> 
<Fra /> 
</Country> 
<City> 
<Ham /> 
</City> 
</Address> 
also shows Illat XML validity is not 
well-formedness: the object 
246 
is valid w.r.t, the DTD, but the corresponding Ot-; object 
addr Fra ttam 
is not well-typed. 
2.4 Computation rules 
In addition to categories and cornbinators, GF grammars 
may contain definitions, such as 
def cap Fra = Par ; 
Definitions belong to the abstract syntax. They define 
a normal form for syntax trees (recursively replace de- 
fienda by definientes), as well as a paraphrase relation 
(sameness of normal tbrm). These notions are, of course 
reflected in the concrete syntax: the addresses 
the capital of France, France 
Paris, France 
are paraphrases, and the latter is the normal form of the 
former. 
= .... - -,- ,--- 1 \[U"e m IE'~i ml~w ml°pu°i's ~lum= ~J II 
 Nll 
I\[ "''~ ~ ~ ~I/ll I I 
~h~1- ~ul~f ~//11 
II lt'~eore~. Fc? aH numbers ×, there e×ists a rtlOcer u ' such I//11 
II ~hat ~.L~ ~,~n~ U~ ,,'. ~-o~f. C~-.id~. ~,~'bitr, a~u I// 
11 nuliJer x. '?z3re<~? . Ik.nce, "for' all riJ~2rs ~, there exists a Ill II 
li ~-,4,c,r'e~. Pot tous lore rBc~hoes ;4. il e~i~te ~.~mbre ~:'.~\[/I1 
j\[ x, ~ r~., ~.111~\] (~ t~.: n~',-~, ~- ~ ~-ooql II//1/ I\[ m~oon L~,u:~oUill " " t~-(~- n~, u =(~.'ql II jill 
II et~oH lukux~ik4}l ...... I ........ . v. II Ill/It 
I \[ <Text > crh,ua~w,-o411 ~,~. (, ,:- \[~, r,, q I I Ill/I/ 
I I <Pr*~> <E--:ist/> <0c~ll E~t,/i'. ~ -_ ~,, ~-_ (~ =11 I I IM II </Prop> <~r'oo> </ @tEll _ . . 
I .... ta ~ ~_l~m H , .................................... Ill.,,,  411Lt 
?~a ~ Proof (Exist !lat (','..'>:' qll®l.l 0:~s~ ~r,, s: ~" ~',~ tl Iii-\]1t 
-- ~iI1_:_'. ~ ,_ t%:t~:F°°~)\]lll~ 
Figure 2: GF session for editing a mathematical proof 
text. 
2.5 GF editing tools 
An editing tool has been implemented for GF, using 
metavariables to represent yet undefined parts of expres- 
sions. The user can work on any metavariable, in various 
different ways, e.g. 
• by choosing a combinator from a menu, 
® by entering a string that is parsed, 
• by reading a previously defined object from a file, 
® by using an automatic search of suitable instantia- 
tions. 
These functionalities and their metatheory have been 
used for about a decade in a number of syntax edi- 
tors for constructive type theory, usually known as proof 
editors (Magnusson and NordstrOm, 1994). From this 
point of view, the GF editor is essentially a proof edi- 
tor together with supplementary views, provided by the 
concrete syntax. The current implementation of GF 
is a plugin module of the proof editor Alfa (Hallgren, 
2000). The window dump in Figure 2 shows a GF ses- 
sion editing a mathematical proof. Five views are pro- 
vided: abstract syntax in type-theoretical notation, En- 
glish, French, Finnish, and XML. One metavariable is 
seen, expecting the user to find a Proof of the proposi- 
tion that there exists a number .r' such that a', is smaller 
than x', where x is an arbitrary number given in the con- 
text (for the sake of Universal Introduction). 
3 IG : Interaction Grammars 
We have just described an approach to solving the limita- 
tions of usual XML tools for multilingual document au- 
thoring which originates in the tradition of constructive 
type-theory and mathematical proof editors. We will now 
sketch an approach strongly inspired by GF but which 
formally is more in the tradition of logic-programming 
based unification grammars, and which is currently un-. 
der development at Xerox Research Centre Europe (see 
(Brun et al., 2000) for a more extended description of 
this project). 
Definite Clause Grammars, or DCG's, (Pereira and 
Warren, 1980), are possibly the simplest unification- 
based extension of context-free grammars, and have 
good reversibility properties which make them adapted 
both to parsing and to generation. A typical view of what 
a DCG rule looks like is the following: 5 
a(al(B,C .... )) ---> 
<textl>, 
b(B), 
<text2>, 
e(c), 
<text3>, 
{constraints (B,C,...)}. 
This rule expresses the fact that (1) some abstract 
structure al (B, C .... ) is in category a if the structure 
B is in category b, the structure C in category c ..... and 
furthermore a certain number of constraints are satisfied 
by the structures B, C .... ; (2) if the structures B, C .... can 
be "rendered" by character strings StringB, StringC, 
.... then the structure al(B,C .... ) can be rendered by 
the string obtained by concatenating the text <text:t> 
(that is, a certain constant sequence of terminals), then 
StringB, then <text2>, then StringC, etc. 
In this formalism, a grammar for generating English 
addresses (see preceding section) might look like: 
SReminder: according to the usual logic programming conventions, 
lowercase letters denote predicates and functors, whereas uppercase 
letters denote metavariables that will be instantiated with terms. 
247 
address(addr(Co,C)) --> city(C), ",", 
country(Co). 
country(fra) --> "France". 
country(get) --> "Germany". 
city(par) --> "Paris" 
city(cap(Co)) --> "the capital of", 
country(Co). 
The analogies with the GF grammars of the previous 
section arc clear. What is traditionally called a cate- 
gory (or nonterminal, or predicate) in the logic program- 
ruing terminology, can also be seen as a type (address, 
country, city) and functors such as get, par, addr, 
cap can be seen as combinators. 
If, in this DCG, we "forget" all the constant strings 
by replacing them with the empty string, we obtain the 
following "abstract grammar": 
address(addr(Co,C)) --> city(C), country(Co). 
country(fra) --> \[\]. 
country(ger) --> \[\]. 
city(par) --> \[\]. 
city(cap(Co)) --> country(Co). 
which is in fact equivalent to the definite clause pro- 
gram: 6 
address(addr(Co,C)) :- city(C), country(Co). 
country(fra). 
country(ger) . 
city(par) . 
city(cap(Co)) :- country(Co). 
This program is -independent and recursively 
dclines a set el' well-formed trees to which it assigns 
types (thus cap(fra) is a well-formed tree o1' type 
city). 
As they stand, such definite clause grammars and pro- 
grams, although suitable Ibr simple generation tasks, are 
not directly adapted for the process of interactive multi- 
lingual document authoring. In order to make them more 
appropriate for that task, we need to specialize and adapt 
DCGs in the way that we now describe. 
Parallel grammars. The tirst move is to allow for 
parallel English, French ..... grammars, which all have 
the same underlying abstract gralnmar (program). So in 
addition to the Englisb grammar given above, we have 
tim French grammar: 
address(addr(Co,C)) --> city(C), ",", 
country(Co). 
country(fra) --> "la France". 
country(get) --> "l'Allemagne". 
city(par) --> "Paris". 
city(cap(Co)) --> "la capitale de", 
country(Co) . 
6hl the sense that rewriling the llOntCI'nlilull goal 
address (addr (Co ,C) ) to the empty siring in lhe I)CG is equivalent 
|o proving the goal address (addr (Co, C) ) in the program (l)cransart 
and Maluszynski, 1993). 
Dependent Categories. The grammars we have given 
arc delicient in one importaut respect: there is no de- 
pendency between the city and the country in the salne 
address. In order to remedy this problem, a stan- 
dard logic programming move would he to reformulate 
the abstract grammar (and similarly for the - 
dependent ones) as: 
address(addr(Co,C)) --> city(C,Co), 
country(Co). 
country(fra) --> \[\]. 
country(ger) --> \[\]. 
city(par,fra) --> \[\]. 
city(cap(Co),Co) --> country(Co). 
The expression city(C, Co) is usually read as the re- 
lation "C is a city of Co", which is line for computational 
purposes, but this reading obscures the notion that the 
object C is being typed as a city; more precisely, it is 
being typed as a city of Co. In order to make this read- 
ing more apparent, we will write the grammar as: 
address(addr(Co,C)) --> cityc0(C), 
country(Co). 
country(fra) --> \[\]. 
country(ger) --> \[\]. 
cityf~(par) --> \[\]. 
cityco(cap(Co)) --> country(Co). 
That is, we allow the categories to be indexed by terms 
(a move which is a kind of "currying" ot' a relation into 
a type for its first argument). Dependent categories are 
similar to the dependent types of constructive type the- 
ory. 
Heterogeneous trees. Natural  authoring is 
different from natural  generation in one cru- 
cial respect. Whenever the abstract tree to be generated 
is incomplete (for instance the tree cap(Co)), that is, 
has some leaves which are yet uninstantiated variables, 
the generation process should not proceed with noude- 
terministically enumerating texts for all the possible in- 
stantiations of the initial incomplete structure. Instead it 
should display to the author as much of the text as it can 
in its present "knowledge state", and enter into an inter- 
action with the author to allow her to further refine the 
incomplete structure, that is, to further instantiate some 
of the uninstantiated leaves. To this purpose, it is use- 
ful to introduce along with the usual combinators (addr, 
fra, cap, etc.) new combinators of arity 0 called type- 
names, which are notated type, and are of type type. 
These combiuators are allowed to stand as leaves (e.g. in 
the tree cap(country)) and the trees thus obtained are 
said to be heterogeneous. The typenames are treated by 
the text generation process as if they were standard se- 
mantic units, that is, they are associated with text trails 
which arc generated "at their proper place" in the gen- 
erated output. These text units are specially phrased and 
highlighted to indicate to the author that some choice has 
to be made to reline the underlying type (e.g. obtaining 
248 
the text "la capimle de PAYS"). This choice has the efl'ect 
of further instantiating the incomplete tree with "true" 
combinators, and the gmmration process is iterated. 
Extended senmntics-driven eompositionality. The 
simple DCG view presented at the beginning of this sec- 
tion sees the process of generating text from an abstract 
structure as basically a compositional process on strings, 
that is, a process where strings are recursively associated 
with subtrees and concatenated to l~roduce strings at the 
next subtree level. But such a direct process of construct- 
ing strings Ires well-known limitations when the seman- 
tic and syntactic levels do not have such a direct corre- 
spondence (simple example: ordering a list of modifiers 
around a noun). We are currently experimenting with a 
powerful extension of string compositionality where the 
objects compositionally associated with abstract subtrees 
are not strings, but syntactic representations with rich in- 
ternal structure. The text itself is obtained fiom the syn- 
tactic representation associated with the total tree by Siln- 
ply enumerating its leaves. 
The picture we get of an IG grammar is tinally the 
following: 
aD,..(al(B,C .... ))-Syn --> 
bE,...(B)-SynB, 
CF,...(C)-SynC, 
{constraints(B,C,...,D,E,F,...)}, 
{compose_english(SynB, SynC, Syn)}. 
The rule shown is a rule for English: the syntactic 
representations are hmguage dependent; Parallel rules 
for tim other hmguages are obtained by replacing the 
compose engl'ish constraint (which is tmique to this 
rule) by constraints appropriate to the other hmguages 
under consideration. 
4 Conclusion 
XML-based authoring tools are more and more widely 
used in the business community for supporting the pro- 
duction of technical documentation, controlling their 
quality and improving their reusability. In this paper, 
we have stressed the connections between these practices 
and current research in natural  genenttion and 
authoring. We have described two related fornmlisms 
which are proposals for removing some of the limitations 
of XML DTD's when used for tim production of multi- 
lingual texts. 
From a compt, tational linguist's point of view, there 
might be little which seems novel or exciting in XML 
representations. Still XML has a great potential as a lin- 
gua.franca and in driving a large community of users 
towards authoring practices where content is becoming 
more and more explicit. There may be a great opportu- 
nity here for researchers in natural hmguage generation 
to connect to a growing sot, rce of applications. 
Acknowledgements 
Thanks for contributions, discussions and comments to 
Ken Beesley, Caroline Brtm, Jean-Pierre Chanod, Marie- 
Hdl8ne Corrdard, Pierre Isabelle, Bengt Nordstr6m, Syl- 
vain Pogodalla and Annie Zaenen. 

References 

C. Brun, M. l)ymetman, and V. Lux. 2000. l)ocument 
structure and multilinguat authoring. In Proceedings of 
First h~telwatiomd Natural lzmguage Generation Confer- 
ence (INLG '2000), Mitzpe P, amon, Israel, June. 

J. Coch. 1996. Evaluating and comparing three text production 
techniques. In Proceedings of the 16th International Conference on Computational Linguistics. 

1: l)eransart and J. Maluszynski. 1993. A Gramntatical View 
of Logic Programming. MIT Press. 

Thonms llallgren. 2000. Alfa Home Page. Awfilable fi'om 
http ://wm~. cs. chalmers, se/~hallgren/Alfa/ 

A. ltartley and C. Paris. 1997. Multilingual document produc- 
tion: fiom support for translating to support for authoring. 
In Machine Translation, Special Issue on New 7bols.for Htt- 
man 7)'anslators, pages 109-128. 

L. Magnusson and B. NordslrOm. 1994. The ALF proof editor 
and its proof engine. In Lecture Notes in Conqmler Science 
806. SpringeL 

P. Martin-L6f. 1984. hmdlionistic 7\]ype 7heoo,. Bibliopolis, 
Naples. 

W. Pardi. 1999. XML in Action. Microsoft Press. 

Femando C. N. Pereira and David II. D. Warren. 1980. Deft- 
nite clause grammars for  analysis. Artificial huel- 
ligence, 13:231-278. 

R. Power and D. Scott. 1998. Multilingual authoring using 
feedback texts. In Proceedings of the 17th International 
Conference on Computational linguistics and 36th Annual 
Meeting of the Association for Computational Linguistics, 
pages 1053-1059. 

P. Prescod. 1998. Fornmlizing SGMI, and XML In- 
stances and Schemata with Forest Automata Theory. 
http : //m~w. prescod, net/f orest/shorttut/. 

A. Ranta. 1994. 7~vpe-Theorelical Grammar. Oxford Univer- 
sity Press. 

Aarne Ranm. 2000. GF Work Page. Awfilablc fi'om 
h'c t;p://m,m, cs. chalmers, se/~aarne/(IF/ 
pub/work-index/ 

W3C, 1998. Exlensible Marktq~ Language (XML) 1.0, Febru- 
ary. W3C recommendation. 

W3C, 1999a. XML Schema - l'art h Strltctttres, Part 2 : 
Datatypes -, l)ecembe,. W3C Working draft. 

W3C, 1999b. XSL Transformations (XSLT), Novcmbe,; W3C 
recommendation. 
