An Empirical Study in Multilingual Natural Language 
Generation: What Should A Text Planner Do? 
Daniel Marcu 
Information Sciences Institute and 
Department of Computer Science 
University of Southern California 
4676 Admiralty Way, Suite 1001 
Marina del Rey, CA 90292 
marcu@isi, edu 
Lynn Carlson 
U.S. Department of Defense 
, _Et. Meade,_ MD~.20755 
lmcarls@afterlife, ncsc. rail 
Maki Watanabe 
Department of Linguistics 
........ • , ~ni.v, ersity.of Southern~California 
Los Angeles, CA 90089 
m watanab@usc, edu 
Abstract 
We present discourse annotation work aimed at con- 
structing a parallel corpus of Rhetorical Structure 
trees for a collection of Japanese texts and their cor- 
responding English translations. We discuss impli- 
cations of our empirical findings for the task of text 
planning in the context of implementing multilingual 
natural  generation systems. 
1 Introduction 
The natural  generation community has em- 
phasized for a number of years the strengths of mul- 
tilingual generation (MGEN) systems (Iordanskaja 
et al., 1992; RTsner and Stede, 1992; Reiter and Mel- 
lish, 1993; Goldberg et al., 1994; Paris et al., 1995; 
Power and Scott, 1998). These strengths concern the 
reuse of knowledge, the support for early drafts in 
several s, the support for maintaining con- 
sistency when making changes, the support for pro- 
ducing alternative formulations, and the potential 
for producing higher quality outputs than machine 
translation. (The weaknesses concern the high-cost 
of building large, -independent knowledge 
bases, and the dilficulty of producing high-quality. 
broad-coverage generation algorithms.) 
From an economic perspective, the more a sys- 
tem can rely on  independent modules for 
the purpose of multilingual generatiom the better. 
If an MGEN system needs to develop  de- 
pendent knowledge bases, and  dependent 
algorithms for content selection, text planning, and 
sentence planning, it-is difficult to justify its eco- 
nomic viability. However, if most of these compo- 
nents are  independent and/or much of the 
code can be re-used, an MGEN system becomes a 
viable option. 
.Many of the earl3 implementations of MGEN sys- 
tems have adopted the perspective that text plan- 
ners can be implemented as -independent 
modules (lordanskaja el, ;11., 1992: Goldberg et el., 
1994), possibly followed by a hm:aricatwn stage, 
in which discourse l.rees are re-written to refleet~ 
-specific constraints (R6sner and Stede. 
1992; St,ede, 1999). Although such an approach may 
17 
be adequate for highly restricted text genres, such 
as weather forecasts, it usually poses problems for 
less restricted genres. Studies of instruction man- 
uals (RTsner and Stede, 1992; Delin et al., 1994: 
Delin et al., 1996) suggest that there are variations 
with respect to the way high-level communicative 
goals are realized across s. For example, 
Delin et al. (1994) noticed that sentences (1), (2), 
and (3), which were taken from a trilingual instruc- 
tion manual for a step-aerobics machine, yield non- 
isomorphic Rhetorical Structure (Mann and Thomp- 
son, 1988) analyses in English, French, and German 
respectively (see Figure 1). 
English: \[The stepping load can be altered I \] 
\[by loosening the locking lever 2\] \[and changing 
the position of the cylinder foota\]. 
(1) 
French: \[Pour modifier la charge d'appui, l\] 
\[desserrer les levieres 2\] \[puis d6placer le pied 
des v6rins a\] (\[To modify the load stepping ~\] 
\[loosen the levers 2\] \[then change .the foot of 
the cylinder foot.el) 
(2) 
German: \[,Nach Lockern der Klelnmhebel 2\] (3) 
\[kann t \] \[durch Verschieben des Zylinderfudes 3 \] 
\[die Tretbelastung verS.ndert werden.~ \] (\[After 
loosening of the levers 2\] \[can'\] \[by puslfing of 
the cylinder foot 3\] \[the load changed be, ~\]) 
Hmvever, previous.discourse ,studies do .not es- 
timate how ubiquitous such non-isomorphic analy- 
ses are. Are the examples above an exception or 
the norm? Are non-isomorphic analyses specific 
to discourse structures built, across elementary dis- 
course units of single sentences, or do they also 
occur across sentences and paragraphs? If non- 
isomorphism is ubiquitous, how should an MGEN 
system be designed in order to effectively deal wit h 
non-isomorphic discourse structures when mapping 
knowledge bases into multiple s? 
In this paper, we describe an experiment that was 
designed to answer these questions. To investigate 
English French German 
Circumstance 
_ .. - -.' !:, !'{Modifier)- ~ ~ r-' O-.°ckem ) 
2 3 2 3 3 1 
Loosen Change Loosen Change Change Alter 
(Desserrer) (Deplacer) (Verschieben) (Verandert) 
Figure 1: Contrasting multilingual discourse structure representations (Delin et al., 1994, p. 63) 
how discourse structures differ across s, we 
manually built a parallel corpus of discourse trees of 
newspaper Japanese texts and their corresponding 
English translations. In section 2, we present some 
of the problems specific to the construction of such 
a corpus. In section 3, we present our experiment 
and discuss our empirical findings. In Section 4, we 
discuss the implications of our work for the task of 
text planning, in the context of multilingual natural 
 generation. 
2 Towards building a parallel corpus 
of discourse trees: an example 
Consider, for example, Japanese sentence (4), a 
word-by-word "gloss" of it (5), and a two-sentence 
translation of it that was produced by a professional 
translator (6). 
~/rf~ ~ 4\] \[~afL~,t~, 5\] \[~< g-~.~oq~t-¢ 
{4) 
Thompson, 1988). If we analyze the text frag- 
ments closely, we will notice that in translating sen- 
tence (4), a professional translator chose to realize 
the information in Japanese unit 2 first (unit 2 in 
text (4) corresponds roughly to unit I in text (6)); 
to realize then some of the information in Japanese 
unit 1 (part of unit 1 in text (4) corresponds to unit 
2 in text (6)); to fuse then information given in units 
1, 3, and 5 in text (4) and realize it in English as 
unit 3; and so on. Also, the translator chose to re- 
package the information in the original Japanese sen- 
tence into two English sentences. 
At the elementary unit level, the correspondence 
between Japanese sentence (4) and its English trans- 
lation (6) can be represented as in (7), where j C e 
denotes the fact that the semantic content of unit 
j is realized fully in unit e; j D e denotes the fact 
that the semantic content of unit e is realized fully 
in unit. j; j = e denotes the fact that units j and e 
are semantically equivalent; and j ~ e denotes the 
fact that there is a semantic overlap between units j 
and e, but neither proper inclusion nor proper equiv- 
alence. 
\[The Ministry of Health and Welfare last year 
revealed 1\] \[population of future estimate ac- 
cording to 2\] \[m future 1.499 persons as the 
lowest a\] \[that after *SAB* rising to turn that 4\] 
\[*they* estimated but 5\] \[already the estimate 
misses a point G\] \[prediction became, r\] 
\[In its future population estimateQ\] \[made 
public last yearf\] \[the Ministry of Health and 
Welfare predicted that the SAB would drop to 
a new low of 1.-199 in the future, a \] \[but would 
make a comeback after that .4\] \[increasing once 
again/;\] \[However. it looks as if that prediction 
will be quickly shattered, t; \] 
(.~) 
(6) 
The labeled spans of lext represent elementary 
discourse units (~dus). i.e.. minimal text spans that 
have an unambiguous discourse function (Mann and 
18 
Jl 
.12 
33 
../4 
3s 
./6 
Jr 
e2;jl ~ ca: 
----eel; 
C e3; 
e4;j4 ~ e5; 
e3; 
C e6; 
C e6 
(7) 
Hence, tile mappings in (7) provide an explicit, rep- 
resentation of the way information is re-ordered and 
re-packaged when translated from Japanese into En- 
glish. However, when translating text, it is not only 
that information is re-packaged and re-ordered; it 
is also that the rhetorical rendering changes. What 
is realized in Japanese using an ELABORATION rela- 
tion can be realized in English using, for example, a 
CON'I'RAST or a CONCESSION relation. 
Figure 2 presents in the style of Mann and Thomp- 
C oncess~or ._~ " 
i _ .. ,.. • 
attriiout+or, a:tr~bmi:~n e aioorahor - oaject- ~.It,lolute- e 
I 
e alaorzm~.-mje+t e ::. ..... " ........ .temporal- alter...+. +,~P~a:+m~+ ~~.+.~i=m +~+ +-++++ia;i+-'> +": 
:1 ) (Z) (~ (a) " + ............ +~ ............ + .... S + ' ..... + + +" " 
b.£ (The Ministry ~ '(popu alien - X-E~. 'tfutur @g (that - a~er 
of Health and of future estimate e 1.499 persons \[SAB I rLqng - to 
Weffare last yeal - a:ccrcling - to) as the lowest) turn lhal) 
revealed) 
t-.= 
I-2 3-3 
elabor~r |buto--¢ 
I11 12) (3) ~.3 
In Its fulure made pobll¢ the Mlnl~ry el:l ~ ~t~,n_.~ d.d ~ i on.~ i 
I~Jl~liOn ~" year, ~ Heolth and LIv.~r "~,.. 
estimates WeHare . (4) (5) but would increasing 
the SAB rmk~a c~nce again. comet~sc~r 
wo~rld drop to after that. 
a ni~a. low o~ 
1,4~0 In the 
future, 
(~ 
,,,~ie+gai 
Figure 2: The discourse structures of texts (4) and (6). 
son (1988) the discourse structures of text frag- 
ments (4) and (6). Each discourse structure is a 
tree whose leaves correspond to the edus and whose 
internal nodes correspond t.o contiguous text spans. 
Each node is characterized by a status (NUCLEUS or 
SATELLITE) and a rhetorical relation, which is a re- 
lation that holds between two non-overlapping text 
spans. (There are a few exceptions to this rule: some 
relations, such as the CONTRAST relation that holds 
between unit \[3\] and span \[4,.5\] in the structure of the 
English text are multinuclear.) The distinction be- 
tween nuclei and satellites comes from the empirical 
observation that the nucleus expresses what is more 
essential to the writer's intention than the satellite: 
and that the nucleus of a rhetorical relation is com- 
prehensible independent of the satellite, but not vice 
versa. Rhetorical relations'that end.in the.suffix :'-e". 
denote relations that correspond to embedded syn- 
tactic constituents. For example, the ELABORATION- 
OBJEC'T-ATTRIBUTE-E relation that holds between 
units 9. and 1 in the English discourse structure cor- 
responds to a restrictive relative. We chose to label 
these relations because we have noticed that they 
often dominate complex discourse trees, whose ele- 
nlentary units are fully fleshed clauses. 
If one knows the mappings at the ed~l level, 
one can determine the mappings at. the span (dis- 
course constituent) level as well. For example, us- 
19 
ing the elementary mappings in (7), one can deter- 
mine that Japanese span \[1,2\] corresponds to English 
span \[1,2\], Japanese unit \[4\] to English span \[4,5\]. 
,Japanese span \[6,7\] to English unit \[6\], Japanese 
span \[1,5\] to English span \[1,5\], and so on. As Fig- 
ure 2 shows, the CONCESSION relation that holds be- 
tween spans \[1,5\] and \[6,7\] in the Japanese tree corre- 
sponds to a similar relation that holds between span 
\[I.5\] and unit \[6\] in the English tree (modulo the fact 
that, in Japanese, the relation holds between sen- 
tence fragments, while in English it holds between 
fldl sentences). However, the TEMPORAL-AFTER re- 
lation that holds between units \[3\] and \[4\] in the 
Japanese tree is realized as a CONTRAST relation 
between unit \[3\] and span \[4,5\] in the English tree: 
And because Japanese units \[6\] and \[7\] are fused 
into: unit \[6\] ing:nglish, the rel,ation .ELA-BORAT.ION- 
OBJECT-ATTRIBUTE-E iS no longer made explicit in 
the English text. 
Assume now that it is the task of an MGEN sys- 
tem to produce from a knowledge base texts (4) 
and (6). The system will have to select, the ap- 
propriate information, generate text plans for the 
two texts, generate sentence plans, and realize them. 
Should the syst.en~ generate a text plan having a 
structure similar to the PtST analysis at. the top or 
the bottom of Figure 2"? Or something in between'? 
As one can see, the discourse trees in Figure 2 are 
quite different: they suggest that depending on the 
output , text plans should use different re- 
lations, different orderings of elementary units, dif- 
ferent aggregations across semantic units, etc. 
Some researchers may argue that the two RST 
analyses in Figure 2 are too specific. That they, 
figures. 
In computing Position-Dependent (P-D) recall 
and precision figures, a Japanese span was consid- 
ered to match an English span when the Japanese 
span contained all the Japanese edus that cor- 
responded to the edus in the English span, and 
in fact, correspond t0.:text.,~plaz!s .,tha-t .l!.ave bee.n 
already refined by an aggregation module and ar- 
guably, even by a sentence planner. After all, the 
re-ordering of units 1 and 2 can be explained only 
in terms of different syntactic contraints in Japanese 
and English. We agree with such a concern. Never- 
theless, as our experiment shows, significant differ- 
ences across discourse trees are found not only for 
trees built at the sentence level, but also for trees 
built at the paragraph and text levels. For these lev- 
els, it is difficult to explain the differences in terms 
of -specific syntactic constraints. Rather, it 
seems more adequate to assume that there are sig- 
nificant differences with respect to the way informa- 
tion is organized rhetorically across s. The 
experiment described in the next section estimates 
quantitatively this difference. 
3 Experiment 
In order to assess how similar discourse structures 
are across s, we built manually a cor- 
pus of discourse trees for 40 Japanese texts and 
their corresponding translations. The texts, se- 
lected randomly from the ARPA corpus (White 
and O'Connell, 1994), contained on average about 
460 words. We developed a discourse annota- 
tion protocol for ,Japanese and English along the 
lines followed by Marcu et al. (1999). We used 
Marcu's discourse annotation tool (1999) in order 
to manually construct the discourse structure of all 
Japanese and English texts it, the corpus. 10~. of 
the Japanese and English texts were rhetorically 
labeled by two of us. The agreement was sta- 
tistically significant (Kappa = 0.65.0 > 0.01 for 
Japanese and Kappa = 0.748,0 > 0.01 for En- 
glish (Carletta, 1996; Siegel-and Castellan, 1988)). 
The tool and the annotation protocol are available 
at. http://www, isi.edt,/~r, zarcu/softwa,-e/. For each 
pair of Japanese-English discourse, structures, we 
also built, manually an alignment file, which specified 
the correspondence between the edus of the Japanese 
and English texts. 
Using labeled recall and precision figures, we com- 
puted the similarity between English and Japanese 
discourse trees with respect t,o their assignment of 
edu boundaries, hierarchical spans, nuclearity, and 
rhetorical relations, Because the trees we comparod 
differ from one  to the other ill the ntnnber 
of elernent ary units, the order, of these units, and the 
way the units are grouped rectirsively into discourse 
spans, we comptlted two types of recall and precision 
, :+when :~e.J~laa~tese:~-and.~En~lish::spans -appeared-in 
the same position linearly. For example, the En- 
glish tree in Figure 2 is characterized by 10 sub- 
sentential spans, which span across positions \[1,1\], 
\[2,2\], \[3,3\], \[4,4\], \[5,5\], \[6,6\], \[1,2\], \[4,5\], \[3,5\], and 
\[1,5\]. (Span \[1,6\] subsumes 2 sentences, so it is 
not sub-sentential.) The Japanese discourse tree has 
only 4 spans that could be matched in the same po- 
sitions with English spans, namely spans \[1,2\]. \[4,4\], 
\[5,5\], and \[1,5\]. Hence the similarity between the 
Japanese tree and the English tree with respect to 
their discourse structure below the sentence level has 
a recall of 4/10 and a precision of 4/ll (in Figure 2, 
there are 11 sub-sentential Japanese spans). 
In computing Position-Independent (P-I) recall 
and precision figures, even when a Japanese span 
"floated" during the translation to a position in the 
English tree that was different from the position 
in the initial tree, the P-I recall and precision fig- 
ures are affected less than when computing Position- 
Dependent figures. The position-independent fig- 
ures reflect the intuition that if two trees tl and t2 
both have a subtree t, tl and 12 are more similar 
than if they were if they didn't share ally subtree. 
For instance, for the spans at the sub-sentential level 
in the trees in Figure 2 the position-independent 
recall is 6/10 and the position-independent preci- 
sion is 6/11 because in addition to spans \[1,2\], \[4,4\], 
\[5,5\], and \[1,5\], one can also match Japanese spat, 
\[1,1\] to English spa,, \[2,2\] and Japanese spa,, \[2,2\] 
to Japanese span \[1,1\]. The Position-Independent 
figures offer a more optimistic metric for comparing 
discourse trees. They span a wider range of values 
than the Position-Dependent figures, which enables 
a finer grained comparison, which in turn enables 
a better characterization of the differ.ences between 
Japanese and English discourse structures. 
In order to provide a better estimate of how close 
two discourse trees were, we computed Position- 
Dependent and -Independent recall and precision fig- 
ures for the sentential level (where units are given 
by edus and spans are given by sets of edus or single 
sentences); paragraph level (where units are given by 
sentences and spans are given by sets of sentences or 
single paragraphs): and text level (where units are 
given by paragraphs and spans are given by sets of 
paragraphs). These figures offer a detailed picture of 
how discourse structures and relations are mapped 
-from one to the other. Some of the differ- 
ences at the sentence level can be explained by differ- 
ences between the syntactic structures of Japanese 
20 
Level 
Sentence 
Paragraph 
Text 
Weighted Average 
All .- 
Sentence 
Paragraph 
Text 
Weighted Average 
All 
Units 
P-D R P-D P 
29.1 25.0 
53.9 53.4 
41.3 42.6 
36.0 32.5 
Spans 
P-D R P-D P 
27.2 22.7 
46.8 47.3 
31.5 32.6 
31.8 28.4 
• 8.2 ..... : 7,4 ..... -.5.9 . ...... 5.3.: 
P-IR P-IP 
71.0 61.0 
62.1 61.6 
74.1 76.5 
69.6 63.0 
74.5 66.8 
P-IR P-IP 
56.0 46.6 
53.2 53.8 
54.4 56.5 
55.2 49.2 
50.6 45.8 
Nuclei 
P-D R P-D P 
21.3 17.7 
38.6 39.0 
28.8 29.9 
26.0 23.1 
..... -.4A ............ 3~9:_ 
P-IR P-IP 
44.3 36.9 
43.3 43.8 
48.5 50.4 
44.8 39.9 
39.4 35.7 
Relations 
P-D R P-D P 
14.9 12.4 
31.9 32.3 
26.1 27.1 
20.1 17.9 
.... ..3.3 ....... 3.0. 
P-IR P-IP 
30.5 25.4 
35.1 35.5 
41.1 42.7 
33.1 29.5 
26.8 24.3 
Table 1: Similarity of the Japanese and English discourse structures 
and English. The differences at the paragraph and 
text levels have a purely rhetorical explanation. 
As expected, when one computes the recall and 
precision figures with respect to the nuclearity and 
relation assignments, one also factors in the nucle- 
arity status and the rhetorical relation that is asso- 
ciated with each span. 
Table 1 summarizes the results (P-D and P-I 
(R)ecall and (P)recision figures) for each level (Sen- 
tence, Paragraph, and Text). It presents Recall and 
Precision figures with respect to span assignment, 
nuclearity status, and rhetorical relation labeling of 
discourse spans. The numbers in the "Weighted 
Average" line report averages of the Sentence-, 
Paragraph-, and Text-specific figures, weighted ac- 
cording to the number of units at each level. The 
numbers in the "All" line reflect recall and precision 
figures computed across the entire trees, with no at- 
tention paid t.o sentence and paragraph boundaries. 
Given the significantly different syntactic struc- 
tures of Japanese and English. we were not surprised 
by tile low recall and precision results that reflect 
the similarity between discourse trees built below 
the sentence level. However, as Table 1 shows, there 
are astonishing differences between discourse trees 
at the paragraph and text. levels as well. For exam- 
pie, the Position-Independent figures show that only 
about 62% of the sentences: and only :about 53% of 
the hierarchical spans built across sentences could be 
matched between the two corpora. When one looks 
at the nuclearity status and rhetorical relations as- 
sociated with the spans built across sentences, the 
P-I recall and precision figures drop to about 43c2~ 
and :/5~ respectively. 
The differences in recall and precision are ex- 
l)lained both by differen,-es in the way information is 
packaged rote paragraphs in the-two s arid 
the way it is structured rhetorically both within and 
above the paragraph level. 
4 How should a multilingual text 
planner work? 
The results in Section 3 strongly suggest that if one 
is to build text plans in the context of a Japanese- 
English multilingual generation system, a - 
independent text planning module whose output is 
mapped straightforwardly into sentence plans (Ior- 
danskaja et al., 1992; Goldberg et al., 1994) will not 
do. The differences between the rhetorical structures 
of Japanese and English texts are simply too big to 
support the derivation of a unique text plan, which 
would subsume both the Japanese- and English- 
specific realizations. If we are to build MGEN sys- 
tems capable of generating rich texts in s 
as distant as English and Japanese, we would need 
to use more sophisticated techniques. In the rest of 
this section, we discuss a set of possible approaches, 
which are consistent with work that has been carried 
out to date in the NLG field. 
4.1 Use text plan representations that are 
more abstract than discourse trees 
Delin et al. (1994) have shown that although tile 
rhetorical renderings in Figure 1 are non-isomorphic. 
dmy are alt subsumed by one .commol~, more.ab- 
stract t.ext.-plan representation  that for- 
realizes the procedural relations of Generation and 
Enablement (Goldman, 1970). One caa~ conceive of. 
text plans being represented as sequences of actions 
or hierarchies of actions and goals over which one can 
identify Generation and Enablement relations that 
hold between them. In such a framework, text plan- 
ning is carried out ill a -independent man- 
ner. which is then followed by a rhetorical "'fleshing 
out". (Delin et al. (1994) have shown how Gener- 
ation and Enablenlent relations are realized rhetor- 
ically in various s using relations such as 
PURPOSE, 'SEQUENCE, CONDITION, and MEANS.) 
Bateman and Rondhuis (1997) suggest that the 
variability present in Delin et al.'s Rhetorical Struc- 
21 
ture analyses in Figure 1 can be explained by the 
inadequate mixture of intentional and semantic re- 
lations, at different levels of granularity. They pro- 
pose that discourse phenomena should be accounted 
for at a more abstract level than RST relations 
and they present a classification system in terms 
discourse-tree rewriting module capable of rewriting 
P-specific discourse structures into O-specific dis- 
course structures. When generating texts in lan- 
guage P, the MGEN system works as a monolin- 
gum generator. When generating texts in  
O, the MGEN system generates a text plan in lan- 
of "stratification", ..'!me£afunction?., ,,and .::p~radig,: ........ guage.-~, xnapsitdr~to.=taaag,uageO.,~ anti,then ~proceeds - .. 
matic/syntagmatic axiality" that enables one to rep- 
resent discourse structures at multiple levels of ab- 
straction. 
Adopting such an approach could be an extremely 
rewarding enterprise. Unfortunately, the research 
of Delin et al. (1994) and Bateman and Rond- 
huis (1997) cannot be applied yet to unrestricted do- 
mains. Generation and Enablement are only two of 
the abstract relations that can hold between actions 
and goals. And some texts, such as descriptions, are 
difficult to characterize only in terms of actions and 
goals. Building a "complete" taxonomy of such ab- 
stract relations and identifying adequate mappings 
between there relations and rhetorical relations are 
still open problems. 
4.2 Derive a -independent 
discourse structure, and then linearize 
it 
RSsner and Stede (1992) and Stede (1999) assume 
that a discourse representation g la Mann and 
Thompson imposes no contraints on the linear order 
of the leaves. For tile purpose of multilingual text 
planning, one can, hence, assume that a - 
independent text planner derives first a - 
independent rhetorical structure and then linearizes 
it, i.e., transforms it to make it  specific. 
The transformations that RSsner and Stede have ap- 
plied concern primarily re-orderings of the children 
of some nodes and re-assignment of rhetorical rela- 
tion labels. But given, for example, tile significant 
differences between the discourse structures in Fig- 
ure 2, it is difficult to envision what the - 
independent text plan might look like. It is deft- 
nitely possible to conceive of such a text plan rep- 
resentation. However, the linearization module will 
need then to be much more sophisticated: it will 
need to be able to rewrite full structures, re-order 
constituents, aggregate_across possibly non-adjacent 
units, etc. 
4.3 hnplement a text planning algorithm 
for one  only. For all other 
s, devise discourse-tree 
rewriting modules 
In this approach, the system developer assigns a pre- 
ferrential status to one of the s that are 
to be handled I) 3 ' the MGEN system. Lot's call 
this  P. The system developer implenlents 
text planning algorithms only for this . For 
any other  O, the developer itnplements a 
22 
further with the sentence planning and realization 
stages. Marcu et al. (2000) present and evaluate a 
discourse-tree rewriting algorithm that exploits ma- 
chine learning methods in order to map Japanese 
discourse trees into discourse trees that resemble 
English-specific renderings. 
The advantage of such an approach is that the 
tree-rewriting modules can be also used in the con- 
text of machine translation systems in order to re- 
package and re-organize the input text rhetorically, 
to reflect constraints specific to the target . 
The disadvantage is that, from an NLG perspective, 
there is no guarantee that such a system could pro- 
duce better results than a system that implements 
-dependent text planning modules. 
4.4 Derive -dependent text plans 
Another viable approach is to acknowledge that 
text plans vary significantly across s and, 
therefore, should be derived by -dependent 
planners. To this end, one could use both top- 
down (How, 1993; Moore and Paris, 1993) and 
bottom-up (Marcu, t997; Mellish et al., 1998) text 
planning algorithms. The advantage of this ap- 
proach is that it has the potential of producing trees 
that reflect tile peculiarities specific to any . 
The disadvantage is that only the text planning al- 
gorithms are general: the plan operators and the 
rhetorical relations they operate with are - 
dependent, and hence, more expensive to develop 
and maintain. 
4.5 Discussion 
Depending oil tile s and text genres it op- 
erates with, all MGEN system may get away with 
a -independent text planner. However, 
for sophisticated genres and distant s, im- 
plementing a -independent planner that is 
straightforwardly'mapped i:nto sentence, plans does 
not appear to be a felicitous solution. We enu- 
merated four possible alternatives for addressing the 
text planning problem in an MGEN system. Each 
of tile approaches has its own pluses and minuses. 
Which will eventually win in large-scale deployable 
MGEN systems remains an open question. 

References 
John A. Bateman and Klaas Jan Flondliuis. 1997. 
Coherence relations: Towards a general specifica- 
tion. Dtsco~u'se Processes, 24:3-49. 
Jean Carletta. 1996. Assessing agreement on clas- 
sification tasks: The kappa statistic. Computa- 
tional Linguistics, 22(2):249-254, June. 
Judy L. Delin, Anthony Hartley, C@ile L. Paris, Do- 
nia R. Scott, and Keith Vander Linden. 1994. Ex- 
pressing procedural relationships in multilingual 
-. instructions. In -P. roeeedirtgs.,of 4he .,geventh:-,tnter~ 
national Workshop on Natural Language Genera- 
tion, pages 61-70, Kennebunkport, Maine, June. 
J. Delin, D. Scott, and A. Hartley. 1996. Prag- 
matic congruence through -specific map- 
pings from semantics to syntax. Technical report, 
ITRI Research Report ITRI-96-12, University of 
Brighton. 
E. Goldberg, N. Driedger, and R. Kittredge. 1994. 
Using natural- processing to produce 
weather forecasts. IEEE Expert, 9(2):45-53. 
A.I. Goldman. 1970. A Theory of Human Action. 
Prentice Hall, Englewood Cliffs, NJ. 
Eduard H. Hovy. 1993. Automated discourse gen- 
eration using discourse structure relations. Artifi- 
cial Intelligence, 63(1-2):341-386, October. 
L. Iordanskaja, M. Kim, R. Kittredge, B. Lavoie, 
and A. Polguere. 1992. Generation of extended 
bilingual statistical reports. In Proceedings of 
the 14th International Conference on Compu- 
tational Linguistics (COLING'92), pages 1019- 
1023, Nantes, France. 
William C. Mann and Sandra A. Thompson. 1988. 
Rhetorical structure theory: Toward a functional 
theory of text organization. Text, 8(3):243-281. 
Daniel Marcu. 1997. From local to global coherence: 
A bottom-up approach to text planning. In Pro- 
ceedings of the Fourteenth National Conference on 
Artificial Intelligence (AAAI-97), pages 629-635, 
Providence, Rhode Island, July 28-31. 
Daniel Marcu, Estibaliz Amorrortu, and Magdalena 
Romera. 1999. Experiments in constructing a 
corpus of discourse trees, tn Proceedings of the 
ACL'99 Workshop on Standards and Tools for 
Discourse Tagging, pages 48-.57, University of 
Maryland. June 22. 
Daniel Marcu, Lynn Carlson, and Maki Watan- 
abe. 2000. The automatic translation of discourse 
structures. In Proceedings of the First Annual 
Meeting of the A'orth American Chapter of the As- 
sociation for Computational Linguistics NAACL- 
2000, Seattle, Washington. April 29 - May 3. 
Chris Mellish. Alistair Knott. Jon Oberlander, 
and Mick O'Donnell. 1998. Experiments using 
stochastic search for text planning. In Proceed- 
}ngs of lbc 9lh International H'oHcstwp on .Vatvral 
Language. (;era'cation. pages 98 107. Niagara-on- 
the-Lake, Canada. August 5-7. 
,lohanna D. Moore and Cdcile L. Paris. 1993. Plan- 
ning text \['or advisory dialogues: Capturing inten- 
tional and rhetorical information. Computational _ 
Linguistics, 19(4):651-694. 
C. Paris, K. Vander Linden, M. Fischer, A. Hart- 
Icy, L. Pemberton, R. Power, and D. Scott. 1995. 
A support tool for writing multilingual instruc- 
tions. In Proceedings. of the 14th International 
Joint. ,Gonfer~nee. ~on.:Artificial ~tnt~ttigenee (IJ- 
CA I'95), pages 1398-1404, Montreal, Canada. 
Richar Power and Donia Scott. 1998. Multilingual 
authoring using feedback texts. In Proceedings of 
the 36th Annual Meeting of the Association for 
Computational Linguistics (ACL'98), Montreal, 
Canada, August. 
Ehud Reiter and Chris Mellish. 1993. Optimizing 
the costs and benefits of natural  gen- 
eration. In Proceedings of the 131h International 
Joint Conference on Artificial Intelligence, pages 
1164-1169. 
Dietmar R6sner and Manfred Stede. 1992. Cus- 
tomizing RST for the automatic production 
of technical manuals. In R. Dale, E. How, 
D. R6sner, and O. Stock, editors, Aspects of Au- 
tomated Natural Language Generation; 6th Inter- 
national Workshop on Natural Language Gener- 
ation, number 587 in Lecture Notes in Artificial 
Intelligence, pages 199-214, Trento, Italy, April. 
Springer-Verlag. 
Sidney Siegel and N.J. Castellan. 1988. Non- 
parametric Statistics for the Behavioral Sciences. 
McGraw-Hill, second edition. 
Manfred Stede. 1999. Rhetorical structure and the- 
matic structure in text generation. In Working 
Notes of the Workshop on Levels of Represen- 
tation m Discourse, pages 117-123, Edinburgh, 
Scotland, July 7-9. 
J. White and T. O'Connell. 1994. Evalua- 
tion in the ARPA machine-translation pro- 
gram: 1993 methodology. In Proceedings of 
the .4RPA Human  Technology Work- 
shop, pages 135-140, Washington, D.C. See also 
h ttp : // ursula, geowetown, edu/. 
