A Media-Independent Content Language for Integrated Text and 
Graphics Generation 
Nancy Green*, Giuseppe Carenini**, Stephan Kerpedjiev*, Steven Roth*, 
Johanna Moore** 
*Carnegie Mellon University, Pittsburgh, PA 15213 USA 
**University of Pittsburgh, Pittsburgh, PA 15260 USA 
Abstract 
This paper describes a media-independent knowl- 
edge representation scheme, or content language, for 
describing the content of communicative goals and 
actions. The language is used within an intelligent 
system for automatically generating integrated text 
and information graphics presentations about com- 
plex, quantitative information. The language is de- 
signed to satisfy four requirements: to represent in- 
formation about complex quantitative relations and 
aggregate properties; compositionality; to represent 
certain pragmatic distinctions needed for satisfying 
communicative goals; and to be usable as input by 
the media-specific generators in our system. 
1 Introduction 
This paper describes a media-independent 
knowledge representation scheme, or content 
language, for describing the content of commu- 
nicative goals and actions. The language is 
used within an intelligent system for automati- 
cally generating integrated text and information 
graphics 1 presentations about complex, quanti- 
tative information. The goal of the current im- 
plementation of the system is to produce analy- 
ses and summarizations of the quantitative data 
output by a transportation scheduling program. 
In our approach \[Kerpedjiev etal.1997a, 
Kerpedjiev et al.1997b, Green et a1.1998, 
This work was supported by DARPA contract num- 
ber DAA-1593K0005. le.g., charts, tables, maps, rather than pictorial forms 
of representation. 
69 
Kerpedjiev et ai.1998\], the content and orga- 
nization of a presentation is first planned at 
a media-independent level using a hierarchical 
planner \[Young1994\]. In this way, a high-level 
presentation goal, such as to assist the user to 
evaluate a transportation schedule created by 
the scheduling program, is ultimately decom- 
posed into media-independent subgoals, whose 
content is represented in the content language. 
The content language also is used to represent 
the content of the media-independent commu- 
nicative acts, e.g., Assert and Recommend, se- 
lected by the planner to satisfy these subgoals. 2 
Content language expressions are constructed 
by the plan constraint functions of the presen- 
tation plan operators during planning. 
The content language in the presentation 
plan is used by the system's two media-specific 
generators, one for text and one for information 
graphics. A media allocation component de- 
cides which parts of the plan shall be realized by 
each generator. The text generator transforms 
its assigned parts to sentence specifications, 
for realization by a general-purpose sentence 
generator (SURGE) \[Elhadad and Robin1996\]. 
The graphics generator transforms its assigned 
parts of the plan to a sequence of user tasks 
which a graphic must support in order to sat- 
isfy the presentation goals. The tasks are 
then input to a graphic design system (SAGE) 
2In other words, the content language describes that 
which is to be asserted, recommended, believed, etc., 
rather than the types of communicative acts to be per- 
formed or propositional attitudes which the acts are in- 
tended to achieve. 
\[Roth and Mattis1990, Roth et a1.1994\] which 
automatically designs and realizes a graphic 
supporting the tasks. 
One of the requirements for our content lan- 
guage is the ability to represent complex de- 
scriptions of quantitative database attributes, 
such as total port capacity of all ports and 
90~ of the total weight of the cargo arriving 
by clay 25. In addition to application-specific 
concepts such as port capacity, such descrip- 
tions involve the specification of application- 
independent quantitative relations (e.g., 90~ of 
...), aggregate properties of sets (e.g., total ... 
of all ... ), and time-dependent relations (e.g., 
increase from ... to ... during the interval ...). 
Thus, we would like for the language to be able 
to express a wide range of quantitative and tem- 
poral relations and aggregate properties, rather 
than just those required for the current domain 
of transportation scheduling. 
Another requirement is for the content lan- 
guage to represent these descriptions composi- 
tionally. A compositional representation should 
facilitate the work of the text and graphics gen- 
erators, as well as media coordination. 
A third requirement for the content lan- 
guage is the ability to represent subtle differ- 
ences in communicative intention with respect 
to the same data. To give an example in the 
domain which will be used for illustration in 
the rest of the paper, the same data 3 could un- 
derly either the assertion that Three newspapers 
that are circulated in Pittsburgh carry only na- 
tional news or the assertion that Three news- 
papers that carry only national news are circu- 
lated in Pittsburgh. However, while conveying 
the same facts about the three newspapers, the 
two assertions are not interchangeable. The first 
assertion would be more effective than the sec- 
ond in an argument such as 
Be careful which newspaper you read to 
find out what is going on locally. The 
3All data used in the paper is fictitious. However, 
many of the examples were inspired by a naturally oc- 
curing example about the numbers of readers of newspa- 
pers read in Pittsburgh. We have selected this domain 
for illustration because it requires minimal background 
knowledge. 
70 
Post-Gazette covers both national and 
local news, but three newspapers that 
are circulated in Pittsburgh carry only 
national news. 
while the second would be more effective than 
the first in 
Pittsburghers are interested in national 
affairs. In fact, three newspapers that 
carry only national news are circulated 
in Pittsburgh. 
As will be shown later in the paper, the con- 
tent language enables related assertions such as 
these to be differentiated. 
A final requirement is for the representa- 
tion scheme to be media-independent in order 
to provide a common input language for the 
media-specific generators. We assume that such 
a common language will facilitate the difficult 
problem of media coordination. On the other 
hand, the language must satisfy the needs of 
both the text and information graphics genera- 
tors. 
In the rest of the paper, first we describe 
the content language, focusing on aspects of the 
content language which are applicable to other 
domains. Next, we illustrate how subtle varia- 
tions in communicative intention can be repre- 
sented in the content language, and give exam- 
ples of how they can be expressed in text and 
information graphics. Finally, we describe some 
related work. 
2 Content Language 
In order to ensure that the language would be 
applicable to a variety of quantitative domains, 
we first performed a corpus analysis, the results 
of which are summarized in the next section. 
Then we describe the syntax we adopted to sat- 
isfy the requirements given in the introduction. 
2.1 Corpus Analysis 
We have collected samples of presentations with 
integrated natural language and graphics in or- 
der to describe and analyze the vocabulary and 
structure of such presentations. To ensure gen- 
erality, the corpus includes presentations from 
different disciplines (Economics and Medicine) 
and intended for different audiences. 4 It also 
includes samples from collections of presenta- 
tions compiled by others, such as \[Tufte1983, 
Tufte1990, Tufte1997, Kosslyn1994\], and pre- 
scriptive examples found in books on how 
to design effective presentations \[Zelazny1996, 
Kosslyn1994\]. 
The analysis of this corpus contributed di- 
rectly to the development of a vocabulary for 
the content language. To describe the content of 
the presentations in the corpus, we distinguish 
three different sets of predicates with associated 
modifiers, as follows: 5 
• Comparison Predicates: \[much\] Greater, 
Lower I Highest, Lowest I \[very\] Far-from, 
Close-to I \[almost I exactly \] Equal, n- 
Times.Comparison Predicates apply to any 
quantitative attribute of individuals or sets, 
e.g., On this measure Central Europe's stock- 
markets are still puny compared with those of 
fast-growing Asian countries. 
• Global Predicates: \[widelylslightly\] Vary 
\[:from :to\], Constant. 
Global Predicates apply to quantitative at- 
tributes of sets, e.g., Sales representative per- 
formance is uneven. 
• Trend Predicates: Remain-constant I \[con- 
siderably I slightly\] Increase, Decrease \[:from 
:to\] I Drop, Fall, Rise \[:from :to\] I Reach-a- 
Plateau I Fluctuate, 
Trend Predicates apply only to time series (a 
set of data ordered chronologically), e.g., Pro- 
duction of television sets in Russia fell from 
4.5m units in 1991 to fewer than lm in 1995. 
2.2 Syntax 
The first three requirements described in the In- 
troduction (representing quantitative and tem- 
poral relations and aggregate properties, com- 
4Economics: The Economist (March-August 1996). 
Medicine: UC Berkeley Wellness Letter (June 1993 and 
September 1996), Scientific American (September 1996), 
New England Journal o\] Medicine (April-August 1996). 
5Square brackets indicate optionality, the bar 
exclusive-or, and commas separate variants with differ- 
ing orientations. 
71 
positionality, and representing certain prag- 
matic distinctions) led us to make use of a first- 
order logic with restricted quantification (RQ- 
FOL), which has been used for representing 
the meaning of natural language queries involv- 
ing complex referring expressions \[Woods1983, 
Webber1983\]. The features of RQFOL most 
useful for our purposes are (i) that it permits 
pragmatic distinctions to be made among ex- 
pressions which are semantically equivalent, and 
(ii) that it supports the compositional specifica- 
tion of complex descriptions of discourse entities 
\[Webber1983\]. 
A pragmatic distinction supported in RQ- 
FOL and our content language is the distinc- 
tion between the main predication of an expres- 
sion and information to be conveyed about the 
objects of the main predication. For example, 
although (la) and (lb) are semantically equiv- 
alent with (lc), they are not interchangeable in 
their effectiveness for achieving different com- 
municative intentions (as was demonstrated in 
the Introduction.) In (la) the main predication 
is about news coverage, whereas in (lb) it is 
about newspaper circulation. 
(la) Three newspapers that are circulated in 
Pittsburgh carry only national news. 
(lb) Three newspapers that carry only national 
news are circulated in Pittsburgh. 
(lc) There is a set of three newspapers such that 
for every newspaper in the set, it is 
circulated in Pittsburgh and carries only 
national news. 
To represent this distinction in the content 
language, a communicative act has the form, 
(Act Proposition Referents), where Act specifies 
the type of action (such as Assert), Proposition 
is a quantifier-free FOL formula describing the 
main predication, and Referents is a list describ- 
ing the arguments of the main predication. (It 
is assumed that the agent performing a com- 
municative action is the system, and that the 
audience is the user.) For example, (la) and 
(Ib) can be analyzed ass realizing the assertions 
(2a) and (2b), respectively. In (2a), the main 
predication is (has-coverage ?dl National-only); 
the variable ?dl is further described as three 
newspapers that are circulated in Pittsburgh. 6 
In (2b), the main predication is (has-circulation 
?dI Pittsburgh); the variable ?dl is further de- 
scribed as three newspapers whose coverage is 
national news only. 
(2a) (Assert (has-coverage ?dl National-only) 
((?dl (for (cardinal 3) ?x newspaper 
(has-circulation ?x Pittsburgh))))) 
(2b) (Assert (has-circulation ?dl Pittsburgh) 
((?dl (for (cardinal 3) ?x newspaper 
(has-coverage ?x National-only))))) 
In general, each element of the Referents list 
has the form (term description), where term is 
a variable or a database object identifier; and 
term denotes a discourse entity. If provided, de- 
scription specifies information about term that 
is required to achieve the goal(s) of the commu- 
nicative act, as opposed to information whose 
only function is to enable the audience to iden- 
tify the entity. Only descriptions with an at- 
tributive function are specified in the presenta- 
tion plan. Referential descriptions, whose func- 
tion is only to enable the audience to identify 
an entity, are constructed by the media-specific 
generators. (For information about the different 
roles of attributive and referential descriptions 
in our system, see \[Green et a1.1998\].) In gen- 
eral, description is of the form (for quantifier 
variable class restriction). (In (2a) and (2b), 
quantifier is the cardinal 3, the class is news- 
paper, and the restriction is (has-circulation 
?z Pittsburgh) and (has-coverage ?x National- 
only), respectively.) 
Complex descriptions can easily be ex- 
pressed in a compositional manner in the con- 
tent language. For example, (3a) is a possible 
realization in text of the assertion given in (3b). 
(A graphic realizing (3b) is shown in (3c) of Fig- 
ure 1.) In (3b), the main predication, (gt ?dl 
?d2), is that ?dl is greater than ?d2. ?dl is to 
be described as the unique integer ?x such that 
?x is the number of readers of SPPG. ($PPG 
is a database object denoting the Post-Gazette.) 
?d2 is described as the unique integer ?x such 
6By convention, symbols prefixed with ? are variables, 
and symbols prefixed with $ are database identifiers. 
72 
that ?x is the total of ?d3; ?d3 is described as 
the unique set of integers ?y such that ?y is the 
number of readers of ?d4; and ?d4 is described 
as the elements of the set ($WSJ, $NYT, and 
$USA), (whose elements are database objects 
denoting the Wall Street Journal, the New York 
Times, and USA Today, respectively). 
(3a) 
(3b) 
The number of readers of the Post-Gazette is 
greater than the number of Pittsburgh 
readers of the New York Times, the Wall 
Street Journal, and USA Today combined. 
(Assert (gt ?dl ?d2) 
((?dl (for the ?x integer 
(has-number-of-readers SPPG ?x))) 
($PPG 0) 
(?d2 (for the ?x integer (total ?x ?d3))) 
(?d3 (for the ?w set (lambda ?y integer 
(has-number-of-Pitts-readers ?d4 ?y))) 
(?d4 (for all ?z newspaper 
(in-set ?z ($WSJ $NYT SUSA)))) 
($WSJ 0) ($NYT 0) ($USA 0)) 
3 Examples 
In this section we illustrate how different com- 
municative intentions about the same data can 
be represented in the content language, and how 
these intentions can be expressed in text and in- 
formation graphics. One goal of this exercise is 
to illustrate what distinctions can be expressed 
graphically, but not what information should be 
expressed in graphics. (The problem of deciding 
which media to use, media allocation, is beyond 
the scope of this paper.) Thus, the examples 
of graphics are minimal in the sense that they 
have been designed to convey the information 
to be asserted and as little as possible other in- 
formation. However, in some cases it is not pos- 
sible not to convey more in graphics than was 
intended. 
For example in (3c) in Figure 1, which re- 
alizes (3b), the graphic also conveys informa- 
tion about relative numbers of readers of each 
of the newspapers, e.g., that the Post-Gazette 
has about one-third more than the sum of the 
others, and that the others have about the same 
number of readers each. Note that although 
it is not the communicative intention in (3b) 
(3c) (4c) 
(5c) 
• New York Times 
Wall Street Journal 
~ USA Today 
PPG 
NAT 
number of readers 
PPG: Post-Gazette 
NAT: newspapers with national 
coverage only that are read 
in Pittsburgh 
~///// 
• Post-Gazette .............. 
44 Newspapers with national 
coverage only that are 
read in Pittsburgh 
(6c) 
Coverage of newspapers 
read in Pittsburgh 
Local National 
Coverage Coverage 
Post-Gazette • • 
other ............ ~ ...... 
newspapers ........... 
........................................... 
Figure 1: Assertions expressed in graphics 
to convey the particular numbers of readers of 
each newspaper (hence the x-axis does not show 
actual numbers), information about the actual 
numbers of readers of each newspaper is needed 
during graphics generation to design (3c). (If 
the presentation's intention was to convey the 
particular numbers of readers of the newspa- 
pers, then different assertions specifying the ac- 
tual numbers would be planned.) 
Whereas in (3b), four newspapers are indi- 
viduated, it is possible to make an assertion 
such as (4b) in which the members of the set 
($NAT) of newspapers with only national cov- 
erage are not individuated. The assertion in 
(4b) could be expressed in text as (4a), or in 
graphics in (4c) in Figure 1. However, this 
graphic still expresses more than (4b), e.g., that 
the number of PPG readers is about one-third 
more than the number of NAT readers (even 
though the x-axis does not show the actual num- 
bers of readers). 
(4a) The number of readers of the Post-Gazette is 
greater than the total number of readers of 
the newspapers read in Pittsburgh with 
national coverage only. 
73 
(4b) (Assert (gt ?dl ?d2) 
((?dl (for the ?x integer 
(has-number-of-readers SPPG ?x))) 
($PPG 0) 
(?d2 (for the ?y integer 
(has-total-number-of-Pitts-readers 
SNAT ?y))) 
($NAT (for the ?w set (lambda ?z newspaper 
(and (has-coverage ?z National-only) 
(has-circulation ?z Pittsburgh)))))) 
In contrast to (3b), (hb) differentiates the 
members of NAT, but does not identify or oth- 
erwise describe them. (hb) could be expressed 
in text as (ha), and in graphics as in (5 c) in Fig- 
ure 1. Once again, the graphic has side-effects. 
In this case, it conveys additional information 
about the relative numbers of readers among 
the newspapers with national coverage only, and 
the fact that there are three of those newspa- 
pers. Comparing (5c) to (3c), in (3c) the total 
number of readers of the three other newspapers 
is expressed by concatenating segments of bars 
representing the three newspapers into a single 
bar whose length represents the total number of 
readers of the three newspapers. Although this 
information can be computed from (5c), it is not 
directly realized in the graphic. 
(Sa) The number of readers of the Post-Gazette is 
greater than the number of readers in 
Pittsburgh of any newspaper with national 
coverage only. 
(55) (Assert (gt ?all ?d2) 
((?dl (for the ?x integer 
(has-number-of-readers $PPG ?x))) 
($PPG O) 
(?d2 (for the ?y integer 
(has-number-of-Pitts-readers ?d3 ?y))) 
(?d3 (for each ?z newspaper 
(has-coverage ?z National)))) 
In contrast to the preceding examples, (6b) 
illustrates a communicative intention (about 
the same data as in the other examples) with a 
different main predication. In text, (6b) could 
be expressed as in (6a); the main predication is 
about the coverage of the Post-Gazette rather 
than about the number of readers. This dif- 
ference in main predication results in a graphic 
such as (6c) in Figure i with a different struc- 
ture than those of the preceding examples. 
(6a) Only 1 of the newspapers read in Pittsburgh, 
the Post-Gazette, has both national and 
local coverage. 
(6b) (Assert (has-coverage $PPG Local-National) 
(($PPG (for (only 1) ?x newspaper 
(in-set ?x ?dl))) 
(?dl (for the ?w set (lambda ?x newspaper 
(has-circulation ?x Pittsburgh))))) 
4 Related Work 
Several projects 
have studied the problem of media-independent 
knowledge representation schemes for auto- 
matic generation of multimedia presentations. 
The COMET \[Feiner and McKeown1991\] and 
WIP \[Wahlster et ah1993\] systems generate in- 
structions for operating physical devices, and 
\[Maybury1991\] describes a system that designs 
narrated or animated route directions in a car- 
tographic information system. These systems 
represent content about complex sequences of 
actions the user can perform on the physical 
device and their effects, as well as spatial con- 
74 
cepts. However, this work is not relevant to 
information graphics generation. 
The multimedia system whose focus is clos- 
est to ours, 
PostGraphe \[Fasciano and Lapalme1996\], is a 
system that generates multimedia statistical re- 
ports consisting of graphics and text. However, 
there are some fundamental differences with our 
approach. First, in Postgraphe it is assumed 
that a presentation is about the entire dataset, 
whereas our content language can be used to 
describe subsets and individuals in the dataset. 
Second, in Postgraphe graphics are generated 
directly from its knowledge representation lan- 
guage; then text is generated based upon the 
graphics. Thus, it is not clear whether the lan- 
guage is truly media-independent, i.e., whether 
it could be used to generate text directly. Also, 
Postgraphe's language of intentions is less gen- 
eral than our approach of generating presenta- 
tion plans for achieving communicative goals. 
For example, in Postgraphe the language can 
be used to specify the intention to compare two 
variables of a dataset in a way that emphasizes 
an increase. In our approach, complex argu- 
ments can be planned. 
5 Conclusions 
This paper describes a media-independent 
knowledge representation scheme, or content 
language, for describing the content of commu- 
nicative goals and actions. The language is 
used within an intelligent system for automati- 
cally generating integrated text and information 
graphics presentations about complex, quanti- 
tative information. To ensure that the language 
will be applicable to a variety of quantitative 
domains, it is based upon a corpus analysis of 
integrated natural language and graphics pre- 
sentations. The language is designed to sat- 
isfy four requirements: to represent information 
about complex quantitative relations and aggre- 
gate propertiess; compositionality; to represent 
certain pragmatic distinctions needed for satis- 
fying communicative goals; and to be usable as 
part of the input to the media-specific (text and 
graphics) generators. 

References 
\[Elhadad and Robin1996\] M. Elhadad 
and J. Robin. 1996. An overview of SURGE: A 
reusable comprehensive syntactic realization com- 
ponent. Technical Report Technical Report 96-03, 
Dept of Mathematics and Computer Science, Ben 
Gurion University, Beer Sheva, Israel. 
\[Fasciano and Lapalme1996\] M. Fasciano and 
G. Lapalme. 1996. PostGraphe: a System for the 
Generation of Statistical Graphics and Text. In 
Proceedings of the 8th International Natural Lan- 
guage Generation Workshop, pages 51-60, Sus- 
sex, UK, June. 
\[Feiner and McKeown199i\] S. Feiner and K. McKe- 
own. 1991. Automating the generation of coordi- 
nated multimedia explanations. IEEE Computer, 
24(10):33-40, October. 
\[Green et a1.1998\] 
Nancy Green, Giuseppe Carenini, and Johanna 
Moore. 1998. A principled representation of at- 
tributive descriptions for integrated text and in- 
formation graphics presentations. In Proceedings 
of the Ninth International Workshop on Natural 
Language Generation, Niagara-on-the-Lake, On- 
tario, Canada. To appear. 
\[Kerpedjiev et al.1997a\] S. Kerpedjiev, G. Carenini, 
S. Roth, and J. Moore. 1997a. AutoBrief: a multi- 
media presentation system for assisting data anal- 
ysis. Computer Standards and Interfaces, 18:583- 
593. 
\[Kerpedjiev et al.1997b\] Stephan Kerped- 
jiev, Giuseppe Carenini, Steven F. Roth, and Jo- 
hanna D. Moore. 1997b. Integrating planning and 
task-based design for multimedia presentation. In 
International Conference on Intelligent User In- 
terfaces (IUI '97), pages 145-152. Association for 
Computing Machinery. 
\[Kerpedjiev et a1.1998\] 
Stephan Kerpedjiev, Giuseppe Carenini, Nancy 
Green, Steven F. Roth, and Johanna D. Moore. 
1998. Saying it in graphics: from intentions to 
visualizations. In Proceedings of the Symposium 
on Information Visualization (Info Vis '98). IEEE 
Computer Society Technical Committee on Com- 
puter Graphics. To appear. 
\[Kosslyn1994\] Stephen M. Kosslyn. 1994. Elements 
of Graph design. W.H. Freeman and Company. 
\[Maybury1991\] Mark T. Maybury. 1991. Plan- 
ning multimedia explanations using communica- 
tive acts. In Proceedings of the Ninth National 
Conference on Artificial Intelligence, pages 61- 
66, July. 
\[Roth and Mattis1990\] S.F. Roth and J. Mattis. 
1990. Data characterization for intelligent graph- 
ics presentation. In Proceedings of the Confer- 
ence on Human Factors in Computing Systems 
(SIGCHI '90), pages 193-200. 
\[Rothet a1.1994\] Steven F. Roth, John Koloje- 
jchick, Joe Mattis, and Jade Goldstein. 1994. 
Interactive graphic design using automatic pre- 
sentation knowledge. In Proceedings of the Con- 
ference on Human Factors in Computing Systems 
(SIGCHI '94), pages 112-117. 
\[Tufte1983\] Edward R. Tufte. 1983. The Visual Dis- 
play of Quantitative Information. Graphics Press, 
Cheshire, Conn. 
\[Tufte1990\] Edward R. Tufte. 1990. Envisioning 
information. Graphics Press, Cheshire, Conn. 
\[Tufte1997\] Edward R. Tufte. 1997. Visual Expla- 
nations. Graphics Press, Cheshire, Conn. 
\[Wahlster et a1.1993\] 
W. Wahlster, E. Andre, W. Finkler, H.-J. Prof- 
itlich, and T. Rist. 1993. Plan-based integration 
of natural language and graphics generation. Ar- 
tificial Intelligence, 63:387-427. 
\[Webber1983\] Bonnie L. Webber. 1983. So what can 
we talk about now? In B. Grosz, K. S. Jones, and 
B. L. Webber, editors, Readings in Natural Lan- 
guage Processing. Morgan Kaufmann, Los Altos, 
California. 
\[Woods1983\] W. Woods. 1983. Semantics and quan- 
tification in natural language question answering. 
In B. Grosz, K. S. Jones, and B. L. Webber, ed- 
itors, Readings in Natural Language Processing. 
Morgan Kaufmann, Los Altos, California. 
\[Young1994\] Michael R. Young. 1994. A developer's 
guide to the Longbow discourse planning system. 
Technical Report ISP TR Number: 94-4, Univer- 
sity of Pittsburgh, Intelligent Systems Program. 
\[Zelazny1996\] Gene Zelazny. 1996. Say it with 
charts: the executive's guide to visual communi- 
cation. IRWIN Professional Publishing. 
