File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/96/w96-0406_intro.xml

Size: 7,726 bytes

Last Modified: 2025-10-06 14:06:08

<?xml version="1.0" standalone="yes"?>
<Paper uid="W96-0406">
  <Title>PostGraphe: a system for the generation of statistical graphics and text</Title>
  <Section position="3" start_page="0" end_page="1985" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> Graphics and text are very different media. Fortunately, when their integration is successful, they complement each other very well: a picture shows whereas a text describes. In this research, we are studying the interaction between the text of a statistical report and its figures. Reports are an organized synthesis of data that span a whole array of forms going from tables of numbers to a text summarizing the findings. Statistical reports are particularly interesting because the reader can easily be overwhelmed by the raw data. Without an appropriate preliminary statistical analysis to make the important points stand out and, without an effi,2ent organization and presentation, the reader might be lost. In this paper, we present the important factors in the generation process as well as its important steps. We then give an overview of a statistical report generator called PostGraphe.</Paragraph>
    <Paragraph position="1"> 2 Important factors in the generation process A number of factors have to be considered in order to produce a statistical report containing text and graphics. These factors include the writer's goals, the types and values of the variables to be presented, and the relations between these variables. null The writer's goals have a major role in the generation process. As we can see in figures 1 and '2. the same data can be expressed in very different ways according to the message the writer wishes to transmit. The example presents the same set of data -- profits during the years 1971-1976 -according to two different perspectives which reflect the writer's goals or intentions. In figure 1, the goal is to present the evolution of the profits during the relevant time period. In figure 2, the message is totally different, and corresponds to a different goal: to compare the profits for the 6 years of the data set. Because of its tempo- null ral nature, the usual way of presenting this data is the message of evolution. The difference can be seen in the organization of the graphs and in profits the wording of the text. In figure 1, the evolution is emphasized by using the horizontal axis 18 for the years \[20, 3\]. This is the accepted way of presenting temporal data. The years are sorted in ascending order, also to give the impression of evolution. The associated text describes the over- 12 all evolution and points out an interesting irregularity. On the other hand, the writer's intention for figure 2 is totally different. In order to show a comparison, a few structural changes have to 6 be made. First of all, the years are presented on the vertical axis, thus eliminating the impression of evolution \[20\]. This change is important to the perception of the graph because it makes its 0 message clearer by eliminating a false inference.</Paragraph>
    <Paragraph position="2"> Second, the years are treated as a nominal variable instead of an ordinal one, and thus sorted according to the profit values. This reordering has two positive effects: it further destroys the impression of evolution by making the years non-sequential and it allows a better comparison of the profits \[9\]. The text is also different from the one in figure 1: instead of describing how the profits evolved, it merely points out the best and annde the worst years for profits. This difference in perspective is important for a writer, especially when 1974 trying to convey more subjective messages \[10\].</Paragraph>
    <Paragraph position="3"> 1973 If the communicative goals aren't well identified, it is very easy to convey the wrong impres- 1972 sion to the reader. This problem is often complicated by the fact that a single graph or text can 1976 convey many messages at once, some more direct than others. For example, figures 3 and 4 show 197~ 2 graphs that share a subset of intentions. The main message is one of evolution in figure 3 graph tgv5 and correlation in figure 4, but both graphs also 0 transmit, with lower efficiency, the main message of the other graph. Correlation is perceptible in the line graph because the two sets of data can be followed together and evolution can be perceived in the point graph because significant year clusters are marked by different shapes. Thus, determining which types of graphs or text best satisfy single goals is not sufficient; one also has to</Paragraph>
    <Paragraph position="5"> Globally, the profits have gone down despite a strong rise from 1974 to 1975.</Paragraph>
    <Paragraph position="6">  profits The profits were at their highest in 1975 and 1971. They were at their lowest in 1974, with about half their 1975 value.</Paragraph>
    <Paragraph position="7">  take into account the cumulative influence of the secondary messages conveyed by all parts of the profits report. 100  tion and correlation As might be expected, the types of variables give a lot of information about the structure of the elements of the report \[2, 12, 13\]. For example, although a continuous variable is better represented by a line graph, the nature of a discrete variable will become more apparent using a column graph. Graphics-only systems can get away with a simple type-system as presented in \[12, 13\]. This type system classifies the visual and organizational properties of data variables using such categories as nominal, ordinal, and quantitative. A more complex classification is helpful in general as it allows the classification of other useful properties, e.g. temporal, but in the case of text generation, it becomes necessary in order to express the units of the variables. For example, knowing that &amp;quot;May&amp;quot; and &amp;quot;July&amp;quot; are months allows a generator to produce temporal expressions such as &amp;quot;two months later&amp;quot; \[11\].</Paragraph>
    <Paragraph position="8"> To further refine the selection process, we have to take into account not only the types, but also the specific values of the data samples. The hum- null bet of values sometimes has a lot of influence on the choice of an expression schema. For example, a discrete variable with 200 values will often be treated as continuous, thus overriding the influence of its natural type. In other cases, the range of values has a strong influence. Indeed, as can be seen in figure 5, a seemingly good choice can be invalidated when the range of values is extreme. These factors influence the structure and contents of a statistical report and have to be looked at simultaneously in order to be effective. Many systems based on APT \[12, 13\] use types to determine structure, but specific values are often overlooked and the simultaneous use of types and goals is rare. To further illustrate the importance of simultaneous application of these factors, let's look at figure 5 again. In this graph the small values are not readable because of the scale. In general, this is considered a problem av.d can be corrected by using a different scale (logarithmic or split). However, if the intention of the writer is to illustrate the enormous difference between company D and the others, the graph is very efficient as it is.</Paragraph>
    <Paragraph position="9"> Our research extends the work of Bertin \[2\] and  MacKinlay \[12, 13\] on the types and organization of variables, the work of Zelazny on messages and goals \[20\] and integrates it with other theories on the use of tables \[19, 9\] and graphs \[5, 6, 8, 16, 17, 18\]</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML