File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/01/j01-3004_abstr.xml
Size: 29,004 bytes
Last Modified: 2025-10-06 13:41:58
<?xml version="1.0" standalone="yes"?> <Paper uid="J01-3004"> <Title>Towards Constructive Text, Diagram, and Layout Generation for Information Presentation</Title> <Section position="2" start_page="0" end_page="416" type="abstr"> <SectionTitle> 1. Introduction </SectionTitle> <Paragraph position="0"> The desirability of combining text, layout, graphics, diagrams, punctuation, and type-setting in order to present information most effectively is uncontroversial--indeed, in traditional graphic design and publishing, they could scarcely be conceived of as separate. It is therefore natural that computational attempts to synthesize texts, diagrams, and layout automatically should also now converge. In this paper, we argue that effective and coherent information presentation is best supported by adopting a common framework for physical layout and language/diagram generation. Whereas previous research has made this point convincingly for graphical and textual representations-particularly, for example, in the WIP (Andr4 et al. 1993), COMET (Feiner and McKeown 1993), and SAGE (Kerpedjiev et al. 1997; Green, Carenini, and Moore 1998) systems--we take this further and demonstrate that the same commonalities extend to include overall page layout, an area that has not previously received sufficient attention.</Paragraph> <Paragraph position="1"> The paper focuses on two aspects of automatic information presentation new in our work: * a general mechanism for organizing presentations around informational regularities in the data to be expressed--the regularities then inform the presentational strategies used for natural language, diagrarn, and layout generation; * the construction of an indirect relationship between structured communicative intentions (typically represented in both mono- and multimodal work by some kind of rhetorical structure) and their expression in page layout.</Paragraph> <Paragraph position="2"> The former allows us to ensure broad consistency of perspective and informational organization across elements presented using different media e.g., across diagram and text; the latter allows us to draw closer to the kind of sophisticated layout that is observable in human-produced presentations.</Paragraph> <Paragraph position="3"> We organize the paper as follows. We first introduce the mechanism for data-driven aggregation that we have developed, since this underlies our approach to both natural language generation and diagram design (Section 2). We then sketch the place of layout as an organizing framework within our approach as a whole (Section 3), setting out by means of examples some of the issues focused upon in the empirical investigation (Section 4). We then summarize the results of the empirical study in terms of an abstract specification for performing page layout (Section 5) and provide a first illustration of its application within the prototype multimodal information-presentation system DArtbio (Dictionary of Art: biographies) (Section 6). We conclude by summarizing the main contributions of our work and some of the follow-up research and development to which it is now leading (Section 7).</Paragraph> <Paragraph position="4"> 2. Data-driven Aggregation for Visualization and Natural Language Generation It is commonly recognized in work on multimodal information presentation that much of the true value of such presentations lies in appropriate juxtapositions of non-identical but overlapping information. Textual presentations and graphical presentations have differing strengths and weaknesses and so their combination can achieve powerful synergies. Conversely, simply placing textual and graphical information together is no guarantee that one view is supportive of another: if the perspective on the data taken in a graphic and that taken in a text have no relation (or, worse, even clash), then the result is incoherence rather than synergy--cf, the discussions by authors such as Arens, Hovy, and Vossers (1993), Fasciano and Lapalme (1996), Green et al. (1998), and Fasciano and Lapalme (2000).</Paragraph> <Paragraph position="5"> One means of ensuring mutually compatible presentations across modes is to drive both the language and the graphic generation from the same communicative intentions. If an automatic natural language generator and an automatic graphic generator both receive the task of expressing broadly similar, or compatible, intentions then there is a good chance that the resulting presentations will also be perceived to be compatible and mutually supportive. This has been used to good affect in systems such as CGS (Caption Generation System) of Mittal et al. (1998), where it is clearly crucial that the text and the graphic be in close correspondence. Another, in some ways related, approach is to derive both the graphic and textual elements from different components of a single presentation plan: thus, for example, one part of the Bateman, Kamps, Kleinz and Reichenberger Constructive Page Generation presentation plan might express textually an instruction that must be carried out (turn the dial), while another part of the plan elaborates on that instruction by showing a diagram in which the location of the action to be performed is identified graphically. This has been explored extensively in systems such as WIP (Andr~ et al. 1993), PPP (AndrG Rist, and Mfiller 1998), and COMET (Feiner and McKeown 1993).</Paragraph> <Paragraph position="6"> While both of these approaches are essentially top-down, or goal driven, effective presentations can also be produced by responding to regularities found in the data to be presented. Such regularities are difficult to predict as they are strongly contingent on what set of data happens to have been selected. &quot;Data-driven&quot; methods of this kind are commonly found in automatic visualization, where the goal is to present users with some comprehensible view of large collections of data. Utilizing regularities in the data is essential for effective visualization. In previous work (Reichenberger, Kamps, and Golovchinsky 1995), a set of techniques for generative diagram design were developed for precisely this task, i.e., for presenting overviews of datasets. We subsequently recognized that this approach also has applications to the task of aggregation in natural language generation, and we thus adapted it for use across both textual and graphical presentation modes. This provides a further technique for ensuring consistency between graphical and textual presentations--if both the graphical and textual presentations express the same regularities, or redundancies, that have been found in a dataset, then they are necessarily compatible in this respect. This allows us to use contingent data-driven organizations for generating information while nevertheless preserving coherent and mutually supportive views across presentation modalities.</Paragraph> <Section position="1" start_page="410" end_page="414" type="sub_section"> <SectionTitle> 2.1 Data-driven Aggregation: the Mechanism </SectionTitle> <Paragraph position="0"> The original generative diagram-design algorithms developed by Reichenberger, Kamps, and Golovchinsky (1995) built on the landmark work of Mackinlay (1986).</Paragraph> <Paragraph position="1"> Here, a data-classification algorithm flexibly links relational data with elements of a graphical language. These elements are allocated particular degrees of expressiveness so that appropriate graphical resources can be selected as required to capture the data being described. Reichenberger et al. extended this approach by employing a general type hierarchy of data properties to determine algorithmically the most specific property subtype (e.g., transitive, acyclic directed graph, inclusion, etc.) that accurately describes a dataset to be visualized. This subtype allows in turn selection of the particular forms of diagrammatic representation (e.g., trees, nested boxes, directed arrows, etc.) that are expressively adequate, but not over-expressive, for that dataset.</Paragraph> <Paragraph position="2"> The theoretical basis of these methods is given in detail in Kamps (1997; 1998).</Paragraph> <Paragraph position="3"> They rest on a new application of Formal Concept Analysis (FCA) (Wille 1982). FCA is an applied mathematical discipline based on a formal notion of concepts and concept hierarchies and allowing the exploitation of mathematical reasoning for conceptual data analysis and processing. In particular, FCA permits the efficient construction of dependency lattices that effectively represent the functional and set-valued dependencies established among the domains of some data relation. Such dependency lattices can then motivate the differential selection of appropriate graphical presentations.</Paragraph> <Paragraph position="4"> FCA starts from the notion of a formal context (G,M,I) representing a dataset in which G is a set of objects, M is a set of attributes, and I establishes a binary relation between the two sets. I(g, m) is read object g has property m where g c G and m E M. Such a context is called a one-valued context. For illustration, we draw on the domain of the DArtbio system that we discuss below: an example of a one-valued context corresponding to the attribute Profession for a set of artists is shown in the table to the left of Figure 1. Concepts in FCA are defined in accordance with the Example of a one-valued context and its corresponding lattice.</Paragraph> <Paragraph position="5"> traditional theory of concepts, and consist of an extension and an intension. The extension is a subset A of the set of objects G and the intension is a subset B of the set of attributes M. We call the pair (A,B) a formal concept if each object of the extension has all the properties of the intension. Thus, for the data shown in Figure 1, the pair ({Gropius, Breuer}, {Urban Planner, Architect}) represents a formal concept: each of the members of the extension possesses all the attributes mentioned in the intension. The set of all concepts for some formal context can be computed effectively using the Next Closure algorithm developed by Ganter and Wille (1996).</Paragraph> <Paragraph position="6"> The main theorem of concept analysis then shows that the set of concepts for a formal context can be organized into a complete lattice structure under the following definition of the subconcept relation: a concept (A,B) is a subconcept of (A*,B*) if and only if A c A* ~ B* _c B (Wille 1982). The concept lattice may be constructed by starting from the top concept (the one that has no superconcepts) and proceeding top-down recursively. In each step we compute the set of direct subconcepts and link them to the respective superconcept until we reach the greatest lower bound of the lattice itself (the existence of which is always guaranteed for finite-input data structures). An efficient implementation of this algorithm is given in Kamps (1997). The lattice corresponding to our example one-valued context is given to the right of Figure 1. This lattice shows the full labeling of formal concepts in order to ease comparison with the originating table. Much of this information is redundant, however, and so we generally use variations on the abbreviated, more concise, form shown in Figure 2. Such lattices naturally capture similarities and differences between the values of the specified attributes of objects: each concept of the lattice indicates objects with some set of values in common. Moreover, the generalizations are organized by subsumption, which supports the selection of most-specific subtypes. null When considering datasets in general, we typically need to express more information than that of single attributes and for this we require multi-valued contexts. An example of a multi-valued context is shown in Table 1, which includes our previous one-valued context as one of its columns; for ease of discussion, however, we will for the present restrict the Profession attribute so that each artist has only one profession. The table shows the subject areas/professions, institutions, and time periods in which the indicated artists were active. Formally, a multivalued context is a generalisation of a one-valued context and may be represented as a quadruple (G,M,W,I) where G, M, and I are as before, and W represents the set of values of the attributes-- null Concept lattice example, more succinctly labeled. Here, the extension label for each node consists of just those elements which are added at that node moving up the lattice; conversely the members of the intensions are shown moving down the lattice, again adding just those elements that are new for that node. For example, the node labeled simply Gropius, Breuer corresponds to the full form ({Gropius, Breuer}, {Architect, Urban Planner}) since both Gropius and Breuer are newly added to the extension at that node, while no new elements are added to the intension--'Architect' and 'Urban Planner' are both inherited from above.</Paragraph> <Paragraph position="7"> Table 1 A collection of facts concerning artists and their professions drawn from the frame-based domain model used for the Dictionary of Art: biographies and re-expressed as a table of facts and attributes. (The facts are for illustrative purposes only and should not be taken as reliable statements of art history!) which are, in contrast to the one-valued-context case, not trivially either true or false, applicable or not. To identify the value w c W of attribute m E M for an object g E G, we adopt the notation rn(g) = w and read this as attribute m of object g has value w.</Paragraph> <Paragraph position="8"> Kamps (1997) renders multivalued contexts amenable to the techniques for dependency-lattice construction by deriving a one-valued context that captures the functional dependencies of the original multivalued context. To see how this works, we first note that a functional dependency in a relation table is established when the following implication is always true: for two arbitrary objects g, h E G and two domain sets D,D* E M, then D(g) = D(h) ~ D*(g) = D*(h). This implication suggests the following construction for an appropriate one-valued dependency context: for the set of objects take the set of subsets of two elements of the given multi-valued context P2(G); for the set of attributes take the set of domains M; and for the connecting incidence relation take IN( {g,h}, m) :~=~ m(g) = m(h). The required dependency context is then represented by the triple (P2(G),M, IN). This is illustrated in the table to the left of Figure 3, which shows the one-valued context corresponding to the multivalued context of Table 1. An entry here indicates that the identified attribute has the same</Paragraph> <Paragraph position="10"> Example dependency context and corresponding lattice.</Paragraph> <Paragraph position="11"> value for both the facts identified in the object labels of the leftmost column: for example, gl and g2 share the values of their Profession and School attributes. This provides a wholistic view of the dependency structure of the original data and is, moreover, computationally simple to achieve.</Paragraph> <Paragraph position="12"> It is then straightforward to construct a dependency lattice as described above; this is shown to the right of Figure 3. The arcs in this lattice now represent the functional dependencies between the involved domains, and the equalities (e.g., m(gl)=m(g2)) represent the redundancies present in the data. For example, the lower left node labeled Period indicates not only that the third- and fourth-row entries under Period (g3 and g4) are identical but also, following the upward arc, that these entries are equal with respect to School; similarly, following upward arcs, the middle node (m(gl)=m(g2)) indicates that the first- and second-row entries (e.g., gl and g2) are equal with respect to both School and Profession. The lattice as a whole indicates that there are functional relationships from the set of persons into the set of professions, the set of periods, and the set of schools. A further functional relationship exists from the set of periods into the set of schools.</Paragraph> <Paragraph position="13"> Once such a lattice has been constructed, we also have as a consequence a set of classifications of the original relational input, or dataset. This can directly drive visualization as follows. For graphics generation, it is important that all domains of the relation become graphically encoded: this means the encoding is complete. Kamps (1997) proposes a corresponding graphical encoding algorithm that starts encoding the bottom domain and walks up the lattice employing a bottom-up/left-to-right strategy for encoding the upper domains. The idea of this model, much abbreviated, is that the cardinality of the bottom domain is the largest, whereas the domains further up in the lattice contain fewer elements. Thus, the bottom domain is graphically encoded using so-called graphical elements (rectangle, circle, line, etc.), whereas the upper domains are encoded using graphical attributes (color, width, radius) and set-valued attributes that must be attached to graphical elements. In general, it is preferable to maximize graphical attributes over set-valued attributes as this keeps graphical complexity moderate.</Paragraph> <Paragraph position="14"> Figure 4 shows two example diagrams that are produced from the dataset of Table 1 via the dependency lattice shown to the right of Figure 3. Informally, from the lattice we can see directly that artists (Person) can be classified on the one hand according to work period (following the lefthand arc upwards) and, on the other hand, jointly according to school and profession (following the vertical arc). The algorithm first allocates the attribute Person, indicated in the lowest node of the lattice, to the basic graphical element rectangle; the individual identities of the set members are given by a graphical attachment: a string giving the artist's name. The functional relationship between the set of persons and the set of time periods is then represented by the further Bateman, Kamps, Kleinz and Reichenberger Constructive Page Generation</Paragraph> <Paragraph position="16"> Example diagrams generated for the example data. Alternatives are produced by two distinct traversals of the aggregation lattice.</Paragraph> <Paragraph position="17"> graphical attribute of the length of the rectangle. This is motivated by the equivalence of the properties of temporal intervals in the data and the properties of the graphical relationship of spatial intervals on the page. Two paths are then open: following the functional relationship first to either a set of schools or to a set of professions. Diagram (a) in Figure 4 adopts the first path and encodes the school relationship by means of the further graphical attribute of the color of the rectangle, followed by a nesting rectangle for the relationship to Professions; diagram (b) illustrates the second path, in which the selection of graphical encodings is reversed. Both the selection of color and of nesting rectangles are again motivated by the correspondence between the formal properties of the graphical relations and those of the dependencies observed in the data. Reinstating the multiple professions of Gropius and Breuer mentioned in Figure 1 gives rise to a rather different dependency lattice in which the second solution is no longer possible.</Paragraph> <Paragraph position="18"> All of these mechanisms were implemented and used extensively for visualization in the context of an Editor's Workbench for supporting editorial decisions during the design of large-scale publications such as encyclopedias (Rostek, M6hr, and Fischer 1994; Kamps et al. 1996). 1</Paragraph> </Section> <Section position="2" start_page="414" end_page="416" type="sub_section"> <SectionTitle> 2.2 The Partial Equivalence of Diagram Design and Text Design </SectionTitle> <Paragraph position="0"> A selection of particular graphical elements entails the expression of particular functional dependencies. This is similar to decisions that need to be made when generating text. For instance, the equality rn(gl) = re(g2) in the lattice of Figure 3 above can also motivate a particular grouping of information in a corresponding linguistic presentation. That is, whereas graphically the equality motivates an association of both Gropius and Breuer with the graphical attributes allocated to Professions and Schools, textually we may connect both artists in a single sentence: i.e., gl (concerning Gropius) and g2 (concerning Breuer) can be compactly expressed by collapsing their (identical) school and profession attributes: Both Gropius and Breuer were architects and taught at Harvard.</Paragraph> <Paragraph position="1"> A similar phenomenon holds for the grouping re(g3) = rn(g4); here, g3 (concerning A.</Paragraph> <Paragraph position="2"> Albers) and g4 (concerning J. Albers) may be succinctly expressed by collapsing their identical period and school attributes.</Paragraph> <Paragraph position="3"> Computational Linguistics Volume 27, Number 3 Combining these considerations motivates the following approximate textual re-rendering of diagram (b): Anni Albers (who was a designer) and J. Albers (who was an urban planner) both taught at the BMC from 1933 until 1949. Moholy-Nagy (who was also an urban planner) taught from 1937 until 1938 at the New Bauhaus. Gropius and Breuer (both architects) were, at partially overlapping times (1937-1951 and 1937-1946 respectively), at Harvard. Hilberseimer (who was an architect too) taught at the IIT from 1938 until 1967.</Paragraph> <Paragraph position="4"> A textual re-rendering of diagram (a) would reflect the contrasting groupings entailed there: i.e., Breuer, Gropius, and Hilberseimer would be grouped at top level whereas the two Albers would not.</Paragraph> <Paragraph position="5"> A dependency lattice extracts partial commonalities that remain constant over subsets of the data to be presented and this is closely related to the problem of aggregation in NLG (cf., Dalianis \[1999\]). The functional redundancies captured by the lattice construction are precisely those redundancies that indicate opportunities for structurally induced aggregation. Selecting a particular graphical element or attribute to realize some aspect of the data is in fact an aggregation step. In Bateman et al. (1998), we have shown this in terms more familiar to NLG by re-interpreting in dependency lattice terms some of the standard examples of aggregation discussed in the literature. Below, we show that mutual consistency between textual fragments produced by our NLG component and graphical elements produced by the automatic visualization component can be enforced by driving both from a common dependency lattice.</Paragraph> <Paragraph position="6"> 3. Preliminaries for Layout: Inputs and Outputs for the Layout Determination Task Page layout, more properly termed typographic design, is usually divided into three levels: microtypography, macrotypography (layout proper), and style. Here we are most concerned with macrotypography--the segmentation of a page of information into more or less closely related &quot;visual blocks.&quot; Macrotypography is a central component of professional document design; indeed, Every designer knows that how elements are put together on a page communicates a powerful message (Adobe Inc., InDesign product information sheet).</Paragraph> <Paragraph position="7"> Unfortunately, with some valuable exceptions (cf., for example Schriver \[1996\], Waller \[1988\] and Bernhardt \[1985\]), the professionals do not then go on to tell us just what that message might be.</Paragraph> <Paragraph position="8"> Our starting point for investigating layout and its message rests on the fact that layout is not a fixed property of information presentation; i.e., similar information can be subjected to diverse layouts. We then assume, following Schriver (1996) and others, that layout decisions should be functionally motivated in terms of a presentation's communicative purposes. We illustrate this further, while at the same time setting the scene for our empirical investigation, by briefly considering the kinds of layout variation that are commonly found. We do this in two steps. First, we characterize more finely the notion of layout as such; then, we consider how selections among possible layouts may be motivated.</Paragraph> <Paragraph position="9"> Bateman, Kamps, Kleinz and Reichenberger Constructive Page Generation</Paragraph> </Section> <Section position="3" start_page="416" end_page="416" type="sub_section"> <SectionTitle> 3.1 Layout structure </SectionTitle> <Paragraph position="0"> Issues of layout were already present within the visualization framework discussed above. For example, the relationship of the graphical blocks representing particular artists, or the positioning of the diagrams' legends with respect to the diagrams themselves, all involve decisions of layout. The solution developed as part of the automatic visualization component used in the Editor's Workbench was to consider layout itself as a particular class of diagrams, with their own particular properties and concerns.</Paragraph> <Paragraph position="1"> An automatic page-layout component (APALO) was accordingly implemented as a specialization of the general visualization task.</Paragraph> <Paragraph position="2"> Fully specified layout diagrams specify the physical placement and appearance of elements on a page. In order to generalize across such layouts we define an abstract level of representation called Layout Structure. Layout structure abstracts across the precise details of physical layouts to focus on classes of layouts that are visually &quot;equivalent.&quot; Visually equivalent layouts suggest the same page blocks, with similar inter-block relationships of perceived prominence and similarity.</Paragraph> <Paragraph position="3"> Our view of layout structure draws heavily on Southall (1992), who defines a restricted set of typographical relation types. These include: containment, i.e. recursive block structure; reading order, i.e., generally left-to-right, top-to-bottom reading paths in Western cultures; similarity, describing blocks that share some visual properties such as size, typeface selection, structure, etc.; and reference, where a connection between visual blocks is suggested by physical proximity. We represent layout structures in terms of a tree structure (representing containment) augmented by a restricted set of possible additional annotations corresponding to the remaining typographical relation types. The annotations thus serve either to further constrain the possible physical layouts that may render the layout structure, or to place mutual constraints on the rendering possibilities--for example, a type-equivalence annotation requires consistency in rendering decisions across the units declared to be type-equivalent.</Paragraph> <Paragraph position="4"> A simple example of layout structure and its correspondence to a physical layout is shown in Figure 5. Here we see that annotations also provide a numerical summary of the information to be displayed in any layout element (which may be either descriptive or denote a target): for example, node 2.3.2 in the figure is annotated 403w+3p:50, indicating that it consists of a block of text with 403 words and 3 pictures, and is allocated an importance score of 50%. 2 These scores impose target visual weights for corresponding page elements (i.e., more important nodes should be more prominent, which can be achieved by larger surface area combined with less but more heavy type, by use of prominent colors, etc.). More information concerning layout structure and its motivation is given in Reichenberger et al. (1996).</Paragraph> <Paragraph position="5"> Given a fully specified layout structure as input, APALO renders it as a physical page by mapping constituency to nested boxes (i.e., inclusion diagrams), and strength of connection and sequence to spatial displacement: the boxes included within an enclosing box are arrayed two-dimensionally to influence reading order. Typographic attributes, such as type size, specific type face within the family (bold, italic etc.), arrangement of the type (ragged right, flush matter, etc.), leading, coloring and orientation, are all assigned at this stage, respecting any constraints on presentation given in the abstract layout structure. Since it is rarely the case that a layout structure is so tightly specified that only one physical layout is possible, the implementation uses progressive refinement and allows a user either to stop the process at any point or</Paragraph> </Section> </Section> class="xml-element"></Paper>