File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/j98-3004_metho.xml
Size: 72,464 bytes
Last Modified: 2025-10-06 14:14:48
<?xml version="1.0" standalone="yes"?> <Paper uid="J98-3004"> <Title>Describing Complex Charts in Natural Language: A Caption Generation System</Title> <Section position="3" start_page="433" end_page="434" type="metho"> <SectionTitle> 4 Except those that are positioned relative to other objects, as explained in Section 4. </SectionTitle> <Paragraph position="0"> A SAGE-generated version of the well-known Minard graphic.</Paragraph> <Paragraph position="1"> SAGE'S output consists of one or more coordinated sets of 2-D information graphics that use a variety of graphical lechniques to integrate multiple data attributes in a single display. SAtE integrates multiple attributes in three ways: * by representing them as different properties of the same set of graphical objects. For example, both the left and the right edges of the bars in the left most chart in Figure 2 are used to map attributes (asking-price and selling-price respectively) * by assembling multiple graphical objects into groups that function as units to express data. For example, the interval bar and the mark in the left most chart in Figure 2 are used to show different types of price-related attributes: asking-price, selling-price, and the agency-estimate. * by aligning multiple charts and tables together with respect to a common axis. For example, the three charts in Figure 2 are aligned on the Y-axis, which indicates the house.</Paragraph> <Paragraph position="2"> Creating a graphic that integrates data in this way is partly an encoding process in which the values of data attributes are converted into graphical values of properties of objects (e.g., color, shape, and spatial position of polygons). Interpreting the information in a graphic is a decoding process, where people must translate visual symbols back into data values. SAGE creates graphics that enable people to efficiently perform information-seeking tasks (e.g., searching for clusters of data values that are different from the rest and looking up other facts to understand what makes them different). In designing graphics, however, SAGE only considers how effectively attributes can be mapped to graphical properties to support a task. For example, a requirement to be able to search for particular values by name might result in the relevant attribute being arranged along an axis in lexicographic order; on the other hand, if it is important to find the maximum and minimum values in a set, SAGE might order these values in terms of magnitude. SAGE, like other automated presentation systems (Casner 1991; Mackinlay 1986), does not take into account perceptual complexities associated with the resulting graphic. For instance, SAGE does not explicitly reason about the difficulties users may have in translating bicolor saturation scales to exact numerical values, s</Paragraph> </Section> <Section position="4" start_page="434" end_page="400001" type="metho"> <SectionTitle> 5 The Minard Graphic, shown in Figure 1, uses the bicolor saturation technique to map temperature </SectionTitle> <Paragraph position="0"> values to the march segments shown in the map.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 24, Number 3 SAGE can also design complex presentations that have overlapping objects, or use (:luster composition to define a novel combination or grouping of graphical objects in the presentation. This can make understanding some of the graphics that SAGE generates quite difficult. Fortunately, the picture representation used in SAGE contains a complete declarative representation of the content and structure of the graphic in a form that can be used for reasoning by other components. Thus, this representation can be used to reason about possible sources of user confusion arising from mappings that are either complex or ambiguous to the user.</Paragraph> <Paragraph position="2"> SAGE'S representation serves three functions in explanation generation. First, it helps define what a viewer must understand about a graphic in order to obtain useful information from it. It does this by defining the elements of a graphic and the way they combine to express facts (i.e., how they map to data). Second, the representation describes the structure of both the graphical presentation and the data it presents, so that they can be explained coherently. Finally, the representation helps derive judgments of complexity for specifying graphical elements needing text explanation. To understand these three functions, we briefly review the representation.</Paragraph> <Paragraph position="3"> Graphemes are the basic building blocks for constructing pictures. Marks, text, lines, and bars are some of the different grapheme classes available in SAGE. Each grapheme type consists of a definition of the parameters that control the appearance of all graphemes of that type; different grapheme subtypes can be created by varying specific parameters. Individual graphemes can be generated by providing appropriate values for all the input parameters. For instance, individual marks can be generated by providing values for the parameters: x-coordinate, y-coordinate, shape, size, and color to an instance of a mark class encoder; individual line segments can be generated by providing values for the coordinates xl, yl, x2, and y2, thickness; and color to the line class encoder.</Paragraph> <Paragraph position="4"> Symbol classes are used to organize graphemes into structures that express facts in the data set. A labelled-mark, an interval bar, and a bar with an attached label are some of the more familiar symbol classes available in SAGE. Each symbol class consists of a definition of the spatial relationship among a set of graphemes and the correspondence between the parameters of this set and attributes types in a data set. A labelled-mark, for instance, would be defined as a combination of a mark and a text label and the spatial relationship between them. Consider the labelled-marks in the chart shown in Figure 3. The spatial position of the label is dependent on the position of the mark: it is offset slightly to the right and above the mark. Symbol classes in SAGE can be either predefined (some of the more common ones, such as a labelled-mark have already been defined), or created by the system based on rules about combining different graphemes into clusters.</Paragraph> <Paragraph position="5"> Encoders are used to relate specific data values and graphical values to each other.</Paragraph> <Paragraph position="6"> Horizontal/vertical axes, color keys, size keys, and shape keys are some of the different encoders available in SAGE. Each encoder class consists of a definition of the relation between a family of data set attributes and a particular graphical type. SAGE can then use this information to map data values to graphical values in designing a picture, and provide a frame of reference (e.g., axes, keys, etc.) that can be used to visually interpret specific values in the picture. For instance, a color encoder could map data values &quot;less than 5&quot; to the graphical value &quot;blue&quot; and others to &quot;red.&quot; A schematic of the encoders used in the chart shown in Figure 3 is shown in Figure 4.</Paragraph> <Paragraph position="7"> In addition to this knowledge about graphemes, symbols, and encoders, SAGE uses knowledge of the characteristics of data relevant to graphic design (Roth and Mattis 1990; Roth and Hefley 1993), including knowledge of data types and scales of measurement (e.g., quantitative, interval, ordinal, or nominal data sets), structural</Paragraph> <Paragraph position="9"> A graphic generated by SAGE to illustrate the use of encoders.</Paragraph> <Paragraph position="10"> relationships among data (e.g., the relation between the endpoints of ranges or between the two coordinates of a 2-D geographic location), and the functional dependencies among attributes in database relations (e.g., one:one, one:many, many:many). As we will show later, the latter is an important factor in selecting a high-level discourse strategy for generating explanatory captions.</Paragraph> <Paragraph position="11"> Finall3C/ SAGE has a library of graphical techniques, knowledge of the appropriateness of the techniques for different data and tasks, and design knowledge for as- null Computational Linguistics Volume 24, Number 3 sembling these techniques into composites that can integrate information in a single display. SAGE uses this graphic design knowledge together with the data characterization knowledge to generate displays of information.</Paragraph> <Paragraph position="12"> To summarize, the portion of sign's knowledge base that is most relevant for generating explanatory captions is its graphical syntax and semantics. The syntax includes a definition of the graphical constituents that convey information: spaces (e.g., charts, maps, tables), graphemes (e.g., labels, marks, bars), their properties (e.g., color, shape), and encoders--the frames of reference that enable their properties to be interpreted/translated back to data values (e.g., axes, graphical keys). The syntax also defines the ways in which graphemes can be combined to form symbols---composites that integrate multiple data attributes (e.g., a label attached to a mark). The syntactic structure of a graphical display, like the linguistic structure of text, can provide guidance for creating structurally coherent explanations.</Paragraph> <Paragraph position="13"> The representation of the semantics of graphics conveys the way data is mapped to the syntactic elements of displays. It also provides guidance for organizing explanatory captions by grouping graphical elements that express data attributes that form a coherent group. The data characterization provides knowledge of the structure of the data and therefore also influences the structure of the explanation.</Paragraph> <Paragraph position="14"> 3. Discourse Strategies for Generating Captions Explanations about informational graphics can be classified into at least three categories based on the structural properties of the picture, the structure of the underlying data attributes, and their mapping to spaces and graphemes. These explanation strategies reflect the overall structure of the graphic presentation: whether the spaces are aligned along a common axis, and around the functionally independent attribute (FIA). An attribute is functionally independent if it uniquely determines the values of all other attributes. For example, in one of our current data sets about house sales, the house's street address has been specified as the FIA; it uniquely determines asking-price, selling-price, and the other attributes in the database. In contrast, the listing agency does not uniquely determine any of the other attributes in the house-sales relation.</Paragraph> <Paragraph position="15"> In addition to the factors mentioned above---used to select the overarching discourse strategies--the system makes use of additional information about the symbols and their mappings used in the display to select and organize information to be presented in the caption. For instance, the system uses graphical information to determine the order in which information is presented. This reasoning can occur at various levels of the picture representation: at the space level (all objects in a space are described before objects in another space), at the grapheme cluster level (all objects in a cluster are described together), and at the encoder level (all objects that map the same attribute type are described together). SAGE'S representation of the graphical display thus provides additional information that can be considered when text explanations are generated.</Paragraph> <Paragraph position="16"> The process of generating natural language explanations can be divided into three conceptual stages: (i) select a discourse strategy to provide the overall organization of the explanation based on the structural properties of the graphical presentation, the relations expressed in the data set, and the data-to-grapheme mappings; (ii) within each space of the presentation, use the complexity metric to determine the amount of detail to be included in the explanation; and (iii) reason about the tactical decisions in sentence planning.</Paragraph> <Paragraph position="17"> In our current application, content selection mainly consists of determining the complex or ambiguous aspects of a graphic presentation. In general, knowledge-based Encoders used in mapping attributes to a labelled-mark in the chart shown in Figure 3. systems cannot afford to generate a paraphrase of the entire knowledge base. As illustrated by Figure 5, an explanation that includes all the facts in the underlying picture representation or data set for even a simple graphic in SAtE would be extremely verbose. Most of the facts expressed in such a caption would be both obvious and unnecessary for the average user. Studies have shown approximately three-fourths of the time spent by users in interpreting a graphic is used in understanding the data-to-grapheme mappings (Shah 1995; Cleveland and McGill 1987). Therefore, our initial goal was to generate captions describing only those mappings that might be either complex or ambiguous for the average user. The system can currently analyze a picture representation for five different types of complexities and ambiguities; these are discussed in greater detail in the following section (Section 4). This section discusses the three strategies used by the system to structure the content during text planning.</Paragraph> <Paragraph position="18"> The sentence planning phase is discussed in Section 5, where the individual components implementing the tactical decisions in the microplanner are described in detail.</Paragraph> <Section position="1" start_page="438" end_page="439" type="sub_section"> <SectionTitle> 3.1 Strategy 1: Graphic Organized Around the Functionally Independent Attribute </SectionTitle> <Paragraph position="0"> As mentioned earlier, the three strategies used by the caption generator depend upon both the structure of the graphic presentation and the relations in the data set presented in the graphic. The first strategy can be applied when the data set contains a functionally independent attribute (FIA) that is used as an organizing device or &quot;anchor&quot; for the entire graphic. This occurs either when the graphic has only one space and the FIA is mapped to one of the axes, or when there are multiple spaces and the FIA is mapped to the axis of alignment.</Paragraph> <Paragraph position="1"> Computational Linguistics Volume 24, Number 3 These three charts show information about houses from data set PGH23. Each chart has two axes. The Y-axis identifies the houses in the three charts. The data set countains 17 items. The X-axis in the first chart indicates house prices. The origin is at zero and there are 4 ticks on the axis, with the maximum value being $320,000. The difference between each tick is $80,000. The values mapped to the axis range from $55,000 to $310,000. The left edge of the bar shows the selling price of a house whereas the right edge of the bar shows the asking price of a house. Selling prices shown range from $55,000 to $304,000. Asking prices range from $61,000 to $310,000. The horizontal position of the square mark shows the agency estimate. These range from $55,000 to $305,000. For example ...</Paragraph> <Paragraph position="2"> Figure 5 A fragment of one possible verbose caption for the graphic in Figure 6.</Paragraph> <Paragraph position="3"> In such cases, the strategy attempts to reinforce the organizing role of the functionally independent attribute. The explanation strategy identifies the anchor and the independent attribute first. Then, it describes each space in the picture relative to the anchor. Domain attributes mapped in the graphic are also mentioned in the context of the FIA and the type of relationship defined between them (one:one or one:many). Two SAGE-generated graphics and the associated explanations that illustrate this organizing principle are shown in Figures 6 and 7. These two figures illustrate the importance of a caption generator in this application. Both figures present the same data set about house sales. However, the presentations generated by SAGE are different, make use of different mappings, and give rise to different perceptual complexities. Consequently, the content of the captions generated is also different. However, in both the captions, the overall discourse strategy is the same: to emphasize the aligning Y-axis, the functionally independent attribute--the house-address--and structure the description of the other attributes in terms of the FIA.</Paragraph> </Section> <Section position="2" start_page="439" end_page="440" type="sub_section"> <SectionTitle> 3.2 Strategy 2: Single Space Organized Around Dependent Attributes </SectionTitle> <Paragraph position="0"> In cases where the graphic is organized around dependent attributes, the explanation cannot be structured around any of them. This is because the attribute may be defined in either one:many or many:many relationships in the dataset and cannot therefore be used as an identifier. This is the case in Figures 8 and 9. In these two figures, the attributes that are mapped to the axes of the charts are dependent attributes such as days-on-market, number-of-rooms, and lot-size. Neither of these can be used to refer to other attributes unambiguously. Thus, the discourse strategy cannot be the same as in the case where an FIA is mapped along one of the axes. Instead the explanation emphasizes the relation between the dependent attribute(s) that serve as organizer(s).</Paragraph> <Paragraph position="1"> There are two strategies depending on whether or not the figure consists of multiple spaces. If there is only a single space in the graphic, the explanation emphasizes the relation between the attributes encoded against the two axes. A SAGE-generated graphic and the associated explanation that illustrates this organizing principle is shown in the relationship between the attributes mapped along the axes. Figure 8 shows the relationship between the variation in house prices and the number of days a house is on the market in the data set.</Paragraph> <Paragraph position="3"/> </Section> <Section position="3" start_page="440" end_page="440" type="sub_section"> <SectionTitle> Listing Agency </SectionTitle> <Paragraph position="0"> These three charts show information about houses from data set PGH-23. The Y-axis identifies the houses in the three charts. In the first chart, house prices are shown by the X-axis. The house's selling price is shown by the left edge of a bar, whereas the asking price is shown by the right edge. The horizontal position of the mark shows the agency estimate. For example, as shown in the highlighted tuple, the asking price of 3237 Beechwood is $82K, its selling price is $75K, and the agency estimate is $81K. In the second chart, the house's date on the market is shown by the left edge of a bar, whereas date sold is shown by the right edge. Color indicates the neighborhood.</Paragraph> <Paragraph position="1"> The third chart shows the listing agency.</Paragraph> <Paragraph position="2"> Figure 6 Graphic with caption generated using strategy 1.</Paragraph> </Section> <Section position="4" start_page="440" end_page="441" type="sub_section"> <SectionTitle> 3.3 Strategy 3: Multiple Spaces Aligned along an Axis with Dependent Attributes </SectionTitle> <Paragraph position="0"> The second strategy discussed above is only applicable if there is a single space in the presentation. However, SAGE is capable of designing presentations with multiple spaces that are aligned along dependent attributes in the data set. In such cases, the explanation generator cannot describe all the concepts in the presentation using strategy 2. This is because if one of the spaces in the presentation happens to have the FIA mapped to its non-aligned axis, a description such as &quot;this space shows the (one:one) relationship between the {FIA/ and (attribute-2 I&quot; would not be natural. In such cases, it is more natural to use strategy 1 to describe the mappings in that space. Therefore, strategy 3 allows the system to organize the captionf0r each space accordingly, depending upon whether the FIA is mapped along its nonaligned axis. Figure 9 shows such a graphic and the corresponding caption. The two charts in Figure 9 are aligned along the X-axis, which is used to encode house-price. In generating the captions for the two charts, the system describes each one independently, using either strategy 1 or 2, as appropriate. It describes the top one first (following the structure of the graphic) and then the bottom one. Each of them, in this case, is described using strategy 2 because they both have dependent attributes mapped along the axes.</Paragraph> <Paragraph position="1"> 4. Graphical Complexity: The Need for Clarification In the previous section, we discussed three strategies used to organize the information to be presented. As mentioned earlier, it is important to select information about mappings based on either complexity or ambiguity if the caption is to be both succinct and informative. We have identified five types of graphical complexities, described</Paragraph> <Paragraph position="3"> This chart and table show information about house sales from data set PGH-23. The Y-axis identifies the houses in the two spaces. In the chart, dates are shown along the X-axis. The house's date on the market is shown by the left edge of a bar, whereas the date sold is shown by the right edge. Color indicates the listing agency. The label to the left of a bar indicates the asking price, whereas the label to the right indicates the selling price. The table shows the agency estimate.</Paragraph> <Paragraph position="4"> Caption for an alternative presentation of the dataset used in Figure 6.</Paragraph> <Paragraph position="5"> below, that can make it difficult for a user to understand complex data-to-grapheme mappings. 6</Paragraph> </Section> <Section position="5" start_page="441" end_page="442" type="sub_section"> <SectionTitle> 4.1 Encoder Complexity </SectionTitle> <Paragraph position="0"> To read data values shown in a picture, users must understand the encoders used in designing the picture. Encoders allow the user to map between graphical values and attribute values. Two examples of encoders are the axes (which allow users to map between positional values in the picture and data values along the axes), and graphical keys (these can illustrate mappings between variables such as size and shape and attribute values). Complexities can arise either (i) when an encoder is complex, or (ii) when an encoder mapping uses a scale that is complex.</Paragraph> <Paragraph position="1"> Consider for instance, Figure 10. Among the encoders used in this picture are the X and Y axes, which map positional information to house prices and house addresses, respectively. In the chart shown here, the X-axis does not have a zero origin (presumably in order to make the differences between the data items clearer by having more screen real estate to display a smaller range of data values). Because of this translation of the origin, it is no longer possible to conclude in this chart that a bar twice as long as another bar encodes a value twice as large (for instance, bars representing houses WALNUT-6343 and VERMONT-637 in Figure 10). Both axis translation and truncation-to compress empty regions in quantitative data----can lead to false inferences. Similar decoding problems can occur with other encoding techniques as well, as when a quan-</Paragraph> <Paragraph position="3"/> </Section> <Section position="6" start_page="442" end_page="442" type="sub_section"> <SectionTitle> House Price </SectionTitle> <Paragraph position="0"> This chart shows information about house sales from data set PGH-23. It emphasizes the relationship between house prices and the number of days on the market. The X-axis shows the house prices, whereas the Y-axis shows the house's number of days on the market. The house's listing agency is indicated by color. The selling price is shown by the left edge of the bar, whereas the asking price is shown by the right edge. The position of the mark shows the agency estimate.</Paragraph> <Paragraph position="1"> Figure 8 Graphic with caption generated using strategy 2.</Paragraph> <Paragraph position="2"> titative attribute is mapped to the area of a circle, or nonlinear scales are used along axes.</Paragraph> <Paragraph position="3"> A more complex example of encoding technique complexity can be seen in Figure 1. Saturation and color are combined in a single encoding technique to express temperature. Dark red indicates 100 degrees and dark blue indicates -40 degrees. As the color gets paler (less saturated) it indicates a less extreme temperature. For example, pale red (pink) indicates 65 degrees, while pale blue indicates -5 degrees. White indicates a transition point. 7 Thus both the frame of reference (the color saturation key) and the technique are potentially complex here. Figure 1 also illustrates range complexity: the user must determine what the transition point is (whether it is the center of the scale, or some special value, such as 32 degrees F). The graphic is not explicit about whether the two ranges on both sides of this special transition point are balanced.</Paragraph> </Section> <Section position="7" start_page="442" end_page="400001" type="sub_section"> <SectionTitle> 4.2 Grapheme Complexity </SectionTitle> <Paragraph position="0"> Although the encoder (e.g., positional encoding on an axis) and the mapping (e.g., the scale used along file axis) may both be simple, a grapheme that uses that encoder and mapping may still be difficult for users to interpret. This may occur for a variety of reasons ranging from too many mappings to problems in identifying the mappings.</Paragraph> <Paragraph position="1"> 7 Not only is the encoding technique complex, but the user must understand the conventions used--blue to the cooler side of the scale, red to the warmer.</Paragraph> <Paragraph position="3"/> </Section> <Section position="8" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> House Price </SectionTitle> <Paragraph position="0"> These charts show information about house sales from data set PGH-23. In the two charts, the X-axis shows the selling prices. The top chart emphasizes the relationship between the number of rooms and the selling price. The bottom chart emphasizes the relationship between the lot size and the selling price.</Paragraph> <Paragraph position="1"> Figure 9 Graphic with caption generated using strategy 3.</Paragraph> <Paragraph position="2"> Complexities of this type can arise from: * multiple grapheme properties: In some cases, the presentations can include graphemes that have a large number of geometric properties used in mapping data attributes. Consider, for instance, Figure 11. While the encoders in the figure are relatively straightforward, the fact that four different mappings are used here--x-position, y-position, shape, and color--can hinder comprehension.</Paragraph> <Paragraph position="3"> * unclear geometric properties: Circular marks and horizontal bars are usually familiar to most readers and SAGE chooses them whenever possible. However, in some cases the system may have to use graphemes that are not as common. In such cases, the reader has to not only understand the encoder and the mapping technique, but also understand which property of the grapheme is being used in each encoding.</Paragraph> <Paragraph position="4"> Consider, for instance, if a triangular mark is used in a plot chart: in order to interpret its positional property, it is essential to know which of its three vertices (or the center) is used in the mapping.</Paragraph> <Paragraph position="5"> * semantic properties: The third type of grapheme complexity occurs in graphemes that have subcomponents. For instance if an icon of a truck were to be used as a grapheme, and different subcomponents were used in the mappings (e.g., speed of the truck to the wheel size, cargo type to tank color), the reader must understand not only the various data to grapheme mappings, but also the relationship between the various subcomponents.</Paragraph> </Section> <Section position="9" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 4.3 Ambiguous Mapping Complexity </SectionTitle> <Paragraph position="0"> A user's ability to identify the mapping of even simple techniques can be hindered when dissimilar graphemes (or dissimilar properties of a grapheme) are used to map to similar attribute types. Consider for instance, the charts in Figures 12 and 13. The left and right edges of the bar in Figure 12 refer to the selling-price and asking-price of a house in the domain. However, the X-axis represents prices in general, and there is no way to distinguish between the two from the figure itself. Similarly, in Figure 13, the two text labels refer to two different prices, but the two attributes cannot be distinguished from one another solely from the figure, s</Paragraph> </Section> <Section position="10" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 4.4 Composition Complexity </SectionTitle> <Paragraph position="0"> When multiple graphemes occur in a space, they can be confusing at first until their relationships to each other are clarified. Compositions can result in clusters of two types: * Cooperative Graphemes: For example, consider the chart shown in Figure 14. The mark and label graphemes form an aggregate that must be considered together. In this case, since the label conveying the real estate agency is slightly offset from the position on the X and Y axes, it cannot be interpreted as being related to a particular house and a date of sale on its own. Grapheme composition results in multiple graphemes being displayed as a spatially grouped conceptual unit--these need to be understood as such and interpreted accordingly.</Paragraph> <Paragraph position="1"> * Interfering Graphemes: Unfortunately, grapheme composition does not always result in a cluster where the graphemes are distinct and 8 In the housing domain, it may be assumed that asking-price is either greater or equal to selling-price, but in fact, this is not always the case. Buyers sometimes get into bidding wars that cause the selling price to become greater than the asking price.</Paragraph> <Paragraph position="2"> Comprehension difficulties can result from complex graphemes with multiple properties being used in the encoding.</Paragraph> <Paragraph position="3"> non-occluding. Consider, for instance, the chart shown in Figure 8. The mark indicating the agency estimate of the selling price often overlaps with the interval bar showing the actual asking and selling prices. In some cases, the asking and selling prices are so close that the mark indicating the agency estimate actually occludes the interval bar. Clusters such as this can hinder interpretation and it is important that such mappings be clarified.</Paragraph> </Section> <Section position="11" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 4.5 Alignment Complexity </SectionTitle> <Paragraph position="0"> As illustrated in Figures 6, 7, and 9, alignment of multiple charts and/or tables can be a useful technique for supporting comparisons, rapid lookups for many attributes of the same object, and for maintaining consistent scales. Whenever an alignment occurs, all but one of the charts become separated from the aligning axis labels and the relation between the aligned axis and the rest of the charts may not be clear.</Paragraph> <Paragraph position="1"> The complexity assessment module in the system is capable of identifying the graphemes in the display that are complex for any of the five reasons described in this section. It annotates the picture representation generated by SAGE to indicate the graphemes and their types of complexity. The result of the complexity assessment for the Minard graphic--Figure 1--is shown in Figure 15. As discussed earlier, for instance, the mapping between the attribute temperature and the color of the line is complex for two reasons: (i) encoding complexity, because of the use of color and saturation, and (ii) range complexity, because of the unequal distributions of warm and cold temperatures. Figure 16 gives the complexity assignment for the graphic Complexities can arise from ambiguous mappings (b).</Paragraph> <Paragraph position="2"> shown in Figure 6. In this case, the mapping between the attribute asking-price and the bar is complex for three reasons: (i) grapheme complexity, since the interval bar is a complex grapheme; (ii) ambiguous mapping, since from the graphic, it is not possible to determine whether the attribute is mapped to the left edge or the right edge of the bar; and (iii) composition complexity, since the bar and the mark can Result of the complexity assessment module for the &quot;Minard Graphic&quot; in Figure 1 (i and c are used to indicate interfering and cooperating graphemes respectively).</Paragraph> <Paragraph position="3"> overlap and occlude each other (indicated by i for &quot;interfering&quot;). The annotated picture representation can then be used as one of the knowledge sources in the NLG system to select and structure information appropriately in generating the captions.</Paragraph> </Section> </Section> <Section position="5" start_page="400001" end_page="400001" type="metho"> <SectionTitle> 5. Generating Explanatory Captions </SectionTitle> <Paragraph position="0"> A high-level overview of the system divided into functional modules is shown in Figure 17. A brief description of each module is given below. Detailed descriptions follow later in the section.</Paragraph> <Paragraph position="1"> Mittal, Moore, Carenini, and Roth Generating Chart Captions Data attribute Graphical Element Complexity Type Result of the complexity assessment module for Figure 6. Text Planning Module. The text planner takes as input the goal to generate a caption, the picture representation generated by SAGE (annotated by the complexity module), and generates a partially ordered text plan. The leaves of the text plan represent speech acts about propositions that need to be conveyed.</Paragraph> <Paragraph position="2"> Ordering Module. The ordering module takes a partially ordered text plan and imposes a total order on the speech acts. This may be based on (i) domain-specific knowledge about orderings (for instance, knowledge about temporal order of events), or in the absence of this, (ii) knowledge about graphics (e.g., the left edge of a bar is discussed before the right edge of a bar).</Paragraph> <Paragraph position="3"> Aggregation Module. The output of the ordering module is passed to an aggregation module that can combine multiple propositions into fewer, more complex ones. For instance, the module may combine some propositions regarding a grapheme into one complex proposition for more natural output.</Paragraph> <Paragraph position="4"> Centering Module. Once clauses are ordered and aggregated, coherence of the generated text can be further improved by selecting appropriate orderings between arguments of each clause. For this task, we have developed a selection strategy based on the centering model.</Paragraph> <Paragraph position="5"> Referring Expression Module. The referring expression module analyzes the picture representation and uses the discourse plan to determine appropriate referring expressions for the concepts in the speech acts.</Paragraph> <Paragraph position="6"> Lexical Choice and Realization Modules. This lexical choice module picks lexical items and transforms the speech acts to functional descriptors (FDs) to be processed by FUF/SURGE (Elhadad and Robin 1992; Elhadad 1992), the realization module used to generate the English text.</Paragraph> <Section position="1" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.1 Text Planning Module </SectionTitle> <Paragraph position="0"> The planner constructs text plans from its library of discourse action descriptions.</Paragraph> <Paragraph position="1"> The representation of communicative action is separated into two types of operators: Computational Linguistics Volume 24, Number 3 action operators and decomposition operators. Action operators capture the conditions (preconditions and constraints) under which an action can be executed, and the effects the action achieves if executed under the appropriate conditions. Preconditions specify conditions that the agent should plan to achieve (e.g., the hearer knows a certain term), while constraints specify conditions that the agent should not attempt to plan to change (e.g., facts and rules about the domain). Effects describe the changes that a discourse action is intended to have on the hearer's mental state. If an action is composite, there must be at least one decomposition operator indicating how to break the action down into more primitive steps. Each decomposition operator provides a partial specification for a subplan that can achieve the action's effects, provided the preconditions are true at the time the steps in the decomposition are executed.</Paragraph> <Paragraph position="2"> As an example of how action and decomposition operators are used to encode discourse actions, consider the two operators in Figure 18. These two operators describe the discourse action describe-space-mappings, whose only effect is achieving the state in which the reader knows all the data-to-grapheme mappings shown. The first operator is an action operator and it indicates that describe-space-mappings can be used to achieve the state where the reader knows about the mappings. The second operator in Figure 18 is one of the decomposition operators for the describe-space-mappings action. The decomposition of a nonprimitive action can be expressed either in terms of subactions (:steps slot), or in terms of subgoals of one action's effect (:rewrite slot), or in terms of both. For instance, the :rewrite slot of the decomposition in Figure 18 specifies that one way to achieve describe-space-mappings's effect of having the hearer to know all the mappings in one space is to achieve the three subgoals of having the hearer to know all the interfering, cooperating, and vanilla mappings in that space. 9 This example also illustrates how the graphical complexity metrics are used for content selection by the text planner: just as this operator can be used to describe spaces in which all three types of graphemes are present, there are other operators that deal specifically with encoder complexities, compositional complexities, etc.</Paragraph> <Paragraph position="3"> As illustrated by the second operator in Figure 18, decomposition operators may also have constraints, which indicate the conditions under which the decomposition may be applied. Such constraints often specify the type of information needed for particular communicative strategies, and satisfying them causes the planner to find content to be included in explanations. For example, the constraints of the second operator not only check that a single space is being described, but also find the graphemes of the three types used in the explanation, and the anchor mapping in this space. When the planner attempts to use a decomposition operator, it must try to satisfy all of its constraints. If a constraint contains no unbound variables, it is simply checked against the knowledge source to which it refers. However, if the constraint contains free variables (e.g., ?int-graphs in the second operator), the system must search its knowledge bases for acceptable bindings for these variables. In this way, satisfying constraints directs the planner to select appropriate content to include in explanations. In the case of the operator shown in Figure 18, the two preconditions that must be satisfied are (i) that the reader must be able to recognize the space (i.e., know which space is being discussed, and the data set being visualized), and (ii) know what the anchor mapping in the space is (if any). Anchor mappings refer to the mapping between a functionally independent attribute (FIA)---usually the key in the database schema--and the axis it is mapped to. Thus, action and decomposition operators specify how information can be combined in a discourse to achieve effects on the hearer's mental state.</Paragraph> <Paragraph position="4"> are posted to the text planner. The system generates a plan by iterating through a loop that refines the current plan (either decompositionally or causally), checking the plan after each refinement to ensure that it has not introduced any errors. Decompositional refinement selects a composite action and creates a subplan for that action by adding instances of the steps listed in the decomposition operator to the current plan. Causal refinement selects an unsatisfied precondition of a step in the plan and adds a causal link to establish the needed condition. This is done either by finding a step already in the plan that achieves the appropriate effect, or by using an action operator to create a new step that achieves the needed condition as one of its effects. For a complete definition of the algorithm, its computational properties, and its utility for discourse planning, see Young, Pollack, and Moore (1994), and Young and Moore (1994).</Paragraph> <Paragraph position="5"> In the remainder of the section, we present the modules that follow the text planning process and implement tactical decisions. To clarify the discussion, we describe how each module contributes to the generation of clauses (3) to (5) in the sample caption shown in Figure 19.</Paragraph> </Section> <Section position="2" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.2 Ordering Module </SectionTitle> <Paragraph position="0"> The steps in a completed text plan are partially ordered, and thus further processing must be performed in order to generate a caption. The order of execution of steps in the plan may either be explicitly specified by the operator writer or may have constraints imposed on it by causal links. For instance, in the plan operator shown in Figure 18, all the steps corresponding to the goal recognize-space will be ordered before the steps corresponding to the goal know-all-mappings because recognize-space is a precondition. However, most steps in the plan are not explicitly ordered and Mittal, Moore, Carenini, and Roth Generating Chart Captions Sample caption (1) This chart presents information about house sales from data-set TS-2480. (2) The y-axis shows the houses.</Paragraph> <Paragraph position="1"> (3) The house's selling price is shown by the left edge of the bar (4) whereas the asking price is shown by the right edge.</Paragraph> <Paragraph position="2"> (5) The horizontal position of the mark shows the agency estimate.</Paragraph> <Paragraph position="3"> A representative caption used to illustrate our discussions.</Paragraph> <Paragraph position="4"> do not have causal links between them dictating the ordering. The ordering module takes as input the discourse plan, with links specifying the ordering relations between subtrees, and orders the leaf nodes--the speech acts--based on a set of heuristics. In our application, for instance, unless otherwise indicated, the system will describe the left edge of the bar before the right edge} deg The ordering module sorts first on the basis of the space ordering. This is based on the assumption that in the absence of any other discourse strategy (such as the need to emphasize or compare properties of a concept across multiple spaces), the reader will browse the spaces from left to right. After the plan steps have been sorted on a space-by-space basis, the module sorts plan steps on the basis of their graphical mappings, using the following ordering heuristics: position > color > shape > size > text > others Finally, within each resulting subset, the module orders steps by grapheme type using the following ordering: line set > bar set > mark set > text set > others The strategy of ordering first by graphical mapping and then by grapheme type is based on our analysis of hand-generated captions. We found that most captions tended to be structured along the mappings rather than along the graphemes.</Paragraph> <Paragraph position="5"> Let us now examine how the system's ordering rules determine the ordering among clauses 3-5 of the sample caption shown in Figure 19. First, clauses 3-5 are grouped together because they are all mappings to position. Second, clauses 3-4 precede clause 5 because bar set must precede mark set. Finall.~ clause 3 precedes clause 4, because of the conventional preference for left-to-right ordering between edges of floating bars.</Paragraph> <Paragraph position="6"> So far, we have examined the ordering strategy that the system will follow by default. However, the ordering module can also take an optional input, a functional specification, which can be used to determine plan step orderings that do not conform with the default ordering. Using this optional specification, the system can take advantage of domain knowledge, such as temporal sequencing, which can play an important role in discourse sequence. For instance, in general it may be preferable to state the mappings of the left and right edges of a bar in that order. However, if the left edge of a bar indicates selling-price and the right edge indicates asking-price, and the usual temporal ordering between the events suggests that one discuss the asking Computational Linguistics Volume 24, Number 3 price of a house before the selling price, this would lead to mentioning the right edge before the left edge, contrary to the default ordering.</Paragraph> </Section> <Section position="3" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.3 Aggregation Module </SectionTitle> <Paragraph position="0"> Once the speech acts are ordered, they are passed to the aggregation module. In the general case, aggregation in natural language is a very difficult problem (Dalianis 1996; Shaw 1995; Huang and Fiedler 1996). Fortunately, our generation task requires a type of aggregation that is relatively straightforward. Our aggregation strategy only conjoins pairs of contiguous propositions about the same grapheme type in the same space. The module checks for grapheme types rather than specific graphemes to cover circumstances where, for instance, a chart may have a number of grey and black bars (which are different graphemes of the same type). This enables the system to generate text of the form &quot;The grey bars indicate the selling price of the house, whereas the black bars indicate the asking price.&quot; When two propositions are combinable, namely they are about the same grapheme type in the same space, the system checks to see ff the two properties being discussed are contrastive in some way. For instance, whether the two properties under consideration are the opposite edges of a bar, or are the X and Y axes, etc. If so, the system picks a contrastive cue phrase (e.g., whereas) to merge the clauses resulting from the two propositions, otherwise the system picks the cue phrase and.</Paragraph> <Paragraph position="1"> Let us now briefly examine how aggregation affected clauses 3-5 of the sample caption in Figure 19. Clauses 3-4 were conjoined because they are about the same grapheme type, a horizontal bar, in the same space. Moreover, the module placed a whereas cue phrase between the two clauses, because the opposite edges of a bar are considered contrastive properties.</Paragraph> </Section> <Section position="4" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.4 Centering Module </SectionTitle> <Paragraph position="0"> Once clauses are ordered and aggregated, coherence of the generated text can be further improved by selecting appropriate orderings between arguments of each clause.</Paragraph> <Paragraph position="1"> For this task, we have developed a selection strategy based on the centering model.</Paragraph> <Paragraph position="2"> Focus (e.g., Sidner 1979; Grosz 1977) and centering (e.g., Grosz, Joshi, and Weinstein 1995) models are attempts at explaining linguistic and atttentional factors that contribute to local coherence among utterances. Although focus and centering models were originally developed as foundations for understanding systems, they have frequently been proposed as effective knowledge sources for NLG systems. In particular, for generating referring expressions (including pronominalization) (see Dale \[1992\], Appelt \[1985\], and Maybury \[1991\]), for deciding when to combine clauses (subordination and aggregation) (see Derr and McKeown \[1984\]), and finally for choosing appropriate inter/intraclause orderings, namely, ordering between clauses and between their arguments (see Maybury \[1991\], Hovy and McCoy \[1989\], and McKeown \[1985\]).</Paragraph> <Paragraph position="3"> Details on centering theory and its relation to discourse structure can be found in Grosz, Joshi, and Weinstein (1995), Walker (1993), Walker, Iida, and Cote (1994), Grosz and Sidner (1993), and Gordon, Grosz, and Gilliom (1993); for lack of space in this paper, we only provide a minimal introduction to the basic terminology of centering theory.</Paragraph> <Paragraph position="4"> Centers are semantic objects (not words, phrases, or syntactic forms) that link an utterance to other utterances in the same discourse segment. Centering theory provides definitions for three different centers, and for four possible center transitions between two adjacent utterances. It also states two fundamental constraints on center movement and realization.</Paragraph> <Paragraph position="5"> * Cf(U): The set of forward-looking centers, which contains all the entities that can link the current utterance to the following one. It is not constrained by features of previous utterances. Elements of Cf(U) are ordered; the major determinant of the ranking on the Cf(U) is grammatical role with subject > object > others. 12 * Cp(U): Highest ranking element of Cf(U) * Cb(U): The backward-looking center (unique) is the highest ranking Cf(Ui_l) realized in the current utterance Ui. Cb(U) is a discourse construct, therefore the same utterance in different discourse segments may have a different Cb.</Paragraph> <Paragraph position="6"> Center Transitions. The four possible center transitions across pairs of utterances are shown in Table 1.</Paragraph> <Paragraph position="7"> The central tenet in centering theory is that discourse coherence of a text span increases (and a reader's cognitive load decreases) proportionately to the extent that discourse within the span follows two fundamental centering constraints (Grosz, Joshi, and Weinstein 1995). These are: Constraint on realization: If any element in the set of forward-looking centers of an utterance (Ui) is realized by a pronoun in the following utterance (Ui+i), then the backward-looking center of the following utterance (Ui+l) must also be realized by a pronoun.</Paragraph> <Paragraph position="8"> Constraint on movement: (i.e., centering transitions) Sequences of CONTINUATIONS are preferred over sequences of RETAININGS; and sequences of RETAININGS are preferred over sequences of SHIFTINGS (and consequently, smooth shifts are preferred over rough shifts).</Paragraph> <Paragraph position="9"> Grosz and her colleagues suggest that a competent generation system should apply the constraint on movement by planning ahead in an attempt to minimize the number of SHIFTS in a locally coherent discourse segment (Grosz, Joshi, and Weinstein 1995). Our centering-based strategy implements this suggestion by selecting intraclause orderings that enforce centering transitions consistent with a given discourse structure. The strategy is general and can be applied to any discourse structure, but to be effectively applied to the generation of captions, some assumptions not supported in</Paragraph> <Paragraph position="11"> This chart presents information about house sales from data-set TS-1742. The Y-axis indicates the houses. The dark gray bar shows the house's selling price whereas the black bar shows the asking price.</Paragraph> <Paragraph position="12"> Figure 20 The referring expression module uses color in this case to distinguish between the two types of bars and the attributes mapped to them.</Paragraph> <Paragraph position="13"> terms of centering theory must be made. The problem is that the NPs generated in the captions are often possessive and have complex syntactic structures (e.g., the selling price of the house, the mark's horizontal position) and centering theory is not yet clear on the determination of centers in complex syntactic structures such as possessives and subordinate clauses (Grosz and Sidner 1993). To accommodate this problem we made two assumptions. First, given possessives of the form &quot;property of grapheme/entity&quot;, either the grapheme or the entity is the center, not their properties. Second, even when only a property (e.g., selling-price, right edge) is mentioned, the corresponding entity or grapheme is the center.</Paragraph> <Paragraph position="14"> Our centering strategy processes the ordered speech acts sequentially and assumes that text spans describing the mappings from properties of a grapheme to properties of an entity are locally coherent discourse segments. The strategy enforces the constraint on movement within each of these discourse segments by preferring a CONTINUATION or a SMOOTH-SHIFT transition to a RETAIN or a ROUGH-SHIFT transition, respectively. This is done by keeping the highest-ranking forward-looking center of the first clause of the segment (which is either an entity or a grapheme), as the Cp(Ui) of all the following clauses in the same segment. In this way, in all such clauses the Cb(Ui) and the Cp(Ui) will be the same and, according to Table 1, this corresponds to forcing Mittal, Moore, Carenini, and Roth Generating Chart Captions either CONTINUATIONS or SMOOTH-SHIFTS.</Paragraph> <Paragraph position="15"> Furthermore, the strategy applies an additional constraint on movement: between segments dealing with different graphemes, the strategy explicitly marks the segment boundaries by preferring ROUGH-SHIFT over SMOOTH-SHIFT and RETAIN over CONTIN-UATION. This case is not mentioned in Grosz, Joshi, and Weinstein (1995). However, since the system maintains local coherence in a segment by minimizing ROUGH-SHIFTS and RETAINS, it seems intuitive to prefer ROUGH-SHIFTS and RETAINS to emphasize the change at segment boundaries (i.e., the boundaries between such segments should be maximally incoherent). Thus, in the caption generation application, when a text span describing the mapping for a grapheme (a discourse segment) is followed by a description of a mapping for a different grapheme (another discourse segment), the centering strategy will try to force either a ROUGH-SHIFT or RETAIN to mark the segment boundary. This is done by moving the Cb(Ui) of the clause following the boundary out of the clause front position. That is, if the grapheme is the Cb(Ui), the domain entity is placed in front of the clause, and vice versa in the other case.</Paragraph> <Paragraph position="16"> For example, consider the effect of the centering strategy on clauses 3--5 of the sample caption shown in Figure 19. Since clauses 3 and 4 are about mappings from properties of the same grapheme--a horizontal bar they are assumed to belong to the same discourse segment. Therefore, the system keeps the Cp of clause 4 equal to the Cp of clause 3 by placing the possessive the house's asking price in front of the clause. In contrast, since clauses 4 and 5 are about mappings from properties of different graphemes, a RETAIN centering transition (as opposed to a CONTINUATION) was enforced by moving the possessive corresponding to the Cb, the house's agency estimate, out of the front position. Once intraclause orderings are determined by the centering strateg35 the annotated speech acts are passed to the referring expression module.</Paragraph> </Section> <Section position="5" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.5 Referring Expression Module </SectionTitle> <Paragraph position="0"> The referring expression module is largely based on the algorithm for incremental interpretation described in Dale and Reiter (1995). The incremental interpretation algorithm can generate appropriate referring expressions by incrementally constructing a set of attributes that uniquely identify the desired referent. These identifying attributes are selected based on a domain-specific default ordering. In our case, the only referential problem is identifying the graphemes, and often the type of the grapheme (e.g., &quot;bar&quot;) is sufficient to do so. 13 However, sometimes, a graphic may contain multiple graphemes of the same type. In such cases, the system must utilize additional perceptual properties (e.g., color, saturation, size, shape) to build an appropriate referring expression. For example, the referring expressions for the bars in the caption for the chart shown in Figure 20 use color as an additional identifying attribute.</Paragraph> <Paragraph position="1"> Since our system generates multisentential captions, the referring expression module takes into account what is in focus at a given point in the discourse in order to generate concise and natural expressions. The referring expression module considers in focus all of the forward-looking centers (i.e., Cf) computed by the centering module, and simply removes identifying attributes if they are in the Cf at that point in the discourse. This strategy results in the more concise rephrasing: (3) The house's selling price is shown by the left edge of the bar (4) whereas the asking price is shown by the right edge. The horizontal position of the mark shows the agency estimate. 13 For instance, we do not have to worry about issues such as implicatures conveyed by lexical choices or the use of non-basic-level classes, since the set of objects and the available ways of referring to them in our context is so limited.</Paragraph> <Paragraph position="2"> Computational Linguistics Volume 24, Number 3 There are other forms of referring expression reduction due to discourse context that require a more sophisticated treatment. Hand-written captions often radically simplify descriptions to express facts such as: &quot;the third chart shows the neighborhood.&quot; However, the system-generated caption would express the underlying proposition, based on the data to grapheme mappings, as: &quot;the position of the mark in the third chart shows the neighborhood.&quot; The sequence of reductions shown below could achieve the more natural effect by repeatedly reasoning about the picture and the information being conveyed by each statement.</Paragraph> <Paragraph position="3"> the position of the mark in the third chart shows the neighborhood (1) the mark in the third chart shows the neighborhood (2) the third chart shows the neighborhood (3) The system would need to realize that position was the only attribute of the mark being used for a mapping, and position is always clear in a graph and need not explicitly be mentioned; thus resulting in statement (2). However, since the mark is the only grapheme used in the graph, the system could leave off mentioning the mark as well, thus resulting in statement (3). There are two ways of dealing with this issue: (i) The system could apply iterative refinements of the referring expressions generated by the planner, as done in the local brevity algorithm (Reiter 1990). However, this single case would have substantially increased the computational cost of generating referring expressions in all cases, without significantly improving any of the other (perfectly appropriate) referring expressions generated by the module. (ii) The system could recognize this specific situation at a higher level and process the speech acts appropriately to avoid this situation completely. Thus, rather than considering this situation as a problem of generating an appropriate expression for the concept position of the mark in the third chart, we have chosen to push this problem up to the planner level during content selection. Consequently, there are operators that look specifically for situations such as this--single grapheme in a space, mapping a single property--that are selected by the planner in such situations. While this does tend to muddy the distinction between the &quot;high-level&quot; planner and the &quot;lower-level&quot; tactical processing--because the planner is now forced to deal with this one situation regarding referring expressions that should arguably be dealt with more properly by the referring expression module---it does enable the system to generate appropriate texts with a simpler, more efficient approach in this application.</Paragraph> <Paragraph position="4"> It should be noted that there is one additional type of referring expression that our system is capable of generating. This happens in situations when the graphic being explained is considered complex enough to require an example. In such cases, the system attempts to highlight the grapheme corresponding to the tuple being used in the example. There are a number of ways in which the relevant grapheme can be highlighted--with arrow, a circle surrounding the grapheme, a change in color, or another graphical annotation--and a corresponding number of ways in which the caption can then refer to the grapheme. This is similar to the approaches used for generating cross-modal references discussed in the context of the COMET (McKeown et al. 1992) and wIP (Andr6 and Rist 1994) projects. This will be illustrated in the next subsection, which discusses the generation of examples.</Paragraph> </Section> <Section position="6" start_page="400001" end_page="400001" type="sub_section"> <SectionTitle> 5.6 Example Generation Module </SectionTitle> <Paragraph position="0"> If the text planner encounters particularly complex data-to-grapheme mappings, it can attempt to present an example to clarify the problematic mappings. Our current implementation is designed to trigger the example generation process in the case of interfering grapheme clusters where occlusion can hinder interpretation. Plan operators Mittal, Moore, Carenini, and Roth Generating Chart Captions (described in Section 5.1) contain constraints that check for the appropriate conditions and establish goals for the generation of an example in the caption. In response, the example generation module selects a grapheme shown in the picture, finds the data values associated with the individual grapheme, and constructs an example that can be used by the text planner. Additionally, the example generator also posts a request to SAtE to highlight the relevant instance in the picture. If the highlighting request succeeds, the example generator annotates the example with this information and the resulting caption mentions the highlighted grapheme. Currentl~ this is the only case in which the caption generation mechanism can influence the graphic design. A caption fragment that includes an example is shown below: For example, as shown in the highlighted tuple, 3237 Beechwood Boulevard's asking price is 79900 dollars and its selling price is 65000 dollars. Its agency estimate is 79781.625 dollars. Its neighborhood is Squirrel Hill.</Paragraph> <Paragraph position="1"> There are a number of issues relevant to the generation of captions that integrate examples and text (Mittal and Paris 1992, 1993). We will not discuss them here in detail because the context in which our current system generates explanations is very restricted (as compared to the general case of expository text in which examples are traditionally used when novel or abstract concepts are being introduced). The main difference between generating examples for purely textual descriptions and our current application is in the selection of values used for illustration: one of the constraints in our current situation is the ability of the reader to identify the grapheme in question. Rather than use a strategy that finds and uses either extreme, limiting values, or more prototypical values, the current application requires the selection of a grapheme that is easy to identify and that facilitates the interpretation of values mapped to it. To enable this, the system must be able to reason about individual graphemes as well as the picture as a whole: which graphemes are not crowded by other graphemes, are not too small thin or otherwise unconventional to make interpretation difficult, have data values mapped to them that can be discussed in the caption 14, etc.</Paragraph> <Paragraph position="2"> 6. System Implementation and Evaluation: A Discussion In general it is essential to empirically evaluate theories and systems that purportedly implement them. Not only do evaluations help others understand the strengths and limitations of various hypotheses and systems, but they also facilitate comparisons between competing claims in many cases. However, NLG evaluations are considered difficult (Hovy and Meteer 1990). NLG systems can be evaluated at many different levels, some which are orthogonal to each other. Our case is no exception. There are at least three different, and equally important questions that one could investigate further: validity of the complexity metric: This is perhaps the most critical aspect, since without a valid complexity metric, the system would not be able to generate reasonable captions irrespective of how well any/all of the other components performed. The only way to corroborate the complexity metrics we discussed here would be through rigorous user experiments; fortunatel~ a recent dissertation on graph comprehension 14 This situation often occurs in maps, when certain tuples are better for examples because they're close to landmarks that can be used to identify them. Computational Linguistics Volume 24, Number 3 (Shah 1995) looked at some of the factors in our complexity metrics and found that many of the factors used were indeed correlated with the increased times required to interpret graphs and charts.</Paragraph> <Paragraph position="3"> * validity of the discourse strategies: The paper discussed three discourse strategies for structuring information presented in the captions. There are at least two ways to evaluate a set of strategies used: (1) We could perform a corpus analysis on a different set of charts and captions than those used to initially infer the strategies, in an effort to see how well they fit the test set: this is the usual approach in machine learning, where the learning and test sets are kept separate for precisely this reason. This would require significant resources to find and code charts and their captions for both the data displayed and the discourse strategies used, but it would help determine whether the set of discourse strategies we had come up with was both consistent and complete. (2) Another way to evaluate the discourse strategies would be to conduct user comprehension tests with various charts and captions generated using different strategies at random: while this would be less efficient at testing the set of strategies for completeness, it would allow us to validate that a particular strategy (from our set of three) was best suited for particular types of charts.</Paragraph> <Paragraph position="4"> * utility of the captions generated: This is really the &quot;value-added&quot; test: are the captions and the graphics together better than the graphics alone for some purpose? If so, the value of generating the captions would be confirmed. We conducted an informal, subjective evaluation of the system over a period of two years. Whenever users interacted with SAGE and were unable to understand a graphic, we suggested that they generate a caption. Later on, we requested feedback on their experience: whether the captions were useful or not, and if they would have liked to see something different. We can categorically state that the captions clearly help in understanding the graphic being presented. The need for natural language explanations seems to arise every time a novel, complex graphic is generated--something that happens quite frequently with SAGE.</Paragraph> <Paragraph position="5"> A large part of the work we have discussed in this paper is system-independent and applicable to any automatic graphic design system. Perhaps the most surprising aspect about our current implementation is how far one can get with such a simple architecture. We made certain simplifying decisions initially in order to get a prototype implemented. Surprisingly few of these simplifying assumptions were problematic down the line. An example of this is our pipelined architecture. Most NLG researchers agree that the various modules in a NLG system need to be strongly interconnected with bidirectional communication and control and use shared data structures. We started off by using a pipelined architecture and were surprised to find that the simplifications seemed to be problematic in only one situation (which we were able to get around by planning appropriately). There are several advantages of a pipelined approach as in our case: not only is it easy to design, implement, and test each module independently, it also becomes easy to extend the functionality of any individual module without significantly affecting the others. While such a simplified architecture will certainly not suffice for all generation tasks, this is a strong argument for trying this minimal approach to see where it falls short and why.</Paragraph> <Paragraph position="6"> Mittal, Moore, Carenini, and Roth Generating Chart Captions Over the last two years, this system has been used to generate captions for several hundred figures in different domains (housing-sales, Napoleon's march of 1812, logistics transportation, scheduling, etc.). Porting the system from one domain to another usually requires only specifying the lexicon for the new domain (e.g., battle, troops, etc.). The fact that the captions generated in each of these---quite different--domains are deemed useful and natural by users is testimony to the effectiveness of the caption generation mechanism currently in place.</Paragraph> <Paragraph position="7"> It should be noted that there are two shortcomings in the system that will be addressed in future work: (1) the caption generation system, as described here, cannot, in general, modify the graphics designed by SAGE. There are several cases where this capability would be extremely useful, but the caption generation system described here was designed to work after SAGE had designed and rendered the graphic. There is one specialized case where coordination currently occurs, which is when the caption generator presents an example. In that case, the caption generator can request that the graphemes corresponding to the tuple values used in the example be highlighted in the picture; (2) the system does not, as yet, analyze the data set for interesting patterns or clusters of data points. To do this, the system will need a clustering analysis module that can be used by the caption generator. As a result, the system cannot generate captions of the sort &quot;this chart shows that sales were flat throughout 1995, but rose sharply in 1996.&quot;</Paragraph> </Section> </Section> class="xml-element"></Paper>