File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/w01-0802_metho.xml
Size: 22,929 bytes
Last Modified: 2025-10-06 14:07:38
<?xml version="1.0" standalone="yes"?> <Paper uid="W01-0802"> <Title>A Two-stage Model for Content Determination</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Content Determination </SectionTitle> <Paragraph position="0"> Content determination is the task of deciding on the information content of a generated text. In the three-stage pipeline model of Reiter and Dale (2000), content determination is part of the first stage, document planning, along with document structuring (determining the textual and rhetorical structure of a text). Content determination is extremely important to end users; in most applications users probably prefer a text which poorly expresses appropriate content to a text which nicely expresses inappropriate content. From a theoretical perspective content determination should probably be based on deep reasoning about the system's communicative goal, the user's intentions, and the current context (Allen and Perrault 1980), but this requires an enormous amount of knowledge and reasoning, and is difficult to do robustly in real applications.</Paragraph> <Paragraph position="1"> In recent years many new content determination strategies have been proposed, ranging from the use of sophisticated signal-processing techniques (Boyd 1997) to complex planning algorithms (Mittal et al 1998) to systems which exploit cognitive models of the user (Fiedler 1998). However, most of these strategies have only been demonstrated in one application.</Paragraph> <Paragraph position="2"> Furthermore, as far as we can tell these strategies are usually based on the intuition and experiences of the developers. While realisation, microplanning, and document structuring techniques are increasingly based on analyses of how humans perform these tasks (including corpus analysis, psycholinguistic studies, and KA activities), most papers on content determination make little reference to how human experts determine the content of a text. Human experts are often consulted with regard to the details of content rules, especially when schemas are used for content determination (Goldberg et al 1994, McKeown et al 1994, Reiter et al 2000); but they rarely seem to be consulted (as far as we can tell) when deciding on the general algorithm or strategy to use for content determination.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Summarising Time-Series Data </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Text summaries of Time-Series Data </SectionTitle> <Paragraph position="0"> Time-series data is a collection of values of a set of parameters over time. Such data is very common in the modern world, with its proliferation of databases and sensors, and humans frequently need to examine and make inferences from time-series data.</Paragraph> <Paragraph position="1"> Currently, human examination of time-series data is generally done either by direct inspection of the data (for small data sets), by graphical visualisation, or by statistical analyses.</Paragraph> <Paragraph position="2"> However, in some cases textual summaries of time-series data are also useful. For example, newspapers regularly publish textual summaries of weather predictions, the results of polls and surveys, and stock market activity, instead of just showing numbers and graphs. This may be because graphical depictions of time-series data require time and skill to interpret, which is not always available. A doctor rushing to the side of a patient who is suffering from a heart attack, for example, may not have time to examine a set of graphs of time-series data, and a newspaper reader may not have the statistical knowledge necessary to interpret raw poll results.</Paragraph> <Paragraph position="3"> Perhaps the major problem today with textual descriptions of time-series data is that they must be produced manually, which makes them expensive and also means they can not be produced instantly. Graphical depictions of data, in contrast, can be produced quickly and cheaply using off-the-shelf computer software; this may be one reason why they are so popular.</Paragraph> <Paragraph position="4"> If textual summaries of time-series data could be automatically produced by software as cheaply and as quickly as graphical depictions, then they might be more widely used.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 SUMTIME </SectionTitle> <Paragraph position="0"> The goal of the SUMTIME project is to develop better techniques for automatically generating textual summaries of time-series data, in part by integrating leading-edge NLG and time-series analysis technology. We are currently focusing on two domains: Meteorology - producing weather forecasts from numerical weather simulations. This work is done in collaboration with Weather News Inc (WNI)/Oceanroutes, a leading meteorological company.</Paragraph> <Paragraph position="1"> Gas Turbines - summarising sensor readings from a gas turbine. This work is done in collaboration with Intelligent Applications, a leading developer of monitoring software for gas turbines.</Paragraph> <Paragraph position="2"> These domains are quite different in time-series terms, not least in the size of the data set. A typical weather forecast is based on tens of values for tens of parameters, while a summary of gas-turbine sensor readings may be based on tens of thousands of values for hundreds of parameters. We hope that looking at such different domains will help ensure that our results are generalisable and not domainspecific. We will start working on a third domain in 2002; this is likely to be a medical one, perhaps (although this is not definite) summarising sensor readings in neonatal intensive care units.</Paragraph> <Paragraph position="3"> The first year of SUMTIME (which started in April 2000) has mostly been devoted to knowledge acquisition, that is to trying to understand how human experts summarise time-series data. This was done using various techniques, including corpus analysis, observation of experts writing texts, analysis of content rules suggested by experts, discussion with experts, and think-aloud sessions, where experts 'think aloud' while writing texts (Sripada, 2001).</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.3 Example </SectionTitle> <Paragraph position="0"> The following table shows an example segment of meteorological time series data, specifically predicted wind speed and wind direction at an offshore oil exploration site. The time field is shown in 'day/hour' format.</Paragraph> <Paragraph position="1"> The above example is just a sample showing the data and its corresponding forecast text for the wind subsystem. Real weather forecast reports are much longer and are produced from data involving many more weather parameters than just wind speed and wind direction.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Human Summarisation </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Meteorology </SectionTitle> <Paragraph position="0"> In the domain of weather forecasting, we observed how human experts carry out the task of summarising weather data by video recording a meteorologist thinking aloud while writing weather forecasts. Details of the KA have been described in Sripada (2001). Our observations included the following: 1. In the case of weather forecasts, time-series data represent the values of important weather parameters (wind speed, direction, temperature, rainfall), which collectively describe a single system, the weather. It seemed as though the expert was constructing a mental picture of their source using the significant patterns in time series. Thus the first activity is that of data interpretation to obtain a mental model of weather.</Paragraph> <Paragraph position="1"> 2. The mental model of the weather is mostly in terms of the elements/objects related to atmosphere, like cold fronts and warm fronts; it also seems to be qualitative instead of numerical. In other words, it qualitatively describes the meteorological state of the atmosphere. The expert calls this an 'overview of the weather'.</Paragraph> <Paragraph position="2"> 3. Building the overview involves the task of interpretation of the time series weather data. While interpreting this data the expert used his meteorological knowledge (which includes his personal experience in interpreting weather data) to arrive at an overview of the weather. During this phase, he appeared to be unconcerned about the end user of the overview (see 4.1.1 below).</Paragraph> <Paragraph position="3"> We call this process Domain Problem Solving (DPS) where information is processed using exclusively the domain knowledge.</Paragraph> <Paragraph position="4"> 4. Forecasts are written after the forecaster gets a clear mental picture (overview) of the weather. Building the overview from the data is an objective process which does not depend on the forecast client (user), whereas writing the forecast is subjective and varies with client.</Paragraph> <Paragraph position="5"> Two examples of the influence of the overview on wind texts (Section 3.3) are: 1. When very cold air flows over a warm sea, surface winds may be underestimated by the numerical weather model. In such cases the forecaster uses his 'overview of the weather' to increase wind speeds and also perhaps add other instability features to the forecast such as squalls.</Paragraph> <Paragraph position="6"> 2. If the data contains an outlier, such as a wind direction which is always N except for one time period in which it is NE, then the expert uses the overview to decide if the outlier is meteorologically plausible and hence should be reported or if it is likely to be an artefact of the simulation and hence should not be reported.</Paragraph> <Paragraph position="7"> The above examples involve reasoning about the weather system. Forecasters also consider user goals and tasks, but this may be less affected by the overview. For example, in one think-aloud session, the forecaster decided to use the phrase 20-24 to describe wind speed when the data file predicted a wind speed of 19kt. He explained to us that he did this because he knew that oil-rig staff used different operational procedures (for example for docking supply boats) when the wind exceeded 20kt, and he also knew that even if the average wind speed in the period was 19kt, the actual speed was going to vary minute by minute and often be above 20kt. Hence he decided to send a clear signal to the rig staff that they should expect to use '20kt or higher' procedures, by predicting a wind speed of 20-24. This reasoning about the user took place after the overview had been created, and did not seem to involve the overview.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Gas Turbine Sensors </SectionTitle> <Paragraph position="0"> Unlike the previous domain, in the domain of gas turbine (GT), currently there are no textual summaries of turbine data written by humans.</Paragraph> <Paragraph position="1"> Thus we have asked the domain experts to comment orally on the data. However, the experts have attempted to summarise their comments at the end of each session if they found something worth summarising. Our observations included: 1. The main task is to identify the abnormal data and summarise it. However, an abnormal trend in a specific channel might have been caused due to a change in another channel (for instance, an increase in the output voltage can be explained with a corresponding increase in the fuel input). Thus individual channel data needs to be interpreted in the context of the other channels.</Paragraph> <Paragraph position="2"> 2. The expert agrees during the KA session that he first analyses the data numerically to obtain qualitative trends relating to the GT before generating comments. Therefore the state of the GT that produced the data is constructed through data interpretation and the knowledge of the state is then used to check if the turbine is in a healthy state or not. Since GT is an artefact created by humans it is possible to have a fairly accurate model of states of a GT (unlike weather!).</Paragraph> <Paragraph position="3"> 3. The phrases used by the expert often express the trends in the data as if they were physical happenings on the turbine, like &quot;running down&quot; for a decreasing trend in shaft speed data. This indicates that the expert is merely expressing the state of the GT. This in turn indicates that at the time the summarisation is done, the mental model of the state of the GT is available.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Evidence from Other Projects </SectionTitle> <Paragraph position="0"> After making the above observations, we examined think-aloud transcripts from an earlier project at the University of Aberdeen, STOP (Reiter et al 2000), which involved building an NLG system that produced smoking-cessation letters from smoking questionnaires. These transcripts (from think-aloud sessions of doctors and other health professionals manually writing smoking-cessation letters) showed that in this domain as well experts would usually first build an overview (in this case, of the smoker) before starting to determine the detailed content of a letter. Below is an excerpt from one of the transcripts of a KA session : << .... The first thing I have got to do is to read through the questionaire just to get some idea of where he is at with his smoking. ...... >> We did not investigate overview formation in any detail in STOP, but the issue did come up once in a general discussion with a doctor about the think-aloud process. This particular doctor said that he built in his mind a mental image of the smoker (including a guess at what he or she looked like), and that he found this image very useful in deciding how best to communicate with the smoker.</Paragraph> <Paragraph position="1"> In another work, RajuGuide, once again there is evidence of an overview influencing content determination (Sripada 1997). RajuGuide is a system that generates route descriptions. At a higer level of abstraction, RajuGuide has two parts. The first part is responsible for planning the route the user wanted. The second module is responsible for generating the text describing the route. The route computed by the first part, which is in the form of a series of coordinates, is not directly communicated to the user. Instead the second part attempts to enrich the route depending upon what the user already knows and what additional information the knowledge base knows for that particular route. We believe that the route computed by the route planner is the overview in this case and it drives the content determination process in the second part.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Two-stage Model for content </SectionTitle> <Paragraph position="0"> determination These observations have led us to make the following hypotheses: 1. Humans form a qualitative overview of the input data set.</Paragraph> <Paragraph position="1"> 2. Not all the information in the overview is used in the text.</Paragraph> <Paragraph position="2"> 3. The overview is not dependent on pragmatic factors such as the user's taste, these are considered at a later stage of the content determination process.</Paragraph> <Paragraph position="3"> Based on the above hypotheses, we propose a two-stage model for content determination as depicted in Figure 1. It is assumed that Domain Data Source (DDS) is external to the text generator. It has been assumed that a Domain Problem Solver or Domain Reasoner (DR) is available for data processing. This reasoning module is essentially useful to draw inferences while interpreting the input data set and ultimately is responsible for generating the overview. Communication Goal (CG) is the input to the data summarisation system in response to which it accesses DDS to produce an overview of the data using the DR. In the context of the overview produced by DR, the generates the final content specification taking into account the influence of the User Constraints (UC) and other pragmatic factors. This content is then sent to subsequent NLG modules (not shown), such as microplanning and surface realisation.</Paragraph> <Paragraph position="4"> Our model has some similarities to the one proposed by Barzilay et al (1998), in that the Domain Reasoner uses general domain knowledge similar to their RDK, while the Communication Reasoner uses communication knowledge similar to their CDK and DCK.</Paragraph> <Paragraph position="5"> The central feature of the above model is the idea of data overview and its effect on content selection. One possible use of overviews is to trigger context-dependent content rules. The time-series analysis part of SUMTIME is largely based on Shahar's model (1997), which makes heavy use of such rules. In Shahar's model contexts are inferred by separate mechanisms; we believe that these should be incorporated into the overview, but this needs further investigation.</Paragraph> <Paragraph position="6"> At the current stage of our project we have only a gross idea of what makes up the proposed data overview. Our suspicion is that it is hard to make a generic definition of the data overview for all domains. Instead, we would like to imagine the data overview as the result of inferences made from the input data so as to help in triggering the right content determination rules. For example, in out meteorology domain, the input time-series data comes from a numerical weather prediction (NWP) model, but even the most sophisticated NWP models do not fully represent the real atmosphere - all models work with approximations. Thus the NWP data displayed to the meteorologist is interpreted by him to arrive at a conceptual model in his or her head, which is the overview.</Paragraph> <Paragraph position="7"> 7 Issues with the two stage model There are a number of issues that need to be resolved with respect to the two-stage model described above.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 7.1 Is overview creation a human </SectionTitle> <Paragraph position="0"> artefact? The main basis for including the overview in two stage model has been the observation made during the think aloud sessions that experts form overviews before writing texts. Now it can be argued that even if humans need an overview, computer programs may not. Evidently, it is hard to ever prove the contrary. But what can be done is to show the advantages gained by a computer program by using an overview for content selection.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 7.2 Does the overview have any other </SectionTitle> <Paragraph position="0"> utility than just providing context for content determination rules? We believe that the overview can play multiple roles in the overall process of writing textual forecasts. First, the overview can bring in additional information into the text that is not directly present in the underlying raw data. In Reiter and Dale's (2000) terminology, overviews are a technique for generating Computable Data content, that is content which is not directly present in the input data but can be computed or inferred from it. Such content provides much of the value of summary texts. Indeed, one could argue that simple textual descriptions of a set of data values without extra computed or inferred content, such as those produced by TREND (Boyd, 1997), might not be that much more useful than a graph of the data.</Paragraph> <Paragraph position="1"> The overview may also help in deciding how reliable the input data is, which is especially important in the meteorology domain, since the data comes from an NWP simulation. This could, for example, help the generation system decide whether to use precise temporal terms such as Midnight or vague temporal terms such as tonight. Again one could argue that the ability to convey such uncertainty and reliability information to a non-specialist is a key advantage of textual summaries over graphs.</Paragraph> <Paragraph position="2"> In general, the overview allows reasoning to be carried out on the raw data and this will probably be useful in many ways.</Paragraph> <Paragraph position="3"> 7.3 How is the overview related to the domain ontology? The basic concepts present in an overview may be quite different from the basic concepts present in a written text. For example, the overview built by our expert meteorologist was based on concepts such as lapse rate (the rate at which temperature varies with height), movement of air masses, and atmospheric stability. However, the texts he wrote mentioned none of these, instead it talked about wind speed, wind direction, and showers. In the STOP domain, overviews created by doctors seemed to often contain many qualitative psychological attributes (depression, selfconfidence, motivation to quit, etc) which were not explicitly mentioned in the actual texts written by the doctors.</Paragraph> <Paragraph position="4"> This suggests that the conceptual ontology, that is the specification of underlying concepts, underlying the overview may be quite different from the ontology underlying the actual texts.</Paragraph> <Paragraph position="5"> The overview ontology includes concepts used by experts when reasoning about a domain (such as air masses or motivation), while the text ontology includes concepts useful for communicating information to the end user (such as wind speed, or longer life expectancy). 7.4 What do experts think about the two-stage model? When the two stage model was reported back to a WNI expert who participated in a think-aloud session, the expert agreed that he does build an overview (as he did during the KA session) while writing forecasts, but felt that it's use may not be necessary for writing all forecasts. In his opinion, the interpretation of most data sets doesn't require the use of the overview.</Paragraph> <Paragraph position="6"> However, he was quick to add that the quality of the forecasts can be improved by using overviews which faciliate reasoning with the weather data.</Paragraph> </Section> </Section> class="xml-element"></Paper>