File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/97/j97-1004_metho.xml

Size: 90,422 bytes

Last Modified: 2025-10-06 14:14:29

<?xml version="1.0" standalone="yes"?>
<Paper uid="J97-1004">
  <Title>Developing and Empirically Evaluating Robust Explanation Generators: The KNIGHT Experiments</Title>
  <Section position="4" start_page="66" end_page="71" type="metho">
    <SectionTitle>
2. The Task of Explanation Generation
</SectionTitle>
    <Paragraph position="0"> Explanation generation is the task of extracting information from a formal representation of knowledge, imposing an organization on it, and realizing the information in text. An explanation system must be able to map from a formal representation of domain knowledge (i.e., one which can be used for automated reasoning, such as the predicate calculus) to a textual representation of domain knowledge. Because of the significant differences in formal and textual representational schemes, successfully bridging the gap between them is one of the major challenges faced by an explanation system.</Paragraph>
    <Paragraph position="1"> To communicate complex ideas, an explanation system should be able to produce extended explanations such as those in Figure 1, which shows several explanations produced by KNIGHT from the domain of botanical anatomy, physiology, and development. Note that each of these is a multisentential explanation; the first is a multi-paragraph explanation. These explanations are expository: in contrast to causal explanations produced by automated reasoning systems, expository explanations describe domain phenomena, such as anatomical structures and physiological processes.</Paragraph>
    <Paragraph position="2"> Explanation generation is typically decomposed into two subtasks, explanation planning and realization. Explanation planning itself has two subtasks: content deterruination, in which knowledge structures are extracted from a knowledge base, and organization, in which the selected knowledge structures are arranged in a manner appropriate for communication in natural language. To perform these tasks successfully, an explanation planner must have access to discourse knowledge, which informs its decisions about the content and organization of textual explanations. The organizational aspect of discourse knowledge plays a particularly important role in the construction of extended explanations. Our work focuses on content determination and organization and de-emphasizes issues in realization. However, to more cleanly evaluate the explanation planning work, we have developed a robust realization system that is built on FUF (Elhadad 1992), a unification-based implementation of a large systemic grammar.</Paragraph>
    <Paragraph position="3">  Computational Linguistics Volume 23, Number 1 Question: What happens during embryo sac formation? Response: Embryo sac formation is a kind of female gametophyte formation.</Paragraph>
    <Paragraph position="4"> During embryo sac formation, the embryo sac is formed from the megaspore mother cell. Embryo sac formation occurs in the ovule.</Paragraph>
    <Paragraph position="5"> Embryo sac formation is a step of angiosperm sexual reproduction.</Paragraph>
    <Paragraph position="6"> It consists of megasporogenesis and embryo sac generation.</Paragraph>
    <Paragraph position="7"> During megasporogenesis, the megaspore mother cell divides in the nucellus to form 4 megaspores. During embryo sac generation, the embryo sac is generated from the megaspore.</Paragraph>
    <Paragraph position="8"> Question: What happens during pollen tube growth? Response: Pollen tube growth is a step of angiosperm sexual reproduction.</Paragraph>
    <Paragraph position="9"> It consists of sperm cell generation and sperm cell transport.</Paragraph>
    <Paragraph position="10"> During sperm cell generation, the pollen generative cell divides to form 2 angiosperm sperm cells. During sperm cell transport, 2 angiosperm sperm cells are transported from the pollen tube to the embryo sac.</Paragraph>
    <Paragraph position="11"> Question: What is a spore? Response: The spore is a kind of haploid cell. 4 spores are produced from the spore mother cell during sporogenesis. The spore divides to form 2 plant gametes during gametogenesis. Gametogenesis is a step of gametophyte development.</Paragraph>
    <Paragraph position="12"> Question: What is a root system? Response: The root system is part of the plant and is connected to the mainstem. It is below the hypocotyl and is surrounded by the rhizosphere. The subregions of the root system include the meristem, which is where root system growth occurs.</Paragraph>
    <Paragraph position="13"> Figure 1 Explanations produced by K NIGHT from the Biology Knowledge Base.</Paragraph>
    <Section position="1" start_page="67" end_page="68" type="sub_section">
      <SectionTitle>
2.1 Evaluation Criteria and Desiderata
</SectionTitle>
      <Paragraph position="0"> Evaluating the performance of explanation systems is a critical and nontrivial problem.</Paragraph>
      <Paragraph position="1"> Although gauging the performance of explanation systems is inherently difficult, five evaluation criteria can be applied.</Paragraph>
      <Paragraph position="2">  * Coherence: A global assessment of the overall quality of the explanations generated by a system.</Paragraph>
      <Paragraph position="3"> * Content: The extent to which the information is adequate and focused. * Organization: The extent to which the information is well organized.  explanations are in accord with the established scientific record. In addition to performing well on the evaluation criteria, if explanation systems are to make the difficult transition from research laboratories to field applications, we want them to exhibit two important properties, both of which significantly affect scalability. First, these systems' representation of discourse knowledge should be easily inspected and modified. To develop explanation systems for a broad range of domains, tasks, and question types, discourse-knowledge engineers must be able to create and efficiently debug the discourse knowledge that drives the systems' behavior. The second property that explanation systems should exhibit is robustness. Despite the complex and possibly malformed representational structures that an explanation system may encounter in its knowledge base, it should be able to cope with these structures and construct reasonably well-formed explanations.</Paragraph>
    </Section>
    <Section position="2" start_page="68" end_page="71" type="sub_section">
      <SectionTitle>
2.2 Semantically Rich, Large-Scale Knowledge Bases
</SectionTitle>
      <Paragraph position="0"> Given the state of the art in explanation generation, the field is now well positioned to explore what may pose its greatest challenge and at the same time may result in its highest payoff: generating explanations from semantically rich, large-scale knowledge bases. Large-scale knowledge bases encode information about domains that cannot be reduced to a small set of principles or axioms. For example, the field of human anatomy and physiology encompasses a body of knowledge so immense that many years of study are required to assimilate only one of its subfields, such as immunology.</Paragraph>
      <Paragraph position="1"> Large-scale knowledge bases are currently being constructed for many applications, and the ability to generate explanations from these knowledge bases for a broad range of tasks such as education, design, and diagnosis is critical.</Paragraph>
      <Paragraph position="2"> Large-scale knowledge bases whose representations are semantically rich are particularly intriguing. These knowledge bases consist of highly interconnected networks of (at least) tens of thousands of facts. Hence, they represent information not only about a large number of concepts but also about a large number of relationships that hold between the concepts. One such knowledge base is the Biology Knowledge Base (Porter et al. 1988), an immense structure encoding information about botanical anatomy, physiology, and development. One of the largest knowledge bases in existence, it is encoded in the KM frame-based knowledge representation language. 1 KM provides the basic functionalities of other frame-based representation languages and is accompanied by a graphical user interface, KNED, for entering, viewing, and editing frame-based structures (Eilerts 1994).</Paragraph>
      <Paragraph position="3"> The backbone of the Biology Knowledge Base is its taxonomy, which is a large hierarchical structure of biological objects and biological processes. In addition to the objects and processes, the taxonomy includes the hierarchy of relations that may appear on concepts. The relation taxonomy provides a useful organizing structure for encoding information about &amp;quot;second order&amp;quot; relations, i.e., relations among all of the first order relations.</Paragraph>
      <Paragraph position="4"> Figure 2 depicts the Biology Knowledge Base's representation of embryo sac formation. This is a typical fragment of its semantic network. Each of the nodes in this network is a concept, e.g., megaspore mother cell, which we refer to as a unit or a frame.  A representation of embryo sac formation.</Paragraph>
      <Paragraph position="5"> Each of the arcs is a relation in the knowledge base. For example, the location for embryo sac formation is the concept ovule. We refer to these relations as slots or attributes and to the units that fill these slots, e.g., ovule, as values. In addition, we call a structure of the form (Unit Slot Value) a triple. The Biology Knowledge Base currently contains more than 180,000 explicitly represented triples, and its deductive closure is significantly larger.</Paragraph>
      <Paragraph position="6"> We chose biology as a domain for three reasons. First, it required us to grapple with difficult representational problems. Unlike a domain such as introductory geometry, biology cannot be characterized by a small set of axioms. Second, biology is not a &amp;quot;single-task&amp;quot; subject. Unlike the knowledge bases of conventional expert systems, e.g., MYCIN (Buchanan and Shortliffe 1984), the Biology Knowledge Base is not committed to any particular task or problem-solving method. Rather, it encodes general knowledge that can support diverse tasks and methods such as tutoring students, performing diagnosis, and organizing reference materials. For example, in addition to its use in explanation generation, it has been used as the basis for an automated qualitative model builder (Rickel and Porter 1994) for qualitative reasoning. Finally, we chose biology because of the availability of local domain experts at the University of Texas at Austin.</Paragraph>
      <Paragraph position="7"> It is important to note that the authors and the domain experts entered into a &amp;quot;contractual agreement&amp;quot; with regard to representational structures in the Biology Knowledge Base. To eliminate all requests for representational modifications that would skew the knowledge bas e to the task of explanation generation, the authors entered into this agreement: they could request representational changes only if knowledge was incon- null Lester and Porter Robust Explanation Generators sistent or missing. This facilitated a unique experiment in which the representational structures were not tailored to the task of explanation generation.</Paragraph>
      <Paragraph position="8"> 3. Accessing Semantically Rich, Large-Scale Knowledge Bases To perform well, an explanation system must select from a knowledge base precisely that information needed to answer users' questions with coherent and complete explanations. Given the centrality of content determination for explanation generation, it is instructive to distinguish two types of content determination, both of which play key roles in an explanation system's behavior: Local content determination is the selection of relatively small knowledge structures, each of which will be used to generate one or two sentences; global content determination is the process of deciding which of these structures to include in an explanation.</Paragraph>
      <Paragraph position="9"> There are two benefits of interposing a knowledge-base-accessing system between an explanation planner, which performs global content determination, and a knowledge base. First, it keeps the explanation planner at arm's length from the representation of domain knowledge, thereby making the planner less dependent on the particular representational conventions of the knowledge base and more robust in the face of errors. In addition, it can help build explanations that are coherent. Studies of coherence have focused on one aspect of coherence, cohesion, which is determined by the overall organization and realization of the explanation (Grimes 1975; Halliday and Hassan 1976; Hobbs 1985; Joshi and Weinstein 1981). However, the question &amp;quot;To insure coherence, how should the content of individual portions of an explanation be selected?&amp;quot; is equally important. Halliday and Hassan (1976) term this aspect of coherence semantic unity. There are at least two approaches to achieving semantic unity: either &amp;quot;packets&amp;quot; of propositions must be directly represented in the domain knowledge, or a knowledge-base-accessing system must be able to extract them at runtime.</Paragraph>
      <Paragraph position="10"> One type of coherent knowledge packet is a view. For example, the concept photosynthesis can be viewed as either a production process or an energy transduction process. Viewed as production, it would be described in terms of its raw materials and products: &amp;quot;During photosynthesis, a chloroplast uses water and carbon dioxide to make oxygen and glucose.&amp;quot; Viewed as energy transduction, it would be described in terms of input energy forms and output energy forms: &amp;quot;During photosynthesis, a chloroplast converts light energy to chemical bond energy.&amp;quot; The view that is taken of a concept has a significant effect on the content that is selected for its description. If an explanation system could (a) invoke a knowledge-base-accessing system to select views, and (b) translate the views to natural language (Figure 3), it would be well on its way to producing coherent explanations.</Paragraph>
      <Paragraph position="11"> As a building block for the KNIGHT explanation system, we designed and implemented a robust KB-accessing system that extracts views (Acker 1992; McCoy 1989; McKeown, Wish, and Matthews 1985; Souther et al., 1989; Swartout 1983; Suthers 1988, 1993) of concepts represented in a knowledge base. Each view is a coherent subgraph of the knowledge base describing the structure and function of objects, the change made to objects by processes, and the temporal attributes and temporal decompositions of processes. Each of the nine accessors in our library (Table 1) can be applied to a given concept (the concept of interest) to retrieve a view of that concept. There are three classes of accessors: those that are applicable to all concepts (As-Kind-Of and Functional), those that are applicable to objects (Partonomic-Connection and Substructural), and those that are applicable to processes (Auxiliary-Process--which includes  Accessing and translating a view of photosynthesis.</Paragraph>
      <Paragraph position="12"> Causal, Modulatory, Temporal, and Locational subtypes--Participants, Core-Connection, 2 Subevent, and Temporal-Step). In addition to these &amp;quot;top level&amp;quot; accessors, the library also provides a collection of some 20 &amp;quot;utility&amp;quot; accessors that extract particular aspects of views previously constructed by the system. 3 To illustrate, the Participants accessor extracts information about the &amp;quot;actors&amp;quot; of the given process. For example, some of the actors in the photosynthesis process are chloroplasts, light, chlorophyll, carbon dioxide, and glucose. By specifying a reference process-the second argument of the Participants accessor--the external agent can request a view of the process from the perspective of the reference process. For example, if the system applies the Participants accessor with photosynthesis as the concept of interest and production as the reference process, then the accessor will extract information about the producer (chloroplast), the raw materials (water and carbon dioxide), and the products (oxygen and glucose). In contrast, if the system applies the Participants accessor with photosynthesis as the concept of interest but with energy transduction as the reference process, then it would extract information about the transducer (chlorophyll), the energy provider (a photon), the input energy form (light), and the output energy form (chemical bond energy). By selecting different reference concepts, different information about a particular process will be returned.</Paragraph>
      <Paragraph position="13"> In addition to coherence, robustness is an important design criterion. We define robustness as the ability to gracefully cope with the complex representational struc-</Paragraph>
    </Section>
  </Section>
  <Section position="5" start_page="71" end_page="71" type="metho">
    <SectionTitle>
2 The Core-Connection accessor determines the relation between a process and a core concept. A core
</SectionTitle>
    <Paragraph position="0"> concept is one which is particularly central to a domain. For example, in biology, processes such as &amp;quot;development&amp;quot; and &amp;quot;reproduction&amp;quot; play central roles in many physiological explanations. During the design of a KB-accessing system, the domain-knowledge engineer selects the core concepts and flags these concepts in the knowledge base.</Paragraph>
  </Section>
  <Section position="6" start_page="71" end_page="73" type="metho">
    <SectionTitle>
3 For a more comprehensive description of the accessors, see Lester (1994).
</SectionTitle>
    <Paragraph position="0"> Lester and Porter Robust Explanation Generators Table 1 Library of knowledge-base accessors. KB accessor Arguments Description of View As-Kind-Of concept Finds view of concept as a kind of reference reference concept.</Paragraph>
    <Paragraph position="1"> Auxiliary-Process process Finds temporal, causal, or locational view-type information about process as specified by view-type.</Paragraph>
    <Paragraph position="2"> Participants process Finds &amp;quot;actor-oriented&amp;quot; view of process as reference viewed from the perspective of reference process.</Paragraph>
    <Paragraph position="3"> Core-Connection process Finds the connection between process and a &amp;quot;core&amp;quot; process.</Paragraph>
    <Paragraph position="4"> Functional object Finds functional view of process object with respect to process.</Paragraph>
    <Paragraph position="5"> Partonomic- object Finds the connection from object to a Connection &amp;quot;superpart&amp;quot; of the object in the &amp;quot;partonomy.&amp;quot;</Paragraph>
    <Section position="1" start_page="72" end_page="73" type="sub_section">
      <SectionTitle>
Subevent process
Substructural object
Temporal-Step process
</SectionTitle>
      <Paragraph position="0"> Finds view of &amp;quot;steps&amp;quot; of process.</Paragraph>
      <Paragraph position="1"> Finds structural view of parts of object.</Paragraph>
      <Paragraph position="2"> Finds view of process with respect to another process of which process is a &amp;quot;step.&amp;quot; tures encountered in large-scale knowledge bases without failing (halting execution). The KB accessors achieve robust performance in four ways:  * Omission Toleration: They do not assume that essential information will actually appear on a given concept in the knowledge base. * Type Checking: They employ a type-checking system that exploits the knowledge base's taxonomy.</Paragraph>
      <Paragraph position="3"> * Error Handling: When they detect an irregularity, they return appropriate error codes to the explanation planner.</Paragraph>
      <Paragraph position="4"> * Term Accommodation: They tolerate specialized (and possibly unanticipated) representational vocabulary by exploiting the relation  taxonomy.</Paragraph>
      <Paragraph position="5"> The following four techniques operate in tandem to achieve robustness. First, to cope with knowledge structures that contain additional, unexpected information, the KB accessors were designed to behave as &amp;quot;masks.&amp;quot; When they are applied to particular structures in a knowledge base, the accessors mask out all attributes that they were not designed to seek. Hence, they are unaffected by inappropriate attributes that were installed on a concept erroneously. Second, sometimes a domain-knowledge engineer installs inappropriate values on legal attributes. When the accessors encounter attributes with inappropriate values, they prevent fatal errors from occurring by employing a rigorous type-checking system. For example, suppose a domain-knowledge engineer had erroneously installed an object as one of the subevents of a process. The type-checking system detects the problem. Third, when problems are detected, the nature of the error is noted and reported to the explanation planner. Because the  Computational Linguistics Volume 23, Number 1 planner can reason about the types of problems, it can properly attend to them by excising the offending content from the explanation it is constructing. The KB accessor library currently uses more than 25 different error codes to report error conditions. For example, it will report no superevent available if the &amp;quot;parent&amp;quot; event of a process has not been included. Fourth, the KB accessors exhibit immunity to modifications of the representational vocabulary by the domain-knowledge engineer. For example, given an object, the Substructural accessor inspects the object to determine its parts. Rather than merely examining the attribute parts on the given object, the Substructural accessor examines all known attributes that bear the parts relation to other objects. These attributes include has basic unit, layers, fused parts, and protective components. The Substructural accessor recognizes that each of these attributes are partonomic relations by exploiting the knowledge base's relation taxonomy.</Paragraph>
      <Paragraph position="6"> By using these techniques together, we have developed a KB-accessing system that has constructed several thousand views without failing. Moreover, the view types on which the accessors are based performed well in a preliminary empirical study (Acker and Porter 1994), and evaluations of the KB accessors' ability to construct coherent views, as measured by domain experts' ratings of KNIGHT'S explanations (Section 8), are encouraging.</Paragraph>
    </Section>
  </Section>
  <Section position="7" start_page="73" end_page="74" type="metho">
    <SectionTitle>
4. A Programming Language for Discourse Knowledge
</SectionTitle>
    <Paragraph position="0"> Since the time of Aristotle, a central tenet of rhetoric has been that a rich structure underlies text. This structure shapes a text's meaning and assists its readers in deciphering that meaning. For almost two decades, computational linguists have studied the problem of automatically inducing this structure from a given text. Research in explanation planning addresses the inverse problem: automatically creating this structure by selecting facts from a knowledge base and subsequently using these facts to produce text. To automatically construct explanation plans (trees that encode the hierarchical structure of texts, as well as their content \[Grosz and Sidner 1986; Mann and Thompson 1987\]), an explanation system must possess discourse knowledge (knowledge about what characterizes a clear explanation). This discourse knowledge enables it to make decisions about what information to include in its explanations and how to organize the information.</Paragraph>
    <Paragraph position="1"> It is important to emphasize the following distinction between discourse knowledge and explanation plans: discourse knowledge specifies the content and organization for a class of explanations, e.g., explanations of processes, whereas explanation plans specify the content and organization for a specific explanation, e.g., an explanation of how photosynthesis produces sugar. Discourse-knowledge engineers build representations of discourse knowledge, and this discourse knowledge is then used by a computational module to automatically construct explanation plans, which are then interpreted by a realization system to produce natural language.</Paragraph>
    <Paragraph position="2"> The KB-accessing system described above possesses discourse knowledge in the form of KB accessors. Applying this discourse knowledge, the system retrieves views from the knowledge base. Although this ability to perform local content determination is essential, it is insufficient; given a query posed by a user, the generator must be able to choose multiple KB accessors, provide the appropriate arguments to these accessors, and organize the resulting views. Hence, in addition to discourse knowledge about local content determination, an explanation system that produces multi-paragraph explanations must also possess knowledge about how to perform global content determination and organization. This section sets forth two design requirements for a representation of discourse knowledge, desclibes the Explanation Design  Lester and Porter Robust Explanation Generators Package formalism, which was designed to satisfy these requirements, and discusses how EDPs can be used to encode discourse knowledge.</Paragraph>
    <Section position="1" start_page="74" end_page="74" type="sub_section">
      <SectionTitle>
4.1 Requirements for a Discourse-Knowledge Representation
</SectionTitle>
      <Paragraph position="0"> Our goal is to develop a representation of discourse knowledge that satisfies two requirements: It should be expressive, and it should facilitate efficient representation of discourse knowledge by discourse-knowledge engineers. 4 Each of these considerations are discussed in turn, followed by a representation that satisfies these criteria.</Paragraph>
      <Paragraph position="1"> Expressiveness. A representation of discourse knowledge must permit discourse-knowledge engineers to state how an explanation planner should: * select propositions from a knowledge base by extracting views, * control the amount of detail in an explanation, i.e., if a user requests that terse explanations be generated, the explanation planner should select only the most important propositions, * consider contextual conditions when determining which propositions to include, * order the propositions, and * group the propositions into appropriate segments, e.g., paragraphs.</Paragraph>
      <Paragraph position="2"> The first three aspects of expressiveness are concerned with content determination. To effectively express what content should be included in explanations, a representation of discourse knowledge should enable discourse-knowledge engineers to encode specifications about how to choose propositions about particular topics, the importance of those topics, and ufider what conditions the propositions associated with the topics should be included. These &amp;quot;inclusion conditions&amp;quot; govern the circumstances under which the explanation planner will select particular classes of propositions from the knowledge base when constructing an explanation. For example, a discourse-knowledge engineer might express the rule: &amp;quot;The system should communicate the location of a process if and only if the user of the system is familiar with the object where the process occurs.&amp;quot; As the explanation planner uses this knowledge to construct a response, it can determine if the antecedent of the rule (&amp;quot;the user of the system is familiar with the object where the process occurs&amp;quot;) is satisfied by the current context; if the antecedent is satisfied, then the explanation planner can include in the explanation the subtopics associated with the rule's consequent.</Paragraph>
      <Paragraph position="3"> The final two aspects of expressiveness (ordering and grouping of propositions) are concerned with organization. To encode organizational knowledge, a representation of discourse knowledge should permit discourse-knowledge engineers to encode topic/subtopic relationships. For example, the subtopics of a process description might include (1) a categorical description of the process (describing taxonomically what kind of process it is), (2) how the actors of the process interact, and (3) the location of the process.</Paragraph>
      <Paragraph position="4"> A representation should be sufficiently expressive that it can be used to encode the kinds of discourse knowledge discussed above, and it should be applicable to</Paragraph>
    </Section>
  </Section>
  <Section position="8" start_page="74" end_page="79" type="metho">
    <SectionTitle>
4 While expressiveness and knowledge engineering are the criteria we address, others are also of considerable importance, e.g., soundness and completeness of discourse planners.
</SectionTitle>
    <Paragraph position="0"> Computational Linguistics Volume 23, Number 1 representing discourse knowledge for a broad range of discourse genres and domains. However, discourse knowledge does not specify what syntactic structure to impose on a sentence, nor does it lend any assistance in making decisions about matters such as pronominalization, ellipsis, or lexical choice. These decisions are delegated to the realization system.</Paragraph>
    <Paragraph position="1"> Discourse-Knowledge Engineering. For a given query type, domain, and task, a discourse-knowledge engineer must be able to represent the discourse knowledge needed by an explanation system for responding to questions of that type in that domain about that task. Pragmatically, to represent discourse knowledge for a broad range of queries, domains, and tasks, a formalism must facilitate efficient representation of discourse knowledge. Kittredge, Korelsky, and Rarnbow (1991) have observed that representing new domain-dependent discourse knowledge--they term it domain communication knowledge--is required to create advanced discourse generators, e.g., those for special purpose report planning. Therefore, ease of creation, modification, and reuse are important goals for the design of a discourse formalism. For example, to build an explanation system for the domain of physics, a discourse-knowledge engineer could either build an explanation system de novo or modify an existing system. On the face of it, the second alternative involves less work and is preferable, but designing explanation systems that can be easily modified is a nontrivial task. In the case of physics, a discourse-knowledge engineer may need to modify an existing explanation system so that it can produce explanations appropriate for mathematical explanations.</Paragraph>
    <Paragraph position="2"> To do so, the discourse-knowledge engineer would ideally take an off-the-shelf explanation generator and add discourse knowledge about how to explain mathematical interpretations of the behavior of physical systems. Because of the central role played by discourse-knowledge engineers, a representation of discourse knowledge should be designed to minimize the effort required to understand, modify, and represent new discourse knowledge.</Paragraph>
    <Section position="1" start_page="75" end_page="78" type="sub_section">
      <SectionTitle>
4.2 Explanation Design Packages
</SectionTitle>
      <Paragraph position="0"> Explanation Design Packages emerged from an effort to accelerate the representation of discourse knowledge without sacrificing expressiveness. Our previous explanation generators employed a representation of discourse knowledge that was coded directly in Lisp (Lester and Porter 1991a, 1991b). Although this approach worked well for small prototype explanation systems, it proved unsatisfactory for building fully functioning explanation systems. In particular, it was very difficult to maintain and extend discourse knowledge expressed directly in code.</Paragraph>
      <Paragraph position="1"> Although EDPs are more schema-like than plan-based approaches and consequently do not permit an explanation system to reason about the goals fulfilled by particular text segments, 5 they have proven enormously successful for discourse-knowledge engineering. EDPs give discourse-knowledge engineers an appropriate set of abstractions for specifying the content and organization of explanations. They combine a frame-based representation language with embedded procedural constructs. To mirror the structure of expository texts, an EDP contains a hierarchy of nodes, which provides the global organization of explanations. EDPs are schema-like (McKeown 1985; Paris 1988) structures that include constructs found in traditional programming languages. Just as prototypical programming languages offer conditionals, iterative control structures, and procedural abstraction, EDPS offer discourse-knowledge engineers counterparts of  these constructs that are precisely customized for explanation-planning. 6 Moreover, each EDP names multiple KB accessors, which are invoked at explanation-planning time.</Paragraph>
      <Paragraph position="2"> Because EDPS are frame-based, they can be easily viewed and edited by knowledge engineers using the graphical tools commonly associated with frame-based languages.</Paragraph>
      <Paragraph position="3"> The EDP formalism has been implemented in the KM frame-based knowledge representation language, which is the same representational language used in the Biology Knowledge Base. Because KM is accompanied by a graphical user interface, discourse-knowledge engineers are provided with a development environment that facilitates EDP construction. This has proven to be very useful for addressing a critical problem in scaling up explanation generation: maintaining a knowledge base of discourse knowledge that can be easily constructed, viewed, and navigated by discourse-knowledge engineers.</Paragraph>
      <Paragraph position="4"> EDPs have several types of nodes, where each type provides a particular set of attributes to the discourse-knowledge engineer (Table 2). Note that content specification nodes may have elaboration nodes as their children, which in turn may have their own content specification nodes. This recursive appearance of content specification nodes permits a discourse-knowledge engineer to construct arbitrarily deep trees. In general, a node of a particular type in an EDP is used by the explanation planner to construct a corresponding node in an explanation plan. We discuss the salient aspects of each type of node below. 7 Exposition Nodes. An exposition node is the top-level unit in the hierarchical structure and constitutes the highest-level grouping of content. For example, the exposition 6 EDPs are Turing-equivalent. 7 Representational details of EDPs are discussed in (Lester 1994).  Computational Linguistics Volume 23, Number 1 node of the Explain-Process EDP has four children, Process Overview, Output-Actor-Fates, Temporal Info, and Process Details, each of which is a topic node. Both the order and grouping of the topic nodes named in an exposition node are significant. The order specifies the linear left-to-right organization of the topics, and the grouping specifies the paragraph boundaries. The content associated with topic nodes that are grouped together will appear in a single paragraph in an explanation.</Paragraph>
      <Paragraph position="5"> Topic Nodes. Topic nodes are subtopics of exposition nodes, and each topic node includes a representation of the conditions under which its content should be added to an explanation. Topic nodes have the atomic inclusion property, which enables an explanation planner to make an &amp;quot;atomic&amp;quot; decision about whether to include--or exclude--all of the content associated with a topic node. Atomicity permits discourse-knowledge engineers to achieve coherence by demanding that the explanation planner either include or exclude all of a topic's content. At runtime, if the explanation planner determines that inclusion conditions are not satisfied or if a topic is not sufficiently important given space limitations (see below), it can comprehensively eliminate all content associated with the topic.</Paragraph>
      <Paragraph position="6"> An important aspect of discourse knowledge is the relative importance of subtopics with respect to one another. If an explanation's length must be limited--such as when a user has employed the verbosity preference parameter to request terse explanations-an explanation planner should be able to decide at runtime which propositions to include. EDPS permit discourse-knowledge engineers to specify the relative importance of each topic by assigning a qualitative value (Low, Medium, or High) to its centrality attribute.</Paragraph>
      <Paragraph position="7"> Another important aspect of representing discourse knowledge is the ability to encode the conditions under which a group of propositions should be included in an explanation. Discourse-knowledge engineers can express these inclusion conditions as predicates on the knowledge base and on a user model (if one is employed). For example, he or she should be able to express the condition that the content associated with the Output-Actor-Fates topic should be included only if the process being discussed is a conversion process. Inclusion conditions are expressed as Boolean expressions that may contain both built-in user modeling predicates and user-defined functions.</Paragraph>
      <Paragraph position="8"> Content Speci~cation Nodes. Content specification nodes house the high-level specifications for extracting content from the knowledge base. To fulfill this function, they provide constructs known as content specification expressions. These expressions are instantiated at runtime by the explanation planner, which then dispatches the knowledge base accessors named in the expressions to extract propositions from the knowledge base. Content specification expressions reside in content specification nodes, as in Figure 4. When creating content specification expressions, the discourse-knowledge engineer may name any knowledge base accessor in the KB accessor library. For example, the Super-Structural Connection content specification in Figure 4 names a KB accessor called Find-Partonomic-Connection, and the Process Participants Description content specification names the Make-Participants-View accessor.</Paragraph>
      <Paragraph position="9"> Although the discourse-knowledge engineer may write arbitrarily complex specification expressions in which function invocations are deeply nested, these expressions can become difficult to understand, debug, and maintain. Just as other programming languages provide local variables, e.g., the binding list of a let statement in Lisp, so do content specification nodes. Each time a discourse-knowledge engineer creates a local variable, he or she creates an expression for computing the value of the local  variable at runtime. For example, the Process Participants Description content specification in Figure 4 employs a local variable ?Reference-Process. The content specification expression associated with ?Reference-Process names the KB accessor Find-Ref-Conc and the global variable ?Primary-Concept. Local variables provide a means of decomposing more complex content specification expressions into simpler ones.</Paragraph>
      <Paragraph position="10"> Elaboration Nodes. Elaboration nodes specify optional content that may be included in explanations. They are structurally and functionally identical to topic nodes, i.e., they have exactly the same attributes, and the children of elaboration nodes are content specifications. The distinction between elaboration nodes and topic nodes is maintained only as a conceptual aid to discourse-knowledge engineers: it stands as a reminder that topic nodes are used to specify the primary content of explanations, and elaboration nodes are used to specify supplementary content.</Paragraph>
    </Section>
    <Section position="2" start_page="78" end_page="79" type="sub_section">
      <SectionTitle>
4.3 Developing Task-Specific EDPs
</SectionTitle>
      <Paragraph position="0"> A discourse-knowledge engineer can use EDPS to encode discourse knowledge for his or her application. In our work, we focused on two types of texts that occur in many domains: process descriptions and object descriptions. For example, in biology, one encounters many process-oriented descriptions of physiological and reproductive mechanisms, as well as many object-oriented descriptions of anatomy. In the course of our research, we informally reviewed numerous (on the order of one hundred) passages in several biology textbooks. These passages focused on explanations of the anatomy, physiology, and reproduction of plants. Some explanations were very terse (e.g., those that occurred in glossaries), whereas some were more verbose (e.g., multipage explanations of physiological processes). Most of the texts also contained information about other aspects of botany, such as experimental methods and historical developments; these were omitted from the analysis. We manually &amp;quot;parsed&amp;quot; each passage into an informal language of structure, function, and process which is commonly found in the discourse literature; see Mann and Thompson (1987), McKeown (1985), Paris (1988), Souther et al. (1989), and Suthers (1988), for example. Our final step was to generalize the most commonly occurring patterns into abstractions that covered as many aspects of the passages as possible, which we then encoded in two Explanation Design Packages. While this work was essential for gaining insights about biological texts, it was a sketchy and preliminary effort to informally characterize their content and organization. A promising line of future work is to construct a large corpus of  Computational Linguistics Volume 23, Number 1 parsed discourse through a formal analysis. This will enable the natural language generation community to begin making inroads into producing discourse in the same manner that corpus-based techniques have aided discourse understanding efforts.</Paragraph>
      <Paragraph position="1"> The EDPs resulting from the analysis, Explain-Process and Explain-Object, can be used by an explanation planner to generate explanations about the processes and objects of physical systems. While these EDPS enable an explanation planner to generate quality explanations, we conjecture that employing a large library of specialized EDPS would produce explanations of higher quality. For the same reason that Kittredge, Korelsky, and Rambow (1991) note that domain-dependent discourse knowledge is critical for special purpose discourse generation, it appears that including EDPS specific to describing particular classes of biological processes (e.g., development and reproduction), would yield explanations whose content and organization better mirror that of explanations produced by domain experts. 8 Although we will not discuss the details of the EDPs here, it is instructive to examine their structure and function. The Explain-Process EDP (Figure 5) can be used by the explanation planner to generate explanations about the processes that physical objects engage in. For example, given a query about how a biological process such as embryo sac formation is carried out, the explanation planner can apply the Explain-Process EDP to construct an explanation plan that houses the content and organization of the explanation. The Explain-Process EDP has four primary topics:  * Process Overview: Explains how a process fits into a taxonomy, discusses the role played by its actors, and discusses where it occurs.</Paragraph>
      <Paragraph position="2"> * Process Details: Explains the steps of a process.</Paragraph>
      <Paragraph position="3"> * Temporal Attributes: Explains how a process is related temporally to other processes.</Paragraph>
      <Paragraph position="4"> * Output-Actor-Fates: Discusses how the &amp;quot;products&amp;quot; of a process are used by other processes.</Paragraph>
      <Paragraph position="5">  As computational linguists have known for many years, formally characterizing texts is a very difficult, time-consuming, and error-prone process. Because any initial discourse representation effort must, by necessity, be considered only a beginning, the next step was to incrementally revise the EDPs. The EDPs were used to automatically construct hundreds of explanations: the explanation planner used the EDPs to construct explanation plans, and the realization system translated these plans to natural language.</Paragraph>
      <Paragraph position="6"> The resulting explanations were presented to our domain expert, who critiqued both their content and organization, and we used these critiques to incrementally revise the EDPs. The majority of revisions involved the reorganization and removal of nodes in the EDPs. For example, the domain expert consistently preferred a different global organization than the one encoded in the original Explain-Process EDP. He also preferred explanations produced by a version of the Explain-Process EDP in which the information that had previously been associated with a Process Significance topic was associated with the Temporal Attributes topic. Moreover, he found that an Actor Elaborations node produced information that was &amp;quot;intrusive.&amp;quot; Some revisions involved 8 While we have not explored this hypothesis in the work described here, the EDP framework can be used to test it empirically.</Paragraph>
      <Paragraph position="7">  The final version of the Explain-Process explanation design.</Paragraph>
      <Paragraph position="8"> modifications to particular attributes of the nodes. For example, the inclusion condition on the original Output-Actor-Fates topic was TRUE. Instead, the domain expert preferred for explanations to include the content associated with this topic only when the process being described was a &amp;quot;conversion&amp;quot; process. After approximately twenty passes through the critiquing and revision phases, EDPs were devised that produced clear explanations meeting with the domain expert's approval.</Paragraph>
    </Section>
  </Section>
  <Section position="9" start_page="79" end_page="83" type="metho">
    <SectionTitle>
5. Planning Explanations
</SectionTitle>
    <Paragraph position="0"> Explanation planning is the task of determining the content and organization of explanations. We have designed an architecture for explanation generation and implemented a full-scale explanation generator, KNIGHT, 9 based upon this architecture.</Paragraph>
    <Paragraph position="1">  specification that comes in the form of a qualitative rating expressing the desired length of the explanation (Figure 6). The query interpreter--whose capabilities have been addressed only minimally in our work--translates the query to a canonical form, which is passed, along with the verbosity specification, to the explanation planner. Explanation planning is a synthetic task in which multiple resources are consulted to assemble data structures that specify the content and organization of explanations. KNIGHT's explanation planner uses the following resources: the Biology Knowledge Base, Explanation Design Packages, the KB-accessing system, and an overlay user model. 1deg The explanation planner invokes the EDP Selector, which chooses an Explanation Design Package from the EDP library. The explanation planner then applies the EDP by traversing its hierarchical structure. For each node in the EDP, the planner determines if it should construct a counterpart node in the explanation plan it is building. (Recall that the topic nodes and elaboration nodes of an EDP are instantiated only when their conditions are satisfied.) As the plan is constructed, the explanation planner updates the user model to reflect the contextual changes that will result from explaining the views in the explanation plan, attends to the verbosity specification, and invokes KB accessors to extract information from the knowledge base. Recall that the accessors return views, which are subgraphs of the knowledge base. The planner attaches the views to the explanation plan; they become the plan's leaves. Planning is complete when the explanation planner has traversed the entire EDP.</Paragraph>
    <Paragraph position="2"> The planner passes the resulting explanation plan to the realization component (Section 6) for translation to natural language. The views in the explanation plan are grouped into paragraph clusters. After some &amp;quot;semantic polishing&amp;quot; to improve the content for linguistic purposes, the realization component translates the views in the explanation plan to sentences. The realization system collects into a paragraph all of the sentences produced by the views in a particular paragraph cluster. Explanation generation terminates when the realization component has translated all of the views in the explanation plan to natural language.</Paragraph>
    <Section position="1" start_page="81" end_page="83" type="sub_section">
      <SectionTitle>
5.2 The Explanation-Planning Algorithms
</SectionTitle>
      <Paragraph position="0"> The EXPLAIN algorithm (Figure 7) is supplied with a query type (e.g., Describe-Process), a primary concept (e.g., embryo sac formation), and a verbosity specification (e.g., High).</Paragraph>
      <Paragraph position="1"> Its first step is to select an appropriate EDP. The EDP library has an indexing structure that maps a query type to the EDP that can be used to generate explanations for queries of that type. This indexing structure permits EDP selection to be reduced to a simple look-up operation. For example, given the query type Describe-Process, the EDP Selector will return the Explain-Process Explanation Design Package. The planner is now in a position to apply the selected EDP to the knowledge base. The APPLY EDP algorithm takes four arguments: the exposition node of the EDP that will be applied, a newly created exposition node, which will become the root of the explanation plan that will be constructed, the verbosity specification, and the loop variable bindings. 11 The planner first locates the root of the selected EDP, which is an exposition node.</Paragraph>
      <Paragraph position="2"> Next, it creates the corresponding exposition node for the soon-to-be-constructed explanation plan. It then invokes the APPLY EDP algorithm, which is given the exposition 10 As the planner constructs explanation plans, it consults an overlay user model (Carr and Goldstein 1977). KNIGHT's user-sensitive explanation generation is not addressed in this paper. For a discussion of this work, see Lester and Porter (1991b) and Lester (1994).</Paragraph>
      <Paragraph position="4"> The EXPLAIN algorithm.</Paragraph>
      <Paragraph position="5"> node of the EDP to be applied, the newly created exposition node that will become the root of the explanation plan, the verbosity, and a list of the loop variable bindings. 12 The APPLY EDP algorithm (Figure 8) and the algorithms it invokes traverse the hierarchical structure of the EDP to build an explanation plan. Its first action is to obtain the children of the EDP's exposition node; these are the topic nodes of the EDP. For each topic node, the EDP Applier constructs a new (corresponding) topic node for the evolving explanation plan. The Applier must then weigh several factors in its decision about whether to include the topic in the explanation: inclusion, which is the inclusion condition associated with the topic; centrality, which is the centrality rating that the discourse-knowledge engineer has assigned to the topic; and verbosity, which is the verbosity specification supplied by the user.</Paragraph>
      <Paragraph position="6"> If the inclusion condition evaluates to FALSE, the topic should be excluded regardless of the other two factors. Otherwise, the COMPUTE INCLUSION algorithm must consider the topic's importance and the amount of detail requested and will include the topic in the following circumstances: the verbosity is High; the verbosity is Low but the topic's centrality has been rated as High by the discourse-knowledge engineer; or the verbosity is Medium and the topic's centrality has been rated as Medium or High.</Paragraph>
      <Paragraph position="7"> When the COMPUTE INCLUSION algorithm returns TRUE, the Applier obtains the children of the EDP's topic. These are its content specification nodes. For each of the topic's content specification nodes, the Applier invokes the DETERMINE CONTENT algorithm, which itself invokes KB accessors named in the EDP's content specification nodes. This action extracts views from the knowledge base and attaches them to the explanation plan.</Paragraph>
      <Paragraph position="8"> To determine the content of the information associated with elaboration nodes, DETERMINE CONTENT invokes the APPLY EDP algorithm. Because it was the APPLY EDP algorithm that invoked DETERMINE CONTENT, this is a recursive call. In this invocation of APPLY EDP--as opposed to the &amp;quot;top-level&amp;quot; invocation by the EXPLAIN algorithm--APPLY EDP is given an elaboration node instead of a topic node. By recursively invoking APPLY EDP, DETERMINE CONTENT causes the planner to traverse the elaboration branches of a content node. The recursion bottoms out when the system encounters the leaves of the EDP, i.e., content specification nodes in the EDP that do not have elaborations.</Paragraph>
      <Paragraph position="9"> Rather than merely returning a flat list of views, the EXPLAIN algorithm examines the paragraph specifications in the nodes of the EDP it applied. The paragraph spec- null The order of the paragraph clusters controls the global structure of the final textual explanation; the order of the views in each paragraph cluster determines the order of sentences in the final text. 13 Finally, the EXPLAIN algorithm passes the paragraph clusters to the REALIZE algorithm, which translates them to natural language.</Paragraph>
    </Section>
  </Section>
  <Section position="10" start_page="83" end_page="93" type="metho">
    <SectionTitle>
6. Realization
</SectionTitle>
    <Paragraph position="0"> The explanation planner should be viewed as an automatic specification writer: its task is to write specifications for the realization component, which interprets the specifications to produce natural language. Although our work focuses on the design, construction, and evaluation of explanation planners, by constructing a full-scale natural language generator, it becomes possible to conduct a &amp;quot;pure&amp;quot; empirical evaluation of explanation planners. Without a realization component, the plans produced by an explanation planner would need to be manually translated to natural language, which would raise questions about the purity of the experiments. We therefore designed and implemented a full-scale realization component. 14 13 The realization algorithm treats these groupings as suggestions that may be overridden in extenuating circumstances. 14 During the past few years, we have developed a series of realization systems. The first realizer, which was designed and implemented by the first author, was a template-based generator. The second realizer, which was designed by Kathy Mitchell and the authors (Mitchell 1992), used the Penman (Mann 1983) surface generator. The third realizer (Callaway and Lester 1995) is described briefly in this section; it was developed by the first author and Charles Callaway.</Paragraph>
    <Paragraph position="2"> (pro, ((type material) (lex '(form&amp;quot;))) (patti, ((agent ((semantics (partic agent semantics))))</Paragraph>
    <Paragraph position="4"> A functional description.</Paragraph>
    <Paragraph position="5"> Realization can be decomposed into two subtasks: functional realization, constructing functional descriptions from message specifications supplied by a planner; and surface generation, translating functional descriptions to text. Functional descriptions encode both semantic information (case assignments) and structural information (phrasal constituent embeddings). Syntactically, a functional description is a set of attribute and value pairs (a v) (collectively called a feature set), where a is an attribute (a feature) and v is either an atomic value or a nested feature set. is To illustrate, Figure 9 depicts a sample functional description. The first line, (cat clause), indicates that what follows will be some type of verbal phrase, in this case a sentence. The second line contains the keyword pro,, which denotes that everything in its scope will describe the structure of the entire verbal phrase. The next structure comes under the heading partic; this is where the thematic roles of the clause are specified. In this instance, one thematic role exists in the main sentence, the agent (or subject), which is further defined by its lexical entry and a modifying prepositional phrase indicated by the keyword qualifier. The structure beginning with circum creates the subordinate infinitival purpose clause. It has two thematic roles, subject and object. The subject has a pointer to identify itself with the subject of the main clause while the object contains a typical noun phrase. The feature set for the circum clause indicates the wide range of possibilities for placement of the clause as well as for introducing additional phrasal substructures into the purpose clause.</Paragraph>
    <Paragraph position="6"> To construct functional descriptions from views extracted from a knowledge base, KNIGHT employs a functional realization system (Callaway and Lester 1995). Given a 15 Functional descriptions may also employ syntactic sugar for purposes of legibility.</Paragraph>
    <Paragraph position="7">  Lester and Porter Robust Explanation Generators view, the functional realizer uses its knowledge of case mappings, syntax, and lexical information to construct a functional description, which it then passes to the FUF  surface generator. The functional realizer consists of five principal components:.</Paragraph>
    <Paragraph position="8"> * Lexicon: Physically distributed throughout the knowledge base; each concept frame has access to all of the lexical information relevant to its own realization.</Paragraph>
    <Paragraph position="9"> * Functional Description Skeleton Library: Contains a large number of Functional Description (FD) Skeletons, each of which encodes the associated syntactic, semantic, and role assignments for interpreting a specific type of message specification.</Paragraph>
    <Paragraph position="10"> * Functional Description Skeleton Retriever: Charged with the task of selecting the correct Functional Description Skeleton from the skeleton library.</Paragraph>
    <Paragraph position="11"> * Noun Phrase Generator: Responsible for drawing lexical information from  the lexicon to create a self-contained functional description representing each noun phrase required by the FD-Skeleton processor.</Paragraph>
    <Paragraph position="12"> * Functional Description Skeleton Processor: Gathers all of the available information from the FD-Skeleton, the lexicon, and the noun phrase generator; produces the final functional description.</Paragraph>
    <Paragraph position="13"> When the functional realizer is given a view, its first task is to determine the appropriate FD-Skeleton to use. Once this is accomplished, the FD-Skeleton is passed along with the message specification to the FD-Skeleton processor. The FD-Skeleton processor first determines if each of the essential descriptors is present; if any of these tests fail, it will note the deficiency and abort. If the message is well-formed, the FD-Skeleton processor passes each realizable concept unit found on the message specification to the noun phrase generator, which uses the lexicon to create a functional description representing each concept unit. The noun phrase generator then returns each functional description to the FD-Skeleton processor, which assigns case roles to the (sub)functional descriptions. The resulting functional description, which encodes the functional structure for the entire content of the message specification, is then passed to the surface realizer. Surface realization is accomplished by FUF (Elhadad 1992). Developed by Elhadad and his colleagues at Columbia, FUF is accompanied by an extensive, portable English grammar, which is &amp;quot;the result of five years of intensive experimentation in grammar writing&amp;quot; (p. 121) and is currently the largest &amp;quot;generation grammar&amp;quot; in existence (Elhadad 1992). Given a set of functional descriptions, FUF constructs the final text.</Paragraph>
    <Paragraph position="14"> 7. Example Behavior To illustrate the behavior of the system, consider the concept of embryo sac formation. The semantic network in the Biology Knowledge Base that represents information about embryo sac formation was shown in Figure 2. When KNIGHT is given the task of explaining this concept, 16 it applies the Explain-Process EDP as illustrated in Figure 5.  Computational Linguistics Volume 23, Number 1 its traversal of this tree, it begins with Process Overview, which has a High centrality rating and an inclusion condition of TRUE. KNIGHT executes the COMPUTE INCLUSION algorithm with the given verbosity of High, which returns TRUE, i.e., the information associated with the topic should be included.</Paragraph>
    <Paragraph position="15"> Hence, it now begins to traverse the children of this topic node, which are the As-Kind-Of-Process Description, Process Participants, and Location Description content specification nodes. For the As-Kind-Of Process Description, it computes a value for the local variable ?Reference-Concept, which returns the valuefemalegametophyteformation. It then instantiates the content specification template on As-Kind-Of Process Description, which it then evaluates. This results in a call to the As-Kind-Of KB accessor, which produces a view. The view produced in this execution will eventually be translated to the sentence, &amp;quot;Embryo sac formation is a kind of female gametophyte formation.&amp;quot; Similarly, KNIGHT instantiates the content specification expressions of Process Participants Description and Location Description, which also cause KB accessors to be invoked; these also return views. The first of these views will be used to produce the sentence, &amp;quot;During embryo sac formation, the embryo sac is formed from the megaspore mother cell,&amp;quot; and the second will produce the sentence, &amp;quot;Embryo sac formation occurs in the ovule.&amp;quot; Next KNIGHT visits the Location Partonomic-Connection node, which is an elaboration of Location Description. However, because its inclusion condition is not satisfied, this branch of the traversal halts.</Paragraph>
    <Paragraph position="16"> Next, KNIGHT visits each of the other topics of the Explain-Process exposition node: Output-Actor-Fates, Temporal Information and Process Details. When it visits the Output-Actor-Fates topic, the inclusion condition is not satisfied. Because it was given a High verbosity specification and the inclusion conditions are satisfied, both Temporal Information and Process Details are used to determine additional content. The view constructed from Temporal Information will produce the sentence, &amp;quot;Embryo sac formation is a step of angiosperm sexual reproduction,&amp;quot; and the Process Details will result in the generation of descriptions of the steps of embryo sac formation, namely, megasporogenesis and embryo sac generation. When the views in the resulting explanation plan (Figure 10) are translated to text by the realization system, KNIGHT produces the explanation shown in Figure 1.</Paragraph>
    <Paragraph position="17"> These algorithms have been used to generate explanations about hundreds of different concepts in the Biology Knowledge Base. For example, Section 2 shows other explanations generated by KNIGHT. The explanation of pollen tube growth was produced by applying the Explain-Process EDP, and the explanations of spore and root system were produced by applying the Explain-Object EDP.</Paragraph>
    <Paragraph position="18"> 8. Evaluation Traditionally, research projects in explanation generation have not included empirical evaluations. Conducting a formal study with a generator has posed difficulties for at least three reasons: the absence of large-scale knowledge bases; the problem of robustness; and the subjective nature of the task. First, the field of explanation generation has experienced a dearth of &amp;quot;raw materials.&amp;quot; The task of an explanation generator is three-fold: to extract information from a knowledge base, to organize this information, and to translate it to natural language. Unless an explanation generator has access to a sufficiently large knowledge base, the first step--and hence the second and third-cannot be carried out enough times to evaluate the system empirically. Unfortunately, because of the tremendous cost of construction, large-scale knowledge bases are scarce. Second, even if large-scale knowledge bases were more plentiful, an explanation generator cannot be evaluated unless it is sufficiently robust to produce many explana- null An explanation plan for embryo sac formation: High verbosity.</Paragraph>
    <Paragraph position="19"> tions. In very practical terms, a generator is likely to halt abruptly when it encounters unusual and unexpected knowledge structures; if this happens frequently, the system will generate too few explanations to enable a meaningful evaluation. We conjecture that most implemented explanation generators would meet with serious difficulties when applied to a large-scale knowledge base.</Paragraph>
    <Paragraph position="20"> Third, explanation generation is an ill-defined task. It stands in contrast to a machine learning task such as rule induction from examples. Although one can easily count the number of examples that an induction program classifies correctly, there is no corresponding objective metric for an explanation generator. Ideally, we would like to &amp;quot;measure&amp;quot; the coherence of explanations. Although it is clear that coherence is of paramount importance for explanation generation, there is no litmus test for it.</Paragraph>
    <Paragraph position="21"> Given these difficulties, how can one evaluate the architectures, algorithms, and knowledge structures that form the basis for an explanation generator? The traditional approach has been to conduct an analytical evaluation of a system's architecture and demonstrate that it can produce well-formed explanations on a few examples. While this evaluation technique is important, it is not sufficient. Three steps can be taken to promote better evaluation. First, we can construct large-scale knowledge bases, such as the Biology Knowledge Base. Second, we can design and implement robust explanation systems that employ a representation of discourse knowledge that is easily manipulable by discourse-knowledge engineers. Third, to ensure that a knowledge base is not tailored to the purposes of explanation generation, we can enter into a contractual agreement with knowledge engineers; this eliminates all requests for representational modifications that would skew the representation to the task of explanation generation.</Paragraph>
    <Section position="1" start_page="88" end_page="90" type="sub_section">
      <SectionTitle>
8.1 Experimental Design
</SectionTitle>
      <Paragraph position="0"> The Two-Panel evaluation methodology can be used to empirically evaluate natural language generation work. We developed this methodology, which involves two pan- null Computational Linguistics Volume 23, Number 1 els of domain experts, to combat the inherent subjectivity of NLG: although multiple judges will rarely reach a consensus, their collective opinion provides persuasive evidence about the quality of explanations. To ensure the integrity of the evaluation results, a central stipulation of the methodology is that the following condition be maintained throughout the study: Computer Blindness: None of the participants can be aware that some texts are machine-generated or, for that matter, that a computer is in any way involved in the study.</Paragraph>
      <Paragraph position="1"> The methodology involves four steps:  1. Generation of explanations by computer.</Paragraph>
      <Paragraph position="2"> 2. Formation of two panels of domain experts.</Paragraph>
      <Paragraph position="3"> 3. Generation of explanations by one panel of domain experts.</Paragraph>
      <Paragraph position="4"> 4. Evaluation of all explanations by the second panel of domain experts.  Each of these is discussed in turn.</Paragraph>
      <Paragraph position="5"> Explanation Generation: KNIGHT. Because KNIGHT's operation is initiated when a user poses a question, the first task was to select the questions it would be asked. To this end, we combed the Biology Knowledge Base for concepts that could furnish topics for questions. Although the knowledge base focuses on botanical anatomy, physiology, and development, it also contains a substantial amount of information about biological taxons. Because this latter area is significantly less developed, we ruled out concepts about taxons. In addition, we ruled out concepts that were too abstract (e.g., Object). We then requested KNIGHT to generate explanations about the 388 concepts that passed through these filters.</Paragraph>
      <Paragraph position="6"> To thoroughly exercise KNIGHT'S organizational abilities, we were most interested in observing its performance on longer explanations. Hence, we eliminated explanations of concepts that were sparsely represented in the knowledge base. To this end, we passed the 388 explanations through a &amp;quot;length filter&amp;quot;: explanations that consisted of at least 3 sentences were retained; shorter explanations were disposed of. 17 This produced 87 explanations, of which 48 described objects and 39 described processes. Finally, to test an equal number of objects and processes, we randomly chose 30 objects and 30 processes.</Paragraph>
      <Paragraph position="7"> Two Panels of Domain Experts. To address the difficult problem of subjectivity, we assembled 12 domain experts, all of whom were Ph.D. students or post-doctoral scientists in biology. Because we wanted to gauge KNIGHT's performance relative to humans, we assigned each of the experts to one of two panels: the Writing Panel and the Judging Panel. By securing the services of such a large number of domain experts, we were able to form relatively large panels of 4 writers and 8 judges (Figure 11). To promote high-quality human-generated explanations, we assigned the 4 most experienced experts to the Writing Panel. The remaining 8 experts were assigned to the Judging Panel to evaluate explanations.</Paragraph>
      <Paragraph position="8"> 17 A separate study would be to evaluate KNIGHT on very short (one-sentence and two-sentence) explanations. However, this study would be an evaluation of how it behaves in the face of highly incomplete knowledge rather than a fair head-to-head comparison with knowledgeable experts.</Paragraph>
    </Section>
    <Section position="2" start_page="90" end_page="91" type="sub_section">
      <SectionTitle>
Eval!ations
</SectionTitle>
      <Paragraph position="0"> The Two-Panel methodology in the KNIGHT experiments.</Paragraph>
      <Paragraph position="1"> To minimize the effect of factors that might make it difficult for judges to compare KNIGHT's explanations with those of domain experts, we took three precautions. First, we attempted to control for the length of explanations. Although we could not impose hard constraints, we made suggestions about how long a typical explanation might be. Second, to make the &amp;quot;level&amp;quot; of the explanations comparable, we asked writers to compose explanations for a particular audience, freshman biology students. Third, so that the general topics of discussion would be comparable, we asl&lt;ed writers to focus on anatomy, physiology, and development.</Paragraph>
      <Paragraph position="2"> Explanation Generation: Humans. To ensure that the difficulty of the concepts assigned to the writers were the same as those assigned to KNIGHT, the writers were given the task of explaining exactly the same set of concepts that KNIGHT had explained. Because we wanted to give writers an opportunity to explain both objects and processes, each writer was given an approximately equal number of objects and processes. Each of the four writers was given 15 concepts to explain, and each concept was assigned to exactly one writer. We then transcribed their handwritten explanations and put them and KNIGHT'S explanations into an identical format. At this point, we had a pool of 120 explanations: 60 of these pertained to objects (30 written by biologists and 30 by  Computational Linguistics Volume 23, Number 1 KNIGHT), and the other 60 pertained to processes (also 30 written by biologists and 30 by KNIGHT).</Paragraph>
      <Paragraph position="3"> Explanation Evaluation. We then submitted the explanations to the panel of eight judges. The judges were not informed of the source of the explanations, and all of the explanations appeared in the same format. Each judge was given 15 explanations to evaluate. Judges were asked to rate the explanations on several dimensions: overall quality and coherence, content, organization, writing style, and correctness. To provide judges with a familiar rating scale, they were asked to assign letters grades (A, B, C, D, or F) to each explanation on each of the dimensions. Because carefully evaluating multiple dimensions of explanations is a labor-intensive task, time considerations required us to limit the number of explanations submitted to each judge. Hence, we assigned each judge a set of 15 explanations. (On average, each judge took an hour to evaluate 15 explanations.) We assigned explanations to judges using an allocation policy that obeyed the following four constraints:  * System-Human Division: Each judge received explanations that were approximately evenly divided between those that were produced by KNIGHT and those that were produced by biologists.</Paragraph>
      <Paragraph position="4"> * Object-Process Division: Each judge received explanations that were approximately evenly divided between objects and processes.</Paragraph>
      <Paragraph position="5"> * Single-Explanation Restriction: No judge received two explanations of the same concept. TM * Multijudge Stipulation: The explanations written by each writer were  parceled out to at least two judges, i.e., rather than having one judge evaluate one writer's explanations, that writer's explanations were distributed among multiple judges.</Paragraph>
      <Paragraph position="6"> It is important to emphasize again that the judges were not made aware of the purpose of the experiment, nor were they told that any of the explanations were computergenerated. null</Paragraph>
    </Section>
    <Section position="3" start_page="91" end_page="93" type="sub_section">
      <SectionTitle>
8.2 Results
</SectionTitle>
      <Paragraph position="0"> By the end of the study, we had amassed a large volume of data. To analyze it, we converted each of the &amp;quot;grades&amp;quot; to their traditional numerical counterparts, i.e., A = 4, B = 3, C = 2, D = 1, and F = 0. Next, we computed means and standard errors for both KNIGHT'S and the biologists' grades. We calculated these values for the overall quality and coherence rating, as well as for each of the dimensions of content, organization, writing style, and correctness. On the overall rating and on each of the dimensions, KNIGHT scored within approximately half a grade of the biologists (Table 3). 19 Given these results, we decided to investigate the differences between KNIGHT's grades and the biologists' grades. When we normalized the grades by defining an A to be the mean of the biologists' grades, KNIGHT earned approximately 3.5 (a B+). Comparing differences in dimensions, KNIGHT performed best on correctness and content, not quite as well on writing style, and least well on organization.</Paragraph>
      <Paragraph position="1"> 18 The purpose of this constraint is to promote immediate, nondeliberative reactions from the judges. An alternate study would consist of judges consciously analyzing pairs of explanations to perform an explicit comparative analysis.</Paragraph>
      <Paragraph position="2"> 19 In the tables, :k denotes the standard error, i.e., the standard deviation of the mean.</Paragraph>
      <Paragraph position="3">  Because the differences between KNIGHT and the biologists were narrow in some cases, we measured the statistical significance of these differences by running standard t-tests. 2deg KNIGHT's grades on the content; organization, and correctness dimensions did not differ significantly from the biologists' (Table 4). Of course, an insignificant difference does not indicate that KNIGHT'S performance and the biologists' performance was equivalent--an even larger sample size might have shown a significant difference-however, it serves as an indicator that KNIGHT'S performance approaches that of the biologists on these three dimensions.</Paragraph>
      <Paragraph position="4"> To gauge how well KNIGHT generates explanations about objects--as opposed to processes--we computed means and standard errors for both KNIGHT's explanations of objects and the biologists' explanations of objects. We did the same for the explanations of processes. For both objects and processes, KNIGHT scored within half a grade of the biologists. Again, we measured the statistical significance of these differences. Although there was a significant difference between KNIGHT and biologists on explanations of processes, KNIGHT and the biologists did not differ significantly on explanations of objects (Tables 5 and 6). A probable cause of this result lies in the domain: in biology, process explanations are often more complex than object explanations, therefore making process explanations more challenging to generate.</Paragraph>
      <Paragraph position="5"> As a final test, we compared KNIGHT to each of the individual writers. For a given writer, we assessed KNIGHT's performance relative to that writer: we compared the grades awarded to KNIGHT and the grades awarded to the writer on explanations generated in response to the same set of questions. This analysis produced some surprising results. Although there were substantial differences between KNIGHT and &amp;quot;Writer 1,&amp;quot; KNIGHT was somewhat closer to &amp;quot;Writer 2,&amp;quot; it was very close to &amp;quot;Writer 3,&amp;quot; and its performance actually exceeded that of &amp;quot;Writer 4.&amp;quot; KNIGHT and Writers 2, 3, and 4 did not differ significantly (Table 7).</Paragraph>
      <Paragraph position="6"> 20 All t-tests were unpaired, two-tailed. The results are reported for a 0.05 level of confidence.</Paragraph>
    </Section>
  </Section>
  <Section position="11" start_page="93" end_page="96" type="metho">
    <SectionTitle>
9. Related Work
</SectionTitle>
    <Paragraph position="0"> By synthesizing a broad range of research in natural language generation, KNIGHT provides a &amp;quot;start-to-finish&amp;quot; solution to the problem of automatically constructing expository explanations from semantically rich, large-scale knowledge bases. It introduces a new evaluation methodology and builds on the conceptual framework that has evolved in the NLG community over the past decade, particularly in techniques for knowledge-base access and discourse-knowledge representation. We discuss each of these in turn.</Paragraph>
    <Paragraph position="1"> Evaluation Methodologies. With regard to evaluation, KNIGHT is perhaps most closely related to five NLG projects that have been empirically evaluated: PAULINE (Hovy 1990), EDGE (Cawsey 1992), the Example Generator 21 (Mittal 1993), ANA (Kukich 1983), and 21 Mittal's system has no official name; we refer to it as &amp;quot;the Example Generator&amp;quot; for ease of reference.  Lester and Porter Robust Explanation Generators STREAK (Robin 1994). By varying pragmatic information such as tone, Hovy enabled PAULINE to generate many different paragraphs on the same topic. PAULINE'S texts were not formally analyzed by a panel of judges, and it did not produce texts on a wide range of topics (it generated texts on only three different events). However, this project is a significant achievement in terms of evaluation scale because of the sheer number of texts it produced: PAULINE generated more than 100 different paragraphs on the same subject. In a second landmark evaluation, Cawsey undertook a study in which subjects were allowed to interact with her explanation generation system, EDGE (Cawsey 1992). Subjects posed questions to EDGE about the operation of four circuits. Cawsey analyzed the system's behavior as the dialogues progressed, interviewed subjects, and used the results to revise the system. Although EDGE does not include a realization system (other than simple templates) and it was not subjected to a tightly controlled, formal evaluation, it was sufficiently robust to be used interactively by eight subjects.</Paragraph>
    <Paragraph position="2"> The EXAMPLE GENERATOR (Mittal 1993), ANA (Kukich 1983), and STREAK (Robin 1994) were each subjected to formal (quantitative) evaluations. Mittal developed and formally evaluated a generator that produced descriptions integrating text and examples. Rather than evaluating the explanations directly, subjects were given a quiz about the concept under consideration. 22 The degree to which the experiments controlled for specific factors, e.g., the effect of example positioning, example types, example complexity, and example order, is remarkable. ANA and STREAK were both subjected to quantitative, corpus-based evaluations. Kukich employed a corpus-based methodology to judge the coverage of ANA's knowledge structures. STREAK, which constructs summaries of basketball games, is part of a larger effort by J. Robin, K. McKeown, and their colleagues at Columbia and Bellcore to develop robust document generation systems (McKeown, Robin, and Kukich 1995). It was evaluated with a corpus-based study that produced estimates of STREAK's sublanguage coverage, extensibility, and the overall effectiveness of its revision-based generation techniques. Although neither of these studies employed human judges to critique text quality, the rigor with which they were conducted has significantly raised the standards for evaluating generation systems.</Paragraph>
    <Paragraph position="3"> The relationship between the KNIGHT evaluation and those of its predecessors is summarized in Table 8. KNIGHT, STREAK, and ANA were all evaluated formally, i.e., quantitatively, while PAULINE and EDGE were evaluated informally. The KNIGHT, EDGE, and EXAMPLE GENERATOR evaluations employed humans as judges, while the ANA and STREAK evaluations had &amp;quot;artificial judges&amp;quot; in the form of corpora, and PAULINE was evaluated without judges. KNIGHT is the only system to have been evaluated in the context of a semantically rich, large-scale knowledge base. KNIGHT is also the only system to have been evaluated in a kind of restricted Turing test in which the quality of its text was evaluated by humans in a head-to-head comparison against the text produced by humans (domain experts) in response to the same set of questions.</Paragraph>
    <Paragraph position="4"> Knowledge-Base Access. Several projects in explanation generation have exploited views to improve the quality of the explanations they provide. The ADVISOR system (McKeown, Wish, and Matthews 1985) represents views with a multiple-hierarchy knowledge base. ADVISOR infers a user's current goal from his or her most recent utterances and uses this goal to select a hierarchy from the multiple-hierarchy knowledge base. The  selected view controls the content of the explanation and the reasoning that produced that content. In a similar vein, viewpoints in Swartout's XPLAIN (Swartout 1983) are annotations that indicate when to include a piece of knowledge in an explanation.</Paragraph>
    <Paragraph position="5"> It is preferable to construct (i.e., extract) views at runtime rather than encoding them in a knowledge base. If a KB-accessing system could dynamically construct views, the discourse-knowledge engineer would be freed from the task of anticipating all queries and rhetorical situations and precompiling semantic units for each situation.</Paragraph>
    <Paragraph position="6"> KNIGHT, ROMPER (McCoy 1989), and Suthers' work (Suthers 1988, 1993) use these types of views to determine the content of their explanations. Once a perspective is selected, ROMPER includes in its explanations only those attributes whose salience values are the highest. In contrast to ROMPER'S views, which are domain specific, that is, confined to the domain of financial securities, Suthers' and KNIGHT'S views are domain independent. Suthers set forth a set of views which can be used to select coherent subsets of domain knowledge: structural, functional, causal, constraint, and process. He also developed a view retriever and a highly refined theory of explanation generation in which views play a significant role. KNIGHT'S views are very similar to McCoy's and Suthers' in that they define the relations and properties of a concept that are relevant when considering a concept from a viewpoint belonging to that view type (Acker 1992; Souther et al. 1989). They also provide four types of knowledge-base access robustness, as discussed in Section 3.</Paragraph>
    <Paragraph position="7"> Discourse Generation. Two principle mechanisms have been developed for generating discourse: schemata and top-down planners. 23 McKeown's pioneering work on schemata marks the beginning of the &amp;quot;modern era&amp;quot; of discourse generation (McKeown 1985). Schemata are ATN-like structures that represent naturally occurring patterns of discourse. For example, a schema for defining a concept includes instructions to identify its superclass, to name its parts, and to list its attributes. To construct an explanation plan, McKeown's TEXT system traverses the schemata and sequentially instantiates rhetorical predicates with propositions from a knowledge base. Paris extended schemata to generate descriptions of complex objects in a manner that is appropriate for the user's level of expertise (Paris 1988), and ROMPER's schemata include information about the content of propositions to be selected, as well as their communicative role. Although schemata have been criticized because they lack flexibility, they successfully capture many aspects of discourse structure.</Paragraph>
    <Paragraph position="8"> An alternative to schemata is the top-down planning approach (Cawsey 1992; 23 A third alternative, proposed by Sibun (1992), are short-range strategies that exploit relations such as spatial proximity to guide the generator through the domain knowledge. Though flexible, they do not account for extended explanations, which require a more global rhetorical structure.</Paragraph>
    <Paragraph position="9">  Lester and Porter Robust Explanation Generators Hovy 1993; Maybury 1992; Moore 1995; Moore and Paris 1993; Suthers 1991). 24 The operators of two of these planning systems are based on Rhetorical Structure Theory (RST) (Mann and Thompson 1987). Hovy's (1993) Structurer is a hierarchical planner whose operators instantiate relations from RST. The Reactive Planner also uses RSTlike operators. However, unlike all of the preceding research--and unlike KNIGHT as well--it offers sophisticated mechanisms for generating explanations in interactive contexts (Moore 1995; Moore and Paris 1993). Because the operators explicitly record the rhetorical effects achieved, and because the system records alternative operators it could have chosen, as well as assumptions it made about the user, the Reactive Planner can respond to follow-up questions--even if they are ambiguous--in a principled manner. A related approach has been taken by Cawsey in the EDGE system (Cawsey 1992). EDGE has facilities for managing conversations, so users may interrupt the system to ask questions, and EDGE can either answer the question immediately or postpone its response. Suthers (1991) has developed a sophisticated hybrid approach that includes planning techniques as well as plan critics, simulation models, reorganization methods, and graph traversal. By assembling these diverse mechanisms into a single architecture, he demonstrates how the complexities of explanation planning can be dealt with in a coherent framework. The principal advantage of top-down planners over schema-based generators is their ability to reason about the structure, content, and goals of explanations--as opposed to merely instantiating pre-existing plans embodied by schemata.</Paragraph>
    <Paragraph position="10"> KNIGHT'S EDPS are much more schema-like than plan-like. Although EDPs have inclusion conditions, which are similar to the constraint attribute of RST-based operators, and they provide a centrality attribute, which enables KNIGHT to reason about the inclusion of a topic if &amp;quot;space&amp;quot; is limited, EDPs do not in general permit KNIGHT to reason about the goals fulfilled by particular text segments as do plan-based systems. For example, if the expressions in an EDP'S inclusion condition are not satisfied, KNIGHT cannot create a plan to satisfy them. Moreover, although EDPs are effective for generating explanations, achieving other communicative goals, for example, Correct-Misconception, may be beyond their capabilities. Despite these drawbacks, EDPS have proven to be very useful as a discourse-knowledge engineering tool, a result that can be attributed in large part to their combining a frame-based representation with procedural constructs. In a sense, EDPs are schemata whose representation has been fine-tuned to maximize ease of use on a large scale.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML