File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/89/h89-1022_metho.xml

Size: 15,798 bytes

Last Modified: 2025-10-06 14:12:19

<?xml version="1.0" standalone="yes"?>
<Paper uid="H89-1022">
  <Title>A Flexible Interface for Linking Applications to Penman's Sentence Generator</Title>
  <Section position="3" start_page="153" end_page="155" type="metho">
    <SectionTitle>
2 Basic Design of the Interface
</SectionTitle>
    <Paragraph position="0"> The principal components PSnd sources of information used by Penman are shown in Figure 1. This diagram shows only those components that are necessary to generate one sentence at a time, although the same design has been augmented to generate coherent paragraphs. We describe first the components that are provided by Penman, and comprise its knowledge about the (English) language. Then, we describe those components whose contents depend on the application.</Paragraph>
    <Section position="1" start_page="153" end_page="153" type="sub_section">
      <SectionTitle>
2.1 Resources Provided by Penman
</SectionTitle>
      <Paragraph position="0"> Grammar: Penman's sentence generator is based on Nigel (Mann &amp; Matthiessen 83), a large systemic-functional grammar of English based primarily on the work of Halliday (Halliday 85). Nigel is a network of interdependent points of minimal grammatical contrast, called systems. Each system defines a collection of alternatives, called grammatical features, that specify a particular aspect of a sentence (e.g., active or passive). The semantics of the Nigel grammar is defined by a set of inquiries that control choices of grammatical features by acquiring information from the knowledge sources in Penman's operating environment.</Paragraph>
      <Paragraph position="1"> Basic Lexicon: The basic lexicon provided by Penman contains the definitions of almost all closed-class words (e.g., prepositions, auxiliary verbs), and other frequently used words (approximately 900 root words). The organization of Penman's lexicon is described by (Cumming &amp; Albano 86).</Paragraph>
      <Paragraph position="2"> Upper Model: Penman must be able to treat the symbols given in demands for text appropriately (for example, actions are typically expressed as verbs and objects as nouns). Therefore these symbols must have some definition in terms of Penman's taxonomy of knowledge. This taxonomy is called the upper model, and it contains abstract categories that reflect grammatical distinctions made in English. The upper model is maintained as a property-inheritance network, using the LOOM knowledge representation system (MacGregor &amp; Bates 87); it has also been encoded in several other frameworks.</Paragraph>
    </Section>
    <Section position="2" start_page="153" end_page="155" type="sub_section">
      <SectionTitle>
2.2 Resources Provided by Specific Applications
</SectionTitle>
      <Paragraph position="0"> These are the preparatory knowledge sources that identify those expressive options in English vocabulary and grammar that can be used to represent information from a particular application domain.</Paragraph>
      <Paragraph position="1"> * Lexical Item Definitions: The lexicon should contain definitions of any words (with the exception of proper names) that will be generated from the vocabulary of the application domain. Penman  provides a programmed tool (Cumming &amp; Albano 86) to help application developers define words with appropriate grammatical features so that they are under the full control of Penman's grammar. Domain Model: The domain model is a taxonomy of knowledge that is specific to the application domMn. In general, the categories of the domain model are more specific than those of the upper model. When a category of some application domain is defined to be more specific than an upper model category, Penman is able to make inferences concerning how the application category might be expressed in English. Most applications that are sophisticated enough to need a text generator require such a model as a natural part of their work, either explicitly or implicitly (e.g., the field types and the relations among them in relational data bases).</Paragraph>
      <Paragraph position="2">  - Links to the upper model: All domain model concepts and relations used in demands for text by the application must specialize some concept or relation in Penman's upper model.</Paragraph>
      <Paragraph position="3"> - Links to the lexicon: Any number of lexical items may be associated with a domain model  concept, and these items will be used by Penman when they have features that do not conflict with other constraints imposed by the sentence plan or grammar. It is not necessary to provide lexical associations for every concept of the domain model, but concepts without any lexical associations are only be expressible using words that are associated with more general concepts.</Paragraph>
      <Paragraph position="4"> 3 Experience with the Basic Design Penman was used during 1988 within several projects at the Information Sciences Institute as an experimental English text generator in the following different application domains:  * Navy Pacific Fleet Briefing (CINCPAC): used by the Integrated Interfaces project (Arens et al. 88) to report facts from a database about the positions and activities of ships; * Digital Circuit Diagnosis (DCD): used by the Explainable Expert Systems project (Swartout 83) to describe electronic components and actions suggested by an expert system to repair them; * Program Enhancement Advisor (PEA): used by the Explainable Expert Systems project to describe potential improvements to computer programs; * German-English Machine Translation: used in cooperation with the Eurotra-D project (Bateman et al. 89) to produce English translations of German text.</Paragraph>
      <Paragraph position="5"> These different applications demonstrated that Penman provides the basis for an adaptable general-purpose text generation capability. They also indicated several ways that Penman's interface with applications could be improved. In a case study of linking the DCD application to Penman, it was found that substantial effort was spent on two tasks: 1. subordinating the application domain model to Penman's upper model; 2. constructing input specifications for Penman's sentence generator.</Paragraph>
      <Paragraph position="6">  The first task was difficult mostly because application programmers are likely to be unfamiliar with the upper model. To overcome this difficulty, descriptions of the upper model hierarchy have been encoded in all upper model construction tool. This tool traverses the upper model hierarchy, under guidance of the application programmer, to place domain model concepts under appropriate upper model concepts. The second task was difficult because Penman's former input notation (see Sondheimer &amp; Nebel 86) proved to be cumbersome, requiring all information to be stated in a form similar to the predicate calculus. Although it provided a strong foundation for formal reasoning, the predicate calculus style of notation was relatively inflexible. Some linguistic constraints could not be stated directly, and other kinds of information had to be stated redundantly.</Paragraph>
      <Paragraph position="7"> Hence we developed SPL, a new interface notation with the following characteristics:  * constraints can be stated at multiple levels of abstraction: both propositional content and linguistic features can be selectively controlled; * more information can be predefined, including linguistic features that rarely vary in the application domain (e.g., default to present tense), and frequently used clusters of information (e.g., how to refer to an object by a proper name); * constraints can be separately specified on different occurrences of an entity.</Paragraph>
    </Section>
  </Section>
  <Section position="4" start_page="155" end_page="157" type="metho">
    <SectionTitle>
4 SPL: A New Sentence Plan Notation
</SectionTitle>
    <Paragraph position="0"> SPL representations are lists of terms describing the types of entities and the particular attributes of those entities to be expressed in English. The attributes of SPL terms provide control at several levels of abstraction. At the two most basic levels, attributes may specify semantic relations to be expressed from the application's knowledge base, or they may directly specify responses to Penman's inquiries, which determine grammatical features of sentences.</Paragraph>
    <Paragraph position="1"> The syntax of the SPL notation is defined in Figure 2, using BNF-style productions (note that A* denotes zero or more occurrences of A, and A + denotes one or more occurrence of A). The SPL notation is formally similar to the typed feature logic developed by Smolka (Smolka 88) in a similar context of sharing information between language processing and knowledge representation systems. The formal properties of this kind of notation provide a sound method for merging partial descriptions together into more complete descriptions, thus making it straightforward to merge predefined information with information contained in the specification for a particular sentence.</Paragraph>
    <Section position="1" start_page="155" end_page="156" type="sub_section">
      <SectionTitle>
4.1 A Simple Example
</SectionTitle>
      <Paragraph position="0"> A simple example of the SPL notation is shown in Figure 3. The main term of this plan informs Penman that it should generate a sentence to express el, which is a variable denoting an entity in the application's knowledge base. The type of el is enroute, the name of a concept in the domain model that specializes the  Material-Process concept (i.e., a kind of action) of Penman's upper model. Penman uses this information about the type of el to choose an appropriate verb, be en route, for the sentence.</Paragraph>
      <Paragraph position="1"> The term describing el also contains five attributes. The interpretation of attributes depends on the type of their keywords. The keywords :actor, :destination.r and :ebeg.r are the names of relations in the domain model. These three attributes inform Penman that the action el has an actor denoted by s2, a destination denoted by p3, and a time denoted by d4. :theme is a special keyword that may optionally be used to control thematization. In this case, its value specifies that the phrase referring to d4 should come at the front of the generated sentence. The keyword :tense is the name of a macro which expands the value past into a collection of attributes that specify responses to some of Penman's inquiries, as described below.</Paragraph>
    </Section>
    <Section position="2" start_page="156" end_page="156" type="sub_section">
      <SectionTitle>
4.2 Macros
</SectionTitle>
      <Paragraph position="0"> The SPL notation Provides macros to allow predefinition of frequently used clusters of information. Penman's grammar can control a large number of grammatical features when it builds sentences. In many cases, the grammar is capable of expressing far more delicate shades of meaning than a particular application may require. Rather than require that the application repeatedly specify all the necessary inquiry responses to generate some grammatical phenomenon, Penman makes it possible to use macros to abbreviate the specification at a level of detail that is supported by the application.</Paragraph>
      <Paragraph position="1"> For example, in order to specify English tense in a fully general way, one must specify ordering relations between three times: the actual speaking time, the event time, and the time of reference with which the event is contrasted. For many applications, such delicate control of temporal relations is not required; some distinguish simply between present and past. For this case, we define a macro called :tense that takes the values present or past and expands them into the appropriate inquiry responses. Penman provides a predefined package of common macro keywords, such as the macro for tense described above. It also provides functions for creating new macros that can be used by an application to customize its interface to Penman.</Paragraph>
    </Section>
    <Section position="3" start_page="156" end_page="156" type="sub_section">
      <SectionTitle>
4.3 Defaults
</SectionTitle>
      <Paragraph position="0"> Often it is useful to be able to predefine features of sentences that do not change frequently within an application domain. To enable this, Penman provides a facility for defining default values for any of the inquiries that it uses to obtain information from an application. Many of Penman's inquiries come supplied with initial default values that will be used unless specific information in a sentence plan overrides them.</Paragraph>
      <Paragraph position="1"> For example, consider the sentence plan given in Figure 3. It does not contain any specification of whether the sentence should be a statement, a question, or a command, rior does it contain any specification of whether it should express positive or negative polarity. Penman's predefined default values provide the necessary inquiry responses to generate a statement with positive polarity. In addition to the initial defaults supplied by Penman, functions are provided to enable the application to dynamically define new default values in packages called default environments. Default environments are maintained in a stack-like memory, with the Penman-supplied defaults at the bottom. The stack-like organization of default environments makes it possible for an application to temporarily change default settings for a particular portion of a text, and then return to the default environment that was previously in effect.</Paragraph>
    </Section>
    <Section position="4" start_page="156" end_page="157" type="sub_section">
      <SectionTitle>
4.4 Interpretation of Sentence Plans
</SectionTitle>
      <Paragraph position="0"> A sentence plan in the SPL notation is interpreted in two phases. First, the plan is pre-processed and transformed into an internal representation. This pre-processing step includes expansion of macros, distribution of type information to variable terms, and a check of the consistency of terms. The first term of the plan is identified to the sentence generator as the initial unit of information to be expressed (usually as the main clause of a sentence). Then, Penman invokes its sentence generator to produce a sentence according to the expanded plan.</Paragraph>
      <Paragraph position="1"> Penman's sentence generator uses a series of inquiries to the sentence plan and other knowledge sources in  order to guide the generation process. Inquiries may obtain answers from several sources, according to the following sequence: 1. SPL keyword: The sentence plan is searched for a keyword that matches the name (and, optionally,, the parameters) of the inquiry, and the corresponding value is returned.</Paragraph>
      <Paragraph position="2"> . knowledge sources: Each inquiry may have an executable (i.e., lisp) function associated with it, called an inquiry implementation, which searches knowledge sources for appropriate information. Inquiry implementations generally obtain information from the domain and upper models about the type or relational attributes of SPL terms.</Paragraph>
      <Paragraph position="3"> 3. active default value: When the inquiry implementation returns an undefined value for the inquiry, or when the inquiry has no implementation, the current active default value for the inquiry is used. In general, the attributes in a SPL specification that correspond directly to linguistic distinctions, such as inquiry responses, take precedence over other attributes, such as relations from the domain knowledge base. In addition, all attributes contained in the SPL specification for a particular sentence take precedence over any default values that have been defined for an inquiry.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML