File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/90/w90-0108_abstr.xml
Size: 10,780 bytes
Last Modified: 2025-10-06 13:47:04
<?xml version="1.0" standalone="yes"?> <Paper uid="W90-0108"> <Title>Upper Modeling: organizing knowledge for natural language processing</Title> <Section position="1" start_page="0" end_page="55" type="abstr"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> A general, reusable computational resource has been developed within the Penman text generation project for organizing domain knowledge appropriately for linguistic realization. This resource, called the upper model, provides a domain- and task-independent classification system' that supports sophisticated natural language processing while significantly simplifying the interface between domain-specific knowledge and general linguistic resources. This paper presents the results of our experiences in designing and using the upper model in a variety of applications over the past 5 years. In particular, we present our conclusions concerning the appropriate organization of an upper model, its domainindependence, and the types of interrelationships that need to be supported between upper model and grammar and semantics.</Paragraph> <Paragraph position="1"> Introduction: interfacing with a text generation system Consider the task of interfacing a domain-independent, reusable, general text generation system with a particular application domain, in order to allow that application to express system-internal information in one or more natural languages. Internal information needs to be related to strategies for expressing it. This could be done in a domain-specific way by coding how the application domain requires its information to appear.</Paragraph> <Paragraph position="2"> This is clearly problematic, however: it requires detailed knowledge on the part of the system builder both of how the generator controls its output forms and the kinds of information that the application domain contains. A more general solution to the interfacing problem is thus desirable.</Paragraph> <Paragraph position="3"> We have found that the definition of a mapping between knowledge and its linguistic expression is facilitated if it is possible to classify any particular instances of facts, states of affairs, situations, etc. that occur in terms of a set of general objects and relations of specified types that behave systematically with respect to their possible linguistic realizations.</Paragraph> <Paragraph position="4"> This approach has been followed within the PENMAN text generation system \[Mann and Matthiessen, 1985; The Penman Project, 1989\] where, over the past 5 years, we have been developing and using an extensive, domain- and task-independent organization of knowledge that supports natural language generation: this level of organization is called the upper model \[Bateman et aL, 1990; Mann, 1985; Moore and Arens, 1985\]. The majority of natural language processing systems currently planned or under development are now recognizing the necessity of some level of abstract 'semantic' organization similar to the upper model that classifies knowledge so that it may be more readily expressed linguisticaUy. 1 However, they mostly suffer from either a lack of theoretical constraint concerning their internal contents and organization and the necessary mappings between them and surface realization, or a lack of abstraction which binds them too closely with linguistic form. It is important both that the contents of such a level of abstraction be motivated on good theoretical grounds and that the mapping between that level and linguistic form is specifiable.</Paragraph> <Paragraph position="5"> Our extensive experiences with the implementation and use of a level of semantic organization of this kind within the PENMAN system now permit us to state some clear design criteria and a well-developed set of necessary functionalities.</Paragraph> <Paragraph position="6"> The Upper Model's Contribution to the Solution to the Interface Problem: Domain independence and reusability The upper model decomposes the mapping problem by establishing a level of linguistically motivated knowledge organization specifically constructed as a reponse XIncluding, for example: the Functional Sentence Structure of XTRA: \[Allgayer et al., 1989\]; \[Chen and Cha, 1988\]; \[Dahlgren et al., 1989\]; POLYGLOSS: \[Emele et ai., 1990\]; certain of the Domain and Text Structure Objects of SPOKESMAN: \[Meteer, 1989\]; TRANSLATOR: \[Nixenberg et aL, 1987\]; the Semantic Relations of ~UROTa^-D: \[Steiner et al., 1987\]; JANUS: \[Weischedel, 1989\]. Space naturally precludes detailed comparisons here: see \[Bateman, 1990\] for further discussion.</Paragraph> <Paragraph position="7"> to the task of constraining linguistic realizations2; generally we refer to this level of organization as meaning rather than as knowledge in order to distinguish it from language-independent knowledge and to emphasize its tight connection with linguistic forms (cf. \[Matthiessen, 1987:259-260\]). While it may not be reasonable to insist that application domains organize their knowledge in terms that respect linguistic realizations -- as this may not provide suitable orgunizations for, e.g., domain-internal reasoning -- we have found that it is reasonable, indeed essential, that domain knowledge be so organized if it is also to support expression in natural language relying on general natural language processing capabilities.</Paragraph> <Paragraph position="8"> The general types constructed within the upper model necessarily respect generalizations concerning how distinct semantic types can be realized. We then achieve the necessary link between particular domain knowledge and the upper model by having an application classify its knowledge organization in terms of the general semantic categories that the upper model provides. This does not require any expertise in grammar or in the mapping between upper model and grammar. An application needs only to concern itself with the 'meaning' of its own knowledge, and not with fine details of linguistic form. This classification functions solely as an interface between domain knowledge and upper model; it does not interfere with domain-internal organization. The text generation system is then responsible for realizing the semantic types of the level of meaning with appropriate grammatical forms, s Further, when this classification has been established for a given application, application concepts can be used freely in input specifications since their possiblities for linguistic realization are then known. This supports two significant functionalities: * interfacing with a natural language system is radically simplified since much of the information specific to language processing is factored out of the input specifications required and into the relationship between upper model and linguistic resources; * the need for domain-specific linguistic processing rules is greatly reduced since the upper model provides a domain-independent, general and reusable conceptual organization that may be used to classify all domain-specific knowledge when linguistic processing is to be performed.</Paragraph> <Paragraph position="9"> ~Although my discussion here is oriented towards text generation, our current research aims at fully bi-directional linguistic resources \[Kasper, 1988; Kasper, 1989\]; the mapping is therefore to be understood as a bi.directional mapping throughout.</Paragraph> <Paragraph position="10"> 3This is handled in the PeNM*N system by the grammar's inquiry semantics, which has been described and illustrated extensively elsewhere (e.g., \[Bateman, 1988; Mann, 1983; Matthiessen, 1988\]).</Paragraph> <Paragraph position="11"> An example of the simplification that use of the upper model offers for a text generation system interface language can be seen by contrasting the input specification required for a generator such as MUMBLE-86 \[Meteer el al., 1987\] -- which employs realization classes considerably less abstract than those provided by the upper model -- with the input required for Penman. 4 Figure 1 shows corresponding inputs for the generation of the simple clause: Fluffy is chasing little mice. The appropriate classification of domain knowledge concepts such as chase, cat, mouse, and little in terms of the general semantic types of the upper model (in this case, directed-action, object, object, and size respectively -for definitions see: \[Bateman et al., 1990\]) automatically provides information about syntactic realization that needs to be explicitly stated in the MUMBLE-86 input (e.g., S-V-O_two-explicit-args, rip-common-noun, restrictive-modifier, adjective). Thus, for example, the classification of a concept mouse as an object in the upper model is sufficient for the grammar to consider a realization such as, in MUMBLE-86 terms, a general-np with a particular np-common-noun and accessories of gender neuter. Similarly, the classification of chase as a directed-action opens up linguistic realization possibilities including clauses with a certain class of transitive verbs and characteristic possibilities for participants, corresponding nominalizations, etc. Such low-level syntactic information is redundent for the PENMAN input.</Paragraph> <Paragraph position="12"> The further domain-independence of the upper model is shown in the following example of text generation control. Consider two rather different domains: a navy database of ships and an expert system for digital circuit diagnosis. 5 The navy data base contains information concerning ships, submarines, ports, geographical regions, etc. and the kinds of activities that ships, submarines, etc. can take part in. The digital circuit diagnosis expert system contains information about sub-components of digital circuits, the kinds of connections between those subcomponents, their possible functions, etc. A typical sentence from each domain might be: circuit domain: The faulty system is connected to the input navy domain: The ship which was inoperative is sailing to Sasebo The input specifications for both of these sentences are shown in Figure 2. These specifications freely intermix upper model roles and concepts (e.g., domain, 'Note that this is not intended to single out MUMBL~-88: the problem is quite general; cf. unification-based fframeworks such as \[McKeown and Paris, 1987\], or the Lexical Functional Grammar (LFG)-based approach of \[Momma and DSrre, 1987\]. As mentioned above, the current developments within most such approaches are now considering extensions similar to that covered by the upper model.</Paragraph> <Paragraph position="13"> SThese are, in fact, two domains with which we have had experience generating texts using the upper model.</Paragraph> </Section> class="xml-element"></Paper>