File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0415_metho.xml
Size: 23,120 bytes
Last Modified: 2025-10-06 14:14:25
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0415"> <Title>Denotation I t~not~tion I Denotation E co,n,\[ x Connotatiot~ Connotatior~ Colmolation I Partial / Partial \[ Partial C SemSpec SemSpec I SemSpec \[ 0 Alternation~ Altffnafio~ Alternations N MorphSyn |MorphSynt I MorphSynt Generation</Title> <Section position="4" start_page="141" end_page="141" type="metho"> <SectionTitle> 2 Two-step sentence generation </SectionTitle> <Paragraph position="0"> The MOOSE sentence generator grew out of experiences with building the TECHDOC system \[RSsner, Stede 1994\], which produces instructional text in multiple l:mguages from a common representation. Specifically, MOOSE accounts for the fact that events can receive different verbalizations even in closely related languages such as English and German. It is designed as a sentence generation module that pays attention to language-specific lexical idiosyncrasies, and that can be incorporated into a larger-scale text generator.</Paragraph> <Section position="1" start_page="141" end_page="141" type="sub_section"> <SectionTitle> 2.1 MOOSE in a nutshell </SectionTitle> <Paragraph position="0"> For this brief description of the system architecture, see figure 1. The generator assumes a language-neutral level of event representation, the situation specification or SitSpec. Using parts of t, he target lexicon (see section 2.3), the lexical options for verbalizing the SitSpec are determined. For verbs, the applicable alternations and extensions are computed (see section 4) and added to the set of options. Then a language-specific semantic specification SemSpec is constructed in accordance with generation parameters pertaining to brevity and stylistic tbatures. The SemSpec is then handed over to a surface generator: Penman \[Penman Group 1989\] for English, and a variant developed at FAW Uhn for German. As opposed to the 'traditional' Penman idea, the domain model in which the input SitSpec is represented has been de-coupled from the linguistic upper model, in order to achieve variety in verbalization that would otherwise not be possible \[Stede and Grote 1995\], MOOSE is implemented in Macintosh Common Lisp and uses MacPenman; a full description of the system is given in \[Stede 1996\].</Paragraph> </Section> <Section position="2" start_page="141" end_page="141" type="sub_section"> <SectionTitle> 2.2 Levels of representation </SectionTitle> <Paragraph position="0"> A central assumption of the research reported here is that the &quot;deepest&quot; level of representation is in general not a linguistic representation; instead, we assume a domain model of some sort, implemented in a KI=I.</Paragraph> <Paragraph position="1"> language. Thus, an explicit transition between instantinted domain knowledge and a language-specific semantic sentence representation is seen as the central step in generation.</Paragraph> <Paragraph position="2"> SitSpec A SitSpec is meant to be neutral between the target languages and between particular paraphrases. It is organized along a variant of the ontological categories proposed by Vendler \[1967\] and developed further, inter alia, by Bach \[1986\]. We have extended Bach's ontology by breaking up events so that their internal structure is explicitly represented (similar to Pustejovsky's \[1991\] proposal): An event is composed of a pre-state (holding before the event commences), a post-state (holding when the event is over), and an optional activity that brings the transition about. An event without such an activity is a mere state transition, e.g., The room lit up. An event including an activity is a culmination; as an example, consider the event of oil draining from an engine, which is given here in an abbreviated KL-ONE notation (roles names in capital letters, instance names in lower-case):</Paragraph> <Paragraph position="4"> Figure 2 shows the overall taxonomy of situation types. Subsumed by the general ontological system, a domain model is defined that holds the concepts relevant for representing situations in a technical sample domain and that specifies the exact conditions for the</Paragraph> </Section> </Section> <Section position="5" start_page="141" end_page="142" type="metho"> <SectionTitle> SITUATION STATE ACTIVITY EVE~4T </SectionTitle> <Paragraph position="0"> well-formedness of situations. It is implemented in the KL-ONE language LOOM \[MacGregor, Bates 1987\].</Paragraph> <Paragraph position="1"> SemSpec The level of SemSpecs is motivated by the notion of &quot;upper modelling&quot; \[Bateman et al. 1990\] and is a subset of the input representation language that was developed for Penman, the sentence plan language (SPL) \[Kasper 1989\]. As opposed to a general SPL term, a SemSpec must contain only upper model concepts and no domain concepts--recall that the domain model in MOOSE is not subsumed by the upper model. Furthermore, since our system takes lexicalization as the decisive task in mapping a SitSpec to a SemSpec, the UM concepts referred to in a SemSpec must be annotated with :lex expressions; thus, a SemSpec is a lexicalized structure. Accordingly, we see the upper model as a taxonomy of lexical classes.</Paragraph> <Paragraph position="2"> SemSpecs are constructed from SitSpecs by select- null SitSpec A SitSpec is meant to be neutral between the target languages and between particular paraphrases.</Paragraph> <Paragraph position="3"> It is organized along a variant of the ontological categories proposed by Vendler \[1967\] and developed filrther, inter alia, by Bach \[1986\]. We have extended Bach's ontology by breaking up events so that their internal structure is explicitly represented (similar to Pustejovsky's \[1991\] proposal): An event is composed of a pre-state (holding before the event commences), a post-state (holding when the event is over), and an optional activity that brings the transition about. An event without such an activity is a mere state transition, e.g., The room lit up. An event including an activity is a culmination; as an example, consider the event of oil draining from an engine, which is given here in an abbreviated KL-ONE notation (roles names in capital letters, instance names in lower-case):</Paragraph> <Paragraph position="5"> Figure 2 shows the overall taxonomy of situation types. Subsumed by the general ontological system, a domain model is defined that holds the concepts relevant for representing situations in a technical sample domain and that specifies the exact conditions for the</Paragraph> </Section> <Section position="6" start_page="142" end_page="143" type="metho"> <SectionTitle> SITUATION STATE ACTIVITY EVENT </SectionTitle> <Paragraph position="0"> well-formedness of situations. It is implemented in the KL-ONE language LOOM \[MacGregor, Bates 1987\].</Paragraph> <Paragraph position="1"> SemSpec The level of SemSpecs is motivated by the notion of &quot;upper modelling&quot; \[Bateman et al. 1990\] and is a subset of the input representation language that was developed for Penman, the sentence plan language (SPL) \[Kasper 1989\]. As opposed to a general SPL term, a SemSpec must contain only upper model concepts and no domain concepts--recall that the domain model in MOOSE is not subsumed by the upper model. Furthermore, since our system takes lexicalization as the decisive task in mapping a SitSpec to a SemSpec, the UM concepts referred to in a SemSpec must be annotated with :lex expressions; thus, a SemSpec is a lexicalized structure. Accordingly, we see the upper model as a taxonomy of lexical classes.</Paragraph> <Paragraph position="2"> SemSpecs are constructed from SitSpecs by selecting a UM-process and mapping SitSpec ,:dements to participant roles of that process, so that all elements of the SitSpec are covered. This choice of process and participants in effect establishes a perspective on the situation; SitSpec is underspecified in this respect.</Paragraph> <Paragraph position="3"> SemSpec is still underspecified with regard to, for example, constituent order and lexical choice between near-synonyms (that have the same semantics with respect to SitSpec yet differ in terms of style, collocational restrictions, etc.). These and other decisions are made, on the basis of verbalization parameters, by the surface generators.</Paragraph> <Section position="1" start_page="142" end_page="143" type="sub_section"> <SectionTitle> 2.3 The role of the lexicon </SectionTitle> <Paragraph position="0"> MOOSE is designed with the goal of strong lexical paraphrasing capabilities in mind. Therefore, its lexicon is rich in information so that lexical choices can be made on the basis of various generation parameters (which are not discussed in this paper). A lexical entry in MOOSE has the following components: Denotation A partial SitSpec that defines the C/,pplicability condition of the lexeme: if its denotation subsumes some part of the input SitSpec, then (and only then) it is a candidate lexical option for the verbalization. null Covering The subset of the denotation nodes that are actually expressed by the lexeme. One of the constraints for sentence production is that every node be covered by some lexeme.</Paragraph> <Paragraph position="1"> Partial SernSpec (PSemSpec) The contribution that the lexeme can make to a sentence SemSpec.</Paragraph> <Paragraph position="2"> By means of shared variables, the partial SemSpec is linked to the denotation.</Paragraph> <Paragraph position="3"> Connotations Stylistic features pertaining to formality, floridity, etc. See \[DiMarco et al. 1993\]. Salience assignment (for verbs only): A specification of the different degrees of prominence that the verb assigns to the participants.</Paragraph> <Paragraph position="4"> Alternation rules (for verbs only): Pointers to lexical rules that represent alternations the verb can undergo (see section 4).</Paragraph> <Paragraph position="5"> Morphosyntactic features Standard features needed by the surface generator to produce correct utterances.</Paragraph> <Paragraph position="6"> ,v~- ..... I ,~f the SitSpec--SemSpec mapping is the production of a ~o,-,e .... , ,'-' .. -: ~... 7, ~&quot; the partial SemSpecs (PSemSpecs) associated with a subset of the lexical options, such that the lexemes in this subset collectively cover the entire SitSpec. This unification process is driven by the candidate verbs; their PSemSpec consists of an upper model process and the mappings from situation elements to process participants, which is achieved by co-indexing with positions in the denotation. By means of sharing this information between denotation and PSemSpec, the lexicon entries serve as a &quot;bridge&quot; between the SitSpec to be verbalized and the intermediate representation SemSpec; thus, the role of the lexicon in MOOSE is somewhat similar to that in DIOGENES \[Nirenburg and Nirenburg 1988\].</Paragraph> <Paragraph position="7"> Importantly, the denotation of a lexeme need not be a single concept; instead, it, can be a complete configuration of concepts and roles (cf. Horacek \[1990\]). This is necessary since we want to break up the internal event structure in the representation of verb meaning. Consequences are higher computational cost in finding lexical options, but also a higher flexibility in finding different, verbalizations of the same event. As an example, consider the denotation of the causative reading of to fill:</Paragraph> <Paragraph position="9"> The variables are bound to instances or atomic values of the SitSpec when the two are matched against each other. The filler of the VALUE role in the POST-STATE appears in angle brackets because it is a default value, which we do not discuss further here, though. The accompanying partial SemSpec of to fill contains the same variables: (x / directed-action :lox fill :actor B :actee h :inclusive C <:destination D>) When the denotation is matched against a SitSpec, the variable bindings are propagated to the partial SemSpec; and when it is unified with the partial SemSpecs corresponding to the other elements, a complete SemSpec results, from which PENMAN produces a sentence like .Jill filled the tank with oil. (If the VALUE is different from 'full, it also gets verbalized, such as in Jill filled the tank to the second mark.)</Paragraph> </Section> </Section> <Section position="7" start_page="143" end_page="145" type="metho"> <SectionTitle> 3 Verb semantics </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="143" end_page="143" type="sub_section"> <SectionTitle> 3.1 Aktionsart </SectionTitle> <Paragraph position="0"> qince wrh denotations are complex enough to reflect certain parts of event structure, they can be related to the notion of Aktionsart: the verb-inherent features characterizing (primarily) the temporal distribution of the event denoted. The variety of phenomena in Aktionsart are far from clear-cut, and there is no generally accepted and well-defined set of features.</Paragraph> <Paragraph position="1"> In the following, we use the terms given by Bussmann \[1983\] and discuss only those Aktionsart features that are directly relevant for us because they relate types of SITUATIONS to denotations of verbs. Thus, within the context of our system, we define Aktionsart features in terms of patterns of verb denotations. The following table lists the correspondences.</Paragraph> <Paragraph position="2"> Aktionsart \[ Denotation pattern stative (state X) durative (protracted-act ivity X) semelfactive (moment aneous-act ivity X)</Paragraph> <Paragraph position="4"> Simple cases are stative verbs like to own or to know. Durative verbs characterize continuous occurfences that do not, have internal structure, like to sleep, to sit. In the class of non-durative verbs we find, amongst others, the opposition between iteratire and semelfactive ones. The former are durative activities that result from repeating the same occurrence. In contrast, a semelfactive verb denotes a single occurrence, thus in our system a MOMENTANEOUS-ACTIVITY, as for example to knock. Transformative verbs involve a change of some state, without a clearly recognizable event that would be responsible for it: The room lit up. The denotation of such verbs thus involves a pre-state and a post-state, which is the negation of the former. In our ontology, these are TRANSI-TIONS. Resultative verbs, on the other hand, characterize situations in which something is going on and then comes to an end, thereby resulting in some new state (CULMINATIONS in our ontology). Their denotation includes an activity and a post-state. In the literature, such verbs are often also called inchoative. 1 The final verb-inherent feature we use is the well-known causative, which reflects the presence of a CAUSER in the denotation.</Paragraph> </Section> <Section position="2" start_page="143" end_page="144" type="sub_section"> <SectionTitle> 3.2 Valency </SectionTitle> <Paragraph position="0"> Valency, as introduced by Tesnigre \[1959\], refers to the distinction between actants and circumstantials (central participants associated with the verb versus temporal, locational, and other circumstances). This separation is in principle widely accepted, but views differ on where to draw the line and how to motivate it. The notion of valency was further developed predominantly in German linguistics, with a culmination point being the valency dictionary of German verbs 1 The term 'inchoative' is used to cover a radmr broad range of phenomena, including the beginning of an event (e.g., to in/'/ame) or its coming to an end. We think the term is overloaded and prefer to use 'resultative' for the latter group.</Paragraph> <Paragraph position="1"> by Helbig and Schenkel \[1973\]. They made an additional distinction between 'obligatory' and 'optional' actants; Somers \[1987, ch. 1\] proceeded to propose six different levels of valency binding. He also pointed out that there are different opinions on the type of entities that are subject to a verb's valency requirements: different authors describe them by syntactic class, some by semantic deep cases, and some by their fimction (subject, object, etc.).</Paragraph> <Paragraph position="2"> In our approach, which is driven by the (practical) needs of MLG, we aim at encapsulating syntactic matters in the front-end generators and here look at valency in the SitSpec-SemSpec mapping: When characterizing the linking between SitSpec elements and SemSpec participants/circumstances, we describe valency in terms of upper model concepts.</Paragraph> <Paragraph position="3"> We wish to distinguish cases like the following: * Tom disconnected the wire {from the plug}. To disconnect requires a SOURCE, but it can be omitted in a suitable specific context.</Paragraph> <Paragraph position="4"> * Sally ate. While to eat usually requires a direct object,, it can also be used intransitively due to the strong semantic expectation it creates on the nature of the object--independent of the context.</Paragraph> <Paragraph position="5"> * Tom put the book on the table. To put requires a DESTINATION, and it cannot be omitted, no matter how specific the context.</Paragraph> <Paragraph position="6"> * The water drained from the tank&quot; {in the garage}.</Paragraph> <Paragraph position="7"> Locative circumstances like in the garage are not restricted to particular verbs and can occur in addition to PATHS required by the verb.</Paragraph> <Paragraph position="8"> Adopting the three categories proposed by Helbig and Schenkel \[1973\], we distinguish between obligatory and optional participants on the one hand, and circumstances on the other. The criterion of optionality, as indicated above, singles out the obligatory complements. But how, exactly, can we motivate the distinction between optional participants and circumstances in our framework? By relating the PSemSpec to the SitSpec, via the denotation. In the disconnect case, for instance, the two items CONNECTOR and CONNECTEE are both integral elements of the situation. The situation would not be well-formed with either of them absent, and the domain model encodes this restriction. Therefore, both elements also occur in the denotation of to disconnect, and a co-indexed variable provides the link to the PSemSpec. Only when building the sentence SemSpec is it relevant to know that the CONNECTEE can be omitted. The CONNECTEE in the denotation therefore must have its counterpart in the PSemSpec--that is the SOURCE, but there it is marked as optional (see figure 6 below).</Paragraph> <Paragraph position="9"> With circumstances, the situation is different: A SitSpec is complete and well-formed without the information on, for instance, the location of an event.</Paragraph> <Paragraph position="10"> Hence, a verb's denotation cannot contain that information, and it follows that it is not present in the PSemSpec, either.</Paragraph> </Section> <Section position="3" start_page="144" end_page="145" type="sub_section"> <SectionTitle> 3.3 Verbs and the upper model </SectionTitle> <Paragraph position="0"> Now, since our instrument for ensuring the well-formedness of PSemSpecs and SemSpecs is the upper model, we need to inspect the role of valency information in the UM. On the one hand, Bateman et el.</Paragraph> <Paragraph position="1"> \[1990\] are well aware of the problems with ascribing simple valency patterns to verbs, but for the practical implementation of Penman and the UM, some strict--and simplifying--category distinctions had to be made. Thus, all participants of process types, as listed above, are coded in LOOM as obligatory roles.</Paragraph> <Paragraph position="2"> Circumstances, on the other hand, are in the UM coded as LOOM relations, and there are no restrictions as to what circumstances can occur with what processes. Spatio-temporal information is generally seen as a circumstance. Concerning the linguistic realizations, Penman and the UM in their present form essentially go back to the Tesnb~rian suggestion that participants are realized as nominal groups (with some obvious exceptions, as in say that x), and circumstances as prepositional phrases or as adverbs.</Paragraph> <Paragraph position="3"> But neither this syntactic division corresponding to participants and circumstances (direct or indirect object versns adverbs or prepositional phrases) nor the UM's semantic postulate that spatio-temporal aspects are circumstances hold in general. Regarding spatial relationships, we find verbs that specifically require PATH-expressions, which cannot be treated on a par with circumstances: Recall to put, which requires a direct object and a DESTINATION. Causative to pour requires a direct object as well as a PATH with either a SOURCE, or a DESTINATION, or both: pour the water from the can into the bucket. Some verbs, as is well-known, can occur with either a PATH (Tom walked into the garden) or with a PLACE (Tom walked in the garden), and only in tile garden can here be treated as a circumstance. And to disconnect requires a direct object ,(the entity that is disconnected) and a SOURCE (the entity that something is disconnected from), which can be omitted if it is obvious from the context: Disconnect tile wire! The upper model in its present, form cannot, make distinctions of this kind. It is not, possible to specify a PATH expression, which will be realized as a prepositional phrase, as an obligatory participant. About to disconnect (in the causative reading), which is a MATERIAL-PROCESS, tile UM can only state that the roles ACTOR and ACTEE must be filled, but not the fact that there is another entity involved--in the domain model called the CONNZCTEZ--which is verbalized as a SOURCE. Moreover, the UM does not know that the CONNECTEE is optional in the verbalization; it does not distinguish between obligatory and optional participants. null As a step forward to a more fine-grained distinction between participants and circumstances, we differentiate between requirements of process types (as coded in the UM) and requirements of individual verbs, which are to be coded in the lexical entries. In a nutshell, valency (as a lexical property) needs to supplement the participant/circumstance requirements that can be stated for types of processes. To encode the valency information, we use the partial SemSpec of a lexicon entry. The participant roles stated there are either obligatory or optional, in which case they are marked with angle brackets: to disconnect PSS: (x / directed-action :actor A :actee B < :source C >) With obligatory participants, the verb is only applicable if the elements denoted by these participants are present in the SitSpec. Optional participants need not necessarily be included in the verbalization: If they are present in the SitSpec, they may be omitted if there is some good reason (e.g., a stylistic preference); if they are not present in the SitSpec, the verb can be used anyway.</Paragraph> </Section> </Section> class="xml-element"></Paper>