File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/96/w96-0501_metho.xml
Size: 11,123 bytes
Last Modified: 2025-10-06 14:14:26
<?xml version="1.0" standalone="yes"?> <Paper uid="W96-0501"> <Title>An Overview of SURGE: a Reusable Comprehensive Syntactic Realization Component</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 Reusable Realization </SectionTitle> <Paragraph position="0"> Component for NLG Natural language generation has been traditionally divided into three successive tasks: (1) content determination, (2) content organization, and (3) linguistic realization. The goal of a re-usable realization component is to encapsulate the domain-independent part of this third task. The input to such component should thus be as high-level as possible without hindering portability. Independent efforts to define such an input have crystalized around a skeletal, partially lexicMized thematic tree specifying the semantic roles, open-class lexical items and top-level syntactic category of each constituents. An example SURGE input with the corresponding sentence is given in Fig. 1.</Paragraph> <Paragraph position="1"> The task of the realization component is to map such skeletal tree onto a natural language sentence. It involves the following sub-tasks: (1) Map thematic structure onto syntactic roles: e.g., agent; process, possessed and pcssessor onto subjec't, verb-group, direc't-objec't and indirect-object (respectively) in $1.</Paragraph> <Paragraph position="2"> (2) Control syntactic paraphrasing and al null result in the generation of the paraphrase ($2): &quot;She hands the editor the draft&quot;. (3) Prevent over-generation: e.g., fail when adding the same (dative-move yes) feature to an input similar to I1 except that the possessed role is filled by ((cat pers-pro)) (for personal pronoun) to avoid the generation of ($8) * &quot;She hands the editor it&quot;.</Paragraph> <Paragraph position="3"> (4) Provide defaults for syntactic features: e.g.. definite for the NPs of $1.</Paragraph> <Paragraph position="4"> (5) Propagate agreement features, provid- null ing enough input to the morphology module: e.g.. after the agent and process thematic roles have been mapped to the subject and verb-group syntactic roles.</Paragraph> <Paragraph position="5"> propagate the default (person third) feature added to the subject filler to the verb-group filler; without such a propagation the morphology module would not be able to inflect the verb &quot;to hand&quot; as * 'hands&quot; in $1.</Paragraph> <Paragraph position="6"> (6) Select closed-class words: e.g., &quot;'she&quot;, &quot;'the&quot; and &quot;'to&quot; in $1. (7) Provide linear precedence constraints among syntactic constituents: e.g., subject > verb-group > indirect-object > direct-object once the default active voice has been chosen for $1.</Paragraph> <Paragraph position="7"> (8) Inflect open-class words (morphological processilLg): e.g., the verb &quot;to hand&quot; as &quot;'hands&quot; in $1.</Paragraph> <Paragraph position="8"> (9) Linearize the syntactic tree into a string of inflected words following the linear precedence constraints.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 The FUF/SURGE package </SectionTitle> <Paragraph position="0"> SURGE is implemented in the special-purpose programming language PuP \[1\] and it is distributed as a package with a PuP interpreter. This interpreter has two components: (1) the functional unifier that fleshes Out the input skeletal tree with syntactic features from the grammar, and (2) the linearizer that inflects each word at the bottom of the fleshed out tree and print them out following the linear precedence constraints indicated in the tree.</Paragraph> <Paragraph position="1"> ~u F is an extension of the original functional unification formalism put forward bv Kay \[5\]. It is based on two powerful concepts: encoding knowledge in recursive sets of attribute value pairs called Functional Descriptions (FD) and uniformly manipulating these FDs through the operation of unification.</Paragraph> <Paragraph position="2"> Both the input and the output of a FUF program are FDs, while the program itself is a meta-FD called a Functional Grammar (FG). An FG is an FD with disjunctions and control annotations. Control annotations are used in Fur for two distinct purposes: (1) to control recursion on linguistic constituents: the tree of the input FD is fleshed out in top-down fashion by re-unifying each of its sub-constituent with the FG. and (2) to reduce backtracking when processing disjunctions.</Paragraph> <Paragraph position="3"> SURGE represents our own synthesis, within a single working system and computational framework, of the descriptive work of several (non-computational) linguists. We took inspiration principally from \[4\] for the overall organization of the grammar and the core of the clause ,rod nominal sub-grammars; \[3\] for the semantic aspects of the clause; \[7\] for the treatment of long-distance dependencies: and \[8\] for the many linguistic phenomena not mentioned in other works, yet encountered in many generation application domains.</Paragraph> <Paragraph position="4"> Since many of these sources belong to the systemic linguistic school, SURGE iS mostly a functional unification implementation of systemic grammar. In particular, the type of FD that it accepts as input specifies a &quot;process&quot; in the systemic sense: it can be an event: or a relation. The hierarchy of general process types defining the thematic structure of a clause (and the associated semantic class of its main verb) in the current implementation is compact and able to cover many clause structures.</Paragraph> <Paragraph position="5"> Yet, the argument structure and/or semantics of many English verbs do not fit neatly in any element of this hierarchy \[6\]. To overcome this difficulty. SURGE also includes lexical processes inspired bv lexiealist grammars such as the Meaning-Text Theory and HPSG \[7\].</Paragraph> <Paragraph position="6"> A lexical process is a shallower and less semantic form of input, where the sub-categorization constraints and the mapping from the thematic roles to the oblique roles \[7\] are already specified (instead of being automatically computed by the grammar as is the case for general process types). The use of specific lexical processes to complement general process types is an example of the type of theorv integration that we were forced to carry out during the development of SURGE. In the current state of linguistic research, such an heterogeneous approach is the best practical strategy to provide broad coverage.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Organization and Cover- </SectionTitle> <Paragraph position="0"> age At the top-level, SURGE is organized into sub-grammars, one for each syntactic category. Each sub-grammar encapsulates the relevant part of the grammar to access when recursively unifying an input sub-constituent of the corresponding category. For example, generating the sentence &quot;.lames buys the book&quot; involves successively accessing the sub-grammars for the clause, the verb group, the nominal group (twice) and the determiner sequence. Each sub-grammar is then divided into a set of systems (in the systemic sense), each one encapsulating an orthogonal set of decisions, constraints and features. The main top-level syntactic categories used in SURGE are: clause, nominal group (or NP), determiner sequence, verb group, adjectival phrase and PP.</Paragraph> <Paragraph position="1"> Following \[4\], the thematic roles accepted by SURGE in input clause specifications first divide into: nuclear and satellite roles. Nuclear roles, answer the questions &quot;who/what was involved?&quot; about the situation described by the clause. They include the process itself, generally surfacing as the verb and its associated participants surfacing as verb arguments. Satellite roles (also called adverbials) answer the questions &quot;when/where/why/how did it happen?&quot; and surfa.ce as the remaining clause complements.</Paragraph> <Paragraph position="2"> Following this sub-division of thematic roles, the clause sub-grammar is divided into four orthogonal systems: (1) Transitivity, which handles mapping of nuclear thematic roles onto a default core syntactic structure for main assertive clauses.</Paragraph> <Paragraph position="3"> (2) Voice, which handles departures from the default core syntactic structure triggered by the use of syntactic alternations (e.g., passive or dative moves).</Paragraph> <Paragraph position="4"> (3) Mood, which handles departures from the default core syntactic structure triggered by variations in terms speech acts (e.g., interrogative or imperative clause) and syntactic functions (e.g.. matrix vs.</Paragraph> <Paragraph position="5"> subordinate clause).</Paragraph> <Paragraph position="6"> (4) Adverbial, which handles mapping of satellite roles onto the peripheral svntactic structure.</Paragraph> <Paragraph position="7"> Nominals are an extremely versatile syntactic category, and except for limited cases, no linguistic semantic classification of nominals has been provided. Consequently, while for clauses input can be provided in thematic form. for nominals it must be provided directly in terms of svntactic roles. The task of mapping domain-specific thematic relations to the syntactic slots in an NP is therefore left to the client program.</Paragraph> <Paragraph position="8"> The verb group grammar decompo~es in three major systems: tense, polarity and modality. SUR.GE implements the full 36 English tenses identified in \[4\] pp.19S207 It provides an interface to the client program is in terms Allen's temporal relations (e.g., to describe a past event.</Paragraph> <Paragraph position="9"> the client provides the feature (tpatl:ern (:et :before :sc)),specifying that the event time (et) precedes the speech time (st)).</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Current Work </SectionTitle> <Paragraph position="0"> The development of SURGE itself continues. as prompted by the needs of new applications, and by our better understanding of the respective tasks of syntactic realization and lexical choice \[2\]. We are specifically working on (1) integrating a more systematic implementation of Levin's Mternations within the grammar.</Paragraph> <Paragraph position="1"> (2) extending composite processes to include mental and verbal ones. (3) modifying the nominal grammar to support nominalizations and some forms of syntactic alternations and (4) improving the treatment of obligatory pronominalization and binding. As it stands, SURGE provides a comprehensive syntactic realization component, easy to integrate within a wide range of architectures tbr complete generation systems. It is available on the WWW at http ://www. cs .bgu. ac. il/surge/.</Paragraph> </Section> class="xml-element"></Paper>