An Overview of SURGE: a Reusable 
Comprehensive Syntactic Realization 
Component 
Michael Elhadad 
elhadad©cs, bgu. ac. il 
Mathematics and Computer Science Dept. 
Ben Gurion University in the Negev 
Beer Sheva, 84105 Israel 
Jacques Robin 
j r~di. ufpe. br 
Departamento de Informgtica 
Universidade Federal de Pernambuco 
Recife. PE 50740-540 Brazil 
1 Introduction 
This paper describes a short demo provid- 
ing an overview of SURGE (Systemic Unifi- 
cation Realization Grammar of English) a 
syntactic realization front-end for natural 
language generation systems. Developed 
over the last seven years 1 it embeds one 
of the most comprehensive computational 
grammar of English for generation avail- 
able to date. It has been successfully re- 
used in eight generators, that have little in 
common in terms of architecture. It has 
also been used for teaching natural lan- 
guage generation at several academic in- 
stitutions. 
We first define the task of a stand-alone 
syntactic realization component within 
the overall generation process. We then 
briefly survey the computational formal- 
ism underlying the implementation of 
SURGg as well as the syntactic theories 
that it integrates. We then describe the 
structure of the grammar. 
1The research presented in this paper started 
out while the authors were doing their PhD. at. 
Columbia University, New York. We are both 
indebted to Kathleen McKeown for her guidance 
and support during those years. 
2 Reusable Realization 
Component for NLG 
Natural language generation has been tra- 
ditionally divided into three successive 
tasks: (1) content determination, (2) con- 
tent organization, and (3) linguistic real- 
ization. The goal of a re-usable realization 
component is to encapsulate the domain- 
independent part of this third task. The 
input to such component should thus be 
as high-level as possible without hindering 
portability. Independent efforts to define 
such an input have crystalized around a 
skeletal, partially lexicMized thematic tree 
specifying the semantic roles, open-class 
lexical items and top-level syntactic cat- 
egory of each constituents. An example 
SURGE input with the corresponding sen- 
tence is given in Fig. 1. 
The task of the realization component 
is to map such skeletal tree onto a natural 
language sentence. It involves the follow- 
ing sub-tasks: 
(1) Map thematic structure onto syntac- 
tic roles: e.g., agent; 
process, possessed and pcssessor onto 
subjec't, verb-group, direc't-objec't 
and indirect-object (respectively) in 
$1. 
(2) Control syntactic paraphrasing and al- 
Input Specification (I1): 
cat 
process 
partzc 
clause 
type relation 
lex 
agent 
affected 
possessor 
possessed 
composite \] 
possessive 
"hand" 
\[ cat pers_pro \] 
gender feminine m\[ \] 
let "editor" 
N 
lex "draft" 
Output Sentence ($1): "She hands the draft to the 
editor" 
Figure 1: An {;xample St!RGE I/O 
ternations \[6\]: e.g., adding tile 
(dative-move yes) feature to 11 would 
result in the generation of the paraphrase 
($2): "She hands the editor the draft". 
(3) Prevent over-generation: e.g., fail 
when adding the same (dative-move 
yes) feature to an input similar to I1 
except that the possessed role is filled 
by ((cat pers-pro)) (for personal pro- 
noun) to avoid the generation of ($8) * 
"She hands the editor it". 
(4) Provide defaults for syntactic features: 
e.g.. definite for the NPs of $1. 
(5) Propagate agreement features, provid- 
ing enough input to the morphology mod- 
ule: e.g.. after the agent and process 
thematic roles have been mapped to the 
subject and verb-group syntactic roles. 
propagate the default (person third) 
feature added to the subject filler to the 
verb-group filler; without such a propa- 
gation the morphology module would not 
be able to inflect the verb "to hand" as 
• 'hands" in $1. 
(6) Select closed-class words: e.g., "'she", 
"'the" and "'to" in $1. 
(7) Provide linear precedence constraints 
among syntactic con- 
stituents: e.g., subject > verb-group > 
indirect-object > direct-object once 
the default active voice has been chosen 
for $1. 
(8) Inflect open-class words (morphologi- 
cal processilLg): e.g., the verb "to hand" 
as "'hands" in $1. 
(9) Linearize the syntactic tree into a 
string of inflected words following the lin- 
ear precedence constraints. 
3 The FUF/SURGE package 
SURGE is implemented in the special- 
purpose programming language PuP \[1\] 
and it is distributed as a package with 
a PuP interpreter. This interpreter has 
two components: (1) the functional unifier 
that fleshes Out the input skeletal tree with 
syntactic features from the grammar, and 
(2) the linearizer that inflects each word 
at the bottom of the fleshed out tree and 
print them out following the linear prece- 
dence constraints indicated in the tree. 
~u F is an extension of the original func- 
tional unification formalism put forward 
bv Kay \[5\]. It is based on two powerful 
concepts: encoding knowledge in recursive 
sets of attribute value pairs called Func- 
tional Descriptions (FD) and uniformly 
manipulating these FDs through the op- 
eration of unification. 
Both the input and the output of a FUF 
program are FDs, while the program itself 
is a meta-FD called a Functional Gram- 
mar (FG). An FG is an FD with disjunc- 
tions and control annotations. Control an- 
notations are used in Fur for two distinct 
purposes: (1) to control recursion on lin- 
guistic constituents: the tree of the in- 
put FD is fleshed out in top-down fashion 
by re-unifying each of its sub-constituent 
with the FG. and (2) to reduce backtrack- 
ing when processing disjunctions. 
SURGE represents our own synthesis, 
within a single working system and com- 
putational framework, of the descriptive 
work of several (non-computational) lin- 
guists. We took inspiration principally 
from \[4\] for the overall organization of the 
grammar and the core of the clause ,rod 
nominal sub-grammars; \[3\] for the seman- 
tic aspects of the clause; \[7\] for the treat- 
ment of long-distance dependencies: and 
\[8\] for the many linguistic phenomena not 
mentioned in other works, yet encountered 
in many generation application domains. 
Since many of these sources belong to 
the systemic linguistic school, SURGE iS 
mostly a functional unification implemen- 
tation of systemic grammar. In particu- 
lar, the type of FD that it accepts as input 
specifies a "process" in the systemic sense: 
it can be an event: or a relation. The hier- 
archy of general process types defining the 
thematic structure of a clause (and the as- 
sociated semantic class of its main verb) 
in the current implementation is compact 
and able to cover many clause structures. 
Yet, the argument structure and/or se- 
mantics of many English verbs do not fit 
neatly in any element of this hierarchy \[6\]. 
To overcome this difficulty. SURGE also in- 
cludes lexical processes inspired bv lexieal- 
ist grammars such as the Meaning-Text 
Theory and HPSG \[7\]. 
A lexical process is a shallower and less 
semantic form of input, where the sub- 
categorization constraints and the map- 
ping from the thematic roles to the oblique 
roles \[7\] are already specified (instead 
of being automatically computed by the 
grammar as is the case for general pro- 
cess types). The use of specific lexical 
processes to complement general process 
types is an example of the type of theorv 
integration that we were forced to carry 
out during the development of SURGE. In 
the current state of linguistic research, 
such an heterogeneous approach is the 
best practical strategy to provide broad 
coverage. 
4 Organization and Cover- 
age 
At the top-level, SURGE is organized into 
sub-grammars, one for each syntactic cat- 
egory. Each sub-grammar encapsulates 
the relevant part of the grammar to ac- 
cess when recursively unifying an input 
sub-constituent of the corresponding cat- 
egory. For example, generating the sen- 
tence ".lames buys the book" involves suc- 
cessively accessing the sub-grammars for 
the clause, the verb group, the nomi- 
nal group (twice) and the determiner se- 
quence. Each sub-grammar is then di- 
vided into a set of systems (in the systemic 
sense), each one encapsulating an orthog- 
onal set of decisions, constraints and fea- 
tures. The main top-level syntactic cate- 
gories used in SURGE are: clause, nominal 
group (or NP), determiner sequence, verb 
group, adjectival phrase and PP. 
Following \[4\], the thematic roles ac- 
cepted by SURGE in input clause specifica- 
tions first divide into: nuclear and satellite 
roles. Nuclear roles, answer the questions 
"who/what was involved?" about the sit- 
uation described by the clause. They in- 
clude the process itself, generally surfac- 
ing as the verb and its associated partici- 
pants surfacing as verb arguments. Satel- 
lite roles (also called adverbials) answer 
the questions "when/where/why/how did 
it happen?" and surfa.ce as the remaining 
clause complements. 
Following this sub-division of thematic 
roles, the clause sub-grammar is divided 
into four orthogonal systems: 
(1) Transitivity, which handles mapping 
of nuclear thematic roles onto a default 
core syntactic structure for main assertive 
clauses. 
(2) Voice, which handles departures from 
the default core syntactic structure trig- 
gered by the use of syntactic alternations 
(e.g., passive or dative moves). 
(3) Mood, which handles departures from 
the default core syntactic structure trig- 
gered by variations in terms speech acts 
(e.g., interrogative or imperative clause) 
and syntactic functions (e.g.. matrix vs. 
subordinate clause). 
(4) Adverbial, which handles mapping of 
satellite roles onto the peripheral svntactic 
structure. 
Nominals are an extremely versatile 
syntactic category, and except for limited 
cases, no linguistic semantic classification 
of nominals has been provided. Conse- 
quently, while for clauses input can be pro- 
vided in thematic form. for nominals it 
must be provided directly in terms of svn- 
tactic roles. The task of mapping domain- 
specific thematic relations to the syntactic 
slots in an NP is therefore left to the client 
program. 
The verb group grammar decompo~es 
in three major systems: tense, polarity 
and modality. SUR.GE implements the full 
36 English tenses identified in \[4\] pp.19S- 
207 It provides an interface to the client 
program is in terms Allen's temporal re- 
lations (e.g., to describe a past event. 
the client provides the feature (tpatl:ern 
(:et :before :sc)),specifying that the 
event time (et) precedes the speech time 
(st)). 
5 Current Work 
The development of SURGE itself contin- 
ues. as prompted by the needs of new 
applications, and by our better under- 
standing of the respective tasks of syntac- 
tic realization and lexical choice \[2\]. We 
are specifically working on (1) integrat- 
ing a more systematic implementation of 
Levin's Mternations within the grammar. 
(2) extending composite processes to in- 
clude mental and verbal ones. (3) modify- 
ing the nominal grammar to support nom- 
inalizations and some forms of syntactic 
alternations and (4) improving the treat- 
ment of obligatory pronominalization and 
binding. As it stands, SURGE provides a 
comprehensive syntactic realization com- 
ponent, easy to integrate within a wide 
range of architectures tbr complete genera- 
tion systems. It is available on the WWW 
at http ://www. cs .bgu. ac. il/surge/. 

References 
\[1\] M. Elhadad. Using argumentatior~ to 
control lezical choice: a unification- 
based implementation. PhD the- 
sis, Computer Science Department. 
Columbia University, 1993. 
\[2\] M. Elhadad, K. McKeown. and 
• l. Robin. Floatings constraints in lexi- 
cal choice. Computational Linguistics. 
1996. To appear. 
\[3\] R. Fawcett. The semantics of clause 
and verb for relational processes in en- 
glish. In M. Halliday and R. Fawcett, 
editors. New developments in systemic 
linguistics. Frances Pinter, London 
and New York, 1987. 
\[4\] M. Halliday. An introduction to func- 
tional grammar. Edward Arnold. Lon- 
don. 1994. 2nd Edition. 
\[5\] M. Kay. Functional grammar. In Pro- 
ceedings of the 5th Annual Meeting of 
the Berkeley Linguistic Society, 1979. 
\[6\] B. Levin. English verb classes and 
alternations: a preliminary investiga- 
tion. University of Chicago Press. 
1.993. 
\[7\] C. Pollard and \[. A. Sag. Head Driven 
Phrase Structure Grammar. University 
of Chicago Press, Chicago, 1994. 
\[8\] R. Quirk, S. Greenbaum, G. Leech. 
and J. Svartvik. A comprehen- 
sive grammar of the English language. 
Longman, 1985. 
