.~N F~GLISH GF~NERATOR FOR A CASE-LABELLED DEP~qD~QCY REPRESENIW£10N 
John Irving Tait 
Acorn Computers Ltd. 
Fulboum~ Road 
Cherry Hinton 
Cambridge CB1 4JN 
U.K. 
Abstract 
The paper describes a progrmn which has been 
constructed to produce English strings from a 
case-labellea depenaency representation. The 
program uses an especially single and uniform 
control structure with a well defined separation 
of the different knowledge sources used during 
ge~,eration. Furthermore, the majority of t/le 
syst~n's knowledge is expressed in a declarative 
form, so in priciple the generator ' s knowledge 
bases could be used for purposes other than 
gex,eration. The ge~erator uses a two-pass control 
structure, the first translating from the 
s~nantically orientated case-labelled dependency 
structures into surface syntactic trees and the 
second translating from these trees into English 
str i~/s. 
The generator is very flexible: it can be run in 
such a way as to produce all the possible 
syntactically legitimate variations on a giveJ, 
utterance, and has built in facilities to do some 
synon~s substitution. It has been used in a 
nu, ber of application domains: notably as a part of 
a free text retrieval system and as part of a 
natural language front end to a relational database 
system. 
i. Introduction 
This pa\[~er describes a progrmn which has been 
constructed to translate from Boguraev ' s 
case-labelled depe~idency representations (Boguraev, 
1979: see also 8oguraev and Sparck Jones, 1982) to 
English strings. Although the principles on which 
the program has been constructed are primarily a 
new mix of established idea~, the generator 
incorporates a number of novel features. In 
particular, it caLlbines an especially simple a,~ 
uniform control structure with a well defined 
separatlon of t/le differe~,t ka,owledge sources used 
du~ing generation. It operates in two passes, the 
f~rst translating from the semantically orientated 
case-labelled dependency structures into surface 
syntactic trees a~,d the secona translating fran 
these trees into English strings. 
The translation fran de\[~_ndency structures to 
surface syntactic trees is the more c~mplex of the 
two passes unaertaken by the generator a~ will be 
described here. The other, translation from 
instantiated surface trees to text strings is 
relatively straightforward and will not be dealt 
with in this paper. It is fundamentally a tree 
flattening process, and is described in detail in 
Tait and Sparck Jones (1983). 
2. The Generator's Knowledge Structures 
The generator's Knowledge is separated into four 
sections, as follows. 
i) a set of bare templates of phrasal and 
clausal structures which restrict the 
surface trees other parts of the system may 
produce by defining the branching factor at 
a giv~_n node type. For example, the patterns 
record that English has intransitive, 
transitive and ditransitive, but not 
tritraneitive, verb phrases. The bare 
template for noun phrases is illustrated in 
Figure i. 
2 ) a lexicon and an associated morphological 
process~. 
3) a set of productzon rules which fill out 
partially instantiated syntactic trees 
produced from the phrasal ~,~ clausal 
patterns. These rules contain most of the 
syst~n's knowleuge about the relatzonship 
between the constructs of Boguraev' s 
representation la,~uage and English for~. 
4) another set of production rules which c~vert 
filled out surface trees to English strings. 
/-Q~antifier 
I -Determiner 
I -Or(/inal 
Noun Phrase = I -Ntm~er 
I -Adjective-list 
1 ?~%~l-modif ier- list 
\-\[~ost-mcdifers 
Figure i 
Template for Noun Phrase 
These four knowledge sources represent ti~e 
generator's entzre knowledge of both English and 
Boguraev ' s representation language. Although they 
are obviously interrelatea, each is distinct and 
separate. This well defzned separation greatly 
194 
increases t/~e extensability and maintainability of 
the syst~. 
A~ noted in the previous section the application of 
the rules of section 4 will not be discussed in 
this paper. The r~nainder of the paper discusses 
the use ~.%~de of t/~ first three knowledge sources. 
3. Tra,,slation frr, n Dependency Structures to 
Surface Syntactic Trees 
The pranary work of conversion frQm the dependency 
representations to the surface syntactic trees is 
~Luertaken by a set of production rules, each rule 
being associated with ane of the case labels used 
~, 8oguraev's representation scheme. These rules 
are applied by a suite of programs which exploit 
information about the structure of Bcguraev ' s 
dependency structures. For example they know where 
in a nominal aependency structure to find the word 
sense name of the head noun ('oscillatorl' in 
Figure 2) and where to find its case list (to 
which the production rules should be applied). 
(n (oscillatorl THING 
( @@ det ( thel ONE) ) 
(## nmod 
((((trace (clause v agent)) 
(clause 
(v (be2 BE 
( @@ agent 
(n (frequencyl SIGN)) ) 
(@@ state 
(st (n (n~,eless NIL)) 
(val ( high3 KISD ) ) ) ) 
))) ))) ))1 
Figure 2 
Boguraev Representation used for 
"the high frequency oscillator" 
It must be emphasize~ that Bcguraev's use of the 
teon case is much wider than is cxma,on in 
i inguistics. Not only is it used to cover 
prepositior~al attac~L~nt to nouns as ~ell as 
verbs; it is also used to cover sane other forms 
of attac~nent to, and modification of, nouns, for 
example by determiners ( like "a" ) and even for 
plural or singular number. In the pi~:ase "the high 
frequ~,cy oscillator", whose representation is 
illustrated by Figure 2, the link between 
' oscillatorl ' ( standing for "oscillator" ), and the 
determiner ( ' (thel ONE) ', representing "the") is 
the so-called case-label de__~t. Similarly the 
prenominal modifier "high frequent-y" (represented 
by ti~e c~nplex structure to the lower right of the 
flgure) is linked to 'oscillatorl' by nmod. 
Each ca~e-associated production rule takes four 
inputs, as follows: 
11 the depea\]dent iten attacheu to tI~ case link, 
for example ' (thel ONE)' i,i the case o~ det 
given below; 
2) an environment which is used to pass 
information from the processing of higher 
levels of the representation down to lower 
levels: for example tense fran the 
sentential level into an embedde~ relative 
clause; the enviroament is also used to allow 
various kinds of control over the generation 
process: for example to determine how many 
paraphrases of a sentence are produced; 
3 ) a partially instantiated phrase or clause 
template, which will ultimately form part of 
the surface syntactic tree output by the 
first pass of the generator; 
4 ) the dictionary entry for the daminant itam of 
tI~ current case list: in Figure 2 this is 
the entry for ' oscillatorl ', presented in 
Figure 3. 
(oscillatorl 
( oscillatorl-#1 
(root oscillator ) 
(syntax-patterns Noun-phrase-pattern ) ) ) 
Figure 3 
Dictionary entry for 'oscillatorl' 
The rules vary greatly in cx~nplexity: the structure 
illustrated in Figure 2 requires the use of both 
the simplest and most complex form of rule. 
The det production rule may be described in 
pseudo-English as: 
If the partially inst~,itiated template is for 
a noun ptu:ase then look up the lexical items 
(potentially synon~nl~) as~.~ciated with the 
word sense name 'thel', and insert each in 
the determiner slot in a new copy o~ r/le 
syntactic node. 
(Of course for English there is only one lexical 
item associated with 'thel': "the".) At the other 
extreme is the production rule for the nmod case. 
The nmcd case in Bcguraev's dependency structures 
is used to associate the pre-ncminal modifiers in 
a ccni~und nominal with tI~e ~ead notu~. The 
pre-~cminal modifiers are represented as a list of 
simple nQninal representations. 
(Noun-Phrase (NIL the NIL NIL NIL 
((Noun-Phrase NIL NIL NIL NIL 
(high) NIL frequ~icy NIL)) 
oscillator NIL) ) 
Figure 4 
Surface Structure Tree for 
"the high frequency oscillator" 
In English the nmod production rule might be 
195 
expres~eu a~: 
If the partially instantiated template is for 
a noun phrase, apply the processor which, 
given an existing ,~3minal representation, 
instantiates a corresponding phrasal 
~Iplate, to each nominal repr~ztati~z in 
the dependent item list: form the results 
into a set of lists, one for each 
combination of possible results for 
expressing each nominal: insert each result 
list ~zto a copy of the partially 
instantiated t~nplate Originally passed to 
the rule. 
The surface structure tree prc~L_~fed after these 
rules have been applied to the representation of 
Figure 2 is given in Figure 4. Note that the tree 
contains syntactic category names, and that 
unfilled slots in the tree are filled with NIL. 
Thus if the phrase to be generated was "all the 
high frequency oscillators", the flrst NIL in the 
surface syntactic tree (representing the unfilled 
quantifier slot of the dominant noun phrase node) 
would be replaced by "all". The order of the words 
in the surface syntactic tree represents the order 
in which they will be produced in the output 
sentence. 
These two production rules, for the det and 
case labels, are fairly typical o-f-those used 
el~ewhere in the system. There is, however, an 
{,nportant feature tt~y fail to illustrate. In 
c<xztrast with more ccnve~tional cases, ~ and 
det do not require the identification of a lexical 
~tem associated with the case-label itself. This is 
of course necessary when expressing prepositional 
plzases. 
4. Distinctive Feauures of this Translation Process 
The two most noteworthy features of the generation 
phase which produces surface structure trees are 
tl~e control structure employed and distribution of 
the syst~ language knowledge between its 
dl ~ferent components. 
NO mention Of the system's c~trol structure was 
made in the previous section. The structure used 
zs sufficiently powerful and elegant tlmt it could 
be ignored entirely when building up the systems 
~zowledge of Bcguraev's representation language 
an~ of English. However, the efficiency of the 
generator described here is largely a result of the 
control structure used. It is rare for this system 
to take more than a few fracti~,s of a sec~ to 
generate a sentex,ce: a sharp contrast with 
approaches based on unification, like Appelt's 
(1983) TELk~RAM. 
First the current representational structure is 
classified as clausal, sL~ple nominal, Or complex 
(typically relativised) nominal. Second, a suitable 
structure dismantling function is applied to the 
structure which identifies the head lexical token 
from the structure and separates out its case-list. 
Third the dictionary entry for the head lexical 
item is obtained, and. after checkinu the 
syntactic ~arKers in the dictionary ~,try anu 
phrasal or clause templates suitable for the 
environ~,t are ic~ztified. Fourth, appropriate 
production rules are applied to each ele, ent of the 
structure's case list in order to instantiate the 
templates. Frequently this whole process is applied 
recursively to some dependent representation level. 
So, for example, the representation for "high 
frequency" is prccessed by a second call of the 
noun phrase processor from within the call dealing 
with the dominant noninal, 'oscillatorl'. When the 
case list has been completely processed, the 
di~rsntling function applies any necessary 
morphological processing to the head lexical item 
( for example to reflect subject/verb and 
person/nu~ agre~Rent). 
This simple fra~nework covers all the processing 
done by the generator. 
The split ~etween the syntactic ~lowledge 
represented in the p|u:asal and clausal templates 
a~ in the production rules is also unusual. The 
templates define the shape of t/~e surface 
syntactic trees which the system can produce. It 
places no restrictions on the form of the fillers 
for any slot in a gran~ node. The production 
rules ~,force categorial and order~,~ 
restrictions. So, for example, the templates 
reflect the fact that English possesses 
hztransitive, transitive and ditransitive verbs, 
whilst the production rules ensure that the 
subject of a clause is of a suitable syntactic 
category, and that the subject precedes the verb 
in simple declarative sentences. 
The surface structure trees prcduce~ contain all 
the words in the sentence to be produced in the 
order and form in which they are to be output. Thus 
it is a straightforward matter to generate English 
strings fran them. 
5. C~iclusion 
The generator presented here is in essence a 
development of the Micro-Mumble generator 
descriheu in Mee|~ (1981). But in the process of 
extending Meehan's framework for a wide coverage 
system, his original design has been radically 
transformed. Most notably, the system described 
here has its syntactic knowledge largely separated 
fran its knowledge of the input representation 
language. It has, however, retained the eleg~It 
control structure of Meehan's original. This 
distinguishes it from the early generators in the 
same style, like Goldman's (1975) BABEL. 
At the san~ thne the generator described here is 
very flexible: it can be run in such a way as to 
produce all the possible syntactically legitimate 
variations on a given utterance, and has built in 
facilities to do same synonym substitution. The 
envircnn%~-nt mechanism is very ( perhaps too) 
powerful, and could be used to dynastically select 
possible ways of expressing a given structure in 
almost any way required. 
The system's knowledge of ,~tural language and of 
196 
t~ representation language is expressed in a 
fundmn~itally r%/le-like way, most notably without 
the use o£ an assignment ~necl~Lnism. In principle 
such rules could be used backwards, that is they 
could be used to parse incoming English. H~ver no 
work has been done to develop a parser which uses 
t/~ generators rules, so this possibility remains 
pure speculation at present. 
The generator described here, it must be 
e,pbasized, covers Only part of the task of 
generation. Unlike, for example, McKecwn's (1980) 
system, it deals not with what to say, but only 
with how to say it. Boguraev ' s representation 
identifies sentence bot~K~aries and the majority of 
content word~ to be used in the utterance being 
produceu (see Figure i), making the task of the 
generator relatively straightforward. However, the 
techniques used could deal with a representation 
which was much less closely related to the surface 
text provided this representation retained a 
fairly straightforward relationship between 
propositional units of the meaning representation 
~u~ the clausal structure of the language. For 
example, a representat ion language which 
represented ally states and times, but not the 
events which linked different states and times 
would probably require a more puwerful framework 
than ti~t provided by the generator described 
here. Hc~ver, another case-labelled dependency 
language, like Schank's ( 1975 ) Conceptual 
Dependency (CD) Representation, could be handled 
by providing the ge~lerator with a new set of 
syntactico-semant£c production rules, a new lexicon ~ 
and t/~ replaca~ent of the functions for 
dismantling Boguraev's dependency representation 
with functions for dismantling CD structures. 
The fr~ork of ti~ g~lerator has been completely 
implemented and tested with a lexicon of a few 
hundred words and a grammar covering much of the 
E,~lish noun plu:ase and a number of the more 
straightforward sentence types. It has bee__n used 
in a number of applications, most notably document 
retrieval (Sparck Jones and Tait, 1984a and 1984b) 
and relational database access (Bcguraev and 
Sparck Jales, 1983). 
The program described here is efficient (rarely 
taking more than a few fractions of second to 
generate a seJ,tence) in c~,trast with approaches 
based On complex pattern matching (like Appelt 
(1983), and Jacohs (1983)). On the other |round, the 
esse~itial simplicity and uniformity of the approach 
adopted here has meant that the generator is no 
,sore difficult to maintain and extend than i~re 
linguistically motivated approaches, for example 
Appelt's. Thus it has demonstrated its usefulness 
as a practical tool for computational linguistic 
research. 
~CKNOWLE\[~S~2~TS 
This work was supported by the British Library 
Research and Development Department and was 
undertaken in the University of C;,nbridge Ccmguter 
Laboratory. I would like to thank Bran Boguraev, 
Ted Briscce and Karen Sparck Jones for the helpful 
comments they made on the first draft of this 
paper. I would also like to th~ my ~onymous 
referees for the very helpful comments they ~aade on 
the an earlier draft of the paper. 
REFER~S 
Appelt, D.E. (1983) TELS3RAM: A Grammar Formalism 
for Language Planning. Proceedings of the 
Eighth International Joint Conference on 
Artificial Intelligence. Karlsruhe. 
Boguraev, B. K. (1979) Autcmatic Resolution of 
Linguistic Ambiguities. Technical Report No. Ii, 
University of Cambridge Computer Laboratory. 
Boguraev, B.K. and K. Sparck Jones (1982) A natural 
language ~,~lyser for database access. In 
Information Technology: Research and 
Development; vol. i. 
Bo~uraev, B.K. and K. Sparck Jones (1983) A natural 
language front end to data bases with 
evaluative feedback. In New Applications of 
Da~aha~as (Ed. Garadin and Gelenbe), Academic 
Press, London. 
Goldman, N. (1975) Conceptual Generation. In 
Conceptual Information Processing, R. C. 
Schank, North Holland, Amsterda~n. 
Jacobs, P. S. (1983) Generation in a Natural 
Language In~erface. Proceedings of the Eighth 
International Joint Conference on Artificial 
Intelligence. Karlsru|~. 
McKecwn, K .R. ( 1980 ), Generati~ Relevant 
Explanations: Natural Language Responses to 
Questions about Database Structure. Proceedings 
of the First Annual National C~,\[erence on 
Artificial Intelligence, Stanford, Ca. 
Meehan, J. ( 198i ) Micro-TALE-SPIN. In Inside 
Computer Understanding, R.C. Schank and C.K. 
Riesbeck, Lawrence Erlbaum A~sociates, 
Hillsdale, New Jersey. 
Schank, R. C. ( 1975 ) Conceptual Infom~at 1on 
Processing, North Holland, Amsterdam. 
Sparck Jones K. and J. I. Tait (1984a), Automatic 
Search Term Variant Generation. Journal of 
Documentation, Vol 40, No. i. 
Sparck Jones, K. and J. I. Tait ( 1984b), 
Linguistically Motivated Descriptive Term 
Selection. Proceedings of COLING ~4, Association 
for Computational Linguistics, Stanford. 
Tait, J.I. and K. Sparck Jones (1983), Aut~natic 
Search Term Variau,t Generation for Document 
Retrieval; British Library R&D Report 5793, 
Cambridge. 
197 
