24 
1965 International Conference 
on Computational Linguistics 
GENEP~TION, P~ODUCTION, AND T~ANSLATION 
Petr Sgall 
Malostransk@ n. 23 
Prague I 
Czechoslovakia 
Sgall 1 
Abstract: The notions of generation, production, synthesis 
etc. are analyzed, with the conclusion, that the description of 
a natural language should have ~ a creative and a trans- 
ductive part. As a system meeting the principal known conditions, 
a sequence of pushdown store transducers, interpreted as a stra- 
t~f~cational language descrlption~is proposed. 
Sgall 2 
I.I In the works of N. ~homsky, P.M.Postal, J.J.Katz and 
others there are formulated three zeneral aims of generative 
grammar: 
a) the grammar generatesf all and only the grammatical senten- 
ces of the doscribed languase, in the senBe of a mathem,~tical 
enumeration of the set of grammatical sentences; 
b) the grammar automatically ascribes to each generated sentence 
a structural d~scription, not contradictory to the intuitive 
evidence of native speakers~ if the same sentence is generated 
by the grammar in ~ different ways, ~ different structural 
descriptions a~e ascribed to it~ 
c) the grammar is a description of the mechanism the user of the 
language has internalized and uses in the process of langua- 
ge co~2.unication (as a speaker, hearer etc.). 
There are, of course, other aims or conditions laid upon 
generative grammars, e.g. those concerning degrees of grmmmati ~ 
calness of sentences. But our main point here concerns the three 
conditions cited; we assume that these a~e actually three diffe- 
rent conditions, in particular that ~ and ~ are not identlca~ 
being implied by ~, so that a grammar can exist, which satisfies 
the conditions A and ~, but not ~. 
I~2 If we examine the new version of transformational 
descript-i~n~from that point of view, we see that there are 
difficulties concerning the ccndition ~. Of course, we are now 
applying the conditions not only to the grammar, but to the 
whole description, including its semantic component. ~e are not 
certain, however, that the mutual relations of the thyee components 
of this description have been characterized quite correctly. There 
is one creative component, (A), whose output is interpreted by 
Sgall 5 
both (B), i.e. semantics, and (C); but then - in my opinion - (A) 
ig the basis (including lexical rules), and the transformational 
and phonological parts 0elong to (C). Whether this is the case or 
not, the formulation of the new shape of the transformational 
description is a further and important step in overcoming the one- 
sidedness of descriptivism. An adequate basis for fulfilling the 
conditions g and ~ may be considered to have been reached in ~his 
way. 
But our question remains: can this system really be considered 
to describe the mechanism used in speech communication? 
J.J.Katz writes (/5/, p.4; similarly in /6/): "The whole 
process may be pictured as follows: the speaker, for reasons that 
are biographically relevant but linguistically not, chooses a 
message he wants to convey to the hearer .... This message is 
translated into syntactic form by the selection of a syntactic 
structure whose semantic interpretation is this message~' Of course, 
the question, why a particular message has been chosen, is not a 
linguistic one. But if the mechanism used by the speaker to trans- 
late the message into syntactic form were to be described as a 
device for translating, not just for selection, this other des- 
cription would be preferable, as to that point. 
1.3 The generative description is neutral to the speaker and 
to the hearer, i.e. to the production and understandlng of senten- 
ces. But in a sense the generation is nearer to the production, 
just as the recognition procedure is nearer to the understanding 
of sentences. That is, the sentence appears as an output string in 
the generative description and in the process of production, whereas 
in the recognition routine and in the process of understanding it 
appears as an input string. 
N.C~omsky introduced the distinction of the competence and the 
performance of language users;~ the methodological importance of 
Sgall 4 
this distinction is undeniable. But it is not evident whether 
the generative grammar is the same as the description of the 
competence of both speaker and hearer; it is not clear whether 
a recognition grammar belongs to the realm of performance (Cfo 
Hays /4Tp.8/) , or to the realm of competence, also (cf~ catego- 
rical and other grammars, formulated as recognition grammars, 
e.g. /I/, but not intended as models of the hearer's activity). 
One sees, of course, that the actual programs for random genera- 
tion of sentences /17,9/ are not models of the speaker's activity, 
either. But that is not due, perhaps, to their being related to 
generative grammars. 
So our questions are: Has a description of the user's 
coz~petence to be neutral with regard to the speaker and the 
hea~er? (Or, better: Are all the differences between the speaker's 
and the he~rer's activity due only to their performance, are there 
no differences which should be respected in a description of their 
competence, of the mechanism they have internalized?) And, further: 
If the generative description is neutral in this respect, is the 
recognition description also neutral? - ~f we assume that the set 
of grammatical ~entences is a recursive one, then why should we 
prefer ~eneration as the form af the description of lanA~age, and 
not recognition? 
2.1. The mechanism internalized by the user of a language 
enables him (I) to choose a message as such, (II) to formulate 
this message, i.e. to translate it from its "semantic" form to 
one of the corresponding phonetic forms, (IIl) to "interpret" , 
or understand a sentence, i.e. to translate it f~om its phonetic 
form to an~ of the corresponding "semantic" forms, (IV) to choose 
the most appropriate form, while translating in either direction. 
The parts (I) and (IV) cannot be fully described by linguistic 
Sgall 5 
means alone. As to parts (II) and (III), we see that they are 
essentially translating procedures. They may include some blocking 
or checking sub parts, but we should try to describe at least their 
main parts as tr~msducers or in some similar way (or, alternatively, 
to show that this is impossible). 
The vast and heterogeneous field presystematically called 
4 semantics should be analyzed from such a point of view; there are 
elements in it which belong to the "transductive" part of the 
mechanism and are purely linguistic;~ the relations between levels 
or strata of the l~n~uage system can be described as "semantic" 
relations (relations of"representation", with the so-called 
"assymetrical2 dualism", i.e. generally manyvj- toz many relations 
/~,I~. On the other hand, there are othe~ quite distinct, "semantic" 
problems, concerning the manner of choice of a message itself, and 
\ 
of these only some are intrinsically linguistic (e.g. questions 
concerning anal~contradiction). 
2.2 Even if the mechanism briefly outlined above could be 
integrally described, it ~ould be a description of the competence, 
not of the perfd~mance of a user of language. It would not describe, 
particularly, the relations connected with restrictions of memory 
and other faculties of individual speakers, nor v~rious possible 
short-cuts used during the process of speech. It would hardly be 
possible to de~cyibe, by linguistic means, the conditions under by 
which a construction is being built upVthe speaker during the act 
of speech, and the conditions under which it is delivered as a whole 
by the memory~ only extreme eases, where there is only the latter~ 
possibility, are distinguished explicitly by the grammar; the former 
constructions are, e.g. "analytical" forms of newly coined or 
accepted words formed "by analogy"; at the other extreme are idioms, 
"irregular" forms and other "exceptions", which have to be listed 
Sgall 6 
by the grammar. In most cases, there are both possibilities, and 
it is not one of the purposes of linguistics to say whether (or, 
under what conditions) the spe~ker chooses the first or the second. 
We can say, then, that the distinction between competence and 
performance is independent from that between speaker and hea~er. 
If the description contains some part(s) used only by the speaker, 
and others used only by the hearer, it does not follow that it is 
no longer a description only of their competence. 
2.3. If I understand correctly the words of J.J.Katz, cited 
in 1.2, then he defines the message as a unit of the semantic level 
(the message is a semantic interpretation of a sentoid). Enumeration 
of all possible messages (of a language) is a necessary part (or 
prerequisite, perhaps) of a description of a language, and it is,of 
course, preferable that the mechanism of this enumeration should 
ascribe structural descriptions to the generated messages (see 
in 1.1). But it is not elear whether the transformational descrip- 
tion really can serve as such an enumeration. If so (e.g. if the 
semantic component does not, as a filter, block an infinite number 
of sentoids), then it would be possible to reverse the semantic 
component in some way (to discover its inverse machine), and use 
this inverted procedure as the description of the mechanism the 
speaker uses while translating the message to its syntactic form. 
If there is no such possibility, then the semantic component in its 
present form is not satisfactory, either. 
On th~ other hand, it is not necessary, in a linguistic descrip- 
tion, to ~enerate the set of messages by a system which could serve 
as a model of the mechanism internalized by the users of a language 
(1.1.~). Bearing in m~nd, that the description of the complicated 
relations between the semantic and phonetic levels, and not the 
selection of the message, belongs to the classical aims of linguis- 
tics, we can model not the .:~echanism used by the speaker and the 
Sgall 7 • 
hearer, but %~used by the translator, who has to discover the~K" 
meaning(s) pertaining to a certain phonetic string and vice versa, 
I 
but not to choose the meaning itself. (But cf.3.3.)~ Of course, 
we have to distinguish here between cases where translation 
proceeds only as an operation on texts, without using other than 
linguistics knowledge and knowledge obtained from the translated 
text itself, and cases, where the translator has to obtain from 
external sources information not contained in the text but 
D ~ 1 " " ~'~ ~e ~eq, ired by the grammar of the output lan~uag (in cases as "king 
John's son" tran~l~ted to German "ein Sohn des K6nigs J." or 
"der Sohn. " of./13/) We are concerned here only with transla- "• 3 • 
tion in the narrow sense;, the mechanism descrioed does not enable 
the translator to select one of the possibilities in the cases 
where such external information is necessary. 
3.1• Viewed± in this way, the generative system should have~,h 
twQ parts: a purely generative one, serving to enumerate possible 
messages, or meanings of messages, and another "transductive" part, 
describing the tr~nslation of messages from one level to another. 
The message can be composed, of course, of many sentences, but, 
as is usual today, we take no account of the difference and 
speak of sentences onl~ in the Sequel. 
This sjstem should not be more elaborate than necessary, but 
there are some known conditions it has to fulfil. As to weak gene- 
rative power , it has to generate not only the sets of strings gene- 
2ated by a context-free phrase structure grammar, but also at least 
set of all strings of the form xa_~x, where ~ is a symbol o~he the 
output vocabulary and ~ is any string of such symbols not containing 
~; cf., in a somewhat different formulation, /12/. As to strong 
generative power, the system has to ascribe not only taxonomical 
structural descriptions, i.e. there must be a possibility of having 
' Sgall 8 
structural descriptions not just for sentences, but for several le- 
vels of sentence structure, such as the deep structure and the 
surface structure. 
3.2. One possible system meeting these conditions is a 
sequence of pushdown store transducers /11;3/, interpreted as a 
stratificational description /8;14/. Its first, generative and recur- 
sire part would be a context-free phrase structure grammar or a sys- 
T~r~ ~\]r tem .... ~.ly equivalent to it, enumerating the "sentence meanings "~ 
in such a way that the output string itself serves as the descrip- 
tion of the sentence structure at this level. This string is 
t~mslated then by an other pushdown store transducer 6r other~a~ 
transducers)to the level of "surface stA'ucture" (a l&nearized 
tree 
version Of the dependency mx~ of the sentence), and furt- 
her to the morphological level. For a ~of such a system, see 
/15; 16/. 
We can formulate the first part of this system so as to gene- 
rate all strings of the form xa_._xx" and only them (where ~" is the 
reflection of ~), and the second part can consist of the following 
rules (symbolized here as done by Evey, with some simplification; 
denotes any symbol of the input alphabet other than ~ ): 
input p inn. --@ inn. p 
state state 
b 1 1 
a 1 2 
b 2 Z b 
3 stop! 
output 
b 
8 
b 
Sgall 9 
The output of this translator would be x ax, whenever xa__.xx war 
its input string. So the first condition of 3.1. is fulfilled. It 
should be noted that we are not u~3ing the word "translate" in the 
technical sense used by R.J.Evey; the input language of our trans- 
lator contains the set of the strings xax" as its proper sub~.~et. 
There is still a possibility that, at least for some languages, a 
description would be adequate where the input language of each 
translator is the same as the output language of its predecessor 
in the sequence of machines. Th@ weak generative power of such a 
system would then equal that of a context-free phrase structure 
l~mguage (cf./13/, theorem 2:6.6). 
As to the socond condition, concerning stror~ generative power, 
our system, as a stratificational one, shares the main property of 
transformational description and cannot be held to be taxonomic. 
There ~re three levels or strata (not to speak of phonology amd 
phonetics; but the question of the nmmber of strata is an empirical 
one and it is possible that typologically different languages do 
not coincide in this respect; we are working v~ith inflected 
languages, such as Czech or ~ussian): the tectogrammatical or 
"semantic" level (Lo) , the phenogrammatical level (surface structure, 
L1) , and the morpholo ical level (L2). The sequence of representations 
of a sentence on each of these levels, s member of L~ LIXL 2 
~ulfilling the condition that each of its elements is derivable 
by the transducer system (from its predecessor in the sequence, if 
Shore is any), is a structural description of that sentence. A 
complete derivational history is not needed here, because the symbols 
useful for the structural description appear in the terminal strings 
of the transducers, so that these strings (or some of them, as a 
sequence of two transducers is generally needed to translate the 
sentence f~m one level to a~other)themselves serve as the 
Sgall I0 
representations of sentences at the corresponding level. 
So, not a unique description of the ss~actic structure is 
ascribed to a sentence (with no homonymy) but, rather, two dis- 
tinct descriptions (elements of L 0 and L1, respectively), whose 
mutual relation can be compared with that of Chomsky's deep struc- 
ture and surface structure. In cases such as questions, grammati- 
cally conditioned change of word order apparently disturbing the 
constituent structure of the ~entence, nominalizations etc., the 
rules of our system are to a great extent parallel to those of the 
tr~nsformational description. But cases such as the passive or 
lexical synonyL~ and honlonymy are handled in our system in such a 
% 
way that the representation of the sentence at the tecto~rammatical 
level is unique for the synonymous expressions (and distinct, of 
course, for the homonymous ones). 
The third level of our system, the morphological, includes 
categories of "rectlon" and concord, not included in the "syntac- 
tic" levels. 
3.3. The recursive component of our system, w~th the initial 
symbol ~ and the gradual derivation of the terminal string first 
by rules of expansion and then by rules of selection, does not 
_ _ ~, stated in 2.3 , it is not necessa- Wulfil condition c o,? 1.I. But, ~ • 
ry that a linguistic description should contain a description of the 
mecl~nism used by the speaker in that part of his activity. Rossibly 
this gap can be filled in the future by reformulation of the gene-- 
ration routine. 
There are other difficulties in this component, concerning the 
condition ~. The main point here is the formulation of context 
restrictions relevant to the selection of terminal symbols; we 
think that it is possible to define various partitions on the 
vocabulary of this component, so that we can work with rule schemata 
Sgall 11 
not only as with a conventional simplification of notatien~ but 
also as with an intrinsic part of the system;~ the weak equivalence 
with a context-free phrase structure grammar should not be lost by 
this. 
Another remark concerns the confronYation of the tectogra- 
mmatical levels of two language% necessarily made by a translator. 
An ideal case of translation would be the full coizcidence of both 
these levels. Generallyj the translator, if he is to translate 
correG~ly, is supposed to use (as a part of the mechanism interna- 
lized) some routine converting strings of the tecto&Tammatical 
level of the input language to those of the output language. This 
part of the mechanism is not as yet described by our system, as we 
consider one language only, i.e. a translator translating from 
his mother tongue to the some l~nguage. But elaboration of this 
part of the system is needed, and is directly relevant to Machine 
Translation as well as to general linguistic theory. 
3.4. The transducers have to fulfil certain special conditions, 
so that the existence not only of an inverse machine ~bir each of 
them but also of a recognition routine for the whole system is 
ensured as well. These conditions, which we shall not present here 
in full, are parallel to the absence of deletion rules in a phrase 
structure grammar under similar circumstances. We can say, infor- 
mally, that no symbol re~d in the input is deleted by the rules 
of a transducer, except in cases where that symbol can be determined 
uniquely by the resulting output. Free deletions are confined to 
transduction from the morphonological to the phonological level, 
or from the phonological to the phonetic level. At those stages, 
the existence of a recognition routine is given by the fact that 
the corresponding transducers are finite state#. 
The system can serve then, in its transductive part, as a 
Sgall 12 
description of a part of the translator's competence, as well as 
a basis for translation al~orithmus. As to the actual performance, 
as well as the actual algorithms, here various digressions from 
the process defined by the transducer system can be found, as well 
as restrictions due to the finite memory etc. A hearer performing 
(mostly in an unconscious way, probably) the process of understan- 
ding a sentence, does not have to follow the rules of the abpve 
mentioned inverse machines punctually. He can use various short- 
cuts or trial-and-error methods, checking the results (cf.Matthews, 
/I0/). But t~at may also be true for the speaker, as far as we know. 
3.5. With regard to terminology, it is possible to distinguish 
several pairs of notions concerning routines of both kinds: 
the sentence (its phonetic 
representation) is 
in the output 
enumeration 
~eneration 
in the input 
decision 
recognition 
synthesis analy sis 
production understanding 
note 
definition of a set 
...combined with automatic 
a~3cribing of structural 
description 
...interpretable as the 
description of the mecha- 
nism used by the translator 
...process of performance 
4. So we arrive at the conclusion that the connection bwtween 
the theory of grammar and the purposes of Machine Translation is an 
intrinsic one. It is of importance to practical research (where 
research on MT, using the findings of the theory of grammar and 
algebraic linguistics, makes possible through study of its empirical 
Sgall IS 
questions and testing of its working hypotheses). Moreover, the 
relation~ship between the theory and the purposes of tr~Lnslation 
has a bearing on the understanding of the theoretical co~epts them- 
selves. And the importance of this a~pect Will increase, probably, 
as theory proceeds from the study of individual l~nguages to the 
~escription of language universals and other relations between 
natural lsngu~%es, and to the description of language in generalo 
Sgall 14 
r% 

References 

/I/ Y.Bar-Hillel, Language and Information, Je~"usalem- Re~ding- 
London, 1964 

/2/ N.Chomsky, Catego~'ies and Y<elations in Syntactic Theory (mimeo~jr. 
for the Conference on Structural Linguistics, ~lagdeburg, 
1964) 

/3/ R.J.Evey, The Theory and Applications of Pushdown Store ~%achi- 
nes, Mathematical Linguistics and Automatic Translation~ 
Report No. IO-NSF of M.I.T., Laboratory of Electronics, 
Cambridge, Mass., 1963 

/4/ D.G.Hays, Dependency Theory: A Formalism and Some Observations, 
I/emorandum ~,~-4087/P2, The i~and Corp., Sta. l~onica, 1964 

/5/ J.J.Katz, The Semantic Component of a Linguistic Description 
(mimeogT. for the Conference on Structtu'al Linguistics, 
Magdeburg, 1964) 

/6/ J.J.Katz, In ~re Defen@e of i~lenta\]ism, Language~ 1964 

/7/ J.J.Katz - P.~i.Postal, An Inte~grated Theory of Linguistic 
'~ I.T. Descriptions, ~eSearch ~onograph No. 26, ~,,. 
1964 Cambridge, Lass., 

/8/ S.~,~.Lamb, The Ser:emic Approach to Semantics, Ame "~. Anthropologist 
66, No. 3, P.2, 1964, 57-78 

/9/ i~.B.Lees, Autoratic Gene1"ation of Natural-Langua&e Sentences, 
re~d at the ~olloquium "Foundations of I~\[athematics..." 
Tihany, 1962 (in press) 

/I0/ G.H.2~latthews, Anal~sis by Synthesis of Sentences in the Light 
of Recent Developments in the Theory of Gran~ar (re d at 
the ~olloquium on Algebraic Lingui~tics amd }~\[T, Prague 
1964 (in p1"ess in Kybe~-netika) 
Sgall ~ .Q 

/ll/ A.G.Oet~nger, Automatic Syntactic Analysis and the Pushdown 
St~m~ Structure of Language and Its Mathematical 
Aspects, Proc. of the ~ymp. in Appl. Math. 12, 1961, 104- 
-129 

/12/ P.M.Postal, Constituent Structure, Indiana Univ., Bloomington 
(=IJAL 30/1, Fart III), 1964 V 

/13/ I.I.~evzin - V.J.Xozencvejg, Osnovy ob~e~;~o i ma~innogo perevo- 
da, ~ioscow 1964 

/14/ P.S~alI, Zur Frage der Ebenen im Sprachsystem, Trsvaux linguls- 
tiques de Prague I, 1964, 95-106 

/15/ P.Sgall, ~eneratlve ~eschreibung und die Ebenen des Sprach- 
s~,stems (re t~d at the Conf. on Struc~ Linz., Magdeburg, 
1964) 

/16/ roSg~ll, Ein mehrstufiges generatives System (read at the 
~olloquium on Al~ebr. Ling. and MT, Prague, 1964) 

/17/ V.E.Yngve, Random Gene2ation of English Sentences, in Proc. 
1961 Intern. ~on~r. on ~achine Translation of Languages 
and Applied l~Ig. Analysis, Teddington (1962) 
