COLING 82, .i.. Horeck) led.) 
North-Holland Publishing Company 
© Academia. 1982 
RANDOM GENERATION OF CZECH SENTENCES 
Jarmila Panevov~ 
Department of Applied Mathematics 
Faculty of Mathematics and Physics 
Charles University 
Prague 
Czechoslovakia 
The experiments testing the theoretical adequacy 
and the practical usefulness of the Functional 
Generative Description (FGD) are described. The 
FGD consists of a generative component, which, 
in the experimental version, has the shape of a 
context-free grammar combined with elements of 
dependency approach, and the other components 
having the form of pushdown store automata. The 
latter components have a transductive role, trans- 
ducing the semantie Ctectogrammatical) represent- 
ations of sentences to the lower levels of the 
language system. The transduction is articulated 
into several steps corresponding more or less to 
the levels of language system Csurface syntax, 
morphemics, morphophonemics, phonemics, or, as 
the case may be, graphemics) postulated in Europ- 
ean structural linguistics. The theoretical and 
practical qualities of the system are evaluated. 
i. The model of generative description called Functional Gener- 
ative Description CFGD) was proposed early in the sixties (Sgall, 
1964~, and it is worked out from that time by the'group of algebraic 
linguistics, Charles University, Prague. This description is being 
enriched and completed from the empirical point of view and from the 
point of view of its theoretical adequacy. TO allow for a systematic 
elaboration of both of these aspects, FGD is tested on computers in 
the form of random generation of semantic representations of Czech 
sentences and of transducing them to their outer shape. The computer 
testing fulfils also another aim: FGD, beside being an appropriate 
framework for empirical study and theoretical description of langua- 
ge, can also be applied as a background for practical projects, such 
as synthesis for machine translation into Czech, synthesis for ans- 
wers in question-answering systems etc. 
2. The main features distinguishing FGD from most of other 
linguistic frameworks are: (i) in FGD there is no place where trans- 
formationrules are needed~ Cii) the generative power is concentrated 
in its first component, generating underlying representations on 
the level of linguistic meaning representing a specific patterning 
of extralinguistic, ontological content , the set of generated 
~trinqs surpassing only moderately the set of context-free lanquaqes 
(Pl~tek and Sgall, 1978)~ (iii) FGD is based on a dependency approach 
to svntax; (iv) a stratificational approach is used here, articulat- 
ing the generation of sentences into several steps corresponding to 
particular language levels ordered from meaning to the outer shape 
295 
296 J. PANEVOV.~ 
<level of tectogrammatics, of surface syntax, of morphemics,morpho- 
phonemics and at the end graphemics in our case, or the phonemic le- 
vel in a theoretical description). 
Describing the theoretical features of FGD it must be stressed 
that the description intself is being developed much more quickly 
than the system of programmes can reflect. The first component of FGD, 
generating tectogrammatical representations (TR's), was reformulated 
from the shape of formalism corresponding to context-free phrase 
structure grammar with elements of dependency features <Sga!l, 1967> 
into a pure dependency formalism using pushdown store automata and 
including also description of the topic/focus articulation of the 
sentence (see Haji~ov~ and Sgall, 1980>. 
3. The component generating TR's was implemented in the older 
form, as a context-free grammar. The first experiments with random 
generation are restricted to a relatively small lexicon: something 
about 300 "deep" lexical units; the number of units increases on the 
surface level, where also function words are present, as well as the 
units gained by means of "syntactic" derivation in Kury~owicz (1936) 
sense (suffixation and prefixation serving for nominalizons, etc.>. 
The enriched output lexicon consists of more than 1500 units. As for 
the grammatical phenomena concerning the different linguistic levels, 
we tried to make the system relatively complete even in the first 
stage of the experiments; the coordinated constructions, the pronouns 
of the ist and 2nd person and some of the possible word order variants 
were Omitted in this stage. 
4. We concentrate our attention, first of all, on the transduct- 
ive components. All the linguistically relevant semantic information 
is included in the TR's, where we have to do with disambiguated re- 
presentations, identical for all synonymous surface variants. This 
means that the transductive components describe the asymmetric dual- 
ism between a function and its forms (in the sense of Karcevskij, 
1929, and the Prague School of Linguistics, cf. Vachek, 1964). The 
relation between form and function may be illustrated by the follow- 
ing examples, concerning the relations between a TR and the correspond- 
ing surface syntactic representations and between these and the morph- 
emic ones: The participant actor may be expressed either by surface 
subject, or by an adverbial of actor (in passive constructions>, by 
a possessive adjective, or a noun in genitive or instrumental (with 
nominalizations>; the functor (Fillmore's 'case'>Instrument may be 
expressed by the morphemic case of instrumental, or by prepositional 
constructions n a + locative, pomocf + genitive. 
The mathematical apparatus used for the transduction components 
of FGD is a sequence of pushdown store automata, transducing the TR 
into the surface representations <dependency trees> and the latter 
into morphemic ones <strings>; then follows a finite automaton trans- 
ducing the representation into the graphemic output form <there are 
some differences in this respect between the theoretical description 
of language and the procedure serving for applied projects>. On both 
levels of the structure of the sentence the dependency tree represent- 
ing the structural order has as its root the predicate of the main 
clause, which is the only one node not dependent on any other node. 
Every node is labelled by a representation of a single (autosemantic) 
word form, having the shape of a complex symbol containing its syn- 
tactic, morphological and lexical parts, corresponding to the char- 
acter of the particular level. The dependency tree preserves the ~on- 
dition of projectivity. 
Each transduction of the ~epresentation of the structure of the 
sentence to the adjacent level needs a pair of automata. The condit- 
RANDOM GENERATION OF CZECH SENTENCES 297 
ions constraining the transduction to the next level can be charact- 
erized as follows: 
<a> In a given step only a single dependency syntagm <the governing 
word and its modification> is processed; one of the two steps ad- 
justs the morphological features <called grammatemes, cf. Panevov~, 
1979) 1980> in accordance with certain properties of the other member 
Qf the syntagm; mostly the modifier changes according to the charact- 
er of its governing word: e.g. the actor of a passive verb comes over 
into an adverbial, that of a nominalized verb Ca surface noun> into 
an attribute:host4 ~i~li - p~fchod host6 <the guests arrived - the 
arrival of the guests>. 
<b> A single pass through the sentence <in the text-to-rule order> 
is sufficient for every transducer. 
<c> The process of transduction is based on the governing unit being 
handled by every pushdown automaton earlier that its modifications 
<dependent words>. We work with the new characteristics of the gov- 
erning word when its modifications are being processed. 
The main programme <the defining function> of every automaton 
is based on the fact that the root of every dependency tree is pro- 
cessed as first. <It should be noted that the linearized dependency 
tree is converted into the sequence "regens post rectum"; neverthe- 
less, not only this structural order, but also the linear order,more 
or le"s directly corresponding to the surface <morphemic9 word order, 
are preserved in it>. Then the first member from the right depending 
on the root is read by the automaton I and compared with the root, 
i.e. modified according to its properties. If the last word form read 
has no modifiers, it may be printed on the output from the given au- 
tomaton; if there is some modifier present, the governing word is 
placed into the pushdown store and this pair of word forms connected 
by the dependency relation is then compared and evaluated. This means 
that the matrices or tables described in detail in Panevovl <1979> 
are involved, where the changes obligatory or optional for the given 
pair of word forms <syntagm> are determined. These tables form an 
inner part of the automaton, which cannot be separated from the work 
of the whole procedure; the empirical data determining the choice of 
the means <forms> for functions <meanings> are involved here. The 
word form processed in a given time point can be printed in the out- 
put only in such a point when all the subtrees dependent on it have 
already been printed. On the output of every pushdown transducer but 
the last <i.e. with the exception of the morphemic representation> 
we again receive an order of word forms adapted to the further pro- 
cessing ~f the root of the tree as the first nod~;the modifying <de- 
pendent> nodes are read from the right side. 
5. We want to present here a short survey of the linguistic 
problems solved by the individual automata: The first pair of auto- 
mata chooses the active and passive constructions, non~inalization, 
infinitive construction or subordinate clause. Due to the optional- 
ity of some rules <synonymy between e.g. Sna~il se , aby p~i~el v~as- 
subordinate clause, Sna~il se p ijit v~as - infinitive construction, 
Sna~il se o v~asn~ p fch0d - nominalized form of He attempted to come 
in time>, in the cases where we have to do with the choice between 
several equivalent constructions a probability for particular possib- 
ilities has been added <determined in a quite preliminary way, the 
results of which are being checked in the course of the experiments>. 
The rules of choosing simple or prepositional cases, subordin- 
ating conjunctions, etc. are a matter of the transduction from surf- 
ace syntax to morphemics. There also the morphemic units of number, 
verbal aspect, tense etc. are assigne4; e.g. with the word forms 
characterized <by an index> as a "plurale tantum", any grammateme of 
298 J. PANEVOV.6 
number is obligatorily converted to plural as a morphemic unit: n~k\[ 
<scissors>, kalhot Z <trousers>. The rules of grammatical congruence 
are also applied here <congruence between adjectival adjuncts and 
their head noun, between subject and predicate, etc.>. 
For the formulation of such rules, of course, detailed empiric- 
al studies about contextual conditions influencing the choice of the 
particular expression for instance of such underlying units as Actor, 
Instrument, etc. are needed <Instrumental case - ps~t ep_~, prepos- 
itional phrase n__aa + Loc - ps~t na psacfm stro~i, pomocf + Gen - p~e- 
lo~it pomqqf slovn/ku, etc., i.e. write with a pen, write on a type- 
writer, translate by means of a dictionary, respectively, all corres- 
pond to Instrument>. 
The next step consists in the procedure of morphemic synthesis 
(see Weisheitelov~, 1979).This procedure is adapted to the purposes 
of practical projects, so that a direct transition to graphemics is 
attempted at. Here the structural order is no more needed and the re- 
presentation of word forms provided with information on the morphemes 
included can be submitted to the procedure of the combination of lex- 
ical stem <and, as the case may be, its alternations> with endings to 
create a correct sequence of Czech word forms corresponding to the 
meaning represented in the TR with which the whole procedure of trans- 
duction started. 
6. Some tectogrammatical representations of Czech sentences 
that were already gained as a result of the functioning of the proced- 
ure of random generation at the computer EC 1040 can serve as illustr- 
ations. Most of them are correct from the grammatical point of view, 
though their meaningfulness can often be doubted; however, the con- 
straints on the semantic compatibility of lexical units are - in our 
opinion - a matter reaching beyond the linguistic competence as such. 
The questions of the boundary between these semantic selection re- 
strictions and the grammatical conditions on strict subcategorization 
are by far not clear; this can be illustrated by the foll~wing examp- 
les the underlying trees corresponding to which were derived by our 
system during the first experiments: 
N~co cht~lo mft Kladno.- Something wanted to have (the town of} 
Kladno. 
Pr~v~ n~co t~ilo ka~d4ho mu~e.- Exactly something pleased eve- 
ry man. 
M~me rozvit sv~tek. - We have expanded a holiday. 
Co byla pam~{ spodem? - What was the memory from the bottom? 
Each of these sentences seems to be connected with specific e\[~pirical 
problems concerning the mentioned boundary, and thus also the bounda- 
ry between the system of language and the domain of cognition. Need- 
less to say, clearly acceptable sentences were derived, too, by our 
system, such as: 
Ka~d4 ~ena vyr~b~la n~ky. - Every woman manufactured scissors. 
Panovnfk m~l b~t co nejkrat~f. - The sovereign should have been 
as short as possible. 
Musfme sezn~mit s kvalitou pam~{. - We must make memory 
acquainted with quality. 
P~n vyrobil list/. - The gentleman manufactured leaves. 
The system of programmes for the transductive components is ex- 
tremely complex and due to this fact the linking of its partial nro ~ 
grammes <procedures and subprocedures) for individual automata is a 
difficult task from the point of view of computer storace and of the 
human work concerning the debugging of the programmes. A recent 
sample of computer outputs will be demonstrated at the conference. 
RANDOM GENERATION OF CZECH SENTENCES 299 
APPENDIX 
The TR of the sentence ""Sledovat to nenf snadn4" CTo keep track of 
it is not easy): 
O-BY2T:8CNEG>;2CPROCES), 
3CNONFR),4CSIMULT), 
ONACTUAL),6CIND>, 
ADJ-SNADNE2:; 
3CNONFR),4CSIMULT), ICPOS),8CPRED), ~ 
5CNONACT),6<IND~,8(AG>, 
~o PRONTN-TEN:)8CPAT>, 
12CSG),I3CNEUTR), 
The syntactic surface representation of the same sentence: 
_~A~CO-BY2T:8CNEG);2CPROCES}, 
/ ~ 3CNONFR),4CSIMULT~, 
/ ~ 5CNONACTUAL),6CIND>, 
........ ~ CACTIVE) 
V~20\[ACC);2\[NONPERF), b ADJ-SNADNE2:; ICPOS>, 
3CNONFR>,4CSIMULT), 8CPREDN),I2CSG), 
5(NONACTUAL),6CIND>, 13CNEUTR> 
8(SUBJ),9(ACTIVE) 
~o PROTN-TEN:; 8COBJ), 
12CSG),I3CNEUTR) 
The morphemic representation: 
V-SLEDUJ:;2CNONPERF),3(NONFR),6CIND),9CACTIVE),IOCINF),PRONTN-TEN:; 
8CACC),I2CSG),I3CNEUTR),CO-BY2T:)2CNONPERF),3CNONFR),4CPRES),5CNEG), 
6CIND),9CACTIVE),IIC3PERS),ADJ-SNADNE2:;ICPOS),8CNOM),I2CSG),I3CNEUTR> 
In the Appendix we present a slightly simplified representation of a 
Czech sentence on the different levels. The sequence of data in the 
complex symbo~ functioning as labels of a single node is as follows: 
part of speech, lexical item, indices Cbetween the signs ":" and 
";"), grammatemes <after the ";" sign). The correspondence between 
function and its expression Cform) on the adjacent level may be 
interpreted on the base of our example. The rules from which the 
correspondences between a function and its formCs) may be obtained 
on the basis of the main progranune of a pushdown transducer, or of 
its subprocedures having the form of tables, were characterized in 
Panevov~ C1979; 19809. 
3~ J. PANEVOVA 

REFERENCES

Haji~ov~, E. and Sgall, P., A dependency-based specification of 
topic and focus, SMIL - Journal of Linguistic Calculus 1-2 <1980) 
93-109. 

Karcevskij, S., Du dualisme asymm4trique du signe linguistique, 
in: Travaux du Cercle linguistique de Prague i<1929)88ff. 

Kury~owicz, J., D4rivation lexicale et d4riv&tion syntaxique, 
Bulletin de la Soc. ling. de Paris 37<1936> 79-92. 

Panevov~, J., From tectogrammatics to morphemics, in: Explizite 
Beschreibung der Sprache und automatische Textbearbeitung 4<1979> 
3-166. 

Panevov~, J., Formy a funkce ve stavb~ ~esk4 v~ty \[Forms and 
functions in the structure of Czech sentence\] <Academia, Praha, 
1980). 

Pl~tek, M. and Sgall, P., A scale of context-sensitive languages: 
Applications to natural language, Information and Control 38 
<1978> 1-20. 

Sgall, P., Zur Frage der Ebenen im Sprachsystem, in: Travaux 
linguistiquesde Prague 1 <1964) 95-106. 

Sgall, P., Generativnf popis jazyka a ~esk~ deklinace <Academia, 
Praha, 1967>. 

Vachek, J., A Prague School ~eader in linguistics<Indiana Univ- 
ersity Press, Bloomington, 1964). 

Weisheitelov~, J., Transducing components of functional generat- 
ive description 2, in: Explizite Beschreibung der Sprache und 
automatische Textbearbeitung 5 <1979) 3-67. 
