An intuitive representation of context-free languages 
By L~szl6 KALM~ in Bzeged, Hungary 
1. In this paper, the following conception of languageis 
used. A l~guage is an ordered triple L = (V, C, f) where V and C 
are two disjoint, non-empty, finite sets and f is sn application of C 
into the set of all subsets of the free semigroup F(~ generated by ¥. 
The set V is called the vocabulary (in Chomsky ~1~, terminalvocabu- 
lary), its elements are called word_____~s, those of F(V) Ford strings. The 
elements of C (which corresponds to the auxiliary vocabulary in Chom- 
sky \[i~) are called ~smmatieal) categories. For any category c ~ C, 
the elements of f~)~ F~) are called the word strings belonging to 
the category c. 
The usual conception of language, viz. a subset S of F~V), is 
S particular case in which C = ~s} contains a single element s ~the 
category of sentences, a sentence, i. e. a word string belonging to 
the category s, being an element of S~ However, both for natural and 
formsl languages, the above, more general conception seems to be more 
appropriate, for we are not only interested, in the ease of a natural 
language, in ~hat are the sentences, and, for a programming language, 
say, what are the programs, but also, what are the noun phrases, verbal 
phrases, etc., and, what are the declsrations, Statements, expressions, 
etc., respectively. Another advantage of our more general conception 
is that for a generative grammar of L, we can use the set.C of catego- 
ries (but of course, we can use any superset of C as wel~ as auxili- 
ary vocabulary. 
Accordingly, we define a context-free grammar as an ordered 
triple G = (V, C, R~ where V and C ar~ two disjoint, non-empty, fini- 
te sets and R is s subset of the Cartesian product of C with the free 
semigroup P(VU C) generated by the union of V and C. V and C are 
called the vocabulary and set of cste6ories (or terminal and auxili- 
ary vocabulary), respectively; the elements r of R, which are of the 
form (c,~ with c ~ C and ~ E F(V ~ ~, are called (production) rules. 
A rule (c, ~> will be written in the sequel as "c:d" as in the pres- 
entation of ALGOL 68 \[2\], rather than "c,~" as in Chomsky or "c: :=d', 
- 2- 
as in the presentation of ALGOL 60 C3~). A "mixed string," i. e. an 
element ~ of F ~ U C) is called a direct production of a category c 
if c:~ is a rule; productions of a category o are defined recursively 
as ~) its direct productions end ~ii) mixed strings ~ = dl~3 form- 
ed of productions ~t= c%ie~63 of e by replacing a category c t by a 
direct production ~2 of ci° (We denote the semigroup operation of 
F (V U C) by juxtaposition end do not distinguish in notation a string 
formed of a single element (of V   C) from that element.) Terminal 
productions of c ere those of its productions which are elements of 
F~) ~. e. formed of words only); and the language L generated by a 
context-free ~rammar G = ~V, C, R~ is defined as L = 4¥, C, f~ where, 
for any category c ~ C, f~) is the set of all terminal productions 
of c. (Also, any language L / = (V, C/, fl) with CtC C where fl is 
the mapping f above restricted to C / could be regarded ssa language 
genersted by G as well.) A language L is a comtext.free language if 
it is generated by some context-free grammar. 
2. The intuitive representation of context-free languages s~ 
bout which I shall speak is a representation by means of flag diagrams. 
A flag diagram is an ordered septuple D -- (V, C, H, fl' f2' gl' g2~' 
where V and C are disjoint, non-empty, finite sets; H is a finite ori- 
ented graph; fl and f2 are mappings of two disjoint, non-empty subsets 
P1 end P2' respectively, of the set P of points (vertices) of H onto 
C; and gl and g2 are mappings of two disjoint subsets E 1 and E2, re- 
spectively, of the set E of edges of H, of which E 1 is non-empty, on- 
to V and into C, respectively. The sets V and C are called again v__oo- 
cabulary (set of words) and set of categories, respectively. A point 
Pl ~ P1 and a point P2~ P2 with fl@l) = f2(P2~ = c E C are called 
a starting c-point snd an ending c-point, respectively; an edge el~ E 1 
with gl(el) = v ~ V and sn edge e 2 ~ E 2 with g2(e2~ = c~ C are called 
a v-edge and a c-edge, respectively. Starting and ending c-points are 
marked by s flag-head, pointing to the left and to the right (i. e. by 
a pentagon with two horizontal, one vertical and two slant sides which 
form an angle pointing to the left and to the right), respectively, 
bearing the symbol c; v-edges e are marked by the word v written a- 
bove the edge e, and c-edge~ are mar~ed by a double flag-head, point- 
i~ tu Both sides (i. e. by a hexagon with two horizontal and four 
slent sides which form two angles, pointing to the left and to the 
-3- 
right , bearing the s bol c. (See Fig. i.) 
starting ending 
c-point v-edge c-e dge 
Fig. 1. 
In the case E 2 = @ we call D = (V, C, H~ fl' f2; gl' g2~' 
which can be written for short as D = <V, C, H, fl' f2' gl> for g2 
is the empty mapping, viz. the m~pping of the empty set E2 into c, 
a finite state fla~ diagram. In this case, an ~riented, possibly 
self-intersecting) path Q ~. e. going possibly severai but a finite 
number of times through the same point or edg~ of H is called a c~ 
path of H (c ~ C), if it leads from s starting c-point Pl to an end- 
ing c-point P2 but otherwise, does not go throug~ any starting or 
ending c-point (implying the condition that Q has not to go through 
Pl or P2 once mor~. 
Let be el, e2, ..., e n the edges belonging to E 1 of a c-path 
Q of H, each written as m2ny times as Q goes through it and written 
in the order in which Q goes through them. The word string VlV2...Vn, 
where, for i = I, 2, ..., n, v i = gl(ei), is called the word strin~ 
robe read along the c-path Q. The language represented h~ a finite 
state flag diagram D = <V, C, H, fl' f2' g~ is defined as L = ~V, C, f>, 
where, for any c ~ C, f(c) is the set of all word strings to be read 
along some c-path of E. A language is a finite state language if it 
is represented by some finite state flag diagram. 
Flag diagrams in general are generalizations of finite state 
flag diagrsms. In the case of a flag diagram D = <V, C, H, fl' f2' gl' 
g2> in general, an ~riented, possibly self-intersecting~ path Q of H 
is called a c-path of H ~ ~ C) if, besides leading from a starting c- 
point to an ending c-point but otherwise not going through any starting 
or ending c-point, it does not go through any cl-edge of H ~t ~ C). 
The word string to be read along a c-path of H is defined in the same 
way as in the case of a finite state flag diagram. 
In order to define the language represented by a flag diagram 
in general, we need still some auxiliary notions. Consider two differ- 
H <V, C, ent ~nts Pl and P2 of the oriented graph^of a flag diagram D = 
-4- 
H, fl' f2' gl' g2>" The subgraph H / of H connectin~ Pl with P2 con- 
sists, by definition, of all points p of H for which both from Pl to 
p and from p to P2 at least one (oriented) path leads, together with 
the edges which connect theszpoints p in H. (If there is no such point 
p then H / is the empty graph; otherwise, both Pl and P2 are points of 
H~). A subgraph of H connecting a starting c-point with an ending c- 
point ~ ~ C), provided it is not empty, is cslled a c-subgraph of H. 
The derivatives of a fla~ diagram D = <V, C, H, fl' f2' gl' g~ 
are defined by recursion as (i) D itself, and (ii) any flag diagram of 
the form D t = <V, C, H I , f~, f~, g~, g~> which can be obtained from so- 
me derivative D" = <V, C, E", f~, f~, g~, g~> of D by replacing one of 
its c-edges (c ~ C) e by any c-subgraph H"' of H. Here, replacing has 
to be understood in the following sense. First, the starting c-point 
Pl ~nd the ending c-point P2 of H connected by H" are replaced by the 
st~zting point P3 and the ending point P4' respectively, of the edge e 
of H"; then, the graph H'" modified thus is inserted, instead of e, be- 
tween P3 and P4 in H". Any point or edge of H" and H ~ which was a start- 
ing c~-point, an ending c'-point, a ~-edge or a c~-edge in H" and H, re- 
spectively (c' E C, v & V), remains so after the replacement; in parti- 
cular, P3 and P4 remein marked or unmarked as they were in H" rather 
then getting marked by a flag-head, bearing the s~bol c and pointing 
to the left and to the right, respectively, as Pl snd P2' respectively, 
were m~rked in H. 
Now, we define the language represented by a fls~ diagrem D = 
(V, C, H, fl' f2' gl' g2) as L = <V, C, f> where, for any c E C, f(c) 
is the set of all word strings to be read along some c-path of the ori- 
ented graph H' of some derivative D' = (V, C, H' , fl'# f~, g~, g~)of D. 
As a simple example, Fig. 2. shows a flag diagram D represent- 
ing the language L = (V, C, f> with V =~0, (,)}, C = {s} and f(s) = 
~b, (C), ((0)) ; (((0)))i , ...}. On Fig. 3, some of the derivatives of 
D ere shove. 
0 
- 
7ig. 2. 
c 
-5- 
0 
0 
<D l + o 
0 
° ,t+ 
i+++oo°.+,o°.°°......+°o.o~o.o°o..oo+..°e,oooot 
Fig. 3. 
3. Obviously, any cQntext~free language can be represented 
by some flag diagram. Indeed, let G = (V, C, R~ be a context-free 
grammar. To each rule r = c:~ R, where 6= SlS2...Sn, c E C, el, 
s 2, ..., SnE V U C, form an oriented grsph H r with n + 1 points P O, 
Pl' P2' "''' Pn of which P0 is a starting c-poi~t~ pm an ending c- 
point and, for i = l, 2, ..., ~, Pi-1 is connected with Pi hy a~ s i- 
edge (i. e., for s i = v E V a v-edge, for s i = c E C a c -edg~ el, 
oriented towards Pi" The ~isconnected) unio~ of these oriented graphs 
Hr, for all rules r ~ R, defines a flag diagram obviously representing 
the language generated by G. 
Using this constuction e. g. for the context-free grammar G = 
<V, C, R~ with ¥ =~O,( , )9, C =~s~ and R =~s:O, s:~s)~ generating 
the l~mguage represented by the flag diagram shown by Fig. 2, we should 
obtain the flag diagram, shown by Fig. 4. Replacing this disconnected 
flag diagram by the connected one shown by Fig 2 corresponds to the AL- 
GOL 68-like way of writing s:O;(s) (or the ALGOL 60-like way of writing 
s::=O\[~)) of the rules belonging to R. 
Besides the possibility of reduction of the number of starting 
-6- 
Fig. 4. 
and ending c-points in a similar way, we have often the more important 
possibility of reduction of the number of c-edges. E.g. the language 
generated by the context-free grammar G = (V, C, R) with V = ~a, b,..., 
z, O, 1,..., 9 , C = letter, digit, identifier , R = rl, r2,... , 
r26 , r27 , r28,... , r36 , r37, r38 , r39~, where 
r I = letter:a ~ 
r 2 = letter:b 
.e.e°eeoeeeeo 
r26 = letter:z 
r27 = digit:O 
r28 = digit:l 
.ooooeeeoooee 
r36 = digit:9 
r37 = identifier:letter 
r38 = identifier:identifier letter 
r39 = identifier:identifier digit 
here, space has been used between "identifier" snd "letter" or "digit" 
to denote the semigreup operation), can be represented, instead of the ~ 
flag diagram indicated by Fig. 5, which we get using the above constuct- 
ion end for which the overall number of c-edges is 5, by the flag dia- 
gram indicated by Fig. 6, for which this number is O, the language in 
~ue~tion being actually a finite state language. 
To show a more complicated exsmple, take the lsnguage of the 
Church lambda conversion t4J where we use x, x|, xJl, XllJ ,... as va- 
riables end for simplicity (as a matter of~fact, for obtaining a con- 
text-free language at all) we allow the abstraction ~v~ even if the 
variable v is not contsined, or contained as a bound variable, in the 
(well-formed) formula g. This language is generated by the context- 
free grammar G = (V, C, R~ with V = ~x, \] , ~, ~, ~, ~, \], (,)~, C = 
-7- 
eeeeeeeeeeeeeeeeeee,o 
eeeeeoeeeeoeeaoeleele 
<identifier ~ identifie¢ 
<iden%ifl er ~iden%i fier~ iden%ifi er> 
<iden%i fief ~identifier~ iden, ifier> 
Fig, 5. 
a 
: 
Fig . 6. 
~variable, formula} 8nd R = {rl. r2, r3. r4, rs~ , where 
r I = variable:x 
r 2 = variable:variable| 
- 8 - 
r 3 = formula:variable 
r4 = formula:~formul~ (formula) 
r 5 = formula:~variable~foJ~mul~ 
(here, the semigroup operation is denoted by juxtaposition agai~ . 
Fig. 7 shows a simple flag diagram representing this language. Here, 
the overall number of c-edges is 4. 
Fig. 7. 
Flag diagrams can be used with advantage as a tutorial tool in 
teaching progrs~ing languages. In the ease of a great number of cate- 
gories, starting and ending c-points can be marked by flags with handle 
of different length for different categories c, rather than just flag- 
heads, pointing to the left and to the right, respectively, as shown 
by Fig. 8, serving to give a survey over the possible modes in ALGO~ 68. 
Here, the flags instead of flag-heads are not really needed, for we have 
two categories only. However, inserting some more flags (which needs 
some more branchings too), we can get a flag diagram which is equivalent 
to the metaproduction rules of modes ~ith 25 categories) of C2\], 1.2.1 . 
Besides such tutorial use of flag diagrams, they might have a 
theoretical interest in furnishing a natural classification of context- 
free languages according to the minimum of the overall number of c-edges 
in the flag diagrams representing a given such language. This minimum 
can be considered as a measure of the non-finite state character of the 
given l~nguage. However, for such a theoretical use, a method for cal- 
culation of the minimum in question would ~e needed. 
-9- 
I integral 
real ~t boolean 
character 
format 
procedureJ With r ~ klll~" 
reference to 
row of 
structured with ~_ field 
parameter .~ 
i 
a 
• z • 
letter~ 
digit " zero\] 
w~i °he ~I 
: 
\[nine I 
I r 
union of.i 
and 
mode 
Fig. 8. 
r, 
- 10- 

References 

N. Chomsky, Three models for the description of langa- 
~ge, IRE Transactions, 2 6950, 113-124. 

A. van Wijngaarden ~editor), ~. J. Mailloux, J. E. T.. Peck, 
and C. H. A. Koster, Final Draft Report on the AlgorithnLic Lsngu- 
sge ALGOL 68, Stichting Mathematisch Centrum, Amsterdam, Rekem- 
sfdeling, I~m 100 (1968). 

P. Naur (editor) , J. W. Bsckus, J. Green, C. Ketz, J. McCarthy, 
A. J. Perlis, H. Rutishauser, K. Samelson, B. Vsuquois, J. G. Weg- 
stein, A. van Wijngaarden, and M. Woodger, Report on the Algorith- 
mic Language ALGOL 60, Numerische lath., 2 (1960), 106-136, or 
C oz~munications AC~, 3 (1960)~ 299-314. 

A. Church, A set of postulates for the foundation of 
Logic, second paper, Ann~ls of Math., (23 34 C1933)¢ 
