The Relevance of Some Compiler CQnstruction 
Techniques to the Description and Translation 
Of Languages 
by 
Steven I, Laszlo 
Western Union Telegraph Co. 
The framework is machine-translation. Compiler-building 
can for'a variety of reasons be considered as a special case of 
machine-translation. It is the purpose of this paper to explicate 
some techniques used in compiler-building, and to relate these to 
linguistic theory and to the practice of machine-translation. " 
The generally observed machine-translation procedure could 
be schernatized as in FIGURE 1, or to put it another way, 
le 
2. 
3. 
Parsing the source-text. 
Translation from source to object-language. 
Synthesis of gramrrmtically correct object-text. 
FIGURE 1. " 
break-down, translation, and recomposition. The translation 
usually occurs on the level of some simplified, cannonical form 
(that is not necessarily the kernel-form) of both languages, such 
that the source-text is decomposed, and the object-text recomposed 
from this form. The translation algorithm usually requires a 
statement of the structure of both the source and the object-language, 
a.s well as the statemen~f some primitive-to-primitive 
* Currently at Decision S~stems, Inc. 
-- I- 
correspondence paradigm for both syntactic and lexical primitives. 
Compilers on the other hand work on the bases of only the first two 
steps of FIGURE 1. : breakdown, . and translation. Consequently, 
the processor requires only statements of the structure of the 
source -language and of the correspondence paradigm. That does 
not imply that the structure of the object-language is irrelevantto 
the process of translation, but that it is implicit in the 
correspondence paradigm, and in the selection of what is a 
primitive or terminal in the description of the Source~-language. 
Through the use of examples it will be shown that BNF and 
similar language-description devices (8) are -- by themseives -- 
both analytically and generatively inadequate and depend on other 
devices, implicit in the translation algorithm. It will be shown 
that by some extensions of the notion of P-rules and some 
applications of the concept of T-r___.~e__.s (4), a description that is bpth 
analytically and generatively adequate may be constructed for 
programming languages. The programming language P. O. L. Z (IZ}, 
(13) was selected for the examples because an adequate, fully 
explicit description does exist for it; furthermore, the language 
contains most syntactically problernatic features of other 
programming languages as well as presenting a few unique problems 
in description that are worthy of attention. 
• The failure to come to grips with the ~ problem is 
sufficient to demonstrate the inadequacy of BNF and similar devices 
• (8). The simplified program-segments in FIGURE 2, serve to 
illustrate 
EXAMPLE I. 
I. Let A be variable. 
2. Let B be = "7". 
3. Let C be = "9.5 'i. 
4. Let D be = ". 07Z". 
5. A=B+C/D. 
6. Print A. 
EXAMPLE Z. 
Define Funct (A, B) = (C). 
Q 
End. 
---and elsewhere--- 
Funct (Q, R) -- (Z). 
V = D +K ~Funct (P,T}. 
FIGURE Z. 
-2- 
this problem. B'NF and similar devices would generate a parse 
designating "A, "B", etc. in qEXAMPLE 1. as identifier (a 
syntactic word-class) hut would fail to indicate that the various 
occurrences of a given identifier (e. g., "A" in statements 1., 5., 
and 6. ) are that of the same lexical token or semantic object. 
Related to the identity problem is the restriction that each 
identifier occurring in a program statement must also occur in one 
and only one definition. This restriction may be called the 
definitionproblem. BNF, etc., do not handle the definition 
problem. Other manifestations of the identity and definition 
problems are associated with the use of macro- or compound 
functions (see EXAMPLE Z., FIGURE Z.), subscript expressions, 
etc. 
Since there exists a demonstrable necessity for establishing 
the above mentioned identities and restriction (3), compilers 
contain -- implicit in the translation algorithm -- an elaborate 
table-building/table-searching/identity-testing procedure. Without 
such procedures, the syntactic description is inadequate, full 
analysis and translation impossible. In order to deal with these 
problems explicitly, it was decided to incorporate a 
transformational component along with the BNF-like phrase- 
structure component in the description of P. O. L. 2. The above 
reasons for positing a transformational component are in essence 
the programming-language equivalents of Chomskyls original 
reasons to use transformations in the description of natural 
languages. 
Rule 1. M 9- #,M,se1.1, # 
where '!M"is the initial symbol, "#" is the boundary marker, 
and the subscript will be explained later. 
Rule Z. M -~DEFINE, functmention, program, END sel. I 
where the convention is used that terminal symbols are all 
capital letters, and members of the intermediate alphabet 
are in lower case. 
Rule 3. program-k.. ., placeholder, M, . . . 
FIGURE 3. 
In FIGURE 3., in a simplified form it is shown that the phrase- 
structure component generates function definitions (17), (18) 
embedded in others (see Rule 3. ), and that the form of the function 
is generated in the definition -- as the expansion of the symbol 
-3- 
"functmention" -- generating place-holders for instances of use of 
the function. Transformations replace the place-holders with the 
appropriate form of the function generated in the definition, thus 
accounting for both the identity and the definition problems. Other 
transformations exist to handle other instances of these problems 
e. g., labels, identifiers, subscript expressions. The method is : 
identical: the form is generated in the relevant definition, place- 
holders are generated for instances of use, and the place-holders 
are replaced transformationally with the correct form generated in 
the definition. 
Other transformations deal with additional notational ! 
restrictions of P. O. L.Z. One such restriction is that a function 
definition may reference other functions but a definition may not be 
embedded in another. Definitions (see FIGURE 3. ) are in fact 
generated embedded, and it becomes necessary to posit some 
exbedding tra.nsformation (7), moving the nested definitions outside 
the "parent" definition. There exist several proofs in the literature 
establishing the equivalence between languages generated by 
grammars with and without the use of boundary markers (5), (10). 
The exbedding transformation may be expressed more simply if 
boundarymarkers are used (see FIGURE 4. ). 
#, ..., #, M, #, ..., #.> #, M, #, #, ..., # 
or 
M 
FIGURE 4o 
The boundary-markers may be deleted later by another 
transformation, or they may rewrite as carriage-returns on some 
keyboard, depending on the orthography of the particular 
implementation and medium. The T-rules may be generated by 
positing a set of elementary transformations (i. e. , single node 
operations ) and a set of formation and ~ombination rules over the 
set of elementary transformations, prJoducing some set of compound 
or complex transformations. This i~ not significantly different 
from having locally ordered subsets of a set of elementary 
-4- 
transformations (11), (1Z)° 
Syntactic descriptions of programming languages published in 
the past =- e. g., (1), (9), (19) -- generally Cook a program- 
statement to correspond to the basic unit of :'grammar, denoted by 
the. initial symbol of the phrase-structure grammar. The grammer 
discussed here takes a function definition (s~e FIGURE 3. ) as its 
basic unit. Program-statements are elements of the intermediate 
alphabet and have no other theoretical standing or significance. 
The natural language correlates of program-statements are 
sentences, and function definitions correspond to some larger- 
than-sentence units of discourse (e. g., paragraphs or chapters). 
This procedure may lead to some syntactic or at least linguistic 
method of distinguishing between "meaningful" and "meaningless" 
. programs. Using a syntax of prograins, orfunctlons also yields 
an intuitively more pleasing set of relationships among elements ~f 
the described language. 
The present grammar makes no effort to distinguish between' 
"elegant" and inelegant" programming, but does distinguish both 
from "ungrammatical" Code. Declaring arguments or variables 
never •referenced is inelegant; referencing undeclared operands is 
ungrammatical. To return momentarily to the identity and 
definition problems: it is possible to generate a definition such that 
there are no corresponding place-holders; but each place-holder 
must be replaced by some definition-generated form of the 
appropriate nature. In describing the definition and use of functions, 
separate place-h01ders accomodate recursive use and the general 
case of usage. 
It is customary to give descriptions of programming languages 
such that -- with the exception of some small set of key words such 
as arithmetic operators, delimiters of definitions, etc. -- the 
phrase-structure grammar generates character-strings for the 
!exical items. In naturai languages the vocabulary is fixed. There 
is a stable, limited Set of vocabulary elements that correspond to 
each syntactic word-class. In programming languages that is not 
the case: a small set of word-classes rewrite each as a set of one 
or more key-words; others will expand -- through the use of some 
phras&-structure rules -- as any string• of characters. In the 
descriptian of P. O. L. Z it was decided to separate the lexicon= 
generation rules from the phrase-structure rules. Though they are 
the same shape that BNF rules of the same purpose .would be, it 
was de~erm~ned that separating the rules generating lexical items -- 
even as morphophonemic rules of natural languages represent a 
separate class of rules -- is more intuitively acceptable: a class of 
orthographic rules. FIGURE 5. indicates what some of these rules 
~night look like. 
In the tekt 0g FIGURE 3., Rule 1., the explanation of the 
subscript was deferred. Functions and operators used in 
programming languages•are two notational variants of the same 
concept (17). Depending onthe notation of the system, any operation 
may be expressed either as an operator or a function. Since in 
- 5- 
Rule I. identifier * alpha (, characterstring> 
where "~ ... > " enclose optional items. 
Rule ,- 2. ~ *I alpha } characterstring t numera! 
where "~...~" enclose alternative options such that one and 
only onetoftHeJ options enumerated must~be selected. 
iqu~e ~. alpha * 
~, character string> 
f I 2 Rule 4. numeral* " 
0 1 
FIGURE • 5. " 
P. O. L. Z there are both functions and operators, depending on 
notational convenience, newly defined operations may be defined 
as either. \]Being defined as one or the other, however, restricts 
their distribution or "embeddability" to certain contexts. This 
phenomenon is accounted for by the use of a device similar to the 
notation of complex s_ymboltheory (4), (11), (lZ), (15). The 
P.O.L. Z notation is such that functions (i. e., defined macro s) 
ma 7 occur as functions, coordinate transformations (linear or 
otherwise) or as operands (denoting their value for a particular • 
set of arguments) and operators may appear as arithmetic, 
relational or logical operators, depending on range and/or domain 
as well as distributional restrictions. In P. O. L. 2 every program - 
however simple or complex -- must have an "outermost" function, 
one into which all others are embedded by the P-rules. The first 
rule of the grammar (see FIGURE 5., Rule 1. ) expands the 
"outermost" function. Elsewhere in the phrase-structure 
component, depending on context, other 
"Msel. i s'' are introduced, as well as ~'Mse!. Zs", "Msel. 3 s'', 
"M and "M ~s". sel. 4 s", sel. 
-6- 
Th~ese correspond to the various embedded occurrences of functions 
and Operators. The rewrites or expansions of the several versions 
of "M" are almost identical except for the string denoting the left 
bracket delimiting the definition. Alternative solutions exist but 
the above one appears most intuitively satisfying. 
There are proofs and demonstrations in the literature to the 
effect that full, left, or right parenthesis notation is context-free, 
but not much on elided parenthesis notation. We have in the past 
constructed several context-sensitive grammars generating elided 
parenthesis notation, but they did not seem very satisfactory. 
Adding a device not heretofore associated with production-rules, a i 
set of rules was produced to generate the elided parentheses 
notation such that the rules look and process very much like context- 
free rules (see FIGURE 6.). .~ 
Rule I. 
Rule 2. 
expression 9- expression n / 
"expression n+e, operator n, expressio 
expr e s sion n+e 
expressionn-~ "(", expression, ")" 
identifier piaceholder 
unaryoperator, expression 
where for one cycle (II)n remains the same integer between 
s ubrules I and 2 and e remains the same integer increment. 
FIGURE 6. 
Though the "counter" n and the "increment " eare not part of a 
known system of production rules, their nature and the reason for 
their use can be clearly stated. Their use per,nits a simpler 
scanner for the syntax than context-restricted rules do. 
A similar counter is used to handle the concatenations of n- 
tuples.. In P. O. L. Z an item of data may be declared as a pair, 
triple, or n-tuple, and operations may be performed over nltuples 
of identical n.s (see FIGURE 7. ). 
-7.- 
Rule i. n-tuple-expression ")" n-tuple, operator, n-tuple 
wlrere n = n = n. Any of the n- tuples may however be 
concatenates of two or more n-tuples of smaller n-s such that: 
Rule Z. n-tuple ~ (m) - tuple, concatenator, (n-m____)-tupl e 
where n andre are positive integers and the arithmetic 
relationship designated obtains. 
FIGURE 7. 
Of course, the (m)-tuple or the (n-m)-tuple may be further broken 
down by the same rule into further concatenates. 
The above are selected examples rather than an exhaustive 
list of the transformations in the syntax of P. O. L.Z. A rigorous 
statement of the transformations is available, stated as mappings 
Of structural descriptions into structural descriptions, accounting 
for the attachment and detachment of nodes. Presenting the 
selection of transformations here in a descriptive rather than a 
rigorous form offers an idea of the general approach. 
Constructing the phrase structure component, many alternative 
solutions or approaches came up at every juncture; in specifying 
the transformational component, the alternatives quickly multiplied 
beyond manageable proportions. It is certainly the case that 
throughout its brief but exciting history, one of the aims of 
transformational theory has been to describe language in terms of 
the most• restricted -- hence simplest-- system possible. But one 
may well regard the sets of devices solar advanced as parts of 
transformational theory, as algorithmic alphabets (in the A.A. 
Markov/M~rtin Davis (5), (15) sense). Specific algorithmic 
alphabets are more or less arbitrary selections from some universe 
of elementary and compound algorithms bound by formation and 
com~ination rules. This paper is not a proposal toward the 
modification, extension or restriction of transformational theory, 
merely at, indication that an overlapping set of algorithms may be 
selected to deal with a similar but not identical problem: the 
structural ~ " " descrlptlon of some formal notation systems such as 
programming languages. 
Beyond doubt, substantial simplification and sophistication may 
be achieved over the model described here. The effort here has 
been toward the application of linguistic techniques to artificial 
languages, conforming to the linguist's notion of what it means to 
"give an account 0f the data", rather than to the laxer standards of 
themethods used to describe programming languages. 
-8- 

References

I. Ba~kus, J.W. "The Syntax and Semantics of the. Proposed 
International Algebraic Language of the Zurich ACM-GAMM 
Conference", Information Processing; Proceedings of the 
International Conference on Information Processing. Paris: 
UNESCO, 1960. 

Cheatham, Jr., T.E. The Introduction of Definitional Facilities.. 
into Higher Level Programming Languages. Draft Report i 
CA-6605-061 I. ; Wakefield, Mass. : Computer Associates, Inc., 
1966. 

The Theory and Construction O f Compilers 
Draft Report CA-6606-0111. ; Wakefield, Mass. : Computer~ 
Associates, Inc., 1966'. 

Chomsky, Noam. .... Aspects of the Theory of Syntax. Cambrtdge~ " , 
Mass.:'MIT Press, 1965. 

"On Certain Formal Properties of 
Grammars", Information and Control, 2, (1959), pp. 137-167. 

Davis, Martin. Computability and Unsolvability. New York; 
McGraw-Hill, 1958. 

Film,re. C.J. "The Position of Embedding Transformations 
in a Grammar",.Word,19, Z, (1963). 

Corn, Saul• "Specification Languages for Mechanical Languages 
and their Processors -- A Baker's Dozen", Communications of 
the ACM, 7, 12, (1961). 

Heising, W.P. 'TIistory and Summary of FORTRAN Standardi- 
zation Development for the ASA", Communications of the ACM, 
7, ~0, (1964). 

Landweber, P.S. "Three Theorems on Phrase Structure 
Grammars of Type I", Information and Control, 6, (1963), 
pp. 131-136. 

Lakoff, G.P. Cycles and Complex Symbols in English Syntax. 
Unpublished Manuscript, Indiana UniversitY, \]963. 

Some Constraints On Transformations. 
Unpublished manuscript, Indiana Universify, 1964. 

Laszlo, S.I. "Report on a Proposed General Purpose Procedure 
Oriented Computer Programming Language"• Repo.rt of the 
Institute of Educational Research, Bloomington: Indiama 
• University, 1965. 

"P.O.L., A General Purpose, Procedure 
Oriented Computer Programming Language" Repor t of the 
Institute Of Educational Research, Bloomington: Indianan 
University, 1965. 

Matthews, P.H., "Problems of Selection in Transformational 
Grammar", Journal of L.!nguistics, i, (1965)• 

Mark,v, A.A., Theory of Algorithms. Washington, D.C. : 
U.S. Printing Office, 1965. 

McCarthy, John "A Basis for a Mathematical Theory of 
Computation", in Computer Programming and Formal Systems. 
P. Braff0rt & D. Hirschberg (ed.), Amsterdam: N. Holland 
Publishing Co. 1963. . 

18. et al., LISP 1.5 Programmer's Manual; 
Cambridge, Mass.: MIT Press, 196Z. 

19. Naur, Peter (ed.) "Revised Report on the Algorithmic Language 
ALGOL 60", Communications of the ACM, reprinted in 
• E.W. Dijkstra, A Primer of ALGOL Programming. New York: 
Academic Press, 1964. 
