DEFINING NATURAL LANGUAGE GRAMMARS IN GPSG 
Eric Sven Ristad 
MIT Artificial Intelligence Lab Thinking Machines Corporation 
545 Technology Square and 245 First Street 
Cambridge, MA 02139 Cambridge, MA 02142 
1 Overview 
Three central goals of work in the generalized phrase struc- 
ture grammar (GPSG) linguistic framework, as stated in the 
leading book "Generalized Phrase Structure Grammar" Gaz- 
dar et al (1985) (hereafter GKPS), are: (1) to characterize all 
and only the natural language grammars, (2) to algorithmically 
determine membership and generative power consequences of 
GPSGs, and (3) to embody the universalism of natural lan- 
guage entirely in the formal system, rather than by statements 
made in it. 1 
These pages formally consider whether GPSG's weak context- 
free generative power (wcfgp) will allow it to achieve the three 
goals. The centerpiece of this paper is a proof that it is unde- 
cidable whether an arbitrary GPSG generates the nonnatural 
language ~'. On the basis of this result, I argue that GPSG 
fails to define the natural language grammars, and that the gen- 
erative power consequences of the GPSG framework cannot be 
algorithmically determined, contrary to goals one and two. 2 In 
the process, I examine the linguistic universalism of the GPSG 
formal system and argue that GPSGs can describe an infinite 
class of nonnatural context-free languages. The paper concludes 
with a brief diagnosis of the result and suggests that the problem 
might be met by abandoning the weak context-free generative 
power framework and assuming substantive constraints. 
1.1 The Structure of GPSG Theory 
A generalized phrase structure grammar contains five language- 
particular components (immediate dominance (ID) rules, meta- 
rules, linear precedence (LP) statements, feature co-occurrence 
IGKPS clearly outline their goals. One, uto arrive at a constrained met- 
alanguage capable of defining the grammars of natural languages, but not 
the grammar of, say, the set of prime numbers2(p.4). Two, to construct 
an explicit linguistic theory whose formal consequences are clearly and eas- 
ily determinable. These 'formal consequences' include both the generative 
power consequences demanded by the first goal and membership determi- 
nation: GPSG regards languages "as collections whose membership is def- 
initely and precisely specifiable."(p.1) Three, to define a linguistic theory 
where ~lhe universalism \[of natural language\] is, ultimately, intended to be 
entirely embodied in the formal system, not ezpressed by statements made in 
it.'(p.4, my emphasis) 
2The proof technique make use of invalid computations, and the actual 
GPSG constructed is so simple, so similar to the GPSGs proposed for actual 
natural languages, and so flexible in its exact formulation that the method of 
proof suggests there may be no simple reformulations of GPSG that avoid 
this problem. The proof also suggests that it is impossible in principle 
to algorithmically determine whether linguistic theories based on a wcfgp 
framework (e.g. GPSG) actually define the natural language grammars. 
restrictions (FCRs), and feature specification defaults (FSDs)) 
and four universal components: a theory of syntactic features, 
principles of universal feature instantiation, principles of seman- 
tic interpretation, and formal relationships among various com- 
ponents of the grammar. 3 
The set of ID rules obtained by taking the finite closure 
of the metarules on the ID rules is mapped into local phrase 
structure trees, subject to principles of universal feature instan- 
tiation, FSDs, FCRs, and LP statements. Finally, these local 
trees are assembled to form phrase structure trees, which are 
termmated by lexical elements. 
The essence of GPSG is the constrained mapping of ID rules 
into local trees. The constraints of GPSG theory subdivide 
into absolute constraints on local trees (due to FCRs and LP- 
statements) and relative constraints on the rule to local tree 
mapping (stemming from FSDs and universal feature instan- 
tiation). The absolute constraints are all language-particular, 
and consequently not inherent in the formal GPSG framework. 
Similarly, the relative constraints, of which only universal in- 
stantiation is not explicitly language-particular, do not apply 
to fully specified ID rules and consequently are not strongly in- 
herent in the GPSG framework either. 4 In summary, GPSG 
local trees are only as constrained as ID rules are: that is, not 
at all. 
The only constraint strongly inherent in GPSG theory (when 
compared to context-free grammars (CFGs)) is finite feature 
closure, which limits the number of GPSG nonterminal symbols 
to be finite and bounded. S 
1.2 A Nonnatural GPSG 
Consider the exceedingly simple GPSG for the nonnatural lan- 
guage Z*, consisting solely of the two ID rules 
SThis work is based on current GPSG theory as presented in GKPS. The 
reader is urged to consult that work for a formal presentation and thorough 
exposition of current GPSG theory. 
4I use "strongly inherent" to mean ~unavoidable by virtue of the formal 
framework." Note that the use of problematic feature specifications in 
universal feature instantiation means that this constraint is dependent on 
other, parochial, components (e.g. FCRs). Appropriate choice of FCRs 
or ID rules will abrogate universal feature inetantiation, thus rendering it 
implicitly language particular too. 
5This formal constraint is extremely weak, however, since the theory 
of syntactic features licenses more than 10 TM syntactic categories. See 
Ristad, E.S. (1986), ~Computational Complexity of Current GPSG Theory ~ 
in these proceedings for a discussion. 
40 
S ---* {},H I E 
This G PSG generates local trees with all possible subcategoriza- 
tion specifications -- the SUBCAT feature may assume any value 
in the non-head daughter of the first ID rule, and S generates 
the nonnatural language ~*. 
This exhibit is inconclusive, however. We have only shown 
that GKPS -- and not GPSG -- have failed to achieve the first 
goal of GPSG theory. The exhibition leaves open the possibility 
of trivially reformalizing GPSG or imposing ad-hoc constraints 
on the theory such that I will no longer be able to personally 
construct a GPSG for Z*. 
2 Undecidability and Generative Power 
in GPSG 
That "= Z*?" is undecidable for arbitrary context-free gram- 
mars is a well-known result in the formal language literature 
(see Hopcraft and Ullman(1979:201-203)). The standard proof 
is to construct a PDA that accepts all invalid computations of 
a TM M. From this PDA an equivalent CFG G is directly con- 
structible. Thus, L(G) = ~' if and only if all computations of 
M are invalid, i.e. L(M) = 0. The latter problem is undecid- 
able, so the former must be also. 
No such reduction is possible for a proof that "-- ~*?" is 
undecidable for arbitrary GPSGs. In the above reduction, the 
number of nonterminals in G is a function of the size of the 
simulated TM M. GPSGs, however, have a bounded number 
of nonterminal symbols, and as discussed above, that is the 
essential difference between CFGs and GPSGs. 
Only weak generative power is of interest for the follow- 
ing proof, and the formal GPSG constraints on weak generative 
power are trivially abrogated. For example, exhaustive constant 
partial ordering (ECPO) -- which is a constraint on strong gen- 
erative capacity -- can be done away with for all intents and 
purposes by nonterminal renaming, and constraints arising from 
principles of universal feature instantiation don't apply to fully 
instantiated ID rules. 
First, a proof that "-- ~*?" is undecidable for context-free 
grammars with a very small number of terminal and nonter- 
minal symbols is sketched. Following the proof for CFGs, the 
equivalent proof for GPSGs is outlined. 
2.1 Outline of a Proof for Small CFGs 
Let L(z,~ ) be the class of context-free grammars with at least 
x nonterminal and y terminal symbols. I now sketch a proof 
that it is undecidable of an arbitrary CFG G c L(~,v ) whether 
L(G) = ~* for some x, y greater than fixed lower bounds. The 
actual construction details are of no obvious mathematical or 
pedagogical interest, and will not be included. The idea is 
to directly construct a CFG to generate the invalid computa- 
tions of the Universal Turing Machine (UTM). This grammar 
will be small if the UTM is small. The "smallest UTM" of 
Minsky(1967:276-281) has seven states and a four symbol tape 
alphabet, for a state-symbol product of 28 (!). Hence, it is not 
surprising that the "smallest GUT M" that generates the invalid 
computations of the UTM has seventeen nonterminals and two 
terminals. 
Observe that if a string w is an invalid computation of the 
universal Turing machine M = (Q,\]E, r, 5, q0, B, F) on input x, 
then one of the following conditions must hold. 
1. w has a "syntactic error," that is, w is not of the form 
Xl~g2~''" ~Xm~ , where each xi is an instantaneous de- 
scription (ID) of M. Therefore, some xl is not an ID of 
M. 
2. xl is not initial; that is, Xl ~ q0~* 
3. x,~ is not final; that is xm ~ r*fF* 
4. x~ F-. M (X~+l) R is false for some odd i 
5. (xi) R ~-*M Xi+l is false for some even i 
Straightforward construction of GVTM will result in a CFG 
containing on the order of twenty or thirty nonterminals and 
at least fifteen terminals (one for each UTM state and tape 
symbol, one for the blank-tape symbol, and one for the instan- 
taneous description separator "~'). Then the subgrammars 
which ensure that (xi) R ~-~'M xi+l is false for some even i and 
that x~ ~--~M (xi+l) R is false for some odd i may be cleverly 
combined so that nonterminals encode more information, and 
SO on. 
The final trick, due to Albert Meyer, reduces the terminals 
to 2 at the cost of a lone nonterminal by encoding the n ter- 
minals as log n -- k-bit words over the new terminal alphabet 
{0, 1}, and adding some rules to ensure that the final grammar 
could generate \]E* and not (~4).. The productions 
N4 --* OL41L4 I OOL4 I 01L~ I llL4 I ... 
are added to the converted CFG GtVTM, which generates a 
language of the form 
L4 --* oooo I OOOl \] OOlO I ... I E I L4L4 
Where L4 generates all symbols of length 4, and N4 gener- 
ates all strings not of length 0 rood k, where k = 4 (i.e. all 
strings of length 1,2,3 mod 4). Deeper consideration of the ac- 
tual GUTM reveals that the N4 nonterminal is also eliminable. 
Note that all the preceding efforts to reduce the number of 
nonterminals and terminals increase the number of context-free 
productions. This symbol-production tradeoff becomes clearer 
when one actually constructs GUTM. 
Suppose the distinguished start symbol for GVTM is SUTM. 
Then we form a new CFG consisting Of all productions of the 
form 
41 
S ---* {Q - q0}{E p - (M}}{N4 U L4} 
and the one production 
S ---* SUT M 
where (M} is the length p encoding of an arbitrary TM M, 
and L4, N4 are as defined above. 
This ensures that strings whose prefix is "q0(M)" will be 
generated starting from S if and only if they are generated start- 
ing from SVrM: that is, they are invalid computations of the 
UTM on M. 
2.2 Some Details for Lc~,v ) and GPSG 
Let the nonterminal symbols F, Q, and E in the following CFG 
portion generate the obvious terminal symbols corresponding to 
the equivalent UTM sets. B is the terminal blank symbol. 
Then, the following sketched CF productions generate the 
IDs of M such that zi ~---~M (Xi+l) R is false for some odd i. 
The $4 and $5 nonterminals are used to locate the even and 
odd i IDs zi of w. Sok generates the language {F t_J #}*. 
s4 -~ rs4 I #s5 I #SoddSok 
S5 -~ rs5 I#s4 I #s,.,.Sok 
$odd -~ Sl# 
Sl ~ rs~r I s2 I s6l s7 
Ss -~ rs~ \[ rs3 
s7 -, srr I ssr 
$2 --* EaESzFbF 
where a # b, both in E 
s~ -. aqbSa{r s - pca} if 8(q, b) = (p, c, R) 
aqbSs{r s - cap} if 8(q,b) = (p,c,L) 
S2 --* aqB#B{r s - pca} if 8(q, B) = (p,c, R) 
aqB#B{r 3 - cap} if 8(q, B) = (p, c, L) 
s3 -. rs~r I QB#Brr I ZB#Br 
$1 and $2 must generate a false transition for odd i, while Sz 
need not generate a false transition and is used to pad out the 
IDs of w. The nonterminals Se,S7 accept IDs with improperly 
different tape lengths. The first $2 production accepts transi- 
tions where the tape contents differ in a bad place, the second $2 
production accepts invalid transitions other than at the end of 
the tape, and the third $2 accepts invalid end of the tape transi- 
tions. Note that the last two $2 productions are actually classes 
of productions, one for each string in F 3 -pca, F 3 - cap,.... 
The GPSG for "= E*?" is constructed in a virtually iden- 
tical fashion. Recall that the GPSG formal framework does not 
bar us from constructing a grammar equivalent to the CFG just 
presented. The ID rules used in the construction will be fully 
specified so as to defeat universal feature instantiation, and the 
construction will use nonterminal renaming to avoid ECPO. 
Let the GPSG category C be fully specified for all features 
(the actual values don't matter) with the exception of, say, the 
binary features GER, NEG. NULL and POSS. Arrange those four 
features in some canonical order, and let binary strings of length 
four represent the values assigned to those features in a given 
category. For example, C\[0100\] represents the category C with 
the additional specifications (\[-GER\], \[+NEG\], \[-NULL\], \[- 
POSS\]). We replace Soda by C\[0000\], S1 by C\[0001\], $2 by 
C\[0010\], $3 by C\[0011\], $6 by C\[0100\], and Sr by C\[0101\]. The 
nonterminal r is replaced by three symbols of the form C\[1 l xx\], 
one for each linear precedence r conforms too. Similarly, Y. is 
replaced by two symbols of the form C\[100x\]. The ID rules, in 
the same order as the CF productions above (with a portion of 
the necessary LP statements) are: 
c\[oooo\] -~ c\[oool\]# 
C\[0001\] -* C\[llO0\]C\[O001\]C\[llO1\]{C\[O010\]\[C\[0100\]\]C\[OIO1\] 
C\[OIO0\]--* C\[llO0\]C\[OIO0\] I C\[llO0\]C\[O011 \] 
cIolol\] -~ C\[OlOl\]C\[llOlltC\[oonlc\[llOl\] 
c\[oolo\] -~ C\[10001aC\[lOO1\]C\[OOn\]C\[XXO1\]bC\[U101 
where a ~ b, both in E 
C\[0010\] ~ aqbC\[00u\]{r ~- pca} if6(q,b) = (p,c,R) 
aqbC\[oon\]{r 3 - cap} if 8(q,b) = (p,c,L) 
C\[0010\] --* aqB#B{r s -pca} if 8(q, B) = (p, c, R) 
aqB#B{r 3 - cap} if 8(q,B) = (p,c,L) 
C\[0011\] -~ C\[1100\]C\[0011\]C\[1101\] \] 
QB#BC\[llO0\]C\[ll01\] I 
C \[1000\] B# BC \[1100\] 
C\[ll00\] < C\[O001\],C\[O011\],C\[OIO0\],C\[OIO1\] < C\[ll01\] 
C\[I000\] < a < C\[1001\] < C\[0011\] < C\[1110\] 
While the sketched ID rules are not valid GPSG rules, just 
as the sketched context-free productions were not the valid com- 
ponents of a context-free grammar, a valid GPSG can be con- 
structed in a straightforward and obvious manner from the 
sketched ID rules. There would be no metarules, FCRs or FSDs 
in the actual grammar. 
The last comment to be made is that in the actual GUTM, 
only the number of productions is a function of the size of the 
UTM. The UTM is used only as a convincing crutch -- i.e. not 
at all. Only a small, fixed number of nonterminals are needed to 
construct a CFG for the invalid computations of any arbitrary 
Turing Machine. 
3 Interpreting the Result 
The preceding pages have shown that the extremely simple non- 
natural language ~* is generated by a GPSG, as is the more 
complex language Llc consisting of the invalid computations of 
an arbitrary Turing machine on an arbitrary input. Because 
42 
Llc is a GPSG language, "= E'?" is undecidable for GPSGs: 
there is no algorithmic way of knowing whether any given GPSG 
generates a natural language or an unnatural one. So, for ex- 
ample, no algorithm can tell us whether the English GPSG of 
GKPS really generates English or ~*. 
The result suggests that goals 1, 2, 3 and the context-free 
framework conflict with each other. Weak context-free gener- 
ative power allows both ~* and Lie, yet by goal 1 we must 
exclude nonnatural languages. Goal 2 demands it be possi- 
ble to algorithmically determine whether a given GPSG gener- 
ates a desired language or not, yet this cannot be done in the 
context-free framework. Lastly, goal 3 requires that all nonnat- 
ural languages be excluded on the basis of the formal system 
alone, but this looks to be impossible given the other two goals, 
the adopted framework, and the technical vagueness of "natural 
language grammar." 
The problem can be met in part by abandoning the context- 
free framework. Other authors have argued that natural lan- 
guage is not context-free, and here we argue that the GPSG 
theory of GKPS can characterize context-free languages that 
are too simple or trivial to be natural, e.g. any finite or reg- 
ular language. 6 The context-free framework is both too weak 
and too strong -- it includes nonnatural languages and excludes 
natural ones. Moreover, CFL's have the wrong formal proper- 
ties entirely: natural language is surely not closed under union, 
concatenation, Kleene closure, substitution, or intersection with 
regular sets! 7 In short, the context-free framework is the wrong 
idea completely, and this is to be expected: why should the ar- 
bitrary generative power classifications of mathematics (formal 
language theory) be at all relevant to biology (human language)? 
Goal 2, that the naturalness of grammars postulated by 
linguistic theory be decidable, and to a lesser extent goal 3, 
are of dubious merit. In my view, substantive constraints aris- 
ing from psychology, biology or even physics may be freely in- 
voked, with a corresponding change in the meaning of "natural 
language grammar" from "mentally-representable grammar" to 
something like "easily learnable and speakable mentally-representab\]£ 
grammar." There is no a priori reason or empirical evidence to 
suggest that the class of mentally representable grammars is not 
fantastically complex, maybe not even decidable, s 
One promising restriction in this regard, which if properly 
formulated would alleviate GPSG's actual and formal inability 
to characterize only the natural language grammars, is strong 
nativism -- the restrictive theory that the class of natural lan- 
eWhile 'natural language grammar' is not defined precisely, recent work 
has demonstrated empirically that natural language is not context-free, and 
therefore GPSG theory will not be able to characterize all the human lan- 
guage grammars. See, for example, Higglnbotham(1984), Shieber(1985), 
and Culy(1985). For counterarguments, see Pullum(1985). Nash(1980), 
chapter 5, discusses the impossibility of accounting for free word order lan- 
guages (e.g. Warlplrl) using ID/LP grammars. I focus on the goal of 
characterizing only the natural language grammars in this paper. 
VThe finite, bounded number of nonterminals allowed in GPSG theory 
plays a linguistic role in this regard, because the direct consequence of finite 
feature closure is that GPSG languages are not truly closed under union, 
concatenation, or substitution. 
8See Chomsky(1980:120) for a discussion. 
guages is finite. This restriction is well motivated both by the 
issues raised here and by other empirical considerations. ° The 
restriction, which may be substantive or purely formal, is a for- 
mal attack on the heart of the result: the theory of undecidabil- 
ity is concerned with the existence or nonexistence of algorithms 
for solving problems with an infinity of instances. Furthermore, 
the restriction may be empirically plausible, l°'xl 
The author does not have a clear idea how GPSG might be 
restricted in this manner, and merely suggests strong nativism 
as a well-motivated direction for future GPSG research. 
Acknowledgments. The author is indebted to Ed Barton, 
Robert Berwick, Noam Chomsky, Jim Higginbotham, Richard 
Larson, Albert Meyer, and David Waltz for assistance in writ- 
ing this paper, and to the MIT Artificial Intelligence Lab and 
Thinking Machines Corporation for supporting this research. 
4 References 
Chomsky, N. (1980) Rules and Representations. New York: 
Columbia University Press. 
Gasdar, G., E. Klein, G. Pullum, and I. Sag (1985) General- 
ized Phrase Structure Grammar. Oxford, England: Basil 
Blackwell. 
Higginbotham, J. (1984) "English is not a Context-Free Lan- 
guage," Linguistic Inquiry 15: 119-126. 
~Note that invoking finiteness here is technically different from hiding 
intractability with finiteness. Finiteness is the correct generalization here, 
because we are interested in whether GPSG generates nonnatural languages 
or not, and not in the computational cost of determining the generative 
capacity of an arbitrary GPSG. A finiteness restriction for the purposes of 
computational complexity is invalid because it prevents us from properly 
using the tools of complexity theory to study the computational complexity 
of a problem. 
l°See Osherson et. el. (1984) for an exposition of strong nativism and 
related issues. The theory of strong nativism can be derived in formal 
learning theory from three empirically motivated axioms: (1) the ability of 
language learners to learn in noisy environments, (2) language learner mem- 
ory limitations (e.g. inability to remember long-past utterances), and (3) 
the likelihood that language learners choose simple grammars over more 
complex, equivalent ones. These formal results are weaker empirically 
than they might appear at first glance: the equivalence of Ulearned~ gram- 
mars is measured using only weak generative capacity, ignoring uniformity 
considerations. 
llAn alternate substantive constraint, suggested by Higginbotham (per- 
sonal communication) and not explored here, is to require natural language 
grammars to generate non-dense languages. Let the density of a class of lan- 
guages be an upper bound (across all languages in the class) on the ratio 
of grammatical utterances to grammatical and ungrammatical utterances, 
in terms of utterance lengths. If the density of natural languages was small 
or even logarithmic in utterance length, as one might expect, and a decid- 
able property of the reformulated GPSG's, then undecidability of "= \]~*?n 
would no longer reflect on the decidability of whether the GPSG framework 
characterized all and only the natural language grammars. The exact spec- 
ification of this density constraint is tricky because unit density decides 
"= IE'?" , and therefore density measurements cannot be too accurate. 
Furthermore, ~* and Lic can be buried in other languages, i.e. concate- 
nated onto the end of an arbitrary (finite or infinite) language, weakening 
the accuracy and relevance of density measurements. 
43 
Hopcroft, J.E., and J.D. Ullman (1979) Introduction to Au- 
tomata Theory, Languages, and Computation. Reading, 
M.A: Addiso~a- Wesley. 
Minsky, M. (1967) Computation: Finite and Infinite Machines. 
Englewood Cliffs, N.J: Prentice-Hall. 
Nash, D. (1980) "Topics in Warlpiri Grammars," M.I.T. De- 
partment of Linguistics and Philosophy Ph.D dissertation, 
Cambridge. 
Osherson, D., M. Stob, and S. Weinstein (1984) "Learning The- 
ory and Natural Language," Cognition 17: 1-28. 
Pullum, G.K. (1985) "On Two Recent Attempts to Show that 
English is Not a CFL," Computational Linguistics 10: 182- 
186. 
Shieber, S.M. (1985) "Evidence Against the Context-Freeness of 
Natural Language," Linguistics and Philosophy 8: 333-344. 
44 
