REVISED GENERALIZED PHRASE STRUCTURE GRAMMAR 
Eric Sven Rlstad 1 
M.I.T. Artificial Intelligence Lab 
545 Technology Square, 805 
Cambridge, MA 02139 
Thinking Machines Corporation 
245 First Street 
Cambridge, MA 02142 
ABSTRACT 
In this paper, I revise generalized phrase structure grammar 
(GPSG) linguistic theory so that it is more tractable and linguis- 
tically constrained. Revised GPSG is also easier to understand, 
use, and implement. I provide an account of topicalization, ex- 
plicative pronouns, and parasitic gaps in the revised system and 
conclude with suggestions for efficient parser design. 
1 Introduction and Motivation 
A linguistic theory specifies a computational process that assigns 
structural descriptions to utterances. This process requires cer- 
tain computational resources, such as time or space. In a descrip- 
tively adequate linguistic theory, the computational resources 
available to the theory match those used by the ideal speaker- 
hearer. The goal of this paper is to revise generalized phrase 
structure grammar (GPSG) so that its computational power cor- 
responds to the ability of the speaker-hearer. 
The bulk of this paper is devoted to identifying what com- 
putational resources are used by GPSG theory, and deciding 
whether they are linguistically necessary. GPSG contains five 
formal devices, each of which provides the theory with the re- 
sources to model some linguistic phenomenon or ability. I iden- 
tify those aspects of each device that cause intractability and then 
restrict the computational power of each device to more closely 
match the (inherent) complexity of the phenomenon or ability 
it models. The remainder of the paper presents the new formal 
system and exercises it in the domain of topicalization, explica- 
tive pronouns, and parasitic gaps. I conclude with suggestions 
for efficient parser design and future research. 
In my opinion, the primary value of this work lies in the re- 
sult (revised GPSG, or RGPSG) as well as in the methodology 
of using complexity analysis to improve linguistic theories. The 
methodology explicates how a tool of modern computer science 
can help us understand and improve theories of linguistic compe- 
tence. More than that, complexity analysis forms the foundation 
of informed parser design. I feel RGPSG is of value both to lin- 
guists and computational linguists because it is more tractable 
and easier to understand, use, and implement. It can be effi- 
ciently implemented and appears to have better empirical cover- 
age than its GPSG ancestor. 
tThe author is eupported by a graduate fellowship from the IBM Corpora- 
tion. This research was supported in part by Thinking Machines Corporation 
and by NSF Grant DCR-85552543, under a Presidential Young Investigator 
Award to Profeuor Robert C. Berwick. I wish to thank Ed Barton for 
stylistic improvements and helpful discussion; Robert Berwick for support, 
critickm, and suggesting I pursue thk research; and Geoff Pullum for his patient 
help with GPSG theory. 
2 Eliminating Intractability in GPSG 
Ristad (1986a) examines the computational complexity of two 
components of the GPSG formal system (metarules and the fea- 
ture system) and shows how each of these systems can lead to 
computational intractability. Rlstad also proves that the uni- 
versal recognition problem for GPSGs is EXP-POLY hard, and 
intractable. 2 In another words, the fastest recognition algorithm 
for GPSGs can take more than exponential time. 
These results may appear surprising, given GPSG's weak 
context-fres generative power. They also raise some important 
computational and linguistic questions: why GPSG-Recognition 
is so difficult, what aspects of the GPSG formalisms cause in- 
tractability, and whether they are linguistically necessary. I be- 
gin with an outline of the GPSG formal system, as presented in 
Gazdar, Klein, Pullum, and Sag (1985), GKPS hereafter. Sub- 
sequently, I identify and remove the excess computational power 
provided by each formal device. 
2.1 Overview of GPSG Formalisms 
From the perspective of classic formal language theory, a GPSG 
may be thought of as a grammar for generating a context-free 
grammar. The generation process begins with immediate dom- 
inance (ID) rules, which are context-free productions with un- 
ordered right-hand sides. An important feature of ID rules is that 
nonterminals in the rules are not atomic symbols (for example, 
NP). Rather, GPSG nonterminals are sets of \[.feature, feature-value\] 
pairs. For example, IN +\] is a \[feature, feature-value\] pair, and 
the set { IN ÷\], IV -\], \[BAR 2\] } is the GPSG representation of 
a noun phrase. Next, metarules apply to the ID rules, resulting 
in an enlarged set of ID rules. Metarules have fixed input and 
output patterns containing a distinguished multiset variable W 
in addition to constants. If an ID rule matches the input pattern 
under some specialization of the variable W, then the metarule 
generates an ID rule corresponding to the metarule's output pat- 
tern under the same specialization of W. For example, the passive 
metarule 
VP ~ W, NP 
• ~. (1) 
VPIPAs\] ---* W, (PPIby\]) 
says that "for every ID rule in the grammar which permits a VP 
to dominate an NP and some other material, there is also a rule 
2The universal recognition problem most accurately reflectg the difficulty 
of processing a grammatical formalism because it incorporates the gr-4m- 
mar in the problem statement, as explained in Barton, Berwick, and Ristad 
(x987). 
243 
in the grammar which permits the passive category VP \[PAS\] to 
dominate just the other material from the original rule, together 
(optionally) with a PP\[by\] ~ (GKPS:59). In Ristad (1986a), the 
finite closure problem is used to determine the cost of metarule 
application. Principles of universal feature instantiation (UFI) 
apply to the resulting enlarged set of ID rules, defining a set of 
phrase structure trees of depth one (local trees). One principle of 
UFI is the head feature convention, which ensures that phrases 
are projected from lexical heads. Informally, the head feature 
convention is GPSG's ~-theory. Ristad (1986a) uses the eatego~j 
mem~ersA~p problem to determine, in part, the cost of mapping 
I'D rules to local trees. Finally, linear precedence statements are 
applied to the inst~ntiated local trees. LP statements order the 
unordered daughters in the instantiated local trees. The ulti- 
mate result, therefore, is a set of ordered local trees, and these 
are equivalent to the context-free productions in a context-free 
grammar. The resulting context-fres grammar derives the lan- 
guage of the GPSG. 
The process of assigning structural descriptions to utterances 
consists of two steps in GPSG: the projection of ID rules to local 
trees and the derivation of utterances from nonterminals, using 
the local trees. Accordingly, formal devices may supply resources 
to either process. 
2.2 Theory of Syntactic Features 
In current GPSG theory, syntactic categories (nonterminals) en- 
code linguistic relations as feature-value pairs. If a relation is 
true of two categories in a phrase structure tree, then the rela- 
tion will be encoded in every category on the unique path be- 
tween the two categories. The primary computational resource 
provided by the theory of syntactic features is polynomial space, 
primarily due to the large number of possible syntactic categories 
arising from finite feature closure. Ristad (1986a) observes that 
finite feature closure admits a surprisingly large number of pos- 
sible categories: 9(36"bT) where a is the number of atomic-valued 
features and b the number of category-valued features. In fact, 
there are more that 107:~ categories in the GKPS system. 
Fortunately, the full power of embedded categories does not 
appear to be linguistically necessary because no category-valued 
feature need ever contain another, s In GPSG, there are three 
category-valued features: SLASH, which marks the path between 
a gap and its filler with the category of the filler; AGR, which 
marks the path between an argument and the functor that syn- 
tactically agrees with it (between the subject and matrix verb, for 
example); and WH, which marks the path between a ~#h-word and 
the minimal clause that contains it with the morphological type 
of the ~h-word. AGR will never contain SLASH because a functor 
(verb or predicate) will never select a gap or a constituent con- 
taining a gap as it's argument. Conversely, SLASH will never be 
required to contain AGR because such a category corresponds to 
%he following imaginary (and rather weird) case: Suppose we 
found a language in which finite verb phrases could be fronted 
over an unbounded domain provided that they were in the agree- 
ment form associated with third-person-singular NP controllers" 
(PuUum, personal communication). Similarly, because the value 
of ~ is the category of a wh- noun phrase, and because ~#~- nom- 
sLet f and g be any distinct category-valued features. I am arguing that 
although f may ~ppear inside g in some L~nguage, f will never be reqm'regto 
appear inside g. 
inals never contain gaps, WH can never contain SLASH or AGR. In 
point of fact, no category embeddings appear in the GKPS gram- 
mar for English, and it is difficult to see how they would appear 
in a GPSG for any other natural language. 
The obvious revision, then, is unit feature closure: to limit 
category-valued features to containing only O-level categories. (0- 
level categories do not contain any category-valued features). I 
adopt this strongly falsifiable constraint in RGPSG. The depth 
of category-embedding is purely an empirical issue, and hence 
unit closure is not ad hoe. The other revision is primarily no- 
tational: any RGPSG feature f may assume the distinguished 
values noBind or unbound in addition to those values determined 
by p(f). A noBSnd value indicates that the feature may not re- 
ceive a value in an extension of the given category, while unbound 
indicates that the feature does not currently have a value, and 
may receive one in extension. 
2.3 Immediate Dominance/Linear Precedence 
GPSG's ID/LP format models certain word order phenomena, 
such as the head parameter and some free word order facts. An 
ID rule is a context-free production 
Co -'* CI,C2 .... ,C~ 
whose left-hand side (LHS) is the mother category and whose 
right-hand side (RHS) is an unordered multlset of daughter cate- 
gories, some of which may be designated as head daughters. The 
LHS immediately dominates the unordered RHS in a tree of depth 
one (a local tree). 
2.3.1 Complexity in ID/LP 
ID rules significantly increase the time resources available to the 
GPSG derivation process in four related ways. First, a deriva- 
tion step is nondeterm/nistlc because a category may immediately 
dominate more than one RHS. Second, the derivation process 
may alternate between a derivation step involving the ID rules 
C --~ Ct \[ ... I C~ that corresponds to an OR-transition (only 
one of k possible successors must yield a terminal string) and 
a derivation step involving an ID rule C ~ CI,C2,... ,Ce that 
corresponds to an AND-transition (all k successors must yield 
terminal strings). These two devices introduce lexical and struc- 
tural ambiguity. As is well-known, ambiguity is a central prop- 
erty of natural languages. Therefore, I consider this aspect of ID 
rules linguistically essential, and it will be retained in RGPSG. 
Third, unrestricted null transitions in ID rules are a source of 
intractability because they allow GPSGs to generate enormous 
phrase structure trees whose yield is the empty string (see Ristad, 
1986a). Thus, a parser that used such a grammar must nonde- 
terministically postulate elaborate phrase structure in between 
its input tokens. The indisputable unnaturalness of this ability 
motivates me to greatly restrict null transitions in RGPSG. 
Fourth, the multiset RHS of an ID rule contributes to a large 
space of local phrase structure trees: an ID rule with s a RHS of 
cardinality b can, if unconstrained by LP statements, correspond 
to b! ordered productions. In parsing practice, this can cause 
a combinatorial explosion in a context-free parser's state space 
(see Barton, 1985). In addition to causing nondeterrninism in 
244 
any GPSG-based parser, the multiset RHS confers on GPSG the 
ability to count nonterminals. The apparent artificiality of this 
device, as discussed in Barton, Berwick, and Ristad (1987:260- 
261), will motivate me to adopt a substantive constraint of short 
ID rules in RGPSG (binary branching, for example). 4 
2.3.2 Revised ID/LP 
RGPSG ID rules have exactly one mother and at least one head 
daughter. The heads are separated notationally from the non- 
heads by a colon, and appear to the left of the colon. The mother 
and all head daughters are implicitly specified for \[NULL -\]. For 
example, the RGPSG headed ID rule 2 corresponds to the GPSG 
ID rule 3. 
ve --, \[SUBCAT 2\] : 5'e (2) 
Ve\[NULL -\] --* H\[SUBCAT 2.NULL -\],N,q (3) 
There is only one lexical element for the null string, and it is 
universal across all grammars: 
X2 \[SLASH X,~I, NULL +\] l ""* 
Co-subscripting indicates that the two X,~ categories must be 
identical in any legal projection of the rule, with the exception of 
the \[NULL ÷\] and SLASH specifications. This restricted ID rule 
format, when coupled with a restriction on metarules that pre- 
vents them from affecting head daughters, prevents head daugh- 
ters from ever being erased in a RGPSG derivation. Thus, null 
transitions are effectively eliminated from RGPSG. 
An ordered production is an ID rule whose daughters are com- 
pletely linearly ordered, that is, a string of daughter categories 
rather than multisets of head and nonhead daughters. An or- 
dered production is LP-occeptable if all LP statements in the 
RGPSG are true of it. 
The RGPSG ID/LP formalism does not contain formal con- 
straints sufficient to guarantee polynomial-time recognition, al- 
though the linguistically justified use of short ID rules can render 
ID rules tractable, because ID/LP grammars with bounded rules 
can be parsed in time polynomial in the grammar si~.e, s 
2.4 Metarules 
Metarules are lexical redundancy rules. Formally, they are func- 
tions that take le=ical ID rules--ID rules with a lexical head--to 
'The binary branching constraint is independently motivated by the lln- 
guistic arguments of Kayne (1981) und others. In that work, Kayne argues 
that the pnth from a governed category to its governor (for example, from 
an anaphor to its antecedent) must be unamblguou~--informally put, "an 
unambiguous path is a path such that, in tracing it out, one is never forced 
to m~.ke a choice between two (or more) unused branches, both pointing in 
the same direction" (Kayne 1981:146). The unambiguous path requirement 
sharply constrains fan-out in phra~ structure trees because n-ary branching, 
for n > 2, is only possible when none of the rt sister nodes must govern any 
other nodes in the phrase structure tree. 
s~ the length bound for natural language graznmars is the constant b, then 
any \]I)/LP grammar G cffin be converted into a strongly-equivalent CFG G ~, 
of sise 0(IG I . b!) = $(IGI) by simply expanding out the constant number of 
linear precedence po~ibilitlee. In the GKP$ and RGPSG grammars for En- 
glish, b = 3 becau~ double object constrnctions (\[g/us NP NP\], for example) 
are atmigued a fiat, ternary branching structure. (I ignore the iterating coor- 
dination schema, which licenses rules with unbounded right-hand sides.) It 
is important, however, that the short rules reflect a genuine constraint and 
that the grammar does not use some other mechanism to get the effect of 
longer rules (feature instantiation, for example). 
sets of lexical ID rules. See the GKPS passive metarule above. 
The GKPS grammar for English also includes metarules for subject- 
aux inversion, extrapusition, and transitivity alternations. The 
complete set of ID rules in a GPSG is the maximal set that can 
be arrived at by taking each metarule and applying it to the set 
of rules that did not themselves arise from the application of that 
metarule. This maximal set is called the finite closure FC(M, R) 
of a set R of lexical ID rules under a set At f of metarules. 
2.4.1 Complexity of Metarules 
Metarules can increase the time and space resources available to 
the derivation process by introducing null transitions and ambi- 
guity in ID rules and by increasing the space of ID rules more 
than exponentially. They can also increase the cost of the projec- 
tion process itself: finite closure is nondeterministic (NP-hard, in 
fact) because metsrules are applied to ID rules nondeterministi- 
cally. 
2.4.2 Revised Metarules 
Unrestricted null transitions are both linguistically and computa- 
tionally undesirable. Moreover, the ability of metarules to affect 
lexicai head daughters is in direct conflict with their linguistic 
purpose: ato express generalizations about the subcategorization 
possibilities of lexical heads, n (GKPS:59) Unrestricted metarules 
can destroy the relation between a phrase and its lexicai head, 
and thereby violate ~-theory. The first step in revising recta- 
rules is to restrict them to on/y affect nonhead daughters in lexical 
ID rules. Because of this change, metarules cannot alter the im- 
plicit \[NULL o\] specification on the head daughters. Therefore, 
once a category is expanded in a derivation, it must be lexlcal\]y 
realized in the derived string. This formal constraint ensures 
that the empty string does not have elaborate phrase structure 
in RGPSG. 
Metarule finite closure generates many linguistically incorrect 
ID rules that must be excluded by other GPSG devices (FCRs, 
for example). The GKPS grammar for English contains six meta- 
rules; out of approximately 1944 possible metarule interactions 
in principle, only two such interactions appear to be productive 
(passive followed by subject-aux inversion or slash termination 
metarule 1).6 Therefore, the second metarule restriction adopted 
by RGPSG is biclosure, instead of finite closure, r 
SGiven a set of ,~ metarules, the number of possible metarule interactions 
is the number of ways to pick n or less metarules from the set, where order 
matters and repetitions are not allowed. That number is given by the total 
number of possible koeslections from the a metarules, where k v-4ries from 0 
(no metarnles apply) to ~ (any combination of all metaruies apply). Thus, 
the number of possible interactions j'(n) is: ~-~:o (b--,)l ~ b!-e). This k not 
the size of metarule finite closure, because it does not consider the pouibillty 
of a metarnle matching an I'D rule in more than one wuy. 
TMetarule biclosure does not overgenerate as badly as finite closure, and 
thereby promotes descriptive adequacy at the expense of some explanatory 
power. Biclosure has an edge in descriptive economy (explanatory power) 
over unit closure because simpler (and less) metarules are needed with biclo- 
sure. Thus, the length of metarnle derivations is not totally ad hoc because 
it is subject to scientific criterion. 
245 
2.5 Principles of Universal Feature Instantiation 
The ID rules obtained by taking the finite closure of the mete- 
rules on the ID rules are proiected to local phrase structure trees. 
Abstractly, this process establishes the connection between those 
relations encoded in ID rules (for example, domination, subcate- 
gorization, case, modification, and predication) and the nonlocal 
linguistic relations. Local trees are projected from ID rules by 
mapping the categories in a rule into legal extensions of those 
categories in the projected local tree. 
Principles of aniverea/feature instantiation (UFI) constrain 
this projection by requiring categories in a local tree to agree in 
certain feature specifications when it is possible for them to do 
so. For example, the head feature convention (HFC) requires the 
mother to agree with all head features that the head daughters 
agree on, if agreement is possible. The HFC expresses ~-theory 
in part, requiring a phrase to be the projection of its head. It 
also plays a central role in the GPSG account of coordination 
phenomena, requiring the conjuncts in a coordinate structure to 
all participate in the same linguistic relations with the rest of 
the sentence. The two other principles of UFI are the control 
agreement pr/nc/ple and the foot feature principle. The control 
agreement principle represenm the GPSG theory of predicate- 
argument relations; informally, it requires predicates to agree 
with their arguments (for example, verb phrases must agree with 
their subject NPs in English). The foot feature principle pro- 
rides a partial account of gap-filler relations in the GPSG sys- 
tem, including parasitic gaps and the binding facts of reflexive 
and reciprocal pronouns; it plays a role strikingly similar to that 
of Pesetsky's (1982) path theory and Chomsky's (1986) binding 
and chain theories, s Informally, the foot feature principle ensures 
that certain syntactic information is not lost. ~Exceptional ~ fea- 
ture specifications are those feature specifications in an ID rule 
that should agree by virtue of a principle of UFI, but are unable 
to without changing a feature specification inherited from the ID 
rule. 
2.5.1 Complexity of U'FI 
The three principles of UFI all cause intractability because they 
provide the derivation process with reusable space resources. 
First, each principle of UFI can enforce nonlocal feature agree- 
ment in phrase structure. Ristad (1986b) shows how this causes 
NP-hardnees, when coupled with lexical ambiguity or null tran- 
sitions. A related source of intractability is that the projection 
of ID rules to local trees can create an astronomical space of 
local trees, which in turn increases parser search space. These 
two sources of intractability cannot be eliminated because they 
are essential to GPSG's account of linguistic agreement among 
aThe possibility of expreuing the control agreement and foot feature prin- 
ciples as local constI-sints on nonlocal relations ~llm out from the central 
role of c-command, or equivalently unambiguous paths, in binding theory. 
C-command k a local relation, in fact the primary source of locality in 
phrase structure (see Berwick and Wexler 1982). Similarly, the possibility 
of encoding multiple g-sp-filler relations in one feature specification of one 
category corresponds to the "no crossing ~ constraint of path theory. Peeet- 
sky (1982:556) compares the predictions of path theory and principles of UFI 
when the two diverge in cases of double extraction (for example, a probls~r~ 
thaf~ \] know ~vho i to \[~ talk to s i about ell) from coordinate structures. He 
concludes that ithe apparent simplicity of the slash category solution fades 
when more complex cases are considered." 
conjuncts and between predicates and their arguments, gaps and 
their fillers, and phrases and their lexical heads. 
The use of exceptional feature specifications in these princi- 
ples allows a derivation to reuse the space resources provided by 
the ID rules and theory of syntactic features. In the reduction 
of Ristad (1986a), head features encode an alternating Turing 
machine tape. The HFC is used to transfer the tape contents 
for an ATM configuration Co (represented by the mother) to its 
immediate successors C1, C2,... ,Ck (the head daughters). The 
configurations Co, C1 .... ,Ct have identical tapes, with the crit- 
ical exception of one tape square. If the HFC enforced absolute 
agreement between the head features of the mother and head 
daughters, the polynomial space ATM computation could not be 
simulated in this manner. 
2.5.2 Universal Feature Instantiation in RGPSG 
Principles of universal feature instantiation in RGPSG all pre- 
serve a simple invariant across all ID rules. They are mono- 
tonic; that is, they never delete or alter existing feature spec- 
ifications. The head feature convention, for example, ensures 
that the mother agrees exactly with all head feature specifica- 
tions that the head daughters agree on, regardless of where the 
specifications come from. 
Principles of UFI are first applied to the ID rule output of 
metarule unit closure. After this initial application, each princi- 
ple always applies, governing the well-formedness of the ID rule 
extension relation. The resulting ID rules derive utterances in 
the language generated by the RGPSG. 
Head feature convention. The head feature convention en- 
forces the invariant that the mother is in absolute agreement 
with all head features on which the head daughters agree. It 
also requires the BAR value on a head daughter to be less than or 
equal to the BAR value on the mother. HEAD contains exactly 
those features that must be equivalent on the mother and head 
daughters of every ID rule. 9 
HEAD = {AGR, ADV, AUX, INV, LOC, N, N'FORM, PAS, PAST, 
PER, PFORM, PLU, PRD, V, VFORM} 
Control agreement principle. The control agreement princi- 
ple (CAP)differs from the HFC in that it establishes equivalences 
(//nks) between the categories in an ID rule: when two categories 
are linked in an ID rule, the two categories must be identical in 
any legal extension of that rule. Links are calculated immedi- 
ately after the HFC has applied to the ID rules for the first time; 
once a link is established in an ID rule, it cannot be changed or 
undone. I° The first part of the CAP calculates control relations 
between categories, while the second part of the CAP establishs 
°In order to properly account for feature inetantiation in the binary and 
Rerating coordination schemata, the binary head (BHEAD) features BAR, 
SUB J, SUBCAT, and SLASH are considered to be head features for the purposes 
of the HFC in all nonlexlcal, multiply-headed ID rules. 
loin GI~s, only head feature specifications and inherited foot feature 
specificationJ determine the semantic types relewant to the definition of con- 
trol. RGPSG simplifies this by considering inherited feature specifications 
and only some head feature specifications. Alternatively, control relations 
could be calculated every time the HFC instantiates a feature specification. 
246 
links using the control relations. In all cases, linking is indicated 
by co-subscripting. 
RGPSG control relations are calculated as follows. A predi- 
cate is a VP or an instantiation of XP\[÷PRD\] such as a predicate 
nominal or adjective phrase. The control feature of a category C~, 
where C~(BAR) 7 & 0, is SLASH if C~ is specified for SLASH; other- 
wise, it is AGR. Control is calculated once and for all immediately 
after the HFC has applied to the ID rules resulting from metarule 
unit closure. 
Let f be the control feature of a category C,. Then 6', is 
controlled by C~ in a rule if and only if CI(f) = C2, 6'2 ~_ X2, 
and either the rule is Co -* C, : 6'2 (recall that 6'1 is the head 
daughter), or the rule is Co -'* Cs : CI,C2, and C0,CI _~ VP. 
The RGPSG control agreement principle states: In an ID rule 
r = Co -. el,..., Ci : C#+~ ..... C. 
• If C~ controls Ck and fk is the control feature of C~, then 
Ck(f~) and C~ are linked. 
• If there is a nonhead predicate C~ with no controller, then 
link C~(f~) and Co(fo), where f~ and f0 are the control 
features of C~ and Co, respectively. 
In the theory of GKPS, the control agreement principle per- 
forms subject-verb agreement by enforcing a control relation be- 
tween the two daughters of the rule 
5' --, H\[-SUBJ\], X~ 
In RGPSG, this rule must be stated as 
S --* X~ \[-SUBJ,AGR X~\] : X~ 
if we wish to enforce the control relation between the two daugh- 
ters. Because control relations in RGPSG are static (never re- 
calculated), this control relation exists even if Xg ~ NP. Fortu- 
nately, no verb will ever be specified for \[AGR AP\] in the lexicon, 
and therefore any "questionable" control relations involving an 
Xg other than NP are ignored at the lexical insertion level. 
Foot feature principle. The foot feature principle (FFP) re- 
quires any foot feature specification instantiated on a daughter 
category to also be instantiated on the mother. The specifica- 
tion is identical to any instantiation of the same feature on other 
daughter categories. The FFP ensures that (1) the existence 
of inherited foot features on any category of an ID rule blocks 
instantiation of those foot features on any other component cat- 
egory of the rule, and (2) inherited foot features are equivalent 
across all component categories of the rule. This second condi- 
tion may be too strong. 
Because the empty string can be dominated only by a cate- 
gory of the form <*\[NULL ÷, SLASH a\] in RGPSG, the FFP tries 
to ensure that every gap will have a unique filler. Unfortunately, 
it is impossible to truly guarantee recoverability of deletions in 
RGPSG, because the FFP can only locally constrain the rule- 
to-tree projection, and not the ID rules themselves. This sit- 
uation is unavoidable in the GPSG framework, simply because 
SLASH does not always mark the complete path between a gap 
and its filler in accepted GPSG analyses. The classic example 
is the GPSG analysis of subject dependencies, where an S/NP 
is reanalyzed as a I/P, effectively deleting an NP gap in subject 
position. In GKPS, this operation is performed by slash termi- 
nation metarule 2 (GKPS:160-2): \[SLASH NP\] only marks the 
path from the filler to the mother of the reanalyzed I/P. Another 
example is the GKPS (pp. 150-152) analysis of missing-object 
constructions such as John is e~y to please. In missing-object 
constructions, \[SLASH NP\] only marks the path from the NP 
gap to the V~\[INF\]/NP dominating to please, failing to continue 
through the AP easlt to please to the filler Job,. Many sweep- 
ing changes would be necessary before the FFP would be able to 
strictly enforce recoverability of deletions in RGPSG. 
2.6 Marking Conventions 
Feature co-occurrence restrictions (FCRs) and feature specifica- 
tion defaults (FSDs) are explicit marking conventions used in the 
GPSG system both to express language-particular facts and to 
restrict the overgeneration of other formal devices (both metarule 
and feature closure}. FCRs and FSDs are restrictive predicates 
on categories, constructed by Boolean combination of feature 
specifications. All legal categories must unconditionally satisfy 
all FCRs. All categories must also satisfy all FSDs, if it is possi- 
ble to do so without violating an FCR or a principle of universal 
feature instantiation. For example, 
FCR i: \[INV ÷\] D {\[AOX +\] A \[VFORM FIN\]) 
requires any category that bears the \[INV ÷\] feature specifica- 
tion to also bear the specifications \[AUX ÷\] and \[VFORM FIN\]. 
2.6.1 Complexity of Marking Conventions 
FCRs and FSDs both provide significant resources to the GPSG 
projection process. First, they allow the projection process to 
reuse the polynomial space provided by the theory of syntactic 
features, because they can establish equivalences between the fea- 
tures in a category C and the features in a category contained 
in C. This ability to apply across embedded categories vastly 
increases the complexity of the rule-to-tree projection. To see 
why it is linguistically unnecessary, consider the role of embed- 
ded categories. A category-valued feature f expresses a nonlocal 
linguistic relation between a category C and the one or more cat- 
egories that bear the feature specification \[f C\]. Thus, in the 
linguistically relevant cases, every embedded category eventually 
~surfaces" in phrase structure, where the marking conventions 
are free to apply. The one exception to this argument is FCR 
13 in the GKPS grammar for English, which applies 'across' an 
embedded category. 
FCR 13: \[FIN, AGR NP\] O \[AGR NP\[NOM\]\] 
In RGPSG, marking conventions may not apply to or across em- 
bedded categories. The effect of FCR 13 is achieved in RGPSG 
by a combination of the simple default SD 2 in section 3.2.2 below 
and carefully written ID rules. 
Second, FCRs and FSDs of the "disjunctive consequence" 
form \[f ~\] D \[fl vl\] V...V \[fn ~,\] compute the direct ana- 
log of the NP-complete satisfiability problem: when several such 
247 
FCRs are used together, the GPSG must nondeterministically 
try all n featurs-value combinations. 
Third, the process of applying FSDs to local trees is very 
complex, in part because it is not informationally encapsulated. 
Rather than simply considering the (existing) feature specifica- 
tions in each target category separately, FSD application is af- 
fected by the other categories in the ID rule, all principles of 
universal feature instantiation, and even FCRs. 
2.6.2 Simple Defaults in RGPSG 
There is no reason to believe that marking conventions need be 
so powerful and unconstrained. The approach RGPSG takes is to 
virtually eliminate marking conventions. Rather than stating the 
internal constraints on categories explicitly (and redundantly), 
as FCRs do, RGPSG eliminates FCRs altogether. Instead, the 
constraints FCRs express are implicitly stated in the rest of the 
grammar -- in the way ID rules and metarules are written, for 
example. The sole explicit marking convention in RGPSG is the 
simple defauh (SD). Unlike FCRs and FSDs, SDs are construc- 
tive, easy to understand and computationally tractable. Each 
$D is applied (and may be understood) to each category inde- 
pendent of all other categories and RGPSG formal devices, in- 
cluding other SDs. $Ds are applied to ID rules immediately after 
the initial application of principles of UFI. 
An SD contains a predicate and a consequent. The conse- 
quent is a list of feature specifications. The predicate is a Boolean 
combination of truth-values and feature specifications such that 
if a category C bears or extends a given feature specification, that 
feature specification is true of C, else false. If the predicate is 
true of a given category C in a rule and the consequent includes 
only unbound and unlinked features, then the feature specifica- 
tions listed in the consequent are instantiated on C. Each SD is 
applied simultaneously to every top-level category in every rule 
exactly once, in the order specified by the grammar. Consider 
the following SD: 
SD I: if \[SUBCAT\] then \[BAR 0\] 
If the target category C in a ID rule is specified for the SUBCAT 
feature, but unspecified for the BAR feature, then the SD wi|\] 
force the feature specification \[BAR 0\] on C. 
3 The Revised Theory 
In this section, I explain how the formal subsystems described 
above fit together. I begin by formally specifying the class of 
RGPSGs and the languages they generate. I conclude by trans- 
lating the GKPS analysis of topicalization, explicative pronouns, 
and parastic gaps to the RGPSG formal system. 
Figure 1 shows the internal organization of RGPSG. The set 
of ID rules R' defined by metarule unit closure, UFI, and SD 
application generates the language of the RGPSG as follows. If 
R' contains a rule A --. ~' with an extension A' --..1, that satisfies 
all principles of UFI and is an LP-acceptable ordered production, 
then for any string of terminals a and nonterminals ~, we write 
aA'~ =~ a'Tt~. This is a derivation step. The language of an 
RGPSG contains all terminal strings that can be derived, using 
ro s,~es R o(IRI) 
I Metarule UC 
vc(M,a) O(iRi2.1Mi) 
v-.d r~ R~. I O(IR?'IMI'ISl) 
I SDe and UFI 
m ,,~. ~ O(IGt') 
Figure I: This diagram shows internal organization of an RGPSG 
G with ID rules R, metarules M, and simple defaults S. The 
O-bounds show the effect of various formal devices on derived 
grammar symbol size. 
the ID rules, from any extension of the distinguished start cate- 
gory. Let =~ be the reflexive transitive closure of =~. Then the 
language L(G) generated by G is 
L(G) = { z I z e V~ and 3C • K\[(C ~_ Start) ^ C =~ zl} 
Ristad (1986b) proves that universal recognition problem for 
RGPSG is NP-complete, a significant decrease in complexity 
from the EXP-POLY time hardness of GPSG-Recognition. xl In 
fact, of the more than ten sources of intractability lurking in 
GPSG, only two remain in RGPSG -- lexical ambiguity and 
nonlocal feature agreement. Critically, these two sources of in- 
tractability in RGPSG appear to be linguistically essential. 
3.1 Efficient RGPSG Parsing 
Intractability in RGPSG arises from a particularly deadly com- 
bination of feature agreement and lexical ambiguity. Underspec- 
ification of categories in ID rules and metarules can be costly. 
This suggests that limiting the number of head features or the 
scope of their agreement will mitigate the intractability. An ef- 
ficient recognition algorithm might approximate grammaticality 
by failing to transfer all head features through coordinate struc- 
tures (for example, letting them assume default values instead), 
or by aborting a parse in the face of excessive lexical or struc- 
tural ambiguity. Ef~cient parsing techniques based on partial 
enforcement of UFI are also possible. One such implementation, 
which propagates feature specifications bottom up using Earley's 
algorithm, is in progress at Thinking Machines Corporation. 
~This decrease in complexity ie significant from both theoretical and prac- 
tical perspectives. First, N'P-complete problems typically have good average 
time algorithms, while EXP-POLY problems do not. Next, the fastest rec- 
ognizer known for GPSGs can require double-exponential time in the worst 
case, while RGPSG has a simple exponential time recognizer. Finally, NP- 
complete problems have efficient witneeBes, while EXP-POLY hard problems 
do not. Thk means that RGPSG parses can always be verified efficiently, 
while GPSG parsee cannot, in gener~h 
248 
Barton (1986) proposes a constraint-based computational so- 
lution to intractability in the two-level Kinuno morphological 
analyzer. Intractability arises from unbounded agreement pro- 
cesses in that system, and similar techniques based on constraint 
propagation may be adapted to create an e/~cient approz~mate 
parsing algorithm for RGPSG. Tuples of features would corre- 
spond to constraint-propagation nodes, while tuples of sets of 
fcature-values would correspond to node labels; features could 
receive multiple values in this implementation. Nodes would be 
connected by both RGPSG ID rules and principles of universal 
feature instantiation. 
3.2 Linguistic Analysis of English 
This section reproduces three of the more intricate linguistic anal- 
yses of GKPS in order to illustrate RGPSG's formalisms. To 
reproduce their comprehensive analysis of English in toto would 
be a disservice to that work and is beyond the scope of this 
paper. Instead, Ristad (1986b) provides an RGPSG roughly 
equivalent to their GPSG for English; the reader should consult 
GKPS for the accompanying linguistic exposition. In all cases, 
co-subscripting indicates linking. 
3.2.1 Topicallzation 
The rule 4a expands clauses and rule 4b introduces unbounded 
dependency constructions (UDCs) in English. 
a.S--*XS\[sUBJ -.AGR X2\] :X~ 
b. S --. X8 \[SUBJ *,SLASH X2\] : X~ (4) 
In both cases the X2 nonhead daughter controls the head daugh- 
ter, and the control agreement principle links the value of the 
head daughter's control feature with the 3(2 daughter, creating 
the ID rules in 5. 
a. S --* VP\[AGR X~x\] : X~I 
b. S \[SLASH noBind\] .~ S \[SLASH X~\] :X~ \[SLASH noBind\]t 
(s) 
In the following discussion, \[3s\] and \[3p\] abbreviate \[PER 3, -PLU\] 
and \[PER 3.+PLU\], respectively. Note that it is impossible to 
extract any constituent out of the X~ daughter in 5b because 
the foot feature principle has forced \[SLASH noBind\] on the X~ 
daughter and its mother. This explains the unacceptabihty of 6 
in RGPSG, which is permissible in the theory of GKPS. 
* New York \[\[ the girl from --\] \[ we want __ to succeed \]\] 
(s) 
3.2.2 Explicative pronouns 
Now I account for the distribution of the explicative pronouns it 
and there in infinitival constructions on the basis of postulated ID 
rules and principles of universal feature instantiation (see GKPS, 
pp.115-121). The feature specification \[AGR NP\[NFORM all is 
abbreviated as +a below, where a is it, there, or NORM. 
The RGPSG for English includes the ID rules 7, 
a. S --~ X2 \[-SUBJ,AGR X~ : X2 
b. VP --, \[13\] : VP\[INF\] 
c. VP -. \[1£,\] : (PP\[to\]), VP\[INF\] (7) 
d. VP -. \[17\] : NP, VP\[INF\] 
e. VP \[AGR 5"\] --. \[20\] : NP 
the simple defaults 8, 
a. SD I: if \[SUBCAT\] then \[BAR 0\] 
b. SD 8: ;f \[+V,-N,-SUBJ\] then \[+NORM\] (8) 
the extraposition metarule g, 
X~ \[AGR S\] -., W 
(9) 
X~\[+it;\] -. W,S 
and the lexical entries 10. All other nouns are specified for 
\[NFORM NflRM\] by their lexical entries. 
(it, NP \[PRO. -PLU. NFORM it;\] ) 
(there, NP \[PRO, NFORM t;here\] ) (I0) 
From the ID rules in 7, RGPSG generates the following ID 
rules. 
a. VP \[AGRI\] --~ VO \[13.AGRI\] : VP \[INF,AGRI\] 
b. VP\[AGRI\] -~ VO\[16,AGRI\] : (PP\[to\]), VP\[INF,AGRI\] 
(11) 
The absence of a controlling category allows the CAP to link the 
AGR values of the mother and VP\[INF\] predicate daughter. The 
HFC then links the AGR values of the mother and lexical head 
daughter. SD 1 specifies the head daughter for \[BAR 0\], while 
SD 2 cannot affect the linked AGR values. 
VP\[AGRI NP\[HORM\]\] --~ V0114.AGR, NP\[HORM\]\]: 
V~\[INF, AGR, NP\[NORM\]\] 
The CAP and HFC operate identically as in 11, except that the 
\[+NORM\] specification is inherited from the ID rule 7b and prop- 
agated through the rule by the CAP and HFC. 
VP\[AGR~ NP\[NORM\]\] --. V0117,AGR2 NP\[HORM\]\]: 
NPI, VP\[INF, AGRt NP\] 
(12) 
The NP daughter controls its VP\[INF\] sister, and the CAP links 
the AGR value of the VP to its sister NP. SD 2 specifies the mother 
for \[+NORM\], and the HFC forces this specification on the head 
daughter. 
The rules 13 introduce \[+it\] and \[+there\] specifications. 
Note that 13a is the result of the extraposition metarule on the 
ID rule 7e. 
a. VP\[+it\] -* \[20\] :NP, S 
b. VP\[+it\] -~ \[21\] :(PP\[to\]),S\[FIN\] (13) 
c. VP \[AGR NP\[*there.PLU ,~\] } --* \[22\] : NP \[PLU c~\] 
The rules in 13 may only expand the VP daughters of the 
ID rules 11 and 12 in a derivation (compare their AGR values). 
Thus, the grammar claims that explicative pronouns only occur 
in utterances generated using the rules in 13, in combination with 
the "extending" rules 11 and 12. This describes the following 
facts from GKPS, p. 120. I~ {It} 
*There \[continues \[ to bother \[ Lou \]\[ that Robin was chosen \]!! 
*Kim 
(14) 
*21n order to better understand these examples, associate each constituent 
with the ID rule that generated it. To help with this task, the main 
verbs and their SUBCAT values are: (continue, 18), (appear, 16), (believe, 17), (bother, 2.0), {be, f.P.). 
249 
*It } 
There \[ appeared (to us) \[ to be \[ nothing in the park Ill 
*Kim (is) { } 
Leslie \[ believed *there \[ to bother \[ u= \] \[ that Lee lied Ill 
*Kim 
(16) {'} 
We \[ believed there \[ to be \[ no flaws in the argument HI 
*Kim 
(17) 
3.2.3 Parasitic gaps 
Simple parasitic gaps, that is, those introduced in verb phrases 
by lexical rules, present no problem for RGPSG because the FFP 
demands all instantiations of SLASH on daughters to be equal to 
each other and equal to the SLASH instantiation on the mother. 
VP/NP 
vo \[13\] NP/NP (18) 
PP \['to\] /NP 
Kim wondered which models 
{ \[ had sent \[ pictures of __ \] \[ to __ \]\] } 
Sandy \[ had sent \[ pictures of __ \] \[ to Bill \]\] 
\[ had sent \[ pictures of Bill \] \[ to E II 
(19) 
The FFP insists nonlexical heads be instantiated for SLASH if 
any nonhead daughter is, thereby explaining the unacceptability 
of 20 and the acceptability of 21. 
a. * S/NP 
NP/NP 
vP (20) 
b. * Kim wondered which authors 
\[\[ reviewers of E \] \[ always detested sushi \]\] 
a. S/NP 
NP/NP 
VP/NP (21) 
b. Kim wondered which authors 
\[\[ reviewers of ~ \] \[ always detested ~\]\] 
This analysis of parasitic gaps exactly follows the one presented 
in GKPS on matters of fact. These facts may be questionable, 
however. Some sentences considered acceptable in GKPS (for 
example, Kim wondered which models Sandy had sent pictures of 
to Bill and Kim wondered which authors reviewers of always de- 
tested) axe marginal for some native English speakers. Note that 
both sentences axe marked unacceptable in the GB framework 
because of subjacency violations. 
It would be instructional to identify a~nd restrict the computa- 
tional resources provided by the formal devices in other linguistic 
theories (for example, lexical-functional grammar, government- 
binding theory, or morphological theory). Barton, Berwick, and 
Ristad (1987) explores the utility of complexity analysis in other 
linguistic domains, although the research strategy reported here 
is not the focus of that work. 
5 References 
Barton, E., 1985. On the complexity of ID/LP parsing. Compu- 
tational Linguistics 11(4):205-218. 
Barton, E., 1986. Constraint propagation in Kimrno systems. 
Proceedings of the ~4th Annual Meeting of the Association 
for Computational Linguistics. Columbia University, New 
York: Association for Computational Linguistics 
Barton, E., R. Berwick, and E. Ristad, 1987. Computational 
Complczity and Natural Language. Cambridge, MA: MIT 
Press. 
Berwick, R. and K. Wexler, 1982. Parsing efficiency and c- 
command. Proceedings of the First West Coast Conference 
on Formal Linguistics. Los Angeles, CA: University of Cali- 
fornia at Los Angeles, pp. 29-34. 
Chomsky, N., 1986. Knowledge of Language: Its Origins, Nature, 
and Use. New York: Praeger Publishers. 
Gazdar, G., E. Klein, G. Putlum, and I. Sag, 1985. Generalized 
Phrase Structure Grammar. Oxford, England: Basil Black- 
well. 
Kayne, R., 1981. Unaznbiguous paths. In Levels of Syntactic 
Representation, R. May and J. Koster, eds. Dordrecht: Foris 
Publications, pp. 143-183. 
Pesetsky, D., 1982. Paths and categories. Ph.D. dissertation, 
MIT Department of Linguistics and Philosophy, Cambridge, 
MA. 
Ristad, E.S., 1986a. Computational complexity of current GPSG 
theory. Proceedings of the 2~th Annual Meeting of the As- 
sociation for Computational Linguistics. Columbia Univer- 
sity, N. ew York: Association for Computational Linguistics, 
pp. 30-39. 
Ristad, E.S., 1986b. Complexity of linguistic models: a com- 
putational analysis and reconstruction of generalized phrase 
structure grammar. S.M. Thesis, MIT Department of Elec- 
trical Engineering and Computer Science, Cambridge, MA. 
Shieber, S., 1986. A simple reconstruction of GPSG. Proceed- 
ings of the 11th International Conference on Computational 
Linguistics. Bonn, West Germany, 20-22 August, 1986. 
4 Conclusion 
This work is similar to that of Shieber (1986) in its attempt to 
reconstruct GPSG theory. Shieber, however, is concerned solely 
with creating a more easily implementable description of GPSG 
theory, rather than with changing the theory in a linguistically 
or computationally significant way. 
250 
