SYN'I'ACI IC CONSTI~,,\INTS AND F~FI:ICIFNI' I~AI(SAI~,II.I'I'Y 
Robert C. Berwick 
Room 820, MIT Artificial Intelligence l,aboratory 
545 Technology Square, Cambridge, MA 02139 
Amy S. Weinberg 
Deparuncnt of Linguistics, MIT 
Cambridge, MA 02139 
ABSTRACT 
A central goal of linguistic theory is to explain why natural 
languages are the way they are. It has often been supposed that 
com0utational considerations ought to play a role in this 
characterization, but rigorous arguments along these lines have been 
difficult to come by. In this paper we show how a key "axiom" of 
certain theories of grammar, Subjacency, can be explained by 
appealing to general restrictions on on-line parsing plus natural 
constraints on the rule-writing vocabulary of grammars. The 
explanation avoids the problems with Marcus' \[1980\] attempt to 
account for the same constraint. The argument is robust with respect 
to machine implementauon, and thus avoids the problems that often 
arise wilen making detailed claims about parsing efficiency. It has the 
added virtue of unifying in the functional domain of parsing certain 
grammatically disparate phenomena, as well as making a strong claim 
about the way in which the grammar is actually embedded into an 
on-line sentence processor. 
I INTRODUCTION 
In its short history, computational linguistics has bccn driven by 
two distinct but interrelated goals. On the one hand, it has aimed at 
computational explanations of distinctively human linguistic behavior 
-- that is, accounts of why natural languages are the way they are 
viewed from the perspective of computation. On the other hand, it has 
accumulated a stock of engineenng methods for building machines to 
deal with natural (and artificial) languages. Sometimes a single body 
of research has combined both goals. This was true of the work of 
Marcus \[1980\]. for example. But all too often the goals have remained 
opposed -- even to the extent that current transformational theory has 
been disparaged as hopelessly "intractable" and no help at all in 
constructing working parsers. 
This paper shows that modern transformational grammar (the 
"Government-Binding" or "GB" theory as described in Chomsky 
\[1981\]) can contribute to both aims of computational linguistics. We 
show that by combining simple assumptions about efficient parsability 
along with some assumpti(ms about just how grammatical theory is to 
be "embedded" in a model of language processing, one can actually 
explain some key constraints of natural languages, such as Suhjacency. 
(The a)gumcnt is differmlt frt)m that used in Marcus 119801.) In fact, 
almost the entire pattern of cunstraints taken as "axioms" by the GB 
thct)ry can be accutmtcd tbr. Second, contrary to what has sometimes 
been supposed, by exph)iting these constraints wc can ~how that a 
Gll-based theory is particularly compatil)le v~idl efficient parsing 
designs, in particdlar, with extended I I~,(k,t) parsers (uf the sort 
described by Marcus \[1980 D. Wc can extcnd thc I,R(k.t) design to 
accommodate such phenomena as antecedent-PRO and pronominal 
binding. Jightward movement, gappiug, aml VP tlcletion. 
A, Functional Explanations o__f I,ocality Principles 
Let us consider how to explain locality constraints in natural 
languages. First of all, what exactly do we mean by a "locality 
constraint"? "\]'he paradigm case is that of Subjacency: the distance 
between a displaced constituent and its "underlying" canonical 
argument position cannot be too large, where the distance is gauged (in 
English) in terms of the numher of the number of S(entence) or NP 
phrase boundaries. For example, in sentence (la) below, John (the 
so-called "antecedent") is just one S-boundary away from its 
presumably "underlying" argument position (denoted "x", the 
"trace")) as the Subject of the embedded clause, and the sentence is 
fine: 
(la) John seems \[S x to like ice cream\]. 
However, all we have to do ts to make the link between John and x 
extend over two S's, and the sentence is ill-formed: 
(lb) John seems \[S it is certain \[S x to like ice cream 
This restriction entails a "successive cyclic" analysis of 
transformational rules (see Chomsky \[1973\]). In order to derive a 
sentence like (lc) below without violating the Subjacency condition, 
we must move the NP from its canonical argument position through 
the empty Subject position in the next higher S and then to its surface 
slot: 
(lc) John seems tel to be certain x to get the ice cream. 
Since the intermediate subject position is filled in (lb) there is no licit 
derivation for this sentence. 
More precisely, we can state the Subjacency constraint as follows: 
No rule of grammar can involve X and Y in a configuration like the 
following, 
\[ ...x...\[,, ...\[/r..Y...\]...l ...X...\] 
where a and # are bounding nodes (in l.'nglish, S or NP phrases). " 
Why should natural languages hc dcsigned Lhis way and not some 
other way? Why, that is, should a constraint like Subjaccncy exist at 
all? Our general result is that under a certain set of assumptions about 
grammars and their relationship to human sentence processing one can 
actually expect the following pattern of syntactic igcality constraints: 
(l) The antecedent-trace relationship must 
obey Subjaccncy, but other "binding" 
realtionships (e.g., NP--Pro) need not obey 
Subjaccncy. 
119 
(2) Gapping constructitms must be subject 
to a bounding condition resembling 
Subjacency. but VP deletion nced not be. 
(3) Rightward movemcnt must be stricdy 
bounded. 
To the extent that this predicted pattern of constraints is actually 
observed -- as it is in English and other languages -- we obtain a 
genuine functional explanation of these constraints and support for the 
assumptions themselves. The argument is different from Man:us' 
because it accounts for syntactic locality constraints (like Subjaceney) 
,as the joint effect of a particular theory of grammar, a theory of how 
that grammar is used in parsing, a criterion for efficient parsability. 
and a theory of of how the parser is builL In contrast, Marcus 
attempted to argue that Subjaceney could be derived from just the 
(independently justified) operating principles of a particular kind of 
parser. 
B. Assumptions. 
The assumptions we make are the following: 
(1) The grammar includes a level of 
annotated surface structure indicating how 
constituents have been displaced from their 
canonical predicate argument positions. 
Further, sentence analysis is divided into 
two stages, along the lines indicated by tile 
theory of Government and Binding: the 
first stage is a purely syntactic analysis that 
rebuilds annotated surface structure; the 
second stage carries out the interpretation 
of variables, binds them to operators, all 
making use of the "referential indices" of 
NPs. 
(2) To be "visible" at a stage of analysis a 
linguistic representation must be written in 
the vocabulary of that level. For example, 
to be affected by syntactic operations, a 
representation must be expressed in a 
syntactic vocabulary (in the usual sense); to 
be interpreted by operations at the second 
stage, the NPs in a representation must 
possess referential indices. (This 
assumption is not needed to derive the 
Subjaccncy constraint, but may be used to 
account for another "axiom" of current 
grammatical theory, the so-called 
"constituent command" constraint on 
antecedcnLs and the variables that they 
hind.) This "visibility" assumption is a 
rather natural one. 
(3) The rule-writing vocabulary of the 
grammar cannot make use of arithmetic 
predicates such as "one", "two" or "three". 
but only such predicates as "adjacent". 
Further, quzmtificational statements are not 
allowed m rt.les. These two assumptions 
are also rather standard. It has often been 
noted that grammars "do not count" -- that 
grammatical predicates are structurally 
based. There is no rule of grammar that 
takes the just the fourth constituent of a 
sentence and moves it, for example. In 
contrast, many different kinds of rules of 
grammar make reference to adjacent 
constituents. (This is a feature found in 
morphological, phonological, and syntactic 
rules.) 
(4) Parsing is no....! done via a method that 
carries along (a representation) of all 
possible derivations in parallel. In 
particular, an Earley-type algorithm is ruled 
out. To the extent that multiple options 
about derivations are not pursued, the parse 
is "deterministic." 
(5) The left-context of the parse (as defined 
in Aho and Ullman \[19721) is literally 
represented, rather than generatively 
represented (as, e.g., a regular set). In 
particular, just the symbols used by the 
grammar (S, NP. VP...) are part of the 
left-context vocabulary, and not "complex" 
symbols serving as proxies for the set of 
lefl.-context strings. 1 In effect, we make the 
(quite strong) assumption that the sentence 
processor adopts a direct, transparent 
embedding of the grammar. 
Other theories or parsing methods do not meet these constraints 
and fail to explain the existence of locality constraints with respect to 
thts particular set of assumpuons. 2 For example, as we show, there is 
no reason to expect a constraint like Subjacency in the Generalized 
Phrase Structure Grammars/GPSGsl of G,zdar 119811, because there 
is no inherent barrier to eastly processing a sentence where an 
antecedent and a trace are !.mboundedly far t'rt~m each other. 
Similarly if a parsing method like Earlcy's algorithm were actually 
used by people, than Sub\]acency remains a my:;tcry on the functional 
grounds of efficient parsability. (It could still be explained on other 
functional grounds, e.g., that oflearnability.) 
II PARSING AND LOCALITY PRINCIPLES 
To begin the actual argument then, assume that on-line sentence 
processing is done by something like a deterministic parser) 
Sentences like (2) cause trouble for such a parser: 
(2) What i do you think that John told Mary...mat ne 
would like to eat % 
t. Recall that the suoec.~i~'e lines of a left- or right-most derivation in a context-free 
grammar cnnstttute a regular Language. ~.~ shown m. e.g.. DcRemer \[19691. 
2. Plainly. one is free to imagine some other set of assumptions that would do the job. 
3. If one a.ssumcs a backtracking parser, then the argument can also be made to go 
through, but only by a.,,,,~ummg that backtracking Ks vcr/co~tlS, Since this son of parser 
clearly ,,~ab:~umes the IR(kPt,',pe machines under t/le right co,mrual of 'cost". we make 
the stronger assumption of I R(k)-ncss. 
120 
The problem is that on recognizing the verb eat the parser must decide 
whether to expand the parse with a trace (the transitive reading) or 
with no postverbal element (.the intransitive reading). The ambiguity 
cannot be locally resolved since eat takes both readings. It can only be 
resolved by checking to see whether there is an actual antecedent. 
Further, observe that this is indeed a parsing decision: the machine 
must make some decision about how to tu build a portion of the parse 
tree. Finally, given non-parallelism, the parser is not allowed to pursue 
both paths at once: it must decide now how to build the parse tree (by 
inserting an empty NP trace or not). 
Therefore, assuming that the correct decision is to be made on-line 
(or that retractions of incorrect decisions are costly) there must be an 
actual parsing rule that expands a category as transitive iff there is an 
immediate postverbal NP in the string (no movement) or if an actual 
antecedent is present. However, the phonologically overt antecedent 
can be unboundedly far away from the gap. Therefore, it would seem 
that the relevant parsing rule would have to refer to a potentially 
unbounded left context. Such a rule cannot be stated in the finite 
control table of an I,R(k) parser. Theretbre we must find some finite 
way of expressing the domain over which the antecedent must be 
searched. 
There are two ways of accomplishing this. First, one could express 
all possible left-contexts as somc regular set and then carry this 
representation along in the finite control table of the I,R(k) machine. 
This is always pu,,;sible m the case of a contcxt-fiee grammar, and m 
fact is die "standard" approach. 4 However, m the case of (e.g.) ,,h 
moven!enk this demands a generative encoding of the associated finite 
state automaton, via the use of complex symbols like "S/wh" 
(denoting the "state" that a tvtt has been encountered) and rules to pass 
king this nun-literal representation of the state of the parse. Illis 
approach works, since wc can pass akmg this state encoding through 
the VP (via the complex non-terminal symbol VP/wh) and finally into 
the embedded S. This complex non-terminal is then used to trigger an 
expansion of eat into its transitive form. Ill fact, this is precisely the 
solution method advocated by Gazdar. We ~ce then that if one adopts 
a non-terminal encoding scheme there should he no p,oblem in 
parsing any single long-distance gap-filler relationship. That is, there 
is no need for a constraint like Subjacency. s 
Second, the problem of unbounded left-context is directly avoided 
if the search space is limited to some literally finite left context. But 
this is just what the Sttbjacency c(mstraint does: it limits where an 
antecedent NP could be to an immediately adjacent S or S. This 
constraint has a StlllpJe intcrprctatum m an actual parser (like that built 
hy Murcus \[19};0 D. l'he IF-THEN pattern-action rules that make up 
the Marcus parser's ~anite control "transi:ion table" must be finite in 
order to he stored ioside a machine. The rule actions themselves are 
literally finite. If the role patterns must be /herally stored (e.g., the 
pattern \[S \[S"\[S must be stored as an actual arbitrarily long string ors 
nodes, rather than as the regular set S+), then these patterns must be 
literally finite. That is, parsing patterns must refer to literally hounded 
right and left context (in terms of phrasal nodes). 6 Note Further that 
4 Following the approactl of DcRemer \[\]969\], one budds a finHe stale automaton Lhat 
reco~nl/es exactly Ihe set of i¢\[t-(OIIlext strings that cain arise during the course of a 
right-most derivation, the so-Gilled ch,melert.sllcf'.nife s/ale ClUlOmC~lott. 
5 l'laml} the same Imlds for a "hold cell" apploaeh \[o compulm 8 filler-gap 
relallonshipi 
6. Actually Uteri. lhJ8 k;nd or device lall!; lllto lJae (~itegoly of bounded contc;~t parsing. 
a.'~ defiued b~. I \]oyd f19(.)4\]. 
this constraint depends on the sheer represcntability of the parser's 
rule system in a finite machine, rather than on any details of 
implementation. Therefore it will hold invariantly with respect to 
rnactfine design -- no matter kind of machine we build, if" we assume a 
literal representation of left-contexts, then some kind t)f finiteness 
constraint is required. The robustness of this result contrasts with the 
usual problems in applying "efficiency" results to explain grm'~T""'!cal 
constraints. These often fail because it is difficult to consider all 
possible implcmentauons simultaneously. However, if the argument is 
invariant with respect to machine desing, this problem is avoided. 
Given literal left-contexts and no (or costly) backtracking, the 
argument so far motivates some bounding condition for ambiguous 
sentences like these. However, to get the lull range of cases these 
functional facts must interact with properties of the rule writing system 
as defined by the grammar. We will derive the litct that the Imunding 
condition must be ~acency (as opposed to tri- or quad-jaccncy) by 
appeal to the lhct that grammatical c~m~tramts and rules arc ~tated in a 
vocabtdary which is non-c'vunmtg. ,',rithmetic predicates are 
forbidden. But this means that since only the prediu~lte "ad\].cent" is 
permitted, any literal I)ouuding rc,~trict\]oi\] must be c.xprc,~)cd m tcrlllS 
of adjacent domains: t~e~;ce Subjaccncy. INert that ",djacent" is also 
an arithmetic predicate.) l:urthcr. Subjaccncy mu,,t appiy ~.o ,ill traces 
(not ju',t traces of,mlb=guously traw~itive/imransi\[ive vcrb,o in:cause a 
restriction to just the ambiguous cases would low)ire using cxistentml 
quantilicati.n. Ouantificatiomd predicates are barred in the rule 
writing vocabulary of natural grammars. 7 
Next we extend the approach to NP movement and Gapping. 
Gapping is particularly interesting because it is difficult ~o explain 
why this construction (tmlike other deletiou rules) is bounded. That is, 
why is (3) but not (4) grammatical: 
(3) John will hit Frank and Bill will \[ely P George. 
*(4)John will hit Frank and I don't believe Bill will 
\[elvpGeorge. 
The problem with gapping constructions is that the attachment of 
phonologically identical complements is governed by the verb that the 
complement follows. Extraction tests show that in {5) the pilrase u/?er 
M'ao' attaches to V" whde in (6) it attaches to V" (See Hornstem and 
Wemberg \[\]981\] for details.} 
(5) John will mn aftcr Mary. 
(6) John will arrivc after Mary. 
In gapping structures, however, the verb of the gapped constituent ,s 
not present in the string. Therefore. correct ,lltachrnent o( the 
complement can only be guaranteed by accessing the antecedent in the 
previous clause. If this is true however, then the boundlng argument 
for Suhjacency applies to this ease as well: given deterministic parsing 
of gapping done correctly, and a literal representation of left-context, 
then gapping must be comext-bounded. Note that this is a particularly 
7 Of course, there zs a anolhcr natural predic.atc Ihat would produce a finite bound on 
rule context: i\[ ~\]) alld Irate hod I. bc in tile .ame S donlalll Prc~umahb', lhls is also an 
Optlllt3 ~l;iI could gel reah,ed in qOII|C n.'Ittlral l~rJoln'iai~: ll'ic resuhing languages would 
no( have ov,,:rt nlo~.eIIICill OUlside o\[ an S. %o(e lllal Lhc naltllal plcdJc;des simply give 
the ranta¢ of po~edble ndiulal granmlars. \]lot those actually rour~d. 
The elimination of quanllfil',.llion predic~les is supportable on grounds o(acquisltton. 
121 
interesting example bccause it shows how grammatically dissimilar 
operations like wh-movement and gapping can "fall together" in the 
functional domain of parsing. 
NP-trace and gaplSing constructions contrast with 
antecedentY(pro)nominal binding, lexical anaphor relationships, and 
VP deletion. These last three do not obey Subjacency. For example, a 
Noun Phrase can be unboundedly far from a (phonologically empty) 
PRO. even in tenns of 
John i thought it was certain that... \[PRO i feeding himself\] 
would be easy. 
Note though that in these cases the expansion of the syntactic tree does 
no._At depend on the presence or absence of an antecedent 
(Pro)nominals and Icxical anaphors are phonologically realized in the 
string and can unambiguously tell the parser hew to expand the tree. 
(After the tree is fully expanded the parser may search back to see 
whether the element is bound to an antecedent, but this is not a 
parsing decision,) VP deletion sites are also always locally detectable 
from ~e simple fact that every sentence requires a VP. The same 
argument applies to PRO. PRO is locally detectable as the only 
phonologically unrealized element that can appear in an ungoverned 
context, and the predicate "ungoverned" is local. 8 In short, there is no 
parsing decision that hinges on establishing the PRO-antecedent. VP 
deletion-antecedent, t)r lexical anaphor-antecedent relationship. But 
then, we should not expect bounding principles to apply in thcse cases, 
and, in fact, we do not find these elements subject to bounding. Once 
again then. apparently diverse grammaucal phcnomc,m behave alike 
within a functional realm. 
To summarize, we can explain why Subjacency applies to exactly 
those elements that the grammar stipulates it must apply to. We do 
this using both facts about the functional design of a parsing system 
and properties of the formal rule writing vocabulary, l'o the extent 
that the array of assumpuons about the grammar and parser actually 
explain this observed constraint on human linguistic behavior, we 
obtain a powerful argument that certain kinds of grammatical 
represenumons and parsing dcstgns are actually implicated in human 
sentence processing. 
Chomsky, Noam \[19811 Lectures on Gove,nmem and Binding, Foris 
Publications. 
I)eRerner, Frederick \[1969\] Practical 7"nms,':m~sJbr IR(k) I.angu,ges, 
Phi) di.~scrtation, MIT Department of Electrical Engineering and 
Computer Science. 
Floyd, Robert \[1964\] "Bounded-context syntactic analysis." 
Communtcations of the Assoctatiotl for Computing ,l.lachinery, 7, pp, 
62-66. 
Gazdar, Gerald \[19811 "Unbounded dependencies and coordinate 
structure," Linguistic Inquiry, 12:2 I55-184. 
Hornstein. Norbert and Wcinherg, Amy \[19811 "Preposition stranding 
and case theory," LingutMic \[nquio,, 12:1. 
Marcus, Mitchell \[19801 A Theory of Syntactic Recognition for Natural 
Language, M IT Press 
111 ACKNOWLEDGEIvlENTS 
This report describes work done at the Artificial Intelligence 
Laboratory of the Massachusetts Institute ofl'cchnt)logy. Support for 
the Laboratory's artificial intelligence research is prey)deal in part by 
tiac Advanced P, esearch ProjccLs Agency of the Department of Defense 
under Office ()f Naval Research Contract N00014-80-C-0505. 
IV REFERENCES 
Aho, Alfred and Ullman, Jeffrey \[1972\] The Theory of Parsing 
Trnn.~lalion, attdCumpiiing, vo\[. \[., Prentice-(-{all. 
Chumsky, Noam \[1973\] "Conditions on 'rransformations,"in S. 
Anders(m & P Kiparsky, eds. A Feslschr(l'l \[or Morris Halle. Holt, 
Rinehart and Winston. 
8 F;hlce ~ ~s ungovcNicd fff a ~ovct'llcd t:~ F;L\[:~c, and a go~c,'m~J is a bounded predicate, 
i hcmg Lcstrictcd Io mu~',dy a ~in~i¢ lllaX1111;il Drojcctlon (at worst al| S). 
122 
