An Attribute-Grammar Implementation of Government-bindlng Theory 
Nelson Correa 
Department of Electrical and Computer Engineering 
Syracuse University 
111 Link Hall 
Syracuse, NY 13244 
ABSTRACT 
The syntactic analysis of languages with respect to 
Government-binding (GB) grammar is a problem 
that has received relatively little attention until 
recently. This paper describes an attribute grammar 
specification of the Government-binding theory. The 
paper focuses on the description of the attribution 
rules responsible for determining antecedent-trace 
relations in phrase-structure trees, and on some 
theoretical implications of those rules for the GB 
model. The specification relies on a 
transformation-lem variant of Government-binding 
theory, briefly discussed by Chomsky (1981), in 
which the rule move-a is replaced by an interpretive 
rule. Here the interpretive rule is specified by means 
of attribution rules. The attribute grammar is 
currently being used to write an English parser 
which embodies the principles of GB theory. The 
parsing strategy and attribute evaluation scheme 
are cursorily described at the end of the paper. 
Introduction 
In this paper we consider the use of attribute gram- 
mars (Knuth, 1968; Waite and Goos, 1984) to pro- 
vide a computational definition of the Government- 
binding theory layed out by Chomsky (1981, 1982). 
This research thus constitutes a move in the direc- 
tion of seeking specific mechanisms and realizations 
of universal grammar. The attribute grammar pro- 
vides a specification at a level intermediate between 
the abstract principles of GB theory and the partic- 
ular automatons that may be used for parsing or 
generation of the language described by the theory. 
Almost by necessity and the nature of the goal set 
out, there will be several arbitrary decisions and 
details of realization that are not dictated by any 
particular linguistic or psychological facts, but 
perhaps only by matters of style and possible com- 
putational efficiency considerations in the final pro- 
duct. It is therefore safe to assume that the partic- 
ular attribute grammar that will be arrived at 
admits of a large number of non-isomorphic vari- 
ants, none of which is to be preferred over the oth- 
ers a priori. The specification given here is for 
English. Similar specifications of the parametrized 
grammars of typologically different languages may 
eventually lead to substantive generalizations about 
the computational mechanisms employed in natural 
languages. 
The purpose of this research is twofold: First, 
to provide a precise computational definition of 
Government-binding theory, as its core ideas are 
generally understood. We thus begin to provide an 
answer to criticisms that have recently been leveled 
against the theory regarding its lack of formal expli- 
citness (Gazdar et aI., 1985; PuUum, 1985). Unlike 
earlier computational models of GB theory, such as 
that of Berwick and Weinberg (1984), which 
assumes Marcus' (1980) parsing automaton, the 
attribute grammar specification is more abstract 
and neutral regarding the choice of parsing'auto- 
mata. Attribute grammar offers a language 
specification frsxnework whose formal properties are 
generally well-understood and explored. A second 
and more important purpose of the present research 
is to provide an alternate and mechanistic charac- 
terization of the principles of universal grammar. 
To the extent that the implementation is correct, 
the principles may be shown to follow from the sys- 
tem of attributes in the grammar and the attribu- 
tion rules that define their values. 
The current version of the attribute grammar 
is presently being used to implement an English 
parser written in Prolog. Although the parser is not 
yet complete, we expect that its breath of coverage 
of the language will be substantially larger than 
that of other Government-binding parsers recently 
reported in the literature (Kashket (1986), Kuhns 
(1986), Sharp (1985), and Wehrli (1984)). Since the 
parser is firmly based on Government-binding 
theory, we expect its ability to handle natural 
language phenomena to be limited only by the accu- 
racy and correctness of the underlying theory. 
In the development below I will assume that 
the reader is familiar with the basic concepts and 
terminology of Government-binding theory, as well 
as with attribute grammars. The reader is referred 
to Sells (1985) for a good introduction to the 
45 
relevant concepts of GB theory, and to Waite and 
Goos (1984) for a concise presentation on attribute 
grammars. 
The Grammatical Model Asstuned 
For the attribute grammar specification we assume 
a transformation-less variant of Government- 
binding theory, briefly discussed by Chomsky (1981, 
p.89-92), in which rule move-a is eliminated in favor 
of a system Ma of interpretive rules which deter- 
mines antecedent-trace relations. A more explicit 
propceal of a similar nature is also made by Koster 
(1978). We assume a context-free base, satisfying 
the principles of X'-theory, which generates directly 
structure trees at a surface structure level of 
representation. S-structure may be derived from 
surface structure by application of Ma. The rest of 
the theory remains as in standard Government- 
binding (except for some obvious reformulation of 
principles that refer to Grammatical Functions at 
D-Structure). 
The grammatical model that obtains is that 
of (1). The base generates surface structures, with 
phrases in their surface places along with empty 
categories where appropriate. Surface structure is 
identical to S-structure, except for the fact that the 
association between moved phrases and their traces 
is not present; chain indices that reveal history of 
movement in the transformational account are not 
present. The interpretive system Ma, here defined 
by attribution rules, then applies to construct the 
absent chains and thus establish the linking rela- 
tions between arguments and positions in the argu- 
ment structures of their predicates, yielding the S- 
structure level. In this manner the operations form- 
erly carried out by transformations reduce to attri- 
bute computations on phrase-structure trees. 
(1) 
Context-free base 
I 
Surface structure 
\]Ma 
S-Structure 
/ \ 
PF LF 
Interpretive Rule 
I sketch briefly how the interpretive system M~ is 
defined. Two attributes node and Chain are associ- 
ated with NP, and a method for functionally classi- 
fying empty categories in structure trees is 
developed (relying on conditions of Government and 
Case-marking). In addition, two attributes A-Chain 
and A-Chain are defined for every syntactic 
category which may be found in the c-command 
domain of NP. In particular, A-Chain and A'- 
Chain are defined for C, COMP', S, INFL', VP, and 
V' (assuming Chomsky's (1986) two-level X'- 
system). The meanings attached to these attributes 
are as follows. Node defines a preorder enumeration 
of tree nodes; Chain is an integer that represents 
the syntactic chain to which an NP belongs; 
A -Chain (A-Chain) determines whether an argu- 
ment (non-argument) chain propagates across a 
given node of a tree, and gives the number of that 
chain, if any. 
Somewhat arbitrarily, and for the sake of 
concreteness, we assume that a chain is identified by 
the node number of the phrase that heads the chain. 
For the root node, the attribution rules dic- 
tate A-Chain ~- X-Chain -~ O. The two attri- 
butes are then essentially percolated downwards. 
However, whenever a lexical NP or PRO is found in 
a 8-position, an argument chain is started, setting 
the value of A-Chain to the node number of the 
NP found, which is used to identify the new chain. 
Thus NP traces in the c-command domain of the 
NP are able to identify their antecedent. Similarly, 
when a Wh-phrase is found in COMP specifier posi- 
tion, the value of A-Chain is set to the chain 
number of that phrase, and lower Wh-traces may 
pick up their antecedent in a similar fashion. 
Downwards propagation of the attributes 
A-Chain and A-Chain explains in a simple way 
the observed c-command constraint between a trace 
and its antecedent. 
The precise statement of the attribution rules 
that implement the interpretive rule described is 
given in Appendix A. In the formulation of the 
attribution rules, it is assumed that certain other 
components of Government-binding theory have 
already been implemented, in particular parts of 
Government and Case theories, which contribute to 
the functional determination of empty categories. 
The implementation of the relevant parts of these 
subtheories is described elsewhere (Correa, in 
preparation). We assume that all empty categories 
are base-generated, as instances of the same EC 
\[#p e \]. Their types are then determined structur- 
ally, in manner similar to the proposal made by 
Koster (1978). The attributes empty, pronominal, 
and anaphoric used by the interpretive system 
achieve a full functional partitioning of NP types 
(van Riemsdijk and Williams (1986), p.278); their 
46 
values are defined by attribution rules in Appendix 
B, relying on the values of the attributes Governor 
and Caees. The values of these attributes are in 
turn determined by the Government and Case 
theories, respectively, and indicate the relevant 
governor of the NP and grammatical Case assigned 
to it. 
The claim associated with the interpretive 
rule, as it is implemented in Appendix A, is that 
given a eur\]'aee etr~eture in the sense defined above, 
it will derive the correct antecedent-trace relations 
after it applies. An illustrative sample of its opera- 
tion is provided in (3), where the (simplified) struc- 
ture tree of sentence (2) is shown. The annotations 
superscripted to the C, COMP', S, INFL', VP, and 
V' nodes are the A-Chain and A-Chain attri- 
butes, respectively. Thus, for the root node, the 
value of both attributes is zero. Similarly, the 
superscripts on the NP nodes represent the node 
and Chain attributes of the NP. The last NP in 
the tree, complement of 'love', thus bears node 
number 5 and belongs to Chain 1. 
Some Theoretical Implications: Bounding 
Nodes and Subjaeency 
In Government-binding theory it is assumed that 
the set of bounding nodes that a language may 
select is not fixed across human languages, but is 
open to parametric variation. Rizzi (1978) observed 
that in Italian the Subjacency condition is systemat- 
ically violated by double Wh-extraction construc- 
tions, as in (4.a), if one assumes for Italian the same 
set of bounding nodes as for English. The analogous 
construction (4.b) is also possible in Spanish. A 
solution, considered by Rizzi to explain the gram- 
maticality of (4), is to assume that in Italian and 
Spanish, COMP specifier position may be "doubly 
filled" in the course of a transformational deriva- 
tion, while requiring that it be not doubly filled (by 
non-empty phrases) at S-Structure. Thus both 
moved phrases 'a cui' and 'the storie' can move to 
the lowest COMP position in the first transforma- 
tional cycle, while in the second cycle 'a cui' may 
move to the next higher COMP and 'che storie' 
stays in the first COMP. 
(2) Who~ did Johny seem \[ e, \[ ej to love e,\] 
(3) c(e,o) 
Np(m) COMP1 (o,1) 
Who, COMP S (~1) 
did Np(~=) INFL I (2,1) 
John2 INFL VP (2'1) 
I 
V ~ (2,1) 
V C (2'1) 
{ 
seem Np(~n COMP~ (zn 
COMP S (zl) el 
l',,II:, ('-,2) INFL I 
i 
e2 
(0,1) 
INFL VP (°'1) 
I I to V I (o,1) 
V NP (6'1) 
I I 
love el 
47 
A second solution, which is the one adopted 
by Rizzi and constitutes the currently accepted 
explanation of the (apparent) Subiacency violation, 
is to assume that Italian and Spanish select C and 
NP as bounding nodes, a set different from that of 
English. The first phrase 'che storie' may then 
move to the lowest COMP position in the first 
transformational cycle, while the second, 'a cui', 
moves in the next cycle in one step to the next 
higher position, crossing two S nodes but, crucially, 
only one C node. Thus Subjaceney is satisfied if C, 
not S, is taken as a bounding node. 
(4) a. Tuo fratello, \[a eui\]i mi domando \[che 
storie\]~ abbiano raccontato e i el, era molto 
preoccupato. 
Your brother, to whom I wonder what stories 
they have told, was very worried. 
b. Tu hermano, \[a quien\]i me pregunto \[que 
historias\]i le habran contado ej el, estaba 
muy preocupado. 
The empirical data that arguably distin- 
guishes between the two proposed solutions is (5.a). 
While the "doubly filled" COMP hypothesis allows 
indefinitely long Wh-chains with doubly filled 
COMPs, making it possible for a wh-chain element 
and its successor to skip more than one COMP posi- 
tion that already contains some wh-phrase, the 
"bounding node" hypothesis states that at most one 
filled COMP position may be skipped. Thus, the 
second hypothesis, but not the first, correctly 
predicts the ungrammaticality of (5.a). 
(5) a. * Juan, \[a quien\]i no me imagino \[cuanta 
gente\]i ej sabe donde~ han mandado el ek, 
desaparecio ayer. 
Juan, whom I can't imagine how many people 
know where they have sent, disappeared yes- 
terday. 
b. La Gorgona, \[a donde\]i no me imagino 
\[cuanta gente\]j ej sabe \[a quienes\], han 
mandado et el, es una bella isla. 
La Gorgona, to where I can't imagine how 
many people know whom they have sent, is a 
beautiful island. 
One mi~t observe, however, that (5.a), even 
if it satisfies subjacency, violates Peseteky's (1982) 
Path Containment Condition (PCC). Thus, on these 
grounds, (5.a) does not decide between the two 
hypotheses. The grammaticality of (5.b), on the 
other hand, which is structurally similar to (5.a) but 
satisfies the PCC, argues in favor of the "doubly 
filled" COMP hypothesis. The wh-phrase 'a donde' 
moves from its D-Structure position to the surface 
position, skipping two intermediate COMP posi- 
tions. This is possible if we assume the doubly filled 
COMP hypothesis, and would violate Subjacency 
under the alternate hypothesis, even if C is taken as 
the bounding node. We expect a similar pattern 
(5.b) to be also valid in Italian. 
Movement across doubly filled COMP nodes, 
satisfying Pesetsky's (1982) Path Containment Con- 
dition, may be explained computationally if we 
assume that the type of the A -Chain attribute on 
chain nodes is a last-in/first, out (lifo) stack of 
integers, into which the integers identifying ,~-chain 
heads are pushed as they are first encountered, and 
from which chain identifiers are dropped as the 
chains are terminated. If we further assume that 
the type of the attribute is universal, we may 
explain the typological difference between Italian 
and English, as it refers to the Subjacency condi- 
tion, by assuming the presence of an A-Chain 
atack depth bound, which is parametrized by univer- 
sal grammar, and has the values 1 for English, and 
2 (or possibly more) for Italian and Spanish. 
To conclude this section, it is worth to review 
the manner in which the subjacency facts are 
explained by the present attribute grammar imple- 
mentation. Notice first that there is no particular 
set of categories in the theory that have been 
declared as Bounding categories. There is no special 
procedure that checks that the Subjacency condi- 
tion is actually satisfied by, say, traversing paths 
between adjacent chain elements in a tree and 
counting bounding nodes. Instead, the facts follow 
from the attribution rules that determine the values 
of the attributes A-Chain and X-Chain. This 
can be verified by inspection of the possible cases of 
movement. 
Thus, NP-movement is from object or INFL 
specifier position to the nearest INFL specifier which 
c-commands the extraction site. Similarly, Wh- 
movement is from object, INFL specifier, or COMP 
specifier position to the nearest c-commanding 
COMP specifier. If the bound on the depth of the 
A-Chain stack is 1, either S or COMP' (but not 
both) may be taken as bounding node, and Wh- 
island phenomena are observable. If the bound is 2 
or greater, then C is the closest approximation to a 
bounding node (although cf. (5.b)), and Wh-island 
violations which satisfy the PCC are possible. NP 
is a bounding node as a consequence of the strong 
condition that no chain spans across an NP node, 
which in turn is a consequence of the rules (ii.e) in 
Appendix A. 
48 
Parser Implementation 
A prototype of the English parser is currently being 
developed using the Prolog logic programming 
language. As mentioned in the introduction, the 
attribute grammar specification is neutral regarding 
the choice of parsing automaton. Thus, several 
suitable parser construction techniques (Aho and 
Ullman, 1972) may be used to derive a parser. The 
context-free base used by the attribute grammar is 
an X'-grammar, essentially as in Jackendoff (1977), 
although some modifications have been made. In 
particular, following Chomsky (1986) we assume 
that maximal projections have uniformly bar-level 2 
and that S is a projection of INFL, not V, as Jack- 
endoff assumes. The base, due to left-recursion in 
several productions, is not LR(k), for any k. 
We have developed a parser which is essen- 
tially LL(1), and incorporates a stack depth bound 
which is linearly related to the length of the input 
string. Prolog's backtracking mechanism provides 
the means for obtaining alternate parses of syntacti- 
cally ambiguous sentences. The parser performs rea- 
sonably well with a good number of constructions 
and, due to the stack bound, avoids potentially 
infinite derivations which could arise due to the 
application of mutually recursive rules. Attributes 
are implemented by logical variables which are asso- 
ciated with tree nodes (cf. Arbab, 1986). Most attri- 
butes can be evaluated in a preorder traversal of the 
parse tree, and thus attribute evaluation may be 
combined with LL(1) parser actions. Notable excep- 
tions to this evaluation order are the attributes 
Governor, Cases, and Os associated with the NP in 
INFL specifier position. The value of these attri- 
butes cannot be determined until the main verb of 
the relevant clause is found. 
Conclusions 
We have presented a computational specification of 
a fragment of Government-binding theory with 
potentially far-reaching theoretical and practical 
implications. From a theoretical point of view, the 
present attribute grammar specification offers a 
fairly concrete framework which may be used to 
study the development and stable state of human 
linguistic competence. From a more practical point 
of view, the attribute grammar serves as a Starting 
point for the development of high quality parsers for 
natural languages. To the extent that the 
specification is explanatorily adequate, the language 
described by the grammar (recognized by the 
parser) may be changed by altering the values of 
the universal parameters in the grammar and 
changing the underlying lexicon. 
Acknowledgements 
I would like to thank my dissertation advisor, Jaklin 
Kornfilt, for helpful and timely advise at all stages 
of this research. Also, I wish to thank an 
anonymous ACL reviewer who pointed out the simi- 
laxity of the grammatical model I assume to that 
proposed by Koster (1978), Mary Laughren and 
Beth Levin for their discussion and commentary on 
related aspects of this research, Ed Barton, who 
kindly made available some of the early literature 
on GB parsing, Mike Kashket for some critical com- 
ments, and Ed Stabler for his continued support of 
this project. Support for this research has been pro- 
vided in part by the CASE Center at Syracuse 
University. 
References 
Aho, A.V., and J.D. Ullman. 1972. The Theory of 
Parsing, Translation and Compiling. 
Prentice-Hall, Englewood Cliffs, NJ 
Arbab, Bijan. 1986. "Compiling Circular Attribute 
Grammars into Prolog." IBM Journal of 
Research and Development, Vol. 30, No. 3, 
May 1986 
Berwick, Robert and Amy Weinberg. 1984. The 
Grammatical Basis of Linguistic Perfor- 
mance. The MIT Press. Cambridge, MA 
Chomsky, Noam. 1981. Lectures on Government 
and Binding. Foris Publications. Dordreeht 
Chomsky, Noam. 1982. Some Concepts and Conse- 
quences of the Theory of Government and 
Binding. The MIT Press. Cambridge, MA 
Chomsky, Noam. 1986. Barriers. The MIT Press. 
Cambridge, MA 
Correa, Nelson. In preparation. Syntactic Analysis 
of English with respect to Government- 
binding Grammar. Ph.D. Dissertation, Syra- 
cuse University 
Gazdar, Gerald, Ewin Klein, Geoffrey Pullum, and 
Ivan Sag. 1985. Generalized Phrase Structure 
Grammar. Harvard University Press. Cam- 
bridge, MA 
Jaekendoff, Ray. 1977. X Syntaz: A Study o/ 
Phrase Structure. The MIT Press. Cambridge, 
MA 
Kashket, Michael. 1986. "Parsing a Free-word 
Order Language: Walpiri." Proceedings of the 
24th Annual Meeting o/ the Association /or 
49 
Computational Linguistics, p.60-66. 
Knut:h, Donald E. 1968. "Semantics of Context-free 
Languages." In Mathematical Systems Theory, 
Vol. 2, No. 2, 1968 
Koster, Jan. 1978. "Conditions, Empty Nodes, and 
Markedness." Linguistic Inquiry, Vol. 9, No. 
4. 
Kuhns, Robert. 1986. "A PROLOG Implementation 
of Government-binding Theory." Proceedinge 
of the Annual Conference of the European 
Chapter of the Association for Computational 
Linguistics, p.546-550. 
Marcus, Mitchell. 1980. A Theory of Syntactic 
Recognition for Natural Language. The MIT 
Press. Cambridge, MA 
Pesetsky, D. 1982. Paths and Categories. Ph.D. 
Dissertation, MIT 
Pullum, Geoffrey. 1985. "Assuming Some Vemion 
of the X-bar Theory." Syntax Research 
Center, University of California, Santa Cruz 
Rizzi, Luigi. 1978. "Violations of the Wh-lsland 
Constraint in Italian and the Subjacency 
Condition." Montreal Working Papers in 
Linguistics 11 
Sells, Peter. 1985. Lectures on Contemporary Syn- 
tactic Theories. Chicago University Press. 
Chicago, Illinois 
Sharp, Randall M. 1985. A Model of Grammar 
Baaed on Principles of Government and Bind- 
ing. M.Sc Thesis, Department of Computer 
Science, University of British Columbia. 
October, 1985 
Van Riemsdijk, Honk and Edwin Williams. 1986. An 
Introduction to the Theory of Grammar. The 
MIT Press. Cambridge, MA 
Waite, William M. and Gerhard Coos. 1984. Com- 
piler Construction. Springer-Verlag. New 
York 
Wehrli, Erie. 1984. "A Government-binding Parser 
for French." Institut pour les Etudes Seman- 
tiques et Cognitives, Universite de Geneve. 
Working Paper No. 48 
Appendix A: The Chain Rule 
i. General rule and condition 
attributior~: 
NP.Chain .-- if NP.empty ---- '-' then NP.node 
else if NP.pronominal -- '+' 
then NP.node 
else if NP.anaphoric = '+' 
then NP.A-Chain 
else N'P.A- Chain 
condition: 
NP.Chain # 0 
ii. Productions 
a. Start production 
Z-*C 
attribution: 
C.A-Chain *-- 0 
C.X-Chain ,-- 0 
b. COMP productions 
C --, COMP' 
attribution: 
COMP'.x ~ C.x, for x = A-Chain, X-Chain 
condition: 
C.A-Chain = 0 " 
C~NP COMP' 
ottribution: 
NP.x *- C.x, for x ~ A-Chain, ~-Chain 
COMP'.A-Chain ,-- C.A-Chain 
COMP'.A-Chain ~- NP.Chain 
condition: 
NP.Wh = '+' 
COMP' --* COMP S 
attribution: 
S.x *-- COMF'.x, for x ---- A-Chain, A -Chain 
e. INFL productions 
S ~ NP INFL' 
attribution: 
NP.x ~- S.x, for x = A-Chain, A-Chain 
INFL'.A-Chain 
if NP.as = 'nil' 
then NP.Chain else 0 
INFL'A -Chain *-- 
if NP.Chain = S.X-Chain 
then 0 else S.A-Chain 
50 
INFL' --* INFL VP 
attribution: 
VP.x *- INFL'.x, for x =- A-Chain, 
A -Chain 
d. V productions 
VP--. V' 
attribution: 
V'.x *-- VP.x, for x ----- A-Chain, A -Chain 
V'--* V NP 
attribution: 
NP.x *-- V'.x, for x -~ A-Chain, .W.-Chain 
V'---, V C 
attribution: 
C.x *-- V'.x, for x ---- A-Chain, A -Chain 
V'--* V NP C 
attribution: 
NP.x *-- V'.x, for x ---- A-Chain, A-Chain 
C.A-Chain *-- 0 
C7, -Chain 
if NP.Chain = V'.A -Chain 
then 0 else V'. -Chain 
e.N 
NI:'~ 
N'~ 
productions 
(/VP ~) N' 
attribution: 
NP~-A-Chain ~- 0 
NP2.~-Chain *- 0 
N (PP)(C) 
attribution: 
PP-A-Chain *-- 0 
PP./T-Chain *-- 0 
C-A-Chain ~ 0 
C.A'-Chain *- 0 
Appendix B: Functional determination of 
NP 
i. General Rules 
atCrib ution: 
NP.pronominal 
if NP.empty = '-' then N'.pronominal 
else if NP.Governor = <0,'nil'> then '+' 
else '-' 
NP.anaphoric 
if NP.empty = '-' then N'.anaphoric 
else if NP. Whs ~- '+' then '-' 
else if NP.Governor = <0,'nil'> 
then '+' 
else if NP. Cases ~ 'nil' then '+' 
else '-' 
ii. Productions 
NP-*~ 
attribution 
NP.empty *-- '+' 
NP --* (Spec) N' 
attribution 
NP.empty 4--- '-' 
51 
