PARSING HEAD-DRIVEN PHRASE STRUCTURE GRAMMAR 
Derek Proudlan and Carl Pollard 
Hewlett-Packard Laboratories 
1501 Page Mill Road 
Palo Alto, CA. 94303, USA 
Abstract 
The Head-driven Phrase Structure Grammar project 
(HPSG) is an English language database query system 
under development at Hewlett-Packard Laboratories. 
Unlike other product-oriented efforts in the natural lan- 
guage understanding field, the HPSG system was de- 
signed and implemented by linguists on the basis of 
recent theoretical developments. But, unlike other im- 
plementations of linguistic theories, this system is not a 
toy, as it deals with a variety of practical problems not 
covered in the theoretical literature. We believe that 
this makes the HPSG system ,nique in its combination 
of linguistic theory and practical application. 
The HPSG system differs from its predecessor 
GPSG, reported on at the 1982 ACL meeting (Gawron 
et al. 119821), in four significant respects: syntax, lexi- 
cal representation, parsing, and semantics. The paper 
focuses on parsing issues, but also gives a synopsis of 
the underlying syntactic formalism. 
1 Syntax 
HPSG is a lexically based theory of phrase struc- 
ture, so called because of the central role played by 
grammlttical heads and their associated complements.' 
Roughly speaking, heads are linguistic forms (words 
and phrases) tl, at exert syntactic and semantic restric- 
tions on the phrases, called complements, that charac- 
teristically combine with them to form larger phrases. 
Verbs are the heads of verb phrm~es (apd sentences), 
nouns are the heads of noun phra~es, and so forth. 
As in most current syntactic theories, categories 
are represented as complexes of feature specifications. 
But the \[IPSG treatment of lcxical subcategorization 
obviates the need in the theory of categories for the no- 
tion of bar-level (in the sense of X-bar theory, prevalent 
in much current linguistic research}. \[n addition, the 
augmentation of the system of categories with stack- 
valued features - features whose values ~re sequences 
of categories - unilies the theory of lexical subcatego- 
riz~tion with the theory of bi,~ding phenomena. By 
binding pimnomena we meaa essentially noJL-clause- 
bounded delmndencies, ,'such a.~ th~rse involving dislo- 
cated constituents, relative ~Lnd interrogative pronouns, 
and reflexive and reciprocal pronouns \[12 I. 
* iIPSG ul a relinwlJ~i¢ ~ld ¢zt.,~nsioll ,,f th~ clu~dy rel~tteu Gt~lmr~dilmd Ph¢.'tme 
Structulm Gran|n|ar lTI. The detaaJs uf lily tllt~J/-y of HPSG ar~ Nt forth in Ii|\[. 
More precisely, the subcategorization of a head is 
encoded as the value of a stack-valued feature called 
~SUBCAT". For example, the SUBCAT value of the 
verb persuade is the sequence of three categories IVP, 
NP, NP I, corresponding to the grammatical relations 
(GR's): controlled complement, direct object, and sub- 
ject respectively. We are adopting a modified version of 
Dowty's \[19821 terminology for GR's, where subject "LS 
last, direct object second-to-last, etc. For semantic rea- 
sons we call the GR following a controlled complement 
the controller. 
One of the key differences between HPSG and its 
predecesor GPSG is the massive relocation of linguistic 
information from phrase structure rules into the lexi- 
con \[5\]. This wholesale lexicalization of linguistic infor- 
mation in HPSG results in a drastic reduction in the 
number of phrase structure rules. Since rules no longer 
handle subcategorization, their sole remaining function 
is to encode a small number of language-specific prin- 
ciples for projecting from \[exical entries h, surface con- 
stituent order. 
The schematic nature of the grammar rules allows 
the system to parse a large fragment of English with 
only a small number of rules (the system currently uses 
sixteen), since each r1,le can be used in many different 
situations. The constituents of each rule are sparsely 
annotated with features, but are fleshed out when taken 
together with constituents looked for and constituents 
found. 
For example the sentence The manager works can 
be parsed using the single rule RI below. The rule 
is applied to build the noun phrase The manager by 
identifying the head H with the \[exical element man- 
aqer and tile complement CI with the lexical element 
the. The entire sentence is built by ideutifying the H 
with works and the C1 with the noun phrase described 
above. Thus the single rule RI functions as both the S 
-* NP VP, and NP ~ Det N rules of familiar context 
fRe grammars. 
R1. x -> ci hi(CONTROL INTRANS)\] a* 
Figure I. A Grammar Rule. 
167 
\]Feature Passing 
The theory of HPSG embodies a number of sub- 
stantive hypotheses about universal granunatical prin- 
ciples, Such principles as the Head Feature Princi- 
ple, the Binding Inheritance Principle, and the Con- 
trol Agreement Principle, require that certain syntac- 
tic features specified on daughters in syntactic trees are 
inherited by the mothers. Highly abstract phrase struc- 
ture rules thus give rise to fully specified grammatical 
structures in a recursive process driven by syntactic in- 
formation encoded on lexical heads. Thus HPSG, un- 
like similar ~unification-based" syntactic theories, em- 
bodies a strong hypothesis about the flow of relevant 
information in the derivation of complex structures. 
Unification 
Another important difference between HPSG and 
other unification based syntactic theories concerns the 
form of the expressions which are actually unified. 
In HPSG, the structures which get unified are (with 
limited exceptions to be discussed below) not general 
graph structures as in Lexical Functional Qrammar \[1 I, 
or Functional Unification Granmaar IlOI, but rather fiat 
atomic valued feature matrices, such as those ~hown 
below. 
\[(CONTROL 0 INTRANS) (MAJ N A) 
(AGR 3RDSG) (PRD MINUS) (TOP MINUS)\] 
\[(CONTROL O) (MAJ H V) (INV PLUS)\] 
Figure 2. Two feature matrices. 
In the implementation of \[\[PSG we have been able 
to use this restrictiou on the form of feature tnatrices 
to good advantage. Since for any given version of the 
system the range of atomic features and feature values 
is fixed, we are able to represent fiat feature matrices, 
such as the ores above, as vectors of intcKers, where 
each cell in the vector represents a feature, and ~he in- 
teger in each cell represents a disjunctioa of tile possible 
values for that feature. 
CON MAJ AGR PRD INV TOP ... 
............................ 
i tTt toi2 It 13 I t I 
I t\[ t21 7 13 I t I 3 I 
Figure 3: Two ~ransduced feature matrices. 
For example, if the possible values of the MAJ fea- 
ture are N, V, A, and P then we can uuiquely represent 
any combination of these features with an integer in 
the raalge 0..15. This is accomplished simply by a~ign- 
ing each possible value an index which is an integral 
power of 2 in this range and then adding up the indices 
so derived for each disjunction of values encountered. 
Unification in such cases is thus reduced to the "logical 
and" of the integers in each cell of the vector represent- 
ing the feature matrix. In this way unification of these 
flat structures can be done in constant time, and since 
=logical and" is generally a single machine instruction 
the overhead is very low. 
N V A P 
\[ t \[ 0 \[ t \[ 0 \[ = tO = (MAJ N A) 
It It 10 io l= t2= (MAJ gV) 
Ilmllmllmmlllllll~l Un£fication 
I I I 0 I 0 I 0 I = 8 = (MAJ H) 
Figure 4: Closeup of the MAJ feature. 
There are, however, certain cases when the values 
of features are not atomic, but are instead themselves 
feature matrices. The unification of such structures 
could, in theory, involve arbitrary recursion on the gen- 
eral unification algorithm, and it would seem that we 
had not progressed very far from the problem of uni- 
fying general graph structures. Happily, the features 
for which this property of embedding holds, constitute 
a small finite set (basically tlte so called "binding fea- 
tures"). Thus we are able to segregate such features 
from the rest, and recurse only when such a "category 
valued ~ feature is present. \[n practice, therefore, the 
time performance of the general uailication algorithm 
is very good, essentially the sanze a.s that of the lint 
structure unification algorithm described above. 
2 Parsing 
As in the earlier GPSG system, the primary job 
of the parser in the HPSG system is to produce a se- 
mantics for the input sentence. This is done composi- 
tionally as the phrase structure is built, and uses only 
locally available information. Thus every constituent 
which is built syntactically has a corresponding seman- 
tics built for it at the same time, using only information 
available in the phrasal subtree which it immediately 
dominates. This locality constraint in computing the 
semantics for constituents is an essential characteristic 
of HPSG. For a more complete description of the se- 
mantic treatment used in the HPSG system see Creary 
and Pollard \[2\]. 
Head-driven Active Chart Parser 
A crucial dilference between the HPSG system and 
its predecessor GPSG is the importance placed on the 
head constituent in HPSG. \[n HPSG it is the head con- 
stituent of a rule which carries the subcategorization 
information needed to build the other constituents of 
168 
the rule. Thus parsing proceeds head first through the 
phrase structure of a sentence, rather than left to right 
through the sentence string. 
The parser itself is a variation of an active chart 
parser \[4,9,8,13\], modified to permit the construction of 
constituents head first, instead of in left-to-right order. 
In order to successfully parse "head first", an edge* 
must be augmented to include information about its 
span (i.e. its position in the string). This is necessary 
because heaA can appear as a middle constituent of 
a rule with other constituents (e.g. complements or 
adjuncts) on either side. Thus it is not possible to 
record all the requisite boundary information simply 
by moving a dot through the rule (as in Earley), or by 
keeping track of just those constituents which remain 
to be built (as in Winograd). An example should make 
this clear. 
Suppose as before we are confronted with the task 
of parsing the sentence The manager works, and again 
we have available the grammar rule R1. Since we are 
parsing in a ~head first" manner we must match the 
H constituent against some substring of the sentence. 
But which substring? In more conventional chart pars- 
ing algorithms which proceed left to right this is not 
a serious problem, since we are always guaranteed to 
have an anchor to the left. We simply try building the 
\[eftmost constituent of the rule starting at the \[eftmost 
position of the string, and if this succeeds we try to 
build the next \[eftmost constituent starting at one po- 
sition to the right of wherever the previous constituent 
ended. However in our case we cannot ausume any such 
anchoring to the left, since as the example illustrates. 
the H is not always leftmost. 
The solution we have adopted in the HPSG system 
is to annotate each edge with information about the 
span of substring which it covers. In the example be- 
low the inactive edge E1 is matched against the head of 
rule R1, and since they unify the new active edge E2 is 
created with its head constituent instantiated with the 
feature specifications which resulted from the unifica- 
tion. This new edge E2 is annotated with the span of 
the inactive edge El. Some time later the inactive edge 
I,:3 is matched against the "np" constituent of our ac- 
tive edge E2, resulting in the new active edge E.I. The 
span of E4 is obtained by combining the starting posi- 
tion of E3 {i.e. t) with the finishing postion of E2 (i.e. 
3). The point is that edges ~Lre constructed from the 
head out, so that at any given tame in Lhe life cycle of 
an edge the spanning informatiun on the edge records 
the span of contiguous substring which it covers. 
Note that in the transition from rule ill to edge 
1~2 we have relabeled the constituent markers z, cl, 
~nd h with the symbols ~, np, ~utd VP respectively. 
This is done merely a.s ~t mnemouic device to reflect 
the fact that once the head of the edge is found, the 
subcategorization information on that head (i.e. the 
values of the "SUHCAT" feature of the verb work.s) is 
An edi\[e is, Iooe~y spea&ing, ,-tn inlCantiation of a nile witll ~nnle of tile 
\[e~urml on conlltituentll m~de ntore spm:ifl¢. 
propagated to the other elements of the edge, thereby 
restricting the types of constituents with which they 
can be satisfied. Writing a constituent marker in upper 
case indicates that an inactive edge has been found 
to instantiate it, while a lower case (not yet found) 
constituent in bold face indicates that this is the next 
constituent which will try to be instantiated. 
El. V<3.3> 
RI. x-> ci h a* 
g2. s<3.3> -> np VP a* 
E3. NP<I,2>" 
E2. s<3,3> -> np VP a* 
E4. s<1.3> -> ~P VP R*' 
Figure 5: Combining edges and rules. 
Using Semantics Restrictions 
Parsing ~head first" offers both practical and theo- 
retical advantages. As mentioned above, the categories 
of the grammatical relations subcategorized for by a 
particular head are encoded as the SUBCAT value of 
the head. Now GR's are of two distinct types: those 
which are ~saturated" (i.e. do not subcategorize for 
anything themselves), such as subject and objects, and 
those which subcategorize for a subject (i.e. controlled 
complements). One of the language-universal gram- 
matical principles (the Control Agreement Principle) 
requires that the semantic controller of a controlled 
complement always be the next grammatical relation 
(in the order specified by the value of the SUBCAT 
feature of the head) after the controlled complement 
to combine with the head. But since the HPSG parser 
always finds the head of a clause first, the grammati- 
cal order of its complements, as well as their semantic 
roles, are always specified before the complements are 
found. As a consequence, semantic processing ~f con- 
stituents can be done on the fly as the constituents 
are found, rather than waiting until an edge has been 
completed. Thus semantic processing can be do.e ex- 
tremely locally (constituent-to-constituent in the edge, 
rather than merely node-to-node in the parse tree as in 
Montague semantics), and therefore a parse path ,an 
be abandoned on semantic grounds (e.g. sortal iltcon- 
sistency) in the rniddle of constructing an edge. la this 
way semantics, as well as syntax, can be used to control 
the parsing process. 
Anaphora ill HPSG 
Another example of how parsing ~head first" pays 
oil is illustrated by the elegant technique this strat- 
egy makes possible for the binding of intr~entential 
a~taphors. This method allows us to assimilate cases of 
bound anaphora to the same general binding method 
used iu the HPSG system to handle other non-lexically- 
governed dependencies ~uch a.s ~ap.~, ~,ttt~ro~,t.ive pro- 
nouns, and relative pronouns. Roughly, the unbound 
dependencies of each type on every constituent are en- • 
coded as values of a,n appropriate stack-valued feature 
169 
("binding feature"). In particular, unbound anaphors 
axe kept track of by two binding features, REFL (for 
reflexive pronouns) and BPRO \['or personal pronouns 
available to serve as bound anaphors. According to 
the Binding Inheritance Principle, all categories on 
binding-feature stacks which do not get bound under a 
particular node are inherited onto that node. Just how 
binding is effected depends on the type of dependency. 
In the case of bound anaphora, this is accomplished 
by merging the relevant agreement information (stored 
in the REFL or BPRO stack of the constituent contain- 
ing the anaphor) with one of the later GR's subcatego- 
rized for by the head which governs that constituent. 
This has the effect of forcing the node that ultimately 
unifies with that GR (if any) to be the sought-after 
antecedent. The difference between reflexives and per- 
sonal pronouns is this. The binding feature REFL 
is not allowed to inherit onto nodes of certain types 
(those with CONTROL value \[N'rRANS}, thus forc- 
ing the reflexive pronoun to become locally bound. In 
the case of non-reflexive pronouns, the class of possible 
antecedents is determined by n,,difying the subcatego- 
rization information on the hel,,l governing the pronoun 
so that all the subcategorized-fl~r GR's later in gram- 
matical order than the pronoun are "contra-indexed" 
with the pronoun (and thereby prohibited front being 
its antecedent). Binding then takes place precisely as 
with reflexives, but somewhere higher in the tree. 
We illustrate this d~ttttction v, ti, kh I.~O examples. 
\[n sentence S I below told subcategorizes for three con- 
stituents: the subject NP Pullum, the direct object 
Gazdar, and the oblique object PP about himself.' 
Thus either PuUum or f;uzdur are po~ible antecedents 
of himself, but not Wasow. 
SI. Wasow was convinced that Pullum told 
Gazdar about himself. 
$2. Wasow persuaded Pullum to shave him. 
\[n sentence 52 shave subcategorizes for tile direct 
object NP him and an NP subject eventue.tly tilled by 
the constituent Pullum via control. Since the subject 
position is contra-indexed with tile pronoun, PuUum is 
blocked from serving a~ the a,tecedent. The pro,mun 
is eventually bound by the NP WanouJ higher up in the 
tree. 
Heuristics to Optiudze ."Joareh 
'\['he liPS(; system, based as it is upon a care- 
fully developed hngui~tic theory, has broad expressive 
power. In practice, how-ver, much of this power is often 
not necessary. To exploit this fact the IiPSC, system 
u.~cs heuristics to help r,,duve the search space implic- 
itly defined by the grammar. These heuristics allow 
the parser to produce an optimally ordered agenda of 
edges to try ba.sed on words used in tile sentence, and 
on constituents it has found so far. 
• The pNpOlltiOll tl treltt~| eel~lttt~tllF :is a c:~se tam'king. 
One type of heuristic involves additional syntactic 
information which can be attached to rules to deter- 
mine their likelihood. Such a heuristic is based on the 
currently intended use for the rule to which it is at- 
tached, and on the edges already available in the chart. 
An example of this type of heuristic is sketched below. 
RI. x -> cl h a* 
Heuristic-l: Are the features of cl +QUE? 
Figure 6: A rule with an attached heuristic. 
Heuristic-I encodes the fact that rule RI, when 
used in its incarnation as the S -- NP VP rule, is pri- 
marily intended to handle declarative sentences rather 
than questions. Thus if the answer to Heuristic-1 is 
"no" then this edge is given a higher ranking than if 
the answer is "yes". This heuristic, taken together with 
others, determines the rank of the edge instantiated 
from this rule, which in turn determines the order in 
which edges will be tried. The result in this case is that 
for a sentence such as 53 below, the system will pre- 
fer the reading for which an appropriate answer is ".a 
character in a play by Shakespeare", over the reading 
which has as a felicitous answer "Richard Burton". 
S3. Who is Hamlet? 
It should be empha~sized, however, that heuristics 
are not an essential part of the system, as are the fea- 
ture passing principles, but rather are used only for 
reasons of efficiency. In theory all possible constituents 
permitted by the grammar will be found eventually 
with or without heuristics. The heuristics siu,ply help 
a linguist tell the parser which readings are most likely, 
and which parsing strategies are usually most fruitful, 
thereby allowing the parser to construct the most likely 
reading first. We believe that this clearly diifereuti- 
ares \[IPSG from "ad hoc" systems which do not make 
sharp the distinction between theoretical principle and 
heuristic guideline, and that this distinction is an izn- 
portant one if the natural language understanding pro- 
grams of today are to be of any use to the natural 
language programs and theories of the future. 
ACKO WLED(4EME1NTS 
We would like to acknowledge the valuable assi- 
tance of Thomas Wasow ~md Ivan Sag ht tile writing 
of this paper. We would also like to thank Martin Kay 
and Stuart Shi.e~:r lot tlke~r tte\[p\[u\[ cuttutteut, aly on an 
earlier draft. 
170 
REFERENCES 
Ill Bresnan, J. (ed). (1982) 
The Mental Representation of Grammatical Rela- 
tions, The MIT Press, Cambridge, Mass. 
\[z) Creary, L. and C. Pollard (1985) 
~A Computational Semantics for Natural Lan- 
guage", Proceedings of the gSrd Annual Meeting 
of the Association for Computational Linguistics. 
\[31 Dowry, D.R. (1982) 
"Grammatical Relations and Montague Grammar", 
In P. Jacobson and G.K. Pullum (eds.), The Nature 
of Syntactic Representation D. Reidel Publishing 
Co., Dordrecht, Holland. 
L41 Earley, J. (1970) 
"An efficient context-free parsing algorithm", 
CACM 6:8, 1970. 
\[5t Fliekinger, D., C. Pollard. T. Wasow (1985) 
"Structure-Sharing in Lexical Representation", 
Proceedings of the 23rd Annual Meetin!l of the 
Association for Computational Linguistics. 
i61 Gawron, 3. et al. (1982) 
"Processing English with a Generalized Phrase 
Structure Grammar", ACL Proceedings 20. 
it! Gazdar, G. et al. (in pres,s) 
Generalized Phrase Structure (;rammar, 
Blackwell and Harvard University Press. 
Is! Kaplan, R. ( 197:. ! 
~A General Syl!tlxt:l, ic Processor", la Rustin (ed.) 
Natural Langua~te Proeessiny. Algorithmics Press, 
N.Y. 
Kay, M. \[t973) 
"The MIND System", lu Rusl, in (ed.) Natural 
Language Processiur.i. Algorithmics Press, N.Y. 
it01 Kay, M. (forthcoming) 
"Parsing in Functiotml Uailicatiou Grammar". 
iLll Pollard. C. (198,1) 
Generalized Context-Free (;rammur~, Ile, ad (:r.m- 
mar.s, and Natural L,mtlU.Ve, I'tn.D. Dissertation, 
Stanford. 
Pollard, C. (forthcotnitlg) 
"A Semantic Approlu:h to Ilhuling in ;t Movms- 
trata\[ Theory", To appe,~r in Lin!luistic~ and 
Philosophy. 
131 Winograd, T. (tg~O) 
Language as a Uo~nitive l'rocess. 
Addi~on-W~lcy, lteadiag, Marts. 
171 
