Using Restriction to Extend Parsing Algorithms for 
Complex-Feature-Based Formalisms 
Stuart M. Shieber 
Artificial Intelligence Center 
SRI International 
and 
Center for the Study of Language and Information 
Stanford University 
Abstract 1 Introduction 
Grammar formalisms based on the encoding of grammatical 
information in complex-valued feature systems enjoy some 
currency both in linguistics and natural-language-processing 
research. Such formalisms can be thought of by analogy to 
context-free grammars as generalizing the notion of non- 
terminal symbol from a finite domain of atomic elements 
to a possibly infinite domain of directed graph structures 
nf a certain sort. Unfortunately, in moving to an infinite 
nonterminal domain, standard methods of parsing may no 
longer be applicable to the formalism. Typically, the prob- 
lem manifests itself ,as gross inefficiency or ew, n nontermina- 
t icm of the alg~,rit hms. In this paper, we discuss a solution to 
the problem of extending parsing algorithms to formalisms 
with possibly infinite nonterminal domains, a solution based 
on a general technique we call restriction. As a particular 
example of such an extension, we present a complete, cor- 
rect, terminating extension of Earley's algorithm that uses 
restriction to perform top-down filtering. Our implementa- 
tion of this algorithm demonstrates the drastic elimination 
of chart edges that can be achieved by this technique. Fi- 
t,all.v, we describe further uses for the technique--including 
parsing other grammar formalisms, including definite.clause 
grammars; extending other parsing algorithms, including 
LR methods and syntactic preference modeling algorithms; 
anti efficient indexing. 
This research has been made possible in part by a gift from the Sys* 
terns Development Fonndation. and was also supported by the Defense 
Advancml Research Projects Agency under C,mtraet NOOO39-g4-K- 
0n78 with the Naval Electronics Systems Ckm~mand. The views and 
ronchtsi~ms contained in this &Jcument should not be interpreted a.s 
representative of the official p~dicies, either expressed or implied, of 
the D~'fen~p Research Projects Agency or the United States govern- 
mont. 
The author is indebted to Fernando Pereira and Ray Perrault for their 
comments on ea, riier drafts o\[ this paper. 
Grammar formalisms ba.sed on the encircling of grantmal- 
ical information in complex-valued fealure systems enjoy 
some currency both in linguistics and natural-language- 
processing research. Such formalisms can be thought of by 
analogy to context-free grammars a.s generalizing the no- 
tion of nonterminai symbol from a finite domain of atomic 
elements to a possibly infinite domain of directed graph 
structures of a certain sort. Many of tile sm'fa,',,-bast,,I 
grammatical formalisms explicitly dvfin,,,I ,,r pr~"~Ul~p,~'.,'.l 
in linguistics can be characterized in this way ,,.~.. It.xi ,I- 
functional grammar (I,F(;} \[5\], generalizt,I I,hr:~,' ~l rlt,'l ur,. 
grammar (GPSG) \[.1\], even categorial systems such ,as M,,n- 
tague grammar \[81 and Ades/Steedman grammar Ill --,~s can 
several of the grammar formalisms being used in natural- 
language processing research--e.g., definite clause grammar 
(DCG) \[9\], and PATR-II \[13\]. 
Unfortunately, in moving to an infinite nonlermiual de,- 
main, standard methods of parsing may no h,ngvr t~, ap- 
plicable to the formalism. ~k~r instance, the application 
of techniques for preprocessing of grantmars in ,,rder t,, 
gain efficiency may fail to terminate, ~ in left-c,~rner and 
LR algorithms. Algorithms performing top-dc~wn prediction 
(e.g. top-down backtrack parsing, Earley's algorithm) may 
not terminate at parse time. Implementing backtracking 
regimens~useful for instance for generating parses in some 
particular order, say, in order of syntactic preference--is 
in general difficult when LR-style and top-down backtrack 
techniques are eliminated. 
\[n this paper, we discuss a s~dul.ion to the pr~,blem of ex- 
tending parsing algorithms to formalisms with possibly infi- 
nite nonterminal domains, a solution based on an operation 
we call restriction. In Section 2, we summarize traditional 
proposals for solutions and problems inherent in them and 
propose an alternative approach to a solution using restric- 
tion. In Section 3, we present some technical background 
including a brief description of the PATR-II formalism~ 
which is used as the formalism interpreted by the pars- 
ing algorithms~and a formal definition of restriction for 
145 
PATR-II's nonterminal domain. In Section 4, we develop 
a correct, complete and terminating extension of Earley's 
algorithm for the PATR-II formalism using the restriction 
notion. Readers uninterested in the technical details of the 
extensions may want to skip these latter two sections, refer- 
ring instead to Section 4.1 for an informal overview of the 
algorithms. Finally, in Section 5, we discuss applications 
of the particular algorithm and the restriction technique in 
general. 
2 Traditional Solutions and an Al- 
ternative Approach 
Problems with efficiently parsing formalisms based on 
potentially infinite nonterminal domains have manifested 
themselves in many different ways. Traditional solutions 
have involved limiting in some way the class of grammars 
that can be parsed. 
2.1 Limiting the formalism 
The limitations can be applied to the formalism by, for in- 
stance, adding a context-free "backbone." If we require that 
a context-free subgrammar be implicit in every grammar, 
the subgrammar can be used for parsing and the rest of the 
grammar used az a filter during or aRer parsing. This solu- 
tion has been recommended for functional unification gram- 
mars (FI,G) by Martin Kay \[61; its legacy can be seen in 
the context-free skeleton of LFG, and the Hewlett-Packard 
GPSG system \[31, and in the cat feature requirement in 
PATR-\[I that is described below. 
However, several problems inhere in this solution of man- 
dating a context-free backbone. First, the move from 
context-free to complex-feature-based formalisms wan mo- 
tivated by the desire to structure the notion of nonterminal. 
Many analyses take advantage of this by eliminating men- 
tion of major category information from particular rules a or 
by structuring the major category itself (say into binary N 
and V features plus a bar-level feature as in ~-based theo- 
ries). F.rcing the primacy and atomicity of major category 
defeats part of the purpose of structured category systems. 
Sec, m,l. and perhaps more critically, because only cer- 
tain ,ff the information in a rule is used to guide the parse, 
say major category information, only such information can 
be used to filter spurious hypotheses by top-down filtering. 
Note that this problem occurs even if filtering by the rule 
information is used to eliminate at the earliest possible time 
constituents and partial constituents proposed during pars- 
ing {as is the case in the PATR-II implementation and the 
~Se~'. \[or instance, the coordination and copular "be" aaalyses from 
GPSG \[4 I, the nested VP analysis used in some PATR-ll grammars 
11.5 I, or almost all categorial analyse~, in which general roles of com- 
bination play the role o1' specific phlr~se-stroctur¢ roles. 
Earley algorithm given below; cf. the Xerox LFG system}. 
Thus, if information about subcategorization is left out of 
the category information in the context-free skeleton, it can- 
not be used to eliminate prediction edges. For example, if 
we find a verb that subcategorizes for a noun phrase, but 
the grammar rules allow postverbal NPs, PPs, Ss, VPs, and 
so forth, the parser will have no way to eliminate the build- 
ing of edges corresponding to these categories. Only when 
such edges attempt to join with the V will the inconsistency 
be found. Similarly, if information about filler-gap depen- 
dencies is kept extrinsic to the category information, as in 
a slash category in GPSG or an LFG annotation concern- 
ing a matching constituent for a I~ specification, there will 
be no way to keep from hypothesizing gaps at any given 
vertex. This "gap-proliferation" problem has plagued many 
attempts at building parsers for grammar formalisms in this 
style. 
In fact, by making these stringent requirements on what 
information is used to guide parsing, we have to a certain 
extent thrown the baby out with the bathwater. These 
formalisms were intended to free us from the tyranny of 
atomic nonterminal symbols, but for good performance, we 
are forced toward analyses putting more and more informa- 
tion in an atomic category feature. An example of this phe- 
nomenon can be seen in the author's paper on LR syntactic 
preference parsing \[14\]. Because the LALR table building 
algorithm does not in general terminate for complex-feature- 
based grammar formalisms, the grammar used in that paper 
was a simple context-free grammar with subcategorization 
and gap information placed in the atomic nonterminal sym- 
bol. 
2.2 Limiting grammars and parsers 
On the other hand, the grammar formalism can be left un- 
changed, but particular grammars dew,loped that happen 
not to succumb to the problems inhere, at in the g,,neral 
parsing problem for the formalism. The solution mentioned 
above of placing more information in lilt, category symbol 
falls into this class. Unpublished work by Kent Witwnburg 
and by Robin Cooper has attempted to solve the gap pro- 
liferation problem using special grammars. 
In building a general tool for grammar testing and debug- 
ging, however, we would like to commit as little ,as possible 
to a particular grammar or style of grammar.: Furthermore, 
the grammar designer should not be held down in building 
an analysis by limitations of the algorithms. Thus a solution 
requiring careful crMting of grammars is inadequate. 
Finally, specialized parsing alg~withms can be designed 
that make use of information about the p;trtictd;tr gram- 
mar being parsed to eliminate spurious edges or h vpothe- 
ses. Rather than using a general parsing algorithm on a 
'See \[121 for further discl~sioa of thi~ matter. 
146 
limited formalism, Ford, Bresnan, and Kaplan \[21 chose a 
specialized algorithm working on grammars in the full LFG 
formalism to model syntactic preferences. Current work at 
Hewlett-Packard on parsing recent variants of GPSG seems 
to take this line as well. 
Again, we feel that the separation of burden is inappropri- 
ate in such an attack, especially in a grammar-development 
context. Coupling the grammar design and parser design 
problems in this way leads to the linguistic and technolog- 
ical problems becoming inherently mixed, magnifying the 
difficulty of writing an adequate grammar/parser system. 
2.3 An Alternative: Using Restriction 
Instead, we would like a parsing algorithm that placed no 
restraints on the grammars it could handle as long as they 
could be expressed within the intended formalism. Still, the 
algorithm should take advantage of that part of the arbi- 
trarily large amount of information in the complex-feature 
structures that is significant for guiding parsing with the 
particular grammar. One of the aforementioned solutions 
is to require the grammar writer to put all such signifi- 
cant information in a special atomic symbol--i.e., mandate 
a context-free backbone. Another is to use all of the feature 
structure information--but this method, as we shall see, in- 
evitably leads to nonterminating algorithms. 
A compromise is to parameterize the parsing algorithm 
by a small amount of grammar-dependent information that 
tells the algorithm which of the information in the feature 
structures is significant for guiding the parse. That is, the 
parameter determines how to split up the infinite nontermi- 
nal domain into a finite set of equivalence classes that can be 
used for parsing. By doing so, we have an optimal compro- 
mise: Whatever part of the feature structure is significant 
we distinguish in the equivalence classes by setting the pa- 
rameter appropriately, so the information is used in parsing. 
But because there are only a finite number of equivalence 
ciasses, parsing algorithms guided in this way will terminate. 
The technique we use to form equivalence classes is re- 
strietion, which involves taking a quotient of the domain 
with respect to a rcstrietor. The restrictor thus serves as 
the sole repository, of grammar-dependent information in the 
algorithm. By tuning the restrictor, the set of equivalence 
classes engendered can be changed, making the algorithm 
more or less efficient at guiding the parse. But independent 
of the restrictor, the algorithm will be correct, since it is 
still doing parsing over a finite domain of "nonterminals," 
namely, the elements of the restricted domain. 
This idea can be applied to solve many of the problems en- 
gendered by infinite nonterminal domains, allowing prepro- 
cessing of grammars as required by LR and LC algorithms, 
allowing top-down filtering or prediction as in Earley and 
top-down backtrack parsing, guaranteeing termination, etc. 
3 Technical Preliminaries 
Before discussing the use of restriction in parsing algorithms, 
we present some technical details, including a brief introduc- 
tion to the PATR-II grammar formalism, which will serve 
as the grammatical formalism that the presented algorithms 
will interpret. PATR-II is a simple grammar formalism that 
can serve as the least common denominator of many of 
the complex-feature-based and unification-based formalisms 
prevalent in linguistics and computational linguistics. As 
such it provides a good testbed for describing algorithms 
for complex-feature-based formalisms. 
3.1 The PATR-II nonterminal domain 
The PATR-II nonterminal domain is a lattice of directed, 
acyclic, graph structures (dags). s Dags can be thought of 
similar to the reentrant f-structures of LFG or functional 
structures of FUG, and we will use the bracketed notation 
associated with these formalisms for them. For example. 
the following is a dag {D0) in this notation, with reentrancy 
indicated with coindexing boxes: 
a: 
d: 
b: c\] 
I , i: k: I hl\] 
Dags come in two varieties, complez (like the one above) 
and atomic (like the dags h and c in the example). Con~plex 
dags can be viewed a.s partial functions from labels to dag 
values, and the notation D(l) will therefore denote the value 
associated with the label l in the dag D. In the same spirit. 
we can refer to the domain of a dag (dora(D)). A dag with 
an empty domain is often called an empty dag or variable. 
A path in a dag is a sequence of label names (notated, e.g.. 
(d e ,f)), which can be used to pick out a particular subpart 
of the dag by repeated application {in this case. the dag \[g : 
hi). We will extend the notation D(p) in the obvious way to 
include the subdag of D picked ~,tlt b.v a path p. We will also 
occasionally use the square brackets as l he dag c~mstructor 
function, so that \[f : DI where D is an expression denoting 
a dag will denote the dag whose f feature has value D. 
3.2 Subsumption and Unification 
There is a natural lattice structure for dags based on 
subsumption---an ordering cm ¢lag~ that l'~mghly c~rre~pon~l.~ 
to the compatibility and relative specificity of infi~rmation 
~The reader is referred to earlier works \[15.101 for more detailed dis- 
cussions of dag structures. 
147 
contained in the dags. Intuitively viewed, a dag D subsumes 
a dag D' {notated D ~/T) if D contains a subset of the in- 
formation in (i.e., is more general than)/Y. 
Thus variables subsume all other dags, atomic or complex, 
because as the trivial case, they contain no information at 
all. A complex dag D subsumes a complex dag De if and 
only if D(i) C D'(I) for all l E dora(D) and LF(P) =/Y(q) 
for all paths p and q such that D(p) = D(q). An atomic dag 
neither subsumes nor is subsumed by any different atomic 
dag. 
For instance, the following subsumption relations hold: 
a: m\[b : c\] \] 
field:el r'\[a: {b:el\]c d: ~ 
-- -- t: f e: f 
Finally, given two dags D' and D", the unification of the 
dags is the most general dag D such that LF ~ D and D a C_ 
D. We notate this D = D ~ U D". 
The following examples illustrate the notion of unification: 
to tb:cllot : ,lb:cl\] 
\[ a: {b:cl\]u d - d 
The unification of two dags is not always well-defined. In 
the rases where no unification exists, the unificati,,n is said 
to fail. For example the following pair of dags fail to unify 
with each other: 
d d: \[b d\] =fail 
3.3 Restriction in the PATR-II nontermi- 
r,.al domain 
Now. consider the notion of restriction of a dag, using the 
term almost in its technical sense of restricting the domain 
,)f ,x function. By viewing dags as partial functions from la- 
bels to dag values, we can envision a process ,~f restricting 
the ,l~mlain of this function to a given set of labels. Extend- 
ing this process recursively to every level of the dag, we have 
the ,'-ncept of restriction used below. Given a finite, sperifi- 
,'ati,,n ~ (called a restrictor) of what the allowable domain 
at ,,:u'h node of a dag is, we can define a functional, g', that 
yields the dag restricted by the given restrictor. 
Formally, we define restriction as follows. Given a relation 
between paths and labels, and a dag D, we define D~ 
to be the most specific dag LF C D such that for every path 
p either D'(p) is undefined, or if(p) is atomic, or for every 
! E dom(D'(p)}, pOl. That is, every path in the restricted 
dag is either undefined, atomic, or specifically allowed by the 
restrictor. 
The restriction process can be viewed as putting dags into 
equivalence classes, each equivalence class being the largest 
set of dags that all are restricted to the same dag {which we 
will call its canonical member). It follows from the definition 
that in general O~O C_ D. Finally, if we disallow infinite 
relations as restrictors (i.e., restrictors must not allow values 
for an infinite number of distinct paths) as we will do for the 
remainder of the discussion, we are guaranteed to have only 
a finite number of equivalence classes. 
Actually, in the sequel we will use a particularly simple 
subclass of restrictors that are generable from sets of paths. 
Given a set of paths s, we can define • such that pOI if and 
only if p is a prefix of some p' E s. Such restrictors can be 
understood as ~throwing away" all values not lying on one 
of the given paths. This subclass of restrictors is sut~cient 
for most applications. However, tile algorithms that we will 
present apply to the general class as well. 
Using our previous example, consider a restrictor 4~0 gen- 
erated from the set of paths {(a b), (d e f),(d i j f)}. 
That is, pool for all p in the listed paths and all their pre- 
fixes. Then given the previous dag Do, D0~O0 is 
a: \[b: e l 
Restriction has thrown away all the infi~rmatiou except the 
direct values of (a b), (d e f), and (d i j f). (Note however 
that because the values for paths such as (d e f 9) were 
thrown away, (D0~'¢o)((d e f)) is a variahh,.) 
3.4 PATR-II grammar rules 
PATR-ll rules describe how to combine a sequence ,,f con- 
stituents. X, ..... X,, to form a constituent X0, stating mu- 
tual constraints on the dags associated with tile n + 1 con- 
stituents as unifications of various parts of the dags. For 
instance, we might have the following rule: 
Xo -" Xt .\': : 
(.\,, ,'sO = >' 
(.\', rat) = .X l' 
(.\': cat) = I'P 
(X, agreement) = (.\'~ agreement). 
By notational convention, we can eliminate unifications for 
the special feature cat {the atomic major category feature) 
recording this information implicitly by using it in the 
"name" of the constituent, e.g., 
148 
S-- NP VP: 
(NP agreement) = (VP agreement). 
If we require that this notational convention always be used 
(in so doing, guaranteeing that each constituent have an 
atomic major category associated with it}, we have thereby 
mandated a context-free backbone to the grammar, and can 
then use standard context-free parsing algorithms to parse 
sentences relative to grammars in this formalism. Limiting 
to a context-free-based PATR-II is the solution that previous 
implementations have incorporated. 
Before proceeding to describe parsing such a context-free- 
based PATR-II, we make one more purely notational change. 
Rather than associating with each grammar rule a set of 
unifications, we instead associate a dag that incorporates all 
of those unifications implicitly, i.e., a rule is associated with 
a dug D, such that for all unifications of the form p = q in 
the rule. D,(p) = D,(q). Similarly, unifications of the form 
p = a where a is atomic would require that D,(p) = a. For 
the rule mentioned above, such a dug would be 
X0: \[cat: S\] 
Xl : agreement: m\[\] 
\[eat: V P \] 
X, : agreement : ,~I 
Thus a rule can be thought of as an ordered pair (P, D) 
whore P is a production of the form X0 -- XI -.. X, and D 
is a dug with top-level features Xo,..., X, and with atomic 
values for the eat feature of each of the top-level subdags. 
The two notational conventions--using sets of unifications 
instead of dags, and putting the eat feature information im- 
plicitly in the names of the constituents--allow us to write 
rules in the more compact and familiar.format above, rather 
than this final cumbersome way presupposed by the algo- 
rithm. 
4 Using Restriction to Extend Ear- 
ley's Algorithm for PATR-II 
We now develop a concrete example of the use of restriction 
in parsing by extending Earley's algorithm to parse gram- 
mars in the PATR-\[I formalism just presented. 
4.1 An overview of the algorithms 
Earley's algorithm ia a bottom-up parsing algorithm that 
uses top-down prediction to hypothesize the starting points 
of possible constituents. Typically, the prediction step de- 
termines which categories of constituent can start at a given 
point in a sentence. But when most of the information is 
not in an atomic category symbol, such prediction is rela- 
tively useless and many types of constituents are predicted 
that could never be involved in a completed parse. This 
standard Earley's algorithm is presented in Section 4.2. 
By extending the algorithm so that the prediction step 
determines which dags can start at a given point, we can 
use the information in the features to be more precise in the 
predictions and eliminate many hypotheses. However. be- 
cause there are a potentially infinite number of such feature 
structures, the prediction step may never terminate. This 
extended Earley's algorithm is presented in Section 4.3. 
We compromise by having the prediction step determine 
which restricted dags can start at a given point. If the re- 
strictor is chosen appropriately, this can be as constraining 
as predicting on the basis of the whole feature structure, yet 
prediction is guaranteed to terminate because the domain -f 
restricted feature structures is finite. This final extension ,,f 
Earley's algorithm is presented in Section -t.4. 
4.2 Parsing a context-free-based PATR-II 
We start with the Earley algorithm for context-free-based 
PATR-II on which the other algorithms are based. The al- 
gorithm is described in a chart-parsing incarnation, vertices 
numbered from 0 to n for an n-word sentence TL, I '', Wn. An 
item of the form \[h, i, A -- a.~, D I designates an edge in the 
chart from vertex h to i with dotted rule A -- a.3 and dag 
D. 
The chart is initialized with an edge \[0, 0, X0 -- .a, DI for 
each rule (X0 -- a, D) where D((.% cat)) = S. 
For each vertex i do the following steps until no more items 
can be added: 
Predictor step: For each item ending at i c,f the form 
\[h, i, Xo -- a.Xj~, D I and each rule ,ff the form (-\'o -- 
~, E) such that E((Xo cat)) = D((Xi cat)), add an 
edge of the form \[i, i,.I( 0 -- .3,, E\] if this edge is not 
subsumed by another edge. 
Informally, this involves predicting top-down all r~tles 
whose left-hand-side categor~j matches the eatego~ of 
some constituent being looked for. 
Completer step: For each item of the form \[h, i,.\o -- 
a., D\] and each item of the form \[9. h, Xo -- f3..Yj~/, E\] 
add the item \[9, i, X0 --/LY/.3', Eu iX/ : D(.X'0)I\] if the 
unification succeeds' and this edge is not subsumed by 
another edge. s 
~Note that this unification will fail if D((Xo eat)) # E((X~ cat)) and 
no edge will be added, i.e., if the subphrase is not of the appropriate 
category for IsNrtlos Into the phrase being built. 
SOue edge subsumes another edge if and only if the fit'at three elements 
of the edges are identical and the fourth element o{ the first edge 
subsumes that of the second edge. 
149 
Informally, this involves forming a nsw partial phrase 
whenever the category of a constituent needed b~l one 
partial phrase matches the category of a completed 
phrase and the dug associated with the completed phrase 
can be unified in appropriately. 
Scanner step: If i # 0 and w~ - a, then for all items {h, i- 
1, Xo --* a.a~3, D\] add the item \[h, i, Xo --* oa.B, D\]. 
Informally, this involves aliomin9 lezical items to be in- 
serted into partial phrases. 
Notice that the Predictor Step in particular assumes the 
availability of the eat feature for top-down prediction. Con- 
sequently, this algorithm applies only to PATR-II with a 
context-free base. 
4.3 Removing the Context-Free Base: An 
Inadequate Extension 
A first attempt at extending the algorithm to make use of 
morn than just a single atomic-valued cat feature {or less 
if no .~u,'h feature is mandated} is to change the Predictor 
Step so that instead of checking the predicted rule for a left- 
hand side that matches its cat feature with the predicting 
subphr,'~e, we require that the whole left.hand-side subdag 
unifies with the subphrase being predicted from. Formally, 
we have 
Predictor step: For each item ending at i of the form 
ih. i. Xo -- a.Xj~, DI and each rule of the form (Xo 
"~. E). add an edge of the form \[i, i, X0 -- .7, Ell {X0 : 
D(Xj)II if the unification succeeds and this edge is not 
subsumed by another edge. 
This step predicts top-down all rules whose left-hand 
side matches the dag of some constituent bein 9 looked 
for. 
Completer step: As before. 
Scanner step: As before. 
\[\[owever. this extension does not preserve termination. 
Consi,h,r a %ountin~' grammar that records in the dag the 
numb,,r of terminals in the string, s 
.5' -- T : 
<.~f) = a. 
T,-- T: .4: 
(TIf) = {T:f f). 
.b'-- :i. 
A~G. 
SSimilar problems occur in natural language grammars when keeping 
lists of, say, subcategorized constituents or galm to be found. 
Initially, the ,.q -.- T rule will yield the edge 
\[0,0, Xo ---, .Xt, x0 S\] 1 \[oo, T\] 1 
&: I: a 
which in turn causes the Prediction step to give 
\[0, 0, Xo -'- .Xi, 
eat: T \] X0: I: ~a 
\[ eat : T \] 
Xt: f: \[f: ~\] x,: feat a\] 
yielding in turn 
\[0, 0, .% -..X,, 
cat: T ) 
Xo: f: '~a 
f eat : i .If t : f: f: 
X,: \[cat: A\] 
If: l\]\] 
and so forth ad infinitum. 
4.4 Removing the Context-free Base: An 
Adequate Extension 
What is needed is a way of ~forgetting" some of the structure 
we are using for top-down prediction. But this is just what 
restriction gives us, since a restricted dag always subsumes 
the original, i.e.. it has strictly less information. Takin~ 
advantage of this properly, we can change the Predi,'ri~n 
Step to restrict the top-down infurulation bef~,re unif> in~ it 
into the rule's dag. 
Predictor step: For each item ending at i of the f(~rm 
Ih, i, .% -- c,..Y~;L DI and each rule of the form,{.\'0 -- 
"t, E}, add an edge of the form ft. i..V0 -- .'~. E u 
{D{Xi)I~4~}\] if the unification succeeds and this odge is 
not subsumed by another edge. 
This step predicts top-do,,n flit rules ,'h,.~r lefl.ha,d 
side matrhes the restricted (lag of .~ott:e r,o.~tilttcol fitt- 
ing looked for. 
Completer step: AS before. 
Se~m, er step: As before. 
150 
This algorithm on the previous grammar, using a restrictor 
that allows through only the cat feature of a dag, operates a.s 
before, but predicts the first time around the more general 
edge: 
\[0, o, Xo -- .X,, 
cat: T \] 
X0: f: ITi\[\] 
cat: T 
X,: f: if: l-if l 
A\] 
1 
Another round of prediction yields this same edge so the 
process terminates immediately, duck Because the predicted 
edge is more general than {i.e., subsumes) all the infinite 
nutuber ,,f edges it replaced that were predicted under the 
nonterminating extension, it preserves completeness. On the 
other hand. because the predicted edge is not more general 
than the rule itself, it permits no constituents that violate 
the constraints of the rule: therefore, it preserves correctness. 
Finally, because restriction has a finite range, the prediction 
step can only occur a finite number of times before building 
an edge identical to one already built; therefore, it preserves 
ter,nination. 
5 Applications 
5.1 Some Examples of the Use of the Al- 
gorithm 
The alg.rithnl just described liras been imph,meuted and in- 
(',>rp()rat,,<l into the PATR-II Exp(,rinwntal Syst(,m at SRI 
Itlt,.rnali(,)lal. a gr:lmmar deveh)pment :m(l tt,~,ting envirt)n- 
m,.))t fi,l' I'\TILII ~rammars writt(.u in Z(.t:llisl) for the Syrn- 
l)+)li('~ 3(;(ll). 
The following table gives s,)me data ~ugge~t.ive of the el'- 
feet of the restrictor on parsing etliciency, it shows the total 
mlnlber (,f active and passive edges added to the <'hart for 
five sent,,ncos of up to eleven words using four different re- 
strictors. The first allowed only category information to be 
,ist,d in prodiction, thus generating th,, same l)eh:wi<)r .as the 
un<'xte:M('(} Earh,y's algorithl,,. The -,<'('(,n,{ a<{d,,,l su{w:tle- 
m+,rizati+.n illf-rrllalion in a<l(lili(,n t<)lh(,(-:H+,~<)ry: Thethird 
a<hl-d lill.+r-gap +h,l.'ndency infornlaliou a.s well ~,<+ Ihat the 
~:tp pr.lif<.rati<,n pr-hlem wa.s r<,m<)ved. The lin:d restri<'tor 
ad,lo.I v<,rb form informati.n. The last c<flutnn shows the 
p,,r('entag+, of edges that were elin,inated by using this final 
restrh-tor. 
Prediction % 
Sentence eat\] + s.bcat I + gap t ÷ form elim. 
1 33 33 20 16 I 52 
2 85 50 29 21 I 75 
3 219 124 72 45 79 
4 319 319 98 71 78 
5 812 516 157 100 !i 88 
Several facts should be kept in mind about the data 
above. First, for sentences with no Wh-movement or rel- 
ative clauses, no gaps were ever predicted. In other words, 
the top-down filtering is in some sense maximal with re- 
spect to gap hypothesis. Second, the subcategorization in- 
formation used in top-down filtering removed all hypotheses 
of constituents except for those directly subcategorized \[or. 
Finally, the grammar used contained constructs that would 
cause nontermination in the unrestricted extension of Ear- 
ley's algorithm. 
5.2 Other Applications of Restriction 
This technique of restriction of complex-feature structures 
into a finite set of equivalence cla~ses can be used for a wide 
variety of purposes. 
First. parsing Mg<,rithnls such ~ tile ;d~<)ve (:all be mod- 
ified for u~e by grain<nat (ortnalintus other than P.\TR-ll. 
In particular, definite-clause grammars are amenable to this 
technique, anti it <:an be IIsed to extend the Earley deduc- 
tion of Pereira and Warren \[i 1 I. Pereira has use<l a similar 
technique to improve the ellh'iency of the BI'P (bottom- 
up h,ft-corner) parser \[71 for DCC;. I,F(; and t;PSC parsers 
can nlake use of the top-down filteringdevic,,a~wvll. \[:f'(; 
p,'tl~ot'~ n|ight be \[mill th;tl d() ll(d. r<,<\[11il'i. ;+. c<~llt+,,,;-l'ri,~. 
backl><.m,. 
• ";*'<'(rod. rt,~ll'i<'ti(.ll <';tlt l)e llmt'+l If> ~'llh;lllt'+' ,+l h,'r I+;~l'>ill~, 
:dgorithuls. Ig>r eX;lllll)le, tilt, ancillary fllllttic~ll to c.tlq)uto 
1.1{ <'l.sure-- whMi. like Ihe Earh,y alg-rithm..,itht,r du.,.+ 
not use feature information, or fails to terminate--,-an be 
modified in the same way as the Ea.rh,y I)re<lict~r step to ter- 
nlinate while still using significant feature inf<,rmati(m. LR 
parsing techniques <'an therel+y I)e Ilsed f,,r ellicient par'dn~ 
+J conll)h,x-fe:)+ture-lmn.,<l fiwnlalislun. .\l,,r(' -,l)*','ulaliv+,ly. 
,'++cheme~. l'(+r s,'hed.lin~ I,I{ l>:irnt.r:.-+ h~ yi..hl l,;~r.,,.-, i. l>rvl "- 
or+,m-e ,~r+h'r t.i:~hl I., it,,,lilie~l fi,r .',.mld.,x-f,,:lqur.-l,;r~.,,l 
fl)rlllaliP,.llln, alld et,'cn t1111t,<\[ Iw lll+,:)+tln .d + lilt + l.(,,+.tl'ivt~+r. 
Finally, restriction can be ilsed ill are:~.s of i)arshlg oth+,r 
than top-down prediction and liltering. For inslance, in 
many parsing schemes, edges are indexed by a categ<,ry sym- 
bol for elficient retrieval. In the case of Earley's Mgorithm. 
active edges can be indexed bv the category of the ,'on- 
stituent following the dot in the dotted rule. tlowever, this 
again forces the primacy and atomicity of major category in- 
formation. Once again, restriction can be used to solve the 
problem. Indexing by the restriction of the dag associated 
151 
with the need p.grmits efficient retrieval that can be tuned to 
the particular grammar, yet does not affect the completeness 
or correctness of the algorithm. The indexing can be done 
by discrimination nets, or specialized hashing functions akin 
to the partial-match retrieval techniques designed for use in 
Prolog implementations \[16\]. 
6 Conclusion 
We have presented a general technique of restriction with 
many applications in the area of manipulating complex- 
feature-based grammar formalisms. As a particular exam- 
ple, we presented a complete, correct, terminating exten- 
sion of Earley's algorithm that uses restriction to perform 
top-down filtering. Our implementation demonstrates the 
drastic elimination of chart edges that can be achieved by 
this technique. Finally, we described further uses for the 
technique--including parsing other grammar formalisms, in- 
cluding definite-clause grammars; extending other parsing 
algorithms, including LR methods and syntactic preference 
modeling algorithms; and efficient indexing. 
We feel that the restriction technique has great potential 
to make increasingly powerful grammar formalisms compu- 
tationally feasible. 
References 
\[I\] Ades, A. E. and M. J. Steedman. On theorder of words. 
Linguistics and Philosophy, 4(4):517-558, 1982. 
\[21 Ford, M., J. Bresnan, and R. Kaplan. A competence- 
based theory of syntactic closure. In J. Bresnan, editor, 
The Mental Representation of Grammatical Relations, 
MIT Press, Cambridge, Massachusetts, 1982. 
\[3\] Gawron, J. M., J. King, J. Lamping, E. Loebner, E. 
A. Paulson, G. K. Pullum, I. A. Sag, and T. Wasow. 
Processing English with a generalized phrase structure 
grammar. In Proeecdinos of the ~Oth Annual Meet- 
ing of the Association for Computational Linguistics, 
pages 74-81, University of Toronto. Toronto, Ontario, 
Canada, 16-18 June 1982. 
\[41 Gazdar, G., E. Klein, G. K. Puilum, and I. A. Sag. 
Generalized Phrase Structure Grammar. Blackwell 
Publishing, Oxford, England, and Harvard University 
Press, Cambridge, M~ssachusetts, 1985. 
\[51 Kaplan, R. and J. Bresnan. Lexical-functional gram- 
mar: a formal system for grammatical representation. 
\[n J. Bresnan, editor, The Mental Representation o/ 
Grammatical Relations, MIT Press, Cambridge, Mas- 
sachusetts, 1983. 
\[61 Kay, M. An algorithm for compiling parsing tables from 
a grammar. 1980. Xerox Pale Alto Research Center. 
Pale Alto, California. 
\[7\] Matsumoto, Y., H. Tanaka, H. Hira'kawa. II. Miyoshi. 
and H. Yasukawa. BUP: a bottom-up parser embed- 
dad in Prolog. New Generation Computing, 1:145-158, 
1983. 
\[8\] Montague, R. The proper treatment of quantification 
in ordinary English. In R. H. Thomason. editor. Formal 
Philosophy, pages 188-221, Yale University Press. New 
Haven, Connecticut, 1974. 
\[9\] Pereira, F. C. N. Logic for natural language anal.vsis. 
Technical Note 275, Artificial Intelligence Center, SRI 
International, Menlo Park, California, 1983. 
\[I0\] Pereira, F. C. N. and S. M. Shieber. The semantics of 
grammar formalisms seen as computer languages. In 
Proceedings of the Tenth International Conference on 
Computational Linguistics, Stanford University, Stan- 
ford, California, 2-7 July 198,t. 
\[11\] Pereira, F. C. N. and D. H. D. Warren. Parsing as 
deduction. In Proceedinas o/ the elst Annual Meet- 
inff of the Association for Computational Linguistics. 
pages 137-144, Massachusetts Institute of Technology.. 
Cambridge, Massachusetts, 15-17 June 1983. 
\[12\] Shieber, S. M. Criteria for designing computer facilities 
for linguistic analysis. To appear in Linguistics. 
\[13\] Shieber, S. M. The design of a computer language 
for linguistic information. In Proceedings of the Tenth 
International Conference on Computational Lingui,s- 
ties, Stanford University, Stanford. California. 2-7 July 
1984. 
\[14\] Shieber, S. M. Sentence disambiguation by a shift- 
reduce parsing technique. \[n Proceedinqs of the ~l.~t 
Annual Martin O of the Association for Computational 
Linguistics, pages 1i5--118, Massachusetts Institute of 
Technology, Cambridge, Massachusetts, 15-17 June 
1983. 
\[15\] Shieber, S. M., H. Uszkoreit, F. C. N. Pereira, J. J. 
Robinson, and M. Tyson. The formalism and im- 
plementation of PATR-II. In Re,earth on Interactive 
Acquisition and Use of Knowledge, SRI International. 
Menio Park, California, 1983. 
\[16\] Wise, M. J. and D. M. W, Powors. Indexing Prol.g 
clauses via superimposed code words and lield encoded 
words. In Pvoeeedincs of the 198. i International Svm. 
posture on Logic Prowammin¢, pages 203-210, IEEE 
Computer Society Press, Atlantic City, New Jersey, 6-9 
February 1984. 
152 
