COMPUTING FIRST AND FOLLOW FUNCTIONS FOR, 
FEA'rURE-THEoRE'rR? GRAMMARS 
Arturo Trujillo 
Computer Labora,tory 
University of Cambridge 
Cambridge CB2 3QG, gnghmd 
iat@cl.cam.ac.uk 
ABSTRA(3T 1.1 FIRST ANI) FOLLOW 
This paper des(:ribes an algorithm ff)r (;he com- 
lmtation of FII{ST and F()I~LOW sets for use 
with f(;atur(!-thcor('~(;ic grammars, in which the 
value of the sets t:onsist;s of pairs of feature- 
l;heorctic cat;o, gorit;s. The Mgori(;hm tn'(;sezve, s 
as mu(:h informed;ion from th(; gratnnmrs a,s 
1)ossibh',, using negative restriction to (h;fin(~ 
(',quivah;nc(,, classes. Addition of a simple (Ia(;a 
S(;I;IIt:tIII'e leads to ;m order of ilta, glli|;lld(,; iln- 
\[)rov('Jll(;ii(; ilt exet:tl(;ion l;ilne ov(~r a naivo, iHl- 
ph;mo, ntation. 
1 INTRODUCTION 
The, need for (;flicien(; parsing is a (:onsta,n(; one 
in Natural Langmtge, Pro('cssing. With the ad- 
vo, n|; ()f fea(;ur(>l;he, ore(;i(: gratmn;ti's, In;my of 
(;he optimization te(:hniqu(',s that w(;re a, pi)li(:a- 
bh; to Context Free (CI") grammnrs have r(;- 
(tuircd mo(titi(:ation. For insl;an(:(~, ~ mlml)(~r 
of algorithms used (;o ex(;ra,(:L p;~rsing l;a~bl(~,q 
fl'om CF grammars have involved (list:at(ling 
inform~t(;ion which ol;h(',rwisc, would have con- 
strain('d tim parsing pro('ess, (Brisco(; and Car- 
roll, 19!)3). This t)ap(',r des(:rib(~s an extension 
to an algorithm that ()pe, r~L(;('~s over CF gram- 
nmr to make, it at)pli(:M)h; to f(',atur(',-th(;oretic 
()lieS. One advantage of th('~ (;x(;en(h~d Mgo- 
rithm is (;ha£ it, i)r(;serves ~ts much of l;hc in- 
fOrlnalAon in (;h(! graininar as I)ossibh;. 
\[n order to make more t,,\[\[icient parsers, it; is 
SOln(;gimt',s necessary (;o pr(;process ((:Olnl)ih;) a 
gra, illlBa,r 1;o ext;ra,c(; \]'rom it; (;o\[)-down inforlrm- 
~ion (;o guide (;h(; s(;mr:h during analysi.q. Tim 
\[irs(; sl;(;p in I;ho, prel)ro(:essing stage of sev- 
eral (;omi/ilal;ion algorithms requires (;h(; sohl- 
tion of (;wo \['un(',tions normally (:al\]e(l FII{,ST 
and F()\]AA)W. Intui(;iv(~ly, lrIILS'T(X) gives 
us (;h(; (;(',rmina, l symbols (;ha(; may app(;nr in 
hfil,ial posi(;iou in substrings de, riv(;(I froiD, c;L|;(~- 
gory X. FOLLOW(X) gives us (;Ira (;c'rminals 
whi(:h may imm(;dia,t(;ly follow a su|)string o1' 
(:~(;e, gory X. For (,~xami)lc , in l;h(; gra,mmar S 
> NI' VP; NP - > (lel; II(tltlI; VP --} vtra NP, 
wc go(;: 
l~'H~sr(s')- v'lies'-r(N/,)-: (J,c~,}, 
v~llc, s'~r(yl,) - {,,,~,,,..,}, 
FOI.I.0W(A~,') : {,,~,,.,, ,}, 
ml,mw(s') - t,~01~l~(~p~,,(V~,). (,} ($ 
marks trod of input,) 
Th(;so, (;wo runc(;bmLs a,t:(~ hnpc, rtan|; in a \]a, rgo, 
l'il, II~( ~. ()f algorithms used for constructing el'- 
licit;n(; l)arsexs. For ex;mq)le th(; l,I{-tmrse, r 
const, rll(stion algorithm given in (Aho c't al., 
1986):232 uses FIRST to ('omt)ute item do- 
SllI'( ~, va, hles. AIm(;h(~r examph; is (;h{'~ t:ompu- 
(;ation of the /* rela, tion whit:h is used in tim 
c(mstru('(;ion of go, nt;ralize,(1 left-(:orn(~r parsers, 
(N(;dcrhof, \] !)9:}); (;his relation is etl'ct:tivcly ~m 
ex(;ensi(m of t;h(; funt:(;ion FIRST. 
875 
2 COMPUTING FIRST AND 
FOLLOW 
We propose an algorithm :\[or the computa- 
tion of FIRST values which handles feature- 
theoretic grammars without having to extract 
a CF backbone from theln; the approach is eas- 
ily adapted to compute F()LLOW values too. 
An improvement to the algorithln in presented 
towards the end of' the pat)er. Betbre describ- 
ing the algorithm, we give a well known proce- 
dure for coinputing FIRST for CF grammars 
(taken from (Aho et al., 1986):1189, where e is 
the empty string): 
"rio conlpute FIRST(X) for all grammar sym- 
bols X, apply the following rules until no more 
terminals or e can be added to ally FIRST set:. 
1. If X is terufinal, then FIRST(X) is X. 
2. If X -+ e is a production, then add e to 
m~ST(X). 
3. If X is nonterminal and X --~ Y1Y,2...Y~ is 
a. production, then place a in FI.I?,ST(X) if 
\['or some i, a is in FIR, ST(Yi), and e is in 
all of F1RST(YI) ... FIRST(Yi_:I); that is, 
Yt...Y/ I ~ e. If e is in F\[12,ST(Yj) for all 
j = 1, 2,..., k, then add e to FIRST(X). 
Now, we can compui;e FIRST t:br any string XI 
X.e...Xu as tbllows. Add to FIRST(XIX2...X~z) 
all of the non-e symbols of I,'II?.ST(X,). Also 
add the non-e symbols of 1,'I.BST'(X,2) ire is in 
.FI.RST(Xt), the non-e symbols of P'll{ST(Xa) if 
e is in both t,'IH.ST(X,) and F1RSfl'(X2), and so 
on. Finally, add ~ to FIH.ST(XIX.e...X,~) if, tbr 
all i, FIH, ST(Xi) contains e." 
This algorithln will fbrm the basis of our pro- 
posal. 
3 COMPILING FEATURE- 
THEORETIC C~RAMMARS 
3.1 EQUIVAI,ENCE CLASSES 
The inain reason why the al)ove algorithm can- 
uol: be used with li~al, ure-theoi'etic grammars is 
that in general the number of possibh; nonter- 
minals allowed by the gralnmar is intinit~e. One 
of the simplest ways of showing this is where 
a grammar accumulates the orthographic rep- 
resentation of its terminals as one of its fea- 
ture values, it is not difficult to see how one 
can have an infinite mmlber of NPs in such a 
granlInar: 
NP\[orth: the (log 1 
NP\[orth: the fat clog\] 
NP\[orth: the big Nt dog\], etc. 
This means that l~'Ii~ST(NP\[orth: the (tog\]) 
would have a different value to FllLgT(NP\[ 
orth: the fat dog\]) even though they share 
the same left;most terminal. That is, |:tie ilia - 
ture structure for the substring "det adj noun" 
will be different to that for "det noun" ewm 
though they have tile same starting symbol. 
This point is important since similar situations 
arise with the subcategorization frame of verbs 
and the selnan(;ic value of categories in con- 
temporary theories of grammar, (Pollard and 
Sag, 1992). Without modification, the algo- 
rithm above would not terminate. 
The sohltion to this problem is to define a 
finite number of equivalence classes into which 
the infinite uumber of nnnterminals inay be 
sorted. 'Fhese (',lasses may be established in 
a number of ways; the one we have adopted in 
that presented by (Harrison and Ellison, \] 992) 
which builds on l;he work of (Shieber, 1985): it 
introduces the nol;ion of a negative restrictor 
to define equivalence classes. In this solution 
a predefined portion of a category (a specific 
set of paths) is discarded when determining 
whether a category belongs to an equivalence 
(:lass or not. For instance, in the above ex- 
ample we could define the negative restrictor 
to be {orth}. Applying this negative restrietor 
to each of the three NPs abow~' would discard 
the infbrmation in the %rth' feature t,o give us 
three cquiwflenI; nonterminals. It, is clear that 
the restrictor must be such that it discards fea- 
tures which in one way or another give rise I;o 
an infinil;e munl)er of nOlfl;erminals. Unl'ortu- 
nately, terlnination in not guaranteed for all 
restrict;ors, and \['llrl;hermore, the, best restric- 
tOl' CalUIOt; l)e chosen automatically since it de- 
pends on the amount of grammatical informa- 
tion I;hat is t;o be preserved. Thus, selection 
876 
o\[ :m ~t)t)roi)rial;e restrictor will det)(',IM on the 
parti(:ub~r grammar or system used. 
3.2 VA\] A)E SIIAR.ING 
Ano(;her prol)leln wil;h the Mgo,:il;hm Moove is 
l;ha.t, ree.ntranci(:s bel;w(:en a. category a)\[(t its 
Iqll.ST a.nd F()I,I,()W values are n()(. t)reserved 
in the sohition to (;hese t'unct;iollS; this is be- 
cause (he algoril;hlu assumes al;omic syml)ols 
and /;hese ca,m~ot encode (~xI)licilJy ,~ha, red in- 
f()rmation l)etwe(;l~ c~t(;t:gories, l'br example, 
cousid(:r the \[oIlowing ha,ire gra, mnm, r: 
S: :> Ne\[a.gr: X\] VP{a.gr: X\] 
VP\[agr: X\] ~> Vint\[a,gr: X\] 
NP\[~,gr: X\]-5 Det N\[a.gr: X\] 
We would like l,h(: solul;i(m of I,'OLLOW(N) 
t() in(:h\]de l;h(: l)in(ling o\[ the 'ag\]" f(:a,ture 
Sllch t;ha(; (;he va.hl(: of F()IA,()W ,'(~s(,ml)h:d: 
: x\]): : x\]. 
(:he a.lgoril;hm above, even wi(;h a. r(:s(;ri(:t;or, 
would nol, prese)'ve such at l)indiug siuce the 
a,dditi(m of a new ca, t('~go)'y to I,'OLLOW(N) 
is don(', indel)e.nd(',utly of the bindings \[)(;l:w(',(',n 
(;he new (:a,i:egory ~tlut N. 
4 Tile BASIC AI,QOI{.ITHM 
We l)rOpose an algorithm which, rather than 
cousl;ru(:{; a set; of categories as (;\[t(~ vah\]e of 
l,'II1.S"l' a.nd F()M,()W, <:onstru(:(;s a. set of pairs 
each of which represeuts a (:M;egory and its 
FIRST ov F()I,LOW category, with all the (:or- 
rect biudings exp\]i(:it;ly encoded. For instant(:, 
for l;he a.hove (:xa.iill)l( L (,he pair (Vl>\[agr: X\], 
Vint\[agr: X\]) would t)e in l;ho. set r(,pres(:nting 
the vMue ()f (;he fllll(:I;Joll FII{.ST. In th(~ uext 
section the a.lgorithm for (:OlUf)ul;ilu L FIll.ST is 
d(:s(:ril)(,.(l; (:ompul;a.l;io:t)oi' F()I~\],()W t>ro(:(~e(ls 
in a similar l'ashion. 
4.1 SO\],VIN(; FI.IZSq? 
When modifying the a.lgorit;hm of Section 2 
w(' note 1;ha.l; (:a.ch o(:(:mren(:(: o\[' a. (:al;eg()ry iu 
(;he grammar is pol;e.n(>ia.lly <list;in('.(; \['rom ev- 
(:1' 3, o(;her (:a.Le.gory. \])1 addit;iou, l()r each cat,c- 
gory we nee(| I;o r(:memb(u' a, ll the reentrmtcies 
between it aud the da,ughters wi(;hin the rule 
ill which i(; oc.(:ltrs. Finally, we assmne that 
any ca, tegory hi a, rule which c~m unify with 
a lexica.1 category is marked in some way, say 
by using the t'e~ture-wthle pair 'l;er: +', and 
I;ha.l. llOtl-(;(!rttlilla.l caJx;gori(,s IIIllS|; llni\[y with 
the tool;her o\[' ~ome rule in the grammar; the 
latter con(lit;ion is ne(:essaxy he(:ause the Mgo~ 
rithm only c(mllmLes the solutiou of FIll.ST \[or 
h:xi(:a,l (:a.lx:gories in' for (:a£egories tJml; occur as 
mot, hers. 
\]n corn\]rot;in ~ Iql{.g'r w(' i(,era.l;e over ~1| \[;he 
rules ill t;h(! gF;LIHHI&I', (;re.al;ing t, he i\[loi;h('.l' O\[ 
each rule as the category fl)r which we m'e (;ry- 
ing (,o lind a FIll.ST wdue. Throughout each 
i(x~ral:ion, unific~l;ion of a, (la.ugh(;er with tim lhs 
of an eh:Inent o\[ lql{ST resul(;s in a. modified 
rule and ~ modified pnir in which bindings be- 
(;ween the mot;her category mM the )'hs o\[ the 
pair are (~si;a, lflislmd. The modi\[ied mot;her aim 
rhs are \[;h('.tl ll,q(:(l (,o (:o,lH(;rtl(:l; I,ho 1}air which 
is added to F\[\]{ST. l)'or iusta.nce, giwm rule 
X - > ~" ~u,d pair (L, l~), w(! unify Y and L to 
t<iw: X'- } }7, and (I7, 1{); DOln these the pair 
(X', l~ t) is COllSl, t'llc:l;cd ~tll(| added 1,o \] I ~,S \[. 
The algorith\]n a.ssumes an op('ra£ion -I-~. 
which (:onsLrll(:l;.q a. sel; H' -- ,5' -} <7 /) ill the lbl- 
lowing w~Lv: i\[ pair p sul)smues an element; a 
of 5 then S' = ,ff - o, fl- p; if p is subsulned 
I)y an (Qement of ,%~ (;hen 3;' ~= ,%'; else S' - ,S 
) p. 1(; should b('. uol;ed (;trot the pairs col> 
stitul;h~g the wflue of li'II{.ST can themselw~s 
l)(: comlm.red using the subsumption relation in 
whid~ reeIll;ran(; wdu(;s a.re su\[),'-;ulIcled by non- 
r(:(:lll;ra.ti1; oIlcs~ 3AI(\[ combined using the uni\[i- 
cation olmration. Thus in the pl'in(:ipal step 
of the a.l~;orithm, a. new \]mir is constructed ;is 
described above, ~ restrictov is applied to i(;, 
a.nd the resulting, resl;ricted pair is +<-added 
to FIRST. 'Phe a.lgorithm is a.s follows: 
\]. \[nitia, iise t"i'r,sl,. ~ {}. 
2. l~,un through a J1 the da,ughgers in Lhe 
gramma, r. If X is pre-t;erlninal: then 
fci,~.,~t :- Fi,~.,~t I< (X,X)N, (whore 
(X,X)!q> meaus a.pply the nega.tiw: re.- 
,%ri(:tor (P (x) l~he. ira, it (X, X)). 
3. For each rule in the grammar with mother 
877 
S 
S 
VP\[agr: X, slash: Y\] 
NP\[agr: X, slash: NULL\] 
NP\[slash: NP\] 
=> NP\[agr: X, slash: NULI,\] VP\[agr: X, slash: NULL\] 
NP\[slash: NULL\] NP\[agr: X, slash: NULL\] VP\[agr: X, slash: NP\] 
=> Vtra\[agr: X, ter: +\] Ne\[slash: Y\] 
-~ Det\[ter: +\] N\[ag,': X, ter: ÷.\] 
=>-e 
Figure 1: Example grammar with value sharing. 
X, apply steps 4 and 5 until no more 
changes are made to First. 
4. If the rule is X -+ e, then First = 
First +< (X, e)!e~. 
5. if the rule is X -+ V,..Y~..Yk, then First = 
First +~ (X', a)l(I ). if' ~(Y'i, a) has success- 
flflly unified with an eleinent of First, and 
(~,, e, )... (~%, ei_~) have all successfully 
and simultaneously unified with members 
of First. Also, First = First+< (X', e)\[(l~ 
if (Y(, e~)...(Y\[, e~) haw ~. all suc(:essfully 
and simultaneously unified with elements 
of lvir',vt. 
6. Now, for any string of categories Xl 
..X~..X,~, First = First +< (X',...X\[,,, a)!(I) 
if (X~, a) has sueeessflflly unified with an 
element of First, and a f e. Also, for 
i -- 2...n, Fir.st = t"ir'st+< (X~...X n, a)!q) 
if (X',a) has suceessfiflly unified with 
an eMnent of First, a ~ e, and 
(X~, e, )... (X~_l, ci-1) have all sueeessfidly 
and simultaneously unified with members 
of First. Finally, First = First +< 
(Xf...X,'~, ¢)!(I' if (X',,e,)...(X~, %) have 
all suecessflflly and simultaneously unified 
with members of First. (This step may be 
eomtmted on demand). 
()no observation on this algorithm is in order. 
Tim last; action of steps 5 and 6 adds e as a 
l)ossible wfiue of FII{ST for a mother category 
or a. string of categories; such a wflue results 
when all daughters or categories have e as their 
FII2.ST value. Since most grammatical descrip- 
tions assign a category to e (e.g. to bind onto it 
information necessary for correct gap thread- 
ing), the. pairs (X',{-)or (X\[...X.\[~, ¢) should 
have bindings between their two elements; this 
creates the problem of deciding which of the 
cs in the FIRST pairs to use, since it; is possi- 
t)le in principle that each of these will have 
a difl:erent value for (. In our irnplementa- 
tion, the pair added to First in these situa- 
tions consists of' the mother category or the 
string of categories and the most general cate- 
gory for e as defined by the grammar, thus et- 
fectively ignoring any bindings that e may have 
within the constructed pair. A more accurate 
solution would have been to compute multiple 
pairs with c, construct their least upper bound, 
and then add this to First. However, in our 
implementation this solution has not t)roven 
necessary. 
4.2 EXAMPLE 
Assuming the grammm: in Fig. 1 and the neg- 
ative restrietor (\]) = {slash}, the following is a 
simplified run through the algorithm: 
• ~',:r.~t = {} 
• After processing a.ll in'e-terminal categories 
Fir,st - {(Dot, Dct), (N, N), (Vt'ra, Vtra)} 
(obvious bindings not shown). 
* After I;he first iteration First = {(Det, De.t), 
(N, N),(W, ra, Vt,',*),(VP{,,,r," : X\], W','~+,,:," : 
xJ), (NIP.. Wt), (NI5 ~)} 
* Since ~slash' is in (.1}, any of the NPs in the 
grammar will unit) with the lhs of (NP, e) and 
hence S will have Vtra as part of its FIRST 
w~lue..rir,t = {..,(V l'\[o,,,' : X\], Wra\[a,," : X\]), 
(NP, Det),(NP, e), (S, De, t), (S, Vtra)} 
• The next iteration adds nothing and the first 
stage of the algorithm terntinates. 
The second stage (step 6) is done on demand, 
for example to eomtmte state transitions for 
a parsing table, in order to avoid the expense 
of colntmting FIRST for all possible substrings 
of categories. For instance, to compute FIlq, ST 
for the string \[NP NP VP\] the algorithm wo,'ks 
as follows: 
878 
• {.., (v : x\]), 
(N 
• After considering the firsl, NP: t,'i'rsl, -: 
{.., (fro' NIP Vr\], D~:t)}. 
• (3onsi(l('a:~fl;ion of 1;11(,. sc(:ond NI ) in I;h('~ input 
sl,ring rcsul(;s in no (:ha.uges to t,'iv.,d., given the s(> 
manl;i(:s of-} <, sin(:e tim pair l;lutl; il; wouhl have 
,, Sin(-(; Nf's can rcwri(;(~ as < (i.(', (NI',() 
is in \[;"i','>;l,), l'"i'v,'~l, :: {.., (\[N\[' .N\]' V\["\], I)e.f), 
(\[Nr m' vr\], 
• Finally, (\[NI' NP VI'\], c) ma,y not t)e added since 
(VI', () does nol; unif~y wil;h ;my clemett(, of Fir'sl,. 
5 IMPROVING THE SEARCH 
THI/,OUCdt FiTst 
1\[ (;he a.lgoril;hm is r,m a,s t)r(;s(;nt(~(l, ea.(:h il:- 
eration I;hrough l;ha gramunar rules lm(:()mes 
slow(;r a, nd sl(-)w(;I'. The r('.a,son is (;\]l~:tl;> iH sl;e\[) 
5, when st!a,r(:hing l"i'rst to cr(:at(, a new Imir 
(X', o,), every 1)a, ir in l,'i'rsl, ix cousi(h;red and 
unilical, ion of its lhs with the relevanL daughter 
of ,V ~(;teml)l,('xl. Sin(:(; en(:lL i(,(~raLion n()rmMly 
adds pah's to Fi','st ca.oh i(;(,r~t;ion involves a 
s(mr'(:h I;hrough a larger ~t\[l(l larger s('.(;; fm- 
(;hertm)re, (;his search involves utfilic~rt;ion, a.nd 
in the case of a. su(:(:(;ssful match, tit(; subse- 
quent; (:(instruction and a(l(tition to Fi'rst also 
r('quir(:s sul)sumption che(:ks. All ()f t;hese Ol)- 
erations (:olnbine (;o make ea.t:h a(hlitit)nal ele- 
m(;nl; in 1/i'rsl, lu~ve ~v strong effect, on the per-- 
\[brma,nce o\[ (;he Mgorithm. \Ve (;h(~rel'ore ne(,.d 
(;o mmilnize (;he number of pairs searched. 
C(msi(h,ring the d(:t)('nd(,nci(!s that exist t)(> 
Lwee, u pairs in Fivsl, one nol;iccs (;lust; once a 
pair has been consi(ter(M in rela(:ion wit;h a, ll 
I;hc rlllcs in the gralnnlaa', I;he efl'(~cl; of thai, 
l);rir has |)eeu COml)h;l;(;ly dctermin('.(\[. Thai; is, 
a.ft(;r a, pair is added to Fi'r>d, i( n(>.d only I>(, 
(:onsidcr(:d u I) (;o a.nd int;luding (he rule frOIll 
which it was d(wivo.d, aft;ev which time it; may 
lm excluded from fl~rtho.r se;trches. For exa.m~ 
t>le, ta.ke th(: previous gra.IllIiii-lr, a.lld ilt pa.rtic- 
ul;n' (:h('~ va.hw of l/'irsl, a\['l;o.r 1;\]l('~ first i l;ei'~t;ion 
through th(: algorithln. '\['he lmir (NI), l)c,t), 
a,dded Iwca, use of l;hc rifle NP\[~gr: X, slash: 
NUI3~\] -~ DeL\[tm: +\] N\[a.gr: .X, Ix:r: +\], ha.s I;() Im 
(:onsi(lered only once by every ruh', in the gram- 
max; M't, er thai;, this I)a.ir cmmot hc involved in 
l;he ('.onsl,ru(:tion of new values. 
A siml)le data. stru('.ture which keel)s I;rack 
of thos(! pairs (;hat; n(;ed to be sear(:hcd a.(, any 
one tim(; was added 1;o the Mgoril,hm; the (la.ta 
S(;I'tlC(;III'(~ l,ook l;hc l'()l'l\[l o\[' ~ list of l)oin(;ers I;o 
a.cl:ive pa.irs it) l,"i'rst, whel:(~ m, a.('.l;ivc pair is 
one which has t~o(: linen (:()nsid(,red l)y the rule 
from which il; was c.(mst;ru(:t(M. For exa.ml>l( h 
the pair (NP, I)(t) would 1)('. a.('.l;ive for a com- 
l)le(x~ it(,ral;ion l'vom the moment tim(> the cot-- 
responding rule in(,roduc.ed iL until that rule is 
visited again (hiring (;he second it(u'~ti(m. The 
effe(:t of this policy is (;o allow eaclt pair in 
l;'i'rsl, to be (;este(l against each )'ul(~ exa(:l:ly 
OlI(:(: a, ll(\[ (;hell \])e ex(:lu(led \['rolil slll)st~(lllell(; 
s(:ar('h(:s; this g)'ea.(;ly r(~(lu(:(!s th(: mtml)er (>I' 
pairs considered for ca,dr il;era(,ion. 
Usin/~; th(' 't'yt>e<l l"(,a.l;ure St;ru<:l;m'e sysl:(!m 
(tit<,. LKI\]) of (lh.is<:<m el, al., 1993), we wrole 
two gr~mmmrs and (;esl;(;d l;h(: algoril;hm on 
l;h(~ttl. 'l"a.ble 1 shows the average llllllIllCI' Of' 
pairs c(mside)'(~d for cad1 i(;(:rat;ion (-ompa.r(:(l 
I;o (;h(~ a.vera.ge mtmber of pairs in l,'i'rst. 
\[ 13 Ihfl! Grammar I 21 Rule (,rmnmar \] (,'on~\]d,~,\]~d 
%(:~,1 (,,,.' s,)h,~,(~,\] \[ :coL~L 
1 \]{)271 - i).7 \[ u :7 
- \]2E) .o \] _ A;qLup o 
Tabl(; f: Average mmflmr of 1)a.irs per iteral;ion. 
As we ca.n see., ah;er the first iteral;ion Lhe 
mmflw, r of lmirs I;h~rt needs to be considered 
is lnss (lnltch h.',ss t()i Lhc final iteration) thau 
t, he l;oLal mlml)er o\[ pfirs in I<'i'rsl,. Similar im- 
I)rOV(mWnl;s in per\['ormance were obga.ined for 
the (:Oml)Ul;ation of F()I,IX)W. 
6 \[~iEI,ATED R,ESEARCH 
~l'he exl,eusion to the 1,1{ al~gorit, hn~ presenlx~d 
by (Na,ka,za,wa, 1991) uses a, similar a4)proa,ch 
t;o LhaL dea',rilmd here; the \['undsions inw)lwM 
hOWCVtW ,~LI'(! I;h()s(~ \[l(~CCSS/,/,l'y \['()1' \[;h(; /;OllSLI'llC- 
(;ion of an /J{ pa.rsing t;al)le (i.e. the GOTO 
and A(YI'I()N funcl;i(ms). One technica.l (lit t 
879 
ference between the two approaches is that he 
uses positive restrictors, (Shieber, 1985), in- 
stead of negative ones. In addition, both of 
his algorithms also differ in another way from 
the algorithm described here. The difference 
is that they add items to a set using simple 
set addition whereas in the algorithm of Sec- 
tion 4.1 we add elements using the operator 
+<. Furthermore, when computing the clo- 
sure of a set of items, both of the algorithms 
there ignore the effect that unification has on 
the categories in the rules. 
For example, the states of an LR parser are 
computed using the closure operation on a set 
I of dotted rules or items. In Nakazawa's al- 
gorithms computation of this closure proceeds 
as follows: if dotted rule < A ~ "w.Bz > is 
in I, then add a dotted rule < C -+ .y > to 
the closure of I, where C and B unify. This 
ignores the fact that both dotted rules may be 
modified after unifcation, and therefore, his 
algorithm leads to less restricted \[ values than 
those implicit in the grammar. To adapt our 
algorithm to the computation of the closure 
of I for a feature-theoretic grammar would in- 
volve using a set of pairs of dotted rules as the 
value of I. 
7 CONCLUSION 
We have extended an algorithm that manip- 
ulates CF grammars to allow it to handle 
feature-theoretic ones. It was shown how most 
of the information contained in the grammar 
rules may be preserved by using a set of pairs as 
the value of a function and by using the notion 
of subsumption to update this set. Although 
the algorithm has in fact been used to adapt 
the constraint propagation algorithm of (Brew, 
1992) to phrase structure grammars, the ba- 
sic idea should be applicable to the rest of the 
flmctions needed for constructing LR tables. 
However, such adaptations are left; as a topic 
for future research. 
Finally, improvements in speed obtained 
with the active pairs mechanism of Section 5 
are of an order of magnitude in an implemen- 
tation using Common Lisp. 
ACKNOWLEDGEMENTS 
This work was funded by the UK SERC. I 
am very grateflfl to Ted Briscoe, John Carroll, 
Mark-Jan Nederhof, Ann Copestake and two 
anonymous reviewers. All remaining errors are 
mine. 
References 
Aho, A. V., R. Sethi, and J. D. Ullman , (1986). 
Compilers - Principles, Techniques, and Tools. 
Addison-Wesley Publishing Company, Read- 
ing, MA. 
Brew, C., (1992). Letting the cat; out of the 
bag: Generation for Shake-and-Bake MT. In 
COLING-92, pages 610-616, Nantes, l~-ance. 
Briscoe, E. and J. Carroll, (1993). Generalised 
Probabilistic LR Parsing of Natural Language 
(Corpora) with Unification-Based Grammars. 
Computational Linguistics , 19(1):25 60. 
Briscoe, E., A. Copestake and V. de Paiva (eds). 
(1993). Inheritance, D@ults and the Lexicon. 
Cambridge University Press, Cambridge, UK. 
Harrison, S. P. and T. M. Ellison, (1992). Re- 
striction and \[\[L'rmination in Parsing with 
Feature-Theoretic Grmmnars. Computational 
Linguistics, 18 (4) :519-530. 
Nakazawa, T., (1.991). An Extended LR Parsing 
Algorithm for Grammars using Feature-Based 
Syntactic Categories. In Proceedings European 
ACL 91, pages 69--74, Berlin, Germany. 
Nederhof, M., (1993). Generalized Left-Corner 
Parsing. In Proceedinos European A CL 93, 
pages 305 -314, Utrecht, The Netherlands. 
Pollard, C. and I. Sag, (1992). 
Phrase Structure Grammar. 
sity Press, IL. 
Head Driven 
Chicago Univer- 
Shieber, S. M., (1985). Using Restriction to Ex- 
tend Parsing Algorithms for Complex-Feature- 
Based Formalisms. In Proceedings ACL 85, 
pages 145-152, Chicago, IL. 
880 
