NOTES ON LR PARSER DESIGN 
Christer Samuelsson 
Swedish Institute of Computer Science, 
Box 1263 S-164 28 Kls'ra, Sweden. lg-maih christer@sics.se 
1 INTRODUCTION 
This paper discusses the design of an LR parser for 
a specific high-coverage English grammar. The de- 
sign principles, though, are applicable to a large class 
of unification-based grammars where the constraints 
are realized as Prolog terms and applied monotonically 
through instantiation, where there is no right move- 
ment, and where left movement is handled by gap 
threading. 
The I,R. parser was constructed for experiments on 
probabilistic parsing and speedup learning, see \[10\]. LI{ 
parsers are suitable for probabilistic parsing since they 
contain a representation of the current parsing state, 
namely the stack and the input string, and since the 
actions of the parsing tables are easily attributed prob- 
abilities conditional on this parsing state. LR parsers 
are suitable for the speedup learning application since 
tile learne~ grantmar Ls much larger than the original 
grammar, and the prefixes of tile learned rules over- 
lap to a very high degree, circumstances that are far" 
from ideal for the system's original parser. Even though 
these ends influenced the design of the parser, this ar- 
ticle does not focus on these applications but rather on 
the design and testing of the parser itself. 
2 LR PARSING 
An LI{ parser is a type of shift-reduce parser originally 
devised by Knuth for programming languages \[4\]. The 
success of LR. parsing lies ill handling a number of gram- 
mar rules simultaneously, rather than attempting one 
at a time, by the use of prefix merging. LI~. parsing in 
general is well described in \[1\], and its application to 
natural-language processing in \[12\]. 
An LR parser is basically a pushdown automaton, 
i.e. it has a pushdown stack in addition to a finite set 
of internal states, and a reader head for scanning the 
input string from left to right, one symbol at a time. In 
fact, the "b" in "LW' stands for left-to-right scanning 
of the input string. The "W' stands for eonstr, cting 
the rightmost derivation in reverse. 
The stack is used in a characteristic way: The items 
on the stack consist of alternating grammar symbols 
and states. The current state is the state on top of the 
stack. The most distinguishing feature of an LR. parser 
is however the form of the transition relation -- the 
action and gore tables. A non-deterministic LR parser 
can in each step perform one of four basic actions. In 
state S with lookahead symbol Syra it can: 
1. accept (S, Sym) : llalt and signal success. 
2. shift(S,Sym,S2): Consume the sylnbol Sym, 
place it on tile stack, and transit to state $2. 
3. reduce (S, Sym, R) : l'op off a number of items Dora 
tile stack corresponding to tim I{IIS of grammar 
rule R, inspect the stack for tile ohl state S1, place 
the LttS of rule tt on tile stack, and transit to state 
$2 determined by goto(Sl,LHS,S2). 
4. error(S,Sym): Fail and backtrack. 
PreIix merging is accomplished by each internal 
state corresponding to a set of l)artially processed gram- 
mar rules, so-called "dotted items" containing a dot (.) 
to mark the current position. Since the grammar of 
Fig. 1 contains Rules 2, 3, and 4, there will be a state 
containing the dotted items 
VP -~ V. 
VP --~ V. NP 
VP -~ V. NP NP 
This state corresponds to just having found a verb (V). 
Which of the three rules to apply in the end will be 
determined by the rest of the inl)ut string; at this point 
no commitment has been made to either. 
Cornpiling L\[{. parsing tables consists of construct- 
ing the internal states (i.e. sets of dotted items) and 
from these deriving the sl,ift, reduce, accept and tote 
entrie.s of tile transition relation. New states can be in- 
(h, ced from previous ones; given a state S1, another 
state S2 reachable from it. by goto(Sl,Sym,S2) (or 
shift(Sl,Sym,S2) if Sym is a terulinal symlml) can be 
constructed as Ibllows: 
I. Select all items in state S1 where a particular sym- 
bol gym follows immediately afte,' the (lot and move 
the dot to after this symbol. This yiehls the kernel 
items of state S2. 
2. Construct the non-kernel closure by repeatedly 
adding a so-called non-kernel item (with the dot 
at the beginning of the I{IIS) for each grammar 
rule whose LIIS matches a syn,bo\] following the 
(lot of some item in $2. 
Consider for example the grammar of Fig. 1, which will 
generate the states of Fig. 2. State I can be constructed 
from State 0 by adwmcing the dot in S --~ . NP VP and 
NP --+ • NP I'P to form the items S ---+ NP . VP and 
NP --~ NI' • PP, which constitute tire kernel of State 1. 
The non-kernel items are generated by the grauunar 
386 
S --~ NP VP (1) 
lq' -~ v (2) 
vs, .-+ V NS' (3) 
VP -~ V NP NP (4) 
VP -+ VP PI' (5) 
NP -, l)et N (6) 
NI' -~ tb'on (7) 
NI' -+ NI' I'P (8) 
1't > -+ Prep NP (9) 
Figure 1: A toy grainniar 
rules for VPs and PPs, the categories following the dot 
in the new items, namely Ihlles 2, 3, 4, 5 aml 9. 
Using this method, the set <>f all parsing slates can 
I>e induced from an initial state whose single kernel item 
has the top symbol of the grammar preceded by the 
dot as its RI\[S (the item S' --+ • S of State 0 in Vig. 2). 
The accept, shift and goto e.ntries fall out autonmtically 
from this procedure. Any dotted item where the dot 
is at the end of the I{,IIS gives rise to a reduction l)y 
the corresl>onding gramm~tr rule. Thus it remains to 
determine the lookahead sylnbols of the reduce enl, ries. 
In Simple LIt (SLR) the h)okahead is any termiual 
symbol that can imnlediately follow any symbol of the 
saltle tylie as the LIIS of tile rule. In l,ookAhead 1,1L 
(LALIL) it is lilly terminal sylnbol that cali ilriiue(liately 
follow the LIlS giwm that it was constructed using this 
rule in this state, hi general, I,AI,R gives COilsiderably 
fewer reduce entries than SI,I{., and thus results in faster 
parsing. Ill the experiments this reduced the l)arsing 
tiines by 30 %. 
3 PROBLEMS WITH LR PARSING 
The l)roblems of applying the Lit-parsing scheme to 
large tmification grammars for natural language, rather 
than small context:free grammars for progranmling lan- 
guages, stem from three sources. The tirst is that syu> 
bol matching no h)nger consists of checlcing atomic sym- 
bols for equality, but rather comparing COml)h~x \['eaLur(~ 
structm'es. The second is tile high lewq of ambiguity 
of natural hmguage and the resulting non-determinism. 
The third is tile sheer size. of the gratllli'mrs. 
Straight-forward resorting to a context-free back: 
bone grammar and subsequent filtering using the full 
constraints of the underlying unification gramnrar (U(1) 
is an al>proaeh taken by lbr example \[3\], The I)roblem 
with this al>proaeh is that the I>redictive power of I, he 
unification grammar is so vastly diluted when feature 
l>ropagation is omitted. Firstly, the context-free l>ack- 
bone gramniar will ill general allow very irutlly Illore 
analyses titan the unification grammar, leading to l>oor 
parser performance. Secondly, the fe.ature propagation 
necessary for gap threa<ling to prevent n<mq.ermination 
due to empty productions is obstructed. 
On the other haml, the treatment of 1,he full \[l(~ 
constraiuts in the parsing-tal)le consLructioil phase is 
associated with a nmnber of problemg most of which 
Stale 0 filate 1 
,q' -~ . S S -+ NP. VI" 
S -+ . NP VP NP -+ NP. PP 
NP -+ • l),q N Vf' -, • V 
NP ~ • l'ron VI' -+ . V NP 
NP -, . NP PP VP -> . V NI' NI' 
State ~ VI' -~ • VP 1'1' 
NP ~ l)ct. N PP -~ . ib'cp NP 
,(;talc 3 ,fftalc 
NP ~ l'ron . ,';' ~ ,q • 
Elate 5 Slalc 6' 
,S' -, NP Vf'. VP --, V. 
VP -+ VP. l'F VP -, V. NI' 
I'P - , . Prep NI' VP -~ V. NI' NP 
Stale 7 NI> -~ . tel N 
NP - ~ NI' I'P. NP -~ . t'ron 
State 3 NP -~ • NI' PP 
I'P -~ Prep • NP Stale O 
NI' ~ • l)ct N VP -, VI' I'P. 
Nt' ~ . Prou Slate 10 
NP -, . NP 1'1 > NP -+ l)et N. 
fit<de 1 1 ,~talc 12 
VP -, VNP. VP ~ VNI >NP. 
VI > -~ V NP. NP NP -+ NP. PP 
N\] > --, NP. PP PP --, . Prep NP 
NP --, . I)ct N Stale 13 
NI' .... 15"on I'P -, lb'c v NF. 
Nl' ~ . NP Pl' NP --* NP. PI' 
PP -- . lb'cp NI' Pf' -~ • Prep NP 
Figure 2: The internal stales of the toy grammar 
are discussed in \[,5\]. One of the main questions is that 
of cquality or similarity between linguistic objects. 
Consider constructing the non-kernel items using 
U(~ phrases following the dot in items ah'eady in the 
set fo~/l>rediction. If such a phrase unifies with the 
IAISld a graulmar rule and we add the uew item with 
this instantiation, we Ilee(\[ a mecl,ufism to ensure ter- 
mination the risk is that we add more aim more 
iilsl.anLiated versiolls of the same il.e.nl hl(lelhdtely. One 
might object that this is easily renmdied I)y only addiug 
items I.hat are llot sllbsllllled by :Lily previous ones. UN- 
\['ortunaLely, this does uot work, since it is quite possible 
to gellerate all infinite se(luence of items none of which 
suhsunles tile other, see \[9\]. This problem call I)e solved 
by using so called "resl;rictors" to block out the feature 
l)rol)agatioll leading to non-termination, see Ill\], hut 
still the number of items t\[lat are slight variants of one- 
another may I)e quite large. In her paper \[5\], Nakazawa 
proposes a simple and elegant solution to this problem: 
"While the C LOS U ILE proced u re makes top-down 
predictions in the same way its beh)re \[using the 
full constraints of the unitication grammar\], new 
items ;tre added without instantlation. Since 
only original productions in a gl'itlllltl~Lr appear as 
items, productions ~tre added am new items only 
once and the nontermlnation problem does not 
occur, as is the case of the I,R parsing algorithm 
with atomic categoric.s." 
387 
Unfortunately, even with this simplification, computing 
tile non-kernel closure is quite time-consuming for large 
unification grammars. 
Empty productions are a type of grammar rules that 
constitutes a notorious problem for parser developers. 
The LIIS of these grammar rifles have no realization 
in the inlmt string since their RIIS are empty. They 
are used to model movement as in the sentence Whali 
does John seek ei .,2, which is viewed as a transfornration 
of John seeks what?. This is an example of left move- 
ment, since the word "what" has been moved to the 
left. Examples of right movement are rare in English, 
but frequent in other languages, the prime exarnple be- 
ing German subordinate clauses. 
The particular unification grammar used keeps 
track of moved phrases by employing gap threading, 
i.e. by passing around a list of moved phrases to ensure 
that an empty production is only applicable if there 
is a moved phrase elsewhere in the sentence to license 
its use, see \[6\] pp. 125--129. As LR parsing is a pars- 
lug strategy employing bottom-up rule prediction, it is 
necessary to limit the applicability of these empty pro- 
ductions by the use of top-down filtering. 
4 PARSER DESIGN 
The parser was implemented and tested in SICStus Pro- 
log using a version of the SRI Core Language Engine 
(CLE) \[2\] adapted to the air4ravel information-service 
(NFIS) domain for a spoken-language translation task 
\[8\]. The CLE ordinarily employs a shift-reduce parser 
where each rule is tried in turn, although filtering us- 
ing precompiled parsing tables makes it acceptably fast. 
The ATIS domain is a common ARPA testbench, attd 
the CLE performance on it is comparable to that of 
other systems. 
In fact, two slightly ditferent versions of tile parser 
were constructed, one for the original grammar, em- 
ploying a mechanism for gap handling, as described in 
Section 4.2, and one for the learned grammar, where 
no such mechanism is needed, since this grammar lacks 
empty productions, l~xperirnents were carried out ow~r 
corpora of 100-200 test sentences, using SLI{ parsing 
tables, to measure the impact on parser performance of 
the various modifications described below. 
A depth-first, backtracking LI/. parser was used were 
the parsing is split into three phases: 
1. Phase one is the LI{ parsing phase. The grammar 
used here is the generalized unification grammar 
described in Section 4.1 below. The output is a 
parse tree indicating how tile rules were applied to 
the input word string and what constraints were 
associated with eaelt word. 
2. Phase two applies the full constraints of the syn- 
tactic rules of the unification grammar and lexicon 
to the output parse tree of phase one. 
3. Phase three applies the constraints of the compo- 
sitional semantic rules of the grammar. 
For tile learned grarmnar, phase two and three coin- 
cide, since tile learned rules include coml)ositional se- 
mantic constraints. Each rule referred to in the output 
parse tree of phase one may be a generalization over 
several ditDrent rules of tit(; unification grammar. Like- 
wise, the constraints associated with each word can be 
a generalization over several distinct lexicon entries. In 
phase two, these difli~rent ways of applying the full con- 
straints of the syntactic rules and the lexicon, and with 
the learned grammar also tile compositional semantic 
constraints, are attempted non-deterministically. 
The lookahead symbols, on the other hand, are 
ground Prolog terms. Firstly, this means that they 
can be computed e\[llciently in the LAI,I{. case. Sec- 
ondly, this avoids trivial reduction ambignities where a 
particular reduction is performed once for each possi- 
ble ruapping of the next word to a lookahead symbol. 
This is done by producing the set of all possible looka- 
head symbols \['or the next word at once, rather than 
producing one at a time non-deterministieally. Each 
reduction is associated with another set of lookahead 
symbols. The intersection is taken, and the result is 
passed on to the next parsing cycle. 
Prefix merging means theft rules starting with sim- 
ilar phrases are processed together until they branch 
away. q'he problem with this in conjunction with a 
unification gramrnar is that it is not clear what "simi- 
lar phrase" means. The choice made here is to regard 
phrases that rnap to tile same CF symbol as similar: 
Definition: Two phrases are similar if they 
map to the same conic*t-free symbol. 
Since the processing is performed by applying colt- 
straints incrementally and monotonically, where con- 
straints are realized as Prolog terms and these are ill- 
stantiated stepwise, it is important that a UG phrase 
map to tile same CF symbol regardless of its degree of 
instantiation l'or this delinition to be useful. The map- 
ping of tic phrases to CF symbols used in the experi- 
ments was the naive one, where UG phrases mapl)ed to 
their syntactic categories, (i.e. Prolog terms mapped to 
their \['unctors), save that vert)s with different comple- 
ments (intransitive, transitive, etc.) were distinguished. 
4.1 G,meralization 
The grammar used in phase one is not a eontexl.-fl'ee 
backbone grammar, nor the original unification gram- 
mar. Instead a generalized unification grammar is em- 
ployed. This generalization is accomplish using anti- 
unification. Tiffs is the dual of uniIication it con- 
structs tim least general term that subsumes two giwm 
terms --- and was first described in \[7\]. This operation is 
often refe.rred to as generalization in the computational- 
linguistics literature. If 7' is the anti-unification of Tt 
and 7), then 7' subsumes Tl and 5" subsumes 5".,, and 
if any other terrn 7" subsumes both of 7'1 and 5/~, then 
T' snbsunqes 7'. Anti-uniflcation is a built-in predicate 
of SICStus Prolog and quite acceptably fast. 
For each context-free rule, a generalized UG rule is 
constructed that is the generalization over all UG rules 
388 
that lnltp to that context-free rule. If there is only 
one such orightal UG rule, the full constraints of the 
nnification grammar are applied already ill phase one. 
Siwilarly, the symbols of the action and gore tables 
are not context-free symbols. Tliey are the general- 
izations of all relevant similar UG phrases. For exam- 
pie, each entry in the goto table will have as a sym- 
bol the generalization of a set of UG phrases. These 
UG phrases are those that map to the same context- 
free symbol; occur in a UG rule that corresponds to 
an item where this CF symlml immedhttely follows the 
clot; and ill such a UC, rule occur at tile position im- 
mediately following tile clot. For example, tile synibol 
of the gore (or shift) entry for verbs between State 1 
and State 6 of Fig. 2 is the anti-unification of tim RIIS 
verbs of tile UG rules inapping to lhlles 2, 3 and 4, e.g. 
vp: \[agr=Agr\] => \[v : \[agr=Agr,sub=intran\] \]. 
vp : \[agr=Agr\] => \[v : \[agr=/lgr, sub=Áran\[ ,np : \[agr= \] \]. 
vp: \[agr=Jtgr\] => 
\[v : \[agr=Agr, sub=ditran\], np : last= \] , np : \[agr= \] \]. 
which is v: \[agr=_,sub= \]. llere the vahle of the sub- 
categorization feature sub is left unspecilied. 
l,exical arnbignity iii the input sentence is handled 
in the same wliy. For each word, a generalized phrase is 
constructed from all similar phrases it can lie analyzed 
as. Again, if there is no lexical ambiguity within the CF 
symbol, the fllll UO constraints are apl)lied. Nothing is 
done about lexical an-lbignities outside of the sltnie CF 
symbol, though. 
In the experiments, using the UG constraints, in- 
stead of their generalizations, for tile LR-parsing phase 
led to an increase in median normalized parsing tinie l 
from a.1 to 3.8, i.e. by 20 %. This wits also typi- 
Gaily tile case for the individual parsing times. In the 
machine-learning experiments, where normally several 
UG rules mapped to the same CF rule, this effect was 
more marked; it led to an increase hi parsing time by a 
factor of fiw.'. 
On tile other hand, using truly context-free sylnbols 
for I, II. parsing actually leads to non-ternqhiation due to 
the empty productions. Even when banning einpty pro- 
ductions, the parsing times increase, by orders of lilag~- 
nitude; tim vast majority (86 %) of the. test sentences 
were timed out after ten minutes and still the nornial- 
ized parsing time exceeded 100 hi more than half (,54 
%) of the cases. This shouhl be compared with the 
0,220 tigure using generalized UG eonstraiuts. Ill the 
maehine-learnlng experiments, this lead to an increase 
in processhig time by ~ factor 100. 
4.2 Gap handling 
A technique for limiting the applicability of enll)ty pro- 
dueÁions is eniployed in the version for tile original 
gr~ulllnar. It is only correct for left lnoveFltellt. ~illoe 
there are no empty productions in the learned gram- 
mar, there is no need for gap handling here. 
The idea is that in order for an empty production to 
be applicable, some grammar rule must have placed a 
'rite parsing time for the Lit parser divided by the parsing 
time for the original l)arser. 
phrase corresponding to tile inow;d one on the gap list. 
'\['htls a ga 1) list is maintained where phrases corresl)ond- 
ins to ltotenti~d left uloventent are added whenever ~l 
state is visited where there is a "gap-adding phrase" im- 
n-lediately following the dot in any item. The elements 
of the gap list ar0 tile corresponding CF symbols. At 
this point the stack is "back-checked", as defined below, 
to see if the gap-adding rule really is applicalde. 
Ilack-cl/ecking ineans matching the prefixes of the 
kernel itelns agldnst tile stack in each state. The. ratio- 
nale for this is twofohl. Firstly, capturing constraints 
on phrases previously obscured by grainmar rules that 
have now brancl,ed off. Secondly, cal)tur}ng feature 
agreement between phrases lit prefixes of greater length 
than one. In general this was not useful; it simply re- 
suited in a small overhead. Ill conjunction with gap 
handlhlg, however, it proved essential. 
The gap list is enlptled after al~plying ~ui einpty pro- 
duction. This is not correct if several phrases are mow;d 
using the same gap list, or for conjunctions where tile 
gall threading is shared between thecoitiuncts. For the 
refiner reasoli two different gap lists are employed 
()lie for (auxiliary) verbs and erie for lnaXillrlal l:,rojec- 
tions such as Nl's, PPs, Adjl's a.lid AdvPs. 
Ill the experhnents, on\[itÁins the gal)-handlhlg pro- 
oedure led to non-tern-ihiatlon; even just olnitthig the 
back-checking did so. Ily reinovhlg enipty produc- 
tions all together, the parshig tinies decreased all Of 
der of nl,%gnitude.; tile lnedian normalized parsing tinle 
dropped to 0.270. Thls reduced tile number of analyses 
of some selitences, and n\],%lly seato\[ices f~dled to parse 
at all. New~rtheless, this indicates that these rules liaw~ 
a strollS, adverse effect ell parser performallce, 
5 COMPILER DESIGN 
We turn now to the design of the compiler that con- 
structs tile parshlg tables for tile gralnmar. All, hough 
the conlpilal, ion step involves a fair alnonnt of pro- and 
ImStl)rocessing, tile latter two consist (if rather Illlilltel'- 
esting ltlenial tasks. 
The llarsilig, t;dlles are constrllcted ilslng the 
cont, ext-free backbolle l~ralllllial', liul, also here there 
is Ol)llorl,unity for interleaving with the full U(-', ('Oll- 
strahlts. The clomlre oller~d, ion w.r.t, the non-kernel 
itchiS is characteristic for the method. 
The first point is viewing the closure operation as 
operal.htg oi1 sots. (Jonsider the closltrel3 predicate 
of Fig. 3. u Froin ~ui item already hi the set, a set 
of non-kernel iteins is generated and its union with the 
original set is taken. The. truly new items are added to 
tiu; agenda driving tile process. 
The second point is nutl, ehhtg the correspondhlg 
phrases of the unification grammar when predicting 
non-kernel items. This is done by the call to the predi- 
cate check ug rules/4 of Fig. 3, and ensures that the 
2 \[ ain hldebted to Mats C&rlssOli for this sc'lielill.'. All eMcleiit 
ilnl)lelnellt&tlol/of the \])l'illiii01v{! sl.'L operD.tlons Sllch its illllOll ;llld 
illtel'sectloll is provided by \[.he Ol'¢lt!l'ed-.set-illaniliillli£1Oll package 
of the SICStus library. These ln'hnitiw:s presuppose that the sets 
;ire represented its el'tiered lists D.Iltl COIISlSt Of grotlnd tel'illS. 
389 
closure(Set,Closure) :- 
closure(Set,Set,Closure). 
closure(\[\], Closure, Closure). 
closure(\[ItemlItems\], SetO, Closure) :- 
findall(Nkltem, 
n k_item(Item,NkItem), 
RkItems), 
union(SetO,Nkltems,Setl,NewItems), 
merge(NewItems,Items,ltemsl), 
closure(Itemsl,Setl,Closure). 
n_k_it em (it em(Rulel,_, RHS0,RItS), 
item(Rule2, LHS2, RHS2 ,RIIS2) ) : - 
gllS = \[LHS2I_\], 
cf rule (Rule2, LIIS2, RtlS2), 
check_ug_rules (Rule 1, Rule2 ,RHS0 ,RHS). 
Figure 3: The non-kernel closnre flmction 
phrase immediately following the "(lot" in some UCI 
rule mapping to Rulol unifies with the LIIS of some UG 
rule mapping to Rule2. In it em(Ruqe, LHS, RltS0, RIts), 
Rule is an atomic rule identifier and RltS0 and RHS form 
a difference list marking the position of the (lot. 
This is a compromise between performing the clo- 
sure operation with full UG constraints and perform- 
ing it efficiently, and achieves the same net effect as the 
method in Section 3 advocated by Nakazawa. Espe- 
cially in the machine-learning application, where rather 
large grammars are used, compiler performance is a 
most critical issue. 
In the experiments, omitting the checking of UG 
rules when performing the closure operation leads to 
non-termination when parsing. This is because the 
back-checking table for the gap handler becomes too 
general. For the learned grammar, this made construct- 
ing the internal states prohibitively time-consuming. 
6 SUMMARY 
The design of the Lit. parser and compiler is based ol, 
interleaving context-free processing with applying the 
full constraints of the unification grammar. 
Using a context-free description-level has the ad- 
vantages of providing a criterion for similarity between 
UG phrases, allowing efficient processing both at com- 
pile time and runtime, and providing a basis for prob- 
M>ilistic analysis. The former makes prefix merging, 
which is tim very core of LR parsing, well-defined for 
unification grammars, and enables using a generalized 
unification grammar in the Ll{ parsing phase, which is 
one of the major innovations of the scheme. This and 
prefix merging are vital when working with the learned 
grammar since many rules overlap totally or partially 
on the context-free level. 
Interleaving context-free processing with applying 
the fidl constraints of the unitlcation grammar to prune 
the search space restores some of the predictive power 
lost using a context-free backbone grammar. In par- 
ticular, using the full U(~ constraints "inside" the non- 
kernel closure operation to achieve the effect of using 
the unification grammar itself for performing this oper- 
ation constitutes another important innow~tlon. 
The experiments emphasize the importance of re- 
stricting the applicability of emI)ty productions through 
the use of top-down filtering. Thus the main remain- 
ing issue is to improve the gap handliIlg nm(;hanisrn to 
l)erform real gap threading. 
ACKNOWLEDGEMENTS 
I wish to thank Mats Carlsson for wduabh." advice on 
Prolog implementation issues and Ivan I\]retan, Robert 
Moore and Manny I{ayner for clear-sighted comments 
on draft versions of this article and related publications, 
and for useful suggestions to improvements. 
References 
\[1\] Aho, Alfred V., Ra.vi Sethi and .leffrey D. Ulhnan 
(1986). Compiler.s, l'rineiples, Techniques and Tools, 
Addlso n- Wesley. 
\[2\] Alshawl, lliyan editor (1992). 77++: Core Lan~luage l'SJ- 
ginc, MIT Press. 
\[3\] Briscoe, Ted, and John Carroll (1993). "Generldized 
Probabilistic LR Parsing of Nattmd Language (Cor- 
pora) with lJnifiea.tion-Hased C, rgltllnltrs", Computa- 
tionalLin.q~dslies 19 1, pp. 25 59, 1993. 
\[4\] Knnth, l)onatd l". (1965). "On the translation of la.n- 
guages from left to fight.", h~formation aud Conhvl 8 
6, pp. 607 (;:19. 
\[5\] Nakaza.wa, Tsuneko (19'.)1). "An I"xtended LR Parsing. 
Algorithm for Qrammltrs \[Jsing l;'eltture-l~ased Syntac- 
tic Categories", EA (.'L 91, pp. 69 -74. 
\[6\] Pereira, l?ernando C. N., and Stuart M. Shieber (1987). 
Prolog a,M Natural Language Analgsis, CSLI Le(:ture 
Note 1O. 
\[7\] Plotkht, (;ordon 1). (1970). "A Note on Inductive (1en- 
eralization", Machi~w lntelllg+mee 5, pp. 153-163. 
\[8\] Rayner, M., I1. Alshawi, I. Bretan, 1). C+trter, V. I)i- 
ggdakis, B. (laml)il.ck, .I. I(a..ia, .I. I(arlgren, It. l,yberg, 
P. Price, ,q. Puhnan and (;. Samuelsson (1993). "A 
.qpeeeh to Speech Translation System Fh,ilt l"rom ~tan- 
dam Coml)onents" , I'roes. A RI'A workshop on Iluman 
Language 7}ehnologg. 
\[9\] $a,nuelsson, Christer (199a). "Avoiding Non-termina- 
tion in Unification (',ramm~trs", NLULP 98, pp. 4--16. 
\[i0\] Samuelsson, Christer, and Manny Ibt.y,,er (1991). 
"Quantitative Evahmtion of \]'~xphulation-Based Learll- 
ing its itll Optimization Tool for a I+a)'ge-Seale Natural 
l+angu,'tge System", IJCAI 91, pp. 609-615. 
\[11\] Shieber, Stuart M. (1985). "Using Restrictions to Ex- 
tend Parsing Algorithms for Complex-l?e;tture-lbtsed 
Formalisms", ACL 85, pp. 145 152. 
\[12\] To,nita, M~ttsurn (I986). EJfieicnt l'a,'si),g of Natu- 
ral Lauguage. A Fast Algorithm.\[or l))'aetical Sgstem.% 
Khtwer. 
390 
