File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/94/c94-1062_metho.xml
Size: 21,080 bytes
Last Modified: 2025-10-06 14:13:35
<?xml version="1.0" standalone="yes"?> <Paper uid="C94-1062"> <Title>NOTES ON LR PARSER DESIGN</Title> <Section position="3" start_page="0" end_page="386" type="metho"> <SectionTitle> 2 LR PARSING </SectionTitle> <Paragraph position="0"> An LI{ parser is a type of shift-reduce parser originally devised by Knuth for programming languages \[4\]. The success of LR. parsing lies ill handling a number of grammar rules simultaneously, rather than attempting one at a time, by the use of prefix merging. LI~. parsing in general is well described in \[1\], and its application to natural-language processing in \[12\].</Paragraph> <Paragraph position="1"> An LR parser is basically a pushdown automaton, i.e. it has a pushdown stack in addition to a finite set of internal states, and a reader head for scanning the input string from left to right, one symbol at a time. In fact, the &quot;b&quot; in &quot;LW' stands for left-to-right scanning of the input string. The &quot;W' stands for eonstr, cting the rightmost derivation in reverse.</Paragraph> <Paragraph position="2"> The stack is used in a characteristic way: The items on the stack consist of alternating grammar symbols and states. The current state is the state on top of the stack. The most distinguishing feature of an LR. parser is however the form of the transition relation -- the action and gore tables. A non-deterministic LR parser can in each step perform one of four basic actions. In state S with lookahead symbol Syra it can: 1. accept (S, Sym) : llalt and signal success.</Paragraph> <Paragraph position="3"> 2. shift(S,Sym,S2): Consume the sylnbol Sym, place it on tile stack, and transit to state $2.</Paragraph> <Paragraph position="4"> 3. reduce (S, Sym, R) : l'op off a number of items Dora tile stack corresponding to tim I{IIS of grammar rule R, inspect the stack for tile ohl state S1, place the LttS of rule tt on tile stack, and transit to state $2 determined by goto(Sl,LHS,S2).</Paragraph> <Paragraph position="5"> 4. error(S,Sym): Fail and backtrack.</Paragraph> <Paragraph position="6"> PreIix merging is accomplished by each internal state corresponding to a set of l)artially processed grammar rules, so-called &quot;dotted items&quot; containing a dot (.) to mark the current position. Since the grammar of Fig. 1 contains Rules 2, 3, and 4, there will be a state containing the dotted items</Paragraph> <Paragraph position="8"> This state corresponds to just having found a verb (V).</Paragraph> <Paragraph position="9"> Which of the three rules to apply in the end will be determined by the rest of the inl)ut string; at this point no commitment has been made to either.</Paragraph> <Paragraph position="10"> Cornpiling L\[{. parsing tables consists of constructing the internal states (i.e. sets of dotted items) and from these deriving the sl,ift, reduce, accept and tote entrie.s of tile transition relation. New states can be in(h, ced from previous ones; given a state S1, another state S2 reachable from it. by goto(Sl,Sym,S2) (or shift(Sl,Sym,S2) if Sym is a terulinal symlml) can be constructed as Ibllows: I. Select all items in state S1 where a particular symbol gym follows immediately afte,' the (lot and move the dot to after this symbol. This yiehls the kernel items of state S2.</Paragraph> <Paragraph position="11"> 2. Construct the non-kernel closure by repeatedly adding a so-called non-kernel item (with the dot at the beginning of the I{IIS) for each grammar rule whose LIIS matches a syn,bo\] following the (lot of some item in $2.</Paragraph> <Paragraph position="12"> Consider for example the grammar of Fig. 1, which will generate the states of Fig. 2. State I can be constructed from State 0 by adwmcing the dot in S --~ . NP VP and</Paragraph> <Paragraph position="14"> The non-kernel items are generated by the grauunar</Paragraph> <Paragraph position="16"> rules for VPs and PPs, the categories following the dot in the new items, namely Ihlles 2, 3, 4, 5 aml 9.</Paragraph> <Paragraph position="17"> Using this method, the set <>f all parsing slates can I>e induced from an initial state whose single kernel item has the top symbol of the grammar preceded by the dot as its RI\[S (the item S' --+ * S of State 0 in Vig. 2). The accept, shift and goto e.ntries fall out autonmtically from this procedure. Any dotted item where the dot is at the end of the I{,IIS gives rise to a reduction l)y the corresl>onding gramm~tr rule. Thus it remains to determine the lookahead sylnbols of the reduce enl, ries.</Paragraph> <Paragraph position="18"> In Simple LIt (SLR) the h)okahead is any termiual symbol that can imnlediately follow any symbol of the saltle tylie as the LIIS of tile rule. In l,ookAhead 1,1L (LALIL) it is lilly terminal sylnbol that cali ilriiue(liately follow the LIlS giwm that it was constructed using this rule in this state, hi general, I,AI,R gives COilsiderably fewer reduce entries than SI,I{., and thus results in faster parsing. Ill the experiments this reduced the l)arsing tiines by 30 %.</Paragraph> </Section> <Section position="4" start_page="386" end_page="387" type="metho"> <SectionTitle> 3 PROBLEMS WITH LR PARSING </SectionTitle> <Paragraph position="0"> The l)roblems of applying the Lit-parsing scheme to large tmification grammars for natural language, rather than small context:free grammars for progranmling languages, stem from three sources. The tirst is that syu> bol matching no h)nger consists of checlcing atomic symbols for equality, but rather comparing COml)h~x \['eaLur(~ structm'es. The second is tile high lewq of ambiguity of natural hmguage and the resulting non-determinism.</Paragraph> <Paragraph position="1"> The third is tile sheer size. of the gratllli'mrs.</Paragraph> <Paragraph position="2"> Straight-forward resorting to a context-free back: bone grammar and subsequent filtering using the full constraints of the underlying unification gramnrar (U(1) is an al>proaeh taken by lbr example \[3\], The I)roblem with this al>proaeh is that the I>redictive power of I, he unification grammar is so vastly diluted when feature l>ropagation is omitted. Firstly, the context-free l>ackbone gramniar will ill general allow very irutlly Illore analyses titan the unification grammar, leading to l>oor parser performance. Secondly, the fe.ature propagation necessary for gap threa<ling to prevent n<mq.ermination due to empty productions is obstructed.</Paragraph> <Paragraph position="3"> On the other haml, the treatment of 1,he full \[l(~ constraiuts in the parsing-tal)le consLructioil phase is associated with a nmnber of problemg most of which State ~ VI' -~ * VP 1'1' NP ~ l)ct. N PP -~ . ib'cp NP ,(;talc 3 ,fftalc NP ~ l'ron . ,';' ~ ,q * Elate 5 Slalc 6' ,S' -, NP Vf'. VP --, V.</Paragraph> <Paragraph position="4"> VP -+ VP. l'F VP -, V. NI' I'P - , . Prep NI' VP -~ V. NI' NP Stale 7 NI> -~ . tel N NP - ~ NI' I'P. NP -~ . t'ron State 3 NP -~ * NI' PP I'P -~ Prep * NP Stale O NI' ~ * l)ct N VP -, VI' I'P.</Paragraph> <Paragraph position="5"> Nt' ~ . Prou Slate 10 NP -, . NP 1'1 > NP -+ l)et N.</Paragraph> <Paragraph position="6"> fit<de 1 1 ,~talc 12 VP -, VNP. VP ~ VNI >NP.</Paragraph> <Paragraph position="7"> VI > -~ V NP. NP NP -+ NP. PP N\] > --, NP. PP PP --, . Prep NP NP --, . I)ct N Stale 13 NI' .... 15&quot;on I'P -, lb'c v NF.</Paragraph> <Paragraph position="8"> Nl' ~ . NP Pl' NP --* NP. PI' PP -- . lb'cp NI' Pf' -~ * Prep NP are discussed in \[,5\]. One of the main questions is that of cquality or similarity between linguistic objects. Consider constructing the non-kernel items using U(~ phrases following the dot in items ah'eady in the set fo~/l>rediction. If such a phrase unifies with the IAISld a graulmar rule and we add the uew item with this instantiation, we Ilee(\[ a mecl,ufism to ensure termination the risk is that we add more aim more iilsl.anLiated versiolls of the same il.e.nl hl(lelhdtely. One might object that this is easily renmdied I)y only addiug items I.hat are llot sllbsllllled by :Lily previous ones. UN\['ortunaLely, this does uot work, since it is quite possible to gellerate all infinite se(luence of items none of which suhsunles tile other, see \[9\]. This problem call I)e solved by using so called &quot;resl;rictors&quot; to block out the feature l)rol)agatioll leading to non-termination, see Ill\], hut still the number of items t\[lat are slight variants of oneanother may I)e quite large. In her paper \[5\], Nakazawa proposes a simple and elegant solution to this problem: &quot;While the C LOS U ILE proced u re makes top-down predictions in the same way its beh)re \[using the full constraints of the unitication grammar\], new items ;tre added without instantlation. Since only original productions in a gl'itlllltl~Lr appear as items, productions ~tre added am new items only once and the nontermlnation problem does not occur, as is the case of the I,R parsing algorithm with atomic categoric.s.&quot; Unfortunately, even with this simplification, computing tile non-kernel closure is quite time-consuming for large unification grammars.</Paragraph> <Paragraph position="9"> Empty productions are a type of grammar rules that constitutes a notorious problem for parser developers. The LIIS of these grammar rifles have no realization in the inlmt string since their RIIS are empty. They are used to model movement as in the sentence Whali does John seek ei .,2, which is viewed as a transfornration of John seeks what?. This is an example of left movement, since the word &quot;what&quot; has been moved to the left. Examples of right movement are rare in English, but frequent in other languages, the prime exarnple being German subordinate clauses.</Paragraph> <Paragraph position="10"> The particular unification grammar used keeps track of moved phrases by employing gap threading, i.e. by passing around a list of moved phrases to ensure that an empty production is only applicable if there is a moved phrase elsewhere in the sentence to license its use, see \[6\] pp. 125--129. As LR parsing is a parslug strategy employing bottom-up rule prediction, it is necessary to limit the applicability of these empty productions by the use of top-down filtering.</Paragraph> </Section> <Section position="5" start_page="387" end_page="388" type="metho"> <SectionTitle> 4 PARSER DESIGN </SectionTitle> <Paragraph position="0"> The parser was implemented and tested in SICStus Prolog using a version of the SRI Core Language Engine (CLE) \[2\] adapted to the air4ravel information-service (NFIS) domain for a spoken-language translation task \[8\]. The CLE ordinarily employs a shift-reduce parser where each rule is tried in turn, although filtering using precompiled parsing tables makes it acceptably fast.</Paragraph> <Paragraph position="1"> The ATIS domain is a common ARPA testbench, attd the CLE performance on it is comparable to that of other systems.</Paragraph> <Paragraph position="2"> In fact, two slightly ditferent versions of tile parser were constructed, one for the original grammar, employing a mechanism for gap handling, as described in Section 4.2, and one for the learned grammar, where no such mechanism is needed, since this grammar lacks empty productions, l~xperirnents were carried out ow~r corpora of 100-200 test sentences, using SLI{ parsing tables, to measure the impact on parser performance of the various modifications described below.</Paragraph> <Paragraph position="3"> A depth-first, backtracking LI/. parser was used were the parsing is split into three phases: 1. Phase one is the LI{ parsing phase. The grammar used here is the generalized unification grammar described in Section 4.1 below. The output is a parse tree indicating how tile rules were applied to the input word string and what constraints were associated with eaelt word.</Paragraph> <Paragraph position="4"> 2. Phase two applies the full constraints of the syntactic rules of the unification grammar and lexicon to the output parse tree of phase one.</Paragraph> <Paragraph position="5"> 3. Phase three applies the constraints of the compo null sitional semantic rules of the grammar.</Paragraph> <Paragraph position="6"> For tile learned grarmnar, phase two and three coincide, since tile learned rules include coml)ositional semantic constraints. Each rule referred to in the output parse tree of phase one may be a generalization over several ditDrent rules of tit(; unification grammar. Likewise, the constraints associated with each word can be a generalization over several distinct lexicon entries. In phase two, these difli~rent ways of applying the full constraints of the syntactic rules and the lexicon, and with the learned grammar also tile compositional semantic constraints, are attempted non-deterministically.</Paragraph> <Paragraph position="7"> The lookahead symbols, on the other hand, are ground Prolog terms. Firstly, this means that they can be computed e\[llciently in the LAI,I{. case. Secondly, this avoids trivial reduction ambignities where a particular reduction is performed once for each possible ruapping of the next word to a lookahead symbol. This is done by producing the set of all possible lookahead symbols \['or the next word at once, rather than producing one at a time non-deterministieally. Each reduction is associated with another set of lookahead symbols. The intersection is taken, and the result is passed on to the next parsing cycle.</Paragraph> <Paragraph position="8"> Prefix merging means theft rules starting with similar phrases are processed together until they branch away. q'he problem with this in conjunction with a unification gramrnar is that it is not clear what &quot;similar phrase&quot; means. The choice made here is to regard phrases that rnap to tile same CF symbol as similar: Definition: Two phrases are similar if they map to the same conic*t-free symbol.</Paragraph> <Paragraph position="9"> Since the processing is performed by applying coltstraints incrementally and monotonically, where constraints are realized as Prolog terms and these are illstantiated stepwise, it is important that a UG phrase map to tile same CF symbol regardless of its degree of instantiation l'or this delinition to be useful. The mapping of tic phrases to CF symbols used in the experiments was the naive one, where UG phrases mapl)ed to their syntactic categories, (i.e. Prolog terms mapped to their \['unctors), save that vert)s with different complements (intransitive, transitive, etc.) were distinguished.</Paragraph> <Section position="1" start_page="387" end_page="388" type="sub_section"> <SectionTitle> 4.1 G,meralization </SectionTitle> <Paragraph position="0"> The grammar used in phase one is not a eontexl.-fl'ee backbone grammar, nor the original unification grammar. Instead a generalized unification grammar is employed. This generalization is accomplish using antiunification. Tiffs is the dual of uniIication it constructs tim least general term that subsumes two giwm terms --- and was first described in \[7\]. This operation is often refe.rred to as generalization in the computationallinguistics literature. If 7' is the anti-unification of Tt and 7), then 7' subsumes Tl and 5&quot; subsumes 5&quot;.,, and if any other terrn 7&quot; subsumes both of 7'1 and 5/~, then T' snbsunqes 7'. Anti-uniflcation is a built-in predicate of SICStus Prolog and quite acceptably fast.</Paragraph> <Paragraph position="1"> For each context-free rule, a generalized UG rule is constructed that is the generalization over all UG rules that lnltp to that context-free rule. If there is only one such orightal UG rule, the full constraints of the nnification grammar are applied already ill phase one.</Paragraph> <Paragraph position="2"> Siwilarly, the symbols of the action and gore tables are not context-free symbols. Tliey are the generalizations of all relevant similar UG phrases. For exampie, each entry in the goto table will have as a symbol the generalization of a set of UG phrases. These UG phrases are those that map to the same context-free symbol; occur in a UG rule that corresponds to an item where this CF symlml immedhttely follows the clot; and ill such a UC, rule occur at tile position immediately following tile clot. For example, tile synibol of the gore (or shift) entry for verbs between State 1 and State 6 of Fig. 2 is the anti-unification of tim RIIS verbs of tile UG rules inapping to lhlles 2, 3 and 4, e.g.</Paragraph> <Paragraph position="3"> vp: \[agr=Agr\] => \[v : \[agr=Agr,sub=intran\] \].</Paragraph> <Paragraph position="5"> which is v: \[agr=_,sub= \]. llere the vahle of the sub-categorization feature sub is left unspecilied.</Paragraph> <Paragraph position="6"> l,exical arnbignity iii the input sentence is handled in the same wliy. For each word, a generalized phrase is constructed from all similar phrases it can lie analyzed as. Again, if there is no lexical ambiguity within the CF symbol, the fllll UO constraints are apl)lied. Nothing is done about lexical an-lbignities outside of the sltnie CF symbol, though.</Paragraph> <Paragraph position="7"> In the experiments, using the UG constraints, instead of their generalizations, for tile LR-parsing phase led to an increase in median normalized parsing tinie l from a.1 to 3.8, i.e. by 20 %. This wits also typi-Gaily tile case for the individual parsing times. In the machine-learning experiments, where normally several UG rules mapped to the same CF rule, this effect was more marked; it led to an increase hi parsing time by a factor of fiw.'.</Paragraph> <Paragraph position="8"> On tile other hand, using truly context-free sylnbols for I, II. parsing actually leads to non-ternqhiation due to the empty productions. Even when banning einpty productions, the parsing times increase, by orders of lilag~nitude; tim vast majority (86 %) of the. test sentences were timed out after ten minutes and still the nornialized parsing time exceeded 100 hi more than half (,54 %) of the cases. This shouhl be compared with the 0,220 tigure using generalized UG eonstraiuts. Ill the maehine-learnlng experiments, this lead to an increase in processhig time by ~ factor 100.</Paragraph> </Section> <Section position="2" start_page="388" end_page="388" type="sub_section"> <SectionTitle> 4.2 Gap handling </SectionTitle> <Paragraph position="0"> A technique for limiting the applicability of enll)ty produeAions is eniployed in the version for tile original gr~ulllnar. It is only correct for left lnoveFltellt. ~illoe there are no empty productions in the learned grammar, there is no need for gap handling here.</Paragraph> <Paragraph position="1"> The idea is that in order for an empty production to be applicable, some grammar rule must have placed a 'rite parsing time for the Lit parser divided by the parsing time for the original l)arser.</Paragraph> <Paragraph position="2"> phrase corresponding to tile inow;d one on the gap list.</Paragraph> <Paragraph position="3"> '\['htls a ga 1) list is maintained where phrases corresl)ondins to ltotenti~d left uloventent are added whenever ~l state is visited where there is a &quot;gap-adding phrase&quot; imn-lediately following the dot in any item. The elements of the gap list ar0 tile corresponding CF symbols. At this point the stack is &quot;back-checked&quot;, as defined below, to see if the gap-adding rule really is applicalde.</Paragraph> <Paragraph position="4"> Ilack-cl/ecking ineans matching the prefixes of the kernel itelns agldnst tile stack in each state. The. rationale for this is twofohl. Firstly, capturing constraints on phrases previously obscured by grainmar rules that have now brancl,ed off. Secondly, cal)tur}ng feature agreement between phrases lit prefixes of greater length than one. In general this was not useful; it simply resuited in a small overhead. Ill conjunction with gap handlhlg, however, it proved essential.</Paragraph> <Paragraph position="5"> The gap list is enlptled after al~plying ~ui einpty production. This is not correct if several phrases are mow;d using the same gap list, or for conjunctions where tile gall threading is shared between thecoitiuncts. For the refiner reasoli two different gap lists are employed ()lie for (auxiliary) verbs and erie for lnaXillrlal l:,rojections such as Nl's, PPs, Adjl's a.lid AdvPs.</Paragraph> <Paragraph position="6"> Ill the experhnents, on\[itAins the gal)-handlhlg prooedure led to non-tern-ihiatlon; even just olnitthig the back-checking did so. Ily reinovhlg enipty productions all together, the parshig tinies decreased all Of der of nl,%gnitude.; tile lnedian normalized parsing tinle dropped to 0.270. Thls reduced tile number of analyses of some selitences, and n\],%lly seato\[ices f~dled to parse at all. New~rtheless, this indicates that these rules liaw~ a strollS, adverse effect ell parser performallce,</Paragraph> </Section> </Section> class="xml-element"></Paper>