File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/90/c90-3013_metho.xml
Size: 22,963 bytes
Last Modified: 2025-10-06 14:12:29
<?xml version="1.0" standalone="yes"?> <Paper uid="C90-3013"> <Title>Efficient Disjunctive Unification for Bottom-Up Parsing</Title> <Section position="3" start_page="70" end_page="71" type="metho"> <SectionTitle> 2 Bottom-up Parsing of Japanese </SectionTitle> <Paragraph position="0"> 'Pile Nadine system is geared towards the processing of Japanese sentences of the type encountered in telephone conversations. At ATR, a substantial corpus of dialogues has been collected by simulating, both by speech and by keyboard, telephone calls to the organizing otfice of an international conference. At tile time the research described here was carried out, Nadine's grammar and lexicon were being developed and tested mainly on a subcorpus of 100 sentences comprising five of these dialogues. The results presented in this paper therefore all derive fl'om applying Propane to this same sentence set. Although the size of the set is comparatively small, the sentences in it were not in any sense &quot;made up&quot; to suit either the Nadine or Propane parsers. Rather, to the degree that a simulation can approach reality, they can be taken as representatives of the kinds of sentences to be handled in a realistic language processing application. null Japanese has severM characteristics which suggest that bottom-up parsing~approaehes might be particular~ly fl'uitflfl.. The language is a head-finM, strongly left-branchirlg~ one.. This means that modifiers ale ways attach to a head on.their right, and that there is a~prefet~ence for:attachment.to the nearest such head .that obe.ys:the constraints that syntax, seman~ tics aud:~pragmatics ,place.. on possible combinations.</Paragraph> <Paragraph position="1"> &quot;l?his,prefe~rence~is: so, strong, as to suggest a parsing alger:it hm Ltrat~.,firgt- e6nstructs analyses that: obey iG bacl~urac, king: and ,pro&a:oing analyses with~ different braeketfn~gs only,if.the:initial !analysisor analyses are i udgeld ~un,aceeptable ,by,some. outside process.</Paragraph> <Paragraph position="2"> Atgempt;s. traX, e b~en made, for example in Nadine and \[iy Shi~tzu.i~nd Naito (1989); to use the left-branchingpref~rence to select among alternative aeti0ns;:in ~:c~art&quot;parser. However, the approach adopte'd' in Propane-is to implement the preference dire'etly' into' the tnedianish~ Of a shift-reduce parser: In gener~l:, a stiiftxreduce parser uses a ~able of parsd states and po~sibl~ adti0iis ttiat determine, at each St'age, whether a shift or a reduction is appropriate , PSh'd'in tile liit'ter case, what grammar rule ~.,'hoU\]d'%~e used. IIoweVer, When Japanese is formalized&quot;6si'ng a'grammar i'n which every rule has exactlj, two rightxhiindiside elements - ms is the case in Nadine grammar - the left-br~mching preference corresp'6nds ~ t0 astl'at~gy of reducing the top two categorfes i ot~ tlie: st:ack ~vhdhever th:ere is a grammar rule ~thaf allows t,liemt0 b'e ~ 'reduced. and shifting only wti;en this cannot, b'e done. No table is there-Ibre requiked ~. Nadihe'~ grammar rules include syntactic, s6~aniiC-afid;piiaglfiatiC/ information, so that Prop~i\]g'g:decisi'6n ~o redt/ce or not depends on the aecei~t'liSitf'ty '0fth:e ~'restflt ~ at: all three of these linguisttd'16Vo.lg1&quot; $/~cti.'a'test; fakes advantage of the maXtmfim-dmotmg Of~vMlableinformation. and applies'it ~in ~ fai.rl 2' straightforward' and' efficicnt way. Aif~rni~t~vC lekicdlen~rfes'for Words, and alternative grammar rules that 'can apply to the same pair of daiight6r categories, mean that each position on the p~,rse'r~s:'stack is in fact occupied no~ by a single eateg9i' ~ bii~ by a list of categories (each of which, of ebfirse, cbn~aifis a disjunctive structure that may have many realiZatiOns): The lengths of these lists do not grow significantly as parsing progresses, because just as the lexicon and the grammar can introduce al\[ernatives, so the application of grammar rules c~tii remove thern/The attempt to reduce each of m ~qssiblehead' daughters with each of n possible n0ii-hea,d' daugliterg typically results in far fewer than 'm,:)f. md~het ,structures , because not every rule appli~at\]bn suecoeds. .......</Paragraph> <Paragraph position="3"> O~'c0mplicati0ti that arises in parsing written Japanese ~s that wor d born}danes a.re not mdmated explic~\[ly. &quot;I~lns fiaea~.ns that the lexicon imposes a la.ttice structure, not a simple sequence of tokens, on the input, so that, when a shff ope~:atmn ~s needed the t)o~at to ~l~fft, ffor~,~ ~s, not necessarily well-defined Propane deals with this situation in the followiug~ way. When shifting, edges of all lengths are placed onto the stack, and are allowed to participate in any following sequence of reductions. Before the next shift, however, Propane &quot;prunes&quot; the edges that constitute the top of the stack, removing all but the longest. This corresponds to the assump~ lion that there is a preference for longer strings of characters to correspond to lexical items where possible, but that this preference should be overturned when a shorter string, but not a longer one, allows a reduction with what precedes it.</Paragraph> <Paragraph position="4"> A laa'ge proportion of the lO0-sentence subcorpus targeted by Nadine can be parsed correctly by this simple approach of always preferring reductions to shifts and longer edges to shorter ones. Nevertheless, on .many occasions the correct parse will involve at least one violation of these pre\['erenees. In general, some kind of intelligent backtracking and/or lookahead is required. In Propane, only a limited form of \]ookahead exists. Sometimes, an examination of the parts of speech (i.e. category names only and not feature values) in the grammar and those of the constituents in the stack and of t.he item that would be consumed in a shift shows the following situation: a reduction is possible, but if it is performed, the next shift cannot itself be followed by a reduction, whereas if a shift is performed next, two reductions may be possible. That is,, there are two alternatives: reduce now and then be forced to shift twice, or shift now and, unless unification failure prevents it, reduce twice. In such situations, Propane chooses the second option. This often allows sentences to be parsed which would not otherwise be, and does not prevent the parsing of any sentences in the subcorpus. Because only category names, and not features, are examined, the lookahead procedure is very quick.</Paragraph> <Paragraph position="5"> With this small amount of lookahead included, Propane was able to parse 75 of the 100 sentences in the subcorpus. No attempt was made to check thoroughly the validity of these because of the present author's limited farniliarity with Japanese and the Nadine grarnmar; however, they were inspected informally, and none seemed to be obviously wrong.</Paragraph> <Paragraph position="6"> Of the 25 sentences for which no parse was found, ten involved an incorrect reduction. Eight of these might: have been prevented had information corresponding to Gunji's (1988) treatment of &quot;sentence levels&quot; for modification been present in the grammar. Twelve sentences failed through incorrectly favouring longer edges over shorter; all of these failures involved a lexical entry for the same particle sequence, and could have been prevented either by altering the treatment of that sequence or by im-: plementing the same kind of lirnited lookahead for: the long-over-short preferel~ee as was clone for the = reduce-over-shift preference. Of the other three failures, two were sentences on which the Nadine parser also failed, suggesting that they were outside grammatical and/or lexical coverage, and one remained unexplained. '\]'hus in summary, up to 98% of the subcorpus could have been assigned plausible analyses: by Propane given the improvements just,listed,</Paragraph> </Section> <Section position="4" start_page="71" end_page="71" type="metho"> <SectionTitle> 3' Pruning Irrelevant Disjuncts </SectionTitle> <Paragraph position="0"> If&quot;bottom-up parsing is to be effident, it is impel tan(~liat disjunctions that are irrelevant to. a newly-: er~eat:'ed ' mother Coiisti~uent -~'that is ~ d~sj~nCtions wli~se values never affect the'reaiizat~ons Of the 'con; st:i:thefit, i.el tlle se't of tin'mS i~i its disjunctive' norm~i' form 2_ are:disC~ded WheneVer possib.lel Otherwise, the number of disjunCt'i0ns in a constihlent will be roug,hly.proportional t'6 themumber'~f'\[e~ical:'en{ries andlgralnmar:rules used, to construct, it~ land: ttie. time ta.l~en, to unify two constituents~wil,1, increase at&quot;l~ast as;:fasV as' that number Jand, probably ra~herdaster.</Paragraph> <Paragraph position="1"> However, :i't is ,nov possibte si~mply' to-dlscard, disjunctive' constraints t,t~at refer, 0nty' ~o t;he daugtrter nod'~s,,' because feature, struct~ures are, grap'hs, not' tree's~ ~he.sa~me substructure' frequen-tty appears ~;in more ~ ~,h,a.n oue placei When a grammar role' has ide:n~ifl~d :par:t of,the,motl~er st, r~c'ture~with::p'arg of a-d~ugh'tie~ one; ~he'n, any disjune~ions ~i~iV~lving~ the la.t,~m,~ m~tst*,be preserved. Some: means mus~ ~therefore be ~ou.nd, of.k~eping track of wllat pieces'of structu, re' ateMm~ed? or in~ other' wovd~s; w.hat pai.fs of!featm~e, pat~hs, lead to the~ same V~kues. I~fqhds.qs; done; ~ a:di~jn,~6tion that:.enpt:icitlyqn:vOlve~ ;Only daug:h~er constituents, ~cau,, safely be di~carde'd :: if.. no feature' path ,tJhmugh,' the ~mother * l~ads, :to ~ i,t~ oe 't'o ~ ,any: o'f its, ~coinponents.</Paragraph> <Paragraph position="2"> :'O~ course; t'he set of featu.re paths, t'hat 's'la:are' a valise will' di~ffer fo.v t~he different reM~z~tions ~Comple~e ch~ice~ 'of disjut~e~s)of a 'disjtlnetiVe' sti~ueture. It ~is) not even simpty' the Case ttra~ eacI\] disjun:Ct co~atvibut'es~:its ~own:s'et'.of',cm~iirtion p't~ths; naembe~s of:~.wc~,differ~nt, ,~i;sj:une~ions va~ii,ehu.~e t~V6; p'a:~hs' ~'6 h~y~ t~t~,%~a~.~e ya:\[nai ir~ a xe~li~,ati0n in w,hich itt)ey ar~ b,qt,t~ ~lec.t~d. if'Mmy pl,a~e the (same vgF~i,~bl,e in ,t.go different ,positions Thus .to de~ide inNltibty or grammar predicate, therefore, is assigned a set of &quot;path groups!!, which each correspond either to a variable that: .appears more than once in the original Nadine definition, or to an explicit identity equation between two:or more positions in the feature structure: To some, extent,, a path group is analogous to a set .of Eisele, and:'rD.5rre pointers that all, p,oint to:the,, same position.. However, the crucial,,poil~, is ,.tha~,i;n. Bropane,.,no:record is kept of w,hieh, position: i~n,,the and/or, tree each path comes from. This aneans,, two things.. ,Firstly when deciding whether; ~to4hro,~caway, a disju.n~tior~ referring ~o a p articular~gositionSn-a:daugil~er, st~uct ure, Propane can check the,(m4ique,, disjunctiondndependent) set of patti,group, s,, .and ff n,0, p,ossib!e equivalence with pant of.the mo~,~mr, z~r,uC/tu~C/ is found, the: disjunction c~n safei~ be pruned. The. p~'ice we pay for this disju,nctifm,i~gtep9 ~ndence is th~,t.the pathgroups can sp~ecify~sp, uriQu~.~,~va\[C/nees. It is possible for two p~hs .t9 be ~so~ia~ted wher*.;th~y, arise from two~dif, ferent,, inc0mpatible di~jullcJ~s or to remain assoC/ia~ed after the d\[sjunct{s).from which they arose have been eliminated through later unificatmn. Itoweve L sn)ce path groups ~are used only for demdmg wtmt digjunct{ons &quot;to d~;eard, and not as part of the fea~ - ? * %,;(~:, . ture structure representation itself a spurious path group c~a.n only result in some inefficiency and not in an irtco 't', ct. result.</Paragraph> <Paragraph position="3"> This tec ~n'ique is.thus a compromise between on the pge:!~nd,i C/~r,{-y,jagoBt.,possibly exhaustive compg;~i~;u t~p,,ach~ey ~ a perfect, result,, and on the oth,e.r hand ,not ~!}5.e~r~ing ar~y:thiag.~t all. It avoids any expone~)!)i~):Sx.p,~,nsi~n: 0f, disjunctions at the cost,of so,me sli.gb,! .tt!~,~)eee, ss, a~Y= proge~sir~g at a later sta ~. I~ prg.ct, ic%'t'~;'co}t invoive, qt seen~, quite accepta~, in,t~aat th~ 1.1,~m3'~r, qf di~juac~s in. ,~ constituent dQ~s no,~ iqcre~s.c~,~.~!y ~ith i~ heigt~t in the. parse.tre~, :~M~q~her,<~?p~fl~le, nc~.,of .keeping. irrelevant dis,jt!~,CSS {S~,t,l~i~:if ~ ~.t; t, bgend, of the parse, the set.of all full re~\]iz~tions of a disjunctive .feature structure.is exhaustwely epume~ated then ,tl~e same realization tt ,~ G(: ;,', ','; ()l ':' ~ * ': ~', may'be .encotlntered'.repeatedly However, experb enne suggests that for t!~e current Nadine gra~mnarl it)l~', ,~ '!~ !.l I) .F,!)'~ ,.&quot;';, ahzatmns (~enihcal or .d~fferertt) per parse of'the 75 senten,c, es suC/cess\[ully.parsed was exactly two, and, on!y one sentence received more than six reali.z~{idiis .... &quot; he l?runinz ot~e,~tion in fact resulted in, on ave~e ~ ,~0,~ decre~e ~)a the,numbe~ of &sjunctions t.')'g~&quot;) &quot;))II~):l'~ 03 ,I(!fl ,:i ~ , ~ , &quot; : in. a new\]yi created tnother constit,uent, over all &quot;re- .I &quot;~ O \[ .l~ IC* I\[')G li &quot;t'.,.' 'f &quot; ~. ' &quot; dace&quot;., operations t)erfgrmed in processing ~he cor- &quot;U;'?,edeg)l &quot;LAI UI ~;dIC/fU,'; ~&quot; V U'&quot;': ' &quot; 'deg. &quot; &quot; pus Probal~lv for t,j.gS reaso$~ the number ofdisjunc- f)\[\[l't ~'\[&quot;)&quot;f~) .(Yt\[ f)t~itf; )&quot;, V &quot;&quot; ~ &quot;~ &quot;&quot; ' : ' .&quot; tmns fn a new mother constituent.only barely show. s a positive cQrrelation to the size, in constituents, of t!m su )itr ~-~ ~l~a~'~t domg~ates ~nd from which tt has ~\])&quot;)~\]G~)\]I.~tt |JOfl~. ';~'1' . I ; ~' * 'b &quot; ' DeeD- off\]it. ton the other nano, \]i pruning were not ~'he correlation between subtree ~ize and number of disjUnctmns, for d~e 406 tree nodes crea, ted, w,%s only just, slg- ,C/ ~q t C/ , ,: ,, , ,. C/ .&quot; .&quot; n\]ti~e~fl~ ,)PSt ,t~ff6 &quot;'5%' lex,el: 'gfgell' the, mlll hypothems that the pertbrmed, each constituent eottld be expected to add its quota of irrelevant disjm~cts to ~very ottmr constituent that dominated it. l)espite the relatively modest figure of a 20% decrease over one reduction, the cumulative effect of such decreases over a whole parse is theretbre quite significant.</Paragraph> <Paragraph position="4"> In particular, it is worth noting that if', through pruning, the number of disjunctions in a node does not increase with the number of nodes it dominates, {;hen disjunctive unification will have no ef\['ect on the time complexity of parsing as a flmction of sentence length. There is reason to hope that this will often be the case; while disjunction may be widespread in grammar rules and texical entries, Kasper (1987) observes that in his implementation, &quot;in the analysis of a particular sentence most fieatures have a unique value, and some features are not present at, all. \Vhen disjunction remains in the description of a sentence after parsing, it usually represents ambiguity or an underspecified part of the grammar.&quot; it is tempting to interpolate between the extremes of single words and whole sentences and to speculate that, with thorough pruning, the number of disjunctions in a node should decrease with its height in the tree.</Paragraph> </Section> <Section position="5" start_page="71" end_page="71" type="metho"> <SectionTitle> 4 Pairwise Consistency Checking </SectionTitle> <Paragraph position="0"> When a new mother constituent has been creal, ed by rule application, it is essential to verify that it.</Paragraph> <Paragraph position="1"> does in fact have at least on~ cousistent realization.</Paragraph> <Paragraph position="2"> Although redundancy is not a major problenl for our i)urposes, a representation that did not di:-stinguish bet.ween realizable and ~,nrealizable struc*ures (that 5:< between success and failure i~l unification) would eseriously flawed. Ilowever. consistency checking is, in the general case: an N\['-complete problem.</Paragraph> <Paragraph position="3"> Kasper (1987) describes a teelmique which, lbr every set of ',~ conjoined disjt,p.ctions, checks the:, con:;~stcncy first of single disjuncls against the delinite part of the description. :h<, ixhat of pairs, and so on u I0 to ~>tuples for full cca~sistency. At each stage l,:, m~y disjunct that does not take part in any consist.ent /c'-tuple is eliminated. 2 If all the disjuncts in a disjunction are elhninated, the conjunction of which I:l~at disjm~ction is a conjuncl is eliminated too; and if the ooterlYlOSt c.onjm~ct.ion of the whole foaturc .~;tructure is ,qiminat.c-d, unifica.tiorl fails. This techl~ique has the adwmtage that the pruning of nodes a~ stage/e will make stage /c' + 1 more eflicieJqt. Nevertheless, since n can sometimes be quite large, this exhaustive process be time~consunfiug, and indeed in the limit will take exponential time.</Paragraph> <Paragraph position="4"> Propane's attempted solution to this problem is based on the hypothesis that the vast majority of large unrealiza.t~le di@mctive feature struct.ures that i:~mnber of disjunctions is independcnl o\[&quot; subCree size. 2 Smnewha.t confusingly, l(aspee uses the term &quot;n-wise consb~tency&quot; for I.he chedC/ing of 'n q-- l-luples of (tis.itmcts. \Ve avoid |,his usage.</Paragraph> <Paragraph position="5"> will be created in the use ofa practical natllt:al language grannnar will be not o~tly unrca.lizable, \])ut also &quot;pairwise ~mrealizable&quot;, in the sense that they will Nil at or betb,'e the second stage of l(asper's consistency check, for k = 2.</Paragraph> <Paragraph position="6"> The reasotl we can expect most unrealizable structures also to be pairwise unrealizable is that most comn rely, unrealizability will result from the contents of two nodes in the tree being incompatible, through assigning non-unifiable vah~es to the same positiol~ in a feature structure. Although there can clearly be exceptions, the hypothesis is that it is fairly unlikely, in a large disjunctive structure (which is the case where exponen|.iality would be harmful) that there would be a non-pairwise inconsistency but no pairwise inconsistency.</Paragraph> <Paragraph position="7"> Following this hypothesis, when the Propane unitier has created a structure, it checks a~d prunes it first for pairwise consistency, and if this succeeds, risks trying for a single full realization (one choice at each disjunct) straight away. Thus it differs from Kasper's algorithm in two ways: no exhaustive leo wise checks are made for k > 2. and when a flfll check is made, only one success is required, avoiding an exhaustive search through all combinations of disjuncts, a Of course, if the structure is pairwise realizable but not flflly realizable, the search for a single success will take exponential time; but., accordlug to the hypothesis, such occurrences, for structure.s with enough disjuncts for exponential time co be unacceptably long, should be extremely rare.</Paragraph> <Paragraph position="8"> The effectiveness of this strategy can only be judged by' observing its behaviour in practice. In fact, 7~o instances were observed of the search for a flfll realizabition taking an inordinately long time ar-Iel' pairwise consistency checking and pruning have succeeded. Thus it can be tentatively concluded that, wilh the current version of the Nadine grammar and with bottom-up parsing, the risk is worth taking: that is, a full realization is virtually always possible, in reasonable ~irne, tbr a pairwise consistent structure. Maxwell and Kaplan's (1989) belief that %.. \[simple inconsistencies\] become less predominant as grarnmars are extended to cover more and more linguistic phenomena&quot; does not therefore al ?ear to) be true of the Nadine grammar, in spite of its coverage of a wide range of phenomena at many linguisr, ie levels; or if it is true, it. does not affect the success of Propar~e's strategy'. That is. even if simple ineGnsistencies art less predominant, they are still common enough that a large structure that is unrealizable because of complex inconsistencies will also 3According to M\[axwell and Kaplan (1989), &quot;in practice, Kasper noted that...once bad singleton disjuncts have been eliminated, il is more efficient to switch to DNF \[disjunctive normal form\] (hart to compnie at\[ of tim higher degrees of consistency.&quot; This variation of the algorithm given in Kasper (1987) is closer t.o Propane's strategy, b~H. the expansion io full \[)N\[,&quot; is it.self in general an exponeatia\] pt'ocess and will. when many disjunctions remain, l.,e far more expensive Ihan looking for a single realizatiola.</Paragraph> <Paragraph position="9"> be unrealizable because of simple ones.</Paragraph> <Paragraph position="10"> Of course, this does not alter the fact that in general, i.e. for an arbitrary input and for an arbitrary grammar written in the Nadine formalism, Propane's unification algorithm, like Kasper's, is exponential in behaviour. In the limit, an exponential term in the formula for the time behaviour of an algorithm will dominate, however small its associated constant factor.</Paragraph> <Paragraph position="11"> Unlike Nadine's unifier, Propane's strategy has the property that when a structure survives consistency checking, not every member of every disjunct in it can necessarily participate in a full realization; that is, ideally, it should have been pruned. However, this property is only undesirable to the extent that, at the end of the parse, ii. makes any exhaustive search for flfll realizations inefficient through excessive backtracking. Again, in practice, this seems not to be a problem; exhaustive full realizat~ion is extremely quick compared to parsing.</Paragraph> <Paragraph position="12"> An analysis of Propane's processing of its corpus reveals quite wide variation in the relationship between the total number of disjunctions in a rule application (in both daughters and the rule) and the time taken to perform the unification. However, although, unsurprisingly, unification time increases with the number of (|isjunctions, it. appears from inspection to be perhaps linear with a small binomiM component, and not exponential. This is, in fact, what an analysis of the algorithm predicts.</Paragraph> <Paragraph position="13"> The linear component derives from the check of each disjunct sepa.rately against the definite part., while the parabolic component derives from the pairwise check. The relatively small size of the latter may imply t.hat a majority of disjuncts are eliminated during the first phase, so the second has less work to do.</Paragraph> </Section> class="xml-element"></Paper>