XML Viewer - c00-1017

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/00/c00-1017_metho.xml
Size: 11,216 bytes
Last Modified: 2025-10-06 14:07:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="C00-1017">
  <Title>Probabilistic Parsing and Psychological Plausibility</Title>
  <Section position="3" start_page="111" end_page="111" type="metho">
    <SectionTitle>
2 Psycholinguistic Motivation
</SectionTitle>
    <Paragraph position="0"> Theories of human sentence processing have largely been shaped by the study of pathologies in tnnnan language processing behaviour. Most psycholinguistic models seek to explain the d{fficulty people have in comprehending structures that are ambiguous or memory-intensive (see (Crocker, 1999) for a recent overview). While often insightflfl, this approach diverts attention from the fact that people are in fact extremely accnrate and effective in understanding the vast majority of their &amp;quot;linguistic experience&amp;quot;. This observation, combined with the mounting psycholinguistic evidence for statistically-based mechanisms, leads us to investigate the merit of exploiting robust, broad coverage, probabilistie parsing systems as models of hmnan linguistic pertbrmance.</Paragraph>
    <Paragraph position="1"> The view that hmnan language processing can be viewed as an optimally adapted system, within a probabilistic fl'amework, is advanced by (Chater et al., 19981, while (Jurafsky, 19961 has proposed a specific probabilistic parsing model of human sentence processing. In work on human lexical category disambiguation, (Crocker and Corley, to appear), have demonstrated that a standard (iimrmnental) HMM-based part-of-speech tagger models the finding from a range of psycholinguistic experiments. In related research, (Crocker and Brants, 19991 present evidence that an incremental stochastic parser based oll Cascaded Markov Models (Brants, 1999) can account tbr a range of experimentally observed local ambiguity preferences. These include NP/S complement ambiguities, reduced relative clauses, noun-verb category ambiguities, and 'that'-ambiguities (where 'that' can be either a complementizer or a determiner) (Crocker and Brants, to appear).</Paragraph>
    <Paragraph position="2"> Crucially, however, there are differences between the classes of mechanisms which are psychologically plausible, and those which prevail in current language technology. We suggest that two of the most important differences concern incrcmentality~ and memory 7vso'urces. There is overwhehning experimental evidence that people construct connected (i.e. semantically interpretable) analyses for each initial substring of an utterance, as it is encountered. That is, processing takes place incrementally, from left to right, on a word by word basis.</Paragraph>
    <Paragraph position="3"> Secondly, it is universally accecpted that people can at most consider a relatively small number of competing analyses (indeed, some would argue that number is one, i.e. processing is strictly serial). In contrast, many existing stochastic parsers are &amp;quot;unrestricted&amp;quot;, in that they are optinfised tbr accuracy, and ignore such t)sychologically motivated constraints. Thus the appropriateness of nsing broad-coverage probabilistic parsers to model the high level of human performance is contingent upon being able to maintain these levels of accuracy when the constraints of&amp;quot; incrementality and resource limirations are imposed.</Paragraph>
  </Section>
  <Section position="4" start_page="111" end_page="113" type="metho">
    <SectionTitle>
3 Incremental Stochastic
Context-Free Parsing
</SectionTitle>
    <Paragraph position="0"> The fbllowing assumes that the reader is familiar with stochastic context-free grammars (SCFG) and stochastic chart-parsing techniques. A good introduction can be found, e.g., in (Manning and Schfitze, 19991. We use standard abbreviations for terminial nodes, 11051terminal nodes, rules and probabilities.</Paragraph>
    <Paragraph position="1">  This t)tq)er invcsl;igates stochastic (;onl;(;xl;fl'ee parsing l)ascd on ~ grmmmu&amp;quot; (;hat is (tcrivc(l from a trcel)ank, starting with 1)art-ofsl)eech ta,gs as t(;rlninals. The gl:;~nllnt~r is (lcriv(;d l)y (:olle(:ting M1 rul('.s X -+ c~ th;tt oc(:ur in the tr(',(;bank mM (;heir ffe(lU(m(:i('~s f. The l)l'()l);tl)ilil;y of a rule is set to</Paragraph>
    <Paragraph position="3"> \],br ~ descril)l;ion of treebank grammars see (Charniak, 1.996). The gr~mmmr does not coiita.in c-rules, oth(:rwis(: th(:r(: is no restriction oll the rules. In particular, w(: do not r(:quir(' C homsky-NormM-Form.</Paragraph>
    <Paragraph position="4"> In addition to the rult:s tha(; corr(:st)ond (;o sl;rucl;ur(:s in th(: corpus, w(: a.dd ;~ new st~u:l; sylnl)ol ROOT to l;h(; grnmmar and rules ROOT -~ X for all non-t;(;rminals X togel;lwx with l)rol)al)iliti('s (h:):iv(:d l'roln th(: root n()(t(:s in th(: tort)us I.</Paragraph>
    <Paragraph position="5"> For t)m:sing th(:se gr~unmn)'s, w(: r(:ly upon n stan(tard l)oLi;onl-U t) (:ha.rl,-t)arsing t(:(:hniqu(: with n modification for in(:rcmental parsing, i.(:., tbt&amp;quot; each word, all edges nr(: proc(:ss(:d and l)ossib\]y 1)run(:d 1)(:ti)r(: \])ro(:e(:(ling to the next word. Th(: outlilm of th(: Mgorithm is as follows.</Paragraph>
    <Paragraph position="6"> A (:hart; (:ntry 1~ (:onsists of a sl;;u:I, aim (:n(l 1)osition i ;rod j, a (tott(:d rul(: X ~ (~:.'7, tim insi(t(: l)rol)nl)ility fl(Xi,.j) thud; X g(:n(:ra.tx:s l;ll(: t(:rmihal string from t)osi(:ion i to .7, mM information M)out th(: most l)robat)\](: ilL~i(t(' stru(:i;ur(:. 1t7 th(: dot of th(: dotte(t ruh: is nt th(' rightmost i)osition, the corresl)ondillg (:(lg(: is an inactive edg(:. If the (tot is at mty other 1)osition, il; is mt ,,ctivc, edge. Imu:l;ivo, e(tgcs repr(',scnt re('ogniz(',d hypo(:heti(:a,1 constituents, whil(; a(:tiv(; (;(lg(',s r(;1)r(;s(:nt 1)r(:lixes of hyl)ol;heticM (:()llsi;it;ll(:lll;s. Th(: ith t(:rminal nod(: I,i l;lla, t; (:nt(:rs th(: (:hart gencra, tcs an inactive edge for l;\]m span (i - 1, i).</Paragraph>
    <Paragraph position="7"> Ba, sed on this, n(;w active mid inactive (;(lges are generated according to the stan(t~tr(t algorithm.</Paragraph>
    <Paragraph position="8"> Sine(: we are ilfl;(:r(:stcd in th(: most i)robM)le pars(:, the chart can be minimized in th(: tbllowing way whik: sti\]l 1)crfi)rming an ('xhaustiv(: search. If&amp;quot; ther(: is mor(: l;hm~ one (:(lg(~ that covers a span (i,j) having (;h(', sa, me non-t(:rminM symbol on th(; lefIAmnd side of th(: (to(,(x:(l rule, 1The ROOT node is used int;ernally fl)r parsing; it is neither emitted nor count,ed for recall and l)recision.</Paragraph>
    <Paragraph position="9"> only the one with the highest inside prol)M)ility is k(;1)t ill tit(; (:\]mrt. The others cmmot contrilmt(; to th(; most i)rol)M)le 1)nrse..</Paragraph>
    <Paragraph position="10"> For an ina('tiv(: edge si)aiming i to j and rei)rcs(mting the rule X --&gt; y1...yq~ the inside l)robM)ility/31 is set to</Paragraph>
    <Paragraph position="12"> wher(: il and jl mm'k the start and end t)ostition of Yl having i = il nnd j = Jr. The insid(: prol)M)ility tbr an active cdg(: fiA with the dot after th(: kth syml)ol of th(: right-hmM side is sol, to k</Paragraph>
    <Paragraph position="14"> W(: (lo not use the t)rol)M)i\]ity of th(: rule a.t this point. This allows us to ('oral)in(: a.ll (:(Ig(:s with (;h(: sam(: st)m~ and th(: dot al; th(: sam(: 1)osition but with (liiI'er(:uI; symbols on the l(,ft-hmM side.</Paragraph>
    <Paragraph position="15"> Jntrodu(:ing a distinguish(:(1 M't-hand sid(: only for in~mtiv(: (:(lg('s significantly r(:du(;(:s th(: nun&gt; b(:r of a(:(;iv(: (:dg(:s in the (:hm't. This goes one st, e t) furth(:r than lint)licitly right-1)inarizing th(: grmmnar; not only suilix(:s of right-hmM si(h:s are join(:(t, but also l;hc ('orr(:sponding l(:fi;-hand sid(:s.</Paragraph>
    <Paragraph position="16"> d Memory Restrictions \Y=(: inv(:stig~rt(: th(: (dimin~I;ion (pruning) ()f edges from th(: ('hnrt in our in('r(:nl(:n|;a |\])re'sing sch(:m(:. Aft(:r processing a word and b(:fi))'(: 1)roc(:cding to the n(:xt word during incremental 1)re:sing, low rnnk(,d edges ~r(: removed. This is (:(luivM(:lfl; t;() imposing m(:mory rcsia'ictions on the t)ro(:('ssing system.</Paragraph>
    <Paragraph position="17"> The, original algorithm k('ei)s on(; edge in th(: (:hart fi)r each (:oml)ination of span (start and cn(l position) ~md non-tcrmimd symbol (for inactive edges) or right-hand side l)r(:fixcs of (lot;te(t rules (for active edges). With 1)tinting, we restric(; the mmfl)cr of edges allowed per span.</Paragraph>
    <Paragraph position="18"> The limit~tion (:an b(: cxi)resscd in two ways:  1. Va'riable bcam,. Sch:ct a threshold 0 &gt; 1.</Paragraph>
    <Paragraph position="19">  Edg(: c. is removed, ill its 1)rol)ability is p~:, I;lm 1)csl; l)rol)M)ility fi)r the span is Pl, and v,; &lt; pl_. (~l)  2. Fixed beam. Select a maximum number of  edges per span m. An edge e is removed, if its prot)ability is not in the first m highest probabilities tbr edges with the same span.</Paragraph>
    <Paragraph position="20"> We pertbrmed exl)eriments using both types of beauls. Fixed beams yielded consistently better results than variable beams when t)lotting chart size vs. F-score. Thereibre, the following results are reported tbr fixed t)eams.</Paragraph>
    <Paragraph position="21"> We, compare and rank edges covering the same span only, and we rank active and inactive edges separately. This is in contrast to (Charniak et al., 1998) who rank all edges. They use nornmlization in order to account tbr different spans since in general, edges for longer spans involve more nmltiplications of t)robabil ities, yielding lower probabilities. Charniak et al.'s normalization value is calculated by a dilferent probability model than the inside probabilities of the edges. So, in addition to the normalization for different span lengths, they need a normalizatio11 constant that accounts tbr the different probability models.</Paragraph>
    <Paragraph position="22"> This investigation is based on a much simt)ler ranking tbrmula. We use what can be described as the unigram probability of a non-terminal node, i.e., the a priori prot)ability of the col resl)onding non-ternlinal symbol(s) times the inside t)robat)ility. Thus, fi~r an inactive edge</Paragraph>
    <Paragraph position="24"> for ranking. This is the prol)ability of the node and its yield being present in a parse. The higher this value, |;lie better is this node. flI is the inside probability for inactive edges as given in eqnation 2, P(X) is the a priori probability tbr non-terminal X, (as estimated from the frequency in the training COrlmS) and Pm is the probability of the edge tbr the non-terminal X spanning positions i to j that is used tbr ranking. null For an active edge {i,j,X --~ y1 ...yk. yk+l ym, y) k</Paragraph>
    <Paragraph position="26"> ter the kth symbol of llSe: the right-hand side) we</Paragraph>
    <Paragraph position="28"> p(yl ,,, yk) can be read ()If the corpus. It is the a priori probability that the right-hand side of a production has the prefix y1 ... y/c, which is estilnated by f(yl ... yt~ is prefix) 00) N where N is the total number of productions in the corpus 2, i = ij, j = j/~ and flA is the inside probability of the pretix.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML