Probabilistic Parsing and Psychological Plausibility 
Thorsten Brants and Matthew Crocker 
Saarland University, COlnl)U|;al;ional Linguistics 
D-6G041 Saarbriicken, Germany 
{brants, crocker}@coi±, un±-sb, de 
Abstract 
Given the recent evidence for prot)abilistic 
mechanisms in models of hmnan aml)iguity res- 
olution, this paper investigates the plausibil- 
ity of exl)loiting current wide-coverage, 1)rob- 
al)ilistic 1)arsing techniques to model hmnan 
linguistic t)ert'orman(:e. In l)arl.i(:ulm ', we in- 
vestigate the, t)crforlnance of stan(tar(l stoclms- 
tic parsers when they arc revis(;(l to el)crate 
incrementally, and with reduced nlenlory re- 
sources. We t)resent techniques for ranking 
and filtering mlMyses, together with exl)erimen- 
tal results. Our results confirm that stochas- 
tic parsers which a(lhere to these 1)sy('hologi- 
cally lnotivated constraints achieve goo(l l)er- 
f()rman(:e. Memory cast t)e reduce(t (lown to 
1% ((:Oml)are(l to exhausitve search) without re- 
ducing recall an(l 1)rox:ision. A(lditionally, thes(; 
models exhil)it substamtially faster l)ertbrmance. 
FinMly, we ~rgue that this generM result is likely 
to hold for more sophisticated, nnd i)sycholin- 
guistically plausil)le, probal)ilistic parsing mod- 
els. 
1 Introduction 
Language engineering and coml)ut~tional psy- 
cholinguistics are often viewed as (list|net re- 
search progrmnmes: engineering sohttions aim 
at practical methods which ('an achieve good 
1)erformance, typically paying little attention 
to linguistic or cognitive modelling. Comlm- 
tational i)sycholing,fistics, on the other hand, 
is often focussed on detailed mo(lelling of hu- 
man lmhaviour tbr a relatively small number 
of well-studied constructions. In this paper we 
suggest that, broadly, the human sentence pro- 
cessing mechanism (HSPM) and current statis- 
ti(:al parsing technology can be viewed as having 
similar ol)jectives: to optimally (i.e. ral)idly and 
accurately) understand l;he text and utl;erances 
they encounter. 
Our aim is to show that large scale t)robabilis- 
tic t)arsers, when subjected to basic cognitive 
constraints, can still achieve high levels of pars- 
ing accuracy. If successful, this will contribute 
to a t)lausil)h; explanation of the fact th~tt I)(;() - 
\])lc, in general, are also extremely accurate and 
rol)llS(;. Sllch a 1'o81111; Wollld also strellgthclt ex- 
isting results showing that related l)robal)ilistic 
lne('hanisms can exl)lain specific psycholinguis- 
tic phenomena. 
To investigate this issue, we construct a stan- 
dard 'l)aseline' stochastic parser, which mir- 
rors t;he pertbrmance of a similar systems (e.g. 
(,lohnson, 1998)). We then consider an incre- 
re(total version of th(', parser, and (;v~,htat(; tim 
etf'c(:ts of several l)rol)al)ilistic filtering strate- 
gies which m'e us(,(l to 1)rune the l)arser's search 
space, and ther(;l)y r('(lu('(', memory load. 
rio &,,-;sess th(; generMity of oltr resnll;s for 
more Sol)histi(;ate(t prot)al)ilistic models, we also 
conduct experiments using a model in which 
parent-node intbrmation is encoded on the 
(laughters. This increase in contextual informa- 
tion has t)(;(;11 shown 1;o improve t)erforlnance 
(.Johnson, 1998), and the model is also shown 
to be rolmst to the inerementality and memory 
constraints investigated here. 
We present the results of parsing pertbr- 
mance ext)eriments , showing the accuracy of 
these systems with respect to l)oth a parsed 
corpus and the 1)aseline parser. Our experi- 
ments suggest that a strictly incremental model, 
in which memory resources are substantially 
reduced through filtering, can achieve l)reci- 
sion and recall which equals that of 'unre- 
stricted' systems. Furthermore, implementa- 
tion of these restrictions leads to substantially 
faster 1)(;rtbrmance. In (:onchlsion, we argue 
that such 1)road-coverage probabilistic parsing 
111 
models provide a valuable framework tbr ex- 
plaining the human capacity to rapidly, accu- 
rately, and robustly understand "garden va- 
riety" . This lends further supt)ort 
to psycholinguistic accounts which posit proba- 
bilistic ambiguity resolution mechanisms to ex- 
plain "garden path" phenomena. 
It is important to reiterate that our intention 
here is only to investigate the performance of 
probabilistic parsers under psycholinguistically 
motivated constraints. We do not argue for the 
psychological plausibility of SCFG parsers (or 
the parent-encoded variant) per se. Our inves- 
tigation of these models was motivated rather 
by our desire to obtain a generalizable result 
for these simple and well-understood models, 
since obtaining similar results for more sophisti- 
cated models (e.g. (Collins, 1996; Ratnaparkhi, 
199711 might have been attributed to special 
properties of these models. Rather, the current 
result should be taken as support tbr the poten- 
tial scaleability and performance of probabilistic 
I)sychological models such as those proposed by 
(aurafsky, 1996) and (Crocker and Brants, to 
appear). 
2 Psycholinguistic Motivation 
Theories of human sentence processing have 
largely been shaped by the study of pathologies 
in tnnnan  processing behaviour. Most 
psycholinguistic models seek to explain the d{f- 
ficulty people have in comprehending structures 
that are ambiguous or memory-intensive (see 
(Crocker, 1999) for a recent overview). While 
often insightflfl, this approach diverts attention 
from the fact that people are in fact extremely 
accnrate and effective in understanding the 
vast majority of their "linguistic experience". 
This observation, combined with the mounting 
psycholinguistic evidence for statistically-based 
mechanisms, leads us to investigate the merit of 
exploiting robust, broad coverage, probabilistie 
parsing systems as models of hmnan linguistic 
pertbrmance. 
The view that hmnan  processing 
can be viewed as an optimally adapted sys- 
tem, within a probabilistic fl'amework, is ad- 
vanced by (Chater et al., 19981, while (Juraf- 
sky, 19961 has proposed a specific probabilis- 
tic parsing model of human sentence process- 
ing. In work on human lexical category dis- 
ambiguation, (Crocker and Corley, to appear), 
have demonstrated that a standard (iimrmnen- 
tal) HMM-based part-of-speech tagger mod- 
els the finding from a range of psycholinguis- 
tic experiments. In related research, (Crocker 
and Brants, 19991 present evidence that an 
incremental stochastic parser based oll Cas- 
caded Markov Models (Brants, 1999) can ac- 
count tbr a range of experimentally observed 
local ambiguity preferences. These include 
NP/S complement ambiguities, reduced relative 
clauses, noun-verb category ambiguities, and 
'that'-ambiguities (where 'that' can be either a 
complementizer or a determiner) (Crocker and 
Brants, to appear). 
Crucially, however, there are differences be- 
tween the classes of mechanisms which are psy- 
chologically plausible, and those which prevail 
in current  technology. We suggest that 
two of the most important differences concern 
incrcmentality~ and memory 7vso'urces. There is 
overwhehning experimental evidence that peo- 
ple construct connected (i.e. semantically in- 
terpretable) analyses for each initial substring 
of an utterance, as it is encountered. That is, 
processing takes place incrementally, from left 
to right, on a word by word basis. 
Secondly, it is universally accecpted that peo- 
ple can at most consider a relatively small 
number of competing analyses (indeed, some 
would argue that number is one, i.e. process- 
ing is strictly serial). In contrast, many exist- 
ing stochastic parsers are "unrestricted", in that 
they are optinfised tbr accuracy, and ignore such 
t)sychologically motivated constraints. Thus the 
appropriateness of nsing broad-coverage proba- 
bilistic parsers to model the high level of hu- 
man performance is contingent upon being able 
to maintain these levels of accuracy when the 
constraints of" incrementality and resource limi- 
rations are imposed. 
3 Incremental Stochastic 
Context-Free Parsing 
The fbllowing assumes that the reader is fa- 
miliar with stochastic context-free grammars 
(SCFG) and stochastic chart-parsing tech- 
niques. A good introduction can be found, e.g., 
in (Manning and Schfitze, 19991. We use stan- 
dard abbreviations for terminial nodes, 11051- 
terminal nodes, rules and probabilities. 
112 
This t)tq)er invcsl;igates stochastic (;onl;(;xl;- 
fl'ee parsing l)ascd on ~ grmmmu" (;hat is (tcrivc(l 
from a trcel)ank, starting with 1)art-ofsl)eech 
ta,gs as t(;rlninals. The gl:;~nllnt~r is (lcriv(;d l)y 
(:olle(:ting M1 rul('.s X -+ c~ th;tt oc(:ur in the tr(',(;- 
bank mM (;heir ffe(lU(m(:i('~s f. The l)l'()l);tl)ilil;y 
of a rule is set to 
.f(x l'(x - (:l) 
E .f(x 
fl 
\],br ~ descril)l;ion of treebank grammars see 
(Charniak, 1.996). The gr~mmmr does not coii- 
ta.in c-rules, oth(:rwis(: th(:r(: is no restriction 
oll the rules. In particular, w(: do not r(:quir(' 
C homsky-NormM-Form. 
In addition to the rult:s tha(; corr(:st)ond 
(;o sl;rucl;ur(:s in th(: corpus, w(: a.dd ;~ new 
st~u:l; sylnl)ol ROOT to l;h(; grnmmar and rules 
ROOT -~ X for all non-t;(;rminals X togel;lwx 
with l)rol)al)iliti('s (h:):iv(:d l'roln th(: root n()(t(:s 
in th(: tort)us I. 
For t)m:sing th(:se gr~unmn)'s, w(: r(:ly upon 
n stan(tard l)oLi;onl-U t) (:ha.rl,-t)arsing t(:(:hniqu(: 
with n modification for in(:rcmental parsing, i.(:., 
tbt" each word, all edges nr(: proc(:ss(:d and l)ossi- 
b\]y 1)run(:d 1)(:ti)r(: \])ro(:e(:(ling to the next word. 
Th(: outlilm of th(: Mgorithm is as follows. 
A (:hart; (:ntry 1~ (:onsists of a sl;;u:I, aim (:n(l 1)o- 
sition i ;rod j, a (tott(:d rul(: X ~ (~:.'7, tim insi(t(: 
l)rol)nl)ility fl(Xi,.j) thud; X g(:n(:ra.tx:s l;ll(: t(:rmi- 
hal string from t)osi(:ion i to .7, mM information 
M)out th(: most l)robat)\](: ilL~i(t(' stru(:i;ur(:. 1t7 th(: 
dot of th(: dotte(t ruh: is nt th(' rightmost i)osi- 
tion, the corresl)ondillg (:(lg(: is an inactive edg(:. 
If the (tot is at mty other 1)osition, il; is mt ,,ctivc, 
edge. Imu:l;ivo, e(tgcs repr(',scnt re('ogniz(',d hypo- 
(:heti(:a,1 constituents, whil(; a(:tiv(; (;(lg(',s r(;1)r(;- 
s(:nt 1)r(:lixes of hyl)ol;heticM (:()llsi;it;ll(:lll;s. 
Th(: ith t(:rminal nod(: I,i l;lla, t; (:nt(:rs th(: (:hart 
gencra, tcs an inactive edge for l;\]m span (i - 1, i). 
Ba, sed on this, n(;w active mid inactive (;(lges are 
generated according to the stan(t~tr(t algorithm. 
Sine(: we are ilfl;(:r(:stcd in th(: most i)robM)le 
pars(:, the chart can be minimized in th(: tbl- 
lowing way whik: sti\]l 1)crfi)rming an ('xhaustiv(: 
search. If" ther(: is mor(: l;hm~ one (:(lg(~ that cov- 
ers a span (i,j) having (;h(', sa, me non-t(:rminM 
symbol on th(; lefIAmnd side of th(: (to(,(x:(l rule, 
1The ROOT node is used int;ernally fl)r parsing; it is 
neither emitted nor count,ed for recall and l)recision. 
only the one with the highest inside prol)M)ility 
is k(;1)t ill tit(; (:\]mrt. The others cmmot con- 
trilmt(; to th(; most i)rol)M)le 1)nrse.. 
For an ina('tiv(: edge si)aiming i to j and rei)- 
rcs(mting the rule X --> y1...yq~ the inside 
l)robM)ility/31 is set to 
\[d 
il = 1"(x -+ H (2) 
/=\] 
wher(: il and jl mm'k the start and end t)ostition 
of Yl having i = il nnd j = Jr. The insid(: 
prol)M)ility tbr an active cdg(: fiA with the dot 
after th(: kth syml)ol of th(: right-hmM side is 
sol, to k 
II <r ' t It il,jl} (3) 
l-d 
W(: (lo not use the t)rol)M)i\]ity of th(: rule a.t this 
point. This allows us to ('oral)in(: a.ll (:(Ig(:s with 
(;h(: sam(: st)m~ and th(: dot al; th(: sam(: 1)osition 
but with (liiI'er(:uI; symbols on the l(,ft-hmM side. 
Jntrodu(:ing a distinguish(:(1 M't-hand sid(: only 
for in~mtiv(: (:(lg('s significantly r(:du(;(:s th(: nun> 
b(:r of a(:(;iv(: (:dg(:s in the (:hm't. This goes one 
st, e t) furth(:r than lint)licitly right-1)inarizing th(: 
grmmnar; not only suilix(:s of right-hmM si(h:s 
are join(:(t, but also l;hc ('orr(:sponding l(:fi;-hand 
sid(:s. 
d Memory Restrictions 
\¥(: inv(:stig~rt(: th(: (dimin~I;ion (pruning) ()f 
edges from th(: ('hnrt in our in('r(:nl(:n|;a| \])re's- 
ing sch(:m(:. Aft(:r processing a word and b(:fi))'(: 
1)roc(:cding to the n(:xt word during incremental 
1)re:sing, low rnnk(,d edges ~r(: removed. This is 
(:(luivM(:lfl; t;() imposing m(:mory rcsia'ictions on 
the t)ro(:('ssing system. 
The, original algorithm k('ei)s on(; edge in th(: 
(:hart fi)r each (:oml)ination of span (start and 
cn(l position) ~md non-tcrmimd symbol (for in- 
active edges) or right-hand side l)r(:fixcs of (lot;- 
te(t rules (for active edges). With 1)tinting, we 
restric(; the mmfl)cr of edges allowed per span. 
The limit~tion (:an b(: cxi)resscd in two ways: 
1. Va'riable bcam,. Sch:ct a threshold 0 > 1. 
Edg(: c. is removed, ill its 1)rol)ability is p~:, 
I;lm 1)csl; l)rol)M)ility fi)r the span is Pl, and 
v,; < pl_. (~l) 
0 
113 
2. Fixed beam. Select a maximum number of 
edges per span m. An edge e is removed, if 
its prot)ability is not in the first m highest 
probabilities tbr edges with the same span. 
We pertbrmed exl)eriments using both types 
of beauls. Fixed beams yielded consistently bet- 
ter results than variable beams when t)lotting 
chart size vs. F-score. Thereibre, the following 
results are reported tbr fixed t)eams. 
We, compare and rank edges covering the 
same span only, and we rank active and inactive 
edges separately. This is in contrast to (Char- 
niak et al., 1998) who rank all edges. They 
use nornmlization in order to account tbr dif- 
ferent spans since in general, edges for longer 
spans involve more nmltiplications of t)robabil - 
ities, yielding lower probabilities. Charniak et 
al.'s normalization value is calculated by a dil- 
ferent probability model than the inside proba- 
bilities of the edges. So, in addition to the nor- 
malization for different span lengths, they need 
a normalizatio11 constant that accounts tbr the 
different probability models. 
This investigation is based on a much simt)ler 
ranking tbrmula. We use what can be described 
as the unigram probability of a non-terminal 
node, i.e., the a priori prot)ability of the col 
resl)onding non-ternlinal symbol(s) times the 
inside t)robat)ility. Thus, fi~r an inactive edge 
(i, j, X --> (~,/31(Xi,j)}, we use the l)rob~fl)ility 
Pm(Xi,j) = P(X) . P(tg...tj_IIX) (5) 
= 
for ranking. This is the prol)ability of the node 
and its yield being present in a parse. The 
higher this value, |;lie better is this node. flI is 
the inside probability for inactive edges as given 
in eqnation 2, P(X) is the a priori probability 
tbr non-terminal X, (as estimated from the fre- 
quency in the training COrlmS) and Pm is the 
probability of the edge tbr the non-terminal X 
spanning positions i to j that is used tbr rank- 
ing. 
For an active edge {i,j,X --~ y1 ...yk. yk+l ym, y) k 
• " ~,,~ )) "'" ~A( il,jl (the (tot is aI" 
ter the kth symbol of 
llSe: 
the right-hand side) we 
(7) 
= P(YI...Yk).flA(EI~,:h...Yi~,jk) (9) 
p(yl ,,, yk) can be read ()If the corpus. It is 
the a priori probability that the right-hand side 
of a production has the prefix y1 ... y/c, which 
is estilnated by 
f(yl ... yt~ is prefix) 00) 
N 
where N is the total number of productions in 
the corpus 2, i = ij, j = j/~ and flA is the inside 
probability of the pretix. 
5 Experiments 
5.1 Data 
We use sections 2 - 21 of the Wall Street Jour- 
lYecl)ank (Marcus el; al., nal part of' the Penn ~ " 
1993) to generate a treebank grammar. Traces, 
flmctional tags and other tag extensions that do 
not mark syntactic category are removed before 
training 3. No other modifications are made. For 
testing, we use the \] 578 sentences of length 40 
or less of section 22. The input to the parser is 
the sequence of i)art-ofspeech tags. 
5.2 Evaluation 
For evaluation, we use the parsewfi measures 
and report labeld F-score (the harmolfiC mean 
of labeled recall and labeled precision). R.eport- 
ing the F-score makes ore" results comt)aral)le to 
those of other previous experinmnts using the 
same data sets. As a nleasure of the anlount 
of work done by the parser, we report the size 
of the chart. The mnnl)er of active and imm- 
rive edges that enter the chart is given tbr the 
exhaustive search, not cored;lug those hypothet- 
ical edges theft are replaced or rejected because 
there is an alternative edge with higher t)roba- 
t)ility 4. For t)runed search, we give |:tie percent- 
age of edges required. 
5.3 Fixed Beam 
For our experiments, we define the beam by a 
maximunl number of edges per span. Beams 
for active and inactive edges are set separately. 
The Imams run from 2 to 12, and we test all 
2Here, we use proper prefixes, i.e., all prefixes not 
including the last element. 
aAs an example, PP-TMP=3 is replaced 173, PP. 
4The size of the chart is corot)arable to the "number 
of edges popped" as given in (Chanfiak et al., 1998). 
114 
i 
78 
77 
(D 
8 7o cJ~ 
75 
(1) 
74 z$ 
73 
72 
71 
79 
Results with Original and Parent Encoding 
A 
ctive: 8 / ma¢.:l'~v°", "~I ;t(:l ix'(" (i 
/j j~ activ(',: 3 
C 
" ' ';" F llldCt 1V( 12 
active: 9 
inactive,: 2 
active: 3 / 
1.0 1.2 
ina(:tiv(',: 6 ilmctive: 8 
a(:l.ive: d . a('tivo,: 7 
I I 1 \[ \] i I I i -- 
\] .d 1.6 1.8 2.0 2.2 2.d 2.6 2.8 3.0 % (:hart; size 
Figure 1: \]!}xt)erimental results tbr increJnelfl;al parsing and t)rmfing. The figm:e shows the percent- 
age of edges relative to (',xhaustiv(; s(;ar(:h mid l;h(', F-s(:()re a(:hieved with this chart size. Exhaustive 
search yiehled 71.21% fin" th(; original en(:o(ting and 7!).28% for the I)arent (m(:o(ting. l/.c, sull;s in the 
grey ar(;as are equiwflent with a (:()nli(l('n(:('~ (tegr(',e of (~ =: 0.99. 
12\] comlfi\]~ati(ms of the, s(~ lmmus for ac:i;ivc and 
illactiw~ edges, l~ach setting results in a lm.ri;ic - 
ulm" average size of l;he chart and an F-score, 
which arc tel)erred ill (;he following se(:l;ioll. 
5.4 Experimental Results 
The results of our 121 tes(; Hills with (tifl'erent 
settings for active and in;u:tivc \])(~a.ms m'e given 
in figure 1. The (tittgranl shows ch~trt sizes vs. 
labeled F-scores. It sorts char|; sizes across dif 
ferent sel;l;ings of the beams. If several beam 
sett;ings result in equiwdenfi chart sizes, the di- 
agram cent;tins the one yielding th(', highes|, F- 
SCOI'(L 
The 111~ill tinding is thai: we can r('xlu(:e the 
size of the chart to l)el;ween 1% and 3% of 
the size required fi)r exhaustive s(,ar(:h without 
affecting the results. Only very small 1)cams 
d(;grad(' t)ertbrmance 5. The eiti;ct occurs for 
both models despite the simple ranking formub~. 
This significantly reduces memory r(,quirements 
'~Givc, n the' amount of test data (26,322 non-terminal 
nod(!s), results within a rang(' of around 0.7% arc cquiv- 
al(mt with a (:onfidcnc(; degr(',(, of (~ = 99%. 
(given as size of the chart) and increases l)m'sing 
qmed. 
i1 t Exhaustive search yields an I-Score of 
71.21 % when using the original Petal %'eel)ank 
cn(:odh~g. ()nly around 1% the edges are re- 
(tuir('.d to yield e.(tuiwdcnt resul(;s with incrcm(,.n- 
tal processing and printing after each word is 
added to the chart;. This result is, among other 
settings, obtained by a tixcd beam of 2 for in- 
active edges and 3 tin" active e(lges ri 
1,br the parmtt encoding, exhaustive search 
yields an l,-Scorc of 79.28%. Only 1)etween 2 
mM 3% of the edges are required to yMd an 
equiwflcnt result with incremental t)l'OCcSSillg 
and pruning. As an cXmnl)le, the point at size 
= 3.0% F-score = 79.1% is generated by the 
beam setting of 12 for imml;ive and 9 tbr active 
edges. The parent encoding yields around 8% 
higher F-scores but it also imposes a higher ab- 
solute and relative memory load on t;he process. 
The higher (hw'ee of par~dlelism in l;he inactive 
(;Using variable Imams, wc would nccd \].95% of the 
\[:hart entries 1;o achieve an (KlllivalenI; F-scor(x 
115 
chart stems from the parent hytmthesis in each 
node. In terms of pure node categories, the av- 
erage number of parallel nodes at this point is 
3.5 7 . 
Exhaustive search for the base encoding needs 
in average 140,000 edges per sentence, tbr tile 
parent encoding 200,000 edges; equivalent re- 
sults for the base encoding can be achieved with 
around 1% of these edges, equivalent results tbr 
the parent encoding need between 2 and 3%. 
The lower mmlber of edges significantly in- 
creases parsing speed. Using exhaustive search 
tbr the base model, the parser processes 3.0 to- 
kens per second (measured on a Pentium III 
500; no serious efforts of optimization have gone 
into the parser). With a chart size of 1%, speed 
is 630 tokens/second. This is a factor of 210 
without decreasing accuracy. Sl)eed for the par- 
ent model is 0.5 tokens/second (exhaustive) and 
111 tokens/seconds (3.0% chart size), yielding 
an improvement by factor 220. 
6 Related Work 
Probably mostly related to the work reported 
here are (Charniak et al., 1998) and (Roark and 
Johnson, 1999). Both report on significantly 
improved parsing efl:iciency by selecting only 
subset of edges tbr processing. There are three 
main differences to our at)t)roach. One is that 
they use a ranking fbr best-first search while 
we immediately prune hypotheses. They need 
to store a large number edges because it is not 
known in advance how maw of the edges will be 
used until a parse is found. Tile second differ- 
ence is that we proceed strictly incrementally 
without look-ahead. (Chanfiak et al., 1998) 
use a non-incremental procedure, (Roark and 
Johnson, 1999) use a look-ahead of one word. 
Thirdly, we use a much simpler ranking tbnnula. 
Additionally, (Chanfiak et al., 1998) and 
(Roark and Johnson, 1999) do not use the 
original Penntree encoding tbr the context-fl'ee 
structures. Betbre training and parsing, they 
change/remove some of the productions and in- 
troduce new part-of-speech tags tbr auxiliaries. 
The exact effect of these modifications is un- 
known, and it is unclear if these affect compa- 
7For the active chart, lmralellism cannot be given for 
different nodes types since active edges are introduced 
fbr right-hand side prefixes, collapsing all possible left- 
hand sides. 
rability to our results. 
Tile heavy restrictions in our method (imme- 
diate pruning, no look-ahead, very simple rank- 
ing formula) have consequences on the accuracy. 
Using right context and sorting instead of prun- 
ing yields roughly 2% higher results (compared 
to our base encodingS). But our work shows 
that even with these massive restrictions, the 
chart size can be reduced to 1% without a de- 
crease in accuracy when compared to exhaustive 
search. 
7 Conclusions 
A central challenge in computational psycholin- 
guistics is to explaiu how it is that people are 
so accurate and robust in processing . 
Given the substantial psycholinguistic evidence 
tbr statistical cognitive mechanisms, our objec- 
tive in this paper was to assess the plausibility 
of using wide-coverage probabilistic parsers to 
model lmman linguistic performance. In par- 
ticular, we set out to investigate the effects of 
imposing incremental processing and significant 
memory limitations on such parsers. 
The central finding of our experiments is that 
incremental parsing with massive (97% - 99%) 
pruning of the search space does not impair 
the accuracy of stochastic context-free parsers. 
This basic finding was rotmst across different 
settings of the beams and tbr the original Penn 
Treebank encoding as well as the parent encod- 
ing. We did however, observe significantly re- 
duced memory and time requirements when us- 
ing combined active/inactive edge filtering. To 
our knowledge, this is the first investigation on 
tree-bank grammars that systematically varies 
the beam tbr pruning. 
Our ainl in this paper is not to challenge 
state-of-the-art parsing accuracy results. For 
our experiments we used a purely context-ti'ee 
stochastic parser combined with a very sim- 
ple pruning scheme based on simple "unigram" 
probabilities, and no use of right context. We 
do, however suggest that our result should ap- 
ply to richer, more sophistacted probabilistic 
SComparison of results is not straight-forward since 
(Roark and Johnson, 1999) report accuracies only tbr 
those sentences for which a parse tree was generated (be- 
tween 93 and 98% of the sentences), while our parser 
(except for very small Imams) generates parses for vir- 
tually all sentences, hence we report; accuracies for all 
sentences. 
116 
models, e.g. when adding word st~tistics to the 
model (Charni~d¢, 1997). 
We thereibre conclude theft wide-covcr~ge, 
prol)~fl)ilistic pnrsers do not suffer impaired a('- 
curacy when subject to strict cognii;iv(~ meXnOl'y 
limitntions mM incremental processing. Fm'- 
thermore, parse times are sut)stm~ti~fily reduced. 
This sltggt',sts that it; m~y lie fruit;tiff to tlur,sllC 
the use of these models within ¢',onlt)utational 
l)sycholinguistics, where it: is necessary to ex- 
plain not Olfly the relatively r~tr(; 'pathologies' of 
the hmmm parser, but also its mor(; fl'e(tuently 
ol)scrved ~u:(:ur~my m~(1 rol)llSiilless. 

References 

Thorsten Brants. 1999. Cascaded Markov models. In Proceedings of 9th, Conference of 
the European Chapter of the Association for 
Computational Linguistics EACL-99, B(;rg(;n, 
Norway. 

Eugene Charniak, Sharon Goldwater, and Mark Johnson. 1998. Edge-based best-first chart parsing. In PRoceedings of the Sixth Workshop on Very Large Corpora (WVLC-98), 
Montreal, K~ma(la. 

Eugene Ch~rni~fl¢. 1996. '15:ee-bank grmmm~rs. 
In P'rocecding,~ of t,h,c Th, irtec'nth, National 
Cm@rc'.nce on A'rt'ti/icial lntdlig(:,m:~:, l)a.g(,,s 
1031 1036, Menlo Pnrk: AAA\] Press/M1T 
l)i.ess. 

\]!htgen('. Chm:nia.k. 1997. Sl;a.i;isti(:al \]mrs- 
ing wit;h ~t context-fr(:(~ gl:;41111llVtl' 2.1~11(1 \voF(| 
statistics. In P~'occ.cdings qf the \],b,a'rt,(:enth 
National Co~@'r('.nce o'n A'rt~ificial Intelli- 
gence, pagc.s 1031 1036, Menlo Park: AAAI 
Press/MIT Press. 

Nicholas Chafer, Matthew Crock(;l', ~md Martin 
Pickcring. 1998. The rational analysis of in- 
quiry: The case. for parsign. In Charter and 
O~ksfor(1, editors, Ratio'hal Models o/" Cog'ni- 
tion. Oxford University Press. 

Michael Collins. 1996. A new statistical parser 
based on bigram lexical dependencies, in 
Proceedings of ACL-96, Sa, llta, Cruz, CA, 
USA. 

Matthew Crocker and Thorsten Br~mts. 1999. 
Incremental probabilisti(: lnodels of lmman 
linguistic perform;race. In The 5th Cm@r- 
cnce on Arc:hitcctu~v.s a'nd Mcch, anism.,s for 
La'nguagc Processing, Edi~flmrgh, U.K. 

Matthew Crocker and Thorst;en Brmfl;s. to ~t)- 
l)car. Wide cover~ge l)rol)~flfilistic sentence 
processing. Journal of Psych, oling'aistic Re- 
search,, November 2000. 

M~tthew Cro('kex mM Steil'an Corley. to ~l)- 
peru:. Modulm" nrchitectures and statisticM 
mcchnnisms: The case. frolli lexical category 
disnmbiguntion. In Merlo and Stevenson, ed- 
itors, The Lczical Basis of Sentence Process- 
in9. John Bcnjamins. 

1VI~tthew Crocker. 1999. Mech~misms for scn- 
|;ellce, tn'oeessing. In Garrod and Pieker- 
ing, editors, Language Proc~ssing. Psychology 
\])ross, London, UK. 

Mark Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613 632. 

\])~mi(;l .\]m:at~ky. \]996. A t)robabilistic n|o(M of 
lexi(:~tl nnd syntactic a(:(:ess and (lisambigua- 
tion. Cognitive Science, 20:137 194. 

Christot)her Mmming mid Him'ieh S(:lfiil;ze_ 
1999. l,b,a.ndatiou, s of Statistical Natural Lan- 
9'uag(: P'roct'.s,si'ng. MKI' Press, Cmnl)ridge, 
Mass~Lc:husetts. 

Mitchell Marcus, Beatrice Santorini, and 
Mary Ann Marcinkiewicz. 1993. Building 
a large annotated corpus of English: The 
Penn Treebank. Computational Linguistics, 
)(2):313 330. 

Adwait Ratnaparkhi. 1997. A linear observed 
time statistical parser based on maximum entropy models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP-97 Providence, 
11\]. 

Brian Rourk and Mark Johnson. 1199. Efficient 
probabilistic rop-down and left-corner parsing. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. ACL-99, Maryland. 
