The information-processing difficulty of incremental parsing
John Hale
Department of Linguistics and Languages
Michigan State University
East Lansing, MI 48824-1027
jthale@msu.edu
Abstract
When an incremental parser gets the next word,
its expectations about upcoming grammatical struc-
tures can change. When a word greatly constrains
these grammatical expectations, uncertainty is re-
duced. This elimination of possibilities constitutes
information processing work. Formalizing this no-
tion of information processing work yields a com-
plexity metric that predicts human repetition ac-
curacy scores across a systematic class of linguis-
tic phenomena, the Accessibility Hierarchy of rela-
tivizable grammatical relations.
1 Introduction
An attractive hypothesis in psycholinguistics, dat-
ing back at least to the 1950s, has been that
the degree of predictability of words in sentences
is somehow related to understandability (Taylor,
1953), production difficulty (Goldman-Eisler, 1958)
or, more recently, eye-movements (McDonald and
Shillcock, 2003). However, since the 1950s, inte-
grating this hypothesis with realistic models of lin-
guistic structure has remained a challenge.
Lounsbury (1954) appreciated the formal char-
acter of the problem. He defined a finite, artificial
language, endowed with a rudimentary phonology,
morphology and syntax, and showed that a word’s
informational contribution could be formally de-
fined as the entropy reduction brought about by its
addition to the end of a sentence fragment. He qual-
ified the significance of his achievement, saying
An entropy reduction analysis presupposes
that the number of possible messages is finite,
and that the probabilities of each of the mes-
sages is known....Thus it appears that the en-
tropy reduction analysis could be applied only
to limited classes of natural language mes-
sages since the number of messages in nearly
all languages is indefinitely large
(Lounsbury, 1954, 108)
A fuller presentation of this work can be found in
Hale (forthcoming).
The present paper extends Lounsbury’s original
idea to infinite languages, by applying two classi-
cal ideas in (probabilistic) formal language theory:
Grenander’s (1967) closed-form solution for the en-
tropy of a nonterminal in a probabilistic context-free
phrase structure grammar, and Lang’s (1974; 1988)
insight that an intermediate parser state is itself a
specification of a grammar.
This extension permits the psycholinguistic hy-
pothesis ERH to be examined.
Entropy Reduction Hypothesis (ERH) a per-
son’s processing difficulty at a word in a
sentence is directly related to the number of
bits signaled to the person by that word with
respect to a probabilistic grammar the person
knows.
In section 2 a method for calculating the entropy
reduction of a word in a sentence generated by a
probabilistic grammar is presented. Section 3 de-
scribes the empirical domain of interest, the Acces-
sibility Hierarchy (Keenan and Comrie, 1977). Sec-
tion 4 goes on to describe two probabilistic gram-
mars in the class of mildly context-sensitive Min-
imalist Grammars (Stabler, 1997). One expresses
the “promotion analysis” (Kayne, 1994) of relative
clauses while the other expresses the more standard
“adjunction analysis” (Chomsky, 1977). The pre-
dictions of these grammars through the lens of the
ERH are considered in sections 5 through 7, where
it is shown that predictions derived from the pro-
motion analysis match human repetition accuracy
scores better than predictions derived from the ad-
junction analysis. Section 8 concludes.
2 Entropy Reduction
The idea of the entropy reduction of a word is that
uncertainty about grammatical continuations fluctu-
ates as new words come in. The ERH is the pro-
posal that fluctuations in this value be taken as psy-
cholinguistic predictions. This proposal is founded
on the possibility of viewing nonterminal symbols
in probabilistic grammars as random variables. For
instance, in the rules given below,
0.87 NP→theboy
0.13 NP→thetallboy
the nonterminal NP can be viewed as a random
variable that has two alternative outcomes. Indeed,
nonterminals generally in probabilistic context-free
phrase structure grammars (PCFGs) can be viewed
this way. Since their outcomes are discrete, their
entropy H is easily calculated
H(X) = −summationdisplay
x∈X
p(x)log2 p(x) (1)
H(NP) = −[(0.87×log2 0.87)
+(0.13×log2 0.13)]
≈ 0.56 bits
There is just over half a bit of uncertainty about
how NP is going to rewrite, because the outcome
is so heavily weighted towards the first alternative.
In this simple example there is no recursion, so the
generated language is finite. To obtain the uncer-
tainty about infinite PCFG languages, a recursive
relation due to Grenander (1967) can be used to cal-
culate the entropy of the start symbol S which be-
gins all derivations.
2.1 Entropy of nonterminals in a PCFG
Grenander’s theorem is a recurrence relation that
gives the entropy of each nonterminal in a PCFG G
as the sum of two terms. Let the set of production
rules in G be Π and the subset rewriting nontermi-
nal ξ be Π(ξ). Denote by pr the probability of a rule
r having daughters ξj1,ξj2,.... Then
h(ξi) = − summationdisplay
r∈Π(ξi)
pr log2 pr
H(ξi) = h(ξi) + summationdisplay
r∈Π(ξi)
pr [H(ξj1)
+H(ξj2) +···]
(Grenander, 1967, 19)
the first term, lowercase h, is simply the definition
of entropy for a discrete random variable. The sec-
ond term, uppercase H, is the recurrence. It ex-
presses the intuition that derivational uncertainty is
propagated from children to parents.
For PCFGs that define a probability distribution,
the solution to this recurrence can be written as a
matrix equation where I is the identity matrix, vectorh
the vector of the h(ξi) and A is a matrix whose
(i,j)th component gives the expected number of
nonterminals of type j resulting from nonterminals
of type i.
H = (I −A)−1vectorh (2)
2.2 Incomplete sentences
Grenander’s theorem supplies the entropy for any
PCFG nonterminal in one step by inverting a ma-
trix. To determine the contribution of a particu-
lar word, one would like to be able to look at the
change in uncertainty about compatible derivations
as a given prefix string is lengthened. When this set,
the set of derivations generating a given string w =
w0w1 ...wn as a left prefix, is finite, it can be ex-
pressed as a list. In the case of a recursive grammar
this set is not finite and some other representation is
necessary.
Lang and Billot observe (1974; 1988; 1989) that
the incremental state of a parser can be described
by another, related grammar. They view parsing as
the intersection of a grammar with a regular lan-
guage, of which ordinary strings are but the simplest
examples. This perspective readily accommodates
incomplete sentences as regular languages whose
members all have the same initial n words but con-
tinue with all possible words of the terminal vocabu-
lary, for all possible lengths. If L(G) is the language
of the grammar G, parsing an initial substring w is
the intersection depicted in 3 where the period de-
notes any terminal symbol of G and the Kleene star
indicates any number of repetitions.
w(.)∗ ∩L(G) (3)
The result of this intersection is a new context-
free grammar describing just the derivations whose
yield begins with the string w. By generalizing the
input from a single string to a regular set of strings,
the grammatical continuations can be captured in
the new, output grammar. These grammars are eas-
ily read off of chart parsers’ internal data structures
by attaching position indices to nonterminal names,
thus distinguishing recognized constituents in dif-
ferent positions.
The uncertainty associated with the the start sym-
bol of this new, resultant grammar is the conditional
entropy H(S|w1,w2,···wn). The entropy reduc-
tion of word wn+1 then is the downward change in
this value as the string w is made one word longer.
The proposal of the ERH is that these changes mea-
sure the disambiguation work the comprehender has
performed by ruling out possible syntactic analyses.
SUBJECT ⊃ DIR. OBJECT ⊃ INDIR. OBJECT ⊃ OBLIQUE ⊃ GENITIVE ⊃ OCOMP
Figure 1: The Accessibility Hierarchy of relativizable grammatical relations
3 The Accessibility Hierarchy
This paper examines the processing predictions of
the ERH on a systematic class of relative clause
types, the Accessibility Hierarchy (AH) shown in
figure 1. The AH is an implicational markedness
hierarchy of grammatical relations discovered by
Keenan and Comrie in (1977). The implication is
that if a language has a relative-clause formation
rule applicable to grammatical relations at some
point x on the AH, then it can also form relative
clauses on grammatical relations listed at all points
before x.
This hierarchy shows up in a variety of mod-
ern syntactic theories that have been influenced by
Relational Grammar (Perlmutter and Postal, 1974).
In Head-driven Phrase Structure Grammar (Pollard
and Sag, 1994) the hierarchy corresponds to the
order of elements on the SUBCAT list, and inter-
acts with other principles in explanations of bind-
ing facts. The hierarchy also figures in Lexical-
Functional Grammar (Bresnan, 1982) where it is
known as Syntactic Rank.
Keenan and Comrie speculated that their typo-
logical generalization might have a basis in per-
formance factors. This idea was examined in
a repetition-accuracy experiment carried out in
1974 but not published until 1987. Subjects in this
study repeated back stimulus sentences after a delay
while under the additional memory load of a digit-
memory task. Stimuli were subject-modifying rel-
ative clauses embedded in one of four carrier sen-
tence frames, exemplified in figure 2.
subject extracted they had forgotten that the boy who
told the story was so young
direct object extracted the fact that the cat which
David showed to the man likes eggs is strange
indirect object extracted I know that the man who
Stephen explained the accident to is kind
oblique extracted he remembered that the food which
Chris paid the bill for was cheap
genitive subject extracted they had forgotten that the
girl whose friend bought the cake was waiting
genitive object extracted the fact that the sailor whose
ship Jim took had one leg is important
Figure 2: Relative clauses in each of four carrier
sentence types
The results of the human study, given in figure 3,
SU DO IO OBL GenS GenO
repetition
accuracy 406 364 342 279 167 171
Figure 3: results from Keenan & Hawkins (1987)
show that repetition accuracy1 declines across the
AH. Keenan and Hawkins (1987) note however that
“It remains unexplained just why RCs should be
more difficult to comprehend-produce as they are
formed on positions lower on the AH.”
The ERH, if correct, would offer just such an ex-
planation. If a person’s difficulty on each word of
a sentence is related to derivational information sig-
naled by that word, then the total difficulty reading
a sentence ought to be the sum of the difficulty on
each word2.
4 Minimalist Grammars
If correct, the ERH would explain the increasing
difficulty across the AH in terms of greater or lesser
uncertainty about intermediate parser states. To cal-
culate these predictions, some assumption must be
made about what those structures are.
4.1 Two analyses of relativization
Toward this end, two grammars covering the
Keenan and Hawkins stimuli were written in the
Minimalist Grammars (Stabler, 1997) formalism.
These grammars were exactly the same except for
their treatment of relative clauses.
One grammar expresses the usual analysis of rel-
ative clauses as right-adjoined modifiers (Chomsky,
1977). The other expresses the promotion analysis
of relative clause. The analysis, which dates back to
the 1960s, is revived in Kayne (1994). For reasons
having to do with Kayne’s general theory of phrase
structure, he proposes that, in a sentence like 1, the
underlying form of the subject is akin to 2.
1Each response was coded for accuracy on a 0-2 scale where
2 means perfect repetition and 1 suggests minor, grammatical
errors. A score of 0 was assigned when the response did not
include a relative clause of the indicated grammatical function.
Cf.Keenan and Hawkins (1987)
2Summation naturally extends the word-by-word complex-
ity metric ERH to the sentence level. In word-by-word self-
paced reading, evidence for the Accessibility Hierarchy is lim-
ited (cf. chapter 5 of Hale (2003)).
(1) the boy who the father explained the answer
to was honest
(2) [IP the father explained the answer to
[DP[+wh] who boy[+f] ] ]
According to Kayne, at an early stage (2) of syn-
tactic derivation, the determiner phrase (DP) “who
boy” occupies what will eventually be the gap posi-
tion. This DP moves to a specifier position of the en-
closing, empty-headed (C0) complementizer phrase
(CP), thereby checking a feature +wh as indicated
in 3.
(3) [CP [DP who boy[+f] ]i C0 [IP the father ex-
plained the answer to ti ] ]
In a second movement, “boy” evacuates from DP,
moving to another specifier (perhaps that of the
silent agreement morpheme, Agr) as in 4 – checking
a different feature, +f.
(4) [AgrP boyj Agr [CP [DP who tj ]i C0 [IP the
father explained the answer to ti ] ] ]
The entire structure becomes a complement of a de-
terminer to yield a larger DP in 5.
(5) [DP the [AgrP boyj Agr [CP [DP who tj ]i C0
[IP the father explained the answer to ti ] ] ]
]
No adjunction is used in this derivation, and, un-
conventionally, the leftmost “the” and “boy” do not
share an exclusive common constituent. Nor is the
wh-word “who” co-indexed with anything. Struc-
tural descriptions involving both the Kaynian anal-
ysis and the more standard adjunction analysis are
shown in figures 4 and 5 respectively3. The other
linguistic assumptions suggested by these diagrams
are discussed in chapter 4 of Hale (2003).
4.2 Formal grammars of relativization
The Minimalist Grammars (MG) formalism (cf.
Stabler and Keenan (2003) for a systematic pre-
sentation) facilitates the relatively transparent im-
plementation of ideas like movement and fea-
ture checking that figure prominently in the two
analyses of relativization discussed in the previ-
ous subsection. MGs define a set of sentences
by closing the structure-building functions merge
and move on a finite set of lexical entries; how-
ever, this does not mean that parsing must happen
bottom-up. A fundamental result, obtained inde-
pendently by Harkema (2001) and Michaelis (2001)
3The X-bar structures depicted in figures 4 and 5 are drawn
using tools developed by Edward Stabler and colleagues.
is that MGs are equivalent to Multiple context-
free grammars (Seki et al., 1991). Multiple context-
free grammars generalize standard context-free
grammars by allowing the string yields of daugh-
ter categories to be manipulated by a function other
than simple concatenation. As in Tree Adjoin-
ing Grammar (Joshi et al., 1975) a record of these
manipulations is kept at each node of an MG deriva-
tion tree, while a picture of the result is manifested
in derived trees such as the ones in figures 4 and 5.
The derivation tree on the promotion grammar is
shown4 in figure 6 for the substring “the boy who
the father explained the answer to.”
d -case
::=c_rel d -casec_rel
+wh_rel c_rel,-wh_rel
::=t +wh_rel c_relt,-wh_rel
+case t,-case,-wh_rel
::=>little_v +case tlittle_v,-case,-wh_rel
=d little_v,-wh_reld -case
::=>v =d little_vv,-wh_rel
+case v,-case,-wh_rel
=d +case v,-wh_reld -case
::=p_to =d +case vp_to,-wh_rel
::=>Pto p_toPto,-wh_rel
+case Pto,-case -wh_rel
::=d +case Ptod -case -wh_rel
+f d -case -wh_rel,-f
::=Num +f d -case -wh_relNum,-f
::=n Num::n -f
::=Num d -caseNum
::=n Num::n
::=Num d -caseNum
::=n Num::n
Figure 6: Derivation tree on promotion grammar.
The derivation trees encode everything there is to
know about MG derivations, and can be parsed in
a variety of orders. Most importantly, if equipped
with weights on their branches, they can be gener-
ated by probabilistic context-free grammars.
4These derivation trees are drawn using tools developed by
Maxime Amblard.
cP
c’a40a40a40a40a40a40
c
a104a104a104a104a104a104
tP
dP(4)
d’a24a24a24
d
the
a88a88a88
c relPa32a32a32a32
dP(1)
nP(0)
n’
n
boy
d’
d
who
NumP
Num’
NumnP
t(0)
a96a96a96a96
c rel’a32a32a32
c rel
a96a96a96
tPa40a40a40a40a40
dP(3)
d’
d
the
NumP
Num’
NumnP
n’
n
father
a104a104a104a104a104
t’a32a32a32
ta33a33
little v
v
explain
little v
a97a97t
-ed
a96a96a96
little vPa24a24a24
dP
t(3)
a80a80
little v’a16a16
little v
t
a80a80
vPa16a16
dP(2)
d’
d
the
NumP
Num’
NumnP
n’
n
answer
a80a80
v’a34
dP
t(2)
a98v’
v
t
p toP
p to’a16a16
p to
Pto
to
p to
a80a80
PtoPa8
dP(1)
t(1)
a72a72Pto’
Pto
t
dP
t(1)
t’a33a33
t
Be
be
t
-ed
a97a97BeP
Be’
Be
t
aP
dP
t(4)
a’
a AP
A’
A
honest
Figure 4: Kaynian promotion analysis
cP
c’a40a40a40a40a40a40
c
a104a104a104a104a104a104
tP
dP(3)a24a24a24
dP
d’
d
the
NumP
Num’
NumnP
n’
n
boy
a88a88a88
c relPa24a24a24
dP(0)
d’
d
who
a88a88a88
c rel’a32a32a32
c rel
a96a96a96
tPa40a40a40a40a40
dP(2)
d’
d
the
NumP
Num’
NumnP
n’
n
father
a104a104a104a104a104
t’a32a32a32
ta33a33
little v
v
explain
little v
a97a97t
-ed
a88a88a88
little vPa16a16
dP
t(2)
a88a88a88
little v’a16a16
little v
t
a80a80
vPa16a16
dP(1)
d’
d
the
NumP
Num’
NumnP
n’
n
answer
a80a80
v’a34
dP
t(1)
a98v’
v
t
p toP
p to’a16a16
p to
Pto
to
p to
a80a80
PtoPa8
dP(0)
t(0)
a72a72Pto’
Pto
t
dP
t(0)
t’a33a33
t
Be
be
t
-ed
a97a97BeP
Be’
Be
t
aP
dP
t(3)
a’
a AP
A’
A
honest
Figure 5: more standard adjunction analysis
5 Procedure
Derivation trees on both grammars were obtained5
for each of Keenan and Hawkins’ (1987) twenty-
four stimulus sentences6. Branches of these deriva-
tion trees were viewed as PCFG rules with probabil-
ities set according to the usual relative-frequency es-
timation technique (Chi, 1999). However, because
the stimuli were intentionally constructed to have
5Derivations were obtained using a parser described in Ap-
pendix A of Hale (2003)
6To eliminate number agreement as a source of derivational
uncertainty, the results were calculated using a modified stim-
ulus set in which four noun phrases were changed from plural
to singular.
exactly four examples of each structure, these sen-
tences were weighted in accordance with a corpus
study (Keenan, 1975) to make their relative frequen-
cies more realistic.
6 Results
The summed entropy reductions exhibit a signifi-
cant correlation with the repetition accuracy scores
collected by Keenan and Hawkins (1987).
The correlation in figure 7(a) obtains only on the
grammar expressing the Kaynian promotion anal-
ysis, and not on the grammar expressing the stan-
dard adjunction analysis (figure 7(b)). Nor do log-
probabilities for stimulus sentences on the grammar
250 300 350 400 450 500errorscore
30
35
40
45
50
55
totalbitsreduced AccessibilityHierarchypromotiongrammarr
2Equal0.45,pLess0.001
250 300 350 400 450 500errorscore
50
55
60
65
70
75
totalbitsreduced AccessibilityHierarchyadjunctiongrammarr
2Equal0.02,n.s.
Figure 7: Predictions of two probabilistic Minimalist Grammars through the lens of the ERH
exhibit a significant correlation with repetition ac-
curacy scores.
7 Discussion
From the perspective of the ERH, the difference be-
tween the promotion and adjunction grammars re-
sides in the uncertainty of particular states an incre-
mental parser would pass through on the way to a
complete analysis.
On the Keenan and Hawkins’ (1987) stimuli,
these grammars specify incremental parser states
that support explanations for some of the observed
repetition accuracy asymmetries, abbreviated <.
SU < IO subject extracted relatives are easier than
indirect object extracted relatives, because a
left-to-right incremental parser evades, in just
subject extracted relatives, the uncertainty as-
sociated with questions like
• which internal argument is the gap?
• did dative shift happen?
These questions are defined by alterna-
tive derivation-subtrees associated with the
verb phrase. For the DO stimuli that use poten-
tially ditransitive embedded verbs the same ex-
planation is available, however only two out of
four items in the Keenan and Hawkins (1987)
set qualify.
IO < OBL there is only one type of extraction
from indirect object, whereas on these gram-
mars, the head of the oblique phrase (“for”
“with” “on” or “in”) signals which of four cat-
egorically separate kinds of extraction has oc-
curred. These alternatives correspond to four
different derivation-nonterminals.
OBL < GEN both grammars analyze “whose” as
taking a common noun argument, for example
“whose ship.” But in just the promotion gram-
mar, “whose” is further analyzed as the ordi-
nary “who” morpheme plus a complex pos-
sessive phrase headed by “-s” (McDaniel et
al., 1998). Because of the recursive charac-
ter of this possessor category, the structure of
“whose’s” common noun argument introduces
additional uncertainty not present in the indi-
rect object extracted relatives.
Strikingly, the two grammars disagree on six out-
liers in figure 7(b) where just the adjunction gram-
mar predicts very great difficulty in conjunction
with the ERH. These outlier predictions are made on
just the sentences that use the nominal carrier frame
beginning with “the fact that...” Because the adjunc-
tion grammar analyzes relative clauses with an MG
rule analogous to the phrase structure rule (4),
DP → DP CPrel. (4)
all DPs are available for modification by any num-
ber of stacked relative clauses. The nominal frame
introduces an additional DP, not present in the other
stimuli, that can be modified in this way.
By contrast, the promotion grammar does not in-
clude a +f promotion feature on any lexical en-
try for “fact,” precluding the possibility of such
modification. Moreover, even with such a feature,
the promotion grammar assigns different categories
to the outermost versus successive relative clause
modifiers. Because only one relative clause is ever
stacked in the Keenan and Hawkins (1987) stimulus
set, the relevant recursion is not attested, yielding a
category of caseless subject DP that is more certain
than it is in the adjunction grammar.
An ERH account that avoids predicting these out-
liers on the Keenan and Hawkins (1987) stimuli
seems to require a grammar where the probability
of 2nd and subsequent stacked relative clause modi-
fiers is closer to 0 (its value on the trained promotion
grammar) than to 0.31 (its value on the trained ad-
junction grammar). Beyond these particular stimuli,
this modeling motivates a general question about
the scale of structural expectations in human sen-
tence processing. Does disconfirmation of a more
complicated structural alternative (such as stacked
relative clauses) induce greater processing difficulty
than disconfirmation of a simpler one? Such em-
pirical issues go beyond the scope of this paper but
suggest particular kinds of future work.
8 Conclusion
By extending Lounsbury’s (1954) entropy reduction
idea to infinite languages, it has become possible
to relate predictability and processing difficulty in a
way that takes into account linguistic structures de-
fined by one kind of mildly context-sensitive gram-
mar formalism. This relation is the linking hypoth-
esis ERH.
On this linking hypothesis, a grammar express-
ing the promotion analysis of relative clauses yields
whole-sentence predictions more closely approxi-
mating human repetition accuracy results than does
a grammar expressing the standard adjunction anal-
ysis.
If the ERH is true, this result suggests that one
grammar carries a kind of greater psychological va-
lidity than the other. On the other hand, to the
extent that the promotion grammar correctly char-
acterizes human linguistic competence, this con-
firms the ERH as a linking hypothesis. In any case,
the information-processing difficulty of incremental
parsing can now be given a more specific definition.
Acknowledgments
The author wishes to thank Paul Smolensky, Ed Sta-
bler and Ted Gibson.

References
Sylvie Billot and Bernard Lang. 1989. The struc-
ture of shared forests in ambiguous parsing. In
Proceedings of the 1989 Meeting of the Associa-
tion for Computational Linguistics.
Joan Bresnan, editor. 1982. The Mental Repre-
sentation of Grammatical Relations. MIT Press,
Cambridge, MA.
Zhiyi Chi. 1999. Statistical properties of prob-
abilistic context-free grammars. Computa-
tional Linguistics, 25(1):131–160.
Noam Chomsky. 1977. On Wh-Movement. In Pe-
ter Culicover, Thomas Wasow, and Adrian Ak-
majian, editors, Formal Syntax, pages 71–132.
Academic Press, New York.
Frieda Goldman-Eisler. 1958. Speech produc-
tion and the predictability of words in context.
Quarterly Journal of Experimental Psychology,
10:96–106.
Ulf Grenander. 1967. Syntax-controlled probabili-
ties. Technical report, Brown University Division
of Applied Mathematics, Providence, RI.
John Hale. 2003. Grammar, uncertainty and sen-
tence processing. Ph.D. thesis, Johns Hopkins
University, Baltimore, Maryland.
Henk Harkema. 2001. Parsing Minimalist Gram-
mars. Ph.D. thesis, UCLA.
Aravind K. Joshi, Leon S. Levy, and Masako Taka-
hashi. 1975. Tree adjunct grammars. Journal of
Computer and System Sciences, 10:136–163.
Richard S. Kayne. 1994. The Antisymmetry of Syn-
tax. MIT Press.
Edward L. Keenan and Bernard Comrie. 1977.
Noun phrase accessibility and universal grammar.
Linguistic Inquiry, 8(1):63–99.
Edward L. Keenan and Sarah Hawkins. 1987. The
psychological validity of the Accessibility Hi-
erarchy. In Edward L. Keenan, editor, Univer-
sal Grammar: 15 Essays, pages 60–85, London.
Croom Helm.
Edward L. Keenan. 1975. Variation in universal
grammar. In R.W. Shuy and R.W. Fasold, edi-
tors, Analyzing Variation in Language. George-
town University Press.
Bernard Lang. 1974. Deterministic techniques for
efficient non-deterministic parsers. In J. Loeckx,
editor, Proceedings of the 2nd Colloquium on
Automata, Languages and Programming, num-
ber 14 in Springer Lecture Notes in Computer
Science, pages 255–269, Saarbru¨ucken.
Bernard Lang. 1988. Parsing incomplete sentences.
In Proceedings of the 12th International Confer-
ence on Computational Linguistics, pages 365–
371.
Floyd G. Lounsbury. 1954. Transitional proba-
bility, linguistic structure and systems of habit-
family hierarchies. In C. E. Osgood and T. A.
Sebeok, editors, Psycholinguistics: a survey of
theory and research. Indiana University Press.
Dana McDaniel, Cecile McKee, and Judy B. Bern-
stein. 1998. How children’s relatives solve a
problem for minimalism. Language, pages 308–
334.
Scott A. McDonald and Richard C. Shillcock. 2003.
Eye movements reveal the on-line computation of
lexical probabilities during reading. Psychologi-
cal Science, 14:648–652.
Jens Michaelis. 2001. On Formal Properties of
Minimalist Grammars. Ph.D. thesis, Potsdam
University.
David Perlmutter and Paul Postal. 1974. Lectures
on Relational Grammar. LSA Linguistic Insti-
tute, UMass Amherst.
Carl Pollard and Ivan A. Sag. 1994. Head-
driven Phrase Structure Grammar. University of
Chicago Press.
Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii,
and Tadao Kasami. 1991. On multiple context-
free grammars. Theoretical Computer Science,
88:191–229.
Edward Stabler and Edward Keenan. 2003. Struc-
tural similarity. Theoretical Computer Science,
293:345–363.
Edward P. Stabler. 1997. Derivational minimal-
ism. In Christian Retor´e, editor, Logical As-
pects of Computational Linguistics, pages 68–95.
Springer.
Wilson Taylor. 1953. Cloze procedure: a new tool
for measuring readability. Journalism Quarterly,
30:415–433.
