AN LR CATEGORY-NEUTRAL PARSER WITH LEFT CORNER 
PREDICTION 
Paola Merlo 
University of Maryland/Universit~ de Gen~ve 
Fscult~ des Lettres 
CH-1211 Gen~ve 4 
merlo@divsun.,nige.ch 
Abstract 
In this paper we present a new parsing model of 
linguistic and computational interest. Linguisti- 
cally, the relation between the paxsez and the the- 
ory of grammar adopted (Government and Bind- 
ing (GB) theory as presented in Chomsky(1981, 
1986a,b) is clearly specified. Computationally, 
this model adopts a mixed parsing procedure, 
by using left corner prediction in a modified LR 
parser. 
ON LINGUISTIC THEORY 
For a parser to be linguistically motivated, it must 
be transparent to a linguistic theory, under some 
precise notion of transparency (see Abney 1987)~ 
GB theory is a modular theory of abstract prin- 
ciples. A parser which encodes a modular theory 
of grammax must fulfill apparently contradictory 
demands: for the parser to be explanatory it must 
maintain the modularity of the theory, while for 
the paxser to be efficient, modularization must be 
minimized so that all potentially necessary infor- 
mation is available at all times, x We explore a 
possible solution to this contradiction. We observe 
that linguistic information can be classified into 5 
different classes, as shown in (1), on the basis of 
their informational content. These we will ca\]\] IC 
Classes. 
(1) a. Configurations: sisterhood, c-command, 
m-command, :t:maximal projection ... 
b. Lexical features: ~N, ±V, ±Funct, 
±c-selected, :t:Strong Agr ... 
c. Syntactic features: ±Case, ~8, ±7, 
~baxrier. 
d. Locality information: minimality, binding, 
antecedent government. 
e. Referential information: +D-linked, 
±anaphor, ±pronominal. 
IOn efficiency of GB-based systems 
tad(1990), Kashkett(1991). 
see RJs- 
288 
This classification can be used to specify pre- 
cisely the amount of modularity in the parser. 
Berwick(1982:400ff) shows that a modulax system 
is efficient only if modules that depend on each 
other axe compiled, while independent modules 
axe not. We take the notion of dependent and 
independent to correspond to IC Classes, in that 
primitives that belong to the same IC Class axe 
dependent on each other, while primitives that be- 
long to different IC Classes axe independent from 
each other. We impose a modularity requirement 
that makes precise predictions for the design of the 
parser. 
Modularity Requirement (MR) Only primi- 
tives that belong to the same IC Class can be 
compiled in the parser. 
RECOVERING PHRASE 
STRUCTURE 
According to the MR, notions such as headedness, 
directionality, sisterhood, and maximal projection 
can be compiled and stored in a data structure, be- 
cause these notions belong to the same IC Class, 
configurations. These features are compiled into 
context-free rules in our parser. These basic X 
rules axe augmented by A rules licensed by the 
part of Trace theory that deals with configura- 
tions. The crucial feature of this grammar is that 
nontermina\]s specify only the X projection level, 
and not the category. The full context-free gram- 
max is shown in Figure 1. 
The recovery of phrase structure is a crucial 
component of a parser, as it builds the skeleton 
which is needed for feature annotation. It must 
be efficient and it must fail as soon as an error is 
encountered, in order to limit backtracking. An 
LR(k) parser (Knuth 1965) has these properties, 
since it is deterministic on unambiguous input, 
and it has been proved to recognize only valid 
prefixes. In our parser, we compile the grammar 
shown above into an LALR(1) (Aho and Ullma~n 
1972) parse table. The table has been modified 
X" ~ Y" X' 
X" --' X' Y" 
X' --' X Y" 
X' --+ ¥" X 
X' --* Y" X' 
X' --' X' Y" 
X" --~ Y" X" 
X" --' X" Y" 
X --, empty 
X" --, empty 
Figure 1: 
specification 
complementation 
modification 
adjunction 
empty heads 
empty Xmaxs 
Category-Neutral Grammar 
in order to have more than one action for each 
table entry. 2 Three stacks are used: a stack for 
the states traversed so far; a stack for the seman- 
tic attributes associated with each of the nodes; 
a tree stack of partial trees. The LR algorithm 
is encoded in a parse predicate, which establishes 
a relation between two sets of 5-tuples, as shown 
in (2). s 
(2) Tix$ixA~xCixPT~--* T~xSjxA.~xCjxPT~ 
Our parser is more elaborate and less restric- 
tive than a standard LR parser, because it im- 
poses conditions on the attributes of the states 
and it is nondeterministic. In order to reduce the 
amount of nondeterminism, some predictive power 
has been introduced. The cooccurenee restrictions 
between categories, and subcategorization infor- 
mation of verbs is compiled in a table, which we 
call Left Corner Prediction Table (LC Table). By 
looking at the current token, at its category la- 
bel, and its subcategorization frame, the number 
of choices of possible next states can be restricted. 
For instance, if the current token is a verb, and 
the LR table allows the parser either to project one 
level up to V ~, or it requires to create an empty ob- 
ject NP, then, on consulting the subcategorization 
information, the parser can eliminate the second 
option as incorrect if the verb is intransitive. 
RESULTS AND COMMENTS 
The design presented so far embodies the MR, 
since it compiles only dependent features in two 
tables off-line. Compared to the use of partially 
or fully instantiated context-free grammars, this 
2This modification is necessary because the gram- 
mar compiled into the LR table is not an LR grammar. 
Sin (2) T~ is an element of the set of input tokens, 
Ss is an element of the set of states in the LR table, At 
is an element of the set of attributes associated with 
each state in the table, C~ iS an element of the set of 
chains, i.e. displaced element, and PTk iS an element 
of the set of tokens predicted by the left corner table 
(see below). 
289 
Grammar Instantiated 
Number of Rules 51 
46 
224 
Number of States 
Shift/reduce conflicts 
Reduce/reduce conflicts 270 
X 
16 
14 
24 
36 
Figure 2: Numbers 
organization of the parsing algorithms has been 
found to be better on several grounds. 
Consider again the X grammar that we use in 
the parser, shown in Figure 1. One of the crucial 
features of this grammar is that the nonterminals 
are specified only for level and headedness. This 
version of the grammar is a recent result. In previ- 
ous implementations of the parser, the projections 
of the head in a rule were instantiated: for in- 
stance NP--~ YP IV' . Empirically, we find that 
on compiling the partially instantiated grammar 
the number of rules is increased proportionately 
to the number of categories, and so is the num- 
ber of conflicts in the table. Figure 2 shows the 
relative sizes of the LALR(1) tables and the num- 
ber of conflicts. Moreover, on closer inspection 
of the entries in the table, categories that belong 
to the same level of projection show the same re- 
duce/reduce conflicts. This means that introduc- 
ing unrestricted categoriM information increases 
the size of the table without decreasing the num- 
ber of conflicts in each entry, i.e. without reducing 
the nondeterminism in the table. 
These findings confirm that categorial infor- 
mation can be factored out of the compiled table, 
as predicted by the MR. The information about 
cooccurrenee restrictions, category and subcatego- 
rization frame is compiled in the Left Corner (LC) 
table, as described above. Using two compiled ta- 
bles that interact on-line is better than compiling 
all the information into a fully instantiated, stan- 
dard context-free grammar for several reasons. 4 
Computational\]y, it is more efllcient, s Practically, 
manipulating a small, highly abstract grammar is 
4Fully iustantiated grammars have been used, 
among others, by Tomita(1985) in an LR parser, and 
by Doff(1990), Fong(1991) in GB-based parsers. 
sit has been argued elsewhere that for context-free 
parsing algorithms, the size of the graxrtrnsr (which iS 
a constant factor) can easily become the predominant 
factor for a11 useful inputs (see Berwick and Weinberg 
1982). Work on compilation of parsers that use GPSG 
seems to point in the same direction. The separation of 
strnctu~al information from cooccttrence restrictions iS 
advocated in Kilbury(1986); both Shieber(1986) and 
Phi\]Hps(1987) argue that the combinatorial explosion 
(Barton 1985) of a fully expanded ID/LP formalism 
can be avoided by using feature variables in the com- 
piled gxammar. See also Thompson 1982. 
much easier. It is easy to maintain and to embed 
in a full-fledged parsing system. Linguistically, a 
fully-instantiated paxser would not be transpaxent 
to the theory and it would be language dependent. 
Finally, it could not model some experimental psy- 
cholingnistic evidence, which we present below. 
PSYCHOLINGUISTIC SUPPORT 
A reading task is presented in F~azier and Rayner 
1987 where eye movements are monitored: they 
find that in locally ambiguous contexts, the am- 
biguous region takes less time than an unambigu- 
ous eounterpaxt, while a slow down in process- 
ing time is registered in the disambiguating re- 
gion. This suggests that selection of major catego- 
rial information in lexically ambiguous sentences is 
delayed, e This delay means that the parser must 
be able to operate in absence of categorial infor- 
mation, making use of a set of category-neutral 
phrase structure rules. This separation of item- 
dependent and item-independent information is 
encoded in the grammax used in our paxser. A 
parser that uses instantiated categories would have 
to store categorial cooccurence restrictions in a dif- 
ferent data structure, to be consulted in case of 
lexically ambiguous inputs. Such design would be 
redundant, because categorial information would 
be encoded twice. 
CONCLUSION 
The module described in this paper is imple- 
mented and embedded in a parser for English of 
limited coverage, but it has some shortcomings, 
which axe currently under investigation. Refine- 
ments axe needed to compile the LC table auto- 
matically, to define IC Classes predictively instead 
of by exhaustive listing. Finally, a formal proof 
is needed to show that our definition of indepen- 
dent and dependent is always going to increase 
efficiency. 
ACKNOWLEDGEMENTS 
This work has benefited from suggestions by Bon- 
nie Doff, Paul Gorrell, Eric Wehrli and Amy 
Weinberg. The author is supported by a Fellow- 
ship from the Swiss-Italian Foundation. 
eFor instance, in the sentences in (3), (from F~azier 
and Rayner 1987) the ambiguous target item, shown 
in capitals in (3)a, takes less time than the unambigu- 
ous control in (3)b, while there is a slow down in the 
disambiguating material (in italics). 
(3) a. The warehouse FIRES numerous employees 
each year. 
b. That warehouse fixes numerous employees each 
year. 

REFERENCES 
Abney Steven 1987, "GB Paxsing and Psycholog- 
ical Reality" in MIT Paxsing Volume, Cognitive 
Science Center. 
Aho A.V. and J.D. Ullman 1972, The Theory 
of Parsing, Translation and Compiling, Prentice- 
Hall, Englewood Cliffs, NJ. 
Barton Edward 1985, "The Computational 
Difficulty of ID/LP Parsing" in Proc. of the ACL. 
Berwick Robert 1982, Locality Principles and 
the Acquisition of Syntactic Knowledge, Ph.D 
Diss., MIT. 
Berwick Robert and Amy Weinberg 1982, 
" Paxsing Efficiency, Computational Complexity 
and the Evaluation of Grammatical Theories ", 
Linguistic Inquiry, 13:165-191. 
Chomsky Noam 1981, Lectures on Govern- 
ment and Binding, Foris, Dordrecht. 
Chomsky Noam 1986a, Knowledge of Lan- 
guage: Its Nature, Origin and Use, Praeger, New 
York. 
Chomsky Noam 1986b, Barriers,MIT Press, 
Cambridge MA. 
Dorr Bonnie J. 1990,Lezical Conceptual Struc- 
ture and Machine Translation, Ph.D Diss., MIT. 
Fong Sandiway 1991, Computational Prop- 
erties of Principle-based Grammatical Theories, 
Ph.D Diss., MIT. 
Frazier Lyn and Keith Rayner 1987, "Res- 
olution of Syntactic Category Ambiguities: Eye 
Movements in Parsing Lexically Ambiguous Sen- 
tences" in Journal of Memory and Language, 
26:505-526. 
Kashkett Michael 1991, A Parameterised 
Parser for English and Warlpiri, Ph.D Diss., 
MIT. 
Kilbury James 1986, "Category Cooccurrence 
Restrictions and the Elimination of Metaxules", in 
Proc. of COLING, 50-55. 
Knuth Donald 1965, "On the 'I~anslation of 
Languages from Left to Right", Information and 
Control, 8. 
Phillips John 1987, "A Computational Repre- 
sentation for GPSG", DAI Research Paper 316. 
Ristad Eric 1990 , Computational Strnc~ure of 
Human Language, MIT AI Lab, TR 1260. 
Shieber Stuart 1986, "A Simple Reconstruc- 
tion of GPSG" in Proc. of COLING, 211-215. 
Thompson Henry 1982, "Handling Metaxules 
in a Parser for GPSG" in Proc. of COLING. 
Tomita Masaru 1985, E~cien~ Parsing for 
Natural Language, KluweI, Hingham, MA. 
