An Efficient Implementation of the 
Head-Corner Parser 
Gertjan van Noord" 
Rijksuniversiteit Groningen 
This paper describes an efficient and robust implementation of a bidirectional, head-driven parser 
for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken 
dialogue system in which information about public transport can be obtained by telephone. 
After a review of the motivation for head-driven parsing strategies, and head-corner parsing 
in particular, a nondeterministic version of the head-corner parser is presented. A memorization 
technique is applied to obtain a fast parser. A goal-weakening technique is introduced, which 
greatly improves average case efficiency, both in terms of speed and space requirements. 
I argue in favor of such a memorization strategy with goal-weakening in comparison with 
ordinary chart parsers because such a strategy can be applied selectively and therefore enormously 
reduces the space requirements of the parser, while no practical loss in time-efficiency is observed. 
On the contrary, experiments are described in which head-corner and left-corner parsers imple- 
mented with selective memorization and goal weakening outperform "standard" chart parsers. 
The experiments include the grammar of the OV/S system and the Alvey NL Tools grammar. 
Head-corner parsing is a mix of bottom-up and top-down processing. Certain approaches to 
robust parsing require purely bottom-up processing. Therefore, it seems that head-corner parsing 
is unsuitable for such robust parsing techniques. However, it is shown how underspecification 
(which arises very naturally in a logic programming environment) can be used in the head-corner 
parser to allow such robust parsing techniques. A particular robust parsing model, implemented 
in OVIS, is described. 
1. Motivation 
In this paper I discuss in full detail the implementation of the head-corner parser. 
But first I describe the motivations for this approach. I will start with considerations 
that lead to the choice of a head-driven parser; I will then argue for Prolog as an 
appropriate language for the implementation of the head-corner parser. 
1.1 Head-driven Processing 
Lexicalist grammar formalisms, such as Head-driven Phrase Structure Grammar 
(HPSG), have two characteristic properties: (1) lexical elements and phrases are as- 
sociated with categories that have considerable internal structure, and (2) instead of 
construction-specific rules, a small set of generic rule schemata is used. Consequently, 
the set of constituent structures defined by a grammar cannot be read off the rule 
set directly, but is defined by the interaction of the rule schemata and the lexical cate- 
gories. Applying standard parsing algorithms to such grammars is unsatisfactory for a 
number of reasons. Earley parsing is intractable in general, as the rule set is simply too 
general. For some grammars, naive top-down prediction may even fail to terminate. 
Alfa-informatica & BCN. E-mail: vannoord@let.rug.nl 
(~ 1997 Association for Computational Linguistics 
Computational Linguistics Volume 23, Number 3 
Shieber (1985) therefore proposes a modified version of the Earley parser, using re- 
stricted top-down prediction. While this modification often leads to better termination 
properties of the parsing method, in practice it easily leads to a complete trivialization 
of the top-down prediction step, thus leading to inferior performance. 
Bottom-up parsing is far more attractive for lexicalist formalisms, as it is driven 
by the syntactic information associated with lexical elements, but certain inadequacies 
remain. Most importantly, the selection of rules to be considered for application may 
not be very efficient. Consider, for instance, the following Definite Clause Grammar 
(DCG) rule: 
s(\[\] ,Sem) --> Arg, vp(\[Arg\] ,Sem). (1) 
A parser in which application of a rule is driven by the left-most daughter, as it is for 
instance in a standard bottom-up active chart parser, will consider the application of 
this rule each time an arbitrary constituent Arg is derived. For a bottom-up active chart 
parser, for instance, this may lead to the introduction of large numbers of active items. 
Most of these items will be useless. For instance, if a determiner is derived, there is no 
need to invoke the rule, as there are simply no VP's selecting a determiner as subject. 
Parsers in which the application of a rule is driven by the right-most daughter, such 
as shift-reduce and inactive bottom-up chart parsers, encounter a similar problem for 
rules such as: 
vp(As,Sem) --> vp(\[ArgIAs\],Sem), Arg. (2) 
Each time an arbitrary constituent Arg is derived, the parser will consider applying 
this rule, and a search for a matching VP-constituent will be carried out. Again, in 
many cases (if Arg is instantiated as a determiner or preposition, for instance) this 
search is doomed to fail, as a VP subcategorizing for a category Arg may simply not 
be derivable by the grammar. The problem may seem less acute than that posed by 
uninstantiated left-most daughters for an active chart parser, as only a search of the 
chart is carried out and no additional items are added to it. Note, however, that the 
amount of search required may grow exponentially, if more than one uninstantiated 
daughter is present: 
vp(As) --> vp(\[AI,A21As\]), AI, A2. (3) 
or if the number of daughters is not specified by the rule: 
vp(\[AO\]) --> vp(\[AO ..... An\]), A1 ..... An. (4) 
as appears to be the case for some of the rule-schemata used in HPSG. 
Several authors have suggested parsing algorithms that may be more suitable for 
lexicalist grammars. Kay (1989) discusses the concept of head-driven parsing. The 
key idea is that the linguistic concept head can be used to obtain parsing algorithms 
that are better suited for typical natural language grammars. Most linguistic theories 
assume that among the daughters introduced by a rule there is one daughter that can 
be identified as the head of that rule. There are several criteria for deciding which 
daughter is the head, two of which seem relevant for parsing. First of all, the head of 
a rule determines to a large extent what other daughters may or must be present, as 
the head selects the other daughters. Second, the syntactic category and morphological 
properties of the mother node are, in the default case, identical to the category and 
morphological properties of the head daughter. These two properties suggest that it 
may be possible to design a parsing strategy in which one first identifies a potential 
head of a rule, before starting to parse the nonhead daughters. By starting with the 
426 
van Noord Efficient Head-Corner Parsing 
head, important information about the remaining daughters is obtained. Furthermore, 
since the head is to a large extent identical to the mother category, effective top-down 
identification of a potential head should be possible. 
In Kay (1989) two different head-driven parsers are presented. First, a head-driven 
shift-reduce parser is presented, which differs from a standard shift-reduce parser 
in that it considers the application of a rule (i.e., a reduce step) only if a category 
matching the head of the rule has been found. Furthermore, it may shift onto the 
parse-stack elements that are similar to the active items (or "dotted rules") of active 
chart parsers. By using the head of a rule to determine whether a rule is applicable, the 
head-driven shift-reduce parser avoids the disadvantages of parsers in which either 
the left-most or right-most daughter is used to drive the selection of rules. Kay also 
presents a head-corner parser. The striking property of this parser is that it does 
not parse a phrase from left to right, but instead operates bidirectionally. It starts by 
locating a potential head of the phrase and then proceeds by parsing the daughters 
to the left and the right of the head. Again, this strategy avoids the disadvantages 
of parsers in which rule selection is uniformly driven by either the left-most or right- 
most daughter. Furthermore, by selecting potential heads on the basis of a head-corner 
table (comparable to the left-corner table of a left-corner parser) it may use top-down 
filtering to minimize the search-space. This head-corner parser generalizes the left- 
corner parser. Kay's presentation is reminiscent of the left-corner parser as presented 
by Pereira and Shieber (1987), which itself is a version without memorization of the 
BUP parser (Matsumoto et al. 1983). 
Head-corner parsing has also been considered elsewhere. In Satta and Stock (1989), 
Sikkel and op den Akker (1992, 1993), and Sikkel (1993), chart-based head-corner 
parsing for context-free grammar is considered. It is shown that, in spite of the fact 
that bidirectional parsing seemingly leads to more overhead than left-to-right parsing, 
the worst-case complexity of a head-corner parser does not exceed that of an Earley 
parser. Some further variations are discussed in Nederhof and Satta (1994). 
In van Noord (1991, 1993) I argue that head-corner parsing is especially useful 
for parsing with nonconcatenative grammar formalisms. In Lavelli and Satta (1991) 
and van Noord (1994) head-driven parsing strategies for Lexicalized Tree Adjoining 
Grammars are presented. 
The head-corner parser is closely related to the semantic-head-driven generation 
algorithm (see Shieber et al. \[1990\] and references cited there), especially in its purely 
bottom-up incarnation. 
1.2 Selective Memorization 
The head-corner parser is in many respects different from traditional chart parsers. An 
important difference follows from the fact that in the head-corner parser only larger 
chunks of computation are memorized. Backtracking still plays an important role for 
the implementation of search. 
This may come as a surprise at first. Common wisdom is that although small 
grammars may be successfully treated with a backtracking parser, larger grammars 
for natural languages always require the use of a data structure such as a chart or a 
table of items to make sure that each computation is only performed once. In the case 
of constraint-based grammars, however, the cost associated with maintaining such a 
chart should not be underestimated. The memory requirements for an implementation 
of the Earley parser for a constraint-based grammar are often outrageous. Similarly, 
in an Earley deduction system too much effort may be spent on small portions of 
computation, which are inexpensive to (re-)compute anyway. 
For this reason, I will argue for an implementation of the head-corner parser in 
427 
Computational Linguistics Volume 23, Number 3 
which only large chunks of computation are memorized. In linguistic terms, I will ar- 
gue for a model in which only maximal projections are memorized. The computation 
that is carried out in order to obtain such a chunk uses a depth-first backtrack search 
procedure. This solution dramatically improves upon the (average case) memory re- 
quirements of a parser; moreover it also leads to an increase in (average case) time 
efficiency, especially in combination with goal weakening, because of the reduced 
overhead associated with the administration of the chart. In each of the experiments 
discussed in Section 7, the use of selective memorization with goal weakening out- 
performs standard chart-parsers. 
1.3 Why Prolog 
Prolog is a particularly useful language for the implementation of a head-corner parser 
for constraint-based grammars because: 
• Prolog provides a built-in unification operation. 
• Prolog provides a built-in backtrack search procedure; memorization can 
be applied selectively. 
• Underspecification can be exploited to obtain results required by certain 
techniques for robust parsing. 
• Prolog is a high-level language; this enables the application of partial 
evaluation techniques. 
The first consideration does not deserve much further attention. We want to exploit 
the fact that the primary data structures of constraint-based grammars and the cor- 
responding information-combining operation can be modeled by Prolog's first order 
terms and unification. 
As was argued above, Prolog backtracking is not used to simulate an iterative 
procedure to build up a chart via side-effects. On the contrary, Prolog backtracking is 
used truly for search. Of course, in order to make this approach feasible, certain well- 
chosen search goals are memorized. This is clean and logically well-defined (consider, 
for example, Warren \[1992\]), even if our implementation in Prolog uses extra-logical 
predicates. 
The third consideration is relevant only for robust parsing. In certain methods in 
robust parsing, we are interested in the partial results obtained by the parser. To make 
sure that a parser is complete with respect to such partial results, it is often assumed 
that a parser must be applied that works exclusively bottom-up. In Section 6 it will be 
shown that the head-corner parser, which uses a mixture of bottom-up and top-down 
processing, can be applied in a similar fashion by using underspecification in the top 
goal. Clearly, underspecification is a concept that arises naturally in Prolog. 
The fact that Prolog is a high-level language has a number of practical advantages 
related to the speed of development. A further advantage is obtained because tech- 
niques such as partial evaluation can be applied. For example, I have successfully 
applied the Mixtus partial evaluator (Sahlin 1991) to the head-corner parser discussed 
below, to obtain an additional 20% speed increase. In languages such as C, partial 
evaluation does not seem to be possible because the low-levelness of the language 
makes it impossible to recognize the concepts that are required. 
1.4 Left-Corner Parsing and Head-Corner Parsing 
As the names suggest, there are many parallels between left-corner and head-corner 
parsing. In fact, head-corner parsing is a generalization of left-corner parsing. Many 
428 
van Noord Efficient Head-Corner Parsing 
of the techniques that will be described in the following sections can be applied to a 
left-corner parser as well. 
A head-corner parser for a grammar in which for each rule the left-most daughter 
is considered to be the head, will effectively function as a left-corner parser. In such 
cases, the head-corner parser can be said to run in left-corner mode. Of course, in a left- 
corner parser, certain simplifications are possible. Based on the experiments discussed 
in Section 7, it can be concluded that a specialized left-corner parser is only about 10% 
faster than a head-corner parser running in left-corner mode. This is an interesting 
result: a head-corner parser performs at least almost as well as a left-corner parser, 
and, as some of the experiments indicate, often better. 
1.5 Practical Relevance of Head-Corner Parsing: Efficiency and Robustness 
The head-corner parser is one of the parsers that is being developed as part of the 
NWO Priority Programme on Language and Speech Technology. An overview of the 
Programme can be found in Boves et al. (1995). An important goal of the Programme 
is the implementation of a spoken dialogue system for public transport information 
(the OVIS system). The language of the system is Dutch. 
In the context of the OVIS system, it is important that the parser can deal with 
input from the speech recognizer. The interface between the speech recognizer and 
the parser consists of word-graphs. In Section 5, I show how the head-corner parser 
is generalized to deal with word-graphs. 
Moreover, the nature of the application also dictates that the parser proceeds in a 
robust way. In Section 6, I discuss the OVIS Robustness component, and I show that 
the use of a parser that includes top-down prediction is not an obstacle to robustness. 
In Section 7, I compare the head-corner parser with the other parsers implemented 
in the Programme for the OVIS application and show that the head-corner parser op- 
erates much faster than implementations of a bottom-up Earley parser and related 
chart-based parsers. Moreover, the space requirements are far more modest. The dif- 
ference with a left-corner parser, which was derived from the head-corner parser, is 
small. 
We performed similar experiments for the Alvey NL Tools grammar of English 
(Grover, Carroll, and Briscoe 1993), and the English grammar of the MiMo2 system 
(van Noord et al. 1991). From these experiments it can be concluded that selective 
memorization with goal-weakening (as applied to head-corner and left-corner parsing) 
is substantially more efficient than conventional chart parsing. We conclude that at 
least for some grammars, head-corner parsing is a good option. 
2. A Specification of the Head-Corner Parser 
Head-corner parsing is a radical approach to head-driven parsing in that it gives up the 
idea that parsing should proceed from left to right. Rather, processing in a head-corner 
parser is bidirectional, starting from a head outward (island-driven). A head-corner 
parser can be thought of as a generalization of the left-corner parser (Rosenkrantz and 
Lewis 1970; Matsumoto et al. 1983; Pereira and Shieber 1987). As in the left-corner 
parser, the flow of information in a head-corner parser is both bottom-up and top- 
down. 
In order to explain the parser, I first introduce some terminology. I assume that 
grammars are defined in the Definite Clause Grammar formalism (Pereira and Warren 
1980). Without any loss of generality I assume that no external Prolog calls (the ones 
that are defined within { and }) are used, and that all lexical material is introduced 
in rules that have no other right-hand-side members (these rules are called lexical 
429 
Computational Linguistics Volume 23, Number 3 
goal 
. 
lex Figure 1 
The head-corner parser. 
g? l goal 
entries). The grammar thus consists of a set of rules and a set of lexical entries• For 
each rule an element of the right-hand side is identified as the head of that rule. The 
head-relation of two categories h, m holds with respect to a grammar iff the grammar 
contains a rule with left-hand side m and head daughter h. The relation head-corner 
is the reflexive and transitive closure of the head relation. 
The basic idea of the head-corner parser is illustrated in Figure 1. The parser selects 
a word (1), and proves that the category associated with this word is the head-corner 
of the goal. To this end, a rule is selected of which this category is the head daughter• 
Then the other daughters of the rule are parsed recursively in a bidirectional fashion: 
the daughters left of the head are parsed from right to left (starting from the head), 
and the daughters right of the head are parsed from left to right (starting from the 
head). The result is a slightly larger head-corner (2). This process repeats itself until a 
head-corner is constructed that dominates the whole string (3). 
Note that a rule is triggered only with a fully instantiated head daughter. The 
generate-and-test behavior discussed in the previous section (examples 1 and 2) is 
avoided in a head-corner parser, because in the cases discussed there, the rule would 
be applied only if the VP is found, and hence arg is instantiated. For example if arg 
= np(sg3, \[\] ,Subj), the parser continues to search for a singular NP, and need not 
consider other categories• 
To make the definition of the parser easier, and to make sure that rules are indexed 
appropriately, grammar rules are represented by the predicate headod_rulo/4 in which 
the first argument is the head of the rule, the second argument is the mother node of 
the rule, the third argument is the reversed list of daughters left of the head, and the 
fourth argument is the list of the daughters right of the head. 1 This representation of 
a grammar will in practice be compiled from a friendlier notation• 
As an example, the DCG rule 
x(A,E) --> a(A), b(B,A), x(C,B), d(C,D), e(D,E). 
of which the third daughter constitutes the head, is represented now as: 
headed_rule(x(C,B), x(A,E), \[b(B,A), a(A)\], \[d(C,D), e(D,E)\]). 
It is assumed furthermore that lexical lookup has been performed already by an- 
other module. This module has asserted clauses for the predicate lexieal_analysis/3 
where the first two arguments are the string positions and the third argument is the 
1 Later we will also allow the use of rules with an empty right-hand side. These will simply be 
represented by the predicate gap/1. 
430 
van Noord Efficient Head-Comer Parsing 
% parse(?Cat,+PO,+P) 
% there is a category Cat from PO to P 
parse(Cat,PO,P) :- parse(Cat,PO,P,PO,P). 
% parse(?Cat,?PO,?P,+E0,+E) 
% there is a category Cat from PO to P within the interval E0-E 
parse(Cat,P0,P,E0,E) :- 
predict(Cat,PO,P,E0,E,SmallCat,Q0,Q), 
head_corner(SmallCat,QO,~,Cat,PO,P,EO,E). 
% head_corner(?Small,+QO,+Q,?Cat,?PO,?P,+EO,+E) 
Small from QO-Q is a head-corner of Cat from PO-P 
where PO-P occurs within EO-E 
head_corner(Cat,PO,P,Cat,PO,P .... ). 
head_corner(Small,QO,Q,Cat,PO,P,EO,E) :- 
headed_rule(Small,Mother,RevLeftDs,RightDs), 
head_link(Cat,PO,P,Mother,QL,QR), 
parse_left_ds(RevLeftDs,QL,QO,EO), parse_right_ds(RightDs,Q,QR,E), 
head_corner(Mother,QL,QR,Cat,PO,P,EO,E). 
% parse_left_ds(+RevLeftDs,-Q0,+Q,+E0) 
% there are categories LeftDs from QO to Q 
s.t. RevLeftDs is reverse of LeftDs, and E0=<~0. 
parse left_ds(\[\],Q,Q,_). 
parse left_ds(\[HIT\],QO,Q,EO) "- 
parse(H,QI,Q,E0,Q), parse_left_ds(T,Q0,QI,E0). 
% parse_right_ds(+RightDs,+Q0,-Q,+E) 
% there are categories RightDs from Q0 to Q s.t. Q =< E. 
parse_right_ds(\[\],Q,Q,_). 
parse right_ds(\[HIT\],Q0,Q,E) "- 
parse(H,QO,Ql,@0,E), parse_right ds(T,QI,Q,E). 
% predict(+Cat,?PO,?P,+EO,+E,-Small,-QO,-Q) 
% Small from QO-Q (within EO-E) is a lexical category and possible 
% head-corner for Cat from PO-P. 
predict(Cat,PO,P,EO,E,Small,QO,Q) :- 
lex_head_link(Cat,PO,P,Small,QO,Q), 
lexical_analysis(QO,Q,Small), 
smaller_equal(EO,QO), 
smaller_equal(Q,E). 
Figure 2 
Definite clause specification of the head-corner parser. 
(lexical) category. For an input sentence Timeyqies like an arrow this module may produce 
the following set of clauses: 
lexical_analysis (0, I, verb). (5) 
lexical_analysis (0,2 ,noun). 
lexieal_analysis (i, 2, verb). 
lexical_analysis (2,3, verb). 
lexical_analysis (4,5, noun). 
A simple definite-clause specification of the head-corner parser is given in Figure 2. 
The predicate visible to the rest of the world will be the predicate parse/3. This 
lexical_analysis(O,l,noun). 
lexical_analysis(1,2,noun). 
lexical_analysis(2,3,prep). 
lexical_analysis(3,4,det). 
431 
Computational Linguistics Volume 23, Number 3 
predicate is defined in terms of the parse/5 predicate. The extra arguments introduce 
a pair of indices representing the extreme positions between which a parse should be 
found. This will be explained in more detail below. A goal category can be parsed if 
a predicted lexical category can be shown to be a head-corner of that goal. The head- 
corner predicate constructs (in a bottom-up fashion) larger and larger head-corners. 
To parse a list of daughter categories, each daughter category is parsed in turn. A 
predicted category must be a lexical category that lies somewhere between the extreme 
positions. The predicate smaller_equal is true if the first argument is a smaller or 
equal integer than the second. The use of the predicates head_link and lex_head_link 
is explained below. 
Note that unlike the left-corner parser, the head-corner parser may need to con- 
sider alternative words as a possible head-corner of a phrase, for example, when 
parsing a sentence that contains several verbs. This is a source of inefficiency if it is 
difficult to determine what the appropriate lexical head for a given goal category is. 
This problem is somewhat reduced because of: 
• the use of extremes 
• the use of top-down information 
2.1 The Use of Extremes 
The main difference between the head-corner parser in the previous paragraph and 
the left-corner parser is--apart from the head-driven selection of rules--the use of 
two pairs of indices, to implement the bidirectional way in which the parser proceeds 
through the string. 
Observe that each parse-goal in the left-corner parser is provided with a category 
and a left-most position. In the head-corner parser, a parse-goal is provided either 
with a begin or end position (depending on whether we parse from the head to the 
left or to the right) but also with the extreme positions between which the category 
should be found. In general, the parse predicate is thus provided with a category and 
two pairs of indices. The first pair indicates the begin and end position of the category, 
the second pair indicates the extreme positions between which the first pair should 
lie. In Figure 3 the motivation for this technique is illustrated with an example. 
2.2 Adding Top-Down Filtering 
2.2.1 Category Information. As in the left-corner parser, a linking table is maintained, 
which represents important aspects of the head-corner relation. For some grammars, 
this table simply represents the fact that the HEAD features of a category and its head- 
corner are shared. Typically, such a table makes it possible to predict that in order to 
parse a finite sentence, the parser should start with a finite verb; to parse a singular 
noun-phrase the parser should start with a singular noun, etc. 
The table is defined by a number of clauses for the predicate head_link/2 where 
the first argument is a category for which the second argument is a possible head- 
corner. A sample linking table may be: 
head_link(s,verb). 
head_link( s, vp). 
head_link(pp,prep). 
head_link( X, X). 
head_link( vp, verb). 
head_link( np, noun). 
head_link(sbar, comp). 
(6) 
432 
van Noord Efficient Head-Corner Parsing 
vp 
v np 
5 6 7 8 
Figure 3 
This example illustrates how the use of two pairs of string positions reduces the number of 
possible lexical head-corners for a given goal. Suppose the parser predicted (for a goal 
category s) a category v from position 5 to 6. In order to construct a complete tree s for this 
head-corner, a rule is selected that dictates that a category np should be parsed to the right, 
starting from position 6. To parse np, the category n from 7 to 8 is predicted. Suppose 
furthermore that in order to connect n to np a rule is selected that requires a category adjp to 
the left of n. It will be clear that this category adjp should end in position 7, but can never 
start before position 6. Hence the only candidate head-corner of this phrase is to be found 
between 6 and 7. 
2.2.2 String Position Information. The head-corner table also includes information 
about begin and end positions, following an idea in Sikkel (1993). For example, if the 
goal is to parse a phrase with category sbar from position 7, and within positions 7 and 
12, then for some grammars it can be concluded that the only possible lexical head- 
corner for this goal should be a complementizer starting at position 7. Such information 
is represented in the table as well. This can be done by defining the head relation as a 
relation between two triples, where each triple consists of a category and two indices 
(representing the begin and end position). The head relation ((Cm, pm, qm), (Ch, ph, qh)) 
holds iff there is a grammar rule with mother Cm and head Ch. Moreover, if the list of 
daughters left of the head of that rule is empty, then the begin positions are identical, 
i.e., Ph = Pro. Similarly, if the list of daughters right of the head is empty, then qh = qm. 
As before, the head-corner relation is the reflexive and transitive closure of the head 
relation. 
The previous example now becomes: 
head_link( s ..... verb .... ). 
head_link( s,_,P, vp,_,P). 
head_link(pp,P,_, prep,P,_). 
head_link( X,P,Q, X,P,Q). 
head_link( vp,P,_, verb,P,_). 
head_link( np ..... noun .... ). 
head_link(sbar,P,_, comp,P,_). 
(7) 
Obviously, the nature of the grammar determines whether it is useful to represent 
such information. In order to be able to run a head-corner parser in left-corner mode, 
this technique is crucial. On the other hand, for grammars in which this technique does 
not provide any useful top-down information no extra costs are introduced either. 
433 
Computational Linguistics Volume 23, Number 3 
2.2.3 Integrating the Head-Corner Table. The linking table information is used to 
restrict which lexical entries are examined as candidate heads during prediction, and 
to check whether a rule that is selected can in fact be used to reach the current goal. 
To distinguish the two uses, we use the relation lex_head_link, which is a subset 
of the head_link relation in which the head category is a possible lexical category. 
An example might be the following (where we assume that the category vp is never 
assigned to a lexical entry), which is a subset of the table in 7. 
lex_head_link( s ..... 
lex_head link( np ..... 
lex_head link(sbar,P,_, 
verb .... ). 
noun,_,_). 
comp,P,_). 
lex_head_link(vp,P,_, verb,P,_). 
lex_head_link(pp,P,_, prep,P,_). 
lex_head_link( X,P,Q, X,P,Q). 
(8) 
A few potential problems arise in connection with the use of linking tables. Firstly, 
for constraint-based grammars of the type assumed here the number of possible non- 
terminals is infinite. Therefore, we generally cannot use all information available in 
the grammar but rather we should compute a "weakened" version of the linking table. 
This can be accomplished, for example, by replacing all terms beyond a certain depth 
by anonymous variables, or by other restrictors (Shieber 1985). 
Secondly, the use of a linking table may give rise to spurious ambiguities. Consider 
the case in which the category we are trying to parse can be matched against two 
different items in the linking table, but in which case the predicted head-category may 
turn out to be the same. 
Fortunately, the memorization technique discussed in Section 3 takes care of this 
problem. Another possibility is to use the linking table only as a check, but not as a 
source of information, by encapsulating the call within a double negation. 2 
The solution implemented in the head-corner parser is to use, for each pair of 
functors of categories, the generalization of the head-corner relation. Such functors 
typically are major and minor syntactic category labels such as NP, VP, S, S-bar, verb, 
.... As a result there will always be at most one matching clause in the linking table 
for a given goal category and a given head category (thus there is no risk of obtaining 
spurious ambiguities). Moreover, this approach allows a very efficient implementation 
technique, as described below. 
2.2.4 Indexing of the Head-Corner Table. In the implementation of the head-corner 
parser, we use an efficient implementation of the head-corner relation by exploiting 
Prolog's first argument indexing. This technique ensures that the lookup of the head- 
corner table can be done in (essentially) constant time. The implementation consists 
of two steps. In the first step, the head-corner table is weakened such that for a given 
goal category and a given head category at most a single matching clause exists. In the 
second step, this table is encoded in such a way that first argument indexing ensures 
that table lookup is efficient. 
As a first step we modify the head-corner relation to make sure that for all pairs 
of functors of categories, there will be at most one matching clause in the head-corner 
table. This is illustrated with an example. Suppose a hypothetical head-corner table 
2 This approach also solves another potential problem: the linking table may give rise to (undesired) 
cyclic terms due to the absence of the occur check. The double negation also takes care of this potential problem. 
434 
van Noord Efficient Head-Corner Parsing 
contains the following two clauses relating categories with functor x/4 and y/4: 
head_link (x (A, B .... ) ..... y(A,B .... ) .... ). 
head_link(x(_,B,C,_) ..... y(_,B,C,_) .... ). 
In this case, the modified head-corner relation table will consist of a single clause 
relating x/4 and y/4 by taking the generalization (or "anti-unification") of the two 
clauses: 
head_link(x(_ ,B .... ) ..... y(_ ,B .... ) .... ). 
As a result, for a given goal and head category, table lookup is deterministic. 
In the second and final step of the modification we re-arrange the information in 
the table such that for each possible goal category functor g/n, there will be a clause: 
head_link(g(Al..An) ,Pg,Qg,Head,Ph,Qh) :- 
head_link_G_N (Head, Ph, Qh, g (AI.. An), Pg, Qg). 
Moreover, all the relations head_link_G_N now contain the relevant information from 
the head-comer table. Thus, for clauses of the form: 
head_link (x (_, B .... ) ..... y(_,B .... ) .... ) . 
we now have: 
head_link_x 4(y(_,B .... ) ..... x(_,B .... ) .... ). 
First argument indexing now ensures that table lookup is efficient. 
The same technique is applied for the lex_head_link relation. This technique sig- 
nificantly improves the practical time efficiency of the parser (especially if the resulting 
code is compiled). 
2.3 Dealing with Epsilon Rules 
In the preceding paragraphs we have said nothing about empty productions (epsilon 
rules). A possible approach is to compile the grammar into an equivalent grammar in 
which no such epsilon rules are defined. It is also possible to deal with epsilon rules 
in the head-corner parser directly. For example, we could assert empty productions 
as possible lexical analyses. In such an approach, the result of lexical analysis may 
contain clauses such as those in (9), in case there is a rule np/np --+ \[\]. 
lexical_analysis (0, O, np/np), lexical_analysis (i, I, np/np) . (9) 
lexical_analysis (2,2, np/np), lexical_analysis (3,3,np/np). 
lexical_analysis (4,4, np/np). 
There are two objections to this approach. The first objection may be that this is a 
task that can hardly be expected from a lexical lookup procedure. The second, more 
important, objection is that empty categories are hypothesized essentially everywhere. 
In the general version of the head-corner parser, gaps are inserted by a special 
clause for the predict/8 predicate (10), where shared variables are used to indicate the 
corresponding string positions. The gap_head_link relation is a subset of the head_link 
relation in which the head category is a possible gap. 
predict (Cat, PO, P, _EO, _E, Small, Q, Q) • - (10) 
gap_head_link (Cat, PO, P, Small, Q, 6)), 
gap(Small). 
435 
Computational Linguistics Volume 23, Number 3 
For this approach to work, other predicates must expect string positions that are 
not instantiated. For example, Prolog's built-in comparison operator cannot be used, 
since that operator requires that its arguments are ground. The definition of the 
smaller_equal predicate therefore reflects the possibility that a string position is a 
variable (in which case, calls to this predicate should succeed). 
For some grammars it turns out that a simplification is possible. If it is never 
possible that a gap can be used as the head of a rule, then we can omit this new clause 
for the predict predicate, and instead use a new clause for the parse/S predicate, as 
follows: 
parse (Small, Q, Q, _EO, _E) :- 
gap(Small). 
(11) 
This will typically be much more efficient because in this case gaps are hypothesized 
in a purely top-down manner. 
It should be noted that the general version of the head-corner parser is not guaran- 
teed to terminate, even if the grammar defines only a finite number of derivations for 
all input sentences. Thus, even though the head-corner parser proceeds in a bottom-up 
direction, it can run into left-recursion problems (just as the left-corner parser can). 
This is because it may be possible that an empty category is predicted as the head, 
after which trying to construct a larger projection of this head gives rise to a parse-goal 
for which a similar empty category is a possible candidate head .... This problem is 
sometimes called "hidden left-recursion" in the context of left-corner parsers. 
This problem can be solved in some cases by a good (but relatively expensive) 
implementation of the memorization technique, e.g., along the lines of Warren (1992) or 
Johnson and DOrre (1995). The simplified (and more efficient) memorization technique 
that I use (see Section 3), however, does not solve this problem. 
A quite different solution, which is often applied for the same problem if a left- 
corner parser is used, is to compile the grammar into an equivalent grammar without 
gaps. For left-corner parsers, this can be achieved by partially evaluating all rules 
that can take gap(s) as their left-most daughter(s). Therefore, the parser only needs 
to consider gaps in non-left-most position, by a clause similar to the clause in (11). 
Obviously, the same compilation technique can be applied for the head-corner parser. 
However, there is a problem: it will be unclear what the heads of the newly created 
rules will be. Moreover, and more importantly, the head-corner relation will typically 
become much less predictive. For example, if there is a rule vp --> np verb where 
verb can be realized as a gap, then after compilation, a rule of the form vp --> np 
will exist. Therefore, an np will be a possible head-corner of vp. The effect will be that 
head-corners are difficult to predict, and hence efficiency will decrease. 
Fortunately, experience suggests that grammars exhibiting hidden head-recursion 
can often be avoided. For example, in the Alvey NL Tools grammar in only 3 rules (out 
of more than 700) the head of the rule could be gapped. These rules are of the form 
x --> not x. Arguably, in such rules the second daughter should not be gapped. 
In the MiMo2 grammar of English, no heads can be gapped. Finally, in the Dutch 
OVIS grammar (in which verb-second is implemented by gap-threading) no hidden 
head-recursion occurs, as long as the head-corner table includes information about the 
feature vslash, which encodes whether or not a v-gap is expected. 
436 
van Noord Efficient Head-Corner Parsing 
3. Selective Memorization and Goal-Weakening 
3.1 Selective Memorization 
The basic idea behind memorization is simple: do not compute things twice. In Prolog, 
we can keep track of each goal that has already been searched and keep a list of the 
corresponding solution(s). If the same goal needs to be solved later, then we can skip 
the computation and simply do a table lookup. Maintaining a table and doing the table 
lookup is rather expensive. Therefore, we should modify the slogan "do not compute 
things twice" to do not compute expensive things twice. 
In the head-corner parser it turns out that the parse/5 predicate is a very good 
candidate for memorization. The other predicates are not. This implies that each max- 
imal projection is computed only once; partial projections of a head can be constructed 
during a parse any number of times, as can sequences of categories (considered as 
sisters to a head). Active chart parsers memo everything (including sequences of cat- 
egories); inactive chart parsers only memo categories, but not sequences of categories. 
In our proposal, we memo only those categories that are maximal projections, i.e., pro- 
jections of a head that unify with the top category (start symbol) or with a nonhead 
daughter of a rule. 
The implementation of memorization uses Prolog's internal database to store the 
tables. The advantage of this technique is that we use Prolog's first argument indexing 
for such tables. Moreover, during the consultation of the table we need not worry about 
modifications to it (in contrast to an approach in which the table would be maintained 
as the value of a Prolog variable). On the other hand, the use of the internal database 
brings about a certain overhead. Therefore, it may be worthwhile to experiment with 
a meta-interpreter along the lines of the XOLDT system (Warren 1992) in which the 
table is maintained dynamically. 
Memorization is implemented by two different tables. The first table encodes 
which goals have already been searched. Items in the first table are called goal items. 
The second table represents all solved (i.e., instantiated) goals. Items in this second 
table are called result items. It might be tempting to use only the second table, but 
in that case, it would not be possible to tell the difference between a goal that has 
already been searched, but did not result in a solution ("fail-goal") and a goal that has 
not been searched at all. If we have two tables, then we can also immediately stop 
working on branches in the search-space for which it has already been shown that 
there is no solution. The distinction between these two kinds of item is inherited from 
BUP (Matsumoto et al. 1983). The memorized version of the parse predicate can be 
defined as in (12). 
parse(Cat,PO,P,EO,E) :- 
( in_tablel(Cat,PO,P,E0,E) 
-> true 
; ( 
assert_table2(Cat,P0,P), 
fail 
; assert_tablel(Cat,PO,P,E0,E) 
) ), 
in_table2(Cat,P0,P,EO,E). 
done before? 
then don't search 
predict(Cat,PO,P,E0,E,SmCat,Q0,Q), ~ otherwise find all 
head_corner(SmCat,Q0,Q,Cat,PO,P,EO,E), ~ results and assert 
these 
Z goal is now done 
pick a solution 
(12) 
437 
Computational Linguistics Volume 23, Number 3 
The first table is represented by the predicate ' GOAL_ITEM'. This predicate sim- 
ply consists of a number of unit-clauses indicating all goals that have been searched 
completely. Thus, before we try to attempt to solve Goal, we first check whether a 
goal item for that goal already exists. Given the fact that Goal may contain vari- 
ables, we should be a bit careful here. Unification is clearly not appropriate, since 
it may result in a situation in which a more general goal is not searched because a 
more specific variant of that goal had been solved. We want exactly the opposite: if 
a more general version of Goal is included in the goal table, then we can continue 
to look for a solution in the result table. It is useful to consider the fact that if we 
had previously solved, for example, the goal parse (s, 3, X, 3,12), then if we later en- 
counter the goal parse(s,3,Y,3,10), we can also use the second table immediately: 
the way in which the extreme positions are used ensures that the former is more gen- 
eral than the latter. The predicates for the maintenance of the goal table are defined 
in (13). 
in_tablel (Cat, P0, P, E0, E) • - (13) 
'GOAL_ITEM'(Cat_d,P0_d,P_d,E0_d,E_d), Z goal exists which is 
subsumes chk((Cat_d,P0_d,P_d), (Cat,P0,P)), Z more general and within 
smaller_equal(E0_d,E0), °h a larger interval 
smaller_equal (E, E_d). 
assert_tablel(Cat,PO,P,EO,E) :- assertz('GOAL_ITEM'(Cat,PO,P,EO,E)). 
The second table is represented by the predicate 'RESULT_ITEM'. It is defined by 
unit-clauses that each represent an instantiated goal (i.e., a solution). Each time a result 
is found, the table is checked to see whether that result is already available. If it is, 
the newer result is ignored. If no (more general version of the) result exists, then the 
result is added to the table. Moreover, more specific results that may have been put 
on the table previously are marked. These results need not be used anymore. 3 This is 
not strictly necessary but is often useful because it decreases the size of the tables; in 
this approach, tables are redundancy free and hence minimal. Moreover, no further 
work will be done based on those results. Note that result items do not keep track 
of the extreme positions. This implies that in order to see whether a result item is 
applicable, we check whether the interval covered by the result item lies within the 
extreme positions of the current goal. The predicates dealing with the result table are 
defined in (14). 
in_table2(Cat,P0,P,E0,E) "- 
clause('RESULT_ITEM'(Cat,P0,P),Ref), 
\+ 'REPLACED_ITEM'(Ref,_), 
smaller_equal(EO,PO), smaller_equal(P,E). 
(14) 
result exists, not 
replaced by general 
result 
within desired interval 
3 Note that such items are not removed, because in that case the item reference becomes available for later items, which is unsound. 
438 
van Noord Efficient Head-Corner Parsing 
assert_table2(Cat,P0,P):- 
( 'RESULT_ITEM'(Cat_d,P0_d,P_d), 
subsumes_chk((Cat_d,P0_d,P_d),(Cat,P0,P) 
-> true 
; assertz('RESULT_ITEM'(Cat,P0,P),Ref), 
mark_item('RESULT_ITEM'(Cat,P0,P),Ref) 
). 
% if result exists 
% which is more general 
then ok 
otherwise assert it, and 
mark more specific items 
mark_item(Cat,NewRef) "- 
(clause(Specific,_,Ref), 
\+ Ref=NewRef, 
subsumes_chk(Cat,Specific), 
assertz('REPLACED_ITEM'(Ref,NewRef)), 
fail 
; true 
). 
% item exists 
not the one just added 
and it's more specific 
then mark it 
% do this for all such 
% items 
The implementation uses a faster implementation of memorizating in which both 
goal items and result items are indexed by the functor of the category and the string 
positions. 
In the head-corner parser, parse-goals are memorized. Note that nothing would 
prevent us from memoing other predicates as well, but experience suggests that the 
cost of maintaining tables for the head_corner relation, for example, is (much) higher 
than the associated profit. The use of memorization for only the parse/5 goals implies 
that the memory requirements of the head-corner parser in terms of the number of 
items being recorded is much smaller than in ordinary chart parsers. Not only do 
we refrain from asserting so-called active items, but we also refrain from asserting 
inactive items for nonmaximal projections of heads. In practice the difference in space 
requirements can be enormous. This difference is a significant reason for the practical 
efficiency of the head-corner parser. 
3.2 The Occur Check 
It turns out that the use of tables defined in the previous subsection can lead to a 
problem with cyclic unifications. If we assume that Prolog's unification includes the 
occur check then no problem would arise. But since most versions of Prolog do not 
implement the occur check it is worthwhile investigating this potential problem. 
The problem arises because cyclic solutions can be constructed that would not have 
been constructed by ordinary SLD-resolution. Furthermore, these cyclic structures lead 
to practical problems because items containing such a cyclic structure may have to be 
put in the table. In SICStus Prolog, this results in a crash. 
An example may clarify the problem. Suppose we have a very simple program 
containing the following unit clause: 
x(A,B). 
439 
Computational Linguistics Volume 23, Number 3 
Furthermore suppose that in the course of the computation a goal of the form 
?- x(f(x) ,x) 
is attempted. This clearly succeeds. Furthermore an item of that form is added to the 
table. Later on it may be the case that a goal of the form 
7- x(Y,Y) 
is attempted. Clearly this is not a more specific goal than we solved before, so we 
need to solve this goal afresh. This succeeds too. Now we can continue by picking up 
a solution from the second table. However, if we pick the first solution then a cyclic 
term results. 
A possible approach to deal with this situation is to index the items of the second 
table with the item of the first table from which the solution was obtained. In other 
words, if you want to select a solution from the second table, it must not only be the 
case that the solution matches your goal, but also that the corresponding goal of the 
solution is more general than your current goal. This strategy works, but turns out to 
be considerably slower than the original version given above. The reason seems to be 
that the size of the second table is increased quite drastically, because solutions may 
now be added to the table more than once (for all goals that could give rise to that 
solution). 
An improvement of the head-corner parser using a goal-weakening technique of- 
ten eliminates this occur check problem. Goal weakening is discussed in the following 
subsection. 
3.3 Goal Weakening 
The insight behind goal weakening (or abstraction \[Johnson and D6rre 1995\]) in the 
context of memorization is that we may combine a number of slightly different goals 
into a single, more general, goal. Very often it is much cheaper to solve this single 
(but more general) goal than to solve each of the specific goals in turn. Moreover, 
the goal table will be smaller (both in terms of number of items, and the size of 
individual items), which can have a positive effect on the amount of memory and 
CPU-time required for the administration of the table. Clearly, one must be careful not 
to remove essential information from the goal (in the worst case, this may even lead 
to nontermination of otherwise well-behaved programs). 
Depending on the properties of a particular grammar, it may, for example, be 
worthwhile to restrict a given category to its syntactic features before attempting to 
solve the parse-goal of that category. Shieber's (1985) restriction operator can be used 
here. Thus we essentially throw some information away before an attempt is made to 
solve a (memorized) goal. For example, the category 
x(A, B, f (A, B), g(A,h(B, i (C)) ) ) 
may be weakened into: 
x(A,B,f (_,_) ,g(_,_)) 
If we assume that the predicate weaken/2 relates a term t to a weakened version tw, 
such that tw subsumes t, then (15) is the improved version of the parse predicate: 
parse_with_weakening (Cat, P0, P, E0, E) • - (15) 
weaken(Cat,WeakenedCat), 
parse(WeakenedCat,P0,P,E0,E), 
Cat=WeakenedCat. 
440 
van Noord Efficient Head-Corner Parsing 
Note that goal weakening is sound. An answer a to a weakened goal g is only 
considered if a and g unify. Also note that goal weakening is complete in the sense 
that for an answer a to a goal g there will always be an answer a t to the weakening 
of g such that a t subsumes a. 
For practical implementations the use of goal weakening can be extremely im- 
portant. It is my experience that a well-chosen goal-weakening operator may reduce 
parsing times by an order of magnitude. 
The goal-weakening technique can also be used to eliminate typical instances of the 
problems concerning the occur check (discussed in the previous subsection). Coming 
back to the example in the previous subsection, if our first goal 
x(f (x) ,x) 
were weakened into 
x(f (_) ,_) 
then the problem would not occur. If we want to guarantee that no cyclic structures 
can be formed, then we would need to define goal weakening in such a way that no 
variable sharing occurs in the weakened goal. 
An important question is how to come up with a good goal-weakening operator. 
For the experiments discussed in the final section all goal-weakening operators were 
chosen by hand, based on small experiments and inspection of the goal table and item 
table. Even if goal weakening is reminiscent of Shieber's (1985) restriction operator, 
the rules of the game are quite different: in the case of goal weakening, as much infor- 
mation as possible is removed without risking nontermination of the parser, whereas 
in the case of Shieber's restriction operator, information is removed until the resulting 
parser terminates. For the current version of the grammar of OVIS, weakening the goal 
category in such a way that all information below a depth of 6 is replaced by fresh 
variables eliminates the problem caused by the absence of the occur check; moreover, 
this goal-weakening operator reduces parsing times substantially. In the latest version, 
we use different goal-weakening operators for each different functor. 
An interesting special case of goal weakening is constituted by a goal-weakening 
operator that ignores all feature constraints, and hence only leaves the functor for 
each goal category. In this case the administration of the goal table can be simplified 
considerably (the table consists of ground facts, hence no subsumption checks are 
required). This technique is used in the MiMo2 grammar and the Alvey NL Tools 
grammar, both discussed in Section 7. 
4. Compact Representation of Parse Trees 
Often a distinction is made between recognition and parsing. Recognition checks 
whether a given sentence can be generated by a grammar. Usually recognizers can be 
adapted to be able to recover the possible parse trees of that sentence (if any). 
In the context of Definite Clause Grammar this distinction is often blurred because 
it is possible to build up the parse tree as part of the complex nonterminal symbols. 
Thus the parse tree of a sentence may be constructed as a side effect of the recognition 
phase. If we are interested in logical forms rather than in parse trees, a similar trick 
may be used. The result of this, however, is that as early as the recognition phase, 
ambiguities will result in a (possibly exponential) increase of processing time. 
For this reason we will assume that parse trees are not built by the grammar, but 
rather are the responsibility of the parser. This allows the use of efficient packing 
441 
Computational Linguistics Volume 23, Number 3 
112:s-adv-s /\ 
46 s-np-vp /\ 
87 vp-vp-np-pp 
vp-v 121 125 
give22 
Figure 4 
Example of a partial derivation tree projected by a history item. 
techniques. The result of the parser will be a parse forest: a compact representation 
of all possible parse trees rather than an enumeration of all parse trees. 
The structure of the parse forest in the head-corner parser is rather unusual, and 
therefore we will take some time to explain it. Because the head-corner parser uses 
selective memorization, conventional approaches to constructing parse forests (Billot 
and Lang 1989) are not applicable. The head-corner parser maintains a table of partial 
derivation trees, each of which represents a successful path from a lexical head (or 
gap) up to a goal category. The table consisting of such partial parse trees is called the 
history table; its items are history items. 
More specifically, each history item is a triple consisting of a result item reference, 
a rule name, and a list of triples. The rule name is always the name of a rule without 
daughters (i.e., a lexical entry or a gap): the (lexical) head. Each triple in the list of 
triples represents a local tree. It consists of the rule name, and two lists of result item 
references (representing the list of daughters left of the head in reverse, and the list of 
daughters right of the head). An example will clarify this. Suppose we have a history 
item: 
' HISTORY_ITEM ' ( 112, give22, 
\[rule (vp_v, \[\] , \[\] ) , 
rule (s_np_vp, \[87\] , \[\] ) , 
rule (vp_vp_np_pp, \[\] , \[121,125\] ), 
rule (s_adv_s, \[46\] , \[\] )\] ). 
(16) 
This item indicates that there is a possible derivation of the category defined in result 
item 112 of the form illustrated in Figure 4. In this figure, the labels of the interior 
nodes are rule names, and the labels of the leaves are references to result items. The 
head-corner leaf is special: it is a reference to either a lexical entry or an epsilon rule. 
The root node is special too: it has both an associated rule name and a reference to a 
result item. The latter indicates how this partial derivation tree combines with other 
partial trees. 
The history table is a lexicalized tree substitution grammar, in which all nodes 
(except substitution nodes) are associated with a rule identifier (of the original gram- 
mar). This grammar derives exactly all derivation trees of the input. 4 As an example, 
4 The tree substitution grammar is lexicalized in the sense that each of the trees has an associated anchor, 
442 
van Noord Efficient Head-Corner Parsing 
nt5:I nt0:a ntl:4 nt2:3 
/\ r 
ntO man home 
nt3:6 /\ 
at nt2 
nt4:5 nt6:l /\ /\ 
4 nt3 nt5 7 /\ /\ 
nt0 man see nt4 
nt6:2 /\ 
1 nt3 /\ 
nt5 7 /\ 
see ntl 
Figure 5 
Tree substitution grammar that derives each of the two derivation trees of the sentence I see a 
man at home, for the grammar of Billot and Lang (1989). The start symbol of this grammar is 
nt6. Note that all nodes, except for substitution nodes, are associated with a rule (or lexical 
entry) of the original grammar. Root nodes have a nonterminal symbol before the colon, and 
the corresponding rule identifier after the colon. The set of derived trees for this tree 
substitution grammar equals the set of derivation trees of the parse (ignoring the nonterminal 
symbols of the tree substitution grammar). 
consider the grammar used by Tomita (1987) and Billot and Lang (1989), given here 
in (17) and (18). 
(I) s --> np, vp. (2) s --> s, pp. (3) np --> n. 
(4) np --> det, n. (5) np --> rip, pp. (6) pp --> prep, rip. 
(7) vp --> v, rip. 
(17) 
n --> \['I'\] . n --> \[man\] . v --> \[see\] . (18) 
prep--> \[at\]. det--> \[a\]. n--> \[home\]. 
The sentence I see a man at home has two derivations, according to this grammar. The 
lexicalized tree substitution grammar in Figure 5, which is constructed by the head- 
corner parser, derives exactly these two derivations. 
Note that the item references are used in the same manner as the computer gener- 
ated names of nonterminals in the approach of Billot and Lang (1989). Because we use 
chunks of parse trees, less packing is possible than in their approach. Correspondingly, 
the theoretical worst-case space requirements are also worse. In practice, however, this 
does not seem to be problematic: in our experiments, the size of the history table is 
always much smaller than the size of the other tables (this is expected because the 
latter tables have to record complex category information). 
Let us now look at how the parser of the previous section can be adapted to be able 
to assert history items. First, we add an (output-) argument to the parse predicate. This 
sixth argument is the reference to the result item that was actually used. The predicates 
to parse a list of daughters are augmented with a list of such references. This enables 
the construction of a term for each local tree in the head_corner predicate consisting 
of the name of the rule that was applied and the list of references of the result items 
which is a pointer to either a lexical entry or a gap. 
443 
Computational Linguistics Volume 23, Number 3 
used for the left and right daughters of that rule. Such a local tree representation is 
an element of a list that is maintained for each lexical head upward to its goal. Such 
a list thus represents in a bottom-up fashion all rules and result items that were used 
to show that that lexical entry indeed was a head-corner of the goal. If a parse goal 
has been solved then this list containing the history information is asserted in a new 
kind of table: the 'HISTORY_ITEM'/3 table. 5 
We already argued above that parse trees should not be explicitly defined in the 
grammar. Logical forms often implicitly represent the derivational history of a cate- 
gory. Therefore, the common use of logical forms as part of the categories will imply 
that you will hardly ever find two different analyses for a single category, because two 
different analyses will also have two different logical forms. Therefore, no packing is 
possible and the recognizer will behave as if it is enumerating all parse trees. The 
solution to this problem is to delay the evaluation of semantic constraints. During the 
first phase, all constraints referring to logical forms are ignored. Only if a parse tree 
is recovered from the parse forest we add the logical form constraints. This is similar 
to the approach worked out in CLE (Alshawi 1992). 
This approach may lead to a situation in which the second phase actually filters 
out some otherwise possible derivations, in case the construction of logical forms is 
not compositional in the appropriate sense. In such cases, the first phase may be said 
to be unsound in that it allows ungrammatical derivations. The first phase combined 
with the second phase is of course still sound. Furthermore, if this situation arose very 
often, then the first phase would tend to be useless, and all work would have to be 
done during the recovery phase. The present architecture of the head-corner parser 
embodies the assumption that such cases are rare, and that the construction of logical 
forms is (grosso modo) compositional. 
The distinction between semantic and syntactic information is compiled into the 
grammar rules on the basis of a user declaration. We simply assume that in the first 
phase the parser only refers to syntactic information, whereas in the second phase 
both syntactic and semantic information is taken into account. 
If we assume that the grammar constructs logical forms, then it is not clear that we 
are interested in parse trees at all. A simplified version of the recover predicate may 
be defined in which we only recover the semantic information of the root category, 
but in which we do not build parse trees. The simplified version may be regarded 
as the run-time version, whereas parse trees will still be very useful for grammar 
development. 
5. Parsing Word-Graphs with Probabilities 
The head-corner parser is one of the parsers developed within the NWO Priority 
Programme on Language and Speech Technology. In this program a spoken dialog 
system is developed for public transportation information (Boves et al. 1995). 
In this system the input for the parser is not a simple list of words, as we have 
assumed up to now, but rather a word-graph: a directed, acyclic graph where the 
states are points in time and the edges are labeled with word hypotheses and their 
corresponding acoustic score. Thus, such word-graphs are acyclic weighted finite-state 
automata. 
In Lang (1989) a framework for processing ill-formed input is described in which 
5 A complication is needed for those cases where items are removed later because a more general item 
has been found. 
444 
van Noord Efficient Head-Corner Parsing 
certain common errors are modeled as (weighted) finite-state transducers. The compo- 
sition of an input sentence with these transducers produces a (weighted) finite-state 
automaton, which is then input for the parser. In such an approach, the need to gen- 
eralize from input strings to input finite-state automata is also clear. 
The generalization from strings to weighted acyclic finite-state automata intro- 
duces essentially two complications: we cannot use string indices anymore and we 
need to keep track of the acoustic scores of the words used in a certain derivation. 
5.1 From String Positions to State Names 
Parsing on the basis of a finite-state automaton can be seen as the computation of 
the intersection of that automaton with the grammar. If the definite clause grammar 
is off-line parsable, and if the finite-state automaton is acyclic, then this computation 
can be guaranteed to terminate (van Noord 1995). This is obvious because an acyclic 
finite-state automaton defines a finite number of strings. More importantly, existing 
techniques for parsing based on strings can be generalized easily by using the names 
of states in the automaton instead of the usual string indices. 
In the head-corner parser, this leads to an alternative to the predicate smaller_ 
equal/2. Rather than a simple integer comparison, we now need to check that a 
derivation from P0 to P can be extended to a derivation from E0 to E by checking that 
there are paths in the word-graph from E0 to P0 and from P to E. 
The predicate connection/2 is true if there is a path in the word-graph from the 
first argument to the second argument. It is assumed that state names are integers; 
to rule out cyclic word-graphs we also require that, for all transitions from P0 to P, it 
is the case that P0 < P. Transitions in the word-graph are represented by clauses of 
the form wordgraph:trans (P0, Sym, P, Score), which indicate that there is a transition 
from state P0 to P with symbol Sym and acoustic score Score. The connection predicate 
can be specified simply as the reflexive and transitive closure of the transition relation 
between states: 
connect ion (A, A) . (19) 
connection(AO,A) "- 
wordgraph : trans (AO, _, A i, _), 
connection (AI, A). 
The implementation allows for the possibility that state names are not instantiated (as 
required by the treatment of gaps). Moreover it uses memorization, and it ensures that 
the predicate succeeds at most once: 
connection(A,B):- 
( var(A) -> true 
; var(B) -> true 
; A=:=B -> true 
; B < A -> fail 
; ok_conn(A,B) -> true 
; fail_conn(A,B) -> fail 
; wordgraph:trans(A,_,X,_), 
connection(X,B) 
% word-graphs are acyclic 
-> assertz(ok_conn(A,B)) 
(20) 
445 
Computational Linguistics Volume 23, Number 3 
; assertz(fail_conn(A,B)), 
fail 
. 
A somewhat different approach that may turn out to be more efficient is to use the 
ordinary comparison operator that we used in the original definition of the head-corner 
parser. The possible extra cost of allowing impossible partial analyses is worthwhile 
if the more precise check would be more expensive. If, for typical input word-graphs, 
the number of transitions per state is high (such that almost all pairs of states are 
connected), then this may be an option. 
5.2 Accounting for Word-Graph Scores 
To account for the acoustic score of a derivation (defined as the sum of the acoustic 
scores associated with all transitions from the word-graph involved in the derivation), 
we assume that the predicate lexical_analysis represents the acoustic score of the 
piece of the word-graph that it covers by an extra argument. During the first phase, 
acoustic scores are ignored. During the second phase (when a particular derivation is 
constructed), the acoustic scores are combined. 
6. Head-Corner Parsing and Robustness 
Certain approaches towards robust parsing use the partial results of the parser. It is 
assumed in such approaches that even if no full parse for the input could be con- 
structed, the discovery of other phrases in the input might still be useful. It is also 
often assumed that a bottom-up parser is essential for such approaches to work: parsers 
that use top-down information (such as the head-corner parser) may fail to recognize 
relevant subparses in the context of an ungrammaticality. 
In the application for which the head-corner parser was developed, robust pro- 
cessing is essential. In a spoken dialogue system it is often impossible to parse a full 
sentence, but in such cases the recognition of other phrases, such as temporal expres- 
sions, might still be very useful. Therefore, a robust processing technique that collects 
the remnants of the parsing process in a meaningful way seems desirable. 
In this subsection, we show how the head-corner parser can be used in such 
circumstances. The parser is modified in such a way that it finds all derivations of the 
start symbol anywhere in the input. Furthermore, the start symbol should be defined in 
such a way that it includes all categories considered useful for the application. 
6.1 Underspecification of the Positions 
Normally the head-corner parser will be called as follows, for example: 
?- parse(s(Sem) ,0,12) . 
indicating that we want to parse a sentence from position zero to twelve with cate- 
gory s (Sere) (a sentence with a semantic representation that is yet to be discovered). 
Suppose, however, that a specific robustness module is interested in all maximal pro- 
jections anywhere in the sentence. Such a maximal projection may be represented by 
a term xp (Sere). Furthermore there may be unary grammar rules rewriting such an xp 
into appropriate categories, for example: 
xp(Sem) --> np(Sem), xp(Sem) --> s(Sem). (21) 
xp(Sem) --> pp(Sem), xp(Sem) --> advp(Sem). 
446 
van Noord Efficient Head-Corner Parsing 
If we want to recognize all maximal projections at all positions in the input, then we 
can simply give the following parse-goah 
?- parse(xp(Sem) .... ). (22) 
Now one might expect that such an underspecified goal will dramatically slow down 
the head-corner parser, but this turns out to be false. In actual fact we have experienced 
an increase of efficiency using underspecification. This can only be understood in the 
light of the use of memorization. Even though we now have a much more general 
goal, the number of different goals that we need to solve is much smaller. 
Also note that even though the first call to the parse predicate has variable extreme 
positions, this does not imply that all power of top-down prediction is lost by this 
move; recursive calls to the parse predicate may still have instantiated left and/or right 
extreme positions. The same applies with even more force for top-down information 
on categories. 
6.2 The Robustness Component in OVIS 
In an attempt to obtain a robust natural language understanding component, we have 
experimented in OVIS with the techniques mentioned in the preceding paragraph. The 
top category (start symbol) of the OVIS grammar is defined as the category max (gem). 
Moreover there are unary rules such as max(gem) --* np(Sem,.. ) for NP, S, PP, AdvP. 
In the first phase, the parser finds all occurrences of the top category in the input 
word-graph. Thus, we obtain items for all possible maximal projections anywhere in 
the input graph. In the second phase, the robustness component selects a sequence 
of such maximal projections. The robustness procedure consists of a best-first search 
from the beginning of the graph to the end of the graph. A path in the input graph 
can be constructed by taking steps of two types. To move from position P to Q you 
can either: 
• use a maximal projection from P to Q (as constructed by the parser), or 
• use a transition from P to Q. In this case we say that we skip that 
transition. 
In order to compare paths in the best-first search method, we have experimented 
with score functions that include some or all of the following factors: 
• the number of skips. We prefer paths with a smaller number of such 
skips. 
• the number of maximal projections. We prefer paths with a smaller 
number of such projections. 
• the combined acoustic score as defined in the word-graph. 
• the appropriateness of the semantic representation given the dialogue 
context 
• the bigram score. 
If bigram scores are not included, then this best-first search method can be im- 
plemented efficiently because for each state in the word-graph we only have to keep 
track of the best path to that state. 
447 
Computational Linguistics Volume 23, Number 3 
The resulting best path in general consists of a number of maximal projections. In 
the OVIS application, these are often simple time or place expressions. The pragmatic 
module is able to deal with such unconnected pieces of information and will perform 
better if given such partial parse results. 
To evaluate the appropriate combination of the factors determining the scoring 
function, and to evaluate this approach with respect to other approaches, we use a 
corpus of word-graphs for which we know the corresponding actual utterances. We 
compare the sentence associated with the best path in the word-graph with the sen- 
tence that was actually spoken. Clearly, the more often the robustness component 
uses the information that was actually uttered, the more confidence we have in that 
component. This notion of word accuracy is an approximation of semantic accuracy 
(or "concept accuracy"). The string comparison is defined by the minimal number of 
deletions and insertions that is required to turn the first string into the second (Lev- 
enshtein distance), although it may be worthwhile to investigate other measures. For 
example, it seems likely that for our application it is much less problematic to "miss" 
information than to "hallucinate". This could be formalized by a scoring function in 
which insertion (into analysis result) is cheaper than deletion. 
Currently, the best results are obtained with a scoring function in which bigram 
scores, acoustic scores, and the number of skips are included. We have also imple- 
mented a version of the system in which acoustic scores and bigram scores are used 
to select the best path through the word-graph. This path is then sent to the parser 
and the robustness component. In this "best-l-mode" the system performs somewhat 
worse in terms of word accuracy, but much faster, as seen in the experiments in the 
next section. 
7. Practical Experience 
There does not exist a generally agreed-upon method to measure the efficiency of 
parsers for grammars of the kind we assume here, i.e., constraint-based grammars for 
natural language understanding. Therefore, I will present the results of the parser for 
the current version of the OVIS grammar in comparison with a number of other parsers 
that have been developed in the same project (by my colleagues and myself). Moreover, 
a similar experiment was performed with two other grammars: the English MiMo2 
grammar (van Noord et al. 1991), and the English Alvey NL Tools grammar (Grover, 
Carroll, and Briscoe 1993). 6 It should be clear that the results to be presented should 
not be taken as a formal evaluation, but are presented solely to give an impression 
of the practical feasibility of the parser, at least for its present purpose. The following 
results should be understood with these reservations in mind. 
7.1 Other Parsers 
The head-corner parser was compared with a number of other parsers. The parsers 
are described in further detail in van Noord, Bouma, Koeling, and Nederhof (1996) 
6 The material used to perform the experiments with the MiMo2 grammar and the Alvey NL Tools 
grammar, including several versions of the head-corner parser, is available via anonymous ftp at: 
ftp ://ftp. let. rug. nllpublprolog-app/CL97/and the world-wide-web at: 
http://www, let. rug. nl/~vannoord/CL97/. The material is ready to be plugged into the Hdrug 
environment available from the same site. 
448 
van Noord Efficient Head-Comer Parsing 
and van Noord, Nederhof, Koeling, and Bouma (1996). The last two parsers of the 
following list were implemented by Mark-Jan Nederhof. 
• lc. Left-corner parser. This parser is derived from the head-corner parser. 
It therefore uses many of the ideas presented above. Most importantly, it 
uses selective memorization with goal weakening and packing. The 
parser is closely related to the BUP parser (Matsumoto et al. 1983). 
• bu-inactive. Inactive chart parser. This is a bottom-up parser that 
records only inactive edges. It uses packing. It uses a precompiled 
version of the grammar in which no empty productions are present. 
• bu-earley. Bottom-up Earley parser. This is a bottom-up chart parser 
that records both active and inactive items. It operates in two phases and 
uses packing. It uses a precompiled version of the grammar in which no 
empty productions are present. 
• bu-active. Bottom-up Earley parser without packing. This is a chart 
parser that constructs only active items (except for categories that unify 
with the top category). It uses a precompiled version of the grammar in 
which no empty productions are present. 
• lr. LR parser. This is an experimental implementation of a generalization 
for Definite Clause Grammars of the parser described in Nederhof and 
Satta (1996). It proceeds in a single phase and does not use packing. It 
uses a table to maintain partial analyses. It was not possible to perform 
all the experiments with this parser due to memory problems during the 
construction of the LR table. 
Note that we have experimented with a number of different versions of each of 
these parsers. We will report only on the most efficient version. The experiments were 
performed on a 125Mhz HP-UX 735 machine with 240 Megabytes of memory. Timings 
measure CPU-time and should be independent of the load on the machine. 7 
7.2 Experiment 1: OVIS 
The OVIS grammar (for Dutch) contains about 1,400 lexical entries (many of which are 
station and city names) and 66 rules (a substantial fraction of which are concerned with 
time and date expressions), including 7 epsilon rules. The most important epsilon rule 
is part of a gap-threading implementation of verb-second. The grammar is documented 
in detail in van Noord, Nederhof, Koeling, and Bouma (1996). The head-corner table 
contains 128 pairs, the lexical head-corner table contains 93 pairs, the gap-head-corner 
table contains 14 pairs. The left-corner table contains 156 pairs, the lexical left-corner 
table contains 114 pairs, the gap-left-corner table contains 20 pairs. The precompiled 
grammar, which is used by the chart parsers, contains 92 rules. 
The input for the parser consists of a test set of 5,000 word-graphs, randomly taken 
from a corpus of more than 25,000 word-graphs. These word-graphs are the latest 
word-graphs that were available to us; they are "real" output of the current version of 
the speech recognizer as developed by our project partners. In this application, typical 
7 Experiments suggest that the load on the machine in fact does influence the timings somewhat. 
However, the experiments were performed at times when the load of the machine was low. It is 
believed, therefore, that no such artifacts are present in the numbers given here. 
449 
Computational Linguistics Volume 23, Number 3 
Table 1 
The left-most table gives information concerning the 
number of transitions per word-graph of the test set for 
the OVIS grammar. As can be seen from this table, more 
than half of the corpus consists of word-graphs with at 
most five transitions. In the right-most table, the number 
of words per utterance is given. Many utterances consists 
of less than five words. 
Number of Number of Number of Number of 
Transitions Word-Graphs Words Utterances 
0-5 2,825 1-2 2,465 
6-10 850 3-4 1,448 
11-15 408 5-6 543 
16-20 246 7-8 319 
21-30 237 9-10 118 
31-40 146 11-12 56 
41-50 83 13-14 26 
51-75 112 15-16 20 
76-100 44 17-18 5 
101-150 36 
151-200 12 
263 1 
utterances are short. As a consequence, the typical size of word-graphs is rather small 
too, as can be seen in Table 1. 
We report on three different experiments with the OVIS grammar and these word- 
graphs. In the first experiment, the system runs in best-l-mode: the best path is selected 
from the word-graph using bigram scores and the acoustic scores (present in the word- 
graph). This best path is then sent to the parser and robustness component. In the 
second experiment, the parser is given the utterance as it was actually spoken (to 
simulate a situation in which speech recognition is perfect). In the third experiment, 
the parser takes the full word-graph as its input. The results are then passed on to 
the robustness component. As explained in the previous section on robustness, each 
of the parsers finds all derivations of the start symbol anywhere in the input (this is 
the case in each of the OVIS experiments). 
For the current version of the OVIS system, parsing on the basis of the best path in 
the word-graph gives results in terms of word accuracy that are similar to the results 
obtained with full word-graphs. Results for concept accuracy are not yet available. 
Details can be found in van Noord, Bouma, Koeling, and Nederhof (1996). 
7.2.1 Parsing Best Path Only. In Table 2, the CPU-time requirements and the maxi- 
mum space requirements of the different parsers are listed. In the table we list, respec- 
tively, the total number of milliseconds CPU-time required for all 5,000 word-graphs 
(timings include lexical lookup, parsing, and the robustness component), the average 
number of milliseconds per word-graph, and the maximum number of milliseconds 
for a word-graph. The final column lists the maximum amount of space requirements 
(per word-graph, in Kbytes). 8 
8 These sizes are obtained using the SICStus prolog built-in predicate statistics (program~pace,X). 
This only measures the size of the internal database, but not the size of the stacks. The size of stacks 
has never been a problem for any of the parsers; the size of the internal database has occasionally led 
450 
van Noord Efficient Head-Corner Parsing 
Table 2 
Total and average CPU-time and maximal space requirements for a test set of 5,000 best 
paths through word-graphs (OVIS grammar). 
Parser Total (msec) msec/Sentence Maximum Maximum Space 
hc 169,370 34 530 163 
lc 180,160 36 530 171 
bu-active 291,870 58 4,220 1,627 
bu-inactive 545,060 109 13,050 784 
bu-earley 961,760 192 24,470 2,526 
lr 1,088,940 218 416,000 4,412 
Table 3 
Total and average CPU-time and maximum space requirements for a test set of 5,000 
utterances (OVIS grammar). 
Parser Total (msec) msec/Sentence Maximum Maximum Space 
hc 126,930 25 510 137 
lc 137,090 27 490 174 
bu-active 257,390 51 4,030 1,438 
bu-inactive 546,650 109 15,170 1,056 
bu-earley 934,810 187 25,490 3,558 
lr 957,980 192 417,580 4,435 
Table 4 
Total and average CPU-time and maximum space requirements for a test set of 5,000 
word-graphs (OVIS grammar). 
Parser Total (msec) msec/Word-Graph Maximum Maximum Space 
lc 410,670 82 15,360 4,455 
hc 435,320 87 16,230 4,174 
7.2.2 Parsing Sentences. The differences in CPU-time for the corpus of 5,000 word- 
graphs are similar to differences we have found for other test sets. The results are also 
very similar to the results we obtain if we parse the utterances actually spoken. Table 3 
lists the results of parsing the set of 5,000 utterances from which the word-graphs were 
derived. 
7.2.3 Parsing Word-Graphs. Obviously, parsing word-graphs is more difficult than 
parsing only the best path through a word-graph, or parsing an ordinary sentence. 
In Table 4, we list the results for the same set of 5,000 word-graphs. This experiment 
could only be performed for the head-corner and the left-corner parser. The other 
parsers ran into memory problems for some very large word-graphs. 
In order to compare the other parsers too, I performed the experiment with a 
time-out of 5,000 msec (the memory problems only occur for word-graphs that take 
longer to process). In Table 5 the percentage of word-graphs that can be treated within 
a certain amount of CPU-time are listed. 
From the experiments with the OVIS grammar and corpus, it can be concluded 
to problems for the bottom-up chart parsers. 
451 
Computational Linguistics Volume 23, Number 3 
Table 5 
Percentage of word-graphs that can be treated within time limit (OVIS grammar). 
Parser 500 1,000 2,000 3,000 4,000 5,000 Time-Outs 
lc 97.72 99.28 99.78 99.92 99.92 99.92 4 
hc 97.42 98.94 99.60 99.84 99.92 99.92 4 
lr 91.44 94.42 96.30 96.98 97.34 97.70 115 
bu-active 91.84 94.76 96.04 96.84 97.30 97.60 120 
bu-inactive 82.36 88.64 92.24 94.10 95.14 95.86 207 
bu-earley 77.10 84.26 89.04 91.42 92.64 93.50 325 
that the head-corner and left-corner parsers (implemented with selective memorization 
and goal weakening) are much more efficient than the other parsers. In the case of 
word-graphs, the left-corner parser is about 5% faster than the head-corner parser; for 
strings, the head-corner parser is about 6% to 8% faster than the left-corner parser. 
7.3 Experiment 2:MiMo2 
Another experiment was carried out for the English grammar of the MiMo2 system. 
This grammar is a unification-based grammar that is compiled into a DCG. The gram- 
mar contains 525 lexical entries, 63 rules including 13 gaps. The head-corner relation 
contains 33 pairs and the lexical head-corner relation contains 18 pairs. The left-corner 
parser runs into hidden left-recursion problems on the original grammar, so it uses 
a version of the grammar in which left-most gaps are compiled out. This compiled 
grammar has 69 rules. The left-corner relation contains 80 pairs; the lexical left-corner 
relation contains 62 pairs. As a result, the left-corner parser only hypothesizes gaps 
for non-left-most daughters. Because the grammar never allows gaps as head, the 
head-corner parser can be optimized in a similar fashion. Both the left-corner and 
head-corner parser use a goal-weakening operator that only leaves the functor sym- 
bol. This simplifies the way in which the goal table is maintained. 
For this experiment we have no notion of typical input, but instead made up a set 
of 25 sentences of various lengths and levels of difficulty, with a total of 338 readings. 
In order to be able to complete the experiment, a time-out of 60 seconds of CPU-time 
was used. Timings include lexical lookup and parse tree recovery. 
The original parser implemented in the MiMo2 system (a left-corner parser with- 
out packing) took 294 seconds of CPU-time to complete the experiment (with three 
time-outs). Because the test environment was (only slightly) different, we have indi- 
cated the latter results in italics. Average CPU-time is only given for those parsers 
that completed each of the sentences within the time limit. The results are given in 
Table 6. 
The bottom-up active chart parser performs better on smaller sentences with a 
small number of readings. For longer and more ambiguous sentences, the head-corner 
parser is (much) more efficient. The other parsers are consistently much less efficient. 
7.4 Experiment 3: Alvey NL Tools 
A final set of experiments was performed for the Alvey NL Tools grammar (Grover, 
Carroll, and Briscoe 1993), similar to the experiments discussed in Carroll (1994). For 
a longer description of the grammar and the test sets we refer the reader to this 
publication. The grammar contains 2,363 lexical entries, and 780 rules (8 of which 
are gaps). The left-corner relation contains 440 pairs; the lexical left-corner relation 
452 
van Noord Efficient Head-Corner Parsing 
Table 6 
Total and average CPU-time and maximum space requirements for a set of 25 sentences 
(MiMo2 grammar). Italicized items are offered for cautious comparison. 
Parser Total (msec) msec/Sentence Maximum Space Time-Outs 
hc 52,670 2,107 2,062 0 
bu-active 52,990 2,120 30,392 0 
lc 109,750 4,390 8,570 0 
mimo2-lc 294,000 3 
bu-earley 439,050 12,910 4 
bu-inactive 498,610 7,236 5 
Table 7 
Total and average CPU-time and maximum space requirements for a set 
of 129 short sentences (Alvey NL Tools grammar). Italicized items are 
offered for cautious comparison. 
Parser msec msec/Sentence Maximum Kbytes 
bu-active 18250 141 1276 
lc 21900 170 137 
Carroll BU-LC 21500 167 
hc (lc mode) 23690 184 165 
bu-earley 27670 214 758 
hc 68880 534 140 
bu-inactive 83690 649 170 
contains 254 pairs. No gaps are possible as left-most elements of the right-hand side 
of a rule. 
To use the head-corner parser, it must be determined for each of the rules which 
element on the right-hand side constitutes the head of the rule. The head-corner re- 
lation contains 352 pairs; the lexical head-corner relation contains 180 pairs. We also 
report on experiments in which, for each rule, the left-most member of the right-hand 
side was selected as the head. The goal-weakening operator used for the left-corner 
and head-corner parser removes all features (only leaving the functor symbol of each 
category); again this simplifies the maintenance of the goal table considerably. 
The bottom-up chart parsers use a version of the grammar in which all epsilon 
rules are compiled out. The resulting grammar has 1,015 rules. 
The first test set consists of 129 short sentences (mean length 6.7 words). Our results 
were obtained with a newer version of the Alvey NL Tools grammar. In Table 7 we 
list the results for the same grammar and test set for Carroll's bottom-up left-corner 
parser (BU-LC). Carroll performed this experiment on a SUN UltraSparc 1/140. It was 
estimated by Carroll and the author that this machine is about 1.62 times faster than 
the HP-UX 735 on which the other experiments were performed. 9 In Table 7, we have 
multiplied the 13.3 seconds of CPU-time (obtained by Carroll) with this factor in order 
to compare his results with our results. Clearly, these numbers should be taken with 
extreme caution, because many factors in the test environment differ (hardware, LISP 
versus Prolog). For this reason we use italics in Table 7. 
The second test set consists of 100 longer and much more complex sentences. The 
length of the sentences is distributed uniformly between 13 and 30 words (sentences 
9 The SPECINT92 figures for the Ultra 1/140 and HP 735/125 confirm this: 215 and 136 respectively. 
453 
Computational Linguistics Volume 23, Number 3 
Table 8 
Total and average CPU-time and maximum space requirements for set of 
100 longer sentences (Alvey NL Tools grammar). Italicized items are 
offered for cautious comparison. 
Parser msec msec/Sentence Maximum Kbytes 
lc 195,850 1,959 
hc (lc mode) 216,180 2,162 
Carroll BU-LC 333,000 3,330 
bu-earley 1,219,120 12,191 
hc 3,053,910 30,539 
bu-inactive 3,578,370 35,784 
bu-active >> 
10,955 
10,969 
18,232 
7,915 
16,936 
> 65,000 
created by Carroll). Many of the sentences have many parses: the maximum number 
of parses is 2,736 for one 29-word sentence. Average number of readings is about 100 
readings per sentence. 
Again, we list the results Carroll obtained with the BU-LC parser. It took 205.7 
seconds on the SUN UltraSparc 1/140.1° The bottom-up active chart parser ran into 
memory problems for some very ambiguous sentences and was very slow on many 
of the other sentences (due to the lack of packing). The results are summarized in 
Table 8. 
The implementation of the left-corner parser based on selective memorization and 
goal weakening seems to be substantially more efficient than the chart-based imple- 
mentation of Carroll. The head-corner parser running in left-corner mode is almost 
as fast as this specialized left-corner parser. This suggests that the use of selective 
memorization with goal weakening is on the right track. 
From these experiments, it can be concluded that the head-corner parser is not 
suitable for the Alvey NL Tools grammar. The reason seems to be that for this gram- 
mar the amount of top-down information available through the head-corner table is 
of limited value--typically, too many different lexical head-corners are available for 
parsing a given goal category. For example, for parsing a sentence, possible head- 
corners include auxiliaries, verbs, adverbs, complementizers, pronouns, prepositions, 
determiners, nouns, and conjunctions. (In contrast, in the MiMo2 grammar, only verbs 
can function as the head-corners of sentences.) As a result, the prediction step intro- 
duces too much nondeterminism. A related reason for the poor performance for this 
grammar might be the large amount of lexical ambiguity. The grammar and lexicon 
used in the experiment is compiled from a compact user notation. In the compiled for- 
mat, all disjunctions are spelled out in different rules and lexical entries. As a result, 
many words have a large number of (only slightly different) readings. It may be that 
the head-corner parser is less suitable in such circumstances. This could also explain 
the fact that the head-corner parser performs better on strings then on word-graphs: 
in many respects the generalization to word-graphs is similar to an increase in lexical 
ambiguity. This suggests that the design of the head-corner parser could be improved 
in the prediction step. 
10 Note that Carroll reports on recognition times only, whereas our results include the construction of all 
individual parse trees. For this experiment the left-corner parser used about 163 seconds on 
recognition. In the recognition phase, however, the parser ignores a number of syntactic features, therefore, this number cannot be compared fairly with Carroll's number either. 
454 
van Noord Efficient Head-Corner Parsing 
Acknowledgments 
Some of the introductory material of this 
article is a modified and extended version 
of the introduction of Bouma and van 
Noord (1993). Gosse Bouma, John Carroll, 
Rob Koeling, Mark-Jan Nederhof, John 
Nerbonne and three anonymous ACL 
reviewers provided useful feedback. The 
grammar used in the OVIS experiments is 
written by Gosse Bouma, Rob Koeling and 
myself. Mark-Jan Nederhof implemented 
the bottom-up active chart parser and the 
experimental LR parser. The MiMo2 
grammar was written by Joke Dorrepaal, 
Pim van der Eijk, and myself. I am very 
grateful to John Carroll for his help in 
making the experiments with the Alvey NL 
Tools grammar possible. This research was 
carried out within the framework of the 
Priority Programme Language and Speech 
Technology (TST). The TST-Programme is 
sponsored by NWO (Dutch Organization 
for Scientific Research). 
References 
Alshawi, Hiyan, editor. 1992. The Core 
Language Engine. ACL-MIT press. 
Billot, S. and B. Lang. 1989. The structure of 
shared parse forests in ambiguous 
parsing. In Proceedings of the 27th Annual 
Meeting, pages 143-151, Vancouver. 
Association for Computational 
Linguistics. 
Bouma, Gosse and Gertjan van Noord. 1993. 
Head-driven parsing for lexicalist 
grammars: Experimental results. In 
Proceedings of the Sixth Conference of the 
European Chapter of the Association for 
Computational Linguistics, Utrecht. 
Available from 
http: //www.let.rug.nl / ~vannoord / 
papers/. 
Boves, Lou, Jan Landsbergen, Remko Scha, 
and Gertjan van Noord. 1995. Language 
and Speech Technology. NWO Den Haag. 
Project plan for the NWO Priority 
Programme 'Language and Speech 
Technology'. Available from 
http://odur.let.rug.nl:4321/. 
Carroll, John. 1994. Relating complexity to 
practical performance in parsing with 
wide-coverage unification grammars. In 
Proceedings of the 32th Annual Meeting, 
pages 287-294, Las Cruces, NM. 
Association for Computational 
Linguistics. 
Grover, Claire, John Carroll, and Ted 
Briscoe. 1993. The Alvey natural language 
tools grammar (4th release). Technical 
Report 284, Computer Laboratory, 
Cambridge University, UK. 
Johnson, Mark and Jochen D6rre. 1995. 
Memorization of coroutined constraints. 
In Proceedings of the 33th Annual Meeting, 
pages 100-107, Boston, MA. Association 
for Computational Linguistics. 
Kay, Martin. 1989. Head driven parsing. In 
Proceedings of the Workshop on Parsing 
Technologies, Pittsburg, PA. 
Lang, Bernard. 1989. A generative view of 
ill-formed input processing. In Proceedings 
of the ATR Symposium on Basic Research for 
Telephone Interpretation (ASTI), Kyoto, 
Japan. 
Lavelli, Alberto and Giorgio Satta. 1991. 
Bidirectional parsing of lexicalized tree 
adjoining grammar. In Proceedings of the 
Fifth Conference of the European Chapter of the 
Association for Computational Linguistics, 
Berlin. 
Matsumoto, Y., H. Tanaka, H. Hirakawa, 
H. Miyoshi, and H. Yasukawa. 1983. BUP: 
A bottom up parser embedded in Prolog. 
New Generation Computing, 1(2). 
Nederhof, Mark-Jan and Giorgio Satta. 1994. 
An extended theory of head-driven 
parsing. In Proceedings of the 32th Annual 
Meeting, Las Cruces, NM. Association for 
Computational Linguistics. 
Nederhof, Mark-Jan and Giorgio Satta. 1996. 
Efficient tabular LR parsing. In Proceedings 
of the 34th Annual Meeting, pages 239-246, 
Santa Cruz, CA. Association for 
Computational Linguistics. 
Pereira, Fernando C.N. and Stuart M. 
Shieber. 1987. Prolog and Natural Language 
Analysis. Center for the Study of 
Language and Information, Stanford. 
Pereira, Eernando C.N. and David Warren. 
1980. Definite clause grammars for 
language analysis--a survey of the 
formalism and a comparison with 
augmented transition networks. Artificial 
Intelligence, 13. 
Rosenkrantz, D.J. and P.M. Lewis II. 1970. 
Deterministic left corner parsing. In 
Proceedings of the IEEE Conference of the 11th 
Annual Symposium on Switching and 
Automata Theory, pages 139-152. 
Sahlin, Dan. 1991. An Automatic Partial 
Evaluator for Full Prolog. Ph.D. thesis, The 
Royal Institute of Technology (KTH) 
Stockholm, Sweden. SICS Dissertation 
Series 04. 
Satta, Giorgio and Oliviero Stock. 1989. 
Head-driven bidirectional parsing. A 
tabular method. In Proceedings of the 
Workshop on Parsing Technologies, pages 
455 
Computational Linguistics Volume 23, Number 3 
43-51, Pittsburg, PA. 
Shieber, Stuart M. 1985. Using restriction to 
extend parsing algorithms for 
complex-feature-based formalisms. In 
Proceedings of the 23th Annual Meeting, 
Chicago, IL. Association for 
Computational Linguistics. 
Shieber, Stuart M., Gertjan van Noord, 
Robert C. Moore, and Fernando C.N. 
Pereira. 1990. Semantic-head-driven 
generation. Computational Linguistics, 
16(1). Available from 
http: / / www. let. rug .nl / ~ vannoord / pap ers/. 
Sikkel, Klaas. 1993. Parsing Schemata. Ph.D. 
thesis, Twente University, Enschede. 
(Published in Texts in Theoretical 
Computer Science, An EATCS Series. 
Springer Verlag, 1997.) 
Sikkel, Klaas and Rieks op den Akker. 1992. 
Head-corner chart parsing. In Proceedings 
of Computer Science in the Netherlands (CSN 
'92), Utrecht. 
Sikkel, Klaas and Rieks op den Akker. 1993. 
Predictive head-comer chart parsing. In 
IWPT 3, Third International Workshop on 
Parsing Technologies, pages 267-276, 
Tilburg/Durbuy. 
Tomita, M. 1987. An efficient augmented 
context-free parsing algorithm. 
Computational Linguistics, 13(1-2):31-46. 
van Noord, Gertjan. 1991. Head corner 
parsing for discontinuous constituency. In 
Proceedings of the 29th Annual Meeting, 
Berkeley, CA. Association for 
Computational Linguistics. Available from 
http://www.let.rug.nl/~vannoord/papers/. 
van Noord, Gertjan. 1993. Reversibility in 
Natural Language Processing. Ph.D. thesis, 
University of Utrecht. Available from 
http: //www.let.rug.nl/~vannoord/papers/. 
van Noord, Gerljan. 1994. Head comer 
parsing for TAG. Computational 
Intelligence, 10(4). Available from 
http://www.let.rug.nl/~vannoord/papers/. 
van Noord, Gertjan. 1995. The intersection 
of finite state automata and definite 
clause grammars. In Proceedings of the 33th 
Annual Meeting, Boston, MA. Association 
for Computational Linguistics. Available 
from 
http://www.let.rug.nl / ~vannoord/papers/. 
van Noord, Gertjan, Gosse Bouma, Rob 
Koeling, and Mark-Jan Nederhof. 1996. 
Conventional natural language processing 
in the NWO Priority Programme on 
Language and Speech Technology. 
October 1996 Deliverables. Technical 
Report 28, NWO Priority Programme 
Language and Speech Technology. 
Avaiable from 
http://odur.let.rug.nl:4321/. 
van Noord, Gertjan, Joke Dorrepaal, Pim 
van der Eijk, Maria Florenza, Herbert 
Ruessink, and Louis des Tombe. 1991. An 
overview of MiMo2. Machine Translation, 
6:201-214. Available from 
http: / / www.let, rug .nl / ~ vannoord / 
papers/. 
van Noord, Gertjan, Mark-Jan Nederhof, 
Rob Koeling, and Gosse Bouma. 1996. 
Conventional natural language processing 
in the NWO Priority Programme on 
Language and Speech Technology. 
January 1996 Deliverables. Technical 
Report 22, NWO Priority Programme 
Language and Speech Technology. 
Available from 
http://odur.let.rug.nl:4321/. 
Warren, David S. 1992. Memoing for logic 
programs. Communications of the ACM, 
35(3):94-111. 
456 
