THE COMPLEXITY OF PARSING WITH 
EXTENDED CATEGOP IAL GP AMMARS 
Esther KSnig, Universit£t Stuttgart, Institute for Computational Linguistics, 
Keplerstrasse 17, D-7000 Stuttgart 1, FRG, koenigQds01ilog.bitnet 
Abstract 
Instead of incorporating a gap-percolation mecha- 
nism for handling certain "movement" phenomena, 
the extended categorial grammars contain special in- 
ference rules for treating these problems. The Lam- 
bek categorial grammar is one representative of the 
grammar family under consideration. It allows for a 
restricted use of hypothetical reasoning. We define 
a modification of the Cocke-Younger-Kasami (CKY) 
parsing algorithm which covers this additional de- 
ductive power and analyze its time complexity. 
1 Introduction 
Categorial grammars have become attractive in the 
last decade. One reason might be the rediscovery of 
their conceptual elegance: the combinetory potential 
of a lexeme is stated in direct association with the 
lexeme itself- in the lexicon. A basic (bidirectional) 
categorial grammar B is defined by a set of categories 
C := C0U{zl ~ = x/Y or z =x\y; x, yC C} (C0 
a finite set of basic categories, the category y is re- 
ferred to as the argument category, the category x is 
called value category, complex categories are named 
functor categories), a goal category g (the start sym- 
bol) which is a basic category, a lexicon L which is a 
function from a finite set of lexemes onto a set of fi- 
nite sets of categories, and the two combination rules 
"leftward application" (app\) and "rightward appli- 
cation" (app/) which state how argument positions 
are filled: 
(app\) y, xky ~ x 
An object U --* x where U is a sequence of cate- 
gories is called a sequent. 
This basic concept of categorial grammar may be 
extended by adding more combinatory rules. E.g. 
the rule of functional composition is the incarna- 
tion of the idea how to handle certain phenomena 
of unbounded dependencies: if a funetor category 
x/y finds only an incomplete argument category y/z, 
i.e. which, itself, still lacks an argument z, then these 
two partial categories can be, nevertheless, combined 
into the category x/z, which expresses the fact that 
z is still missing. 
In general, functional composition cannot do its 
job by itself. One needs rules which allow for chang- 
ing the order in which a functor category takes its 
left arguments with regard to its right arguments. 
The use of the type raising rules is one way to obtain 
this goal: 
(tr\) y-~ x\(x/y) (tr/) y xl( \y) 
The first extensive use of the concept of func- 
tional composition in syntax appears in the paper 
lAdes, Steedman 1982\]. The rules of functional coin- 
position and type raising have already been men- 
tioned in \[Lambek 1958\] as being theorems of the 
Lambek calculus. 
To obtain a Lambek categorial grammar L from 
a basic categorial grammar, one has to add the two 
rules of functional abstraction (T is a non-empty se- 
quence of categories): 
:/Y' T --+ x (ab tr\) 
--~ x\y 
(abstr/) T, y --~ x T--~ x/y 
The rule (abstr\) reads: If x can be derived from 
the category sequence 7' under the assumption that 
y has been put in front of T then it holds that x\y 
is derivable from T alone. An example of an L-deri- 
vation in a somewhat abbreviated notation 1 is given 
in figure 22. This analysis is an adaptation of ideas 
due to \[Hepple 1990a\]. 
In the remainder of the paper, we first give a 
brief overview on other approaches to parsing with 
extended categorial grammars. Then the modified 
CKY-algorithm is presented. 
2 General Processing Issues 
in connection with parsers for extended categorial 
grammars, the problem of "spurious ambiguities" or 
derivational equivalence has appeared in a massive 
way: One syntactic (or semantic) reading of a sen- 
tence can be derived in many, many ways. There 
have been several proposals in tile literature how to 
tackle this problem. The first group of approaches 
uses repeated equivalence tests such that the spu- 
rious ambiguities stay local and do not carry over 
1 Only the categories on the righthand sides of the ---~-signs 
are shown. 
2Ignore the extra decoration in this figure for a moment. 
i 233 
globally to the final parse results (\[Karttunen 1989\], 
\[Hepple, Morrill 1989\]). But the presence of local 
spurious ambiguities means that still some superflu- 
ous work has to be done by the parser. The second 
group of :methods uses top-down information to avoid 
equivalent derivations completely. This has been il- 
lustrated for the method of prediclive combination in 
\[Wall, Wittenburg 1989\] where, instead of functional 
composition, rules like the following one are used: 
(PfCl) xll(x~lx3),x~lx4-+ xll(x41x3) 
This :method has the shortcoming of changing the 
set of derivable sequents with regard to the orig- 
inal rule set. Check e.g. the sequent xil(x2lx3), 
x~l(x~lxs), x~lx6, (x61xs)lxa --+ x~. 
Since those extended categorial grammars which 
are based on the explicit use of functional composi- 
tion and type raising seem to be reluctant to leave be- 
hind the problem of spurious ambiguity, we have cho- 
sen the Lambek calculus as a different setup for an 
extended categorial grammar where those rules are 
theorems but not axioms. In the Lambek calculus, 
the problem of spurious ambiguities disappears since 
one can define a normal form for derivations which 
can be carried out on the fly (cf. \[Hepple 1990b\], 
\[KSnig 1989\]). One essential characteristic of this 
normal form is the following: If there is a choice 
between using an application rule and an abstrac- 
tion rule, the abstraction rule is preferred. Another 
requirement is that no spontaneous uses of the ab- 
straction rules do occur (cf. Prawitz normal form, 
\[Prawitz 1965\]). This means that an abstraction rule 
can only be triggered by lexical material. 
3 A chart parser for extended 
categorial grammar 
The CKY-algorithm (\[Aho, Ullman 1972\]) works 
bottom.-up. It uses a chart of well-formed substrings 
in order to avoid redundant work. By defining a 
parser for the basic categorial grammars, we intro- 
duce our particular representation of chart parsing. 
3.1 Chart parsing with a basic cate- 
gorial grammar 
A chart is a set of items. An item is a triple \[x, i,j\] 
which states the fact that the category x has been 
derived from the continuous part of the input string 
between the positions i and j. We use a slightly 
alienated sequent notation where the lefthand side 
of a sequent is the current chart, and the righthand 
side is the goal category g of the grammar. The 
*-sign marks the current scanning position. 
Figure 1 shows the processing steps which can 
be performed with a chart. The rule (axiom) de- 
scribes the conditions for successful termination of 
the parsing process: The chart has been processed 
until its end and there exists an item which covers 
the whole input string of length n. The rule (scan) 
replaces a lexeme in the input string by one of the 
categories it is assigned to in the lexicon. The rule 
(complete\) implements leftward functional applica- 
tion: a leftward looking functor which follows im- 
mediately its argument causes the addition of a new 
item with the value category. For rightward func- 
tional application, this works symmetrically. The 
inference rule-style representation of our algorithm 
abstracts away from details dealing with the appro- 
priate control mechanism which guarantees that e.g. 
all possible (complete\)-steps are performed with one 
item. 
The completer step is the part of the CKY- 
algorithm which determines its cubic time complex- 
ity. The applications of the rules (complete\) and (complete/) 
stay within the same time bound be- 
cause the search for a pair of reducible items works 
as in the original algorithm. In particular, the num- 
ber of items which span the same section of the input 
string has a constant bound which is the number of 
subcategories of the categories occuring in the lexi- 
con. This number is proportional to the size of the 
finite lexicon. 
(L lexicon; w lexeme; a string; I set of items) 
(~xiom) 
-~, 0 * ---~ Z 
n (sca~) 
x e L(w) 
\[ * W , Ot --* Z 
• ~ "--)" Z 
(complete\) 
'L$ J'L'  J 
(complete/) 
• 001--+ Z 
Figure I: A chart-parser Bc for basic categorial 
grammars 
3.2 Extending the basic chart parser 
In the traditional concept of a chart every item is 
visible to every other item in the chart. For the 
Lambek calculus, general accessability is inadequate 
because of its use of hypothetical reasoning: The ab- 
234 2 
t;traction rules assert hypothetical constituents which 
:must only be used in the subproof for the premise 
~equent. Thus, tile interaction of items must be reg- 
ulated in a more fine-grained way. 
Fig. 3 shows a section of an L-derivation where 
~ use of an abstraction rule occurs 3. The complex 
~rgurnent of the emitter category */\[... 
~;riggers the act of a,sserting its own arguments as hy- 
pothetical categories (in a fixed order) in addition to 
a given non-empty category sequence xi...xj. We 
(:all the sequence of hypothetical categories which are 
~tsserted on the lefthand side tile left mini-chart, its 
symmetric counterpart is a right mini-chart. A sub- 
proof is carried out on the basis of these additional 
premise categories. If this subproof yields a category 
y, then x spans the whole sequence of premises under 
consideration. 
To pertbrm all abstraction rule, obviously an in- 
termingling of top-down and bottom-up directed ac- 
t,ions is needed. The sequence xi...xj, i.e. the po- 
:i;itions where the mini-charts are attached, h~ to 
be chosen non-determinist\tally. Instead of assert- 
ing copies of the mini-charts at all positions in the 
current chart, it is more convenient to put the mini- 
,:harts apart and to keel) them accessible from all 
chart positions. An adapted completer rule will built 
additional kems on the basis of the extended chart. 
Since there might be several emitter categories in 
lhe original sequent, it is important to provide the 
a:~serted mini-charts with enough information in of 
der to control their use. In particular, altachrae'al 
chai~zs can arise because, mird-charts ca". be attached 
I.o other mini-charts. 
We assume that each value category of a corn- 
plex argument of an emitter is marked with a unique 
number m. Items also have mfique nlHnbers a. A 
(left) index of an item is now a quadruple <t, m, i, p) 
where t stands tbr the type of miui-.chart the \tern 
belongs to: 1 (left), r (right), or ~, (,~one); ,n is the 
ltumber of the subcategory which caused the asser- 
tion of this \tern; i is a position nulnber relative to 
the (re\hi-)chart wi~h type t and number m; p is the 
~urnber of an item in another part of the <:hart where 
the mini-chart is attached to. Por items which have 
been deriw,'d on the basis of the items given by the 
input string, t := n, m = 0, and p = 0. 
In figure 4, the extended algorithm is presented. 
From the following discussion, various details and, 
in particular, a proof of correctness, are omitted due 
to the laek of space. The rule (compl\s) is roughly 
equivalent to (complete\) in the basic algorithm: two 
items which are adjacent in the same piece of the 
(:hart are combined. The rule (compl\r) allows an 
il;em at tile beginning of a right mini-chart to corn- 
line witll sorne other item with number al on its 
left. The function newlast adds the information 
td~out this new attachment point at the end of the 
3In the following, symmetric cases are mosl, ly omitted. 
right attachment chain of item a2 (see item 6 in fig- 
ure 2 where the item id 0 is replaced by the id 5 
because of'a (compl\ r)-step which combines items 5 
and 9). In order to avoid cycles, the intersection o\[' 
the mini-chart numbers which are fbund by travers- 
ing all the tbur attachment chain's has to be empty. 
The work of the abstraction rule (abstr\) is split into 
the two sub-tasks: The rule (emit\) pertbrms the 
assertion of the mini-charts. The rule (di,sc\) ("dis- 
charge") is a specialization of the completer rule for 
fllnctor categories which are emitters. The condi- 
tions in the rule guarantee the following: Both mini- 
charts which are due to the current emitter must 
have been used completely in deriving y. The rigi~t 
index of the attachment point p4 of y (determined 
by a function ri) must be equal to the left index 
of the emitter. The value category x must not bear 
any information about the involvement of the cur- 
rent mini-charts in its derivation. This is achieved 
by going back to the attachment point of the left 
mini-chart. 
Since each of the O(n) mini-charts fbr an input 
string of length n can only be used once in a deriva- 
tion, and since we currently do not know any rea- 
sonable restrictions, we assume that any one of the 
©(n!) permutations of the mini-charts possibly can 
be used in a specific derivation. For every item in the 
initial chart, O(n!) completer-steps can be possible. 
This means that the whole parser has a time com- 
plexity of O(n ~ x n!). In order to evaluate the result 
for tim given algorit, hm, one has to be aware of tile 
fact that. this parsing procedure can handle n-fold 
extraction from phrases - to speak linguistically. 
If we restrict ourselves to single extraction (this 
can be implemented by checking the length of the 
attachment chain before performing (compl\ r)) then 
the time complexity reduces drastically: The formula 
n!/(n - k')! tbr tile variations of length k of a set of 
length 'n equals n tbr k' = 1. Thus, the time corn- 
plexity of the algorithm is O(n:~). If we want to 
describe two-tbld extraction like in figure 2, i.e. use 
a "2-restricted version of Lc", the algorithm needs 
O(n 4) steps for deriving an input string of length n. 
Conjecture: 1-restricted Lc covers the rules of 
functional compositio n and type raising, whereas 2- 
rest, rioted LC suffices to derive the rules of predictive 
combination. 
4 Conclusion 
We have presented an extension of tile CKY-algo- 
rithm for an instance of the family of extended cate- 
gorial grammars which uses hypothetical reasoning, 
i.e. for the bidirectional Lambek categorial gram- 
mars. Although the current algorithm has an un- 
desirable time complexity in its gener~d case, one 
carl define restricted versions which cover tile kind 
of structures found in linguistic examples with time 
complexity O(n 3) or O(n 4) depending on the. fact 
3 235 
whether one wishes to describe single or double ex- 
traction from phrases. It is an interesting question 
whether better Mgorithms for the general case can 
be found. 
There are many ways to improve this new algo- 
rithm concerning its "average complexity" in practi- 
cal applications. For example, the search for a part- 
ner item in a completer-step could be reduced by 
using appropriate heuristics. Further incre~es in ef- 
ficiency can be expected, if one allows for arbitrary 
embedding of disjunctions and slashes in categories 
(of. \[Benthem 19891, \[Morrill 1989\]) instead of using 
the flat set notation which amounts to a disjunctive 
normal form of categories in the lexicon. 
Theoretical complexity results for a group of cat- 
egorial grammars reaching beyond context-free lan- 
guages are known, as well. From the fact that 
certain extended versions of categorial grammars 
(CCG) are equivalent with the Tree Adjoining Gram- 
mars (TAG) (\[Weir, Josh\[ 19881), the O(n 4 log n)- 
time complexity of TAG-parsers (\[tIarbusch 1989\]), 
in principle, carries over to CCG. 
The connection of the propositional grammar 
calculi as discussed above with a unification de- 
vice for first order germs is done by taking 
care of the variable bindings in the hypotheti- 
cal categories which are asserted by the abstrac- 
tion rules. The recognition problem is still decid- 
able (cf. \[Pareschi 1988\]) although surely not well- 
behaved as this is the case for all non-propositional 
grammars. For example, the recognition problem 
for a. basic categorial grammar witti feature uni- 
fication is NP-complete (\[Morrill 1988\]). Higher 
order versions of categorial grammars like the 
ones being produced in the CUG/UCG-frameworks 
(\[Uszkoreit 1986\],\[Zeevat et at. 1986\]) where vari- 
ables can range over categories are in general un- 
decidable because higher order unification itself is 
undecidable. 
5 Acknowledgements 
The research reported in this paper is supported by 
the LILOG project, and a doctoral fellowship, both 
from IBM Deutschland OmbH, and by the Esprit Ba- 
sic Research Action Project 3175 (DYANA). I thank 
Andreas Eisele for discussion. The responsibility for 
errors resides with me. 
236 4 
"l(~l~p) (n,o,o,o) 
I~) '°'''°) _~ 
Wel% / 
whom 
S n,o,o,o) 
n,O,5,0) 15) 
s/np 
(n,O,l,O) (r,mt,l,5) 
In,O,1,0) _/" \ 
says 
~/((~\~p)\~) 
n,o,2,o) r,m2,1,11) 
(',3) 
r,m2,1,11} 12) 
(n,o,3,o) (r,ml,l,a) "~. 
np s/((s\np)\np) 
(n,o,2,o) s 
1~; ,°,'' / {~,),~,o~ \.. 
poto~ s/(~/~) 
~),o,4,o) 
{r,mL0,0) liebt np /~),rn3,1,6) 
loves (n,o,4,o) Ig, o,~,o~ /---~ 
(z',ml,O,O) (r,~,a,o,o) } r,,~l,t,CJ) (r,~,a,1 ,o) 
6) (8) 
(8\~p)\s (r,,n%0,o) 
(r,m2,1,0) (7) 
Figure 2: Long dist~mee dependency \[tv = (s\np)\np, iv2 = (s\np)\s\] 
x/\[... (y/x~)\xl~)... \x11)/~21\] 
\[. (y/~)\~ip... \~.)/~q 
Y 
~.. ~ x~. ~ ~...~;j 
Figure 3: Schema for the use of the abstraction rules 
5 237 
(axiom) 
(sca,O 
\[ (~)(ol \] I, 
(n,o,o,o) • -~ z 
(n,o,,,,o) 
\[ (~)(o> x 6 L(w) I, /n,o,i,o) 
(It,O,i+1,0) 
\[ O W ~ O~ ---~ Z 
o O~ --+ Z 
( compl\ s ) 
p2=OVp3=O 
, (t2,m2,i2,p3) , (tl,ml,il,pl) , ooz -+ z 
(t2,rn2,i2,p2) (t4,m4,i4,p4) (t4,m4,i4,p4) \]:\[ \] 
I, \[ Y{tI,ml,ii,pl) x\y (t2,m2,i2,p3) • OL --+ Z {t2,m2,i2,p2) (t 4,m4,14,p4) 
(compl\ I) 
chain(p1) F1 chain(p2) N chain(p3) n chain(p4) = {} 
newlast(pl, a2) \[(v)~o, l \[(~\~)~o2~ l \[(~)~o~. l 
I, (*l,ml,il,pl) / , / (ta,ma,i3,p8) / , / (t~,,~.,.,pl) / ,'c~---+ z 
(1,m2,0,p2} J | (t4,m4,i4,p4} J k (t4,m4,i4,p4} J 
, \[(y)~oi~- -1 '\[ (~\y)(o~ 1 _, 
L{1,'~2,°,P~)J k it4, , ,p~ j 
(compl\ r) 
newlast(p4, al) 
I, (tl,ml,il,pl) , (r,ma,O,p3} , (tl,ml,il,pl\] , IDOL --~ Z 
(t2,m2,i2,p2} {t4,m4,i4,p4) (t4,m4,i4,p4) 
r (tl,ml,il,pl) (r,ma,o,pa} , ,c~ --+ z 
(t2,m2,i2,p2} (Z4,m4,i4,p4} 
(emit\) 
I, 1,ma,~ ,o) ,.,., (1,ma,l,O) , (r ..... o,o) , ..., /r,,na,q-1,0) 1,ma,p- 1,0) (1,m3,O,O) (r,m3, 1,0) (r,mS,q,O) 
x\\[... (vl,,al/x~)\'~,,)... \x~)/~\] 
tl,ml,il,pl) • OL "-+ Z 
2,m2,i2,p2) \[ ~X\[... (v(~al/~2e)\~*~)... i~11)/~=~1 
I, (tl,ml,il,pl) • o~ --+ z 
(t2,m2,i2,p2) 
(disc\) 
~±(p4) = (,1, ,,~1, il, ~i> 
.... 
(tl,ml,il,pl) • Ot --~ Z 
(r,mg,q,p4) (t2,m2,i2,p2) 
o~ --+ Z 
Figure 4: Chart-parser Lc for Lambek categorial grammars (L lexicon; w lexeme; a string; a item id; I set of 
items) 
238 6 

References 

lAdes, Steedman 1982\] Anthony E. Ades, Mark J. 
Steedman. On the order of words. Linguistics 
and Philosophy, 4:517-558, 1982. 

\[Aho, Ullman 1972\] Alfred V. Aho, Jeffrey D. Ull- 
man. The Theory of Parsing, Translation and 
Compiling. Prentice Hall, 1972. 

\[Benthem 1989\] aohan van Benthem. Categorial 
grammar and type theory. Linguistics and Phi- 
Iosophy, 1989. 

\[Harbusch 1989\] Karin ttarbusch. Effiziente Struk- 
turanalyse naliirlicher Sprache mit Tree Adjoining 
Grammars (Efficient Parsing of Natural Language 
with Tree Adjoining Grammars). PhD thesis, Uni- 
versit£t des Saarlandes, Saarbriicken, 1989. 

\[Hepple 1990a\] Mark Hepple. GramrnaticM rela- 
tions and the Lambek calculus. In Proc. of 
the Symposium on Discontinous Constituency, 
Tilburg, The Netherlands, Institute for Language 
Technology and Artificial Intelligence, 1990. 

\[Hepple 1990b\] Mark Hepple. Normal form theorem 
proving for the Lambek calculus. This volume. 

\[Hepple, Morrill 1989\] Mark }tepple, Olyn Morrill. 
Parsing and derivational equivalence. In Proc. 
EACL , Manchester, 1989. 

\[Karttunen 1989\] Lauri Karttunen. Radical lexical- 
ism. In Mark Baltin, Anthony Kroeh, eds., Alter- 
~zalive Conceptions of Phrase Stvact'are, Univer- 
sity of Chicago Press, 1989. 

\[K6nig 1989\] Esther Kbnig. Parsing as natural de- 
duction. In Pr'oc. ACL , Vancouver, B.C., 1989. 

\[Lambek 1958\] Joachim Lambek. The mathemat- 
ics of sentence structure. American Mathematical 
Monthly, 65:154-170, 1958. 

\[Morrill 1988\] Glyn Morrill. £'ztraction and Coor- 
dination in Phrase S'tr~cture Grammar aTtd Cat- 
egorial Grammar. PhD thesis, University of Ed- 
inburgh, Centre for Cognit.ive Science, Edinburgh, 
Scotland, i988. 

\[Morrill 1989\] Gtyn Morrill. Grammar as Logic. 
Technical Report EUCCS/I{P-34, Centre for Cog- 
nitive Science, University of Edinburgh, Edin- 
burgh, Scotland, 1989. 

\[Pareschi 1988\] Remo Pareschi. A definite clause 
version of categorial grammar. In Proc. ACL , 
Buffalo, N.Y., 1988. 

\[Prawitz 1965\] Dag PrawRz. Natural Deduction. 
Alqvist and Wiksell, Uppsata, 1965. 

\[Uszkoreit 1986\] Hans Uszkoreit. Categorial unifica- 
tion grammars. In Proc. COLING , Bonn, 1986. 

\[Wall, Wittenburg 1989\] Robert, E. WM1, Kent Wit- 
tenburg. Predict.ire normM forms tbr function 
,::omposition in cat, egorial granim~rs. In Proc. of 
the I'a~e~alional Workshop on Parsing Technolo- 
gies, Carnegie Mellon University, 1989. 

\[Weir, Josh\[ 1988\] D.J. Weir, A.K. Josh\[. Combi- 
natory categorial grammmars: Generative power 
and relationship to linear context-free rewriting 
systems. In Proc. ACL , Buffalo, N.Y., 1988. 

\[Zeevat et al. 1986\] Henk Zeevat, Ewan Klein, Jo 
Calder. Unification Categorial Grammar. Tech- 
Meal Report EUCCS/R.P-21, Centre for Cogni- 
tive Science, University of Edinburgh, Edinburgh, 
Scotland, 1986. 
