POLYNOMIAL TIME PARSING OF COMBINATORY CATEGORIAL 
GRAMMARS* 
K. Vijay-Shanker 
Department of CIS 
University of Delaware 
Delaware, DE 19716 
David J. Weir 
Department of EECS 
Northwestern University 
Evanston, IL 60208 
Abstract 
In this paper we present a polynomial time pars- 
ing algorithm for Combinatory Categorial Grammar. 
The recognition phase extends the CKY algorithm for 
CFG. The process of generating a representation of 
the parse trees has two phases. Initially, a shared for- 
est is build that encodes the set of all derivation trees 
for the input string. This shared forest is then pruned 
to remove all spurious ambiguity. 
1 Introduction 
Combinatory Categorial Grammar (CCG) \[7, 5\] is an 
extension of Classical Categorial Grammar in which 
both function composition and function application 
are allowed. In addition, forward and backward 
slashes are used to place conditions on the relative 
ordering of adjacent categories that are, to be com- 
bined. There has been considerable interest in pars- 
ing strategies for CCG' \[4, 11, 8, 2\]. One of the major 
problems that must be addressed is that of spurious 
ambiguity. This refers to the possibility that a CCG 
can generate a large number of (exponentially many) 
derivation trees that assign the same function argu- 
ment structure to a string. In \[9\] we noted that a CCG 
can also generate exponentially many genuinely am- 
biguous (non-spurious)derivations. This constitutes 
a problem for the approaches cited above since it re- 
suits in their respective algorithms taking exponential 
time in the worst case. The algorithm we present is 
the first known polynomial time parser for CCG. 
The parsing process has three phases. Once the 
recognizer decides (in the first phase) that an input 
can be generated by the given CCG the set of parse 
*This work was partially supported by NSF grant IRI- 
8909810. We are very grateful to Aravind Joshi, Michael Niv, 
Mark Steedman and Kent Wittenburg for helpful discussior~. 
1 
trees can be extracted in the second phase. Rather 
than enumerating all parses, in Section 3, we describe 
how they can be encoded by means of a shared forest 
(represented as a grammar) with which an expoo en- 
tial number of parses are encoded using a polynomi- 
ally bounded structure. This shared forest encodes 
all derivations including those that are spuriously am- 
biguous. In Section 4.1, we show that it is possible to 
modify the shared forest so that it contains no spuri- 
ous ambiguity. This is done (in the third phase) by 
traversing the forest, examining two levels of nodes at 
each stage, detecting spurious ambiguity locally. The 
three stage process of recognition, building the shared 
forest, and eliminating spurious ambiguity takes poly- 
nomial time. 
1.1 Definition of CCG 
A CCG, G, is denoted by (VT, VN, S, f, R) where VT is 
a finite set of terminals (lexical items), VN is a finite 
set of nonterminals (atomic categories), S is a dis- 
tinguished member of VN, f is a function that maps 
elements of VT to finite sets of categories, R is a fi- 
nite set of combinatory rules. Combinatory rules have 
the following form. In each of the rules x, y, zl,.., are 
variables and li E {\,/}. 
1. Forward application: z/y y .--. z 
2. Backward application: y z\y ~ z 
3. Forward composition (for n > 1): 
~ly yllz112... I.z. - xllz112.., l~z. 
4. Backward composition (for n_> i): 
yl,z~12...l.=, x\y--* ~I~=~12...I.=~ 
In the above rules, z \[ y is the primary category 
and the other left-hand-side category is the secondary 
category. Also, we refer so the leftmost nonterminal 
of a category as the target of the category. We assume 
that categories are parenthesis-free. The results pre- 
sented here, however, generalize to the case of fully 
parenthesized categories. The version of CCG used 
in \[7, 5\] allows for the possibility that the use of these 
combinatory rules can be restricted. Such restrictions 
limit the possible categories that can inatantiate the 
variables. We do not consider this possibility here, 
though the results we present can be extended to han- 
dle these restrictions. 
Derivations in a CCG involve the use of the com- 
binatory rules in R. Let ~ be defined as follows, 
where Tt and T2 are strings of categories and termi- 
nals and c, cl, c2 are categories. 
• If ctc2 ---* c is an instance of a rule in R then 
TtcT2 ~ Ttctc2T2. 
• If c E f(a) for some a E Vr and category c then 
TzcT2 ==~ TtaT2. 
The string language generated is defined as 
L(G)- {w IS =~ w I w e V~ }. 
1.2 Context-Free Paths 
In Section 2 we describe a recognition algorithm that 
involves extending the CKY algorithm for CFG. The 
differences between the CKY algorithm and the one 
presented here result from the fact that the derivation 
tree sets of CCG have more complicated path sets than 
the (regular) path sets of CFG tree sets. Consider 
the set of CCG derivation trees of the form shown in 
Figure 1 for the language { ww t w E {a, b} ° }. 
Due to the nature of the combinatory rules, cate- 
gories behave rather like stacks since their arguments 
are manipulated in a last-in-first-out fashion. This has 
the effect that the paths can exhibit nested dependen- 
cies as shown in Figure 1. Informally, we say that CCG 
tree sets have context-free paths. Note that the tree 
sets of CFG have regular paths and cannot produce 
such tree sets. 
2 Recognition of CCG 
The recognition algorithm uses a 4 dimensional ar- 
ray L for the input at...a,. In entries of the ar- 
ray L we cannot store complete categories since ex- 
ponentially many categories can derive the substring 
A 
I 
a 
S 
B 
I 
b 
StA 
$|A tB 
B SIAIBIB 
b S1AIB/S SIN 
SIA/S SIB/S b 
I I 
a b 
Figure 1: Trees with context-free paths 
ai... aj I it is necessary to store categories carefully 
It is possible, however, to share parts of categories b~ 
tween different entries in L. This follows from the fac' 
that the use of a combinatory rule depends only on 
(1) the target category of the primary category of th~ 
rule; (2) the first argument (sufrLx of length 1) of th~ 
primary category of the rule;(3) the entire (bounded 
secondary category. Therefore, we need only find thi: 
(bounded) information in each array entry in ordel 
to determine whether a rule can be used. Entries o 
the form ((A, a), T) are stored in L\[i, j\]\[p, q\]. This en 
codes all categories whose target is A, suffix ~, am 
that derive the ai ... aj. The tail T and the indices j 
and q are used to locate the remaining part of thes~ 
categories. Before describing precisely the informatior 
that is stored in L we give some definitions. 
If ~ E ({\,/}VN)" then \[a\[ = n. Given a CCG, 
G = (VT, VN,S,f,R) let kt be the largest n such 
that R contains a rule whose secondary category is 
ylzzzl2... InZn and let k2 be the maximum of kl and 
all n where there is some c E f(a) such that c = As 
and \]o~ I = n. 
In considering how categories that are derived in 
the course of a derivation should be stored we have 
two cases. 
1. Categories that are either introduced by lexical 
1 This is possible since the length of the category can be linear 
with respect to j - i. Since previous approaches to CCG parsin~ 
store entire categories they can take exponential time. 
items appearing in the input string or whose length 
is less that kt and could therefore be secondary cat- 
egories of a rule. Thus all categories whose length is 
bound by k~ are encoded in their entirety within a sin- 
gle array entry. 
2. All other categories are encoded with a sharing 
mechanism in which we store up to kt arguments lo- 
cally together with an indication of where the remain- 
ing arguments can be found. 
Next, we give a proposition that characterizes when 
an entry is included in the array by the algorithm. 
An entry (A, a), T) E L\[i, j\]~>, q\] where A E VN and 
a ~ ({\,/}VN)* when one of the following holds. 
If T = 7 then 7 e {\, I}VN, 1 < I~l < kx, and for 
some a' ~ ({\,/}VN)* the following hold 
(1) Aa'ct "';~ hi...%-tAa'Taq+t ...aj. 
(2) An'7 ~ ap...%. 
(3) Informally, the category An'7 in (1) above is "de- 
rived" from Aatc~ such that there is no intervening 
point in the derivation before reaching An7 at which 
the all of the suffix a of Aa~a has been "popped"• 
Alternatively, ifT = - then 0 <: \[a I < kt +k2, 
(p, q) = (0, 0) and Ac~ =~=t, al...a~. Note that we 
have In\[ < kl + k2 rather than \[M <_ k~ (as might 
have been expected from the discussion above). This 
is the case because a category whose length is strictly 
less than k2, can, as a result of function composition, 
result in a category of length < kl + k~. Given the 
way that we have designed the algorithm below, the 
latter category is stored in this (non-sharing) form. 
2.1 Algorithm 
If c E f(ai) for some category c, such that c - An, 
then include the tuple ((A, a),-) in L\[i, i\]\[0, 0\]. 
For some i and j, l < i < j <_ n consider each rule 
x/~ ~ltzt... I,~z,, ~ xllzt.., l.,z., 2. 
For some k, i < k < j, we look for some ((B, B), -) E 
L\[k+l,j\]\[O,O\], where IN - m, (corresponding to 
the secondary cate$ory of the rule) and we look for 
((A, a/B), T) E L\[i, k\]\[p, q\] for some a, T, p and q 
(corresponding to the primary category of the rule). 
From these entries in L we know that for some 
c~' Aa%/B =~ ai...ak and B/3 =~ ak+1...a~. 
2Backward composition and application are treated in the 
same way as this rule, except that all occurrences below of i 
and k are swapped with occurrences of k+ 1 and j, respectively. 
Thus, by the combinatory rule given above we have 
Asia/3 ~ hi...aj and we should store and encod- 
ing of the category Acgaf? in L\[i, j\]. This encoding 
depends on cd, a, fl, and T, 
If \[~\[ < kl + k2 then (case la) add ((A, aft), -) to 
L\[i, j\]\[0, 0\]. Otherwise, (case lb) add ((A,  ),/B) to ~\[i,/\]\[i, k\]. 
*T~- andre> 1 
The new category is longer than the one found in 
L\[i, k\]\[p, q\]. If a ¢ e then (case 2a) add ((A,  ), IS) 
to L\[i, Jill, k\], otherwise (case 2b) add ((A, ~),T) to 
L\[i, j\] \[p, q\]. 
*T~- andrn= 1 (case 3) 
The new category has the same length as the one found 
in L\[i, k\]~, q\]. Add ((A, ~/), T) to L\[i, j\]~, q\]. 
.T----7 ~- and m----O 
The new category has the a length one less than the 
one found in L\[i, k\]~, q\]. If a ~ e then (case 4a) 
add ((A, a), T) to. L\[i, j\]\[p, q\]. Otherwise, (case 4b) 
since a = • we have to look for part of the category 
that is not stored locally in L\[i, k\]~, q\]. This may be 
found by looking in each entry Lip, q\]\[r, s\] for each 
((A, ~'7), T'). We know that either T' = - or fl' ¢ e 
and add ((A, ~'), T') to L\[i, jilt, s\]. Note that for some 
a", Aa'l~17 ~ a v. .aq, Aa"/3'/B a~ .ak, 
and thus by the combinatory rule above Au'~ ~ =~ 
al • • • a t • 
As in the case of CKY algorithm we should have 
loop statements that allow i, j to range from 1 through 
n such that the length of the spanned substring starts 
from 1 (i - j) and increases to n (i = 1 and j --- n). 
When we consider placing entries in L\[i,j\] (i.e., to 
detect whether a category derives ai•..ai) we have 
to consider whether there are two subconstituents (to 
simplify the discussion let us consider only forward 
combinations) which span the substrings ai .. • ak and 
ak+l...aj. Therefore we need to consider all values 
for k between i through j - 1 and consider the entries 
in L\[i,k\]~,q\] and L\[k+ 1,j\]\[0, 0\] where i ~ p _< q < k 
orp=q=0. 
The above algorithm can be shown to run in time 
O(n 7) where n is the length of the input. In case 4b. 
we have to consider all possible values for r, s between 
p and q. The complexity of this case dominates the 
complexity of the algorithm since the other cases do 
involve fewer variables (i.e., r and s are not involved). 
Case 4b takes time O((q - p)2) and with the loops for 
i, j, k, p, q ranging from 1 through n the time complex- 
ity of the algorithm is O(n't). 
However, this algorithm can be improved to obtain 
a time complexity of O(n s) by using the same method 
employed in \[9\]. This improvement is achieved by 
moving part of case 4b outside of the k loop, since 
looking for ((A, ff/7'), T~) in LIp, q\]\[r, s\] need not be 
done within the k loop. The details of the improved 
method may be found in \[9\] where parsing of Linear 
Indexed Grammar (LIG) was considered. Note that 
O(n s) (which we achieve with the improved method) 
is the best known result for parsing Tree Adjoining 
Grammars, which generates the same class of lan- 
guages generated by CCG and LIG. 
A\[.-a\] --. A, \[a,\]... A, x \[a,-a \] A,\[../~\] A,+I \[ai+l\]... A,\[an\] 
A\[a\] "~ a 
The first form of production is interpreted as: if a 
nonterminal A is associated with some stack with the 
sequence cr on top (denoted \[-.c~\]), it can be rewritten 
such that the i th child inherits this stack with ~ re- 
placing a. The remaining children inherit the bounded 
stacks given in the production. 
The second form of production indicates that if a non- 
terminal A has a stack containing a sequence a then 
it can be rewritten to a terminal symbol a. 
The language generated by a LIG is the set of strings 
derived from the start symbol with an empty stack. 
3 Recovering All Parses 
At this stage, rather than enumerating all the parses, 
we will encode these parses by means of a shared forest 
structure. The encoding of the set of all parses must be 
concise enough so that even an exponential number of 
parses can be represented by a polynomial sized shared 
forest. Note that this is not achieved by any previously 
presented shared forest presentation for CCG \[8\]. 
3.1 Representing the Shared Forest 
Recently, there has been considerable interest in the 
use of shared forests to represent ambiguous parses 
in natural language processing \[1, 8\]. Following Bil- 
lot and Lang \[1\], we use grammars as a representa- 
tion scheme for shared forests. In our case, the gram- 
mars we produce may also be viewed as acyclic and-or 
graphs which is the more standard representation used 
for shared forests. 
The grammatical formalism we use for the repre- 
sentation of shared forest is Linear Indexed Grammar 
(LIG) a. Like Indexed Grammars (IG), in a LIG stacks 
containing indices are associated with nonterminals, 
with the top of the stack being used to determine the 
set of productions that can be applied. Briefly, we 
define LIG as follows. 
If a is a sequence of indices and 7 is an index, we 
use the notation A\[c~7\] to represent the case where a 
stack is associated with a nonterminal A having -y on 
top with the remaining stack being the c~. We use the 
following forms of productions. 
aIt has been shown in \[I0, 3\] that LIG and CCG generate 
the same class of languages. 
3.2 Building the Shared Forest 
We start building the shared forest after the recognizer 
has completed the array L and decided that a given 
input al ... an is well-formed. In recovering the parses, 
having established that some ~ is in an element of L, 
we search other elements of L to find two categories 
that combine to give a. Since categories behave like 
stacks the use of CFG for the representation of the set 
of parse trees is not suitable. For our purposes the LIG 
formalism is appropriate since it involves stacks and 
production describing how a stack can be decomposed 
based on only its top and bottom elements. 
We refer to the LIG representing the shared forest 
as Gsl. The set of indices used in Ga! have the form 
(A, a, i, j). The terminals used in Gs/ are names for 
the combinatory rule or the lexical assignment used 
(thus derived terminal strings encode derivations in 
G). For example, the terminal Fm indicates the use 
of the forward composition rule z/y yllzII2... ImZm 
and (c, a) indicates the lexical assignment, c to the 
symbol a. We use one nonterminal, P. 
An input al...an is accepted if it is the case that 
((S, e), -) 6 L\[1, n\]\[0, 0\]. We start by marking this 
entry. By marking an entry ((A, c~), T) e L\[i, j\]~, q\] 
we are predicting that there is some derivation tree, 
rooted with the category S and spanning the input 
al ...a,, in which a category represented by this en- 
try will participate. Therefore at some point we will 
have to consider this entry and build a shared forest 
to represent all derivations from this category. 
Since we start from ((S, e),-) E L\[1, hi\[0, 0\] and 
proceed to build a (representation of) derivation trees 
in a top down fashion we will have loop statements 
that vary the substring spanned (a~...aj) from the 
largest possible (i.e., i = 1 and j = n) to the smallest 
(i.e., i = j). Within these loop statements the algo- 
rithm (with some particular values for i and j) will 
consider marked entries, say ( (A, ct), T) E L\[i, j\]~, q\] 
(where i < p < q < j or p = q = 0), and will build 
representations of all derivations from the category 
(specified by the marked entry) such that the input 
spanned is ai...aj. Since ((A, ~), T) is a representa- 
tion of possibly more than one category, several cases 
arise depending on ot and T. All these cases try to un- 
cover the reasons why the recognizer placed thin entry 
in L\[i, j\]~, q\]. Hence the cases considered here are in- 
verses of the cases considered in the recognition phase 
(and noted in the algorithm given below). 
Mark ((S, e), -) in L\[1, n\]\[0, 0\]. 
By varying i from 1 to n, j from n to i and for all ap- 
propriate values of p and q if there is a marked entry, 
say ((d, a), T) ~ L\[i,j\]~p, q\] then do the following. 
• Type I Production (inverse of la, 3, and 4a) 
If for some k such that i _ k < j, some a, 13 such 
that ~' = a/3, and B E VN we have ((A, a/B), T) E 
L\[i, k\]\[p, q\] and ((B,/3), -) E L\[k + 1, j\]\[0, 0\] then let 
p be the production 
P\[..(A, a', i, j)\] -..* F,, P\[..(A, a/B, i, k)\] P\[(B, B, k + 1, j)\] 
where m = \[/31. If p is not already present in G°! then 
add p and mark ((A, a/B), T) e L\[i, k\]~,, q\] as well as 
((B,/3),-) e L\[k + i, j\]\[0, 01. 
• Type $ Production (inverse of lb and 2a) 
If for some k such that i < k < j, and a,B,T',r,s,k 
we have ((A,a/B),T') E L\[i,k\]\[r,s\] where (p,q) = 
(i, k), ((B, ~'), -) e L\[k + 1, j\]\[0, 0\], T =/B, and the 
lengths of a and a' meet the requirements on the cor- 
responding strings in case lb and 2a of the recognition 
algorithm then then let p be the production 
P\[..(A, a/B, i, k)(A, a', i, 1)\] -- 
F,,, P\[..(A, or~B, i, k)\] P\[(B, a', k + 1, j)\] 
where m = la'l. If p is not already present in G°! 
then add p and mark ((A, a/B), T') e L\[i, k\]\[r, s\] and 
((B, ~'), -) e L\[k + 1,1\]\[0, 0\]. 
• Type 3 Production (inverse of 2b) 
If for some k such that i < k < j, and some B 
it is the case that ((A,/B), T) 6 L\[i, l:\]\[p, q\] and 
((B, ~'),-) E L\[k + 1, j\]\[0, 0\] where \]a'\] > 1 then then 
let p be the production 
P\[.-(A, a', i, 1)\] --. E,, P\[..(A,/B, i, k)\] P\[(B, a', k + 1, j)\] 
where m = Intl. If p is not already present in G,I 
then add p and mark ((A,/B),T) 6 L\[i, k\]~, q\] and 
((S, ~'), -) e L\[k + 1, j\]\[0, 0\]. 
• Type 4 Production (inverse of 4b) 
If for some h such that i < k < j, and some 
B,~',r,8,~, we and ((A, IB,),~') ~ L\[i,k\]\[r,~\], 
((A, a'7'), T) E L\[r,s\]~,q\], and ((B,e),-) 6 
L\[k + 1, j\]\[0, 0\] then then let p be the production 
P\[..(A, ~', i, j)\] -- 
Fo P\[..(A, ~'v', ,, ,)(A,/B, i, k)\] P\[(B, ,, k + 1, j)\] 
If p is not already present in G,! then add p and 
mark ((A,/B), 7') E L\[i, k\]\[r, s\] and ((B, e), -) 6 
L\[k + 1, j\]\[0, 0\]. 
* Type 5 Production 
If j = i, then it must be the case that T = - and there 
is a lexical assignment assigning the category As / to 
the input symbol given by at. Therefore, if it has not 
already been included, output the production 
P\[(a, ~', i, i)\] - (A~, a,) 
The number of terminals and nonterminals in the 
grammar is bounded by a constant. The number of in- 
dices and the number of productions in G,! are O(nS). 
Hence the shared forest representation we build is 
polynomial with respect to the length of the input, n, 
despite the fact that the number of derivations trees 
could be exponential. 
We will now informally argue that G,! can be built 
in time O(nZ). Suppose an entry ((A, a'), T) is in 
L\[i,j\]~,q\] indicating that for some /3 the category 
A/3c~' dominates the substring al...aj. The method 
outlined above will build a shared forest structure to 
represent all such derivations. In particular, we will 
start by considering a production whose left hand side 
is given by P\[..(A, ~', i, j)\]. It is clear that an intro- 
duction of production of type 4 dominates the time 
complexity since this case involves three other vari- 
ables (over input positions), i.e., r, sl k; whereas the 
introduction of other types of production involve only 
one new variable k. Since we have to consider all pos- 
sible values for r, s, k within the range i through j, this 
step will take O((j - 0 3) time. With the outer loops 
for i, j, p, and q allowing these indices to range from 1 
through n, the time taken by the algorithm is O(n7). 
Since the algorithm given here for building the 
shared forest simply finds the inverses of moves made 
in the recognition phase we could have modified the 
recognition algorithm so as to output appropriate G,! 
productions during the process of recognition without 
altering the asymptotic complexity of the recognizer. 
However this will cause the introduction of useless pro- 
ductions, i.e., those that describe subderivations which 
do not partake in any derivation from the category S 
spanning the entire input string al ... a,. 
5 
4 Spurious Ambiguity 
We say that a given CCG, G, exhibits spurious am- 
biguity if there are two distinct derivation trees for 
a string w that assign the same function argument 
structure. Two well-known sources of such ambiguity 
in CCG result from type raising and the associativity 
of composition. Much attention has been given to the 
latter form of spurious ambiguity and this is the one 
that we will focus on in this paper. 
To illustrate the problem, consider the following 
string of categories. 
At!A2 A2/Aa ... An-z/An 
Any pair of adjacent categories can be combined using 
a composition rule. The number of such derivations 
is given by the Catalan series and is therefore expo- 
nential in n. We return a single representative of the 
class of equivalent derivation trees (arbitrarily chosen 
to be the right branching tree in the later discussion). 
4.1 Dealing with Spurious Ambiguity 
We have discussed how the shared forest representa- 
tion, Gsl, is built from the contents of array L. The 
recognition algorithm does not consider whether some 
of the derivations built are spuriously equivalent and 
this is reflected in G,I. We show how productions of 
G,! can be marked to eliminate spuriously ambigu- 
ous derivations. Let us call this new grammar Gnu. 
As stated earlier, we are only interested in detecting 
spuriously equivalent derivations arising from the as- 
sociativity of composition. Consider the example in- 
volving spurious ambiguity shown in Figure 2. This 
example illustrates the general form of spurious am- 
biguity (due to associativity of composition) in the 
derivation of a string made up of contiguous substrings 
ai~ ...a h, a~ ...aj2, and ai~ ...aj8 resulting in a cat- 
egory Az alot2a3. For the sake of simplicity we assume 
that each combination indicated is a forward combi- 
nation and hence i2 = jl + 1 and i3 = J2 + 1. 
Each of the 4 combinations that occur in the above 
figure arises due to the use of a combinatory rule, and 
hence will be specified in G,! by a production. For 
example, it is possible for combination 1 to be repre- 
sented by the following type I production. 
P\[..( At , ot' ot2 / A3, il , j2)\] -~ 
F,,, P\[..( Ax, ot' / A2, i, ,jx)\] P\[(A2, a2, i2, j2 )\] 
where i2 = jz + 1, ~' is a suffix of az of length less than 
A a a a • 1 1 2 3 
A1%~ 
A a /A A a /A A a 1 1 2 2 2 3 3 3 
a a a a a a 
il jl i2 12 i3 j3 
1123 
A a /A A a /A A a 11 2 22 3 33 
a a a a a a 
il jl i2 j2 13 j3 
Figure 2: Example of spurious ambiguity 
kl, and m = la2\[. Since Aloq/A3 and Aaa3 are used 
as secondary categories, their lengths are bounded by 
kl + 1. Hence these categories will appear in their en- 
tirety in their representations in the G,! productions. 
The four combinations 4 will hence be represented in 
G,! by the productions: 
Combination 1: P\[..(A1, a'ot2/Aa, il, j2)\] --* 
Combination 2: P\[..(Aa, a'a~cra, ia, ja)\] "-* 
F,, P\[..(At, a'a2/A~, it, jr )\] P\[(A,, a3, j~ + 1, j, )\] 
Combination 3: P\["(A2, ot~ota,ja + 1,ja)\] --* 
F,, P\[..(A2, ot2/Aa, jx + 1, j2)\] P\[(Aa, ot,, j2 + 1,3'3)\] 
Combination 4: P\[.-(Ax, a'a2a,, il, j3)\] --* 
Fna P\["(Ax, ct'/A2, Q,/x)\] P\[(A2, a2c~3, ja + 1, j3)\] 
where., = = and = 
4We consider the case where each combination is represented 
by a Type 1 production. 
These productions give us sufficient information to de- 
tect spurious ambiguity locally, i.e., the local left and 
right branching derivations. Suppose we choose to re- 
tain the right branching derivations only. We are no 
longer interested in combination 2. Therefore we mark 
the production corresponding to this combination. 
This production is not discarded at this stage be- 
cause although it is marked it might still be useful in 
detecting more spurious ambiguity. Notice in Figure 3 
A Q a ~ a I 2 3 
Aaa~ 
A a /A A a IA A a IA A a 1 1 I 2 22 3 33 
a a a a a a a a 
io jO ii Jl i2 j2 i3 j3 
t23 
Aa/A AaalA Aa I 112 3 33 
a a 8 a a a I0 iO II 12 13 j3 
Figure 3: Reconsidering a marked production 
that the subtree obtained from considering combina- 
tion 5 and combination 1 is right branching whereas 
the entire derivation is not. Since we are looking for 
the presence of spurious ambiguity locally (i.e., by con- 
sidering two step derivations) in order to mark this 
derivation we can only compare it with the derivation 
where combination 7 combines Aa/A1 with Alala2a3 
(the result of combination 2) s. Notice we would have 
already marked the production corresponding to com- 
bination 2. If this production had been discarded then 
the required comparison could not have been made 
and the production due to combination 6 can not have 
been marked. At the end of the marking process all 
marked productions can be discarded 6 . 
In the procedure to build the grammar Gn8 we start 
with the productions for lexical assignments (type 5). 
By varying il from n to 1, jz from i + 2 to n, i~ from 
j3 to il + 1, and i3 from i.~ + 1 to j3 we look for a 
group of four productions (as discussed above) that 
locally indicates the the presence of spurious ambigu- 
ity. Productions involved in derivations that are not 
right branching are marked. 
It can be shown that this local marking of spuri- 
ous derivations will eliminate all and only the spuri- 
ously ambiguous derivations. That is, enumerating all 
derivations using unmarked productions, will give all 
and only genuine derivations. If there are two deriva- 
tions that are spuriously ambiguous (due to the as- 
sociativity of composition) then in these derivations 
there must be at least one occurrence of subderiva- 
tions of the nature depicted in Figure 3. This will 
result in the marking of appropriate productions and 
hence the spurious ambiguity will be detected. By 
induction it is also possible to show that only the spu- 
riously ambiguous derivations will be detected by the 
marking process outlined above. 
5 Conclusions 
• Several parsing strategies for CCG have been given 
recently (e.g., \[4, 11, 2, 8\]). These approaches have 
concentrated on coping with ambiguity in CCG deriva- 
tions. Unfortunately these parsers can take exponen- 
tial time. They do not take into account the fact that 
categories spanning a substring of the input could be 
of a length that is linearly proportional to the length 
of the input spanned and hence exponential in num- 
ber. We adopt a new strategy that runs in polynomial 
time. We take advantage of the fact that regardless 
of the length of the category only a bounded amount 
of information (at the beginning and end of the cate- 
5Although this category is also the result of combination 4, 
the tree with combinations 5 and 6 can not be compared with 
the tree having the combinations 7 and 4. 
6Steedman \[6\] has noted that although all multiple deriva- 
tions arising due to the so-called spurious amb;~ty yield the 
same "semantics" they need not be considered useless. 
7 
gory) is used in determining when a combinatory rule 
can apply. 
We have also given an algorithm that builds a 
shared forest encoding the set of all derivations for 
a given input. Previous work on the use of shared 
forest structures \[1\] has focussed on those appropri- 
ate for context-free grammars (whose derivation trees 
have regular path sets). Due to the nature of the CCG 
derivation process and the degree of ambiguity possi- 
ble this form of shared forest structures is not appro- 
priate for CCG. We have proposed a shared forest 
representation that is useful for CCG and other for- 
malLsms (such as Tree Adjoining Grammars) used in 
computational linguistics that share the property of 
producing trees with context free paths. 
Finally, we show the shared forest can be marked 
so that during the process of enumerating all parses 
we do not list two derivations that are spuriously am- 
biguous. In order to be able to eliminate spurious 
ambiguity problem in polynomial time, we examine 
two step derivations to locally identify when they are 
equivalent rather than looking at the entire derivation 
trees. This method was first considered by \[2\] where 
this strategy was applied in the recognition phase. 
The present algorithm removes spurious ambiguity 
in a separate phase after recognition has been com- 
pleted. This is a reasonable approach when a CKY- 
style recognition algorithm is being used (since the de- 
gree of ambiguity has no effect on recognition time). 
However, if a predictive (e.g., Earley-style) parser were 
employed then it would be advantageous to detect 
spurious ambiguity during the recognition phase. In 
a predictive parser the performance on an ambigu- 
ous input may be inferior to that on an unambiguous 
one. Due to the spurious ambiguity problem in CCG, 
even without genuine ambiguity, the purser's perfor- 
mance be poor if spurious ambiguity was not detected 
during recognition. CKY-style parsers are closely re- 
lated to predictive parsers such as Earley's. There- 
fore, we believe that the techniques presented here, 
i.e., (1) the sharing of stacks used in recognition and in 
the shared forest representation and (2) the local iden- 
tification of spurious ambiguity (first proposed by \[2\]) 
can be adapted for use in more practical predictive 
algorithms. 
\[2\] 
\[3\] 
\[5\] 
\[6\] 
\[7\] 
\[8\] 
C9\] 
\[i0\] 
\[11\] 
soc. Comput Ling., 1989. 
M. Hepple and G. Morrill. Parsing and deriva- 
tional equivalence. In European Assoc. Comput. 
Ling., 1989. 
A. K. Joshi, K. Vijay-Shanker, and D. J. 
Weir. The convergence of mildly context-sensitive 
grammar formalisms. In T. Wasow and P. Sells, 
editors, The Processing of Linguistic Structure. 
MIT Press, 1989. 
R. Pareschi and M. J. Steedman. A lazy way 
to chart-parse with categorial grammars. In 25 ~h 
meeting Assoc. Comput. Ling., 1987. 
M. Steedman. Combinators and grammars. In 
1~. Oehrle, E. Bach, and D. Wheeler, editors, Cat- 
egorial Grammars and Natural Language Struc- 
tures. Foris, Dordrecht, 1986. 
M. Steedman. Parsing spoken language using 
combinatory grammars.: In International Work- 
shop of Parsing Technologies, Pittsburgh, PA, 
1989. 
M. J. Steedman. Dependency and coordination 
in the grammar of Dutch and English. Language, 
61:523-568, 1985. 
M. Toraita. Graph-structured stack and natural 
language parsing. In 26 th meeting Assoc. Corn- 
put. Ling., 1988. 
K. Vijay-Shanker and D. J. Weir. The recognition 
of Combinatory Categorial Grammars, Linear In- 
dexed Grammars, and Tree Adjoining Grammars. 
In International Workshop of Parsing Technolo- 
gies~ Pittsburgh, PA, 1989. 
D. J. Weir and A. K. Joshi. Combinatory cate- 
gorial grammars: Generative power and relation- 
ship to linear context-free rewriting systems. In 
26 th meeting Assoc. Comput. Ling., 1988. 
K. B. Wittenburg. Predictive combinators: a 
method for efficient processing of combinatory 
categorial grammar. In 25 th meeting Assoc. Corn- 
put. Ling., 1987. 
References 
\[1\] S. Billot and B. Lang. The structure of shared 
forests in ambiguous parsing. In 27 ~h meeting As- 
8 
