MASSIVE DIS/~MBIGUATION OF Id%R~E TEA~' COR~K)RA 
WITH FLEXIBLE CATEGORIAL G~KMM~R 
f 
Ton van der WOUDEN (CFLEX/INL) 
nirk HEYLEN (INL) 
INL 
P0stbus 9515 
2300 RA Leiden 
The Netherlands 
ABSTRACT 
A n6~ ~/~ of au~mtic l~u~/cal disa~i~ation of 
big t~ is d~ri~, us~u~ recent p~f- 
t/u~tical ~s~l~ f~ %/~ th~zyf of cat~rial 
~an~. 
O. Injection 
The Insti~te of ~tc~h I~(~i~ (I/~), ~z~ 
~ r~tc/u ~ %/ue ~lgian ~~, oensists 
of t~ de~. ~ of %7~ ~ of 
~a~, ~ of %/~ ~~s, is to ~id a 
~ for l~l~ical reset. R~s da~ 
is 8, pote~ntially /_~%~, ~t of ~. ~ 
is to supply a z~p~tati~ ~, if l~sible, 
~lete ~iew of ~n~0or~ (s~r~e I~70) 
stan~ ~_h. In o~ to ac~ %~ ~ial, an 
elf icier ~ta~ ~t~ ~ ~lication 
~fi-w~ ~ ~i(~. At ~/%is n~t (Febz~ 
1988) ~ I~L ~im~s is ~/~ ~t ~tati~ 
oo~ of t/~ ~_/u l~e; it ~tai~ ~er 45 
milli~n ~e/~, ~ 8~, ~ ~ ~. 
IN5 l~c~l ~ta~qse is not an ~ ~ i~If: 
it is ~t to ~ a ~i for ~ific l~Jects, f~ 
of ~/~ ~s of due data~ is ix9 fore f/~ r~ 
~t for a ~ gestation of ~cti~le~. R~ 
data~ is ~J~, ~ it is .not ~listic ~ 
on J~/~xj it wilt ~ it Ix~ib\]e t~ 
~c/u a ri~ ~ of ~nfonnation. R~fo~, a 
~ tJ~ ~i is %0 ~e all ~/~ we~ ~ t~ 
data~ av~lable for ~ I~, en t/~ 
I~ by ~ ef ficie~nt ~ ~x~ul 
a~plication ~ft~, en ~/~ o~ I~ 
onrich~ ~/~ ma~xgrial. Aut~tic mozl~hologic~ 
~uqalysis I~ r~ ~ c~i~ out m~ ~ l~_~l~s 
will ~ ~ ~rporat~ J~ut~D %/~ da~l~. Or~ 
l~el hig~r, we ~ J~nte~ J~% t/~ sizn~c 
not %/~ of an on-lk~ Imrse~" c~l~o fbr ii~ 
process of lemmatization an effective 
¢\[i~i~%/en ~e is ,~x~ss~ as well. %b 
%~ on ~ ~. ~ was ~ case for i~j 
~l~ic~l ar~ll~ , i~ s~tactic ~ is an 
~l~tion of a c~ial c~l~l~o ~d~ 
~uotion of ~ philo~ ~ i/~ La~ 
ca~i~ parser we use for ~ ~i~ation 
~taotic ~lysls is ~ topic of this pap~l ~ . 
i. A note on smbi~ty J_n C~tegorial G~r~ 
F~c/u li~ic ~i or framewoz~ ~~ is 
oc~r~n~ ~%/u ~ ~lem of ambi~ l~cal 
~tever way c¢~ deals with it as far as 
~~t/on is ~ ~ whatever neat 
soluti~ ~ ~ ~ ~t~h, %~ fa~ rem~ixls dqat 
(i) ~ ~ ~ii not dlsappea~, but (2) the 
explo6ions it gives rise to will cause (often 
irreparable) danage %0 (othec~ise) neatly conceived 
~tactic l~s~ or ~alyzers. C~t~orial 
~'~, ~id~ by ~ (~trici~ of ~ Lexi(x~, 
may seem by nature to be t.he first victims of this 
I~. S(m~ c~tc~i;orialis~ try to ciz~e~t 
t/~ problems by ~ inherently ~Dtiva~ 
~~ en ot/~i~ rigidly defined flexible 
take a cl~ look at ~ of ~ z~icti(~is 
c~ ~ ~/~tly, i.e. at ~ of t~ 
~ian~ that ~m alcoa r~tur~ly, but ,~ 
remain unnoticed at first sight ~ . Interesting 
invariants may act as greedy scissors, p~un~ 
away ~m~ny of the useless branc~s of the search 
tree. Catecjorial grammrs encode all syntactic 
~tien in the i~o The effect of this 
s~ra~ on ~ ~ence of ~i~t/es can 
gat/~ if one ~id t~e an oz~i~ p~'ase 
694 
stzuc~c~, 9~am~c and tun~ it into a categorial 
(~ ~bat happens is that for every cat~ in the 
PS g~:am~E one gets a set of categories in the 
Cate~risl grsna~c. C~ %be avarage, the n~ of 
nsw cate4~ies e@mls the ~ of occurzances of 
old cat~3Dry in %1%8 PS i~/les. A lexical ele~nt 
that is not at all aa~iguous as far as s~ntac%J.c 
cate~oz~ ass±gm~ant is oc~, in PSG, will 
a~ost c~Tain/~ beoc~ ambiguous in C~. Still, we 
claim %/~at effective, i.eo fast, disa,~iguati~, is 
.p~sible with CGo ~ rationale behind this claim 
:I.~ that effective disamhiguation does not depend as 
much c~ the de~£~e of ambiguity, but first and 
foz~st on tl~ nature of f/%e dissmbiguation 
~l~'Ic~'~do ~lIl~z'e~.~ "6m~big~llty is damaging to ola~Lgical 
~a p:c,~Jedures because there are no intrinsic 
Zr~e~%i~s of the system that can deal with it, 
aiUnost ~;ho reverse is ~ of cat~jorial 
v~, full l~r~fit is made of theft defining 
c/%aracteristics. :In order to appreciate these 
s~te~nts, the best thing %o do is look at a 
specifio J~l~tation of this idea. 
2° ~f~e L£:mbek calculus 
\]i~ this section we would like to present a 
categorial reduction system which is ~alogous to 
t2~ ~t0\]icatic~al fragment of propositional logic° 
We ~d.ll present it as a calculus, and will limit 
o~ese\].ve,~; to the formal description, thus ignorJ/~g 
semantic interpz~tation (which is not /nm~liately 
relevant for our ~ at hand). 
Some definitions 
Let BA£:CAT be a finite set of at/muic categories 
and CC~ a finite set of category forndrg 
connec~:ives. Then CAT (the set of all 
categories) is t_he induotive closure of BASCAT 
under (~NN, i.e~ the smallest set such that (i) 
BASCAT is a subset of CAT, and (ii) if X, Y are 
manbez~; of CAT and I is a msmber of CDNN, tt~n (xlY) 
is a ~ of CAT. 
So or~ could take BASCAT to be \[S, N, A, T, P} 
and C~I~N \[/, \, *} (these az~ called right 
division, left division and product, 
respes%ively). Some of the ma~rs of CAT are: 
{N, (NkS), ((N/N)*T), (S/(P\(N/S))) .... ). 
A o~\],~ category (xl Y) consiste of thre~ 
:h~m~diat~, suho~%occ~nts: X and Y, which are 
tbla,L~el~ catecdories , and %/1e oo~aeo~cive. When the 
c~3ot~ is '/' or 'V, the complex category is a 
functor. ~Inactor cate4~ories are associated with 
incxx~plete expressions: they will form an 
~.~ssion of ca~ Y (result) with an 
expressi~a of category X (arg~nt) ~ . In the case 
of right division, %h8 argument has to be found to 
fk~ right of the ~ category, whereas in the 
case of left division, the argument l~as to be found 
%0 ths left 5 o 'f1~e produc t o~ive '*' is %0 be 
interpreted as a c~x~atenation operator, i.e. a 
prock~ category (X'Y) is to be associated with an 
expression which is the ooncatenati~ of an 
expression of category X and an expressi~ of 
categozy Y in that o~der. 
Reduction rules 
A specific categorial grammar is 
characterized I~ the choice of basic c~be/jories and 
oennectives on the one hand, ~%d (m the set of 
reductic~ rules (xl %ks other. The system of 
reduction rules says how categories c~t be ccm~/J~ed 
to form larger o0nstih~ents. The application rule 
which cxlabines a funct~r with domain X "and rark3e Y 
with a suitable a~tm~nt of category X %0 give a Y, 
is only one of the possible reduction rules. 
I,%stead of t~{ing a set of reduction laws as 
pr~tltive axioms, we will investigate the 
categorial re,orion 8yst~n as a calculus, whare 
the reduction laws can be ccnsidered theore~u~ that 
follow from a set of axioms and a set of inference 
rules. Next we will see that the parsip~, of a 
syntagm is really the same thing, in ot/~r words, 
attempting a pz%9of for a theorem. 
Sequents 
Before we define the axioms and inference rules of 
the calculus, we need %o define the r~3tion of 
sequant 6 . 
A sequent is a pair (G,D) of finite (possibly 
\[~ ..... B.\] of categories. For categorial L- 
sequents, we require G to be non-e~ioty and n=l. 
For the sequent (G,D) we write G => D. The 
sequence G is called the antecedent, D the 
suocedent. For simplicity square brackets and 
ccmma's are often left out. 
Axloms and Ja~ference rules 
(I) ~ ~ of L are sequ~ts of t~ fo~n X => 
X. 
(2) Inference rules of L: X, Y and Z are 
categories, B, T, Q, u, V are sequences of 
categories, where P, T and Q are ,%on-eai0ty. 
695 
\[/R\] T => Y/X if T,Y => X 
\[\R\] T => Y~X if Y,T => X 
E/L\] U,Y/X,%V => Z ifW =>V 
and U,X,V => Z 
\[\I.\] U,T,Y~X,V => Z if T => Y 
~d U,X,V => Z 
\[*L\] U, XeY,V => Z if U,X,Y,V => Z 
\[*R\] P,Q => X*~f if P => X 8nd Q => Y 
~jether, c~nic~ns and inferenc~ rules define tl~ 
theorems of a categorial calc~ll~-~0 Suppose we have 
a sequent S, to fi~ out w~ther it is a t~)r£~n or 
not we have to apply several, of f/~e infe~e rules 
above till hog\]ring but axi(~ ~anain. A~ ~e n~y 
have noticed, all these rules involve the \].~moval 
of a ~nnective Jn some category. Let's p~\[caphrase 
~/m \[/L\] rule by way of ex~01e. It says: to find 
OUt w~ther a sequent with s~e fur~cor category 
Y/X is a theoz~L identify a sequence of 
categories that follow this category, and see 
whether Y => the identified .sequence is a theorem, 
and what preceded the catego~} + X + what followed 
the sequence => old succedent is a theorem. 
In the following ex6m~01e we present a proof with 
the relevant category printed in bold and the 
identified sequence underlined. 
a/b, d/(e/(f/a)), d, e, f => b \[/n\] 
d => d \[m~.0M\] 
a/b, e/(f/a), e, f => b \[/L\] 
e => e \[A~ICM\] 
a/b, f/a, f => b \[/n\] 
f => f \[m~\] 
a/b, a => b \[/n\] 
a => a \[A~IOM\] 
b => b \[~\[IOM\] 
If we could find an effioient augx~natic decision 
procedur~ %k~t would tell us whether a certain 
~/uent is either a theorem or not, then we wo~id 
have an efficient parser ~s well. The idea being, 
that the succedent represents s~ething like a 
sentenoe (the cag~gories of the words that make it 
up) and the antecedent the S (sentence) category. 
In t/%e next section we will discuss an 
implem~ltaticn of the decision procedure. 
3. The Theorem prover, alias parser 
An algorithm to prove a theorem, could go as 
follows. 
Giv~: a sequent ~.LTh n (~tegories: n--I in 
8ntecedent, 1 ~\]~ succedent. 
Start at the the first category of the succedent. 
If this is a functor, pick ihe relevant ix~f~ 
696 
~lle tl~It will elJndr~te %he oc~nsctive. If tl~ 
zn\].e tells you to identify a part of the sequent to 
one of ~gya sides of the category, then first take 
this t~ be one category. See wbe~r ~ou can prove 
t~ r~il.tJng sequent(s) (the sequent(s) in ths if- 
~3rt of th~ inferer~e x\]/le). If %he identification 
~ not ~_eld a ~\],t (i.e. it, ~siv61y 
calling %~*e ~, th~ Px>tt¢~a of c2fi~, ~\]~ms 
r~mginlng is not reacl~ed), i~oI take two eateouories 
a~d see if this do~s %/-~ trick° (~%tinue aCiding 
cat6~Z)rJ.es \[nltil ~ou have a p~of or iilez~ a~a ~c) 
ca te\]orie~ left. Ixt the latter case, notkeh~j i~; 
i(~t yet, because one (x~ald also have "taken the 
,'o~cxx~, or third functcm to start the proof ~¢3_t-ho 
If Jn %he end there a~'e no ~ ftu~ors left %o 
start the ellnflrmtion with, then the tt~)~em 
cm~~t be ~mov~ a~ one 'can even say %hat i% is 
falsJ. 
Clearly, t~.s procedure might take some time %0 
deoide on %he validity of a seq~lent. One might 
hope that %heor~,~ are proven rapidly, but w~l 
the sequents are false, a lot of ~ork has to be 
£k~. Fortunately enough, there is a si,101e way to 
prune away some branches of the search tree that 
are guarar, teed "to lead to faillDre. There is a 
necessazy formal condition that holds of valid 
ii~eore~ms which is easy to detect~ If a sequent does 
not have this formal characteristic, it cmmot be a 
theo.r~ral Even if %he inputted sequent does have "the 
z~/ired characteristic, in %he pz~ce~s of proving, 
there will be a lot of subproofs %bat need not be 
carried out because they will fail inmnediately. 
This formal characteri.%~tic or Jnveriant is known as 
val Benthem's Go,it, or Count for shor~. It counts 
%/\]e ~ of positive (range) and negative 
(domain) of a basic categoz¥ X in an 
arbitrary category, basic or complex. It may be 
defined as follows. 
count(X,X) = i, if X is a n~mber of BASCAT 
oount(X,Y) = O, if X,Y m~n~0ers of BASCAT, 
X<>Y 
oount(X,Y/Z) =: count(X,Z) - ommt(X,Y) 
oount(X,YkZ) = count(X,Z) - oount(X,Y) 
count(X,Y*Z) = count(X,Y) + count(X,Z) 
C~a,~ralized fm sequ~ of categoxle~, t~ X -- 
o~at of a sequence, X being a category, is the 
sum of %he X.-oounts of %he elements in the 
seqllence. 
count(X, \[Yl ..... ¥.\] ) = ~x~nt(X,Y1 ) + "'° 
+ so~t(X,Y. ) 
\];t w~.' gco~c{n bF Van ~t~u (1986) ti~tt "d~_~ (\]oo~t 
fa<~)tion Js ~i Jnvs~lanl: os, e~ deriwltJ.o~z<7~o Tn:i;..~ 
li~aD~ ~\[: ~ f~Etll~lt :IS a +~F~Z~II if "{\]I~ (X<~n{; 
va\].uos of %/ira alltex~-Jr~nt diffez" fz~ "/-h<a \[x~l~i; 
valu~ of ,Jm succ+_~k~t ft~' ~ basic {m%ogo'~:y.~ '~ha 
(~ate4~,o:ty (PP/(NP\8)) ca~i ~ (xclt~xt~d for O~k'h of 
\[ S NV N AP PP \] 
..... \]-:,7 :<:; .... .,\] .......... 7~ ................... :-± .................. : ..... 
I ...,,~ \[ \[o o o o :,.i 
/ :7•<~:'\s) i ~i-~ o o 0.1 
.... 7175 .......... \] ~o 1 o o ,_i:t 
............ tt o . o ol 
'It) ,~s~ "lb~ ~fu\]x~s of %his invmTiax~t take a 
9~:(~vth of the h6K\['c' )o Aps~t f~n de-, all ~):\[~l~J in 
lids NP a,\]# mi~biguo~<\]o 'li~ C~-~rhe~ian \[zcc~lct of 91he 
m,biguities gives 12 difforent cx~nbi rlato~y 
\[~Atis:\[bilit:<e~: 
(N/NP) ,N, (NP/PP), (N/NP), NP 
( N/I~ ), N, ( NP/PP ), ( N/NP ), ( N/NP ) 
( N/l-t~ ), N, ( ~elPP ), ( ~/~ ), N 
(N/t~') ,N, (NPI(~N)), (N/NP),NP 
(N/Ia),N, (NP/(NkN)), (N/NP), (N/NP) 
(~/~e) ,~, (~P/(~k~ ) ), (~/~) ,~ 
(N/i~), ( NP\S ), (NP/PP), ( N/NP ), NP 
(N/I~), (NP\S), (Ni'/I'P), (N/NP), ( N/NP ) 
(N/Ta), (NP\S), ( NP/PP ), (N/NP), N 
(N/NP), (NP\S), (NP/(N\N)), (N/NP),NP 
(N/NP), (NP\S), (NP/(N~N)), (N/NP) ~ (N/NP) 
(N/}a), (NP\S), (NP/(N~N)), (N/NP),N 
%k\] figure o~'t whethem' i~zis J~au~e is a i~un 
phr ~ase, one ~mld have %x} ~ to hJild a (NP) 
pa~e ~'e~l for each of these -twelve Ix)ssible 
cx~,bJnati(~s of ca~<)ry assi~T~Itso UsJn~J t'ho 
GOt~\]t inwwiant~ l~wever, one 1~ beforelknnd 
%hat ons ~ only c~ of these o~nblnations (given 
iY_l hold faos) oL~Id \[x\]6~ib\].y b~ 1~3\].~ as a ~l 
phrase, ~.t "Chat pazsJ\]~J, itself b~l~s super£1u~ 
in this ca~ ~i~ fbllowing fic~l~ shows "tha Cx:~It 
values fo~ t/~a cx\]r~eot ~ssiq~,t~it. 
N/~, \[ 0 I -Z 0 0 \] 
N \[0 0 l 0 0\] 
(NP/(NkN)) \[ 0 -i 0 ' 0 0 \] 
(N/~) \[ 0 1 -1 0 0 ;I 
~f \[0 0 i 0 O\] 
+ \[0 ! 0 0 O\] 
N~. \[o ~ o 0 o\] 
'.|.'ho ~:e~r can vem'ify for hia~elf that *\lie of %1~ 
other: rxm~Dinations satifies the count invariant. 
it is (k~J.~ that the pmx~edure ,just presented is 
a ~rfe~t n~m~ to lay hands on "the ratios of the 
f~:3qu~ieJios of lexica\].ly ambiglKx~ ~rds, given a 
~x)~l~us ~In(\] a lexicon with c~tegorial iufonnation. 
L',o, J~ c)xfi6z. #so dex'ive thes~ figiL~ss for the words 
h* %l~ t~EK database, sentences of the INL corpus 
a~ Ir~utg~d ~n a c~ada of diesmbiguath~ 
i,~x~leso The implementation of this Lambek- 
@~x-tzen di~bi\[~ator is straightfozward as it 
/ ~volv~.~ only s.il~ple ~atchlngs £~%d list- 
,~anIl~lati~r~s. The m01e of the dis~iguator /\]~ 
the V~c~s of ~J s~iguating the L corpus can t~ 
off frcl, the f'ollow\]ng figUmSo 
CORPUS : set of sen~ 
=> sente31ce select/on 
SENTF/$3~. (i) : list of words 
=> cat egozg asslgrment / lexical lookup 
SENTENCE (2) : list of wozTls + 
cab~ories 
=> ge~rate c~zbinat/ons 
COMBIhi~TIONS : list of categories 
::> test comblnatltms 
(i) C/3~nt 
(Z) Parse / pro~e 
RE~Uf,T : grammatical lists 
of categories 
Given a coz\]pus sentencx9, %he s!nstactic 
categories of all the words it contains are looked 
tip .in a parsing lexicon derived from the 16Dcical 
database. When al\]. combinations of caSegories have 
been computed, each is tested by the Oount module 
%0 ~guce %lie number of possible co~Jnatic~s of 
initial category &ssignments. In the most 
su~ful case, this reductifxz produces ~ily c~9 
possible oc~inatlon, Jmplylng %hat all lexical 
material in this sexztence is disambigua%ed. In most 
otl~gr cases, tllly a saall ~tago of the 
om'iginal n~Mae~ of possible ocm~inations of 
le~ical assignments is left over; these aro handed 
over to the Gentzen Proof Machine which wlil find 
out which of %1~ ~emain/ng assignments fail to 
oc~b~lne to a 9z'am, Tstical sentence. 
697 
Notes 
I. Much of the work described here is based on 
research by Michael M0ortgat. See e.g. his (1987a, 
1987b, 1988). 
2. e.g. Wittenburg (1987), Steedman (1987). 
3. Instead of theorems deducible frcm the calculus 
they are often facts that can be proven of the 
calculus as such, outside the calculus 
(n~tatheorems in other words). 
4. This c~%ation is called applicaticn. 
5. Notice that we will use the (argu~nt 
connective result) r~>taticn, no n~" wt~t ~e 
directionality of the functor. 
6. We wili present the sequent calculus, which 
Lambek adapted from Gentzen's work on logic. See 
Lambek (1958). 
7. Because of space limitatic~s ~ will not 
attempt to show the validity of this procedure. 
8. Proof omitted for space's sake. 

References

Jo van Benth~n (1986) 
Catogorial G~c~° (tie 7 
of Essays in Logical 
Semantics. Reidel, 
Dc~drecht. 

J. L~ (1958) 
The mathematics of 
sentence structure, in: 
• m. Math° M~nthly 65, i~- ~L. 
169. Reprinted in 
Suszkowskl e°a. (e~o): 
Categorlal @~amma~ o 
Benjamin's, Am~Texdsm (to 
appear)° 

Me Moor~at (1987a) 
Lsmlbek Theorem P~In~ ~ 
INL-WP 87-04. In Van 
Benthem & Klein (edso): 
Categorl~, Pol.~x~ 
and Unification. 

Me Moor~at (1987b) 
Generalized Categorial 
Grammar. To appear in F.G. 
Droste (ed.): Malnstcea~ 
in Linguistics . 
Benjamin's, Amsterdam. 

Me Moor~at (1988) 
Cate~jocial Y~estlgatlons 
(dlssertation, to appear)o 

Me Steedm3n (1987) 
Comblnatoz~ and (~rammars. 
In Oehrle, Bach & Wheele~- 
(eds.) : Categorial 
Grits and Natural 
Language S±ruc%ures. 
Reidel, Dcrdrecht ° 

K°Witten~ (1987) 
Predictive Ccmbinators: A 
Method for Efficient 
Processing of Combinator~ 
Categorial Grammars ° In 
Proceedings ACL 25.
