CATEGORIAL GRAMMARS FOR STRATA OF NON-CF LANGUAGES AND THEIR PARSERS 
Michal P. Chy'til Hans Karlgren 
Charles University 
Malostransk@ n&m. 25 
118 O0 Praha I 
Czechoslovakia 
I(V=&L 
Sodermalmstorg 8 
116 45 Stockholm 
Sweden 
Abstract 
We introduce a generalization of 
oategorial grammar extending its descrip- 
tive power~ and a simple model of oatego- 
rial gram.at parser. ~oth tools 08/% be 
adjusted to particular strata of languages 
via restricting gralmnatieal or computatio- 
nal complexity'. 
I. .Two questions about oategprial 6\]ra3,1ars 
In. spite of the fascinating folnnal 
simplicity 8/Id lucidity of oategorial 
grammar as developed by Bar-Hillel \[I\] 
q~Eunbek \[7\] and followers, it has never- 
theless never been brou~'ht into wide scale 
use. Why' is this so? 
We may' easily' recognize two draw- 
backs. 
I/ .R.es,t,rieted scope oJ? o~t.eg~o_r!a_l ~r~unmars. 
It was shown early' \[ I \] that the 
set of laxts~/ages describable by these 
g'rarm\[lars is exactly-that of context-free 
i8/%g~/a~'es. \[Is this restriction inevitable 
or oa/~ a similar ty'pe of l~%ng%lage descrip- 
tion be retained beyond the limit of 
context-free lan~lages? This is the first 
question we try' to ~lswer. 
2/ No real_is/tic model of oategoria.l 
grammar par s in g. 
The schematic description of eate- 
gorial analysis of a given sentence a I . °. 
• ..a is sketched in Fig'. I . n 
assign a category' c i i al i2 "'" in 
to each sentence 
member a. e I 02 ... e n 
cancel the string of 
categories to the 
target category' t t Fig. 1 
This abstract scheme cannot serve 
as a description of a realistic parsing 
procedure. The suitable assig~ement appe- 
aring here as the first phase is in fact 
the goal of the parsing. The "brute force" 
approach following the above scheme, which 
cheeks all possible assignements and tries 
to eEu~eel them is not eomputationally' 
tractable, since for most granmlarS the 
nul,ber of all possible assignements grows 
exponentially with the length of the 
analysed sentence. 
The moral of this obsel~vation is 
that the assi~nement oaru~ot be separated 
from the cancellation. Similarly as parsers 
based on phrase - str~oture grammars have 
to make at each point of time an intelli- 
gent choice of rule to apply next~ the 
eategorial parser must m~ke an intelligen~ 
choice out of a list of alternative oate- 
n'cries. This necessity to look ahead at 
cancellation when making the assignement 
leads to the conclusion \[6 \] that 
assi~nement and cancellation must in any' 
actual parser be interwoven. Therefore 
our second key qlles~ion reads: 
Can this interweaving" be grasped by' 
a simple formal model or does it unavoi- 
dingly lead to ~ mess of complicated ad hoe 
and heuristic teelmiques? 
If. Proposed solution 
We introduce in nontechnical langn/- 
a~'e the essence of the proposed generaliza- 
tion of eategorial gran.nars ~d their 
parsers. Tile exact mathematical formulations 
can be found in \[3\]. 
Oranmlars. Tile principal difference between 
the "classical" eategorial granmlar and the 
~eneralizcd cate_gorial 6-rams at (GCG) is 
208 
that inste~d of finite sets of categories 
corresponding to terminal syunbols, GCG 
allows fox, infinite sets of categories. 
Bach such infinite sot, however~ can be 
generated b\[.' a @J:mple procedure , in fact 
procedure9 based on a finite state gene- 
rator. 
Automata° We offer list automaton (bA) as 
a mathematical model of oate~orial ~rEumnar 
parsing. List automaton is schematically 
represented by' Fig. 2. 
I finite 
°ontr°l 
l all ' lanl 
Fig'. 2 
LA consists of a nondeterministie finite 
state oont~ol unit attached to a finite 
-tape. At the begilminc of the Oolnputation 
"the tape contains the analysed string° The 
automaton can read 8/id rewrite so~,~aned 
symbols and move the soauning head one 
tape cell to the left or right analogously 
as Tur:Lng machine° In addition to it, it 
can delete "the scamped oell~ i.e. out it 
out arid paste the remaining' tape parts 
to~'ether. 
In the remainder of the paragraph 
we list results indicating, as we believe~ 
that the concepts of G-CO and LA give 
satisfaeto:cy' auswers be the above questions. 
a/ ~nd mufb3xal e oxlresppndenc ~. Both 
GCGs and LA represent exactly all context 
-sensitive :kan6~u~6"eSo Similarly' like in 
%he ease of Cl,'-6.r~umnars and pushdo~n auto- 
mata or oon'bext-sensitive ~'ralmnars and 
linearly' bounded automata \[5 \] there 
exist transformations of GCGs to LA and 
"vice versa: au al~'or.ithm Aj, which for 
each GCG G y'ields a LA A I (G) representin C 
the sa.,e 1;luggage ~nd conversely' an 
algorithm A 2 which for each LA M y'ields 
an equivalent GCG A2(M ) 
The next step in our arg~nent is to 
point out a remarkab'ke feature of the 
interplay' between GCGs and LA, 
b/ Stratif'aeation. Tihe correspondence 
between GCGs and LA ca~ be observed not 
only' in the whole class of context-sensi- 
tive languag'es, but also on the level of 
CF-lan6~ages and in each of infinitely many' 
strata between CF a CS-lang~ages. The 
stratification can be defined via two 
complexity measures. 
Or~u3nn~tic~l - pomplexity": given a GCG C 
and a string w , the ~rmmnatieal comple- 
xity' of w wrt. G , denoted G(w) ~ is 
defined as 'the lengt\[h of the longest ca- 
tegory' used in the aualy'sis wrt. ~ . 
(For alabi~uous gralmllars~ "the oonlplexity' 
:ks defined for each \[parse of the string). 
.C0mpn~ational complexity\[: given a LA M 
and 6% strin0; W , the computational com- 
plexity' of W wrt. ~ denoted M(w) , is 
defined as the maximal number of visits 
paid to a sing'le square during the 
accepting computation (ambiguity being 
treated as before). 
In "tile light of these complexity 
measures we can reconsider the relation 
between GCGs and LA determilled by' the 
above mentioned alF;oritl~us A I and A 2 
For s.ny' GCG G and\[ any' sentence w , each 
~r6Ullmabie~l description of w wrt. 
is refleeted as a computation of A I (G) 
accepting w . :File g'r~umnatieal complexity 
of the description is approximately' the 
same as the eomputabional complexity' of 
the corresponding eon,p'atation~ Analogous 
result holds for &2 " 
Now, any function f mappin~ 
natural numbers on natural ntunbers debeT- 
.lines a stratum S (f) of lan~.tla6"es : a 
langmlag~e L belongs to the stratum 8~ 
if and only' if it o~n be represented by' a 
GCG G (Or equivalently a LA M) such 
that from eaeb sentence w from L of 
length n , the complexity' G(w) (or M(w)) 
does not exceed the :numbel ~ f(n) . OuT 
previous considerations show that the 
algorithms /11 ' &2 respect the stratifi~ 
oatio_n. Ilence the introduced tools can be 
209 
a j~usted te the investigated is/ig~ages, 
~l~o exmnples : 
I/ The ~ran~nams in the s~r~tum S(oonst) 
(determined by' constant fmletions) are 
exactly' Bar-I{illel oategorial g'rallmlarS. 
"Finite visit" LA appear as ~heir parsers. 
2/ The la/l~a~es in the strata S(f) ) 
where f is ~ny' fun.etion of erde~ ~ sm~l-l.er 
then -the function log(lo~ n) belort~ to 
"almost eente~t-free lan~a~'es" (of. \[~\]) 
sharing" e~uoi~l properties of CF-\].ans~a~es. 
o/ A S_Si.{nenlent grid cancellation inge:~weveno 
'\];o show %b.at list automata) besides -their 
simplioity'~ llleet also 'the abo~re :Formulated 
requirement for natural parsers ef o~te- 
6'oriel ~ramlllar~ We h~'ve to examine at least 
irrCo~\],ally' in. lllere detail the relavOienshJ.p 
between a dOG G aaa.d J.~s parser A I (O). 
Witch the au'tema~o:tl A 1 (G) a.naly'ses a 
string" al " " ° ~%n ~ then duriTrlg" ~h.e lll--t\]x 
visit to a square eentainin~ erig'inally a 
symbol ~i ~ the automaton fixes the m-th 
symbol in the oate{5o.~y' belong'inu to a i . 
'\]?hus ai'%er m visits , Ill sy~lbols ef 
tlhe eateg'o.vy' ~re determined° Therefere 
from the (infinite) se'(; of caret'cries 
assig~-~able to to a i , enly' those whic\[h 
a(gree with the determined symbols ~omain 
il-~ play', To determine the next symbol of 
a e~%eg'ol~y ', the automaton can cheek the 
envirorunent of the square and take into 
account possible oanoe\].lations. At the 
mOlllent~ when all symbols in a category' 
are fixed, the corresponding" square :ks 
deleted. En other words) a oomputation, of 
A 1 (G) on a str:Lnu a I , . , a n evelves 
dyzl~mica\].ly' a suitable assi~nement 
°l"'°e:n ef o~-teg-ories. The irUPormatien 
used by' the p~rser consists of 
-- g'eneratin~ mechanism ef categories 
cerresponding" to particular s3~nbels~ 
indioatiorls of possible c~noellin~ with 
neJ.~hbour o~te~ories, 
The oemp'~tation is oempleted at the moment 
when the assiG~nement is found. 
Ill, ~tiens 
I/ Y~% thi.s brief no-~e we tried to grasp 
wh~% features of the ex~et mathematical 
models described in I~ \] we consider to 
"be f'mzdmnental. We can ima&-Lne ~J.0erilative 
models d.ifferirtg" in tee\]in.joel d.e%ails but 
havin~ the somle features° Which of the 
medels should \]30 chosen as "e~%nonioal" 
will require ntore extensive s~udieso 
2/ 0~r considerations devil with nonde~ 
terministio LA, i.eo in fact with "illethod.s" 
of parsing'. "i~le step from "methods" to 
"alue~:ktluns" leads frem :13.ondete:£,lllinistie 
"to detorministie LA. L~ven a £:limpse ef 
the basic str~~'uun S(oenst) promises in-- 
terestin~" results. .M1 o bsezwation of T. 
l{:i.bbard \[ 4 \] shows %h~~ deterministic 
"finite visit" LA represent a class of 
lanG~/ag'os bro~der tha/z the el.ass of 
deterministic oe~Ttext-free lanuaaG, es. \]\[t 
implies Lhat deterministko caVe,oriel 
granmlar (in the elassiezt\], sollse) parsin.c: 
will (t'o be3~olld ~.he limits Of e.(~.. 
LR-p~rsi~" based o\]% CF~¢:l, anm*ars 
References

\[1\] Y, Bar-Iiille\]., C.Gaifma/1, F. Shamir: On 
eaLe(5orial and phrase s~rueture ~'ramma~s~ 
\]3ull. Res. Council \]israel, F9, 1960 

\[2\] M,P.Chy'til: ilmost context--free lang~/ag'es) 
to appear in \])~undamenta \]Jllfel~nlatioae , I 986 

\[3\] M.P. Chy'til, \]I.Karlgren: Categ-eria.L g'ram~ 
mars and lis% a atemata for strata of non- 
CD' \].aa1~txa~es~ to ~%ppear in Jovan ~entllem, 
W,Buszkowski, W, Ma~'e:kszewski (ed.), 
Gate.oriel g-rmltmar, J. Benjamins R,V0, 
Amsterdam- Ph:LJ_adelhia 

\[/#\] T.Hibbard: A ~eneralization of corltex:L- 
free deterntinism, I~ffe~llatien \[tnd CenUrel 
11 (1967), 196- 238 

\[5\] JoE.Hoporoft, J°D.Ullman: Formal Lang'ua- 
g'es and their relatien to automata, Add.- 
Wesley' 1969 

\[6\] H.Karlg'ren: Cage5'orial ~rammar calculus, 
Soripter, Stoe\](holill I 974 

\[7\] Jo Lambek: On the calculus ef syntae'tie 
"types, in St:cueture ef language atld its 
me%he aspee~s~ Prec. 12th Symp. Appl° 
Math° AMS~ Providence 1961 
