On the Mathematical Properties of Linguistic Theories 
C. tgm.lm~nd Pev-r=utt 
Dept. of Computer Science 
Untvermty of Toronto 
Toronto, Ontario, Canada M5S IA4 
ASS~ACT 
Meta-theoretical results on the decidability, genera- 
tire capacity, and recognition complexity o~ several syn- 
tactic theories are surveyed These include context-free 
grammars, transformational grammars, lexical func- 
tional grammars, generalized phrase structure gram- 
mars, and tree adjunct grammars. 
i. lmt~tiom. 
The development of new formalisms im which to 
express linguistic theories has been accompanied, at 
\[east since Chomsky and Miller's early work on context- 
free languages, by the study of their nets-theory. In par- 
tioular, numerous results on the decidabttity, generative 
capacity, and more recently the co~aplexity of recogni- 
tion of these formalisms have been published (and 
rumoured!). Strangely enough, much less attention 
seems to have been devoted to a discussion of the 
significance of these mathematical results. As a prelim- 
tnary to the panel on formal properties which will address 
the significance issUe, it seemed appropriate to survey 
the existing results. Such is the modest gee\[ of this 
paper. 
We wdt consider context-tree languages, transforma- 
tional grammars, lexicat functional grammars, general- 
ized phrase structure grammars, and tree adjunct gram- 
mars. Although we will not examme them here, formal 
studies of other syntactic theomes have been under- 
taken: e.g. Warren \[51\] ~or Montague's PTQ \[30\], and Bor- 
gida \[71 for the stratifications\[ grammars Of Lamb \[25\]. 
There follows a brief summary of some comments in the 
Rterature about related empirical issues, but we avoid 
entirely the issue of whether one theory is more descrip- 
tively adequate than another. 
Z. P,-elimLuary Defil~ons 
We assume the reader is familiar with the basic 
definitions of regular, context-tree (CF), context-sensitive 
~CS), recurstve, and recursively enumerable (r.e.) 
languages and with their accepters as car~ be \[ound tn \[':-_\], 
Some elementary definitions \[rom complexity theory 
may be useful. ?urt.her details may be found tn \[2\] 
Complexity theory is the study of the resources required 
of algorithms, usuai~y space and time. Let e(z) be a ~une- 
Lion, say the recognition Junction \[or a language i. The 
most inter~t!n~ results we could obtain about )'would be 
a ~o~JeT bo%znd on ~he resources needed to compute f on a 
mac~hine of a gLven architecture, say a yon Neumann 
This research was sponsored by the National Science and 
Engineering Research Council of Canada under Grant 
Ag2Rs. 
computer or a parallel array of neurons. These results 
over whole classes of machines are very difficult to 
obtain, and none el any significance exist for parsiD.g 
problems. 
Restricting ourselves to a specific machine model 
and an algorithm M for j', we can ask about the cost. (e.g 
time or space) e(z) of executing M on a specific input z. 
~Ically c is too flne-gra/ned to be useful: what one stu- 
dies instead ts a functio~ c w whose argument is a~. 
L'Iteger n denoting the s/zs of the input to !4, and which 
gives some measure of the cost of processing inputs of 
length n. Complexity theorists have been most interested 
g% the a.~\]m~i~ot.ie behavlour of c~v, i.e. the behaviour of c~ 
as n gets \[alge. 
:f one is interested in Upper bo~n~S on the behavlot:- 
of M, one usuai\[y defines c. (n) as the m.a:ru-n.um, of c(= 
over all inputs z of size n.~his is called the worst-case 
convexity hJumct/on for .&'. Notice that other de~rutlon: 
are possible: one could define the expected eomp\[exity 
~otion c,(n) for /v/as the average of c(=) over all LnpuL.-. 
of length ~%. c might be more useful than c~ if one had 
an ~dea of what the distribution of ~nputs to M could be. 
Unfortunately, the introduction of probabiJistic con- 
siderations makes the study of expected complex:It) 
techmcally more difficult that of worst case comp\[exity 
?or a given problem, expected and worst case measures 
may be quite ditTerent. 
it is quite dlfTieult to get detailed descriptions ot c~ 
and for many purposes a cruder estimate ts sufficient. 
~"~'le next abstraction involves "lumping" classes of c w 
functions into simpler ones that more clearly demon- 
str~te their ~symptottc behavlour and are easier to mani- 
pulate. This is the purpose of O-no~.Oon. Let f(n) and 
g(n) be two ~ui%cttons. \], ts said to be O(g) ,.f a constant 
multiple of ~ is an upper bound for f, ~or ~tl\[ but a finite 
number o~ values of n..~\[ore precisely, f ts O(g) ff ~here ,s 
are constants K and n O such that ~or all ~%>~e \],(n) < K'y('.'~). 
Given an algorithn~. M, we will say that tts '.verst-case 
time complexity ts O(g) if the worst-ease time cost func- 
tion cw(.n ) :or M is O(g) Notice that this merely says that 
almost all ~nputs to M of s,ze n can be processed in time 
at most a constant times g(n). It does nat ~ay that alJ 
Lnputs requLre g\[~%) time, or even that any do even on M, 
let alone on any other machine that Lmp\[ements \],. Also, 
if two algorithms A\] and A 2 are avaHab\[e for a function\]'. 
and \[\[ their worst-case complexity can be given respec- 
tively as OE, gl) and O(g~), and g2 < g2' tt may still.be the 
case that for a large number of cases (maybe even for all 
cases one is likely to encounter in practice) that A 2 will 
be the preferable algorithm, simply because the constant 
K! for g! may be much smaller than Kg for .q 8. 
98 
In examining known results about the recognition 
complexity of various theories, it is useful to consider 
how "robust" they are in the face of changes in the 
machine model from which they were derived. These 
models can be divided into two classes: sequential models 
and parallel models. Sequential models \[2\] include the 
familiar single- and multi-tape Turing Machines (TMs) as 
well as Random Access Machines (RAMs) and Random 
/%:ces~ Stored PrograznMaehines (RASPs). A RAM is Like a 
TM except that its working menory is random access 
rather than sequential. A RASP is like a RAM but stores 
its program in its memory. Of all these models, it is most 
like a yon Neumann computer. 
All these sequential models can simulate each other 
in ways that do not cause great changes in time complex- 
ity. For example, a e-tape Turing Machine that runs in 
time O(t) can be simulated by a RAM in time O(t). and 
conversely, a RAM runmng in O(t) can be simulated by a 
e-tape TM in time O(t~). In fact, all familiar sequential 
models are poIFnonm~Uy relate& they can su-nutate each 
other with at most a polynomial toss in efficiency. 
Thus if a syntactic model is known to have a difficult 
recognition problem on one sequential model, then it will 
not have a much easier one on another. 
TransforlTting a sequential algorithm to one on a 
parallel machine with a fixed number Kof processors pro- 
rides at most a factor K improvement in speed. More 
interesting results are obtained when the number of pro- 
cessors is allowed to grow with the size of the problem, 
e.g. with the length of the string to be parsed. If we view 
these processors as connected together in a circuit, vath 
inputs values entering at one end and outputs being pro- 
duced at the other, then a problem that has a solution on 
a ssq~ential machme in polynomial time and in space s 
w111 have a solution on a paraLLeL machine with a polyno- 
mial number of processors and ci~-c~ da-ptA (or max- 
Lmum number of processors data must be passed through 
from input to output) O(s 2) . Since the depth of a parallel 
circuit corresponds to the (parallel) ~/~te required to 
complete the computation, this means that a\[gorlthms 
with sequential solutions requiring small space (such as 
deterrnimstic CSLs) have fast parallel solutions. For a 
comprehensive survey of parallel computation, see 
Cook\[9\]. 
3. Context-Free languages. 
Recognition techmques for context-free languages 
are well-known ~3\]. The so-called "CK~ ~' or "dTnarmc pro- 
gramming" method is attributed by Hays \[~-51 to J Cocke, 
and Lt was discovered mdependentLy by Kasami ~5~.\] and 
Younger \[53\] who showed it to be O(nJ). It requires the 
grarm-nar to be in Chomsky Normal Form, and putting an 
arbitrary grammar in CNF may square the size of the 
grammar. 
Ear\[ey's algorithm recognizes strings in arbitrary 
CFGs in tlme O(n 3) and space O(rt2), and in time O(n 2) for 
unambiguous CF'Gs. Graham, Harrison and Ruzzo \[/3\] 
glve an algorithm that tlnifies ~ and Ear{ey's \[/0\] algo- 
rithm, and discuss implementation details. 
Valiant \[50\] showed how to Interpret the Ck'Y algo- 
rithm as the finding of the transitive closure of a matrix 
and thus reduced CF recognition to matrix multiphca- 
tion, for which sub-cubic aJgorithms exist. Because of 
the enormous constants of proport,onality associated 
with thls method, it is not likely to be of much practical 
use, either an implementation method or as a descrtp- 
tlon of the function of the brain. 
Ruzzo \[55\] has shown how CFLs can be recognized by 
boolean circuits of depth O(Log(n)2), and thus that paral- 
lel recognition can be done in time O(log(~)e). The 
required circuit has size polynomial in ~. 
So as not to get mystified by the uppe~- bs~nW2 on CF 
recogmtion, it is useful to remember that no known CFL 
requires more than linear time, nor is there a (non- 
constructive) proof of the existence of such a larg "-.=~. 
For an empirical comparison of various parsing 
methods, see Slocum \[44\]. 
4. Tran~ormational Gram.mr. 
From its earliest days, discussions of transforma- 
t/onal grammar (TG) have included mention of matters 
computational. 
Peters and Ritchie \[3S\] provided the first non-trivial 
results on the generative power of TGs. Their model 
reflects the "Aspects" version quite closely, including 
transformations that could move and add constituents 
and delete them subject to recoverability. All transforma- 
tions are obligatory, and applied cycl)cally from the bot- 
tom up. They show that every recursively enumerable 
(re.) set can be generated by a TC using a conte×t- 
sensitive base. The proof ts quite simple: the right-hand 
sides of the type-0 rules that generate the r.e. set are 
padded with a new "blank" symbol to make them at least 
as long as their left-hand sides. Rules are added to allow 
the blank symbols to commute with all others. These 
context-sensitive rules are then used as the base of a T0 
whose only transformation deletes the blank symbols. 
Thus if the transformational formalism itself is sup- 
posed to cA~te~'tze the grammatical strings of possible 
natural languages, then the only languages being 
excluded are those which are not enurnerabie under an~\] 
model of computation. 
At the expense of a considerably more intricate 
argu_rnest, the previous result can be strengthened \[32\] 
to show that every r.e. set can be generated by a 
context-free 5used TG, as long as a ~Iter (intersection 
with a regular set) can be applied to the phrase-markers 
output by the transformations. In fact, the base gram- 
mar can be 4,ndependent of the language being generated. 
The proof involves simulating a TM by a TG. The transfor- 
mations first generate an "input tape" for the TM being 
simulated, and then apply the TM productions, one per 
cycle of the grammar. The filter insures that the base 
grammar generated just as many S nodes as necessary to 
generate the input string and do the simulation. Again, if 
the transformational formalism is supposed to character- 
ize the Dossibie natural languages, then the Universal 
Base HYl~)th.esis \[31\] according to which all natural 
\[anguages can be generated from the same base gram- 
mar ks empirically vacuous: an?#, recurs\[rely enumerable 
language can. 
:Teverai attempts were then made to find a restricted 
form of the transformational model that was descmp- 
tively adequate arld yet whose generated languages are 
recurslve (see e.g. \[271). Since a key part of the proof in 
\[32\] involves the use of a filter on the final derivation 
trees, Feters and Ritchie examined the consequences of 
forbidding fi/%al filtering \[35\]. They show that if S is the 
only recursive symbol in the CF base then the generated 
language L is predict~bL U en~zrte~-~bLe and ez'pone'rLtZalL. N 
bo~ndec£ A language L \[s predictably enunlerable if there 
is an "easily" computable function t(n) that gives an 
upper bound on the number of tape squares needed by its 
enumerating TM to enumerate the first n elements of L. 
L is exponent/aUy bounded if there is a constant K such 
that for every string z in L there is another string z' in L 
whose length is at most Ktimes the length of z. 
99 
The class of non-filtering languages is quite unusual, 
including all the CFLs (obviously), but also some (but not 
all) Cb-l~s, some (but not all) reeursive languages, and 
some (but not all) r.e. languages. 
The source o~ non-recursivtty in transformational\[y 
generated languages is that transformations can delete 
arbitrarily large parts of the tree, thus producing surface 
trees arbitrarily smaller than the deep structure trees 
they were derived from. This ts what Chomsk'y's recover- 
ability of deletions condition was meant to avoid. In his 
thesis, Petrick \[36\] de6nes the following term~sal- 
\[em&d.h-i~cr,=a-inE condition on transformational deriva- 
tions: consider the followi~g two p-markers from a 
derivation, where the right one is derived from the left 
one by applying the cycle of transformations to subtree c 
producing the subtree z~ r 
s s 
Contmuing the derivation, apply the cycle to tree t yield- 
ing tree ~. 
s $ 
cycle 2 
A derivation satisfies the terminal-length-increasing con- 
dition if the yield of I~ is always \[ortger than the yield of 
Petrick shows that if all recursion in the base 
"passes through S" and if all derivations satisfy the 
terrninal-\[ength-mcreasing condition, then the generated 
language is recursive. Using a slightly more restricted 
model of transformations, Rounds \[42\] strengthens this 
result by showing that the resulting languages are in fact 
context-se nsitive. 
|n an unpublished paper, Myhill shows that Lf the 
condition is weakened to terrnlnal-length-non-decreasing, 
then the resulting languages can be recognized in space 
at most ez-po~ent/o\], Ln the length of the input. This 
implies that the recognition can be done ~n at most 
double-exponential time, but Rounds \[.~\] shows that not 
only can recognition be done in ez-ponevtt/a/ t/raze, but 
that every language recognizable in exponential time can 
be generated by a TG satisfylng the terminal-length-non- 
decreasing condition and recoverability of deletions 
This Is a very stron~ resu\].t, because of the closure 
properties of the class of exponential-time \[a_r~uages. To 
see why this LS SO requires a ~ew more deflnitions 
Let P be the class o~ all languages that can be recog- 
cuzed Ln polynomlaJ time on a deterministic TM, and NP 
the class of all languages that can be recognized in poly- 
nnmiai time on a non-deterministic T~\[ P \[s obviously 
contained in NP, but the converse is not known, although 
there is much evidence that is false. 
There is a class of problems, the so-called NP- 
complete problems, which are in NP and "as di~icuit" as 
any prob'..em in NP m the followLn~ sense: if czn!\] of them 
could be shown to be m P, then art the problems m NP 
would also be in P. One way to show thaL a language L Ls 
NP-complete \[s to show that L is in NP and that every 
other lar~uage L o in NP can be pol~omi~lly tr-allSfOlrl0sed 
into L, i.e. that there is a deterministic TM, operating in 
polynomial time', that will transform an input tu to L into 
an input %u 0 to L 0 such that m is in L if and only tu 0 ts Ln 
L O. In practice, to show that a \[an@uage is NP-complete, 
one shows that it ~s in NP, and that some already-known 
NP-complete language can be polynomially transformed 
to it. 
All the known NP-comp\[et.e languages can be recog- 
nized in exponential time on a deterministic machine, 
and none are known to have sub-exponential solutions. 
Thus sint:e the restricted transformational languages of 
Rounds characterize the exponential languages, then i\[ 
all of them were to be in P, then P would be equal to NP 
Putting it another way, i\[P is not equal to NP, then some 
transformational languages (even those sat,sfytng the 
terrnlnal-le ng th-non-incre asin~ condition) have ~ 
"tractable" (i.e. polynomial time) recognition pro~ "cu::,s 
on any deterministic TM. Note that this result also holcts 
for all the other ki%own sequential models of computa- 
tion, and even for parallel machines wlth as many as a 
poL%pto,rt/at number o\[ processors. 
5. L=xical FUnctional Grammar, 
in part, transformational grammar seeks to account 
for a range of constraints or dependencles wtthm sen- 
tences. Of particular interest are subcategorlzation 
dependencies and predicate-argunlent dependencies. 
These dependencies can hold over arbitrarily large dis- 
tances ~everal recent theories su~o=est difTerent u/ays of 
accounting for these dependencies, but without making 
use of transformations. We will exa/'nine three o~ these, 
Lexica\[ Functional Grammar, Generalized Phrase ~truc- 
ture Grammar, and Tree Adjunct Grammars, m ttte next 
few sections. 
Lexica\[ Functional Grammar ~LFG) of gap\[an and 
Bresnan \[24\] aims to provtd~ a descriptively adequate 
syrttactic formalism wlthout transformations. All the 
work done by transformations is instead encoded tn 
structures in the \[exlcon and in \[inks established 
between nodes in the constituent structure 
LFG languages are CS and properly include the CFLs 
\[2~\]. Berwlck \[5\] shows that a set of strings whose recog- 
nition problem is known to be NP-compIete, namely the 
set o\[ satisQable boolean formulas, Ls an LFG l&\[~uage. 
Therefore, as was the case for Rounds's restrlcted class of 
TGs, tf P Ls not equal to NP, then some languages ~en- 
erated by \[-~s do not have polynomial t~me recognition 
algorithms indeed only quite "baste" parts of the LFG 
mecharusm are necessary to the reduction. This 
includes mechanlsms necessary for feature agreement, 
for forcing verbs to take certain cases, and \[exlcal ambt- 
==uity Thus no s,mp\[e chan~e to the formalism is likely 
to avoid the combinatorlai consequences of the ~ull 
mechanism 
Berw1ek has also examined the relation between LFG 
and the class of languages generated by iIldexed gram~t- 
\[I\], a class kllown to be a proper subset of the C~Ls, 
but including some NP-complete languages \[42\] He 
ela/ms (personal communication) that the indexed 
languages are a proper subset of the LFG languages. 
6. Generalized Phrase Structure Grammar. 
In a series of papers, Gerald Gazdar and his col- 
leagues \[" l\] have argued for a joint account of the syntax 
and semantics o\[ En~hsh like LFG in eschewing the use of 
trans,formations but unlike it in positing, only one level of 
tO0 
syntactic description. The syntactic apparatus is based 
on a non-standard interpretation of phrase-structure 
rules and on the use of meta-rules. The formal conse- 
quences of both these moves %ave been investigated. 
6. I. No~ A~Mmsmbtlity 
There are two ways of interpreting the function of CF 
rules. The first, and most usual, is as rules for ,-e~,T,3b/~g 
strings. Derivation trees can then be seen as canonical 
representatives of classes of derivations producing the 
s~me string, and di~lering only in the order of application 
o~ the same productions. 
The second interpretation of CF rules is as con- 
straimts on derivation trees: a legal derivation tree is 
:he where each node is "admitted" by a rule, i.e. each 
node dormnates a sequence of nodes in a way sanctioned 
by a rule. For CF rules, the two interpretations obviously 
generate the same strings ~,~ the same set of trees. 
Following a suggestion of McCawley's, Peters and 
~,itchle \[34\] showed that if one considered context- 
se~s~.ve rules from the node-admissibility point of view, 
the languages defined were still CF Thus the use of CS 
rules in the base to impose sub-categorization restric- 
t/oRs, for example, does not increase the weak generatlve 
capacity of the base component. (For some different res- 
trictions of context-sensitive rules that guarantee that 
only CFLs will be generated, see Baker \[~:\].) 
Rounds \[40\] gives a simpler proof of Peters and 
?,itchie's node-adrnisstbility result using the techniques 
from tree-automata theory, a generalization to trees of 
fmlte state automata theory for strings. Just as a 0_rote 
state automaton (FSA) accepts a strong by reading it one 
character at a time, changing its state at each transi- 
tion, a finite state tree automaton (FETA) traverses trees, 
propagating states. The top-dowln F~TA "attaches" a start- 
ing state (from a flnite set) to the root o\[ the tree. Tran- 
sltions are allowed by productions of the form 
(q, ,., ~)--> (q, ..... ~,..) 
such that if state q is being applied to a node Labelled 
and dominatmg n descendants, then state ~i should be 
applied to its ~th descendant. Acceptance occurs if all 
\[eaves of the tree end up labelled with states in the 
accepting subset. The bottom-up Fs"rA is similar: start- 
\[ng states are attached to the \[eaves of the tree and the 
productions are of the form 
(=, ~, q~ ..... q~)-> q 
indicating that if a node labelled a dommating n descen- 
dants each labelled wlth states ql to q,v then node a gets 
labelled ~th state q..Acceptance occurs when the root is 
labelled by a state from the subset o\[ accepting states. 
.As is the ease ~th FSAs, F~TAs of both flavours can 
be either deterministic or non-deterministic. A set of 
trees i~ sa~d to be recognizable if it is accepted by a non- 
deterministic bottom-up Fb-TA. Again as with FSAs, any 
set o~ trees accepted by a non-determlmstic bottom-up 
~A t.~ accepted by a deterministic bottom-up ~,~TA, but 
the re:~ult does not hold for top-down F'5"FA. although the 
recognizable sets are exactly the languages retognized 
by non-determinlstic top-down FSTAs. 
A set of trees is local if it is the set of demvation 
trees of a CF grammar Clearly, every local set }s recog- 
nizable by a one-state bottom-up F~A that checks at 
each node that it satis6es a CF production. Also, the 
yield of a recogmzable set ol trees (the set of strings it 
generetes) is CF..4/though not all recognizable sets are 
local, hey can all be mapped into local sets by a simple 
~homo~norphic) mapping. 
Rounds's proof !41\] that CS rules under node- 
adnussibility generate only CFLs involves showing that 
the set of trees accepted by the rules is recognizable, 
i.e. that there is a non-deterministic bottom-up FSTA that 
can check at each node that some node-admissibili~:y 
condition holds there. This requires checking that the 
"strictly context-free" part of the rule holds, and that 
some proper analysis o\[ the tree passing thr~"g '_ the 
node ~atisfies the "context-sensitive" part of the rule. 
The ditlieulty comes h'om the fact that the bottom- 
up automaton cannot generate the set of proper ana- 
lyses, but must instead propagate (in its state set) the 
proper analysis conditions necessary to "admit" the 
nodes of its subtrees. It must, of course, also check that 
those rules get satisfied. 
A more intuitive proof using tree tr~nsduce~rs as well 
as FSTAs ,s sketched inthe Appendix. 
Joshi and Levy \[21\] strengthened Peters and 
Ritchie's result by showing that the node admissibility 
conditions could also include arbitrary Boolean combina- 
tions of ~mance conditions: a node could specify a 
bounded set of \[abels that must occur immediately above 
it along a path to the root, or un~r'nediate\[y below it on a 
path to the frontier. 
In general the CF grammars constructed \[n .the 
proof of weak equivalence to the CS grammars under 
node admissibility are much larger than the original, and 
not useful for practical recognition. Joshi, Levy and Yueh 
\[22\], however, show how Eariey's a/gomthm can be 
extended to a parser that uses the local constraints 
directly. 
8.2. MetaruJes. 
The second important mechanism used by Gazdar 
\[ii\] is mp~es, or rules that apply to rules to produce 
other rules. Using standard notation for CF rules, one 
example of a metarule that could replace the transforma- 
tion k~lown as "particle movement" is: 
V--> VN Pt X ==> V--> VP~ ~-PRO\] X 
Xhere is a vamable behavmg like vamables in structural 
analyses of transformations. If such vamables are res- 
tricted to being used as cbbTeviaticns, that is if they are 
only allowed to range over a \]~n~te subset of strings over 
the vocabulary, then closing the grammar under the 
metarules produces only a 6nite set of derived rules, arid 
thus the generative power of the formalism is not 
increased. If, on the other hand, X is allowed to range 
over strings of unbounded length, as are the essential 
~es of transformational theory, then the conse- 
quences are less clear. It is well known, for example, that 
I\[ the right-hand sides of phrase structure rules are 
allowed to be arbitrary regular expressions, then the gen- 
erated languages are still context-free. Might something 
like this not be happening wlth essential variables in 
metarules? It turns out not. 
The formal consequences of the presence of essen- 
tie/, variables in metarules depends on the preserice of 
another device, the so-called phantoms categories. It may 
be convenient in formulating metarules to allo~, in the 
left-hand sides of rules, occurrences of syntactic 
categories that are never introduced by the grammar, 
1.e that never appear m the mght-hand sldes of rules. |n 
standard CFLs, these are called %L.~eLess e¢tego~es, and 
rules containing them can simply be dropped, with no 
change Jngenerative capacity Not so ~th metarules: it 
is possible for metarules to rewrite rules containing 
phantom categories into rules without them. Such a dev- 
ice was proposed at one time as a ~ay to implement pas- 
tures in the GPSG framework. 
i01 
Uszkorelt and Peters \[49\] have shown that essential 
variables i.n metarules are powerful devices indeed: CF 
grammars with metaru\[es that use at most one essential 
variable and allow phantom categories can generate all 
reeursively enumerab\[e sets. Even if phantom categories 
are banned, as long as the use of at \[east one essential 
variab\[es \[s allowed, then some non-reeursive sets can be 
generated. 
Possible restrictions on the use of metarules are 
suggested in Oazdar and Pultum \[12\]. Shieber et al.\[45\] 
discuss some empirical consequences of these moves. 
7. Tree Adjunct ~. 
The Tree Adjunct Grammars (TAGs) of Joshi and his 
colleagues presents a different way of accounting for syn- 
tactic dependencies (\[17\], \[19\]). A TAG conmsts of two 
(finite) sets of (finite) trees, the centre trees and the 
ndjunet trees. 
The centre trees correspond to the surface struc- 
tures of the "kernel" sentences of the languages. The 
root of the adjunct trees is labelled with a non-terminal 
symbol which also appears e~cactiy once on the frontier of 
the tree. All other frontier nodes are labelled with terrm- 
nai symbols. Derivations in TAGs are defined by repeated 
application of the operation of ad|uneUon I~ c is a centre 
tree containing an occurrence of a non-tern-anal ,4. and if 
is an adjunct tree whose root (and one node n on the 
fronUer) ;s labelled ,4, then the adjunction of a to e is per- 
formed by "detaching" from c the subtree ~ rooted at A, 
attaching a \[n its place, and reattachiug t at node ft. 
Adjunctton may then be seen as a tree analogue of a 
context-free dertvatlon for strings \[40\]. The string 
\[anguage.~ obtamed by taking the yletds of the tree 
languages generated by TAGs are called Tree Adjunct 
~mgLu~es, or TALs. 
In TAGs all long-distance dependencies are the result 
of adjtmcttons separating nodes tb.~t at one point in the 
derivation were "cLose". 8oth crossing and non-crossing 
depenctenctes can be represented \[).8\]. The formal pro- 
perties of TAGs are fully discussed in \[30\]. \[52\], \[~\]. Of 
particular interest are the ~ollo~ng. 
TALs properly contain the C~Ls ~nd are property con- 
rained \[n the indexed languages, which m turn are prop- 
erly contained m the CSLs. Although the indexed 
{anguages contain NP-complete languages, TALs are 
much better behaved: ~oshi and Yokomori report ~per- 
sonal eommunicationl an O(n ~) recognition algorithm 
and conjecture that an O(n ~) bound may be possible. 
\[3. A Pointer to \]~t~nl~lrieal DLseusmons 
The literature on the emptmca\[ issues underiyiug 
the formal results reported here ts not ex~enswe. 
Chomsky argues convincingly \[8\] that there is no 
argument for natural languages neeess~.'~l~j being recur- 
sive. This, or course, is different from the possibdity that 
'~anguages are covtt~zgentty recurstve. Putnam \[39\] gives 
three reasons he claims "point in this direction": (i) 
'speaker~ can presumably classify sentences as accept- 
able or unacceptable, deviant or non-deviant, et cetera, 
wlthout reliance on extra-linguistic contexts. There are 
of course exceptions to this rule ", (~) grammatical\[ty 
judgements can be made for nonsense sentences, and \[S) 
grammars can be \[earned. (e) and (S) are irrelevant and 
(i) contalns zts own counter-argument. 
Peters and Ritchie \[S~\] contains a suggestive but 
hardly open-and-shut case ~or contingent recurstvtty: (:) 
every TQ has an exponentially bounded cyehng ~unction, 
and thus generates only recurs\[re languages, (Z) every 
natural }an£ua~e has a descriptively adequate TG, and (3) 
the comp\[exlty of \[anguages investigated so far ks typLca\[ 
of the class. 
Hintikka\[16\] presents a very di\[~erent argument 
against the recursivity of English based on the distribu- 
tion of the words ~r~y and evev-y. I/is account of why JoA~ 
\]cno~s e'ue'mjth£~g is grarnmatlcal whi\[e John ~c~o~s =~y- 
thing ks not is that =~y can appear only in contexts where 
replacing it by eve~ changes the meaning. Taking mean- 
mg to be logical equivalence, this means that grammati- 
eality is dependent on the determination of logical 
equivalence of logical formulas, an undecidable problem. 
Chomsk'y \[8\] argues that a simpler solution ks available, 
namely one that replaces logical equivalence by syntac- 
tic tdentLty of some kind of logical form. 
PuHum and Gazdar \[38\] \[s a thorough survey of, and 
argument against, published claims (mainly the "respec- 
tively" examples \[26\], Dutch cross-serial dependencies, 
and nominallzation in Mohawk \[,37\]) that some natural 
languages cannot be weakly generated by CF grammars. 
No cIalms are made about the strong adequacy of CFGs. 
9. Seeking E~gnilleanee. 
When can the supporter of a weak (syntactic) formal- 
ism (i.e. low recognition complexity, low gener.~tive capa- 
city) e\[alm that it superior to a competing more powerful 
formalism? 
Ling\[astir theories can differ along several dimen- 
sions, wtth generative capacity and recognition capacity 
being only two (albeit related) ones. The evaluation must 
take into consideration at \[east the fottovang others: 
Coverage. Do the theories make the same ~rammat o 
tcal predictions ? 
Extensibdity. The linguistic theory of which the syn- 
tactic theory ks a part will want to express well- 
formedness constraints other than syntactic ones These 
constraints may be expressed over syntactic representa- 
tions, or over different representations, presumably 
related to the syntactic ones. One theory may make this 
connection possible when another does not. This of 
course underlies the arguments for strong desempttve 
adequacy, 
Also relevant here Ls how the tmguLstlc theory as a 
whole is decomposed. The syntactm theory can obviously 
be made ampler by trans~ermng some of the explanatory 
burden to a.nother constituent. The c\[asmc e×amp\[e in 
programming languages is the constraint that all vam- 
ables must be declared before they are used. This con- 
strain\[ cannot be Lmposed by a CFG but can be by an 
indexed grammar, at the cost of a dramatic increase in 
recognltton complexity. Typically, however, the require- 
ment is slmply not cen~idered part of "syntax", which 
thus remams CF, and imposed separately in this case, 
the overall recognitmn comp\[exlty remams ~ome low- 
order polynomial, Some arguments of this kind can be 
found m \[3t~\] 
Separating the eonstralnts into different sub- 
theome~ wlt\[ not tn general make the problem of recog- 
ntzmg strings that satisfy all the constraints any more 
eHictent, but tt may allow hailing the power of each con- 
stituent. To take an e×treme example, every r.e. set 
the homomorphic image of the intersectlon of \[~,) 
context-free languages, 
Implementation. This Ls probably the most subtle s,~t 
of issues determining the sigmfieance of the \[orm,,l 
results, and I don't claim to understand them. 
Comparison between theories requires agreement 
between the machine models used to derive the complex- 
ity results As mentioned above, the sequential models 
are all polynomtally related, and no problem not hawng a 
102 
polynomial time solution on a sequential machine is 
likely to have one on a parallel machine limited to at 
most a polynomial number o\[ processors, at least if P is 
not equal to NP. Both these results restrict the improve- 
ment one can obtain by changing implementation, but 
are of little use in comparing algorithms of low complex- 
ity. Berwick and Weinberg \[6\] give examples of how algo- 
rithms of low complexity may have different implementa- 
tions differing by large constant factors. In particular, 
changes in the form of the grammar and in its represen- 
tation may have this effect. 
But of more interest I believe is the fact that imple- 
mentation is often accompanied by some form of 
resource limitation that has two effects. First it is also a 
change in speeifieaZ~bn. A context-free parser imple- 
mented with a bounded stack recognizes only a finite- 
state language. 
Second, very special implementations can be used if 
one is willing to restrict the size of the probterrt to be 
solved, or even use special-purpose methods for limited 
problems. Marcus's parser \[28\] with its bounded look- 
ahead is another good example. Sentences parsable 
~nthin the allowed look-ahead have "quick" parses, but 
some grammatical sentences, such as "garden path" sen- 
tences cannot be recognized without an extension to the 
mechanism that would distort the complexity measures. 
There is obviously much more of this story to be 
told. Allow me to speculate as to how it might go. We may 
end up with a space of linguistic theories, differing in the 
idealization of the data they assume, in the way they 
decompose constraints, and in the procedural 
specifications they postulate (I take it that two theories 
may differ tn that the second simply provides more detail 
than the first as to how constraints specified by the first 
are to be used.) Our observations, in particular our meas- 
urements of necessary resources, are drawn from the 
"ultimate implementation", but this does not mean that 
the "ultimately low-level theory" is necessamly the most 
reformative, witness many examples in the physical sci- 
ences, or that less procedural theories are not useful 
stepping stones to more procedural ones. 
It is also not clear that theories of different compu- 
tational power may not be useful as descriptions of 
different parts of the syntactic apparatus. For example, 
tt may be earner to learn statements o\[ constraints 
within the framework of a general machine. The con- 
straints once learned might then be subjected to 
transformation to produce more ei=llcient special-purpose 
processors also imposing resource limitations. Indeed, 
the "possible languages" of the future may be more com- 
plex than the present ones, just as earlier ones may have 
been syntactically simpler Were ancient languages reg- 
ularO 
Whatever we decide to make of existing formal 
results, Lt is clear that continumg contact with the com- 
plexity community is important. The driving problems 
there are the P = NPquestion, the determination of lower 
bounds, the study of time-space tradeot~s, and the com- 
plexity of parallel computations. We still have some 
m~thodological house-cleaning to do, but I don't see how 
we can avoid being affected by the outcome of their 
investigations. 
~ITKNO~E~ENTS 
Thanks to Bob Bet'wlck, Arawnd Joshi, Jim Hoover, 
and Stan Peters for their suggestions. 
APPE~IDIX 
Rounds \[41\] proves that context-sensitive rules 
under node-admissibility generate only context-free 
languages by constructing a non-deterministic bottom-up 
.tree automaton to recognize the accepted trees. We 
sketch here a proof that makes use of several d~tev-r, tin/s- 
tic O"a.,~d'uc.e~s instead. 
FSTAs can be generalized so that instead of simply 
accepting or rejecting trees, they transform them, by 
adding constant trees, and deleting or duplicating sub- 
trees. Such devices are called ~nite state tree transdue- 
erw (FSTT), and like the FSTA they can be top-down or 
bottom-up. Flrst motivated as models of syntax-directed 
translations for compilers, they have been extensively 
studied (e.g. \[47\], \[48\], \[40\]) but a simple subset is 
sufficient here. 
The idea is this. Let The the set of trees accepted by 
the CS-based gram/nat. Let t be in 71 F'STTs can be used 
to label each node ~% of t with the set of all proper ana- 
lyses passing through n. It will then be simple to check 
that each node satisfies one of the node admissibility 
conditions by sweeping through the labelled tree with a 
bottom-up FSTA. 
The node labelling is done by two FST"Fs ~'l and r e. Let 
be the maximum length of any left or right-context of 
any node adITussibility condition. Thus we need only label 
nodes with sets of strings of length at most rrL, and over a 
finite alphabet there are only a fitute number of such 
strings. 
r I operates bottom-up on a tree t, and labels each 
node 7t of t with three sets Pb'efz(n), Sufj~z(n), and 
Y'zeld(n) of proper analyses: if P is the set of all proper 
analyses of the subtree rooted at vt, then Prefix(n) is the 
set of all substrings of length at most m that are prefixes 
of strings of P. Similarly, ,~tJ%z(n) is the set of all 
suGixes of length at most ~,~., and Z'zetd(n) is the set of all 
strings of P of length at most rrL. It can easily be shown 
that for any set of trees T, Tis recognizable if and only if 
~ /r) is. 
Applying to the output of "r j, the second transducer 
-rm operating top-down, labels each node rt with all the 
proper analyses going through n, i.e. wlth a pair of sets 
of strings. The first set ~nll contain all left-contexts of 
node n and the second all mght-eontexts, r 2 also 
preserves recogmzability. A bottom-up FSTA can now be 
defined to check at each node that both the context-~ree 
part of a ru/e as well as its context conditions are 
satisfied. 
This argument also extends easily to cover the dotal- 
nance predicates of ioshi and Levy: transducers can be 
added to label each node wlth all its "top contexts" and 
all its "bottom- contexts" "\['he final ?STA must then 
check that the nodes satlsf-y whatever Boolean combina- 
tion of dorrunanee and proper analysis predicates _are 
requJred by the node admissibility m//es. 
REFEEENCES 
\[i\] Aho A.V., \[ndezed gT~rrLmars., aw eztensiovt o/ the 
co~ztezt-free ~rr=m~rs, JACM 15, 647-67!. i968. 
\[2\] Aho A.V., Hopcroft J.E. and Ullman J.D., The Design and 
Analysis of Comlmlter Algorith~q~ Addison-Wesley, Reading 
~ass, 1974. 
\[3\] Aho A.V., and Ullman J.D, The Theory of Parsing. 
Translation. and CoKm@ili.g. vol I: Parsing. Prentice Hall, 
Englewood Cliffs N.J., 1972. 
103 
\[4\] Baker B.S., Arbitrary g'r~mm~r~ generating contezt- 
free languages, TR 11-72, Center for Research in Comput- 
ing Technology, Harvard Univ., 1972. 
\[5\] Berwick R. C., Comp~aZional com~lezity ~nd lezical 
2u~tio~al srammar, t9th ACL, 198L 
\[6\] Berwick R. C. and Weinberg A., Pars/rig eff/c/.ency, 
com~tationn~ complezity, and the eval~lion of gram- 
rna//ca~ theor~s, l.ing. Inq. 13, ,.65-191, 1962. 
\[7\] Borgida & T., Formal Studies of Strat/ficational Gram- 
mars, Pb_D Thesis, University o£ Toronto, 1977. 
\[8\] Chomsky N., Rt~a and Representatinns. Columbia 
University Press, 1980. 
\[9\] Cook S. A., 7b~'ds ~ cornpte~ity thso~ d of svnchzo- 
nm~s pa.v~ztlet com.~utat~, L'P.n-eignement 
Math,bmatique 27, 99-t24, !981. 
\[I0\] Eariey J., An effic-Lent contezt-free txtrs-mg algo- 
~hrn, Comm_ or ACM 13.2, 94-10~, 1970. 
\[Ii\] Gazdar G., PhTsse stru=hzre ~rramanar, in Jacobson P. 
and Puilum G. (eds.), The Nature of Syntactic Represen- 
tation. Reidel, 1982. 
\[12\] Gazdar G. and Pullum G., Generalized l~e ~-uc- 
ttu~ C~"amr~ A'l~oretical Synopsis. Indiana Univ. Ling. 
Club, 1982. 
\[13\] Graham S. L., Harrison M. A.. and Ruzzo W.L., An 
iwtprovaed co~tte=t-free recognizer, ACM Trans. on Frog. 
lang. and Systems, 2, 3, 415-462, 1980. 
\[14\] Hopcroft J.E. and Ullman J., Introduction to Auto- 
mata Theory, ~es and Computation, Addison Wes- 
ley, 1979. 
\[15\] Hays D.G., A~tornat~c l~g~age data ~'ocess~g. In 
Computer Ai~Nication~ in the Behavioral Sciences, H 
Borko (ed.), Prentice Hail, Englewood Cli~s N.J., 1962 
\[16\] Hintikka, ,I.K.K., Q~¢n~e~ in naPurai ~ang~ges: 
solute logicalprobtems 2, l.i.= and PhiL, 153-L72, 1977. 
\[17\] Joshi ,% K., How much contezt-sensitiuit~j is required 
to pr~ride reasonable St~ttctur~ desc~ptio~s: tree 
a~\]oining 9-ra~nmm-s, to appear in Dowry D., Karttunen L. 
add Zwicky A. (eds.), Natural l~-ua~e Processing: 
l~Chl)iinuui~c, CoH~=m.ltatiollaJ. and TheoRt~aJ. Pl"Oper- 
ties, Cambridge Univ. Press. 
\[18\] Joshi AK., F~to~..g recurs-m~ ~nd d~pendencies: an 
aspect of Tree Ad\]o~rtY.~tg Ora~tmar,c and a compm-~son ~f 
some format properties of TAG's, GPSG's, PLG'S and LFG's, 
these Proceedings, !983. 
\[19\] Josht &K. and Levy L.S:, P/mnse st~t~'e trees be~ 
rnm'e ~t t?~n you zvo~Id have tturught, 18th ACL, 1980. 
\[20\] Joshi A.K., Levy L.S. and Takahasht M,, Tree adfunct 
@-ra~rnars, J. of Cor~p. and Sys. Sc. "0, i, !36-163, "975. 
!21\] Joshi A.K., Levy L.S., Constv~z~nts on structural 
48scr~tions: local transfc,rmations, SIAM J. on Comput- 
e, 1977. 
\[Z2\] Joshi A.K, Levy L.S. and Yueh K., L~ca/cor~traints on 
progrexr~m~ng ta~g~zges, Part 1. ~ttta~, Th. Comp. Sc. 
i2.2.65-290, :960 
\[~31 Josh/ A~ K. and Yokomori T, S0~ne characterization 
theore~-s fo~" tree ¢~j~tct la'nguages and recognizsuble 
sets. \[orthcoming 
\[24} Kaplan R. and Bresnan J., Lez~cal-F~nct~nal Gram- 
tnwr: a fozvrt~2 system fc, r gra~maZ~cal representation, in 
Bre.~nan J (ed.), The Mental Repe~sentation of Grammati- 
cad Relatinn~, M\]T Press, ! 982. 
\[25\] Lamb S., OutlJ~e of StJratificational Gramr,'\]ar, George- 
town Umvermty Press, Washington, 1966. 
\[26\] Langendoen D.T., On the inadegu~cy o/ "/~jpe-2 and 
TMpe-3 gr~rruzrs for human l~tg~ages, tn P.J. Hopper 
(ed.) ~.udies m HistorieRl IJnguJ~cs: festsc\]~riJtt \[or 
W'~d P. Lehrrmrt, John Benjamin, Amsterdam, 159-171, 
1977. 
\[27\] LaPointe S., /{%zc~rsive~tess and deletiovt, \[Jno. Al~d. 
3, 227-Z65, 1976. 
\[28\] Marcus M.P., A Theory of ~tacUc Recognition for 
NaturaI Language, MIT Press, 1980. 
\[29\] Matthews R., Are the ~Fr~at~cal sentences of a 
l,,ngtmge a rec~r~ve set?, Synthese 40, 2139-224, 1979. 
\[30\] Montague R., The prier treat~.ewt of T.Lm~t~fication 
in 0rdina~J English, in Hintikka, J., Moravcsik J, and 
Suppes P (eds.), Approaches to Natural Language, Reidel, 
Dordrecht, 1973. 
\[31\] Peters P.S. and Ritchie R.W., A note on the ~iversaZ 
base ~othes~, J. of IJn£~.~cs, 5, i50-2, 1969. 
\[32\] Peters PS. and Rttchie R.W., On restmctzng tim base 
component of t'r~wsform~ztiovtat g~zmma~s, Inf. and Con- 
trol 18, 483-5-1, 1971. 
\[33\] Peters P.S. arid Ritchie R.W., On the gevter~e power 
of brensloz'rrta~iona~ grcntm~rs, Inf. So. 6, 49-83, 1973. 
\[34\] Peters P.S. and Ritchte R.W., Co.tort-sensitive 
~.t~rwd~--Ze cowst~tuevtt am~tyszs - co~tezt-free Languages. 
re'wLsited. Math Sys. Theory 6. 324-333, "9?3. 
\[35\] Peters P.S. and Ritchie R.W., Ncrn-fitterivu\] and local 
fiZter~g gratnmm'.~, tn JK.K. Hinttkka, JM.E. Moravcsik, 
and P. Suppes (eds.) Approaches to Natural Ls~b~ge, 
Reidel, 180-194. 1973. 
\[36\] Petrick S. R., A Recognition Procedure tot Transfor- 
mational Grammar~ Pb_D Thesis, .k-IT, 1965 
\[37\] Postal P.M., Lirtt~tations of phrase-structure g~rtt- 
mats, m J.A. Fodor and J.J. Katz (eds.), Th~ structure of 
language: Rean~nL/s in the philosophy of lan=~lage, Eng~e- 
wood Cliffs: Prentice Hall, 137-i51, 1964. 
\[38\] Pullum G.K. and Gazdar G., Natural and contezt-~ree 
~a~@'uages, I~.~_. and Phil., 4, ~71-504. 1982. 
\[39\] Putnam H., So~te iss~s m the theory of ~ramrn.~r, m 
l~. of S~sia in ApI~lied Mathe~tics..American 
Math. Soc., 1961. 
\[40\] Rounds W. C., Mwptr~gs and gramr~tars o~ trees, 
Math. Sys. Th..4,3, 257-~87, '.9?0. 
\[41\] Rounds W. C., 7~'ee-ov-Lented proofs of some theorems 
on contezt-free and indezed lsmg~axjes, 2nd ACM b-'ymp, on 
Th. Comp. Sc.. 109-;-!6, 1979. 
\[4~} Rounds W. C., Cs~r~ptezi~j of rec,Jgni~ion in 
~rttev-r~diate-level langta~ges, 14th S\]Enp. or* Sw. and Aut. 
13~ 1973. 
\[43\] Rounds W C., A graz~.z}~x~ticai cAwr~ter~.ation of 
e~povtevttial-ti.te Imrtg~ages, ~ ~. on ,Found. cf 
Commp. So., L35-!43. 1975. 
\[44\] b-qocum J., A practizat co~tpar/son of pav~ing stra- 
tegies, !9th ACt "_981. 
\[45\] Shieber S.M., Stucky S. U., Uszkoreit H. anc\[ Robinson 
J. j., fb~naZ constraints on metctrtdes, these Proceed- 
ings, 1983. 
\[46\] Thatcher J W, Charazter~ng dew.arian trees of 
context-free grarnrna~'s through a generalization of finite 
~tovrtata theov'g, J. of Comp. and Sys. Sc. ~, 3~.7-3~2. 
1967 
\[47\] Thatcher J.W., C.ensv~/~zed ~ se~rttenti~t roach%no 
maps, J. of Comp. and Sys. So. 4, 339-67, :970 
\[~8\] Thatcher J W., 7~'ee mLttozrtata: an infov-rrtal survey, \[n 
A. Aho (ed.), Currents in the theory of COml~ting, Pren- 
tice Hall, 148-172, !973. 
104 
\[49\] Uszkoreit H. and Peters P. S., Essm~t~aL var/ables in 
rr~ts~/es, forthcoming. 
\[50\] Valiant L., ~ener~ context-free recogTtition in less 
than cubic t~z~, L of Comp. and b~Fs. So. i0, 308-315, 
1975. 
\[51\] Warren D. S., Syntax and Sem~nLics of Paz~ing: An 
~q~pficaUon to Montague C~ramm~m Ph.D Thesis, University 
of Michigan, 1979. 
\[52\] Yokomori T. and Joshi A. K., Serni-liztear~ty, p~z'Uch- 
baundea~ss ~td tree adjunct taTtguages, to appear \[n inf. 
Pr. Letter& 1983. 
\[53\] Younger D. H., Recoqln52io~t ~nd pars~ztg of context- 
f~ee lamg'az~es ~ t/.~u~ nJ, Inf. and Contl~l, I0, 2, 189-208, 
1967. 
\[54\] Kasami T., A~ e(T/c/ent recognition a~d s'tj.ntax a2go- 
r~thm /~r context-free laT~g~mges, Air Force Cambridge 
Research Laboratory report AF-CRL-65-758, Bedford ,VLA, 
\[965. 
\[55\] Ruzzo W. L., On ur,:Lfo~'n, c'Lrc'~ cora\]c~ez~tj (extended 
abstract), Proc. of 20th $mnual Syrup. on Found. of Com. 
S=., 312-318, 1979. 
105 
