XML Viewer - p83-1015

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/83/p83-1015_metho.xml
Size: 40,901 bytes
Last Modified: 2025-10-06 14:11:34
<?xml version="1.0" standalone="yes"?>
<Paper uid="P83-1015">
  <Title>On the Mathematical Properties of Linguistic Theories</Title>
  <Section position="2" start_page="0" end_page="98" type="metho">
    <SectionTitle>
Z. P,-elimLuary Defil~ons
</SectionTitle>
    <Paragraph position="0"> We assume the reader is familiar with the basic definitions of regular, context-tree (CF), context-sensitive ~CS), recurstve, and recursively enumerable (r.e.) languages and with their accepters as car~ be \[ound tn \[':-_\], Some elementary definitions \[rom complexity theory may be useful. ?urt.her details may be found tn \[2\] Complexity theory is the study of the resources required of algorithms, usuai~y space and time. Let e(z) be a ~une-Lion, say the recognition Junction \[or a language i. The most inter~t!n~ results we could obtain about )'would be a ~o~JeT bo%znd on ~he resources needed to compute f on a mac~hine of a gLven architecture, say a yon Neumann This research was sponsored by the National Science and Engineering Research Council of Canada under Grant Ag2Rs.</Paragraph>
    <Paragraph position="1"> computer or a parallel array of neurons. These results over whole classes of machines are very difficult to obtain, and none el any significance exist for parsiD.g problems.</Paragraph>
    <Paragraph position="2"> Restricting ourselves to a specific machine model and an algorithm M for j', we can ask about the cost. (e.g time or space) e(z) of executing M on a specific input z. ~Ically c is too flne-gra/ned to be useful: what one studies instead ts a functio~ c w whose argument is a~.</Paragraph>
    <Paragraph position="3"> L'Iteger n denoting the s/zs of the input to !4, and which gives some measure of the cost of processing inputs of length n. Complexity theorists have been most interested g% the a.~\]m~i~ot.ie behavlour of c~v, i.e. the behaviour of c~ as n gets \[alge.</Paragraph>
    <Paragraph position="4"> :f one is interested in Upper bo~n~S on the behavlot:of M, one usuai\[y defines c. (n) as the m.a:ru-n.um, of c(= over all inputs z of size n.~his is called the worst-case convexity hJumct/on for .&amp;'. Notice that other de~rutlon: are possible: one could define the expected eomp\[exity ~otion c,(n) for /v/as the average of c(=) over all LnpuL.-. of length ~%. c might be more useful than c~ if one had an ~dea of what the distribution of ~nputs to M could be.</Paragraph>
    <Paragraph position="5"> Unfortunately, the introduction of probabiJistic considerations makes the study of expected complex:It) techmcally more difficult that of worst case comp\[exity ?or a given problem, expected and worst case measures may be quite ditTerent.</Paragraph>
    <Paragraph position="6"> it is quite dlfTieult to get detailed descriptions ot c~ and for many purposes a cruder estimate ts sufficient.</Paragraph>
    <Paragraph position="7"> ~&amp;quot;~'le next abstraction involves &amp;quot;lumping&amp;quot; classes of c w functions into simpler ones that more clearly demonstr~te their ~symptottc behavlour and are easier to manipulate. This is the purpose of O-no~.Oon. Let f(n) and g(n) be two ~ui%cttons. \], ts said to be O(g) ,.f a constant multiple of ~ is an upper bound for f, ~or ~tl\[ but a finite number o~ values of n..~\[ore precisely, f ts O(g) ff ~here ,s are constants K and n O such that ~or all ~%&gt;~e \],(n) &lt; K'y('.'~). Given an algorithn~. M, we will say that tts '.verst-case time complexity ts O(g) if the worst-ease time cost function cw(.n ) :or M is O(g) Notice that this merely says that almost all ~nputs to M of s,ze n can be processed in time at most a constant times g(n). It does nat ~ay that alJ Lnputs requLre g\[~%) time, or even that any do even on M, let alone on any other machine that Lmp\[ements \],. Also, if two algorithms A\] and A 2 are avaHab\[e for a function\]'. and \[\[ their worst-case complexity can be given respectively as OE, gl) and O(g~), and g2 &lt; g2' tt may still.be the case that for a large number of cases (maybe even for all cases one is likely to encounter in practice) that A 2 will be the preferable algorithm, simply because the constant K! for g! may be much smaller than Kg for .q 8.</Paragraph>
    <Paragraph position="8">  In examining known results about the recognition complexity of various theories, it is useful to consider how &amp;quot;robust&amp;quot; they are in the face of changes in the machine model from which they were derived. These models can be divided into two classes: sequential models and parallel models. Sequential models \[2\] include the familiar single- and multi-tape Turing Machines (TMs) as well as Random Access Machines (RAMs) and Random /%:ces~ Stored PrograznMaehines (RASPs). A RAM is Like a TM except that its working menory is random access rather than sequential. A RASP is like a RAM but stores its program in its memory. Of all these models, it is most like a yon Neumann computer.</Paragraph>
    <Paragraph position="9"> All these sequential models can simulate each other in ways that do not cause great changes in time complexity. For example, a e-tape Turing Machine that runs in time O(t) can be simulated by a RAM in time O(t). and conversely, a RAM runmng in O(t) can be simulated by a e-tape TM in time O(t~). In fact, all familiar sequential models are poIFnonm~Uy relate&amp; they can su-nutate each other with at most a polynomial toss in efficiency.</Paragraph>
    <Paragraph position="10"> Thus if a syntactic model is known to have a difficult recognition problem on one sequential model, then it will not have a much easier one on another.</Paragraph>
    <Paragraph position="11"> TransforlTting a sequential algorithm to one on a parallel machine with a fixed number Kof processors prorides at most a factor K improvement in speed. More interesting results are obtained when the number of processors is allowed to grow with the size of the problem, e.g. with the length of the string to be parsed. If we view these processors as connected together in a circuit, vath inputs values entering at one end and outputs being produced at the other, then a problem that has a solution on a ssq~ential machme in polynomial time and in space s w111 have a solution on a paraLLeL machine with a polynomial number of processors and ci~-c~ da-ptA (or max-Lmum number of processors data must be passed through from input to output) O(s 2) . Since the depth of a parallel circuit corresponds to the (parallel) ~/~te required to complete the computation, this means that a\[gorlthms with sequential solutions requiring small space (such as deterrnimstic CSLs) have fast parallel solutions. For a comprehensive survey of parallel computation, see Cook\[9\].</Paragraph>
  </Section>
  <Section position="3" start_page="98" end_page="99" type="metho">
    <SectionTitle>
3. Context-Free languages.
</SectionTitle>
    <Paragraph position="0"> Recognition techmques for context-free languages are well-known ~3\]. The so-called &amp;quot;CK~ ~' or &amp;quot;dTnarmc programming&amp;quot; method is attributed by Hays \[~-51 to J Cocke, and Lt was discovered mdependentLy by Kasami ~5~.\] and Younger \[53\] who showed it to be O(nJ). It requires the grarm-nar to be in Chomsky Normal Form, and putting an arbitrary grammar in CNF may square the size of the grammar.</Paragraph>
    <Paragraph position="1"> Ear\[ey's algorithm recognizes strings in arbitrary CFGs in tlme O(n 3) and space O(rt2), and in time O(n 2) for unambiguous CF'Gs. Graham, Harrison and Ruzzo \[/3\] glve an algorithm that tlnifies ~ and Ear{ey's \[/0\] algorithm, and discuss implementation details.</Paragraph>
    <Paragraph position="2"> Valiant \[50\] showed how to Interpret the Ck'Y algorithm as the finding of the transitive closure of a matrix and thus reduced CF recognition to matrix multiphcation, for which sub-cubic aJgorithms exist. Because of the enormous constants of proport,onality associated with thls method, it is not likely to be of much practical use, either an implementation method or as a descrtptlon of the function of the brain.</Paragraph>
    <Paragraph position="3"> Ruzzo \[55\] has shown how CFLs can be recognized by boolean circuits of depth O(Log(n)2), and thus that parallel recognition can be done in time O(log(~)e). The required circuit has size polynomial in ~.</Paragraph>
    <Paragraph position="4"> So as not to get mystified by the uppe~- bs~nW2 on CF recogmtion, it is useful to remember that no known CFL requires more than linear time, nor is there a (nonconstructive) proof of the existence of such a larg &amp;quot;-.=~. For an empirical comparison of various parsing methods, see Slocum \[44\].</Paragraph>
    <Paragraph position="5"> 4. Tran~ormational Gram.mr.</Paragraph>
    <Paragraph position="6"> From its earliest days, discussions of transformat/onal grammar (TG) have included mention of matters computational.</Paragraph>
    <Paragraph position="7"> Peters and Ritchie \[3S\] provided the first non-trivial results on the generative power of TGs. Their model reflects the &amp;quot;Aspects&amp;quot; version quite closely, including transformations that could move and add constituents and delete them subject to recoverability. All transformations are obligatory, and applied cycl)cally from the bottom up. They show that every recursively enumerable (re.) set can be generated by a TC using a context-sensitive base. The proof ts quite simple: the right-hand sides of the type-0 rules that generate the r.e. set are padded with a new &amp;quot;blank&amp;quot; symbol to make them at least as long as their left-hand sides. Rules are added to allow the blank symbols to commute with all others. These context-sensitive rules are then used as the base of a T0 whose only transformation deletes the blank symbols.</Paragraph>
    <Paragraph position="8"> Thus if the transformational formalism itself is supposed to cA~te~'tze the grammatical strings of possible natural languages, then the only languages being excluded are those which are not enurnerabie under an~\] model of computation.</Paragraph>
    <Paragraph position="9"> At the expense of a considerably more intricate argu_rnest, the previous result can be strengthened \[32\] to show that every r.e. set can be generated by a context-free 5used TG, as long as a ~Iter (intersection with a regular set) can be applied to the phrase-markers output by the transformations. In fact, the base grammar can be 4,ndependent of the language being generated. The proof involves simulating a TM by a TG. The transformations first generate an &amp;quot;input tape&amp;quot; for the TM being simulated, and then apply the TM productions, one per cycle of the grammar. The filter insures that the base grammar generated just as many S nodes as necessary to generate the input string and do the simulation. Again, if the transformational formalism is supposed to characterize the Dossibie natural languages, then the Universal Base HYl~)th.esis \[31\] according to which all natural \[anguages can be generated from the same base grammar ks empirically vacuous: an?#, recurs\[rely enumerable language can.</Paragraph>
    <Paragraph position="10"> :Teverai attempts were then made to find a restricted form of the transformational model that was descmptively adequate arld yet whose generated languages are recurslve (see e.g. \[271). Since a key part of the proof in \[32\] involves the use of a filter on the final derivation trees, Feters and Ritchie examined the consequences of forbidding fi/%al filtering \[35\]. They show that if S is the only recursive symbol in the CF base then the generated language L is predict~bL U en~zrte~-~bLe and ez'pone'rLtZalL. N bo~ndecPS A language L \[s predictably enunlerable if there is an &amp;quot;easily&amp;quot; computable function t(n) that gives an upper bound on the number of tape squares needed by its enumerating TM to enumerate the first n elements of L.</Paragraph>
    <Paragraph position="11"> L is exponent/aUy bounded if there is a constant K such that for every string z in L there is another string z' in L whose length is at most Ktimes the length of z.</Paragraph>
    <Paragraph position="12">  The class of non-filtering languages is quite unusual, including all the CFLs (obviously), but also some (but not all) Cb-l~s, some (but not all) reeursive languages, and some (but not all) r.e. languages.</Paragraph>
    <Paragraph position="13"> The source o~ non-recursivtty in transformational\[y generated languages is that transformations can delete arbitrarily large parts of the tree, thus producing surface trees arbitrarily smaller than the deep structure trees they were derived from. This ts what Chomsk'y's recoverability of deletions condition was meant to avoid. In his thesis, Petrick \[36\] de6nes the following term~sal\[em&amp;d.h-i~cr,=a-inE condition on transformational derivations: consider the followi~g two p-markers from a derivation, where the right one is derived from the left one by applying the cycle of transformations to subtree c producing the subtree z~ r s s Contmuing the derivation, apply the cycle to tree t yielding tree ~.</Paragraph>
    <Paragraph position="14"> s $ cycle 2 A derivation satisfies the terminal-length-increasing condition if the yield of I~ is always \[ortger than the yield of Petrick shows that if all recursion in the base &amp;quot;passes through S&amp;quot; and if all derivations satisfy the terrninal-\[ength-mcreasing condition, then the generated language is recursive. Using a slightly more restricted model of transformations, Rounds \[42\] strengthens this result by showing that the resulting languages are in fact context-se nsitive.</Paragraph>
    <Paragraph position="15"> |n an unpublished paper, Myhill shows that Lf the condition is weakened to terrnlnal-length-non-decreasing, then the resulting languages can be recognized in space at most ez-po~ent/o\], Ln the length of the input. This implies that the recognition can be done ~n at most double-exponential time, but Rounds \[.~\] shows that not only can recognition be done in ez-ponevtt/a/ t/raze, but that every language recognizable in exponential time can be generated by a TG satisfylng the terminal-length-nondecreasing condition and recoverability of deletions This Is a very stron~ resu\].t, because of the closure properties of the class of exponential-time \[a_r~uages. To see why this LS SO requires a ~ew more deflnitions Let P be the class o~ all languages that can be recogcuzed Ln polynomlaJ time on a deterministic TM, and NP the class of all languages that can be recognized in polynnmiai time on a non-deterministic T~\[ P \[s obviously contained in NP, but the converse is not known, although there is much evidence that is false.</Paragraph>
    <Paragraph position="16"> There is a class of problems, the so-called NP-complete problems, which are in NP and &amp;quot;as di~icuit&amp;quot; as any prob'..em in NP m the followLn~ sense: if czn!\] of them could be shown to be m P, then art the problems m NP would also be in P. One way to show thaL a language L Ls NP-complete \[s to show that L is in NP and that every other lar~uage L o in NP can be pol~omi~lly tr-allSfOlrl0sed into L, i.e. that there is a deterministic TM, operating in polynomial time', that will transform an input tu to L into an input %u 0 to L 0 such that m is in L if and only tu 0 ts Ln L O. In practice, to show that a \[an@uage is NP-complete, one shows that it ~s in NP, and that some already-known NP-complete language can be polynomially transformed to it.</Paragraph>
    <Paragraph position="17"> All the known NP-comp\[et.e languages can be recognized in exponential time on a deterministic machine, and none are known to have sub-exponential solutions.</Paragraph>
    <Paragraph position="18"> Thus sint:e the restricted transformational languages of Rounds characterize the exponential languages, then i\[ all of them were to be in P, then P would be equal to NP Putting it another way, i\[P is not equal to NP, then some transformational languages (even those sat,sfytng the terrnlnal-le ng th-non-incre asin~ condition) have ~ &amp;quot;tractable&amp;quot; (i.e. polynomial time) recognition pro~ &amp;quot;cu::,s on any deterministic TM. Note that this result also holcts for all the other ki%own sequential models of computation, and even for parallel machines wlth as many as a poL%pto,rt/at number o\[ processors.</Paragraph>
  </Section>
  <Section position="4" start_page="99" end_page="99" type="metho">
    <SectionTitle>
5. L=xical FUnctional Grammar,
</SectionTitle>
    <Paragraph position="0"> in part, transformational grammar seeks to account for a range of constraints or dependencles wtthm sentences. Of particular interest are subcategorlzation dependencies and predicate-argunlent dependencies.</Paragraph>
    <Paragraph position="1"> These dependencies can hold over arbitrarily large distances ~everal recent theories su~o=est difTerent u/ays of accounting for these dependencies, but without making use of transformations. We will exa/'nine three o~ these, Lexica\[ Functional Grammar, Generalized Phrase ~tructure Grammar, and Tree Adjunct Grammars, m ttte next few sections.</Paragraph>
    <Paragraph position="2"> Lexica\[ Functional Grammar ~LFG) of gap\[an and Bresnan \[24\] aims to provtd~ a descriptively adequate syrttactic formalism wlthout transformations. All the work done by transformations is instead encoded tn structures in the \[exlcon and in \[inks established between nodes in the constituent structure LFG languages are CS and properly include the CFLs \[2~\]. Berwlck \[5\] shows that a set of strings whose recognition problem is known to be NP-compIete, namely the set o\[ satisQable boolean formulas, Ls an LFG l&amp;\[~uage. Therefore, as was the case for Rounds's restrlcted class of TGs, tf P Ls not equal to NP, then some languages ~enerated by \[-~s do not have polynomial t~me recognition algorithms indeed only quite &amp;quot;baste&amp;quot; parts of the LFG mecharusm are necessary to the reduction. This includes mechanlsms necessary for feature agreement, for forcing verbs to take certain cases, and \[exlcal ambt==uity Thus no s,mp\[e chan~e to the formalism is likely to avoid the combinatorlai consequences of the ~ull mechanism Berw1ek has also examined the relation between LFG and the class of languages generated by iIldexed gram~t\[I\], a class kllown to be a proper subset of the C~Ls, but including some NP-complete languages \[42\] He ela/ms (personal communication) that the indexed languages are a proper subset of the LFG languages.</Paragraph>
  </Section>
  <Section position="5" start_page="99" end_page="99" type="metho">
    <SectionTitle>
6. Generalized Phrase Structure Grammar.
</SectionTitle>
    <Paragraph position="0"> In a series of papers, Gerald Gazdar and his colleagues \[&amp;quot; l\] have argued for a joint account of the syntax and semantics o\[ En~hsh like LFG in eschewing the use of trans,formations but unlike it in positing, only one level of  syntactic description. The syntactic apparatus is based on a non-standard interpretation of phrase-structure rules and on the use of meta-rules. The formal consequences of both these moves %ave been investigated. 6. I. No~ A~Mmsmbtlity  There are two ways of interpreting the function of CF rules. The first, and most usual, is as rules for ,-e~,T,3b/~g strings. Derivation trees can then be seen as canonical representatives of classes of derivations producing the s~me string, and di~lering only in the order of application o~ the same productions.</Paragraph>
    <Paragraph position="1"> The second interpretation of CF rules is as constraimts on derivation trees: a legal derivation tree is :he where each node is &amp;quot;admitted&amp;quot; by a rule, i.e. each node dormnates a sequence of nodes in a way sanctioned by a rule. For CF rules, the two interpretations obviously generate the same strings ~,~ the same set of trees.</Paragraph>
    <Paragraph position="2"> Following a suggestion of McCawley's, Peters and ~,itchle \[34\] showed that if one considered contextse~s~.ve rules from the node-admissibility point of view, the languages defined were still CF Thus the use of CS rules in the base to impose sub-categorization restrict/oRs, for example, does not increase the weak generatlve capacity of the base component. (For some different restrictions of context-sensitive rules that guarantee that only CFLs will be generated, see Baker \[~:\].) Rounds \[40\] gives a simpler proof of Peters and ?,itchie's node-adrnisstbility result using the techniques from tree-automata theory, a generalization to trees of fmlte state automata theory for strings. Just as a 0_rote state automaton (FSA) accepts a strong by reading it one character at a time, changing its state at each transition, a finite state tree automaton (FETA) traverses trees, propagating states. The top-dowln F~TA &amp;quot;attaches&amp;quot; a starting state (from a flnite set) to the root o\[ the tree. Transltions are allowed by productions of the form (q, ,., ~)--&gt; (q, ..... ~,..) such that if state q is being applied to a node Labelled and dominatmg n descendants, then state ~i should be applied to its ~th descendant. Acceptance occurs if all \[eaves of the tree end up labelled with states in the accepting subset. The bottom-up Fs&amp;quot;rA is similar: start\[ng states are attached to the \[eaves of the tree and the productions are of the form (=, ~, q~ ..... q~)-&gt; q indicating that if a node labelled a dommating n descendants each labelled wlth states ql to q,v then node a gets labelled ~th state q..Acceptance occurs when the root is labelled by a state from the subset o\[ accepting states. .As is the ease ~th FSAs, F~TAs of both flavours can be either deterministic or non-deterministic. A set of trees i~ sa~d to be recognizable if it is accepted by a non-deterministic bottom-up Fb-TA. Again as with FSAs, any set o~ trees accepted by a non-determlmstic bottom-up ~A t.~ accepted by a deterministic bottom-up ~,~TA, but the re:~ult does not hold for top-down F'5&amp;quot;FA. although the recognizable sets are exactly the languages retognized by non-determinlstic top-down FSTAs.</Paragraph>
    <Paragraph position="3"> A set of trees is local if it is the set of demvation trees of a CF grammar Clearly, every local set }s recognizable by a one-state bottom-up F~A that checks at each node that it satis6es a CF production. Also, the yield of a recogmzable set ol trees (the set of strings it generetes) is CF..4/though not all recognizable sets are local, hey can all be mapped into local sets by a simple ~homo~norphic) mapping.</Paragraph>
    <Paragraph position="4"> Rounds's proof !41\] that CS rules under nodeadnussibility generate only CFLs involves showing that the set of trees accepted by the rules is recognizable, i.e. that there is a non-deterministic bottom-up FSTA that can check at each node that some node-admissibili~:y condition holds there. This requires checking that the &amp;quot;strictly context-free&amp;quot; part of the rule holds, and that some proper analysis o\[ the tree passing thr~&amp;quot;g '_ the node ~atisfies the &amp;quot;context-sensitive&amp;quot; part of the rule. The ditlieulty comes h'om the fact that the bottom-up automaton cannot generate the set of proper analyses, but must instead propagate (in its state set) the proper analysis conditions necessary to &amp;quot;admit&amp;quot; the nodes of its subtrees. It must, of course, also check that those rules get satisfied.</Paragraph>
    <Paragraph position="5"> A more intuitive proof using tree tr~nsduce~rs as well as FSTAs ,s sketched inthe Appendix.</Paragraph>
    <Paragraph position="6"> Joshi and Levy \[21\] strengthened Peters and Ritchie's result by showing that the node admissibility conditions could also include arbitrary Boolean combinations of ~mance conditions: a node could specify a bounded set of \[abels that must occur immediately above it along a path to the root, or un~r'nediate\[y below it on a path to the frontier.</Paragraph>
    <Paragraph position="7"> In general the CF grammars constructed \[n .the proof of weak equivalence to the CS grammars under node admissibility are much larger than the original, and not useful for practical recognition. Joshi, Levy and Yueh \[22\], however, show how Eariey's a/gomthm can be extended to a parser that uses the local constraints directly.</Paragraph>
    <Paragraph position="8"> 8.2. MetaruJes.</Paragraph>
    <Paragraph position="9"> The second important mechanism used by Gazdar \[ii\] is mp~es, or rules that apply to rules to produce other rules. Using standard notation for CF rules, one example of a metarule that could replace the transformation k~lown as &amp;quot;particle movement&amp;quot; is: V--&gt; VN Pt X ==&gt; V--&gt; VP~ ~-PRO\] X Xhere is a vamable behavmg like vamables in structural analyses of transformations. If such vamables are restricted to being used as cbbTeviaticns, that is if they are only allowed to range over a \]~n~te subset of strings over the vocabulary, then closing the grammar under the metarules produces only a 6nite set of derived rules, arid thus the generative power of the formalism is not increased. If, on the other hand, X is allowed to range over strings of unbounded length, as are the essential ~es of transformational theory, then the consequences are less clear. It is well known, for example, that I\[ the right-hand sides of phrase structure rules are allowed to be arbitrary regular expressions, then the generated languages are still context-free. Might something like this not be happening wlth essential variables in metarules? It turns out not.</Paragraph>
    <Paragraph position="10"> The formal consequences of the presence of essentie/, variables in metarules depends on the preserice of another device, the so-called phantoms categories. It may be convenient in formulating metarules to allo~, in the left-hand sides of rules, occurrences of syntactic categories that are never introduced by the grammar, 1.e that never appear m the mght-hand sldes of rules. |n standard CFLs, these are called %L.~eLess eC/tego~es, and rules containing them can simply be dropped, with no change Jngenerative capacity Not so ~th metarules: it is possible for metarules to rewrite rules containing phantom categories into rules without them. Such a device was proposed at one time as a ~ay to implement pastures in the GPSG framework.</Paragraph>
    <Paragraph position="11">  Uszkorelt and Peters \[49\] have shown that essential variables i.n metarules are powerful devices indeed: CF grammars with metaru\[es that use at most one essential variable and allow phantom categories can generate all reeursively enumerab\[e sets. Even if phantom categories are banned, as long as the use of at \[east one essential variab\[es \[s allowed, then some non-reeursive sets can be generated.</Paragraph>
    <Paragraph position="12"> Possible restrictions on the use of metarules are suggested in Oazdar and Pultum \[12\]. Shieber et al.\[45\] discuss some empirical consequences of these moves.</Paragraph>
    <Paragraph position="13"> 7. Tree Adjunct ~.</Paragraph>
    <Paragraph position="14"> The Tree Adjunct Grammars (TAGs) of Joshi and his colleagues presents a different way of accounting for syntactic dependencies (\[17\], \[19\]). A TAG conmsts of two (finite) sets of (finite) trees, the centre trees and the ndjunet trees.</Paragraph>
    <Paragraph position="15"> The centre trees correspond to the surface structures of the &amp;quot;kernel&amp;quot; sentences of the languages. The root of the adjunct trees is labelled with a non-terminal symbol which also appears e~cactiy once on the frontier of the tree. All other frontier nodes are labelled with terrmnai symbols. Derivations in TAGs are defined by repeated application of the operation of ad|uneUon I~ c is a centre tree containing an occurrence of a non-tern-anal ,4. and if is an adjunct tree whose root (and one node n on the fronUer) ;s labelled ,4, then the adjunction of a to e is performed by &amp;quot;detaching&amp;quot; from c the subtree ~ rooted at A, attaching a \[n its place, and reattachiug t at node ft.</Paragraph>
    <Paragraph position="16"> Adjunctton may then be seen as a tree analogue of a context-free dertvatlon for strings \[40\]. The string \[anguage.~ obtamed by taking the yletds of the tree languages generated by TAGs are called Tree Adjunct ~mgLu~es, or TALs.</Paragraph>
    <Paragraph position="17"> In TAGs all long-distance dependencies are the result of adjtmcttons separating nodes tb.~t at one point in the derivation were &amp;quot;cLose&amp;quot;. 8oth crossing and non-crossing depenctenctes can be represented \[).8\]. The formal properties of TAGs are fully discussed in \[30\]. \[52\], \[~\]. Of particular interest are the ~ollo~ng.</Paragraph>
    <Paragraph position="18"> TALs properly contain the C~Ls ~nd are property conrained \[n the indexed languages, which m turn are properly contained m the CSLs. Although the indexed {anguages contain NP-complete languages, TALs are much better behaved: ~oshi and Yokomori report ~personal eommunicationl an O(n ~) recognition algorithm and conjecture that an O(n ~) bound may be possible.</Paragraph>
    <Paragraph position="19"> \[3. A Pointer to \]~t~nl~lrieal DLseusmons The literature on the emptmca\[ issues underiyiug the formal results reported here ts not ex~enswe.</Paragraph>
    <Paragraph position="20"> Chomsky argues convincingly \[8\] that there is no argument for natural languages neeess~.'~l~j being recursive. This, or course, is different from the possibdity that '~anguages are covtt~zgentty recurstve. Putnam \[39\] gives three reasons he claims &amp;quot;point in this direction&amp;quot;: (i) 'speaker~ can presumably classify sentences as acceptable or unacceptable, deviant or non-deviant, et cetera, wlthout reliance on extra-linguistic contexts. There are of course exceptions to this rule &amp;quot;, (~) grammatical\[ty judgements can be made for nonsense sentences, and \[S) grammars can be \[earned. (e) and (S) are irrelevant and (i) contalns zts own counter-argument.</Paragraph>
    <Paragraph position="21"> Peters and Ritchie \[S~\] contains a suggestive but hardly open-and-shut case ~or contingent recurstvtty: (:) every TQ has an exponentially bounded cyehng ~unction, and thus generates only recurs\[re languages, (Z) every natural }anPSua~e has a descriptively adequate TG, and (3) the comp\[exlty of \[anguages investigated so far ks typLca\[ of the class.</Paragraph>
    <Paragraph position="22"> Hintikka\[16\] presents a very di\[~erent argument against the recursivity of English based on the distribution of the words ~r~y and evev-y. I/is account of why JoA~ \]cno~s e'ue'mjthPS~g is grarnmatlcal whi\[e John ~c~o~s =~ything ks not is that =~y can appear only in contexts where replacing it by eve~ changes the meaning. Taking meanmg to be logical equivalence, this means that grammatieality is dependent on the determination of logical equivalence of logical formulas, an undecidable problem.</Paragraph>
    <Paragraph position="23"> Chomsk'y \[8\] argues that a simpler solution ks available, namely one that replaces logical equivalence by syntactic tdentLty of some kind of logical form.</Paragraph>
    <Paragraph position="24"> PuHum and Gazdar \[38\] \[s a thorough survey of, and argument against, published claims (mainly the &amp;quot;respectively&amp;quot; examples \[26\], Dutch cross-serial dependencies, and nominallzation in Mohawk \[,37\]) that some natural languages cannot be weakly generated by CF grammars.</Paragraph>
    <Paragraph position="25"> No cIalms are made about the strong adequacy of CFGs.</Paragraph>
  </Section>
  <Section position="6" start_page="99" end_page="102" type="metho">
    <SectionTitle>
9. Seeking E~gnilleanee.
</SectionTitle>
    <Paragraph position="0"> When can the supporter of a weak (syntactic) formalism (i.e. low recognition complexity, low gener.~tive capacity) e\[alm that it superior to a competing more powerful formalism? Ling\[astir theories can differ along several dimensions, wtth generative capacity and recognition capacity being only two (albeit related) ones. The evaluation must take into consideration at \[east the fottovang others: Coverage. Do the theories make the same ~rammat o tcal predictions ? Extensibdity. The linguistic theory of which the syntactic theory ks a part will want to express well-formedness constraints other than syntactic ones These constraints may be expressed over syntactic representations, or over different representations, presumably related to the syntactic ones. One theory may make this connection possible when another does not. This of course underlies the arguments for strong desempttve adequacy, Also relevant here Ls how the tmguLstlc theory as a whole is decomposed. The syntactm theory can obviously be made ampler by trans~ermng some of the explanatory burden to a.nother constituent. The c\[asmc examp\[e in programming languages is the constraint that all vamables must be declared before they are used. This constrain\[ cannot be Lmposed by a CFG but can be by an indexed grammar, at the cost of a dramatic increase in recognltton complexity. Typically, however, the requirement is slmply not cen~idered part of &amp;quot;syntax&amp;quot;, which thus remams CF, and imposed separately in this case, the overall recognitmn comp\[exlty remams ~ome low-order polynomial, Some arguments of this kind can be found m \[3t~\] Separating the eonstralnts into different subtheome~ wlt\[ not tn general make the problem of recogntzmg strings that satisfy all the constraints any more eHictent, but tt may allow hailing the power of each constituent. To take an extreme example, every r.e. set the homomorphic image of the intersectlon of \[~,) context-free languages, Implementation. This Ls probably the most subtle s,~t of issues determining the sigmfieance of the \[orm,,l results, and I don't claim to understand them.</Paragraph>
    <Paragraph position="1"> Comparison between theories requires agreement between the machine models used to derive the complexity results As mentioned above, the sequential models are all polynomtally related, and no problem not hawng a  polynomial time solution on a sequential machine is likely to have one on a parallel machine limited to at most a polynomial number o\[ processors, at least if P is not equal to NP. Both these results restrict the improvement one can obtain by changing implementation, but are of little use in comparing algorithms of low complexity. Berwick and Weinberg \[6\] give examples of how algorithms of low complexity may have different implementations differing by large constant factors. In particular, changes in the form of the grammar and in its representation may have this effect.</Paragraph>
    <Paragraph position="2"> But of more interest I believe is the fact that implementation is often accompanied by some form of resource limitation that has two effects. First it is also a change in speeifieaZ~bn. A context-free parser implemented with a bounded stack recognizes only a finite-state language.</Paragraph>
    <Paragraph position="3"> Second, very special implementations can be used if one is willing to restrict the size of the probterrt to be solved, or even use special-purpose methods for limited problems. Marcus's parser \[28\] with its bounded look-ahead is another good example. Sentences parsable ~nthin the allowed look-ahead have &amp;quot;quick&amp;quot; parses, but some grammatical sentences, such as &amp;quot;garden path&amp;quot; sentences cannot be recognized without an extension to the mechanism that would distort the complexity measures.</Paragraph>
    <Paragraph position="4"> There is obviously much more of this story to be told. Allow me to speculate as to how it might go. We may end up with a space of linguistic theories, differing in the idealization of the data they assume, in the way they decompose constraints, and in the procedural specifications they postulate (I take it that two theories may differ tn that the second simply provides more detail than the first as to how constraints specified by the first are to be used.) Our observations, in particular our measurements of necessary resources, are drawn from the &amp;quot;ultimate implementation&amp;quot;, but this does not mean that the &amp;quot;ultimately low-level theory&amp;quot; is necessamly the most reformative, witness many examples in the physical sciences, or that less procedural theories are not useful stepping stones to more procedural ones.</Paragraph>
    <Paragraph position="5"> It is also not clear that theories of different computational power may not be useful as descriptions of different parts of the syntactic apparatus. For example, tt may be earner to learn statements o\[ constraints within the framework of a general machine. The constraints once learned might then be subjected to transformation to produce more ei=llcient special-purpose processors also imposing resource limitations. Indeed, the &amp;quot;possible languages&amp;quot; of the future may be more complex than the present ones, just as earlier ones may have been syntactically simpler Were ancient languages regularO null Whatever we decide to make of existing formal results, Lt is clear that continumg contact with the complexity community is important. The driving problems there are the P = NPquestion, the determination of lower bounds, the study of time-space tradeot~s, and the complexity of parallel computations. We still have some m~thodological house-cleaning to do, but I don't see how we can avoid being affected by the outcome of their investigations.</Paragraph>
    <Paragraph position="6"> ~ITKNO~E~ENTS Thanks to Bob Bet'wlck, Arawnd Joshi, Jim Hoover, and Stan Peters for their suggestions.</Paragraph>
    <Paragraph position="7"> APPE~IDIX Rounds \[41\] proves that context-sensitive rules under node-admissibility generate only context-free languages by constructing a non-deterministic bottom-up .tree automaton to recognize the accepted trees. We sketch here a proof that makes use of several d~tev-r, tin/stic O&amp;quot;a.,~d'uc.e~s instead.</Paragraph>
    <Paragraph position="8"> FSTAs can be generalized so that instead of simply accepting or rejecting trees, they transform them, by adding constant trees, and deleting or duplicating subtrees. Such devices are called ~nite state tree transdueerw (FSTT), and like the FSTA they can be top-down or bottom-up. Flrst motivated as models of syntax-directed translations for compilers, they have been extensively studied (e.g. \[47\], \[48\], \[40\]) but a simple subset is sufficient here.</Paragraph>
    <Paragraph position="9"> The idea is this. Let The the set of trees accepted by the CS-based gram/nat. Let t be in 71 F'STTs can be used to label each node ~% of t with the set of all proper analyses passing through n. It will then be simple to check that each node satisfies one of the node admissibility conditions by sweeping through the labelled tree with a bottom-up FSTA.</Paragraph>
    <Paragraph position="10"> The node labelling is done by two FST&amp;quot;Fs ~'l and r e. Let be the maximum length of any left or right-context of any node adITussibility condition. Thus we need only label nodes with sets of strings of length at most rrL, and over a finite alphabet there are only a fitute number of such strings.</Paragraph>
    <Paragraph position="11"> r I operates bottom-up on a tree t, and labels each node 7t of t with three sets Pb'efz(n), Sufj~z(n), and Y'zeld(n) of proper analyses: if P is the set of all proper analyses of the subtree rooted at vt, then Prefix(n) is the set of all substrings of length at most m that are prefixes of strings of P. Similarly, ,~tJ%z(n) is the set of all suGixes of length at most ~,~., and Z'zetd(n) is the set of all strings of P of length at most rrL. It can easily be shown that for any set of trees T, Tis recognizable if and only if ~ /r) is.</Paragraph>
    <Paragraph position="12"> Applying to the output of &amp;quot;r j, the second transducer -rm operating top-down, labels each node rt with all the proper analyses going through n, i.e. wlth a pair of sets of strings. The first set ~nll contain all left-contexts of node n and the second all mght-eontexts, r 2 also preserves recogmzability. A bottom-up FSTA can now be defined to check at each node that both the context-~ree part of a ru/e as well as its context conditions are satisfied.</Paragraph>
    <Paragraph position="13"> This argument also extends easily to cover the dotalnance predicates of ioshi and Levy: transducers can be added to label each node wlth all its &amp;quot;top contexts&amp;quot; and all its &amp;quot;bottom- contexts&amp;quot; &amp;quot;\['he final ?STA must then check that the nodes satlsf-y whatever Boolean combination of dorrunanee and proper analysis predicates _are requJred by the node admissibility m//es.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML