File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/65/c65-1016_metho.xml

Size: 23,311 bytes

Last Modified: 2025-10-06 14:11:05

<?xml version="1.0" standalone="yes"?>
<Paper uid="C65-1016">
  <Title>SETS OF GRAMMARS BETWEEN CONTEXT-FREE AND CONTEXT-SENSITIVE</Title>
  <Section position="1" start_page="16" end_page="16" type="metho">
    <SectionTitle>
1965 International Conference on Computational Linguistics
SETS OF GRAMMARS BETWEEN CONTEXT-FREE
AND CONTEXT-SENSITIVE
</SectionTitle>
    <Paragraph position="0"> U.S.A.</Paragraph>
    <Paragraph position="1"> ,:;~' ' e / %,' # .</Paragraph>
  </Section>
  <Section position="2" start_page="16" end_page="16" type="metho">
    <SectionTitle>
ABSTRACT
</SectionTitle>
    <Paragraph position="0"> We discuss some sets of grammars whose generative power lies between that of the set of context-free grammars and that of the set of context-sensitive grammars. These sets are developed by subjecting generators of context-sensitive grammars to abstract versions of a &amp;quot;hardware&amp;quot; restriction to which the users of natural languages, unlike the describers of natural languages, might be subject.</Paragraph>
    <Paragraph position="1"> Kugel 1 The notion of a formal grammar was first introduced to provide formal models of techniques used by the describers of natural languages (linguists) (1). Later, formal grammars have been used as models of the capabilities of users of natural languages (See (2) for a review). Language users differ from language describers in being subject to restrictions on the amount of nhardwareW that they have available to them and the amount of time that they have to perform their operations. Where the linguist has available (at least theoretically) an unlimited amount of material with which (pencil) and on which (paper) to store his intermediate results, it is probable that the internal organization of the natural language user may not permit him the use of such unlimited re source s.</Paragraph>
    <Paragraph position="2"> Therefore, when one uses a formal grammar as a model of the language user, one may consider the effects of subjecting such grammars to abstract versions of certain types of hardware limitations. One model in this vein is t hat of Yngve (3) which considers the natural language user to be like a device capable of dealing with context-free languages and then subjects it to further limitations. However, there are reasons for thinking that natural language users may have available to them powers beyond those of the context-free grammars. According to current views, these additional powers are those that are required to construct transformational grammars. Among these one might include the ability to permute the order of elements in a string and the ability to erase elements (4).</Paragraph>
    <Paragraph position="3"> The ability to effect the permutation of elements is a property of context-sensitive grammars. However, context-sensitive grammars have additional drawbacks as models for the capabilities of the users of natural languages (1). Permitting erasure as an element the generation of a phrase marker has the difficulty that it is not always clear whether the resulting rewriting systems . generate only recursive sets of strings. These considerations suggest that one Thus, any semi-Thue system (For a definition see (5), p. 84) can be looked at as a context-free grammar which permits the shortening of strings (erasure). But semi-Thue systems are capable of generating non-recursive sets of strings ((5), Theorem 2.6, p. 93).</Paragraph>
    <Paragraph position="4"> Kugel 2 might want some context-sensitivity and some erasure but not enough to produce the undesirable features of context-sensitive grammar or of semi-Thue systems.</Paragraph>
    <Paragraph position="5"> One way of getting at such grammars might be to consider a device for generating context-sensitive languages and subjecting it to abstract versions of t he types of hardware limitations to which the users of natural language users might be sjubect.</Paragraph>
    <Paragraph position="6"> Assume that users of natural languages are information processing systems organized in the manner of the present-day digital computer. They have a storage unit (memory),a processing unit, and some input/output equipment. One way of suggesting the roles of these parts is to say that they correspond roughly to those parts of the handling of a natural language that are described by the semantic, syntactic and phonetic components of a language description respectively. Since our concern in this paper is largely with the syntactic component, we will consider limitations on the effects of limitations on the processing unit. Suppose that the processing unit has the machinery for applying the rewriting rules of context-sensitive grammar, but that this application has to be done by changing the state of something like a register in the arithmetic unit of a present-day computer. Such a register can be looked at as a sequence of pigeon holes into which symbols can be placed. A rule then is applied to change the contents of the pigeonholes and the results are returned to the memory or output. To say that the registers have a given size is to say that there is only a fixed number of such . pigeon-holes . Such an assumption finds a formal analogue in the notion of a formal grammar as a restriction on the length of the strings that can appear on either side of the arrow in a rewriting rule. To say that a register has only n pigeon-holes is to say that the strings on either side of the arrow can contain at ** most n symbols. However, such a restriction does not accomplish much that We are also assuming that there is no way of doing anything like multiple precision arithmetic.</Paragraph>
    <Paragraph position="7"> ** Or, equivalently, that the string on the right hand side of the arrow can contain at most n symbols.</Paragraph>
    <Paragraph position="8"> Kugel 3 / is of interest, for it is easy to prove that: Theorem 1: The set of all grammars that contain strings of no more than two symbols on either side of their rewriting rules has the generative power of the set of context-sensitive grammars. The set of all grammars that contain no more than three symbols in any rewriting rule has the generative power of . the set of context-flee grammars.</Paragraph>
    <Paragraph position="9"> It is clear from an examination of the proof of the first part of this theorem that the restriction on the length of the strings used in stating rules of the grammar is overcome by introducing new letters. Such an introduction of additional letters is common in proofs of theorems about formal grammars and it is reasonable so long as one is considering these grammars as models of the procedures used by language describers who have available to them a medium (pencil marks on paper} which is unlimited not only in amount, but which permits an unlimited variety of symbols within a given space (at least in theory}. The fact that language users might have to represent their grammatical catagories in a discrete rather than continuous medium suggests that one might limit the number of available (distinct) symbols that can appear in a rule of grammar. However, this restriction also is of no great interest since we can prove the following: Theorem 2: There is a sense in which the generative power of grammars whose rules can be expressed using only two distinct symbols in its vocabulary is equivalent to the set of all context-sensitive grammars.</Paragraph>
    <Paragraph position="10"> Suppose, therefore, that one attempts to limit both of these simultaneously. Thus, let us define a &amp;quot;grammar of size (m, p)S as a grammar whose rules are constructed of strings (on either side of the rewriting rule's arrows) such that no string contains more than m occurrences of letters and such that the non$ null Definitions and proofs of theorems can be found in the appendix.</Paragraph>
    <Paragraph position="11"> Explicated in the appendix.</Paragraph>
    <Paragraph position="12"> Kugel 4 terminal vocabulary of the grammars contains no more than p distinct letters. Let us first consider such grammars as augmented simply by dictionaries.</Paragraph>
    <Paragraph position="13"> These grammars turn out to be curious hybrids. For one thing, given a size, there is only a finite number of grammars of that size (if one equates straight-forward reletterings of the same grammars). Furthermore: Theorem 3: The set of grammars of size (m, p) with dictionaries, for sufficiently large m and p, cannot generate all context-free languages and can generate some languages which are not context flee.</Paragraph>
    <Paragraph position="14"> Nevertheless, it is obvious that the union of the grammars of size (m, p) for all values of m and p has the generative power of the set of all context-sensitive grammars (since any context-sensitive grammar has some finite size).</Paragraph>
    <Paragraph position="15"> These grammars are not particularly interesting because we have put limits on the amount of recursion that can appear in them. This can be overcome by permitting some recursion either in a pre- or post-processor, limiting recursion to context-free rules only. Thus, we are led to consider systems consisting of three parts in tandem. The first part is a context-free grammar, the second part is a grammar of size (m, p), and the third is a dictionary. Although such systems appear to be rather ad ho% one can give some arguments for considering them. The arguments for the two grammars in tandem are roughly those for a context-free grammar followed by a transformational component. If we allow erasure in the final processing we can permit our intermediate string generated by the context-free grammar to be the phrase marker in something approximating Polish notation. Thus, the phrase marker: /\ Kugel 5 could be represented by the string SACxDyBz. The context-sensitive grammar of restricted size could operate on these markers in the manner of a transformational component. The dictionary would contain rules of the form X--*, A--*, etc., to erase the non-terminal symbols. This argument suggests that if one wants such a system as a model for a natural language user one might consider different primitive operations in the part of the system that was to represent the transformational component. Thus, using the suggestion of (4) one might permit not only what we have been calling grammar rules but also rules which permute the order of strings directly such as rules of the form: XYZ--*ZYX. By making these primitive one makes them cost less of the &amp;quot;size&amp;quot; of the underlying grammar. The argument for allowing something like a dictionary is that something of this sort appears to be required for the phonetic component of a language description anyway.</Paragraph>
    <Paragraph position="16"> Let us call such systems &amp;quot;grammar systems of size (m, p). &amp;quot; Those systems which have primitive permutation rules we might call &amp;quot;permutation systems. &amp;quot; We can prove: Theorem 4: Grammar systems define infinite hierarchies of languages L 0. .. L i. .. such that (a) L 0 is the set of context-free languages; (2) Li~ Lj for j sufficiently greater than i and (3) the sum of the L i for all i is the set of context- sensitive language s.</Paragraph>
    <Paragraph position="17"> We have suggested that if a natural language user is organized like a present-day digital computer he might find that the size of the registers in what corresponds to his &amp;quot;processing unit&amp;quot; might have an effect on the kinds of languages with which he could deal. We have given a rather preliminary sketch of how this might occur. Such effects appear however, to be critically dependent on the &amp;quot;machine code&amp;quot; of such a system, and in view of the currentlack of knowledge as to what this code might be, it is not clear whether the kinds of notions that we have discussed have any applications in computational linguistics, even if the underlying notion of some sort of a &amp;quot;register&amp;quot; limitation applies to the competence of natural language users.</Paragraph>
    <Paragraph position="18"> Kugel 6</Paragraph>
  </Section>
  <Section position="3" start_page="16" end_page="16" type="metho">
    <SectionTitle>
APPENDIX
</SectionTitle>
    <Paragraph position="0"> This appendix contains definitions of some of the terms used in the body of the paper and proofs of the theorems. We begin by defining some basic notions. A rewriting rule is a rule of the form PhQ-*PHQ where P, Q, h, and H are (possibly empty*) strings. If h is a single letter and H is a non-empty string of letters a rewriting rule is called a grammar rule. A grammar (or a context-sensitive grammar ) G is a single letter S together with a finite set of grammar rules. The @phabet of G is the set of all letters in rewriting rules of G. The non-terminal vocabulary of G is the set of all letters appearing on the left hand side of some grammar rule in G. The terminal vocabulary of G is the alphabet of G minus the non-terminal vocabulary of G. We will assume that S is always in the non-terminal vocabulary of G.</Paragraph>
    <Paragraph position="1"> A set of rewriting rules which contains no non-terminal letters on the right-hand side of a rewriting rule is called a dictionary. If P and Q in all the grammar rules of G are empty,then G is a context-free grammar. A derivation of a string S n in a grammar G is a sequence of strings S 1, ..., S n such that S 1 is S and such that S i +1 is the result of replacing some sequence of letters L in S i by a sequence of letters L' such that L--L' is one of the grammar rules of G. The language generated by a grammar G is the set of all strings M such that there exists a derivation of M in G, and such that M consists of only letters in the terminal vocabulary of G. Two sets of grammars that generate the same sets of languages are said to have t_he same generative power.</Paragraph>
    <Paragraph position="2"> Theorem h (a) The set of grammars , none of whose rules contain more than four letters, has the same generative power as the set of context-sensitive grammars. (b) The set of grammars none of whose rules contain more than three letters has the generative power of the set of all context free grammars.</Paragraph>
    <Paragraph position="3"> PhQ, however, is not empty (i. e., not all of P, h, and Q can be empty).</Paragraph>
    <Paragraph position="5"> an effective procedure for replacing each of the rules of an arbitrary context-sensitive grammar G with a set of rules containing no more than two letters on either side of the arrow, and such that the generative power of the resulting grammar G' remains the same. Consider an arbitrary rule of the form L--R, where L=a 1...a i...aj andR=al...a.l_lb 1...bkai+ 1...aj. Replace this by the following new rules in which the c and d are new letters not in the alphabet m m</Paragraph>
    <Section position="1" start_page="16" end_page="16" type="sub_section">
      <SectionTitle>
of G:
Rules
</SectionTitle>
      <Paragraph position="0"/>
      <Paragraph position="2"> In schematizing the effects of a sequence of rules we have assumed an order in their application. However, where the order of application is arbitrary parts of the strings might be different if the order of application were different.</Paragraph>
      <Paragraph position="3"> These parts are indicated by surrounding them with parentheses.</Paragraph>
      <Paragraph position="4"> Kugel 8</Paragraph>
    </Section>
    <Section position="2" start_page="16" end_page="16" type="sub_section">
      <SectionTitle>
Rules Effect of Added Rules
</SectionTitle>
      <Paragraph position="0"> di-'bld'i + 1 d ) d'i+ n-~bn- 1 i+n+l d'i+k_ l-'b k (...)di(. .. )-*(...)bl...bc(...) c2-*a2:' ' l ci - l~ai - 1 alc 2. *. c i _ 1 (.. * )-*a 1. .. ai_l(. * * ) di+ l-~ai+. 1 1 d.--a. J J (&amp;quot;')di+l'&amp;quot;dj -'~ (''')ai+l''' aj  The equivalence in the other direction (i. e., the fact that all four letter grammars are at most context sensitive) is obvious. (b) Because of the definition of a &amp;quot;grammar rule n rules containing three letters can only be of the form a-*bc (and not ab~c) so clearly all three letter rules are context-free. To produce a three letter equivalent of a longer context-free rule, say a---a 1. . a one replaces it by the rules a-~ala~, .. ~,-~ a' 1' * n ' -i -i i + ' &amp;quot; ' a'-*a where the a? are new letters. n n } Theorem 2: The set of grammars containing only two letter together witha dictionary has the generative power of the set of all context sensitive grammars. Proof: Let the two letters be 0 and 1. Again, it is only necessary to provide an effective procedure for replacing any rule in a given context-sensitive grammar with a new set of rules containing only two letter, plus some dictionary rules. Kugel 9 Suppose that G contains m rules and that the alphabet of G contains n letters. Let each rule be of the form Li-*R i (for the i-th rule). We construct G' as follows: To replace each rule Li--*R i we add new rules as follows: Rewrite each letter a. in L i by the string: \] ~/jth position 011... 1100011... O... 110 (= a!) m times n times lth position The first replacing rule takes the revised L i into 0111..~0... 11000... The effect of this is to tag the string as being subjected to rule number i. The second replacing rule takes the 0 in the n-tuple of ones of the letter being replaced, and it turns it into a 1. If the only effect of the rule is to simply replace this letter by another letter, the rest of the new rules place the 0 in the n-tuple appropriately and then erase the tag in the left-most m--tuple to signal the end of the application of the rule. If the replaced letter is expanded then the replacements are added one letter at a time and the process is finished off by &amp;quot;untagging&amp;quot; the left-most m-tuple in the replacement for L i.</Paragraph>
      <Paragraph position="1"> The dictionary has the job of translating back into the vocabulary of G. It lacks any procedures for dealing with letters whose m-tuple is not all one's so that no intermediate product of a rule can be terminal. The dictionary is simply the set of rules a!-~a, for each a. in the terminal vocabulary of G. It is clear that J J J G' generates exactly the same language as G.</Paragraph>
      <Paragraph position="2"> This proof suggests a problem that might be of some interest. In devising a procedure for reducing the number of letters in what is, in some sense, a program, one is required to add new rules. These rules introduce intermediate products (strings), and the basic problem in the proof was that of devising a way in which these intermediate products can be prevented from being caught up by rules other than those that are intended to apply to them. We have used an extremely straightforward technique for doing this but this technique is costly in the size of the required strings.</Paragraph>
      <Paragraph position="3"> Kugel 10 One might ask what more efficient general procedures there are for such reduction. A reason for asking this question (other than a theoretic interest} is that the world as seen by a biological organism can be looked at as consisting of an arbitrary alphabet, the units (or letters} of v&amp;ich are the basic percepts of that organism. However, the organismTs brain might have a fixed alphabet into which the processing of this (probably larger} alphabet has to be encoded. Such encoding would probably have to be done by an algorithm that avoided this crossing of intermediate products.</Paragraph>
      <Paragraph position="4"> We define a grammar of size (m, p) as a set of grammar rules which has a non-terminal vocabulary of no more than m letters and such that no rule contains a string of more than p occurrences of letters on the right hand side of the arrow.</Paragraph>
      <Paragraph position="5"> Theorem 3: The set of grammars of size (m, p) plus an arbitrary number of dictionary rules for sufficiently large m and p, cannot generate all context-free languages and can generate some languages that are not context-free. Proof: Consider the language that consists of the strings bi repeated an arbitrary number of times aib i. *. bia i for some range r of i, (1 ..&lt; i :.&lt; r}. If r &gt; im then this language cannot be i=1 generated by a grammar of size (m, p) since all the recursion must be in the i=p context-sensitive part. But there are only ~, im distinct left hand sides of i=1 such rules so that the grammar must generate some string of the form aib i. .. bia i for i C/ j. Since any context sensitive grammar is a grammar of size (m, p) and since Chomsky has proved that not all context-sensitive languages are context free (6), it is obvious that there are languages generated by grammars of size (m, p} for sufficiently large (m, p) that are not context-free. We define a grammar system of size (m, p) as three rewriting systems, the first of which is a context-free grammar, the second of which is a grammar of size (m, p} and the third of which is a dictionary. The language generated by such a system is defined in the obvious way.</Paragraph>
      <Paragraph position="6"> Kugel 11 Theorem 4: The sets of languages generated by grammar systems of size (mxp),where mxp = y, define a hierarchy of languages L such that (a) L 0 is the Y set of context-free languages, (b) L. ~ L. for j sufficiently greater than i, and 1 j (c) such that the sum of the L is the set of context-sensitive languages. Y Proof: The set of languages whose strings are of the form PhP, where h is a fixed string and P are arbitrary strings on a given alphabet A, are context-sensitive and not context-free (6). Therefore, in a grammar system which generates such a language,the part that generates such strings must be in the context-sensitive part. Although the dictionary can introduce arbitrary new letters it cannot insure that if the substitution for some given letter a i is to be aj at one time and a k at another, that the substitutions in a given string will be uniform (i. e., always aj and never ak) for the entire length of an arbitrarily long string. Therefore, the rules of the context-sensitive part of the grammar system generating PhP must have different letters (or distinct strings representing different letters) in the left-hand side of its rules. But in the grammar generating the copy of a given string P there must be at least one rule to produce the effect of copying each letter of A. If we let the alphabet of A be larger than mxp, then this cannot occur in a grammar of size (m, p).</Paragraph>
      <Paragraph position="7"> Therefore, for every grammar of size (m, p) there is a context-sensitive language that cannot be generated by a grammar system limited to a grammar of that size. But clearly this language can be generated by a system having a grammar of some finite size. This proves part (b). Part (a) of the theorem is proved by observing that the set of context free languages are generated by a grammar system with a grammar of size (0, 0). This is so because the contentsensitive part is empty and the amount of erasure that can be produced by any dictionary is always finite and therefore its effect can be incorporated into a context-free grammar. Part (c) of the theorem is obvious.</Paragraph>
      <Paragraph position="8"> Kugel 12</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML