 The Elimination of Grammatical Restrictions M. Salkoff and
 
in a String Grammar of English
 
N. Sager
 
Institute for Computer Research in the Humanities New York University, New York
 
i.
 
Sun~nary of String Theory In writing a grammar of a natural language, one is 
faced with the problem
 
o f e x p r e s s i n g grammatica
 NVN number: sequence N1 and N 2 (N, noun: V, verb), the subject The boy eats the 
meat; Q N1 P N2 N
 
For example, i n t h e s e n t e n c e form
 and the verb V must agree in Or, in the five feet in length, One of the
 
~ The boys eats the meat. e.g.,
 
(Q a number; P, preposition),
 
are of particular subclasses:
 
theories of linguistic structure which is particularly relevant to this problem 
is linguistic string analysis[1]. In this theory, the major syntactic (a string 
is
 
structures of English are stated as a set of elementary strings a sequence of 
word categories, e.g., N V____NN, N V P N, eta).
 
Each sentence
 
of the language consists of one elementary sentence (its center string) plus 
zero or more elementary adjunct strings which are adjoined either to the right 
or left or in place of particular elements of other elementary strings in the 
sentence. 17.~
 
The elementary strings can be grouped into classes according to how and where 
they can be inserted into other strings. an elementary string, X If Y = X 1 X 2 
. . . Xn is
 
ranging over the category symbols, the following
 
classes of strings are defined:
 
left adjuncts of X: adjoined to a string Y to the left of X in Y, or to the left 
of an ~X adjoined to Y in this manner.
 
rX
 
right adjuncts of X: adjoined to a string Y to the right of X in Y, or to the 
right of an rX adjoined to Y in this manner.
 
replacement strings of X: adjoined to a string Y, replacing X in Y. sentences 
adjuncts of the string Y, adjoined to the left of X 1 or after X i in Y (l~ i ~ 
n), or to the right of an Sy adjoined to Y in this manner.
 
Cy, i conjunctional strings of Y, conjoined after X i in Y (i< i < n), or to _ _ 
the right of a Cy, i adjoined to Y in this manner. z center strings, not 
adjoined to any string.
 
These string-class definitions, with various restrictions on the repetition ~and 
order of members of the classes, constitute rules of combination on the 
elementary strings to form sentences. Roughly speaking, a center string is the 
skeleton of a sentence and the adjuncts are modifiers. green we met in in An 
example of a left adjunct of N is the adjective A right adjunet of N is the 
clause whom
 
the green blackboard. the man whom we met. in the sentence
 
A replacement formula of N is, for example, The same sentence
 
what he said
 
What he said was interesting.
 
with a noun instead of a noun replacement string might be interesting. since he 
left. An example is are Examples of sentence adjuncts are
 
The lecture was
 
in general, at this time,
 
The c strings have coordinating conjunctions at their head. but left in He was 
here but left. Examples of center strings
 
He understood
 
and also
 
We wondered whether he understood.
 
The grammatical dependencies are expressed by restrictions on the strings as to 
the word subcategories which can occur together in a string or in strings 
related by the rules of combination. Thus, in the center string N 1 V N2, the
 
figrammatical dependency mentioned above is formulated by the restriction:
 
if
 
N1 is plural, theh V does not carry the singular morpheme -_ss. The string 
grammar with restrictions gives a compact representation of the linguistic data 
of a language, and provides a framework within which it is relatively simple to 
incorporate more linguistic refinement, restrictions. J i.e., more detailed
 
One may ask whether it is possible to write such a string grammar without any 
restrictions at all, i.e., to express the grammatical dependencies 
(restrictions) in the syntactic structures themselves. In the resulting
 
restrictionless grammar, any elements which are related by a grammatical 
dependency w i l L b e e l e m e n t s relations, other of the same elementary 
string. No grammatical
 
than those given by the simple rule of string combination, The result of this 
paper is to
 
obtain between two strings of a sentence.
 
demonstrate that such a restrictionless grammar can be written [4]. In order to 
obtain a restrictionless form of a string grammar of English, we take as a point 
of departure the grammar used by the computer program for string decomposition 
of sentences, developed at the University of Pennsylvania [2,3]. This gran~nar 
is somewhat more detailed than the sketch of an English A summary of the form of 
the computer grammar is In section 3 we show how the restrictions can be
 
string grammar in Ill. presented
 
below in section 2.
 
eliminated from the gran~nar. An example of a typical output obtained for a 
short sentence from a text of a medical abstract is shown in Figs. 1 and 2. The 
decomposition of the
 
sentence into a sequence of nested strings is indicated in the output by the 
numbering of the strings. As indicated in line 1., the sentence consists of
 
the two assertion centers in lines 2.and ~ ~ conjoined by and. The line B 
 
ficontains a sentence adjunct th~_~) on the assertion center as a whole . The 
assertion center 2 . is of the form N V A : Spikes would be effective . The noun 
spikes has a left adjunct (such enhanced) in line 5 -  as indicated by the 
appearance of 5 . to the left of spikes . The object effective has a left 
adjunct ~ 9 _ ~ ) in line 6 . and a right adjunct in line 7  In the same wsy,
 
each of the elements of the adjunct strings may have its own left and right 
adjuncts. Line IO . contains an assertion center in which the subject and the 
This zeroing is indicated in the
 
modal verb (woul____dd)have been zeroed.
 
output by printing the zeroe~ element in parentheses. The difference between the 
two analyses in Figs. decomposition of the sequence in initiating analysis (Fig. 
i an~ 2 lies in the In the first
 
synaptlc action.
 
I), this sequence is taken as a
 
P_~N right adjunct on of the
 
effective, where initiating
 
synaptlc is a left adjunct (onaction)
 
form of a repeated adjective (parallel to escaping toxic in the sequence in 
eseap.ing toxic gases) . In the second analysis (~ig. 2), this same sequence is 
taken as a ~ right adjunct of effective, where initiating
 
is the Ving, and synaptic action is the Object of initiating.
 
The Computer String Grammar. In representing the string grammar in the computer, 
a generalized grammar
 
where
 
Y-. = Y' IS
 
where Y' is a grammar string like Y.
 
This system of nested gram~nar strings terminates when one of the grammar 
strings is equal to an atomic string (one of the word-category symbols). The
 
Y. are called the options of Y, and each option Y. consists of the elements Y... 
l l 13 Not every option of a grammar string Y will be well-formed each time the 
sentence analysis program finds an instance of Y in the sentence being analyzed. 
Associated with each option Yi is a series of zero or more tests, called 
restrictions. 'If RiP is the set of tests associated with Yi then the grammar
 
A restriction is a test (which will be descrfbed below) so written that if it 
does not give a positive result its attached option may not be chosen. All of 
the restrictions in the grammar fall into two types: TypeA: The restrictions of 
type A enable one to avoid defining many The options of the grammar string Y
 
similar related sets of grammar strings.
 
have been chosen so that Y represents a group of strings which have related
 
filinguistic properties. This allows the grammar to be written very compactly, 
and each grammar string can be formulated as best suits the linguistic data. 
However, when a grammar string Y appears as a Y' ij of some other string Y' , 
some of the options of Y may lead to . non-wellformed sequences. In order to 
retain the group of options of Y and yet not allow non-wellformed sequences 
wherever options of Y which would have that effect are used, we attach a 
restriction of type A to th0s~ options of Y. For example, let Y be
 
where
 
and
 
YI = which Z V (e.g., which he chose)
 
Y2 = what E V (e.g., what he chose) Then Y can appear in the subject Z of the 
linguistic center string CI:
 
Cl = z v n
 What he chose was impDrtant.
 
This yields Which he chose was important;
 
As it is defined here, Y can also be used to represent the wh-clauses in the 
right adjuncts of the noun:
 
but in rN only the which option of Y gives wellformed sequences: 3 the book 
which he chose the book what he chose
 
Hence a restriction R a is attached to the what option of Y (eq. 5) whose effect 
is to prevent that option from being used in rN. Type B: With some given set of 
rather broadly defined major categories (noun,
 
verb, adjective, etc.) it is always possible to express more detailed linguistic 
relations by defining sub-categories of the major categories. These relations
 
then appear as constraints on how the sub-categories may appear together in the 
grammar strings Y. If some element Yij of Yi is an atomic string (hence a 
word-category symbol) representing some major category, say C, then Rb may 
exclude the subcategory Cj as value of Yij if some other element Yik of Yi has 
the value Ck. Y i k m a y also be a grammar string, in which case R b m a y 
exclude a particular option of Yik when Yij has value C.. The restrictions Rb 
may be classified into three kinds: (a) Between elements of some string Y. where 
the Y.. correspond to elements 1 i~
 
of a linguistic string. For example, A noun in the sub-category singular cannot 
appear with a verb in the sub-category plural. ~ The man agree.
 
Only a certain sub-category of adjective can appear in the sentence adjunct 
P__AA : in general, (b) in particular, ~ in ha~py.
 
Between a Yij and a Yik where Yij corresponds to an element of a linguistic For 
example,
 
string and Yik corresponds to a set of adjuncts of that element.
 
In rN, the string to V 2 cannot adjoin a noun of sub-categoryN 2 (proper names): 
the man to do the job ~ John to do the ~ob.
 
Only a certain adjective sub-category (e.g., re~/.e~, available) can appear in 
rN without any left or right adjunct of its own: the people present ; (c) ~ the 
people happy.
 
Between Yij and Yik ' where one corresponds to an element of a
 
linguistic string and the other corresponds to an adjunct set which can repeat 
itself, i.e., which allows 2 or more adjuncts on the same ling-
 
uistic element. These restrictions enable one to express the ordering among 
adjuncts in some adjunct sets. For example, Q (quantifier) and A (adjective) are 
both in the set N ' the left adjuncts of the noun. However, _Q can precede A 
but A cannot precede _Q when both are adjuncts of the same N in a sentence: 3 Q 
A N books , but ~AQN e.g., five green
 
e.g., green five books.
 
The string grammar defined by eqs. i-3, together with the atomic strings 
(word-category symbols) have the form of a BNF definition. The system with eq. 
4, however, departs from a BNF definition in two important respects : (a) it 
contains restrictions (tests) on the options of a definition; (b) the atomic 
strings (word-categories) of the grammar have sub-classifications. With the 
elimination of the restrictions, the computer grammar will again have the form 
of a BNF definition.
 
fi3. Elimination of the Restrictions The restrictionless string grammar is 
obtained from the grammar (in
 
described above by the methods of (A) and (B) below. Initially this paper), 
conjunctional restrictionless
 
strings have not been included in the
 
grammar. We estimate that the addition of conjunctions/ grammar by a
 
strings will increase the size of the restrictionless factor of about 5.
 
(A) The linguistic strings represented in the computer graz~,ar are reformulated 
in accordance with the following requirement. any utterance of a language 
containing Given
 
grammatical dependency obtains between A and B , the elementary strings of a 
restrictionless string grammar are defined so that A and B appear
 
together in the same linguistic string, and any iterable sequence between A and 
B is an adjunct of that string. Iterable sequences of the type seemed to begin 
to in It seemed to be~in to surprise him that we in It is said to be known
 
worked seriously , or
 
is said to be known to
 
to surprise him that we worked seriuusly
 
are analyzed as adjuncts.
 
If we place such sequences among the left adjuncts of the verb, v ' then the 
sentences above can be put in the form
 
It~_v surprise him that we worked seriously
 
fi~v = seemed to begin to ; However, when the adjunct by definition), surprise 
verb of ~v
 
is said to be known to ;
 
etc.
 
takes on the value zero (as can all adjuncts, sequence It
 
then (9) above becomes the non-grammatical
 
him that we worked seriously. ~v (seemed ~
 
This happens because the first and the latter
 
is__) carries the tense morpheme,
 
disappears when
 
We separate the tense morpheme from the verb, and
 
place it in the center string as one of the required elements. (i0) C1 = Z t ~ V 
g;
 
This formulation of the assertion center string C1 (lO), in which the tense 
morpheme is an independent element and iterable sequences are taken as adjuncts, 
is necessary in ord@r to preserve, for example, the dependence surprises him 
that we In the
 
between the particle it and the succeeding sequence worked seriously: 
grammar~which
 
~ The book surprises him that we worked seriously.
 
includes restrictions,
 
this formulation is not necessary because
 
this dependence can be checked by a restriction. (B) Turning to the computer 
form of the grammar, all the restrictions of the grammar are eliminated either 
by defining new grammar strings (for the elimination of the restrictions 
categories by the particular required by the restriction Ra) ' or by replacing 
the general word-
 
subclasses of those categories which are (to eliminate Rb). The application of 
this
 
procedure increases the number of strings in the grammar, of course. The 
restrictions R a can be eliminated in the following manner. Suppose
 
the option Yi of Y has a restriction R a on it which prevents it from being 
chosen in Y' (Y is a Y'ij of Y'). Then define a new grammar string Y ' w h i c 
h
 
ficontains all the options of Y but Y. :
 
Then the new gran~nar string Y* replaces Y in Y'. R a on p. 5, the string Y* = 
which Z t fv V / ....
 
Thus, in the example of (in the modified treatment
 
of tense and iterable sequences) would replace Y in r N. The restrictions R b 
are eliminated in a different way, according to the types described on p. 6. (a) 
New strings must be written in which only the wellformed sequences In the 
example of subject-verb agreement, the
 
of subcategories appear.
 
where N s and Np are singular and plural nouns, V s and Vp singular and plural 
verbs. (b) If an element of a particular subcategory, say Ai, can take only a 
rAi is defined. It
 
subset of the adjuncts rA, then a new adjunct s~ring
 
contains those options~_ of rA which can appear only with A i plus all the 
options of r A which are common to all the sub-categ0ries of A. When this to rA, 
:
 
has been done f0r  all A i having some particular behavior w i t h r e s p e c 
t all the remaining sub-categories A rA ~ AlrA1
 
of A will have a common adjunct string r a
 
As many new sets rAi must be defined as there were special sub-categories A. A 
similar argument holds for ~A
 
and other adjunct sets which depend on A.
 
A new element corresponding to the/adjunct set must be defined in
 
which the adjuncts appear correctly ordered with respect to each other, and each 
one must be able to take on the value zero. This procedure for eliminating 
restrictions is also the algorithm for introducing further grammatical 
refinements into the restrictionless grammar. Such a general procedure can be 
formulated because of an essential property of a string grammar: In terms of 
linguistic (elementary) strings, all a) between elements of a string, or b) 
between an
 
restrictions are either
 
element of the string and its adjunct, or same string.
 
c) between related adjuncts of the
 
Further, there is no problem with discontinuous elements in a all elements which 
depend in some way on each other grammatic-
 
string grammar:
 
ally appear in the same string or in strings which are contiguous by adjunction. 
The cost of the elimination of all restrictions in this way is about an order of 
magnitude increase in the number of strings of the grammar. Instead
 
of about 200 strings of the computer grammar, the grammar presented here has 
about 2000 strings. It is interesting that the increase in the size of the This 
suggests that in a program Also, since
 
grammar is not greater than roughly one order of magnitude. there may be 
practical applications for such a grammar, e.g.
 
designed to carry out all analyses of a sentence in real time.
 
the restrictionless grammar is equivalent to a B.N.F. grammar of English, it may 
prove useful in adding English-language features to programming languages which 
are written in B.N.F.
 
fiSENTENCE
 
N E U H - I B  SUCE ENHANCED SPIKES WOULD BE MORE E F F E C T I V E IN I N I T 
I A T I N G SYNAPTIC ACTION AND THUS BE RESPONSIBLE FOR THE OBSERVED 
POST-TETANIC POTENTIATION 
 Ol
 
I.
 
PARSE SENTENCE
 
INTRODUCER CENTER AND Z, AND
 
CI
 
ASSERTION
 
SUBJECT 5 . SPIKES
 
VERB $ OBJECT gOULD BE 6. EFFECTIVE
 
RV
 
T,
 
ACVERB
 
ADVERB THUS
 
CONJUNCTION
 
LN
 
ARTICLE
 
QUANTIFIER SUCH
 
ADJECTIVE ENHANCED
 
TYPE-NS" NOUN
 
AEVERB
 
PN
 
PREPOSITION IN
 
ACTION
 
lO.
 
CI
 
ASSERTION
 
BE
 
OBJECT RESPONSIBLE
 
II.
 
LN
 
ARTICLE
 
QUANTIFIER
 
ADJECTIVE INITIATING
 
TYPE-MS SYNAPTIC
 
NOUN
 
PN
 
POTENTIATION
 
LN
 
GUANTIFIER
 
ADJECTIVE OBSERVED P O S T - T E T A N I C
 
TYPE-NS
 
NOUN
 
SENIENCE
 
NEUH-.IB  SUCH ENHANCED SPIKES kOULD BE MORE E F F E C T I V E IN I N I | i A T 
I N G S Y N A P T I C A C T I O N AND THUS UE R E S P O N S I B L E FOR THE 
OBSERVED P O S T - T E T A N I C P O T E N T I A T I O N  02 = |NTROOUCER 
CENTER AND Z. AND 3 6 END MARK 
 
PARSE SENTENCE
 
CI
 
VERB  kOULD BE
 
RV
 
T,
 
ACVERB
 
S ADVERB IHUS
 
LN
 
ARTICLE
 
QUANTIFIER SUCH
 
ADJECTIVE ENHANCED
 
TYPE-NS
 
NOUN
 
lCVERB
 
To P NS V I N G I O F |
 
0 = PREPOSITION IN
 
SN INIIIATING
 
ACTION
 
CI
 
ASSERTION
 
VERB (WOULD)
 
OBJECT RESPONSIBLE
 
LN
 
QUANTIFIER
 
TYPE-NS
 
NOUN
 
PN
 
= LP P R E P O S I T I C N FOR
 
POTENTIATION
 
QUANTIFIER
 
ADJECTIVE OBSERVEO P O S T - T E T A N I C
 
TYPE-NS
 
NOUN
 
NG MCRE PARSES
 
Conclusion

4. This problem was suggested by Professor J. Schwartz of the Courant institute of Mathematical Sciences, New York University. 

5. The option Yi here corresponds to the linguistic string Y of the previous section. The symbol / separates the options of a string definition. Academic Press,

REFERENCES
 
1. Harris, Z. S., . String Analysis of Sentence Structure. Papers on Formal Linguistics, No. l, Mouton and Co., The Hague, 1962. 

2. Sager , N., Salkoff, M., Morris, J., and Raze, C., . Report on the String Analysis Programs. Department of Linguistics, University of Pennsylvania, March 1966. 

3. Sager, N., . Syntactic Analysis of Natural Language. Advances in Computers (Alt, F. and Rubinoff, M., eds.), vol. 8, pp. 153-188. New York, 1967. 
