FORMAL SPECIFICATION OF NATURAL LANGUAGE SYNTAX 
ABSTRACT 
The two-level grammar is investigated as a notation for giving formal 
specification of the context-frec and context-sensitive aspects of n,~tural 
language syntax. In this paper, a large class of English declarative 
sentences, including post-noun-modificatlon by relative clauses, is 
formalized using a two-level grammar. The principal advantages of two- 
level grammar are: 1) it is very e~sy to understand and may be used to 
give a formal description using a structured form of natural language; 2) it 
is formal with many well-known mathematical properties; and 3) it is 
directly implementable by interpretation. The significance of the latter fact 
is that once we have written a two-level grammar for natural language 
syntax, we can derive a parser automatically without writing any 
additional specialized computer programs. Because of the ease with which 
two-levcl grammars may express logic and their Turing computability we 
expect that they will also bc very snitable for future extensions to 
semantics and knowledge representation. 
1. INTRODUCTION 
Formal specifications of natural language syntax should serve as a 
standard definition for the syntax of the subject language. The 
specification must be complete, concise, consistent, precise, unambiguous, 
understandable, and useful to language scholars, users, and implcmentors 
who wish to develop a parser for the tanguagc to run on a computer. 
Furthermore the specification should be mathematically rigorous to the 
degree that an implementation of the language can be automatically 
derived from the specification {10\]. Unfortunately many of these aims arc 
difficult to accomplish primarily because of the dynanric and informal 
nature of natural language. Formal specification is still a worthy goal to 
the degree allowed by present knowledge about natm'al language and iu 
this paper we propose a mctalanguage for specifying both syntax and 
semantics of natural language that has potential for satisfying these goals. 
The mctalanguage we propose is the two-levd grammar \[16} (also called 
W-grammars and tlgs). Two-level grammars have been used extensively for 
specifying the syntax and semantics of programming languages \[2\] but 
their use in specifying natural language was first introduced by the authors 
\[7, 8, 9\]. 
Existing formal specification mcthods for natural language syntax 
take many forms. Of these, some of the more common are augmented 
transition network grammars \[181, transformational grammars \[1\], and 
generalized phrase-structure grammars \[5\]. These methods and others arc 
also surveyed in \[17\]. The degree to which any formal specification method 
satisfies the above stated goals is sometimes difficult to evaluate and relies 
on subjectivity. The authors do not intend to evaluate these existing 
methods with respect to the requirements of formal specification languages 
but will instead concentrate on why two-level grammars satisfy the 
necessary goals in a mathematically rigorous but readable and easy to 
understand way. In this paper, the two-level grammar mctalanguage will 
be used to define a large classification of English declarative sentences, 
extending work described in \[8\] and \[9\]. We will emphasize the method of 
using two-level grammars for this purpose and the advantages gained 
rather than any particular characteristics of the given grammar. 
2. TWO-LEVEL GRAMMARS 
A two-level grammar consists of two sel)aratc grammars, the 
mstaproductlon rule~ (metarules) and the hyperrules. The metarules are 
generally context-free rules which take the form: 
METANOTION :: hypcrnotion-1; hypcrnotlon-2; ... ; hypernotion-n. 
where METANOTION is tile left-hand side "nonterminal" symbol of the 
production and hypernotion-1, hypernotlon-2, ... hypcrnotion-n are the n 
alternatives of the production right-hand side. Each hypcrnotion consists 
of protonotions (terminal symbols) and other metanotions. In the case of 
English, the terminal symbols of the recta-grammar are English words. 
The recta-grammar itself is used to definc the context-free ~spccts of 
English. Example metarules arc: 
SENTENCE :: DETERMINER NOUN VERB. 
DETERMINER :: a; an; tile; these; those; this; that. 
USING TWO-LEVEL GRAMMAR 
Barrett R. Bryant 
Dale Johnson 
Balanj aninat~h Edupuganty 
Department of Computer and Information Sciences 
The University of Alabama at Birmingbam 
Birmingham, Alabama, U. S.A. 35294 
The hypcrrules are of the form 
hypcrnot\]on ; hyperaltern-1; hyper~ltern-2; ... ; hyperaltern-n. 
Tile hyperalternatives separated by semicolons arc distinct production 
alternatives. Each of these hyperaltcrnatives may be divided into a 
sequence of hypernotions separated by commas. In a two-level grammar 
derivation tree, there will be one br:mch for each clement in the sequeucc. 
A two-level grammar with either hyperrnles having more than one 
hyperaltcrnative or two distinct hypcrrules having the same hypcrnotion 
on the production left-hand side is nondetcrministic. \[f cach hYl)crrule has 
only one hyperalternative and all hypcrnotions in production left-hand 
sides are distinct from mm another then the tlg is dcterminisl;ic. 
A hypcrrule is actually a production rule "pattern" since each 
hyperrule can possibly represent an infinite number of production rules in a 
context-free grammar. This is because each occurrence of a metanotion in 
the hypcrrulc represents all sequences of protonotions that can be derivcd 
from that metanotion. That is, a hyperrule may be viewed as a set of 
production rules (called strict production rules) in which all metanotions 
are replaced by the protonotions they derive. The only restriction here is 
that if there arc more than one occurrcnce of a single rnetanotion, then 
each is replaced by the same protonotion sequence in deriving the strict 
production rules. This is called conMstent substitution. For example, in the 
byperrulc 
where WORD is WOR\]) : true. 
both occurrences of the metanotion WORD repr~ent the same 
protonotion. The set of allowable protonotions in this rule is defined by 
the metarulcs for WORD. If these metarules define an infinite number of 
possible protonotions, then tile above hyperrule also represents an infinite 
uumbcr of strict i)roduction rules. \[t is tiffs featurc of two-level grammars 
that allow tbcm to define context-sensitive and recursivcly enumcrable 
languages \[12\]. 
If consistent substitution is not required (or desired) for metanotions 
with the same root metarulcs (and nanm), then these metanotions may be 
distinguished by subscripts. For example, 
where SENTENCE1 and SENTENCE2 are correct ; 
where SENTENCE1 is corre, ct, where SENTENCE2 is correct. 
In this hypcrrule, SENT:ENOE1 and SENTENCE~. are defined by the 
same metarulcs (and root mctanotion SENTENCE} but need not have the 
same instantiations. 
Some hypcrrules called predicates act as conditions which must be 
satisfied for the derivation to be :~uccessful. A predicate begins with the 
word where or coadition and the terminal derivation of the hyperrule is 
the empty string if the condition is satisfied and will derive a "blind alley" 
(i.e. not derive any terminal string) if the condition is not satisfied. In tire 
two-level grammar of English presented in this paper, all hyperrules arc 
predicates and serve to perform context checks such as subject-verb 
agreement, object~vcrb agreement, and any additional required context 
cheeks which cannot be conveniently specified by a eontext-frce grammar 
(i.e. tile mctarules). 
3. METARULES FOR ENGLISH 
Tile metarulcs of the two-level grammar for English define tire 
context-free a~pccts of English synt*Lx. Some lexical items from English can 
not be easily defined in a forinal way (i.e. using context-free rules). These 
include tile nouns, verbs, adjectives, proper names, and titles, given names 
and surnames for people which arc lcxical categories containing a large 
number of elements. The formal specification of these categories would be 
production rules of tlm form: 
NOUN :: aardvarkl abacus; ...; zucchlnL 
VERB :: abandon; abate; ...; zoom. 
ADJECTIVE :: abdominal; abhorrenti ..4 znhcd. 
PROPER_NAME :: Aberdeen; Abilenc; ...; Zambia. 
TITLE :: Admiral; Archblslmp; ...; Warrant Officer. 
For simplicity we choose to omit more formal specifications of the above 
categories. A more complete list of words in these categories may bc found 
in \[14\]. 
527 
The metarules in our two-level grammar illustrate tile specific subset 
of English grammar defined in this paper. The subset includes declarative 
sentences with the subject noun premed;fled and postmodilled, including 
postmodification by relative clauses. The choice of this subset is rather 
arbitrary since we have used two-level grammars to define a wide variety 
of English sentences (e.g. in \[7\], more extensive modification is allowed and 
also compound sentences). This subset will serve to illustrate the power of 
two-level grammars for the purposes of defining English syntax. Because 
the notation for metarules follows context-free grammar conventions using 
natural language vocabulary, our recta-grammar is fairly self-explanatory. 
The rules of English syntax that have been incorporated into our grammar 
are based on English grammar rules given in \[3\], \[11\], \[131, and \[19\]. 
We now enumerate the metarules used in our two-level grammar of 
English. A scntence consists of a noun phrase and a verb phrase. The 
noun phrase consists of an optional sentence modifier such as a "viewpoint" 
adverbial and a subject sequence. The subject sequence consists of two 
main subjects, separated by the coordinator and. The main subjects may 
be either a list of nouns premed;fled and postmodified or a proper name 
premodificd by a restricter. 
1. SENTENCE :: NOUN_PHRASE VERB_PItRASE PERIOD. 
2, NOUN_PHRASE :: 
SENTENCE_MODIFIER SUBJECT_SEQUENCE. 
3. SENTENCE_MODIFIER :: VIEWPOINT COMMA; EMPTY. 
4, VIEWPOINT :~ artlstlcally; eeonoudeaily; etMcally; financially; 
geographically; linguistically; militarily; morally; personally; 
politically; psyehologleally; publically; theoretleally; visually. 
5. SUBJECT_SEQUENCE t: 
MAIN_SUBJECT; MAIN_SUBJECT and MAIN_SUBJECT. 
g. MAIN_SUBJECT :.* MODIFIED_NAMED_SUBJECT~ 
PRE_NOUN_MODIFICAT10 N NOUN_tIEAD 
POST_NOUN_MODIFICATION. 
7. MODIFIED_NAMED_SUBJECT :: 
RESTRIOTERS NAMED_SUBJECT. 
8. NAMED_SUBJECT ~: PROPER_NAME; GIVEN_NAME~ 
SURNAME; TITLE SURNAME. 
tl. RESTRIOTERS :: chiefly; especially; even; just; largely; mainly; 
mostly; primarily; not even; only; EMPTY. 
10. NOUN_HEAD :: NOUN; NOUN and NOUN; 
NOUNJLIST COMMA_OPTION and NOUN. 
11. NOUN_LIST :: 
NOUN_LIST COMMA NOUN; NOUN COMMA NOUN. 
The verb phrase consists of a predicate sequence and an object 
sequence. Tlm predicate sequence consists of an auxiliary seqnence (an 
optional auxiliary adverb such as a focusing or maximizing adverb 
followed by an active or passive auxiliary verb) and the main verb of the 
sentencc, 
12. VERB_PIIRASE :: 
PREDICATE_SEQUENCE OBJECT_SEQUENCE. 
13. PREDICATE_SEQUENCE :: AUXILIAI?~Y_SEQUENGE VERB. 
14. AUX-ILIARY_SEQUENCE t: AUXILIARY._ADVERB_OPTION; 
AUXILIARY_ADVERB_OPTION 
AGTIV E_OR_PAS S IVE~A UXI LIARY. 
15. AIYXILIARY_ADVERI~_OPTION::AUX\]LIARY'~ADVERB; EMPTY. 
18. AUXILIARY_ADVERB :: 
FOCUSING_ADVERB; MAXIMIZING._ADVERB. 
17. FOCUSING_ADVERB :: again; also; as we;l; at least; equally; 
especially; even; fnrtlmr; in addition; in particular; just; largely; 
likewise; mainly; mercly~ mostly; notably; only; partlcula,'ly! 
primarily; principally; purely; purely and slmplyl shnilarly i 
simply\] specifically. 
18. MAXIMIZING_.ADVERB :: absolutelyl altogether; completclyl 
entirely; fully; in Ml respects; perfectly; qulte; thoroughly; 
totally; utterly; very fufiy; very thoroughly. 
lg. ACTIVE_OR_PASSIVE_AUXILIARY :~ 
ACTIVE_AUXILIARY; PASSIVE_AUXILIARY. 
20. ACTIVE_AUXILIARY :: 
A1 IXILIARY_.\[IAVE AUXILIAR Y_ADVERB_OP TIC N. 
21. PASSIVE_AUXILIARY :: 
AUXILIARY_BE AUXILIARY_ADVERB_OPTION; 
AUXILIARY_J~IAVE AUXILIARY_ADVERB~OPTION been. 
22. ALVXILIARY_BE :: am~ is; were; was. 
23. AUXILL~Y_ItAVE :: have; had; has. 
24. AUXILIARY_VERB z: AUXILIARY_BE; AUXILIARY_HAVE. 
25. AUXILIARY_TRAILER :: AUXILIARY_ADVERB_OPTION; 
AUXILIA RY~aA2)VERB_O PTI O N been. 
528 
The object sequence of a verb phrase can contain both direct and 
indirect objects followed by an optional adverbial such as a maximizing 
adverb or a time adverb. Objects can be either a proper name, possibly 
modified by the restrieters given above, or a noun expression, possibly 
premed;fled and postmodified. 
26. OBJECT_SEQUENCE :: 
INDIRECT_OBJECT DIRECT_OBJECT 
OB JECT_SEQUENGE_ADVERB; 
DIRECT_OBJECT OBJECT_SEQUENCE_ADVERB. 
27. OBJECT_SEQUENCE ~DVERB :: 
O B JEOT_S EQUENO E~LDVERBIAL; EMPTY. 
28. OBJECT_SEQUENCE_ADVERBIAL :: 
MAXiMIZING_ADVERB; TIME_ADVERB. 
29. TIME_.ADVERB :: again; early; first; last; late; next; now; recently; 
simultaneously; slnee; then; today; yesterday. 
30. INDIRECT_OBJECT :: OBJECT. 
31. DIRECT_OBJECT :t OBJECT. 
32. OBJECT :: MODIFIED_NAMED_SUBJECT; 
PRE_NOUN_MODIFICATION NOUN_HEAD 
POST NOUN_MOD1FICATIO N. 
We now turn to the pro-noun-modifiers specified in our grammar. The 
modifier is a determiner optionally followed by a list of possessive nouns, 
an adjective, a sequence of nouns, another list of possessive nouns and a 
denominal noun. Examples of this type of construct include "the 
murderer's empty black pistol" and "a very rich man's thick wallet." For 
context-sensitive purposes, the determiners are divided into "universal" 
determiners which may precede both singular and plural nouns and 
determiners which may only precede singular nouns. Furthermore, a 
context-frcc restriction of the pro-noun-modifiers is that thcrc can be at 
most one list of possessive nouns in a sequence. For convenience we choose 
to enforce this condition in the hypcrrules instead of the metarules. 
33. PRE_NOUN_MODIFIOATION .': 
DETERMINER PRE_NOUN_MODIFIERS. 
34. PRE_NOUN_MODIFIERS :: EMPTY; 
POSSESSM,\]_NOUN_LIST AD JEOTIVE_OPTION 
NOUN_SEQUENCE POSSESSIVE_NOUN_LIST 
I) ENO MINAL_NOUN. 
35. DETERMINER :: 
UNIVERSAL_DETERM \[NER; SINGIJLAR_DETERMINER. 
311. UNIVERSALDETERMINER :: 
tim; some; any; my; your; his; her; its; our; their. 
37. SINGULAR_I)ETERMINER :: either i neither; another; 
NOT_OPTION NEGATABLE_SINGULAR_DETERM\[NER. 
38. NEGATABLE_SINGULAR_DETERMINER :: a; an; eaeb; every. 
39. NOT_OPTION :: not; EMI)TY. 
40. POSSESSIVE_NOUN_LIST :: EMPTY; 
POSSESSIVE_NOUN LIST POSSESSIVE_NOUN. 
41. POSSESSIVE_NOUN :: NOUN's; NOUN'. 
42. ADJECTIVE_OPTION ~: ADJECTIVE; EMPTY. 
43. NOUN_SEQUENCE :: NOUN; NOUN and NOUN; EMPTY. 
The nouns in the NOUNSEQUENCE denote the physical composition of 
items (e.g. "the fisherman's rusted iron hook") and thus act as adjectives 
Denominal nouns arc adjectives which denote some quality of the noun 
being modified (e.g. "her social life" and "his moral responsibility"). Since 
there are a large number of these, we omit their formal specification here. 
In our grammar subset we restrict post-noun-modifiers to relative 
clauses involving people. Many other forms of post-noun-modification are 
fermal\]y specified in \[7\] 
44. POST_NOUN_MODIFICATION :: RELATIVE_CLAUSE; EMPTY. 
45, RELATIVE_CLAUSE :: 
who PREDICATE_SEQUENCE OBJECT_SEQUENCE. 
Finally, the punctuation in our grammar is given below 
46. PERIOD :: . . 
47. COMMA :t ~ . 
48. COMMA OPTION :: COMMA; EMPTY. 
49. EMPTY :: . 
4. HYPERRULES FOR ENGLISH 
The hyperrules of tile two-level grammar for English define the 
context-sensitive aspects of English syntax which can not be specified by 
the context-free rules ef the recta-grammar. Unlike the meta-grammar, the 
hyperrulss do not generate any part of the English sentence. They serve 
only to verify the context-sensitive conditions of the grammar. This is 
done by using predicates ,~ described earlier. Predicates willderive the 
empty string if they are satisfied and will derive nonterminal strings of 
useless symbols otberwise. The notim~ that tile hyperrulcs will not 
generate any terminal string but instead verify context-sensitive eonditions 
of a terminal string already generated by the context-h'ee mctarules is a 
nnique feature of our approach to designing two-level grammars (e.g. in 
contrast, see \[2\]). This will greatly simplify parsing two-level grammars as 
we will see later. 
We will define two types of predicates. The first of these will be 
preceded by the protonotion condition and will be given explicitly in the 
formal grammar. As with the recta-grammar, however, there will be some 
rules which can not bc precisely defined in the formal system. These rules 
relate to qualities of the unspecified lexical elassc~ (e.g. nouns, vm'bs, etc.) 
and will be designated by the protonotion where. For exalnplc, the 
hypernotions where NOUN is singular, where VERB is past partlelple, 
and where NOUN and VERB agree in person and number call not bc 
precisely defined except by a very large number of formal rules such ms 
those given below: 
where aardvark is singular : EMPTY. 
where abandoned is past participle I EMPTY. 
where Adam and ere agree in person and number : EMPTY. 
In the subseqnent discussion of hyperrules we will use the not, ation Itu 
to denote hyperrule number n. The start hypcrrule (Ill) of the two-level 
granunar is: 
1, SENTENOE : condition SENTENOE is a well-formed sentence. 
This hyperrule has as its start notion an English sentence which is well- 
formed with respect to the context-free rules or the recta-grammar for 
metanotion SENTENCE. The next hyperrule (H2) expands the sentence 
with respect to what conditions must be satisfied. The formalization of 
these is self-explanatory. 
2. condition SENTEN(IE_MODIFIER SUBJECT_SEQ, UENOIg 
AUXILIARY_SEQUENCE VERB OBJECT_SEQUENCE 
PERIOD is a well-formed sentence : 
condition SUI1JEC'I~SEQUENCE shows subject-predicate 
agreelnent witb AUXILIARY_SEQUENOE VERB, 
condition SUBJEOT_SEQUENOE i.~ a well-formed subject, 
condition OBJEOT_SEQUEN(JE 
shows objeet~prcdicate agreement with VERB, 
condition AUXILIARY_SEQU\]';NOE VERB 
is ~ well-formed predieate~ 
condition OBJEGT_SEQUEN(JE is a wclbhwmed object. 
The first condition is that the subject sequence must agree with the 
predicates specified by the auxiliary sequence and verb. In onr grammsr, 
agreement means that the subject and the subject-verb must agree in 
person and !mmbcr. There are two possibilities for snbject-verbs: 1) the 
auxiliary sequence ia empty (It3) iu which c~sc the main verb must be 
consistent with the subject, and 2) thc auxiliary scqucncc is uon-empW 
(H4) in wfiieh case it is the auxiliary verb which must be consistent wit.h 
the subject: Subjec~.s may be in our of three forms: l) the subject is a 
proper name (II5), possibly modified by a rcstrictcr (c.g. "even Mr. Smith" 
or "primarily Mrs. Jones"), and therefore requires ~ singular verb; 2) the 
subject is a single subject (H6-HT) in wbich case it need only agree wi~h 
. the subject-verb; or 3) the subject may bca compound subject co- 
ordinated with and (fIS-II9), in which casc it reqnires a plural verl) (e.g. 
"John and Bill arc here."). 
3. condition SUBJECT_SEQUENCE 
shows subjeet-pr¢,dicate agreement with VERB : 
condltlon SUBJEOT__SI,,'QUI~NOE agrees iltl person and nuttlber 
with VERB. 
4. condition SUBJECT SFQtJENCI~ 
shows rod)jeer-predicate agreement 
with AUXII,IARY__A1)VEItILOPTION AUX\[I,IARY VERB 
AUXILIARY_TRAILER VERB : 
condltkm SUBJE(JT_SEQUENCE agrees in person and number 
with AIJX\[LIARY_VERB. 
5. condition MOI)IFIED_NAMED_SUBJEOT 
agrees in person and number wlth VERB : 
where VERB i.q slng,dar. 
8. condition PRE_.NOUN_MODIFICATION NOUN I1EAD 
POST_NOUNMODIFICATION 
agrees in person and number wlth VERB : 
condition NOUN_ttEAD 
agrees in person and number with VERB. 
7. condition NOUN'agrecs in person and number with VERB : 
where NOLVN and VERB agree in person and number. 
8. condition NOUN LIST OOMMA_OPTION and NOUN 
agrees in person and number with VERB : 
wlmre VERB is phlral. 
9. condition MAIN_.SUBJECTI a.nd MAIN_SUBJECT2 
agrees in person and nnmber wlth VERB : 
where VERB is plurM. 
To satisfy tile second condition that tile subject of a sentence must bc 
well-formed, the subject may fall into one of the following categm'ies: 1) if 
the subject is a name (II10), then it is already well-formed by the 
metarules; 2) if the subject is modified (till), then the modifiers must be 
correct; and 3) if the subject is a componnd subject (I112), then each 
component of the compound subject must be well-formed according to 
rules 1 and 2. 
10. condition MODIFIED_NAMED_SUBJECT is a well-formed subject : 
EMPTY. 
11. condition DETEI{M\[NEI)~ PRE_NOUN_MODIFIERS 
NOUN tlEAI) PO ST__N OUN__MODIFICATI ON 
is a well-formed subject : 
condition DETERMINER I'RE_NOUN_MODIFIERS 
NOUN_tlEAD is correct in premodificatlon, 
con dition DETERMINER NOUN_IlEAl) 
P OST_NOUN_MODIFI(3ATION 
is correct in postmodificatlon. 
12. condition IMAIN__SUBJEOT1 trod MAIN_SIJBJECT2 
is a well-formed subject : 
condition MAIN_SUBJECTI is a well-formed subjcct, 
condition MAIN_SUBJE(JT2 is a well-remind subjcct. 
Correctness of modification implies that a subject must bc correctly 
l)remodilied and postmodificd. We first give the hyperru\[es which enforce 
correct premodification. Premodifieatiml (H13) requires 1) correct 
determiner usage (i.e. with respect to singular and plural nouns) and 2) 
any prcmodifying nouns must be singular or "mass" nouns (i.e. nouns 
which denote item composition such as aluminum, bra~ss, etc.). A singular 
determiner (e.g. a, an, each, etc.) requires a siugular noun (Ill4) but a 
"universal" determiner (e.g. some, the, etc.) may bc used with singular or 
plural nouns (II15). If there arc no premodifying nouns, then hyperrulc 
Ill6 will apply. A single premodifying noun (II17) may bc either singular 
or a mass noun. Note that rnle Ill7 is nondeterminlstic in that there are 
two hyperalternativcs. The condit.ion is satisfied if either onc of these 
hypcrrules is satisfied. If the premodifying uouns are co-ordinated with and 
(1118), then both nouns must be mass norms (e.g. "the wooden and iron 
door" is correct but "the forest and garden path" is not). 
I a. conditkm DETERM\[NI,;R POSSI,\]SSIVE_ NOUN .LIST t 
NO UN_SEQUENCE I~OSSI~SSIVE_NOUN_LIST2 
DENOMINAL_NOUN NOUNHEAD 
lu correct in premodifieatlon : 
condition DETERMINER correctly premodifie.~ NOUN IIEAD, 
condition NOUN SEQUFN(~E are singular or mass II\[)II\[IS. 
14. condition SIN(\]IJI,AI{_DETERMIN\]I\]R correctly premodifies NOUN: 
where NOUN i:~ singubtr. 
15. condition UNIVFRSAb_DETERM\[NER 
correctly premodifies N()UNJIEAD : EMPTY. 
18. condition EMPTY are singubw or mass nouns = EMPTY. 
17. condlLion NOUN are singular or ma.qs nouns : 
where NOUN is Mngula¢; wlmre NOUN is a mass noun. 
18. condition NOONI and NOUN2 arc singular or mass nouns : 
where NOUN1 is a mass noun~ where NOUN2 is a mass noun. 
llyperrulcs \[I19-II27 define the conditions for postmodification. Any 
postmodificatk)n of the snbjcct mast bc in the form of a relative clause 
which begins with who. Tliis type of relative clause rcqnires ~t human noun 
and the verb of the relative clause nmst agree with the modified noun. For 
cxamplc~ iu "The men who fix computers were very helpful," the noun men 
nlust bc }1~ blllll~gn nOUll since it is modified by who and the verb fix must 
be compatible with men. Tbis type of relative clause may be considered as 
describing two separate sentences: "The men fix computers." and "The men 
were very helpful." In the hypcrrnles whleh verify these conditions, the 
sub-sentence described by bhc relative clause is formed and then checked 
for correctness using hypcrrule I12 rccursively. 
19. condition I)ETERMINEIt NOUN._IIEAD 
POST NOUN_.MODIFICATION 
IS correct ill postlnodil|l:atlol| l 
condition POST_NOUN MODIFIOATION 
correctly post, modlfies DETERMINEI{ NOUN_\[lEAD. 
20. condition EMPTY correctly postmodifics 
DETERMINEI{ NOUN_ IlEAl) : EMPTY. 
21. condition RELATIVE_CLAUSE correctly postmodifies 
DETERM1NEI'~ NOUN .IIEAD : 
condition NOUN_IlEAl) is a human noun, 
eondithm the verb of RELATIVI,;_C, LAUSE 
agree~ wltll I)ETERMINEll NOUN_IIEAD. 
529 
22. conditlou NOUN is a human norm t where NOUN is a human noun. 
23. condition NOUNI and NOUN2 is a human noun t 
wlmre NOUN1 is a human noun 9 
where NOUN2 is a human noun. 
24. condition NOUN_LIST COMMA_OPTION and NOIJN 
iS a hLIman nonu 1 
condition NOUN_LIST in a human noun~ 
where NOUN is a human noun. 
25. condition NOUN1 COMMA NOUN2 is a human noun : 
where NOUN1 ia a human noun 9 
where NOUN2 is a human noun. 
2{1. condition NOUN_LIST COMMA NOUN is a human noun : 
condition NOUN_LIST is a human noun, 
wikere NOUN is a human noun. 
27. condition the verb of 
who PREDICATE_SEQUENGE OBJECT_SEQUENCE 
agrees with DETERMINER NOUN~HEAD : 
condition DETERMINER NOUN_IlEAl) 
PREDICATE_SEQUENCE OBJECT_SEQUENCE PERIOD 
is a well-formed sentence, 
Tile third condition that the English sentences defined by our 
grammar must satisfy is that the predicate (verb) and objects should agrcc. 
The type of verb mast correspoud to the number of objects in the sentence: 
if the verb is intransitive, then no objects are allowed except for adverbs 
(ti28); if the verb is transitive, then a direct object is required (H29); and if 
the verb is ditransitive, then both a direct and an indirect object are 
required (I130). 
28. condition OBJECT_SEQUEN(3E_ADVERB 
slmws object"predlcate agreement with VERB : 
where VERB is iutransitlve. 
29. condition DIRECT_ORJECT OBJECT_SEQUENI3E ADVERB 
shows object.predlcate agreement with VERB : 
where VERB is transitive. 
30. condition IND1RECT_ORJECT DiRECT_OBJECT 
OB JECT_S EQUEN CIE_A1)VERB 
shows object.predlcate agreement with VERB : 
where VERB is dltransltive. 
The fourth condition for a well-formed sentence is that the auxiliary 
adverbs and main verb are in correct grammatical sequence, if I, here are no 
auxiliary verbs (H31), then tile auxiliary sequence is correct according to 
the recta-grammar. If auxiliary verbs are present then the verb must be a 
past partieiple (II32). 
31. condition AUXILIARY_ADVERB_OPTION VERB 
is a well-formed predicate : EMPTY. 
32. condition ALrXILIAI~Y_ADVEI~,B_OPTION 
AC TIVE_OR_PA S SIVE_,AUX/LIARY VERB 
is a well-formed predicate : 
where VERB is a past participle. 
The fifth and final condition which must be satisfied is fro" the object 
of the sentence to be well-formed. A simple object (H33) must satisfy the 
same conditions as a subject and hyperrules H10-H12 will apply 
recursively. An object sequence (H34) is well-formed if the indirect and 
direct objects are well-formed. 
33, condition OBJE(3T OBJECT_SEQUENCE_ADVERB 
is a well-formed object : 
condition OBJECT is a well-formed subject. 
34. condition INDIRECT_OBJECIT DIRECT_OBJECT 
OBJECT_SEQUEN(3E._ADVERB is a well-formed object : 
condition INDIRECVr OBJE(3T is a well-formed nbject~ 
condition DIRECT_OBJECT is a well-formed object. 
It can be seen that the above set of hyperrules is relatively concise and the 
conditions being described are readily understandable. We claim that the 
other goals of consistency, precision (for our subset of English), and 
unambiguity are also achieved. In the next section it will be shown how 
this specification may be implemented automatically. 
5. TWO-LEVEL PARSIN(I 
Our method of natural language specification has two-levcls: 
metarules for eontexVfree syntax and hyperrules for context-sensitive 
syntax. Similarly our method of parsing a two-level grammar requires a 
parser for metarules and a parser for hyperrules. Since the metarules are 
context-free, any of the well-known context-free paining algorithms (e.g. 
see \[17\]) may be used to derive a context-free structure of some input 
sentence. Context-free parsing will eliminate all sentences which do not 
satisfy the context-free syntax of the language but is unable to eliminate 
530 
structures which are correct in the context-free sense but incorrect with 
respect to context-sensitive syntax. The hyperrule parser will further reduce 
the set of sentences which arc considered to be grammatically valid by 
analyzing the context-free parse tree for context-sensitive violations. 
The "parser" for the hyperrules is actually an interpreter developed by 
the authors in \[4\] which evaluates the hyperrules in much the same way as 
a progrannning language interpreter executes programs. The hyperrules 
are interpreted sequentially in the order that conditions are enumerated in 
the grammar. Interpretation proceeds by expanding the stm't notion and 
applying the hyperrules to all of the branches of the hypcrrule derivation 
tree until all of the prcdicatcs are evaluated. As interpretation proceeds, 
each node of the derivation tree (corrcsponding to a hypernotion) is 
expanded by matching it with a hyperrule lcft-hand sldc. The right-hand 
side of the matched hyperrule is then used to create a subtrcc for that 
node. Each branch of tile tree is evaluated from left to right in a prc~ordcr 
traversal. The English sentence is syntactically correct if and only if the 
resulting terminal string derived by tbe hypcrrulc tree is the empty string. 
The method of writing hyperrules to derive only the einpty string 
greatly simplifies the parsing process. Traditionally (e.g. \[2, 10\]), ~wo-lcvel 
grammars use tile hyperrules to generate the terminal s~rings of the 
language with the metarules being used only to instantiatc hyperrules. For 
example, in our grammar the metanotion SI°NTENCE is nscd to generate 
English sentences which arc tben input to the hyperrules for anMysis. In 
other two-level grammar styles, however, the components of thc sentence 
would also be generated by hypcrrules. The result of hyperrules generating 
terminal strings is that parsing bccmnes considerably more difficult and is 
not accomplished without restrictions being placcd on hypcrrules (e.g. \[15\]). 
Our method of interpreting hypcrrnles places no restricl, ions, thcrclorc 
allowing the tlg to be more gencral. The differences in writing styles are 
cxplored further in I4\]. 
The hyperrule interprctatkm algoritbm is outlined below: 
Procedure EvMuute (hypcrnotion) 
1. Find tile hyperrule to apply wMch has tim hypernotion as its left.: 
hand side. This rule will bc of the form: 
hypernoffon : hypernotioa-I, hyperaotfon-2, ..., hypernoth>u-n. 
2. Expand the derivation tree with hypernotion tts the root of the 
current snbtree ~nd tile branches being hypernvtion-t, hypernolion..2, 
, hypernotfon-n. 
3. Evaluate (hypernntion-i) for i ~= 1, 2~ .., n. 
To explain how this interpreter works, consider the examplc sentence 
"Professor White and the students who attend the university gave Mrs. 
White a present today." This sentence is seen to be correct, with respect to 
context-free syntax and its structural representation is shown in 1,'ignre 1. 
The specific metarules applied arc numbered. We will now apply the 
hyperrules to this sentence to show how the context-sensitive conditions 
arc verified. For notational convenience we have italicized the protonotions 
which correspond to metanotions in the hyperrules. Since the tree will bc 
traversed from left to right we will label the branches (i.e. nodes) using a 
nmnber (0-8) to denote the level in the tree and a letter (a-e) to indieaLe 
lcf~ to right ordering. 
The root of the hyperrulc derivation tree is the sentence itself. 
\[Iyperrulc HI will be applied to initiate the verification process. This will 
be followed by H2 which divides the derivation tree into five separate 
branches, one for each condition which the sentence must satisfy. 
0 • Professor White and the ~tudents who attend the university gave Mrs. 
White a present today. 
1 • condition Professor White and the students who attend the university 
gave Mrs. White a present today, is a well-formed sentence 
2a * condition Professor White and the students who attend the university 
ghows subject-predicate agreement with gave 
2b * condition Professor White and the students who attend the nniversity is 
a well-formed subject 
2c • condition a present today shows object-predicate agreement with gave 
2d * condition gave is a well-formed predicate 
2e • condition a present today is a well-formed object 
To expand branch 2a and cheek the first condition, hyperrule H3 (no 
anxiliary verbs) is applied. Since the subject is compound, rule H9 will be 
applied, requiring the verb to be plural. The "library" predicate will verify 
the plurality of gave. 
2a • condition Professor White and the student8 who attend the university 
shows subject-predicate agreement with gave 
3a • condition Professor White and the students who attend the university 
agrees in person and number with gave 
4a • where gave is plural 
5a • 
tlyperrule H12 will be applied to expand branch 2b and decompose 
the compound subject into its components. IIyperrules ltl0 and Illl will 
then analyze each of the two respective sub-subjects for well-fm'medness. 
2b * condition Professor White and the students who attend the university 
is a well-formed subject 
3b * condition Professor White is a well-formed subject 
4b * 
3c * condition the students who attend the university is a well-formed 
subject 
4e * condition the students is correct in premodification 
4d .condition the students who attend the university is correct in 
postmodifieation 
Proceeding to construct the trce ill a left-to-right manner, branch 4c is 
expanded next using hyperrule It13. Since file determiner is universal and 
~here is no premodifying noun sequence, hypcrrules It15 and H16 complete 
this subtree. 
4c • condition the students is correct in premodifieation 
5b * condition the correctly premodifies students 
6a • 
5c • condition EMPTY are singular or mass nouns 
6b * 
The expansion of branch 4d is one of the more interesting aspects of 
the context-sensitive analysis since it involves a relative clause. The 
analysis is performed by hyperrules HI9, H21, ti22 and H27. Note that 
rule II27 rearranges ~hc relative clause into a new sentence and reem'sively 
calls hyperrule H2 to analyze the new sentence. 
4d o condition the students who attend the university is correct in 
postmodification 
5d • condition who attend the nniversity correctly postmodifies the students 
6c . condition students is a human noun 
7a * where ,students is a hllillD.n noun 
8& • 
6d • condition the verb of who attend the nnivcrslty agrees with the 
students 
7b * condition the students attend the university, is a well-formed sentence 
Instead of expanding branch 7b further, wc will resmne mlr example 
at branch 2c to verify the condition that the originM sentence must have 
object-predicate agreement. Since the object sequence contains an indirect 
object, direct object and an adverb, hyperrule H30 will be Nlplied next and 
since the verb gave is ditransitive, object-predicate agreement will he 
satisfied. 
2e * condition Mrs. White a present today shows object-predicate 
agreement with cave 
3d * where gave is ditransitive 
4e • 
Returning to the top-level conditions, we next verify the well- 
formcdness of the verb gave. Since there arc no auxiliary verbs, hypcrrule 
lt31 is satisfied. 
2d • condition gave is a well-formed predicate 
3e • 
The final condition that the sentence must satisfy is well-formcdness of the 
object. Since the object is a sequence, rule H34 will be applied to branch 2c 
to decompose tile object sequence and analyze the indirect and direct 
objects individually by rule H33. Rule Itaa calls rules II10-II12 recursively. 
Since Mrs. White is a named subject, hyperrule H10 is satisfied for tile 
indirect object. By applying hypcrrules \[I11, II13, HI4, H16, It19 and 1120, 
the direct object a present will also be verified as a well-formed object. The 
analysis is now complete and the sentence has been determined to be 
correct through tile process of our twoqevel grammar interpretation 
method. 
6. CONCLUSIONS 
We have shown that two-level grammars may be used very elegantly 
to give a formal specification of Ignglish context-fl'ec and context-sensitive 
syntax. In addition to the subset we have defined in this paper, many 
other types of Nnglish declarative sentences have been formMly specified 
using two-level grammars {7\]. There seems to be no obstacle to using rig 
specifications for any type of natural language syntactic specification. 
Tile principal advantages of the two-level grammar mctManguage are: 
1) it is very readable and may be used to give a formal description using a 
structured form of natural language; 2) it is formal with many well-known 
mathematical properties; and 3) it is directly implcmentable by 
interpretation. The significance of the latter fact is that once we have 
written a two-level grammm' for natural language syntax, we can derive a 
parser automatically without writing any additional specialized computer 
programs. The combination of readability and implementability is unique 
in grammar theory for natural languages. 
To give a complete spccification of natural language, semantics and 
knowledge representation must be specified in addition to syntax. Our 
future goals are the investigation of two-level grammar for semantic 
specification. Because of the ease with wtfich two-level grammars may 
express logic \[6\] and their Turing computability \[12\], we expect that tlgs 
will also bc very suitable for these goals. 
531 
Figure i. Meta-Grammar Derivation Tree. 
NOUN PIII~SE 
SF~\]TI~CE SUBJECT PRI~ICATE 
MODIFIER SEQUF~CE SEQUENCE 
-- MA. :IN AUXILIARY VERB SUBJECT SUBJECT S EQU F/~C E { 
gave 
(6) ~) (14) 
MODIFIED AUXILIARY NAMED SUBJECI ~ ADVERB 
T I5) R~TRICTERS NAMED --- 
SUBJE/~ 
9) 
TITLE SURN~4E \[ J 
Professor ~lite 
SENTENCE 
VE~ P.~E PERIO~ 1(46) 
OBJECT 
INDIRECT OBJECT 
(30) 
OBJ£L~f 
(32) 
MODIFIED 
N~I4ED SU P~\] E~%' 
RFSTR ICTERS NAMED 
MODIFIEIh£ 
9) (34) 
~- TITLE SUI~E SIb~ULAR 
MrS. White a 
DIRECT O~JEC~ 
OBJECT SEQUENCE 
(31) ADV (~7) 
OBJECP OBJECT 
ADVERBIAL 
PRE NOUN NOUN PO~£ NOUN TIME MODIFICATION \[IEAD MODIFICATION ADVF/~ 
DETERMINER ERE NOUN -- today i 
present 
PRE }~DUN NOUN \[~3Wf bDUN MODIFICATION }lEAD HODIF ICATION / 
33)~ (I0) (44) 
DETE\[~4INER PRE NOUN NOUN RELATIVE 
MODIFIERS I CLAUSE 
(35) ~j34) 
UNIVERSAL ~ students who PREDICATE DETEP44 INER SEQUENCE 
the AUXILIARY VERB 
AUXILIARY attend ADVERB 
OPTION 
_~(15) 
O\[~\] ECI" 
DIRBC~I ' OBJECT 
PRE NOON NOUN POST NOJN MODIFICATION }~J~D ~3DIF I CATION 
DE .TE~INF~ PRE NCON N~JN 
(35) MODIiIERS I 
34) 
UNIVERSAL -- university DE~RM INER 
(36) 
the 
OBJECT 
SEQU~,~CE ADVERB ~ 
(27) 
532 

REFERENCES 

Ill Chomsky, N. Syntactic Structures. Mouton Publishers, The tIague, 
Netherlands, 1957. 

\[2\] Gleaveland, J. C. and Uzgalis, R. C. Grammars /or Programming 
Languages. Elsevier North-IIolland, New York, 1977. 

\[3\] Culicovcr, P. W. Syntax. 2nd ed. Academic Press, New York, 1982. 

I4\] Edupuganty, B. and Bryant, B. R. "Two-Level Grammars for 
Automatic Interpretation." Prcc. 1985 ACM Annual Conference, 
1985, pp. 417-423. 

\[5\] Gazclar, G. and Pullum, G. K. Generalized Phrase Structure 
Grammar: A Theoretical Synopsis. Indiana University Linguistics 
Club, Indiana University, Bloomington, Ind., 1982. 

\[6\] tlessc, W, "A Correspondence Iletween W-Grammars and Formal 
Systems of Logic and Its Application to Formal Language 
Description." Comput. Linguist. Comput. Lang. 13 (1979), 19-30. 

\[7\] Johnson, D. Using Two-Levd Grammars to Describe the Syntax of 
English. M. S. Thesis, Department of Computer and Information 
Sciences, The Uniw;rsity of Alabama at Birmingham, 1984. 

\[8\] Johnson, D. and Bryant, B. R. "Using Two-Level Grammars to 
Describe the Syntax of English." Papers on Computational and 
Cognitive Science, ed. E. Battistella. Indiana University Linguistics 
Club, Bloomiugtmb had., Aug. 1984, pp. 61-86. 

\[91 Johuson, D. and Bryant, B. R. "FormM Syntax Methods for Natural 
Language." \[nf \])rocess. Lett. 19, 3 (Oct. 1984), 135-.143. 

\[1O\] Pagan, F. G. Formal Specification of Programming Languages: A 
Panoramic Primer. Prentice-Hall, Englewood Cliffs, N. J., 1981. 

\[11\] Quirk, R. et al, A Grammar of Oontemporary English. Longman, 
White Plains, N. Y., 1972. 

\[t2\] SintT, off, M. "Existence of wm Wijngaarden's Syntax for Every 
Recursively lgnmnerablc Set." Ann. Sos. Set. Bruxdlcs 2 (1967), 115- 
118. 

\[13\] Stagcbcrg, N. C. An Introductory English Grammar. 4th ed. Holt, 
Rinehart and Winston, New York, 1981. 

\[141 Webster's Third New International Dictionary, Unabridged. The Great 
Library of the English Language. Merriam-Webster, Springfiehl, 
Ma.ss., 1981. 

\[15\] Wegner, L. M. "On Pro'sing Two-Level Grammars." Acts Inf. 1~ 
(1980), 175-193. 

\[16\] van Wijngaarden, A. "Orthogonal Design and Description of a Fro'real 
Language." Technical l{eport MR 76, Mathematiseh Centrum, 
Amsterdam, 1965. 

\[17\] Winograd, T. Natural Language as a Cognitive Process. Volume b 
Syntax. Addison-Wesley, Reading, Mass., 1983. 

\[18\] Woods, W. A. "Transition Network Grammar for Natural Language 
Analysis." Commun. A CM 13 (1970), 591-602. 

\[19\] Zandvoort, R. W. A IIandbook of English Grammar. Prentice-flail, 
Englcwood Cliffs, N. J., 1965. 
