CONJUNCTIONS AND MODULARITY IN LANGUAGE ANALYSIS PROCEDURES 
Ralph Grishman 
Department of Computer Science 
Courant Institute of Mathematical Sciences 
New York University 
New York, New York, U. S. A. 
Summary 
The further enrichment of natural 
language systems depends in part on 
finding ways of "factoring" the effects 
of various linguistic phenomena, so that 
these systems can be partitioned into 
modules of comprehensible size and 
structure. Coordinate conjunction has a 
substantial impact on all aspects of 
syntactic analysis -- constituent struc- 
ture, grammatical constraints, and 
transformations. If the rules of syn- 
tactic analysis were directly expanded 
to accomodate conjunction, their size 
would increase severalfold. We describe 
below the mechanisms we have used to lo- 
calize the effect of conjunction in our 
natural language analyzer, so that most 
of the rules of our grammar need not ex- 
plicitly take conjunction into account. 
Introduction 
Progress in computational ~inguis- 
tics depends in part on identifying ways 
of decomposing complex components of the 
analysis procedure into more elementary 
constituents corresponding to separable 
or nearly separable aspects of linguis- 
tic structure. If this "factoring" is 
successful, the constituents will in sum 
be substantially simpler than the compo- 
nent they replace, thus clarifying the 
linguistic theory represented by the an- 
alysis procedure and paving the way for 
further enrichment of the language sys- 
tem. 
A familiar example of such factor- 
ing began with early context-free natur- 
al language parsers. Such parsers tried 
to use a context-free grammar to des- 
cribe both the constituent structure of 
sentences and the most basic grammatical 
constraints (e.g., number agreement 
between subject and verb and within noun 
phrases, subcategorization constraints). 
Because these grammatical constraints 
have a multiplicative rather than addi- 
tive effect on the size of the grammar, 
this approach rapidly becomes unwieldy. 
In its place arose two component systems 
with a separate procedural restriction 
component for expressing such constra- 
ints. 
We have devoted considerable effort 
to factoring out the effects of coordi- 
nate conjunction on language analysis. 
Conjunction greatly increases the number 
of different possible structures which 
the components of syntactic and semantic 
analysis must be able to process. If 
each component were simply expanded to 
accomodate all these additional struc- 
tures, the resulting system would be 
huge and the essential function of the 
components greatly obscured. We have 
sought instead to isolate, as much as 
possible, the effects of conjunction 
within separate modules which modify the 
operation of the parser and restructure 
the result of the parse. 
Our System in Brief 
Over the past 15 years, members of 
the Linguistic String Project and Com- 
puter Science Department at New York Un- 
iversity have developed a powerful set 
of tools for natural language analysis 
\[1,2\]. Our primary objective has been 
the automated retrieval of information 
from natural language texts; for the 
past several years we have been applying 
these tools to the construction of sys- 
tems for gathering statistics and 
answering questions about hospital re- 
cords (initially, radiology reports; 
currently, discharge summaries). 
We have divided the task of answer- 
ing questions about such texts into two 
subtasks \[3\]. The first of these is the 
automatic mapping of these texts into a 
tabular data base structure called an 
information format. We term this map- 
ping procedure formatting \[4\]. The sec- 
ond subtask is the retrieval of informa- 
tion from this data base in response to 
natural language queries. Both subtasks 
-- the analysis of the texts and the an- 
alysis of the queries -- involve several 
stages of syntactic processing followed 
by several of semantic processing. The 
syntactic processing is similar in the 
two cases, although the semantic pro- 
cessing is quite disparate: in the for- 
matting, it involves the mapping of sen- 
tence constituents into the format; in 
the question answering, it involves a 
translation into logical form (an exten- 
--500~ 
sion of predicate calculus) and thence 
into a data base retrieval request. We 
shall focus in this paper on the effects 
of conjunction on the syntactic process- 
ing, although we shall also comment at 
the end on the interaction of the syn- 
tactic and semantic processing. 
Syntactic processing is done into 
two stages: parsing and transformation- 
al decomposition. Parsing is done by a 
top-down context-free parser augmented 
by grammatical constraints expressed in 
an special-purpose procedural language, 
Restriction Language \[2,5\]. The result- 
ing organization is similar to that of 
an augmented transition network (ATN), 
although sharper separation is mainta- 
ined between the context-free grammar 
which defines the constituent structure 
and the restrictions which implement the 
grammatical constraints. In contrast to 
most ATNs, transformational decomposi- 
tion is performed as a separate stage 
following the parse \[6\].* The decomposi- 
tion regularizes the sentence structure 
by such operations as converting passive 
sentences to active, expanding relative 
clauses and other noun modifiers to full 
sentential structures, etc. 
Incorporating Coordinate Conjunction 
In this section we shall briefly 
consider the effect of coordinate con- 
junction on each aspect of syntactic 
processing, and describe how we have 
added modules to our processing compo- 
nents to account for these effects. 
Constituent Structure 
Let us consider first the problem 
of the new structures introduced by con- 
junction. The allowed patterns of con- 
joinings in a sentence are quite regu- 
lar. Loosely speaking, a sequence of 
elements in the sentence tree may be 
followed by a conjunction and by some or 
all of the elements immediately preced- 
ing the conjunction. For example, if 
the top-level sentence structure is sub- 
ject - verb - object and an "and" ap- 
pears after the object, the allowed pat- 
terns of conjoinings include subject - 
verb - object - and - subject - verb - 
object ("I drank milk and Mary ate 
cake."), subject - verb - object - and - 
* The separation of constituent struc" 
ture, restrictions, and transformations 
is another example of the modularity we 
try to achieve in our system. See Pratt 
\[7\] for a discussion of the modularity 
of augmented context-free analyzers. 
verb - object ("I drank milk and ate 
cake."), and subject - verb - object - 
and - object ("I drank milk and 
seltzer."). There are certain excep- 
tions, known as gapping phenomena, in 
which one of the elements following the 
conjunction may be omitted; for exam- 
ple, subject - verb - object - and - 
subject - object ("I drank milk and Mary 
seltzer."). 
We could extend the context-free 
component of our surface grammar to ac- 
count for these patterns. For example, 
in place of the production 
S -> SUBJ VERB OBJ 
we would have the set of productions 
S -> SUBJ CA1 VERB CA20BJ CA3 
CA1 -> SUBJ CA1 i 
null 
CA2 -> SUBJ CA1 VERB CA2 i 
VERB CA2 i 
null 
CA3 -> SUBJ CA1 VERB CA20BJ CA3 I 
VERB CA20BJ CA3 i 
OBJ CA3 1 
null 
this does not include gapping). The 
trouble with coordinate conjunctions is 
that they can occur almost anywhere in 
the structure of a sentence. Thus the 
same changes which we made above to the 
definition of S would have to be made to 
all (or at least many) of the produc- 
tions in the grammar. Clearly, such an 
extension to the grammar could increase 
its size by perhaps an order of magni- 
tude. 
One alternative is to automatically 
generate the additional elements and 
productions needed to account for con- 
junction as required during the parsing 
process. When a conjunction is encoun- 
tered in the sentence, the normal pars- 
ing procedure is interrupted, a special 
conjunction node is inserted in the 
parse tree (such as the CAn nodes 
above), and the appropriate definition 
is generated for this conjunction node. 
This definition allows for all the al- 
ternative conjoined element sequences, 
like the definitions of the CAn shown 
above. Conjoinings not fitting the 
basic pattern, such as gappings, are 
still included explicitly in the gram- 
mar. An interrupt mechanism of this 
sort is part of the Linguistic String 
Project parser \[i\]. A similar mechanism 
is included in Woods' augmented transi- 
tion network parser \[8\] and a number of 
other systems. 
--501-- 
Restrictions 
The restrictions enforce grammati- 
cal constraints by locating and testing 
constituents of the parse tree. One of 
the simpler restrictions in the Linguis- 
tic String Project grammar is WSELI, 
verb-object selection for noun objects. 
Verbs may be marked (in the dictionary) 
as excluding certain classes of noun ob- 
jects; WSELI verifies that the object 
is not a member of one of these classes. 
For instance, the verb "eat" is coded as 
excluding objects of the class NSENTI, 
which includes such words as "fact", 
"knowledge", and "thought."* The sen- 
tence "John ate his thought." would 
therefore fail WSELI and be marked as 
ungrammatical by the parser. 
Explicitly modifying each restric- 
tion to account for possible conjoined 
structures would expand that component 
several fold. Most restricions, 
however, apply distributively to conjo- 
ined structures -- a constraint is sa- 
tisfied if it is satisfied separately by 
each conjunct. For example, when the 
object is conjoined (verb nounl and 
noun2) verb-object selection must be sa- 
tisfied both between verb and nounl and 
between verb and noun2. Thus in "John 
ate meat and potatoes.", WSELI must sep- 
arately check selection between "ate" 
and "meat" and between "ate" and "pota- 
toes". This constraint can exclude in- 
correct analyses for some conjoined sen- 
tences. For instance, in "John ate his 
sandwich and thought about Mary.", it 
excludes the analysis where John ate his 
thought about Mary. 
Our implementation takes advantage 
of the fact that most restrictions apply 
distributively. The restrictions are 
stated in terms of a set of grammatical 
routines which locate constituents of 
the parse tree; for example, the CORE 
routine locates the head noun of a noun 
phrase. In a conjoined context, these 
routines are in effect multi-valued; in 
"John ate meat and potatoes.", the CORE 
OF THE OBJECT has two values, "meat" and 
"potatoes" We achieve this effect 
through a non-deterministic programming 
mechanism which is invoked by the rou- 
tines when a conjoined structure is en- 
countered \[2,9\]. This mechanism auto- 
matically reexecutes the remainder of 
the restriction for each value of the 
routine (each conjunct). In this way, 
* NSENTI is one of several noun classes 
defined in the Linguistic String Project 
grammar in terms of the types of senten- 
tial right modifiers they can take (such 
as "the fact that John is here"). 
the effect of conjunction is largely 
isolated within these grammatical rou- 
tines. Restrictions which do not dis- 
tribute (such as number agreement) must 
still be explicitly modified for con- 
junction, but these represent a rela- 
tively small fraction of the grammar. 
Transformational decomposition 
The transformations regularize the 
parse by incrementally restructuring the 
parse tree, and are therefore almost all 
affected by the possible presence of 
conjunctions in the portion of the tree 
they manipulate. Most of the transfor- 
mations, however, only rearrange ele- 
ments within a single sentential struc- 
ture or noun phrase. We therefore chose 
the expand each conjoined structure into 
conjoined complete sentential structures 
or noun phrases at the beginning of 
transformational decomposition (for ex- 
ample, "John baked cookies and made 
tea." would be expanded to "John baked 
cookes and John made tea.); in this way 
most of the transformations are unaf- 
fected by the presence of conjunctions. 
The rules for determining quantifi- 
cational structure, however, must take 
account of the copying which occurs when 
expanding conjoined structures (for ex- 
ample, "Some people speak English and 
understand Japanese." is not synonymous 
with "Some people speak English and some 
people understand Japanese."). In sim- 
plest terms, quantifiers derived from 
noun phrases which are copied during 
conjunction expansion (such as "some pe- 
ople" in the last example) must be as- 
signed wider scope than the logical con- 
nective derived from the conjunction. 
We do this by assigning a unique index 
to each noun phrase in the parse tree, 
copying the index along with the noun 
phrase in the transformations, and 
checking these indices during the scope 
analysis which is a part of the transla- 
tion to logical form. Similar account 
must be taken of copied determiners and 
quantifiers in conjoined noun phrases 
(because, for example, "ten colleges and 
universities" is not necessarily synony- 
mous with "ten colleges and ten univer- 
sities"). 
Sentence Generation 
As part of our question-answering 
system, we generate answers by translat- 
ing from logical form (extended predi- 
cate calculus) into full English sen- 
tences \[10\]. There is a close parallel 
between the components for sentence ana- 
lysis and sentence generation; in par- 
--502-- 
ticular, the last major step in genera- 
tion is the application of a set of gen- 
erative transformations. In accordance 
with the basic symmetry between analysis 
and generation, the generative transfor- 
mations operate on trees containing con- 
junctions only of full sentential struc- 
tures and noun phrases. Conjunction re- 
duction (changing, for example, "John 
ate cake and John drank milk." to "John 
ate cake and drank milk.") is performed 
at the end of the transformational 
cycle. Most of the generative transfor- 
mations operate within a single senten- 
tial structure or noun phrase. As a re- 
sult, the generative transformations, 
like the analytic transformations, are 
for the most part not affected by the 
presence of conjoined structures. 
Discussion 
In the preceding sections we have 
described the effect of coordinate con- 
junction on all the components of a syn- 
tactic analyzer. We have shown how we 
have been able to encapsulate the 
changes required to these components -- 
as an interrupt mechanism for our con- 
text-free parser; as a 
non-deterministic programming mechanism 
invoked by the routines used by the res- 
trictions; as a set of expansion rou- 
tines preceding transformational decom- 
position. We have thus avoided the need 
for pervasive changes which would have 
substantially enlarged, complicated, and 
obscured the original components. In 
addition, our approach has isolated and 
characterized the effect of conjunction 
in such a way that it may be carried 
forward to future systems and other gro- 
ups. 
Although modularity is generally 
regarded as a desirable objective, it is 
sometimes claimed that it imposes con- 
straints on the communication between 
modules which will ultimately lead to 
unacceptable losses of efficiency. We 
would respond that some constraints are 
necessary if a complex system is to be 
manageable and comprehensible. If the 
mode of interaction is appropriately 
chosen and sufficiently powerful (such 
as the interrupt mechanism and the 
non-deterministic programming mechanism) 
the resulting system will be both clear- 
ly structured and reasonably efficient. 
Acknowledgements 
The author wishes to acknowledge 
the primary roles played by Naomi Sager 
and Carol Raze Friedman in the design of 
the conjunction mechanisms. Carol 
Friedman developed the routines and res- 
trictions for conjunction and the con- 
junction expansion procedure. Ngo Thanh 
Nhan implemented the conjunction trans- 
formations for question analysis and the 
conjunction reduction routine for sen- 
tence generation. 
This research has been supported in 
part by Grant No. N00014-75-C-0571 from 
the Office of Naval Research; in part 
by the National Science Foundation under 
Grants MCS78-03118 from the Division of 
Mathematical and Computer Sciences and 
IST-7920788 from the Division of Infor- 
mation Science and Technology; and in 
part by National Library of Medicine 
Grant No. LM02616, awarded by the Na- 
tional Institutes of Health, DHEW. 

References 

i. N. Sager, Syntactic Analysis of Na- 
tural Language. Advances in Computers 
8, 153-188. Academic Press, New York 
(1967). 

2. R. Grishman, N. Sager, C. Raze, 
and B. Bookchin, The Linguistic String 
Parser. AFIPS Conference Proceedings 
42, 427-434. AFIPS Press, Montvale, New 
Jersey (1973). 

3. R. Grishman and L. Hirschman, 
Question Answering from Natural Language 
Medical Data Bases. Artificial 
Intelligence Ii, 25-43 (1978). 

4. N. Sager, Natural Language Informa- 
tion Formatting: The Automatic Conver- 
sion of Texts to a Structured Data Base. 
Advances i_nn Computers __17, 89-162. 
Academic Press, New York (1978). 

5. N. Sager and R. Grishman, The Res- 
triction Language for Computer Grammars 
of Natural Language. Communications of 
the ACM 18, 390-400 (1975). 

6. J. Hobbs and R. Grishman, The Au- 
tomatic Transformational Analysis of En- 
glish Sentences: An Implementation. 
Int'l J. of Computer Mathematics, 
Section A, 5, 267-283 (1976). 

7. V. Pratt , Lingol -- A Progress Re- 
port. Advance Papers Fourth Int'l Joint 
Conf. Artificial Intelli@ence, 422-428 
(1975) . 

8. W. A. Woods, An Experimental Pars- 
ing System for Transition Network Gram- 
mars. Natural Language Processing, 
Courant Computer Science Symposium 8, 
111-154 (1973). 

9. C. Raze, A Computational Treatment 
of Coordinate Conjunctions. American Journal
Computational Linguistics, microfiche 52 
(1976). 

10. R. Grishman, Response Generation 
in Question-Answering Systems. Proc. 
17th Annl. Meeting Assn. Computational 
Lin@uistics, 99-101 (1979). 
