DEALING WITH CONJUNCTIONS 
IN A MACHINE TRANSLATION ENVIRONMENT 
Xiuming Huang 
Institute of Linguistics 
chinese Academy of Social Sciences 
Beijing, China* 
ABSTRACT 
A set of rules, named CSDC (Conjunct Scope 
Determination Constraints), is suggested for 
attacking the conjunct scope problem, the major 
issue in the automatic processing of conjunctions 
which has been raising great difficulty for natu- 
ral language processing systems. Grammars embody- 
ing the CSDC are incorporated into an existing A~{ 
parser, and are tested successfully against a wide 
group of "and" conjunctive sentences, which are of 
three types, namely clausal coordination, phrasal 
coordination, and gapping. With phrasal coordina- 
tion the structure with two NPs coordinated by 
"and" has been given most attention. 
It is hoped that an ATN parser capable of 
dealing with a large variety of conjunctions in an 
efficient way will finally emerge from the present 
work. 
0 INTRODUCTION 
One of the most complicated phenomena in 
English is conjunction constructions. Even quite 
simple noun phrases like 
(i) Cats with whiskers and tails 
are structurally ambiguous and would cause problem 
when translated from English to, sa~-, Chinese. 
Since in Chinese all the modifiers of the noun 
should go before it, two different translations in 
Chinese might be got from the above phrase: 
(la) (With whiskers and tails) de (cats) ("de" is 
a particle which connects the modifiers and 
the modifieds); 
(ib) ((With whiskers) de (cats)) and (tails). 
Needless to say, a machine translation system 
should be able to analyse correctly among ether 
things the conjunction constructions before high 
quality translation can be achieved. 
As is well known, ATN (Augmented Transition 
Network) grammars are powerful in natural language 
* Mailing address: 
Cognitive Studies Centre 
University of Essex 
Colchester COb 3SQ, England. 
parsing and have been widely applied in various 
NL processing systems. However, the standard ATN 
grsamars are rather weak in dealing with con- 
junctions. 
In (Woods 73), a special facility SYSCONJ for 
processing conjunctions was designed and imple- 
mented in the LUNAR speech question-answering sys- 
tem. It is capable of analysing reduced conjunc- 
tions impressively (eg, "John drove his car 
through and completely demolished a plate glass 
window"), but it has two drawbacks: first, for the 
processing of general types of conjunction con- 
structions, it is too costly and too inefficient; 
secondly, the method itself is highly non-deter- 
ministic and easily results in combinatorial ex- 
plosions. 
In (Blackwell 81), a WRD AND arc was propos- 
ed. The arc would take the interpreter from the 
final to the initial state of a computation, then 
analyse the second argument of a coordinated con- 
struction on a second pass through the ATN net- 
work. With this method she can deal with some 
rather complicated conjunction constructions, but 
in fact a WRD AND arc could have been added to 
nearly every state of the network, thus making the 
grammar extremely bulky. Furthermore, her syste~ 
lacks the power for resolving the ambiguities con- 
tained in structures like (1). 
In the machine translation system designed by 
(Nagao et al 82), when dealing with conjunctions, 
only the nearest two items of the same parts of 
speech were processed, while the following types 
of coordinated conjunctions were not analysed 
correctly: 
(noun + prep + noun) + and + (noun + prep + noun); 
(adj + noun) + and + noun. 
(Boguraev in press) suggested that a demon 
should be created which would be woken up when 
"and" is encountered. The demon will suspend the 
normal processing, inspect the current context 
(the local registers which hold constituents re- 
cognised at this level) and recent history, and 
use the information thus gained to construct a new 
ATN arc dynamically which seeks to recognise a 
constituent categorially similar to the one just 
completed or being currently processed. Obviously 
the demon is based on expectations, but what fol- 
lows the "and" is extremely uncertain so that it 
would be very difficult for the demon to reach a 
high efficiency. A kind of "data-driven" alter- 
81 
native which may reduce the non-determinism is to 
try to decide the scope of the left conjunct re- 
trospectively by recognising first the type of 
the right conjunct, rather than to predict the 
latter by knowing the category of the constituent 
to the left of the coordinator which is "just com- 
pleted or being currently processed" -- an obscure 
or even misleading specification. 
the ball. 
Exl3. The man kicked the child and threw the 
ball. 
Exlh. The man kicked and threw the ball. 
ExlS. The man kicked and the woman threw the 
ball. 
I CASSEX PACKAGE 
CASSEX (Chinese Academy of Social Sciences;U- 
niversity of Essex) is an ATN parser based on part 
of the programs developed by Boguraev (1979) which 
was designed for the automatic resolution of ling- 
uistic ambiguities. Conjunctions, one major sour- 
ce of linguistic ambiguities, however, were not 
taken into consideration there because, as the au- 
thor put it himself, "they were felt to be too 
large a problem to be tackled along with all the 
others" (Boguraev 79, 1.6). 
A new set of grammars has been written, and a 
lot of modifications has been made to the grammar 
interpreter, so that conjunctions could be dealt 
with within the ATN framework. 
II PARSING MATERIALS 
The following are the example sentences 
rectly parsed by the package: 
Exl. The man with the telescope and the 
brella kicked the ball. 
Ex2. 
Ex3. 
Ex~. 
Ex5. 
Ex6. 
ExT. 
Ex8. 
Ex9. 
ExlO. 
ExlI. 
ExI2. 
cor- 
urn- 
The man with the telescope and the um- 
brella with a handle kicked the ball. 
The man with the telescope and the wo- 
man kicked the ball. 
The man with the telescope and the wo- 
man with the umbrella kicked the ball. 
The man with the child and the woman 
kicked the ball. 
The man with the child and the woman 
with the umbrella kicked the ball. 
The man with the child and the woman 
is kicking the ball. 
The man with the child and the woman 
are kicking the ball. 
The man with the child and the umbrella 
fell. 
The man kicked the ball and the child 
threw the ball. 
The man kicked the ball and the child. 
The man kicked the child and the woman 
III ELEMENTARY NP AND EXPANDED NP 
The term 'elementary NP' is used to indicate 
a noun phrase which can be embedded in but has no 
other noun phrases embedded in it. A noun phrase 
which contains other, embedded, NPs is called 'ex- 
panded Np,. Thus, when analysing the sentence 
fr84~ment "the man with the telescope and the woman 
with the umbrella", we will have four elementary 
NPs ("the man", "the telescope", "the woman" and 
"the umbrella") and two expanded NPs ("the man 
with the telescope" and "the woman with the umbre- 
lla"). We may well have a third kind of NP, the 
coordinated NP with conjunction in it, but it is 
the result of, rather than the material for, con- 
junction processing, and therefore will not recei- 
ve particular attention. In the text followed we 
will use 'EL-NP' and 'EXP-NP' to represent the two 
types of noun phrases, respectively. 
LEFT-PART will stand for the whole fragment 
to the left of the coordinator;andRIGHT-PART for 
the fragment to the right of it. LEFT-WORD and 
RIGHT-WORD will indicate the word immediately pre- 
cedes and follows, respectively, the coordinator. 
The conjunct to the right of the coordinator will 
be called RIGHT-PHRASE. 
VI CSDC RULES 
Constraints for determining the grammatical- 
ness of constructions involving coordinating con- 
junctions have been suggested by linguists, among 
which are (Ross 67)'s CSC (Coordinate Structure 
Constraint), (Schachter 77)'s CCC (Coordinate Con- 
stituent Constraint), (Williams 78)'s Across-the- 
Board (ATB) Convention, and (Gazdar @l)'s nontrans- 
formational treatment of coordinate structures u- 
sing the conception of 'derived categories'. These 
constraints are useful in the investigation of co- 
ordination phenomena,but in order to process coor- 
dinating structures automatically, some constraint 
defined from the procedural point of view is still 
required. 
The following ordered rules, named CSDC (Con- 
juncts Scope Determination Constraints), are sug- 
gested and embodied in the CASSEX package so as to 
meet the need for automatically deciding the scope 
of the conjuncts: 
i. Syntactical constraint. 
The syntactical constraint has two parts: 
82 
i.i The conjuncts should be of the same syn- 
tactical category; 
1.2 The coordinated constituent should be in 
conformity syntactically with the other constitu- 
ents of the sentence, eg if the coordinated con- 
stituent is the subject, it should agree with the 
finite verb in terms of person and number. 
Acoording to this constraint, Ex8 should be 
analysed as follows (the representation is a tree 
diagram with 'CLAUSE' as the root and centred a- 
round the verb, with various case nodes indicating 
the dependency relationships between the verb and 
the other constituents): 
( CLAUSE 
(TYPE DCL) 
(QUERY NIL) 
(TNS PRESENT) 
(ASPECT PHOGRESSIVE) 
( MODALITY NIL) 
(NEG NIL) 
(v (KICK ((*ANI SUBJ) 
( (*PHYSOB OBJE) 
( (THIS (MAN PART) ) INST) STRIK) )* 
(OBJECT ((BALL1 ,..)) 
(NLg~ER SINGLE) 
(QUANTIFIER SG) 
(DETERMINER ((DETI ONE) ) ) 
( AGENT 
AND 
((MAN ...) 
(NUMBER SINGLE) 
(QUANTIFIER SG) 
(DETERMINER ((DETIONE)) ) 
(ATTRIBUTE ((PREP (PREP WITH)) 
( (CHILD ...) 
(NUMBER ... ) 
((woMAN ... ) 
while Ex7 (and the more general case of ExS) should 
be analysed roughly as: 
(AGENT 
(tMAN ...) 
(NUMBER SINGLE) 
(QU~ITIFIER SG) 
(DETERMINER ((DETI ONE))) 
(ATTRIBUTE ((PREP (PREP WITH)) 
AND 
((CHILD ...) 
(NUMBER ...) ...) ((wo~ 
...) 
2. Semantic constraint. 
NPs whose head noun semantic nrimitives are 
the same should be preferred when deciding the sco- 
pe of the two conjuncts coordinated by "and". How- 
ever, if no such NPs can be found, NPs with dif- 
ferent head noun semantic primitives are coordina- 
ted anyhow. 
Cf (Wilks 75). 
According to rule 2, Exl should be roughly 
represented as 'The man with (AND (telescope) (um- 
brella))'; Ex2, 'The man with (AND (telescope) 
(umbrella with a handle))'; Ex3, '(AND (man with 
telescope) (woman))' and Exh, '(AND (man with te- 
lescope) (woman with umbrella))' 
3. Symmetry constraint. 
When rules i and 2 are not enough for deci- 
ding the scope of the conjuncts, as for Ex5 and 
Ex6, this rule of preferring conjuncts with symme- 
trical pre-modifiers and/or post-modifiers will be 
in effect: 
Ex5 .... with (AND (child) (woman)) ... 
Ex6. (AND (the man with ...) (the woman with 
...))... 
h. Closeness constraint. 
If all the three rules above cannot help, the 
NP to the left of "and" which is closest to the co- 
ordinator should be coordinated with the NP imme- 
diately following the coordinator: 
Ex9. The man with (AND (child) (umbrella)) 
fell. 
V THE IMPLEMENTATION 
The seemingly straightforward way for deal- 
ing with conjunctions using the ATN grammars would 
be to add extra WRD AND arcs to the existing sta- 
tes, as (Black-well 81) proposed. The problem with 
this method is that, as (Boguraev in press) point- 
ed out, "generally speaking, one will need WRD AND 
arcs to take the ATN interpreter from just about 
every state in the network back toalmosteach pre- 
ceding state on the same level, thus introducing 
large overheads in terms of additional arcs and 
complicated tests." 
Instead of adding extra WRD AND arcs to the 
existing states in a standard ATN gra~,nar, I set 
up a whole set of states to describe coordination 
phenomena. The first few states in the set are as 
follows: 
(CONJ/ At the moment only 
((JUMP AND/) "and" is taken into 
(EQ (GETR CONJUNCTION)consideration. 
'AND) ) ..,.a.) 
(AND/ 
((J~4P S/) Try to analyse RIGHT 
LEFT-PART-I~-CLAUSE) -PART as a clause, if 
LEF2-PART is one. 
((JUMP S/) This arc is for such 
(AND (EQ LEFT-WORD- cases as Exl5. 
CAT ' VERB) 
NPSTART) 
( (SETQ BUILD-RIGHT-CLAUSE-FIRST 'T) ) ) 
((PUSH NP/) (NPSTART) Try phrasal coordi- 
((sENDR SUBJNP T) nation. 
(SETR RIGHT-PHRA~E ") 
83 
(SETR RIGHT-PHRS-SMNTC-CAT (~EAD (C~a_R *))) 
(IF NMODS-CONJ THEN 
(SETQ **NP-STACK 
(REVERSE **NP-STACK)))) The role of 
(TO AND/NP/PREPARE)) **NP-STACK 
will be ex- 
plained la- 
ter. 
((JUMP S/NP) For cases 
(EQ (GET CURRENT-WORD 'CAT) like Exlh. 
'VERB) 
((SETQ BUILD-RIGHT-CLAUSE-FIRST 'T)))) 
(AND/NP/PREPARE 
((JUMP AND/NP) T 
(SETQ **TOP-OF-NP-STACK (POP **NP-STACK)))) 
(AND/N? 
((JUMP AND/NP/MATCH) T 
((SETR LEFT-PHRASE (CAR (GETR **TOP-OF- 
NP-STACK))) 
(SETR LEFT-PHRASE-SYN (CAR (REVERSE 
(GETR **TOP-OF-NP-STACK)))) 
(SETR LEFT-PHRS-SMNTC-CAT (HEAD (CAAR 
(GETR **TOP-OF-NP-STACK)))))))) 
( AND/NP/MATCH 
((JUMP AND/NP/COORD) 
(EQ (GETR LEFT-PHRS-SMNTC-CAT) To imple- 
(GETR RIGHT-PHRS-S~TC-CAT))ment se- 
o..) mantic 
((J~4P AND/NP) constaint. 
(NOT (NULL **NP-STACK)) 
(SETR **TOP-OF-~-STACK (POP **NP-STACK))) 
((JUMP AND/NP/COORD) T) .o.) 
The CONJ/ states can be seen as a subgrammr 
which is separated from the main (conventional) ATN 
grezmar, and is connected with the main grammar via 
the interpreter. 
The parser works in the following way. 
Before a conjunction is encountered, the par- 
ser works normally except that two extra stacks are 
set: **NP-STACK and **PREP-STACK. Each NP, either 
EL-NP or EXP-NP, is pushed into **NP-STACK,together 
with a label indicating whether the NP in question 
is a subject (SUBJ) or an object (OBJ) or a prepo- 
sition object (NP-IN-NMODS). 
The interpreter takes responsibility of look- 
ing ahead one word to see whether the word to come 
is a conjunction. This happens when the interpret- 
er is processing "word-consuming" arcs, ie CAT, 
WRD, MEM and TST arcs. Hence no need for expli- 
citly writing into the grammar WRD AND arcs at all. 
By the time a conjunction is met, while the 
interpreter is ready to enter the CONJ/ state, ei- 
ther a clause (ExlO-13) or a noun phrase in subject 
position (Exl-9) would have been POPed, or a verb 
(Exlh-15) would have been found. For the first ca- 
se, a flag LEFT-PART-IS-CLAUSE will be set to true, 
and the interpreter will t~j to parse RIGHT-PART as 
a clause. If it succeeds, the representation of a 
sentence consisted of two coordinated clauses will 
be outputted. If it fails, a flag RIGHT-PART-IS- 
NOT-CLAUSE is set up, and the sentence will be re- 
parsed. This time the left-part will not be treat 
-ed as a clause, and a coordinated NP object will 
be looked for instead. ExlO and Exll are examples 
of coordinated clauses and coordinated NP object, 
respectively. One case is treated specially: when 
LEFT-PART-IS-CLAUSE is true and RIGHT-WORD is a 
verb (Exl3), the subject will be copied from the 
left clause so that a right clause could be built. 
For the second case, a coordinated NP subject 
will be looked for. Eg, for Exh, by the time "and" 
is met, an I~P "the man with the telescope" would 
have been POPed, and the state of affairs or the 
**NP-STACK would be like this: 
(((MAN ...) (NUMBER ...) (QUANTIFIER ...)(DE- 
TERMINER ,.. ) (ATTRIBUTE ( (PREP (PREP WITH) ) ( (TE- 
LESCOPE ... ) ) ) SUBJ) ( (TELESCOPE ... )NP-IN-NMODS) ) 
After the excution of the arc ((PUSH NP) (NP- 
START)), RIGHT-PHRASE has been found. If it has 
an PP modifier, a register NMODS-CONJ will be set 
to the value of the modifier. Now the NPs in the 
**NP-STACK will be POPed one by one to be compared 
with the right phrase semantically. The NP whose 
formula head (the head of the NOUN in it) is the 
same as that of the right conjunct will be taken 
as the proper left conjunct. If the NP matched is 
a subject or object, then a coordinative NP sub- 
ject or object will be outputted; if it is an EL- 
NP in a PP modifier, then a function REBUILD-SUBJ 
or REBUILD-OBJ, depending on whether the modified 
EXP-NP is the subject or the object, will be call- 
ed to re-build the EXP-NP whose PP modifier should 
consist of a preposition and two coordinated NPs. 
Here one problem arises: for Ex5, the first 
NP to be compared with the right phrase ("the wo- 
man") would be "the man with the child" whose head 
noun "~usn" would be matched to "woman" but, accor- 
ding to our Symmetry Constraint, it is "child" 
that should be matched. In order to implement this 
rule, whenever NMODS-CONJ is empty (meaning that 
the right NP has no post-modifier), the **NP-STACK 
should be reversed so that the first NP to be tri- 
ed would be the one nearest to the coordinator (in 
this case "the child"). 
For the third case (LEFT-WORD is a transitive 
verb and the object slot is empty, Exs lh and 15), 
right clause will be built first, with or without 
copying the subject from LEFT-PART depending on 
whether a subject can be found in RIGHT-PART.Then, 
the left clause will be completed by copying the 
object from the right clause, and finally a clau- 
sal coordination representation will be returned. 
In the course of parsing, whenever a finite 
verb is met, the NPs at the same level as the verb 
and havin~ been PUSHed into the **NP-STACK should 
be deleted from it so that when constructing p(s- 
sible coordinative NP object, the NPs in the sub- 
ject position would not confuse the matching. Exll 
is thus correctly analysed. 
84 
9-I DISCUSSION 
The package is written in RUTGERS-UCI LISP and 
is implemented on the PDP-IO computer at the Uni- 
versity of Essex. It performs satisfactorily. How- 
ever, there is still much work to be done. For ins- 
tance, the most efficient way for treating reduced 
conjunctions is to be found. Another problem is 
the scope of the pre-modifiers and post-modifiers 
in coordinate constructions, for the resolution of 
which the Symmetry constraint may prove inadiquate 
(eg, it cannot discriminate "American history and 
literature" and "American histolv and physics"). 
It is hoped that an ATN parser capable of de- 
sling with a large variety of coordinated construc- 
tions in an efficient way will finally emerge from 
the present work. 
ACKNOWLEDGEMENTS 
I would like to thank Prof. Wilks of the De- 
partment of Language and Linguistics of the Uni- 
versity of Essex for his advice and his patience 
in reading this paper and discussing it with me. 
Any errors in the paper are mine, of course. I 
would also like to thank Dr. Boguraev and my col- 
league Fass for part of their parsing programs. 
REFERENCES 
Blackwell, S.A. "Processing Conjunctions in an ATN 
Parser". Unpublished M.Phil. Dissertatation, U- 
niversity of Cambridge, 1981. 
Boguraev, B.K. "Automatic Resolution of Linguistic 
Ambiguities". Technical Report No. ii, Universi- 
ty of Cambridge Computer Laboratory, Cambridge, 
1979. 
Boguraev, B.K. "Recognising Conjunctions within 
the ATN Framework". Sparck-Jones, K. and Wilks, 
Y. (eds), Automatic Natural Language Parsing, 
Ellis Horwood (in press). 
Gazdar, G. "Unbounded Dependencies and Coordinate 
Structure". Linguistic Inquiry 12, 155-84,1981. 
Nagao, M., Tsijii, J., Yada, K., and Kakimoto, T. 
"An English Japanese Machine Translation System 
of the Titles of Scientific and Engineering Pa- 
pers". In Horecky, J. (ed), COLING 82, North- 
Holland Publishing Company, 1982. 
Radford, A. Transformational S_S_S_Synt~. Cambridge 
University~~1981. 
Ross, J.R. Constraints on Variables in Syntsx. 
Doctoral Dissertation, MIT, Cambridge, Massach- 
usetts, 1967. Also distributed by the Indiana 
University Linguistics Club, Bloomington,lndiana 
1968. 
Schachter, P. "Constraints on Coordination," Lan- 
guage 53, 86-103, 1977. 
Wilks, Y.A. "Preference Semantics". In Keenan(ed), 
Formal Semantics of Natural Language, Cambridge 
University Press, London, 1975. 
Wilks, Y.A. "Making Preferences More Active". AI 
1978. 
• Appllcatlon, Williams, E.S "Across-the-Board Rule " " " 
Linguistic Inquiry 9, 31-3h, 1978. 
Winograd, T. Understandin$ Natural Language, Aca- 
demic Press, N.Y., 1972. 
Woods, W. "An Experimental Parsing System for 
Transition Network Grammar". In Rustin, R.(ed), 
Natural LanEua~e Processing, Algorithmic Press, 
N.Y., 1973. 
85 
