COLING 82, J. Horeck2) (ed.) 
North-Holland Publishing Company 
© Academia, 1982 
PROCESSLNG OF SENTENCES WITH INTRA-SENTENTIAL CODE-SWITCHING i 
Aravind K. Joshi 
Depar~snent of Computer and Information Science 
R. 268 Moore School 
University of Pennsylvania 
Philadelphia, PA 19104 
Speakers of certain bilingual communities systematically produce 
utterances in whichthey switch from one language to another, suggest- 
ing that the two language systems syst~matically interact with each 
other in the production (and reoognitlon) of these sentences. We 
have investigated this phenomenon in a formal or computational frame- 
work which consists of two gramnatical systems and a mechanism for 
switching between the two systems. A variety of constraints apparent 
in these sentences are then explained in terms of constraints on the 
switching mechanism, especially, those on closed class items. 
I. INTRODUCTION 
Speakers of certain bilingual cc~nunities systematically produce utterances in 
which they switch from one language to another (called code-switching), possibly 
several times, in the course of an utterance. Production and comprehension of 
utterances with intr~sentential code-switching is part of the linguistic eompe- 
tenoe of the speakers and hearers of these cc~nunities. Much of the work on 
code-switching is in the sociolinguistic framework and also at the discourse 
level. Recently ther~ have been few studies of code-switching within the scope 
of a single sentence. (See Sridhar (1980) for a good review, also Pfaff (1979)). 
Also until recently, this phenomenon has not been studio4 in a formal or computa- 
tior~.l framework. (See Sankoff and Poplack (1980), Woolford (1980), Joshi (1980), 
and boron (1981). Space does not permit a detailed comparison. ) 
The discourse level of code-switching is important, however, it is only at the 
intrasentential level that we are able to observe with some certainty, the inter- 
action between two Erammatical systems. These interactions, to the extent they 
can be systematically characterized, provide a nice framework for investigating 
some processing issues both fran the generation and parsing points of view. 
There are some important charscteristios of intrasentential code-switching which 
give hope for the kind of work described here. These are as follows, i. The 
situation which we are concerned with involves participants who are about equally 
fluent in both languages. 2. Participants have fairly consistent judgements 
about the "acceptability" of mixed sentences. (In fact it is amazing that parti- 
cipants have such acceptability judgements at all. ) 3. Mixed utterances are 
spoken without hesitation, paus. es, repetitions, corrections, etc. , suggesting 
that intrasentential code-switching is not some random interference of one system 
with the other. Rather, the switches seem to be due to systematic interactions 
between the two systems. 4. The two l~age systems seem to be simultaneously 
active. 5. Intr~sentential code-switching is sharply distinguished from other 
interferences such as borrowing, learned use of foreign words, filling lexical 
gaps,.etc, all of which could be exhibited by monolingual speakers. 6. Despite 
extensive intrasentential switching, speakers and hear~re usually agree on which 
language the mixed sentence is "ocraing from". We call this language the matrix 
145 
146 A~. JOSHI 
~s and the other language the embedded l~e. These interesting charac- of the mixed sentences suggest that the two language systems are syste- 
matically interacting with each other in the production (and recognition) of the 
mixed sentences. 
Our main objectives in this paper are (i) to formulate a system in terms of the 
grammars of the two languages and a switching rule, (2) to show that a variety of 
observable constraints on intrasentential cede-switching can be formulated in 
terms of constmaints on the switching rule. The main result of this paper is 
that a large ntm%ber of constraints can be derived from a general constraint on 
the switchability of the so-called closed class items (determinizers, quantifiers, 
prepositions, tense morphemes, auxilliaries ~ helping verbs, complementizers, pro- 
nouns, etc. ). This result is of interest because the differential behavior of 
closed class items (as compared to the open class items) has been noted in 
various aspects of language processing (in the monolingual case), for example, 
(i) certain types of speech errors which strand the closed class items, (2) resis- 
tance to change as well as resistance to incGrporate new items as closed class 
items, (3) frequency indpendent lexieal decision for closed class items (as com- 
pared to open class items for which lexical decision is frequency dependent), (4) 
the absence of frequency independence for closed class items in certain types of 
aphasia, (5) closed class items aiding in ccm~prehension strategies, etc. (This 
list is based on a talk given by Mary-Louise Kean at the University of Pennsylva- 
nia). It is not clear what the relationship is between the behavior of closed 
classes in intrasentential code-switching and the other behaviors (in monolingual 
situations) described above. We believe r however~ that investigating this 
relationship may give some clues concerning the organization of the granmar and 
the lexicon, and the nature of the inter~ace between the two language systems. 
The examples in our paper are all from the language pair, Marathi (m) and English 
(e), Marathi (m) is the matrix language and English (e) is the embedded language. 
(The coincidence of the abbreviation m for the matrix language, which is Marathi 
and e for the embedded language, whieJ~ is English, is accidental, but a happy 
one:~-. A few facts about Marathi will be useful to note. It is an Indo-Euro- 
pean language (spoken on the west coast of India near Bombay and in parts of 
central India by about 60 million people). It is an S0V language. Adjectives 
and relative clauses appear prencmirally and it has postpositions instead of pre~ 
positions. It uses a rich supply of auxilliary or helping verbs. Other facts 
about Mamathi will become apparent in the examples. (See Section 3). 
2. FORMULATION OF THE SYSTI~ 
Let L m be the matrix language and L e be the embedded language. Further let G m and 
Ge be the corresponding grammars, i.e., Gm is the matrix ~rmmnaF and Ge.is the 
embedded ~. A "mixed" sentence is a sentence which contaLns lexlcal items 
from both L m and I e. Let L x be the set of all mixed sentences that are judged to 
be acceptable. Note that a mixed sentence is not a sentence of either L m or L e. 
However, it is judged to be "ccming f-rcm" im. The task is to formulate a system 
characterizing L x. Our approach is. to formulate a system for Lx in terms of Gm 
and Ge and a 'control structure' which permits shifting control from Gm to Ge but 
not from G e to G m. We assume a 'correspondence' between categories of Gm and Ge, 
for example, NP m corresponds to NP e (written as NPm~NPe). Control is shifted 
by a switching r~le of the form 
(2.1) A m x A e, where Am is a category of Gin, Ae is a category of Ge, and 
Am =Ae. 
At any stage of the derivation, (2.1) can be invoked, permitting A m to be switched 
to Ae. Thus further derivation involving Am will be carried out by using rules 
of Ge, starting with Ae. The switching rule in (2. i) is as>mrnetric i.e., 
switching a category of the matrix grammar to a category of the embedded gr~ummr 
INTRA-SENTENTIAL CODE-MIXING 147 
is permitted but not vice versa. This asynmetry can be stated directly in the 
rule itself, as we have done, or it can be stated as a constraint on a more gener- 
alized switching rule which Qill permit switching from Amto A e as well as the 
other way rotund. We have chosen to state the asyrgretry by incorporating it in 
the rule itself because the asyn~netry plays such a central role in our formulation. 
This asygmetric switching rule together with the further constraints described in 
Section 3 is intended to capture the overpowering judgement of speakers about a 
mixed sentence "coming from" the matrix language Lm. The switching rule in (2.1) 
is neither a rule of GmnOr a rule of G e. It is also not a rule of a grammar, 
say G x for Lx. As we have said before, we will construct a system for L x in 
terms Of Gmand G e and a switchingrule and not in terms of a third ~, say 
G x. Although formally this can be done, there are important reasons for not 
doing so. Using this general framework we will now show that the system for L x 
can be formulated by specifying a set of constraints on the switching rule (beside~ 
the asyn~netry constraint). These further constraints primarily pertain to the 
closed class items. 
3. CONSTRAINTS ON THE SWITCHING RULE 
Our hypothesis is that Lx can be completely characterized in terms of constraints 
on the switching rule (2.1). The types of constraints can be characterized as 
follows; 
3.1 Asyn~netry: We have already discussed this constraint. In fact~we have 
incorporated~t in the definition of the switching rule itself. The main justi- 
fications for as!m~etry are as follows. (a) We want to maintain the notion of 
matrix and embedded languageS and the asyrmnetry associated with this distinction. 
(b) Arbitrarily long derivations would be possible, for example, by allowin~ 
back and forth switching ofAmandAe along a nonbranchingpath. Them appears 
to be no motivation for allowing such derivations. (c) The asymmtry constraint 
together with certain constraints on the non-switchability of closed class items 
seem to allow a fairly cGmplete characterization of L x. 
3.2 Constraint on switchability of certain categories: Rule (2.1) permits 
switching any category Am toA e if Am~Ae. However Certain categories cannot be 
switched. Although all major categorles can be switched, we must exclude the 
root node S m. Obviously, if we permit Sm to be switched to Se, we can derive 
a sentence in Le starting with S m in a trivial manner. Hence, we need the 
following constraint. 
(3.2.1) Root node Sm cannot be switched. 
Constraints on closed class items: (3.2.2) Certain closed class itemS such as 
tense, ihx, and helping verbs when they appea~in"main VP cannot be sWitc/%eo. 
Examples: (underlined items in the examples are from Lm). 
(3.1) mula khurcy~ rangawtat. 
boys ~ paint 
(3.2) mula khurcya paint kartat. 
~nse) 
In (3.2)the root verb has been switched fr~n Marathi to English. The closed 
class item tat is not switched, however it is attached to an auxilliary or help- 
ingverb ~-~--since it cannot be stranded. This phenomenon'appears in mixed 
sentences-~other language pairs (see Pfaff(1979).) 
It is not possible to switch both the V and the tense in (3.1), and also not the 
entire VP. 
148 A.K. JOSHI 
( 3.3 ) ~mula khurcy_~a paint. (3.4) emula paint chairs. 
Note that (3.4) could be derived by starting with S e (i.e., by starting to derive 
a sentence of le) and then switching VP e to VPm, but this is not permitted by the 
asyT~netry constraint. Of course, one cannot start with the S e node because this 
requires switching S m toS e which is blocked by the constraint on the switchability 
of the root node. 
(3.2.3) Closed class items (e.g., determiners, quantifiers, prepositions_, 
possessiye, aux, tense, helping verbs, etc. ) cannot be switched. Thus, for example, 
DET m cannot be "swm~ched to DET e. This does ~ that a lexical item belong- 
ing to DET e cannot appear in the mixed sentence. It can indeed appear if NP m has 
already been switched to NP e and then NP e is expanded into DETeN e according to Ge. 
(3.5) kahi khurcya DETmNm (3.6) some chairs DETeN e s-~66 ~ 
(3.7) ~ chairs DETmN e (3.8) ~" some ~ ¢~ DETeN m 
Adjectives are not closed classes; hence all four combinations below are possible. 
(3.9) unca pe~i ~ (3.10) unc___aa box (3.11) tall ~ (3.12) tall box 
Note that (3.12) is a Marathi NP m in which both the A m and N m have been switched. 
It is not derived fr~n NPe, if it were, it would have a determiner. (Determiner 
is optional in Marathi). 
Prepositions and postpositions are closed class items. Marathi has postpositions 
while English has prepositions. 
(3.13) kahi khurcrey~war (3.14) ~ chairswar (3.15)~ some chairswar + 
some c\]nairs on Q 
(3.16) ~'~ sc\[ne chairs on (3.17)* kahi khurcya on (3.18) on some e_hairs 
(3.19) ~on k~hi khurcy~ (3.20) * war k~hi khurcya )3.21) ~'~ war some chairs 
(3.2.3) Constraints on Complementizers: Complementizers are closed class items 
and therefore c/nnot be' switched in the same sense as in (3.2.2) above. However, 
often we have a choice of a oomplementizer. This choice depends both on the 
matrix verb V m and the embedded verb V e (V m ~ V e) to which V m has been switched. 
let the complementizerslof ym h~l C0MPm : \[ CI,IC2, C3}land the complementizers of 
V e (~Vm) be COMPe= {CI, C 2, C 4} where CI~CI, C2~ C 2. Now if V m is switched to 
V e i.e., the verb is lexically r~alized in the embedded language, then the choice 
of the oomplementizer is constrained in the following manner. Since complemen- 
tizers are closed classes, they cannot be switched. Hence, the choice is CI, C 2 , 
or C3; hQwever 9nly C1 and C 2 are permitted, as the equivalent lexical verb V e 
permits C i and C~ which are the eqgivalents of C\] and C 2 respectively. C$ is not 
permltted because its equlvalent C 3 ms not pertainS-ted for Ve, and C 4 whlch s the 
equivalent of C¼ is not permitted because it is not allowed by V m. Thus the only 
oomplementizers that are permitted, if V m is switched to V e , are those that are 
permitted by V m and the equivalents of which are permitted by V e (Vm.~Ve). Thus 
the choice is constrained not only to the complementizers of V m (because of non 
switchability of complementizers) but it is further constrained by the choice of 
complementizers of V e as explained above. 
+This is a problematic case which is discussed in detail in the longer version 
of this paper. 
INTRA-SENTENTIAL CODE-MIXING 149 
Exa _ : 
(3.22) tS 
going d~cides 
(3.23) * to perat jayla__ ~_a.~. __la:t° 
~-6 ~--~ to g~ decides 
The Marathi verb har_ih~aw (decide) takes the com#lementizer ca(ing) but not the 
com#lementizer la(to). The corresponding English verb de.de takes both the 
complementizers to and ing (after on). We now switch the Marathi verb V m (tharaw) 
to V e (decide) in---~th 3~.22) and ~.23). Since the tense in the main VP~-~--~ 
be switched (as we have seen in (3.1) and (3.2) earlier) a helping verb kar (do) 
has to be introduced so that the tense can be attached to it. Thus we have 
(3.24) tS ~ ~ decide kartS, ca:ing 
h-'e hack golng ~ense) -- 
(3.25) * tS p~ jayla decide kartS, la:to HS 
to go ~-0~ense ) -- 
Note that although decide takes both the cc~plementizers to and ing, only (3.24) 
is allowed. (3.25~-i~locked because the Marathi verb ~raw ~-~s not allow 
the complementizer to__. Thus the only ecmplementizer that appears in the mixed 
sentence is ing. 
There are several interesting issues concerning the generation and recognition 
of sentences such as (3.24) and (3.25). For example, at what point the decision 
to switch the main verb is made? (We could have raised this issue earlier when 
we discussed (3.1) and (3.2)). Since a new helping verb has to be introduced 
when the switch is made, does it mean that sc~e 'local' structural change has to 
be made along with the switching of the verb? Another point is that the choice 
of the ccmplementizer (which canes before the matrix verb) also determines whether 
the verb can be switched or not. The machinery we have provided so far may have 
to be augmented to provide systematic answers to these questions. Thus for 
example, we may have to introduce additional constraints on the switching rules. 
4. PARSING CONSIDERATIONS 
In this paper, we have given an account of the constr~ts on intrasentential 
code-switching in a generative framework. The formal model and the constraints 
on switching that we have proposed clearly have implications for the kind of 
par~er we may be able to construct. We will not pursue this aspect in this 
paper. However, we would like to point out that by adopting some parsing stra- 
tegies, we can account for scme of the constraints described earlier. A pre 
liminary attempt was made in Joshi (1981) by proposing a strategy involving a so- 
called left corner constraint. This stretegy has some serious drawbacks as was 
pointed out by Doron (1981). She has proposed an alternate strategy called 
'early deter~nination strategy', according to which the parser tries to determine 
as early as possible the language of the major constituent it is currently pars- 
ing. Thus upon encountering a Marathi (m) determiner i.e., DET m the parser 
would predict a Marathi NP m. The Marathi N m could be then realized lexically in 
Marathi or the Nm would be switched to N e and then lexically realized in English. 
NP m is expanded into DETmNomm where Nora m is expanded into AmN m. 
Note that A m and N m could be independently switched to A e and N e respectively, 
thus giving four possible sequences: 
DET m Am Nm, DET m A m Ne, DET m A e Nm, DET m A e Ne, all of which 
are permissible. 
If the parser encountered an English determiner, i.e. DET e then it would predict 
NPe, but now N e or AeN e into whichNPe can expand carrot be switched to N m or Am 
150 A.K. JOSH| 
because of the asymmetry constraint. Thus the only permissible sequence is 
DET e (A e) Ne, and the following are excluded, i.e., *DET e Nm, * DET e A e Nm~ * DET e 
Am Ne, * DET e A m Nm, which checks with the data. 
Of course, so far we have the same predictions as we had with the constraint on 
the nonswitchability of closed class items. However, there is some evidence to 
the effect that a parsing strategy as described above may be in effect. The 
following distribution is correctly predicted by the above strategy: 
(8.i) * tall pe~ya (5.2) tall boxes (5.3) unea pe~ya (5.4) unca boxes. 
(5.1) is disallowed, because upon encountering an English adjective, Ae, the parser 
predicts Nome, which is expanded into A e Ne. However, N e cannot be realized 
lexically in Marathi, unless N e is switched to Nm, which is disallowed. Note that 
(S.l) cannot be disallowed by invoking nonswitchability of adjectives, because 
these are not closed classes. This early determination strategy does not help 
however in accounting for the distribution of phrases involving postpositions (see 
Section 3 ). 
Our conclusion at present is that the framework described in Section 3 along with 
the constraints on closed class ite~as is the proper way to formulate the code- 
switching system. A parsing strategy as diseussed above is perhaps also opera- 
tive (see Examples (5.1) - (5.4)) and when a closed class item is the leftmost 
constituent of a major category then the two formulations made the same predictions. 
5. CONCLUSION 
We have presented a formal model for characterizing intra-sentential code-switch- 
ing. The main features of this model are that i) the model treats the two gram- 
mars (languages) asynmetrically, 2) there is no third gran~ar, and 3) the con- 
straints on the switchability of closed class items. We believe that further 
investigation of code-switching in the proposed framework will be very productive, 
as it captures some essential aspects of intrasentential code,switching. Another 
interesting result concerns the role of closed class items. Since several inpor~ 
tant characteristics of closed class items are well-known in the context of pro- 
cessing of monolingual utterances, we think that further investigation of the role 
of closed class items in the context of code-switching will give us some insights 
into the processing of monolingual utterances. Our investigation of intra- 
sentential code-mixing can also be considered as a small contribution towards the 
larger problem of determining the nature of the interface between the two language 
systems of a bilingual speaker or hearer. 

REFERENCES 

Doron, E., "On formal models of code:switching", MS., U. of Texas at Austin (1981). 

Joshi, A.K., "Some problems in processing sentences with intrasentential code 
switching", Extended Abstract of paper read at the U. of Texas Parsing Workshop, March 1981. 

Pfaff, C., "Constraints on language switching" Language, Vol. 55, 1979. 
Sankoff, D. and Poplack, S. , "A formal g~anmar of code.switching: Working Paper, 
1980. 

Sridhar, S.N., "The syntax and psycholinguistics of bilingual code-mixing", 
Studies in the Linguistic Science, Spring 1980, University of Illinois. 

Woolford, E., "A formal model of bilingual cede-switching", Working Paper, 
M.I.T., 1980. 
