File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/82/c82-1023_metho.xml
Size: 20,745 bytes
Last Modified: 2025-10-06 14:11:23
<?xml version="1.0" standalone="yes"?> <Paper uid="C82-1023"> <Title>PROCESSLNG OF SENTENCES WITH INTRA-SENTENTIAL CODE-SWITCHING i</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> PROCESSLNG OF SENTENCES WITH INTRA-SENTENTIAL CODE-SWITCHING i </SectionTitle> <Paragraph position="0"> Speakers of certain bilingual communities systematically produce utterances in whichthey switch from one language to another, suggesting that the two language systems syst~matically interact with each other in the production (and reoognitlon) of these sentences. We have investigated this phenomenon in a formal or computational framework which consists of two gramnatical systems and a mechanism for switching between the two systems. A variety of constraints apparent in these sentences are then explained in terms of constraints on the switching mechanism, especially, those on closed class items.</Paragraph> </Section> <Section position="2" start_page="0" end_page="145" type="metho"> <SectionTitle> I. INTRODUCTION </SectionTitle> <Paragraph position="0"> Speakers of certain bilingual cc~nunities systematically produce utterances in which they switch from one language to another (called code-switching), possibly several times, in the course of an utterance. Production and comprehension of utterances with intr~sentential code-switching is part of the linguistic eompetenoe of the speakers and hearers of these cc~nunities. Much of the work on code-switching is in the sociolinguistic framework and also at the discourse level. Recently ther~ have been few studies of code-switching within the scope of a single sentence. (See Sridhar (1980) for a good review, also Pfaff (1979)).</Paragraph> <Paragraph position="1"> Also until recently, this phenomenon has not been studio4 in a formal or computatior~.l framework. (See Sankoff and Poplack (1980), Woolford (1980), Joshi (1980), and boron (1981). Space does not permit a detailed comparison. ) The discourse level of code-switching is important, however, it is only at the intrasentential level that we are able to observe with some certainty, the interaction between two Erammatical systems. These interactions, to the extent they can be systematically characterized, provide a nice framework for investigating some processing issues both fran the generation and parsing points of view.</Paragraph> <Paragraph position="2"> There are some important charscteristios of intrasentential code-switching which give hope for the kind of work described here. These are as follows, i. The situation which we are concerned with involves participants who are about equally fluent in both languages. 2. Participants have fairly consistent judgements about the &quot;acceptability&quot; of mixed sentences. (In fact it is amazing that participants have such acceptability judgements at all. ) 3. Mixed utterances are spoken without hesitation, paus. es, repetitions, corrections, etc. , suggesting that intrasentential code-switching is not some random interference of one system with the other. Rather, the switches seem to be due to systematic interactions between the two systems. 4. The two l~age systems seem to be simultaneously active. 5. Intr~sentential code-switching is sharply distinguished from other interferences such as borrowing, learned use of foreign words, filling lexical gaps,.etc, all of which could be exhibited by monolingual speakers. 6. Despite extensive intrasentential switching, speakers and hear~re usually agree on which language the mixed sentence is &quot;ocraing from&quot;. We call this language the matrix</Paragraph> </Section> <Section position="3" start_page="145" end_page="145" type="metho"> <SectionTitle> 146 A~. JOSHI </SectionTitle> <Paragraph position="0"> ~s and the other language the embedded l~e. These interesting charac- of the mixed sentences suggest that the two language systems are systematically interacting with each other in the production (and recognition) of the mixed sentences.</Paragraph> <Paragraph position="1"> Our main objectives in this paper are (i) to formulate a system in terms of the grammars of the two languages and a switching rule, (2) to show that a variety of observable constraints on intrasentential cede-switching can be formulated in terms of constmaints on the switching rule. The main result of this paper is that a large ntm%ber of constraints can be derived from a general constraint on the switchability of the so-called closed class items (determinizers, quantifiers, prepositions, tense morphemes, auxilliaries ~ helping verbs, complementizers, pronouns, etc. ). This result is of interest because the differential behavior of closed class items (as compared to the open class items) has been noted in various aspects of language processing (in the monolingual case), for example, (i) certain types of speech errors which strand the closed class items, (2) resistance to change as well as resistance to incGrporate new items as closed class items, (3) frequency indpendent lexieal decision for closed class items (as compared to open class items for which lexical decision is frequency dependent), (4) the absence of frequency independence for closed class items in certain types of aphasia, (5) closed class items aiding in ccm~prehension strategies, etc. (This list is based on a talk given by Mary-Louise Kean at the University of Pennsylvania). It is not clear what the relationship is between the behavior of closed classes in intrasentential code-switching and the other behaviors (in monolingual situations) described above. We believe r however~ that investigating this relationship may give some clues concerning the organization of the granmar and the lexicon, and the nature of the inter~ace between the two language systems.</Paragraph> <Paragraph position="2"> The examples in our paper are all from the language pair, Marathi (m) and English (e), Marathi (m) is the matrix language and English (e) is the embedded language.</Paragraph> <Paragraph position="3"> (The coincidence of the abbreviation m for the matrix language, which is Marathi and e for the embedded language, whieJ~ is English, is accidental, but a happy one:~-. A few facts about Marathi will be useful to note. It is an Indo-European language (spoken on the west coast of India near Bombay and in parts of central India by about 60 million people). It is an S0V language. Adjectives and relative clauses appear prencmirally and it has postpositions instead of pre~ positions. It uses a rich supply of auxilliary or helping verbs. Other facts about Mamathi will become apparent in the examples. (See Section 3).</Paragraph> </Section> <Section position="4" start_page="145" end_page="145" type="metho"> <SectionTitle> 2. FORMULATION OF THE SYSTI~ </SectionTitle> <Paragraph position="0"> Let L m be the matrix language and L e be the embedded language. Further let G m and Ge be the corresponding grammars, i.e., Gm is the matrix ~rmmnaF and Ge.is the embedded ~. A &quot;mixed&quot; sentence is a sentence which contaLns lexlcal items from both L m and I e. Let L x be the set of all mixed sentences that are judged to be acceptable. Note that a mixed sentence is not a sentence of either L m or L e.</Paragraph> <Paragraph position="1"> However, it is judged to be &quot;ccming f-rcm&quot; im. The task is to formulate a system characterizing L x. Our approach is. to formulate a system for Lx in terms of Gm and Ge and a 'control structure' which permits shifting control from Gm to Ge but not from G e to G m. We assume a 'correspondence' between categories of Gm and Ge, for example, NP m corresponds to NP e (written as NPm~NPe). Control is shifted by a switching r~le of the form (2.1) A m x A e, where Am is a category of Gin, Ae is a category of Ge, and Am =Ae.</Paragraph> <Paragraph position="2"> At any stage of the derivation, (2.1) can be invoked, permitting A m to be switched to Ae. Thus further derivation involving Am will be carried out by using rules of Ge, starting with Ae. The switching rule in (2. i) is as>mrnetric i.e., switching a category of the matrix grammar to a category of the embedded gr~ummr</Paragraph> </Section> <Section position="5" start_page="145" end_page="145" type="metho"> <SectionTitle> INTRA-SENTENTIAL CODE-MIXING 147 </SectionTitle> <Paragraph position="0"> is permitted but not vice versa. This asynmetry can be stated directly in the rule itself, as we have done, or it can be stated as a constraint on a more generalized switching rule which Qill permit switching from Amto A e as well as the other way rotund. We have chosen to state the asyrgretry by incorporating it in the rule itself because the asyn~netry plays such a central role in our formulation.</Paragraph> <Paragraph position="1"> This asygmetric switching rule together with the further constraints described in Section 3 is intended to capture the overpowering judgement of speakers about a mixed sentence &quot;coming from&quot; the matrix language Lm. The switching rule in (2.1) is neither a rule of GmnOr a rule of G e. It is also not a rule of a grammar, say G x for Lx. As we have said before, we will construct a system for L x in terms Of Gmand G e and a switchingrule and not in terms of a third ~, say G x. Although formally this can be done, there are important reasons for not doing so. Using this general framework we will now show that the system for L x can be formulated by specifying a set of constraints on the switching rule (beside~ the asyn~netry constraint). These further constraints primarily pertain to the closed class items.</Paragraph> </Section> <Section position="6" start_page="145" end_page="145" type="metho"> <SectionTitle> 3. CONSTRAINTS ON THE SWITCHING RULE </SectionTitle> <Paragraph position="0"> Our hypothesis is that Lx can be completely characterized in terms of constraints on the switching rule (2.1). The types of constraints can be characterized as follows;</Paragraph> <Section position="1" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.1 Asyn~netry: We have already discussed this constraint. In fact~we have </SectionTitle> <Paragraph position="0"> incorporated~t in the definition of the switching rule itself. The main justifications for as!m~etry are as follows. (a) We want to maintain the notion of matrix and embedded languageS and the asyrmnetry associated with this distinction.</Paragraph> <Paragraph position="1"> (b) Arbitrarily long derivations would be possible, for example, by allowin~ back and forth switching ofAmandAe along a nonbranchingpath. Them appears to be no motivation for allowing such derivations. (c) The asymmtry constraint together with certain constraints on the non-switchability of closed class items seem to allow a fairly cGmplete characterization of L x.</Paragraph> </Section> <Section position="2" start_page="145" end_page="145" type="sub_section"> <SectionTitle> 3.2 Constraint on switchability of certain categories: Rule (2.1) permits </SectionTitle> <Paragraph position="0"> switching any category Am toA e if Am~Ae. However Certain categories cannot be switched. Although all major categorles can be switched, we must exclude the root node S m. Obviously, if we permit Sm to be switched to Se, we can derive a sentence in Le starting with S m in a trivial manner. Hence, we need the following constraint.</Paragraph> <Paragraph position="1"> (3.2.1) Root node Sm cannot be switched.</Paragraph> <Paragraph position="2"> Constraints on closed class items: (3.2.2) Certain closed class itemS such as tense, ihx, and helping verbs when they appea~in&quot;main VP cannot be sWitc/%eo. Examples: (underlined items in the examples are from Lm).</Paragraph> <Paragraph position="3"> (3.1) mula khurcy~ rangawtat.</Paragraph> <Paragraph position="4"> boys ~ paint (3.2) mula khurcya paint kartat.</Paragraph> <Paragraph position="6"> In (3.2)the root verb has been switched fr~n Marathi to English. The closed class item tat is not switched, however it is attached to an auxilliary or helpingverb ~-~--since it cannot be stranded. This phenomenon'appears in mixed sentences-~other language pairs (see Pfaff(1979).) It is not possible to switch both the V and the tense in (3.1), and also not the entire VP.</Paragraph> <Paragraph position="7"> a sentence of le) and then switching VP e to VPm, but this is not permitted by the asyT~netry constraint. Of course, one cannot start with the S e node because this requires switching S m toS e which is blocked by the constraint on the switchability of the root node.</Paragraph> <Paragraph position="8"> (3.2.3) Closed class items (e.g., determiners, quantifiers, prepositions_, possessiye, aux, tense, helping verbs, etc. ) cannot be switched. Thus, for example, DET m cannot be &quot;swm~ched to DET e. This does ~ that a lexical item belonging to DET e cannot appear in the mixed sentence. It can indeed appear if NP m has already been switched to NP e and then NP e is expanded into DETeN e according to Ge. (3.5) kahi khurcya DETmNm (3.6) some chairs DETeN e s-~66 ~ (3.7) ~ chairs DETmN e (3.8) ~&quot; some ~ C/~ DETeN m Adjectives are not closed classes; hence all four combinations below are possible. (3.9) unca pe~i ~ (3.10) unc___aa box (3.11) tall ~ (3.12) tall box Note that (3.12) is a Marathi NP m in which both the A m and N m have been switched. It is not derived fr~n NPe, if it were, it would have a determiner. (Determiner is optional in Marathi).</Paragraph> <Paragraph position="9"> Prepositions and postpositions are closed class items. Marathi has postpositions while English has prepositions.</Paragraph> <Paragraph position="10"> (3.13) kahi khurcrey~war (3.14) ~ chairswar (3.15)~ some chairswar + some c\]nairs on Q (3.16) ~'~ sc\[ne chairs on (3.17)* kahi khurcya on (3.18) on some e_hairs (3.19) ~on k~hi khurcy~ (3.20) * war k~hi khurcya )3.21) ~'~ war some chairs (3.2.3) Constraints on Complementizers: Complementizers are closed class items and therefore c/nnot be' switched in the same sense as in (3.2.2) above. However, often we have a choice of a oomplementizer. This choice depends both on the matrix verb V m and the embedded verb V e (V m ~ V e) to which V m has been switched. let the complementizerslof ym h~l C0MPm : \[ CI,IC2, C3}land the complementizers of V e (~Vm) be COMPe= {CI, C 2, C 4} where CI~CI, C2~ C 2. Now if V m is switched to V e i.e., the verb is lexically r~alized in the embedded language, then the choice of the oomplementizer is constrained in the following manner. Since complementizers are closed classes, they cannot be switched. Hence, the choice is CI, C 2 , or C3; hQwever 9nly C1 and C 2 are permitted, as the equivalent lexical verb V e permits C i and C~ which are the eqgivalents of C\] and C 2 respectively. C$ is not permltted because its equlvalent C 3 ms not pertainS-ted for Ve, and C 4 whlch s the equivalent of C1/4 is not permitted because it is not allowed by V m. Thus the only oomplementizers that are permitted, if V m is switched to V e , are those that are permitted by V m and the equivalents of which are permitted by V e (Vm.~Ve). Thus the choice is constrained not only to the complementizers of V m (because of non switchability of complementizers) but it is further constrained by the choice of complementizers of V e as explained above.</Paragraph> <Paragraph position="11"> +This is a problematic case which is discussed in detail in the longer version of this paper.</Paragraph> <Paragraph position="12"> The Marathi verb har_ih~aw (decide) takes the com#lementizer ca(ing) but not the com#lementizer la(to). The corresponding English verb de.de takes both the complementizers to and ing (after on). We now switch the Marathi verb V m (tharaw) to V e (decide) in---~th 3~.22) and ~.23). Since the tense in the main VP~-~--~ be switched (as we have seen in (3.1) and (3.2) earlier) a helping verb kar (do) has to be introduced so that the tense can be attached to it. Thus we have (3.24) tS ~ ~ decide kartS, ca:ing h-'e hack golng ~ense) -(3.25) * tS p~ jayla decide kartS, la:to HS to go ~-0~ense ) --Note that although decide takes both the cc~plementizers to and ing, only (3.24) is allowed. (3.25~-i~locked because the Marathi verb ~raw ~-~s not allow the complementizer to__. Thus the only ecmplementizer that appears in the mixed sentence is ing.</Paragraph> <Paragraph position="13"> There are several interesting issues concerning the generation and recognition of sentences such as (3.24) and (3.25). For example, at what point the decision to switch the main verb is made? (We could have raised this issue earlier when we discussed (3.1) and (3.2)). Since a new helping verb has to be introduced when the switch is made, does it mean that sc~e 'local' structural change has to be made along with the switching of the verb? Another point is that the choice of the ccmplementizer (which canes before the matrix verb) also determines whether the verb can be switched or not. The machinery we have provided so far may have to be augmented to provide systematic answers to these questions. Thus for example, we may have to introduce additional constraints on the switching rules.</Paragraph> </Section> </Section> <Section position="7" start_page="145" end_page="145" type="metho"> <SectionTitle> 4. PARSING CONSIDERATIONS </SectionTitle> <Paragraph position="0"> In this paper, we have given an account of the constr~ts on intrasentential code-switching in a generative framework. The formal model and the constraints on switching that we have proposed clearly have implications for the kind of par~er we may be able to construct. We will not pursue this aspect in this paper. However, we would like to point out that by adopting some parsing strategies, we can account for scme of the constraints described earlier. A pre liminary attempt was made in Joshi (1981) by proposing a strategy involving a so-called left corner constraint. This stretegy has some serious drawbacks as was pointed out by Doron (1981). She has proposed an alternate strategy called 'early deter~nination strategy', according to which the parser tries to determine as early as possible the language of the major constituent it is currently parsing. Thus upon encountering a Marathi (m) determiner i.e., DET m the parser would predict a Marathi NP m. The Marathi N m could be then realized lexically in Marathi or the Nm would be switched to N e and then lexically realized in English.</Paragraph> <Paragraph position="1"> NP m is expanded into DETmNomm where Nora m is expanded into AmN m.</Paragraph> <Paragraph position="2"> Note that A m and N m could be independently switched to A e and N e respectively, thus giving four possible sequences: DET m Am Nm, DET m A m Ne, DET m A e Nm, DET m A e Ne, all of which are permissible.</Paragraph> <Paragraph position="3"> If the parser encountered an English determiner, i.e. DET e then it would predict NPe, but now N e or AeN e into whichNPe can expand carrot be switched to N m or Am 150 A.K. JOSH| because of the asymmetry constraint. Thus the only permissible sequence is DET e (A e) Ne, and the following are excluded, i.e., *DET e Nm, * DET e A e Nm~ * DET e Am Ne, * DET e A m Nm, which checks with the data.</Paragraph> <Paragraph position="4"> Of course, so far we have the same predictions as we had with the constraint on the nonswitchability of closed class items. However, there is some evidence to the effect that a parsing strategy as described above may be in effect. The following distribution is correctly predicted by the above strategy: (8.i) * tall pe~ya (5.2) tall boxes (5.3) unea pe~ya (5.4) unca boxes.</Paragraph> <Paragraph position="5"> (5.1) is disallowed, because upon encountering an English adjective, Ae, the parser predicts Nome, which is expanded into A e Ne. However, N e cannot be realized lexically in Marathi, unless N e is switched to Nm, which is disallowed. Note that (S.l) cannot be disallowed by invoking nonswitchability of adjectives, because these are not closed classes. This early determination strategy does not help however in accounting for the distribution of phrases involving postpositions (see Section 3 ).</Paragraph> <Paragraph position="6"> Our conclusion at present is that the framework described in Section 3 along with the constraints on closed class ite~as is the proper way to formulate the code-switching system. A parsing strategy as diseussed above is perhaps also operative (see Examples (5.1) - (5.4)) and when a closed class item is the leftmost constituent of a major category then the two formulations made the same predictions.</Paragraph> </Section> class="xml-element"></Paper>