File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/87/j87-1003_metho.xml
Size: 18,981 bytes
Last Modified: 2025-10-06 14:12:01
<?xml version="1.0" standalone="yes"?> <Paper uid="J87-1003"> <Title>SIMULTANEOUS-DISTRIBUTIVE COORDINATION AND CONTEXT-FREENESS</Title> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 MATHEMATICAL TECHNIQUES FOR ESTABLISHING TRANS-CONTEXT-FREENESS </SectionTitle> <Paragraph position="0"> We shall rely here on a number of established mathematical results which, taken together, give us a way of establishing trans-context-freeness for a language with certain syntactic properties.</Paragraph> <Paragraph position="1"> Theoreml (Bar-Hillel et al. 1961) The set of context-free languages is closed under homomorphism.</Paragraph> <Paragraph position="2"> Theorem 2 (Interchange Lemma, Ogden et al.</Paragraph> <Paragraph position="3"> 1985) Let L be a CFL, and let L n be the set of length n strings in L. Then there is a constant C L such that for any n, any nonempty subset Qn of L n, and any integer m such that n > m >_ 2, the following holds: Let k = rllQnll/(CLnZ)l, where rxl denotes x rounded up to the nearest integer, and II Qn II is the cardinality of Qn&quot; Then there are k distinct strings z 1, ..., z k in Qn such that z i can be written w i xy i forl <i<k, and: (i) \[wi\[ = Iwj\[ for alli, j<_k; (ii) lYil = lYjl foralli, j_<k; (iii) m _> I xi\[ > m/2; (iv) \[xi\[ = Ixjl for alli, j<_k;and (v) w i xj Yi e L for alli, j, <_ k.</Paragraph> <Paragraph position="4"> Since this result is likely to be unfamiliar to some readers, we shall provide some commentary that should prove helpful in following the remainder of the presentation. Less forbiddingly stated, the Interchange Lemma says (in part) that in a CFL it is possible, for any n, to find at least two strings of length n with internal parts of the same length that can be exchanged for each other to produce strings that are also in L, providing that there are at least two distinct strings of length n. This makes it possible to prove the trans-context-freeness of a certain kind of language (what kind will be stated in a moment) by showing that once n become sufficiently large, the possibility of interchange no longer exists. For this strategy to work, it is required that the cardinality of L n grow very rapidly as a function of n, which we can illustrate with the case of the copying language over the vocabulary {a, b} (call this language CP): For every n < 2, II cPn II -- 2n2, thus growing exponentially, while Ccpn 2 grows only polynomially; the necessary conditions for use of the Interchange Lemma to prove trans-context-freeness are thus satisfied by this case. Making use of the Interchange Lemma, we have proved the following further result: Theorem 3.</Paragraph> <Paragraph position="5"> LetH = {xy I x E {a, b}*, y E {c, d}*, y = h(x), where h(a) = c, h(b) = d} and G = {xy I x E {a,b}*,y c {c, d}*, Ixl # lyl} Then any set Iis trans-CF if I is a superset of H and a subset of G LI H.</Paragraph> <Paragraph position="6"> The proof is presented as an appendix to the paper. As an immediate corrollary, we obtain the following result: 26 Computational Linguistics, Volume 13, Numbers 1-2, January-June 1987 Michael B. Kac, Alexis Manaster-Ramer, William C. Rounds Simultaneous-Distributive Coordination and Conlext-Freeness Theorem 4.</Paragraph> <Paragraph position="7"> Let G, H, and I be as defined in Theorem 3, and let J = {xy I x * {a, b}*,y * {c, d}*, Ixl = lyl} and K = J - H. Then any set L containing H and disjoint from K is trans-CF.</Paragraph> <Paragraph position="8"> Proof: Intersect L with the regular set H = {xy I x * {a, b}*,y * {c,d}*3 and letN = Lt3M. Hisa subset of N, K is disjoint from N, and the only other strings that might be in N are those in G.</Paragraph> <Paragraph position="9"> Therefore, N contains H and some subset (possibly empty) of G, and is trans-CF by Theorem 3. Since the intersection of any CFL with any regular set is CF, L is trans-CF.n Theorem 4 says that an arbitrary language L is trans-CF if it meets the following conditions: * It includes H.</Paragraph> <Paragraph position="10"> * It excludes every string not in H that is nonetheless divisible into two equal parts, the first over {a, b} and the second_over {c, d}. (This is the set K.) In order to apply these results, we sill actually need to consider not quite the sets G through N as defined above but the corresponding sets G r through N r, where the latter differ from the former in including only strings whose x and y parts are of at least length 2. The subtraction of a finite subset obviously changes nothing essential, and Theorems 3 and 4 will hold, mutatis mutandis, of sets G t through N r.</Paragraph> <Paragraph position="11"> Hence: Theorem 5.</Paragraph> <Paragraph position="12"> Let H p and K r be as defined above. Then any set L' containing H ~ and disjoint from K p is trans-CF. Our strategy in applying these results to English will be to show that there is a subset F of English that can be homomorphically mapped to some L p, and that F is the intersection of English with a regular language. This suffices to show that English itself is trans-CF.</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 GRAMMATICAL NUMBER AGREEMENT IN ENGLISH </SectionTitle> <Paragraph position="0"> Our empirical argument rests on the claim that number agreement between reflexive pronouns and their antecedents is a syntactic phenomenon in English. For example, the string (2) *The girl likes themselves.</Paragraph> <Paragraph position="1"> must be considered ungrammatical rather than merely semantically ill-formed by virtue of the impossibility (because of number incompatibility) of supplying an intraelausal antecedent for the reflexive pronoun. The reason that this is so has to do with a fact about grammatical number in English that has not been generally recognized; namely, that it is, like grammatical gender in languages such as French and German, partly arbitrary.</Paragraph> <Paragraph position="2"> This can be shown by a number of different kinds of examples, among them the following. First, there are synonym pairs in English, each consisting of a grammat- null * a mountain range in Central Asia ** as used in the garment trade Further, these examples can be elaborated in various ways. For example, names of some mountain ranges are strictly singular (Caucasus, Hindu Kush), while those of others are strictly plural (Alps, Rockies); items from similar semantic fields may vary as to their grammatical number properties (compare odds-probability, wheat-oats, yoghurt-curds, pasta-noodles, mush-grits (in some dialects), Granola-Rice Krispies). Note further that there is dialect variation regarding the grammatical number of certain collective nouns (such as government and company), which are strictly singular in American English, but which can be used as plurals in British English.</Paragraph> <Paragraph position="3"> A further phenomenon on which we shall capitalize is the existence in English of an idiomatic way of expressing the ease with which an activity can be performed involving the use of reflexive constructions, as illustrated by, for example, This land will rent itself, or These woods will sell themselves. With this in mind, compare now (3) This land and these woods can be expected to rent itself and sell themselves respectively.</Paragraph> <Paragraph position="4"> (4) *This land and these woods can be expected to rent themselves and sell itself respectively.</Paragraph> <Paragraph position="5"> It is clear that strings like (3), in which each reflexive pronoun agrees with the corresponding noun, are grammatical, while those in (4), in which each pronoun disagrees with the corresponding noun, are not. This fact will be the basis of our demonstration that English is trans-CF.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 ARGUMENT REGAINED </SectionTitle> <Paragraph position="0"> Let A = {{this land, those woods} and {this land, these woods} + can be expected to {{rent, sell} {itself, themselves}} + and {{rent, sell} {itself, themselves}} respectively}, and note that A is regular. Now let B be the subset of A that satisfies the following condition: January-June 1987 27 Michael B. Kac, Alexis Manaster-Ramer, William C. Rounds Simultaneous-Distributive Coordination and Context-Freeness In case the number of occurrences of members of {this land, these woods} is equal to the number of occurrences of members of {itself, themselves}, then for all i _> l, if the ith noun in the string is land, then the ith pronoun is itself and if the ith noun is woods, then the ith pronoun is themselves (i.e., number agreement obtains between the nouns and the reflexive pronouns).</Paragraph> <Paragraph position="1"> Now let C = A - B. Every string in C contains exactly as many pronouns as it does nouns, but for some i > 1, the ith pronoun fails to agree in number with the ith noun. Finally, let D be the subset of B consisting of just those strings that contain exactly as many nouns as pronouns.</Paragraph> <Paragraph position="2"> It is clear that D is part of English, and that C is disjoint from English, inasmuch as D exhibits the required number agreement and C does not. If D were the intersection of English with the regular set A, then the result that English is trans-CF would follow immediately. 4 However, things are not that simple, and it is conceivable that the intersection of English with A is some proper superset F of D that is a subset of B (possibly B itself). The point of uncertainty here is the status of strings of A with unequal numbers of occurrences of / nouns and pronouns, such as the following: (5) This land, these woods, and this land can be expected to rent itself and sell themselves respectively. null (6) This land and these woods can be expected to rent itself, sell themselves, and rent itself respectively.</Paragraph> <Paragraph position="3"> While it might seem that such strings are ungrammatical, this assumption is called into question by Pullum and Gazdar's (1982) observation that there is no syntactic constraint in English governing the number of conjuncts in SD-coordinations. Thus, contrary to conventional wisdom, there are perfectly well-formed SD-co-ordinations with mismatched numbers of conjuncts; for example, The last two people in this picture live in Columbus and Chicago respectively. This undermines a number of older arguments that English is trans-CF that crucially assume that grammatical SD-coordinations must have equal numbers of conjuncts. In light of this, it may be that strings like (5-6) are to be considered syntactically well-formed, albeit lacking sensible interpretations, and so an argument presupposing the contrary cannot be used to show that English is trans-CF.</Paragraph> <Paragraph position="4"> We now show that Theorems 3, 4, and 5 allow us to get around this obstacle by, in effect, ignoring the strings with mismatched numbers of conjuncts in constructing our argument. If strings like (5-6) are grammatical, then the intersection of English with the regular language A is not the trans-CF language D but some proper superset F thereof. Theorem 5 tells us in effect that, no matter what its exact identity, if there is a sublanguage of English homomorphic to H r but none homomorphic to K p, then English is trans-CF. This &quot;separation&quot; strategy yields the conclusion that so long as English contains D and excludes C, it is trans-CF.</Paragraph> <Paragraph position="5"> Recall now that, according to our definition, B includes strings like (5-6), along with sentences like (3) but, crucially, excludes (4) and all strings like it. To be precise, B includes all strings of A which, like (3) have exactly as many nouns as pronouns and cross-serial number agreement, but also all strings in A that, like (5) and (6), have more pronouns than nouns or vice versa.</Paragraph> <Paragraph position="6"> In virtue of Theorem 5, we will be able to show that English is trans-CF no matter what position we take on the grammaticality of strings like (5-6), so long as there are no English sentences in the subset C of A consisting of strings in which there are as many nouns as pronouns but at least one of the pronouns fails to agree with the corresponding noun in the first part of the string. Thus, the intersection of English with A is some subset F of B that is disjoint from C; it is of no consequence whether F is equal to all of B, or only to D, or to some proper subset of B that is a proper superset of D.</Paragraph> <Paragraph position="7"> We now define the homomorphism h such that</Paragraph> <Paragraph position="9"> This homomorphism maps F to L t. Since the CFLs are closed under homomorphism, and L r is trans-CF, F is trans-CF. And since the CFLs are also closed under intersection with regular sets, and the intersection of English with the regular set A has turned out to be trans-CF, it follows that English is also trans-CF.</Paragraph> <Paragraph position="10"> It should be immediately apparent that a similar strategy can be applied in instances such as the one mentioned in Section 1, where grammatical gender agreement is involved, providing that the language in question has instances of arbitrary gender. So, for example, we can construct for French a sublanguage parallel to D consisting of sentences like (7) Cette nation et ce pays sont respectivement une alli6e et un associ6 des Etats-Unis.</Paragraph> <Paragraph position="11"> 'This nation and this country are respectively an ally and a partner of the United States.' In this example, we capitalize on the fact that the inanimate nouns nation 'nation' and pays 'country' belong to different gender classes, reflected in the predicate nominals une alli~e 'an ally' and un associd 'a partner'. 5 Inversion of the predicate nominals yields an ungrammatical string, but corresponding inversion of the subjects restores grammaticality: (8) *Cette nation et ce pays sont respectivement un associ6 et un alli6e des Etats-Unis.</Paragraph> <Paragraph position="12"> (9) Ce pays et cette nation sont respectivement un assoei6 et une alli6e des Etats-Unis.</Paragraph> <Paragraph position="13"> 28 Computational Linguistics, Volume 13, Numbers 1-2, January-June 1987 Michael B. Kac, Alexis Manasler-Ramer, William C. Rounds Simultaneous-Distributive Coordination and Conlexl-Freeness Comparable examples can be constructed in other languages, a case in point being the Polish sentence (10) Francja i Kongo s~ przeciwniczka wzgl~dnie zwolennikiem traktatu.</Paragraph> <Paragraph position="14"> which translates literally as 'France and (the) Congo are opponent respectively supporter (of the) treaty'; here, as in the French example, the two nouns in the subject phrase are of different grammatical genders, and are matched with corresponding gender-compatible nouns in the predicate phrase. Example (11) is ungrammatical (notice the suffixes of the predicate nouns) while (12) is grammatical again.</Paragraph> <Paragraph position="15"> (1 1) *Francja i Kongo s~ zwolennikiem wzgl~dnie przeciwniczk~ traktatu.</Paragraph> <Paragraph position="16"> (12) Kongo i Francja s~ zwolennikiem wzgl~dnie przeciwniczk~ traktatu.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 CONCLUSION </SectionTitle> <Paragraph position="0"> We would like to close by pointing out the importance and interest of,the following question: Given that it is logically possible for a language to have an operator just like English respectively except that the conjuncts to be linked with each other are paired in center-embedding rather than cross-serial fashion, and given that the properties of such an operator can be characterized by formal apparatus apparently more elementary than what is required to characterize respectively, why do operators of this seemingly more elementary type appear not to exist in any natural language? The use of apparently is important here: from the standpoint of the Chomsky hierarchy, nesting is less complex than mutual intercalation, in the sense that the type of grammar required to handle the former is more restricted than the type required to handle the latter, but the possibility is always open that this type of complexity is not germane to human psychological capacity. We strongly suspect that this is the case (see Manaster-Ramer and Kac 1985), though that is a topic for another time.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> APPENDIX: PROOF OF THEOREM 3 </SectionTitle> <Paragraph position="0"> Assume that I c G U H is context-free and apply the Interchange Lemma to Q2n = {xh(x) \] x E {a, b} n} Since H _c I, Qzn _c Izn, the set of length 2n substrings in I. Choose n suitably large (for the exact choice, see the proof of Claim 1 below), and let m = n. Let k = r llQznl\[/(Cl(2n)2)'l be the number defined in the lemma. We get k distinct z i in Q2n satisfying the conclusion of the lemma. Our result will follow from the next two claims: Claim 1. Let x i, ..., x k be the middle parts of z i, ..., z k respectively. Then there are i and j such that x i xj, provided that n is suitably chosen.</Paragraph> <Paragraph position="1"> Claim 2. If x i C/ Xj, then wixjy i is not in G O H and, afortiori, not in I.</Paragraph> <Paragraph position="2"> Proof of Claim 1. Suppose that all the x i were equal. By (iii) of the Interchange Lemma, I xil > n/2. Therefore, at least n/4 characters from the x i are in the {a, b} half or in the {c, d} half of z i. We may assume the former possibility since the argument works exactly the same way in the latter case. Each z i is determined by the n characters of its {a, b} half and, by supposition, at least n/4 of these characters are fixed. So there can be at most 2 n-n~4 = 2 3n/4 strings z t ..... z k. But JJ Q2n IJ = 2 n, and if n is chosen so that 23n/4 < 2n/(Cl n2) = Jl Qzn JJ/(el n2), we contradict the Interchange Lemma, which says that all the z's are distinct. Note that n can always be chosen this way, no matter what C I is, by elementary inequalities from college algebra and calculus.</Paragraph> <Paragraph position="3"> Proof of Claim 2. Observe first that interchanging x i with xj does not produce a string in G. The substrings x i and xj disagree in some position, which is also a position in one half or the other of the strings z i and zj. Thus the matching position in z i and zj (in the other half of the word) does not occur in x i or xj because \] Xi\] -~ \]Xjl _< n, by (iii) of the Interchange Lemma, so interchanging x i and xj produces a string not in H.</Paragraph> <Paragraph position="4"> Since Claim 1 and Claim 2 violate clause (v) of the Interchange Lemma, I cannot be context-free.II</Paragraph> </Section> class="xml-element"></Paper>