File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/86/c86-1011_metho.xml
Size: 13,391 bytes
Last Modified: 2025-10-06 14:11:49
<?xml version="1.0" standalone="yes"?> <Paper uid="C86-1011"> <Title>Ilarion Ilarionov Mathematics Dpt Higher Inst of Eng & Building</Title> <Section position="1" start_page="0" end_page="0" type="metho"> <SectionTitle> TESTING THE PROJECTIVITY HYPOTHESIS </SectionTitle> <Paragraph position="0"/> </Section> <Section position="2" start_page="0" end_page="0" type="metho"> <SectionTitle> ABSTRACT </SectionTitle> <Paragraph position="0"> The empirical validity of the projeetivity hypothesis for Bulgarian is tested. It is shown that the justification of the hypothesis presented for other languages suffers serious methodological deficiencies.</Paragraph> <Paragraph position="1"> Our automated testing, designed to evade such deficiencies~ yielded results falsifying the hypothesis for Bulgarian: the non-projective constructions studied were in fact grammatical rather than ungrammatical, as implied by the projeetivity thesis. Despite this, the projectivity/non-projectivity distinction itself has to be retained in Bulgarian syntax and, with some provisions, in the systems for automatic processing as well.</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="metho"> <SectionTitle> 1 THE PROJECTIVIrY HYPOTHESIS </SectionTitle> <Paragraph position="0"> Projectivity is word order constraint in dependency grammars, which is analogous to continuous constituency within phrase-structure systems. In a projective sentence, between two words connected by a dependency arc only such words can be positioned which are governed (directly or indirectly) by one of these words. Or, in other words, a sentence is projective in case there are no intersections between arcs and projections in its dependency tree diagram.</Paragraph> <Paragraph position="1"> Thus, for instance, sentence (i) is projective, whereas sentence (2) is non-projective: He took the book He the took book We might note that sentence (2) is ungrammatical.</Paragraph> <Paragraph position="2"> The projectivity hypothesis, originally propounded by Lecerf (of. e.g. Lecerf 1960) and later gaining wide acceptence, amounts to the following: Natural languages are projective in the sense that the non-projective constructions in them are ungrammatical. And this has an important consequence. Thus, taking into account the self-evident fact that ungrammatical phrases do not occur in texts, in the processing of texts we can rule out from consideration the non-projective parses on the basis of ungrammatioslity. Projectivity thus serves as a filtering device, shown further to be of extremely powerful nature (op.oit.).</Paragraph> <Paragraph position="3"> To estimate the usefulness of the projectivity hypothesis for each particular language requires the conduct of extensive empirical testings. On the basis of statistical accounts from inspection of texts French was reported by Leoerf to be almost lO0.&quot;~ projective. The same would be true, according to him, for other languages like German, Italian, Dutch etc., although the material available Cat the time) was not sufficient for statistical processing. English is also believed to be a projective language: in 30 000 phrases only two non-projective ones were found (Harper and Hays 1959); in Kareva (1965) somewhat different, but still result in the same vein was obtained (using different notation): from lO 000 phrases of connected text 620 were found to be non-projective.</Paragraph> <Paragraph position="4"> Such investigations can be seen to be bound together by their a r~h to the testing of the projectivity hypothesis: texts are explored and statistical accounts are made of the correlation between projective and non-projective phrases. The very rare occurrence in such texts of non-projective sentences is interpreted as a confirming evidence. Such studies represent what we shall furtheron refer to as &quot;the textual approach to the testing of the projeotivity hypothesis&quot; (or simply, &quot;the textual approach&quot;).</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 2 DEFICIENCIES OF THE TEXTUAL APPROACH </SectionTitle> <Paragraph position="0"> The textual approach, in addition to the fact that it involves the tedious task of inspection of thousands of sentences, suffers serious methodological shortcomingswhich can be summarized as follows: (i) Irrelevancz of data. The data the textual approach presents in justification of the hypothesis is, strictly speaking, irrelevant. Knowing that non-projective phrases do not occur in texts, naturally, gives us no formal right to infer that such phrases are ungrammatical as well.</Paragraph> <Paragraph position="1"> (ii) I~is_.u~fi_c~en_.c~ of data. The data provided by this approach is insufficient to justify even a __weaker claim to the effect that non-projective structures do not occur in texts. To justify this latter claim further steps in addition to direct inspection of certain (immaterially how large) corpora of texts should be made. In particular, a justifiable justification would have to involve both further factual confirmation (e.g. demonstration that predictions from the hypothesis in fact comply with actual data) and &quot;systematic&quot; confirmation (demonstration that the hypothesis is consistent with other linguistic principles, facts, etc.) (of. e.g. Baths 19Bl: Ch.9; also SS 3 below).</Paragraph> <Paragraph position="2"> (iii) Heuristic futility. The textual approach is heuristically futile in the sense that, being confined to a mere registration of non-projective constructions within specific texts, we have no way of knowing whether the structures encountered (if some are at all encountered) are all the non-projective structures in a given language, and if not, how many more are there, and which exactly they are.</Paragraph> </Section> <Section position="5" start_page="0" end_page="57" type="metho"> <SectionTitle> 3 TESTING THE PROOECTIVITY HYPOTHESIS FOR BULGARIAN </SectionTitle> <Paragraph position="0"> The considerations given in SS 2 seriously undermine the credulousness of the results obtained for other languages following the textual approach. What was important for our investigation however was to evade these methodological deficiencies in the study of Bulgarian. Accordingly, we had to address not texts, but rather what we had to do was to generate all logically admissible non-projective structures in Bulgarian, and then inspect them for grammatieality. null It was appropriate to aeeomplieh our testing in two phases: preliminar~ (manual) ~, in whieh the plausibility of the prejeotivity hypothesis was to be estimated for the Bulgarian language, and testinq ~i4ezr (automated tsstinA), in which the non-projective structures in Bulgarian were to be automatically generated, and then checked for grammatieality/ungrammaticality. null</Paragraph> <Section position="1" start_page="56" end_page="56" type="sub_section"> <SectionTitle> 3.1 Preliminary testing </SectionTitle> <Paragraph position="0"> The preliminary (manual) testing comprised: (i) factual testing, and (ii) systematic testing (cf. SS z (ii)).</Paragraph> <Paragraph position="1"> In the factual testing it was inspected whether certain predictions from the projectivity hypothesis are consistent with actual data. l hat is, we take an arbitrery non-projective situation, say, a situation of the form:</Paragraph> <Paragraph position="3"> and then, subatituting X1, X2, and X3 with appropriate word classea, check whether the resultant construction is well-formed in Bulgarian or not.</Paragraph> <Paragraph position="4"> In the systematic testing it was inspected whether the projectivity hypothesis in fact fits in with other known word order principles, rules, etc.</Paragraph> <Paragraph position="5"> (of univers~ql or language-specific nature).</Paragraph> <Paragraph position="6"> By way of illustration, consider the generally recognized universal principle: In all languages there exist classes of words occupying a rigidly fixed position in the sentence (the particular words and positions of course being language-specific). On inspection, this prineiple turns out to contradict the projectivity hypothesis. This is so, since such situations may occur in which this fixed position of certain words leads to non-projeetivity. Thus, one manifestation of this principle in Bulgarian syntax ia reflected in the fact that the verb sam 'be' never occurs in sentence-initial or eentenee-final position. Now, assume that we have a three-word sentence containing be in which moreover: (a) be governa another word, X2, X2 being positioned to the left or right of be; and (b) X2 governs X3, X3 being obligatorily positioned to the right of X2. This being the ease, three structures are theoretically admissible, two projective and one non-projective: (4) ~ (5). (6) .~_.~ .... .~ .~' *, ~.~. ~_~, X2 X3 be be X2 X3 X2 be X3 However, structures (4) and (5) will be ungrammatieel, as predicted by the principle mentioned (notice the position of be). This latter fact, in turn, predicts the grammatioality of the non-projective structure (6) (knowing of course that there is nothing to forbid in Bulgarian the occurrence of three-word sentences containing be). As another illustration, this mode of testing would have to lead to the discovery of non-projectivities of the type: &quot;A j~_~oedure is discussed whish...&quot; in English which are due to the sentence-initial position of the subject in the English sentence.</Paragraph> <Paragraph position="7"> In summary, the results obtained from our preliminary testing showed the implausibility of the hypothesis for Bulgarian: we easily found numerous and diverse kinds of counterexamples to it. We further noticed that the counterexamples belonged, informally speaking, to two stylistic layers which could be labeled a8 stylistieally marked and stylistically unmarked. null</Paragraph> </Section> <Section position="2" start_page="56" end_page="57" type="sub_section"> <SectionTitle> 3.2 Testinq proper </SectionTitle> <Paragraph position="0"> As a next step in our investigation, the non-projective constructions in Bulgarian had to be generated, and then assessed for well-formedneas. More specifically, non-projectivity in triples and quadruples was to be examined (in so far aa non-projeotivity in more than four-words constructions is reducible to triples or quadruples).</Paragraph> <Paragraph position="1"> in triples, there are two possible non-projective situations, viz. (the mirror-images):</Paragraph> <Paragraph position="3"> In quadruples,these nan-projective situations are 30 in number. That is, the total number of non-pro= jeetive situations is 32. The number and content of constructions in Bulgarian conforming to these situations will be language-specific, i.e. it will depend on the specific Bulgarian word classes and the possibilities for their mutual positioning. E.g. the constructions conforming to situation (7) will be the set of all triples X1 X2 X3 such that X2 governs X1, X1 being positioned to the left of X2, and X1 governs X3, X3 being positioned to the riqht of X1.</Paragraph> <Paragraph position="4"> Then, a program was written in BASIC implemented on the Bulgarian microcomputer &quot;Pravetz&quot; (a machine compatible with Apple 11) which generated the constructions conforming to the non-projective situations, l he input to the program was a fragment of the dependency grammar for Bulgarian given in Pericliev 1983. In particular, 30 rules were stored, each rule eoneiting of a pair of word classes, a master and a slave, and their mutual position(s).Fer obvious reasons the rules were not arbitrarily chosen, but rather it was required that they be maximally diverse in syntactic nature. That is, they included pairs of notional and/or functional words (particles, pronouns/adverbs introducing clauses, paired conjunctions, duplicating parts of the sentence, etc.). The generated constructions were then inspected for wellformedness. null The results from our experiment may be summarized as follows. From about 3\[)0 non-projective constructions generated, approximately 15% turned out to be ungrammatical. The remaining part of the constructions were grammatical. As already expected, they could be classed into two groups according to their stylistic value: stylistically unmarked and stylistically marked constructions.</Paragraph> <Paragraph position="5"> The unmarked constructions, informally speaking, included diverse kinds oi' structures: some questions (with the question particle li 'do' or with litogethor with a notional questioning word), some exclamatory sentences (with structure of questions), di~ fferent complex sentences Ca word belonging to some subordinate clause, most often objective and attributive clause, is positioned somewhere in the main clause), sentences containing clitice (be, short possessive and dative pronouns, etc.), various constructions with &quot;strongly linked&quot; parts (paired conjunctions/particles, duplicating parts, Bulgarian equivalents of more ... than, such ... that , etc.) and many othera~&quot;T'he rati-~&quot;6&tw-6e-~styli-~&lly unmarked and stylistically marked constructions was about 1:5.</Paragraph> </Section> </Section> class="xml-element"></Paper>