TESTING THE PROJECTIVITY HYPOTHESIS 
Vladimir Pericliev 
Mathematical Linguistics Dpt 
Institute of Mathematics with Cemp Centre 
lll3 Sofia, bl.8, BULGARIA 
Ilarion Ilarionov 
Mathematics Dpt 
Higher Inst of Eng & Building 
Sofia, BULGARIA 
ABSTRACT 
The empirical validity of the projeetivity hypothesis 
for Bulgarian is tested. It is shown that the justi- 
fication of the hypothesis presented for other lan- 
guages suffers serious methodological deficiencies. 
Our automated testing, designed to evade such defi- 
ciencies~ yielded results falsifying the hypothesis 
for Bulgarian: the non-projective constructions stu- 
died were in fact grammatical rather than ungrammati- 
cal, as implied by the projeetivity thesis. Despite 
this, the projectivity/non-projectivity distinction 
itself has to be retained in Bulgarian syntax and, 
with some provisions, in the systems for automatic 
processing as well. 
1 THE PROJECTIVIrY HYPOTHESIS 
Projectivity is word order constraint in depen- 
dency grammars, which is analogous to continuous con- 
stituency within phrase-structure systems. In a pro- 
jective sentence, between two words connected by a 
dependency arc only such words can be positioned 
which are governed (directly or indirectly) by one of 
these words. Or, in other words, a sentence is pro- 
jective in case there are no intersections between 
arcs and projections in its dependency tree diagram. 
Thus, for instance, sentence (i) is projective, whe- 
reas sentence (2) is non-projective: 
He took the book He the took book 
We might note that sentence (2) is ungrammatical. 
The projectivity hypothesis, originally propounded 
by Lecerf (of. e.g. Lecerf 1960) and later gaining 
wide acceptence, amounts to the following: Natural 
languages are projective in the sense that the non- 
projective constructions in them are ungrammatical. 
And this has an important consequence. Thus, taking 
into account the self-evident fact that ungrammatical 
phrases do not occur in texts, in the processing of 
texts we can rule out from consideration the non-pro- 
jective parses on the basis of ungrammatioslity. Pro- 
jectivity thus serves as a filtering device, shown 
further to be of extremely powerful nature (op.oit.). 
To estimate the usefulness of the projectivity hy- 
pothesis for each particular language requires the 
conduct of extensive empirical testings. On the basis 
of statistical accounts from inspection of texts 
French was reported by Leoerf to be almost lO0."~ pro- 
jective. The same would be true, according to him, 
for other languages like German, Italian, Dutch etc., 
although the material available Cat the time) was not 
sufficient for statistical processing. English is al- 
so believed to be a projective language: in 30 000 
phrases only two non-projective ones were found (Har- 
per and Hays 1959); in Kareva (1965) somewhat diffe- 
rent, but still result in the same vein was obtained 
(using different notation): from lO 000 phrases of 
connected text 620 were found to be non-projective. 
Such investigations can be seen to be bound toge- 
ther by their a r~h to the testing of the pro- 
jectivity hypothesis: texts are explored and statis- 
tical accounts are made of the correlation between 
projective and non-projective phrases. The very rare 
occurrence in such texts of non-projective sentences 
is interpreted as a confirming evidence. Such studies 
represent what we shall furtheron refer to as "the 
textual approach to the testing of the projeotivity 
hypothesis" (or simply, "the textual approach"). 
2 DEFICIENCIES OF THE TEXTUAL APPROACH 
The textual approach, in addition to the fact that 
it involves the tedious task of inspection of thousa- 
nds of sentences, suffers serious methodological 
shortcomingswhich can be summarized as follows: 
(i) Irrelevancz of data. The data the textual app- 
roach presents in justification of the hypothesis is, 
strictly speaking, irrelevant. Knowing that non-pro- 
jective phrases do not occur in texts, naturally, 
gives us no formal right to infer that such phrases 
are ungrammatical as well. 
(ii) I~is_.u~fi_c~en_.c~ of data. The data provided by 
this approach is insufficient to justify even a __wea- 
ker claim to the effect that non-projective structur- 
es do not occur in texts. To justify this latter 
claim further steps in addition to direct inspection 
of certain (immaterially how large) corpora of texts 
should be made. In particular, a justifiable justifi- 
cation would have to involve both further factual 
confirmation (e.g. demonstration that predictions 
from the hypothesis in fact comply with actual data) 
and "systematic" confirmation (demonstration that the 
hypothesis is consistent with other linguistic prin- 
ciples, facts, etc.) (of. e.g. Baths 19Bl: Ch.9; al- 
so § 3 below). 
(iii) Heuristic futility. The textual approach is 
heuristically futile in the sense that, being confi- 
ned to a mere registration of non-projective constru- 
ctions within specific texts, we have no way of know- 
ing whether the structures encountered (if some are 
at all encountered) are all the non-projective struc- 
tures in a given language, and if not, how many more 
are there, and which exactly they are. 
3 TESTING THE PROOECTIVITY HYPOTHESIS FOR BULGARIAN 
The considerations given in § 2 seriously under- 
mine the credulousness of the results obtained for 
other languages following the textual approach. What 
was important for our investigation however was to 
evade these methodological deficiencies in the study 
of Bulgarian. Accordingly, we had to address not 
texts, but rather what we had to do was to generate 
all logically admissible non-projective structures 
56 
in Bulgarian, and then inspect them for grammatiea- 
lity. 
It was appropriate to aeeomplieh our testing in 
two phases: preliminar~ (manual) ~, in whieh 
the plausibility of the prejeotivity hypothesis was 
to be estimated for the Bulgarian language, and 
testinq ~i4ezr (automated tsstinA), in which the 
non-projective structures in Bulgarian were to be 
automatically generated, and then checked for gra- 
mmatieality/ungrammaticality. 
3.1 Preliminary testing 
The preliminary (manual) testing comprised: 
(i) factual testing, and (ii) systematic testing 
(cf. § z (ii)). 
In the factual testing it was inspected whether 
certain predictions from the projectivity hypothesis 
are consistent with actual data. l hat is, we take 
an arbitrery non-projective situation, say, a situa- 
tion of the form: 
(3) ./. 
X1 X2 X3 
and then, subatituting X1, X2, and X3 with approp- 
riate word classea, check whether the resultant con- 
struction is well-formed in Bulgarian or not. 
In the systematic testing it was inspected whe- 
ther the projectivity hypothesis in fact fits in 
with other known word order principles, rules, etc. 
(of univers~ql or language-specific nature). 
By way of illustration, consider the generally 
recognized universal principle: In all languages 
there exist classes of words occupying a rigidly 
fixed position in the sentence (the particular words 
and positions of course being language-specific). On 
inspection, this prineiple turns out to contradict 
the projectivity hypothesis. This is so, since such 
situations may occur in which this fixed position of 
certain words leads to non-projeetivity. Thus, one 
manifestation of this principle in Bulgarian syntax 
ia reflected in the fact that the verb sam 'be' ne- 
ver occurs in sentence-initial or eentenee-final po- 
sition. Now, assume that we have a three-word sen- 
tence containing be in which moreover: (a) be go- 
verna another word, X2, X2 being positioned to the 
left or right of be; and (b) X2 governs X3, X3 being 
obligatorily positioned to the right of X2. This 
being the ease, three structures are theoretically 
admissible, two projective and one non-projective: 
(4) ~ (5). (6) .~_.~ .... .~ .~' 
•, ~.~. ~_~, 
X2 X3 be be X2 X3 X2 be X3 
However, structures (4) and (5) will be ungrammati- 
eel, as predicted by the principle mentioned (notice 
the position of be). This latter fact, in turn, pre- 
dicts the grammatioality of the non-projective 
structure (6) (knowing of course that there is no- 
thing to forbid in Bulgarian the occurrence of three- 
word sentences containing be). As another illustra- 
tion, this mode of testing would have to lead to the 
discovery of non-projectivities of the type: "A j~_~- 
oedure is discussed whish..." in English which are 
due to the sentence-initial position of the subject 
in the English sentence. 
In summary, the results obtained from our preli- 
minary testing showed the implausibility of the hy- 
pothesis for Bulgarian: we easily found numerous and 
diverse kinds of counterexamples to it. We further 
noticed that the counterexamples belonged, informally 
speaking, to two stylistic layers which could be la- 
beled a8 stylistieally marked and stylistically un- 
marked. 
3.2 Testinq proper 
As a next step in our investigation, the non-pro- 
jective constructions in Bulgarian had to be generat- 
ed, and then assessed for well-formedneas. More spe- 
cifically, non-projectivity in triples and quadruples 
was to be examined (in so far aa non-projeotivity in 
more than four-words constructions is reducible to 
triples or quadruples). 
in triples, there are two possible non-projective 
situations, viz. (the mirror-images): 
(7) ./" _~ (a) ------'" . 
X\]. X2 X3 X3 X2 X1 
In quadruples,these nan-projective situations are 
30 in number. That is, the total number of non-pro= 
jeetive situations is 32. The number and content of 
constructions in Bulgarian conforming to these situa- 
tions will be language-specific, i.e. it will depend 
on the specific Bulgarian word classes and the po- 
ssibilities for their mutual positioning. E.g. the 
constructions conforming to situation (7) will be the 
set of all triples X1 X2 X3 such that X2 governs X1, 
X1 being positioned to the left of X2, and X1 governs 
X3, X3 being positioned to the riqht of X1. 
Then, a program was written in BASIC implemented 
on the Bulgarian microcomputer "Pravetz" (a machine 
compatible with Apple 11) which generated the con- 
structions conforming to the non-projective situa- 
tions, l he input to the program was a fragment of the 
dependency grammar for Bulgarian given in Pericliev 
1983. In particular, 30 rules were stored, each rule 
eoneiting of a pair of word classes, a master and a 
slave, and their mutual position(s).Fer obvious rea- 
sons the rules were not arbitrarily chosen, but ra- 
ther it was required that they be maximally diverse 
in syntactic nature. That is, they included pairs of 
notional and/or functional words (particles, pro- 
nouns/adverbs introducing clauses, paired conjunc- 
tions, duplicating parts of the sentence, etc.). The 
generated constructions were then inspected for well- 
formedness. 
The results from our experiment may be summarized 
as follows. From about 3\[)0 non-projective construc- 
tions generated, approximately 15% turned out to be 
ungrammatical. The remaining part of the construc- 
tions were grammatical. As already expected, they 
could be classed into two groups according to their 
stylistic value: stylistically unmarked and stylis- 
tically marked constructions. 
The unmarked constructions, informally speaking, 
included diverse kinds oi' structures: some questions 
(with the question particle li 'do' or with litoge- 
thor with a notional questioning word), some excla- 
matory sentences (with structure of questions), di~ 
57 
fferent complex sentences Ca word belonging to some 
subordinate clause, most often objective and attribu- 
tive clause, is positioned somewhere in the main 
clause), sentences containing clitice (be, short po- 
ssessive and dative pronouns, etc.), various constru- 
ctions with "strongly linked" parts (paired conjun- 
ctions/particles, duplicating parts, Bulgarian equi- 
valents of more ... than, such ... that , etc.) and 
many othera~"T'he rati-~"6&tw-6e-~styli-~&lly unmar- 
ked and stylistically marked constructions was 
about 1:5. 
4 DISCUSSION OF RESULTS 
In principle, a hypothesis, in empirical scien- 
ces such as linguistics, may be said to be: Ca) ab- 
solutely true (i.e. true without exceptions), (b) on 
the whole true (i.e. true, but with certain excep- 
tions), and (e) false. 
The projectivity hypothesis would hove been usa- 
ble as a filtering device (in Bulgarian) if it fell 
under cases (a) or (b), ease (b) presupposing fur- 
ther that there is a list available of all exceptio- 
ns. The results obtained however unambiguously class 
it under case (c). Indeed, the great majority of non- 
projective constructions in Bulgarian are well-formed 
rather than unacceptable, as implied by the hypothe- 
sis. This seems to be the only correct conclusion, 
despite the fact that the results themselves should 
not be considered as absolutely final, this being 
due to the following circumstances: (1) We did not 
in fact inspect literally all admissible non-pro- 
jectivitiea but only those that were obtainable from 
the fragment of grammar stored for the experiment; 
and (2) the presense/absense of projectivity signi- 
ficantly depends on the conventions chosen for de- 
pendency arcs distribution. Still, our investigation 
on the whole is to be viewed as sufficiently reliab- 
le (another experimental testing in which a larger 
grammar was used end slightly different notation 
where linguistically justified gave similar results). 
Another point deserves special attention. Des- 
pite the fact that non-projective structures in Bul- 
garian are grammatical, the prevalent part of them 
are "stylistically marked". Whatever that means, the 
latter circumstance implies that, quite probably, 
such constructions will not occur in some kinds of 
texts at least. Our chances to pinpoint such texts 
to a great extent depend on our understanding of the 
constructions in question (the label "stylistic mar- 
kedness" in itself is no great progress in this di- 
rection). So we focused our attention on these con- 
structions, having already completed the experimen- 
tal testing described. The preliminary results of 
this latter study were quite intereatingi non-pro- 
jeetivity in fact turned out to finely describe 
sentences with a special type of logical emphasis in 
Bulgarian, characterized by: 
(i) presense of both non-projectivity and con- 
trastive ~tress (CS) (in contrast to other emphatic 
sentences in Bulgarian, having a CS and "normal", 
projective word order); 
(ii) preaense of CS on, on one of the words 
(immaterially which) from the non-projective "inter~ 
val" (an "interval" is constituted by a pair of 
words connected by an arc such that intersects with 
a projection); the non-projective constructions in 
this sense contrast with other emphatic constructions 
with projective word order , which can take a CS on 
58 
on, word in them; 
(iii) having synonymous constructions of two 
types: with a projective word order and CS, or such 
analogous to the English cleft sentences ("It is he 
who ..., where the underlined words are connected by 
an----are); perhaps, not by coincidence the latter con- 
structione, both in Bulgarian and in English, are 
non-projective; 
(iv) having a relation to the topic-comment dis- 
tinction; more specifically, one of the words from 
the non-projective interval (immaterially which) is 
necessarily the comment of the sentence (though, na- 
turally, not each comment requires non-projeetivity). 
These findings rehabilitate the projectivity/ 
/non-projectivity distinction (thogh not the hypothe- 
sis) for Bulgarian syntactic the~: non-projectivi- 
ty happens to be the word order part of s very gene- 
ral mechanism for formation of sentences with lo~ 
gical emphasis. Thus, though Bulgarian is generally 
characterized as a free word order language, senten- 
ces with logical emphasis are not with a "free", 
but rather with a precisely specified word order 
coinciding with non-projectivity. 
As to the need to retain the above distinction 
in syntactic processors of Bulgarian, things will 
be determined by the concrete applied goals: some 
small applications can certainly ignore non-projec- 
tive structures (their emphatic part), whereas ro- 
bust systems cannot do without them. That is, the 
former systems would have to keep the distinction 
and the latter reject it. Still, it will not be sur- 
prising if the distinction might be usefully acco- 
modated even in the latter case. However, this needs 
further investigations, mostly in the line of textu- 
al linguistics: the study of context, topic/comment, 
etc. One is reminded in this train of thought of 
the remark of the philosopher K. Popper to the 
effect that a scientific investigation begins and 
ends with problems. 
REFERENCES 

Botha R., 1981, The Conduct of Linquia_tic I__nn- 
u q~, Mouton: The Hague. 

Harper K. and D. Hays, 1959, The use of machines 
in the construction of a grammar and computer prog- 
ram for structural analysis, Rapport UNESCO. 

Kareva N.~ 1965, A classification of non-projec- 
tive constructions, Scientific ~ Technical Informa- 
tio___En , No.4 (in Russian). 

Leeerf Y., 1960, Programme dee conflits, mod&le 
des conflits, Tr_aduction automati__2.~, l, No.4° 

Perioliev V., 1983, ~~in Bul~ 
~and in En liq~ (Ph.D. Dissertation; in Bul- 
garian)~ 
