SYNTACTIC PATTERNS IN A SAMPLE 
OF TECHNICAL ENGLISH 
The Importance of the Concept of Homogeneity 
A fundamental assumption of statistical linguistics 
is that there are differences worthy of note in the fre- 
quency of various units in certain texts. At the same 
Time, there are differences in frequencies which would 
not be considered important. The question is, how is an 
"important" difference tO be determined? 
The mesolution of this pmoblem has been made more im- 
portant by the increasing populamity of statistical ap- 
pmoaches to questions of style and authorship. Defini- 
tions of style from this point of view are based on notions 
of distinctiveness and consistencyin literary performance. 
While distinctiveness appears to be the more important com- 
ponent of style, it is recognized that some consistency is 
necessary to lend significance to whatever feature might 
be distinctive. 
The Deter,nination of Homogeneity 
For this discussion we define homogeneity as the 
similarity of parts of the whole with respect to certain 
features. For some features it may be perfectly clear, 
even without counting, that parts of a text or texts from 
a genre are not alike. This seems more likely to occur 
for some features and for some genmes than for others, for 
-1- 
example, syntacticor phonological constructions in poetry, 
as opposed to parts of speech in technical writing. 
Few would be satisfied to rely solely on subjective 
impression fom the estimation of the similarity of text 
samples. For statistical linguists the decision to count 
is the foundation of their science. Fop literary scholars 
the decision to count stems from a desire to give quanti- 
tative verification of existing theories and interpreta- 
tions, and to gain greater insight into the structure of 
literary works for the purpose of proposing new theories 
and interpretations. Both groups are faced with the prob- 
lem of evaluating the results of the counting. 
The Nature of Statistical Tests 
The techniques of statistical description ame, of 
course, uniquely suited to the statement of the raw, 
uninterpreted results. Measures of location such as means, 
modes, medians are commonly used for this purpose. 
In examining the raw results it may be clear at once 
that there is a meaningful difference among the counts or 
scopes. If samples of 100 sentences were taken at random 
from each of two texts, and the mean lengths for the two 
samples were 20 words and W0 words, no one would hesitate 
to conclude that one text revealed a r'significantly" 
greater sentence length than the other. But if the figures 
were closer, say 27 and 33, more exact methods ape needed. 
m 
It is a law of nature that a sample taken from a population 
will not always yield exactly the statistics of the popula- 
Tion, that on occasion even a large discrepancy will be 
found. The extent to which sample values may be expected 
to vary from population values through chance alone is a 
subject of mathematical statistics, as is the extent to 
which two or more sample values from the same population 
will differ. 
Language Statistics and Homogeneity 
There is considerable data that demonstrates overall 
similarities in the frequencies of various units between 
samples from the same writer, fmom different writers, and 
even from different languages. 1 The problem for statis- 
tical linguistics and stylistics is the ordering of degrees 
of similarity into groups according to some notion of homo- 
geneity. If the sample values differ no more than could 
reasonably be attributed to chance, we see no reason why 
the populations from which the samples were taken could 
not be called one homogeneous population. 
Whether text samples pass a statistical test fom 
homogeneity depends on the nature of the text~ the chosen 
iSee, for example, Herdan, The Advanced Theory o~f Language 
as Choice and Chance, pp. i--7-/-27, and M. Rensk~"The Noun- 
Verb Quotlent in Englxsh and Czech, Phllolo~la Pra~ensla, 
VIII (1965), pp. 289-302. 
-3- 
&- 
significance level, and the power of the test as deter- 
mined by characteristics of the test itself in conjunction 
With the size of the sample. It is possible to imagine a 
perfectly uniform text, for example, one composed o£ 
nothing more than repetitions of the same identical sentence. 
In this case, a statistical test will reveal this homogen- 
eity for any significance level or sample size. For real 
texts, though, the selection of the s.l. and s.s. poses a 
problem of practical and theoretical interest. The danger 
is that an investigator will be tempted to make a flat 
statement concerning the homogeneity of a feature for a 
text or a genre, when a slight change in s.l. or s.s. could 
have led to a reversal of that finding. Homogeneity, then, 
as a product of statistical hypothesis testing, should not 
be regarded as a function of the text alone, but rather 
as a function of the text and the significance level and 
power associated with the test and the sample size. If 
the samplesrepresent different populations even if different 
only in some minimal way, it is only a question of increas- 
ing sufficiently the sample size to cause the hypothesis 
of homogeneity to be rejected. 
In discussing the size of samples to be taken, Herdan 
states that "for statistical investigations in general, it 
is usually a question of how small the sample should be-- 
I for reasons of economy--without becoming unrepresentative 
of the universe, and without the errors acquiring such 
dimensions as to make significance testing illusory. "2 
2Ibid., p. 170 
-4- 
q 
It is clear that hard infommation is needed on The 
extent to which parts of a single text will differ with 
respect to the frequency of various measured units. IT 
is also clear that different units may occur with vamying 
degrees of consistency throughout a text. The question 
of the homogeneity of a text is complex. But until The 
nature of variation within texts is understood, statements 
about variation between texts cannot be made with great 
authority. 
The Design of the Study 
A suitable model for the study of quantitative change 
in linguistic behavior is one which views change as taking 
place along dimensions, such that if two texts vary signi- 
ficantly in the proportion or distribution of one or more 
units, this difference would be attributed to the two texts 
occupying different positions in a context space. The exam- 
ination of other texts of varying similarity to each of 
the original two texts should lead to the description of 
factors (dimensions) responsible for the original observed 
difference. The proposed dimensions can then be tested 
by predicting the behavior of texts not yet examined. 
In this study we propose to examine some aspects of 
the statistical behavior of certain syntactic units in a 
sample of technical English. In this as in any other study 
we must carefully set our goals and gather an appropriate 
--5-- 
h 
amount of data to carry them out. 
The major focus of this study will be on the varia- 
tion in frequency of syntactic units within the writing 
of two individuals. A primary hypothesis to be tested is 
that the distributions of units will remain l~easonably 
the same throughout a single text written by one person. 
If the distributions are not uniform~ several explanations 
could be offered. For example~ the varying content could 
influence the frequencies; that is, even in a single text 
there might be contextual variations. A comparison of 
the individual chapters should reveal such variations 
since the chapters represent the way in which the content 
has been divided in the text. For this reason the chapters 
will be compared with each other in each of the two texts. 
There may be other causes for internal differences 
in a text. During the time that the text was written vari- 
ous circumstances could have arisen to influence the fre- 
quencies. This study does not attempt, however, to account 
for such influences except as they may be co~related with 
chapter content and position. 
The other primary hypothesis to be tested is that 
the two sample texts will reveal essentially the same 
distributions. Several studies have compared samples of 
technical writing as a whole with samples of non-technical 
I 
writing, but no one-seems to have reported on the varia- 
tion in linguistic performance among individual American 
--6-- 
v- 
technical writers. 
In order to be sure that differences between the 
texts would be attributable as much as possible to the 
writers themselves it was decided to select the sample 
texts from the same discipline. In other werds, if a 
history text differed in average sentence length from a 
biology text this could be due either to the different 
writers or the subject areas or beth. While it may 
seem unreasonable to believe that biology and history 
writings could exhibit distinctive patterns, there is 
also no inherent reason why technical and non-technical 
should vary. 
The texts selected for this study are both from 
linguistics. They are: 
I. Emmon Bach's Introduction to Transformational 
Grammars {New York, 1964), all but exercises at the end 
of chapters. 
2. Kenneth Pike's Language in Relation to a Unified 
Theor Z of the Structure of Human Behavior (The Hague, f967), 
pp. 25-82, excluding bibliographical sections. 
The choice of linguistics as the technical field was 
arbitrary. These samples of technical writing cannot be 
regarded as-random samples of technical writing as a whole, 
or even of linguistic writing, or even of Bach's or Pike's 
writing. The requirement of this study for large amounts 
of data from single texts precluded the possibility of gain- 
ing representativeness through the use of many smaller 
samples. Factors leading to the selection of the particu- 
lar text by Bach were its relative shortness as a complete 
-7- 
book, its recent publication date, and the varied material 
covered. The three chapters by Pike may be regarded as a 
smaller control sample to be available to confirm any major 
conclusions for the Bach sample. Moreover, it was £elt 
that Pike exhibited a rather different approach to sentence 
construction from Bach, and that this difference, when 
demonstrated quantitatively, would dispel any notion that 
technical writers could not show individual styles. For 
convenience the samples from Bach and Pike will be referred 
to hereafter as simply Bach and Pike. 
Before conducting a statistical investigation of texts 
various parameters or units must be selected which later 
will be counted and used as the basis for determining the 
similarity of the samples to be compared. The parameters 
discussed here represent 2 syntactic levels, that o£ clause 
and sentence. Table 1 depicts the basic clause level units. 
-8- 
TABLE 1 
CLAUSE LEVEL CLASSIFICATION 
Type Name Examples 
3 "Be" Clause 
4 "Active" Clause 
5 Passive Clause 
C "There" Clause 
E "It" Clause 
This theorem is true. The 
description has not been 
useful. 
This description has many parts. 
Ideas flourish. Progress gives 
men hope. Linguists study lan- 
guage. We consider this false. 
This was realized by others. 
There a~e few days left. There 
seems to be no way to do this. 
It is not easy to estimate this 
quantity. It seems futile to 
try this. 
-9- 
Sentence types are defined through constituent 
clause types. A sentence is assumed to consist of a 
sequence of clauses, each of which is either a main 
clause or a subordinate clause. In the coded text 
symbols for main clauses are preceded by an "M". 
Further, some clauses will be embedded within another 
clause. Embedded clauses appear in parentheses follow- 
ing the clause in which they are embedded. Thus, those 
sentences which are composed of the same clauses in the 
same order are considered to belong to the same sentence 
type. The following examples should clarify the clause 
and sentence type classifications: 
i. Numerous examples and problems are presented 
throughout this introduction. Bach, page 2. One main 
passive clause: MS. 
2. These are works that embod 7 in the medium of 
language the esthetic values of the individual or the com- 
Bach, page I. A main be clause followed by a sub- 
te transitive clause: M3---z\[. 
3. The particular wa 7 of statin~ a theory of a lang- 
uage with which we shall be concerned has taken inspira- 
tion from modern logic. Bach, page 9. A main transitive 
clause with an embedded b_~e clause: M4(3). 
4. It is doubtful whether there are an 7 natural 
lansuases conformin~ to an 7 of these tTpes. Bach, page 
105. A main it clause followed by subordinate there and 
transitive clauses: MEC4. 
5. We set up terminall 7 discontinuous consZructions 
as continuous ones and then separate them. Bach, page 
120. Two main transitive clauses: M4M4. 
The coding of the original texts'was carried out "man- 
ually," that is, no computer program was written to convert 
-I0- 
\ 
the source text to coded text. For each chapter (8 in 
Bach, 3 in Pike) the occurrences or tokens of each of The 
clause and sentence Types were counted and compared. The 
chi-square test was employed To determine the validity of 
the assumption that the chapters in each text can be re- 
garded as random samples from one population. 
The counting and statistical analysis was carried out 
through the facilities of the Michigan Terminal System at 
the University of Michigan Computing Center. This time- 
sharing system is presently driven by two IBM System 
/360-67 processors. The clause level unit analysis programs 
were written in assembly language and FORTRAN IV. The sen- 
Tence type counting was programmed in SNOBOL~. 
Results for Bach 
Table 2 depicts the frequency counts of the five clause 
types in Bach. Here considerable variation is apparent, 
especially in the be clause and the passive clause. The 
there and i_~t clause frequencies appear to be relatively 
constrained. The assumption that The chapters may be re- 
garded as random samples from one population must be 
rejected. 
The frequency of the most common sentence types in 
Bach is illustrated in Table 3. The percentages given in 
the table represent the proportion of a sentence type among 
the five sentence Types listed. It was expected that a 
few sentence types would occur quite often, and that many 
Types would be found only once. It was disappointing , 
-ii- 
TABLE 2 
CLAUSE DISTRIBUTION FOR BACH 
Chapter Clause Type 
b~e active passive there i_~_t 
54 114 41 4 9 
24.3~ 5z.4~ lB.5% Z.8~ 4.1~ 
2 90 191 135 18 21 
19.8~ 42.0~ 29.7~ 4.0~ 4.6~ 
3 95 179 139 
21.6~ 40.8~ 3z.7~ 8 18 1.8~ 4.1~ 
4 118 238 171 12 21 
21.1~ 42o5~ 30o5~ 2o1~ 3°7~ 
185 405 167 35 18 
22.8~ 50.0~ 20o6~ 4.3~ 2o2~ 
6 55 174 98 8 14 15.8~ 49.9~ 28.l~ 2.3~ 4.~ 
238 301 171 31 39 
30.5~ 38.6~ 21.8~ 4.0~ 5.0~ 
8 86 208 94 28 28 
10.4% 46.~ 21.2~ 6.3~ 6.3~ 
Total 921 1805 1016 144 168 
22o7~ 44.5~ 25.1~ 3.6~ 4.1~ 
Chi-square value= 123.99. 
Probability= laps than .O01. 
-12- 
TABLE 3 
DISTRIBUTION OF ROST FREQUENT SENTENCE TYPES 
Type 
Chapter 
m3 m4 m5 m44 m45 
1 21 19 4 5 3 
40.4~ 36o5~ 7o7~ 9.6~ 5.B~ 
17 31 25 7 12 
18.5~ 33.7~ 27.2~ 7.5~ 13.0~ 
15 33 22 2 8 
18.1~ 41.2~ 27.5~ 2o5~ . 10.0~ 
4 14 27 37 14 4 
z~.6~ 2e.z~ 3s.5~ z4.6~ 4.2~ 
5 35 52 27 22 ? 
24.5~ 36o4~ 18o9~ 15.4~ '4.9~ 
9 32 13 7 3 
z4.l~ 5o.o~ 2o.~ zo.~ 4.7~ 
7 28 43 33 4 7 
24.3~ 37.4~ 28.7~ 3.5~ 5.1~ 
8 14 42 15 8 ii 
z5.6~ 45.7~ z6.7~ e.~ z~.2~ 
153 279 176 69 55 
Total 21.0~ 38.1~ 24.0~ 9.4~ 7.5~ 
Chi-square value: 71.32. 
Probability= leas than ,OOl. 
-13- 
however, to find that only five types occummed with suffi- 
cient frequency fom statistical testing. 
Theme is clearly little consistency in the frequency 
of these sentence types, and the chi-squame test is able to 
meject strongly The hypothesis of homogeneity of the chaptems. 
A cumsomy inspection of the table reveals little ovemall 
pattern. The main passive Type (MS) occums least in chap- 
ters 1 and 8, the introduction and the conclusion. This 
is consistent with the notion of the passive clause being 
highly comrelated with technical material. Of course, the 
main passive type is not the only source of passive clauses. 
The active plus subordinate passive type (M45) listed in 
the table also pmovides one passive clause per sentence. 
We find that this type has its lowest frequencies in chap- 
ters 4 and 7. Theme is, then, no strong correlation be- 
tween sentence types on the basis that they both contain 
passive clauses. 
Bach and Pike Compared 
Table 4 depicts the distmibution of clauses in Pike. 
As for Bach, the assumption that the chapters mepmesent 
random samples from one population must be mejected. As 
in Bach, the passive vamies considerably fmom chaptem to 
chaptem. Bach's fimst chaptem, the intmoduction, has the 
I smallest propomtion of passives but Pike's fimst chaptem 
has the most passives. Bach's be clauses range fmom 15.8 
-14- 
per cent to 30.5 per cent, but Pike's be clauses are more 
stable, ranging from 16.1 per cent to 22.4 per cent. 
Pike's active and passive clauses are also more consistent, 
but with eight chapters it must be taken into account that 
Bach has a greater opportunity to reveal inconsistency. 
Bach appears to use slightly more b__ee clauses, many fewer 
active clauses, and somewhat more passive and it clauses. 
The difference in the frequency of there clauses does not 
seem substantial. A chi-square test comparing Bach's and 
Pike's clause totals yields a probability far less than 
.001. 
TABLE 4 
CLAUSE DISTRIBUTION FOR PIKE 
Clause Type 
Chapter 
Be Active Passive There It 
1 51 143 76 II 6 
17.8% 49.8% 26.5% 3.8% 2.1% 
2 145 338 132 15 17 
22.4% 52.2% 20.4% 2.3% 2.6% 
3 60 227 65 17 4 
16.1% 60.9% 17.4% 4.6% 1.1% 
Total 256 708 273 43 27 
19.6% 54.2% 20.9% 3.3% 2.1% 
Chi-square value: 23.19 
Probability: between .001 and .005. 
-15- 
1 
We recall that in examining Bach's sentence types 
only a handful occurred with sufficient frequency in each 
chapter to allow statistical testing, in spite of a 
sample of almost 2000 sentences. There are far fewer sen- 
Tences in the Pike sample of 446 sentences, and in addition 
Pike appears to use proportionally more sentence Types due 
to his preference for sentences with four or more clauses, 
copies of which are not likely to be found again. It is 
not surprising, then, That just two or three sentence types 
occur often enough for Testing. Rather than attempt any 
judgment on the consistency of Pike's sentence types on 
such meager evidence, we proceed to a summary of the most 
frequent sentence types in Pike and Bach. 
The results, given in Table 5, clearly indicate the 
authors' different preferences, but at the same time theme 
are marked similarities in their frequency of usage of 
some types, for example the M34 and M43 types. We must re- 
member that Bach's most common sentence types were shown 
to be strongly non-homogeneous, and thus the data in 
Table 5 cannot be regarded as highly predictive of the per- 
formance to be found in other Bach samples. Because of 
this great internal inconsistency a chi-square test was 
not carried out on the data in Table 5. 
I 
Conclusions 
This study has produced, we believe, much useful and 
interesting data which leads to several major conclusions 
-16- 
TABLE 5 
mOST FREQUENT SENTENCE TYPES IN BACH AND PIKE 
Rank PropOro Proper. Rank Proper. Proper. 
Type in in in Type in in in 
Bach Bach Pike Pike Pike Bach 
m4 1 14.3% 11o2% m4 1 11.2% 
m5 2 B.9~ 3.6% m5 2 6.3% 
m3 3 7.~ 6.:~% m44 3 5.4% 
0144 4 3.6% 5.4% ms 4 3.6% 
m45 5 2.8~ 2.2~ mc 5 2.5% 
mF* 6 2.2% .45% m45 6 2.2% 
4HI4 7 2.0~ ;917~ fil43 7 2.0~ 
m54 e z.~ .9o~ m4m4 e z.e~ 
~4 9 z.s~ z.6~ m4 9 z.e% 
m43 10 1.7% 2.0'~ 1114(4) 10 1.3% 
\[tiC 11 1.5% 2.5~ gl344 11 1.5% 
0135 12 1.5% Z.Z~ In455 12 1.1% 
m41114 13 1.2~ 1.8~ HI35 13 1.1% 
mE4 14 Z. 1% .6~ rg54 14 .90~ 
f/133 15 . B6% • • gl444 15 . 90~ 
z4.3~ 
7.~ 
3.6~ 
e.~ 
2.~ 
Z.7% 
Z.2~ 
.66~ 
.4~ 
.5~ 
Z.~ 
oe 
~Frapresent8 an imperative clause. 
-17- 
about the nature of language performance. 
The first conclusion is that the model of a writer 
producing language by drawing samples of linguistic units 
at random from a specific and unchanging population is 
untenable. The evidence given here is strongly against 
such a model, but it is not certain whether the difficulty 
with such a model is to be traced to non-random sampling 
from a constant population, or random sampling from a 
changing population, or non-random sampling from 
a changing population. Moreover, it is not clear how any 
one of the three alternative models could be demonstrated 
superior to any of the others, since there seems to be no 
way to distinguish empirically between the effects of non- 
random sampling and a changing population. 
The random sample-uniform population (RSUP) model for 
a single writer appears to be the foundation for many 
studies in statistical stylistics and linguistics, al- 
though this is often not expressed in any explicit way. 
These studies are designed as follows. The hypothesis is 
that two or more writers or genres differ substantially 
in the use of one or more linguistic units such as sentence 
type, sentence length, adjective-verb ratio, etc. Brinegar, 
for example, has stated this hypothesis in this way: 
The use of this method assumes that every 
author unconsciously uses w6rds that, at 
least in the long run, could be considered 
as random drawings from a fixed frequency 
distribution of word lengths. This should 
-18- 
be true at least for writings of a related 
type over a reasonable span of years. 3 
A null hypothesis of no significant difference (homo- 
geneity) is tested through the drawing of random samples 
from the writers or genres and the selection of a stat- 
istical test and a significance level. The sample sizes 
are chosen on the basis of their being large enough to be 
representative of the writers or genres. The test is 
applied to the sample data and the null hypothesis is 
either maintained or rejected. Thus homogeneity is a 
black or white proposition in This approach, a function 
of the vagueness of sample size and significance level. 
The RSUP model interprets the discovery of a statis- 
tically "significant" difference between writers or 
genres as something unexpected and hence worthy of note, 
meaning that the writers or genres represent distinct 
populations. Yet, the demonstration of statistically 
significant differences is something to be expected with 
a sufficiently large sample size. What is needed is an 
approach which takes sample size out of immediate consi- 
deration and relativizes the concept of homogeneity. 
3Claude S. Brinegar, "Mark Twain and the Quintus Curtius 
Snodgrass Letters: A Statistical Test of Authorship," 
Journal of the American Statistical Association, LVIII 
~arch, 1963), p. 87. 
-19- 
1 
There is really no intuitive support for the notion 
of the homogeneity of linguistic units as something abso- 
lute. There is nothing intuitively objectionable about a 
statement that one feature is more or less consistent than 
another. 
This study is not the first, of course, To use large 
enough sample sizes within a genre to demonstrate very 
significant internal differences. A recent analysis of 
some aspects of the Brown University corpus reveals such 
differences within fifteen genres for parts of the sentence 
length distribution in words, using the chi-square test. 
There was no attempt to relativize the results of the 
chi-square test, but sentence length was described for each 
genre in terms of The mean, standard deviation, and co- 
efficient of variation, the latter being the standard de- 
viation divided by the mean, a measure better suited than 
the standard deviation alone to indicate the extent of dis- 
persion in the distribution. 
An investigation of The homogeneity of individual 
vocabulary items between genres revealed once again very 
significant differences. 5 Rather than settle for a state- 
Henry Kucera and W. Nelson Francls, Computational 
Analysis of Present-Day American English (Providence, 
RfI.: Brown University Press, 1957),, pp. 378-379. 
5Ibid., pp. 277-293~ 
-20- 
menT that certain words were found to be non-homogeneous 
in the corpus at certain sample sizes, Mosteller proposes 
an "index of contextuality" as a measure of relative con- 
sistency for a specific word. 6 This index is computed by 
dividing the chi-square value by The sample size and then 
multiplying by i000. The effeoT is To treat each sample 
as if it consisted of exactly i000 units~ and the resulting 
index can be used to rank The homogeneity of individual 
words. In This fashion Mosteller computes indexes of 6.2, 
6.9, and 9.6 for to, and, and the, which were the least 
contextual~ or least influenced in frequency by context, 
in the Brown corpus. 
The second major conclusion of this study is that an 
index like Mosteller's is an appropriate way %o treat homo- 
geneity in a corpus. Using this approach to study different 
writers one would segment each writer's works into n seg- 
ments, Take random samples, preferably of equal size, from 
each of the segments and compute an index of contextuality 
for each feature measured. If there is a central tendency 
in a feature for the w~iters~ an index of contextuality 
for pairs of writers considered Together may be computed 
as a measure of Their variance. 
6F. Mosteller~ "Association and Estimation in Contingency 
Tables," Journal Of the American Statistical AssOciation, 
LXIII (March~ 1968), pp. 1-28. 
-21- 
1 
Such a comparison of indexes of contextualitypre- 
supposes a consistent number of segments and data divisions 
(parts of speech, sentence length gmoupings, etc.), since 
these lead to the degrees of freedom of the contingency 
table, and for a greater number of degrees of freedom a 
greater chi-square value is expected for a given deviation 
from randomness. Nevertheless, for purposes of rough com- 
parison one may wish to examine the homogeneity of features 
with different underlying degrees of freedom. Another 
index should be of interest in this regard: the sample 
size necessary to reject the null hypothesis at a level 
of .001. This index is in a way more concrete than the 
index of contextuality in that The degree of consistency 
is related to the number of units being measured. More 
important, the degrees of freedom is taken into account. 
For the index of contextuality a higher value means less 
uniformity for the feature, while a higher value for the 
rejection size means more uniformity. 
Table 6 gives values for these two indexes for a 
number of features as a basis for determining The relative 
similarity of Bach and Pike. As can be seen, the two 
writers agree relatively closely on the ratio of main to 
subordinate clauses of the passive type, but differ greatly 
on this same ratio for the there type., 
-22- 
We believe that the categomization of clause and 
sentence types used here is reasonable and simple, and 
that this sort of categomization would be readily appli- 
cable to other languages. In addition, statistical measures 
such as the index of contextuality and rejection size 
appear to be quite useful as indicators of the consis- 
tency of linguistic performance. 
-23- 
TABLE 6 
INDEX VALUES FOR BACH AND PIKE COBPARED- 
Feature Degrees of Index of Rejection 
Freedom Contextuality Size 
Word Level 15 1.7 22200 
Clause Level 4 ll.l 1565 
B_eeClauee, S-~ 1 1.3 8300 
Active Clause, S-M 1 1.5 7200 
Passive Clause, S-M 1 .65 16620 
I.t Clause, S-M 1 5.3 2038 
Ther..~_eeClauae, s-m 1 29.15 370 
Clauses, Nested 
vs. Non-Nested 1 .1 10800 
material in 
Parentheses 1 2.9 3720 
Sentence Length 
in ~orde 6 63.5 354 
Sentence Length 
in Clauses 4 6i.1 300 
Paragraph Length 
in Clauses 2 76.35 154 
Paragraph Length 
An Sentences 1 98.4 110 
-2q- 

REFERENCES 

Bailey, Richard W. "Statistics and Style: A Historical 
SuPvsy." Statistics and Style. Edited by Lubom~r 
Dole~el and Richard W. Bailey. New York: American 
Elsevier, 1969. 

Bailey, Richard W., and Dole~el, • Lubomzr, eds. An Anno- 
tated BibliogPaphy of Statistical Stylistics. Ann 
ArboP: Michigan Slavic Contributions~ Bibliograph- 
ical Series No. 2, 1968. 

Dole:el, Luboml°r. "A Framework for the Statistical Analy- 
sis ok Style." Statistics and Style. Edited by 
Lubom~r Dole~el and Richard W. Bailey. New York: 
AmePican Elsevier, 1969. 

Edmundson, H.P. '~athematical Models in Linguistics and 
Language Processing." Automated Language Processing. 
Edited by Harold BoPko° New YoPk: John Wiley and 
Sons, 1967. 

Huddleston, R. D.; Hudson, A.; Winter, E.O.; and Henrici~ 
A. Sentence and Clause in Scientific English. London: 
Communication Research Centre, University College 
London, 1968. 

Kaufman, S. I. "Oh Imennom Kharaktere Tekhnicheskovo 
Stilja." Voprosy J a~koznani~a, X, No. S (1961), 
lOq-06. 

Simpson, Harold. "A Descriptive Analysis of Scientific 
Writing." Unpublished Ph.D. dlssertation, The Univ- 
ersity of Michigan, 1965. 

StPeeter, Victor J. "Homogeneity in a Sample of Techni- 
cal English." Unpublished Ph.D. dlssertation, The 
UnivePslty of Michigan, 1969. 
