SEGMENTING NATURAL LANGUAGE BY ARTICULATORY FEATURES 
David Shillan 
Cambridge Language Research Unit, 
ENGLAND. 
I. For many purposes it is necessary to segment text 
into units convenient for handling. The sentence has 
been generally accepted as the natural unit, since 
there was no obvious alternative other than the word 
-which by itself tells us too little - or the para- 
graph - which is a vague and shifting unit, ~unless 
redefined. But the sentence is not satisfactory 
either: it ks very variable in length; studies of 
speech show that in its conventional form it is not 
always recognizably present I ; it may depend semant- 
ically upon its context up to at least paragraph 
length; and in any case what constitutes a sentence 
is not consistently defined (Fries 2 indicates more 
than 200 definitions). 
2. There is another way of segmenting text, which 
does not suffer from these limitations, being based 
upon the rhythmical features of articulated speech. 
This use of the term "articulated" results from a 
vlew of language as basically speech, that is as 
skilled bodily movement. We have found it possible 
to bridge the gap between spoken language and written 
language by using features which both the writer and 
the reader of language tend to adopt from speech. 
3. Studies of §poken language, particularly in re- 
latiQn to foreign language teaching, show agreement 
on at least the terminal boundary of the "tone group" 
which Crystal & Quirk3 call "the most striking proso- 
dic unit in English speech", and on which they have 
found experimentally a high rate of agreement by 
informants. Many different teaching books* exemplify 
this agreed feature, despite the lack of satisfactory 
instrumental evidence on continuous speech (into which 
research is now being planned). 
4. Less agreement is found on the configuration of 
the whole unit which terminates in the "nucleus". 
Some authors refer to "tone groups" or "tone units", 
some to "sense groups", some use both terms: this 
overlapping category of tone and sense suggested a 
field for further study, which has been proceeding 
at C.L.R.U. for some time. Syntax is not usually 
brought into the treatment of this subject, since 
the approach is phonological; but among the authors 
* Work supported by Canadian National Research Council. 
.I. 
referred to 4, MacOarthy do~ndicate that syntactic 
criteria determine the s$~ure of his "intonation 
groups". Our studies support the work of those who 
suggest that what is commonly called "stress" has a 
semantic functionS, and what can be an~ysed in terms 
of intonation is the syntactic feature , - a kind of 
audible syntactic braketting. 
5. It is common practice in the teaching of English 
as a foreign language (see Baird7) to use tone groups 
of two stresses (head and nucleus) as examples, but 
this configuration is not usually formalized. In my 
own use of such drill material for the foreign learner, 
I have for many years adopted this unit, marked it 
with a musical p~rase-mark, and called it, since my 
1954 publication °, a "phrasing". MY drill use of this 
unit gives a minimal context of not less than one sen- 
tence - a sentence being se~nentable into one or more 
phrasim~s, the phrasing being thus audit between the 
word and the sentence but not necessarily coterminous 
with the clause or grammatical phrase. (The musical 
analogy shows phrasing as a category distinct from the 
note, the bar, and the section.) 
6. Ten years after publication of these drills, my 
work was called upon by Margaret Masterman9 in re- 
lation to her own semantic approach, for which the two 
stress-points of the phrasing were seen to correspond 
to two information points. In the ~eantime I had been 
led by teaching experience to consid6r the ~ifficulty 
of foreign lezrners with adequate vocabulary and 
adequate syntax but no adequate speech-experience of 
English. They were unable to read a piece of current 
English (e.g. a "Times" leading article) with under- 
standing, wherean the native English reader, even if 
momentarily puzzled by perhaps a hastily-worded sen- 
tence, would immediately feed back into his reading 
of it (i.e. "in his mind's ear") the natural speech 
form (i.e. the phrasing) with which the writer had 
written it. 
7. From this the conception of "stress-point" became 
differentiated from precise syllabic location of stress 
(which is itself a complex of amplitude, frequency, and 
duration) and was defined as the word or words centred, 
in stress-and-tone prominence, on the nuclear tone, 
~nd the word or words centred (in th e same sense of 
,prominence") on that head t_one which predominates 
above any other head or heads which might follow the 
precedin~ nucleus. 
.2. 
r This method of dealing with tone groups which 
apparently have more than one head proves to be 
operationally satisfactory. It gives us a consis- 
tent phrasing of two beats, the second of which con- 
sists, in certain cases, of a "silent stress~ ( 
phenomenon vouched for by many phoneticianslO), a It 
also helps to meet~the difficulty of differently timed 
lan@nla~es, referred to in para. 13 below. 
8. It follows from the treatment of stress-points in- 
dicated in para. 7 above, that spread stress will occur 
in regular compounds, such as "semi+readiness", and it 
also occurs very frequently in cases of a noun with its 
qualifier, whether true adjective or noun acting as 
adjective, e.g. "political+requirements", or "staff+ 
planning", and in g~neral where we find intimately 
associated words on which the stress falls with vir- 
tually equal emphasis. 
9. The silent beat may or may not be a perceptible 
pause, but tends to occur in certain typical locations, 
e.g. where some expression of significant semantic 
content is about to follow. It would also be possible 
in many cases to imagine the phrasing re-written using 
relevant syllables instead of the silent beat, e.g. 
"in a review of progress" instead of 
"in a review () ". 
In marking phrasings on text two symbols are used in 
addition to the + sign for spread stress and the () 
sign for silent beat. They are the well-known tonetio 
m~rker ~ (originally representing a high falling tone) 
used for the nuclear stress, and the stress-mark' used 
for the head stress. These may also be referred to as 
primary and secondary stress-points, the nucleus being 
primary because in general it indicates the ~ of 
the utteraace and the head being secondary because in 
generalit indicates the cqmment. Thus reading down 
all the nuclear stress-points of a text printed as a 
series of phrasings one below the other, we have an 
index of the topic of the whole text. 
10. A piece of text reading 
"Politically Canada is divided into ten provinces and 
two territories" can be phrased-up either as 
.3. 
"~oliticall~ ( )" ~ Canada is "divide~"into ' ten "Province6 
"and 'two ~errltorie's TM or as ~olitically ( ) 
'Canada is "divided 
into 'ten ~province8 
and 'two ~territories. 
The "quatrain" form into which this falls proves to be 
very frequent, particularly at the bUinning of a 
passage. This passage continues in two more quatrains: 
'Each+province is ~sovereign 
in its ' own "sphere 
and 'administers its ~own 
'natural ~resource8, 
and upon 'such "resources 
as 'related to ~topography, 
' position and "clilate 
i8 'based the "economy÷of÷the÷province. 
A straightforward text of this kind offers if not a word. 
for-word, at least something like a phrasing-for- 
phrasing possibility in translation. But the trans- 
lation correspondence, for French for example, is often 
not direct but expanded (e.g. 2 or more French for 1 
English), or transposed in order. Apart from these 
ocnsiderations, there are many cases in which the 
phrasing structure resolves syntactic or semantic un- 
certainty. Here is a case where the lack of such a 
means of segmentation led to a serious mlstranslation: 
It 'may be •assumed 
that an 'international ~force 
on a 'standby ~basis 
will ' take+shape as a • development 
out of 'practice which has already "begun. 
The published translation has turned the last two lines 
into 
"prendra une for~e assez singuli~re, ce qu'elle a d6Jh 
coneno6 h faire". 
1 1. Passages of text An various styles and of various 
lengths have been analyse~ by hand, and show a con- 
sistent tendency for this~hythm to be found. There 
may be physiological reasons for this. Neurological 
studieseshow persistence of tone and rhythm in cases 
where normal articulation is impaired1 1. ~ood reasons 
for this rhythm to be binary include the fact that the 
*For neurological literature I am indebted to Dr. 
Violet MacDermot. 
.4. 
rhythm of the motk~Ms heart-beat is present even to 
the unborn child, and the in/out rhythm of respiration 
and the left/right rhythm of walking are basic to h~an 
life in general. Studies in articulatory phonetics 
support the belief that some form of kinaesthetic 
activity is involved in silent reading, as well as in 
listening to live speech, which is why we can legiti- 
mately refer to "the rhythm of the prose" in spite of 
the lack, up to the present, of acoustic instrumental 
documentation of this. 
12. Though intonation supplies the contour on which the 
phrasing is founded, the rhyth~of stress is the more 
essential factor. As Tibbitts '~ sayss "The correct 
basic stressing is mandator~ while the intonation is 
variable within as yet undefined limits". This is the 
reason wh~ She phrasing hypothesis is unaffected by 
differences o~ dialect or accent. The question of 
isochromicity in English prose has a literature str~tch- 
in~ back to Joshua Steele in 1775, through Coventry Pat- 
more in 1856, and on to its thorough experimental .. 
(though not instrumental) examination by AndrT~Classe 
in 1939 and discussion by Abercrombie in 1951 o. There 
is evidence for at least a strong tendency towards a 
normal regular periodicity of stress-points. Our 
observations suggest that a speaker tends to select 
and order his words so as to distribute them about 
these pulsations of stress in such a way that points 
of emphasis fall naturally upon them. 
13. The question of whether the phrasing can be equally 
well observed in languages other than English is not 
included in the present paper, except by the observa- 
tion that when parallel texts in English and ~rench are 
analysed in this way, the French equivalent of the 
English phrasing, as clearly delimited by the French 
nuclear tone (and notwithstanding the difference bT~ 
tween a syllable-timed and a stress-t~ed language ) 
supplies a form of "translation unit "'l withl~ measur- 
able rate of correspondence with the English . 
13. Examination of given phrasings in a text of 377 
phrasings a followed by another of over 900 phrasings, 
led Dolby'9 to say: "Phrasing length, as measured by 
the number of syllables, appears to be a reasonably 
behaved statistic when viewed in isolation with routine 
statistical tools". (See Appendix I) 
14. A method of observing the phonological configura- 
tion of phrasings is to turn written text into spoken 
prose on magnetic tape, pass this through a suitable 
pitch detector and intensity detector (such as that of 
.5. 
the University of Grenoble or the University of 
Copenhagen), and record the result on mlngograph 
scrolls. Research now being started at C.L.R.U. is 
comparing the output of these two sets of apparatus 
with that of apparatus developed in England, with a 
view to finding the best selection of acoustic data 
by which to observe the terminal point of the phrasing 
(frequently a steep fall or rise in pitch), and the two 
stress-points as peaks of frequency-plus-amplitude-plus- 
duration. 
15. An extension of the usefulness of this unit of 
segmentation can be seen in algorithmic production by 
computer of a form of phrasing, based on observation 
of the criteria used in making articulatory p~asings. 
This has beeh done at 0.L.R.U. by J.E. Dobson=Vin a 
form which while not in every single case identical 
with hand-marked phrasings nevertheless provides a 
new and operational segmentation of continuous text. 
As part of the work done under contract to the National 
Research Council of Canada, this programme is now being 
applied to the phrasing of a text of 20,000 words from 
the 0~uada Year Book of 1962. 
16. The normal rhythmical stress can also be provided 
algorithmically. This makes possible a computerized 
ordering of the phrasings of a text alphabetically 
according to four different valuations, i.e. 
(i) the primary Snuclear) stress! 
(ii) the secondary (=head) stress, 
(ill)pendants (= unstressed strings attached) to 
primary stress; 
(iv) pendants (= unstressed strings attached) to 
secondary stress. 
This gives a semantic concordance (called SE~O) from 
which statistical and other information can be derived. 
The computer can process text in this way as it could 
not do using the sentence as a unit, and both more 
economically and with more information than it could 
by merely cutting the text into lines of the length 
of the computer print-out. 
17. The patterning of stressed and unstressed words, 
i.e. of stress-points and unstressed words can be ex- 
pressed as a calculus of ordered pairs, on which 
research is proceeding. 
.6. 
.8. 
APPENDIX IA: Histogram of phrasing frequency 
versus phrasing length in words. 
280 
260 
~0 
220 
2O0 
180 
140 
.,~ 1 20 
o 100 
3o 
o 
o o 60 
o 
g 2o 
o 
! 
0 
r- 
| ! l 
1 2 3 4 
m 
I w 
5 6 '7 8 9 10 
Phrasing length in words 
.9. 
APPENDIX IB: Histogram of phrasing frequency versus phrasing length in syllables. 
m al \[ 
O'l 
0 
m 0 
Q 
0 0 
0 
0 
g 
1 PO 
140 
130 
120 
110 
IO0 
90 
8O 
70 
60 
50 
4O 
30 
2O 
10 
0 
0 
--1 
i 
m 
i i . - • I ! 1 2 3 i 5 6 7 8 9 1'0 t 1'213 14151'6 1'718 
~hrasim~ len&~h in syllables. 
.tO - 
APPENDIX II: 0omputer outpmt from phrasl~ program. 
\ 
WHILE THEy ARE WELL KNOWN AND ESTABLISHED, 
,I THOU3HT IT WOUL£ BE APPROPiATE 
:To DRA'~ YOUR ATTE~TIO~ 
TO CERTAIN OF THE DEPARTMENTAL PROBRAM~ES 
THAT AXE ~ESS WELL KNE~WN 
IN RELATIOrJSHIP TC SERVICES 
FOR THE ABED, 
BuT wHiCH NEvERTH~ESS 
CAN CONTRIBUTE SIGNiFiCANTLY 
TO THEIR WELL BEI~6. 
ONE OF THESE I~ 
THE NATIONAL WELFARE GRANT PROGRA~!~E 
WHICH ~AS ESTABLISHED 
A9 LATE AS NINETEEN SIXTYTWO 
WITH CONSIDERABLE SUPFORT AND ENTHUSIASM 
FROM THE PROVINCIAL 60VERNMENTS, 
AND FROM THE NATICNAL AND LOCAL VO~UNCARY WELFARE 
AGENCIES. 
ONE MILLION DOLLARS 
I~ AVAILABLE 
UNDER THtS PROBRA~ME 
OURIN6 THE CURRENT FISCAL YEAR 
AND THAT AfJOUNT I~ TO iNCPEASE 
A~ THE RATE O~ HALF A ~ILLION DOLLAR3 
A YEAR 
.11. 

References

C.C. Fries: "The Structure of English"; Harcourt 

Brace, 1952, Longmans Green, 1957. 

R. Quirk, A. Dmckworth, J. Svartvik, J.P.L. 

Rusiecki, A.V.T. 0olin: "Studies in the corres- 
pondence of prosodic to grammatical features in 
English"; IXth International Congress of Linguistics 
1962. 

D. Crystal & R. Quirk: "Systems of prosodic and 
paralinguistic features in English"; Mouton, 1964, 

Armstrong & Ward: "Handbook of English intonation"; 
Heffer, 1931. 

W. Stannard Allen: "Living English Speech"; Long- 
mans Green, 1954. 

0'0onnor & Arnold: "Intonation of colloquial 
English"; Longmans Green, 1961. 

Arnold & Gimson: "English Pronunciation Practice", 
Lonaon Univ. Press, 1965. 

J.T. Pring: "Colloquial English Pronunciation", 
Longmans Green, 1959. 

R.A. Close: "Patterns of Spoken English"; Kenyusha 
(Tokyo), 1954. 

R. Kingdon: "The Groundwork of English Intonation"; 
Loh~mans Green, 1958. 

Lado & Fries: "English Pronunciation"; Ann Arbor, 
1954. 

L.A. Hill: "Stress and Intonation step by step"; 
Oxford, 1965. 

W.R. lee:"An English intonation reader"; Macmillan, 
1963. 

P. MacCarthy: "Endlish Pronunciation", Heffer, 
1944/50. 

D. Shillan: "Spoken English", Longmans Green, 
1954/65. 

R. Gunter: in Journal of Linguistics 2, 
2 Oct. 1966. 

M.A.K. Halliday: "Some aspects of the thematic 
organisation of the English clause"; Rand 
Memorandum, Jam. 1967. 

A. Baird: "Transformation and sequence in pro- 
nunciation", English Language Teaching XX, 2, J~n. 
1966. 

Margaret Masterman: "Commentary oK the Guberina 
hypothesis.; Methodos 57-58, XV, 1963 

D. Jones: "Outline of English Phonetics"; Cambridge, 1932. 

D. Abercrombie: "Studies in Phonetics and 
Lingu/stics", Oxford, 1965. 

T. AlaJouanine: "Verbal realization ia 
aphasia"; Brain 79, part I, March 1965. 

R.H. Stetson: "Motor Phonetics"; Amsterdam, 1951. 

E.L. Tibbitte: in English Language Teaching XXI, 
1, Oct. 1966. 

A. Olasse: "The Rhythm of E~glish Prose"; 
Blackwell, 1939. 

K.L. Pike: "The Intonation of Americ" English"; Ann Arbor, 1946. 

D. Shill--: in Meta (Montreal) XI, 3, Sept. 1966. 

D. Shill--: in Englieh~e Teaching XXI, 2, 
Jan. 1967. 

J.Y. Dolby: Reports to O.L.R.U. 1965-66. 

J.E. Dobson: O.L.R.U. work paper ML 185, and 
later developments. 
