American Journal r~f Computational Linguistics 
Microfiche 16 
S~M LE DIGITAL SPEECH SYNTHESIG 
William Mw Fisher and Aw Maynard Engebretson 
Central Ins-t~itute for the Deaf 
818 South Euclid Street 
St. Louis, Mo. 63110 
and 
Biomedical Computer Laboratory 
Washington University ~chool of Medicine 
700 South Euclid Street 
St. Louis- Mom 63110 
cop yri gh t 19 75 
Association. for Comgutetional Linguistics 
Relatively simpl~ computer aiethods for synthesiaing B eech to 
be used in phonetic /perceptual research are presented, wit f: particular 
references to the problm~ and successes encountered in the devel- 
opment of suoh a system at Cehtral inatitUte for the Deaf and the 
Biomedical Computer Laboratory of Washington University, 
The 
purpose of this paper is to present a synthesis and clarification 
of established methods so as to encourage other computational lin- 
guists to tackle digital speech synthesf e. The approach is semi- 
tutorial : 
crucial algorithms are given in Fortran or block-diagram 
form, and bibliographic referenoes that were found to be most 
useful in the eystem development are listed and dlscilaeed. 
The system described requires a minimum of hardware; a mini- 
computer is sufficient, if it is equipped with tape or diek 
secondary memory. The sound pressure wave is calculated entirely 
by software and only a digital-to-analog converter and a low-pass 
filter. are .required to convert it to a recordable electrical signal. 
The vocal apparatus is simulated by a rough model which is still 
general enough to mate most speech eounds. The two main types of 
excitation of the vocal tract -- periodic glottal waves for voicing 
and random noise for frication or aspiration -- are supplied by 
algorithms presented as function subr~utines. The effect of the 
vocal tract on these inputs is modelled by combinations of three 
other elemental functions, whose coding is based on recursive equa- 
tion theory for computational efficiency. A resonance provides the 
user with a means for accentuating the signal at a certain frequen- 
cy, such as a formant frequency of a vowel; an anti-resonance or 
notch filter is provided to cut back the energy at a certain fre- 
quency, 
as in simulation of the nasal anti-formant; and a radiation- 
effect subroutine simulates the effect on the speech signal of 
passage from the lips through a short stretch of air. Empirically 
obtained wave shapes and spectra of the outputs of these five basic 
functions are given in order to give the reader a better feel for 
what they do. 
These five elements can be combined in a number of ways. 
A detailed discussion is given for qne of the simplest reasonable 
models, in which the glottal wave and frication generators excite 
a series of three variable resonators, using a set of fixed reso- 
nances to simulhte higher frequency formants in addition to the 
radiation-effect simulator. Several other more complex arrangements 
are presented, including parallel resonator models dnd models with 
separate filters for shaping voiced, fricative, and nasal components, 
and their advantages and disadvantages are discussed. 
An example is given of a complete modular Fortran program 
generating the word %eattt. The equations for specifying the param- 
eters ~w3rolling the elemental functions were derived, with much 
effort, from analysts of one token utterance, and spectrographs of 
the real and synthetic words are shown to illustrate the degree of 
naturalness obtainable with the simple three-resonator series model. 
A simpler example for generating a constant vowel sound is also 
given, along dth a summary of data useful in making many vowels. 
This paper is a slightly expanded version of one given orally 
at the ,l2th annual A.C.L. meeting in Amherst Massa&usetts. 
Table of Contents 
Page No 
I. Introduction emmmamaaammm..~..~e 4 
111. Basic Elements 
A. Sources *mws...r.emar.mmrm 10 
1. Glpttal Wave Generator . . . . . 10 
2. White Noise Generator 
.barn...I. 16 
3. Spectral Shaping blements .. ....... 17 
1. Resonances and Anti-resonances . . . . , 17 
Radiation Effect 
IV, OrganizatioaofElements . . . . . . . . . . . . . . 22 
A. The Simplest Model l . . . . . . . 
B. Control ...... .... ..... .... 
28 
C. Other Models w............... 
30 
Parallel Formant 
2. Separate Noise Shaping Channel . , 
3. Other More Complex Models . . . . . . 
33 
P. A Complete Example l . . . . . ,, . . . . 33 
V. Final Reaarks , ..an. .. ........... 
44 
Bibliography . . . . . . . . . . . . . . . . . 46 
5. Introduction 
Several years ago, it was decided that the Research Depart- 
ment at Central Institute far the, Deaf in St. Louis should have 
a digital speech synthesizer to aid studies in psychoacoustios. 
arid phonetic perception. The equipment on hand at the time 
coneisted primarily of a 12 -bit-w~rd mini -computer with keyboard 
and scope, with special-purpose hardware f af doing digital-to- 
analog conversion, low-pass filtering, and floating-point 
arithmetic 
Mare peripheral devf ces and core memory have Raw 
b&n added, We have been working since then on writing digital 
computer programs to synthesize speech studying the literature 
and gaining practical experience. 
Almost all of the theory and techniques necessary to program 
the synthesis of Erglish sounds oan be found in published literature, 
but in bits and pleces, here and there. Utilizing the cantributions 
of many authors, plus our own experience, we present and explain 
a basic program for synthesizmg speech, in the hope that computa- 
tional lingdists who have not worked with low-level speech phenomena 
may be encouraged to program synthesizers. 
The use of synthet ic-speech stimuli has been ext xemely import - 
ant to the investigation of the per~eptually~distinctive features 
of speech and of low-level phonological rules, but much work remains 
undone. Synthesizing speech is clearly important to phonetic 
research, and the field could well use more researchers with 
linguistic training. The system described in this paper requires 
relatively small investments in equipment and programming. 
11. Overview 
Synthesizers use varying amo~nts of 8peciai-purpose hardware. 
The type of synthesis we desc.ribe here uses the bare minimum, 
calculatiag the speech wave on a digital c~mputer and fequiring 
only a digital-to-analog convertqr and a low-pass analog filter 
as special. hardware. This minimum set of equipment is shown in 
Figure 1, page 6. 
If we asstuse that tape or disk secondary 
storage is available, then a 4 k 12-bit-word mini-computer is suffic- 
iently large, and a 12-bit D/A converter, will glve enough dynamic 
range. 
Ohce the speech wave is generated and stored on tape or 
disk, there is 
problem of writing a program to output enough 
of it synchronously at a fast enough rate. We will not go 
further into this pmblem here, since the solution will depend 
on the particular machine installation you have. Whatever output 
sample rate you achieve, there are tw things to note: the 
analog low-pass filter should pass only frequencies below 
the output sample rate, and the output sample rate is a parameter 
whose value must be fed into the digital cal'culations. 
Although our synthesizer can be described as a texminal 
analog model of the vocal apparatus, the attitude we take is 
that our method of synthesis is used rather to produce the sig- 
nificant acoustic features of speech. 
To set the stage for an 
understanding pf the synthesizer presented here, and for the 
benefit of those not familiar with acoustic phonetics, let us 
review for a minute what we are synthesizing. 
KEY BOARD 
Figure 1. Minimum Hardware Needed 
Figure 2, page 8, shorn a typical spectrogram of the word 
Itseat. Frequency is the vertical axis, time the horizontal, 
and darker marks indicate more energy. The mottled area at 
the upper left is the quasi-random noise of the /el sound. 
To make other fricatives, suck as f 1, we need to alter the 
frequency spectrum and intensity of the noise. The dark 
horizontal bands in the center are concentrations of ,energy 
-- resonances called ~tformants~~ -- chaxactefistic of vowel 
sounds. To make different vowels, we ;reed to change the 
center frequenaees, bandwidths, and relative intensities of 
the thxee lower formants visible here. 
There are higher- 
frequency formants, which do not show up well in this spectro- 
gram, but they seem to be important only to the naturalness 
of the speech, not to which vowel is perceived. Note the 
beginning and ending slopes of the formants; these formant 
transitions are crucial to the perception of occlusive con- 
sonants such as the stops , , and /g/, And finally, the 
vertical mark at the tar right is a burst ot noise marking the 
release of the final consonant, 
The detailed algorithms we present here will be expressed, 
in Fortran for clarity, although the synthesizer with which 
we have had the most experience is coded in assembly language. 
We are presently converting to Fortran, and the subroutines 
and final example program listed in this paper have b~eh tested 
in their: Fortran form. 
Before we get into details of synthesis, consider the 
overall logic, Figure 3, page 9. The main thing tornote in 
Time [in msec) 
Figure 2. Typical Spectrogram of "Seat" 
SKELETON OF SYNTHESIS PROGRAM 
DIMENSION NBUF (256 ) 
GENERATE 
GENERATE NEXT SPEECH WAVE POINT 
A3 THE VALUE OF YN 
C 
C STORE SPEECH WAVE POINT 
C 
NBUF (NPT) =IFIX(YN) 
750 CONTINUE 
C 
C WRITE OUTPUT BUFFER 
C 
CALL WRITB ( NBUF ) 
800 CONTINUE 
CAZL EXIT 
END 
Figure 3. Over-all Synthesiaer Program Logic. 
WRITB is a subroutine to write the contents 
of the array NBUF onto tape or disk. 
3.0 
this logic is that we synthesize the speech a point st a tihe, 
in one paaa, storing each buffer-full on tape or disk as it is 
generated. This program will produce an integral number of 
buffer-loads, but other methods for terminating the main loop 
are easy to implement. 
111. Basic Elements 
There are five basic elements in this method of speech 
synthesis : 
1. A glottal wave generator, with controls for repetition 
rate (pitch) and amplitude; 
2. A white-noise generator, with a control for amplitude; 
3. A resonant filter, with center frequency and bandwidth 
controls ; 
4. An anti-resonant filter, with similar controls; 
and 5. A radiation-effect simulator. 
Tkse five elements can be connected in a variety of ways tp 
produce models of greater or lesser complexity. me glottal wave 
generator and noise generator produce sounds whose spectra are 
then shaped by combinations of the other elementss, 
A. Sources 
1. Glottal-Wave Generator 
Natural glottal waves, while subject to much variation, are 
usually considered to consist of three parts: a glottis-opening 
phase in which the volume velocity is increasing, a glottis-closing 
phase in which the volume velocity is decreasing, and a glottis- 
closed phase in which the volume velocity is zero. The spectrum of 
such waves is supposed to fa11 off at about 10 to 12 dB/octave, 
1 
1 Cf. Flanagan (1958) 

7.88 TlflEp tl8ECe 
WEECH WhUE (flDRHCIL1 ZED) 
0 
I 
t 
I 
I 
I 
PHASE fiH6LC SPECTRUH 
Figure 4. Wave Shape and Spectral Analysis of a 
Typical Glottal Wave Period. The intensity spectrum 
has been normalized so that the largest component is 
at the 80 dB level; before normalization it was 61.54 
dB relative to an arbitrary standard of 0.25. The 
horizontal line across the intensity spectrum graph is 
a conservatively estilpated noise cut -off line; any com- 
ponent with rnagnitudelless than this level may be the 
product of computational noise. 
The phase angles of the 
components are gR, where the wave is represented by 
Z A, sin(nwlt + vn) and Q1 is arbitrarily zero. 
The 
ptase angle dieplay is suppressed for any component whose 
intensity falls below the noise cut-off line. 
execution time and the polynomial scored the highest in his 
tests of naturalness. Figure 5, page 14, shows the wave shape 
and spectrum of a linear approxi~pation and Figure 6, Page 14, 
shows that of a polynomial approximation. The over-all falloff 
of the spectrum of the polynomial more nearly matches our example 
wave, but no lobes are apparent, as they are in the spectra of 
the linear approximation and the natural example. Judging 
from some informal listening tests we have made, these dis- 
tinctions do not seem to make a great difference: both approx- 
imations sound good. 
As Figure 7, page 15, we present a Fortran function 
GLOT(P,AV) for generating glottal waves using €he polynomial 
approximation. Values of pitch (P) and zero-to-peak amplitude 
(AV) are passed directly as control parameters. 
The subroutine uses the common area to store several variables, 
which or course could he declared as formal parameters insteqd, 
ISWV is a voicing switch used in the logic internal to GLOT. 
TDEL is the period between output sample points in milliseconds; 
in the over-all initialization of the program, its value should 
be caloulated as 1000.0/OSR. where OSR is the output sample 
rate in samples per second. TG is a variable used as a simulated 
time clock by GWT, keeping track of how far through the glottal 
wave it has gone. TP, TI, and T2 are durations (in  millisecond^) 
from the hegirining or the glottal pulse, calculated and used by 
GLQT: TP is the duration of the pulse. T1 is the duration of 
the opening phase, and T2 is the duration of the opening and 
closing pndses combined. OPTR (an acronym fm llopenlny- time 
-- 
PHASE MOLE SPRCTRUW 
Figure 5. Wave Shape and Spectral Analysis of a 
Linear Approximation to a Glottal Wave. 
For 
details of the display, cf. Flgure 4. 
la07 
4087 
7.86 trm, nrac. s.70 
$P&c~H UhU6 <UORPM~ZLO? 
Figure 6. Wave Shape and Spectral Analysis of a 
Polynomial Approximatioa to a Glottal Wave. FOX 
details of the display, cf. Fiqure 4. 
Figure 
ruNevxaN CLOT (P,~AV) 
COMMON ~~uV~TDFL~TG~TP,~~,TD~OPTR~CLTR~AV~AVE 
C 
C PRODUCES A POLYNOMIAL 4PPROXIHATfON TO L GLOTTAL WAVE 
C WITH CON~TANT WAVE $HAPf! 
c 
c uacs THE ~OL~O~XNG VARIAR~E~ man COMMONI XSWV, 
C TDfC~TC,tP,TlrT2,aPTR,CLTR,Av1AVt; 
C 
C PlR8T 8EC IP HI! NEED TO RLmINIIIALItL~ 
C FOR BLGfNHlNG OF GLbTTAL HAVE 
C 
C ARC Wt HOW GENERATING VOftET 
rr (X~MVI ~BE,I?~,{~ 
c YES -. 31 HE ARE NOT AT THE END or A ruLaC, 
C PARAMETERS ARE OeK, 
C OTHERWl3E YE NEED TO CHECK'AV Tb SEE 
c xr HE NEED ANOTHER PULSE 
1 0 IF ITG+0,5*WEL-TP) 5B128p20 
C EITHER HE HAVE NOT BEEN GENERATING VOICE OR 
C WE HAVE JUST FINISHED A PULSE rr 
C IF' AV P 0, XNITIALXZE TO GENERATE A(NOTHER] PULIL 
C AND RESEt ISYV TO 1 
C OTHERWISE (REj8ET ISHV TO B 
28 IP*(Av) 30;30,40 
3d 2Suv.a 
GO TO 50 
C INITIALIZE FOR ANOTHER PULSE 
48 ISWVll 
hVSAVE8 AV 
t6aB.B 
TP.lB0BeG9/P 
TlmOPTRtTP 
TSmTl*CLtR*TP 
58 CONTINUE 
C 
C END Of PARAMETER SETTING 
c 
c BEGINNING OF L~GXC TO GENERATE GCQTIAL WAVE 
C 
IF (IJWY) 571S5t57 
55 Y.R18 
to to 108 
57 CONTINUE 
C 
C fr TG TI, Y.AV*(~*(TC/T~I~*~-~+(TG/T~]**J) 
138 I.# (TG.Tll 148, )SAP 15R 
149 Y~AvSAVE*(J,B* ([TG/Tl) **2)-2,8* ((TC/tl)c*3~) 
GO 70 180 
C ELSE IF TS * 72, Y~AV*(l~(((TG~fl)/(T2~Tl))**2)) 
159 IF (TG*T2) 16G1178a17B 
c ELSE rue 
178 Y.fl,fl 
GO TO A80 
c 
C EN0 OF GLOTTAL PULSE GENERATION 
C RETURN VALUE OP GLOTTAL NAVE 
d AND XNCREHENT TG 
C 
189 G~O~DY 
TG.TG+JtIEL 
REYURN 
C 
c ERROR IN VALUE OF xsrv 
C-' 
900 UR!TE(I,9i@) 13UV 
010 FORHAT("* ERR IN GLDTll ~~YvR~~,I!J) 
CALL HOLD 
CALL EXIT 
€NO 
C GCOT 
Fort ran Function 
Generate Glottal 
Waves 
ratio") i8 the tractlon or tne wave oacuplea by The opening phase, 
and CLTR is the fraotion occupied by cl~cring. In the over-all 
initialiaettion, OPTR ehould be set to .40 and CLTR Lo .16, values 
which maximize naturalness according to Rosenbergfs paper. 
Sf the instantaneous values of A\I were used by GLOT, the 
standard, wave shape would be altered if AV were changing during 
generation of a glottal wave. To keep the wave shape constant, 
GLOT uses the variable AVSAVE to atore the value of AV at the 
beginning of each pitch period, and during the generation of 
the pulse, AVSAVE is used as the (constant) amplitude. 
Between calls to GLOT, the values of Ismt TG, TP, T1, T2, 
and AVSAVE should not be altered, 
Rosenbergls equations for the polynomial approximation 
are used in GLOT; to get the linear approximation, the follow- 
ing two lines of Fortran should be substituted in GLOT for 
lines number 60 and 64: 
2. White Noise Generator 
Almost any reasonably good random-number generator can 
be used as a source of white noise. If the spectrum of the 
random numbers produced is flat, it will be easier to shape 
into the desired spectra fox the different fricative sounds. 
The algorithm we use was developed for use in synthetic 
speech work: it is very fast, and produces noise with a quite 
flat spectrum. Its presentation by Rader, Rabiner, Schafer, 
and perryl is easy to fellow and implement. 
The logic of the 
algorithm is formally stated in Fortran in Figure 8, page 18, 
but if possible this should be one function coded in alssembly 
language: if the right machine instructions are available, 
it will be snap, but "bit in Fortran is very slow. 
Our function IRN4(X) -- X is a dummy variable required 
by our Fortran compiler -- contains this algorithm in assembly 
language, producing on successive calls a series of random 
numbers with a uniform distribution over the interval from 
-2047 ta +2047. To implement a white noise generator, only this 
line of coding is needed: 
where AN is a variable whose value is the amplitude of noise 
desired. 
A typical stretch of noise produced in this manner, along 
with its spectrum, is given as Figure 9, page 19. Note that 
there does not appear to be any significant deviation from 
flatness in the spectrum intensity. 
B. Spectral Shaping Elements 
1. Resonances and Anti-resonances 
We use recursive equations, a technique developed by 
electrical engineers, to simulate resonant. and anti-resonant 
(notch) filters as elements to shape spectra. Each individual 
filter can be represented by a second-order linear differential 
1. Rader, Rabiner and Schaf er (1970) and Perry, Schafer. 
and Rabiner (1972) 
FUNCTION IRN4(X) 
COMMON NM1(19), NM2(19) 
DlMENsION NX1(19),NX2[19) 
C FORM BIT-WISE EXCLUSIVE OR OF NMl,NM2 
DO 10 I=1,19 
10 ~Xl(f )-MOR(NM1(1:),NM2(I) 
C ROTATE NXI, 8 PLACES TO TI.B RIGHT 
DO 20 I=l,ll 
20 NX2 (I+B)=NXl (I) 
DO 30 I=12,19 
30 NX2(1-11)=NXl(I) 
C SHIm PAST VALUES 
DO 40 I=l,19 
NM2(I)=NMl(I) 
40 NMl(I)==NX2(II 
C RETURN VALUE OF LEFT-MOST 12 BITS OF NX2 
IRN4=MINT(PfX2) 
C END 
]HETURN 
END 
Figure 8. Random Number Generator Documented In k'ortran. 
NMl(19) and NM2(19) are! arrays whose elements have either 
the value 0 or 1. MOR(Nl,N2) is a function returning the 
exclusive or of N1 and N2, variables having either the value 
0 or 1. MINT(N) is a function whose argument N is a bit- 
string array such as NM1 and which returns a 12-bit integer 
value consisting of the left-most 12 elements of N packed 
into a single word. 
80 
80 
'to 
20 
Figure 9. Typical Wave Form and Spectral Analysis 
of the Output From the White Noise Generator. 
For details of the display cf. Figure 4. 
equation, giving only one resonance or anti-resonance. FOX those 
who feel at home in the s-plane, the recent book 8p?.eCh -- Synthesis . --- 
edited by Flanagan and Rabiner contains rsprinta of pawrs 
developing the theory of recursive equation filter simulation: 
for the rest of us, the paper by Lovell et al. (1973) is a clear 
presentation, with some more general Fortran algorithms than 
will be given here. 
Figure 10, Rage 21, gives one Fortran subroutine and 
two Fortran functions which we use to simulate resonant and 
anti-resonant tllter8. 
The functions RE3 and ARES return the output values of 
simple resonant (conjugate pole pair) and anti-resonant (con 
jugate zero pair) filters. respectively. AO, Al, and A2 are 
coefficients used in the recursive equations. I and Y'M2 axe 
remembered previous values of the signal, and Y is the input 
to the filter. AO, Al, and R2 determine the characteristics 
of the filter: center frequency and bandwidth. Each simulated 
filter should have its own variables in which to save the values 
of YM1 and YM2, and between calls to the fdnction simulating 
that filter, the values of these variables ahould not be changed. 
The subroutine COEFF is used to calculate appropriate 
values for AO, Al, and A2, based on CF, the center frequency, 
and BW, the bandwidth. of the resonance or anti-resonance. 
1 
SR is the output sample rate, and MPZ tells the subroutine 
1. In the version of this paper presented at the 12th annual 
A.C.L. meeting, there was an error in line 16 of subroutine COEFF 
~UBRQUTXNL ~o~e~ccr,aw,~s,rr,~a,s~,~rz~ 
C 
C! COMPUTE8 THE R~CURbIVE EQUATION EQEFPXCIENT8 
C A0,AIIAQ FOR EXf'HER A RESONANT OR AN'TIRPBONATL PILTCR 
C ~IPEC~$'~~U 0Y CENTER FRl!!OULNCY CF AND BANbWtOTH IW (HZ.) 
C 8R 18 THE SAMPLE RATE (~AMPCESI~EC} 
S IP WPLqI, FILTER WIlL BE? A RPIONANCE 
C IF MPZ80, CXLfER WILL BE AH ANTTRCBONANCE 
?X~3,t4159Q66 
AaPXlrcBW/SR 
Bs2,8*PI*CP/SR 
AO~EXP Cm2,B*Aj 
A~~&,B+EXP (a41 +CO5 CB3 
&0mlnBrA1+A2 
IF (MPZ) 2flulflrQ0 
10 AO@l ,B/AB 
20 RETURN 
END 
QUNCPIQN RESCYtYM11YM2rA0,Alra2] 
C 
C SIMULATES RE80NATOR (CONJUGATE POLE PAIR1 
C GXYENpBY RECURSIVE EQUATION COEFPZCIENTS ABPA1,A2 
C YMi khlD VM2 ARE PAST VALUE$ OF YI THEIR VALUES 
C MUST BE SAVED 
YRESrnABtY+Al *YM~uA~*YM$ 
YM2aYMt 
YMtrVRES 
ISESrVRES 
RETURN 
END 
FUNCTION AREJ(V~YMIpYM2,A01hlCA2J 
C 
C SIMULATES ANTIRESONATOR (CONJUGATE ZERO PAIR] 
C GIVEN BV RECURSlVE EQUATION COLfPfCIEN79 AB,Ai,A2 
C Jn! AN0 YM2 ARE PAST VALUES OF Yt THEIR VALUU 
C MUST BE SAVED 
TEHPaY+A0 
ARESeTEMP*Al*YMl*A2+YM2 
YMZ!3YMf 
YMlrTEMP 
RE*~URN 
END 
Figure 10. Fortran Implementation of Elemental Filters 
whether to compute ~oeffi~i0nt~ fsr B rasonancs or an anti- 
XBEI onance 
The spectral effect of a resonance with center frequenay 
of 3000 He and bandwidth of 200 Ha is illustrated in Figure 11, 
page 23, while Figure 12, page 23, is a similar illustration of 
the, effect sf an anti-resonance of the same center freq'uency 
and bafidwidth. In these figures, the fixst graph shows output 
for an impulse input and the second graph shows the normalized 
intensity spectrum, of that output. 
2. Radiation Effect 
The effect~on the spectrum of,the radiation of sound from 
the lips through a short stretch of air can be reasonably approx- 
imated by a differentiat0r.l Figure 13,- page 24, gives a simple 
Fortran function KAD simuLating this effect. Y is the speech 
mve input to the simulater, YM1 is the remembered immediately 
previous value of Y, hfid G is a normalizing gain control which 
should be calculated in the over-all initiaLization as a direct 
function of the output sample xake, some K times OSR. The spec- 
tral effect of RAID, approximately a 6 dI3/octave rise, is illus- 
trated in Figure 14, page 24. 
IV. Organization of Elements 
A. The Simplest Model 
The simplest reasonable model f~r connecting these elements, 
which we have taken to calling "Mode3 TI', fs given in block 
diagram form in Figure 15, page 25. We use this organization 
In our currently ru'nning synthesizer. The three vdriable 
1. Rabiner (1968) p. 823 
-. ...- .- "r""- * " " +----a- 
-2011 -. 
*I-- ---- .,--- 
~1.m mrc. 
I 
28.11 
treac~ unvc <nounn~ t ZED) 
Figure 11. Spectral Shaping Effect of an Elemental 
Resonant Filter with CF=3000 'Hz and BW=200 Hz. 
Figure 12. Spectral Shaping Effect of an Elemental 
Anti-Resonant Filter with CF=3000 Hz and BW-200 He. 
RAD~O*~YIYMII 
YMlnY 
RETURN 
END 
Figure 13. Fortran Function to Simulate Radiation 
Ef feat. 
OU 2K YK BK 8% low 
FREQUENCY, KHz. 
~WTL~~~ZTY ~CECTRUH ~W~R~~LIZED, nhx. camr. r a+. 76 or., 
Figure 14. Spectral Shaping Effect of Radiation 
as Simulated by Function RAD. 

resonant filtexs are used to make tha formants of voiaad speech 
and, in a rather strained fashion, to- shape the noise spectrum 
during frication and aspiration. 
AIL of the elements in this figure should now be familiar 
except the one called "higher otder correction  filter.^^ This is 
a series of resonant filters of fixed center frequency and 
bandwidth which cornpensat e for the effect of higher -f requency 
resonances present in a real vocal tract but absent in a digital 
simulation of this kind. 
Their use is discussed in Rabiner ('1968.) 
from which the values presented in Figure 16, page 27, were taken. 
These are the values to use for center frequency and bandwidth of 
the higher order correcting filters. Only higher-order fjlters 
with center frequency less than & khe output sample rate should be 
used. The xecursive equation coefficients AO, Al, and A2 need be 
calculated only once, in the over-all initialiaation. 
Theoretically, the arder of computation of the series elements 
such as those in the main stem of Model T, makes no difference. 
However, because the digital numbers are finite in length round-off 
or truncation errors are introduced at each step of' the computation 
The overall error increases as the number of computational steps 
increase. Some types of computation such as differentiation tend 
to increase the error, while other types such as integration tend 
ro aecrease -me error. For this reason, overall system error is 
related in a complex way to the order of computation. An understandin 
of error buildup and testing of the various algorithms will help in 
choosing the computational sequence that results in smallest errors 
In the case of cascaded resonators. it is better to perform the 
computation in reverse order from that implied by Fig. 15 - radia- 
tion effect first, then higher order filters in descending center 
frequency order, then formant filters. 
Resonator No. Center Freq. (Hal Bandwidth (Hz) 
Figure 16. Higher Order Correction Filter Center 
Frequencies and Bandwidths. From Rabiner (1 968) 
C ZERO VARIABLE HOLDING NEXT SPEECH WAVE POINT 
YN=O -0 
C ADD GLOTTAL WAW 
YN-YN-tGLOT ( P, AV) 
C ADD FRICATIVE NOISE 
YN=YN+AN*(FLOAT(IRN4(X))/2047mO) 
C -PLY FORMANT FILTERS 
W 200 I=1,3 
200 YN==S(YN,ml(I) ,YM2{IJ,AO(I) ,Al(I),A2(I) 1 
C APPLY HIGHER ORDER CORREZTING FILTERS 
DO 250 I=l,7 
250 YN=RES(YN,~~(~),~~(I;),HAO(I),HA~(I).~~(I) 
CaAPPLY IRADIANCE EFFBCT 
YN=RAD(YN, RYM1, GRAD) 
Figure 37. Model T Logic in Fortran. 
The conversion of the Model T block diagram into Fortran is 
illustrated by Figure 17, page 27, the series elemel~ts being 
here computed in their natural order. 
This coding is an example 
of what should be inserted into the over-all logic (Figure 3, 
page 9) 
following the comment lines "GENERATE NEXT SPEECH WAVE 
I I 
POINT. , . . 
B. Control 
In the Model T organization, nine control parameters are avail- 
able -- P (pitch), AV (amplitude of voicing), AN (amplitude of noise) 
and the center frequency and bandwidth of three variable formant 
filters. In the main loop, just before generating the next speecn 
wave point, a subroutine (it can be in-line code, of course) calcu- 
lating values of these parameters is needed. 
If at the beginning 
of the program the variable T (time) is initialized to zero 
and lncremented by TDEL at the end of the main loop, it can serve 
as a simulated-time clock on which to base calculation of the, 
control parameters. The simplest method of control is to formu- 
late the desired control parameter curves algebraically and just 
include Fortran statements in this sedtion calculating their 
values as in the algebraic equations. For example, suppose we 
mnted the pitch to rise linearly from 80 to 120 Hz in the first 
100 msec, stay constant at 120 Hz for 200 msec, then fall linearly 
ta 100 Hz in the next 100 msec and stay at that value from then on. 
The following Fortran statements can be used to calculate P: 
60 TO 270 
260 P=lOO ,O 
270 CONTINUE 
When new values of CF and BW are computed for the three 
variable formant filters, subroutine COEFF should be called to 
translate these into the coefficients AO, Al, and A2 actually 
used by function RE$. 
If new valQes of thi control parameters are calculated every 
sample point, the execution time of the program will be very long 
There ape several obvious ways to speed up this calculation. 
One way is to calculate new values for P and AV only at 
the. beginning of each pitch period, since this is the only time 
GLOT uses them. 
With some error introduoed. the control parameters can be 
re-computed only every so many meeo to speed things up. A 
variable used as a time clock in the same way that TG is used 
by GLOT can control this period. Computing the source control 
parameters (P, AV, and AN) this way introduces error only in that 
the actual parameter curves will follow the desired curve in s 
step-wise fashion, but changing the characteristics of the 
formant filters this way will introduce another type of error, 
which comes out sounding like clicks or static if the change in 
filter characteristics is too large. 
Our currently-implemented a~sembly-language synthesizer 
reads a fil'e of tabled values created by another program as 
values representing the parameter curves. The period between 
tabled parameter values is changeable, but on the order of 5 
to 10 msec. In computing the actual parameter values used, t%e 
synthesizer interpolates linearly along the tabled parameter data 
curves. The step size of the interpolation can be easily changed, 
allowing a smooth trade-off between accuracy and execution time. 
To sum up, computing new control parameter values fox each 
sample point generated is the easiest and most accurate way, but 
alternative schemes allowing a convenient trade of accuracy for 
speed are easily programmed 
C. Other Models 
We will briefly describe several alternative organizations 
of the elements, although most of our practical experience has 
been with the Model T organization. 
1, Parallel Formant 
One model used in some synthesizers la the parallel 
formant model, whose block diagram is given as Figure 18, 
page 32. 
In serial formant models such as Model T, no 
independent control of the relative i~teensities of formants 
is possible, since the order of operatiom is immaterial. It 
has been shorn that the relative intensities of formants in a 
serial svnthesis closely match those found in natural speech, 
1 
which is some justification of the serial model as an analog 
of the vocal tract. But in a parallel formant arrangement, each 
parallel channel must have a separate gain control. This is fine 
if voulre investigating the perception of relative formant intens- 
ities, but not many have chosen this model for general speech 
synthesis 
Rabiner (1968) contains a worthwhile discussion of the 
relative merits of serial and parallel synthesis. If you decide 
to try a parallel formant model, the higher order correcting 
filters are apparently unnecessary, and Rabiner (1968 ) mentions 
that zeros -- anti-resonances -- are introduced into the spectrum. 
2. Separate Noise Shaping Channel 
In Model T, the sqme three filters are used to make the 
formants ot voiced speech and to shape the spectrum of noise 
during unvoiced speech. This is cumbersome and difficult, and 
a simple alternative is illustrated in Figure 19, page 34. 
1. Fant (1956) 
2. so does Flanagan (1957) 
I GLOTTAL  WAVE^^ AV 
RADIATION 
EFFECT 
. 
1 
1 
I 
h 
-. 
Figure 18. Parallel Formant Organization Model 
RESONANCE - CF1 RESONANCE -CF2 
NO, 1 NO. 2 
- BW1 
m L 
* 
RESQNANCE 
NO. 3 
- CF3 
- BW3 
A separate channel ie devoted to nolae, with its own resonant 
and anti-resonant filters for spectral shaping. 
It haa been 
suggested that one remnance and one anti-resonance axe sufficient 
to nodel me$ English fricatives. 
1 
Of course, this model adds 
two new control parameters ta be computed. 
3. Other More Complex Models 
Model T does not use anti-resonances; they are not typical 
of voiced speech, but rather are present in the spectra of 
faicatives and nasalized segments. For making nasal sounds, a 
parallel nasal channel whose input is the glottal wave and whose 
output is added in just before the radiance effect calculation 
can be added. The spectral shaping filters needed in this channel 
are not obvious from published reports, but one variable anti - 
resonance and several fixed resonances are probably a minimum 
complement. 
A multitude of more complex models can be seen in the litera- 
ture, -- Rabiner (1968). to take one example, includes a special 
a'rrangement for generating voiced fricatives. 
D. A Complete Example 
To illustrate the capabilities of the ~implest synthesis 
model, on the following pages we present as Figure 20 a complete 
Fortran program for synthesizing the wrd llseatll ( 
sith] ). 
We tried to duplicate one particular token utterance of, this word. 
Spectrograms of the original sound used as a model and the syn- 
thesized sound calculated by the Fortran program are shown in 
-- 
-- 
1. Heinz and Stevens (1961) 
GLOTTAL WAVE 
PARAMETERS 
RESONANCES VOCAL TRACT 
PARAMETERS- 
NTI-RESONANCE- 
Figure 19. Block Diagram of Model With Separate 
Noise Shaping Channel 
Figme 21, page 44. Deriving the parameter curvQa from the 
token utterance took a lot of work, but as Figure 21 shows, the 
rssulkihg ~lynthetic word is a reasonably close copy of the original. 
For those who may want to use, this program as a beginning to 
their work, several features of it will be explained. 
The basic model used is the simplest, model W"f (Figure 15, 
page 25) with one addition: the noise signal is multiplied by a 
relative gain constant (GFRIC) before entering the vocal tract 
(variable filter) section. 
The output sound wave is stored a block at a time in file 
WWRK1. This file is opened by the subroutine called in line 43 
written into in line 455, and closed in line 470. 
Parameter values are periodically calculated from piece- 
wise polvnomial algebraic speciffications in the section called 
!lGPARfI, lines 142 to 415. The periods between parameter 
re-calculations are controlled by twa variables serving as 
clocks, TVOC for voicing parametexs and TFRIC for frication 
parameters. The values of PVOC and PFRIC are the times in msec 
between re-calculations of vocalic and fricative parameter values, 
respectively. 
These parameter values can be reset more often by 
merely changing the values assigned to PVOC and PFRIC in lines 
93 and 37; at present frication parameters are reset every 
0.1 msec and vocalic parameters every 0.2 msec. 
The Fortran 
logic calculating the parameters was coded for clarity, not econ- 
omy, and though lengthy should be easy to follow. 
The primitive 
subroutine TPOW returns pawrs of a variable for ease in calcu- 
lating polynomial functions of time: 
after calling TPOW(T, N, TX) 
~x(I)=T~, TX(Z)=T~. . . . TX(NI=TN 
Certain paramaters are constant during aome of the sounds, 
e.g., spectral parameters during ft3fN. As a minolr concession to 
exeoution speed, these constant parameters are not reset after 
the first entry into the section during which they are constant 
The variables 19W1, ISW2, and 1SW3 are irst-time-through" 
switches, used tb remember whether or not the tempora1:il.y 
constant parametei-s have hen calculated yet. 
We have found that duplicating a token of natural speech 
using thLs simple basic synthesis model, though possible, can 
be quite difficult, requiring much trial-and-error work. 
Fortunately, much research on speech perception does not require 
exact duplication of given utterances, but instead uses simpler 
sets of parameter curves. An example of such a program, which 
synthesizes the vowel /it with constant pitch and intensity can 
be made by substituting the following code for the "GPARfl Section, 
lines 151 to 411 of' the sample program. 
IF (ISW1) 150,100,150 
ISWl=l 
P=lOO. 0 
AVDB-15 0.0 
AV==~O.O~*(AVDB/~O.O) 
DO 110 I=1,3 
CALL COEFF(CF(I),BW( I),AO(I),Al(1)4A~(~),~~~, 1) 
CONTI: NUE 
FORTRAN SYNTHESIS TE3TER 
W~YN~~~,YO 
SYNTH5SXZES VSEAT" 
CQMqON ISWV,DT~TGcTP,T~,T2~TRpCITRtAVS~VE 
DXMENSION IRUF (256) ,C.F (181 ,BW (In) #A@II91I A1 [I@) #~P(10), 
YH~ (101 ,YM?(t0) ,rn ti@) 
C 
C 
C OVERALL XNITXALZZATlON 
C 
C 
C SET OUTPUT SAMPLE RATE IN SAMPLESlSEC 
OSRa2&7Glfl@,8 
C SET OUTPUT BLOCK SIZE AND NO, OF BLOCKS 
NBSXZEo256 
NI)LKSxSI 
C SET WAVE SHAPE CONSTANTS FOR GLOTTAL WAVE GENERATION 
QPTR=Q),49 
CtfR=C3,16 
C SET OVERALL TIME CLOCK AND DELTA T IN MSEC, 
Tan@ 
DT~1QBG¶,@/OSR 
C SET PARAMETER~RES~TTZNG Cl.OtK FOR PRXCATXON 
TFRlCltRI,QI 
c SET PARAMETERaRESEfTXNG PERIOD FOR FRIXATION 
PFRIC=a,i 
C SET PARAMET ERnRESETTING CLOCK f OR YOJCING 
TVOC=M ,Q1 
C SET PARAMETER~RESE~TING PERIOD F 
PVOC~c),2 
c OPEN OUTPUT STORAGE FILE 
c OUTPUT BUFFER IS ARRAY ISUF 
NQUTa5 
CALI,, OPENR(NOUT~"RAf'wRK1 'InOD''clcLENG,XflUFt%l@) 
GO TO 20 
C ERROR HANDLING IF OUTPUT FILE CANNOT BE OPENED 
191 WRXTECI, 15) 
15 FORHATCH *** ERROR OPENING WORK FfLE*I 
GO 7'0 9CaB 
C FILE OPENED O,K,t CHECK IF BIG ENOUGH 
28 IF (NR(.US*lENG) 35135,25 
C ERROR HANDLIHG -- FZLE NOT BIG ENOUOlI 
23 WRfTE(lr3B) 
3g FCIRMAII~~ +++ WOR~ FILE NOT elCi ENQUGHII) 
GO 70 9PP) 
C IIllTPUT WORK FILE READY TO GO 
C SET VOICING SHITCY OFF 
35 tswv=0 
C INITIALIZE HIGHER ORDER FILTER VALUES 
C IN REVERSE ORDER BY CENTER FREQUENCY 
CF (4) zQSCI0,B 
CF (51 =ssce,a 
CF (61 =75Qla,0 
Cf (7) n65C30.0 
CF (8) ~5588,8 
CF (9) t4SG90 ,B 
Figure 20. Example Program Synthesizing Weat If. 

: IF FIRST ENTRY INTO WSI' ROUTINE, 9ET CONSTANT PARAMETERS 
F (xswI1 l~s,lss,¶l5 
La5 AVmOl,t?l 
CF (I)a40flBn0 
EP f?)*37fl0,B 
CF (3) "98flfl.@ 
8WCt)tlSfl@.fl 
BW (2) 93VIVI6,Pl 
8W (3) m49Pfl*8 
DO 210 I=ld3 
110 CALL, CQEFF(CFC~)~B~~I~~A~~CI~IA~~I)IA~~I~IO~R~~) 
ISWl~l 
C IF TIME TO DO SB, RESET VARIABLE ~ARAMETERS 
115 IF [TFRXC) 12e, 12mg 125 
12C3 CCINTINUE 
G IF t*5, NO NOISE 
IF (7-5.n) la;;rr i24,i24 
$22 ANJQ),~ 
GO TO 126 
124 CALI. TPOH (Te4,TX1 
ANOl3~+3~!50*1~6375+TX C1l~OJ~02O32*TXC2l+B~~@0~I~~*T% (3) 
1 -0.~1~91~191~~2242wTXt4) 
ANmlBIB**(ANDB/2Gl,0) 
125 CONT~NUE 
GO 10 @50 
C 
C END OF 99" 
c 
C uEAu VOWEL? 
2041 tQ (Tm448m981 2flSr31Br316 
C 
C PEA" 
C 
C IF FXRST TIHE THRU, SET CONSTANT PARAMETERS 
205 IF CXSw2) 2101210,215 
210 AN=@,@ 
8W C11=54.0 
BW C23 s55,B 
0W C3>+170,G 
ISN2sl 
C IP TIME, RESET VARIABLE PARAMETERS 
215 IF GTVOC) 220p22flt659 
220 CONTINUE 
c PITCH 
CALL TPOW~TU~~~,~,~,~XI 
P~146,243*0,A25185d*TX~t)~@mQlQIP)1925P)78fT%[2) 
1 mG.OBR@G5R18408+TX (3) mBe~0@0@42d6ld68 1 *TX (4) 
C AMPLITUDE OF VOICING 
IF [Te24Sa01 41@0410,420 
410 GALL TPOW(T~219~51~3~tX~ 
A~~A~43,493+B,fi19573*TX(11~0,QI2476343+TXCZ) 
1 *0,00&72638669+TX (3) 
GO TO SIB 
420 IF CT~29flaf11 A38r4300440 
430 AVOB~47,34963~B,l457143*CT~245,.8) 
GO TO 518 
440 fFCTm33Sm8) 458,450,468 
Figure 20. [continued) 
45Q) AVDRs4Fl ,8rRP072* [7r2991,0¶ 
GO TO 51Q, 
48a IF (Tm4VIgm0) 478,478r480 
470 AVPB~37,2964~*6nf143*CT~33B~0) 
GO TO Sltl 
480, 1 (Tn43PI.P)) A9@,4Qglr58QJ 
490 CALL TPOW(T~4@8,t9r2rT%) 
AVQB~39,86422a0,~7127*TXC~~*O,02243*~XC2) 
GO TO Sin 
SQQ CALL t~owtt~nzs,n,2,~~) 
AVQ~~29,48846~lr3794i6*TX(~1*0,0285?102*TX(2~ 
519) AVn!RQ~**(AVDB/2Q,B) 
C CPI 
IF (Tn26fl,01 225,238,226 
225 GF(~)r25a,m+1.23*CT~2191)J1> 
60 TO 23PI 
228 JP CTa395,91) 2271228c228 
227 CF (13 ~3~la.n 
GO TO 23C4 
228 CP (l3~3t4@,Bw4,8+ tTv395,0) 
230 CONTINUE 
C CFC2) 
XF CZa32QI.G?) 232,232,234 
Q32 CALL TPOW(7~219,51,2,TX) 
CF (21 *1732.857+14.95833*TX Cl>bO1t'l7tI76rTX (2) 
GO TO 250 
234 IF C'T*395,@) 235,235,24@ 
259 CF f2) 3250P~ ,0 
GO TO 258 
2 441 CF (2) =2508,0m20,0+ (Tm395,8) 
250 CoNTf NU€ 
C BW(23 IS CONSTANT 
C CFC33 
If (Tm344.8) 255,255r260 
255 CALL TPOW [T~219r51r21TX) 
CF (3) a2411 r078+1B,77844*TXCt~-018-4028484*T~(2) 
GO TO 275 
268 TP <T-395,8) 265r26Sr270 
265 CF CJ] a3125,0 
GQ TO 275 
270 CF (3) r3125,0m25,B+ (T*1395,0) 
275 CONTINUE 
C BW(3) IS CONSTANT 
380 CONTXNUE 
C NO NEED TO RESET FILTERS IF 358cTa394 
IF (Tr35C3.01 3p1513f121S9)2 
3t42 IF (Tm394,a3 650,30$,30!5 
303 OD 308 XsL.3 
3418 CALL COEFFCCFCI) ,BWCZ> ,AOCI),A~ [I] ,A~cI] COSR, 1) 
GO TO 656 
c 
C 
C SILENCE+RELEASE OF nl"t 
C 
310 f f (Tm50)2r701 315132@r32@ 
315 ANu0,B 
AV:crl,B 
Flgure 20. (continued) 
GQ TO 488 
32 41 JF [TmCI17*77) 325,310,315 
C IF FIRST TIME, SET CONSTANT FEATURE8 OF RELEASF 
325 IF (fSW3) 338,33@4?540 
330 XSW3*i 
FQCl)alB@@,fi 
CP (21=ia~10.0 
CF (3) b43R0.8 
0W (i)ai50eB 
BW 12)alSB10 
BH [a) ~25G¶0,0 
DO 335 1~1.3 
335 CALL, COEFFCCF(f) ,~WtXl,A@CX~ IA~CI~IA~(~~IQSRC~~ 
C Ah 
340 IF (TFRIC) 345, 34sB48Q 
343 ZF (T~!3S2~811 344,3477347 
346 ANDB=ll I 
GO TO 395 
347 IF CTlr512m0) 350,355,355 
350 ANDOt94 ,Ci 
GO TO 395 
355 IF (Ta538.0) 368r365p$65 
368 ANDBg89,S 
GO TO 395 
365 ANOB~0gn5rP!,40* (T~538~0) 
393 AN=l@,O**CANDB/2O,Q) 
408 CONTINUE 
I: 
C 
C 
C fXWAL STEP -9 RESET TIMERS 
C 
650 IF CTFRIC) 660p66Qv678 
668 TFRZC*PFRIC 
@7 B IF (TvbC) 688,68Fn,690 
688 TVOCaPVOC 
690 CONTINUE 
C* ? 
C* EN0 OF GPAR q 
c* * 
C+*+***++**+*****~***************'IC****f****************** 
C 
C ADD GLOTTAL WAVE 
YN=YN+GLOf tP, AV) 
C ADD NOISE 
C MULTIPLIEO BY RELATIVE NOISE GAIN 
VNPYN+GFRIC+AN*FLOA~ILRN~(X)~/~~~~,~ 
C APPLY COWER THREE [FORMANT) FILTERS 
C AND HIGHER ORDER CORRECTION FILTERS 
00 7em l=l,IB 
70GJ 
YmREsCYNtYMl (TI BYM~CI) 4 A0 (I) ,A1 (X> ,A2CX)) 
C SXMULAT E RADf ATION EFFECT 
YNaRAO(YNpYMIF?ADBOSR) 
C HUtTlPL'f BY OVERALL GAIN FACTOR 
YN=YN+GALN 
C CHECK FOR CLIPPING 
IF (ABSCYN9*2847,0) ?38,7JBr71@ 
Figure 20. (continued) 
C ERROR a- SPFECH 13 CLXPPEO 
710 WRITE (1,728) 
?28 FORMAT (1' *** CLIPPEDH) 
GO TO 9gQl 
C 0°K" m- NOT CLIPPED 
730 CONTINUE 
C STORE SPEECH WAVE POINT 
IBUF CXPT)=fFIX (YN) 
C INCREMENT GENERAL TIME CLOCK 
TaT+Dt 
7.5 0 CONY XNUE 
C 
C EVD OP BLOCK~SIZE~ LOOP 
c 
C WRITE OUT BLOCK 
CALL, WR1TRCNOUT,IBLK~IBUFIX760) 
GO TO 8UPI 
C 140 ERROR HANDLING 
7641 WRITE (1,770) 101-K 
770 FOFZMAT(~~ *** Xi0 ERROR BLOCK NO, I',XS) 
GO TO 9.00 
800 CQHTINUE 
C 
c END or I,.OOP ON NO, OF s~oc~s 
C 
C CLOSE WOH6 FILE 
CALL CLO$R CNOUT) 
C END 
gae WRXVE(~,S~~) 
91a FORMAT('' HIT SPACE BAR TO QUIYn) 
PAUSE 
CALL EXIT 
END 
C 
C WSYNI ,FD 
C 
C SYNTHESIZES HSEAT" 
C 9/27/74 
Figure 20. (continued) 

V. Final Remarks 
The simple schemes for speech ~ynthesi~l presented here are 
sufficient to make speech containing most of the sounds of English. 
Figure 22, page 45, is a handy reference conralnlng char- 
acteriskics of English vowels from several, primary sources, 
assembled in tabular form by Dunn as part of his section of 
Automatic Speech Recognition (1963). Since this reference is 
- 
not easily available, the primary sources of Dunn's data are also 
listed in the bibliography. The fundamental and three formant 
fxequencies for men, women, and ckildreh were taken from Peterson 
and Barney (1952). The three formant amplitudes for each vowel 
are also from the Peterson and Barney measurements, but averaged 
over all three classes of speakers. Dunn took the relative 
strengths of the first formants of the different vowels from the 
measurements of Sacia and Beck (19261, as quqtdd by Fletcher (1953) 
arbitrarily assigning the zero decibel level to the strongest 
vowel, [:3] 
The three formant bandwidths are the average of 
three sets given by Fant (19621, one set his own measurements, 
one those of House and Stevens, and the third measured by Dunn 
himself (1961). Dunn cautions that the bandwidths vary widely 
among individual speakers. 
These values of center frequency and bandwidth of formants 
for the diffetent vowels can be used as variable formant filter 
control parameters in Model T with unimpeachable results. 
- 
Vowe 1 i I & a3 a 3 u u A 3 
As in heed hid head had hod hawed who'd hood hud heard 
Fundamental Frequencies (cycles per second) 
M 136 135 130 
127 121 12 9 1!1 1 137 130 133 
W 235 232 
223 210 
212 216 231 232 221 
218 
ch . 272 269 260 251. 256 263 27Q 
276 261 
261 
Formant Frequencies (cycles per secona) 
F1 
270 390 
530 660 730 570 300 ??o 640 490 
w 310 430 610 860 850 530 370 l7c 760 500 
ch. 370 530 690 lolo 
1030 6eo 
30 560 850 560 
F2 
2230 1990 1890 1720 1090 830 870 1020 1190 1350 
w 2790 2180 2330 2050 1220 920 950 1160 1400 1610 
Ch. 3200 2730 2610 2320 1370 1060 1170 1910 1590 1820 
Formant Amplitudes (db) 
L1 
-9 -3 -2 -1 -1 0 -3 -1 -1 -5 
L2 
-2LC -23 -17 -12 -5 -7 - 19 - 12 - 10 - 15 
5 
-28 -27 -21 -22 -28 -3% 43 -33 -27 -20 
Bormant Bandwidths (cycles per second) 
B1 59 53 
18 63 51 $3 50 40 52 34 
B2 55 69 69 81 57 $7 $9 74 57 58 
4 
170 113 101. 126 93 68 77 62 89 6q 
Figure 22. 
Average Measured Characteristics of Vowels 
From Dunnfs section (p. D-2) of Automatic Speech Recognition 
(1963). 

Bibliography 

Automatic Speech Heoognition, material fpr an intensive 
course for Engineers and Scientists, The University of 
Michigan Summer Conferences, Ann Arbor. 1963 (2 Vols .-) 

Dunn, H.K., 1lMethods of Measuring Vowel Formant BBndwidths" 
J. Acoust Soc. Am. - 33 (12); 1737-1746 (1961) : 

Dsnn, H.K., Flanagan, J.L., and Gestrin, P.J., '1Complex 
Zeros of a Triangular Approximation to the Glottal Wave1[, 
J. Acoust. Soc. Am. -I 34 1977 (A), (1962 ) ; reprinted in 
Automatic Speech Recognition. 

Fant, C.G.M., Iton the Predictability of Formant Levels and 
Spectrum Envelopes from Formant Frequencies1I, in For Roman 
Jakobson (The Hague, Mouton, 19561, pp. 109-120. 

Fant, C.G.M., 1ISpeec.h Analysis and Synthesisll, in Air Force 
Cambridge Research Laboratories Technical Report #62-790, 
pp. 32-34, January 31, 1962. 

Fant, C.G .M., and Martony, J., "Instrumentation for Parametric 
Synthesis (OVE 1 1, STL-QPSR - 2, 18-24 (1962). 

FLanagan, J ,L . , "Note on the Design of 'Terminal-Analogt 
Speech Synthesizerst1, J. Acou~t. Soc. Am. - 29 (21, 306-310 (1957) 

Flanagan, J.L., IISome Properties of the Glottal Sound Sourcell, 
J. Speech & ~eal'ing Res, - I,, 1-18 (1958) 

Flanagan, J.L., and Rabiner, L.R. (eds.1, speech bynthesis; 
Dowden, Hutchinson & Ross Inc.. , Stroudsburg, PA (1973 

Fletcher, H., Speech and Hearing in Cbmmunication: 
D. Van Nostrand, Princeton, N.J . (1953) 

Heinz, LM., and Stevens. K.N., Iton the Properties of 
Voiceless. Fdrcative Consonants't, J . Acoust . Soc . Am. 33 
(s), 589-596 (1961) 

Lovell, J . D., Naael, D.C.. gnd Carterette, E.C., ItDiqital 
Filtering and Signal Processingir, Behav. Res. Meth. & Instru 

Perry, J.L., Schafer, R.W., and Rabiner, L .H., "A. Digital 
Hardware Realization of a Random Number Generatort', ,IEEE 
Trans. Aud. Elect. AU-20 4 236-240 (Oct. 1972) 

pet erson, G.E., and Barrtey, H.L . , tfCont xol Methods Used 
in a Study of the Vowels", J. Acoust. Soc. Am. 24d2). 
175-184 (1952 ) . 

Rabiner, L. R., "Digital -Formant Synthesizer for Speech- 
Synthesis Studies1l, J. Acoust. Soc. Am. 43, 822-828 (1968) 

Rader, C.M., Rabiner, L.R., and Schafer, R.W., flA Fast Method 
of Generating Digital Random  number^^^, Bell Sys. Tech. J. 49, 
2303-2310 (Nov. 1970). 

Rosenberg, A.E., IJEff-ect of Glottal Pulse Shape on the Quality 
of Natural Vowelst1. 3. Acoust. Soc. Am. 49 (21, 583-590 (1971). 

Sacia, C.F., and Beck, C.J., "The Power of Fundamental Speech 
Soundsff, Bell Sys. Tech. J. 5, 393-403 (1926'). 

Sekimoto, S., 
"A Computer-Controlled Speech Synthesizeru, Ann. 
Bull. No. 7, Res. Inst. Logo. & Phon., U. of Tokyo. 39-44 (1973) 
