T~j K. BH^w/ai 
A COMPUTATIONAL INVESTIGATION ON THE 
PERCEPTION AND ACQUISITION OF ASPIRATION 
O. Introduction. 1 
The phenomenon of aspiration in Hindi has intrigued phoneticians 
and phonologists for some time. However, so far no adequate investi- 
gation of this phenomena has been made. 
The earlier acoustical (i.e., perceptional) studies which have been 
performed on the phonetic aspects of Hindi can be grouped into two 
1) acoustically-oriented, and 2) linguistically-oriented studies. In the 
area of acoustic studies, the most significant work was done byJ. GUVTA, 
S. AGRAWAL and R.. AHUAD (1969) and 1~. AHMAD and S. AG~AWAr 
(1969). In their experiment they have revealed the significant features 
in the perception of Hindi consonants in normal as well as in clipped 
speech. For example, they pointed out that the average effect of dip- 
ping on features follows the order: 1) place; 2) nasality; 3) flapped 
liquids; 4) liquids; 5) continuants; 6) voicing; 7) friction; 8) aspiration; 
9) affrication, i.e., the place of articulation is most important in the 
intelligibility of any sound, and affrication is least important. The 
higher the rank, the higher the intelligibility. Earlier, W. J. BLACK 
and S. SINOH (1966), in their experiment with four language groups, 
namely English, Hindi, Arabic and Japanese, have also pointed out 
significant features. However, their rank order is as follows: 1) na- 
sality; 2) place; 3) voicing; 4) friction; 5) liquid; 6) duration; 7) aspi- 
ration. The focus of the above experiments was not aspiration. The 
same is true about the linguistically oriented work done by D. P. GA~Dm 
and S. J^GGI (1971). The present study of Hindi consonants is signif- 
icant for the following reasons: first, it examines the predictive role of 
1 My thanks are due to the following for their comments on this paper: Chin-chuan 
Cheng and Chin W. Kim. They are, however, not responsible for any mistakes in this 
project. 
34 TEJ K. BHATIA 
phonetic science in the light of the recent theory of aspiration pro- 
pounded by C. W. KIM (1970). Second, it questions the absolute pre- 
dictive power of contrastive analysis. Third, it investigates the acqui- 
sition and the development of perceptual cues in a certain amount of 
time by proper language training. Fourth, it presents an account of 
"perceptual interference" and also establishes " the hierarchy of dif- 
ficulties" (or probable error) on the part of English speakers. Thus 
this investigation has pedagogical merit, too. 
1.0. Methodology. 
A context-free data of minimal pairs of a set of 22 consonants in 
initial, medial and final position was collected. The minimal pairs are 
of two types: 1) unvoiced unaspirated vs. unvoiced aspirated; 2) voiced 
unaspirated vs. voiced aspirated. Minimal pairs across the two types 
were also collected. The total number of items in the data is 62, with 
the following syllabic structures: CVCVC (18), CVCV (3), CVC (37), 
VC (2), VCC (1), VCV (1). Both meaningful and non-sensical, but 
phonologically possible, pairs of words were included in the data. 
The randomized data was presented to three native speakers 2 of 
Hindi for recording. The recording of this data was made in the Uni- 
versity of Illinois Phonetic Laboratory at the speed of 33/4 n,s on AMl, rX 
Model AG 440 tape recorder. 
In order to include all 62 items but to maintain the random nature 
of the data, the recording of each speaker (S1, S~ and $3) was cut at 
two uniform points. Thus, the recording of each speaker was divided 
into three parts (X, Y and Z) and was joined together as shown in the 
diagram. 
Speaker Tape 
I had three informants: two males (myself and Mr. A_nil Arora) and one female, 
Mrs. Vimala Mohan. They are from Delhi, Pant Nagar (U.P.), and Lucknow (U.P.) 
respectively. My thanks are due to them. 
PERCEPTION AND ACQUISITION OF ASPIRATION 35 
This tape (which I shall call T1) included three readings and each 
reading contained the voice of three informants. 
The final version of the perception test tape (Tz) was prepared by 
copying T1 and by inserting the necessary instructions. In T~ sufficient 
space was inserted between each item so as to allow subjects enough 
time to mark their responses. 
The test matrix s of 62 X 4 was constructed by presenting the 
minimal pairs of every correct item. For example, if the correct re- 
corded item is /kor/, the test matrix was prepared in the following 
way: Ikhorl Ikorl Igor~ Ighorl. 
The perception test was relayed in the sound-isolated phonetic lab- 
oratory of the University of Wisconsin, Madison. The test-tape was 
played from the teacher's booth and 25 English-speaking subjects 4 
heard it in their respective booths. 
The IBM 360-25 was used to perform a quantitative analysis of more 
than 18,000 items. The test matrix was assigned codes. The integers rep- 
resented the vertical position of the item. On the horizontal scale 
A, B, C, D represented 1st, 2nd, 3rd and 4th position, respectively. 
On the data cards, all the responses were punched according to 
the following input format: 1) one or two integers represented the 
vertical position of the item; 2) A/B/C/D/represented the horizontal 
position and 3) was followed by ', '4) representing the end of the reading. 
2.0. Results. 
Tables I, II and ni represent the distribution of the records made 
by the subjects in the initial, middle, and final position respectively. 
The consonants given along the horizontal axis represent the sound 
which was perceived by the subjects and the consonants along the ver- 
tical axis indicate the consonants which were spoken by informants. 
For example, in Table I, the second line indicates that kh was spoken 
in the initial position. Out of 75 occurrences of hk, 19 times it was 
8 I am thankful to Mrs. Y. Kachru for the various suggestions in selecting data and 
for helping me design the test matrix. 
4 My subjects were 25 English speakers who were from various universities of the 
United States. In the summer of 1971 they came to Wisconsin to attend Summer School. 
All of them were going to leave for India to stay there for a year after the completion 
of intensive language training. They were well-motivated and the perception test was 
presented on the last day of language training. 
36 T~J K. BHATIA 
\[,... 
% 
OOOO 
PERCEPTION AND ACQUISITION OF ASPIRATION ~7 
OO 
OO 
P~u3 
38 TEJ K. BHATIA 
II 
E 
O 
PERCEPTION AND ACQUISITION OF ASPIRATION 39 
perceived as k; 55 times correctly as kh; and zero times as g and gh. 
Once there was no response. Thus, out of 75 occurrences of kh, 55 
times it was correctly responded and 20 times it was confused. 
The diagonal represents the correct responses given by the subjects 
while readings on the left or right of it denote errors. In the tables, 
NR stands for "No response", and TC stands for "Total Confusion" 
which is the sum of all the readings which appears on the left or right 
of the diagonal plus NR. 
The results presented in these Tables (I, It and m) are summarized 
below: 
(1) In all the positions, unvoiced unaspirated consonants, such 
as k, c, t, T and p are mistaken more than unvoiced aspirated consonants. 
In medial position the only exception is TH. TH is more confused 
than T. 
(2) In initial and medial positions, voiced aspirated consonants 
are more confused than voiced unaspirated consonants. The exceptions 
are DH and gh in initial position. 
(3) In contrast to initial and medial position, the confusion in 
voiced unaspirated consonants is more than in voiced aspirated con- 
sonants in final position. The only exception is g. 
(4) The rate of confusion in the palatal series is much higher 
than the rate of confusion which took place in other series. 
Thus, the above results indicate that subjects reacted differently in 
final position and in initial and medial position in the case of voiced 
aspirated consonants. 
Table iv points out the first and second probable errors and pre- 
sents a dear picture of mistakes made by the subjects. The probable 
error is drawn from the readings of Table I, II, and In. First probable 
error refers to the most frequent mistake while the second probable 
error to the next most frequent mistake. For example if g is 5 times 
mistaken for k and 3 times mistaken for kh, then the first probable 
error for g will be k and the second probable error will be kh. 
In some cases there is a probability of three errors but the third 
one is the least confused; that is why it is omitted in Table iv. The 
most important error is the first probable error. The error which is 
responsible for 33 ~/o or more of the confusion is marked as significant 
error and is indicated by a line under it and if 5 ~/o or less confusion 
is caused by an error, that error is considered to be insignificant and 
is indicated by a star. 
40 Tv.j x. BHATIA 
TAet~ IV. Probable error matrix (for TABT.~ x, 9, and 111) for initial, middle and final con- 
sonants. 
Consonants 
First Second 
Problable Error Probable Error 
Initial Middle Final Combined Initial Middle Final 
position position position Error position position position 
k kh kh kh kh g~h* 
kh k k k k 
g gh gh gh gh k* k* 
gh g kh g g kh* g 
c ch ch ch ch j j* 
ch c c c c 
j jh jh jh jh c* 
jh ch ch j ch/j j j 
T D TH TH TH/D TH D* 
TH T T T T D 
D DH DH DH T* 
DH D D D T/TH* T* 
t th th th th dh* dh* 
th t t t t dh* 
d dh dh 
dh d d d d t/th th* 
p ph b ph ph bh* 
ph p p p* p 
b bh bh bh bh p ph* 
bh b b ph b~h p~h* ph* 
R RH RH RH 
RH R R R 
gh* 
g* 
kh* 
e* 
C* 
DH* 
DH* 
d 
t* 
t#h* 
b 
b 
* l~epresents insignificant error (CONFUSION is 5 % or less) 
Underlined consonants are significant errors (CONFUSION is 33 % or more) 
PERCEPTION AND ACQUISITION OF ASPIRATION 41 
The probable error in initial, medial and final position is determined 
from Tables I, rt and m respectively. And then on the basis of signif- 
icance and frequency of the error in all three positions, a combined 
error is determined. The two other results which can be drawn from 
Table IV are given below: 
(5) First probable error indicates that the confusion occurred 
most frequently between the consonant classes which can be distinguished 
by a single feature, i.e., either by aspiration or by voicing. 
The other indirect result which can be arrived at is that there is 
not a single example in the first probable error which indicates that 
the confusion took place between consonant classes which can be distin- 
guished by two features, i.e. voicing and aspiration. Second probable 
error record shows that such type of confusion did take place but it 
was insignificant. 
TAI3t~ v. Rank Order of the Perceptually Confused Consonants. 
Initial Middle Final Combined 
One Feature Position Position Position Rank Order 
\[-- aspirate\] 1 1 1 1 
\[ + aspirate\] 2 2 2 2 
\[ + voiced\] 3 3 3 3 
\[-- voiced\] 4 4 4 4 
Two Features 
\[++ voiced \] aspirate" 1 1 2 1 
voiced 1 \[~ aspirate ~ 2 2 1 2 
\[ + voiced 1 __ aspirate ~ 4 3 2 3 
\[T voiced \] 2 4 4 4 aspirate J 
Table v presents the rank ordering of features. The rank ordering 
has been expressed in terms of one feature as well as in two features. 
The rank ordering of the consonants in determined by adding the total 
number of confusions which took place in the perception of those 
consonants. First, the ranks have been established according to initial, 
medial and final position, i.e. information transmitted by Tables I, II, 
and nI, respectively. For example, if any consonant is confused the least 
then rank 4 is assigned. On the other hand, if any consonant is confused 
the most in any position, it is assigned rank 1. Second, by summing 
42 TEJ K. BHATIA 
up the ranks in all the positions the combined rank is determined. If 
the sum of all the three positions is least, rank 1 is assigned and if it 
is highest, rank 4 is allotted. The rank of 1 indicates the highest number 
of confusions and the rank of 4, the least number of confusions. 
The labels in Table v are explained below: 
(a) \[-- Aspirate\] indicates that the the consonants such as k and g are 
mistaken for kh and gh respectively. 
(b) \[+ Aspirate\] presents the opposite case of (a). 
(c) \[+ Voiced\] indicates that voiced consonants such as g and gh 
were confused for unvoiced consonants k and kh 
respectively, 
(d) \[--Voiced\] shows that confusion was caused as a result of the 
addition of voicing, i.e. unvoiced consonants such 
as k and kh were mistaken for voiced consonants g 
and gh respectively. 
Rank-ordering in terms of two features is presented below: 
(a) \[+ voiced 1 refers to the reverse case of(b). + aspirated j 
r--voiced 1 means unvoiced unaspirated consonants are mistaken 
(b) t aspirated j for voiced aspirated, i.e. consonants like k are mista- 
ken for gh. 
r+ voiced 1 expresses that the consonants such as g and c are 
(c) L aspirated j mistaken for kh and ch respectively. 
\[--voiced "shows that unvoiced aspirated consonants were 
(d) + aspiratedl mistaken for voiced unaspirated consonants such as 
the confusion of ch for j. 
The results which Can be drawn from Table v are given below. 
(6) The confusion of unaspirated in all the positions is the highest 
of all. 
Consequently, \[--aspirated\] has the highest number one while 
\[+ aspirated\] has a lower rank. The confusion which took place in 
terms of the two features is insignificant except for the one which 
has rank one. 
PERCEPTION AND ACQUISITION OF ASPIRATION 43 
T~u3t~ vf. The two types of interaction is shown below (on the basis of First Probable Error) : 
1. between unaspirated and aspirated consonants. 
2. between voiced aspirated and unvoiced aspirated consonants and unvoiced unaspi- 
rated and voiced unaspirated consonants. 
For example: Confusion/k/and/kh/ (Diagonal indicates confusion of/g/for/gh/ 
etc. and vice versa) 
k,e,t, T,P kh,ch, TH 
th,ph g,i,D,d,b gh,/h,DH 
dh, bh 
unvoiced unvoiced 
unaspirated aspirated 
k.c, T,t,p kh,ch, TD 
th,ph 
voiced voiced 
unaspirated aspirated ~,j,D,d,b gh,ih,Dh,dh 
R bh,RH 
Confusion of/T/for/D/ (/bh/was mistaken for/ph\[) 
unvoiced unvoiced voiced voiced 
unaspirated aspirated unaspirated aspirated 
1. The straight lines indicate the correct-recognition of consonants. 
2. Diagonals show confusion of sounds. 
3.0. Discussion. 
A contrastive analysis of a fragment of Hindi and English sounds 
will predict the following bilingual interference: 
(1) In English, only unvoiced aspirated consonants occur in in- 
itial position so it is likely that an English speaker will replace unvoiced 
44 T~J K. BHATIA 
unaspirated consonants by unvoiced aspirated ones. As a results of this, 
the perceptual confusion of unvoiced unaspirated consonants will be 
more. 
My results mostly agree with the above statement. 
(2) In medial and final position unvoiced aspirated consonants 
do not occur in English. Therefore, such consonants are likely to be 
replaced by unvoiced unaspirated consonants unless these syllables are 
stressed. 
My results partially agree with this prediction. In medial position 
unvoiced unaspirated consonants are preceded by su or ku CV-type 
prefix. The stress is carried by the second syllable; that is why unvoiced 
unaspirated consonants are mistaken more in medial position. 
(3) The voiced aspirated consonants will be mistaken more than 
voiced unaspirated in all the positions because they are not present in 
English. 
My results indicate that the conclusion of contrastive analysis is 
relevant. The subjects confused voiced aspirated consonants more than 
voiced unaspirated in initial and in medial position. But in the final 
position the situation changes completely. 
In a recent study an attempt has been made to explain aspiration 
in terms of "voicing lag " (see L. LISK~R and A. A~I~AMSON, 1964; 
C. W. KIM, 1970). 5 Aspiration is explained in terms of two reference 
points, i.e. (a) release of closure of a stop; and (b) the onset of voicing. 
Since in final position one reference point, i.e. onset of voicing is 
lost, thus, the theory implies that aspiration will be neutralized in word 
final position. In other words, aspirated sounds will be pronounced as 
unaspirated sounds in final position, and as a result, aspirated sounds 
will be perceived as unaspirated sounds in the word final position. 
In final position my results indicate that aspirated consonants are 
recognized more than unaspirated ones. On the contrary, unaspirated 
consonants are mistaken more frequently. 
5 Kim's explanation of aspirations differs from Lisker and Abramson in terms of un- 
derlying control mechanism. Kim agrees that aspiration is laryngeally controlled. But 
what is controlled by the laryngeal muscles in the case of aspiration is not the timing of 
glottal closing (Lisker and Abramson's view) but the size of the glottal opening. 
Manjari andJohn Ohala refute Chomsky and Halle's claim that heightened sub-glottal 
air pressure is a necessary characteristic of all aspirated consonants. According to them, 
during h and upon the release of the aspirated stops there occurs a moment when there 
is no oral constriction and when the glottal resistance is markedly lower than that of 
normal voicing. Given such lowered resistance to the lung air, the air naturally rushes out 
in great volume, and consequently the air pressure just below the glottis is momentarily 
lowered. 
PERCEPTION AND ACQUISITION OF ASPIRATION 45 
My results get further support from another experiment which 
I performed with native speakers of Hindi. The results of that experi- 
ment showed the same directions. 
The analysis of my results in final position raises two questions: 
1) Why are aspirated (voiced and unvoiced) stops recognized more 
than unaspirated stops by the English speakers, while these sounds 
don't exist in English in final position? 2) Why are unaspirated conso- 
nants confused more although such sounds are present in English? 
The answer to the first question is that in the pronunciation of aspi- 
rated consonants of Hindi a sort of strong final release is present which 
helps English speakers to perceive aspirated consonants more accurately 
in final position. 
As for question 2, two possibilities can be presented as an answer. 1) 
The tmaspirated consonants in word-final position are released, and the 
release causes the English speakers to interpret them as aspirated. 2) The 
nature of pronunciation (of native speakers) can be responsible for the 
perceptual confusion of those consonants which are common to both 
Hindi and English. W. J. BLACK and S. SINGH'S (1966) experiment 
shows that when a set of data which included the identical sounds of 
languages was presented by native speakers to other native speakers 
and to non-native speakers, the confusion in the latter case was relative- 
ly high. It seems that the nature of pronunciation is responsible for 
the perceptual confusion of identical sounds. 
In my results, I noticed certain exceptions. Interestingly enough, I 
found similar exceptions in experiments with native speakers. This 
shows that these exceptions seem to be related with some underlying 
phenomenon which is operating not only in the case of native speakers 
of English but also in the case of native speakers of Hindi. 
Below, I will discuss the exceptions and will propose some explana- 
tions. 
(1) In the case of unvoiced unaspirated consonants the only 
exception was present in the retroflex consonant in medial position, 
i.e. T is less confused in medial position. However, it is negligible. 
(2) The exceptions, in the case of voiced aspirated consonants, 
occur in the retroflex and velar consonants. In initial and medial po- 
sitions, DH is less confused and gh in the initial position is less mistaken 
too, while other voiced aspirated consonants are more confused in these 
two positions. 
Now two questions arise: 1) Is this distinction parallel to the distinc- 
tion which the native speakers of Hindi maintain? 2) Can they maintain 
46 T~J K. BHATIA 
this distinction because of a relative strength of aspiration present in 
such unaspirated and aspirated consonants? 
It seems that English speakers maintain the latter type of distinction. 
The retroflex are considered to be \[+ tense\] and velars, because of 
their \[+ back\]ness inherit some aspiration. 
There is another exception in final position. In final position all 
voiced aspirates are recognized more than voiced unaspirates. But gh 
is an exception. 
Velar voiced aspirated consonant gh should not be mistaken more 
than its unaspirated counter-part, because of the following reasons: 
First, it carries a final release since it is an aspirated consonant. Second, 
it has a relatively higher degree of aspiration than dental, bilabial and 
palatal consonants. At this point it appears to me that either voiced 
aspirated consonant gh behaves like unaspirated in final position and 
loses its final release as well as higher degree of aspiration simultaneously 
(further research with acoustic instruments is needed to support this); 
or this exception points towards a psychological process of " over- 
compensation " which is going on in the subject's mind, i.e. English 
speakers, like the native speakers of Hindi, realize that aspiration is 
the most characteristic phonological feature of Hindi. That is why they 
sometimes substitute aspirated sounds for unaspirated and, as a result, 
we may get exceptions in cases such as gh. The shortcomings of this 
proposal can be easily noticed since the question arises why the phenom- 
enon of " overcompensation " fails to operate upon other segments. 
Similarly, the first hypothesis can be questioned on the ground that 
if all other unvoiced aspirated consonants as well as voiced aspirated 
consonants maintain their own identity (i.e. 1) final release; 2) final- 
release and relatively high degree of aspiration, respectively) then why 
does only gh lose it in the final position? Instrument measurements are 
needed to answer this question. 
4.0. Comparison of this Investigation with Gandhi's and Jaggi's Research. 
Gandhi's and Jaggi's investigation of Hindi consonants also shows 
two results with regard to aspiration. First, in all the positions aspirates 
are mistaken more than unaspirates by English speakers. Second, 
unaspirates are substituted for aspirated sounds. 
My results show disagreement with their results in the final position 
only, since my results show that the intelligibility of aspirated consonants 
is more than unaspirated, with the exception of gh in final position. 
PERCEPTION AND ACQUISITION OF ASPIRATION 47 
My results completely agree with their second finding. The dis- 
agreement in the final position can be caused because of several reas- 
ons: First, in their study aspiration is not the focus; thus, their results 
have been determined on the basis of a very restricted amount of data. 
Second, from their experiment it is not clear which kind of data was 
used to perform such an experiment. Third, such disagreement can hap- 
pen because of their inaccurate recording and listening conditions. Lastly, 
it may depend on the language training of the subject. 
It was not mentioned in their study whether the second syllable 
was stressed in the middle position or not. In such a situation, it is hard 
to conclude whether my results agree or disagree with their findings. 
5.0. Summary. 
The following conclusions can be drawn from the above discussion. 
First, unvoiced unaspirated consonants are more confused than unvoi- 
ced aspirated consonants in all positions. Second, voiced aspirated con- 
sonants behave differently: 1) in initial and medial position; and 2) in 
final position. In initial and medial position they are mistaken more 
while they are better recognized in final position. Third, the confusion 
occurred primarily between the consonant classes which can be distin- 
guished by a single feature, i.e., either by aspiration or by voicing. 
Fourth, unaspirated segments were more frequently confused than 
aspirated ones. \[~ voicing 1 has the lowest rank, i.e. the least con- aspiratiow 
fusion took place in the perception of these segments. Fifth, exceptions 
are present only in the retroflex or velar series. In such consonants 
(voiced) the degree of aspiration present is relatively high. It seems that 
voiced unaspirated consonants of velar and retroflex series possess ahnost 
equal amounts of aspiration which is present in palatal, dental, and bi- 
labial aspirated (voiced) consonants, and that aspirated consonants of 
velar and retroflex series preserve higher degree of aspiration than the 
aspirates of the palatal, dental or bilabial series. This is a highly tentative 
conclusion since it lacks empirical support. Sixth, after undergoing an 
intensive Hindi instruction of a semester, motivated students can de- 
velop perceptual cues for aspiration. They can hear aspiration in more 
than 50 °/o of the cases. Lastly, the rate of confusion in the palatal series 
is much higher than the rate of confusion which takes place in other 
series. 

REFERENCES 

R. AHMAD, S. AGRAWAL, Significant 
Features in the Perception of \[Hindi\] 
Consonants, in ~J. Acoust. Soc. Amer. ~), 
x~v (1969) 3, pp. 758-63. 

W.J. B~ACX, S. SINCH, Study of Twenty- 
six Intervocalic Consonants as Spoken 
and Recognized by Four Language 
Groups, in ~J. Acoust. Soc. Amcr. ~), 
xxxIx (1966), pp. 372-87. 

D. P. GANDHI, S. JAGGI, Perceptual In- 
terference and Hierarchy of Difficulties, 
in ~ Indian Linguistics ~, xxxlI (1971). 

J. GUPTA, S. AGRAWAL, R. AHMaD, Per- 
ception of \[Hindi\] Consonants in Clipped 
Speech, in ~J. Acoust. Soc. Amer. ~, 
XLV (1969) 3, pp. 770-73. 

D. JONES, An Outline of English Phonetics, 
New York, 1956. 

C. W. KaM, A Theory of Aspiration, in 
~ Phonetica*, xxt (1970), pp. 107-116. 

L. LISKER, A. ABRAMSON, Across-Language 
Study of Voicing in Initial Stops: Acous- 
tical Measurements, in ~ Word ~, xx 
(1964), pp. 384-422. 

P. E. NICELY, A. MILLER, An Analysis of 
Perceptual Confusion Among Some Eng- 
lish Consonants, in ~J, Acoust. Soc. 
Amer.~, xxva (1955), pp. 338-52. 

M. OHALA, J. OHAtA, The Problem of 
Aspiration in Hindi Phonetics, in An- 
nual Bulletin N. 6, Research Institute 
of Bogopedics and Phoniatrics, Uni- 
versity of Tokyo, 1972. 

R. N. SRIVASTAV, Theory of Morphone- 
matics and Aspirated Phonemes cf Hindi, 
in Studies in Hindi Linguistics, AIIS, 
New Delhi, 1968. 
