THE MEASURE~NT OF PHONETIC SIMILARITY 
Peter Ladefoged 
University of California, Los Angeles 
There are many reasc~s for wanting to measure the degree of phoae- 
tic similarity between members of a group of languages or dialects. The 
present study grew out of a research project which was designed to get 
data that might have a bearing on some of the practical problems which 
exist in Uganda. In the Southern part of Uganda, where two thirds of 
the nine million people live, there are numerous closely related Bantu 
languages or dialects. The official Ugandan census data lists 15 Bantu 
languages. The current study uses data on these and six others. We 
wanted to assess their phonetic similarity so that there would be data 
on which to base decisions on which languages to use for broadcasting 
(the government currently broadcasts in 8 or 9 of these languages, as 
well as in i0 non-Bantu languages), which to use in schools (3 are used 
officially and a further 5 unofficially, but with the connivance of the 
local education authorities), and which for other purposes. 
One method of obtaining a measure might have been by devising a 
metric that could be applied to formal comparisons of phonological 
descriptions of each of these languages. This method was not attempted, 
largely because of time limitations. The data had to be collected and 
first analyses made within a period of one year. Furthermore, it soon 
appeared that the sound patterns of nearly all of these languages were 
very similar, and the phonological descriptions would have to be eX- 
tremely detailed before systematic differences became apparent. Finally, 
before we could quauti~j, in practical terms, the overall degree of pho- 
netic similarity between a pair of languages, the phonological descrip- 
ticas would have to be supported by counts of the frequency of occur- 
rence of each rule. A difference between two languages due to, s~, 
the addition of a rule in one but not the other would be more or less 
important depending on the number of times in which the rule was involved 
in ordinary utterances. 
The technique which we chose to use instead was to measure the 
degree of phonetic similarity in a list of 30 co-,,on words in each lan- 
guage, all of which were historically cognate forms in at least 16 out 
of the 20 languages. The list was a subset of a list of lO0 words which 
had been recorded so that lexico-statistical comparisons might be made. 
The complete lists had been recorded in a narrow phonetic transcription 
by the author, u~ing IPA symbols except for the voiced and voiceless 
palatal affricates, which Were transcribed j and C in accordance 
with the conventions of Ugaudan orthographies. Long vowels and long 
consonants (both of which are phonemic in sone of these languages) were 
transcribed with double letters. Tones were transcribed by acute accents 
(high), grave accents (low) and circumflex accents (falling); as far as 
is known these possibilities will account for nearly all the tonal con- 
trasts that occur in these languages. Table 1 exemplifies the data for 
two words in each of the 20 languages. 
The fundamental problem in making phonetic comparisons is how to 
line up two words, one in one dialect and one,in another, in such a ws~ 
that we can make a valid point by point comparison of all the things 
which affect phonetic similarity. In the Bantu languages with which we 
were concerned, each noun consists of a stem, and a prefix indicating 
the noun class. Only the stems were used in these phonetic comparisons. 
In general, a stem begins with a consonant, C, followed by a vowel, V, 
and may contain additional alternations of consonants and vowels. The 
commonest form is CVCV. Some problems in lining up segments will be 
considered after we have considered how they may be compared. 
There have been a number of attempts to devise measures of the 
degree of phonetic similarity of isolated segments. Some of these have 
been based on experimental studies showing, for instance, the degree of 
confusability of different segments (Miller and Nicely 1955, Peters 1963, 
Wickelgren 1965, 1966, Klatt 1968, Greenberg and Jenkins 1962, Mohr and 
Wang 1968); others have been based on more theoretical arguments (Austin 
1957, Peterson and Harary 1961). All of these are of interest here, in 
that the knowledge of the degree of phonetic similarity between segments 
is a necessary prerequisite to a statement about the degree of phonetic 
similarity of languages as a whole. 
Some of the studies cited above have discussed the possibility of 
quantifying the degree of difference between segments by counting the 
number of differences in their specifications in terms of features. 
Various ways of specifying segments in terms of features have been sug- 
gested, the most important being the early distinctive feature system 
of Jakobson, Fant, and Halle (1951), its revision by Jakobson and Halle 
(1956), and the system proposed by Chomsky and Halle (1968). All these 
features sets are intended for classifying the segments which occur in 
phonemic or phonological contrasts within a language. But it is by no 
means obvious that the specification of the phonetic level in the way 
suggested by Chc~sky and Halle, for instance, is directly related to the 
specification of the kind of phonetic similarity measure which is useful 
in cross language studies. Chomsky and Halle were certainly not trying 
to produce a phonetic specification of this kind. Accordingly for the 
purposes of the present study an ad hoc set of phonetic features was 
used. 
For the sake of computational simplicity, the phonetic features were 
considered to be independent binary categories. This is obviously an 
invalid assumption which will be discussed further towards the end of 
this paper. Because vowels were being compared only with vowels, and 
consonants only with consonants, there was no need for features such as 
consonantal and vocalic; they would never have contributed anything to 
the cross language comparisons. Furthermore there was no need to use 
the same features for both consonants and vowels. The feature system 
which was set up was adequate for specifying all the phonetic differences 
which had been observed sunnng Ugandan Bantu languages and seemed, on the 
basis of the experimental studies cited above, likely to be the best 
possible measure of segment similarity within the constraints previously 
noted. 
Each consonant segment in a Ugandan Bantu language was described as 
being, or not being: (i) a stop; (2) a nasal; (3) a fricative; (4) an. 
terior -- made in the front of the mouth; (5) alveolar -- made near the 
teeth ridge; (6) coronal -- made in the centr~ of the mouth; (7) voiced; 
(8) long; (9) followed by a w-glide; (i0) followed by a y-glide. The 
easiest way of appreciating the way in which these terms were used is 
through the examples showing the partial characterization of some ~o- 
nants given in Tables 2 and 3. A plus sign indicates the presence ~ 
feature, and a minus sign shows its absence. 
The degree of similarity between segments is exemplified in Ts~ ~. 
Thus b and d have nine out of the ten points in common; and b =~ 
5Y differ in seven points, and have only three points in cc~amon. 
In one or two details this measure is not entirely satisfactory. 
There is no reason why b should be considered to have seven points ~n 
cow,non with I and only six points in common with r ; and, what i~ 
important, there is no reason why h should have such varying degz1~s ~ 
similarity with b , d , d , d- . These anomalies occur becauBe 
ments were specified in terms of independent binary categories. Wi~l~ 
classification system of this kind it is impossible to give a specif~ca.- 
tion of h which is equuZ~ different from all the stop consonants. 
these inequities probably did not have a significant effect. Among 
2,400 segments compared, h occurred only 31 times. 
In specifying the vowels we stated whether each one was, or was ~.- 
(i) high; (2) mid; (3) low; (4) front; (5) central; (6) back; (7) long~ 
(8) high tone; (9) falling tone. At one time we added the possibilitlr.- 
(i0) low tone. But preliminary results showed that this gave too much 
importance to tonal similarity, and it was better to consider low to~e 
as simply the absence of high or falling tone. The degree of similark~r 
in vowels was measured by counting the number of features they had in 
common, in the same way as for consonants. 
Using this measure of the ~\[egree of phonetic similarity, the fea- 
tures in each segment were compared with the corresponding features in 
the corresponding segment in each of 30 words in each of the 20 Bantu 
languages. The 144,000 comparisons involved, the st, ms indicatingthe 
degree of phonetic similarity of each pair of languages, and the tabula- 
tions were all done on a cumputer. 
A number of problems arose in the comparison of specific segments, 
two of which will be considered here. Both are due to the constraint 
of having to compare words segment by segment, a constraint which is 
necessary only because of the difficulties of formalizing the compari- 
sons in any other way. 
The first was that not all the stems to be compared were the same 
length. For exe~ple, the stem in the word for 'ear' has the form -~ 
or -~wf in many of these languages; but in two languages it is disyllabic, 
being either -t~yf or -t~yf. One might guess that these are the older 
forms, and there has been some kind of shortening process in all the 
other languages. The solution that was adopted was to add dummy seg- 
msnts with entirely negative feature values to all the languages having 
a monosyllabic form. This did not affect the similarity measure within 
the monosyllabic group of languages ; and it made the two languages having 
disyllabic forms more similar to the monosyllabic group than they would 
have been to another language which had a different second syllable. 
The second problem arose when a phonetic feature such as palatali- 
zation was realized in one language in a consonant and in another in a 
vowel. The word for 'crocodile', for example, often has a stem of the 
form -g66~ ~ but sometimes, instead of the p~latal nasal, the form is 
-g6fn~. Note that if these two forms were lined up so that the conso- 
nants were compared only with the consonants and the vowels only with 
the vowels, then there would be differences in both the last vowel and 
the last consonant. Consequently this pair would be counted as less 
similar than a pair such as -g6~n~ and -g6~. This is not a desirable 
result. It was avoided by an ad hoc solution in which -in was arbi- 
trarily specified as a consonant differing in one feature from the 
palatal nasal p . Note also that the problem is not avoided by using 
the same features for consonants and vowels~ it is simply a matter of 
the lining up of the segments to be compared. 
The ad hoc approaches discussed above are, of course, unsatisfac- 
tory. They were adopted simply in the interests of expediency. Work 
is continuing on a better formalization of the problem of comparing 
whole words, but so far without success. Meanwhile, a computer program 
has been written which compares the features in each segment in each 
word in e~ch language with the corresponding features in each word in 
every other language. The sums indicating the degree of phonetic 
similarity of each pair of languages are printed out in matrix form. 
The results for this particular group of 20 Ugandan languages are not 
particularly relevant here~ they are given in detail elsewhere (Criper, 
Glick,and Ladefoged, forthcoming). It is sufficient to note that the 
relationships revealed suggested plausible and interesting groupings 
into dialect clusters. 
What is of more interest here is the validation of the claim that 
this technique measures phonetic similarity between languages. We 
attempted to do this in two ways, first by assessing local opinion 
concerning the degree of similarity between one language and another, 
and secondly by testing the extent to which people actually understand 
other languages° The first of the6e two methods did not produce reliable 
data; different local experts gave different figures, and even the ssme 
man gave different estimates when the questions were put to him in a 
slightly different ws~ on different occasions. The second method pro- 
duced limited but valid data. The procedures are described in full 
elsewhere (Criper, Glick, and Ladefoged, forthcoming). We conducted 
tests with speakers of two different languages. For each of these lan- 
guages we used five groups of speakers, and pls~ed them recordings of 
stories in their own and four other languages, rotating stories, lan- 
guages, and groups in a Latin square design. The group scores in 
answering questions about these stories were subjected to an analysis 
of variance, which showed that there were no significant differences 
between any of the listening groups, or between auy of the stories ; but 
there were very significant differences in the comprehension of the 
different languages. We therefore had valid scores on the co~rehension 
of two languages relative to four other languages. These eight scores 
were compared with the degrees of phonetic similarity of the corresponding 
pairs of languages end, provided one score was left out for reasons 
discussed below, a high correlation was found (r = 0.98). 
It is virtually impossible to test the relative comprehension of 
all possible pairs of a large number of languages, because of the com- 
plexities in the experimental design which are necessary. But it would 
appear that, at least in the case of these Ugandan Bantu languages, 
valid predictions ms~ be made on the basis of the phonetic similarity 
measure described above. There are, however, circumstances in which 
our predictions would be wrong. The degree of comprehension of one 
language to another is not always a reversible relationship~ speakers 
of a prestige language do not understand a minor l~uguage as well as 
speakers of the minor language understand the prestige language. It 
is this discrepancy which accosts for our having to leave out one 
score in order to get a high correlation as described above. Phonetic 
similarity is a good predictor of intelligibility only if questions 
of prestige are not involved. 
Finally we must consider w~s in which we could improve the 
metric used for comparing the phonetic similarity of segments. Perhaps 
the mo~t obvious improvement is to allow for variations in the importance 
of different features. The experimental studies cited above generally 
agree in finding that differences in manner of articulation contribute 
more to perceptual distance than differences in voicing, and both con- 
tribute more than differences in place of articulation. Accordingly 
features must be assigned different weights. 
The situation is, however, more complicated. We must also allow 
for the interaction of features. For example, the experimental studies 
cited above have shown that there is a greater difference between the 
members of the set pa - ta - ka than there is between the members of 
the set b8 - ds - 9.~ ; and the members of the set ma - na - ~8 
are even less different from one another. Consequently differences in 
place of articulationD however coded, must be made to have less effect 
when the feature voiced is also present; and even less effect when the 
feature nasal is also present. 
It seems that it would also be advisable to allow for non-binary 
specifications of features. Multivalued feature specifications can be 
l0 
treated in either of two wssrs. In one way, each value is regarded as 
being equally different from all others. Thus if the consonants 
p , ~ , c , k are assigned the values l, 2, 3, 4 on a feature of 
articulatory place, they will each be regarded as being c~e point dif- 
ferent from each other with respect to this feature, assuming it has 
been given a weight of 1. Alternatively multivalued specifications can ~ 
be treated as scalar quantities. If this is done and, for example, the 
vowels i , e , a are specified as having the values l, 2, 3 on a 
feature of vowel height, then e would be counted as one point different 
from i and a , but i and a would be two points different from 
each other (assuming this feature has a weight of 1). If they had been 
specified as l, 2, 7 then e would have been three points from I and a 
and the~ would have been six points different from each other. 
The use of independent multivalued feature specifications allows 
us to correct an anc~aly which was mentioned above. It will be re- 
membered that using the previous system it was impossible to specify h 
in a way such that it was equally different from all stop consonants. 
But if place of articulation is an independent multivalued feature, and 
if h is assigned a value different from any of the stop consonants, 
then it can be made equally different from all of them. In other words, 
this type of specification allows us to formalize within the metric the 
notion of an irrelevant feature. 
A computer program has now been written which compares segments 
which may be specified in terms of weighted, interacting, multivalued, 
independent or scalar, features. It is hoped that results of experiments 
using this program will be available for reporting to the conference. 
\ 
ii 
Table i: Phonetic transcriptions of the words for 'bee' end 'bone' 
in 20 Ugandan Bantu languages. IPA symbols ere used, 
except that j and c are used for the voiced and voice- 
less palatal affricates. Doubled letters denote long 
sounds. The stems (which are all that were used in the 
cmmparisons) are separated from the noun class prefixes 
by a vertical line. 
Language 'bee ' 'bone ' 
Lumas ab a 
Lunyole 
Lus amia 
Lugwe 
Lugwere 
Lukenyi 
Lus oga 
Luganda 
Ruruli 
Runyoro 
Rut ooro 
Ruhororo 
Rut agwenda Ru~rm~ore 
Ruki ga 
Lubwi s i 
RukonJo 
Rugungu 
Runyarwauda 
Rwamba 
n z 
n j 
n j 
n 
n n j 
n 
n 
m 
n j 
n 
n ts 
n 
n j 
n z 
k~ h 
n z 
n j 
kf I| 
,',hi' 6 
~ h f 
,J hi 
b kf ~ l 
b kf 6 | 
b kf $ | 
~ k 1 $ ' 
& cf ~ l 
6 kl ~ 1 
6 kl 1 
6 cl 1 
b xf (} 1 
6 kl ~ 1 
6 kf ~ I 
6 kl 
u c1 ~rl 
b ki t 
~ kl I 
~kl n 
~ mb 
~J~ mb 
k ~J~ mb 
k uu mb 
g ~mb~ 
g 6b mb 
g ,'.', mb ~ 
9g '~ mb 
9 ~ mb 
g ~ f ~ 
g ~ f ~ 
g 6 f ~ 
k ~ mb 
g ~ f ~ 
g ,~ f ~ 
6 w ~ 
k 6 h ~ 
k ~ h ~ 
9k, u f ~ 
,~ w ~ 
Table 2: The classification of the places of articulation required 
for the description of Ugaudan Bantu languages. 
Example Phcaeti c Characteristic Features 
term anterior alveolar corcmal 
b labial + - - 
dental + - + 
d alveolar + + + 
d- post alveolar + + 
j prep alat al - + 
9 velar - - 
Table 3: The classification of some manners of articulation required 
for the descriptiom of Ugasdan Bantu languages. 
Example Phonetic Characteristic Features 
term nasal stop fricative 
n nasal 
nz prenas al 
fricative 
nd prenasal 
stop 
d stop 
j affri care 
z fricative 
I approximaut 
÷ - ÷ 
÷ ÷ - 
- ÷ - 
÷ ÷ 
Table 4: The degree of similarity between some ccasonant segments 
in Ug~d~ B~tu l~ages. 
~ d d- j g dY dW d: dz z nz I r h s ~ sY ~Y 
b 987797777657675 ~ 43 
98888888768766554 
d 9779999879857665 
d- 888888768966756 
j 86668' 766766756 
g 6 6 6 6 5 4 6 7 8 4 5 3 h 
dY 888768746576 
d w 88768746554 
d: 8768746554 
dz 9887 h 8776 
z 99859887 
nz 8748776 
I, 968776 
r 77867 
h 6756 
s 998 
89 
sY 9 

References 

Austin, W.M. (1957) 'Criteria for phonetic similarity' Language 33, 
538-~3. 

Ch~msky, A.N. and Halle, M. (1968) The Sound Pattern of English 
Harper and Row, New York, New York. 

Criper, C., Glick, R., and Ladefoged, P. (forthcoming) Lang~ge in Ug~a. 

Greenberg, J.H. and Jenkins, J..T. (1964) 'Studies in the psychological 
correlates to the sound system of American English' Word 20, No. 2, 
157-77. 

Jakobson, R., Fant, G., and Halle, M. (1951) Pr~iminca~es to Speech 
Analysis (sixth printing, 1965) Cambridge, Mass., M.I.T. Press. 

Jakobson, R. and Halle, M. (1956) Fundaz~ntaZ8 of LangUage Mouton, 
The Hague. 

Klatt, D.H. (1968) 'Structure of confusions in short-term memory between 
English consonants' J. Acoust. Soc. Amer. 44, No. 2, 401-7, 

Miller, G.A. and Nicely, P.E. (1955) 'An analysis of perceptusl con- 
fusions among same English consonants' J. Acoumt. Soc. Amer. 
27, 338-52. 

Mohr, B. and Wang, W. (1968) 'Perceptual distance and the specification 
of phonological features' Phonetica 18, 31-45. 

Peters, R.W. (1963) 'Dimensions of perception for consonants' J. Acoust. 
Soc. Amer. 35, 1985-9. 

Peterson, G.E. and Harary, F. (1961) 'Foundations of phonemic theory' in 
Structure of Language and its Mathematical Aspects (ed. R. Jakobson) 
American Mathematical Society, Providence, Rhode Island. 

Wickelgren, W.A. (1965) 'Distinctive features and errors in short-term 
memory for English vowels' J. Acoust. Soc. Amer. 38, 583-8. 

Wickelgren, W.A. (1966) 'Distinctive features snd errors in short-term 
memory for English consonants' J. Acoust. Soc. Amer. 39, 388-98. 
