PART ~E - LELIEOSTATIST\]DS 
The Swadesh theory of lexicostatistics (1950, 1952, 1955) 
provided the first quantitative comparison of related languages 
based on a well-defined model of language change. The stochastic 
nature of this model was poorly understood by linguists, in the 
main, and many have rejected the theory in the course of a protracted 
and confused controversy. Meanwhile field linguists, especially 
those working with language groups of unknown history, have accepted 
lexicestatisties and have found it to he an efficient, valid and 
reliable tec huique. 
The Swadesh thee .r~ 
There are serious oversimplifications of reality implicit 
in lexicostatistics, and it is these, rather than the stochastic 
aspects, which are limitations of the theory. Swadesh hypothesized, 
in effect, that 
(i) it is possible to discover a set of basic, universal and 
non-cultural meanings, and he constructed a list of about 200 such 
meanings; 
(ii) in every natural language, at a given tl ~e, there is a 
unique lexical representation (word)corresponding to each of these 
meanings; but 
(iii) over short time intervals, the word representing any 
meaning runs a small but constant risk of being replaced by a 
different (non-cognate) word; and 
(iv) the replacement, or non-replacement, of the lexical 
representation of a meaning occurs independently of that of any other 
meaning, and independently over different periods of time. 
To formalize (i) and (ii), we must postulate the existence, 
for each natural language, at all points t in time, of a lexico_..___~n, 
represented by a finite abstract set Lt. A well-defined equivalence 
relation corresponding to cognation parti- 
tions the elements of ~TLt (T a real interval) into equivalence 
classes. If k6Ls, I~L t (t,s6T) are cognate, we write ~(k,l) = i. 
Otherwise ~(k,1) = O. 
Further, we must postulate the existence of a finite 
abstract set M (corresponding to the universal set of meanings), and 
a procedure for defining, for any t, and any Lt, a unique map from 
M into Lt. This map, written M ~Lt, specifies that for each 
t m~M there is a i~ L t such that m~l (i mean., ss m). 
Hypotheses (iii) and (iv) imply that the changes over 
time in the image of the map M---~*-t Lt have a certain stochastic 
aspect. This can be modelled by the probability statement 
P\[~(k,l) = 1\] = i - X(t- s) ÷ h , 
a universal constant, t>s, and ~t-s -~ 0 as t-~s; and two 
independence conditions ; let 
S m i ----~ k i 
for i = 1,2,...,IM~ (~Mi the number of elements in M), 
t 
mi------'-~ i i 
~'oCkl,ll; , ~ (k2,12), • • • , ~o(k~Ml,l~m ) ,,', ~do~ndent ~hen 
random variables; let 
3 J 
s i 
m~h i 
for _ \[si,ti) , i = 1,2,...,N, 
ti 
m~ j± 
then ~ (h~,j~, I(~,j2~,... , 
variables. 
This model has a number of immediate properties which form 
the central thesis of the Swadesh theory. These are presented here 
as Theorems I, 2 and 3. For simplicity we will assume that at any 
time t, at most one word in L t can belong to a cognation equivalence 
class. This simplifies notation and proofs, although the assumption 
may be relaxed without substantively affecting this development. 
One further type of assumption is required to ensure a 
degree of randomness in the choice of replacing word during lexical 
replacement. To prove Theorem I as stated below, we require 
BC > I such that 
for all m, t>s, ~(k,l) = O. 
a finite number of 
disjoint intervals, 
(~ ' JN ) are independent random 
ILtl 
= O(ILtl ) , 
Theorem i 
\]MCI ~S(~,Im) = e "~t's> • O(\[Ltl ) 
Proof 
Let N(t-s) be the number of changes (with respect to the 
cognation relation) of the mapping m~t i in the interval (s.t\]. 
Then N(Z-s) = 0 is just the event tha% a Poisson process remains at 
zero onthe interval (s,t\] (see, e.g. Parzen, 1960, p.252), and 
However, 
P \[ ~(~s): 0\] : e "~(~s) 
• PIN(t-s)> o\] x p \[ last c~ge Is te 
k (o~ co~a~) I N(t-~) > o\] 
= e "~'(t's) ÷ (i - e "~(~'s)) O(ILtl) 
= e-A(~-s) . O(IT.tl) 
Then 
IMi -1 
~. ~(k,1) = e "A(t's) • 0(|Lt!) 
EZ~(~,~> ~ i~ 1 ~E~(k~,~) 
m(M mEM 
= tMj -1 IMILe-~s) • O( ILtl )~ 
= e + O(ILtl) 
Definition 
B I| Lt, Lt, t> s represent the lexicons of two languages 
which are indenendent ~ languages of the same parent language 
(which are said to solit at time s) if 
' = L" Ls s ' and if 
s m---*-k (in beth languages) 
m--t-~t l' in the first language 
t m~l" in the second language, 
then ~(k,l') and ~(k)l") are independent random variables. 
Theorem 2 
Then 
Let Lt, .Lt, t>s be as above. 
el P\[l(~',l") = 1\] = .-2Xi~s). O(~.~L,ti ,iLtl) 
m6.M ) 
Proof 
Ass'~l~g m~-~k in beth languages, 
By transitivity of the equivalence relation represented by S, the 
first term on the right is 
P\[~(k,l") = 1~ $(k,l') = 1\]~w~oh, by independence 
.. \[s-~,C~s>. o~<, ~1 \[ e .a~+'s> * o~(,.~, >\] 
-z~ (t-s) e-A(t-s) ' = e • (O(ILtl) * 0(1'. i)) ÷ O(ILtllLt|' " ) 
tl = e -zA(t's) * O(~Iz,\[l ,ILtl ) 
Now the second probability on the right hand side above is, similarly, 
P\[~(l',f') = 1~ I(k,l') = o~ ~(k,f') = o\] 
= ~P\[m-~e-t l'(lst language)~ m--~t l"(2nd language)\] J(l',l") 
I| I'~ L t 
11 It 
1 ~L t 
S(~', ~o 
~(~", ~o 
= ~P\[m t--~l '\] P\[m t--~p-lU ~(l',l"), by independence. 
Since we have fixed m--~ k 
The summation contains at most 
I I! 
mx (I Lt\[, { Ltl ) 
terms which are not annihilated by "- " ~) (i',i") 
and so the total is 
o c max (IL'I,IL"I)t t "< IL\[I {L~I 
t H = O(.~n f Lt|, ILt/) 
This completes the proof of the first statement of the theorem. 
The proof of the second parallels the analogous result in the 
previous theorem. 
In natural languages, \[Ltl is several thousands and \[LJ 
is ne~ligible compared to the exponential te~u, except for 
very high values of t (where the theory has little applicability). 
In the next theorem, the results of Theorems 1 and 2 are utilized, 
neglecting the emr terms of the form O(lLt\[ ). 
Under certain, more specific restrictions on 
P\[o-*~ llm-~ ~\] , B~ ~n.~ ~Ived ~or ~ e~ot ~o~ of 
the error term attached to the expommtial laws (here formulated as 
Theorems 1 mad 2). 
Theorem 3 
Insofar as we may approximate the results of Theorems I and 2 
by 
l.1-1 E~ $(k~) = j':~e,..s) 
and 
l.l "1 E ~., Jo.',l") = o ''2~e'~) 
respective~ if it is known that t-s = T, then 
: - lo~lxl-l~ck.1) 
T 
is the maximum likelihood estimator (~) of ~ in the first 
formula above, and if ~ is known, 
- log |Ml.l~Sd(k~l )-v. . /% %~-s -- .... 
k 
is the }~E of t-s. 
In the case of two independent daughter languages (Thin. R), 
A~ - io~ IMI'ZZ ~(i'~i") t-s = 
is the MLE of t-s. 
Proof 
It suffices to find the MLE of ~ , the other cases being 
analogous. 
-AT Consider binomial trials with parameter p = e 
~(k,l) = 1 is the equivalent of a success in one such trial. 
~. ~(km,l m) = r is the equivalent of r successes in \[M I trials. 
m~M 
The likelihood function of ~ in such a case is 
log L(~) = constant-~Tr • (|M|-r) log (1-e'A~. 
d io~ L( ~ ) = -Tr -Te-~T~_ML~ 
dk l-e-"" 
At the MLE, ~ , this derivative should be zero, 
~ = - log~ I 
T 
and the same process yields 
t-s 
A 
Letr= ~ 
me M as in Theorem 3. Swadesh (1950) derived 
a method~logy to utilize the three results 
~), = - log (r/~) 
t.-s 
A ~.$ 
= - loz CrlIMl) 
2A 
as follows. He first selected his list of meanings which he 
considered basic to all languages. He then • compared Old English 
with Modern English (t-s ~.I000 years), i.e. he compared the 
words in each language corresponding to the basic meanings. The 
etymology of words in those languages being fairly well known, 
he was able to decide when a pair of words corresponding to 
the same meaning were cognate (i.e. one was historically derived 
from the other, or both were derived from a co~n root, by a 
series of phonological alterations, each of which affected only 
a part of the word in question). This Immedi&tely led to 
~-~ 2 ~ I0"4° Using the estimate which he obtained as a constant, 
he dated the relative times of separation or "split" of 
10 
various Salish (western NorthAmerican Indian) languages from a 
common parent with the estimator t-s . After the work of Lees (1953), 
was considered to be a universal constant~ t-s could estimate 
absolute dates of split, and t-s could date a collection of texts 
from a dead language. 
Criticisms of the theor 7 
Criticisms of lexicostatlstics fall into two classes. In 
the first class are protests based on or resulting from the stochas- 
tic nature of the model an~or the stochastic nature of the pheno- 
mena of lexical loss and replacement. The second class of criticisms 
refer to particular assumptions in the model , and I will discuss 
these in the next section. 
Bergsland and Vogt (1962) presented four cases where 
t-s (or ~s are not accurate (thre~ too low and one too high), 
and rejected the Swadesh theory on this basis. In statistical terms, 
the authors constructed a sample consisting entirely of outliers and 
rejected an hypothesis without even considering the distribution of 
the test statistic. Fodor (1962) took the same approach to "disprove" 
lexicostatistlcs. Chretien (1962) calculated and published pages 
of ordinary binomial functions to prove, in essence, that t-s is a 
random variable and hence not "an acceptable mathematical formula. 
tion" of the Swadesh theory. This basic misunderstanding of the 
nature of statistical estimation is characteristic not only of 
critics of lexicostatistics, hut also of many of its practitioners. 
11 
A more important criticism has been expounded, at great 
lengthp by Fodor (1965) and, more clearly, by Teeter (1963). 
Quoting from the latter: 
"Lexical similarities and dissimilarities do net 
come about in any one simple way, and any mechanical 
method of counting lexical similarities cannot 
separate those due to chance, universals, diffusion, 
and common origin. Lexical change is the result of 
many factors, and all are scrambled together in the 
final result." (p.~l) 
This diversity of causes of lexical and semantic change has received 
detailed study by linguists and semanticists; see, for example, 
Bloomfield (1933) p.392 ff., Ullman (1957) p.183 ff. Quoting from 
Lees (1953): 
" The reasons for morpheme decay, i.e. for changes 
in vocabulary, have been classified by many authors; 
they include such processes as word tabu, phonemic 
confusion of etymologically distinct items close in 
meaning, change in material culture with loss of ob- 
solete terms, rise of witty terms or slang, adoption 
of prestige forms from a superstratum language, and 
various gradual semantic shifts, such as specializa- 
tion, generalization, and peroration." (p. 114) 
And it is Just this diversity and the difficulty of "unscrambling" 
which, contrary to Teeter and to Fodor, justifies a stochastic 
model incorporating retention parameters. Consider, for comparison, 
the problem of constructing a model for the behaviour of gases. We 
have an enclosed volume containing a large number of particles of 
finite dimension, undergoing rapid motion. We can assume everything 
is perfectly deterministic, all the particles obeying Eewton's three 
laws of motion, and all collisions perfectly elastic. The position 
of any particle at any time can, theoretically, be calculated pre- 
cisely if we know the initial state of the system and the time 
elapsed. Practically speaking, of course, this would be impossibly 
tedious, boring and pointless, there being so many particles, any 
two of which may collide, plus the walls, plus gravitational or 
electrical charge attractions and repulsions to consider. What is 
possible, interesting, and of great value (witness the fields of 
kinetic theory and statistical mechanics, dating from the work of 
men such as Maxwell, Bolt~man% and Einstein) is to consider the 
nature of each particle as a random process involving appropriate 
parameters and to consider the statistical bohaviour of the model 
thus constructed. It is complexity and great difficulty of predie. 
tion which make a statistical model workable. In the same way, Fodor 
and others have inadvertently Justified the preposition that some 
sort of stochastic process might be an appropriate model for lexical 
change phenomena. The question remains, what process? The Swadesh 
theory provides at least a first approximation to the correct answer. 
Problems with Swadesh's mode ~ 
Before discussing details of the model, it is appropriate 
to present the results of an early (1953) lexicostatistic investiga. 
tion of R. Lees. He chose thirteen language pairs, each pair con- 
sistlng of an historio language and a modern descendant. The 
12 
13 
particular choice of pairs presumably stewed from availability and 
not from any sampling technique. He translated each word in 
Swadesh's 215-word list (1950) into the 26 languages. After count- 
ing the number, r, of cognates between each language pairp he used 
(in effect), 
-~S 
where {MJ ~ 215 according to the number Of indeterminate cognations 
and uncertainties of translation. To get an estimate of a "universal" 
, he combined the individual estimates in 
i=l 
( ~ =~A~gives approximately the same result. ) 
Using p = e "At as the parameter in the binomial experiment 
he calculated, for each language pair, 
(JMIo - r)2 
IMI p(1-p) 
which should be approximately the square of a standard normal random 
variable, if the assumptions of the theory are true. Since an 
est~imate of ~ is used in calculating p, the sum of the squared 
va~bles s~d ~ ~,~-~strlbuted. ~ut Z~<9.5, ei~iflcant ,t 
the I% level, suggesting rejection of the theory. 
Lees, however, suggested four reasons for not rejecting on 
the basis of the ~2 test; the large values for ~M~ and r, uncertainty 
in t, possible inappropriateness of the ~2 test, and the error in 
estimating ~ . The first and third of those are not valid 
14 
statistically, and the fourth is a source of very little of the 
excess ~2. The variability in the time parameter can be incor- 
porated into the ~2 calculation. This only reduces 
~2 to 25.9 - 27.5 depending on the variation assumed in t. Lees' 
results, then, indicate strongly that the theory Is an inadequate 
model for the phenomena. 
We turn now to the second class of criticisms of the 
Swadesh model, those that involve objections, evaluations or im- 
provements related to the generalizations and simplification of 
reality inherent in lexicostatistlo theory. The listing of assump- 
tions earlier in this chapter will serve as a framework for classify- 
ing this latter class of criticisms. 
(i) There are no universal sets of meanings, it being difficult 
to specify most meanings without recourse to particular natural 
languages. ~o llst of meanings yet devised is completely satisfactory 
for sufficiently diverse languages; Holier (1956), O'Grady (1960), 
Cohen (196~), Levin (1964), Trager (1966). 
(ii) The existence of synonymy proves the non-uniqueness of 
the meaning map MT-~L; and no known methods of eliclting words for 
given meanings are completely and reliably reproducible, from 
speaker to speaker or even from occasion to occasion for a single 
speaker; Gudschinsky (1960). The existence of general and specific 
terms for a single entity provides a further complication. 
i 
(iii) If the parameter ~ can be sald to exist at all, It Is 
constant neither from language to'language; Bergsland and Vogt (1962), 
15 
Fodor (1962), from meaning to meaning; Swadesh (1955), Androyev 
(1962), Ellegard (1962), and especially Dyen (1964), van der Merwe 
(1966), Dyen, James and Cole (1967), nor even from time interval 
to time interval for the same meaning; Swadesh (1962). 
Judgements about cognation are unreliable, especially 
with respect to languages which are separated by large t-s and 
whose history is mostly unknown; Fairbanks (1955), Teeter (1963), 
Lunt (1964). An analysis of this latter problem is beyond the scope 
of this study. 
(iv) Lexioal loss and replacement do not occur independently 
for different meanings, neither are current and future trends entire- 
ly independent of what has happened in the past, especially in lang- 
uages which have possessed an orthography for some time. This has 
been noted especially in connection with the independence assumption 
of Theorem 2, as in the interval immediately after a split we might 
expect parallel (to some extent, at least) evolution of the two 
daughter langumges; Lees (1953), Hymes (1960), Teeter (1963). Also 
in this connection~ independence of evolution does not strictly hold 
where borrowings, loan-translations and imitations of other types are 
frequent occurrences. 
Towards a new thgor ~ 
A number of authors have attempted to deal with one or more 
of these problems. Swadesh (1952) discarded more than half of the 
meanings in his original list. For choosing among synonyms, Gudschin- 
sky (19~) proposed a random selection, ~vmes (1960) suggested a 
procedure which would seleot cognate forms whenever they were 
16 
available, Satterthwaite (1960) and D,Jen (1960) pointed out that 
it would be more reasonable to choose the word which is most fre- 
quently used for the meaning in question. 
Little could be done about the central postulate or result 
of the theory; that ~ is a constant, until the work of Dyen became 
well known. Dyen, on the basis of comparisons of a large number of 
Malayopolynesian languages was able to segregate meanings into 
groups on the basis of their individual ~ 's. A discussion of the 
mathematical implications of this ( p=e~i (t's) for meaning m i leads 
to E(r/I M I) = "i~=e "~i(t's) ) was published by van der Merwe (1966). 
Meanwhile, Dyen (1964) had statistically demonstrated that meanings 
with high A in the Malayopolynesian languages tend to have high 
in the Indoeuropean languages and vice ve~. This was the first 
new type of lexicostatistic result since the work of Lees. Later 
(1967) this work was refined so that Dyen et al were able to 
estimate a separate ~ for each meaning on a 196-word list of the 
Swadesh type. 
On the problem of independence, Swadesh pointed out that 
interaction between languages because of contact would bias estimates 
of t-s downward. Hattori (1953) suggested and Hymes (1960) discussed 
the formula 
~-(r/t Z~I) = e "l'~(t's) 
as a way of taking into account parallel evolution and the effect of 
those meanings with lower ~ than the rest of the llst. The latter 
effect is, however, properly described by using a sum of exponen- 
tials and, for the former, it is unreasonable to expect a constant 
multiplier (1.4) to express the dependence of two languages over 
all time. It is Clear that the multiplier of -~(t~s) should be 
near zero when t is close to s and to approach 2 as t gets very 
large. This was noted by Gleason (1960) who rightly suggested that 
for all sufficiently large t, estimates of t-s could be corrected 
by adding a small positive constant. 
One further suggestion that has been made by many authors 
and implemented by some, e.g. Hirseh (19~)~ Hattori (1957), is to 
attempt to construct a larger set M to provide a better (i.e. 
lower variance) estimate of time intervals. 
The primary purpose of this paper will be to develop a 
formal theory of word-meaning relationship, applicable to lexical 
and semantic change, which incorporates most of the criticisms 
levelled against the Swadesh theor~ 
17 
Relationship j to linguistic theories 
This theory is unique in that ~ t provides a link between 
two previously unrelated linguistic theories, that of generative 
grammar, and the conventional descriptive semantics. Elsewhere (1969) 
we show how stochastic models, like our theory of word meaning 
behaviour, and Labov's (1967,1968) frequency approach to optional 
grammatical rules, can be derived by imposing probabilistic struc- 
ture on formal grammars. On the other hand, the major phenomena 
and problems of descriptive and historical semantics can be elegant- 
ly formalized in terms of this same model. 
18 
PART TWO - WORD-MEAN\]I~G PROCESSES 
The problems of the Swadesh theory stem from its assump- 
tions about the nature of meaning, and its oversimplified mechanism 
of lexical replacement. I propose a model of word-meaning relation- 
ship in which lexical replacement is a consequence of a more basic 
stochastic phenomenon - fluctuations in probabilities of word usage. 
The only aspect of a "meaning" which is relevant to this model is 
its representability by one or more words. I make no assumption 
as to the psychological or cultural nature of meaning. In fact, Thm. 4 
below shows that the set of meanings as defined here can be 
considered a purely analytical construct. This set is completely 
determined by comparing word usage probabilities in certain con- 
texts. For a natural language there is the possibility of construct- 
ing the set of meanings by empirical means (from word usage frequency 
data). 
Whether the entities I refer to as meanings correspond well 
to aspects of the intuitive (or the semanticists') concept of 
meaning depends on whether they have important properties in common 
and whether they behave similarly over time. It is ~ thesis that 
these entities model the processes of historical semantics at least 
as closely as, say, the "meanings" of Osgood e_~t a_~l (1957) model 
psychological aspects of meaning or the "m?anings" of Katz and 
Postal (1964) model the grammatical function of meaning. 
19 
The word-meanin~ relationship 
The mapping type of relationship in the Swadesh theory 
can be represented by a bipartite graph as in Fig. I . 
M L 
m 1, , ira1 
~' ~'lm 2 
miMl ~imiM i 
Fig. I. Map relationship (many-to- 
one possible but not 
one-to-many). 
The first generalization to be made is to allow a many-to-one 
(in both directions) relation= as in Fig. 2. 
Fig. 2 • Unrestricted word-meanlng 
relationship. 
The next important refinement of the model is the introduc- 
tion of probability distributions on words and meanings. The 
frequency with which a word takes on a meaning in M has, as cited 
in PART 1, been recognized as important to lexicostatistics. 
DFen's (1960) essay contains a clear description of how fluctuations 
in these frequencies underlie the phenomena of lexical replacement. 
In what follows, L can be understood as in PART 1, hut M is 
completely reinterpreted. 
,~finitio.n 
Let L and M be finite sets. 
Let p(. ," ) be a bivariate probability distribution on MX L. 
 .et S m =  l qp(m,l  > 01. 
If S m ~ ~ for all m~M, and if for distinct m,n~M, S m ~ S n , 
then M is a set of meanings on L, with respect to the distribution 
p, and eaoh non-zero p(m,l) represents a word-mean in ~ relationshi p 
between 1 and m. 
p(m,l) should be understood as the probability that the 
word i will be used, and that meaning m will be intended (when no 
information is given about the context). The definition incorporates 
two restrictions on abstract meaning, s, neither of which is overly 
restrictive when considered as properties of meanings in the in- 
tuitive sense. First, if a meaning is expressible by some word or 
other in the lexicon, that word must have a non-zero probability 
of expressing it (in some context which has a non-zero probability 
of occurring). Second, if two meanings are to be distinct, on our 
level of analysis, at least one of them must be expressible by at 
least one word which the other is not, Fig. 3 illustrates these 
conditions. The latter principle, lexica..._~l distinguishability of 
meanings, might seem to place too much emphasis on marginal or 
threshold word-meaning relationships (those with very low p(. ,. )). 
20 
- . ~ calm 
mj ~i t = happy 
mk~l u overjoyed 
~ I v exube rant 
Fig. 3 . Part of word-meaning system. 
iff p(m,l)> 0. 
A line Joins m and i 
Such objections will be seen to have little importance, however, 
after Theorem 9 below, where M is embedded in a metric space. 
Hero all meanings which do not differ greatly in their usage proba- 
bilities will cluster together in the metric space, and any com- 
parisons between meanings will be in terms of the metric. Assuming 
lexical distinguishability facilitates the particular line of 
development followed here, but relaxing it (e.g. in favour of a more 
quantitative distinction between meanings, or in favour of a defini- 
tion of meaning grouping closely related lexically distinguished 
entities) is not likely to radically affect the behaviour of 
meanings in the metric space. An important consequence of the 
definition of a set of meanings is 
.T..hoorem 
Let ~(L) be the set of subsets of L, and let M be a set 
of meanings on L with respect to p. If S m ~l~LIp(m,1) > 0~, 
then 
m~S m 
is a one-one map from M onto a subset of ~(L). 
21 
Proo~ 
It need only be shown that if 
m -----*- S m , n~S n 
then 
S n = S m 
or equivalently, 
m~n 
m ~ n =:$ s m ~ s n , 
but this is just the condition of lexical distinguishability in the 
definition. 
Theorem @ tells us t~at, for analytical or computational 
purposes, we can treat meanings as sets of words. Two meanings 
are distinguished by the words they do no_~t share and are related by 
these they have in common. Note that the case p(m,l) = 0 can arise 
in two ways. Either p(m,l) = O, for all i, in which case S m = 
and m is not a meaning, or m i_~s a meaning but I ~S m. From now 
on, no distinction will be drawn between the meaning m and the set 
Sm, and the latter notation will be discarded. Sometimes, an 
entity whose status as a meaning or not is under study, will be 
labelled m. If m is not a meaning, p(m,l) -- O, for all i; m ~ M; 
S m = ~, etc., and every attempt will be made to keep this usage 
unambiguous. 
_Interpretation of the marglnal, distrlbu~ions 
With the usage probability interp~tation of p(. ,. ), 
g(1) = ~p(m,l) 
m 
22 
23 
is the overall probability that i is used. The probability 
function g(1) underlies word-frequency distributions, e.g. those 
of Zipf (1945), Josselson (1953), and Juilland (196~a, 1965b). 
1 
is the overall probability that m is used. This is related (at 
least conceptually) to the "semantic frequency lists" of Eaton (1940). 
Since these are probability distribution functions, 
m i m,l 
and~ of course, 
p(m,l) ~ 0 . 
Recapitulating, a word-meaning relationship exists between 
m and i, or a line is drawn between m and i on a word-meaning graph 
like Fig. 2 or 3 , iff i can take on meaning m, which occurs iff 
p(m,l) ~ O. (I.e., we require that if a word ca..~n take on a meaning, 
there is a non-zero probability that it wil_.~l do so. ) The statement 
f(m) = 0 is equivalent to saying that m is not lexically represent- 
able by elements of L, and m~ M. 
Precision of speech 
In constructing a model involving the grouping of words 
and the distinction between meanings, provision should be made 
for some degree of variation to correspond to the variation which 
occurs in reality, from person to person and, more especially, from 
situation to situation. This variation is a complex effect, but 
24 
a good deal of it may. be interpreted as alternation between precise 
and loose speech. In certain situations, and for certain topics, 
effective communication requires unambiguous usages, specific rather 
than generic terms, and other manifestations of precision which 
are, on the other hand, inefficient, uneconomical or just too 
difficult to sustain in everyday speech. This alternation may occur 
independently in different parts of the lexicon in a natural 
language, but for our model we will use a single precision parameter 
06. Each value of o6 will specify a set M~ of meanings on L. In 
the next few sections, the probability distributions and other 
entities dependent on OC will be so subscripted (e.g. pK(m,l), M~). 
In what manner should the system depend on ~6 ? In natural 
languages, as a speaker becomes more precise he draws more distinc- 
tions between words and he groups two words of similar meaning less 
frequently (i,e. with smaller probabilities). One measurement 
which is sensitive to this process in the model is the average size 
of the meanings 
where Iml = \[Sol, the number of words connected to, representing, 
or simply in, a meaning. This measurement would be too crude, by 
itself, to serve as a precision parameter, since it does not 
distinguish between overall precision in the system and extreme 
precision in one part of the system but li%tle precision in the 
rest. Instead, a condition should be placed on the system so that 
if ~ increases, then in any Dart o~'the ~ I, this increase 
25 
would coincide withan increase ~n the probability weight on small 
meanings (i.e. ~m~ is small) and a decrease in ~ would coincide 
with an increase on largo meanings. Such a restriction may be 
formalized as follows. 
Let 0(6 \[0,I\]. Let DC ~(L) be any set of subsets of L, 
meanings or not, such that 
m ~ D, nCm ~ n~ D . 
Then it is required that 
m~D mGD \]~m 
is monotonic and non-decreaslng with ~. Another way of looking at 
this is in terms of the lattice of subsets of L. If we choose any 
points in the lattice or even draw a llne right across it~ the prob- 
ability assigned to all sets below these points 9 or below the line~ 
must increase (or at least not decrease) as ~9 the precision~ in- 
A simple example ~ illustrate this. Let L = (11,~,13} creases. @ 
Fig. 4 depicts the lattice of subsets of L. 
: \[ i ,12,13I 
Fig. 4 . Possible meanings when L = \[11,12,13~ . 
26 
For three values of ~ , values of ~(m,l) might he as in 
Table 1, and it is easy to verify that the precision condition 
hel~ ~o car = one o~ ~I~, \[~, ~, ~,~, ~'=3\] ' 
f~3'~ ' ~=i'~'~\] ' {~'~'~}' f~'~-'~3~ ' ~'~3'~J ' 
{=~.,~,,b,=~,=~,=~,,~ >. 
~=1 I 
f 
12 
13 
~/8 - o ~ - o ~,0~ =~.~ 
. 1/8 - ~ 0 0 high precision 
I ll ~=0.5 i z t/to - .- o - z/lo z/zo ~.5 = {"4-'"~'mS'm6'm?} - Z/tO - o Z/5 - l/Zo %~\] =z'z - - 0 - 1/10 1/10 1/10 medium precision 
~=0 f 
l 
13 
o - - o - 1/8 ,~ ~={'~."~} 
- o . o o - .~ ~-.O=,\]=z.7~ 
. -- 0 l O 1/8 ~ low precision 
Table 1. A word-meaning system at 3 levels of precision. 
The example suggests the next theprem, which confirms 
that the precision requirement is strong enough to imply 
monotonlcity of the averQge meaning size. 
2? 
Proof 
E,~\[|ml) is a decreasing function of' ~ . 
~t ~(i) =~\] ~,~(~) "~ i = 1,2,. ,I~I , 
Iml=i 
lml=i 
.Then a(.) and b(.) are probability distributions on the integers, 
where a(i) is the probability that an unspecified meaning will 
contain i words. Consider 
Clearly 
requires 
D i= \[m~L( ImISJ~ . 
re(D, n~m ~ n &D . Then the precision condition 
Therefore 
ImI.~J lm|eJ 
J 
i=l " i=1 
Since a(.) and b(') are probability distributions, 
ill • ILt 
i=l i=i 
i=J i--i i=1 i=J 
J 
28 
Then 
~" ja(J) ~" Jb(J) 
J=l J=l 
since a(. ) and b(,) are the probability distributions of the 
values of |m|. 
Regularity conditions 
We have imposed a condition on the p~(m,l) so that the 
probability weight must flow down the lattice of subsets of L as 
C( increases. It would be desirable, from the viewpoints of model 
realism and analytical convenience, to have this "flo~' behave in 
as continuous a manner as possible. It would be most convenient if 
the pJm,l) were required to be continuous functions of ~ , but 
there are good reasons to relax this somewhat. 
Again trying to model natural fan,ages, it would be 
realistic to require that the following process may occur in the 
system. Suppose a meaning m'~ is connected to k,ll,12,...,ir~ L. 
our earlier notation, Sme= ~k,ll,12,...,I~, in our current (in 
notation m' -- {k,11,12,...,lr~ , po<~',k),O, Po(m',li>> O, ~ li'~). 
As ~ increases, the values of all the p~,l i) fluctuate but 
remain greater than some positive value, except for p~(m~k) which 
gradually drops to zero at ~o" In terms of speech behaviour, 
the words k, iI,12,...,i r are used interohangeably (in certain 
Z9 
contexts) to mean m t, when precision is low. As precision increases, 
11,12,...,i r continue to be interchangeable but k is seldom usable 
in this sense and~ at ~(o, never. It is most important in what 
ensues to understandthat the set m I = {k,11,12,...,I ~ ceases to 
be a meaning when the precision is ~o' 
i.e. e m e M~, 
GC<O( o 
m' ~ M~.. 
It is, however, most natural that m = me-~I,~1,12,... Sir ~ 
b~e a meaning at O( o' since the interchangeability of these words is 
not necessarily dependent on the behaviour of k. Hence, if any 
psychological interpretation is to be attached to the set of abstract 
meanings in our model, it must be realized that as precision changes, 
the abstract label attached to a psychological or cognitive entity 
may suddenly change as lexical representability of that entity 
changes. If this seems strange behaviour for a symbolic system, 
it should seem less so later, when the M~are embedded in a metric 
space and the relative position of meanings in this space becomes 
more important than the letters that identify them. 
e Returning to quantitative considerations, since m ceases 
to be a meaning at ~o and m suddenly takes over its role, it is 
necessary that p~(m' ~l~,..., p~m',l r) drop discontinuously to 
zero at % and poc(m,ll), ..., p~(m,l r) Jump to compensate. 
We must, therefore, accept certain discontinuities of 
this sort in the model. For simplicity°s sake~ we restrict 
30 
occurrences such as this so that only on_~e p~(m,l) may drop 
' k continuously to zero at any particular value of a6 o (p~(m , ) in 
the example above). This is in fact a weak restriction, in that 
we can approximate situations where N of the pm(m,l) go to zero 
at ~o by having them do this one at a time, at a(o, ~o ~ E , 
~o * 2G , . . . , ~o ~ (N-I)E for arbitrarily small 
An appropriate continuity-discontinuity condition may be 
most economically phrased as in condition (iii) in the next seetlon. 
Summary of development thus far 
We assume that there exists a finite set L (the set of 
words) and for each ~ • \[0,I\] a finite set M~ (a set of meanings 
on L) and a hivariate probability distribution pm on MmM L such that 
m~M~ IGL 
The elements of M~ are in one-one correspondence with certain 
non-empty subsets of L. 
mw---*s m ¢~ ~(m,l)> O, ~l~S m . 
This correspondence enables us %o unambiguously identify S m with 
m~ and we may rewrite the above condition 
As 
(i) p~(m,l) > 0 ~ le m and mcM~ 
varies between zero and I, the followin E conditions must hold: 
(ii) If DC @(L) such that m~D, n~m ~ n&D, then 
~ p~(m,1) is monotone non-decreasing with ~. 
maD l~m 
31 
(iii) The pc(m,1) are continuous functions of ~ only where 
M~ is fixed. M K changes at @~o only as a result of discontinui- 
ties occurring, for unique m, and unique k~ m, to all of 
but " " 
p, Jm,1) • p=(m * {k},l) is oontinuous, for ~n 1~ L. 
Before. enunciating the continuity and discontinuity 
condition (lii), we described the desired behaviour of some of the 
functions p(. ,. ) at a point where the condition is relevant. We can 
prove that this condition implies this behaviour. 
In the system as described above, if ~o is a point where 
M l changes, then p~(m @ {k},k) (as in condition (ill) above) is 
continuous at ~op and if it goes to zero at ~o it is the only 
such function. 
Proof 
By condition (iii), 
Therefore 
p~(~,k) • p~(~ • \[k},k) is aontinuous ,t =o. 
But 
pw(m,k) ~ o since k ~ m ; 
he~. the contin~ty of ~(. * @},kL 
32 
~ow if any other p (n,l')• goes to zero at go' n ceases to be a 
meaning and MoQ changes as a result. This contradicts condition 
(iii) unless n = m or m • ~ k~, in which case discontinuities are 
prescribed by the same condition. 
Existence and local behaviour 
The next theorem gives assurance that the conditions 
on the components of a word-meaning system, as developed so far, 
are not contradictory. The proof consists of a construction of a 
particular system (which is otherwise uninteresting) and is presented 
as Appendix 3 in Sankoff (1969). 
,Theorem 7 
Word-meanings systems exist. 
Specifically, it is possible to construct a word-meaning 
system using any finite set 
, L 
The regularity conditions are strong enough, however, so 
that aside fromeontinuous variation in the p,~(.,.), only certain 
types of change in 1~,~ are possible. 
Theorem 8 
Suppose M K changes at ~o" Let M" , M ~ be the state of 
Mm in small enough intervals to the left and right of ~o, 
I 
respectively. Then one of A, B or C must hold. 
A. For a unique m, and unique k~m, as in condition (ili), 
represented by 
Be 
C. 
Proof 
(~,~: ~,~), 
(~, #': ¢,~), 
• p~.Cm * {k~,k) - 0 , 
.P~o(m + (k},k) ~ 0 , 
p{.(m • Ik),k) = 0 . 
33 
There are 16 ways of filling four places ~rAth e or 4 • 
( ¢,c, ~,e ), (¢,4 ,4,~ ), ( ~,~ : ¢,#) and (4,4, e,~) 
involve no change in H,~ ,. 
imply either p~(m,l)--'--- 0 or p (m • {k\]tl)~ 0 near ~ot and henoe 
haveno discontinuity. 
zn (4 ,e,4,~ ) and (~,¢:6,4), p~(m,Z) and p=(m • {k},Z) "jump. 
in the same direc~ion, henoe their sum could not be continuous. 
(¢ ,E;~,~), (4-,~ ,e,~) and (~,~;~,4) v4_oZ,,te con~tJ.on (44). _ 
There remain only' the three possibilities, 
A. m *~k) disappears, llm p~(m *{k~,k) = p~m "l" (k~,k) = 0 . 
B. reappears, me (k} in H" andH +, p~ (m +{k},k)>0. 
D 
C. m appears, m • {k\]dlsappears, p~o(m • {kl,k) = o . 
These three situations are illustrated in Fig. 5A,~ B and 5C. 
t 
F~.g..~A~ . (,-,Z) .f,';~ 
Possibility A, ( ~'J"~ 
The. 8 
(m.÷ 
0 @C@ -- ~--~ 
N.B. right oontinulty instead of 
left oontinuity would be 
" ~'p\[ equally ~sslble here. 
, L.. k'~ ~,.~ ..., '1 " 
Fi~. 5,B, / "~-R:.:\].. (,,,,Z) 
Possibility B, G\] t ""' " -: " " 
? 
(a + 
(m+ 
Fig. 5~; Possibility C, 
Thm. 8 
( "',..,.--"1 
Lk\] L ./'"~-_..) : 
k'~,k) .-. 
,% 
35 
NeaninMs as points in a metric space 
The idea of distances between meanings is not new, and 
there have been a number of attempts to operetionalize this concept. 
We shall examine a very natural way of defining such a distance for 
the meanings in a word-meaning system in terms of the functions 
p,~(m,l). 
_L 
Definition 
Let agM~, nGM B 
The°rein 9 
d~,a defines a metric on Mo~. 
The norm Z~o \[ defines a metric on probability distributions 
defines a probability distribution on L. 
z~(.) 
It remains to prove that two such mK M~ do not define the 
same distribution. But this follows from the fact that each m ~ M~ 
defines a unique subset of L such that pg(m,l)> O. 
Remark 
If as/~ increases beyond ~< , p~(m,l) changes, d~Afm,m) 
will have a minimum value at F = ~ and will increase for ~ on 
either side of @~ . In a neighbourhood of ~ , ~ (re,m) for fixed 
36 
m, then, measures distance from ~ . This relevance of d to the 
parameter as well as to the meanings will become important in later 
sections. 
Theorem iO 
If ~ =M I,M i =Mj~or 
and if 
Proo___ff 
m~I, #e J, two intervals, 
mG M I , n~Mj 
d~a~(m,n) is continuous on IXJ. 
This follows from the continuity of the p~ on such intervals 
and from the fact that d is a continuous function of such p~ . 
As ~ ehanges, the points in M~ move continuously. 
When M ~ changes, two (at most) points experience a sudden shift in 
position with respect to the rest of the points. This may involve 
the creation or annihilation of these points. When ~ is close to 
1, there will be few ~rds in common between two meanings, on the 
average~ and hence the distance between them will be close to I. 
When c~ is close to zero, on the other hand, the reverse is true, 
and distances will tend toward zero. This rather succinct comparison 
of precise versus loose usage accords well with more intuitive notions 
of precision of spe~h. Fig. 6 and Table ~ present, as illustrations t 
the distances in the metric spa@es defined by the 3-word system 
described earlier in this chapter. 
3? 
~'=1 
c,' =0..5 
,% 1 
,~ 2/3 
1 
½ 
~2 
"b 
1 2 
1 2/3 
2/3 ~/3 ~/~ 
m6 
= o ~! I 1/3 
Table 2. d~,o((o,.) for system 
of Table 1. 
m6 ----_m 7 
Fig. 6. 2-dimensional 
visualization of 
distances in 
Table 2. 
Di~c~ronic word.meanin~ systems 
We have developed, in some detail, a synchronic (i.e. at a 
fixed point in time) theory of words and meanings. It remains to show 
what relevance this has to historical linguistics and lexicostatlstics. 
As Ullman (1957) remarks: 
"The two C semantic relationship, simple or multiple, 
and semantic change~ are interdependentt one being 
the projection of the other on a different plane. 
The functional analysis of meaning will entail there- 
fore a definition of semantic change along similar 
lines. If a meaning is conceived as a reciprocal 
relation obtaining between name and sense ~word and 
meaning 3 , then a semantic change will occur whenever 
a new name becomes attached to a sense an~or a new 
sense to a name." (p.171) 
and, as he points out, word-meaning phenomena at a fixed time have 
parallels in processes of change over time. 
In our particular model, changes in the system as the 
precision parameter changes will provide the prototype for change 
with time. 
Definition 
A word-meaning system history is a word-meaning system with 
W(\[0,1\] replaced by tE\[O,T\] (time parameter) and with condition (il) 
relaxed entirely. Condition (iii) is changed so that if k and m are 
given as before, and ~ is a ne_~w meaning or if m disappears 
starting at to, there are discontinuities in Pt(m ÷ \[kJ ,k) and 
Pt(m @ (k~,l) ~" Pt(m,1) for one lgm, but 
Pt(m @ \[k},k) 4', Pt(m .{- {~k} ,1) '¢" pt(m,1) 
is continuous. 
Although an adjustment to the construction necessary for T~.8 
could adapt the existence proof of word.meaning systems to that 
of word-meaning system histories~ it will be simpler to leave 
existence to be implicit in the constructions carried out later. 
Theo rein , 11 
Suppose M t changes at t o . Let M', ~ be as in Thm. 
l Then one of A, B, C, A', B', C' holds. 
A. (~,~,~,~), ~o(m* {k},k) :0, 
A e . 
8. 
( ~,~ ~, 4), ~(~ * {kJ,k) > o. pg(= ÷ {k},k) = 0, 
B. (~ ,E; ¢,~), Pto(m *{k},k)>O, 
B'. (¢,G;~,e), Pto(m * {k},k)~O, 
C. ... (~,~;~,'e), p~(m÷ ~k},k)~O, 
C'. (¢,G'; ~, #), p~(m ÷ ~k},k)> 0, 
A, B and C were the three possibilities admitted in Thin. 8. 
The only new restriction applies when m • ~k~ appears or m dis- 
appears at to, and therefore applies to none of the three. A', 
B' and C' were discarded in Thin. 8 because they violated condition 
(ii). The new condition (iii) applies to all of these cases. In 
A, ~d c', ~. ~k~ app~are so pt(~ * ~k},k) ~ust J~p r~ se~ at t o. 
The cases A, B and C are still represented by Fig. 5A~ 5B 
andSC , with e~ replaced by t. Cases A', B' and C' would he rep- 
resented by mirror images of these three figures, except that 
Pt(m ÷ {k~,k) must exhibit a discontinuity at to, and one of the 
Pt(m * (kt ,1) must compensate for this. 
The asynnuetry with respect to time of the conditions for 
changes in ~ may be interpreted as foll.ows, The probability 
that a word may he used for a meaning may drop to zero continuously, 
but it may not increase from zero continuously. Instead, it must 
at some time jump to some finite value. This distinction is not too 
important to the overall characteristics of word-meaning system 
39 
40 
histories, but we note it because the particular type of histories 
we shall study have this property. 
The development of the metric d in the previous section 
carries over completely when the time parameter replaces the pre- 
cision parameter, except, of course, that there is no longer any 
necessary trend in the average distance between meanings as t increases. 
Anticipating some of our later discussion, consider the 
case where all meanings consist of exactly one word, as in the 
Swadesh model. In this case, letting s and t be time as in Thin. I, 
ds,t(m,n) = I - ~(k,1) 
where m = {k%~Ms, n = {l~'~. d then, is in a certain sense a 
P 
generalization of the cognation indicator ~ . 
Wo rd-meanln~ Drogesses 
So far, changes in M~ or M t have been deterministic as the 
value of the parameter changes. (Even though the p~ or Pt are 
abillty functions, we have not studied further properties of the 
random variables which are distributed according to these functions, 
and we will not do so. In linguistic terms, we are still dealing 
with lan~ue and not parole. ) To generalize the Swadesh theory, and 
to provide a realistic model, we must take into account unpredict- 
ability of lexical and semantic change. In probability theoretical 
terms, we must impose a probability measure, on the set of all 
possible histories. We shall not do this explicitly. Rather we 
shall assume it is possible, and assume that the examples we construct 
by specify-ing local behaviour are well-behaved in terms ofan 
underlying probability measure space. 
Definition 
A word-meaning process is a set of word-meaning system 
histories indexed by 60 ~ ~ where (~ ,~ , P) is a probability 
measure space. This means that any event or combination of events 
in which we may be interested is represented by a set, A, of 
histories (WE ~ ) where A is a member of the ~-algebra ~ , 
and where P(A) is well-deflned for all A6~. 
A wo,rd-meaning process based on Brownian , motion 
To construct the word-meaning process which is the best 
model for natural languages would require the operationalizing of 
definitions, collection of much data and its statistical analysis. 
At present, we shall attempt only an heuristic investigation. 
In PART i, we emphasized the basic unpredictability 
of change in the word-meaning relationship. In terms of our model, 
(and considering only small intervals of time) this means that for 
t> s, 
X \[Pt(m,l) - Ps(m,l)3 = 0 
Furthermore, it should not be possible to. predict the future 
behavlouF of Individtml Pt(mDl) from trends established in the 
past: for any t>sl>s2~ . . .>s r 
p\[pt(m,1) ~ ps (re,l), ps z(m,1), . . .,psr(m,1) \] 
. the oo d tion 
But these two conditions and the continuity conditions on 
Pt indicate that the local behaviour of Pt(m,l) should resemble a 
41 
diffusion process, with zero drift. The simplest such process is 
the well-known Brownian motion, whose behaviour characteristics 
change neither with time, t, nor with position, x. 
We proceed to construct a word-meaning process satisfying : 
these properties. Let (L i M~, p~,, ,)) be a word-meaning system 
for a fixed c~ . For t = O, let Pt(m,l) = p~(m,l), M t = M=. 
Let 
n o is the number of word-meaning relationships in the system. 
Let xl(t) , ~(t), . . .)~(t) t t ~ 0 be n o sample paths of a 
n O 
Brownian motion process, chosen independently, and x(t) =~ £~--j1xl (t). 
Let Yi(t) = xi(t) . x(t). The Yi are also 8rownlan sample paths, 
but are no longer completely independent in that 
yi(t) = xi(t) - ~(t) i=1 i=l i=l 
= no~(t) - no~(t) 
Let 
= O. 
Pt(m,l) = po(m,l) * yi(t) , 
where i = l(m,l) is determined beforehand. Then Pt is continuous 
in ~ O,T~ with probability I. We must en~ure that pt(. ,.) is a 
probability distribution. 
i 
~3 
n o 
~. Z pt(m, l) = I ~ Pc(m, 1) ~ ~ Yi (t) 
m~l~ lem m~M o l~m i=l 
= 1 ÷ 0 
= 1 
It is not necessarily true, however, that pt(m,l)~ O, 
since Yi(t) may be negative. To adjust for this, let 
In other words, all the Pt(m,l) are positive before 1 ~ . Then 
t with probability 1, there is a unique m'~ ~, k~ m , such that 
zim ~(~',k) = p~(m',k) = 0 . 
But this is reminiscent of case A or C in Thin. 11 (see Fig.5), 
where one word in a meaning loses its ability to be grouped with the 
others. Then all the pT(m' ,1) should drop %o zero and all the 
p~(m'-\[k~,l) Jump to compensate. According to whether m' - {k} ~ % 
or not, we have ease A or case C respectively. Then it is a simple 
matter to determine ~. Now, change the definition of all the 
Pt(m,l) for t >~, by calculating 
and starting over as for t = O. 
Continuing this way until t = T, we ensure that no Pt(m,l) 
ever drops below zero. 
We now have a word.meaning process, but not a very healthy 
One, in that IMtJ decreases monotonically with t. 
To counteract this, we superimpose another process on our 
construction. We select points in \[O,T\] at random as follows: 
The probability of no points being selected in an interval It,t÷ ~ 
is 
I -/~At + h 
where ~t~ 0 as ArgO). At each point ~ selected, randomly 
choose m~ M~and keL, k/re. If f%~(m + {k} )>0, the system 
undergoes a change as in case B w of Thin. 11 . If f~.(m + ~k}) = Oj 
the system undergoes an A'-type change with probability 1Y , and 
a C'-type change with probability I -~ . In each of these cases 
an element i~ m must be selected at random so that Pt(m + ~k},k) 
Pt(m +{k),l) and Pt(m,l) are discontinuous but their sum is 
continuous. The size of the discontinuity is uniformly distributed 
between 0 and Pt(m,l). In case A', we assume that after this latter 
step is done, each element in m loses a random (but fixed) propor- 
tion of its probability weight to the corresponding element in 
This ends the construction. Note that case B of Thm.ll 
does not occur in this example. 
Had we not insisted on the extra discontinuities (in 
Pt(m + ~k~,k)) in the definition of a word-meaning system history, 
we would not have been able to use the Brownian motion. If 
p~ (m + {k~,k) = O, and if we add a Brownian motion Yi(t), 
B~.~t(m + ~k),k) will be zero again for arbitrarily small t. Hence 
we must start p (m + (k~,k) at a finite value, i.e. discontinuously. 
4. 
45 
Stability 
"The first thing we would like to know about our system is 
whether or not it is degenerate. Does it tend to degenerate into a 
single word-meaning relationship with p(~l~ ,1) = I ? Does the 
number of meanings I~l or word-meaning relationships n t tend to 
grow without bounds as T and ILl increase? 
By increasing ~A to a high enough value, we can increase 
the rate at which new word-meaning relationships are created, 
and hence reduce the time during which n t is at low values. At 
the same time, n t cannot increase without bound, since as the 
number of word-meaning relationships increases, the probability 
weight attached to each must decrease, on the average. Hence a 
higher proportion of relationships tends to be annihilated per 
unit time, as in cases A and C of Theorem11 . A rigorous proof 
that n t is neither too large nor too small most of the time does 
not seem easy to achieve, simply because of the complication of 
the model and the importance of the initial conditions. In any 
case, such a result would be rather weak. It seems likely, and 
we will present evidence from sampling experiments to support 
this, that as t ~ ~, n t tends to vary about an equilibrium 
mean value acoording to an equilibrium distribution, depending only 
on the system parameters ~A and ~ . 
~6 
Re~arity of 'c~nge ~ (~, dt, t) 
For each ~ , a word-meaning system (relatively complicated) 
was associated with a relatively simple metric space (M~,d,~. The 
meanings corresponded to points in the metric space and the distance 
• between meanings varied continuously almost everywhere with respect 
to ~. 
The same remarks hold true, of course, for the analogous 
metric spaces (~,dt, ~. As t increases each meaning moves continu- 
ously except at certain points where it can split into two or merge 
with another meaning. At such times there are discontinuities in 
~,t, but these are not usually very large. This regularity of motion 
ensures that we have some sort of correspondence between the sets of 
meanings at two distinct times. In the Swadesh model, a well defined 
correspondence is assumed, in terms of the universal set of meanings. 
If we do not postulate anything of this nature, since it must 
necessarily refer to cultural universals, not linguistic universals, 
it becomes more difficult to make word-meaning comparisons at two 
points in time. Indeed, if after a point in time, s, a meaning 
loses a lexical representation (as in case C in Thin. 11 ), it ceases 
to exist, in our technical sense, and ethers close to it take up 
its semantic load - and we must, at the very least, assume some rule 
for choosing a related or close meaning, for all later points in 
time, if we are to make lexical comparisons. The intuitive use of 
the term "close" gives a clue as to the appropriate choice - the 
% 
meaning n which minimizes 
ds,t(m, n) • 
This has one important desirable property for such a rule. For t 
very close to s, in most cases n will, of course, be m itself. 
ds,t(m,m) = ds,t(m,n) will then be the sum of the absolute values 
of quantities approximately proportional to Brownian motion (see 
definition of d~, 6) and hence will, on the average (or in expectation) 
increase monotonically. 1 - ds,t(m,m) will decrease monotonically. 
After a discontinuity 1 - rain do +(re,n) will continue to decrease. 
Since it is the processes of lexlcal less and lexical replacement 
which are responsible for this decrease, 1 ~ (1 - mln ds,t(m,n)) 
is a likely candidate to replace Swadesh's ~M 
m~Z 
lexicostatlstlc indicator. We will so use it, keeping in mind that 
it does not involve any pan-cultural or pan-llnguistic method of 
selecting universal meanings to compare. If such a method existed 
(and it does, approximately speaking, e.g. the Swadesh llst) our 
indicator must necessarily provide an upper bound for any indicator 
of the form i - ds,t. 
Simulating word-meaning processes 
A complete, purely mathematical treatment of the Browuian- 
based word-meaning system would be difficult, and no results analo- 
gous to Theorems I - 3 are yet available. On the other hand, by 
ChOOSing a set of Po(m,l) = p~o(m,l) from a word-meaning system, 
48 
and fixing ~ and ~ it is possible to simulate the behaviour of 
the bivariate functions pt(m,l). A sample from a number of simulated 
histories might produce some hint of what the Corresponding theorems 
might be. The remainder of this chapter consists of an account of 
such an experiment. 
Asimulation nro ~ram 
A computer program (see Fig. 7) was written to provide 
word.meaning histories sampled from the Brownian-based process 
(actually an approximation of this process). 
The program accepts as initial data T (the length of the 
simulation), parameters I (from which /~ can be calculated), 7/ 
and ~ ; and two matrices N(i,J) and P(i,J) with ~M} rows and 20 
columns. The row index i identifies the meaning being consideredp 
and the non-zero N(i,J) identify the words connected to that meaning 
(up to 20). P(i,J) then, represents Po(mi,lk) of the system where 
~(i,j) = i k. (It is more economical to store two IMI X 20 matrices 
than one ~M i X |LJ matrix if ~L~> 40.) 
To approximate the Brownian motion from time t=O to t=I, 
one part of the program adds a normal random variable to each of the 
non-zero P(i,j). These variables have mean zero and variance I and 
their sum is zero, as specified in the model. Each of these P(i,J) 
is then examined to see whether it has dropped to zero or below. If 
it has, the rest of the non-zero P(i,k) are set to zero as in cases 
A and C of Thin. 11 and P(h,g) are increased by compensating amounts 
where h and g are the appropriate meanings and words for the cases. 
49 
Another part of the program picks an integer according to 
a Poisson random variable, with mean lOp and this variable represents 
the number of cases A', B' and C' which have occurred during the time 
increment I. Hence J4 ~ 10/I. For each of these occurrences the 
program then allows a choice of whether the word (see Thm~l ) is to 
be a new word (borrowing) or a word that is already used for another 
meaning (this choice is made at random with probabilities 0 ,I-~ ). 
The meaning m and the word 16 m (again as in ThinS1 ) are chosen at 
random. If necessary (not in case B') a random choice is made 
between A' and C' according to parameter ~ , and if necessary (case 
A') the allocation of probabilities between m and m • ~ is decided 
by choosing a random number (uniformly distributed between 0 and 
p(~,z)). 
The program then provides for the examination of the system 
to oalculats the resulting values of ~M~, IL|, ~(., .) , P(.,-) and 
n t and it prints these out. From this point it returns to the Brownlan 
motion section and sets t = 21 and adds another batch of normal varia- 
bles with variance I, eto. 
The above is only a summary of the program. Other routines 
relabal words or meanings so that they may be stored and examined 
economically, and others allocate any "negative probability" from 
Brownian paths going ~ zero during a time increment (when in 
theory they are only allowed to go as far as zero) among the other 
word-meaning relationships of the meaning involved. Finally, in the 
versionrsprosentsd~nFl~Tthere is a routine which compares the word- 
meaning system at time t with the initial word-meaning system (at 
50 
N,P,T ~ ? I,/, 
,~ , ~| 
if t--T,store / 
ne 
Calculate t 
IMt I, ILtl, ~t I 
~tus status 
1 0 
,~t |, nt,F(t) 
Choose Poi integer J, 
mean i0 
Calculate F(t) I 
I Add normal r.v. 1 to each P(i,J)~O~ 
I Adjust according . to Thin, 11 1 e~ 
I Allocate "negative 0 probabilities" ~e~ 
.4 
~ess ...... than J J cycles / cycles completed 
completed 
1 Choose meaning 1 
Borrowed word or old word 
(choice) I 
Fig. ?. Flow ohart for simulation program. 
5: 
time t--O) according to our lexicostatistic indicator 
~---~I ~0(1 ~ do,t(m,n)) F(t) = .rain . 
Results of a simulation exper~mep% 
To illustrate the properties of a Brownian-based process, 
we will present the results on 12 sample histories of a simulated 
process with the parameters fixed. 
These histories were obtained as follows. For the first, the 
initial system was represented as in Fig. 8 . 
20 1 
18 ~3 
13 ~8 
11 10 
Fig. 8. Ini%ial word-meaning system. 
where each line between an m and an i represents Po(m,l) = .01 . 
~ere I~2o, l~o, no=lO0. CO,T\]was ~vlded inte I00 inore. 
ments, and details of the system were extracted at time T 
and these were used to provide the initial system for the second 
history. This general procedure was followed thereafter with the 
final status of some of the systems serving as the initial systems 
for others. 
Stability and equilibrium distributions 
As we conjectured earlier, the system moves rather quickly 
to equilibrium and we can trace this in the first history. Fig. 9 
shows how \[~{, ~Ltl and n t tend to approach and then oscillate around 
an equilibrium value, 
The "equilibrium" distributions in Fig. I0 are calculated 
from all the values of the system characteristics, at all points in 
time, of the last 11 histories (since the first history started with 
a non-equilibrium state). 
Zi~f' s Law 
It is a property . of natural languages that, aside 
from the few most frequent words, the frequency of occurrence of a 
word G(1) and the rank order of this frequency, H(ll are related 
approximately as 
G(l~ = Ce "KH~I~ 
where C and K are constants. 
Our word-meaning systems do not have as many words as natural 
languages. Nevertheless, it is possible to calculate the probabili- 
ties (not frequencies) g(1) from 
g(1) = m ~, Pt(m,l) . 
l&m 
O" 0 
O 
• 4:- 
O 
0 
0 0 
I 
0 
t 
+ 
/ 
! 
"d 
0 ~r" 
g 
i..u 
0 
I 
N 0 
~ o 
Oq 
f: 
l, 
I+I I+I, 
O" 
.I 
0 
0 0 
\ 
I .,., 
I 
Q 
q 
i 
% 
+ 
i 
b ~ 
0 
0 
l I + 
l 
i 
I 
0 
I 
s \ . 
0 . 
0 i 
i ! 
I 
:J 
~. c', 
0 ~b 
0 
! 
~° 
0 
I \] J A 
j 
0 
J 
0 
(D I J. 
IJ. 
~I~ I ~" 
0 
0 O0 
! 
g~ 
o 
O" o 
0- 
~. 
m 
This was carried out for eiEht of the terminal word-meaning 
systems of our simulation and the g(1)were then ordered to give H(1). 
Plotting these (Fig. 11), it is clear that a Zipf's law can be stated 
which holds for the majority of the words in the system, excepting 
the first few and the last few. The "tailing off" effect can perhaps 
be ascribed to the homogeneity of the Brownlan process - any word, 
whose total probability fluctuates close to zero, is very likely to 
hit zero and be absorbed. By introducing an inhomogeneous diffusion, 
where the variance of the displacement of p(m,l) after time ~t is 
an increasing function of Pt(m,l), this effect could be removed, and 
the total number of words end meanings could increase as well. 
One interesting comparison can be made between the g(1) 
vs. H(1) curves for the initial and the terminal states of the first 
history (see Fig. 8 ). In the initial, non-equilibrlum state all 
words have equal probability g(1) =.05 . The terminal state has 
shifted to a typical Zipf's law. 
Lexico statistic s 
Finally, we present the results of the lexicostatistic 
survey of the 11 equilibrium system histories. These are displayed 
in Fig. 12 end the mean behaviour is extracted and is displayed in 
Fig. 13 • These diagrams speak for themselves - after an initial 
sharp drop, the index 
~-- ~ (I - min do,t(m,n)) 
undergoes an unmistakeably exponential decline. 
0.1 
lJo e 
e 
0.1 
°o 
ee o 
e•leee•oe 
0.I 
e 
ee• e•o • e- 
".. 
O. 01 
t g{z~ 
O. 001 
Fig. 11.. Zipf's law for 8 examples (note semi- - 
logarithmic plot). Successive 
examples shifted downward by 
factors of 0.1 (,Con%Lnued on nex% page) 
•ee°eOeeeee 
e•ea 
eeee • • eeo 
li'°eeeeee ° 
eeeo e 
•e 
ea e 
o• 
0• 
e. e• e e 
e°oeo • 
oe 
• eoe• 
••o • ! 
• ea ee e• 
8e • 
• ~ • o ee 
te%eeo-•eo • 
eOee o 
• "e e%eeee ° 
• 0e ~ '~ ..% ~ 
eeo 
"• I ee 
°. I e e 
°e o 
m• • e• 
e~ e 
e eeeeeeeee eOe°eeee eee eee 
eeeoeeeee eeo 
eeee 
°eeoe I 
eeeee e 
eeoee 
H(1) --~ 
O0 e 
QO 
0 
e•%ee e 
eoe o 
Qe ee 
0 
l 
I 
I 
l 
I 
l 
l 
l 
l 
I 
I | 
l 
0.1 
b. 
0,'I ~ 
°•e, 
0.1 
0.01 
O. 001 
b 
e o 
• i e 
eeoC, 
oo • 
oe etee eoe 
"e'e 
ee me e 
eeo 
• a 
• Qo 
NQ~eg~o 
t g(1) 
Fig. 11 
eoe e 
eee • eeee o, e 
e e 
°eeooj 
°e oe~o ° 
e. • • ee e 
e°oe • 
eeoee" 
°es • 
Qe 
o5 
doge@ 
eeeoe° 
O o 
ee~eoe 
°eeqb °eoeoeee~ e 
eoe e 
e °°°eeeo eo~ 
~oeee 
.(i) ---~ 
(Continued) 
ee e 
gee 
.e 
°e e 
e• o 
e• 
oee 
% 
e e 
eeee 
°e~ e• e• • %,, 
@ 
0.~ 
0.1 
) 
ee • 
oe 
e I 
O.(X 
;.001 
g(1) 
Be e 
ee e, 
e,e 
et 
H(15--,- 
aee • 
• eo 
e° eee • eeeeo • 
e~,ee 
eee 
• e, 
"ee ~)ee 
).000~ -- 
me me ee 
ee e 
eee e 
eee, 
eo eee • eeee e 
Fig. 11 (Continued) 
 Q o 
• Q °Q o Qo 
QIQ oo o I 
@@°i • 
°Q 
iOoo 
Q 
I O 
°e) e 
we" ee,,) e,~" 
ee. 
q)e~e e 
% 
i)e, 
gQ 
t 0 
E 
m 
c~ 
m 
0 
o ~ 
N ID 
~...g 
,el 
e"l ~ 2~ 
d 
J 0 
! 
• 0 
O: 
t 
e 
0 
O 
Q 
O 
Q 
59 
f I 
I ~ . .~" 
0 
U 
P 
I 
Q 
~, ill 
6O 
To what extent the ini~al drop is a property of the 
pal~icular metric he~u~ used and to what extent it is an inevitable 
consequence of the Brounian mo~ton, lust await further study. In 
any case, it does not seem to be a sidle ccmmequence of a Zipf's 
law distribution of wrd probubilltlem or t~e analogous effect for 
meaning, since it also occurs for very sy~etrical initial systems 
such as the one in Fig. 8 . 
Without cc~ to any speclf~ cc~cluslons, it is 
appropriate to end this chapter ~ ~Int~n E out that both $wadesh's 
relatively simple model of lexical loss, using a universal meaning 
set to compare language stages i and our more complicated model, in 
which comparisons between st~Ees of languages are made in terms of 
internal properties of the lexicon; co,cur in the very similar 
behaviour of their iexicostatist~ indexes. 
61 

References

Andreyev, N.D. 
1962 "Comment" on Bergsland and Vogt, Cur..~nt Anthro~ole .~.7 3:130. 

Bergsland, Knut & Hans Vogt 
1962 "On the validity of glottochronology", Current &nthro- 
3: 115-153. 

Bloomfield, Leonard 
1933 Language. New York. 

Brainerd, B. 
n.d. "A stochastic process related to language change" 

Chretien, C.D. 
1962 "The mathematical models of glottochronology", 
38: 11-37. 

Cohen, David 
1964 "Probl~mes de lexicostatistique sud-sdmitique", Proceedings 
of the Ninth International Congress of Linguists, H. Lunt, 
ed., The Hague, pp.490-496. 

Dyen, Isidore 
1960 '~omment" on Bymes, Current Anthropology 1: 34-39. 

Dyen, Isidore 
1964 "On the validity of comparative lexicostatistics", Proceed- 
ings of the Ninth International Congress of Linguists, 
H. Lunt, ed., The Hague, pp.238-252. 

Djen, I., James, A.T., & J.W.L. Cole 
1967 "Language divergence and estimated word retention rate", 
LanP.uaF.e 43| 150-171. 

Eaton, Helen S. 
1940 Semantic frequency list for English, French, German and 
Spanish. Chicago, University of Chicago Press. 

Ellegard, A. 
1962 "Comment" on Bergsland & Vogt, Current Anthronolozv 3:130-131 . 

Fairbanks, G.H. 
1955 "A note on glottochronolos~", International Journal of 
American Linguistics 21s 116-120. 

Fodor, Istvan 
1962 "Comment" on Bergsland & Vogt, Curren_..._._~t Anthronolo~v 3:132-134. 

Fodor, Istvan 
1965 The rate of linguistic change: limits of the application 
of mathemtical meth~dsiln linguistics. University of 
Budapest. 

Gleason, H.A., Jr. 
1960 "Comment" on Hymes, Current Anthropology 1=20. 

Gudsc hinsky, S.C. 
1956 "The ABC's of lexicostatistics", Word 12: 175-210. 

Gudsc hinsky, S.C. 
1960 "Comment" on Hymes, Current Anthropology I: 39-40. 

Hatteri, S. 
1953 "On the method of glottochronology and the time-depth of 
proto.Japanese", Journa_.____~l of the Linguistic ~ of Ja_~, no's.22,23, pp.29-7-~ ~glish 
su~mry pp.7~-~). 

Hatteri, S. 
1957 Kiso goi chosabyo (A test list of basic vocabulary). 

Hirsch, David I. 
1954 "Glottochronology and Eskimo and Eskimo-Aleut prehistory", 
American Anthropologist 56: 825-838. 

Hockett, C.F. 
1958 A course in modern linguistics. Bew York, Macmillan° 

Hoijer, Harry 
1956 "Lexicostatistiosl a critique", Lanzuaze 32: ~9-60. 

Hymes, D.H. 
1960 eLexicostatistios so far", Current Anthronolo~v I: 3-@3. 

Josselson, H. 
1953 The Russian word count. 

Juilland, Alphonse, & E. Chang-Rodriguez 
1965 a Frequency dictionary of Spanish words. Mouton, The Hague. 

Juilland, A., F.M.H. Edwards & I. Juilland 
1965b Frequency dictionary of Rumanian words. Mouton, The Hague. 

Katz, J.J. & P.M. Postal 
1964 An integrated theory of linguistic descriptions. MIT, Cambridge. 

Labov, W. 
1967 "Contraction, deletion and inherent variability of the 
English copula", paper given before the Linguistic Society 
of America , Chicago, December,1967. 

Labov, W. 
1968 "Consonant cluster simplification and the reading of the 
'-ed' suffix", unpublished manuscript, Columbia University. 

Lees, Robert B. 
1953 "The basis of glottochrenology", ~ 29: 113-127. 

Levln, Saul 
1964 "The fallacy of a universal list of basic vocabulary", Proceedings of the Ninth International Congress of 
LinguiStics, H. Lunt, ed., pp.232-236. Mouton, The Hague. 

Lunt, H. 
196# "Comment" on Dyen, Proceedings of the Ninth International 
Congress of Lingulsties, pp.247-2~2. 

O' Grad~, G.~. 
1960 "Comm-nt" on Hymes, Curren..._._~t Anthro~olo~ 1: 338-339. 

Osgood, C.E., G.J. Suci & P.H. Tannenbaum 
1957 The measurement of meaning. Urbana. 

Parzen, Emanuel 
1960 Modern probability theory and its applications. Wiley~ 
New York. 

Sankoff, D. 
1969 Historical linguistics as stochastic process. Unpublished Ph.D. thesis. McGill University. 

Satterthwaite, A.C. 
1960 "Rate of morphemic decay in Meccan Arabic", International of American Linn~uistics 26. 

Swadesh, Morris 
1950 "Salish internal relationships", Internationla\] ~ Jo2o_o_o_o_o_o_o~. ~ o_!f American Linguistics 16: 157-167. 

Swadesh, Morris 
192 "Lexioo-statistio dating of prehistoric ethnic contacts", Proceedings of the American Philosophical ~ 96, @~2-463. 

Swmdesh, Morris 
1955 "Towards greater accuracy in lexicostatistic datin~"p 
International Journal of American Lin~ulst~cs 21:121-137 . 

Swmdesh, Morris 
1962 "Conunent" on Bergsland & Vogt, ~ Anthre~olo~/ 3:143-145. 

Teeter, Karl V. 
1963 "Lexicostatistios and genetic relationship", Lactate 
39: 638-648. 

Trager, G.L. 
1966 "Comment" on van dsr Merwe, Current AnthroDolo~ 7s 497-498. 

Ullman, Stephen 
1957 The principles of semantics. 

van der Merwe, N.J. 
1966 
Barnes & Noble, New York. 
"New mathematics for glottochronolog~, Curren..._._~t Anthro~lo~v 
7: ,85-5o0. 

Zlpf, G.K. 
1945 "The meaning-frequency relationship of words", Journal of 
~svcholo~v 33: 251-256. 
