File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/65/c65-1015_concl.xml
Size: 29,088 bytes
Last Modified: 2025-10-06 13:55:51
<?xml version="1.0" standalone="yes"?> <Paper uid="C65-1015"> <Title>MODELS OF LEXICAL DECAY</Title> <Section position="2" start_page="15" end_page="15" type="concl"> <SectionTitle> Abstract </SectionTitle> <Paragraph position="0"> Lexical decay is the phenomenon underlying the dating techniques known as &quot;glottochronology&quot; and&quot;lexicostatistics.&quot; Much of the contraversial nature of work in this field is the result of extremely imprecise foundations and lack of attention to the underlying statistical and semantic models.</Paragraph> <Paragraph position="1"> A satisfactory semantic model can be found in the concept of semantic atom. Notwithstanding a number of philosophical objections, the semantic atom is an operationally feasible support for a lexicon which is a semantic subset of all possible meanings and at the same time, exhausts the vocabulary of a language. Lexical decay is the process by which the lexical item covering an atom is replaced by another lexical item.</Paragraph> <Paragraph position="2"> Exponential lexical preservation is, in this model, directly analogous to decay phenomena in nuclear physics. Consistency requires that the decay process involved in exponentially preserved vocabularies be a Poisson process. This shows how to form test vocabularies for dating and proves that presently used vocabularies are not correctly formed.</Paragraph> <Paragraph position="3"> Dialectation studies show that historically diverging populations must be modelled by correlated Poisson processes. Definitive statistical treatment of these questions is not possible at this time, but much desirable research can be indicated.</Paragraph> <Section position="1" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Kleinecke - I Introduction </SectionTitle> <Paragraph position="0"> This paper is an attempt to establish the method of dating by lexical decay upon an adequate theoretical foundation. The method discussed is that invented by Swadesh (1) over a decade ago and usually known as glottochronology or lexicostatistics. In the intervening years it has been widely applied, but often to the accompaniment of much confusion and contraversy. It seems that much of the confusion can be removed by a rigorous treatment of the phenomenological model and careful application of statistics. The contraversy can be removed only by the completion of a sufficient number of supporting studies. Rigorous formulation permits us to'pinpoint what studies are needed and what conclusions are being sought.</Paragraph> <Paragraph position="1"> Granting (as not everyone seems willing to do) that the basic fact of &quot;uniform&quot; lexical decay occurs, the problem to be attacked is that of correctly formulating models for lexical decay and of correctly deriving statistical consequences from these models. In what follows, we will construct a set of models which seem to fit the needs of the method of dating by lexical decay, Our approach is strictly pragmatic, that is, we construct the model we need without concerning ourselves about its a priori reasonableness. Later we try to assemble some arguments which justify the model. In no sense is this an approach for first principles.</Paragraph> <Paragraph position="2"> The analogy between lexical decay and the decay phenomena of nuclear physics has been often noted and dismissed. In the present paper, we insist that this analogy is much more than an analogy; it is, on the first level, an identity. The only alternative to this hypothesis seems to be a kind of mystic faith that the decay occurs but without palpable manipulable principles. The burden of the proof that the identity is false lies with the doubter and we will make no further demonstration of its validity.</Paragraph> </Section> <Section position="2" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Kleinecke - Z </SectionTitle> <Paragraph position="0"> Decay phenomena in nuclear physics are governed by relatively simple, well understood principles. To apply these results to lexical decay we first establish the concepts of a semantic atom and a set of independent semantic atoms. The observed fact of exponential decay of vocabulary then is accounted for by assuming that the lexical item covering an atom decays according to a Poisson process. Oenerally speaking, the converse of this is also true,and only a Poisson process would produce exponential decay. From these considerations, we can draw many conclusions about how to and how not to construct test vocabularies for dating purposes.</Paragraph> <Paragraph position="1"> With this model in hand, we can draw conclusions of a statistical nature. For example, we can develop formulas for the proper method of dating the split between three or more languages and for good estimators in more complex situations.</Paragraph> <Paragraph position="2"> We can construct an inprecise heuristic model for the dynamic semantics underlying the Poisson process. So long as the first order theory is adequate, this is much in the nature of a curiosity. It seems, however, that first order theory is not adequate. Actually, such a conclusion is really premature because the kind of verification studies needed have not been made. Assuming the pessimistic conclusion, we have to construct second (or higher) order theories to account for the inadequacies of first order theory. At the moment, we have no useful results in this direction--the problem merges into the problem of dialectation. Probably the most important service we can render is to indicate exactly what kind of detailed studies are needed.</Paragraph> </Section> <Section position="3" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Semantic Atoms </SectionTitle> <Paragraph position="0"> It is very easy to raise objections of a philosophical nature to the concept of a semantic atom. In this paper we will simply ignore Kleinecke - 3 these objections and define semantic atom in an operational way. There are also operational difficulties, but these seem to be surmountable. null A semantic atom is a ciently specified to remove completely defined unit concept suffiall ambiguity. For example, in an anthropological context, we might have &quot;sun, as pointed at by a male anthropologist at high noon in the middle of summer on an average day among a group of young men with plenty to drink&quot;. The kind of subtilities needed to complete the definition reminds one of Korzybskian General Semantics, but the intention is not the same. We seek to remove ambiguity but we must have a non-unique concept--one that is always present.</Paragraph> <Paragraph position="1"> Certainly there has been little use of semantic atoms anywhere in the past. Those interested in semantics for its own self will reject them as useless or meaningless; lexographers deal in more generalized concepts. It would be hard to argue that they have general utility, but they are precisely what is needed for studying lexical decay. Each semantic atom, in any speech at any time, is assumed to be covered by some lexical item. That is, there is some word whose meaning includes that of the atom. Thus, vocabularies can be formed over any set of semantic atoms by listing the covering lexical item for each atom. The kind of decay being studied is that where the covering lexical item is replaced by another item. The replaced word only rarely immediately disappears from the language as a whole, but it has disappeared from the semantic atom.</Paragraph> <Paragraph position="2"> An independent set of semantic atoms is a set of atoms all of which differ among themselves enough to make the decay at any atom completely independent of that at any other atom. Thus, only one from sets of words, like numerals or pronouns, with habitual interrelations Kleinecke - 4 can appear in the set. Independent sets are useful because in them, problems of inter-atom correlations need not be considered. Before passing on, we should say a few words as to the practical use of semantic atoms. There does not seem to be any doubt that the collectors of vocabularies want to work with semantic atoms-even if their results are completely unsuccessful. In an entry &quot;dog= hund&quot; they would like to say that there is a semantic atom and its cover in English is &quot;dog&quot;, in German, &quot;hund&quot;. The pitfalls of this sort of thing are well-known. Some care in defining atoms might make it feasible if we require not complete identity of the English and German semantics, but rather the existence of some concrete concept where both the English and German words are appropriate. Clearly this much weaker requirement will be easier to satisfy, so we adopt it.</Paragraph> <Paragraph position="3"> We conclude that, with adequate precautions, semantic atoms can be operationally feasible even if true rigor is impossible. In the case of little-known languages, there is much more chance for error. We should encourage collectors of vocabularies to improve the precision of their definitions so that the atom in question can be identified.</Paragraph> </Section> <Section position="4" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Decay Process </SectionTitle> <Paragraph position="0"> We assume that lexical decay, for a set of independent semantic atoms, is a Poisson process. That is, it satisfies three conditions: i. Each atom decays independently of all the other atoms. 2. Each atom decays independently of its history of earlier de c ay.</Paragraph> <Paragraph position="1"> 3. There is a constant k such that for each atom the pro- null bability of one decay in a short time interval At is kAt, and the probability of more than one decay is negligible. Kleinecke - 5 It is rather easy to deduce that for longer time intervals t , the probability of not decaying is exp(-It) , and if there are N atoms, the expected number of undecayed atoms after time t is Nexp(-lt). This formula is the usual formula for lexical decay. It should be pointed out that it was tested, statistically, in the first publication by Swadesh, and it failed to pass. The difficulty is probably due to the word list used which is not an independent set of atoms. If we examine the assumptions made so far, we see that any list of semantic atoms can be used if they are: (1) independent; and (2) assured of existence throughout the time in question. There is no satisfactory a priori basis for assuming that some kinds of semantic atoms decay at different rates than other kinds, and it is doubtful if enough historical evidence can be collected to make such a conclusion stati stically significant.</Paragraph> <Paragraph position="2"> The question whether l is a universal constant, a constant within any one language but possibly differing between languages, or a variable, is easier to discuss. So far, indications are that k is about equal to 1/5000 years. Now this means that over the span of most historic evidence, exp(-kt) will be greater than about 0. 60. There is a great deal of scatter to be expected in the results because N exp(-kt) is an expectation, not an exact prediction.</Paragraph> <Paragraph position="3"> There have been a number of studies of the exponent of exponential decay. All of them are too superficial to be conclusive (Z) . An adequate study in any one language would have to meet several criteria which make it into a major research effort. A set of independent semantic atoms must be selected--selected prior to detailed study--and no atoms, however difficult, dropped without complete explanations (3). Then the history of each atom must be traced through the historical record to locate the lexical item covering the atom at Kleinecke - 6 each point in time. In reporting the study, all of this should be fully documented in detail. Each instance of decay can then be recognized and tallied. Statistical tests should be applied to see whether or not the model is satisfied and to estimate )~ . For example, if there are i00 semantic items T, there should be about one decay every 50 years uniformly spread through time. These things can be checked statistically. We hope that scholars will undertake definitive studies of this type for as many cases as possible (4).</Paragraph> <Paragraph position="4"> Until the results of the kind of research just mentioned are available, the status of ~ is unsure. We anticipate it will be recognized as a universal constant.</Paragraph> <Paragraph position="5"> There remains the problem of making a Poisson process a reasonable assumption. In other words, we need to describe some sort of mechanism which makes words slip off semantic atoms independently of how long they have been covering the atom, and at a constant rate per unit time, at least over short time intervals. Incidentally, since )~ is on the order of 1/5000 years, 50 years is a short time interval. Since the speakers of normal languages are not historians, the independence from history seems easy to accept.</Paragraph> <Paragraph position="6"> The constant rate is harder to accept. First of all we have to account for an identical figure in populations, literate and illiterate, and between a handful of speakers and half a billion speakers. The decay effect must be independent of the number of speakers, hence it must be operative at the level of the single isolated speaker. This is satisfactory since, by and large, the amount of speech reaching an individual does not seem to have changed much throughout history and does not vary much between cultures at the present day.</Paragraph> <Paragraph position="7"> But why does a speaker decide to change an occasional lexical item--about i~0 in his lifetime--and maintain the rest. The only Kleinecke - 7 hypothesis we have been able to construct is that all words are always under pressure--perhaps from several semantic &quot;directions&quot; at the same time. Most atoms resist change most of the time, but some set of accidents (all very real events at the sociological and psychological levels, but random accidents in our context) weakens a few, and the lexicon decays. In other words, there is a constant dynamic movement among secondary and incidental covers of the semantic atom which threaten the principal cover. Usually the threatening lexical items recede, but occasionally, in a random way, about once every five thousand years the principal cover is displaced and a lexical decay occurs.</Paragraph> <Paragraph position="8"> The hypothetical mechanism advanced to explain lexical decay can be checked against history by case studies of semantic atoms.</Paragraph> <Paragraph position="9"> Each atom should show time periods when the principal word was nearly displaced. During these periods it is difficult to decide whether the old word or a new word is the principal cover. Usually the new word will pass away again, but sometimes it will displace the old word. A very tentative guess based on a casual examination of one hundred current English words suggests there are about four very heavily threatened words per hundred. Since we can expect about one word to be decaying at this moment, we conclude that about three out of four times the old word survives. All of this needs to be verified or disproven in detailed studies.</Paragraph> </Section> <Section position="5" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Decay Statistics </SectionTitle> <Paragraph position="0"> The statistical consequences of the model--the first order model described above--need to be explored. We cannot handle all possible situations, but the following examples should provide an adequate demonstration of technique so that any other problems which occur can be solved in the same manner.</Paragraph> <Paragraph position="1"> Kleinecke - 8 First, let us consider N languages deviating independently from a common parent which is not known to us. The following discussion is a bit more cumbersome than some alternative approaches, but it generalizes more easily.</Paragraph> <Paragraph position="2"> Let ~ be any set of the N languages and let P(a) be the probability that the given semantic atom is covered by the original lexical item in exactly the languages of set C~ New covering words are assumed to be different in each of the innovating languages. P(cc) is a function of time and satisfies the following differential equation: where i and j are languages, ~ and C/ mean &quot;belongs to&quot; and &quot;does not belong to&quot; respectively, and ~) is the union of c~ and the set containing only the language j .</Paragraph> <Paragraph position="3"> Let lal denote the number of languages in ~. If 10~1 = N, the equation is easy to solve: = exp(-Xt) I=\[ = N p If a few cases-- I~I = N- 1 , I~I : N-Z. etc.--are solved, we are lead to hypothesize that</Paragraph> <Paragraph position="5"> and the hypothesis is proven by induction.</Paragraph> <Paragraph position="6"> Thus, P(c~) depends only on the value of Ic~l = n . We can recognize P(n) for n=Z, 3,...,N but P(0) and P(1) cannot be distinguished so we combine these into P' which is obtained by</Paragraph> <Paragraph position="8"> Now suppose that from K semantic atoms we observe that k N atoms are covered by the same word in all languages, and kN_ 1 in all but one, and so on to k 2 , and there are k' atoms differently covered in all languages. The probability of this occuring is</Paragraph> <Paragraph position="10"> where x=exp(-kt) . A maximum liklihood estimate for x seems to be the best single value we can assign to x . This is obtained by setting the (logarithmic) derivative of probability to zero so that</Paragraph> <Paragraph position="12"> tion between two languages. For general N, is the solution of the</Paragraph> <Paragraph position="14"> quatratic equation given above. Note that the answer depends on the statistic A which does not usually appear in discussions of lexical dating.</Paragraph> <Paragraph position="15"> An even more general difference between this treatment and usual treatment by pairs is found in the use made of the number of all the languages containing a certain lexical item as the cover of a semantic atom. This kind of count is almost never made in the literature on dating problems.</Paragraph> <Paragraph position="16"> Another case which constantly recurs in practice is that of three languages; i, 2 and 3, say. &quot;The pair 1 and 2 are more closely</Paragraph> </Section> <Section position="6" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Kleinecke - Ii </SectionTitle> <Paragraph position="0"> related than language 3 is to either I or ? . Suppose t is time from the common ancestor of i, 2 and 3 to 3, and t' the time from the common ancestor of 1 and 2 to 1 or Z. Let x=exp(-It) , x = exp(-kt') so that x/x' is the probability associated with the time from the common ancestor of I, Z and 3 to that of 1 and 2.</Paragraph> <Paragraph position="1"> We might observe any of five situations concerning the cover of a semantic atom. It may be the same in all (i, Z, 3); or in any pair (I, 2), (i, 3) or (Z, 3), or different in each. The probability of each of these events is</Paragraph> <Paragraph position="3"> Supposing klz 3 , k~s, ks8 , kle and k' of each of these is observed when K atoms are considered. The total probability is ' (kls + klss ) x' + k~s + k2s xS) kls x' x Z(kls+k~s+k~ss) x (i - )k' (x'- (i+ -Zx2) k' l Maximum liklihood estimates for x and x are gotten from the equations obtained by setting the (logarithmic) derivatives by x and ! x to zero separately.</Paragraph> <Paragraph position="5"> These equations are best solved numerically for given values of k~s s, kls , kls, kss and k' .</Paragraph> <Paragraph position="6"> The methodology is straight-forward and there is no need to multiply examples. In every case we obtain new formulas based on maximum liklihood estimators. Another area in which these methods could also be utilized is in the construction of significance tests and confidence bands. With this basis, most of the machinery of modern statistics would be available for use.</Paragraph> <Paragraph position="7"> Criticism of First Order Theory As we explained in discussing semantic atoms, we feel there is no adequate observational data to which to apply these formulas for a conclusive test of their value. We have made a few experimental applications using the unsatisfactory data available in the literature. Numerically, the time estimates we obtained, which we will not quote here, do not differ a great deal from those obtained by considering pairs alone. This is to be expected if the phenomena are at all consistent. The value in the formulas derived above lies in the fact that they correctly combine the data from several pairs.</Paragraph> <Paragraph position="8"> The first-order method does have one very important difficulty which appears almost immediately if we try to treat more than three languages. This difficulty is in the family tree of the languages.</Paragraph> <Paragraph position="9"> In the entire first-order development, we have implicitly used the concept of a tree. Languages go together as a &quot;common ancestor&quot; Kleinecke - 13 until some point in time when they divide and become two separate languages. The tree is the first-order model of dialectation--it is known to be inadequate, at least in many situations. In spite of a century or so of studies, we simply do not understand how dialectation occurs. More study is greatly needed, especially in the construction of higher-order models, but the problem lies outside the scope of this paper.</Paragraph> <Paragraph position="10"> The difficulty with the tree rises in decay studies because only splitting is compatible with our statistical model. We have no alternative to constructing a family tree if we wish to apply the method outlined above. However, it seems to be easy to find examples which do not allow a tree to be constructed. Consider four languages; A , B, C and D . Suppose one semantic atom has the same cover in A and B, and another different cover in C and D . And at the same time, some other atom has one cover in A and C , and a different cover in B and D. We cannot fit this data into any family tree.</Paragraph> <Paragraph position="11"> A little more specifically in the Romance languages, we find that the same innovation with respect to Latin is shared by several or all the later languages. Some of this can be explained by the colloquial versus learned speech theory, but no family tree can be constructed to explain all the combinations of innovations. If we had an adequate explanation of the phenomena involved in these shared innovations, it is quite possible that we could assume Romance was the direct descendent of Imperial Latin without going back to Plautus or thereabouts, as seems to be required by the first order theory.</Paragraph> <Paragraph position="12"> A tentative beginning in this direction can be made by a second-order theory based on the dynamic model of lexical influence.</Paragraph> </Section> <Section position="7" start_page="15" end_page="15" type="sub_section"> <SectionTitle> Kleinecke - 14 Second-Order Lexical Decay </SectionTitle> <Paragraph position="0"> The imprecise model of semantic pressures we formed to explain lexical decay suggests the following second-order model.</Paragraph> <Paragraph position="1"> For each semantic atom, we consider not only a covering lexical item as before, but also a potential covering item. The potential cover is the source of pressure against the cover. When the cover decays, it is replaced by the potential cover. Naturally we also assume that the potential cover decays and is replaced by a new potential cover. In the interest of simplicity and because we have no numerical data, we will assume both decays have the same constant k First, let us consider a single language. The situation at an atom can be of four types: (1) both the original cover and potential cover remain; (If) the original cover remains, but the potential cover has decayed; (III) the original cover has decayed and the potential cover has replaced it; (IV) the cover is now neither the original nor the potential cover.</Paragraph> <Paragraph position="2"> Let Pl and PII be the probability of the first two situations.</Paragraph> <Paragraph position="4"> The original cover remains in these two cases only so that the probability of it remaining is</Paragraph> <Paragraph position="6"> which is exactly the same as in first-order theory.</Paragraph> <Paragraph position="7"> When the second-order theory is applied to N languages, the results are quite complicated. We divide the languages into four sets (~, B, Y, 6 depending on which situation holds in the language; in set 0% situation I holds, and so on. Then we have the basic differential equation</Paragraph> <Paragraph position="9"> We have no way of recognizing the condition of the potential cover, so sets ez and ~ should be combined into a set ~ and</Paragraph> <Paragraph position="11"> Before we can actually apply the maximum liklihood technique to languages without known ancestors, we have to make Some further combinations because sets with I~\]I = 1 can not be distinguished from those with I~I- o or those with I'~I- i from those with l~J- 0 Moreover, we cannot distinguish original covers from potential covers so that two sets T\] and y must be combined with the same sets in the reverse order.</Paragraph> <Paragraph position="12"> The general case is very complicated, so we restrict ourselves to two languages. We then observe that the covers are either the same or different. If they are the same, we have either I~ I = 2 and</Paragraph> <Paragraph position="14"> which differs from the first order theory by the term in the square bracket.</Paragraph> <Paragraph position="15"> The simplest case where the second-order theory is really required is that of four languages. We will illustrate the results by one expression. If kss words are covered by two items both in two languages, k4 words by one item in all languages, k 8 by one item in three languages, k s by one item in two languages, and k' by no common items,then the expression to be solved for maximum liklihood</Paragraph> <Paragraph position="17"> where p = exp(-%t) .</Paragraph> <Paragraph position="18"> This second-order theory is not satisfactory not only because it leads to very complex formulas, but it also seems to be qualitatively inadequate. The formula for splitting between two languages is not greatly modified except for very long times, and the change does not seem to be enough to account for data showing short times of division. It is hard to tell whether the formula for several languages including the quantity k2e is any help--so far we have no striking results to quote from its use.</Paragraph> <Paragraph position="19"> A second-order theory where potential cover decayed at a different rate than the original cover might correct some of these defects, but we have no evidence upon which to estimate the decay rate in this case. It is much likely that a more elaborate mechanism must be postulated--it need not lead to more elaborate results. The model must be based on a kind of dialectation study which seems to be absent as yet from the literature.</Paragraph> <Paragraph position="20"> Conclusion We have derived a number of formulas relating to the estimation of time depths by observations of lexical decay. The methods used can be applied to obtain many more similar formulas as required in studies of actual data.</Paragraph> <Paragraph position="21"> All of these formulas are based on models of lexical decay using the concept of semantic atoms and their lexical covers. Lexical decay Kleinecke - 18 is identified with a change in lexical cover. If the semantic atoms are sufficiently independent, the decay is a Poisson process.</Paragraph> <Paragraph position="22"> Probably the most important practical conclusion is the result that any set of semantic atoms can be used to evaluate lexical decay provided the set is made up of atoms: .</Paragraph> <Paragraph position="23"> .</Paragraph> <Paragraph position="24"> far enough removed in meaning from one another to assure independence, which represent concepts assured to have been in existence throughout the time period being studied.</Paragraph> <Paragraph position="25"> There is no outstanding study of this problem. Attempts to &quot;improve&quot; the test vocabulary by limiting it to meanings which have behaved well in earlier studies are methodologically disasterous because they bias the value of k. This requirement is also intended to remove bias from the estimate of k .</Paragraph> <Paragraph position="26"> This is a matter of classical philological research independent of statistical syntheses made from the results.</Paragraph> </Section> </Section> class="xml-element"></Paper>