AUTOMATIC SI~IATION OF HISTORICAL CHANGE 
Raoul N. Smith 
Northwestern University 
0.0 Purpose. One of the principal reasons for studying the history 
of a language has been to explain the system of its modern reflex, 
the contemporary language. This has been especially true in 
attempting to deal with certain anomalies in the modern language. 
But the role, if appropriate, of utilizing information concerning 
diaehronie processes in a synchronic description is not a~ all 
i 
clear. Recent studies describing contemporary languages, based 
purely on synchronically motivated grounds, suggest a much more 
intimate relation between a synchronic grammar and what has been 
2 
previously posited as a dlachronic description of that language. 
The two major problems involved in historical studies have 
been the statement of the sound change (or, as this has been 
reinterpreted, the grammar change) and the relation of this change 
to other diachronic changes, that is, its relative chronology. 
A great deal of attention has been paid to the former but very 
little to the latter, whose significance has g-oatly increased 
1 
See, for example, Lightner, 1965 and Schaehter and Fromkin, 1968. 
2 
Since a naive native speaker of a language can not be ex~cte + 
know the history of his language, the reason for this rei~.ion 
may lie in the manner in which the rules were added to the grammars 
of his ancestors. It is hoped that the future results of this 
study will help to shed light on this relation by comparing them 
and their associated grammars with synchronic descriptions. 
due to recent results in generative grammar. One of the reasons 
for the lack of rigor in stating the relative chronology has 
probably been the large amounts of data required for input to the 
set of rules and the very large number of stages/rules which must 
be accounted for and the many permutations of these rules which 
should be tested. This lack of rigor, in turn, has made it very 
difficult to discuss coherently the historical development of a 
language. 
The purpose of this paper is to discuss certain limited 
aspects of historical language change and suggest the possible 
use of the computer in approaching their solution. The types of 
problems discussed are only phonological and include only those 
changes conditioned by phonetic environment and which do not 
require syntactic information (for example, the change of the 
Old Russian unstressed infinitive ending /t'i/ to /t'/). 
1.0 Language change. The discovery or postulation of the history 
of a language has been approached by two rather well-known methods. 
These are the reconstruction of the parent language of a set of 
genetically related languages and the reconstruction of an older 
stage of a language given a later stage. Both assume that 
languages evolve, thst is, change (either gradually or abruptly) 
and that the relation of one language stage tothe other is that 
one preceded the other. By comparing these stages in the 
development of the same or related sister languages one can 
3 
therefore reconstruct or recover the parent or proto language from 
which it or its sister languages evolved. 
The two problems or reconstruct lon -- the reconstruction of 
a parent language of related sister languages and the reconstruction 
of an earlier form of a single language -- have been approached by 
separate methods. 
The problem of reconstructing the parent of a set of related 
sister languages has been formalized by the so-called comparative 
method. The comparative method assumea~ among other things and in 
a simplified version, that hF comparing sets of sounds occurring 
in the same positions of the same words in the sister languages 
one can reconstruct the sound from whi h these sister sounds 
evolved. ("Same position" and "same word" may be difficult to 
define in a particular case.) For example, the word for "three" 
in various Indo-European languages is 
J "Sanskrit trayah 
Greek LMI #5 " 
Latin tres 
Gothic ~reis 
Lithuanian trys 
Old Church Slavonic tr~je 
Since five of the six forms have some kind of voiceless, dental stop 
in word initial position and that the fricative in Gothic can be 
accounted for by a change particular to Gothic, the assumption is 
made that the parent language of these, Proto-lndo-European, had 
4 
a voiceless, dental stop symbolized by *t, and this fact is 
highlighted by arranging these correspondences in tabular form: 
PIE Skr G~ Lat Go Lith OCS 
*t t ~ t ~ t t 
I 
Once the proto language has been reconstructed, the correspon- 
dences used in the reconstruction can be reinterpreted as the 
results of phonological changes, for example, Proto-lndo-European 
*t became t in Old Church Slavonic, or, as it is usually expressed 
PIE *t > OCS t 
The problem of discovering an older stage of a language 
given only a later stage of that language is approached by 
examining certain alternations in the language at the later stage 
and from these postulate a proto form from which it could have 
evolved. The alternations in the later stagewhich are usually 
chosen are in the form of morphophonemic alternations, that is, 
phonemically differently shaped forms of the same morpheme. The 
assumption is made that these irregularities in the shape of one 
n and the same morpheme must have been conditi~d regularly so that 
by postulating one proto form and accounting for the change by a 
general rule, we have successfully reconstructed that form of the 
earlier language. 
I 
The forms of the proto language reconstructed by the 
comparative method can be interpreted as a statement of the 
state of the art in reconstruction for that language family. 
5 ~ 
For example, in Modern Russian the first person singular 
present of the verb "to be able" is /mogu/ and the second person 
singular is /mo~i~/. The first singular of "to read" is /~itaju/ 
and the second singular is /~itaji~/. From these and other sets 
of verbs we would conclude that the first singular ending is /u/ 
and that of the second singular is /i~/. Therefore, the stem 
morpheme alternants must be ImogNmo~# and \[~itaj ~ ~itaj} 
The first of these two sets exhibits a morphophonemic alternation 
of g/~. This same alternation in the first person singular occurs 
in other sets of verbs when the stem ends in a velar. From this 
(and other corroborative forms) we postulate that an earlier 
stage of Russian had one form for this verb stem, namely /mog/, 
and that before the vowel /i/ /g/ later became /~/. 
The proposal in this paper is to reverse the bottom-to-top 
model of the comparative method and that of internal reconstruction 
into a top-to-bottom generative model where the input forms are 
reconstructed leKical items and the rules are the set of postulated 
sound changes for the language. But there are two major difficul- 
ties in reversing the older models. One, the documented changes 
have often been incorrectly or incompletely stated and, two, the 
relative chronology of various rules ~as not been adequately 
described. We hope to show how the computer can be used at least 
to test the accuracy of the rules and secondly to test or, hopefully, 
to help discover the relative chronology of the rules exhibited 
by their ordering. 
6 
The model )roposed here is one where the proto language is a 
set of reconstructed forms (chosen, for example, from a standard 
reference work). The rules describing the phonological changes 
in that language are then described and ordered. As the program 
operates on these forms, the output from each rule represents a 
synchronic stage in the development of that language. As final 
output one hopes to get the modern language. If any of the output 
is incorrect, then it is assumed to be from one of four possible 
sources: an incorrectly formulated rule (including analogical 
formation), a non-existent rule, an incorrectly ordered rule, 
or an incorrectly reconstructed form. Being able to differentiate 
which of these is the actual cause for the incorrect output is 
simplest only in the case where all of the output was the result 
of the application o£ only one rule. 
2.0 A sketch of the phonological ~ of Russian. The rules 
which were tested were an abridged version of a set presented by 
1 
the author in a recent paper. The rules attempt to account for 
certain aspects of the development of the phonological system of 
C~ntemporary Standard Russian from a late form of Proto-Indo- 
European. These rules were: 
I 
Kantor, Marvin and R.N. Smith, "A sketch of the major develppments 
in Russian historical phonology" (to appear). The original formu- 
lation was in terms of distinctive features; however, for this 
programmatic study a segmental notation has been used for ease of 
statement, etc. 
1. ,\] 
g~L. Lkl 
g k~ 
g~ 
2. 
3. 
8 > ,li} non-obstruent 
4, 
5. \[ {~:} 
a 
6. t k \] 
g 
x \],-If} 
7. t 
d 
S 
Z 
j> / J 
8. 
J l J f 
W 
9. 
10. 
11. 
13. 
i I°l 
Y 
b 
e 
L-~ j{1} 
! n 
r 
\[i 1 
14, 
x 
v > 15. e a 
J 
16. or oro 
ol C C ol ~ /ere/ 
er 
Lo:oj 
\[:i > L_I/ 
9 
18. w > v 
,~. ~ > ~, / ~ 
21. ~ > e 
Some of these rules appear in a less formal and explicit 
way in all standard texts of the history of the Russian language 
but none of them has all of these rules and in most cases there 
is little or no discussion of ordering° 
10 
~.0 Description of test. Approximately five hundred reconstructed 
Proto-lndo-European forms were chosen from Walde and Pokorny (1932). 
These were punched onto cards along with their English glosses. 
A separate Russian gloss was typed onto a print-out of the PIE 
forms. The transcription of Walde and Pokorny for the PIE 
lexical items was maintained as closely as possible, £ncluding 
such notations as subscript e and o. The only criterion imposed 
on choice of words was that they be as long as possible, so as to 
have a variety of environments. 
The program was written in SNOBOL4 for the CDC 6400. Each rule 
set was numbered so as to coincide with the set of rules listed 
in section 2, wlth a zero appended to each rule number so as to 
allow for later insertions. Changing a rule consists at the 
moment of simple removal and replacement of cards. The history 
of a word or set of words can be gotten by a11'owlng it to be 
processed with accompanying output generated by each rule set. 
Similarly, the lexicon for a particular stage can be generated by 
allowing the input to be processed up through the rule covering that 
stage and, if wanted, suppressing output from intermediate stages. 
With the availability of larger storage capacity the output f~em 
each stage can be generated once and stored in such a way that 
it can be referenced simply and thereby ellminate regeneration of 
input forms when the need for a rule change arises. 
Frequency counters will be added in older to measure the 
functional load of a rule, at least in terms of dictionary 
II 
frequency. How this can be incorporated meaningfully into a 
theory of language change is not clear at this time. 
The effect of borrowing can be simulated by introduction of 
lexical items just prior to a specific stage, There are too 
many variables involved in this case and the predictions have been 
poor. The effects of loss of original PIE are even more obvious 
but will require much further study. 
4.0 Discussion of results. The program is obviously language 
dependent but the basic conception is of general applicability. 
The set of rules described in section 2.0 has been programmed 
and has successfully predicted the Modern Russian form from the 
PIE input in many cases including the following: 
PIEE Hod. Russian 
*bhel~ h- boloz- 
*medhi - me~- 
*aloNg- slug- 
*ang~hi - u~ 
*~orm- sram 
*~omb h- zub 
*g~rffg h- griz- 
The number of forms of PIE which were not related to 
Modern Russian was much greater. The main reason is probably 
due to loss (again assuming a uniform parent PIE as the sole 
source of the lexicon). The differences between generated 
12 
and actual Modern Russian could be accounted for in a few instances, 
for example *apsa did not become Mod. R 'osina' in part because 
there are no explicit rules in the program for the simplification 
of consonant clusters. But other incorrect outputs can not be 
accounted for in many instances by any easily accessible, docu- 
mented rule. For example, there is no rule to handle the two 
disparate outputs from similar input forms with respect to the 
initial cluster 'sp-' in the forms 
PIE Mod. Russian 
*sperg- pr'ad- 
*sple~ h- selez- 
Similarly, there is a rule eu>u, for example, to account for 
*leut- becoming l~ud -, and others, but there are at least two 
exceptions to this r~le: 
~bhreu~k - bros- 
*bhleu - bles- 
Whether the original rule has too general an environmental 
condition or whether u was generated and later underwent some 
other changes, heretofore unpostulated, is unknown. It is 
possible that these forms should not have been considered as 
correspondences. 
No case of an error in rule ordering has been found yet for 
l 
this sample. 
Examining the output is at times a forbidding task. It may 
help to simplify the discovery of causes of error by generating 
the output of each rule in the form of a KWIC index where the 
occurrences of each phone would be grouped together and the 
13 
environments then made quite clear. When input and output are 
then compared one might find more easily why a rule was not 
applied or why it should be generalized, etc. 
5.0 Conclusions. This paper is necessarily meant to be only a 
preliminary progress report and as such has raised many other 
questions in addition to its concrete results. Some of these 
questions are very basic, in particular, since many of the output 
forms could not be accounted for, should one really attempt to 
generate forms of a modern language from reconstructed lexical 
items if the rules used are not those postulated during the 
process of reconstruction since the former should be a record 
of the latter. Given, therefore, a set of reconstructed forms 
and a separate set of rules, it becomes very difficult to 
account for the source of errors in the output. Also, the 
assumption of a uniform, single-stage ~roto language may require 
many restrictions. 
The computer can under ideal conditions be successfully 
used in testing hypothesized changes in the history of a 
language, given certain simplifying assumptions. It can be 
expected to operate best when the rules and reconstructed proto 
forms are established by the same investigator, working within 
the bounds established by a general theory of historical change. 
14 

References

Kantor, Marvin and R. N. Smith, "A sketch of the major developments 
in Russian historical phonology" (to appear). 

Lightner, Theodore M., Segmental phonology of modern standard 
Russian. MIT dissertation, 1965. 

Schachter, Paul and V. Fromkin, ~ phonology of Akan: Akuapem, 
Asante and Fante. (Working Papers in Phonetics No. 9) 
UCLA, 1968. 

Walde, A. and J. Pokorny, Ver~leichendes Worterhuch der indo- 
~ermanlschenSprachen. Berlin, 1932. 
