Alignment of Multiple Languages for Historical Comparison 
Michael A. Covington 
Artificial Intelligence Center 
The University of Georgia 
Athens, GA 30602-7415 U.S.A. 
mc@uga.edu 
Abstract 
An essential step in comparative reconstruction 
is to align corresponding phonological segments 
in the words being compared. To do this, one 
must search among huge numbers of potential 
alignments to find those that give a good pho- 
netic fit. This is a hard computational prob- 
lem, and it becomes exponentially more difficult 
when more than two strings are being aligned. 
In this paper I extend the guided-search align- 
ment algorithm of Covington (Computational 
Linguistics, 1996) to handle more than two 
strings. The resulting algorithm has been im- 
plemented in Prolog and gives reasonable results 
when tested on data from several languages. 
1 Background 
The Comparative Method for reconstructing 
languages consists of at least the following steps: 
1. Choose sets of words in the daughter lan- 
guages that appear to be cognate; 
2. Align the phonological segments that ap- 
pear to correspond (e.g., skip the \[k\] when 
aligning German \[kn~ with English \[niy\] 
'knee'); 1 
3. Find regular correspondence sets (proto- 
allophones, Hoenigswald 1950); 
4. Classify the proto-allophones into proto- 
phonemes with phonological rules (sound 
laws). 
The results of each step can be used to refine 
guesses made at previous steps. For example, 
IThese phonetic transcriptions may nor may not be 
phonemic. Because of the way the Comparative Method 
works, synchronic aUophony is, in general, factored out 
along with diachronic allophony as the reconstruction 
proceeds. 
275 
a regular correspondence, once discovered, can 
be used to refine one's choice of alignments and 
even putative cognates. 
Parts of the Comparative Method have been 
computerized by Frantz (1970), Hewson (1974), 
Wimbish (1989), and Lowe and Mazandon 
(1994), but none of them have tackled the align- 
ment step. Covington (1996) presents a work- 
able alignment algorithm for comparing two lan- 
guages. In this paper I extend that algorithm to 
handle more than two languages at once. 
2 Multiple-string alignment 
The alignment step is hard to automate be- 
cause there are too many possible alignments 
to choose from. For example, French le \[l~\] and 
Spanish el \[el I can be lined up at least three 
ways: 
el el- -el 
12 -1~ 12- 
Of these, the second is etymologically correct, 
and the third would merit consideration if one 
did not know the etymology. 
The number of alignments rises exponentially 
with the length of the strings and the number 
of strings being aligned. Two ten-letter strings 
have anywhere from 26,797 to 8,079,453 differ- 
ent alignments depending on exactly what align- 
ments are considered distinct (Covington 1996, 
Covington and Canfield 1996). As for multiple 
strings, if two strings have A alignments then 
n strings have roughly A '~-1 alignments, assum- 
ing the alignments are generated by aligning the 
first two strings, then aligning the third string 
against the second, and so forth. In fact, the 
search space isn't quite that large because some 
combinations are equivalent to others, but it is 
clearly too large to search exhaustively. 
Table 1: Evaluation metric used by Covington 
(1996). 
Badness 
10 
30 
60 
100 
40 
50 
Conditions 
Exact match of consonants or glides 
Exact match of vowels (nonzero so the 
aligner will prefer to match consonants, 
given a choice) 
Match of 2 vowels that differ only in 
length, or \[i\] and \[y\], or \[u\] and \[w\] 
Match of 2 dissimilar vowels 
Match of 2 dissimilar consonants 
Match of 2 unrelated segments 
Skip preceded by another skip in the 
same string 
Skip not preceded by another skip in 
the same string 
Fortunately the comparative linguist is not 
looking for all possible alignments, only the ones 
that are likely to manifest regular sound corre- 
spondences - that is, those with a reasonable 
degree of phonetic similarity. Thus, phonetic 
similarity can be used to constrain the search. 
3 Applying an evaluation metric 
The phonetic similarity criterion used by Cov- 
ington (1996) is shown in Table 1. It is obviously 
just a stand-in for a more sophisticated, per- 
haps feature-based, system of phonology. The 
algorithm computes a "badness" or "penalty" for 
each step (column) in the alignment, summing 
the values to judge the badness of the whole 
alignment, thus: 
e 1 
1 o 
i00 + i00 -- 200 
e 1 - 
1 
50 + 0 + 50 = i00 
The alignment with the lowest total badness is 
the one with the greatest phonetic similarity. 
Note that two separate skips count exactly the 
same as one complete mismatch; thus the align- 
ments 
e -e 
1 l- 
are equally valued. In fact, a "no-alternating- 
skips rule" prevents the second one from being 
generated; deciding whether \[e\] and \[I\] corre- 
spond is left for another, unstated, part of the 
comparison process. I will explain below why 
this is not satisfactory. 
Naturally, the alignment with the best overall 
phonetic similarity is not always the etymolog- 
ically correct one, although it is usually close; 
we are looking for a good phonetic fit, not nec- 
essarily the best one. 
4 Generalizing to three or more 
languages 
When a guided search is involved, aligning 
strings from three or more languages is not sim- 
ply a matter of finding the best alignment of 
the first two, then adding a third, and then a 
fourth, and so on. Thus, an algorithm to align 
two strings cannot be used iteratively to align 
more than two. 
The reason is that the best overall alignment 
of three or more strings is not necessarily the 
best alignment of any given pair in the set. Fox 
(1995:68) gives a striking example, originally 
from Haas (1969). The best alignment of the 
Choctaw and Cree words for 'squirrel' appears 
to be: 
Choctaw fani 
Cree - i !u 
Here the correspondence \[a\]:\[i\] is problematic. 
Add the Koasati word, though, and it becomes 
clear that the correct alignment is actually: 
Choctaw - fani 
Koasati i p - ! u 
Cree i - - l u 
o 
Any algorithm that started by finding the best 
alignment of Choctaw against Cree would miss 
this solution. 
A much better strategy is to evaluate each col- 
umn of the alignment (I'll call it a "step") before 
generating the next column. That is, evaluate 
the first step, 
and then the second step, 
276 
f 
P 
and so on. At each step, the total badness is 
computed by comparing each segment to all of 
the other segments. Thus the total badness of 
a 
b 
C 
is badness(a, b) + badness(b, c) + badness(a, c). 
That way, no string gets aligned against another 
without considering the rest of the strings in the 
set. 
Another detail has to do with skips. Empiri- 
cally, I found that the badness of 
f 
P 
comes out too high if computed as 
badness(f,p) + badness(p,-) + badness(f,-); 
that is, the algorithm is too reluctant to take 
skips. The reason, intuitively, is that in this 
alignment step, there is really only one skip, 
not two separate skips (one skipping If\] and 
one skipping \[p\]). This becomes even more 
apparent when more than three strings are 
being aligned. 
Accordingly, when computing badness I count 
each skip only once (assessing it 50 points), 
then ignore skips when comparing the segments 
against each other. I have not implemented the 
rule from Covington (1996) that gives a reduced 
penalty for adjacent skips in the same string to 
reflect the fact that affixes tend to be contigu- 
ous. 
5 Searching the set of alignments 
The standard way to find the best alignment of 
two strings is a matrix-based technique known 
as dynamic programming (Ukkonen 1985, Wa- 
terman 1995). However, dynamic program- 
ming cannot accommodate rules that look ahead 
along the string to recognize assimilation or 
metathesis, a possibility that needs to be left 
open when implementing comparative recon- 
struction. Additionally, generalization of dy- 
namic programming to multiple strings does not 
entirely appear to be a solved problem (cf. Ke- 
cecioglu 1993). 
Accordingly, I follow Covington (1996) in re- 
casting the problem as a tree search. Consider 
the problem of aligning \[el\] with \[le\]. Coving- 
ton (1996) treats this as a process that steps 
through both strings and, at each step, per- 
forms either a "match" (accepting a character 
from both strings), a "skip-l" (skipping a char- 
acter in the first string), or a "skip-2" (skipping 
a character in the second string). That results 
in the search tree shown in Fig. 1 (ignoring Cov- 
ington's "no-alternating-skips rule"). 
The search tree can be generalized to multiple 
strings by breaking up each step into a series 
of operations, one on each string, as shown in 
Fig. 2. Instead of three choices, match, skip-l, 
and skip-2, there are really 2x2: accept or skip 
on string 1 and then accept or skip on string 
2. One of the four combinations is disallowed - 
you can't have a step in which no characters are 
accepted from any string. 
Similarly, if there were three strings, there 
would be three two-way decisions, leading to 
eight (= 2 3) states, one of which would be dis- 
allowed. Using search trees of this type, the de- 
cisions necessary to align any number of strings 
can be strung together in a satisfactory way. 
6 Alternating skips 
Covington (1996) considers the alignments 
e -e 
1 1- 
equivalent and generates only the first of them, 
leaving it to some later step in the comparison 
process to decide whether \[e\] and \[1\] really cor- 
respond. The rule is: 
NO-ALTERNATING-SKIPS RULE: If there is 
a skip in one string, there cannot be a skip 
in the other string at the next step. 
Although this tactic narrows the search space, 
I do not think this is linguistically satisfactory; 
after all, aligning \[el with \[1\] and skipping them 
in tandem are quite different linguistic claims. 
Consider for example the final segment of Span- 
ish \[dos\] and Italian \[due\] 'two'; it is correct to 
skip the \[s\] and the \[e\] in tandem because they 
come from different Latin endings. It is not his- 
torically correct to pair Is\] with \[e\] in a corre- 
spondence set. 
277 
Start 
rol/sk,p oo ro-1 LIoJ 
LJ ~",~ngl string2 L J 
\\ 
~,';,i~?;\ ro ::: 
Situations where only 
one move is possible 
string 2 o 
S,,,on E;'-;1 string 1 
Analogous 
to above 
Figure 1: Part of a 3-way-branching search tree for generating potential alignments (Covington 
1996, ignoring no-alternating-skips rule). 
Start 
Accept ~\[el\] 
Accept \[ ¢\] \]JU oJ 
,,oce,:, .\[o\]<.--J- k'J ~<,~C~--~_F~.I.._ ,.'-J ,~,<,,:, ;,\] 
rl<" / ~<,-;~..~\[~_\]_.. Ll.~ 
Processing Processing Processing Processing Processing 
string 1 string 2 string I string 2 string 1... 
I I I I I 
Step 1 Step 2 Step 3... 
Figure 2: Search tree factored into 2-way branchings with a disallowed state at each step. This tree 
generalizes to handle more than 2 strings. 
278 
Also, the no-alternating-skips rule does not 
generalize easily to multiple strings. I therefore 
replace it with a different restriction: 
ORDERED-ALTERNATING-SKIPS RULE: A 
skip can be taken in strings i and j in suc- 
cessive steps only if i ~_ j. 
That lets us generate 
- e (String 1) 
1 - (String 2) 
but not 
e - 
-1 
which is undeniably equivalent. It also ensures 
that there is only one way of skipping several 
consecutive segments; we get 
---abc 
def- - - 
but not 
-a-b-c abc--- 
d-e-f .... def 
or numerous other equivalent combinations of 
skips. 
7 Pruning the search 
The goal of the algorithm is, of course, to gen- 
erate not the whole search tree, but only the 
parts of it likely to contain the best alignments, 
thereby narrowing the intractably large search 
space into something manageable. 
Following Covington (1996), I implemented 
a very simple pruning strategy. The program 
keeps track of the badness of the best complete 
alignment found so far. Every branch in the 
search tree is abandoned as soon as its total bad- 
ness exceeds that value. Thus, bad alignments 
are abandoned when they have only partly been 
generated. 
A second part of the strategy is that the com- 
puter always tries matches before it tries skips. 
As a result, if not much material needs to be 
skipped, a good alignment is found very quickly. 
For example, three four-character strings have 
10,536 alignments (generated my way), but 
when comparing Spanish tres, French trois, and 
Table 2: Some alignments found by the proto- 
type program. 
Spanish/Italian/French 'three': 
tr-es 
tr-e- 
t rwa- 
Spanish/Italian/French 'four': 
kwa - t r o 
kwat t ro 
k-a-tr- 
Spanish/Italian/French 'five': 
0i~k-o 
ciokwe 
s~-k- - 
Koasati / Cree / Choctaw 'squirrel': 
ip-!u 
i--!u 
- fani 
English three, 2 the algorithm finds its "best" 
alignment, 
tr-es 
t rwa- 
0r-iy 
after completing only ten other alignments, al- 
though it also pursues several hundred branches 
of the tree part of the way. (Here the match of \[s\] 
with \[y\] is problematic, but the computer can't 
know that; it also finds a number of alternative 
alignments.) 
8 Results and evaluation 
The algorithm has been prototyped in LPA Pro- 
log, and Table 2 shows some of the alignments 
it found. None of these took more than five sec- 
onds on a 133-MHz Pentium, and the Prolog 
program was written for versatility, not speed. 
As comparative linguists know, the alignment 
that gives the best phonetic fit (by any crite- 
rion) is not always the etymologically correct 
one. This is evident with my algorithm. For 
2Admittedly an odd set to compare because of the 
different depth of branching, but they are cognates and 
each has four segments. 
279 
instance, comparing the Sanskrit, Greek, and 
Latin words for 'field,' the algorithm finds the 
correct alignment, 
ager-- 
ag-ros 
a\]-ras (badness = 365) 
but then discards it in favor of a seemingly bet- 
ter alignment: 
ager-- 
ag-ros 
a-\]ras (badness = 345) 
It doesn't know, of course, that \[g\]:\[\]\] is a pho- 
netically probable correspondence. 
Worse, occasionally the present algorithm 
doesn't consider the etymologically correct 
alignment at all because something that looks 
better has already been found. For example, 
taking the Avestan, Greek, and Latin words for 
'100', the algorithm settles on 
--satom 
hekaton 
ken-tum (badness 610) 
without ever considering the etymologically cor- 
rect alignment: 
--sa-tom 
heka-ton 
--kentum (badness 690) 
The penalties for skips may still be too high 
here, but the real problem is, of course, that the 
algorithm is looking for the one best alignment, 
and that's not what comparative reconstruction 
needs. Instead, the computer should prune the 
search tree less eagerly, pursuing any alignment 
whose badness is, say, no more than 120% of 
the lowest found so far, and delivering all solu- 
tions that are reasonably close to the best one 
found during the entire procedure. Indeed, the 
availability of multiple potential alignments is 
the keystone of Kay's (1964) proposal to imple- 
ment the Comparative Method, which could not 
be implemented at the time Kay proposed it be- 
cause of the lack of an efficient search algorithm. 
The requisite modification is easily made and I 
plan to pursue it in subsequent work. 

References 
Covington, Michael A. (1996) An algorithm to 
align words for historical comparison. Com- 
putational linguistics 22:481-496. 
Covington, Michael A., and Canfield, E. Rodney 
(1996) The number of distinct alignments of 
two strings. Unpublished manuscript, Univer- 
sity of Georgia. 
Fox, Anthony (1995) Linguistic reconstruction: 
an introduction to theory and method. Oxford: 
Oxford University Press. 
Frantz, Donald G. (1970) A PL/1 program to 
assist the comparative linguist. Communica- 
tions of the ACM 13:353-356. 
Haas, Mary R. (1969) The prehistory of lan- 
guages. The Hague: Mouton. 
Hewson, John (1974) Comparative reconstruc- 
tion on the computer. John M. Anderson and 
Charles Jones, eds., Historical linguistics I: 
syntax, morphology, internal and comparative 
reconstruction, 191-197. Amsterdam: North 
Holland. 
Hoenigswald, Henry (1950) The principal step 
in comparative grammar. Language 26:357- 
364. Reprinted in Martin Joos, ed., Readings 
in Linguistics I, 4th ed., 298-302. Chicago: 
University of Chicago Press, 1966. 
Kay, Martin (1964) The logic of cognate rcog- 
nition in historical linguistics. (Memorandum 
RM-4224-PR.) Santa Monica: The RAND 
Corporation. 
Kececioglu, John (1993) The maximum weight 
trace problem in multiple sequence alignment. 
Combinatorial pattern matching: 4th annual 
symposium, ed. A. Apostolico et al., 106-119. 
Berlin: Springer. 
Lowe, John B., and Mazaudon, Martine (1994) 
The Reconstruction Engine: a computer im- 
plementation of the comparative method. 
Computational Linguistics 20:381-417. 
Ukkonen, Esko (1985) Algorithms for approxi- 
mate string matching. Information and Con- 
trol 64:100-118. 
Waterman, Michael S. (1995) Introduction to 
computational biology: maps, sequences and 
genomes. London: Chapman & Hall. 
Wimbish, John S. (1989) WORDSURV: a pro- 
gram for analyzing language survey word lists. 
Dallas: Summer Institute of Linguistics. 
