File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/98/p98-1043_intro.xml
Size: 1,962 bytes
Last Modified: 2025-10-06 14:06:35
<?xml version="1.0" standalone="yes"?> <Paper uid="P98-1043"> <Title>Alignment of Multiple Languages for Historical Comparison</Title> <Section position="3" start_page="275" end_page="275" type="intro"> <SectionTitle> 2 Multiple-string alignment </SectionTitle> <Paragraph position="0"> The alignment step is hard to automate because there are too many possible alignments to choose from. For example, French le \[l~\] and Spanish el \[el I can be lined up at least three ways: el el- -el 12 -1~ 12-Of these, the second is etymologically correct, and the third would merit consideration if one did not know the etymology.</Paragraph> <Paragraph position="1"> The number of alignments rises exponentially with the length of the strings and the number of strings being aligned. Two ten-letter strings have anywhere from 26,797 to 8,079,453 different alignments depending on exactly what alignments are considered distinct (Covington 1996, Covington and Canfield 1996). As for multiple strings, if two strings have A alignments then n strings have roughly A '~-1 alignments, assuming the alignments are generated by aligning the first two strings, then aligning the third string against the second, and so forth. In fact, the search space isn't quite that large because some combinations are equivalent to others, but it is clearly too large to search exhaustively.</Paragraph> <Paragraph position="2"> aligner will prefer to match consonants, given a choice) Match of 2 vowels that differ only in length, or \[i\] and \[y\], or \[u\] and \[w\] Skip preceded by another skip in the same string Skip not preceded by another skip in the same string Fortunately the comparative linguist is not looking for all possible alignments, only the ones that are likely to manifest regular sound correspondences - that is, those with a reasonable degree of phonetic similarity. Thus, phonetic similarity can be used to constrain the search.</Paragraph> </Section> class="xml-element"></Paper>