File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/02/c02-1016_intro.xml
Size: 2,526 bytes
Last Modified: 2025-10-06 14:01:19
<?xml version="1.0" standalone="yes"?> <Paper uid="C02-1016"> <Title>Determining Recurrent Sound Correspondences by Inducing Translation Models</Title> <Section position="3" start_page="0" end_page="2" type="intro"> <SectionTitle> 2 Related work </SectionTitle> <Paragraph position="0"> In a schematic description of the comparative method, the two steps that precede the determination of correspondences are the identification of cognate pairs (Kondrak, 2001), and their phonetic alignment (Kondrak, 2000). Indeed, if a comprehensive set of correctly aligned cognate pairs is available, the correspondences could be extracted by simply following the alignment links. Unfortunately, in order to make reliable judgments of cognation, it is necessary to know in advance what the correspondences are. Historical linguists solve this apparent circularity by guessing a small number of likely cognates and refining the set of correspondences and cognates in an iterative fashion.</Paragraph> <Paragraph position="1"> Guy (1994) outlines an algorithm for identifying cognates in bilingual wordlists which is based on correspondences. The algorithm estimates the probability of phoneme correspondences by employing a variant of the kh statistic on a contingency table, which indicates how often two phonemes co-occur in words of the same meaning. The probabilities are then converted into the estimates of cognation by means of some experimentation-based heuristics. The paper does not contain any evaluation on authentic language data, but Guy's program COGNATE, which implements the algorithm, is publicly available. An experimental evaluation of COGNATE is described in Section 6.</Paragraph> <Paragraph position="2"> Oakes (2000) describes a set of programs that together perform several steps of the comparative method, from the determination of correspondences in wordlists to the actual reconstruction of the protoforms. Word pairs are considered cognate if their edit distance is below a certain threshold. The edit operations cover a number of sound-change categories. Sound correspondences are deemed to be regular if they are found to occur more than once in the data. The paper describes experimental results of running the programs on a set of wordlists representing four Indonesian languages, and compares those to the reconstructions found in the linguistic literature. Section 6 contains an evaluation of one of the programs in the set, JAKARTA, on the cognate identification task.</Paragraph> </Section> class="xml-element"></Paper>