File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/05/w05-0606_evalu.xml
Size: 5,003 bytes
Last Modified: 2025-10-06 13:59:32
<?xml version="1.0" standalone="yes"?> <Paper uid="W05-0606"> <Title>Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models</Title> <Section position="8" start_page="44" end_page="45" type="evalu"> <SectionTitle> 6 Experimental Results </SectionTitle> <Paragraph position="0"> In this section, we first report on the effect of model variations on the overall performance, and then we compare the best results for each algorithm.</Paragraph> <Section position="1" start_page="44" end_page="45" type="sub_section"> <SectionTitle> 6.1 Model Variations </SectionTitle> <Paragraph position="0"> Table 1 shows the average cognate recognition precision on the test set for a number of model variations combined with four basic algorithms, VIT, FOR, LOG, and FLO, which were introduced in Section 4.1. The first row refers to the fully trained 5The complete separation of training and testing data is difficult to achieve in this case because of the similarity of cognates across languages in the same family. For each of the removed languages, there are other closely related languages that are retained in the training set, which may exhibit similar or even identical correspondences.</Paragraph> <Paragraph position="1"> each model and algorithm combination.</Paragraph> <Paragraph position="2"> model without changes. The remaining rows contain the results for the model variations described in Section 4.2. In all cases, the simplifications are in effect during testing only, after the full model had been trained. We also performed experiments with the model simplified prior to training but their results were consistently lower than the results presented here.</Paragraph> <Paragraph position="3"> With the exception of the forward log odds algorithm, the best results are obtained with simplified models. The model with only a single transition parameter performs particularly well. On the other hand, the removal of the end state invariably causes a decrease in performance with respect to the full model. If a non-essential part of the model is made constant, only the Viterbi-based log odds algorithm improves significantly; the performance of the other three algorithms either deteriorates or shows no significant difference.</Paragraph> <Paragraph position="4"> Overall, the top four variations of the Viterbi-based log odds algorithm (shown in italics in Table 1) significantly outperform all other PHMM variations and algorithms. This is not entirely unexpected as LOG is a more complex algorithms than both VIT and FOR. It appears that the incorporation of the random model allows LOG to better distinguish true similarity from chance similarity. In addition, the log odds algorithms automatically normalize the results based on the lengths of the words under examination. However, from the rather disappointing performance of FLO, we conclude that considering all possible alignments does not help the log odds approach.</Paragraph> </Section> <Section position="2" start_page="45" end_page="45" type="sub_section"> <SectionTitle> 6.2 Comparison </SectionTitle> <Paragraph position="0"> Table 2 contains the results of the best variants, which are shown in boldface in Table 1, along with other methods for comparison. The results are separated into individual language pairs from the test set. For the baseline method, we selected the Longest Common Subsequence Ratio (LCSR), a measure of orthographic word similarity often used for cognate identification (Brew and McKelvie, 1996; Melamed, 1999; Koehn and Knight, 2001). The LCSR of two words is computed by dividing the length of their longest common subsequence by the length of the longer word. LLW stands for &quot;Levenshtein with learned weights&quot;, which is described in Section 4.4. We also include the results obtained by the ALINE word aligner (Kondrak, 2000) on phonetically-transcribed word lists.</Paragraph> <Paragraph position="1"> Because of the relatively small size of the lists, the differences among results for individual language pairs are not statistically significant in many cases. However, when the average over all language pairs is considered, the Viterbi-based log odds algorithm (LOG) is significantly better than all other algorithms in Table 2. The differences between the remaining algorithms are not statistically significant, except that they all significantly outperform the LCSR baseline.</Paragraph> <Paragraph position="2"> The fact that LOG is significantly better than ALINE demonstrates that given a sufficiently large training set, an HMM-based algorithm can automatically learn the notion of phonetic similarity, which is incorporated into ALINE. ALINE does not involve extensive supervised training, but it requires the words to be in a phonetic, rather than orthographic form. We conjecture that the performance of LOG would further improve if it could be trained on phonetically-transcribed multilingual data.</Paragraph> </Section> </Section> class="xml-element"></Paper>