File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/06/w06-1108_concl.xml
Size: 2,174 bytes
Last Modified: 2025-10-06 13:55:36
<?xml version="1.0" standalone="yes"?> <Paper uid="W06-1108"> <Title>Evaluation of String Distance Algorithms for Dialectology</Title> <Section position="8" start_page="60" end_page="61" type="concl"> <SectionTitle> 6 Conclusions and Prospects </SectionTitle> <Paragraph position="0"> In this paper we examined a range of string comparison algorithms by applying them to Norwegian and German dialect comparison. The Norwegian results suggest that sensitivity to linguistic context in the form of n-grams, and to linguistic structure in alignment improves measurement techniques, but they do not confirm the value of differential weighting for n-grams. The results mostly suggest that sensitivity to order of segments improves the measurements.</Paragraph> <Paragraph position="1"> The larger German data likewise is unfortunately more recalcitrant (as are other data sets we have examined, but in which we have less confidence). A disadvantage of the German data may be that several transcribers were involved, working over a period of twenty years, and that two types of surveys were used, having different orders of sentences. There may be subtle differences in pronunciation as a result of subjects' becoming more relaxed or more impatient in the course of a survey interview.</Paragraph> <Paragraph position="2"> On the other hand, the Norwegian data set is small (15 dialect sites). Our conclusions rely on assumptions of its quality and transcriber consistency, but this warrants further examination. We also cannot exclude the possibility that optimal measurements depend on features of the language and/or data set.</Paragraph> <Paragraph position="3"> It is tempting to wish to redo this study using a large, antiseptically clean data set, transcribed reliably by a minimal number of phoneticians, but the more important practical direction may be to try to understand which properties of data sets are important in selecting variants of pronunciation distance measures. Atlases of material on language varieties simply are not always clean and reliable, and if we wish to contribute to their analysis, we must keep this in mind.</Paragraph> </Section> class="xml-element"></Paper>