File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/01/p01-1053_concl.xml
Size: 2,657 bytes
Last Modified: 2025-10-06 13:53:06
<?xml version="1.0" standalone="yes"?> <Paper uid="P01-1053"> <Title>Automatic Detection of Syllable Boundaries Combining the Advantages of Treebank and Bracketed Corpora Training</Title> <Section position="7" start_page="2" end_page="2" type="concl"> <SectionTitle> 6 Discussion </SectionTitle> <Paragraph position="0"> We presented an approach to supervised learning and automatic detection of syllable boundaries, combining the advantages of treebank and bracketed corpora training. The method exploits the advantages of BCT by using the brackets of a pronunciation dictionary resulting in an unambigous analysis. Furthermore, a manually constructed linguistic grammar admit the use of maximal linguistic knowledge. Moreover, the advantage of TT is exploited: a simple estimation procedure, and a definite analysis of a given phoneme string. Our approach yields high word accuracy with linguistically motivated grammars using small training corpora, in comparison with the treebank grammar. The more linguistic knowledge is added to the grammar, the higher the accuracy of the grammar is. The best model recieved a 96.4% word accuracy rate (which is a harder criterion than syllable accuracy).</Paragraph> <Paragraph position="1"> Comparison of the performance with other systems is difficult: (i) hardly any quantitative syllabification performance data is available for German; (ii) comparisons across languages are hard to interpret; (iii) comparisons across different approaches require cautious interpretations. Nevertheless we want to refer to sev- null eral approaches that examined the syllabification task. The most direct point of comparison is the method presented by Muller (to appear 2001). In one of her experiments, the standard probability model was applied to a syllabification task, yielding about 89.9% accuracy. However, syllable boundary accuracy is measured and not word accuracy. Van den Bosch (1997) investigated the syllabification task with five inductive learning algorithms. He reported a generalisation error for words of 2.22% on English data. However, in German (as well as Dutch and Scandinavian languages) compounding by concatenating word forms is an extremely productive process. Thus, the syllabification task is much more difficult in German than in English. Daelemans and van den Bosch (1992) report a 96% accuracy on finding syllable boundaries for Dutch with a backpropagation learning algorithm. Vroomen et al. (1998) report a syllable boundary accuracy of 92.6% by measuring the sonority profile of syllables. Future work is to apply our method to a variety of other languages.</Paragraph> </Section> class="xml-element"></Paper>