File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/p03-1062_concl.xml
Size: 2,697 bytes
Last Modified: 2025-10-06 13:53:34
<?xml version="1.0" standalone="yes"?> <Paper uid="P03-1062"> <Title>Learning to predict pitch accents and prosodic boundaries in Dutch</Title> <Section position="5" start_page="0" end_page="0" type="concl"> <SectionTitle> 5 Conclusion </SectionTitle> <Paragraph position="0"> With shallow features as input, we trained machine learning algorithms on predicting the placement of pitch accents and prosodic breaks in Dutch text, a desirable function for a TTS system to produce synthetic speech with good prosody. Both algorithms, the memory-based classifier MBL and decision tree inducer CART, were automatically optimized by an Iterative Deepening procedure, a classifier wrapper technique with progressive sampling of training data. It was shown that MBL significantly outperforms CART on both tasks, as well as on the combined task (predicting accents and breaks simultaneously). This again provides an indication that it is advantageous to retain individual instances in memory (MBL) rather than to discard outlier cases as noise (CART).</Paragraph> <Paragraph position="1"> Training on both tasks simultaneously, in one model rather than divided over two, results in generalization accuracies similar to that of the individually-learned models (identical on accent placement, and slightly lower for break placement).</Paragraph> <Paragraph position="2"> This shows that learning one task does not seriously hinder learning the other. From a practical point of view, it means that a TTS developer can resort to one system for both tasks instead of two.</Paragraph> <Paragraph position="3"> Pitch accent placement can be learned from shallow input features with fair accuracy. Break insertion seems a harder task, certainly in view of the informed punctuation baseline PUNC-rule. Especially the precision of the insertion of breaks at other points than those already indicated by commas and other 'pseudo-prosodic' orthographic mark up is hard. This may be due to the lack of crucial information in the shallow features, to inherent limitations of the ML algorithms, but may as well point to a certain amount of optionality or personal preference, which puts an upper bound on what can be achieved in break prediction (Koehn et al., 2000).</Paragraph> <Paragraph position="4"> We plan to integrate the placement of pitch accents and breaks in a TTS system for Dutch, which will enable the closed-loop annotation of more data using the TTS itself and on-line (active) learning.</Paragraph> <Paragraph position="5"> Moreover, we plan to investigate the perceptual cost of false insertions and deletions of accents and breaks in experiments with human listeners.</Paragraph> </Section> class="xml-element"></Paper>