File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/97/a97-1009_concl.xml

Size: 5,505 bytes

Last Modified: 2025-10-06 13:57:45

<?xml version="1.0" standalone="yes"?>
<Paper uid="A97-1009">
  <Title>Name pronunciation in German text-to-speech synthesis</Title>
  <Section position="6" start_page="54" end_page="55" type="concl">
    <SectionTitle>
6 Discussion and future work
</SectionTitle>
    <Paragraph position="0"> After the evaluation, the name analysis transducer was integrated into the text analysis component of the German TTS system. The weights were adjusted in such a way that for any token, i.e., word or word form, in the input text an immediate match in the lexicon is always favored over name analysis which in turn is prefered to unknown word analysis. Even though the evaluation experiments reported in this paper were performed on names in isolation rather than in sentential contexts, the error rates obtained in these experiments (Table 2) correspond to the performance on names by the integrated text analysis component for arbitrary text.</Paragraph>
    <Paragraph position="1"> There are two ways of interpreting the results. On the one hand, despite a significant improvement over the previous general-purpose text analysis we have to expect a pronunciation error rate of 11-13% for unknown names. In other words, roughly one out of eight names will be pronounced incorrectly.</Paragraph>
    <Paragraph position="2"> On the other hand, this performance compares rather favorably with the results reported for the German branch of the European Onomastica project (Onomastica, 1995). Onomastica was funded by the European Community from 1993 to 1995 and aimed to produce pronunciation dictionaries of proper names and place names in eleven languages. The final report describes the performance of grapheme-to-phoneme rule sets developed for each language. For German, the accuracy rate for quality band III-names which were transcribed by rule only--was 71%; in other words, the error rate in the same sense as used in this paper was 29%. The grapheme-to-phoneme conversion rules were written by experts, based on tens of thousands of the most frequent  names that were manually transcribed by an expert phonetician.</Paragraph>
    <Paragraph position="3"> However, the Onomastica results can only serve as a qualitative point of reference and should not be compared to our results in a strictly quantitative sense, for the following reasons. First, the percentage of proper names is likely to be much higher in the Onomastica database (no numbers are given in the report), in which ease higher error rates should be expected due to the inherent difficulty of proper name pronunciation. In our study, proper names were only covered in the context of street names.</Paragraph>
    <Paragraph position="4"> Second, Onomastica did not apply morphological analysis to names, while morphological decomposition, and word and syllable models, are the core of our approach. Third, Onomastica developed name-specific grapheme-to-phoneme rule sets, whereas we did not augment the general-purpose pronunciation rules.</Paragraph>
    <Paragraph position="5"> How can the remaining problems be solved, and what are the topics for future work? For the task of grapheme-to-phoneme conversion, several approaches have been proposed as alternatives to explicit rule systems, particularly self-learning methods (van Coile, 1990; Torkkola, 1993; Andersen and Dalsgaard, 1994) and neural networks (Sejnowski and Rosenberg, 1987; An et al., 1988). None of these methods were explored and applied in the present study. One reason is that it is difficult to construct or select a database if the set of factors that influence name pronunciation is at least partially unknown. In addition, even for an initially incomplete factor set the corresponding feature space is likely to cause coverage problems; neural nets, for instance, are known to perform rather poorly at predicting unseen feature vectors. However, with the results of the error analysis as a starting point, we feel that a definition of the factor set is now more feasible.</Paragraph>
    <Paragraph position="6"> One obvious area for improvement is to add a name-specific set of pronunciation rules to the general-purpose one. Using this approach, Belhoula (Belhoula, 1993) reports error rates of 4.3% for German place names and 10% for last names. These results are obtained in recall tests on a manually transcribed training corpus; it remains unclear, however, whether the error rates are reported by letter or by word.</Paragraph>
    <Paragraph position="7"> The addition of name-specific rules presupposes that the system knows which orthographic strings are names and which are regular words. The problem of name detection in arbitrary text (see (Thielea, 1995) for an approach to German name tagging) has not been addressed in our study; instead, it was by-passed for the time being by integrating the name component into the general text analysis system and by adjusting the weights appropriately.</Paragraph>
    <Paragraph position="8"> Other areas for future work are the systematic treatment of proper names outside the context of street names, and of brand names, trademarks, and company names. One important consideration here is the recognition of the ethnic origin of a name and the application of appropriate specific pronunciation rules. Heuristics, such as name pronunciation by analogy and rhyming (Coker, Church, and Liberman, 1990) and methods for, e.g., syllabic stress assignment (Church, 1986) can serve as role models for this ambitious task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML