XML Viewer - p00-1029

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/00/p00-1029_concl.xml
Size: 6,048 bytes
Last Modified: 2025-10-06 13:52:49
<?xml version="1.0" standalone="yes"?>
<Paper uid="P00-1029">
  <Title>Inducing Probabilistic Syllable Classes Using Multivariate Clustering</Title>
  <Section position="6" start_page="55" end_page="55" type="concl">
    <SectionTitle>
5 Discussion
</SectionTitle>
    <Paragraph position="0"> We have presented an approach to unsupervised learning and automatic detection of syllable structure, using EM-based multivariate clustering. The method yields phonologically meaningful syllable classes. These classes are shown to representvaluable input information in a g2p conversion task.</Paragraph>
    <Paragraph position="1"> In contrast to the application of two-dimensional EM-based clustering to syntax (Rooth et al., 1999), where semantic relations were revealed between verbs and objects, the syllable models cannot a priori be expected to yield similarly meaningful properties. This is because the syllable constituents (or phones) represent an inventory with a small number of units which can be combined to form meaningful larger units, viz. morphemes and words, but which do not themselves carry meaning. Thus, there is no reason why certain syllable types should occur significantly more often than others, except for the fact that certain morphemes and words havea higher frequency count than others in a given text corpus. As discussed in Section 4.2, however, we do nd some interesting properties of syllable classes, some of which apparently represent high-frequency function words and productive axes, while others are typically found in lexical content words. Subjected to  45 resp. 95 words could not be disambiguated by the 3- resp. 5-dimensional empirical distributions. The reported relatively small gains can be explained by the fact that our syllable models were applied only to this small number of ambiguous words.</Paragraph>
    <Paragraph position="2"> a pseudo-disambiguation task (Section 4.1), the 3-dimensional models conrm the intuition that the onset is the most variable part of the syllable.</Paragraph>
    <Paragraph position="3"> In a feasibility study we applied the 5-dimensional syllable model obtained for German to a g2p conversion task. Automatic conversion of a string of characters, i.e. a word, into a string of phonemes, i.e. its pronunciation, is essential for applications such as speechsynthesis from unrestricted text input, which can be expected to contain words that are not in the system's pronunciation dictionary or otherwise unknown to the system. The main purpose of the feasibility study was to demonstrate the relevance of the phonological information on syllable structure for g2p conversion. Therefore, information and probabilities derived from an alignment of grapheme and phoneme strings, i.e. the lowest twolevels in the trees displayed in Figure 6, was deliberately ignored. Data-driven pronunciation systems usually rely on training data that include an alignment of graphemes and phonemes. Damper et al. (1999) have shown that the use of unaligned training data signicantly reduces the performance of g2p systems. In our experiment, with training on unannotated text corpora and without an alignment of graphemes and phonemes, weobtained a word accuracy rate of 75.3% for the 5-dimensional German syllable model.</Paragraph>
    <Paragraph position="4"> Comparison of this performance with other systems is dicult: (i) hardly any quantitative g2p performance data are available for German;; (ii) comparisons across languages are hard to interpret;; (iii) comparisons across different approaches require cautious interpretations. The most direct point of comparison is the method presented by Muller (2000). In one of her experiments, the standard probabilitymodelwas applied to the hand-crafted cfg presented in this paper, yielding 42% word accuracy as evaluated on our test set. Running the test set through the pronunciation rule system of the IMS German Festival TTS system (Mohler, 1999) resulted in 55% word accuracy. The Bell Labs German TTS system (Mobius, 1999) performed at better than 94% word accuracy on our test set. This TTS system relies on an annotation of morphological structure for the words in its lexicon and it performs a morphological analysis of unknown words (Mobius, 1998);; the pronunciation rules draw on this structural information. These comparative results emphasize the value of phonotactic knowledge and information on syllable structure and morphological structure for g2p conversion.</Paragraph>
    <Paragraph position="5"> In a comparison across languages, a word accuracy rate of 75.3% for our 5-dimensional German syllable model is slightly higher than the best data-driven method for English with 72% (Damper et al., 1999). Recently, Bouma (2000) has reported a word accuracy of 92.6% for Dutch, using a `lazy' training strategy on data aligned with the correct phoneme string, and a hand-crafted system that relied on a large set of rule templates and a many-to-one mapping of characters to graphemes preceding the actual g2p conversion.</Paragraph>
    <Paragraph position="6"> We are condent that a judicious combination of phonological information of the type employed in our feasibility study with standard techniques such as g2p alignment of training data will produce a pronunciation system with a word accuracy that matches the one reported by Bouma (2000). We believe, however, that for an optimally performing system as is desired for TTS, an even more complex design will have to be adopted.</Paragraph>
    <Paragraph position="7"> In many languages, including English, German and Dutch, access to morphological and phonological information is required to reliably predict the pronunciation of words;; this view is further evidenced by the performance of the Bell Labs system, which relies on precisely this type of information. We agree with Sproat (1998, p. 77) that it is unrealistic to expect optimal results from a system that has no access to this type of information or is trained on data that are insucient for the task.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML