File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/evalu/00/p00-1029_evalu.xml
Size: 9,901 bytes
Last Modified: 2025-10-06 13:58:38
<?xml version="1.0" standalone="yes"?> <Paper uid="P00-1029"> <Title>Inducing Probabilistic Syllable Classes Using Multivariate Clustering</Title> <Section position="5" start_page="55" end_page="55" type="evalu"> <SectionTitle> 4 Evaluation </SectionTitle> <Paragraph position="0"> In the following sections, (i) the 3-dimensional models are subjected to a pseudo-disambiguation task (4.1);; (ii) the syllable classes are qualitatively evaluated (4.2);; and (iii) the 5-dimensional syllable model for German is tested in a g2p task (4.3).</Paragraph> <Section position="1" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 4.1 Pseudo-Disambiguation </SectionTitle> <Paragraph position="0"> We evaluated our 3-dimensional clustering models on a pseudo-disambiguation task similar to the one described by Rooth et al. (1999), but specied to onset, nucleus, and coda ambiguity. The rst task is to judge which of two onsets on and on is more likely to appear in the context of a given nucleus n and a given coda cod. For this purpose, we constructed an evaluation corpus of 3000 syllables (on;;n;;cod) selected from the original data. Then, randomly chosen onsets on were attached to all syllables in the evaluation corpus, with the resulting syllables</Paragraph> <Paragraph position="2"> ;;n;;cod) appearing neither in the training nor in the evaluation corpus. Furthermore, the elements on;;n;;cod, and on were required to be part of the training corpus. Clustering models were parameterized in (up to 10) starting values of EM-training, in the number of classes of the model (up to 200), resulting in a sequence of 10 20 models. Accuracy was calculated as the number of times the model decided p(on;;n;;cod) p(on ;;n;;cod) for all choices made. Two similar tasks were designed for nucleus and coda. Results for the best starting values are shown in Figure 4. Models of 12 classes show the highest accuracy rates. For German wereached accuracy rates of 88-90% (nucleus and coda) and 77% (onset). For English we achieved accuracy rates of 92% (coda), 84% (nucleus), and 76% (onset). The results of the pseudo-disambiguation agree with intuition: in both languages (i) the onset is the most variable part of the syllable, as it is easy to nd minimal pairs that vary in the onset, (ii) it is easier to predict the coda and nucleus, as their choice is more restricted.</Paragraph> </Section> <Section position="2" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 4.2 Qualitative Evaluation </SectionTitle> <Paragraph position="0"> The following discussion is restricted to the 5-dimensional syllable models, as the qualityof the output increased when more dimensions were added. We can look at the results from dierent angles. For instance, we can verify if any of the classes are mainly representatives of a syllable class pertinent to a particular nucleus (as it is the case with the 3-dimensional models). Another interesting aspect is whether there are syllable classes that represent parts of lexical contentwords, as opposed to high-frequency function words. Finally, some syllable classes may correspond to productive axes.</Paragraph> <Paragraph position="1"> class 4 0.032</Paragraph> <Paragraph position="3"> German. The majority of syllable classes obtained for German is dominated by one particular nucleus per syllable class. In 24 out of 50 classes the probability of the dominantnucleus is greater than 99%, andin9casesitis indeed 100%. The only syllable nuclei that do not dominate any class are the front rounded vowels /y:, Y, 2:, 9/, the frontvowel /E:/ and the diphthong /OY/, all of which are among the least frequently occurring nuclei in the lexicon of German. Figure 5 depicts the classes that will be discussed now.</Paragraph> <Paragraph position="4"> Almost one third (28%) of the 50 classes are representatives of high-frequency function words. For example, class #7 is dominated by the function words in, ich, ist, im, sind, sich, all of whichcontain the short vowel /I/.</Paragraph> <Paragraph position="5"> Another 32% of the 50 classes represents syllables that are most likely to occur in initial, medial and nal positions in the open word classes of the lexicon, i.e. nouns, adjectives, and verbs. Class #4 covers several lexical entries involving the diphthong /aI/ mostly in stressed word-initial syllables. Class #40 provides complimentary information, as it also includes syllables containing /aI/, but here mostly in word-medial position.</Paragraph> <Paragraph position="6"> We also observe syllable classes that represent productive prexes (e.g., ver-, er-, zer-, vor-, her- in class #26) and suxes (e.g., -lich, -ig in class #34). Finally, there are two syllable classes (not displayed) that cover the most common inectional suxes involving the vowel /@/ (schwa).</Paragraph> <Paragraph position="7"> Class numbers are informative insofar as the classes are ranked by decreasing probability. Lower-ranked classes tend (i) not to be dominated by one nucleus;; (ii) to contain vowels with relatively low frequency of occurrence;; and (iii) to yield less clear patterns in terms of word class or stress or position. For illustration, class #46 (Figure 2) represents the syllable ent [Ent], both as a prex (INI) and as a sux (FIN), the former being unstressed (as in Entwurf design) and the latter stressed (as in Dirigent conductor).</Paragraph> <Paragraph position="8"> English. In 24 out of the 50 syllable classes obtained for English one dominantnucleus per syllable class is observed. In all of these cases the probability of the nucleus is larger than 99% and in 7 classes the nucleus probabilityis 100%. Besides several diphthongs only the relatively infrequent vowels /V/, /A:/ and /3:/ do not dominate any class. Figure 3 shows the classes that are described as follows.</Paragraph> <Paragraph position="9"> High-frequency function words are represented by 10 syllable classes. For example, class #0 and #17 and are dominated by the determiners the and a, respectively, and class #1 contains function words that involve the short vowel /I/, suchasin, is, it, his, if, its.</Paragraph> <Paragraph position="10"> Productiveword-forming suxes are found in class #3 (-ing), and common inectional suxes in class #4 (-er, -es, -ed). Class #10 is particularly interesting in that it represents a comparably large number of common sufxes, such as -tion, -ment, -al, -ant, -ent, ence and others.</Paragraph> <Paragraph position="11"> The majority of syllable classes, viz. 31 out of 50, contains syllables that are likely to be found in initial, medial and nal positions in the open word classes of the lexicon. For example, class #14 represents mostly stressed syllables involving the vowels /eI, A:, e:, O:/ and others, in a variety of syllable positions in nouns, adjectives or verbs.</Paragraph> </Section> <Section position="3" start_page="55" end_page="55" type="sub_section"> <SectionTitle> 4.3 Evaluation by g2p Conversion </SectionTitle> <Paragraph position="0"> In this section, we present a novel method of g2p conversion (i) using a cfg to produce all possible phonemic correspondences of a given grapheme string, (ii) applying a probabilistic syllable model to rank the pronunciation hypotheses, and (iii) predicting pronunciation by choosing the most probable analysis. We used a cfg for generating transcriptions, because grammars are expressive and writing grammar-rules is easy and intuitive.</Paragraph> <Paragraph position="1"> Our grammar describes how words are composed of syllables and syllables branch into onset, nucleus and coda. These syllable parts are re-written by the grammar as sequences of natural phone classes, e.g. stops, fricatives, nasals, liquids, as well as long and short vowels, and diphthongs. The phone classes are then re-interpreted as the individual phonemes that they are made up of. Finally, for each phoneme all possible graphemic correspondences are listed.</Paragraph> <Paragraph position="2"> Figure 6 illustrates two analyses (out of 100) of the German word Lotzinn (tin solder). The phoneme strings (represented by non-terminals named phon=...) and the syllable boundaries (represented by the non-terminal Syl) can be extracted from these analyses. Figure 6 depicts both an incorrect analysis [l2:ts][i:n] and its correct counterpart [l2:t][tsIn]. The next step is to rank these transcriptions by assigning probabilities to them. The key idea is to take the product of the syllable probabilities. Using the 5- null for the correct one. Thus weachieve the desired result of assigning the higher probability to the correct transcription.</Paragraph> <Paragraph position="3"> We evaluated our g2p system on a test set of 1835 unseen words. The ambiguity expressed as the average number of analyses per word was 289. The test set was constructed by collecting 295,102 words from the German Celex dictionary (Baayen et al., 1993) that were not seen in the STZ corpus. From this set we manually eliminated (i) foreign words, (ii) acronyms, (iii) proper names, (iv) verbs, and (v) words with more than three syllables.</Paragraph> <Paragraph position="4"> The resulting test set is available on the World the accuracy of two baseline systems: g2p conversion using the 3- and 5-dimensional empirical distributions (Section 2), respectively. The third and fth columns show the word Position can be derived from the cfg analyses, stress placementiscontrolled by the most likely distribution. null accuracy of two g2p systems using 3- and 5-dimensional syllable models, respectively.</Paragraph> <Paragraph position="5"> The g2p system using 5-dimensional syllable models achieved the highest performance (75.3%), which is a gain of 3% over the performance of the 5-dimensional baseline system and a gain of 8% over the performance of the</Paragraph> </Section> </Section> class="xml-element"></Paper>