XML Viewer - p98-1111

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/98/p98-1111_metho.xml
Size: 12,367 bytes
Last Modified: 2025-10-06 14:14:56
<?xml version="1.0" standalone="yes"?>
<Paper uid="P98-1111">
  <Title>Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS</Title>
  <Section position="3" start_page="675" end_page="675" type="metho">
    <SectionTitle>
2 Features of Spoken Korean
</SectionTitle>
    <Paragraph position="0"> This section briefly explains the linguistic characteristics of spoken Korean before describing the architecture.</Paragraph>
    <Paragraph position="1"> A Korean word (called eojeol) consists of more than one morpheme with clear-cut morpheme boundaries (Korean is an agglutinative language). Korean is a postpositional language with many kinds of noun-endings, verb-endings, and prefinal verb-endings. These functional morphemes determine the noun's case roles, verb's aspect/tenses, modals, and modification relations between words. The unit of pause in speech (phrase break) is usually different from that in written text. No phonological change occur between these phrase breaks. Phonological changes can occur in a morpheme, between morphemes in a word, and even between words in a phrase break as described in the 30 general phonological rules for Korean(Korean Ministry of Education, 1995). These changes include consonant and vowel assimilation, dissimilation, insertion, deletion, and contraction. For example, noun &amp;quot;kag-ryo&amp;quot; pronounced as &amp;quot;kangnyo&amp;quot; (meaning &amp;quot;cabinet&amp;quot;) is an example of phonological change within a morpheme. Noun plus noun-ending &amp;quot;such+gwa&amp;quot;, in which &amp;quot;such&amp;quot; means &amp;quot;charcoal&amp;quot; and &amp;quot;gwa&amp;quot; means &amp;quot;and&amp;quot; in English, is sounded as &amp;quot;sudggwa&amp;quot;, which is an example of the inter-morpheme phonological change. &amp;quot;Ta-seos gae&amp;quot;, which means &amp;quot;five items&amp;quot;, is sounded as &amp;quot;taseot ggae&amp;quot;, in which phonological changes occur between words. In addition, phonological changes can occur conditionally on the morphotactic environments but also on phonotactic environments.</Paragraph>
  </Section>
  <Section position="4" start_page="675" end_page="676" type="metho">
    <SectionTitle>
3 Architecture of the
Grapheme-to-Phoneme Converter
</SectionTitle>
    <Paragraph position="0"> Part-of-speech (POS) tagging is a basic step to the grapheme-to-phoneme conversion since phonological changes depend on morphotactic and phonotactic environments. The POS tagging system have to handle out-of-vocabulary (OOV) words for accurate grapheme-to-phoneme conversion of unlimited vocabulary (Bechet and E1-Beze, 1997). Figure 1 shows the architecture of our grapheme-to-phoneme converter integrated with the hybrid POS tagging system (Lee et al., 1997). The hybrid POS tagging system employs generalized OOV word handling mechanisms in the morphological analysis, and cascades statistical and rule-based approaches in the two-phase training architecture for POS disambiguation.</Paragraph>
    <Paragraph position="1"> table J I connectivity checker  phoneme converter in TTS applications Each morpheme tagged by the POS tagger is normalized by replacing non-Korean symbols by Korean graphemes to expand numbers, abbreviations, and acronyms. The phrase-break detector segments the POS sequences into several phrases according to phrase-break detection rules. In the phoneme converter, each morpheme in the phrase is converted into phoneme sequences by consulting the morpheme phonetic dictionary. The OOV morphemes which are not registered in the morpheme phonetic dictionary should be processed in two different ways. The graphemes in the morpheme boundary are converted into phonemes by consulting the morpheme phonetic pattern dictionary. The graphemes within morphemes are converted into phonemes according to CCV conversion rules. To model phoneme's connectablities between morpheme boundaries, the separate phoneme connectivity table encodes the phonological changes between the morpheme with their POS tags. Outputs of the grapheme-to-phoneme converter, that is, phoneme se- null quences of the input sentence, can be directly fed to the lower level signal processing module of TTS systems. Next section will give detail descriptions of each component of the grapheme-to-phoneme converter. The hybrid POS tagging system will not be explained in this paper, and interested readers can see the reference (Lee et al., 1997).</Paragraph>
  </Section>
  <Section position="5" start_page="676" end_page="678" type="metho">
    <SectionTitle>
4 Component Descriptions of the
</SectionTitle>
    <Paragraph position="0"/>
    <Section position="1" start_page="676" end_page="676" type="sub_section">
      <SectionTitle>
Converter
4.1 Morpheme Normalization
</SectionTitle>
      <Paragraph position="0"> The normalization replaces non-Korean symbols by corresponding Korean graphemes. Non-Korean symbols include numbers (e.g. 54, 12, 5,400, 4.2), dates (e.g. 20/1/97, 20-Jan97), times (e.g. 12:46), scores (e.g. 74:64), mathematical expressions (e.g. 4+5, 1/3), telephone numbers, abbreviations (e.g. km, ha) and acronyms (e.g. UNESCO, OECD). Especially, acronyms have two types: spelled acronyms such as OECD and pronounced ones like a word such as UNESCO.</Paragraph>
      <Paragraph position="1"> The numbers are converted into the corresponding Korean graphemes using deterministic finite automata. The dates, times, scores, expressions and telephone numbers are converted into equivalent graphemes using their formats and values. The abbreviations and acronyms are enrolled in the morpheme phonetic dictionary, and converted into the phonemes using the morpheme-to-phoneme conversion module.</Paragraph>
    </Section>
    <Section position="2" start_page="676" end_page="676" type="sub_section">
      <SectionTitle>
4.2 Phrase-Break Detection
</SectionTitle>
      <Paragraph position="0"> Phrase-break boundaries are important to the subsequent processing such as morpheme-to-phoneme conversion and prosodic feature generation. Graphemes in phrase-break boundaries are not phonologically changed and sounded as their original corresponding phonemes in Korean.</Paragraph>
      <Paragraph position="1"> A number of different algorithms have been suggested and implemented for phrase break detection (Black and Taylor, 1997). The simplest algorithm uses deterministic rules and more complicated algorithms can use syntactic knowledge and even semantic knowledge. We designed simple rules using break and POS tagged corpus. We found that, in Korean, the average length of phrases is 5.6 words and over 90% of breaks are after 6 different POS tags: conjunctive ending, auxiliary particle, case particle, other particle, adverb and adnominal ending. The phrase-break detector assigns breaks after these 6 POS tags considering the length of phrases.</Paragraph>
    </Section>
    <Section position="3" start_page="676" end_page="677" type="sub_section">
      <SectionTitle>
4.3 Morpheme-to-Phoneme Conversion
</SectionTitle>
      <Paragraph position="0"> The morphemes registered in the morpheme phonetic dictionary can be directly converted into phonemes by consulting the dictionary entries. However, separate method to process the OOV morphemes which are not registered in the dictionary is necessary. We developed a new method as shown Figure 2.</Paragraph>
      <Paragraph position="1">  The morpheme phonetic dictionary contains POS tag, morpheme, phoneme connectivity (left and right) and phoneme sequence for each entry. We try to register minimum number of morpheme in the dictionary. So it contains only the morphemes which are difficult to process using the next OOV morpheme conversion modules. Table 1 shows example entries for the common noun &amp;quot;pang-gabs&amp;quot;, meaning &amp;quot;price of a room&amp;quot; in hotel reservation dialogs. The common noun &amp;quot;pang-gabs&amp;quot; can be pronounced as &amp;quot;pang-ggam&amp;quot;, &amp;quot;pang-ggab&amp;quot; or &amp;quot;pang-ggabss&amp;quot; according to first phoneme of the adjacent morphemes. null To handle the OOV morphemes, morpheme phonetic pattern dictionary is developed to contain all the general patterns of Korean POS tags, morphemes, phoneme connectivity and phoneme sequences. Boundary phonemes of the OOV morphemes can be converted to their candidate phonemes, and the phonological connectivity for them can be acquired by consulting this morpheme phonetic pattern dictionary.  POS tag morpheme phoneme sequence left connectivity right connectivity common noun pang-gabs pang-ggam 'p' no change 'bs' changed to 'm' common noun pang-gabs pang-ggab 'p' no change 'bs' changed to 'b' common noun pang-gabs pang-ggabss 'p' no change 'bs' changed to 'bss'  Example entries corresponding to the irregular verb &amp;quot;teud&amp;quot;, meaning &amp;quot;hear&amp;quot;, are shown in Table 2. Meta characters, 'Z', 'Y', 'V', '*' designate single consonant, consonant except silence phoneme, vowel, any character sequence with variable length in the order. The table shows that the first grapheme 't' can be phonologically changed to 'tt' according to the last phoneme of the preceding morpheme (left connectivity), and the last grapheme 'd' can be phonologically changed to 'n' according to the first phoneme of the following morpheme(right connectivity).</Paragraph>
      <Paragraph position="2"> The morpheme phonetic pattern dictionary contains similar 1,992 entries to model the general phonological rules for Korean.</Paragraph>
      <Paragraph position="3"> The graphemes within a morpheme for OOV morphemes are converted into phonemes using the CCV conversion rules. The CCV conversion rules are the mapping rules between grapheme to phoneme in character tri-gram forms which are in the order of consonant(C) consonant(C) vowel(V) spanning two consecutive syllables.</Paragraph>
      <Paragraph position="4"> The CCV rules are designed and automatically learned from a corpus reflecting the following Korean phonological facts.</Paragraph>
      <Paragraph position="5"> * Korean is a syllable-base language, i.e., Korean syllable is the basic unit of the graphemes and consists of first consonant, vowel and final consonant (CVC).</Paragraph>
      <Paragraph position="6"> * The number of possible consonants for each syllable can be varied in grapheme-to-phoneme conversion.</Paragraph>
      <Paragraph position="7"> * The number of vowels for each syllable is not changed.</Paragraph>
      <Paragraph position="8"> * Phonological changes of the first consonant are only affected by the final consonant of the preceding syllable and the following vowel of the same syllable.</Paragraph>
      <Paragraph position="9"> * Phonological changes of the final consonant are only affected by the first consonant of the following syllable.</Paragraph>
      <Paragraph position="10"> * Phonological changes of the vowel are not affected by the following consonant.</Paragraph>
      <Paragraph position="11"> The boundary graphemes of the OOV morphemes are phonologically changed according to the POS tag and the boundary graphemes of the preceding and following morphemes. On the other hand, the inner grapheme conversion is not affected by the POS tag, but only by the adjacent graphemes within the same morpheme. The CCV conversion rules can model the fact easily, but the conventional CC conversion rules (Park and Kwon, 1995) cannot model the influence of the vowels.</Paragraph>
    </Section>
    <Section position="4" start_page="677" end_page="678" type="sub_section">
      <SectionTitle>
4.4 Phoneme Connectivity Check
</SectionTitle>
      <Paragraph position="0"> To verify the boundary phonemes' connectablity to one another, the separate phoneme connectivity table encodes the phonologically connectable pair of each morpheme which has phonologically changed boundary graphemes.</Paragraph>
      <Paragraph position="1"> This phoneme connectivity table indicates the grammatical sound combinations in Korean  phonology using the defined left and right connectivity information.</Paragraph>
      <Paragraph position="2"> The morpheme-to-phoneme conversion can generate a lot of phoneme sequence candidates for single morpheme. We put the whole phoneme sequence candidates in a phoneme graph where a correct phoneme sequence path can be selected for input sentence. The phoneme connectivity check performs this selection and prunes the ungrammatical phoneme sequences in the graph.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML