File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/intro/06/p06-2114_intro.xml

Size: 3,349 bytes

Last Modified: 2025-10-06 14:03:48

<?xml version="1.0" standalone="yes"?>
<Paper uid="P06-2114">
  <Title>Sinhala Grapheme-to-Phoneme Conversion and Rules for Schwa Epenthesis</Title>
  <Section position="3" start_page="0" end_page="890" type="intro">
    <SectionTitle>
1 Introduction
</SectionTitle>
    <Paragraph position="0"> The conversion of Text-to-Speech (TTS) involves many important processes. These processes can be divided mainly in to three parts; text analysis, linguistic analysis and waveform generation (Black and Lenzo, 2003). The text analysis process is responsible for converting the non-textual content into text. This process also involves tokenization and normalization of the text. The identification of words or chunks of text is called text-tokenization. Text normalization establishes the correct interpretation of the input text by expanding the abbreviations and acronyms. This is done by replacing the non-alphabetic characters, numbers, and punctuation with appropriate text strings depending on the context. The linguistic analysis process involves finding the correct pronunciation of words, and assigning prosodic features (eg. phrasing, intonation, stress) to the phonemic string to be spoken. The final process of a TTS system is waveform generation which involves the production of an acoustic digital signal using a particular synthesis approach such as formant synthesis, articulatory synthesis or waveform concatenation (Lemmetty, 1999). The text analysis and linguistic analysis processes together are known as the Natural Language Processing (NLP) component, while the waveform generation process is known as the Digital Signal Processing (DSP) component of a TTS System (Dutoit, 1997).</Paragraph>
    <Paragraph position="1"> Finding correct pronunciation for a given word is one of the first and most significant tasks in the linguistic analysis process. The component which is responsible for this task in a TTS system is often named the Grapheme-To-Phoneme (G2P), Text-to-Phone or Letter-To-Sound (LTS) conversion module. This module accepts a word and generates the corresponding phonemic transcription. Further, this phonemic transcription can be annotated with appropriate prosodic markers (Syllables, Accents, Stress etc) as well.</Paragraph>
    <Paragraph position="2"> In this paper, we describe the implementation and evaluation of a G2P conversion model for a Sinhala TTS system. A Sinhala TTS system is being developed based on Festival, the open source speech synthesis framework. Letter to sound conversion for Sinhala usually has simple one to one mapping between orthography and phonemic transcription for most Sinhala letters.</Paragraph>
    <Paragraph position="3"> However some G2P conversion rules are proposed in this paper to complement the generation of more accurate phonemic transcription.</Paragraph>
    <Paragraph position="4"> The rest of this paper is organized as follows: Section 2 gives an overview of the Sinhala phonemic inventory and the Sinhala writing system, Section 3 briefly discusses G2P conversion approaches. Section 4 describes the schwa epenthesis issue peculiar to Sinhala and Section 5 explains the Sinhala G2P conversion architecture.</Paragraph>
    <Paragraph position="5">  Section 6 gives experimental results and our discussion on it. The work is summarized in the final section.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML