Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 890–897,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Sinhala Grapheme-to-Phoneme Conversion and  
Rules for Schwa Epenthesis 
 
 
Asanka Wasala, Ruvan Weerasinghe and Kumudu Gamage 
Language Technology Research Laboratory 
University of Colombo School of Computing 
35, Reid Avenue, Colombo 07, Sri Lanka 
{awasala,kgamage}@webmail.cmb.ac.lk, arw@ucsc.cmb.ac.lk 
 
  
 
Abstract 
This paper describes an architecture to 
convert Sinhala Unicode text into pho-
nemic specification of pronunciation. The 
study was mainly focused on disambigu-
ating schwa-/\/ and /a/ vowel epenthesis 
for consonants, which is one of the sig-
nificant problems found in Sinhala. This 
problem has been addressed by formulat-
ing a set of rules. The proposed set of 
rules was tested using 30,000 distinct 
words obtained from a corpus and com-
pared with the same words manually 
transcribed to phonemes by an expert. 
The Grapheme-to-Phoneme (G2P) con-
version model achieves 98 % accuracy. 
1 Introduction 
The conversion of Text-to-Speech (TTS) in-
volves many important processes. These proc-
esses can be divided mainly in to three parts; text 
analysis, linguistic analysis and waveform gen-
eration (Black and Lenzo, 2003). The text analy-
sis process is responsible for converting the non-
textual content into text. This process also in-
volves tokenization and normalization of the 
text. The identification of words or chunks of 
text is called text-tokenization. Text normaliza-
tion establishes the correct interpretation of the 
input text by expanding the abbreviations and 
acronyms. This is done by replacing the non-
alphabetic characters, numbers, and punctuation 
with appropriate text strings depending on the 
context. The linguistic analysis process involves 
finding the correct pronunciation of words, and 
assigning prosodic features (eg. phrasing, intona-
tion, stress) to the phonemic string to be spoken. 
The final process of a TTS system is waveform 
generation which involves the production of an 
acoustic digital signal using a particular synthesis 
approach such as formant synthesis, articulatory 
synthesis or waveform concatenation (Lemmetty, 
1999). The text analysis and linguistic analysis 
processes together are known as the Natural 
Language Processing (NLP) component, while 
the waveform generation process is known as the 
Digital Signal Processing (DSP) component of a 
TTS System (Dutoit, 1997). 
Finding correct pronunciation for a given 
word is one of the first and most significant tasks 
in the linguistic analysis process. The component 
which is responsible for this task in a TTS sys-
tem is often named the Grapheme-To-Phoneme 
(G2P), Text-to-Phone or Letter-To-Sound (LTS) 
conversion module. This module accepts a word 
and generates the corresponding phonemic tran-
scription. Further, this phonemic transcription 
can be annotated with appropriate prosodic 
markers (Syllables, Accents, Stress etc) as well. 
In this paper, we describe the implementation 
and evaluation of a G2P conversion model for a 
Sinhala TTS system. A Sinhala TTS system is 
being developed based on Festival, the open 
source speech synthesis framework. Letter to 
sound conversion for Sinhala usually has simple 
one to one mapping between orthography and 
phonemic transcription for most Sinhala letters. 
However some G2P conversion rules are pro-
posed in this paper to complement the generation 
of more accurate phonemic transcription. 
The rest of this paper is organized as follows: 
Section 2 gives an overview of the Sinhala pho-
nemic inventory and the Sinhala writing system, 
Section 3 briefly discusses G2P conversion ap-
proaches. Section 4 describes the schwa epenthe-
sis issue peculiar to Sinhala and Section 5 ex-
plains the Sinhala G2P conversion architecture. 
890
Section 6 gives experimental results and our dis-
cussion on it. The work is summarized in the 
final section. 
2 Sinhala Phonemic Inventory and 
Writing System 
2.1 The Sinhala Phonemic Inventory 
Sinhala is the official language of Sri Lanka and 
the mother tongue of the majority - 74% of its 
population. Spoken Sinhala contains 40 segmen-
tal phonemes; 14 vowels and 26 consonants as 
classified below in Table 1 and Table 2 (Ka-
runatillake, 2004). 
There are two nasalized vowels occurring in 
two or three words in Sinhala. They are /a~/, /a~:/, 
/æ~/ and /æ~~:/ (Karunatillake, 2004). Spoken Sin-
hala also has following Diphthongs; /iu/, /eu/, 
/æu/, /ou/, /au/, /ui/, /ei/, /æi/, /oi/ and /ai/  
(Disanayaka, 1991).  
 
Front Central Back 
 
Short Long Short Long Short Long 
High 
i     i:      u    u:    
Mid 
e   e:   \ \: o    o:     
Low 
æ   æ:   a a:   
 
Table 1. Spoken Sinhala Vowel Classification. 
 
 Lab.Den. Alv.Ret.Pal. Vel.Glo.
Voiceless 
p    t      ˇ   k     
Stops 
 
Voiced 
b    d    Î     ˝     
Voiceless 
    c      
Affricates
Voiced 
    Ô      
Pre-nasalized 
voiced stops 
b~   d~    Î~     ˝~     
Nasals 
m    n     μ   ˜     
Trill 
  r         
Lateral 
  l         
Spirants 
f     s       ß   h    
Semivowels 
w       j      
 
Table 2
*
. Spoken Sinhala Consonant  
Classification. 
 
A separate sign for vowel /\/ is not provided by 
the Sinhala writing system. In terms of distribu-
tion, the vowel /\/ does not occur at the begin-
ning of a syllable except in the conjugational 
variants of verbs formed from the verbal stem 
/k\r\/ (to do). In contrast to this, though the letter 
                                                 
*
 Lab. – Labial, Den. – Dental, Alv. – Alveolar, Ret. –
Retroflex, Pal. – Palatal, Vel. – Velar and Glo. – Glottal. 
“ඦ ”, which symbolizes the consonant sound /Ô~/ 
exists, it is not considered a phoneme in Sinhala. 
2.2 The Sinhala Writing System 
The Sinhala character set has 18 vowels, and 42 
consonants as shown in Table 3. 
 
Vowels and corresponding vowel modifiers 
(within brackets): 
අ   ආ (◌ා ) ඇ (◌ැ )  ඈ (◌ෑ ) ඉ (◌
ි
) ඊ (◌
ී
) උ (◌ ු) ඌ (◌ ූ) ඍ (◌ෘ ) 
ඎ (◌ෲ ) ඏ (◌ෟ ) ඐ (◌ෳ ) එ (ෙ◌ ) ඒ (ෙ◌ ේ) ඓ (ෛ◌ ) 
ඔ (ෙ◌ො )  ඕ  (ෙ◌ෝ )  ඖ (ෙ◌ෞ ) 
 
Consonants: 
ක  ඛ  ග  ඝ  ඞ  ඟ  ච  ඡ  ජ  ඣ  ඤ  ඦ  ට  ඨ  ඩ  ඪ  ණ  ඬ  ත  ථ ද  
ධ  න  ඳ  ප  ඵ  බ  භ  ම  ඹ  ය  ර  ල  ව  ශ  ෂ  ස  හ  ළ  ෆ  ◌ං  ◌ඃ   
 
Special symbols: ◌ null    ◌null      null   ඥ  
Inherent vowel remover (Hal marker): ◌  ්  
 
Table 3. Sinhala Character Set. 
 
Sinhala characters are written left to right in 
horizontal lines. Words are delimited by a space 
in general. Vowels have corresponding full-
character forms when they appear in an absolute 
initial position of a word. In other positions, they 
appear as ‘strokes’ and, are used with consonants 
to denote vowel modifiers. All vowels except 
“ඎ ” /iru:/, are able to occur in word initial posi-
tions (Disanayaka, 1995). The vowel /ə / and /ə :/ 
occurs only in loan words of English origin. 
Since there are no special symbols to represent 
them, frequently the “අ ” vowel is used to sym-
bolize them (Karunatillake, 2004). 
All consonants occur in word initial position 
except /ŋ / and nasals (Disanayaka, 1995). The 
symbols “ණ ”, and “ළ ” represent the retroflex 
nasal /¯/ and the retroflex lateral /Æ/ respectively. 
But they are pronounced as their respective 
alveolar counterparts “න ”-/n/ and “ල ”-/l/. 
Similarly, the symbol “ෂ ” representing the 
retroflex sibilant /Í/, is pronounced as the palatal 
sibilant “ශ ”-/ß/. The corresponding aspirated 
symbols of letters ක , ග , ච , ජ , ට , ඩ , ත , ද , ප , බ  
namely ඛ , ඝ , ඡ , ඣ , ඪ , ථ , ධ , ඵ , භ  respectively 
are pronounced like the corresponding un-
aspirates (Karunatillake, 2004). When conso-
nants are combined with /r/ or /j/, special con-
junct symbols are used. “ර් ”-/r/ immediately fol-
lowing a consonant can be marked by the symbol 
“◌ null” added to the bottom of the consonant preced-
ing it. Similarly, “ය් ”-/j/, immediately following 
consonant can be marked by the symbol “◌null ” 
891
added to the right-hand side of the consonant 
preceding it (Karunatillake, 2004). “ඏ ” /ilu/ and 
“ඐ ” /ilu:/ do not occur in contemporary Sinhala 
(Disanayaka, 1995). Though there are 60 sym-
bols in Sinhala (Disanayaka, 1995), only 42 
symbols are necessary to represent Spoken Sin-
hala (Karunatillake, 2004). 
3 G2P Conversion Approaches 
The issue of mapping textual content into pho-
nemic content is highly language dependent. 
Three main approaches of G2P conversion are; 
use of a pronunciation dictionary, use of well 
defined language-dependent rules and data-
driven methods (El-Imam and Don, 2005). 
One of the easiest ways of G2P conversion is 
the use of a lexicon or pronunciation dictionary. 
A lexicon consists of a large list of words to-
gether with their pronunciation. There are several 
limitations to the use of lexicons. It is practically 
impossible to construct such to cover the whole 
vocabulary of a language owing to Zipfian phe-
nomena. Though a large lexicon is constructed, 
one would face other limitations such as efficient 
access, memory storage etc. Most lexicons often 
do not include many proper names, and only 
very few provide pronunciations for abbrevia-
tions and acronyms. Only a few lexicons provide 
distinct entries for morphological productions of 
words. In addition, pronunciations of some 
words differ based on the context and their parts-
of-speech. Further, an enormous effort has to be 
made to develop a comprehensive lexicon. In 
practical scenarios, speech synthesizers as well 
as speech recognizers need to be able to produce 
the pronunciation of words that are not in the 
lexicon. Names, morphological productivity and 
numbers are the three most important cases that 
cause the use of lexica to be impractical (Juraf-
sky and Martin, 2000).  
To overcome these difficulties, rules can be 
specified on how letters can be mapped to pho-
nemes. In this way, the size of the lexicon can be 
reduced as only to contain exceptions to the 
rules. In contrast to the above fact, some systems 
rely on using very large lexicons, together with a 
set of letter-to-sound conversion rules to deal 
with words which are not found in the lexicon 
(Black and Lenzo, 2003). 
These language and context dependent rules 
are formulated using phonetic and linguistic 
knowledge of a particular language. The com-
plexity of devising a set of rules for a particular 
language is dependent on the degree of corre-
spondence between graphemes and phonemes. 
For some languages such as English and French, 
the relationship is complex and require large 
numbers of rules (El-Imam and Don, 2005; 
Damper et al., 1998), while some languages such 
as Urdu (Hussain, 2004), and Hindi (Ramakish-
nan et al., 2004; Choudhury, 2003) show regular 
behavior and thus pronunciation can be modeled 
by defining fairly regular simple rules. 
Data-driven methods are widely used to avoid 
tedious manual work involving the above ap-
proaches. In these methods, G2P rules are cap-
tured by means of various machine learning 
techniques based on a large amount of training 
data. Most previous data-driven approaches have 
been used for English. Widely used data-driven 
approaches include, Pronunciation by Analogy 
(PbA), Neural Networks (Damper et al., 1998), 
and Finite-State-Machines (Jurafsky and Martin, 
2000). Black et al. (1998) discussed a method for 
building general letter-to-sound rules suitable for 
any language, based on training a CART – deci-
sion tree. 
4 Schwa Epenthesis in Sinhala 
G2P conversion problems encountered in Sinhala 
are similar to those encountered in the Hindi lan-
guage (Ramakishnan et al., 2004). All consonant 
graphemes in Sinhala are associated with an in-
herent vowel schwa-/ə / or /a/ which is not repre-
sented in orthography. Vowels other than /ə / and 
/a/ are represented in orthographic text by plac-
ing specific vowel modifier diacritics around the 
consonant grapheme. In the absence of any 
vowel modifier for a particular consonant graph-
eme, there is an ambiguity of associating /ə / or 
/a/ as the vowel modifier. The inherent vowel 
association in Sinhala can be distinguished from 
Hindi. In Hindi the only possible association is 
schwa vowel where as in Sinhala either of 
vowel-/a/ or schwa-/ə / can be associated with a 
consonant. Native Sinhala speakers are naturally 
capable of choosing the association of the appro-
priate vowel (/ə / or /a/) in context. Moreover, 
linguistic rules describing the transformation of 
G2P, is rarely found in literature, with available 
literature not providing any precise procedure 
suitable for G2P conversion of contemporary 
Sinhala. Automating the G2P conversion process 
is a difficult task due to the ambiguity of choos-
ing between /ə / and /a/. 
A similar phenomenon is observed in Hindi 
and Malay as well. In Hindi, the “deletion of the 
schwa vowel (in some cases)” is successfully 
892
solved by using rule based algorithms (Choud-
hury 2003; Ramakishnan et al., 2004). In Malay, 
the character ‘e’ can be pronounced as either 
vowel /e/ or /ə /, and rule based algorithms are 
used to address this ambiguity (El-Imam and 
Don, 2005). 
In our research, a set of rules is proposed to 
disambiguate epenthesis of /a/ and /ə /, when as-
sociating with consonants. Unlike in Hindi, in 
Sinhala, the schwa is not deleted, instead always 
inserted. Hence, this process is named “Schwa 
Epenthesis” in this paper. 
5 Sinhala G2P Conversion Architecture 
An architecture is proposed to convert Sinhala 
Unicode text into phonemes encompassing a set 
of rules to handle schwa epenthesis. The G2P 
architecture developed for Sinhala is identical to 
the Hindi G2P architecture (Ramakishnan et al., 
2004). The input to the system is normalized 
Sinhala Unicode text. The G2P engine first maps 
all characters in the input word into correspond-
ing phonemes by using the letter-to-phoneme 
mapping table below (Table 4).  
 
අ   
/a/ 
ඔ  ,ෙ◌ො   
/o/ 
ඬ  
/Î~/ 
ෆ  
/f/ 
ආ ,◌ා  
/a:/ 
ඕ ,ෙ◌ෝ  
/o:/ 
ත ,ථ  
/t/ 
◌ෲ  
/ru:/ 
ඇ ,◌ැ  
/æ/ 
ඖ ,ෙ◌ෞ  
/ou/ 
ද ,ධ  
/d/ 
  
 
ඈ ,◌ෑ  
/æ:/ 
ක ,ඛ  
/k/ 
ඳ  
/d~/ 
 
 
ඉ  ,◌
ි
 
/i/ 
ග ,ඝ  
/˝/ 
ප ,ඵ  
/p/   
ඊ ,◌
ී
 
/i:/ 
ඞ ,◌ං   
/˜/ 
බ ,භ  
/b/ 
 
 
උ ,◌ ු 
/u/ 
ඟ  
/˝~/ 
ම  
/m/   
ඌ .◌ ූ 
/u:/ 
ච ,ඡ  
/c/ 
ඹ  
/b~/ 
  
සෘ   
/ri/ 
ජ ,ඣ  
/Ô/ 
ය  
/j/ 
 
 
◌ෘ   
/ru/ 
ඤ  
/μ/ 
ර  
/r/   
ඏ  
/ilu/ 
ඥ  
/jμ/ 
ල ,ළ  
/l/   
ඐ  
/ilu:/ 
ඦ  
/Ô~/ 
ව  
/w/   
එ  ,ෙ◌   
/e/ 
ට ,ඨ  
/ˇ/ 
ශ ,ෂ  
/ß/ 
 
 
ඒ ,ෙ◌ ේ 
/e:/ 
ඩ ,ඪ  
/Î/ 
ස  
/s/ 
 
 
ඓ ,ෛ◌  
/ai/ 
න ,ණ  
/n/ 
හ ,◌ඃ  
/h/   
 
Table 4. G2P Mapping Table 
 
The mapping procedure is given in section 5.1. 
Then, a set of rules are applied to this phonemic 
string in a specific order to obtain a more accu-
rate version. This phonemic string is then com-
pared with the entries in the exception lexicon. If 
a matching entry is found, the correct pronuncia-
tion form of the text is obtained from the lexicon, 
otherwise the resultant phonemic string is re-
turned. Hence, the final output of G2P model is 
the phonemic transcription of the input text. 
5.1 G2P Mapping Procedure 
Each tokenized word represented by Unicode 
normalization form is analyzed by individual 
graphemes from left to right. By using the G2P 
mapping table (Table 4), corresponding pho-
nemes are obtained. As in the given example   
Figure 1, no mappings are required for the Zero-
Width-Joiner and diacritic Hal marker “◌ ්” (Ha-
lant) which is used to remove the inherent vowel 
in a consonant. 
 
  
  
Figure 1. G2P Mapping (Example). 
 
The next step is epenthesis of schwa-/ə / for 
consonants. In Sinhala, the tendency of associat-
ing a /ə / with consonant is very much higher than 
associating vowel /a/. Therefore, initially, all 
plausible consonants are associated with /ə /. To 
obtain the accurate pronunciation, the assigned 
/ə / is altered to /a/ or vice versa by applying the 
set of rules given in next section. However, when 
associating /ə / with consonants, /ə / should asso-
ciate only with consonant graphemes excluding 
the graphemes “◌ං ”, “ඞ ” and “◌ඃ ”, which do not 
contain any vowel modifier or diacritic Hal 
marker. In the above example, only /n/ and first 
/j/ are associated with schwa, because other con-
sonants violate the above principle. When schwa 
is associated with appropriate consonants, the 
resultant phonemic string for the given example 
(section 5.1) is; /nə mjə ji/. 
5.2 G2P Conversion Rules 
It is observed that resultant phoneme strings 
from the above procedure should undergo several 
modifications in terms of schwa assignments into 
vowel /a/ or vice versa, in order to obtain the ac-
curate pronunciation of a particular word. 
Guided by the literature (Karunatillake, 2004), it 
was noticed that these modifications can be car-
ried out by formulating a set of rules.  
The G2P rules were formulated with the aid of 
phonological rules described in the linguistic 
literature (Karunatillake, 2004) and by a com-
prehensive word search analysis using the UCSC 
893
Sinhala corpus BETA (2005). Some of these ex-
isting phonological rules were altered in order to 
reflect the observations made in the corpus word 
analysis and to achieve more accurate results. 
The proposed new set of rules is empirically 
shown to be effective and can be conveniently 
implemented using regular expressions. 
Each rule given below is applied from left to 
right, and the presented order of the rules is to be 
preserved. Except for rule #1, rule #5, rule #6 
and rule #8, all other rules are applied repeatedly 
many times to a single word until the conditions 
presented in the rules are satisfied. 
Rule #1: If the nucleus of the first syllable is a 
schwa, the schwa should be replaced by vowel 
/a/ (Karunatillake, 2004), except in the following 
situations;   
(a) The syllable starts with /s/  followed by /v/.   
(ie. /sv/)  
(b) The first syllable starts with /k/ where as, 
/k/ is followed by /ə / and subsequently /ə / is pre-
ceded by /r/.  (ie. /kə r/) 
(c) The word consists of a single syllable having 
CV structure (eg. /də /) 
Rule #2: 
(a) If /r/ is preceded by any consonant, followed 
by /ə / and subsequently followed by /h/, then /ə / 
should be replaced by /a/. 
(/[consonant]rə h/->/[consonant]rah/ ) 
(b) If /r/ is preceded by any consonant, followed 
by /ə / and subsequently followed by any conso-
nant other than /h/, then /ə / should be replaced by 
/a/. 
(/[consonant]rə [!h]/->/[consonant]ra[!h]/ ) 
(c) If /r/ is preceded by any consonant, followed 
by /a/ and subsequently followed by any conso-
nant other than /h/, then /a/ should be replaced by 
/ə /. 
(/[consonant]ra[!h]/->/[consonant]rə !h]/) 
(d) If /r/ is preceded by any consonant, followed 
by /a/ and subsequently followed by /h/, then /a/ 
is retained. 
(/[consonant]ra[h]/->/[consonant]ra[h]/) 
Rule #3: If any vowel in the set {/a/, /e/, /æ/, /o/, 
/\/} is followed by /h/ and subsequently /h/ is 
preceded by schwa, then schwa should replaced 
by vowel /a/. 
Rule #4: If schwa is followed by a consonant 
cluster, the schwa should be replaced by /a/ (Ka-
runatillake, 2004). 
Rule #5: If /ə / is followed by the word final con-
sonant, it should be replaced by /a/, except in the 
situations where the word final consonant is /r/, 
/b/, /Î/ or /ˇ/. 
Rule #6: At the end of a word, if schwa precedes 
the phoneme sequence /ji/, the schwa should be 
replaced by /a/ (Karunatillake, 2004). 
Rule #7: If the /k/ is followed by schwa, and 
subsequent phonemes are /r/ or /l/ followed by 
/u/, then schwa should be replaced by phoneme 
/a/. (ie. /kə (r|l)u/->/ka(r|l)u/) 
Rule #8: Within the given context of following 
words, /a/ found in phoneme sequence /kal/, (the 
left hand side of the arrow) should be changed to 
/ə / as shown in the right hand side.  
• /kal(a:|e:|o:)y/->/kə l(a:|e:|o:)y/ 
• /kale(m|h)(u|i)/->/kə le(m|h)(u|i)/ 
• /kalə h(u|i)/->/kə leh(u|i)/ 
• /kalə /->/kə lə / 
The above rules handle the schwa epenthesis 
problem. The corresponding diphthongs (refer 
section 2) are then obtained by processing the 
resultant phonetized string. This string is again 
analyzed from left to right, and the phoneme se-
quences given in the first column of Table 5 are 
replaced by the diphthong, represented in the 
second column. 
 
Phoneme Sequence Diphthong 
/i/ /w/ /u/ /iu/ 
/e/ /w/ /u/   /eu/ 
/æ/ /w/ /u/ /æu/ 
/o/ /w/ /u/   /ou/ 
/a/ /w/ /u/  /au/ 
/u/ /j/ /i/   /ui/ 
/e/ /j/ /i/   /ei/ 
/æ/ /j/ /i/   /æi/ 
/o/ /j/ /i/   /oi/ 
/a/ /j/ /i/   /ai/ 
 
Table 5. Diphthong Mapping Table. 
 
The application of the above rules for the 
given example (section 5.1) is illustrated in Fig-
ure 2. 
 
 
Figure 2. Application of G2P Rules – An Exam-
ple. 
894
6  Results and Discussion 
Text obtained from the category “News Paper> 
Feature Articles > Other” of the UCSC Sinhala 
corpus was chosen for testing due to the hetero-
geneous nature of these texts and hence per-
ceived better representation of the language in 
this part of the corpus
*
. A list of distinct words 
was first extracted, and the 30,000 most fre-
quently occurring words chosen for testing.  
The overall accuracy of our G2P module was 
calculated at 98%, in comparison with the same 
words correctly transcribed by an expert.  
Since this is the first known documented work 
on implementing a G2P scheme for Sinhala, its 
contribution to the existing body of knowledge is 
difficult to evaluate. However, an experiment 
was conducted in order to arrive at an approxi-
mation of the scale of this contribution. 
It was first necessary, to define a baseline 
against which this work could be measured. 
While this could be done by giving a single de-
fault letter-to-sound mapping for any Sinhala 
letter, owing to the near universal application of 
rule #1 in Sinhala words (22766 of the 30000 
words used in testing), the baseline was defined 
by  the application of this rule in addition to the 
‘default mapping’. This baseline gives us an er-
ror of approximately 24%. Since the proposed 
solution reduces this error to 2%, this work can 
claim to have improved performance by 22%. 
An error analysis revealed the following types 
of errors (Table 6): 
 
Error description # of 
words 
Compound words- (ie. Single words 
formed by combining 2 or more distinct 
words; such as in the case of the English 
word “thereafter”).  
382 
 
Foreign (mainly English) words directly 
encoded in Sinhala. eg. ෆැෂන්  - fashion, 
කැම්පස ් - campus. 
116 
Other  118 
 
Table 6. Types of Errors. 
 
The errors categorized as “Other” are given 
below with clarifications: 
• The modifier used to denote long vowel 
“ආ ” /a:/ is “◌ා ” which is known as “Aela-
pilla”. eg. consonant “ක්” /k/ associates 
with “◌ා ” /a:/ to produce grapheme “කා ” is 
pronounced as /ka:/. The above exercise 
                                                 
*
 This accounts for almost two-thirds of the size of this ver-
sion of the corpus. 
revealed some 37 words end without 
vowel modifier “◌ා ”, but are usually pro-
nounced with the associated long vowel 
/a:/. In the following examples, each input 
word is listed first, followed by the erro-
neous output of G2P conversion, and cor-
rect transcription.   
“අම්ම ”(mother) -> /ammə / -> /amma:/ 
“අක්ක ”(sister) -> /akkə / -> /akka:/ 
“ගත්ත ”(taken)-> /gattə / -> /gatta:/ 
• There were 27 words associated with er-
roneous conversion of words having the 
letter “හ ”, which corresponds to phoneme 
/h/. The study revealed this letter shows an 
unusual behavior in G2P conversion. 
• The modifier used to denote vowel “ඍ ” 
- “◌ෘ ” is known as “Geta-pilla”. When 
this vowel appears as the initial letter of a 
word, it is pronounced as /ri/ as in “ඍණ ” 
/rinə / (minus). When the corresponding 
vowel modifier appears in a middle of a 
word most of the time it is pronounced as 
/ru/ (Disanayaka, 2000). eg. “කෘතිය ” 
(book)is pronounced as /krutijə /, “පෘෂ ්ඨය ” 
(surface) - /prußˇ\j\/, “උත්කෘෂ ්ට ” (excel-
lent)-/utkrußˇ\/. But 13 words were found 
as exceptions of this general rule. In those 
words, the “◌ෘ ” is pronounced as /ur/ 
rather than /ru/. eg. “පවෘත්ති ” (news)- 
/prə wurti/,“සමෘද්ධි ”(prosperity)-/samurdi/, 
“විවෘත ” (opened) - /wiwurtə /. 
• In general, vowel modifiers “◌ැ ” (Adha-
pilla), “◌ෑ ” (Diga Adha-pilla) symbolizes 
the vowel “ඇ ” /æ/ and “ඈ ” /æ:/ respec-
tively. eg. consonant “ක්” /k/ combines 
with vowel modifier “◌ැ ” to create “කැ ” 
which is pronounced as /kæ/. Few words 
were found where this rule is violated. In 
such words, the vowel modifiers “◌ැ ” and  
“◌ෑ ” represent vowels “උ ”- /u/, and “ඌ ”- 
/u:/ respectively.  eg. “ජනශැති ” (legend) - 
/Ôanə ßruti/, “කෑර ” (cruel) - /kru:r\/.  
• The verbal stem “කර ” (to do) is pro-
nounced as /kə rə /. Though there are many 
words starting with the same verbal stem, 
there are a few other words differently 
pronounced as /karə / or /kara/. eg. 
“කරත්තය ” (cart) /karattə yə /, “කරවල ” 
(dried fish)  /karə və lə /. 
895
• A few of the remaining errors are due to 
homographs; “වන ” - /vanə /, /və nə /; “කල ” 
-/kalə /, /kə lə /; “කර ” - /karə /, /kə rə /. 
The above error analysis itself shows that the 
model can be extended. Failures in the current 
model are mostly due to compound words and 
foreign words directly encoded in Sinhala 
(1.66%). The accuracy of the G2P model can be 
increased significantly by incorporating a 
method to identify compound words and tran-
scribe them accurately. If the constituent words 
of a compound word can be identified and sepa-
rated, the same set of rules can be applied for 
each constituent word, and the resultant pho-
netized strings combined to obtain the correct 
pronunciation. The same problem is observed in 
the Hindi language too. Ramakishnan et al. 
(2004) proposed a procedure for extracting com-
pound words from a Hindi corpus. The utiliza-
tion of compound word lexicon in their rule-
based G2P conversion module improved the ac-
curacy of G2P conversion by 1.6% (Ramakish-
nan et al., 2004). In our architecture, the most 
frequently occurring compound words and for-
eign words are dealt with the aid of an excep-
tions lexicon. Homographs are also disambigu-
ated using the most frequently occurring words 
in Sinhala. Future improvements of the architec-
ture will include incorporation of a compound 
word identification and phonetization module.  
7 Conclusion 
In this paper, the problem of Sinhala grapheme-
to-phoneme conversion is addressed with a spe-
cial focus on dealing with the schwa epenthesis. 
The proposed G2P conversion mechanism will 
be useful in various applications in the speech 
domain. To the best of our knowledge no other 
documented evidence has been reported for Sin-
hala grapheme-to-phoneme conversion in the 
literature. There are no other approaches avail-
able for the transcription of Sinhala text that pro-
vides a platform for comparison of the proposed 
rule-based method. The empirical evidence from 
a wide spectrum Sinhala corpus indicates that the 
proposed model can account for nearly 98% of 
cases accurately. 
The proposed G2P module is fully imple-
mented in Sinhala TTS being developed at Lan-
guage Technology Research Lab, UCSC. A 
demonstration tool of the proposed G2P module 
integrated with Sinhala syllabification algorithm 
proposed by Weerasinghe et al. (2005) is avail-
able for download from: 
http://www.ucsc.cmb.ac.lk/ltrl/downloads.html 
Acknowledgement 
This work has been supported through the PAN 
Localization Project, (http://www.PANL10n.net) 
grant from the International Development Re-
search Center (IDRC), Ottawa, Canada, adminis-
tered through the Center for Research in Urdu 
Language Processing, National University of 
Computer and Emerging Sciences, Pakistan. The 
authors would like to thank Sinhala Language 
scholars Prof. R.M.W. Rajapaksha, and Prof. J.B. 
Dissanayake for their invaluable support and ad-
vice throughout the study. Special thanks to Dr. 
Sarmad Hussain (NUCES, Pakistan) for his 
guidance and advices. We also wish to acknowl-
edge the contribution of Mr. Viraj Welgama, Mr. 
Dulip Herath, and Mr. Nishantha Medagoda of 
Language Technology Research Laboratory of 
the University of Colombo School of Comput-
ing, Sri Lanka. 
References 
Alan W. Black and Kevin A. Lenzo. 2003. Building 
Synthetic Voices, Language Technologies Insti-
tute, Carnegie Mellon University and Cepstral 
LLC. Retrieved from http://festvox.org/bsv/ 
Alan W. Black, Kevin Lenzo, and Vincent Pagel. 
1998. Issues in Building General Letter to Sound 
Rules. In Proc. of the 3rd ESCA Workshop on 
Speech Synthesis, pages 77–80. 
Monojit Choudhury. 2003. Rule-Based Grapheme to 
Phoneme Mapping for Hindi Speech Synthesis, 
presented at the 90th Indian Science Congress 
of the International Speech Communication 
Association (ISCA), Bangalore. 
R.I. Damper, Y. Marchand, M.J. Adamson and K. 
Gustafson. 1998. Comparative Evaluation of Let-
ter-to-Sound Conversion Techniques for English 
Text-to-Speech Synthesis. In Proc. Third 
ESCA/COCOSDA Workshop on Speech Syn-
thesis, pages 53- 58, Blue Mountains, NSW, Aus-
tralia. 
J.B. Disanayaka. 1991. The Structure of Spoken 
Sinhala, National Institute of Education, Ma-
haragama.  
J.B. Disanayaka. 2000. Basaka Mahima: 2, Akuru 
ha pili, S. Godage & Bros., 661, P. D. S. 
Kularathna Mawatha, Colombo 10. 
J.B. Disanayaka. 1995. Grammar of Contemporary 
Literary Sinhala - Introduction to Grammar, 
896
Structure of Spoken Sinhala, S. Godage & Bros., 
661, P. D. S. Kularathna Mawatha, Colombo 10. 
T. Dutoit. 1997.  An Introduction to Text-to-
Speech Synthesis, Kluwer Academic Publishers, 
Dordrecht,  Netherlands. 
Yousif A. El-Imam and Zuraidah M. Don. 2005. 
Rules and Algorithms for Phonetic Transcription of 
Standard Malay, IEICE Trans Inf & Syst, E88-D 
2354-2372. 
Sarmad Hussain. 2004. Letter-to-Sound Conversion 
for Urdu Text-to-Speech System, Proceedings of 
Workshop on "Computational Approaches to 
Arabic Script-based Languages," COLING 
2004, p. 74-49, Geneva, Switzerland. 
Daniel Jurafsky and James H. Martin. 2000. Speech 
and Language Processing: An Introduction to 
Natural Language Processing, Computational 
Linguistics, and Speech Recognition. Pearson 
Education (Singapore) Pte. Ltd, Indian Branch, 482 
F.I.E. Patparganj, Delhi 110 092, India. 
W.S. Karunatillake. 2004. An Introduction to Spo-
ken Sinhala, 3
rd
  edn., M.D. Gunasena & Co. ltd., 
217, Olcott Mawatha, Colombo 11. 
Sami Lemmetty. 1999. Review of Speech Synthesis 
Technology, MSc. thesis, Helsinki University of 
Technology. 
A.G. Ramakishnan, Kalika Bali, Partha Pratim Taluk-
dar N. and Sridhar Krishna. 2004. Tools for the 
Development of a Hindi Speech Synthesis System, 
In 5th ISCA Speech Synthesis Workshop, Pitts-
burgh. pages 109-114. 
Ruvan Weerasinghe, Asanka Wasala and Kumudu 
Gamage. 2005. A Rule Based Syllabification Algo-
rithm for Sinhala, Proceedings of 2
nd 
Interna-
tional Joint Conference on Natural Language 
Processing (IJCNLP-05), p. 438-449, Jeju Is-
land, Korea. 
UCSC Sinhala Corpus BETA. 2005. Retrieved Au-
gust 30, 2005, from University of Colombo School 
of Computing, Language Technology Research 
Laboratory Web site: 
http://www.ucsc.cmb.ac.lk/ltrl/downloads.html 
 
897
